BERT-style pre-training on Convnets? Sparse mask modeling with hierarchies in the Beijing U, ByteDance, and Oxford U leading the way
The field of Natural Language Processing (NLP) has made remarkable advances driven by the power of large language models based on adapters such as Google’s BERT and OpenAI’s GPT. Transformer architectures have also recently achieved cutting edge performance in various computer vision tasks. Efforts to extend the pre-training approach to BERT-style masked-image modeling (where a … Read more