Google DeepMind Researchers Propose a Novel AI Method Called Sparse Fine-grained Contrastive Alignment (SPARC) for Fine-Grained Vision-Language Pretraining
Contrastive pre-training using large, noisy image-text datasets has become popular for building general vision representations. These models align global image and text features in a shared space through similar and dissimilar pairs, excelling in tasks like image classification and retrieval. However, they need help with fine-grained tasks such as localization and spatial relationships. Recent efforts…
