Contrastive Language-Image Pre-training (CLIP)

Summary: A video digest of the paper "Learning transferable visual models from natural language supervision" by A. Radford et al. published at ICML 2021, which introduced the CLIP family of models. The paper can be found on arxiv here.
Topics: computer vision, zero-shot learning, vision and language
Slides: link (pdf)

