Tag
vision-language
7 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
CLIP introduced in paper: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021).
bcdef949cc6d3644 · 2 sources · 100% confidence
Flamingo introduced in: Alayrac et al. 2022 — DeepMind few-shot vision-language model.
72ea74efc723bd06 · 2 sources · 100% confidence
Mistral Pixtral 12B publicly released on: 2024-09-11 by Mistral AI — 12B multimodal vision-language model, Apache 2.0.
16ad8de26aee4424 · 2 sources · 100% confidence
Meta Llama 3.2 Vision publicly released on: 2024-09-25 by Meta — 11B + 90B vision-language variants of Llama 3.2.
8b8ff1a29ec72daa · 2 sources · 100% confidence
Cohere Aya Vision publicly released on: 2025-03-04 by Cohere For AI — multilingual open-weight vision-language models (8B + 32B), 23 languages.
e3a0e94cbeffe2c4 · 2 sources · 100% confidence
Allen AI Molmo publicly released on: 2024-09-25 by Allen Institute for AI — fully-open multimodal VLM family (1B/7B/72B), Apache 2.0.
dc2a0606a47f3ac4 · 2 sources · 100% confidence
Show and Tell (Neural Image Caption Generator) introduced in paper: Show and Tell: A Neural Image Caption Generator (Vinyals et al., 2014).
47f58a443dd825ac · 2 sources · 82% confidence