Tag
multimodal
19 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
CLIP (Contrastive Language-Image Pretraining) introduced in paper: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021).
85a3ca745eaf4ee0 · 2 sources · 100% confidence
GPT-4o released on: 2024-05-13.
bd065b91ca6e880b · 1 source · 100% confidence
Llama 3.2 (multimodal release including 11B and 90B vision models) released on: 2024-09-25.
e27816c692a28ce9 · 2 sources · 100% confidence
CLIP introduced in paper: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021).
bcdef949cc6d3644 · 2 sources · 100% confidence
DALL·E 2 released on: 2022-04-06.
0b0e64476bd25bd6 · 2 sources · 100% confidence
Flamingo introduced in: Alayrac et al. 2022 — DeepMind few-shot vision-language model.
72ea74efc723bd06 · 2 sources · 100% confidence
Llama 4 released on: 2025-04-05 by Meta — Scout + Maverick + Behemoth lineup.
d5ce871dc69e7b04 · 2 sources · 100% confidence
GPT-4 Vision publicly released on: 2023-09-25 by OpenAI.
e15378ee47c08761 · 2 sources · 100% confidence
Mistral Pixtral 12B publicly released on: 2024-09-11 by Mistral AI — 12B multimodal vision-language model, Apache 2.0.
16ad8de26aee4424 · 2 sources · 100% confidence
Meta Llama 3.2 Vision publicly released on: 2024-09-25 by Meta — 11B + 90B vision-language variants of Llama 3.2.
8b8ff1a29ec72daa · 2 sources · 100% confidence
OpenAI o4-mini publicly released on: 2025-04-16 by OpenAI — successor to o3-mini reasoning model with multimodal input + tool use.
dc1848333261768b · 2 sources · 100% confidence
Cohere Embed v4 publicly released on: 2025-04-09 by Cohere — multimodal embedding model, 256k context support, 100+ languages.
2f9c856fbdd1a2c7 · 2 sources · 100% confidence
Google Gemma 3 publicly released on: 2025-03-12 by Google DeepMind — Gemma 3 family (1B/4B/12B/27B), 128k context, multimodal vision.
a832aaba30ee658e · 2 sources · 100% confidence
Black Forest Labs Flux.1 Kontext publicly released on: 2025-05-29 by Black Forest Labs — image generation with in-context editing (text + image input).
288c8e0043942285 · 2 sources · 95% confidence
Anthropic Files API publicly released on: 2025-03-25 by Anthropic — file upload + reference API for Claude, supports PDF + image + spreadsheet.
d5836ef4cc0e8a20 · 2 sources · 85% confidence
Microsoft Phi-4 Multimodal publicly released on: 2025-02-26 by Microsoft — Phi-4-multimodal 5.6B parameters with audio + image + text.
cd1056deef1c0597 · 2 sources · 100% confidence
Reka Core publicly released on: 2024-04-15 by Reka AI — frontier multimodal LLM (text + image + audio + video input).
6569f95366abae46 · 2 sources · 95% confidence
Allen AI Molmo publicly released on: 2024-09-25 by Allen Institute for AI — fully-open multimodal VLM family (1B/7B/72B), Apache 2.0.
dc2a0606a47f3ac4 · 2 sources · 100% confidence
Jina AI founded in: 2020 by Han Xiao — open-source multimodal AI infrastructure (Jina Framework, Jina Embeddings, Reader API).
3cabd56c2d4ac308 · 2 sources · 100% confidence