Tag
reasoning
14 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
ReAct (Reasoning + Acting) introduced in paper: ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022).
fceea64fa7d04d3a · 2 sources · 100% confidence
DeepSeek-R1 released on: 2025-01-20 with reasoning chain-of-thought capabilities.
c6660e2e910f2680 · 2 sources · 100% confidence
OpenAI o1 publicly released on: 2024-12-05 — full version with reasoning chain.
b5990a667b668e82 · 2 sources · 100% confidence
OpenAI o3-mini publicly released on: 2025-01-31 by OpenAI.
9c871f7e7d056dc8 · 2 sources · 100% confidence
Gemini 2.5 Pro publicly released on: 2025-03-25 by Google DeepMind.
1529ff37ac65916a · 2 sources · 100% confidence
Tree of Thoughts introduced in: Yao et al. 2023 — deliberate problem solving with LLMs.
9d7676f71d1ee4f3 · 2 sources · 100% confidence
ARC-AGI benchmark introduced in: Chollet 2019 — abstraction and reasoning corpus.
cc5df3c14d35fa49 · 2 sources · 100% confidence
Grok 3 publicly released on: 2025-02-17 by xAI.
232df1b22ed7dca3 · 2 sources · 100% confidence
Chain-of-Thought (CoT) introduced in: Wei et al. 2022 — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
a8503ad535423b54 · 2 sources · 100% confidence
Group Relative Policy Optimization (GRPO) introduced in paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024).
f73e50d63643df21 · 3 sources · 92% confidence
GPQA benchmark introduced in paper: GPQA: A Graduate-Level Google-Proof Q&A Benchmark (Rein et al., 2023).
26f75f130f7b395a · 3 sources · 92% confidence
GSM8K introduced in paper: Training Verifiers to Solve Math Word Problems (Cobbe et al., 2021).
dc1ccb567aff584d · 3 sources · 92% confidence
MATH dataset introduced in paper: Measuring Mathematical Problem Solving With the MATH Dataset (Hendrycks et al., 2021).
8c1f847ae98570da · 3 sources · 92% confidence
MMLU-Pro benchmark introduced in paper: MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (Wang et al., 2024).
2df92e0b0e4c891b · 3 sources · 92% confidence