Tag
dataset
7 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
ImageNet dataset introduced in paper: ImageNet: A Large-Scale Hierarchical Image Database (Deng et al., 2009).
045e628def62181d · 2 sources · 100% confidence
Common Crawl founded in: 2007.
4a2689e6230ef2e1 · 2 sources · 95% confidence
C4 (Colossal Clean Crawled Corpus) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019).
0d24c97977ebd744 · 2 sources · 100% confidence
The Pile dataset released on: 2020-12-31.
4aef1422b96df26c · 2 sources · 100% confidence
RedPajama dataset released on: 2023-04-17.
ea8b7be3a49101be · 2 sources · 95% confidence
GSM8K introduced in paper: Training Verifiers to Solve Math Word Problems (Cobbe et al., 2021).
dc1ccb567aff584d · 3 sources · 92% confidence
MATH dataset introduced in paper: Measuring Mathematical Problem Solving With the MATH Dataset (Hendrycks et al., 2021).
8c1f847ae98570da · 3 sources · 92% confidence