@scadsfct

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

, , , , , , and . Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 35, page 31809-31826. Curran Associates, Inc., (December 2022)

Links and resources

Tags