togethercomputer/RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

[view on github]last commit: Dec 7, 2024
stars
4,934
7d
-
30d
-
90d
-
## star history
## found in