huggingface/datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

[view on github]last commit: May 6, 2026
stars
3,066
7d
+6
30d
+54
90d
+171
## star history
## found in