huggingface/datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

[view on github]last commit: Apr 10, 2026
stars
2,989
7d
+8
30d
-
90d
-
## star history
## found in