NVIDIA/Model-Optimizer
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
[view on github]last commit: May 24, 2026
stars
2,753
7d
+60
30d
+182
90d
+706
## star history