NVIDIA/Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

[view on github]last commit: Apr 15, 2026
stars
2,488
7d
+78
30d
-
90d
-
## star history
## found in