666DZY666/micronet
A Python library for compressing neural networks via quantization and pruning, and deploying them on TensorRT with FP32/FP16/INT8 support.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
The library provides model compression techniques including quantization-aware training for high-bit and low-bit/ternary/binary schemes, post-training quantization for 8-bit inference, and structured pruning strategies. It also supports batch normalization fusion for quantization efficiency. For deployment, it integrates with TensorRT offering FP32, FP16, and INT8 calibration, operation adaptation, and dynamic shape support.