← all repositories

ikawrakow/ik_llama.cpp

A high-performance fork of llama.cpp adding advanced quantization methods and optimized hybrid CPU/GPU inference for large language models.

ik_llama.cpp
Velocity · 7d
+3.8
★ / day
Trend
steady
star history

This repository extends llama.cpp with state-of-the-art quantization formats and performance improvements for running large language models. It adds row-interleaved quant packing, fused MoE operations, FlashMLA optimizations, and first-class Bitnet support. The focus is on efficient inference using hybrid CPU/GPU compute backends with improved memory utilization and throughput.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.