← all repositories

tjake/Jlama

A Java-based inference engine for running LLMs locally with quantization and SIMD acceleration

Jlama
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

Jlama is a Java LLM inference engine that enables running large language models directly in Java applications. It supports popular model architectures including Llama, Gemma, Mistral, and Qwen2, with features like paged attention, mixture of experts, and tool calling. The engine supports multiple data types including F32, F16, BF16, and quantization formats like Q8 and Q4, with optional SIMD acceleration and WebGPU support for performance optimization.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.