mukel/llama3.java

A pure Java implementation of Llama 3+ LLM inference with GGUF support, vectorized SIMD operations, and GraalVM Native Image optimization.

★811 stars Java Inference · Serving Language Models

View on GitHub ↗

Velocity · 7d

+1.0

★ / day

Trend

→steady

star history

This project implements Llama 3, 3.1, and 3.2 model inference entirely in Java without external dependencies. It supports GGUF model format parsing, various quantization formats (Q4_0 through Q8_0), Grouped-Query Attention, and RoPE scaling for the 3.1 variant. The implementation leverages Java’s Vector API for fast SIMD matrix operations and can be compiled with GraalVM Native Image for optimized time-to-first-token performance.