mukel/llama3.java
A pure Java implementation of Llama 3+ LLM inference with GGUF support, vectorized SIMD operations, and GraalVM Native Image optimization.

This project implements Llama 3, 3.1, and 3.2 model inference entirely in Java without external dependencies. It supports GGUF model format parsing, various quantization formats (Q4_0 through Q8_0), Grouped-Query Attention, and RoPE scaling for the 3.1 variant. The implementation leverages Java’s Vector API for fast SIMD matrix operations and can be compiled with GraalVM Native Image for optimized time-to-first-token performance.