← all repositories

mukel/llama3.java

A pure Java implementation of Llama 3+ LLM inference with GGUF support, vectorized SIMD operations, and GraalVM Native Image optimization.

llama3.java
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

This project implements Llama 3, 3.1, and 3.2 model inference entirely in Java without external dependencies. It supports GGUF model format parsing, various quantization formats (Q4_0 through Q8_0), Grouped-Query Attention, and RoPE scaling for the 3.1 variant. The implementation leverages Java’s Vector API for fast SIMD matrix operations and can be compiled with GraalVM Native Image for optimized time-to-first-token performance.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.