Is Jlama open source?

Yes — tjake/Jlama is open source, released under the Apache-2.0 license.

What language is Jlama written in?

tjake/Jlama is primarily written in Java.

How popular is Jlama?

tjake/Jlama has 1.3k stars on GitHub.

Where can I find Jlama?

tjake/Jlama is on GitHub at https://github.com/tjake/Jlama.

← all repositories

tjake/Jlama

A llama that actually runs on the JVM

Jlama runs Gemma, Llama, and Mistral models directly inside the JVM, no Python process required.

★1.3k stars Java Inference · Serving Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Jlama is a native Java inference engine for large language models. It loads Hugging Face SafeTensors weights directly into the JVM and runs inference for models like Llama 3, Mistral, Gemma, and GPT-2 without requiring a Python runtime or external process. You can embed it as a library, expose an OpenAI-compatible REST API, or coordinate distributed inference across a cluster.

The interesting bit

Instead of wrapping a C++ runtime via JNI, Jlama uses Java 21’s Panama Vector API as its default backend for SIMD operations, with optional native SIMD and experimental WebGPU support. The hot path stays in Java bytecode, which keeps tensor marshaling inside the JVM and avoids the usual cross-language friction.

Key highlights

Supports modern architectures: Paged Attention, Mixture of Experts, tool calling, and embeddings
Quantization built-in: runs Q4 and Q8 compressed models, plus F32, F16, and BF16 dtypes
Distributed inference via a coordinator/worker model
Ships with a CLI and an OpenAI-compatible REST server, or embeds via Langchain4j
Apache 2.0 licensed

Caveats

Requires Java 20+ and enables Java 21 preview features (--add-modules jdk.incubator.vector --enable-preview)
WebGPU backend is explicitly marked experimental
Native SIMD backend is optional; Panama Vector is the default

Verdict

Worth a look if you run a Java shop and want LLM inference colocated with your existing stack. Skip it if you are already optimized around Python or GPU-native frameworks and have no patience for JVM tuning.

Frequently asked

What is tjake/Jlama?: Jlama runs Gemma, Llama, and Mistral models directly inside the JVM, no Python process required.
Is Jlama open source?: Yes — tjake/Jlama is open source, released under the Apache-2.0 license.
What language is Jlama written in?: tjake/Jlama is primarily written in Java.
How popular is Jlama?: tjake/Jlama has 1.3k stars on GitHub.
Where can I find Jlama?: tjake/Jlama is on GitHub at https://github.com/tjake/Jlama.