chengchingwen/Transformers.jl

BERT in Julia: no Python required, no Python assumed

A native Julia transformer stack that loads HuggingFace weights and runs on Flux, for developers who'd rather not leave the REPL.

★572 stars Julia Language Models ML Frameworks

View on GitHub ↗

Velocity · 7d

+0.2

★ / day

Trend

→steady

star history

What it does

Transformers.jl implements attention-based models natively in Julia, built on the Flux.jl framework. It can load pretrained HuggingFace weights (the hgf"bert-base-uncased" syntax in the example) and handles the full pipeline: tokenization, special-token injection, truncation, padding, and one-hot encoding through its TextEncoders module.

The interesting bit

The project isn’t a wrapper around PyTorch or TensorFlow—it’s a from-scratch Julia implementation of the transformer architecture. That means GPU acceleration via Julia’s native CUDA stack and differentiation through Flux’s AD, all without a Python process in sight. The hgf"..." string macro for loading pretrained models is a nice touch of Julia metaprogramming.

Key highlights

Native Julia implementation of transformers (not Python bindings)
Loads HuggingFace pretrained models directly via string macros
Built on Flux.jl for differentiation and GPU execution
Includes text encoding pipeline: tokenization, truncation, padding, special tokens
Examples folder with complete working code

Caveats

README warns the current version is “almost completely different” from 0.1.x; breaking changes have happened
Documentation is sparse beyond the dev docs link; most guidance points to “read the code in example”
573 stars suggests a niche audience—smaller ecosystem than Python alternatives

Verdict

Worth a look if you’re already committed to Julia’s ML ecosystem and want to avoid Python interop overhead. If you’re not using Flux or need battle-tested production tooling, the Python HuggingFace stack remains the pragmatic default.