BERT in Julia: no Python required, no Python assumed
A native Julia transformer stack that loads HuggingFace weights and runs on Flux, for developers who'd rather not leave the REPL.

What it does
Transformers.jl implements attention-based models natively in Julia, built on the Flux.jl framework. It can load pretrained HuggingFace weights (the hgf"bert-base-uncased" syntax in the example) and handles the full pipeline: tokenization, special-token injection, truncation, padding, and one-hot encoding through its TextEncoders module.
The interesting bit
The project isn’t a wrapper around PyTorch or TensorFlow—it’s a from-scratch Julia implementation of the transformer architecture. That means GPU acceleration via Julia’s native CUDA stack and differentiation through Flux’s AD, all without a Python process in sight. The hgf"..." string macro for loading pretrained models is a nice touch of Julia metaprogramming.
Key highlights
- Native Julia implementation of transformers (not Python bindings)
- Loads HuggingFace pretrained models directly via string macros
- Built on Flux.jl for differentiation and GPU execution
- Includes text encoding pipeline: tokenization, truncation, padding, special tokens
- Examples folder with complete working code
Caveats
- README warns the current version is “almost completely different” from 0.1.x; breaking changes have happened
- Documentation is sparse beyond the dev docs link; most guidance points to “read the code in
example” - 573 stars suggests a niche audience—smaller ecosystem than Python alternatives
Verdict
Worth a look if you’re already committed to Julia’s ML ecosystem and want to avoid Python interop overhead. If you’re not using Flux or need battle-tested production tooling, the Python HuggingFace stack remains the pragmatic default.