Yes — maderix/ANE is open source, released under the MIT license.

What language is ANE written in?

maderix/ANE is primarily written in Objective-C.

maderix/ANE has 6.9k stars on GitHub.

Where can I find ANE?

maderix/ANE is on GitHub at https://github.com/maderix/ANE.

maderix/ANE

Apple said the Neural Engine was inference-only. They were wrong.

This project exists to prove that Apple's inference-locked Neural Engine can run full transformer training if you bypass CoreML and speak to the hardware directly via reverse-engineered private APIs.

★6.9k stars Objective-C ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

ANE is a research hack that trains small transformers directly on Apple’s Neural Engine by reverse-engineering the private _ANEClient and _ANECompiler APIs. Instead of using CoreML’s inference-only path, it constructs raw MIL compute graphs at runtime to run forward and backward passes on the NPU itself, while offloading weight gradients and the Adam optimizer to the CPU. The result is a fully functional training loop for models like Stories110M and Qwen3-0.6B on an M4 Mac, no GPU required.

The interesting bit

The real trick is treating the ANE not as a black-box accelerator but as a programmable coprocessor you can feed with hand-rolled MIL text. The author packs weights and activations into a single spatial IOSurface dimension so the kernels stay fixed even as weights change, sidestepping a ~119 compile-per-process limit by simply exec() restarting from checkpoint when the compiler leaks resources.

Key highlights

Trains a 109M-parameter Stories110M model at ~91 ms/step and a 596M Qwen3-0.6B at ~412 ms/step on an M4, with forward and backward dx passes running on the ANE.
INT8 W8A8 quantization pushes throughput to 35.1 TOPS, roughly 1.88× over FP16, by halving L2 SRAM bandwidth with MIL quantize/dequantize ops.
A zero-copy GPU↔ANE pipeline shares IOSurface memory, so the GPU can prefill while the ANE decodes without copying tensors.
No external dependencies: it resolves private APIs at runtime via objc_msgSend and uses only system frameworks.
The author is explicit that this is a weekend research project, not a maintained framework or CoreML replacement.

Caveats

Hardware utilization is currently low, around 5–9% of peak, and many element-wise operations still fall back to the CPU.
The ANE ignores attn_mask in SDPA ops, so causal attention has to be decomposed across ANE and CPU manually.
Because the ANE compiler leaks resources, training is capped at roughly 119 compiles per process; the workaround is restarting the process from a checkpoint.

Verdict

Grab this if you are an Apple Silicon reverse-engineering enthusiast or an edge-AI researcher curious about what NPUs can do when freed from vendor SDKs. Skip it if you need a production training stack—GPU training via MLX or PyTorch is still far more practical.

Frequently asked

What is maderix/ANE?: This project exists to prove that Apple's inference-locked Neural Engine can run full transformer training if you bypass CoreML and speak to the hardware directly via reverse-engineered private APIs.
Is ANE open source?: Yes — maderix/ANE is open source, released under the MIT license.
What language is ANE written in?: maderix/ANE is primarily written in Objective-C.
How popular is ANE?: maderix/ANE has 6.9k stars on GitHub.
Where can I find ANE?: maderix/ANE is on GitHub at https://github.com/maderix/ANE.