Teaching neural networks to speedrun GTA V traffic laws
A supervised model that watches five frames of gameplay and decides which keys to mash, trained on 130 GB of human driving data.

What it does
T.E.D.D. 1104 is a PyTorch model that plays Grand Theft Auto V by looking at the screen. It ingests a sliding window of five screenshots taken 0.1 seconds apart, then predicts which keyboard keys (or Xbox controller inputs) to press next. The goal: reach a minimap waypoint fast while dodging cars, pedestrians, and the occasional lamp post. Everything is supervised — no reinforcement learning, just imitation of human players.
The interesting bit
The architecture stacks EfficientNetV2 image encoders with a transformer encoder over the temporal sequence, using a [CLS] token to spit out key combinations. It’s end-to-end classification (or optional regression), not modular perception-planning-control. The author also ships three model sizes — 26M, 68M, and 138M parameters — with pretrained weights, and claims the setup can generalize to “any existing video game.” That last part is aspirational; the repo only demonstrates GTA V.
Key highlights
- Trained on ~130 GB of human-labelled data across weather and time-of-day variations
- Real-time inference script (
run_TEDD1104.py) with parallel sequence recording for higher key-press frequency - Supports both keyboard and virtual Xbox controller output regardless of training mode
- Training works on Linux; data generation and live inference require Windows 10/11
- Pretrained models available via GitHub Releases, with published test accuracies (city driving ~47-56% top-1, highway ~63-80%)
Caveats
- The 130 GB training dataset is listed as “Coming soon” — only dev and test sets are currently downloadable
torchvisioncompatibility is pinned below 0.15.0 with a vague “future release” promise- Test accuracy tables show macro-averaged city performance below 50% for the largest model, so “avoiding obstacles” is a generous framing
Verdict
Worth a spin if you’re researching imitation learning or want a concrete PyTorch Lightning project that bridges computer vision and game interaction. Skip it if you need reproducible training data today or expect autonomous driving insights transferable to physical vehicles.