← all repositories
ikergarcia1996/Self-Driving-Car-in-Video-Games

Teaching neural networks to speedrun GTA V traffic laws

A supervised model that watches five frames of gameplay and decides which keys to mash, trained on 130 GB of human driving data.

Self-Driving-Car-in-Video-Games
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

T.E.D.D. 1104 is a PyTorch model that plays Grand Theft Auto V by looking at the screen. It ingests a sliding window of five screenshots taken 0.1 seconds apart, then predicts which keyboard keys (or Xbox controller inputs) to press next. The goal: reach a minimap waypoint fast while dodging cars, pedestrians, and the occasional lamp post. Everything is supervised — no reinforcement learning, just imitation of human players.

The interesting bit

The architecture stacks EfficientNetV2 image encoders with a transformer encoder over the temporal sequence, using a [CLS] token to spit out key combinations. It’s end-to-end classification (or optional regression), not modular perception-planning-control. The author also ships three model sizes — 26M, 68M, and 138M parameters — with pretrained weights, and claims the setup can generalize to “any existing video game.” That last part is aspirational; the repo only demonstrates GTA V.

Key highlights

  • Trained on ~130 GB of human-labelled data across weather and time-of-day variations
  • Real-time inference script (run_TEDD1104.py) with parallel sequence recording for higher key-press frequency
  • Supports both keyboard and virtual Xbox controller output regardless of training mode
  • Training works on Linux; data generation and live inference require Windows 10/11
  • Pretrained models available via GitHub Releases, with published test accuracies (city driving ~47-56% top-1, highway ~63-80%)

Caveats

  • The 130 GB training dataset is listed as “Coming soon” — only dev and test sets are currently downloadable
  • torchvision compatibility is pinned below 0.15.0 with a vague “future release” promise
  • Test accuracy tables show macro-averaged city performance below 50% for the largest model, so “avoiding obstacles” is a generous framing

Verdict

Worth a spin if you’re researching imitation learning or want a concrete PyTorch Lightning project that bridges computer vision and game interaction. Skip it if you need reproducible training data today or expect autonomous driving insights transferable to physical vehicles.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.