← all repositories
run-house/kubetorch

Kubernetes compute that feels like a local process pool

Kubetorch lets you ship Python functions to a K8s cluster without the usual YAML ceremony or ten-minute build cycles.

1.2k stars Python ML FrameworksLLMOps · Eval
kubetorch
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

What it does Kubetorch is a Python SDK that wraps Kubernetes so you can run functions remotely as if they were local. You define compute (.1 CPUs, GPUs, whatever), decorate a function, and call it. The cluster handles the rest. Logs, exceptions, and hardware faults stream back in real time. No local runtime, no code serialization step.

The interesting bit The pitch is speed of iteration: the README claims 1–3 second turnaround for complex ML workloads like RL and distributed training, down from 10+ minutes. That matters because the gap between “works on my laptop” and “works on 64 GPUs” is where most ML infrastructure projects quietly die.

Key highlights

  • Python-native API: kt.fn(my_func).to(compute) and call it like a regular function
  • No local runtime dependency — works from IDEs, notebooks, CI, or production code
  • Helm-based controller deploys to your cluster; managed serverless option available through Runhouse
  • Claims 50%+ cost savings via bin-packing and dynamic scaling, plus fault handling with programmatic recovery
  • Apache 2.0 licensed; client and server components now unified in one repo

Caveats

  • The “100x faster”, “50%+ savings”, and “95% fewer faults” claims are stated without methodology or benchmarks in the README
  • Version 0.5.0 suggests early-stage software; the managed serverless platform requires contacting the company directly

Verdict Worth a look if you’re currently duct-taping Ray, K8s Jobs, and custom Docker builds together for ML experimentation. Skip it if you need battle-tested, fully transparent infrastructure or aren’t already running a Kubernetes cluster.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.