← all repositories
pathwaycom/pathway

Python stream processing that secretly runs on Rust

Pathway lets you write ETL pipelines in Python, then executes them in a Rust engine built on Differential Dataflow.

63.1k stars Python RAG · SearchData Tooling
pathway
Velocity · 7d
+49
★ / day
Trend
steady
star history

What it does Pathway is a Python framework for stream processing, real-time analytics, and LLM/RAG pipelines. You write pipelines with a pandas-like API, connect to sources like Kafka or PostgreSQL, and run the same code locally, in batch mode, or on streaming data. The pitch is one codebase for development and production.

The interesting bit The Python is just a frontend. The actual computation happens in a Rust engine based on Differential Dataflow, which does incremental computation and can multithread, multiprocess, or distribute. It’s a bit like writing PyTorch code that gets compiled to CUDA — except here your “GPU” is a Rust stream processor.

Key highlights

  • Connectors for Kafka, GDrive, SharePoint, PostgreSQL, plus an Airbyte bridge to 300+ sources
  • Stateful transformations (joins, windowing, sorting) implemented in Rust; you can drop to arbitrary Python functions when needed
  • In-memory real-time Vector Index and LLM wrappers for RAG pipelines, with LlamaIndex and LangChain integrations
  • Persistence for crash recovery; free tier offers “at least once” consistency, enterprise adds “exactly once”
  • Deployment via Docker, Kubernetes, or pathway spawn with thread count flags

Caveats

  • macOS and Linux only; Windows users need a VM
  • The “outperforms Flink, Spark, and Kafka Streaming” performance claim is stated but not substantiated with numbers in the README — the benchmarks section is truncated
  • Distributed scaling and “exactly once” consistency are enterprise features; the open-source version has a BSL license, not a standard OSI-approved one

Verdict Worth a look if you want Python ergonomics for stream processing without the usual Python performance ceiling, especially for real-time RAG or event-driven pipelines. If you’re already deep into Flink or Kafka Streams and happy there, the switching cost may not be justified.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.