Google's on-device ML pipeline that ships in Meet and AR apps
MediaPipe is the graph-based framework Google uses to run perception models on phones, web, and edge devices without phoning home.

What it does
MediaPipe is a cross-platform framework for building machine-learning pipelines that run locally on mobile, web, desktop, and IoT hardware. It provides pre-built “Solutions” for vision, text, and audio tasks, plus a lower-level graph framework for wiring custom on-device inference. Google uses it in production for Meet background effects, AR features, and real-time hand tracking.
The interesting bit
The framework treats ML inference as a dataflow graph of “Calculators” passing timestamped “Packets” — essentially a streaming-first, media-aware DAG engine. That abstraction is what lets the same face-mesh model run on an Android phone, a browser tab, and a Raspberry Pi without rewriting the plumbing.
Key highlights
- Ships with pre-trained models and cross-platform APIs (MediaPipe Tasks) for object detection, pose tracking, text classification, and audio classification
- Includes Model Maker for fine-tuning on your own data and Studio for browser-based benchmarking
- Framework layer exposes C++, Android, and iOS APIs for building custom pipelines from graph primitives
- Used in production by Google Meet, AR apps, prosthesis control (Mirru), and sign-language interfaces (SignAll)
- Legacy Solutions support ended March 2023; new Solutions are the forward path
Caveats
- Primary documentation has moved off GitHub to developers.google.com/mediapipe as of April 2023
- MediaPipe Solutions Preview is explicitly flagged as early release
- Legacy Solutions code and binaries remain available “as-is” but without support
Verdict
Worth a look if you’re shipping computer-vision or audio ML to resource-constrained devices and want battle-tested infrastructure. Skip it if you need server-scale training frameworks or fully open governance — this is Google’s codebase, not a community foundation.