Microsoft's quiet framework for robots that need to think in real time
A C# toolkit for wiring up sensors, AI models, and actuators when latency actually matters.

What it does
\psi (yes, the Greek letter) is a .NET framework for building systems that ingest multiple streaming sensor feeds—audio, video, depth, whatever—run them through AI components, and coordinate responses under tight latency constraints. Think social robots, HoloLens apps, or smart-space installations where “eventually consistent” means “failed demo.”
The interesting bit
The framework treats time as a first-class citizen. Streams carry explicit timestamps, and the infrastructure handles synchronization, persistence, and replay of temporally aligned multimodal data. This is the unglamorous plumbing that separates research prototypes from systems that actually work in the wild.
Key highlights
- Cross-platform core (.NET Standard) with Windows/Linux support; some components are platform-specific
- Built-in visualization and annotation tools (PsiStudio) for debugging live and recorded pipelines
- Component ecosystem spans Azure Kinect, HoloLens 2, microphones, webcams, and various ML wrappers (including MaskRCNN)
- Includes SIGMA, a research-only mixed-reality task-assistance prototype built atop the framework
- Active use in academic labs (Georgia Tech, Cornell Tech, IMT Atlantique) and Microsoft Research projects
Caveats
- Still in beta (v0.19 as of March 2024); API stability not guaranteed
- Some components and tools are Windows-only, so “cross-platform” has fine print
- HoloLens and advanced sensor integration requires specific hardware and non-trivial setup
Verdict
Worth a look if you’re building embodied AI or multimodal interactive systems and have outgrown gluing together ROS nodes and Python scripts. Skip it if your problem is batch processing or doesn’t involve synchronizing multiple real-time sensor streams.