Microsoft's sandbox for fake focus groups and synthetic customers
An experimental Python library that lets you simulate believable personas inside Jupyter notebooks to test ads, products, and ideas before spending real money.

What it does
TinyTroupe spawns LLM-driven agents—TinyPersons with specific personalities, goals, and backstories—inside simulated TinyWorld environments. You script scenarios, they interact. The intended payoff: cheap, repeatable “customer interviews,” ad pre-testing, synthetic data generation, and focus-group-style feedback without recruiting humans.
The interesting bit
The project leans into the simulation angle rather than assistant or game use cases. It includes mechanisms like Proposition checks (persona adherence, self-consistency) and an Intervention system for event-based tweaks—features that only make sense when you care about behavioral fidelity rather than task completion. There is also an empirical validator that statistically compares simulation outputs against real survey data, which suggests someone is actually trying to ground this in reality rather than vibes.
Key highlights
- Jupyter-native workflow: define agents, run simulations, inspect conversations in notebooks
- Vision modality support (as of v0.7.0) for image-based product feedback
- Cost tracking at agent, environment, and client levels—useful when your “focus group” burns GPT-5-mini tokens
- Experimental Ollama support for local model runs
- Persona fragments for reusable personality components across agents
- Paper and validation notebooks included in the repo
Caveats
- API is explicitly unstable; Microsoft warns of frequent breaking changes and “further tidying up” needed
- Default model has shifted multiple times (GPT-4o-mini → GPT-4.1-mini → GPT-5-mini), requiring config retesting each release
- Legal disclaimer is prominent; outputs are research-only and you shoulder full responsibility for use
- Simulation outputs currently render best on dark backgrounds, which is a minor but real papercut
Verdict
Worth a look if you run user research, product testing, or ad evaluation and want to cheaply explore “what would a skeptical physician think of this?” before building for real. Skip it if you need production-stable APIs or are hoping for a drop-in replacement for actual human subjects—the authors are upfront that this is early-stage research tooling.