Pandas copilot that actually reads your data first
Sketch feeds column summaries into LLMs so its code suggestions know what they're working with.

What it does
Sketch is a Python library that bolts a .sketch accessor onto any pandas DataFrame. You can ask natural-language questions about your data (df.sketch.ask), request generated code snippets (df.sketch.howto), or even run LLM-powered transformations row-by-row (df.sketch.apply). No IDE plugin required—just import sketch and go.
The interesting bit
The hook is in the name: “sketch” refers to data sketches, the approximation algorithms that summarize your columns cheaply. Rather than dumping the whole DataFrame into the prompt (expensive, slow, privacy nightmare), Sketch compresses the schema and statistics into context the LLM can actually use. It’s a pragmatic compression layer between your data and a language model that otherwise works blind.
Key highlights
- Three modes:
askfor exploration,howtofor code generation,applyfor data generation/transforms - Runs against a hosted endpoint by default (
prompts.approx.dev) for zero-config startup - Can switch to local Hugging Face models (MPT-7B, StarCoder) or your own OpenAI key via environment variables
- Built on the team’s own
lambdapromptlibrary for templated LLM calls - Explicitly targets the “glue work” of data cleaning, feature extraction, and compliance masking
Caveats
- The
applymode requires an OpenAI API key; the free hosted endpoint won’t cover everything - Local model setup involves three environment variables and downloading weights—“usable in seconds” really means “usable in seconds if you use their cloud endpoint”
- The README’s future hope of “custom made data + language foundation models” is just that: future hope
Verdict
Worth a spin if you live in pandas and want quick, context-aware code stubs without leaving your notebook. Skip it if you need deterministic, auditable data pipelines—this is exploratory acceleration, not production infrastructure.