A four-day crash course in making LLMs do your SQL homework
Course materials for data engineers who want LLMs to write queries, mimic their boss's LinkedIn voice, and build RAG chatbots.

What it does
This repo holds the lab materials for a DataExpert.io course on shoehorning LLMs into data engineering workflows. Over four days, students progress from generating SQL queries with LLMs, to orchestrating pipelines with LangChain, to deploying a “ZachGPT” RAG chatbot via Pinecone. It’s essentially a curated walkthrough with video lectures, Python setup via uv, and a PostgreSQL dump of sample data.
The interesting bit
The course treats LLMs less as oracles and more as overeager interns: useful for drafting SQL from dimensional schemas, auto-generating LinkedIn posts in a specific human’s voice, and fielding business questions via retrieval-augmented generation. The progression from raw prompting to LangChain to vector databases mirrors how many teams actually adopt this stuff—incrementally, with growing API bills.
Key highlights
- Four structured days: prompt-to-SQL, LangChain orchestration, business-value automation, and RAG with Pinecone
- Uses a real PostgreSQL dump (
halo_data_dump.dump) for hands-on query generation labs - Requires OpenAI API key throughout; Pinecone key added for Day 4 vector search
- Setup via
uv syncor fallbackpip install . - Links to external repos for Day 3 (auto-feedback) and Day 4 (vector database) labs
Caveats
- README is essentially a syllabus with setup instructions; no actual code or notebooks visible in the provided source
- Live cohort dependency: Day 2 lab mentions a cloud database URL “Zach gives in Zoom”—self-serve learners must restore the local dump instead
- Mac-centric PostgreSQL download link; Windows users get a terse “Or set it in Windows” for env vars
Verdict
Worth bookmarking if you’re a data engineer who learns best from guided, project-based courses and doesn’t mind paying OpenAI for the privilege. Skip it if you want standalone, runnable code without video homework.