A neural network that reverse-engineers dinner from a photo
A 2018 student project trains CNNs on 400,000 food images to guess what you're eating and fetch the recipe.

What it does
Upload a photo of a dish. The system classifies it into one of 230 food categories, finds the five most visually similar images from a dataset of 400,000+, and cross-checks the category predictions against those neighbors’ known recipes. The output: a recipe that (hopefully) matches what’s on your plate. A web app called DeepChef was planned but marked “work in progress.”
The interesting bit
The author combines two fairly standard techniques—transfer-learned CNN classification (VGG, ResNet, etc.) and approximate nearest-neighbor search on PCA-reduced feature vectors—to compensate for a hard problem: different dishes often look nearly identical. The CNN’s top-5 category guesses get cross-validated against the categories of the visually closest images, a kind of ensemble voting between semantic and visual similarity.
Key highlights
- Built on a large German-language dataset: >300,000 recipes from chefkoch.de with >400,000 associated images
- Explores the full pipeline from data cleaning and augmentation through kNN/k-Means baselines to transfer learning and custom CNN training
- Includes t-SNE visualization of the learned feature space
- Code is heavily commented Jupyter notebooks, though the comments are in German
- Written as a 2018 Maturaarbeit (Swiss high school thesis)
Caveats
- The web application is listed as “work in progress” with no visible updates since 2018
- The core algorithm notebook is the primary runnable artifact; the rest is exploratory code
- Recipe categories were determined dynamically via topic modeling on recipe names—how well this scales to the long tail of world cuisine is unclear
Verdict
Worth a look if you’re building a food-recognition pipeline and want a worked example of combining CNN classification with nearest-neighbor retrieval. Skip it if you need production-ready code or care about cuisines beyond the German-language dataset’s coverage.