← all repositories

cpldcpu/MisguidedAttention

A collection of prompts that test whether large language models can reason through modified versions of thought experiments and riddles without falling back to familiar but incorrect solutions.

476 stars Python LLMOps · EvalLearning
MisguidedAttention
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

This repository contains prompts designed to evaluate LLM reasoning by presenting modified versions of classic riddles, paradoxes, and thought experiments. The prompts are structured to trigger recognition of familiar problems while requiring different solutions, testing whether models apply logical deduction or fall back to memorized responses. An evaluation framework tracks how different models perform on this benchmark over time, with interactive results available via a GitHub Pages deployment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.