← all repositories
declare-lab/MELD

Friends transcripts, now with feelings (and faces, and audio)

A multimodal dataset that turns sitcom dialogue into a benchmark for detecting who feels what, when, and how strongly.

1.1k stars Python Data Tooling
MELD
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

MELD is a research dataset built from the TV show Friends: over 1,400 dialogues and 13,000 utterances, each annotated with one of seven emotions (anger, disgust, fear, joy, neutral, sadness, surprise) plus sentiment polarity. Unlike text-only predecessors, it bundles the actual audio and video clips alongside transcripts, so models can learn from tone of voice, facial expressions, and words together.

The interesting bit

Most conversational emotion datasets are two-person affairs. MELD keeps the full ensemble-cast chaos—multiple speakers talking over each other in the same scene—which makes context modeling genuinely harder and more realistic. The authors also cleaned up the source data: they discovered some “dialogues” in the original EmotionLines dataset were actually spliced from separate conversations, and filtered those out by enforcing timestamp continuity within single episodes and scenes.

Key highlights

  • Three modalities per utterance: text, audio, and video (≈3.6 seconds average clip length)
  • Splits: 1,039 train / 114 dev / 280 test dialogues; 260+ unique speakers in training alone
  • Emotion distribution is heavily skewed toward neutral (≈47% of training utterances), with disgust and fear as rare classes
  • Accepted at ACL 2019; subsequent SOTA work (COSMIC, DialogueGCN) has used it as a benchmark
  • Raw data available as .mp4 clips; annotations and CSV schemas live in the repo

Caveats

  • The raw download link (University of Michigan server) is separate from the newer Hugging Face mirror; the README lists both without clarifying which is preferred
  • Visual features require external extraction (ResNet features are in a sister repo, MM-Align)
  • Baseline code has migrated to the conv-emotion repo, so this repository is essentially documentation and data pointers

Verdict

Grab this if you’re building multimodal transformers or conversational AI that needs to read the room. Skip it if you want a clean, balanced classification problem—neutral dominates, and the real challenge is context tracking across multiple speakers, not single-utterance sentiment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.