Is ConvoKit open source?

Yes — CornellNLP/ConvoKit is open source, released under the MIT license.

What language is ConvoKit written in?

CornellNLP/ConvoKit is primarily written in Jupyter Notebook.

How popular is ConvoKit?

CornellNLP/ConvoKit has 639 stars on GitHub.

Where can I find ConvoKit?

CornellNLP/ConvoKit is on GitHub at https://github.com/CornellNLP/ConvoKit.

← all repositories

CornellNLP/ConvoKit

Social science for people who'd rather code than theorize

A scikit-learn-flavored toolkit that turns messy conversations into measurable social signals.

★639 stars Jupyter Notebook Data Tooling Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does ConvoKit packages conversational analysis as a Python toolkit with a scikit-learn-compatible interface. It bundles a dozen research-backed feature extractors—politeness strategies, linguistic coordination, hypergraph structure, conversational forecasting—with ready-to-download datasets spanning Supreme Court arguments, Reddit threads, Wikipedia talk pages, and movie dialogues.

The interesting bit The toolkit doesn’t just give you bag-of-words; it bakes in published social science. The “linguistic coordination” feature, for instance, measures power dynamics through function-word mimicry. The “Expected Conversational Context Framework” lets you characterize utterances by what typically surrounds them. These are specific, citable methods from Cornell NLP papers, not generic NLP utilities.

Key highlights

Ships with 10+ curated corpora (Supreme Court, Parliament Q&A, 900k subreddits, etc.) via convokit.download()
Implements published methods: politeness strategies, redirection detection, pivotal moment identification, CRAFT forecasting model
Scikit-learn-inspired unified interface; includes interactive Colab tutorials
Active maintenance: v4.1.1 released May 2026, 37 contributors, Discord community

Caveats

Some features (prompt types, surface motifs) appear commented out in the README—status unclear
Several dataset download links point to a Cornell server (zissou.infosci.cornell.edu); long-term availability not guaranteed
Heavy tilt toward academic research use cases; production deployment guidance is sparse

Verdict Researchers studying online discourse, power dynamics, or conversation derailment should start here. Engineers building chatbots or generic conversational AI will find useful pieces but may need to bridge gaps themselves.

Frequently asked

What is CornellNLP/ConvoKit?: A scikit-learn-flavored toolkit that turns messy conversations into measurable social signals.
Is ConvoKit open source?: Yes — CornellNLP/ConvoKit is open source, released under the MIT license.
What language is ConvoKit written in?: CornellNLP/ConvoKit is primarily written in Jupyter Notebook.
How popular is ConvoKit?: CornellNLP/ConvoKit has 639 stars on GitHub.
Where can I find ConvoKit?: CornellNLP/ConvoKit is on GitHub at https://github.com/CornellNLP/ConvoKit.