Yes — cdqa-suite/cdQA is open source, released under the Apache-2.0 license.

What language is cdQA written in?

cdqa-suite/cdQA is primarily written in Python.

cdqa-suite/cdQA has 617 stars on GitHub.

Where can I find cdQA?

cdqa-suite/cdQA is on GitHub at https://github.com/cdqa-suite/cdQA.

cdqa-suite/cdQA

A retired BERT pipeline that still teaches

cdQA was a Python toolkit for building closed-domain QA systems on your own documents, before its authors sent everyone to Haystack instead.

★617 stars Python Language Models RAG · Search

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

cdQA is an end-to-end question-answering pipeline that lets you point BERT (or DistilBERT) at your own document collection—PDFs, markdown, whatever—and ask it natural-language questions. It handles the full loop: converting documents into a pandas DataFrame, retrieving relevant paragraphs, and running a pre-trained reader to extract answers. There’s also a Flask API and a companion UI project if you want to wrap it in a web interface.

The interesting bit

The pipeline explicitly splits the problem into a retriever stage and a reader stage, then blends their scores with a tunable weight. That’s not exotic now, but the project arrived early enough that its Medium article and NLP Breakfast talk became reference material for people learning how BERT-based QA actually works.

Key highlights

Built on HuggingFace transformers, with ready-to-use BERT and DistilBERT readers fine-tuned on SQuAD 1.1
Includes converters for PDF and Markdown; needs Java OpenJDK for PDF parsing
Supports custom fine-tuning on SQuAD-like annotated data via a separate web annotator tool
Provides notebook tutorials runnable on Binder or Google Colab
Ships with a lightweight Flask API for deployment

Caveats

Not maintained. The README banner points users to Haystack as the actively supported alternative
Converter support is limited to PDF and Markdown; the README’s “plan to add more” never materialized
GPU experiments were run on a single Tesla V100; no explicit guidance on whether smaller hardware is viable

Verdict

Worth a look if you’re studying how retriever-reader QA pipelines work and want a clean, educational codebase to dissect. Skip it for production use; the authors themselves redirect you to Haystack.

Frequently asked

What is cdqa-suite/cdQA?: cdQA was a Python toolkit for building closed-domain QA systems on your own documents, before its authors sent everyone to Haystack instead.
Is cdQA open source?: Yes — cdqa-suite/cdQA is open source, released under the Apache-2.0 license.
What language is cdQA written in?: cdqa-suite/cdQA is primarily written in Python.
How popular is cdQA?: cdqa-suite/cdQA has 617 stars on GitHub.
Where can I find cdQA?: cdqa-suite/cdQA is on GitHub at https://github.com/cdqa-suite/cdQA.