BERT + GPT-2: a very expensive WebMD search bar
A hackathon project that pipelines two language models to retrieve and generate medical answers, with the authors explicitly begging you not to use it for actual medical advice.

What it does DocProduct takes a medical question, encodes it with a fine-tuned BioBERT model, runs similarity search via FAISS over 700k scraped Q&A pairs from Reddit and WebMD, then feeds the retrieved context to a fine-tuned GPT-2 (117M parameters) to generate an answer. The whole thing is glued together with custom Keras feedforward networks and a lot of TensorFlow version contortion.
The interesting bit The training trick is the clever part: instead of standard negative sampling, they compute every question-answer dot product in a batch, softmax across rows, and use cross-entropy against a ground-truth pairing matrix. It’s a neat workaround for the fact that embeddings change every step, so NCE loss won’t fly.
Key highlights
- Scraped and wrangled 700k medical Q&A pairs from six different forums, each with its own HTML mess
- Re-implemented BERT in TF 2.0 alpha and got it talking to a TF 1.x GPT-2 model via
tf.compat.v1.disable_eager_execution - Top-6 finalist in the #PoweredByTF 2.0 Challenge; presented to the TensorFlow team
- Provides Colab notebooks for retrieval, training, and an “experimental” end-to-end pipeline
- Authors are upfront: “IT SHOULD NOT TO BE USED FOR ACTIONABLE MEDICAL ADVICE” (their caps, their wisdom)
Caveats
- Built on TF 2.0.0-alpha0, which is now archaeological; expect dependency pain
- The full pipeline is explicitly labeled experimental in the README
- Over a terabyte of generated TFRecords/CSV/checkpoints, but the actual model weights live on OneDrive
Verdict Worth a look if you’re researching medical NLP retrieval architectures or need a case study in mashing BERT and GPT-2 together. Skip it if you want production code, current dependencies, or—heaven forbid—actual medical advice.