xmed-lab/CLIP_Surgery
A research method that improves CLIP model explainability and enhances open-vocabulary tasks like segmentation and multi-label recognition without fine-tuning.

CLIP Surgery is a technique that addresses two explainability problems in CLIP — opposite visualization in self-attention and noisy activations across labels — by surgical modification of the model’s behavior without requiring fine-tuning or additional supervision. The method enhances downstream tasks including multi-label recognition, semantic segmentation, and integration with Segment Anything Model (SAM) through text-to-point prompting. The project provides a Jupyter demo and Python implementation for interpretability analysis.