JIA-Lab-research/LISA
LISA is a large language model-based segmentation assistant that reasons over images using natural language instructions to produce pixel-level segmentation masks.

LISA extends large language models to perform reasoning-based image segmentation by accepting free-form text instructions that require visual understanding and reasoning. It accepts inputs like user queries (e.g., ‘Who was the president of the US in this image? Please output segmentation mask’) and produces both a segmentation mask and a textual response explaining its reasoning. The project includes trained models (7B and LISA++ variants), training code, inference scripts, a Gradio demo, and the LISA+ dataset for training reasoning segmentation models.