Meta's image segmentation model that needs no retraining
SAM lets you isolate any object in an image with a click or bounding box, no custom training required.

What it does
The Segment Anything Model (SAM) generates pixel-perfect object masks from sparse prompts—single points, boxes, or coarse scribbles. Feed it an image and a hint, and it returns a mask. No fine-tuning, no labeled dataset for your specific objects. The repo provides inference code, three model checkpoints (ViT-H/L/B), and Jupyter notebooks to get started in minutes.
The interesting bit
The real labor isn’t the architecture; it’s the data engine. Meta built a model-in-the-loop pipeline where SAM assisted human annotators to produce 1.1 billion masks across 11 million images—the SA-1B dataset—then trained SAM on that output. It’s a brute-force approach to zero-shot generalization: cover enough visual diversity and the model stops needing domain-specific examples.
Key highlights
- Three model sizes trade accuracy for speed; ViT-H is default, ViT-B is the lightweight option
- ONNX export path lets you run the mask decoder in-browser or on edge devices
- Includes a React demo showing browser-based inference with multithreading
- Apache 2.0 license, which is unusually permissive for a major FAIR release
- Meta has already superseded this with SAM 2 for video; this repo remains the reference image implementation
Caveats
- The repo is inference-only; no training code or SA-1B dataset hosting here (download links point to Meta’s site with a research license)
- CUDA is “strongly recommended” but not enforced; CPU fallback will hurt on the larger backbones
- The README’s top half now redirects to SAM 2, which may confuse newcomers looking for video support
Verdict
Computer vision researchers and product engineers needing quick segmentation prototypes should grab this. If you’re already committed to video or need trainable pipelines, skip straight to SAM 2 instead.