← all repositories

shenyunhang/APE

A foundational vision-language model for universal visual perception that detects and segments objects using natural language descriptions across diverse domains.

607 stars Python Computer Vision
APE
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

APE aligns visual and language representations through an innovative prompting mechanism to enable universal visual perception. The model performs open-world object detection, instance segmentation, semantic segmentation, and referring expression comprehension using a single unified architecture. It achieves competitive performance across 160 datasets by leveraging vision-language transformer foundations with thousands of vocabulary entries and language descriptions for flexible perception in varied scenarios.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.