Image · Video · Audio

Image · Video · Audio

newcomers · gaining speed
02
OpenBMB/VoxCPM
+404 ★/dayaccelerating

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

28.3k Python Image · Video · Audio · explained
03
NVIDIA/cosmos
+172 ★/dayaccelerating

Cosmos 3 tries to unify video generation, robot action prediction, and physical reasoning inside a single 16B–64B Mixture-of-Transformers architecture.

9.8k Jupyter Notebook Image · Video · Audio · explained
04
HKUDS/ViMax
+148 ★/dayaccelerating

ViMax orchestrates director, screenwriter, and producer agents to generate multi-shot videos from raw ideas, novels, or scripts.

9.6k Python Agents · explained
05
Anil-matcha/Open-Generative-AI
+114 ★/dayaccelerating

An Electron app that wraps 200+ generative models behind a single UI, with an unusual pitch: no guardrails, no cloud lock-in, and a split personality between local and remote inference.

18.8k JavaScript Image · Video · Audio · explained
06
modelscope/FunASR
+104 ★/dayaccelerating

A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.

17.7k Python Image · Video · Audio · explained
07
abus-aikorea/voice-pro
+56 ★/dayaccelerating

Voice-Pro bundles Whisper, F5-TTS, CosyVoice, and a dozen other tools into a single Gradio interface for creators who want ElevenLabs-like results without the API bills.

10.9k Python Inference · Serving · explained
08
openai/whisper
+158 ★/dayaccelerating

OpenAI's Whisper replaces the usual Rube Goldberg pipeline of speech-processing tools with a single Transformer trained to do it all.

102.4k Python Image · Video · Audio · explained
09
facebookresearch/sam-3d-body
+32 ★/dayaccelerating

A foundation model that turns one image into a full 3D body mesh, optionally guided by keypoints or masks like the original SAM.

3.2k Python Computer Vision · explained
10
k2-fsa/sherpa-onnx
+31 ★/dayaccelerating

A fully offline speech toolkit packing ASR, TTS, diarization, and VAD into one C++ runtime with ONNX, then wrapping it for twelve languages and every edge platform imaginable.

12.9k C++ Image · Video · Audio · explained
11
liwenxi/SWIFT-AI
+14 ★/dayaccelerating

A speed-focused deep learning system for analyzing massive scientific images, from crowds to cancer slides to galaxies.

1.4k Jupyter Notebook Computer Vision · explained
12
HITsz-TMG/VideoClaw
+15 ★/dayaccelerating

VideoClaw turns a one-sentence prompt into a full production pipeline with editable checkpoints, not just a black-box video dump.

1.4k Python Agents · explained
13
SamurAIGPT/Generative-Media-Skills
+17 ★/dayaccelerating

Structured prompt packs that let Claude Code or Cursor generate, edit, and display images, video, and audio via a unified CLI backend.

3.5k Shell Agents · explained
14
Blaizzy/mlx-vlm
+20 ★/dayaccelerating

MLX-VLM crams speculative decoding, continuous batching, and KV cache quantization into a Mac-native toolkit for running multimodal models locally.

5k Python Image · Video · Audio · explained
15
ZhengPeng7/BiRefNet
+13 ★/dayaccelerating

BiRefNet splits images into layers using bilateral references, then offers a whole zoo of task-specific weights for everything from background removal to camouflaged-object detection.

3.7k Python Computer Vision · explained
17
Roblox/cube
+10 ★/dayaccelerating

A foundation model that turns prompts into game-ready meshes, because someone finally had to do it for the metaverse.

1.1k Jupyter Notebook Image · Video · Audio · explained
18
tin2tin/Pallaidium
+6.7 ★/dayaccelerating

A free add-on that turns Blender's Video Sequence Editor into an end-to-end AI movie pipeline, from script to screen and back again.

1.4k Python Image · Video · Audio · explained
19
hustvl/4DGaussians
+12 ★/dayaccelerating

Extends 3D Gaussian Splatting to time-varying scenes without sacrificing the real-time rendering speed that made the original technique appealing.

3.7k Jupyter Notebook Computer Vision · explained
20
OpenWhispr/openwhispr
+23 ★/dayaccelerating

OpenWhispr is the open-source, privacy-first alternative to WisprFlow and Granola that lets you choose between local Whisper/Parakeet models or your own cloud API keys.

3.7k TypeScript Image · Video · Audio · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.