Image · Video · Audio

Image · Video · Audio

heavyweights · velocity + momentum
02
OpenBMB/VoxCPM
+404 ★/dayaccelerating

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

28.3k Python Image · Video · Audio · explained
04
microsoft/VibeVoice
+221 ★/dayaccelerating

A research family of ASR and TTS models built on the bet that voice should be processed as long-form narrative, not chopped into seconds-long shards.

49.2k Python Image · Video · Audio · explained
05
NVIDIA/cosmos
+172 ★/dayaccelerating

Cosmos 3 tries to unify video generation, robot action prediction, and physical reasoning inside a single 16B–64B Mixture-of-Transformers architecture.

9.8k Jupyter Notebook Image · Video · Audio · explained
06
HKUDS/ViMax
+148 ★/dayaccelerating

ViMax orchestrates director, screenwriter, and producer agents to generate multi-shot videos from raw ideas, novels, or scripts.

9.6k Python Agents · explained
07
openai/whisper
+158 ★/dayaccelerating

OpenAI's Whisper replaces the usual Rube Goldberg pipeline of speech-processing tools with a single Transformer trained to do it all.

102.4k Python Image · Video · Audio · explained
08
MisoLabsAI/MisoTTS
+122 ★/daysteady

MisoTTS brings Sesame-style conversational speech synthesis to local hardware, with a Llama backbone and a stubbornly English-only vocabulary.

2.6k Python Image · Video · Audio · explained
09
debpalash/OmniVoice-Studio
+127 ★/dayaccelerating

OmniVoice Studio runs voice cloning, dubbing, and dictation locally on macOS, Windows, and Linux — no API keys, no cloud, no subscription.

6.8k Python Image · Video · Audio · explained
10
Comfy-Org/ComfyUI
+143 ★/dayaccelerating

A visual programming interface for image, video, 3D, and audio generation that treats model pipelines as composable graphs.

116.5k Python Image · Video · Audio · explained
11
Anil-matcha/Open-Generative-AI
+114 ★/dayaccelerating

An Electron app that wraps 200+ generative models behind a single UI, with an unusual pitch: no guardrails, no cloud lock-in, and a split personality between local and remote inference.

18.8k JavaScript Image · Video · Audio · explained
12
AIDC-AI/Pixelle-Video
+112 ★/dayaccelerating

Pixelle-Video wires LLMs, image/video generators, TTS engines, and ffmpeg into a single Streamlit app that spits out short-form videos from a topic string.

22k Python Image · Video · Audio · explained
13
modelscope/FunASR
+104 ★/dayaccelerating

A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.

17.7k Python Image · Video · Audio · explained
15
boona13/image-extender
+63 ★/daysteady

Outpainting is the appetizer; the main course is automated 2D game asset generation with seam-aware tooling that exports engine-ready packs.

969 TypeScript Image · Video · Audio · explained
16
jamiepine/voicebox
+73 ★/daycooling

Voicebox bundles seven TTS engines, Whisper dictation, and MCP agent hooks into a single Tauri app — all offline.

29.7k TypeScript Image · Video · Audio · explained
17
bytedance/Bernini
+53 ★/daysteady

A research framework that uses a multimodal LLM to plan video edits semantically, then hands off to a diffusion transformer to actually draw the frames.

671 Python Image · Video · Audio · explained
18
basketikun/infinite-canvas
+56 ★/daysteady

A self-hostable infinite canvas that wires AI image generation, reference editing, and chat into one collaborative workspace.

1.3k TypeScript Creative · Design · explained
19
cjpais/Handy
+64 ★/dayaccelerating

Handy is an offline, open-source dictation app that pastes your words into any text field—built to be extended, not monetized.

23.5k Rust Image · Video · Audio · explained
20
abus-aikorea/voice-pro
+56 ★/dayaccelerating

Voice-Pro bundles Whisper, F5-TTS, CosyVoice, and a dozen other tools into a single Gradio interface for creators who want ElevenLabs-like results without the API bills.

10.9k Python Inference · Serving · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.