Image · Video · Audio

Image · Video · Audio

newcomers · velocity + momentum
03
jamiepine/voicebox
+221 ★/daysteady

Voicebox bundles seven TTS engines, Whisper dictation, and MCP agent hooks into a single Tauri app — all offline.

29.5k TypeScript Image · Video · Audio · explained
05
microsoft/VibeVoice
+170 ★/daysteady

A research family of ASR and TTS models built on the bet that voice should be processed as long-form narrative, not chopped into seconds-long shards.

48.7k Python Image · Video · Audio · explained
07
MisoLabsAI/MisoTTS
+120 ★/daysteady

MisoTTS brings Sesame-style conversational speech synthesis to local hardware, with a Llama backbone and a stubbornly English-only vocabulary.

2.2k Python Image · Video · Audio · explained
09
neilsonnn/image-blaster
+94 ★/daysteady

A TypeScript skillset that turns a single photo into meshes, gaussian splats, and sound effects by orchestrating multiple generative models through Claude.

4.5k TypeScript Agents · explained
10
boona13/image-extender
+77 ★/daysteady

Outpainting is the appetizer; the main course is automated 2D game asset generation with seam-aware tooling that exports engine-ready packs.

945 TypeScript Image · Video · Audio · explained
13
OpenBMB/VoxCPM
+104 ★/daysteady

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

27.5k Python Image · Video · Audio · explained
14
AIDC-AI/Pixelle-Video
+102 ★/daysteady

Pixelle-Video wires LLMs, image/video generators, TTS engines, and ffmpeg into a single Streamlit app that spits out short-form videos from a topic string.

21.7k Python Image · Video · Audio · explained
15
waooAI/waoowaoo
+93 ★/daysteady

A solo-built TypeScript studio that turns Chinese web novels into AI-generated storyboards, characters, and voiced video.

12.6k TypeScript Image · Video · Audio · explained
16
QwenLM/Qwen3-TTS
+86 ★/daysteady

A 1.7B-parameter speech model that streams its first audio packet after a single character and takes voice design instructions in plain English—or Chinese, or nine other languages.

11.8k Python Image · Video · Audio · explained
18
chatfire-AI/huobao-drama
+82 ★/daysteady

A TypeScript stack that automates scriptwriting, storyboarding, and video synthesis for the short-drama gold rush.

12.6k TypeScript Domain Apps · explained
19
Comfy-Org/ComfyUI
+94 ★/daysteady

A visual programming interface for image, video, 3D, and audio generation that treats model pipelines as composable graphs.

116.1k Python Image · Video · Audio · explained
20
bytedance/Bernini
+58 ★/daysteady

A research framework that uses a multimodal LLM to plan video edits semantically, then hands off to a diffusion transformer to actually draw the frames.

556 Python Image · Video · Audio · explained
loading more…

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.