Image · Video · Audio — the hottest AI repositories on heatdrop

Image · Video · Audio

newcomers · velocity + momentum

+303 ★/day→steady

This Codex Skill turns Chinese articles into hand-drawn, slightly absurd 16:9 illustrations where a deadpan black blob does the conceptual heavy lifting.

★ 3.4k Coding Assistants · explained

EvoLinkAI/awesome-gpt-image-2-API-and-Prompts

+322 ★/day→steady

Curated prompts and API patterns for OpenAI's GPT-Image-2, organized by real use case rather than vibe or aesthetic.

★ 16.3k Python Image · Video · Audio · explained

jamiepine/voicebox

+221 ★/day→steady

Voicebox bundles seven TTS engines, Whisper dictation, and MCP agent hooks into a single Tauri app — all offline.

★ 29.5k TypeScript Image · Video · Audio · explained

freestylefly/awesome-gpt-image-2

+163 ★/day→steady

A crowdsourced library that reverse-engineers GPT-Image2 examples into structured, reusable prompt templates for automation workflows.

★ 7.1k JavaScript Image · Video · Audio · explained

microsoft/VibeVoice

+170 ★/day→steady

A research family of ASR and TTS models built on the bet that voice should be processed as long-form narrative, not chopped into seconds-long shards.

★ 48.7k Python Image · Video · Audio · explained

YouMind-OpenLab/awesome-gpt-image-2

+138 ★/day→steady

A curated, multilingual prompt library for OpenAI's GPT Image 2, with preview images and Raycast snippet support.

★ 7.2k TypeScript Learning · explained

MisoLabsAI/MisoTTS

+120 ★/day→steady

MisoTTS brings Sesame-style conversational speech synthesis to local hardware, with a Llama backbone and a stubbornly English-only vocabulary.

★ 2.2k Python Image · Video · Audio · explained

debpalash/OmniVoice-Studio

+111 ★/day→steady

OmniVoice Studio runs voice cloning, dubbing, and dictation locally on macOS, Windows, and Linux — no API keys, no cloud, no subscription.

★ 6.6k Python Image · Video · Audio · explained

neilsonnn/image-blaster

+94 ★/day→steady

A TypeScript skillset that turns a single photo into meshes, gaussian splats, and sound effects by orchestrating multiple generative models through Claude.

★ 4.5k TypeScript Agents · explained

boona13/image-extender

+77 ★/day→steady

Outpainting is the appetizer; the main course is automated 2D game asset generation with seam-aware tooling that exports engine-ready packs.

★ 945 TypeScript Image · Video · Audio · explained

AUTOMATIC1111/stable-diffusion-webui

+118 ★/day→steady

A Gradio-based web UI that crams every community trick for image generation into one browser tab.

★ 163.5k Python Image · Video · Audio · explained

huangserva/3DCellForge

+85 ★/day→steady

React prototype that wires multiple image-to-3D APIs into a single presentation-ready studio.

★ 2.4k JavaScript Image · Video · Audio · explained

OpenBMB/VoxCPM

+104 ★/day→steady

VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.

★ 27.5k Python Image · Video · Audio · explained

AIDC-AI/Pixelle-Video

+102 ★/day→steady

Pixelle-Video wires LLMs, image/video generators, TTS engines, and ffmpeg into a single Streamlit app that spits out short-form videos from a topic string.

★ 21.7k Python Image · Video · Audio · explained

waooAI/waoowaoo

+93 ★/day→steady

A solo-built TypeScript studio that turns Chinese web novels into AI-generated storyboards, characters, and voiced video.

★ 12.6k TypeScript Image · Video · Audio · explained

QwenLM/Qwen3-TTS

+86 ★/day→steady

A 1.7B-parameter speech model that streams its first audio packet after a single character and takes voice design instructions in plain English—or Chinese, or nine other languages.

★ 11.8k Python Image · Video · Audio · explained

hacksider/Deep-Live-Cam

+95 ★/day→steady

Deep-Live-Cam swaps faces in real time using a single source image and your laptop camera.

★ 93.7k Python Image · Video · Audio · explained

chatfire-AI/huobao-drama

+82 ★/day→steady

A TypeScript stack that automates scriptwriting, storyboarding, and video synthesis for the short-drama gold rush.

★ 12.6k TypeScript Domain Apps · explained

Comfy-Org/ComfyUI

+94 ★/day→steady

A visual programming interface for image, video, 3D, and audio generation that treats model pipelines as composable graphs.

★ 116.1k Python Image · Video · Audio · explained

bytedance/Bernini

+58 ★/day→steady

A research framework that uses a multimodal LLM to plan video edits semantically, then hands off to a diffusion transformer to actually draw the frames.

★ 556 Python Image · Video · Audio · explained

loading more…