A curated prompt cookbook for OpenAI's latest image model, covering portraits, UI mockups, game screenshots, and posters you can drop straight into the API.
Image · Video · Audio
newcomers · velocity + momentumVoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.
This Codex Skill turns Chinese articles into hand-drawn, slightly absurd 16:9 illustrations where a deadpan black blob does the conceptual heavy lifting.
A research family of ASR and TTS models built on the bet that voice should be processed as long-form narrative, not chopped into seconds-long shards.
Cosmos 3 tries to unify video generation, robot action prediction, and physical reasoning inside a single 16B–64B Mixture-of-Transformers architecture.
ViMax orchestrates director, screenwriter, and producer agents to generate multi-shot videos from raw ideas, novels, or scripts.
MisoTTS brings Sesame-style conversational speech synthesis to local hardware, with a Llama backbone and a stubbornly English-only vocabulary.
OpenAI's Whisper replaces the usual Rube Goldberg pipeline of speech-processing tools with a single Transformer trained to do it all.
OmniVoice Studio runs voice cloning, dubbing, and dictation locally on macOS, Windows, and Linux — no API keys, no cloud, no subscription.
A visual programming interface for image, video, 3D, and audio generation that treats model pipelines as composable graphs.
An Electron app that wraps 200+ generative models behind a single UI, with an unusual pitch: no guardrails, no cloud lock-in, and a split personality between local and remote inference.
Pixelle-Video wires LLMs, image/video generators, TTS engines, and ffmpeg into a single Streamlit app that spits out short-form videos from a topic string.
A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.
Outpainting is the appetizer; the main course is automated 2D game asset generation with seam-aware tooling that exports engine-ready packs.
Curated prompts and API patterns for OpenAI's GPT-Image-2, organized by real use case rather than vibe or aesthetic.
A research framework that uses a multimodal LLM to plan video edits semantically, then hands off to a diffusion transformer to actually draw the frames.
A self-hostable infinite canvas that wires AI image generation, reference editing, and chat into one collaborative workspace.
Voicebox bundles seven TTS engines, Whisper dictation, and MCP agent hooks into a single Tauri app — all offline.
Handy is an offline, open-source dictation app that pastes your words into any text field—built to be extended, not monetized.
OpenMontage turns Claude, Cursor, or Copilot into a full video production studio that researches, scripts, generates assets, and renders finished pieces.



