A curated prompt cookbook for OpenAI's latest image model, covering portraits, UI mockups, game screenshots, and posters you can drop straight into the API.
Image · Video · Audio
newcomers · gaining speedVoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.
Cosmos 3 tries to unify video generation, robot action prediction, and physical reasoning inside a single 16B–64B Mixture-of-Transformers architecture.
ViMax orchestrates director, screenwriter, and producer agents to generate multi-shot videos from raw ideas, novels, or scripts.
An Electron app that wraps 200+ generative models behind a single UI, with an unusual pitch: no guardrails, no cloud lock-in, and a split personality between local and remote inference.
A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.
Voice-Pro bundles Whisper, F5-TTS, CosyVoice, and a dozen other tools into a single Gradio interface for creators who want ElevenLabs-like results without the API bills.
OpenAI's Whisper replaces the usual Rube Goldberg pipeline of speech-processing tools with a single Transformer trained to do it all.
A foundation model that turns one image into a full 3D body mesh, optionally guided by keypoints or masks like the original SAM.
A fully offline speech toolkit packing ASR, TTS, diarization, and VAD into one C++ runtime with ONNX, then wrapping it for twelve languages and every edge platform imaginable.
A speed-focused deep learning system for analyzing massive scientific images, from crowds to cancer slides to galaxies.
VideoClaw turns a one-sentence prompt into a full production pipeline with editable checkpoints, not just a black-box video dump.
Structured prompt packs that let Claude Code or Cursor generate, edit, and display images, video, and audio via a unified CLI backend.
MLX-VLM crams speculative decoding, continuous batching, and KV cache quantization into a Mac-native toolkit for running multimodal models locally.
BiRefNet splits images into layers using bilateral references, then offers a whole zoo of task-specific weights for everything from background removal to camouflaged-object detection.
A grab-bag node pack whose Set/Get rewrite might finally tame your worst workflow tangles.
A foundation model that turns prompts into game-ready meshes, because someone finally had to do it for the metaverse.
A free add-on that turns Blender's Video Sequence Editor into an end-to-end AI movie pipeline, from script to screen and back again.
Extends 3D Gaussian Splatting to time-varying scenes without sacrificing the real-time rendering speed that made the original technique appealing.
OpenWhispr is the open-source, privacy-first alternative to WisprFlow and Granola that lets you choose between local Whisper/Parakeet models or your own cloud API keys.






