donahowe/AutoStudio
AutoStudio is a training-free multi-agent framework that uses LLM-based agents to orchestrate consistent multi-turn interactive image generation with Stable Diffusion.

AutoStudio enables users to interact with a system over multiple turns to generate coherent image sequences while maintaining subject consistency. The framework employs three LLM-based agents (subject manager, layout generator, supervisor) and a Stable Diffusion-based drawer agent. It introduces a Parallel-UNet architecture with dual cross-attention modules for subject-aware feature exploitation and a subject-initialized generation method to preserve small subjects across image generations.