XaiJu
Innovate Futures @ Benji
Innovate Futures @ Benji

patreon


Qwen Image Edit & Wan 2.2 - Create Coherent AI Video Scenes With This!

Tutorial Video : https://youtu.be/YQLq--X--HY

In this video, we explore a powerful AI storytelling pipeline that combines language models, text-to-image generation, and image-to-video workflows to create structured, multi-scene AI videos. Instead of relying on a single reference image or generating random clips, the creator demonstrates how to use Qwen 3 Max to generate a sequence of detailed text prompts—each describing a specific scene with subject, action, and environment—for a cohesive 30-second narrative. These prompts are then used to generate consistent character images via Flux Context, followed by turning each image into a 5-second video clip using WAN 2.2 MOE and Light X2V image-to-video LoRAs. The result is a cinematic-style AI video composed of six distinct but visually coherent scenes, complete with sound design. This method offers far more control than traditional long-form AI video generation, avoiding issues like prompt drift and visual inconsistency.

Who is This Content Suitable For?

This content is ideal for:

Why Does This Matter?

Most AI video models struggle with long-term coherence, often breaking down after 10–15 seconds with random objects, shifting styles, or illogical transitions. This video presents a smarter alternative: treating AI video creation like real filmmaking—by planning scenes, maintaining character consistency, and editing clips together. By leveraging LLMs for script breakdowns, controlled image generation, and modular video synthesis, creators can produce high-quality, meaningful narratives instead of chaotic clips. This approach represents a shift from experimental AI demos to practical, repeatable content creation systems, making it easier to produce professional-grade AI videos for storytelling, marketing, or entertainment.

lovis93/next-scene-qwen-image-lora-2509

https://huggingface.co/lovis93/next-scene-qwen-image-lora-2509

lightx2v/Wan2.2-I2V-A14B-Moe-Distill-Lightx2v

https://huggingface.co/lightx2v/Wan2.2-I2V-A14B-Moe-Distill-Lightx2v

HunyuanVideo-Foley Custom Node:

https://github.com/phazei/ComfyUI-HunyuanVideo-Foley

HunyuanVideo-Foley Model Download:

https://huggingface.co/phazei/HunyuanVideo-Foley/tree/main

SRPO Lora

https://huggingface.co/Alissonerdx/flux.1-dev-SRPO-LoRas/tree/main

Attached 3 workflows that mentioned in this tutorial:

Comments

I'd like to thank you for this amazing workflow! This is jus something that I needed right now and this workflow saved me from lots of headache!This is just perfect! May I ask you another workflow, that could do these morphing videos? Same Idea, but WAN vace and WAN combined together. Like in this video: https://www.youtube.com/live/lPMhXfNne0E?si=9YyawfcWejxHIX0p, 37:19 --->

Minna

Fantastic! Forgive me for the question, how many GB do the models weigh for the entire workflow?

Enzo Brand


More Creators