XaiJu
Innovate Futures @ Benji
Innovate Futures @ Benji

patreon


Wan 2.2 Sound To Video Released! Tutorial Guide To Make Your AI Character Talk!

Tutorial Video : https://youtu.be/MegoM8KSO_s

For Patreon Supporters :https://www.patreon.com/posts/137551686

In this video, we dive deep into the newly released WAN 2.2 Sound-to-Video model, just dropped by the WAN AI team. We test both image-to-video and video-to-video workflows in ComfyUI, exploring how well the model handles lip syncing, facial expressions, and motion accuracy. From setup and model files to performance tweaks like block swap, context window, and frame interpolation, we cover it all. You’ll see real-time comparisons between FP16, FP8, and GGUF quantized models, plus why using LoRAs like Light X2V can ruin your results. This is essential viewing for AI creators, developers, and content producers working with talking avatars or animated characters. If you're into AI-generated video, realistic lip sync, or pushing the limits of what’s possible with WAN 2.2, this breakdown will save you hours of trial and error.

Who this is for:

This content is ideal for AI video creators, deep learning enthusiasts, digital artists, and tech-savvy YouTubers who want to generate realistic talking avatars or animate existing characters using audio input. Whether you're using high-end GPUs or optimizing for lower VRAM setups, this guide gives you practical insights into making the most of the WAN 2.2 S2V model.

Why it matters:

With the rapid evolution of AI video generation, getting accurate lip sync and natural facial motion is crucial for believable content. This video highlights the pitfalls and best practices so you can avoid wasting time on unusable outputs and focus on high-quality, production-ready results.

Wan2.2 s2v in Comfy

https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files

Models Your Need To Get This working

-----------------------------------

models/diffusion_models

-----------------------------------

wan2.2_s2v_14B_bf16 (For High VRAM):

https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/wan2.2_s2v_14B_bf16.safetensors

WanVideo_comfy_fp8_scaled/S2V (For Low VRAM):

https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/S2V

Wan2.2-S2V-14B-GGUF (For Low VRAM):

https://huggingface.co/QuantStack/Wan2.2-S2V-14B-GGUF

models/audio_encoders

-----------------------------------

wav2vec2_large_english_fp16 :

https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/audio_encoders/wav2vec2_large_english_fp16.safetensors

models/Lora

-----------------------------------

wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors :

https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors

models/text_encoders

-----------------------------------

https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders

models/vae

-----------------------------------

https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/vae

https://github.com/benjiyaya/ComfyUI-Logic

option for Audio Separation : https://huggingface.co/Kijai/MelBandRoFormer_comfy/tree/main

Attached the Wan 2.2 S2V Sample video workflow, and the S2V I2V For Low VRAM optimize workflow that I did, freebies for all.


More Creators