AI Knowledge Central

AI Knowledge Central

Your Talking Avatar - 🔥NSFW🔥 Audio

Added 2025-09-22 19:27:20 +0000 UTC

Hey friends! 💛 I put together a dead-simple InfiniteTalk + WAN I2V setup so you can get talking-head video generation working in ComfyUI without guesswork. You’ll install ComfyUI, a few essential nodes, and grab the exact models this workflow expects. Everything’s below—just match the GGUF to your GPU VRAM and you’re golden.

One Click installer for my patreon supporters 💓:
https://www.patreon.com/posts/comfyui-infinite-139122339

💻 Software

Get Git
If you haven’t already, install Git from here:
https://git-scm.com/
ComfyUI — Generate video, images, audio, and more with a node graph.
https://github.com/comfyanonymous/ComfyUI
Download and install.
Or If you are a 50XX Series User go here:
https://github.com/comfyanonymous/ComfyUI
Download and Unzip.
ComfyUI-Manager
Open the cmd while you are in your ComfyUI Folder under ComfyUI\custom_nodes
git clone https://github.com/Comfy-Org/ComfyUI-Manager.git

🧩 Must-have Custom Nodes

Install these from Inside ComfyUI After installing the manager:

KJNodes
https://github.com/kijai/ComfyUI-KJNodes

rgthree-comfy
https://github.com/rgthree/rgthree-comfy

VideoHelperSuite
https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite

MelBandRoFormer (node)
https://github.com/kijai/ComfyUI-MelBandRoFormer

WanVideoWrapper
https://github.com/kijai/ComfyUI-WanVideoWrapper

🧠 Pick the right GGUF for your VRAM
8 GB → Q4_K_M → fallback Q3_K_M
12 GB → Q5_K_M → fallback Q4_K_M
16 GB → Q6_K → fallback Q5_K_M
24 GB → Q6_K (MBR fp32)
32 GB+ → Q8_0

📦 Models (what each is for)

😊 InfiniteTalk (GGUF) — Talking-head driver
This is the core GGUF model that powers lip-sync and head motion generation from audio input. It listens to the audio and turns it into realistic mouth movements, facial animation, and subtle head nods. ➡️ Store in:models/diffusion_models
https://huggingface.co/Kijai/WanVideo_comfy_GGUF/tree/main/InfiniteTalk

📹WAN 2.1 I2V 14B 480p (GGUF) — Image-to-Video backbone
Provides the motion engine that takes a still input frame and animates it into smooth video at 480p resolution. While InfiniteTalk handles facial sync, WAN I2V ensures natural motion and temporal coherence. ➡️ Store in: models/diffusion_models
https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/main

⚡Lightx2v LoRA (Lightning 4-step) — Speed booster for I2V
A distilled low-step LoRA that reduces the number of diffusion steps (down to ~4) while keeping quality. It makes the whole pipeline faster and more efficient, especially for long sequences. ➡️ Store in: models/loras
https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors?download=true

🧰 MelBandRoFormer (fp16 / fp32) — Vocal Separation Model
Separates raw audio into vocals and instruments, ensuring a clean speech track for lip-sync. InfiniteTalk relies on this isolated voice to avoid background noise interference.
➡️ Store in: models/diffusion_models
fp16: optimized for GPUs up to ~24 GB VRAM.
https://huggingface.co/Kijai/MelBandRoFormer_comfy/resolve/main/MelBandRoformer_fp16.safetensors?download=true
fp32: more stable on larger setups with high VRAM.
https://huggingface.co/Kijai/MelBandRoFormer_comfy/resolve/main/MelBandRoformer_fp32.safetensors?download=true

📜 UMT5-XXL (Text Encoder) — Prompt interpreter
A massive text encoder (based on mT5 XXL) that converts written prompts into semantic embeddings. This lets the model understand and follow style, context, and conditioning beyond audio. ➡️ Store in: models/text_encoders
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp16.safetensors?download=true

🖼️ CLIP-Vision H — Visual encoder
Processes input frames or reference images to ensure the animated video remains faithful to the original identity and composition. ➡️ Store in: models/clip_vision
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors

🛠️WAN 2.1 VAE — Latent encoder/decoder
The VAE compresses frames into latents for efficient processing and reconstructs them into visuals. Using the repackaged WAN VAE ensures maximum compatibility. ->models/vae
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors?download=true

🧰 Tencent Wav2Vec2 (Base) — Speech Embedding Model
Extracts speech embeddings from the clean vocal track. These embeddings capture phoneme- and prosody-like features, which InfiniteTalk uses to generate accurate mouth shapes and timing.
No Download link. This will be downloaded on runtime.

👉 TL;DR flow

1. Install ComfyUI + ComfyUI-Manager
2. Add the listed custom nodes
3. Download the models above
4. Launch InfiniteTalk, feed text/audio, and render

Comments

Hi, i haven't found a way to to port it to wan 2.2 yet. Regarding the spanish audio that is a tough i found this. but it needs to be compiled somehow it seems: https://huggingface.co/flax-community/wav2vec2-spanish/tree/main To be honest that is out of my knowledge sorry.

Chris Wenzl

2025-10-28 22:25:57 +0000 UTC

Hi Chris, any chance for this workflow to: 1. move to wan2.2 (better LoRa support, higher quality) 2. include Spanish language lip sync. Currently it doesn't work well with Spanish audio (I assume due to the Chinese wan2vec2 model

Niko Louvranos

2025-10-28 13:14:33 +0000 UTC

Hi, could you try to lower the output resolution or the length of the audio file? so that you get a "quick" test. Dm me if that problem persists.

Chris Wenzl

2025-10-19 15:01:46 +0000 UTC

Hi Chris I can't seem to get this passed WanVideoSampler. It sticks at that. Which is 75% through the whole process but 0% through WanVideoSampler. I've tried changing the blocks to swap value to 40 but it didn't make a difference. Same with lowering the resolution of the source image. I've also tried q5km and q4km to no avail. any help would be appreciated

Iain Forbes

2025-10-18 20:29:15 +0000 UTC

More Creators

VastayaVRAssets

VastayaVRAssets

gumroad

StriderDen

StriderDen

patreon

VixyTg

VixyTg

patreon

Sourabh P Hamigi

Sourabh P Hamigi

gumroad

Nakaze

Nakaze

fanbox

1027476256019

1027476256019

gumroad

タクミンH

タクミンH

fantia

Automatiza con Integromat - Marketing Automation

Automatiza con Integromat - Marketing Automation

gumroad

mymr

mymr

patreon

freewolf

freewolf

patreon

jeff672166

jeff672166

fanbox

imyme-maro

imyme-maro

fanbox

Yetti

Yetti

patreon

CHIYOU

CHIYOU

patreon

Zoen

Zoen

patreon

Car (Floombo)

Car (Floombo)

patreon

dtparker

dtparker

patreon

美股持續進修互助社

美股持續進修互助社

patreon

sushiii_fx

sushiii_fx

patreon

inknox

inknox

gumroad

SinDD

SinDD

patreon

Sabine

Sabine

patreon

さばみ

さばみ

fantia

TRIBUTE: Modding Tools

TRIBUTE: Modding Tools

patreon

CosmicPalomita

CosmicPalomita

patreon

BimbonyBrats

BimbonyBrats

patreon

甲斐田の絵垢

甲斐田の絵垢

fanbox

Darling_sama

Darling_sama

patreon

Rolling Racoon

Rolling Racoon

patreon

OverlordJC

OverlordJC

patreon

yacdom

yacdom

fanbox

てらねこす

てらねこす

fanbox

rabidcomics

rabidcomics

patreon

MiubiArts

MiubiArts

patreon

zelolee

zelolee

patreon

fixzhuzhu

fixzhuzhu

fanbox

The Curator

The Curator

patreon

Fluffy_Dus

Fluffy_Dus

patreon

FRUTPLZZ

FRUTPLZZ

patreon

Makehimfemme

Makehimfemme

patreon