My workflow/setup for WAN 2.1
Added 2025-03-22 11:12:17 +0000 UTCHello everybody,
this time a quick overview, how I was able to archieve pretty consistent videos with a length of 10-20 seconds.
Prequesites:
Nvidia GPU with 12 GB VRAM or more (less is possible, but need exponantial more time for generation) and 32 GB RAM would be good.
If you have not the required Hardware, use a ComfyUI Wan template on Runpod.
WAN 2.1 GGUF Models and Clips (GGUF Models are quantitized Models, that make WAN 2.1 accessible for GPUs with less VRAM:
I2V 480P models - Q4_k_s 12GB VRAM - Q5_k_s 16 GB VRAM -> place it into comfyui/models/diffusion_models
https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/mainClip Model -> place it into comfyui/models/clip_vision
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/clip_vision/clip_vision_h.safetensorsWan 2.1 VAE -> place it into comfyui/models/vae
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensorsText Encoder -> place it into comfyui/models/text_encoders
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/text_encoders
Recent ComfyUI Release and with ComfyUI Manager
1 Click Installer for most of the needed components: https://civitai.com/models/1309415/comfyui-auto-installer-wan21-or-gguf-or-upscale
My used Workflows are attached. The main workflow is from the author https://civitai.com/user/UmeAiRT . In my testing was these workflow the best ones in terms of features and usage.
After you placed and installed everything simply drag the workflows into the Comfyui Window and click "Install missing nodes".
Process:
The process is splitted into three parts and workflows:
Generation Workflow generates you the Videos with your desired Image.
Here are some Tipps for the settings:Teacache: Will speed up the generation due to caching techs with only a little quality loss. Use 0.19 for medium generation and 0.26 for fast generation. These are the Values of the official Github page.
Frames: More Frames = longer generation and incosistent videos. I had the best results with a maximum of 48 Frames.
Shift: This is the value for the movement of the video. You shouldn´t go over 4 to not loose too much quality.
Upscaler Ratio and model: Use your desired settings, but have in mind that to high resolutions are not supported on mobile devices. For that reason I have 0.2 set
Steps: 20-50 Steps, depending on the Video and your GPU
Prompting Structure: Subject (Subject Description) + Scene (Scene Description) + Motion (Motion Description) + Camera Language + Atmosphere Words + Style
This is a no brainer structure. You can it also use to give AIs a structure for analyzing your image.Loras: You can find my muscle lora on patreon as Super Raccoon and a bunch of others on Civitai: https://civitai.com/models
Interpolation: A needed feature that make the video smoother due to adding frames.
Video Combiner Workflow simply combine more iterations to one video.Simple copy or delete the nodes for your amount of videos.
Reactor Workflow will adjust the incosistent face in your video. Simply make a screenshot of the face in your initial image, that was used for the Video generation. All settings are perfectly set in the workflow. At the first start the nodes will download additional models.
This guide is a WIP and a first iteration. I will update it, if I face changes in my workflow.
I hope that I could help you a little bit with the WAN 2.1 model usage.
Yours Fino!