Doc

09/07/2025 (Sun) 03:39 No.67011 del

>>67010
It's only using speech to video via Wan 2.2 model (Chinese open source), which is pretty good for what it is but far far far behind Act 2 video2video performance, that is what I'll use ultimately.

All the voices are actually just very advanced text to speech + dozens of generations and some editing to get the acting right. I didn't even do multiple generations for the video because it takes 30+ minutes every 10 seconds of video.

I did all of the writing, speech and video today.