Doc
09/07/2025 (Sun) 03:39
No.67011
del
>>67010It's only using speech to video via Wan 2.2 model (Chinese open source), which is pretty good for what it is but far far far behind Act 2 video2video performance, that is what I'll use ultimately.
All the voices are actually just very advanced text to speech + dozens of generations and some editing to get the acting right. I didn't even do multiple generations for the video because it takes 30+ minutes every 10 seconds of video.
I did all of the writing, speech and video today.