Kitten TTS by KittenML is the voice engine powering this workflow. Free, open-source, offline capable.
Pipeline Overview
Step 1: Prompt Design
Create image generation prompts matching your script topic. Style and composition parameters.
Step 2: Image Generation
ComfyUI generates images using SDXL/Flux. Batch generate for each script segment.
Step 3: TTS Voiceover
Kitten TTS generates narration. Match audio timing to image count.
Step 4: Video Composition
Combine generated images with TTS audio. Add Ken Burns effect or zoom transitions.
Step 5: Post-Processing
Add captions, background music, brand overlay. Apply final color grade.
Step 6: Output
Export complete video. Ready for YouTube, TikTok, or Instagram.
Key Tools in This Pipeline
- Kitten TTS: Voice generation (free, offline, 8 voices)
- Python: Automation and batch processing
- Video Editor: Final assembly and effects
Quick Start Code
from kittentts import KittenTTS
model = KittenTTS("KittenML/kitten-tts-mini-0.8")
# Generate voiceover for pipeline
model.generate_to_file("Your script text here.", "output.wav", voice="Jasper")
print("Voiceover ready for next pipeline step!")