Kitten TTS by KittenML is the voice engine powering this workflow. Free, open-source, offline capable.

Pipeline Overview

Step 1: Prompt Design

Create image generation prompts matching your script topic. Style and composition parameters.

Step 2: Image Generation

ComfyUI generates images using SDXL/Flux. Batch generate for each script segment.

Step 3: TTS Voiceover

Kitten TTS generates narration. Match audio timing to image count.

Step 4: Video Composition

Combine generated images with TTS audio. Add Ken Burns effect or zoom transitions.

Step 5: Post-Processing

Add captions, background music, brand overlay. Apply final color grade.

Step 6: Output

Export complete video. Ready for YouTube, TikTok, or Instagram.

Key Tools in This Pipeline

  • Kitten TTS: Voice generation (free, offline, 8 voices)
  • Python: Automation and batch processing
  • Video Editor: Final assembly and effects

Quick Start Code

from kittentts import KittenTTS
model = KittenTTS("KittenML/kitten-tts-mini-0.8")

# Generate voiceover for pipeline
model.generate_to_file("Your script text here.", "output.wav", voice="Jasper")
print("Voiceover ready for next pipeline step!")