Kitten TTS by KittenML is the voice engine powering this workflow. Free, open-source, offline capable.

Pipeline Overview

Step 1: Script Preparation

Write the video script. Break into scenes. Estimate timing for each segment.

Step 2: TTS Generation

Kitten TTS generates all narration. Save audio files organized by scene.

Step 3: Import to CapCut

Open CapCut. Import TTS audio files. Import footage, images, and B-roll.

Step 4: Timeline Assembly

Place audio on timeline. Match video clips to audio timing. Add transitions between scenes.

Step 5: Text & Effects

Add auto-captions from CapCut. Apply text overlays, stickers, and visual effects.

Step 6: Export

Export at desired resolution. Optimize settings for platform (TikTok, YouTube, etc.).

Key Tools in This Pipeline

  • Kitten TTS: Voice generation (free, offline, 8 voices)
  • Python: Automation and batch processing
  • Video Editor: Final assembly and effects

Quick Start Code

from kittentts import KittenTTS
model = KittenTTS("KittenML/kitten-tts-mini-0.8")

# Generate voiceover for pipeline
model.generate_to_file("Your script text here.", "output.wav", voice="Jasper")
print("Voiceover ready for next pipeline step!")