Skip to content

Polished Narrated Video Playbook

April 17, 2026

Note — April 23, 2026

This playbook was originally built around the local LTX video pipeline. After the Runway Gen-4.5 migration, some of the underlying steps are in flux. The pattern (narrate first, split on natural pauses, render clips in parallel, upscale once at the end) still holds — read the Runway note for what's currently live.

Each video clip is capped at 5 seconds. So if you want a 30-second explainer, a founder pitch, or a podcast visual, Alfrada now runs an end-to-end pattern without supervision: narrate first, score with music, split on natural speech pauses, render each slice in parallel, stitch with crossfades, and upscale once at the end. Cheap iterations, one expensive call only at the very end.

Why it works as a workflow

  • Narration drives the pacing. The audio is generated first and split on natural speech pauses, so each video clip lines up with a natural phrase — not an arbitrary 5-second boundary.
  • Clips render in parallel instead of one at a time, so a 30-second video doesn't take six times as long as a 5-second one.
  • Cheap iteration, expensive upscale only at the end. The agent iterates at low resolution until you approve, then upscales the final draft. You don't pay for an upscale on every regeneration.
  • Slide interstitials can land between clips — title cards, section breaks, simple diagrams — for product walkthroughs and explainers.

Common variants the playbook handles

  • Silent or music-only video — skip narration, the agent adapts.
  • Speech over static slides — skip clip generation, render slides plus narration.
  • Extending a clip — add a few seconds to a single clip rather than regenerating it.

When you'd reach for it

  • A 30-second founder pitch for demo day. "Make me a 30-second narrated video for [company]: cinematic style, piano underscore, sections for problem, solution, traction." The agent narrates, scores, splits the audio into natural phrases, generates the clips in parallel, stitches them with crossfades and slide cards, and shows you the draft. You approve, it upscales, you get the final.
  • A podcast launch visual. "Make a 45-second video with the intro read by Ada, a piano-and-synth bed, and abstract visuals matched to the mood of each section."
  • A product walkthrough with technical callouts. You provide the script, ask for slide interstitials between clips ("Stage 1: Ingest", "Stage 2: Analyze", "Stage 3: Output"), the agent builds the 60-second walkthrough.

Try it

  • "Make me a 30-second narrated video explaining [topic], cinematic style, piano underscore."
  • "Make a 45-second product walkthrough. Narrate the script I'll paste. Cut in 3 slide sections: Problem, Solution, Next Steps."
  • "Take this existing 5-second clip and extend it by 3 seconds with a continued camera move, then add narration."
  • Once you approve the draft, say "upscale it" — one call, sharper output.

Heads up

The upscale is a separate call you approve explicitly. The agent iterates cheaply at low resolution and only spends on the upscale when you're happy with the cut. If you want it to upscale automatically, say so.

Built for the Alfrada platform.