Interactive GitHub Pages

Understand the files this workflow expects and the artifacts it creates.

This page turns the current repo workflow into a technical walkthrough: voice JSON, dialogue formatting, exact CLI steps, and the output folder structure your users should expect.

Workflow shape
  1. Define speaker voices in JSON.
  2. Write dialogue with bracketed speaker tags.
  3. Render numbered WAV fragments and a manifest.
  4. Merge fragments into a single WAV.
  5. Optionally export MP3 or MP4 for sharing.

Environment requirements

  • Windows
  • NVIDIA GPU with CUDA
  • Python 3.12
  • Hugging Face access to Qwen3-TTS models
Example output Villain merged scene

This sample uses the directed villain scene and shows the kind of merged deliverable the workflow produces after narration, merge, and conversion.

MP3 preview villain_directed_merged.mp3
MP4 preview villain_directed_merged_image.mp4

Inputs

Input Lab

Switch between a quickstart sample and an emotional-variants sample to see how the workflow expects the input files to be structured.

What to look for
  • Voice JSON is an object keyed by speaker name.
  • Each dialogue line matches [SpeakerName]: Text.
  • Dialogue speakers exist in the selected voice file.
  • Emotional variants use separate speaker keys.

Voice file

characters.json

Dialogue file

dialogue.txt

Commands

Pipeline map

Select a step to inspect what it consumes, what it emits, and the exact command surface used in this repository.

Commands are shown as python .... Use the Python interpreter from your active environment, whether that is a virtual environment, conda, or another setup.

Artifacts

Output Explorer

The narration run produces more than a merged file. This section explains what belongs in an output folder and why each artifact exists.