Self-Publishing an Audiobook with AI Voice: A Developer's Guide

ACX (Audible's audiobook marketplace) pays narrators $200–$400 per finished hour. A 10-hour audiobook costs $2,000–$4,000 to produce with a professional narrator.

The same audiobook with AI narration: about $4.

That's not a typo. A 10-hour audiobook is roughly 90,000 words — about 540,000 characters. At $0.01/1K chars (Pro tier), that's $5.40 total.

Here's how to do it properly.

#What works well with AI narration

AI voice narration is excellent for:

Non-fiction — business books, self-help, technical guides, how-to content
Educational content — courses, explainers, reference material
Newsletters and blogs — audio versions of written content
Short fiction — short stories, anthologies where a consistent narrator voice works

Where human narrators still have an edge: novels with many characters who need distinct voices, highly emotional dramatic scenes, and literary fiction where the author's unique cadence matters.

#Choosing a narrator voice

The most important decision. A good narrator voice for audiobooks should be:

Warm but not distracting
Clear articulation at natural speed
Consistent emotional tone
Not overly dramatic (listeners will hear this for hours)

LeanVox's narrator category has voices tuned for long-form listening. Preview before committing — you'll hear a big difference between voices that sound great in a 10-second demo and ones that hold up over an hour.

For fiction with multiple characters, the Max tier lets you describe specific character voices and maintain them throughout:

NARRATOR = {
    "voice": "narrator_warm_male",
    "model": "pro"
}

CHARACTERS = {
    "detective_sarah": {
        "model": "max",
        "instructions": "Confident female detective, mid-30s. Direct speech. Occasionally dry humor. American accent."
    },
    "villain": {
        "model": "max",
        "instructions": "Cultured male villain, 50s. Smooth, controlled. Never raises voice. British accent. Slight condescension."
    }
}

#Processing a full book

You don't need to split your manuscript manually. LeanVox supports async jobs that accept full text files as input — the server handles chunking, processing, and reassembly automatically. Just point the CLI at your file and wait:

# Install the CLI
npm install -g leanvox

# Authenticate
lvox auth login

# Generate a full book from a .txt file (async — handles any length)
lvox generate --use-async   --model pro   --voice narrator_warm_male   --file my_book.txt   --output audiobook.mp3

# Also works directly with .epub files
lvox generate --use-async   --model pro   --voice narrator_warm_male   --file my_book.epub   --output audiobook.mp3

The CLI submits an async job, polls for completion, and downloads the final file when done. A 90,000-word book typically completes in 10–20 minutes.

You can also check job status manually:

# Check status of a running job
lvox jobs get <job-id>

# List all recent jobs
lvox jobs list

Via Python SDK

If you prefer to automate from a Python pipeline, the SDK also supports async generation:

from leanvox import Leanvox
import requests

client = Leanvox(api_key="lv_live_...")

# Submit the full manuscript — no chunking needed
with open("my_book.txt") as f:
    manuscript = f.read()

job = client.generate_async(
    text=manuscript,
    model="pro",
    voice="narrator_warm_male",
)

print(f"Job submitted: {job.job_id}")

# Poll until complete
result = job.wait()  # blocks until done, handles retries automatically

# Download
audio = requests.get(result.audio_url).content
with open("audiobook.mp3", "wb") as f:
    f.write(audio)

print("📚 Audiobook complete!")

#Handling dialogue in fiction

For novels with dialogue, you can parse character speech and generate different voices for each speaker:

import re

def parse_dialogue(text: str) -> list[dict]:
    """Extract narrator and character segments from a passage."""
    segments = []
    # Simple pattern — adjust regex for your manuscript format
    pattern = r'"([^"]+)"\s*(?:said|replied|asked|whispered)?\s*(\w+)?'

    last_end = 0
    for match in re.finditer(pattern, text):
        # Narrator segment before dialogue
        narrator_text = text[last_end:match.start()].strip()
        if narrator_text:
            segments.append({"type": "narrator", "text": narrator_text})

        # Dialogue segment
        dialogue = match.group(1)
        speaker = match.group(2) or "narrator"
        segments.append({"type": "character", "speaker": speaker.lower(), "text": dialogue})

        last_end = match.end()

    # Remaining narrator text
    remaining = text[last_end:].strip()
    if remaining:
        segments.append({"type": "narrator", "text": remaining})

    return segments


def generate_scene_with_dialogue(text: str) -> bytes:
    segments = parse_dialogue(text)
    audio_parts = []

    for seg in segments:
        if seg["type"] == "narrator":
            result = client.generate(text=seg["text"], model="pro", voice="narrator_warm_male")
        else:
            config = CHARACTERS.get(seg["speaker"], {"voice": "podcast_casual_male", "model": "pro"})
            result = client.generate(text=seg["text"], **config)

        audio_parts.append(requests.get(result.audio_url).content)

    return b"".join(audio_parts)

#Distribution options

Once generated:

ACX / Audible — accepts MP3 at 192kbps+, each chapter as a separate file. AI-narrated books are accepted but must be disclosed.
Findaway Voices — broader distribution (Apple Books, Kobo, libraries), same disclosure requirements
Direct sale — sell on your own site, Gumroad, or Payhip with no disclosure requirement
Spotify / podcast RSS — publish as a podcast feed, chapter by chapter

#Quality tips

Proofread before generating — fix typos, unusual proper nouns, and formatting issues. Regenerating costs money.
Use punctuation for pacing — em dashes (—), ellipses (...), and commas control rhythm more than speed settings.
Test on a sample chapter first — generate chapter 1 in full before committing to the whole book.
Normalize audio — run the final MP3 through Audacity or ffmpeg to normalize loudness to -16 LUFS (ACX standard).

# Normalize to ACX standard with ffmpeg
ffmpeg -i chapter_01.mp3 -af "loudnorm=I=-16:TP=-1.5:LRA=11" chapter_01_normalized.mp3

#What it costs

Book length	Word count	Chars (est.)	Pro tier cost
Novella (2h)	~20,000	~120K	$1.20
Short non-fiction (4h)	~40,000	~240K	$2.40
Full book (10h)	~90,000	~540K	$5.40

Your free signup credit covers 200+ minutes of audio — enough to produce a complete novella before spending a cent. A full business book costs less than a cup of coffee.

#Try it

Browse narrator voices — find the right voice before committing to a full manuscript.

No-code option: Use our n8n community node to automate audiobook generation in a workflow — RSS feed → extract text → generate speech → upload. No coding required.

Get your API key · Docs