AutoRhythm

Syllable-level rhythm correction for rap vocals. Every syllable on beat. Your voice, untouched.

Zero-Sample Error AI Guide Generation Audacity Integration

Lyrics

Yeah Load up Roblox, headset sittin' on my dome, Full dive mindset, now the block world home. And I swear it gets weird, yeah I'm tellin' no lies— Got that "phantom pain" thing messin' up my mind. Not the medical type, nah, this one's absurd, It's like my brain bought in and forgot what it heard. Some dude walks through me in a lobby so packed— And I flinch in real life like I just got smacked. Avatar bump—yo I feel that ghost, Like a pixel just grazed me, man I'm doin' the most.

Audio Pipeline

1Backing Beat

The instrumental track everything syncs to

2Human Vocal (Raw)

Original rap vocal — off-beat timing, unprocessed

3AI Guide (Full Mix)

ACE-Step 1.5 generates a rap vocal from the lyrics over a backing track

4AI Guide (Vocals Stripped)

Demucs isolates just the vocal — this becomes the timing reference

5Corrected Human Vocal

Every syllable time-warped to match the AI guide's rhythm — zero-sample anchor error. Skip to ~14s for the vocals.

6Raw Vocal + Beat (Comparison)

Original uncorrected vocal over the backing track — hear the timing drift

7Final Mix

Corrected human vocal layered over the backing track

Audacity Session

The pipeline outputs a full Audacity session with visible clips, labels, and waveform tracks.

Audacity session showing corrected vocal clips and label tracks

Syllable Alignment Viewer

110 syllables aligned between human vocal and AI guide. Each block is one syllable region.

110 syllables · 48 kHz · onset anchors
Backing
Human Vocal + Anchors
Syllable region

Anchor Map

Every syllable's human timing vs. guide timing. Delta shows how far each syllable was shifted.

# Syllable Human Onset Guide Onset Delta Confidence

How It Works

9-phase pipeline. AI handles guide generation and alignment. Everything after that is fully deterministic.

NormalizeDeterministic
Resample to 48 kHz, mono for analysis
Guide GenerationAI
ACE-Step generates a vocal from lyrics, Demucs isolates it
SyllabifyAI
CMUdict + G2P maps lyrics to canonical syllables
Forced AlignmentAI
Montreal Forced Aligner extracts phone-level timestamps
Anchor MappingDeterministic
Map each human syllable to the guide's timing
Clip GroupingDeterministic
Safe-boundary scoring finds clean split points
Edit PlanDeterministic
Piecewise time-warp specification for every segment
RenderDeterministic
Rubber Band pitch-preserving time-stretch
Audacity SessionDeterministic
Clips, labels, and tracks via mod-script-pipe

Key Features

Zero-Sample Precision

Every rendered anchor lands exactly on the guide anchor at the integer sample index. Mathematically exact.

Voice Preservation

The human voice is only cut and time-stretched. Never regenerated or transformed by AI.

Transparent Editing

Full Audacity session with visible clips, labels, and waveforms. Every edit is inspectable.

Two Modes

Guide mode uses an AI vocal as reference. Beat-only mode snaps to the detected beat grid.

Built With