Skip to content

feat(core,engine): data-duck — compile-time audio ducking under voice tracks#1337

Open
mvanhorn wants to merge 2 commits into
heygen-com:mainfrom
mvanhorn:fix/hyperframes-audio-autoduck
Open

feat(core,engine): data-duck — compile-time audio ducking under voice tracks#1337
mvanhorn wants to merge 2 commits into
heygen-com:mainfrom
mvanhorn:fix/hyperframes-audio-autoduck

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

What

Declarative audio ducking: a music track marked data-duck automatically lowers itself whenever any data-role="voice" track is audible, with smooth fade ramps.

<audio id="music" src="music.mp3" data-duck="-12dB" data-start="0" data-duration="30"></audio>
<audio id="vo" src="voiceover.mp3" data-role="voice" data-start="2" data-duration="6"></audio>

data-duck accepts dB ("-12dB") or a linear multiplier; data-duck-fade sets the ramp (default 0.3s). Rendered evidence (same composition with and without data-duck):

duck waveforms

The difference row is the ducking, isolated: digital silence outside the voice windows (the two mixes are bit-identical there) and exactly the removed music energy inside them.

Why

Voiceover-over-music is the default shape of narrated video, and getting the balance right currently means hand-authoring volume keyframes against every voice clip's start and end, then re-timing them whenever narration shifts (#1176 was a user doing exactly this with GSAP volume tweens). Consumer editors auto-duck; neither HyperFrames nor Remotion offers it declaratively.

The timeline knows every overlap window at compile time, so ducking doesn't need sidechain analysis. It compiles to volume keyframes and rides the sample-accurate volume automation the engine already has (#1117) — deterministic, and identical in preview and render.

How

  • packages/core/src/compiler/timingCompiler.ts: compileAudioDucking reads data-duck / data-duck-fade / data-role, computes voice-overlap intervals from clip timing (merging gaps shorter than 2x the fade so the music doesn't pump between close voice clips), and writes the resulting keyframes to an internal data-hf-duck-keyframes attribute. Recompilation-safe: the attribute is regenerated, never accumulated.
  • packages/engine/src/services/audioElementParser.ts (extracted from audioMixer): parses the duck keyframes into each track's volumeKeyframes, multiplying with any authored data-volume-keyframes envelope rather than replacing it.
  • packages/core/src/runtime/mediaVolumeEnvelope.ts: preview applies the same multiplied envelope, so what you hear in preview is what renders.
  • probeStage.ts: runtime-probed automation (GSAP volume tweens) multiplies with duck keyframes instead of overwriting them.
  • Docs: data-duck / data-duck-fade / data-role rows in the html-schema reference plus a concepts section.

Out of scope by design: runtime-triggered audio (clips whose timing isn't declarative) — documented as a limitation.

Test plan

The waveform evidence above is real render output from this branch. Sample-subtracting the two renders: bit-identical outside voice windows (diff RMS −270dB), and inside them the measured delta is −37.7dB against a theoretical −37.6dB for a −12dB duck of this music bed — the envelope lands within 0.1dB of spec.

  • Unit tests added/updated (dB and linear gain parsing, overlap merging across fade gaps, ramp clamping at clip bounds, keyframe multiplication with authored envelopes, compiler integration, mixer envelope application)
  • Manual testing performed (real duck/no-duck renders above, macOS say voiceover over a music bed)
  • Documentation updated (html-schema reference, data-attributes concepts, docs nav)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant