How to Remove Audio Artifacts from AI Voiceovers Using Audacity
Fix AI voiceover artifacts in Audacity with this step-by-step processing chain — noise reduction, EQ, compression, and re-sync with video.
ElevenLabs gives you a voice that sounds like a BBC correspondent. Google TTS hands you clean, professional narration. Then you listen back and hear it: a faint metallic buzz at the end of a sentence, a weird breath that sounds like a robot sighing, a consonant that clips like someone’s mic was taped to a ceiling fan. AI voiceovers have gotten genuinely impressive, but the artifacts haven’t gone away — they’ve just gotten subtler and more annoying.
The good news is that Audacity — free, open-source, and stubbornly unsexy — handles most of these problems with tools that have existed for years and work surprisingly well on synthetic speech. No paid plugins required, no cloud API subscription, no waiting for some mythical integration to materialize. This tutorial walks you through the full workflow: diagnosing what kind of artifact you’re dealing with, removing it, and re-syncing your cleaned audio with video. It works on exports from ElevenLabs, Google Cloud TTS, Amazon Polly, or any other AI TTS platform that gives you a WAV or MP3 file.
What You’ll Walk Away With
By the end of this, you’ll have a repeatable post-processing chain in Audacity that handles the four most common AI voiceover problems: background hiss and buzz, sibilance harshness (those piercing S and T sounds), unnatural silence gaps between sentences, and clipping on loud syllables. You’ll also have a workflow for exporting the cleaned audio and snapping it back into your video editor without losing sync. The tools are Audacity 3.x, your AI-generated audio file, and optionally the free Nyquist plugins that ship with Audacity.
What You Need Before You Start
Download Audacity from audacityteam.org — version 3.4 or later is what you want, since it introduced a non-destructive effects stack that makes tweaking your chain much less painful. Your source file should be exported from your TTS platform at the highest quality available: WAV at 44.1kHz or 48kHz, 16-bit minimum. If your platform only gives you MP3, that’s workable, but avoid re-exporting as MP3 after processing — you’ll stack compression artifacts on top of your repairs. Export your final file as WAV or FLAC, then convert to MP3 once at the very end if you need it.
Pro tip ✅
Always keep the original AI-generated file untouched in a separate folder. Audacity’s undo history disappears when you close the project, and you’ll want that original if you decide your edits made things worse — which, with noise reduction, is easier to do than you’d think.
Step 1: Diagnose Your Artifact Before You Touch Anything
Open your AI voiceover in Audacity (File → Import → Audio). Before running any effects, zoom into the waveform and look at the silent sections between sentences. AI TTS tools often generate audio with a constant low-level noise floor — sometimes a faint hiss, sometimes a subtle hum that corresponds to their synthesis model’s background. Switch to Spectrogram view (click the track name dropdown → Spectrogram) and you’ll see exactly what frequencies are causing trouble. Hiss shows up as horizontal static across high frequencies. Hum appears as thin horizontal lines at specific frequencies (often 50Hz or 60Hz from power grid artifacts baked into training data). Sibilance spikes are sharp bright bursts in the 6kHz–12kHz range.
Knowing what you’re dealing with before you apply any effect saves you from over-processing. Noise reduction applied too aggressively turns speech into underwater garbling — a problem that’s arguably worse than the original hiss.
Note 💡
ElevenLabs voices tend to produce sibilance harshness and occasional breath artifacts. Google Cloud TTS and Amazon Polly are generally cleaner but can introduce a subtle metallic resonance on sustained vowels. Knowing your source platform helps you know where to look first.
Step 2: Capture a Noise Profile
Find a section of your audio with silence — between sentences, at the start, or at the end. It needs to be at least half a second of the noise you want to remove, with no speech in it. Select that section by clicking and dragging. Then go to Effect → Noise Reduction and click “Get Noise Profile.” Audacity now knows what the background noise signature looks like.
Next, select the entire track (Ctrl+A / Cmd+A) and go back to Effect → Noise Reduction. The default settings are often too aggressive for synthetic speech. Use these starting values:
Noise Reduction (dB): 6
Sensitivity: 4.00
Frequency Smoothing (bands): 3
These conservative settings remove the noise floor without hollowing out the voice. If the artifact is light — the kind you can barely hear without headphones — start even lower at 4dB reduction. Hit Preview before applying to hear what you’re doing to the voice. If it sounds watery or phasey, dial the Sensitivity down. Apply only when you’re satisfied with the preview.
Warning ⚠️
Do not run Noise Reduction twice on the same file trying to get “more” of the artifact out. You’ll remove consonant detail and the voice will sound like it’s speaking through a wet blanket. One careful pass beats two aggressive ones every time.
Step 3: Fix Sibilance with the Equalizer
Sibilance — the harsh piercing quality on S, T, and CH sounds — is endemic to AI voices. It’s not technically noise, so Noise Reduction won’t touch it. The fix is a targeted EQ cut in the 7kHz–10kHz range. Go to Effect → Filter Curve EQ (or the Graphic EQ if you prefer sliders). Apply a gentle cut:
Frequency: 8000 Hz
Gain: -3 dB to -5 dB
Q (bandwidth): narrow — affect only 6kHz to 10kHz
Listen back. If the S sounds are still sharp, push to -6dB. If the voice starts sounding dull or muffled, you’ve gone too far — pull back. The goal is to tame harshness without making the voice sound like it’s speaking into a pillow. A complementary slight boost around 3kHz–4kHz can restore presence if you’ve cut too much air out of the high end.
Step 4: Repair Clipping and Silence Gaps
AI TTS can occasionally clip loud syllables — the waveform hits maximum amplitude and gets squared off, producing a harsh distortion. You can spot this in the waveform as flat-topped peaks. Audacity’s Clip Fix effect (Effect → Clip Fix) interpolates the clipped sections. Run it with the default settings (Threshold: 95%, Algorithm: Linear) and it recovers most clipped peaks cleanly.
Silence gaps are a different issue. AI tools often generate inconsistent pause lengths between sentences — sometimes too long, sometimes jarring zero-gap transitions. Select a silence gap between two sentences and use Effect → Truncate Silence to normalize them:
Minimum duration: 0.5 seconds
Maximum duration: 1.0 seconds
Silence threshold: -40 dB
Truncate to: 0.7 seconds
This collapses any awkward three-second dramatic pause and stretches any zero-gap transitions into something that sounds like an actual human taking a breath. Adjust the “Truncate to” value to match the pacing of your content — fast-paced explainer videos want shorter pauses, documentary narration wants more space.
Pro tip ✅
If your AI voice has unnatural breath sounds or generates phantom breaths at odd moments, zoom into the waveform, select just that breath artifact, and use Effect → Amplify with a negative gain (e.g., -12 dB) to quietly suppress it without cutting it entirely. A complete silence where a breath used to be sounds more unnatural than a quiet one.
Step 5: Apply Light Compression to Even Out Dynamics
AI voiceovers sometimes have uneven volume — one sentence booms, the next drops off. A gentle compressor evens this out and makes the final audio feel more polished. Go to Effect → Compressor and use these settings for narration:
Threshold: -18 dB
Noise Floor: -40 dB
Ratio: 2:1
Attack Time: 0.20 seconds
Decay Time: 1.0 second
Make-up gain: checked
For a more aggressive podcast-style sound, push the ratio to 3:1 and lower the threshold to -24 dB. For documentary narration where you want dynamic range preserved, keep the ratio at 1.5:1. The “Make-up gain” checkbox brings the overall level back up after compression — leave it on.
Step 6: Normalize and Export
Final step before export: normalize the track so your audio hits a consistent target level. Go to Effect → Normalize and set:
Normalize maximum amplitude to: -1.0 dB
Remove DC offset: checked
The -1.0 dB ceiling gives you headroom so nothing clips in downstream video editors or streaming platforms. Removing DC offset fixes any constant baseline shift in the waveform that can cause subtle distortion.
Export via File → Export Audio. Choose WAV (Microsoft) at 44100 Hz, 16-bit PCM for video work. If your video editor or delivery platform needs 48kHz (common for broadcast and YouTube), set that in the Project Rate dropdown at the bottom-left of Audacity before exporting.
Pro tip ✅
If you’re delivering audio for a video that was already edited with the original AI voiceover (with the artifacts), swap in your cleaned WAV at the same sample rate and the sync will be frame-perfect — Audacity’s processing doesn’t change the duration of the file unless you explicitly used Truncate Silence. If you did use Truncate Silence, export a new timecode reference and re-sync manually in your video editor.
Step 7: Re-Sync with Video
In most video editors (Premiere Pro, DaVinci Resolve, CapCut), replacing an audio track is straightforward: right-click the original audio clip on the timeline, select “Replace” or “Relink,” and point it to your cleaned WAV. If your editor doesn’t have a replace function, mute the original audio track, create a new audio track, import your cleaned file, and align it to the start of the original clip using the waveform view — the overall shape matches even after processing, making manual alignment fast.
If you used Truncate Silence and your audio is now shorter than the original, you have two options: extend the video clips to match the new pacing (often an improvement), or go back into Audacity and add silence manually at the points where you removed it, to restore the original duration. The second approach is fiddly but preserves your original edit.
Note 💡
DaVinci Resolve’s Fairlight audio module has its own decent noise reduction built in, and if you’re already editing in Resolve, it’s worth trying Fairlight’s tools first before bouncing to Audacity. But Audacity’s Noise Reduction is more transparent on synthetic speech — Fairlight’s tends to be more aggressive and is tuned for room noise rather than synthesis artifacts.
Full Effect Chain — Copy This Order
Order matters in audio processing. Running compression before noise reduction amplifies the noise before you remove it, which forces you to use more aggressive reduction settings. Run things in this sequence and you’ll get cleaner results:
1. Noise Reduction (profile capture → apply at 6dB / Sensitivity 4)
2. Clip Fix (if clipping present — Threshold 95%)
3. Filter Curve EQ (cut 8kHz by -3 to -5 dB for sibilance)
4. Truncate Silence (0.5s min, 1.0s max, truncate to 0.7s)
5. Compressor (Threshold -18dB, Ratio 2:1)
6. Normalize (-1.0 dB, DC offset removal checked)
Save this chain as an Audacity Macro (Tools → Macros → New) and you can apply the whole thing to a batch of files in one click. Genuinely useful if you’re processing a multi-episode podcast or a video course where every episode has the same AI narrator.
Avoid 🚫
Don’t apply the EQ step before Noise Reduction. EQ changes the spectral shape of your noise profile, which makes the Noise Reduction algorithm less accurate. Always capture the profile and apply reduction on the unprocessed audio first.
Bonus: Nyquist Prompt for Custom Noise Reduction
Audacity includes a Nyquist scripting environment (Effect → Nyquist Prompt) that lets you write custom audio processing code. For a quick high-pass filter that removes low-frequency rumble common in some TTS outputs without touching the voice, paste this:
(highpass8 s 80)
This applies an 8th-order high-pass filter at 80Hz, which cuts subsonic rumble cleanly without affecting the fundamental frequency of most voices (which sit above 100Hz for both male and female speakers). For a slightly gentler cut that starts rolling off at 120Hz:
(highpass8 s 120)
Run it via Effect → Nyquist Prompt, paste the code, and hit OK. It processes instantly and is non-destructive if you use Audacity 3.4’s effects stack.
The Bottom Line
AI voiceovers are good enough to use — they’ve been good enough for a while. The artifacts that remain are real but fixable, and you don’t need a specialized API, a cloud subscription, or a plugin that may or may not actually exist to deal with them. Audacity’s built-in tools, applied in the right order with conservative settings, handle the majority of what TTS platforms produce. The effect chain above takes about four minutes to run on a five-minute voiceover file, and the improvement is audible even on decent laptop speakers. Save it as a Macro, process your files in batch, and stop letting metallic hiss be the thing your audience notices instead of your content.


