Remove background noise from audio (online)
A technical, practical workflow: identify noise type → pick the right cleanup step → avoid artifacts → verify output.
What “background noise” usually means
Different noise types require different treatment. Misclassifying the noise is the #1 reason results sound unnatural.
| Noise type | How it sounds | What usually works | Common failure mode |
|---|---|---|---|
| Broadband hiss | Steady “shhh” (often high-frequency) | Denoise / spectral reduction | Watery/metallic artifacts if pushed |
| Tonal hum | Low-frequency tone (50/60 Hz) + harmonics | Notch filtering + light denoise | Voice thinness if filters are too wide |
| Room echo (reverb) | “Hollow” voice; tail after words | De-reverb (separate from denoise), better capture | Over-denoise makes reverb more obvious |
| Wind / handling | Low-frequency bursts, thumps, bumps | High-pass + repair + selective attenuation | Speech distortion if treated as steady noise |
| Crowd/traffic ambience | Non-stationary, changing background | Moderate denoise + careful thresholds | Pumping/gating in pauses |
80/20 workflow (fast and safe)
- Get a baseline: listen to a few seconds of silence and a few seconds of speech.
- Classify noise: steady (hiss/hum) vs changing (crowd/wind) vs echo.
- Fix tonal hum first: hum removal before denoise reduces artifacts.
- Denoise conservatively: remove masking noise without re-voicing the speaker.
- Verify in context: check dialogue intelligibility and breaths/sibilants.
What to preserve (so it doesn’t sound “AI”)
- Speech transients: consonant onsets (“t”, “k”, “p”) drive intelligibility.
- Sibilance: “s” and “sh” energy can be mistaken for hiss; avoid over-reduction.
- Breaths: fully removing breaths can create unnatural gaps and pumping.
- Room tone consistency: absolute silence between words can sound edited.
How to avoid the 4 most common artifacts
1) Metallic / watery sound
- Reduce the denoise strength and prefer multiple lighter passes over one aggressive pass.
- Ensure the model is not treating sibilance as noise (listen to “s”, “f”, “sh”).
2) Pumping / gating in pauses
- Use a less aggressive threshold so the noise floor doesn’t jump between words.
- If possible, keep a small amount of room tone instead of forcing “pure silence”.
3) Speech thinning
- If removing hum, avoid wide cuts that eat into voice fundamentals (~85–255 Hz typical).
- Do hum removal first, then use gentle denoise for hiss/ambience.
4) “Underwater” ambience
- Non-stationary noise (crowds/traffic) is harder—aim for intelligibility, not total removal.
- Consider splitting the file: treat noisy sections separately if the background changes.
Quality check (what to listen for)
- Noise floor: does it drop without swirling artifacts?
- Consonants: are “t/k/s/f” crisp or smeared?
- Breaths: do breaths remain natural (not clipped or exaggerated)?
- Between phrases: any pumping or sudden silence jumps?
- Context test: play through speakers + earbuds (artifacts show differently).
When to re-record instead of denoise
- the voice is already heavily clipped/distorted
- the room echo is dominant (high reverb-to-direct ratio)
- the speaker is far from the mic and the background is as loud as speech
A simple capture fix often beats any post-processing: move closer, reduce room reflections, and set input gain to avoid clipping.
FAQ
Will denoise remove echo?
Not reliably. Echo/reverb is different from background noise. If echo is the main issue, prioritize better capture or a dedicated de-reverb step.
Should I denoise before or after compression?
Usually denoise first. Compression raises the noise floor and makes denoise harder.
Can I get “studio voice” from any recording?
Only to a point. If the recording has low SNR or heavy echo, the cleanest result comes from improving capture.