Remove background noise from audio (online)

A technical, practical workflow: identify noise type → pick the right cleanup step → avoid artifacts → verify output.

What “background noise” usually means

Different noise types require different treatment. Misclassifying the noise is the #1 reason results sound unnatural.

Noise type	How it sounds	What usually works	Common failure mode
Broadband hiss	Steady “shhh” (often high-frequency)	Denoise / spectral reduction	Watery/metallic artifacts if pushed
Tonal hum	Low-frequency tone (50/60 Hz) + harmonics	Notch filtering + light denoise	Voice thinness if filters are too wide
Room echo (reverb)	“Hollow” voice; tail after words	De-reverb (separate from denoise), better capture	Over-denoise makes reverb more obvious
Wind / handling	Low-frequency bursts, thumps, bumps	High-pass + repair + selective attenuation	Speech distortion if treated as steady noise
Crowd/traffic ambience	Non-stationary, changing background	Moderate denoise + careful thresholds	Pumping/gating in pauses

Get a baseline: listen to a few seconds of silence and a few seconds of speech.
Classify noise: steady (hiss/hum) vs changing (crowd/wind) vs echo.
Fix tonal hum first: hum removal before denoise reduces artifacts.
Denoise conservatively: remove masking noise without re-voicing the speaker.
Verify in context: check dialogue intelligibility and breaths/sibilants.

Speech transients: consonant onsets (“t”, “k”, “p”) drive intelligibility.
Sibilance: “s” and “sh” energy can be mistaken for hiss; avoid over-reduction.
Breaths: fully removing breaths can create unnatural gaps and pumping.
Room tone consistency: absolute silence between words can sound edited.

Reduce the denoise strength and prefer multiple lighter passes over one aggressive pass.
Ensure the model is not treating sibilance as noise (listen to “s”, “f”, “sh”).

Use a less aggressive threshold so the noise floor doesn’t jump between words.
If possible, keep a small amount of room tone instead of forcing “pure silence”.

If removing hum, avoid wide cuts that eat into voice fundamentals (~85–255 Hz typical).
Do hum removal first, then use gentle denoise for hiss/ambience.

Non-stationary noise (crowds/traffic) is harder—aim for intelligibility, not total removal.
Consider splitting the file: treat noisy sections separately if the background changes.

A simple capture fix often beats any post-processing: move closer, reduce room reflections, and set input gain to avoid clipping.

Not reliably. Echo/reverb is different from background noise. If echo is the main issue, prioritize better capture or a dedicated de-reverb step.

Usually denoise first. Compression raises the noise floor and makes denoise harder.

Only to a point. If the recording has low SNR or heavy echo, the cleanest result comes from improving capture.