George Gilder, Microcosm (1989 cassette source, 14h 28m). Same 30-second
test passage every time, loudness-matched so you're judging the sound, not the volume.
✅
Winner found — Adobe Podcast (Enhance Speech v2)
After four misses, the Adobe pass sounds genuinely good to you. A/B it against the
raw tape below. Next step is batching the full 14.5-hour book through the same engine into a
clean .m4b with chapters + cover preserved.
The test passage
"…human imagination and moral revival. The exemplary technology of this era is the
microchip. The computer inscribed on a tiny piece of processed material. More than
any other invention this device epitomizes the overthrow of matter. Consider a parable of
the microchip once told by Gordon Moore, chairman of Intel…"
A/B — the fix vs the raw tape
✓ ADOBE v2Enhance Speech — the winner"sounds really good"
Adobe's neural restorer rebuilt the full top end naturally — clear consonants,
real "air", no underwater smear. His voice, just un-muffled.
BEFOREOriginal raw tapeboxy / muffled
The untouched 1989 source for reference.
Local contenders — offline, no Adobe account
LOCAL · 1Resemble-Enhancedenoise + generative restore
Strongest general local restorer. Rebuilds natural detail; far less "underwater"
than VoiceFixer. Keeps his exact voice. Runs fully offline on your M4.
A/B this against the Adobe winner above — does it hold up?
LOCAL · 2AudioSR — diffusion super-resolution (speech model)strong top-end rebuild
Diffusion model rebuilds the highs iteratively. Bright and detailed; listen for any
faint diffusion "shimmer" on sibilants. Local.
LOCAL · 3AP-BWE — purpose-built bandwidth extension (12k→48k)gentlest / most conservative
GAN trained to rebuild a band-limited signal to 48 kHz. The most restrained of the three —
keeps your real content, adds a modest, clean top end. Local, and very fast (~1.4s for 30s).
How they compare on paper (energy rebuilt above 6 kHz, after matching loudness):
Resemble −43 dB (most aggressive) · AudioSR −47 · Adobe −52 (natural) · AP-BWE −55 (gentlest) · raw original −58.
More lift isn't automatically better — too much can sound hissy/brittle. Trust your ears; this is just the map.
Your idea — resynthesis (know the words + the delivery, rebuild clean)
NEW · RESYNTHESISIndexTTS-2 — his cloned voice re-reads it, cleanfully synthetic
This is the "fill in the blanks with what we know he's saying" idea. We cloned his
voice from the Adobe-cleaned clip, fed it the transcript, and steered the tone with his
original recording. The output is brand-new, perfectly clean audio — but it's a clone re-performing,
not his real waveform. Listen hard: does it sound like him, or like a very good impostor?
The honest tradeoff: this gives perfect fidelity but loses his real voice, and any
transcription slip becomes a wrong word. Over 14.5 h that's the risk. Bandwidth-extension (Adobe / the
locals above) keeps his actual voice, just clearer.
You can see the fix too
Before — energy dies above ~5 kHz (dark top band).
After Adobe v2 — top end rebuilt with natural detail.
Next — the full book
Doing all 14.5 h by hand on the website is painful (free tier caps daily minutes; long-file
upload is paid). The clean path is Adobe's Enhance API — same engine, batched:
Split the .m4b at its existing chapter marks into ~30–60 min WAV chunks.
Run each chunk through the Enhance API (the exact v2 you just approved).
Loudness-normalize to the −18 LUFS audiobook standard and reassemble.
Re-wrap into a new .m4b with chapters + cover art preserved.
I'll confirm the API access + cost before running anything — you approve the plan first.