Ask HN: What is your recommended speech to text/audio transcription tool?

tikkun · on June 12, 2023

For an end user application, Otter.ai is the best I've seen - I wish there was a better faster one built on top of Whisper, but there isn't a good one that I've seen.

If you're looking for an API - then check AssemblyAI, Google Cloud transcription, Deepgram. I have a list here: https://llm-utils.org/List+of+AI+APIs

solardev · on June 13, 2023

Descript.com was pretty good at it when I tried it, but it's pretty expensive: https://www.descript.com/transcription

We ended up using Otter.ai, which if I remember correctly didn't have as good a speaker separation model, but it was good enough for the price: https://otter.ai/

There's also the much more expensive, human-powered Rev: https://www.rev.com/

tmaly · on June 12, 2023

Microsoft has a tool that accepts wav or mp3 and transcribes it.

But I do not think it can distinguish between speakers.

How well does Whisper work in terms of correctness for single speakers?

elektor · on June 12, 2023

Using the large model, it works really well, even in low volume settings/speakers mumbling. Some of my transcripts are pharma related and Whisper stumbles on the drug names, but I’m pretty understanding of that.

java_beyb · on June 13, 2023

what you're looking for is called diarization. almost all enterprise STTs do that, you can find individual libraries on GitHub too.

fine-tuning whisper is a nightmare, I don't know what the interviews are for, but again most enterprise STTs offer customization. you can add medical terminology.

---Google, Amazon and Nuance have medical models but either expensive or not available for personal projects.

elektor · on June 13, 2023

Thanks for that! Searching for diarization really helped me narrow down for what I was looking for.