Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What is your recommended speech to text/audio transcription tool?
3 points by elektor on June 12, 2023 | hide | past | favorite | 6 comments
Currently, I use a GUI for Whisper AI (https://github.com/Const-me/Whisper) to upload MP3s of interviews to get text transcripts. However, I'm hoping to find another tool that would recognize and split out the text per speaker.

Does such a thing exist?



For an end user application, Otter.ai is the best I've seen - I wish there was a better faster one built on top of Whisper, but there isn't a good one that I've seen.

If you're looking for an API - then check AssemblyAI, Google Cloud transcription, Deepgram. I have a list here: https://llm-utils.org/List+of+AI+APIs


Descript.com was pretty good at it when I tried it, but it's pretty expensive: https://www.descript.com/transcription

We ended up using Otter.ai, which if I remember correctly didn't have as good a speaker separation model, but it was good enough for the price: https://otter.ai/

There's also the much more expensive, human-powered Rev: https://www.rev.com/


Microsoft has a tool that accepts wav or mp3 and transcribes it.

But I do not think it can distinguish between speakers.

How well does Whisper work in terms of correctness for single speakers?


Using the large model, it works really well, even in low volume settings/speakers mumbling. Some of my transcripts are pharma related and Whisper stumbles on the drug names, but I’m pretty understanding of that.


what you're looking for is called diarization. almost all enterprise STTs do that, you can find individual libraries on GitHub too.

fine-tuning whisper is a nightmare, I don't know what the interviews are for, but again most enterprise STTs offer customization. you can add medical terminology.

---Google, Amazon and Nuance have medical models but either expensive or not available for personal projects.


Thanks for that! Searching for diarization really helped me narrow down for what I was looking for.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: