Currently, I use a GUI for Whisper AI (https://github.com/Const-me/Whisper) to upload MP3s of interviews to get text transcripts. However, I'm hoping to find another tool that would recognize and split out the text per speaker.
Does such a thing exist?
If you're looking for an API - then check AssemblyAI, Google Cloud transcription, Deepgram. I have a list here: https://llm-utils.org/List+of+AI+APIs