Thru the 90s and 00s and well into the 10s I generally dismissed speech recognition as useless to me, personally.
I have a minor speech impediment because of the hearing loss. They never worked for me very well. I don't speak like a standard American - I have a regional accent and I have a speech impediment. Modern speech recognition doesn't seem to have a problem with that anymore.
IBM's ViaVoice from 1997 in particular was a major step. It was really impressive in a lot of ways but the accuracy rate was like 90 - 95% which in practice means editing major errors with almost every sentence. And that was for people who could speak clearly. It never worked for me very well.
You also needed to speak in an unnatural way [pause] comma [pause] and it would not be fair to say that it transcribed truly natural speech [pause] full stop
Such voice recognition systems before about 2016 also required training on the specific speaker. You would read many pages of text to the recognition engine to tune it to you specifically.
It could not just be pointed at the soundtrack to an old 1980s TV show then produce a time-sync'd set of captions accurate enough to enjoy the show. But that can be done now.
I have a minor speech impediment because of the hearing loss. They never worked for me very well. I don't speak like a standard American - I have a regional accent and I have a speech impediment. Modern speech recognition doesn't seem to have a problem with that anymore.
IBM's ViaVoice from 1997 in particular was a major step. It was really impressive in a lot of ways but the accuracy rate was like 90 - 95% which in practice means editing major errors with almost every sentence. And that was for people who could speak clearly. It never worked for me very well.
You also needed to speak in an unnatural way [pause] comma [pause] and it would not be fair to say that it transcribed truly natural speech [pause] full stop
Such voice recognition systems before about 2016 also required training on the specific speaker. You would read many pages of text to the recognition engine to tune it to you specifically.
It could not just be pointed at the soundtrack to an old 1980s TV show then produce a time-sync'd set of captions accurate enough to enjoy the show. But that can be done now.