jqpabc123 a day ago

Expecting intelligence and accuracy to "emerge" from a statistical process is absurb.

In others words, LLMs are only clearly useful if the results don't really matter or they can and will be externally verified.

LLMs negate a fundamental argument for computing --- instead of accurate results at low cost, we now have inaccurate results at high cost.

There is undoubtedly some utility to be had here but it is not at all clear or obvious that this will be widely transformative.

  • Ukv a day ago

    I don't think it's necessarily absurd to expect accuracy from statistical methods - in many domains (including voice transcription) they blow the accuracy of traditional non-statistical approaches out of the water, and in some areas even surpass human-level accuracy.

    Main thing is to measure the accuracy of the approach (regardless of whether it's traditional, statistical, or human) to determine if it's fit for purpose. In this case it sounds like the transcription shouldn't be solely relied on for high-risk decisions in its current state, but could be useful for something like searching through the reference audio if it were available.

    That the issue tends to be from "pauses, background sounds or music playing" also makes me suspect a lot of the cases could be relatively low hanging fruit - check the noise gate and normalization on the microphones, or potentially have the model output a quality score for each word so that low-confidence background noise can be displayed to the end user as smaller fainter text for instance, instead of part of the conversation.

  • pachorizons a day ago

    Isn't that what is promised though? What is the benefit to automated transcription if each and every single transcription must be manually audited? Where is the cost or labor saving?

    • 39896880 a day ago

      It is much easier to correct a transcription than to generate it wholesale. As well, the task of audio transcription correction has long since been commoditized because of the deployment of speech recognition on every smartphone.

      It’s not quite a solved problem but it’s close.

      • jqpabc123 6 hours ago

        It’s not quite a solved problem but it’s close.

        As long as the results don't really matter and no one is auditing, it appears more "solved" than it actually is.

logn a day ago

There will always be a need for both human oversight and accountability, and this is a good example. I think the net result will be, eventually, more and better jobs. It's a better job to validate the transcriptions than to actually transcribe.

Another example in medicine, radiologists will start handling orders of magnitude more cases. But the number of scans done might also increase exponentially as costs likewise drop.

  • jqpabc123 a day ago

    It's a better job to validate the transcriptions than to actually transcribe.

    In the real world "better" typically translates to lower cost.

    Which costs less? 1) Pay someone to transcribe a recording or 2) pay for a LLM transcription + pay someone to verify the transcription from a recording.

    It is far from certain or obvious that #2 is actually "better".

rahimnathwani 20 hours ago

  A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed.
The '100 hours' is almost useless information. 'About half' is meaningless without knowing the sample size. Perhaps he had 5 transcripts averaging 20 hours each, and 2 of the 5 had issues. Or perhaps there were hundreds of short transripts, where the 'almost half' would imply significance.
notjulianjaynes 17 hours ago

I used whisper to create an srt file from some some voice memos I made while I was driving and it 'hallucinated' "subtitles by the amara.org community" at the very end. Re ran as txt and what do you know that line disappeared.

AStonesThrow a day ago

This clinical service is not something that you, the patient, should want or allow.

I use a digital recorder app to record audio from my clinical consultations. It's important for me, as a patient, to have a record, because I'm alone in there, and I frequently misremember or misunderstand things that were said.

My current recorder app has a transcription feature. It's fairly good at picking out words. It's supposed to recognize and label speakers as well, but that requires a lot of manual editing after the fact.

Still, it's fantastic having my own durable record of what was said to me, and by me. There are usually a few surprises in there!

Now, I've stopped asking for permission to record, because usually they become hostile to it. Nevertheless, it's legal, and it's my right to have.

sirolimus a day ago

Well, no shit AI isnt meant for anything as serious as medicinal logging

  • Spivak 21 hours ago

    But it's fine for their document OCR? Dragon has been doing dictation for years and years. Either the service works to an acceptable degree or it doesn't. Audio transcription isn't some unknown quantity.