Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This 100%, yes!

I've found myself putting in filler words or holding a noise "Uhhhhhhhhh" while I'm trying to form a thought but I don't want the LLM to start replying. It's a really hard problem for sure. Similar to the problem of allowing for interruptions but not stopping if the user just says "Right!", "Yes", aka active listening.

One thing I love about MacWhisper (not special to just this STT tool) is it's hold to talk so I can stop talking for as long as I want then start again without it deciding I'm done.



I recently got to know about this[^1] paper that differentiates between 'uh' and 'um'.

> The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in “and-uh”), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.

[1]: https://www.sciencedirect.com/science/article/abs/pii/S00100...


I hate when you get "out of sync" with someone for a whole conversation. I imagine sine ways on an occilloscope and there they just slightly out of phase.

You nearly have to do a hard reset to get things comforatble - walk out of the room, ring the back.

But some people are just out of sync with the world.


So they basically train us to worsen our speech to avoid being interrupted.

I remember my literature teacher telling us in class how we should avoid those filler words, and instead allow for some simple silences while thinking.

Although, to be fair, there are quite a few people in real life using long filler words to avoid anyone interfering them, and it’s obnoxious.


Somehow need to overlap an LLM with vocal stream processing to identify semantically meaningful transition points to interrupt naturally instead of just looking for any pause or sentence boundary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: