This pretty much. Everyone knows that LLMs are great for text generation and pro...

MVissers · on Dec 21, 2024

What would you need to see to call it useful?

To give you an example– I've used it for legal work such as an EB2-NIW visa application. Saved me countless of hours. My next visa I'll try to do without a lawyer using just LLMs. I would never try this without having LLMs at my disposal.

As a hobby– And as someone with a scientific background I've been able to build an artificial ecosystem simulation from scratch without programming experience in Rust: https://www.youtube.com/@GenecraftSimulator

I recently moved from fish to plants and believe I've developed some new science at the intersection of CS and Evolutionary Biology that I'm looking to publish.

This tool is extremely useful. For now– You do require a human in the loop for coordination.

My guess is that these will be benchmarks that we see within a few years: How good an AI coordinate multiple other AIs to build, deploy and iterate something that functions in the real world. Basically manager AI.

Because they'll literally be able to solve every single one shot problem so we won't be able to create benchmarks anymore.

But that's also when these models will be able to build functioning companies in a few hours.

skydhash · on Dec 21, 2024

> ...me countless of...would never try this without having LLMs...is extremely useful...they'll literally be able to solve...will be able to... in a few hours.

That's marketing language, not scientific or even casual language. So much outstanding claims, without even some basic explanations. Like how did it help you save these hours? Terms explanations? Outlining processes? Going to the post office for you? You don't need to sell me anything, I just want the how.

wruza · on Dec 21, 2024

My issue with LLMs is that you require a review-competent human in the loop, to fix confabulations.

Yes, I’m using them from time to time for research. But I’m also aware of the topics I research and see through bs. And best LLMs out there, right now, produce bs in just 3-4 paragraphs, in nicely documented areas.

A recent example is my question on how to run N vpn servers on N ips on the same eth with ip binding (in ip = out ip, instead of using a gw with the lowest metric). I had no idea but I know how networks work and the terminology. It started helping, created a namespace, set up lo, set up two interfaces for inner and outer routing and then made a couple of crucial mistakes that couldn’t be detected or fixed by someone even a little clueless (in routing setup for outgoing traffic). I didn’t even argue and just asked what that does wrt my task, and that started the classic “oh wait, sorry, here’s more bs” loop that never ended.

Eventually I distilled the general idea and found an article that AI very likely learned from, cause it was the same code almost verbatim, but without mistakes.

Does that count as helping? Idk, probably yes. But I know that examples like this show that you cannot not only leave an LLM unsupervised for any non-trivial question, but have to leave a competent role in the loop.

I think the programming community is just blinded by LLMs succeeding in writing kilometers of untalented react/jsx/etc crap that has no complexity or competence in it apart from repeating “do like this” patterns and literally millions of examples, so noise cannot hit through that “protection”. Everything else suffers from LLMs adding inevitable noise into what they learned from a couple of sources. The problem here, as I understand it, is that only specific programmer roles and s{c,p}ammers (ironically) write the same crap again and again millions of times, other info usually exists in only a few important sources and blog posts, and only a few of those are full and have good explanations.