There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.
Sure with a 1000B parameter you will get better performance but the average person will have it write some python script, not derive new physics equations.
So in a sense the demand for LLM intelligence with reach a plateau (arguably we are there today for avg person) so there will not be any subsidy required, because the avg person will not need the latest and greatest.
There’s not the same demand pattern for something like uber.
> There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.
But isn't that bad for the AI companies, too? Because then people just run an ~2026 SOTA performance open source model on their laptop for free and not pay any subscription.
Regular folks will not pay Anthropic, but NSA, NASA or research labs might.
I’m not implying this will be a good time for AI companies. I am saying AI as a technology can provide value without it being controlled by only 3 companies.
In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.
> In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.
$20 a month is a lot of money, I don't think the "convenience and flexibility" you get would actually be worth it, unless you've 1) got money to burn, 2) lack the skills to install software, 3) the open source community totally fails to develop a reasonable installer. The LLM service would probably be akin to a scam preying on ignorance, like those companies that will rent you a water softener for like $100/month.
It is a lot compared to what? I believe that a LLM capable laptop will cost considerably more than something that is good-enough for non-LLM productivity tasks. At least within the next 5 years. Say that it would cost 600 USD more, that would buy 30 months of subscription. It is this kind of scenario I think many people will favor the subscription.
Maybe I’m not creative enough to see the potential, but what value does this bring ?
Given the example I saw about CRISPR, what does this model give over a different, non explaining model in the output ?
Does it really make me more confident in the output if I know the data came from Arxiv or Wikipedia ?
I find the LLM outputs are subtlety wrong not obviously wrong
It makes the black box slightly more transparent. Knowing more in this regard allows us to be more precise—you go from prompt tweak witchcraft and divination to more of possible science and precise method.
Can this method be extended to go down to the sentence level ?
In the example it shows how much of the reason for an answer is due to data from Wikipedia. Can it drill down to show paragraph or sentence level that influences the answer ?
Your question should be "Can it drill down to show the paragraphs or sentences that influence the answer?"
I believe that the plagiarism complaint about llm models comes from the assumption that there is a one-to-one relationship between training and answers. I think the real and delightfully messier situation is that there is a many-to-one relationship.
Exactly! We will have a future post that shows this more granularly over the coming weeks. Here is a post we wrote on how this works at smaller scale: https://www.guidelabs.ai/post/prism/
Oh, that looks like a wonderful article. I just skimmed it, and I hope to get back to it later today. One thing I would love to see is how much of the training set is substantially similar to each other, especially in the code training set.
Great questions. We have several posts in the works that will drill down more into these things. The model was actually designed to answer these questions for any sentence (or group of tokens it generates).
It can tell you which specific text (chunk) in the training data that led to the output the model generated. We plan to show more concrete demos of this capability over the coming weeks.
It can tell you where in the model's representation it learned about science, art, religion etc. And you can trace all of these to either to input context, training data, or model's representations.
Does it? If i make a system prompt for most models right now, tell them they were trained on {list} of datasets, and to attribute their answer to their training data, i get quite similar output. It even seems quite reasonable. The reason being each data corpus has a "vibe" to it and the predictions simply assign response vibe to dataset vibe.
Ok I promised videos here is two. LLM had serious head issues with C and python x86 versus mips c. now coherent english. Phase two is chat interface so we can prompt without seeded prompts, check the code its real inference though!
The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY
This feels like an AI agent doing it's own thing. The screenshot of this working is garble text (https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...), and I'm skeptical of reasonable generation with a small hard-coded training corpus. And the linked devlog on youtube is quite bizzare too.
But leaving a light on 2x the time will equal very close to 2x the price.
Asking “what day is today” vs “create this api endpoint to adjust the inventory” will cost vastly different. And honestly I have no clue where to start to even estimate the cost unless I run the query.
Which means implementations also have to be correspondingly complicated. You have to handle quite a few different primitive data types each with their own opcodes, class hierarchies, method resolution (including overloading), a "constant pool" per class, garbage collection, exception handling, ...
I would expect a minimal JVM that can actually run real code generated by a Java compiler to require at least 10x as much code as a minimal Bedrock VM, and probably closer to 100x.
Why do you think that this means "idle GPU" rather than a company recognizing a growing need and allocating resources toward it?
It's cheaper because it's a different market with different needs which can be served by systems optimizing for throughput instead latency. Feels like you're looking for something that's not there.
I wouldn’t be so dismissive. Research is just a loop of hypothesis, experiments, collect data, make new hypothesis.
There’s so creativity required for scientific breakthroughs, but 99.9% percent of scientists don’t need this creativity. Just need grit and stamina.
That loop involves way more flexible goal oriented attention, more intrinsic/implicit understanding of plausible cause and effect based on context, and more novel idea creation than it seems.
You can only brute force things with combinatorics and probabilities that have been well mapped via human attention, as piggy-backing off of lots of human digested data is just a clever way of avoiding those issues. Research is by definition novel human attention directed at a given area, so it can't benefit from that strategy in the same way domains which have already had a lot of human attention can.
I think the whole idea of "original insight" is doing a lot of heavy lifting here.
Most innovative is derivative, either from observation or cross application. People aren't sitting in isolation chambers their whole lives and coming up with things in the absence of input.
I don't know why people think a model would have to manifest a theory absence of input.
> I think the whole idea of "original insight" is doing a lot of heavy lifting here.
This is by biggest issue with AI conversations. Terms like "original insight" are just not rigorous enough to have a meaningful discussion about. Any example an LLM produces can be said to be not original enough and conversely you could imagine trivial types of originality that simple algorithms could simulate (i.e. speculate on which existing drugs could be used to treat known conditions). Given the amount of drugs and conditions you are bound to propose some original combination.
People usually end up just talking past each other.
And insight. Insight can be gleaned from a comprehensive knowledge of all previous trials and the pattern that emerges. But the big insights can also be simple random attempts people make because they dont know something is impossible. While AI _may_ be capable of the first type, it certainly won't be capable of the second
Awfully bold to claim that 99.9% of scientists lack the need for "creativity". Creativity in methodology creates gigantic leaps away from reliance on grit and stamina.
There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.
Sure with a 1000B parameter you will get better performance but the average person will have it write some python script, not derive new physics equations.
So in a sense the demand for LLM intelligence with reach a plateau (arguably we are there today for avg person) so there will not be any subsidy required, because the avg person will not need the latest and greatest.
There’s not the same demand pattern for something like uber.
reply