Hacker Newsnew | past | comments | ask | show | jobs | submit | euclaise's commentslogin

Maybe RL? Just like similar corrections in reasoning traces. You can train non-'thinking' models the same way (though if you're naive about it then you might end up with responses that are similarly rambly), and I'd expect it to have been


There isn't, though you can run it over wasm on it. I tried it a while back with a port of the w2c2 transpiler (https://github.com/euclaise/w2c9/), but something like wazero is a more obvious choice


This is not exactly propaganda in the typical sense, but it clearly is the case that people successfully edit Wikipedia to further objectives. As an example, the Wikipedia page for Meta-analysis (which isn't even that obscure of a topic) currently contains content that seems to plausibly be trying to promote Suhail Doi's methods, and it seems that it has been like this for a number of years. It cites 5 papers from him, more than anyone else, of which the largest has 297 citations. It has a subsection devoted to his method of meta-analysis, despite it being a rather obscure and rarely used method. There have been additional subsections added over time, which also focus on somewhat obscure areas, but frankly these additions are sketchy in similar ways.

In general, it is not uncommon to come across slantedness issues. Is it completely 100% clear that Doi has come on and maliciously added his papers? Not quite, but good propaganda wouldn't be either, and would actually be far less suspicious-looking.

https://en.wikipedia.org/wiki/Meta-analysis


Simpler than, but somewhat reminiscent of, Plan 9's windowing system https://man.cat-v.org/plan_9/4/rio


Between the official nvidia drivers and Linuxulator, FreeBSD can run CUDA applications, but it's a bit hacky

No other BSDs can


This one does have attention, it's just chunked into segments of 4096


Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.


A lot of embedding models are built on top of T5's encoder, this offers a new option

The modularity of the enc-dec approach is useful - you can insert additional models in between (e.g. A diffusion model), you can use different encoders for different modalities, etc


LM studio is closed source, so no


Neat. I've worked on some similar projects in the past

I have previously ported w2c2 to Plan 9 here: https://github.com/euclaise/w2c9

It ran basic Rust code fine.

I later managed to run C++ code without wasm, by (partially) porting musl and doing some linker hacking here: https://sr.ht/~euclaise/cross9/


There's a new 7B version that was trained on more tokens, with longer context, and there's now a 14B version that competes with Llama 34B in some benchmarks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: