More

olq_plo · 2026-01-18T19:38:04 1768765084

And now you can do this with polars in parallel on all your cores and the GPU, using almost the same syntax as in pyspark.

olq_plo · 2025-11-16T08:19:25 1763281165

Corollary: Doing the thing and not talking about it in a hammer tweet is also 'not doing the thing'.

olq_plo · 2025-10-25T09:35:34 1761384934

I wrote my first post about parsing webpages to structured data with a LLM in January, using local models. Now its October and I did it again with current models and libraries. Boy, what a difference.

olq_plo · 2025-10-15T18:40:47 1760553647

The post is so painfully obviously AI written, it hurts my eyes.

The Setup

The Scoop

The Conclusion

I hate AI slop.

olq_plo · 2025-09-05T06:41:25 1757054485

Yes, bad article to omit that. It is such a cool fun fact. Gauss was unreal.

olq_plo · 2025-09-04T14:57:43 1756997863

Since you seem to know your stuff, why do LLMs need so much data anyway? Humans don't. Why can't we make models aware of their own uncertainty, e.g. feeding the variance of the next token distribution back into the model, as a foundation to guide their own learning. Maybe with that kind of signal, LLMs could develop 'curiosity' and 'rigorousness' and seek out the data that best refines them themselves. Let the AI make and test its own hypotheses, using formal mathematical systems, during training.

olq_plo · 2025-06-09T09:59:48 1749463188

IANAL, but it means the commit itself is public domain. When integrated into a code base with a more restrictive license, you can still use that isolated snippet in whatever way you want.

More interesting question is whether one could remove the GPL restrictions on public code by telling AI to rewrite the code from scratch, providing only the behavior of the code.

This could be accomplished by making AI generate a comprehensive test suite first, and then let it write the code of the app seeing only the test suite.

taneq · 2025-06-09T10:33:30 1749465210

Hmm, so basically automated clean room reimplementation, using coding agents? Our concepts of authorship, copying, and equivalence are getting a real workout these days!

tough · 2025-06-09T10:52:09 1749466329

you'd need a pretty good opsec and non-search capable agent and logs of all its actions/chain of thought/process to be able to truly claim cleanroom implementation tho

taneq · 2025-06-09T12:43:07 1749472987

The logs and traceability are the secret sauce here. It's one thing to have an artifact that mysteriously replicates the functionality of a well known IP-protected product without just straight up copying it. It's another thing to be able to demonstrate that said artifact was generated solely from information in the public domain or otherwise legally valid to use.

tough · 2025-06-09T13:34:56 1749476096

if its of your interest, i was investigating this and found out all the big labs like openai offer and indemnity clause for enterprise customers, that is supposed to assure you that it doesn't output non-compliant license code (like copyrighted or AGPL or whatever), BUT you have to accept them keeping all your logs, give them access, and let them and their lawyers do build their own case in case of getting sued.

I guess they're mostly selling insurance to bigCo's, and saying, hey we have the money to go to law, and the interests to win such a case, so we'll handle it

olq_plo · 2025-05-13T05:54:06 1747115646

Very cool idea. Can't wait for converted models on HF.

MichaelMoser123 · 2025-05-13T17:12:11 1747156331

deepseek-v2,v3,r1 are all using multi-headed attention.

olq_plo · on May 9, 2021

This is not everyone, but if you've been writing scientific papers with LaTeX, you may have come across this issue.

You go to an online database (Inspire or ADS) to fetch some references for your paper. Then you have to copy/paste the entry twice, the key to your LaTeX document and the BibTeX entry to your .bib file. Doing redundant things is annoying, right? autobib removes the need to do the latter. You still have to look up the key online and cite it in your LaTeX document, but autobib downloads the entry automatically to your .bib file.

olq_plo · on Dec 8, 2020

Apart from the visible changes to the user interface, I completely swapped out the foundation. iminuit consists - at its core - of Python bindings to the Minuit2 C++ library.

We used to generate those bindings with Cython, but Cython is very bad at generating bindings for C++. It does not support all modern features and imposes restrictions on what you can wrap. It is also an external code generator that you have to install.

Cython was a real problem, so we switched to the excellent pybind11 library. It is C++ header-only library. Generating Python bindings with that is a breeze and it supports all possible C++ constructs. We lost at lot of weight and awkward complexity by switching out the foundation.