Hacker Newsnew | past | comments | ask | show | jobs | submit | yoran's commentslogin

How does an LLM approach to OCR compare to say Azure AI Document Intelligence (https://learn.microsoft.com/en-us/azure/ai-services/document...) or Google's Vision API (https://cloud.google.com/vision?hl=en)?


OmniAI has a benchmark that companies LLMs to cloud OCR services.

https://getomni.ai/blog/ocr-benchmark (Feb 2025)

Please note that LLMs progressed at a rapid pace since Feb. We see much better results with the Qwen3-VL family, particularly Qwen3-VL-235B-A22B-Instruct for our use-case.


Omni OCR team says that according to their own benchmark, the best OCR is the Omni OCR. I am quite surprised.


Magistral-Small-2509 is pretty neat as well for its size, has reasoning + multimodality, which helps in some cases where context isn't immediately clear, or there are few missing spots.


My base expectation is that the proprietary OCR models will continue to win on real-world documents, and my guess is that this is because they have access to a lot of good private training data. These public models are trained on arxiv and e-books and stuff, which doesn't necessarily translate to typical business documents.

As mentioned though, the LLMs are usually better at avoiding character substitutions, but worse at consistency across the entire page. (Just like a non-OCR LLM, they can and will go completely off the rails.)


Classical OCR still probably make undesirable su6stıtutìons in CJK from there being far too many of similar ones, even some absurd ones that are only distinguishable under microscope or by looking at binary representations. LLMs are better constrained to valid sequences of characters, and so they would be more accurate.

Or at least that kind of thing would motivate them to re-implement OCR with LLM.


Huh... Would it work to have some kind of error checking model that corrected common OCR errors? That seems like it should be relatively easy.


It's harder then it first seems. The root problem is that for text like "hallo", correcting to "hello" may be fixing an error or introducing an error. In general, the more aggressive your error correction, the more errors you inadvertently introduce. You can try and make a judgement based on context ("hallo, how are you?"), which certainly helps, but it's only a mitigation. Light error correction is common and effective, but you can't push it to a full solution. The only way to fully solve this problem is to look at the entire document at once so you have maximum context available, and this is what non-traditional OCR attempts to do.


Okay, but there way more common errors that should be easy to fix. "He11o", "Emest Herningway", incorrect diacritics like the other person mentioned, etc.


Not sure how it compares but we did some trials with Azure AI Document Intelligence and were very surprised at how good it was. We had a document example which was a poor photograph of a document that had quite a skew, and it (too our surprise), also detected the customer’s human legible signature and extracted their name from that signature.


Not sure about the others but we use Azure AI Document Intelligence and its working well for our resume parsing system. Took a good bit of tuning but we havent had to touch it for almost a year now.


aren't all of these multimodal LLM approaches, just open vs closed ones


Not sure why you're being downvoted, I'm also curious.


It still worked a few years ago but no longer :( (http://techno.org/electronic-music-guide/)


The Wayback Machine still displays a 2007 version (1900 captures!) ... complete with a sample for each genre ... even musique concrete .

https://web.archive.org/web/20071118083704/http://techno.org...



The archive.org variation appears to be missing both samples and descriptions.

A ported version got made over at: https://ishkur.kenxaj.cyou/

Appears to have all the music files and descriptions from a cursory inspection, although missing Tools, Samples, and Sounds.

Gives credit to Ishkur, recommends checking v3, and was made using: https://github.com/igorbrigadir/ishkurs-guide-dataset/


That's what I get for stopping at the front page after it looked just like I remember. :/

Thanks for the other link.


I feel like all new AI tools only integrate with GitHub though, like Claude Code. We're actually thinking of moving from GitLab to GitHub, just for this reason.


Is that a problem with GitLab or a problem that should make you wary of Claude Code though? It's one thing to lock yourself into one LLM provider, but when they start chaining you to other SaSS organizations aren't they just locking you down even more?


Claude works great with forgejo/gitea. It's all just git, after all.


In some industries, all the tools you actually need (say, MISRA checking) all work with GitLab out of the box.


same reason why we didn't leave github yet

most SaaS tools only have github integration which is sucks


All these tools seem to be GitHub-centric. Any tips for teams using GitLab to store their repositories?


I use Claude code daily at work, it writes all my PRs. It uses the GitHub cli to manage them.

Since all agents are able to use the terminal I suggest looking up the Gitlab CLI and have it use that. Should work locally and in runners.


We use it extensively in our codebase. We started without any types, and added Sorbet later. It's similar to Typescript as that you can gradually sparkle your code with types, building up the typing coverage over time.

I just completed a big refactoring. We have a good suite of tests. But Sorbet has provided an added layer of confidence. Especially when it comes to the treatment of null values. Sorbet will raise an error if you try to call a method on an object that may be null. So it forces you to think through: what should happen if the object is null?

So the tests combined with Sorbet typechecking made that we could just almost blindly deploy refactoring after refactoring, with only a handful of bugs for several 1000s of lines of code changed.


Useful resource!


Seems far-fetched. No one is forcing you to buy an S&P 500 fund. And you could have sold your S&P 500 shares the day Tesla entered the index.


Do we expect everyone and their grandmas to 'diversify investments without Tesla' and 'know what date Tesla entered the index'?

The government incentivizes retirement and investment. I have plenty of that thing the government hates because it doesnt produce value, it just stores value.

These bad actors have me thinking more and more about these non-producing assets that continue to grow because they are scare and necessary.


Short answer yes. If you invest in the S&P 500 you should know what you’re investing in.


"Every industry that has enough political power to utilise the state will seek to control entry." - George Stigler, Nobel prize winner in Economics, and worked extensively on regulatory capture

This explains why BigTech supports regulation. It distorts the free market by increasing the barriers to entry for new, innovative AI companies.


Stigler in particular (and transaction cost economics in general) point out that it's mainly industries with sunk resources (esp. immovable assets) that are incentivized to regulate market entry.

The tech sector has wildly moving resources (AI this year, crypto last year, big-data the year before...), even to the point where many skills are transferable; further, their markets include anything that can be digitized ("software will eat the world"), so investment can be quickly retooled as opportunities arise. As a result, tech virtually never seeks regulation (and can hide behind contract-law fictions to disclaim liability in software licenses and impose arbitration clauses for services). So it's not an instance of capture, and certainly not for the usual economic reasons.

Biden wants tech on his side. Tech wants to escape further blows to its goodwill like FaceBook/Google ad tracking, because every consumer tech application involves users trusting tech. So they cut a deal to put themselves on the right side of history, long on symbolism and short on real impact.

In AI, resources matter only to the extent you believe that larger LLM's can (a) not be replicated, (b) provide significant advantages, or (c) can impose a winner-take-all world where operations lead to more operations. In AI more than most markets, the little guy still has a chance at changing the world.


Agree. I can't stand the coldness of brutalist architecture, it's dystopian. It's disagreeable in summer, let alone on a grey winter day.


I appreciate its novelty at the time, and its nostalgia now. It was used in a lot of public buildings in the 1970's because it tended to be more economical than Ivy League looking brick buildings with white trim. I appreciate that its design had something to say, even if that thing wasn't clear or useful. Now public buildings look like boring brick boxes with maybe a curved wall or some other "accent".


I think of the federal building in Manchester, NH which looks like something the lizardman aliens would have built in “V” or might make you think the government faked the 9/11 attacks.


Boring beats ugly, IME. A certain amount of ambition is welcome, and a certain amount of failure is an inevitable cost of that, but textured concrete should have been consigned to the history books a long time ago.


Honestly, I liked the brutalist behemoths more than the tilt-ups with fake italian villa crown molding, arches, and weird curves.

At least the buildings usually convey one specific message instead of a mish-mash of "I'm a big cheap commercial building but I wish I was this other thing instead"


For most startups that raise money, tech is hardly a big cost. Most of the money goes towards fueling growth through paid marketing and so on.


People and office space are traditionally two major contributors


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: