Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Azul Systems has asked Intel to do this once... but instead created their own processors with interesting memory barrier properties for awhile that greatly sped up JVMs beyond what was capable (at the time) on x86-32/ppc/sparc. Eventually they gave up and became a purely software company, but their "Java Mainframe" product was many times faster than the Intels of the age executing the same code despite much slower CPUs. Died a quick life despite the cool factor.


I had a mentor who worked at Intel labs when this was happening. The reason this died was because someone invented a gc algorithm which consistently outperformed this, leading Intel to drop their hardware gc plans.


Could that not be hardware implemented / augmented?


It could be, but then after investing a billion dollars what happens when someone develops another algorithm?

What happens when a programming language with different gc requirements become popular?


Seems to be one of the main risks for any specialized circuits, if I understand you correctly. You always have to guess "will this really be relevant long enough to invest the money to bake it into hardware?" .. and if you guess wrong you just wasted a part of your silicon budget for something no one will use.


Right. Got it.

There may still be room for specialized GC assist circuits.


When i thought about this (using my programmer brain which knows nothing about hardware), i came up with the idea of a 'store pointer' instruction, which took two addresses and an offset, and stored the first address into the field of the object pointed to by the second address. And also, if the two addresses referred to different memory regions, recorded the pair of addresses into some kind of buffer on the processor. When that buffer got full, the processor would trap to some preconfigured location.

That could be used as a basis for a write barrier.

The devil would be in the detail of how the regions were defined.

And maybe the trapping would mean this wasn't even all that fast.


It wasn't inventing an algorithm in Hardware it was write barriers and escape analysis support in the hardware itself IIRC.


IBM POWER, Z, and Oracle SPARC silicon have instructions designed explicitly for GC and other common managed runtime tasks.


Indeed, OpenJ9's Pause-less GC is implemented using IBM z14 guarded load instructions and is able to outperform software by 10x in pause times and 3.4x in throughput for a given SLA [1].

[1] https://blog.openj9.org/2019/03/25/concurrent-scavenge-garba...


I feel like Amazon is going to bring back custom hardware like this. Imagine if this was an instance type.


I think everyone is about to make custom hardware.

A recent custom chip project I have been a part of for basically a decade is nearing production.

Made on a fairly large, old process. Despite this, many custom features have been added. For many tasks, the performance will be competetive with much more complex, resource intensive devices. It is built in a way that allows for efficient, multi core computing, concurrent or parallel, sans an OS.

People will write those, but many will just grab the pieces they want, put them on cores, then write their target app on top. The prior version, Propeller 1, made doing that a lot easier than one might think.

What struck me was the combination of very well planned silicon, coupled with software, can really perform. It reminds me of the custom chips we saw in early computing. Amiga, SGI, many others made adapter cards to get things done.

All of that stuff nailed the tasks cold, would always perform. As CPUs got quicker, more could be move to software of course.

But now that is hard again

Software plus purpose built silicon is going to deliver peak performance.

Always has.

In that project, it took years, but a great many use cases were considered, FPGA simulations ran, code written, and then augmented with hardware, special instructions and sub systems intended to maximize performance while retaining a lot of flexibility.

The way I see it, general purpose computers may just end up back where they were before.

In 8 bit times, an Apple 2 was an all software machine. People bought and made cards to do specific things very well. Other computers had custom chips that focused on games, etc...

In the later era, Amiga, SGI made great hardware that was focused on specific things, while the PC was more like the Apple 2.

We are currently leaving a long era where general purpose computing made sense for most cases, and software got refined, things got faster, and we saw many good cycles.

Soon, more efficient CPU designs, often with well tuned instructions for given tasks may be directing a lot of purpose built silicon. Phones and tablets vs laptops give us a tiny look at one part of how it might go.

GPU instances, specialized network maybe filesystem CPUs with highly optimized instruction sets already exist.

More will come.

Many use cases can benefit from real time or just faster performance per watt. Software plus custom silicon will nail that, and it is getting easier to do.

RISC V has plans for specialized instructions baked in. It may well be that "one to bind them all", nudging out the more expensive ARM, for example. And maybe not. ARM is lean and mean and mature. Who knows?

All I know is the drive to do more per watt as well as the drive to improve peak sequential compute, because there are still too many use cases where doing that makes sense, given either the nature of the problem space, or accumulates software warrant the effort.

We are already seeing the GPU idea expanded on.

Gonna be interesting and more difficult times ahead.


What do you think about Intel chips with integrated FPGAs, and AMD HyperTransport style chip expansion-by-bus? Is there a clear winner or loser?


Azul essentially provided tagged architecture AND I think forwarding pointers. The former gave you precise GC for free, the latter allowed concurrent GC to move data around without pauses.


I can't seem to find anything on the Vega processor other than "874 cores"... any links?


If it's old news, it often helps to just type the URL into Wayback Machine and click back a few years. I do it all the time. That got us this:

https://web.archive.org/web/20160310165634/https://www.azul....




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: