I love this. My hobby is multithreaded programming. I enjoy trying to get high r...

I love this. My hobby is multithreaded programming. I enjoy trying to get high requests per second with parallel programs. I tend to use Java, C and Rust. But were it possible to create shared memory threading with JavaScript I would probably do more in JavaScript too.

Most CPUs have multiple cores and Amdahl's law and universal scaling law mean there's a a very real scalability advantage to running on multiple cores. Especially with Hyperthreading and the big.LITTLE or chiplet design or Intel's efficiency and performance cores.

One limitation to multithreading nobody talks about is talked about in this whitepaper. It essentially means parallel speed up is limited to cubic speed up.

https://ieeexplore.ieee.org/document/2150

But unfortunately the whitepaper requires payment to read.

I found a summary in this PDF:

https://web.eecs.umich.edu/~imarkov/pubs/jour/dt13-limits.pd...

"To this end, the 1998 IEEE Transactions on Computers paper ‘‘Your Favorite Parallel Algorithms Might Not Be as Fast as You Think’’ by David Fisher accounts for the finite density of processing elements in space, the (low) dimension d of the space in which parallel computation is performed, the finite speed of communication, and the linear growth of communication delay with distance. Neglected in most publications, these four factors limit parallel speed-up to power d + 1.Considering matrix multiplication as an example where exponential speed-up is possible in theory, a two-dimensional computing system (a planar circuit, a modern GPU, etc.) can offer at most a cubic speed-up. Given that the general result is asymptotic, it is significant only for large numbers of processing elements that communicate with each other. In particular, for circuits and FPGAs, it limits the benefits of threedimensional integration to power 4/3 (optimistically assuming a fully isotropic system). For twodimensional GPUs, at most a cubic speed-up over sequential computation is possible. To this end, a 2012 report by the Oak Ridge Leadership Computing Facility analyzed widely used simulation applications (turbulent combustion; molecular, fluid and plasma dynamics; seismology; atmospheric science; nuclear reactors, etc.). GPU-based speed-ups ranged from 1.4 to 3.3 times for ten applications and 6.1 times for the eleventh (quantum chromo-dynamics). These mediocre speed-ups likely reflect flaws in prevailing computer organization, where heavy reliance on shared memories dramatically increases communication costs, but alternatives would drastically complicate programming."

I wrote a parallel actor multithreaded implementation that can get 100 million requests per second without locks. I also wrote an parallel assembly interpreter which can execute this program which uses the underlying actor implementation. Notice the mailbox command. This assembly program essentially sends integers between 25 threads and it also sends a method call between thread (it sends a jump instruction to the other thread for running on that thread.

   threads 25
   <start>
   mailbox numbers
   mailbox methods
   set running 1
   set current_thread 0
   set received_value 0
   set current 1
   set increment 1
   :while1
   while running :end
   receive numbers received_value :send
   receivecode methods :send :send
   :send
   add received_value current
   addv current_thread 1
   modulo current_thread 25
   send numbers current_thread increment :while1
   sendcode methods current_thread :print
   endwhile :while1
   jump :end
   :print
   println current
   return
   :end

https://GitHub.com/samsquire/multiversion-concurrency-contro...

One problem with a javascript multithreaded runtime, which Python also suffers with its subinterpreter design is that data must be marshalled between threads. This is slow.

I would like to design a interpreter that can share data with zero cost abstractions when moving data across a subinterpreter boundary. I think it can be done in zero copies.

Erlang copies data when data is sent between processes which allows garbage collection to be per process.