More

imranq · 2026-03-10T20:05:48 1773173148

Amazing write up and i wish more people showed the process for discovery which is often even more interesting than the result itself

Still the result is really interesting being able to stack abstract reasoning and get better performance and the heat maps to show the prob results

The academic literature seems to be catching up:

- *[SOLAR / DUS (Kim et al., 2023)](https://arxiv.org/abs/2312.15166)* — duplicated transformer layers to build a 10.7B model that outperformed 30B parameter baselines.

- *[The Curse of Depth (2025)](https://arxiv.org/abs/2502.05795)* — explains why this works: Pre-LN causes deep transformer layers to converge toward identity functions, meaning middle layers are where real computation happens, and duplicating them concentrates that capacity.

- *[Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (Geiping et al., NeurIPS 2025)](https://arxiv.org/abs/2502.05171)* — takes the idea to its logical conclusion: a model trained with a single recurrent block repeated at inference time, scaling reasoning depth without adding parameters.

dnhkng · 2026-03-11T06:34:51 1773210891

Hi, thanks for the praise!

On the other papers, models like SOLAR or training a model that uses a single layers are probably going to hit a wall, based on the heatmaps I found. The transformer stack starts with randomised weights, (analogous to undifferentiated stem cells), and it seems they later form 'organs' during the trillions of pre-training tokens they undergo. My hypothesis is that you probably only want one copy of the 'token-to-thought', and 'thought-to-token' organs. It seems that you can make one layer do all three things (transforms in and out, and do the 'thinking'), but I think specialisation will always win.

imranq · 2026-03-04T14:35:41 1772634941

I think it might be 2020 when the M1 was released since I remember i had bought a mac book in 2019 and it was still intel

mohsen1 · 2026-03-04T14:37:26 1772635046

It was Christmas gift. so maybe 2020... not super positive about this

rjrjrjrj · 2026-03-04T14:55:29 1772636129

November 2020

imranq · 2026-01-26T01:10:43 1769389843

I really like Clawdbots safety gloves off approach - no handholding or just saying yes to every permission.

I set it up on a old macbook pro I had that had a broken screen and it works great. Now I just message my server using telegram and it does research for me, organizes my notes, and builds small apps on the fly to help with learning.

However security is a real concern. I need to understand how to create a comprehensive set of allowlists before expanding into anything more serious like bill payments or messaging people / etc

kristopolous · 2026-01-26T03:24:35 1769397875

You know that's the easier and more careless thing to implement. You're flattering someone being reckless

tra3 · 2026-01-26T01:32:55 1769391175

But prompt injection is still a thing though. Remember the lethal trifecta..

imranq · 2026-01-22T12:44:26 1769085866

Some great life lessons here, but also some I don't agree with:

- The lazy person works twice as hard. Often I found you can save a lot of time just trying to the minimal possible and gain a lot of insights of why something is minimal vs not

-The opinion of the person who rarely offers it is listened to more closely. I found the opposite to be true, those who don't offer their thoughts frequently are often dismissed when they do want to share something

Anyway, many of the points are great.. I would also add to keep a journal and write down what was meaningful throughout the day.. you will find time passing by with more quality since you know what the take and what to avoid

imranq · 2026-01-19T04:44:21 1768797861

Just because it is in C, doesn't mean you will get C like performance. Just look at the benchmarks, it is 8x slower than just using PyTorch... while I get its cool to use LLMs to generate code at this level, getting super high performing optimized code is very much out of the domain of current frontier LLMs

jrk · 2026-01-19T04:56:23 1768798583

The PyTorch version is using the GPU (with Metal Performance Shaders); this C version is currently using (in the docs I saw) a single CPU core, with AMX (via Apple Accelerate BLAS) but not yet with OpenMP for parallelism. It’s not slow because LLM code is bad, but because it’s not running on the same hardware. That said, it’s also not as fast as it is because of the LLM—all the critical code is in kernel libraries it calls (the same as for PyTorch).

antirez · 2026-01-19T08:47:46 1768812466

Absolutely true, but now I'll focus on making it fast and I believe it will be possible to go much faster. I left the agent working in the night with a specification and now I'm going to see the progresses and restart the work.

nbardy · 2026-01-19T05:28:37 1768800517

No it’s not. I have written cuda kernels and 8bit optimizers with this.

They’re actually very good at speed optimization and can iterate very quickly taking notes on trials and failures and benchmarks. I’ve had it write 10 different attempts in around an hour and benchmark them all then merge and beat very strong baselines in torch

imranq · 2026-01-18T04:41:48 1768711308

I really liked the approach of getting new topics to research via embeddings, trails, and claude code, but often what will this give you outside of novelty?

imranq · 2025-11-27T04:16:48 1764217008

Hey HN! I built an explorer for all NeurIPS 2025 Main Conference and workshop papers with reviews, scores, and code links.

But the unique feature: AI-generated "explainers" that break down complex papers with interactive visualizations. Example: https://neurips2025.pages.dev/explainers/linear_attention/

It explains why attention is hard to optimize, shows the math with interactive demos, and includes critical analysis of limitations.

The explainers are generated using Gemini3 to parse papers and create:

- Interactive visualizations

- Step-by-step mathematical walkthroughs

- Critical analysis sections

- "What would convince me?" sections

Tech stack: OpenReview API, Gemini API for explainer generation, static hosting on Cloudflare Pages for speed.

I'm planning to generate explainers for more papers based on what people find interesting, so any feedback would be amazing

imranq · 2025-09-20T12:53:28 1758372808

The claims in this paper don't make sense. There is no proof that anything has been decompressed

dragonwriter · 2025-09-20T17:12:23 1758388343

“Decompression” is a metaphor, not a fact claim to be proved; it is a description of an approach to generating a dataset from an LLM where most of the potential utility is still fairly explicitly speculative, a jumping off point for further work.

gmerc · 2025-09-21T14:46:34 1758465994

Nope http://arxiv.org/abs/2509.11208

gmerc · 2025-09-21T14:46:20 1758465980

About that http://arxiv.org/abs/2509.11208

imranq · 2025-08-21T22:35:31 1755815731

There is one: https://pricepertoken.com/

rapind · 2025-08-21T23:28:11 1755818891

Claude's Opus pricing is nuts. I'd be surprised if anyone uses it without the top max subscription.

tmoravec · 2025-08-22T05:41:45 1755841305

FWIW I have the €20 Pro plan and exchange maybe 20 messages with Opus (with thinking) every day, including one weeks-long conversation. Plus a few dozen Sonnet tasks and occasionally light weight CC.

I'm not a programmer, though - engineering manager.

jjani · 2025-08-22T02:55:06 1755831306

Sure I do, but not as part of any tools, just for one-off conversations where I know it's going to be the best out there. For tasks where reasoning helps little to none, it's often still number one.

memothon · 2025-08-22T04:18:38 1755836318

Some people have startup credits

imranq · 2025-08-13T16:20:52 1755102052

Turning it off and then on again works in a lot of surprising places