Hacker Newsnew | past | comments | ask | show | jobs | submit | imranq's commentslogin

Amazing write up and i wish more people showed the process for discovery which is often even more interesting than the result itself

Still the result is really interesting being able to stack abstract reasoning and get better performance and the heat maps to show the prob results

The academic literature seems to be catching up:

- *[SOLAR / DUS (Kim et al., 2023)](https://arxiv.org/abs/2312.15166)* — duplicated transformer layers to build a 10.7B model that outperformed 30B parameter baselines.

- *[The Curse of Depth (2025)](https://arxiv.org/abs/2502.05795)* — explains why this works: Pre-LN causes deep transformer layers to converge toward identity functions, meaning middle layers are where real computation happens, and duplicating them concentrates that capacity.

- *[Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (Geiping et al., NeurIPS 2025)](https://arxiv.org/abs/2502.05171)* — takes the idea to its logical conclusion: a model trained with a single recurrent block repeated at inference time, scaling reasoning depth without adding parameters.


Hi, thanks for the praise!

On the other papers, models like SOLAR or training a model that uses a single layers are probably going to hit a wall, based on the heatmaps I found. The transformer stack starts with randomised weights, (analogous to undifferentiated stem cells), and it seems they later form 'organs' during the trillions of pre-training tokens they undergo. My hypothesis is that you probably only want one copy of the 'token-to-thought', and 'thought-to-token' organs. It seems that you can make one layer do all three things (transforms in and out, and do the 'thinking'), but I think specialisation will always win.


I think it might be 2020 when the M1 was released since I remember i had bought a mac book in 2019 and it was still intel

It was Christmas gift. so maybe 2020... not super positive about this

November 2020

I really like Clawdbots safety gloves off approach - no handholding or just saying yes to every permission.

I set it up on a old macbook pro I had that had a broken screen and it works great. Now I just message my server using telegram and it does research for me, organizes my notes, and builds small apps on the fly to help with learning.

However security is a real concern. I need to understand how to create a comprehensive set of allowlists before expanding into anything more serious like bill payments or messaging people / etc


You know that's the easier and more careless thing to implement. You're flattering someone being reckless


But prompt injection is still a thing though. Remember the lethal trifecta..


Some great life lessons here, but also some I don't agree with:

- The lazy person works twice as hard. Often I found you can save a lot of time just trying to the minimal possible and gain a lot of insights of why something is minimal vs not

-The opinion of the person who rarely offers it is listened to more closely. I found the opposite to be true, those who don't offer their thoughts frequently are often dismissed when they do want to share something

Anyway, many of the points are great.. I would also add to keep a journal and write down what was meaningful throughout the day.. you will find time passing by with more quality since you know what the take and what to avoid


Just because it is in C, doesn't mean you will get C like performance. Just look at the benchmarks, it is 8x slower than just using PyTorch... while I get its cool to use LLMs to generate code at this level, getting super high performing optimized code is very much out of the domain of current frontier LLMs


The PyTorch version is using the GPU (with Metal Performance Shaders); this C version is currently using (in the docs I saw) a single CPU core, with AMX (via Apple Accelerate BLAS) but not yet with OpenMP for parallelism. It’s not slow because LLM code is bad, but because it’s not running on the same hardware. That said, it’s also not as fast as it is because of the LLM—all the critical code is in kernel libraries it calls (the same as for PyTorch).


Absolutely true, but now I'll focus on making it fast and I believe it will be possible to go much faster. I left the agent working in the night with a specification and now I'm going to see the progresses and restart the work.


No it’s not. I have written cuda kernels and 8bit optimizers with this.

They’re actually very good at speed optimization and can iterate very quickly taking notes on trials and failures and benchmarks. I’ve had it write 10 different attempts in around an hour and benchmark them all then merge and beat very strong baselines in torch


I really liked the approach of getting new topics to research via embeddings, trails, and claude code, but often what will this give you outside of novelty?


Hey HN! I built an explorer for all NeurIPS 2025 Main Conference and workshop papers with reviews, scores, and code links.

But the unique feature: AI-generated "explainers" that break down complex papers with interactive visualizations. Example: https://neurips2025.pages.dev/explainers/linear_attention/

It explains why attention is hard to optimize, shows the math with interactive demos, and includes critical analysis of limitations.

The explainers are generated using Gemini3 to parse papers and create:

- Interactive visualizations

- Step-by-step mathematical walkthroughs

- Critical analysis sections

- "What would convince me?" sections

Tech stack: OpenReview API, Gemini API for explainer generation, static hosting on Cloudflare Pages for speed.

I'm planning to generate explainers for more papers based on what people find interesting, so any feedback would be amazing


The claims in this paper don't make sense. There is no proof that anything has been decompressed


“Decompression” is a metaphor, not a fact claim to be proved; it is a description of an approach to generating a dataset from an LLM where most of the potential utility is still fairly explicitly speculative, a jumping off point for further work.





Claude's Opus pricing is nuts. I'd be surprised if anyone uses it without the top max subscription.


FWIW I have the €20 Pro plan and exchange maybe 20 messages with Opus (with thinking) every day, including one weeks-long conversation. Plus a few dozen Sonnet tasks and occasionally light weight CC.

I'm not a programmer, though - engineering manager.


Sure I do, but not as part of any tools, just for one-off conversations where I know it's going to be the best out there. For tasks where reasoning helps little to none, it's often still number one.


Some people have startup credits


Turning it off and then on again works in a lot of surprising places


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: