Hacker Newsnew | past | comments | ask | show | jobs | submit | cactusplant7374's commentslogin

There's also a reward for not over thinking it and letting AI bring the solutions to you. The outcomes are better when it's a question, answer, and execution session.

Tyler Cowen has said that he doesn't drink coffee and he is worried about what it might be doing to us. There is a big unknown.

I admit that I don't know who Tyler Cowen is, but millions (billions?) of people have drunk coffee daily for centuries and if there were ill effects in the same ballpark as opioids or tobacco by now we would certainly know?

There is even a decent chance that the Industrial Revolution and the phenomenal wealth and progress it's brought was caused by the introduction of coffee to Europe.

Hey, let’s not discount the opinion of some internet guy just because of the lived experience of the rest of humanity throughout history. /s

That's an attack on HN comments in general.

A professor of economics has opinions on the health effects of an extremely common substance?

And I have opinions on nuclear energy - but neither of us are worth listening to outside our areas of expertise. Unless you can supply a reason I would bother listening to him as compared to an actual expert on the subject?


Why should I care what an economist's opinion is on coffee consumption?

> There is a big unknown.

Because some dude with no health or nutrition background said uninformed things, that he isn't qualified to have opinions about, on the internet? Come on, now.


This whole thing stinks.

That assumes he is all knowing.

No developer writes the same prompt twice. How can you be sure something has changed?

I regularly run the same prompts twice and through different models. Particularly, when making changes to agent metadata like agent files or skills.

At least weekly I run a set of prompts to compare codex/claude against each other. This is quite easy the prompt sessions are just text files that are saved.

The problem is doing it enough for statistical significance and judging the output as better or not.


I suspect you may not be writing code regularly... If I have to ask Claude the same things three times and it keeps saying "You are right, now I've implemented it!" and the code is still missing 1 out of 3 things or worse, then I can definitely say the model has become worse (since this wasn't happening before).

> I suspect you may not be writing code regularly...

You have no reason to suspect this.


[dead]


I haven't experiences this with gpt-5.3-codex (xhigh) for example. Opus/Sonnet usually work well when just released, then they degrade quite regularly. I know the prompts are not the same every day or even across the day, but if the type of problems are always the same (at least in my case) and a model starts doing stupid things, then it means something is wrong. Everyone I know who uses Claude regularly, usually have the same esperience whenever I notice they degrade.

When I use Claude daily (both professionally and personally with a Max subscription), there are things that it does differently between 4.5 and 4.6. It's hard to point to any single conversation, but in aggregate I'm finding that certain tasks don't go as smoothly as they used to. In my view, Opus 4.6 is a lot better at long standing conversations (which has value), but does worse with critical details within smaller conversations.

A few things I've noticed:

* 4.6 doesn't look at certain files that it use to

* 4.6 tends to jump into writing code before it's fully understood the problem (annoying but promptable)

* 4.6 is less likely to do research, write to artifacts, or make external tool calls unless you specifically ask it to

* 4.6 is much more likely to ask annoying (blocking) questions that it can reasonably figure out on it's own

* 4.6 is much more likely to miss a critical detail in a planning document after being explicitly told to plan for that detail

* 4.6 needs to more proactively write its memories to file within a conversation to avoid going off track

* 4.6 is a lot worse about demonstrating critical details. I'm so tired of it explaining something conceptually without it thinking about how it implements details.


Just hit a situation where 4.6 is driving me crazy.

I'm working through a refactor and I explicitly told it to use a block (as in Ruby Blocks) and it completely overlooked that. Totally missed it as something I asked it to do.


Ralph Wiggum would like a word

Same prompt assumes same context state. But I think you get what I mean.

Cervical radiculopathy can cause shoulder pain. I have experienced this quite a bit and it's probably also because of my sleeping style. I wouldn't get an MRI unless I was planning to have surgery.

Wouldn’t the MRI decide if surgery would be beneficial?

So many other factors and physical therapy is required for insurance approval anyway.

It sounds really expensive to run inference as a crawler.

The gains come from pairing Ghidra with a coding agent. It works amazing well.

I'll second this. I used opencode + opus 4.6 + ghidra to reverse engineer a seedkey generation algorithm[1] from v850 assembly. I gave it the binary, the known address for the generation function, and a set of known inputs/outputs, and it was able to crack it.

[1] https://github.com/Mattwmaster58/ic204


would you have a tutorial on that?

Sorry, I don't. Giving the agent high level context has worked well for me.

It sounds like a mental health crisis. So many people are experiencing them when interacting with AI.


It is really good at highlighting my core flaw, marketing. I can ship stuff great, i feel insanely productive, and then i just hit a wall when it comes to marketing and move on to the next thing and repeat.

I think this is more aimed at the people who talk to AI like it is a person, or use it to confirm their own biases, which is painfully easy to do, and should be seen as a massive flaw.

For every one person who prompts AI intentionally to garner unbiased insights and avoid the sycophancy by pretending to be a person removed from the issue, who knows how many are unaware that is even a thing to do.


There was no mental health crisis, it was a bank account crisis. As in, "I sold my options on the secondary market, and those numbers on my bank statement are now so large I'm scared to stay at my job!" It was no secret what they were signing up for, so I find it too convenient that Anthropic raises a bunch of money, and suddenly this person has an ethical crisis.


It definitely seems to induce a bit of mania (ignoring the obvious joke about AI hype)


How would that compare to subtle bugs introduced by developers? I have seen a massive amount of bugs during my career, many of those introduced by me.


it compares... unfavorably, on the side of ai


Not from what I'm seeing it. 5.3 codex xhigh is pretty amazing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: