More

cactusplant7374 · 2026-03-01T16:10:15 1772381415

There's also a reward for not over thinking it and letting AI bring the solutions to you. The outcomes are better when it's a question, answer, and execution session.

cactusplant7374 · 2026-02-26T14:12:08 1772115128

Tyler Cowen has said that he doesn't drink coffee and he is worried about what it might be doing to us. There is a big unknown.

Kurtz79 · 2026-02-26T14:30:54 1772116254

I admit that I don't know who Tyler Cowen is, but millions (billions?) of people have drunk coffee daily for centuries and if there were ill effects in the same ballpark as opioids or tobacco by now we would certainly know?

BurningFrog · 2026-02-26T16:00:11 1772121611

There is even a decent chance that the Industrial Revolution and the phenomenal wealth and progress it's brought was caused by the introduction of coffee to Europe.

donkey_brains · 2026-02-26T15:15:22 1772118922

Hey, let’s not discount the opinion of some internet guy just because of the lived experience of the rest of humanity throughout history. /s

cactusplant7374 · 2026-02-26T21:10:17 1772140217

That's an attack on HN comments in general.

switchbak · 2026-02-26T15:20:54 1772119254

A professor of economics has opinions on the health effects of an extremely common substance?

And I have opinions on nuclear energy - but neither of us are worth listening to outside our areas of expertise. Unless you can supply a reason I would bother listening to him as compared to an actual expert on the subject?

zoklet-enjoyer · 2026-02-27T03:15:49 1772162149

Why should I care what an economist's opinion is on coffee consumption?

jmye · 2026-02-26T16:42:06 1772124126

> There is a big unknown.

Because some dude with no health or nutrition background said uninformed things, that he isn't qualified to have opinions about, on the internet? Come on, now.

cactusplant7374 · 2026-02-25T00:14:16 1771978456

This whole thing stinks.

cactusplant7374 · 2026-02-24T00:04:03 1771891443

That assumes he is all knowing.

cactusplant7374 · 2026-02-19T11:03:42 1771499022

No developer writes the same prompt twice. How can you be sure something has changed?

kasey_junk · 2026-02-19T11:42:14 1771501334

I regularly run the same prompts twice and through different models. Particularly, when making changes to agent metadata like agent files or skills.

At least weekly I run a set of prompts to compare codex/claude against each other. This is quite easy the prompt sessions are just text files that are saved.

The problem is doing it enough for statistical significance and judging the output as better or not.

andreagrandi · 2026-02-19T13:23:15 1771507395

I suspect you may not be writing code regularly... If I have to ask Claude the same things three times and it keeps saying "You are right, now I've implemented it!" and the code is still missing 1 out of 3 things or worse, then I can definitely say the model has become worse (since this wasn't happening before).

cactusplant7374 · 2026-02-19T14:11:00 1771510260

> I suspect you may not be writing code regularly...

You have no reason to suspect this.

2026-02-19T14:00:03 1771509603

[dead]

andreagrandi · 2026-02-19T14:51:17 1771512677

I haven't experiences this with gpt-5.3-codex (xhigh) for example. Opus/Sonnet usually work well when just released, then they degrade quite regularly. I know the prompts are not the same every day or even across the day, but if the type of problems are always the same (at least in my case) and a model starts doing stupid things, then it means something is wrong. Everyone I know who uses Claude regularly, usually have the same esperience whenever I notice they degrade.

SkyPuncher · 2026-02-19T13:28:48 1771507728

When I use Claude daily (both professionally and personally with a Max subscription), there are things that it does differently between 4.5 and 4.6. It's hard to point to any single conversation, but in aggregate I'm finding that certain tasks don't go as smoothly as they used to. In my view, Opus 4.6 is a lot better at long standing conversations (which has value), but does worse with critical details within smaller conversations.

A few things I've noticed:

* 4.6 doesn't look at certain files that it use to

* 4.6 tends to jump into writing code before it's fully understood the problem (annoying but promptable)

* 4.6 is less likely to do research, write to artifacts, or make external tool calls unless you specifically ask it to

* 4.6 is much more likely to ask annoying (blocking) questions that it can reasonably figure out on it's own

* 4.6 is much more likely to miss a critical detail in a planning document after being explicitly told to plan for that detail

* 4.6 needs to more proactively write its memories to file within a conversation to avoid going off track

* 4.6 is a lot worse about demonstrating critical details. I'm so tired of it explaining something conceptually without it thinking about how it implements details.

SkyPuncher · 2026-02-19T15:58:51 1771516731

Just hit a situation where 4.6 is driving me crazy.

I'm working through a refactor and I explicitly told it to use a block (as in Ruby Blocks) and it completely overlooked that. Totally missed it as something I asked it to do.

baq · 2026-02-19T12:32:12 1771504332

Ralph Wiggum would like a word

cactusplant7374 · 2026-02-19T14:10:13 1771510213

Same prompt assumes same context state. But I think you get what I mean.

cactusplant7374 · 2026-02-18T20:27:05 1771446425

Cervical radiculopathy can cause shoulder pain. I have experienced this quite a bit and it's probably also because of my sleeping style. I wouldn't get an MRI unless I was planning to have surgery.

lostlogin · 2026-02-19T08:15:41 1771488941

Wouldn’t the MRI decide if surgery would be beneficial?

cactusplant7374 · 2026-02-19T14:28:35 1771511315

So many other factors and physical therapy is required for insurance approval anyway.

cactusplant7374 · 2026-02-18T18:37:18 1771439838

It sounds really expensive to run inference as a crawler.

cactusplant7374 · 2026-02-16T15:04:30 1771254270

The gains come from pairing Ghidra with a coding agent. It works amazing well.

Mattwmaster58 · 2026-02-16T15:29:18 1771255758

I'll second this. I used opencode + opus 4.6 + ghidra to reverse engineer a seedkey generation algorithm[1] from v850 assembly. I gave it the binary, the known address for the generation function, and a set of known inputs/outputs, and it was able to crack it.

[1] https://github.com/Mattwmaster58/ic204

bibelo · 2026-02-16T15:16:52 1771255012

would you have a tutorial on that?

cactusplant7374 · 2026-02-18T18:37:59 1771439879

Sorry, I don't. Giving the agent high level context has worked well for me.

cactusplant7374 · 2026-02-13T21:32:11 1771018331

It sounds like a mental health crisis. So many people are experiencing them when interacting with AI.

sidrag22 · 2026-02-13T21:57:16 1771019836

It is really good at highlighting my core flaw, marketing. I can ship stuff great, i feel insanely productive, and then i just hit a wall when it comes to marketing and move on to the next thing and repeat.

I think this is more aimed at the people who talk to AI like it is a person, or use it to confirm their own biases, which is painfully easy to do, and should be seen as a massive flaw.

For every one person who prompts AI intentionally to garner unbiased insights and avoid the sycophancy by pretending to be a person removed from the issue, who knows how many are unaware that is even a thing to do.

mikestew · 2026-02-13T21:44:22 1771019062

There was no mental health crisis, it was a bank account crisis. As in, "I sold my options on the secondary market, and those numbers on my bank statement are now so large I'm scared to stay at my job!" It was no secret what they were signing up for, so I find it too convenient that Anthropic raises a bunch of money, and suddenly this person has an ethical crisis.

pstuart · 2026-02-13T21:33:53 1771018433

It definitely seems to induce a bit of mania (ignoring the obvious joke about AI hype)

cactusplant7374 · 2026-02-13T14:39:11 1770993551

How would that compare to subtle bugs introduced by developers? I have seen a massive amount of bugs during my career, many of those introduced by me.

gaanbal · 2026-02-13T14:49:29 1770994169

it compares... unfavorably, on the side of ai

cactusplant7374 · 2026-02-13T17:14:26 1771002866

Not from what I'm seeing it. 5.3 codex xhigh is pretty amazing.