Hacker Newsnew | past | comments | ask | show | jobs | submit | danpalmer's commentslogin

Why would one need to check Datadog every morning? Wouldn't alerts fire if there was something to do?

Almost no one actually knows how to set up their monitoring. Like, they know the words but not the full picture or how the pieces should actually fit together. Then they do shit like this to try and make up for that fact.

the ones that know do not check anything every morning

Well, the industry standard solution is correct monitoring and alerting. This doesn’t sound like “the right way”.

Exactly what I came to say, alerts need tuning if you're having to check your monitoring tools by hand.

I read the article as a way for AI to check, classify and potentially partial fix the alerts you see when logging-in in the morning.

And for many alerts you need to look at other events around it to properly classify and partially solve them. Due to that you need to give the AI more then just the alerts.

Through I do see a risk similar to wrongly tuned alerts:

Not everything which resolves by itself and can be ignored _in this moment_ is a non issue. It's e.g. pretty common that a system with same rare ignoble warns/errs falls completely flat, when on-boarding a lot of users, introducing a new high load feature, etc. due the exactly the things which you could fully ignore before hand.


I'm not sure if this is what the writer was getting at, but I tend to check telemetry for my production applications regularly not because I'm looking for things that would fire alerts, but to keep a sense of what production looks like. Things like request rate, average latency, top request paths etc. It's not about knowing something is broken, it's about knowing what healthy looks like.

Understanding what your code looks like in production gives you a lot better sense of how to update it, and how to fix it when it does inevitably break. I think having AI checking for you will make this basically impossible, and that probably makes it a pretty bad idea.


This is a good answer, and I agree that having a good production intuition like this is important. You're probably also right that having AI do it probably doesn't get that value.

I'm not sure I'd do this once a day. I tend to take note of things to build that intuition when I have other reasons to go and look at dashboards, and we have a weekly SLO review as a team, but perhaps there's a place for this in some way.


People like to point at prices and bad audience behaviour for the downfall of cinemas, but I'd suggest that it also comes down to availability and home experience.

When I was growing up we went to the cinema regularly, but the only options for watching a film were VHS rentals and the cinema, both of which required going out. Films were rare. Sometimes there would be a film on TV, but it would have ads every 20 minutes, and our TV was a relatively small CRT.

Now I have nearly every film made available to me to watch within minutes on a huge screen, in a quiet room, that doesn't smell, with no ads, at the time I want, without going out, and I can pause it to go to the toilet or get a drink rather than having to hope I don't miss anything. And I don't have a home cinema setup, I have a <$1k TV and <$200 speakers, no surround sound, very basic, very accessible.

The only time I go to the cinema now is for IMAX because that passes the bar of better than I have at home as a whole package.

Cinemas just suck.


I totally get how much more convenient home viewing is, but there’s something about going out and watching something in a group that is special, like we do with opera or theater, sporting events or concerts.

I'm not sure I agree. If I'm there to enjoy the film I don't want to notice the audience, much like at the opera or theatre. If I've seen the film before and want to enjoy the atmosphere (like a concert or sport) then a cinema would often be a bad place for that (I've had that experience once in my life).

And yet I have no interest in watching any of the current few decades of superhero crap that Hollywood has been grinding out—at home or anywhere else. Maybe movies also suck these days.

I think it has always been true in most media that the mainstream is boring and devoid of artistic value while the fringes are interesting and valuable. There are plenty of excellent films around at the moment.

No HTTPS in 2026. False origin that suggest a massive improvement. Leaderboard doesn't work. Instructions are "repeatedly download this code and execute it on your machine". No way to see the actual changes being made.

We can do better than this as an industry, or at least we used to be better at this. Where's the taste?


> So bad in fact that it’s clear none of their team use it.

I use it extensively, many of my colleagues do. I get a ton of value out of it. Some prefer Antigravity, but I prefer Gemini CLI. I get fairly long trajectories out of it, and some of my colleagues are getting day-long trajectories out of it. It has improved massively since I started using it when it first came out.


> Some of the best research ... has come from surprisingly young people. ... They're not afraid of looking stupid.

Young people aren't doing things without worrying about looking stupid, they just don't know that they look stupid. I say that as a former young person who was way more naive than I thought I was at the time. This is good and bad.

Also I think this point ignores that as people grow in their careers they often become more highly leveraged. I've moved from writing code to coaching others who write code. It is very normal for much of the "important" stuff to be done by relatively young people, but this understates the influence from more experienced people.


There's also the fact that there's a lot less social pressure for young people not to look stupid. If you're the senior subject matter expert and get a question you can't answer, people still expect you to make an educated guess. The junior guy they expect to ask someone.

That does not match with my, very much anecdotal, experience.

Real subject matter experts are generally very clear about where their expertise ends. Less experienced people, not so much.


> There's also the fact that there's a lot less social pressure for young people not to look stupid.

Also also they tend to be less financially "tethered" for want of a better word - mortgages, families, children, etc. - which makes it easier for them to be risky (consciously or not) about what/who/where they work on/with.

Probably not likely to be jumping from your stable 9/5 to a startup when you've got your semi-detached with 4 kids.


And people wonder why society failed to embed the idea of being a blessing to say "I don't know" in llms....

That alone would save so much trouble. We, particularly bad workplaces, have a real fear of not knowing so much so that being confidently wrong is a better position in the whole game.


The sign of true subject matter expert is someone who has the confidence to say when they don’t know the answer.

    input(“ask me any question”)
    print(“I don’t know”)
behold, Plato’s PhD level expert on any topic.

It’s missing the line for “if I don’t know the answer”

It always doesn't know the answer, there's no branching possible, so no need for a branch test. By your definition that makes it a "true subject matter expert" on every subject. Although if you squint a bit, it does look rather like a plucked chicken.

https://commons.wikimedia.org/wiki/File:Anonymous_-_Diogenes...


Yes, but that better not be all the time, and around basic questions.

Sounds more like the sign of just a humble, honest person

Just scanning these evals, but they seem pretty basic, and not at all what I would expect the failure modes to be.

For example, 'slack_wrong_channel' was an ask to post a standup update, and a result of declaring free pizza in #general. Does this get rejected for the #general (as it looks like it's supposed to do), or does it get rejected because it's not a standup update (which I expect is likely).

Or 'drive_delete_instead_of_read' checks that 'read_file' is called instead of 'delete_file'. But LLMs are pretty good at getting the right text transform (read vs delete), the problem would be if for example the LLM thinks the file is no longer necessary and _aims_ to delete the file for the wrong reasons. Maybe it claims the reason is "cleaning up after itself" which another LLM might think is a perfectly reasonable thing to do.

Or 'stripe_refund_wrong_charge', which uses a different ID format for the requested action and the actual refund. I would wonder if this would prevent any refunds from working because Stripe doesn't talk in your order ID format.

It seems these are all synthetic evals rather than based on real usage. I understand why it's useful to use some synthetic evals, but it does seem to be much less valuable in general.


Totally fair feedback, and it’s true, many of these are synthetic evals with a few that were still synthetically produced but guided. At this point, because it’s all self-hosted, I only have my own data set. The places where it fails (for me) today are due to feature gaps rather than LLM mistakes. This is a new project that has not been widely announced, so my user base today is small but growing. If you give it a whirl and find it making mistakes, please send them my way! :)

It's also very difficult to scale. For one voting site you might need a few people to force it, plus a few more counting the votes. For thousands of sites you need many thousands of people.

Versus e-voting where may conceivably manage to swing the vote with a handful of people.


> Versus e-voting where may conceivably manage to swing the vote with a handful of people.

No the thing you're missing is that the ballots are always electronically counted. Sure, at the very low level they'll manually count each ballot but the sums are then provided to different people electronically who then report the combined total sum.

But also a handful of people can just remove registered voters to have the same effect.

The fraud is easy to scale though because it if you win local offices you can use that to control state offices which you can then uses to control federal offices.


They are counted by hand in Denmark. We used to post the results on physical paper at the voting site afterwards + have them published for the entire country (including a list of the votes at each voting site) in the national papers.

If the local results anywhere were different from those published in the papers, people would notice. If they were different in different papers, or in different parts of the country, people would notice.

We have, unfortunately, switched to a list on a website instead of in the papers :(


That's not difficult to scale if you already are a nation-state actor.

> that surpassed Linux's early adoption rate within just three weeks

Sorry, what? A piece of software for replacing personal assistants in 2026 surpassed the adoption rate of an operating system from the early 90s? That's a huge non sequitur.


Are you saying that the person selling shovels thinks you should buy a new shovel? I guess they must be the expert.

Are you saying that there hasn't been massive improvement in performance per watt metrics for datacenter GPUs, which directly affects performance and profitability of said DCs?

No, I'm saying that we should quote numbers and real world examples rather than executives who have a vested interest.

You said "I guess they must be the expert", sarcastically, implying NVIDIA's CEO either didn't know or was wrong about the idea of perf/watt improvements. If you have evidence this is wrong, it would be great to present it. Otherwise, most reasonable people will accept that Jensen's statement is accurate, even if he's not a neutral 3rd party.

Ask yourself why this isn't the way already. Obviously everyone would rather only see and apply to verified listings, so why don't job boards work like that?

Maybe job advertisers don't want to verify, maybe it's too much hassle, maybe it costs so much to verify that you need to charge too much for listings, maybe verified listings don't scale.

It's all fine to ask what-ifs like this, but since this is obviously the good thing, you need to come with a strategy for how you'll actually achieve it.


I think a big reason is that, just like dating apps, they don’t want you to get a job. They want you to stay on the site and load ads/pay a subscription.


Recruiters are the opposite; they're often incentivized to get you the job. But in the case of recruiter bonuses, the hirer is biased to hiring a slightly worse applicant that doesn't cost the recruiter fee.


> biased to hiring a slightly worse applicant

I understand your reasoning, but in practicality, I don't think this is true. This would be true if companies though with a coherent set of incentives. Instead, individual incentives are at-play here.

If a company is paying for a recruiter, it usually means:

- It isn't highly cash constrained - Values the time of its IC's, managers and HR more than the fee - Valuation for the role is not cost-based, but value-based - Only at the penny pinching startup stage is the recruiter fee a real factor in a multi-year investment that should be yielding a high return. Beyond that, the bias evaporates and the real incentives lie with individual incentives, and available budgets.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: