More

dekhn · 2026-03-13T16:57:23 1773421043

How could it not? That information is always available to userspace.

bityard · 2026-03-13T17:36:58 1773423418

"Available to userspace" is a much different thing than "available to every website that wants it, even in private mode".

I too was a little surprised by this. My browser (Vivladi) makes a big deal about how privacy-conscious they are, but apparently browser fingerprinting is not on their radar.

dekhn · 2026-03-13T17:42:04 1773423724

We switched to talking about llmfit in this subthread, it runs as native code.

swiftcoder · 2026-03-13T17:42:02 1773423722

It's pretty hard to avoid GPU fingerprinting if you have webgl/webgpu enabled

dekhn · 2026-03-12T20:45:38 1773348338

https://www.cnn.com/TECH/computing/9904/08/cdrom.idg/ and https://forums.civfanatics.com/threads/wow-what-a-system-ftp... I'm not aware of any others with more details and war stories.

dekhn · 2026-03-12T03:43:42 1773287022

Your comment reads as negative about the corporations even if that's not what you intended. There's something about the way it's reads (I noticed myself when I read your comment, then noticed your edit at the bottom). Nuance usually does not translate well on the internet.

dekhn · 2026-03-11T23:07:37 1773270457

The same joke is in David Macauley's Motel of the Mysteries (see drawing in https://www.byanyothernerd.com/2020/04/stranger-days-39-myst...).

zem · 2026-03-11T23:38:02 1773272282

relatedly, https://en.wikisource.org/wiki/Report_on_Grand_Central_Termi...

dekhn · 2026-03-11T18:54:08 1773255248

Are you going to build a competitor to CERN?

There are many things that cannot be feasibly verified empirically without access to rare resources.

dekhn · 2026-03-11T17:26:43 1773250003

Maybe nature and cell and a few other journals should be exceptions: they should be the place that the most advanced scientists publish interesting ideas early for the consumption by their competitors. At that level of science, all the competitors can reproduce each other's experiments if necessary; the real value is expanding the knowledge of what seems possible quickly.

(I am not seriously proposing this, but it's interesting to think about distinguishing between the very small amount of truly innovative discovery versus the very long tail of more routine methods development and filling out gaps in knowledge)

Bratmon · 2026-03-11T17:35:15 1773250515

> that level of science, all the competitors can reproduce each other's experiments if necessary

But they don't, and that's the problem!

godelski · 2026-03-11T19:57:40 1773259060

The problem is bigger. It even blocks research!

In my own experience I was unable to publish a few works because I was unable to outperform a "competitor" (technically we're all on the same side, right?). So I dig more and more into their work and really try to replicate their work. I can't! Emailing the authors I get no further and only more questions. I submit the papers anyways, adding a section about replication efforts. You guessed it, rejected. With explicit comments from reviewers about lack of impact due to "competitor's" results.

Is an experience I've found a lot of colleagues share. And I don't understand it. Every failed replication should teach us something new. Something about the bounds of where a method works.

It's odd. In our strive for novelty we sure do turn down a lot of novel results. In our strive to reduce redundancy we sure do create a lot of redundancy.

jltsiren · 2026-03-11T22:44:15 1773269055

I've seen this from both sides.

Sometimes the result is wrong, or it's not as big or as general as claimed. Or maybe the provided instructions are insufficient to replicate the work. But sometimes the attempt to replicate a result fails, because the person doing it does not understand the topic well enough.

Maybe they are just doing the wrong things, because their general understanding of the situation is incorrect. Maybe they fail to follow the instructions correctly, because they have subtle misunderstandings. Or maybe they are trying to replicate the result with data they consider similar, but which is actually different in an important way.

The last one is often a particularly difficult situation to resolve. If you understand the topic well enough, you may be able to figure out how the data is different and what should be changed to replicate the result. But that requires access to the data. Very often, one side has the data and another side the understanding, but neither side has both.

Then there is the question of time. Very often, the person trying to replicate the result has a deadline. If they haven't succeeded by then, they will abandon the attempt and move on. But the deadline may be so tight that the authors can't be reasonably expected to figure out the situation by then. Maybe if there is a simple answer, the authors can be expected to provide it. But if the issue looks complex, it may take months before they have sufficient time to investigate it. Or if the initial request is badly worded or shows a lack of understanding, it may not be worth dealing with. (Consider all the bad bug reports and support requests you have seen.)

godelski · 2026-03-11T23:44:27 1773272667

I definitely think all these are important, even if in different ways. For the subtle (and even not so subtle) misunderstandings it matters who misunderstands. For the most part, I don't think we should concern ourselves with non-experts. We do need science communicators, but this is a different job (I'm quite annoyed at those on HN who critique arxiv papers for being too complex while admitting they aren't researchers themselves). We write papers to communicate to peers, not the public. If we were to write to the latter each publication would have to be prepended by several textbooks worth of material. But if it is another expert misunderstanding, then I think there's something quite valuable there. IFF the other expert is acting in good faith (i.e. they are doing more than a quick read and actually taking their time with the work) then I think it highlights ambiguity. I think the best way to approach this is distinguish by how prolific the misunderstanding is. If it is uncommon, well... we're human and no matter how smart you are you'll produce mountains of evidence to the contrary (we all do stupid shit). But if the misunderstanding is prolific then we can be certain that ambiguity exists, and it is worth resolving. I've seen exactly what you've seen as well as misunderstandings leading to discoveries. Sometimes our idiocracy can be helpful lol.

But in any case, I don't know how we figure out which category of failures it is without it being published. If no one else reads it it substantially reduces the odds of finding the problem.

FWIW, I'm highly in favor of a low bar to publishing. The goal of publishing is to communicate to our peers. I'm not sure why we get so fixated on these things like journal prestige. That's missing the point. My bar is: 1) it is not obviously wrong, 2) it is not plagiarized (obviously or not), 3) it is useful to someone. We do need some filters, but there's already natural filters beyond the journals and conferences. I mean we're all frequently reading "preprints" already, right? I think one of the biggest mistakes we make is conflate publication with correctness. We can't prove correctness anywhere, science is more about the process of elimination. It's silly to think that the review process could provide correctness. It can (imperfectly) invalidate works, but not validate them. It isn't just the public that seems to have this misunderstanding...

jltsiren · 2026-03-12T00:26:12 1773275172

Things are easier when you are writing to your peers within an established academic field. But all too often, the target audience includes people in neighboring fields. Then it can easily be that most people trying to replicate the work are non-experts.

For example, most of my work is in algorithmic bioinformatics, which is a small field. Computer scientists developing similar methods may want to replicate my work, but they often lack the practical familiarity with bioinformatics. Bioinformaticians trying to be early adopters may also try to replicate the work, but they are often not familiar with the theoretical aspects. Such a variety of backgrounds can be a fertile ground for misunderstandings.

godelski · 2026-03-12T03:34:17 1773286457

Sure. You can't write to everyone and there's tradeoffs to broadening your audience. But I'm also not sure what your point is. That people are arrogant? Such variety of backgrounds can also be fertile ground for collaboration. Something that should happen more often

jltsiren · 2026-03-12T04:27:11 1773289631

As a gross simplification, there are two kinds of fields. Some are defined by the methods they use, and some by the topics they study.

The latter will use any methods that may yield results. That creates a problem. The people who are in the target audience for a paper and may try to replicate the results often fail to do so, because they lack the expertise. Because their background is too different.

godelski · 2026-03-12T05:08:40 1773292120

I think you think that because we don't agree that I have some grave misunderstanding of some, to be frank, basic facts. I assure you, I perfectly understand what you're bringing up here and in the last comment.

But I think you still haven't understood my point about trade-offs. At least you aren't responding as if these exist.

Our disagreement isn't due to lack of understanding the conditions, it is due to a difference in acceptable limitations. After all, perfection doesn't exist.

So you can't just solve problems like this by bringing up limitations in an opposing viewpoint. I assure you, I was already well aware of every single one you've mentioned...

jltsiren · 2026-03-12T06:35:21 1773297321

My original point was that replication attempts often fail, because the person trying to replicate the result is not an expert in the field, and they do not have enough time to devote to the effort. This is a common situation in fields that use results from other fields. If they don't have the time for proper replication, they probably don't have the time for publishing the attempt.

As for your point, I don't really understand what you are trying to say.

dekhn · 2026-03-11T17:38:02 1773250682

Advanced groups usually replicate their competitor's results in their own hands shortly after publication (or they just trust their competitor's competence). But they don't spend any time publishing it unless they fail to replicate and can explain why they can't replicate. From their perspective, it's a waste of time. I think this has been shown to be a naive approach (given the high rate of image fraud in molecular biology) but people who are in the top of the field have strong incentives to focus on moving the state of the art forward without expending energy on improving the field as a whole.

MarkusQ · 2026-03-11T19:29:37 1773257377

"strong incentives to focus on moving the state of the art forward without expending energy on improving the field as a whole"

That sort of Orwellian doublethink is exactly the problem. They need to move it forward without improving it, contribute without adding anything, challenge accepted dogma without rocking the boat, and...blech!

godelski · 2026-03-11T20:10:28 1773259828

  > challenge accepted dogma without rocking the boat

I think the funniest part is how we have all these heroes of science who faced scrutiny by their peers, but triumphed in the end. They struggled because they challenged the status quo. We celebrate their anti authoritative nature. We congratulate them for their pursuit of truth! And then get mad when it happens. We pretend this is a thing of the past, but it's as common as ever[0,1].

You must create paradigm shifts without challenging the current paradigm!

[0] https://www.scientificamerican.com/article/katalin-karikos-n...

[1] https://www.globalperformanceinsights.com/post/how-a-rejecte...

RobotToaster · 2026-03-12T06:34:10 1773297250

"Science is the belief in the ignorance of experts" - Richard Feynman

Bratmon · 2026-03-11T18:44:28 1773254668

All that makes it more important for top journals to reward replication, not less!

jltsiren · 2026-03-11T19:00:41 1773255641

Top journals are not inherently prestigious. They are prestigious because they try to publish only the most interesting and most significant results. If they started publishing successful replication studies, they would lose prestige, and more interesting journals would eventually rise to the top. (Replication studies that fail to replicate a major result in a spectacular way are another matter.)

godelski · 2026-03-11T20:02:10 1773259330

Are you explaining this from experience or from speculation?

I can tell you that it doesn't match my own experience. I also think it doesn't match your example. Those cases of verified image fraud are typically part of replication efforts. The reason the fraud is able to persist is due to the lack of replication, not the abundance of it.

dekhn · 2026-03-11T20:16:42 1773260202

Mostly experience (based on being a PhD scientist, a postdoc, a National Lab scientist, and engineer at several bigtech companies), partly speculation (none of the groups/labs I worked in operated at "the highest level", but I worked adjacent to many of those).

I'm pretty sure most image fraud went completely unrealized even in the case of replication failure. It looks like (pre AI) it was mostly a few folks who did it as a hobby, unrelated to their regular jobs/replication work.

godelski · 2026-03-11T20:31:37 1773261097

In most of the labs I've worked in replication is not a common task[0]

  > 'm pretty sure most image fraud went completely unrealized even in the case of replication failure

Part of my point is that being unable to publish replication efforts means we don't reduce ambiguity in the original experiments. I was taught that I should write a paper well enough that a PhD student (rather than candidate) should be able to reproduce the work. IME replication failures are often explained with "well I must be doing something wrong." A reasonable conclusion, but even if true the conclusion is that the original explanation was insufficiently clear.

  > It looks like (pre AI) it was mostly a few folks who did it as a hobby

I'm sorry, didn't you say

  >>> Advanced groups usually replicate their competitor's results in their own hands shortly after publication

Because your current statement seems to completely contradict your previous one.

Or are you suggesting that the groups you didn't work with (and are thus speculating) are the ones who replicate works and the ones you did work with "just trust their competitor's competence")? Because if this is what you're saying then I do not think this "mostly" matches your experience. That your experience more closely matches my own.

[0] I should take that back. I started in physics (undergrad) and went to CS for grad. Replication could often be de facto in physics, as it was a necessary step towards progress. You often couldn't improve an idea without understanding/replicating it (both theoretical and experimental). But my experience in CS, including at national labs, was that people didn't even run the code. Even when code was provided as part of reviewing artifacts I found that my fellow reviewers often didn't even look at it, let alone run it... This was common at tier 1 conferences mind you... I only knew one other person that consistently ran code.

dekhn · 2026-03-11T21:13:09 1773263589

Note that my field is biophysics (quantitative biology) while yours is physics and CS. Those are done completely differently from biology; with the exception of some truly enormous/complex/delicate experiments that require unique hardware, physics tends to be much more reproducible than biology, and CS doubly-so.

Replication of an experiment and finding image fraud are kind of done as two different things. If somebody publishes a paper with image fraud, it's still entirely possible to replicate their results(!) and if somebody publishes a paper without any image fraud, it's still entirely possible that others could fail to replicate. Also, most image errors in papers are, imho, due to sloppy handling/individual errors, rather than intentional fraud (it's one of the reasons I worked so hard on automating my papers- if I did make an error, there should be audit log demonstrating the problem, and the error should be rectified easily/quickly in the same way we fix bugs in production at big tech).

This came up a bunch when I was at LBL because of work done by Mina Bissell there on extracellular matrix. She is actively rewriting the paradigm but many people can't reproduce her results- complex molecular biology is notororiously fickle. Usually the answer is, "if you're a good researcher and can't reproduce my work, you come to my lab and reproduce it there" because the variables that affect this are usually things in the lab- the temperature, the reagents, the handling.

See https://www.nature.com/articles/503333a (written by Dr. Bissell).

godelski · 2026-03-11T22:52:10 1773269530

  > physics tends to be much more reproducible than biology, and CS doubly-so.

With physics I think there is a better culture of reproduction, but that is, I believe, due more to culture. That it is acceptable to "be slow". There's a high stress on being methodical and extremely precise. The prestige is built on making your work bulletproof, and so you're really encouraged to help others reproduce your work as it strengthens it. You're also encouraged to analyze in detail and to faithfully reproduce, because finding cracks also yields prestige. I don't know if it's the money, but no one is in it for the money. Physics sure is a lot harder than anything else I've done and it pays like shit.

For CS the problem is wildly different. It should be easy to reproduce as code is trivial to copy. Ignoring the issue of not publishing code alongside results, there's also often subtle things that can make or break works. I've found many times in replication efforts that the success can rely on a single line that essentially comes form a work that was the reference to a reference of the work I'm trying to reproduce. The problem here is honestly more of laziness. In contrast to physics there's an extreme need for speed. In physics (like everyone else I knew) I often felt like I was not smart enough, and that encouraged people to dive deeper and keep improving or to give up. In CS (like everyone else I knew) I often felt like I was not fast enough, and that encouraged people to chase sponsorships from labs that provided more compute, it encouraged a "shotgun" approach (try everything), or for people to give up (aka "GPU poor").

The reason I'm saying this is because I think it is important to understand the different cultures and how replication efforts differ. In physics a replication failure was often assumed to be due to a lack of intelligence. In CS a replication effort is seen as a waste of time. Both are failures of the scientific process. Science is intended to be self-correcting. Replication is one means of this, but at its heart is the pursuit of counterfactual models. This gives us ways to validate, or invalidate, models through means other than direct replication. You can pursue the consequences of the results if you are unable to pursue the replication itself. This is almost always a good path to follow as it is the same one that leads to the extension and improvement of understanding.

There's a lot I agree and disagree with from Dr Bissell's article. Our perspectives may differ due to our different fields, but I do think it also serves as some a point of collaboration, if not on the subject of meta-science. Biology is not unique in having expensive experiments. I want to point out two famous and large physics projects: the LHC's discovery of the Higgs Boson[0] and LIGO's Observation of a Gravitational Wave[1]. The former has 9 full pages of authors (IIRC over 200) while the latter has about 3. These works are both too expensive to replicate while also demonstrating replication. Certainly we aren't going to take another 2 decades to build another CERN and replicate the experiments. But there's an easy to miss question that might also make apparent the existence of replication: who is qualified to review the paper and is not already an author of it? There's definitely some, but it really isn't that many. In these mega projects (and there are plenty more examples) the replication is done through collaboration. Independent teams examine the instruments that make the measurements. Independent teams make measurements, using the same device or different devices (ATLAS isn't the only detector at CERN), different teams independently analyze and process the information, and different teams model and simulate them. With LIGO this is also true. It would be impossible to locate those black holes without at least 2 facilities: one in Hanford (Washington) and the other in Livingston (Louisiana) (and now there's even more facilities). Astrophysics has a long history of this type of replication/collaboration as one team will announce an observation and it is a request for other observations. Observations that often were already made! In HEP (high energy particle physics) this may be less direct, but you'll notice other particle physics labs are in the author list of[0]. That's because despite the exact experiment not being replicatable in other facilities, there are still other experiments done. In the effort to find the Higgs there were many collisions performed at Fermi Lab.

I don't think this same in biophysics, but I think there are nuggets that may be fruitful. Bissell mentions at the end of her argument that she believes replication might have higher success were labs to send scientists to the original labs. I fully agree! That would follow the practice we see in these mega experiments in physics. But I also do think she's brushing off an important factor: it is far quicker and cheaper to replicate works than it is to produce them. You're a scientist, you know how the vast majority of time (and usually the vast majority of money) is "wasted" in failures (it'd be naive to call it waste). Much of this goes away with replication efforts. The greater the collaboration the greater the reduction in time and money.

And I do agree with Bissell in that we probably shouldn't replicate everything[2]. At least if we want to optimize our progress. But also I want to stress that there is no perfect system and there are many roadblocks to progress. Frankly, I'd argue that we waste far more time in things like grant writing and publication revisions. I don't know a single scientist who hasn't had a work rejected due to reviewers either not giving the work enough care or simply because they were unqualified (often working in a different niche so don't understand the minutia of the problem). As for the grant writings, I think they're a necessary evil but I'm also a firm believer of what Mervin Kelly (former director of Bell Labs) said when asked how you manage a bunch of geniuses: "you don't"[3]. You're a scientist, an expert in your domain. You already know what directions to look in. You've only gotten this far because you've been honing that skill. We don't have infinite money, so of course we have to have some bar, but we can already sniff out promising directions and we're much better at sniffing out fraud. Science has been designed to be self-correcting.

[More of a side note]

  > Usually the answer is, "if you're a good researcher and can't reproduce my work, you come to my lab and reproduce it there" because the variables that affect this are usually things in the lab- the temperature, the reagents, the handling.

And we should not undermine the importance of these variables. Failures based on them are still informative. They still inform us about the underlying causal structure that leads to success. If these variables were not specified in the paper, then a replication failure shows the mistake of the writing. Alternatively a failure can bound these variables, by making them more explicit. I'm no expert in biophysics, but I'm fairly certain that understanding the bounds of the solution space is important for understanding how the processes actually work.

[0] https://arxiv.org/abs/1207.7214

[1] https://arxiv.org/abs/1602.03837

[2] I also would be very cautious about paid replication efforts. I am strongly against it as well as paywalls on publishing (both in creation of publication as well as the access of).

[3] https://1517.substack.com/p/why-bell-labs-worked

dekhn · 2026-03-11T16:37:05 1773247025

The story has been updated with new data. It looks like (again, based on unofficial information sources) that the US military believes it made the mistake.

https://www.nytimes.com/2026/03/11/us/politics/iran-school-m...

The times is quite good at representing information accurately and "analysis suggests" is a clearly worded way of them saying "we're pretty sure but we don't have enough evidence to say for certain".

dekhn · 2026-03-11T16:29:24 1773246564

I had a somewhat similar experience- was a postdoc for a pre-tenure professor at berkeley. after writing up a paper based on her methods, with poor results, I handed the draft to her. She rewrote it- basically adding carefully worded/presented results that made it look as good as possible. And then submitted it (to a niche conference where the editor was a buddy of hers). When I read her submission I asked her to remove my name from it and she immediately withdrew the submission. I left her lab shortly after because I am not going to tarnish my publication record with iffy papers like that.

Over time I learned that most papers in my field (computational biology) are embellished to some extent or another (or cherry-picked/curated/structured for success) and often irreproducible- some key step is left out, or no code is provided that replicates the results, etc. I can see this from two perspectives:

1) science should be trivially reproducible; it should not require the smartest/most capable people in the field to read the paper and reproduce the results. This places a burden on the people who are at the state of the art of the field to make it easy for other folks, which slows them down (but presumably makes overall progress go faster).

2) science should be done by geniuses; the leaders in the field don't need to replicate their competitors paper. it's sufficient to read the paper, apply priors, and move on (possibly learning whatever novel method/technique the paper shows so they can apply it in their own hands). It allows the field innovators to move quickly and discover new things, but is prone to all sorts of reliability/reproducibility problems, and ideally science should be egalitarian, not credentials-based.

dekhn · 2026-03-10T16:40:02 1773160802

I wish he'd stop saying they solved the protein folding problem. They didn't. They made a step increase in the quality of protein structure prediction. That's a different (and simpler) problem. It's still a significant result, but nothing in AlphaFold tells us "why" (in a physics sense) a specific amino acid sequence adopts its final folded conformation, nor does it say anything about what the protein is doing once it reaches the final folded conformation.

dekhn · 2026-03-10T03:01:48 1773111708

It's a variation of nerd snipe. https://xkcd.com/356/

People get taken by the theoretical coolness and ultimate utility of the idea, and assume it's just a matter of clever ideas and engineering to make it a reality. At some point, it becomes mandatory to work on it because the win would be so big it would make them famous and win all sorts of prizes and adulation.

QC is far earlier than "linear regression" because linear regression worked right away when it was invented (reinvented multiple times, I think). Instead, with QC we have: an amazing theory based on our current understanding of physics, and the ability to build lab machines that exploit the theory, and some immediate applications were a powerful enough quantum computer built. On the other side, making one that beats a real computer for anything other than toy challenges is a huge engineering challenge, and every time somebody comes up with a QC that does something interesting, it spurs the classical computing folks to improve their results, which can be immediately applied on any number of off-the-shelf systems.

antonvs · 2026-03-10T09:29:57 1773134997

> People get taken by the theoretical coolness and ultimate utility of the idea, and assume it's just a matter of clever ideas and engineering to make it a reality. At some point, it becomes mandatory to work on it because the win would be so big it would make them famous and win all sorts of prizes and adulation.

Good description. Commercial fusion power seems to be in the same category currently.

The next step once you have enough thinkers working on the problem is to start pretending that commercial success is merely a few years away, with 5 or 10 years being the ideal number.