Kind of a weird leaderboard if you can’t even check to see if answers satisfy correctness. I’ve seen plenty of leaderboards where the top answers would fail cases that weren’t covered by the leaderboards testing.
I think it's not such a big deal because the input generators are usually pretty good and because most solutions that give up some correctness for performance can pretty cheaply detect that they done goofed and fall back to a slow path where they start over and do it some other way. This ends up being useful because scoring is the median runtime of 3 successful runs (so if you fail one of the 3 then your submission fails completely). It also means that probabilistic solutions that would be correct in non-adversarial environments (like using a bloom filter or N bloom filters tuned to give an exact result given the input constraints with probability >99.9%) are admissible.
This particular solution does satisfy correctness, although I can't share the source code (as the competition is ongoing). Feel free to provide your input data and I'll run it to compare with your expected result.