Hacker Newsnew | past | comments | ask | show | jobs | submit | shiftingleft's commentslogin

If there's anywhere I don't want LLM slop it's probably my database system.

Shouldn't your snippet be using lzcnt? I can't see how this would result in the desired lookup.

for Zen5 rustc creates the following:

  utf8_sequence_length_lookup:
    shl edi, 24
    mov ecx, 274945
    not edi
    lzcnt eax, edi
    shl al, 2
    shrx rax, rcx, rax
    and al, 7
    ret
https://rust.godbolt.org/z/hz1eKjnaG


I can't see any elegant solution. \n

Struggling proc \n lzcnt ecx,ecx \n test cl,cl \n setz al \n lea edx,[ecx-2] ;[0,1,2] \n cmp rdx,2 \n cmovle eax,ecx \n ret \n Struggling endp


and the leading count comes in cl

We can assume the count has already been done.


Ah I can't read, thanks :-)


  After we implemented advanced bot traffic detection and filtering, their reported traffic plummeted by 71%. [...]
  But then the sales report came in. Their actual sales went up by 34%.
  Their real conversion rate optimization (CRO) efforts had been working all along, but the results were buried under an avalanche of fake clicks. They were not bad at marketing; they were just spending thousands of dollars advertising to robots programmed never to buy anything. Their marketing ROI went from "terrible" to "excellent" overnight.
I don't understand how detecting bot traffic would directly lead to less ad spend.

Can you just tell e.g. Google Ads that you don't want to pay for certain clicks?

Did they modify their targeting to try to avoid bots?


I could imagine that blocking bot traffic, would improve their retargeting and make sure that the retargeting budget is spent on real people leading to an increase in conversion.


What's the API here for Google Ads? How does their site report to Google Ads whether that was a good/bad user? Is this done through conversion tracking? If so, why would you track anything but a completed purchase in the first place?


I think Google calls it remarketing and it goes through Google Tag Manager. You can "tag" visitors how you want (duration, action, page scroll, etc.). It's just a javascript call to the API which you can trigger however you like.

You wouldn't necessarily want to track conversions for retargeting, since depending on your product or service, a second buy might be unlikely. But someone who checks out multiple product pages or articles on your site might be interested and buy in the near future. That of course are also actions bots could easily do.


If you building look-alike or remarketing audiences, having any bot users in there could give the wrong signal to Facebook or other platforms.

>Can you just tell e.g. Google Ads that you don't want to pay for certain clicks?

No


Sites send conversion events back to Google so they can target highly converting traffic.

If a bot network hits all the conversing events then Google will tailor the traffic to look more like the bot network.

If you filter the bot traffic out then Google can tailor the traffic to look like real converting users instead.


I assume it's the filtering - detect the user is a bot, don't even load the ads, etc?


As I understand it they are placing ads on other sites and are paying for visits to their site.


You didn't think through this did you

How would you do that on Google or a third-party site?


The author admits to it.

https://www.reddit.com/r/rust/comments/1mh7q73/comment/n6uan...

The reply to that comment is also a good explainer of why the post has such a strong LLM smell for many.


Yeah, I completely agree with that reply, thanks for the link.

BTW that Reddit post also has replies confirming my suspicions that the technical content wasn't trustworthy, if anyone felt like I was just being snobby about the LLM writing: https://www.reddit.com/r/rust/comments/1mh7q73/comment/n6ubr...


The easiest way to parallelize this RNG is to just run it in parallel on multiple states.


Do they help deter people from becoming smokers in the first place?


Not sure if much serious research has been put into it. I would be suspicious of it deterring them because a lot of initial smoking happens in social situations where friends pass out individual cigarettes.

By the time someone buys their own pack they are probably hooked.

I suspect the obscene taxes blocking out young folks is one of the most effective strategies


I doubt that this is a problem in need of a technical solution. In any case, this system can easily be circumvented by emulating the key presses on that website.


Looking at a few files, there's definitely some generated comments in there. Do you have any method to quantify how much of it is (likely) generated?


> Shouldn't be too hard to do even with pen and paper since the 2-adic eval of 52! is large.

Could you elaborate?


https://en.wikipedia.org/wiki/P-adic_valuation

It's nothing fancy, get the prime power decomposition of your number and pick the exponent of p.

There's a clever way to do that for a factorial, but I have the Pari/GP app on my phone so I just did:

    valuation(52!,2)
which gives the answer 49, so 52! is divisible by 2 forty-nine times. Interestingly chatgpt4 turbo got it right with no extra prodding needed.


One can do this mentally easily enough. 52! = 525150....321, and there are 26 even numbers in this progression, so we have 26 powers of 2. Taking them out, there are 13 of those even factors divisible by 4, but we already took out the first 2 from each so we have 13 more 2's, giving 26 + 13 = 39. Now on to factors divisible by 8, they are half of those that are divisible by 4, so half of 13, giving 6 more 2's (52 is not divisible by 8 so we round down). Thus so far we have 39 + 6 = 45 two's in the factorization of 52!. On to numbers less than 52 that are divisible by 16, that's half those divisible by 8, so 3 more, getting us to 48. Finally there is there is the factor 32 = 2^5 of 52! giving us one more 2, hence 49. i.e. for p a prime, the largest k such that p^k divides n! is given by k = Floor(n/p) + Floor(n/p^2) + ... + Floor(n/p^t) where p^(t+1)>n


Does not seem right, the number is way too low.. after all, just the last factor (52) can be divided by 2 at least 5 times.

My calculator says 225 bits, and text suggests the same. Looks like chatgpt4 was wrong as usual:)


The 2-adic valuation is about how often 2 is a prime factor of a number.

For just 52 for example 2 is a prime factor twice, because (52/2)/2 = 13, which is no longer divisible by 2.

Or in other words 52! / (2^49) is an integer, but 52! / (2^50) is not, thus 49 is the correct answer.


When you see something that doesn't look right it's good to engage and work things out, but it's also courteous to check that you haven't misunderstood. I see how you could arrive at the understanding you had, that "how many times you can divide by 2" is equivalent to base-2 logarithm. It's not the right interpretation however, and in context it's clear.

Could I recommend phrasing this kind of comment as a question in future? (Notwithstanding the lifehack of making a false statement in the internet being the shortest path to an answer.)


Fair, I should have rephrased the comment to more directly reference the thread-starter, which is encoding bits "using lexicographic order of the permutation, doing a binary search each time." It's not that your computation of 2-adic decomposition is wrong, it's the idea that using 2-adic decomposition produces the number that is too low.

Let me elaborate:

I am not 100% sure what user qsort meant by "binary search", but one of the simplest manual algorithms I can think of is to use input bits as decision points in binary-search-like input state split: you start with 52 cards, depending on first input bit you take top or bottom half of the set, then use 2nd input bit to select top or bottom of the subset, and so on, repeat until you get to a single card. Then place it in the output, remove from input stack, and repeat the procedure again. Note there is no math at all, and this would be pretty trivial to do with just pen & paper.

What would be the resulting # of bits encoded this way? With 54 cards, you'd need to consume 5 to 6 bits, depending on input data. Once you are down to 32 cards, you'd need 5 bits exactly, 31 cards will need 4-5 bits depending on the data, and so on... If I'd calculated this correctly, that's at least 208 bits in the worst case, way more that 51 bits mentioned above.

(Unless there is some other meaning to "51" I am missing? but all I see in the thread are conversations about bit efficiently...)


To be clear I agree with your interpretation about how much data you can store in the deck permutation and how to search it, my previous comment was only about p-adic valuations. I can't actually see how the 49 is relevant either.


And 50 of the factors of 52! are greater than 2.


The basic method would be to assign a number, 0 through 52!-1, to each permutation in lexicographic order. Because 52! is not a power of 2, if you want to encode binary bits, you can only use 2^N permutations, where that number is the largest power of 2 less than 52!. You can not losslessly encode more than N bits, that's a hard bound, they just won't fit.

If you wanted to turn this into an actual protocol, you would presumably flag some permutations as invalid and use the other ones. You would then encode one bit at a time doing a binary search of the set of valid permutations.

Because 52! has a large number of 2s in its factorization, for a careful choice of the valid permutations it should be practical (or at least not significantly more impractical than the OP's proposed method) to perform this by hand because you would be able to "eyeball" most splits of the binary search.


Their talk was quite nice, they talk about experiences with other HSMs, their history, what lead them to design their own, the many aspects of their design and then go through potential attacks:

https://youtu.be/zD5EdvGs98U?t=13m23s


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: