A friend of mine co-runs a semi-popular semi-niche news site (for now more than ...

afandian · on Dec 22, 2024

Are you saying that Google down-ranked them in search engine rankings for user behaviour in AdWords? Isn't that an abuse of monopoly? It still surprises me a little bit.

malfist · on Dec 23, 2024

Who's going to call them on it if it is?

EdwardDiego · on Dec 23, 2024

Yeah, but then who is going to stop them acting monopolistic?

New administration is going to be monopoly friendly.

I was honestly pleased that Gaetz was nominated for AG solely because he's big on antitrust. Or has been.

mapt · on Dec 23, 2024

Any sentiment expressed by the party which has dedicated itself to unrestricted corporate rights in this direction is an insincere attempt to pander to a current culture war front they are fighting that week; In this case, likely something along the lines of 'Twitter censored Trump's hydroxychloroquine post - we MUST PUNISH THEM AND REIGN IN BIG TECH [for not contributing to the fascist project]'.

EDIT: Direct quote - "The internet's hall monitors out in Silicon Valley, they think they can suppress us, discourage us. Maybe if you're just a little less patriotic. Maybe if you just conform to their way of thinking a little more, then you'll be allowed to participate in the digital world,"

This isn't an attempt to ensure freedom from monopoly, this is an attempt to enforce partisan control of the message, weaponizing the idea of free speech using force.

I can assert that the 'common public square' idea central to freedom of speech is disappearing, and that this is a bad thing, but that's not what this man has been arguing or why this man has chosen this issue.

johnnyanmac · on Dec 23, 2024

if you believe their words (and I can't blame anyone who doesn't) apparently they want to lighten regulations on everything except big tech. So there may be a chance all those Google/Amazon cases will keep going on into the Trump administration.

llamaimperative · on Dec 23, 2024

To be clear this isn't because they have a problem with monopoly businesses abusing consumers. It's because big tech exercised their First Amendment rights in ways he found undesirable.

https://www.bbc.com/news/world-us-canada-57754435

Note that he's still talking about breaking up tech companies but not... X? (Surely that will resume once he and Elon have a falling out)

m3047 · on Dec 22, 2024

It's not that hard to dominate bots. I do it for fun, I do it for profit. Block datacenters. Run bot motels. Poison them. Lie to them. Make them have really really bad luck. Change the cost equation so that it costs them more than it costs you.

You're thinking of it wrong, the seeds of the thinking error are here: "I wonder how soon it becomes actually infeasible to operate a website with actual original content".

Bots want original content, no? So what's the problem with giving it to them? But that's the issue, isn't it? Clearly, contextually, what you should be saying is "I wonder how soon it becomes actually infeasible to operate a website for actual organic users" or something like that. But phrased that way, I'm not sure a CDN helps (I'm not sure they don't suffer false positives which interfere with organic traffic when they intermediate, more security theater because hangings and executions look good, look at the numbers of enemy dead).

Take measures that any damn fool (or at least your desired audience) can recognize.

Reading for comprehension, I think Rachel understands this.

throaway89 · on Dec 22, 2024

what is a bot motel and how do you run one?

m3047 · on Dec 22, 2024

Easy way is to implement e.g. a 4xx handler which serves content with links which generate further 4xx errors and rewrite the status code to something like 200 when sent to the requester. Load the garbage pages up with... garbage.

m3047 · on Dec 23, 2024

Since this is getting upvoted, I will put forth a suggestion I've made to the people who've paid me to help with this sort of subterfuge: turn your 404 handler into search. Then a human who goes there has a way out. But absolutely, load it up with garbage and broken links.

throaway89 · on Dec 22, 2024

Thanks, and you can make money with this? Sorry I'm a total noob in this area.

shadowgovt · on Dec 23, 2024

Not really... You cost the bots money.

Many are trying to index the web for whatever reason. By feeding them a Library of Babel, you can clog up their storage with noise.

m3047 · on Dec 23, 2024

Once in a while people pay you to do something you enjoy doing, like making people cry and wish they had a jobs flipping burgers instead. But I do it on my own systems for fun, honestly.

yesco · on Dec 22, 2024

The idea is that bots are inflexible to deviations from accepted norms and can't actually "see" rendered browser content. So if your generic 404, 403 error pages return a 200 status instead, with invisible links to other non accessible pages. The bots will follow the links but real users will not, trapping them in a kind of isolated labyrinth of recursive links (the urls should be slightly different though). It's basically how a lobster trap works if you want a visual metaphor.

The important part here is to do this chaotically. The worst sites to scrape are buggy ones. You are, in essence, deliberately following bad practices in a way real users wouldn't notice but would still influence bots.

blfr · on Dec 22, 2024

QQBrowser users from Dallas are more likely to be Chinese using a VPN than bots, I would guess.

strogonoff · on Dec 22, 2024

That much is clear, yeah. The VPN they use may not be a service advertised to public and featured in lists, however.

Some of the new traffic did come directly from Tencent data center IP ranges and reportedly those bots signed themselves in UA. I can’t say whether they respect robots.txt because I am told their ranges were banned along with robots.txt tightening. However, US IP bots that remain unblocked and fake UA naturally ignore robot rules.

thaumasiotes · on Dec 22, 2024

> The VPN they use may not be a service advertised to public and featured in lists, however.

Well, of course not, since the service is illegal.

m3047 · on Dec 22, 2024

I'm seeing some address ranges in the US clearly serving what must be VPN traffic from Asia, and I'm also seeing an uptick in TOR traffic looking for feeds as well as WP infra.

BadHumans · on Dec 22, 2024

At my company we have seen a massive increase in bot traffic since LLMs have become mainstream. Blocking known OpenAI and Anthropic crawlers has decreased traffic somewhat so I agree with your theory.

nicbou · on Dec 23, 2024

I don’t think it’s a bot thing. Traffic is down for everyone and especially smaller independent websites. This year has been really rough for some websites.

wkat4242 · on Dec 26, 2024

I think it's also because a lot of sites have started paywalling. So users walk away.

is_true · on Dec 23, 2024

I too found an extremely unlikely % of iphone users when checking access logs.

wiseowise · on Dec 22, 2024

> who are so tech savvy that they apparently do not require CSS

Lmao!

m3047 · on Dec 23, 2024

Heres Crime^H^H^H^H^(ahem)Cloudflare requesting assets from one of my servers. I don't use Cloudflare, they have no business doing this.

  104.28.42.8 - - [21/Dec/2024:13:58:35 -0800] consulting.m3047.net "GET /apple-touch-icon-precomposed.png HTTP/1.1" 404 980 "-" "NetworkingExtension/8620.1.16.10.11 Network/4277.60.255 iOS/18.2"
  104.28.42.8 - - [21/Dec/2024:13:58:35 -0800] consulting.m3047.net "GET /favicon.ico HTTP/1.1" 200 302 "-" "NetworkingExtension/8620.1.16.10.11 Network/4277.60.255 iOS/18.2"
  104.28.42.8 - - [21/Dec/2024:13:58:35 -0800] consulting.m3047.net "GET /dubai-letters/balkanized-internet.html HTTP/1.1" 200 16370 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0"
  104.28.42.8 - - [21/Dec/2024:13:58:35 -0800] consulting.m3047.net "GET /apple-touch-icon.png HTTP/1.1" 404 980 "-" "NetworkingExtension/8620.1.16.10.11 Network/4277.60.255 iOS/18.2"

  # dig -x 104.28.42.8

  ; <<>> DiG 9.12.3-P1 <<>> -x 104.28.42.8
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 35228
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

  ;; OPT PSEUDOSECTION:
  ; EDNS: version: 0, flags:; udp: 1280
  ; COOKIE: 6b82e88bcaf538fc7ab9d44467685e82becd47ff4492b1be (good)
  ;; QUESTION SECTION:
  ;8.42.28.104.in-addr.arpa.      IN      PTR

  ;; AUTHORITY SECTION:
  28.104.in-addr.arpa.    3600    IN      SOA     cruz.ns.cloudflare.com. dns.cloudflare.com. 2288625504 10000 2400 604800 3600

  ;; Query time: 212 msec
  ;; SERVER: 127.0.0.1#53(127.0.0.1)
  ;; WHEN: Sun Dec 22 10:46:26 PST 2024
  ;; MSG SIZE  rcvd: 176

Further osint left as an exercise for the reader.

Crosseye_Jack · on Dec 23, 2024

104.28.42.0/25 Is one of the ip ranges used by Apples Private Relay (via Cloudflare)

https://github.com/hroost/icloud-private-relay-iplist/blob/m...

(There is also a list of ranges on apples site, but I forget where…)

Edit: found it https://mask-api.icloud.com/egress-ip-ranges.csv

shadowgovt · on Dec 23, 2024

What is the issue with this request?

m3047 · on Dec 23, 2024

> What is the issue with this request?

I didn't realize this was an Apple thing, but that's fine. It changes the color of the horse and the name of the river, but the same road leads to the same destination.

1) There is a notion that Cloudflare is a content distribution network. The risk profile for a content distribution network is different from a VPN service. Now I know it's a VPN service (or is it?). Changes it from "seems weird and inappropriate" to "do I care about people relying on this? no, probably not". Cloudflare can't be arsed to provide reverse DNS for something which is clearly not part of their CDN, or is it?

1.5) Is it layer 2 or application? Cloudflare runs a CDN. Correct me if I'm wrong, but the CDN is a reverse proxy is it not? Is Cloudflare caching my website's content? Can they observe it? (It's surprisingly hard to find a solid explanation, but they talk about "proxies" and "decrypts the name of the website you requested" and none of that adds clarity, it makes it sound more like believe what we want you want to believe.)

2) I don't block incoming SYNs from Cloudflare (yet) the way I do with Amazon, and this traffic per se isn't going to trip any mitigations here. But not all of the traffic is as benign (and it's impressive that they're so technically savvy they don't need the CSS as noted elsewhere). Presumably those exit points are shared by multiple customers. Did I mention I block all incoming SYNs from Amazon?

Crosseye_Jack · on Dec 23, 2024

> and it's impressive that they're so technically savvy they don't need the CSS as noted elsewhere

With the logs you provided, they appear to be coming from within iMessage.

So when someone posts a link in iMessage it will fetch the favicon(s) and the html in order to generate a “preview” of the page with the title of the page and use one of the favicons. It doesn’t need to fetch any css files to do this.

Not saying bad actors don’t fetch css either, but the lack of it being fetched doesn’t mean that it’s a bad actor.

As for why CF don’t reverse DNS their IPs stating it’s iCloud private relay, well CF are not Apples only 3rd party egress provider (Akamai are also one that springs to mind). So if the number of providers can change at any time, the best source of information about valid egress providers is from Apple themselves.

But Apple do also publish these changes to geo-location databases for you to query, for example: https://www.ip2location.com/demo/104.28.42.8 lists it as iCloud Private Relay.

As for “are CloudFlare caching my site when ran through private relay?”, not 100% sure, I’ll have to check my own logs and cba’ed right now, but I don’t think so (it’s been a while since I ran tests on it to see how it behaved to be 100% sure right this minute.

But I think it would be silly of them if they did as they may not be aware of the what to cache and for who. Let’s say they cached /profile without knowing what the server is using to determine who the logged in user is, they may false cache-hit and leak data from a previous request. When they act as your sites CDN you explicitly tell them what to cache on, but when acting as a relay (either for apple or their own warp product) for a site they are not a CDN for they are missing this info, sure they could guess, but why risk being wrong?)

m3047 · on Dec 23, 2024

Thanks for the explanation.