> Innovation and privacy go hand in hand here at Mozilla [...] Rest assured, the...

SllX · on May 14, 2024

Privacy first is just not collecting the data. Everything else is just a compromise and an excuse.

dylan604 · on May 14, 2024

I don't have a problem if the server side is tracking metrics on what queries are being made. When it becomes a problem is trying to associate the queries based on a single user regardless on anonymization attempts. Of course there's no money in that.

stevenicr · on May 14, 2024

Wasn't like 15? years ago with the AOL search data anonymized release that people found multiple ways to connect dots to whittle it down to just one person or a small subset of people?

Wasn't this also shown with anonymized taxi-cab data (released in NY?) many moons ago?

Would it not be possible with knowing that you are tracking this data to funnel people into doing searches in a way that would reveal things?

Directions to the out of state reproductive health clinic, combined with card data would be all it takes to do serious things to people in some states.

Defaults matter. A lot.

Anonymized data is not always anonymous, collected server side or otherwise.

ziddoap · on May 14, 2024

Yes, this is a process called (fittingly) data re-identification.

There are many papers on the topic. One of the more popular examples is "Robust De-anonymization of Large Sparse Datasets" using the Netflix Prize Dataset.

>We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.

https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

This paper speaks about AOL in 2006, which I think you are referring to: https://digitalcommons.law.uw.edu/cgi/viewcontent.cgi?articl...

However, it should be noted that the AOL dataset had a bunch of stuff that was identifiable by its nature (e.g. people searching for their full names or address), and the dataset wasn't scrubbed of those searches. So the controversy wasn't just re-identification of data, but also just a bunch of already-identifiable data.

>Anonymized data is not always anonymous

More importantly, in my opinion, is that data that is anonymous now is just one other dataset away from not being anonymous anymore.

saghm · on May 14, 2024

> Anonymized data is not always anonymous, collected server side or otherwise

If anything, I think it's both safer and more accurate to start from the assumption "anonymized" data can be de-anonymized and and require evidence to refute that rather than starting from a place of assumption that anonymization works and then trying to find a way to attack it. In practice, there's just not a good track record of this being done effectively, and I think people should generally be skeptical of whether this is even possible in many cases.

JohnFen · on May 14, 2024

There is only one way that data can really be "anonymized": if the individual data points are aggregated and the original collected data is deleted. Short of that, anonymization is basically illusory.

The trouble is that we'd still have to take the word of the entity doing the data collection that they've done this properly, and it's clear that we can't take anyone's word for that.

salawat · on May 14, 2024

Anonymization is effectively not achievable. Limited anonymoty may be possible within the scope of a particular dataset, but all it yakes is one enrichment pipeline to strip that all away. If you don't think that's what places like insurers do on a regular basis, you're a fool.

unethical_ban · on May 14, 2024

Do you believe there is no utility in knowledge? Do you believe absolutism is the only philosophy for privacy?

I think telemetry and the data software collects can help with usability, design, and product enhancement and that it's very likely this can be done to some extent without harming privacy.

autoexec · on May 14, 2024

> I think telemetry and the data software collects can help with usability, design, and product enhancement

It can, but that's too often little more than a justification to take as much data as possible much of which is misused. In this case, what a person searches for on the internet isn't helping one bit with "usability, design, and product enhancement"

salawat · on May 14, 2024

You know what? That's fine and good. Ask users to opt-in, and respect the wishes of those who opt-out.

Anything short of that is you ignoring the concept of consent entirely, and doing whatever mental gymnastics you can to feel fine with doing so.

Barrin92 · on May 14, 2024

This has nothing to do with mental gymnastics. Consent isn't a meaningful concept when it comes to each individual setting of a piece of software. You'd have to make 500 decisions every minute if you had to actively consent to everything your software does.

In fact cookie banners show this. People hate them because they force meaningless choices on them. If you make a website with tracking as an opt-out option, almost everyone clicks "accept all". If you make a website with tracking as opt-in, almost every one clicks accept all. That shows that opt-in/out or consent does literally nothing ot reflect people's preferences, the act of making a choice completely dominates the actual decision.

That means that if you want to respect user preferences you don't actually get around making default choices for them, and it's why consent is pretty much meaningless.

bobthecowboy · on May 14, 2024

> In fact cookie banners show this. People hate them because they force meaningless choices on them. If you make a website with tracking as an opt-out option, almost everyone clicks "accept all". If you make a website with tracking as opt-in, almost every one clicks accept all. That shows that opt-in/out or consent does literally nothing ot reflect people's preferences, the act of making a choice completely dominates the actual decision.

I disagree with this interpretation - the banners force themselves in front of the user before accessing the content. And then the choice is almost always "Accept all" and "complete a checklist mini-game of things you don't want cookies for". It's not a shock that people when confronted with this will click the easy button, and that doesn't mean it reflects their actual interests. It's just fatigue. If the "accept our cookies" button was off to the side of the page, and defaulted to "none" unless the user did something otherwise, I wonder what the "accept all" numbers would look like then. Actually, I don't.

Barrin92 · on May 14, 2024

> It's not a shock that people when confronted with this will click the easy button, and that doesn't mean it reflects their actual interests.

Yes, but that was my actual point. If one simple UI design trick is enough to completely flip the choices of users, then consent forms aren't a robust way to collect preferences at all. In fact if you wanted to genuinely and in good faith provide access to granular preferences, giving a more complicated set of choices would be the only way to go about it, and that fatigue is still real even if the design has a legitimate purpose.

What you're saying is true, the only way for the choice to be representative would be to have like a binary yes/no choice because that's simple, but that's not even necessarily what the user wants either. You're going to get a significantly more accurate view of people's real preferences by collecting data, like what Firefox is doing here, and then setting defaults accordingly.

int_19h · on May 17, 2024

That does not justify collecting such data without their explicit consent.

MattGaiser · on May 14, 2024

> Personally, I'd see "privacy first" as not needing to sell any user data at all, in the first place, but we're clearly beyond that already.

That requires users paying, something often suggested, but I have not heard of working commercially for anything where a competitor can supply a user as the product alternate. Privacy just isn't that valuable.

autoexec · on May 14, 2024

> That requires users paying

That's clearly not a solution either since expensive products and services with subscriptions still routinely collect every scrap of data they can get their hands on. Companies will always make more money by violating your privacy while also charging you as much as they possibly can so that's exactly what they do.

beretguy · on May 14, 2024

I think a small fee subscription can be very attractive. Less than $5 a month. I already pay $5/month for better search experience. Sometimes Kagi finds things neither DDG nor Brave can.

cubefox · on May 14, 2024

> Wouldn't actually putting "user privacy first" lead to the conclusion that gathering insights like this shouldn't be done on a opt-out basis and instead be opt-in, at the very least?

At the very least collection of non-anonymized data should be opt-in at most. So where is the problem?