> Innovation and privacy go hand in hand here at Mozilla [...] Rest assured, the way we gather these insights will always put user privacy first [...] Remember, you can always opt out of sending any technical or usage data to Firefox
Wouldn't actually putting "user privacy first" lead to the conclusion that gathering insights like this shouldn't be done on a opt-out basis and instead be opt-in, at the very least?
Personally, I'd see "privacy first" as not needing to sell any user data at all, in the first place, but we're clearly beyond that already.
I don't have a problem if the server side is tracking metrics on what queries are being made. When it becomes a problem is trying to associate the queries based on a single user regardless on anonymization attempts. Of course there's no money in that.
Wasn't like 15? years ago with the AOL search data anonymized release that people found multiple ways to connect dots to whittle it down to just one person or a small subset of people?
Wasn't this also shown with anonymized taxi-cab data (released in NY?) many moons ago?
Would it not be possible with knowing that you are tracking this data to funnel people into doing searches in a way that would reveal things?
Directions to the out of state reproductive health clinic, combined with card data would be all it takes to do serious things to people in some states.
Defaults matter. A lot.
Anonymized data is not always anonymous, collected server side or otherwise.
Yes, this is a process called (fittingly) data re-identification.
There are many papers on the topic. One of the more popular examples is "Robust De-anonymization of Large Sparse Datasets" using the Netflix Prize Dataset.
>We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.
However, it should be noted that the AOL dataset had a bunch of stuff that was identifiable by its nature (e.g. people searching for their full names or address), and the dataset wasn't scrubbed of those searches.
So the controversy wasn't just re-identification of data, but also just a bunch of already-identifiable data.
>Anonymized data is not always anonymous
More importantly, in my opinion, is that data that is anonymous now is just one other dataset away from not being anonymous anymore.
> Anonymized data is not always anonymous, collected server side or otherwise
If anything, I think it's both safer and more accurate to start from the assumption "anonymized" data can be de-anonymized and and require evidence to refute that rather than starting from a place of assumption that anonymization works and then trying to find a way to attack it. In practice, there's just not a good track record of this being done effectively, and I think people should generally be skeptical of whether this is even possible in many cases.
There is only one way that data can really be "anonymized": if the individual data points are aggregated and the original collected data is deleted. Short of that, anonymization is basically illusory.
The trouble is that we'd still have to take the word of the entity doing the data collection that they've done this properly, and it's clear that we can't take anyone's word for that.
Anonymization is effectively not achievable. Limited anonymoty may be possible within the scope of a particular dataset, but all it yakes is one enrichment pipeline to strip that all away. If you don't think that's what places like insurers do on a regular basis, you're a fool.
Do you believe there is no utility in knowledge? Do you believe absolutism is the only philosophy for privacy?
I think telemetry and the data software collects can help with usability, design, and product enhancement and that it's very likely this can be done to some extent without harming privacy.
> I think telemetry and the data software collects can help with usability, design, and product enhancement
It can, but that's too often little more than a justification to take as much data as possible much of which is misused. In this case, what a person searches for on the internet isn't helping one bit with "usability, design, and product enhancement"
This has nothing to do with mental gymnastics. Consent isn't a meaningful concept when it comes to each individual setting of a piece of software. You'd have to make 500 decisions every minute if you had to actively consent to everything your software does.
In fact cookie banners show this. People hate them because they force meaningless choices on them. If you make a website with tracking as an opt-out option, almost everyone clicks "accept all". If you make a website with tracking as opt-in, almost every one clicks accept all. That shows that opt-in/out or consent does literally nothing ot reflect people's preferences, the act of making a choice completely dominates the actual decision.
That means that if you want to respect user preferences you don't actually get around making default choices for them, and it's why consent is pretty much meaningless.
> In fact cookie banners show this. People hate them because they force meaningless choices on them. If you make a website with tracking as an opt-out option, almost everyone clicks "accept all". If you make a website with tracking as opt-in, almost every one clicks accept all. That shows that opt-in/out or consent does literally nothing ot reflect people's preferences, the act of making a choice completely dominates the actual decision.
I disagree with this interpretation - the banners force themselves in front of the user before accessing the content. And then the choice is almost always "Accept all" and "complete a checklist mini-game of things you don't want cookies for". It's not a shock that people when confronted with this will click the easy button, and that doesn't mean it reflects their actual interests. It's just fatigue. If the "accept our cookies" button was off to the side of the page, and defaulted to "none" unless the user did something otherwise, I wonder what the "accept all" numbers would look like then. Actually, I don't.
> It's not a shock that people when confronted with this will click the easy button, and that doesn't mean it reflects their actual interests.
Yes, but that was my actual point. If one simple UI design trick is enough to completely flip the choices of users, then consent forms aren't a robust way to collect preferences at all. In fact if you wanted to genuinely and in good faith provide access to granular preferences, giving a more complicated set of choices would be the only way to go about it, and that fatigue is still real even if the design has a legitimate purpose.
What you're saying is true, the only way for the choice to be representative would be to have like a binary yes/no choice because that's simple, but that's not even necessarily what the user wants either. You're going to get a significantly more accurate view of people's real preferences by collecting data, like what Firefox is doing here, and then setting defaults accordingly.
> Personally, I'd see "privacy first" as not needing to sell any user data at all, in the first place, but we're clearly beyond that already.
That requires users paying, something often suggested, but I have not heard of working commercially for anything where a competitor can supply a user as the product alternate. Privacy just isn't that valuable.
That's clearly not a solution either since expensive products and services with subscriptions still routinely collect every scrap of data they can get their hands on. Companies will always make more money by violating your privacy while also charging you as much as they possibly can so that's exactly what they do.
I think a small fee subscription can be very attractive. Less than $5 a month. I already pay $5/month for better search experience. Sometimes Kagi finds things neither DDG nor Brave can.
> Wouldn't actually putting "user privacy first" lead to the conclusion that gathering insights like this shouldn't be done on a opt-out basis and instead be opt-in, at the very least?
At the very least collection of non-anonymized data should be opt-in at most. So where is the problem?
Wouldn't actually putting "user privacy first" lead to the conclusion that gathering insights like this shouldn't be done on a opt-out basis and instead be opt-in, at the very least?
Personally, I'd see "privacy first" as not needing to sell any user data at all, in the first place, but we're clearly beyond that already.