Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tor Users Might Soon Have a Way to Avoid Those Annoying CAPTCHAs (vice.com)
117 points by walterbell on Oct 1, 2016 | hide | past | favorite | 47 comments


Here is the gist of how it works: user solves a one-time traditional CAPTCHA. A browser extension creates a bunch of "tokens", cryptographically blinds them, and asks the server to sign them. Each token can be used only 1 time to prove the user is not a robot: later the user can submit to the server an unblinded, signed token effectively saying "hey, you previously verified me as human, here his a signed token you gave me".

The key insight is the use of blind signatures (https://en.wikipedia.org/wiki/Blind_signature) to provide anonymity: 2 unblinded tokens can't be determined as belonging to the same user, because the server didn't see what it signed.

It's a neat idea but it's just moving the trade-off more towards user convenience and away from security: if a server accepts to sign 100 tokens per CAPTCHA, then solving 1 CAPTCHA now allows a robot to do 100x the work it did before.


It's a bit more than that. Normally, once you pass a CAPTCHA we leave you alone across the network for a while. With Tor, however, you change circuit every 10 minutes and every origin change. Tracking you across circuits would be obviously an invasion of your privacy.

This allows us to extend the lifetime of an "approval" across time and domains.

So it's not just changing the trade-off. It's enabling a human behavior that should make the user experience much better, while preserving privacy and not really changing the equation for bots.

(Disclaimer: I work for Cloudflare and wrote the original incomplete draft.)


This does not preserve anonymity. I suspect you may be confused as to why TOR is changing circuits it's not to prevent users from logging into websites.


Yes it is. They are using blind signatures to track user's who have solved a captcha. https://en.wikipedia.org/wiki/Blind_signature


  Hi I am 54.123.22.16 let's open a secure connection my public key is 5468..
  connection -> This is FarkUser57, token:fark123
  connection -> FarkUser57 -> "Actual message 1"
  Connection times out.

  Hi I am 55.55.55.123 let's open a secure connection my public key is 5592..
  connection -> This is FarkUser57, token:fark123
  connection -> FarkUser57 -> "Actual message 2"
  Connection times out.
As long as FarkUser57 does not map back to your real you, then you preserve anonymity.


Blind signatures do not work like user names. You would use a new signed token for every authentication and because the server itself does not know which tokens it signed, it can only verify that a token is valid, but not find out which tokens belong to the same user.


I wonder about that. Could it use a different let for every user and verify which let it used to track them? Or would that be infeasible due to the cost of verifying the signature?


The point is blind signatures are not needed. If the first TOR connection is anonymous then it can hand you a token that is anonymous the second time. Using multiple IP over time is a way to maintain anonymity vs traffic analysis so you don't want third party's to see the token.

You do need to trust the endpoint as. The end point can already easily deanonymize you in under a minute based on the traffic it's sending, if that can be compared with the traffic you get from TOR.


> then solving 1 CAPTCHA now allows a robot to do 100x the work it did before

The point of a CAPTCHA is to validate that the person browsing a website is a human being. Once they have done that, the person can click on 1 page, 100x pages, or 1000x pages on the site, and from a security perspective that is the same result. The goal was to identify the nature of the entity behind the request, not limit how many times the person accessed the site.

If a server accepts to sign 100 tokens per CAPTCHA, it means that the server assumes those 100 tokens represent the same human person. It do not allow a robot to do 100x more work than before unless the captcha is so faulty that the assumption is incorrect, in which case the captcha is broken and should be replaced. A captcha that only work 90% of the times is not actually preventing bots, since bot owners can easily just add 10x more bots.


Let me reexplain the threat model that I see.

Consuming one token bypasses the need to solve one CAPTCHA. Now you have heard about people being hired to solve CAPTCHAs, right? Well with these tokens, every time a human solves one CAPTCHA and obtains 100 tokens, it's as if he helped a robot bypass 100 CAPTCHAs.

Other example: say a CAPTCHA system is so good that only 0.1% of its challenges can be solved by a robot. With these tokens, if 100 are signed per solved CAPTCHA, then it's as if the robot could solve 10% of them. A 100x improvement.


1) Bot sends the CAPTCHA to a human (perhaps using https://de-captcher.com/) 2) Have the bot crawl 100 pages 3) ...


It uses a blind signature protocol allowing the client to generate bypass tokens without future correlation. That's good.

Unfortunately, because it requires that the user use a plugin, this creates two groups of Tor users: those that are using this protocol and those that aren't. This is more information that can be used---with other information---to aid in de-anonymizing users. (To be clear: using ephemeral JavaScript, as they mentioned, is not a credible option, so they have chosen the better route here.)

CloudFlare stores cookies today, yes, but they can be ephemeral with good client cookie policies. A browser plugin usually persists sessions---even if the tokens don't, the fact that it is _installed_ does.

I understand that this is the case for other plugins as well.

In any case, CloudFlare criticism aside: I'm glad that CloudFlare is listening to the Tor community, and has come up with a protocol that does its best to respect users' privacy.


The Tor Browser Bundle is pretty persistent about updates. If you're a version behind, it lets you know frequently, with flashy annoying notices. Being a version behind often has security implications, and users hiding behind Tor are often very dependent on being secure, so it makes sense.

That also means that if TBB were to be extended with a new plugin, it would get to every user very quickly. Especially if they did some sort of time delay (probably overkill), where the browser updated 2 weeks before the update actually kicked in. Then everyone who has upgraded in the past 2 weeks instantly gets group-anonymity, and everyone who hasn't upgraded has only themselves to blame because the browser gives you a nice flashy warning immediately when you open it up.

I hope that the code is audited for back doors by multiple independent parties, but other than that I think this is fantastic.


> The Tor Browser Bundle is pretty persistent about updates.

That's assuming that the Tor Browser Bundle (and Tails) will include it. I'm curious what they will decide.


Isn't the bigger issue that Cloudflare's default 'protection' setting is too eager?

A few years back back I used cloud flare on my company website in an attempt to improve speed. When I browsed from public Wifi, Cloud flare showed challenges for static sites; this seemed pretty pointless. What kind of attack is cloudflare preventing when they block people from accesing static sites?


Since the CAPTCHAs trigger based on IP reputation, I assume they're trying to prevent things like automated forum spam, by default for everyone. I don't know if there's a setting to avoid challenges even from shady IPs unless the site is under attack, that sounds like it could apply to a lot of websites.


The CATPCHAs might not appear for everyone, but automated JavaScript challenges seem to be required on some sites regardless of your IP (and without JavaScript, you just get stuck in a refresh loop on the challenge page).

It's a real slap in the face to have to enable JavaScript just to gain access to a site, especially if it's a site that doesn't even use JavaScript, or a site that you don't entirely trust.

The funny thing is that it's entirely possible to open a bunch of these tabs in the background (e.g. search for 'site:somesitethatusescloudflare' and just middle click some results that interest you for later reading) and forget about them while they sit there refreshing over and over again, until you really do get blocked (assuming they keep track of unanswered challenges like they do for failed ones).


While obviously useful for the poor souls that live in backwards dictatorships/monarchies like north korea and the uk. i dont personally think tor exit nodes are that great an idea.

tor traffic should stay on the tor network.


I was under the impression allowing people to access the internet at large is explicitly part of Tor's design. If you want something more inward-facing, I2P seems to lean in that direction. I2P's comparison with Tor suggests this too:

https://geti2p.net/en/comparison/tor

Tor: "Designed and optimized for exit traffic, with a large number of exit nodes"

I2P: "Designed and optimized for hidden services, which are much faster than in Tor"


yes, but my point was I think its a "bad" design.

It is not part of the internet, if servers want to provide "anonymous" access they can provide a tor address. But if they don't, then accessing a standard webserver via tor is very little different than any of the other means of illegitimately accessing computers.

->web access via tor seems like a lot of effort to provide what is little more than providing "hacking" services against normal webservers.

I would much rather see for example: https://thepiratebay.se.onion

where tor contacts the dns for thepiratebay.se to get the underlying onion address to use.

much better all round than either going through an exit node (most of which are malicious) to thepiratebay.se or trying to remember uj3wazyk5u4hnvtk.onion or whatever it has changed to now.


What kind of attack is cloudflare preventing when they block people from accesing static sites?

DoS. The Tor network has quite a bit of bandwidth (https://metrics.torproject.org/bandwidth.html) and we see DoS attacks at L7 (e.g. floods of HTTP requests) that attempt to knock sites with poor bandwidth/servers offline.

There are many, many other attacks that come through Tor.


> sites with poor bandwidth

that point is beyond moot as we are talking exclusively about sites already fronted by CF


Nope.

Just because Cloudflare has a ton of bandwidth doesn't mean the origin web server does. An attacker who wants to hurt an origin server just needs to find a URI that Cloudflare doesn't cache (and that's very often /; think of any site that's dynamically generated by some CMS as an example).

So, if you let traffic through to an origin without running any security it's trivial for an attacker to knock off a web server. Thus Cloudflare has to take different security measures to protect servers.

And we do see L7 DoS capable of knocking origin servers down via Tor.


no sympathy then. 1. cloud fare is a crappy cdn, and 2. if you have zero caching on a cheap server you can't expect much to begin with.


How would CloudFlare know it's a static site?


CloudFlare usually just serve you a cached version of the site to increase performances and reduce the bandwidth. It automatically caches CSS, images and scripts but the rules are customizable.


CloudFlare usually just serve you a cached version of the site to increase performances and reduce the bandwidth.

This is nonsense.



No it does not. Resources, such as css, js, pictures, often yes.

You don't get cached hackernews page because they use cf in front. As the other commenter said, it's nonsense.


> What kind of attack is cloudflare preventing when they block people from accesing static sites?

Lots of sites use Cloudflare to prevent scraping.


This!

It's absurd for people to be required to do CAPTCHAs just for read access to a page.

The experience of having to repeatedly perform CAPTCHAs the whole time is a real barrier.


There's no such thing as "read access" when it comes to web security. People think there is but consider, for example,

1. Benign GET / repeated 1000 times per second. That's a DoS on the server

2. Shellshock. Looks like a benign GET / but nasty payload in User-Agent header

3. Simple GET but with SQLi in the URI

All these are real examples. All seen through Tor (and, of course, without Tor).


Unless they do fairly deep analysis of the page or force customers to configure it correctly, you don't know if the page will trigger "write access" later. And while it's easy to inject a CAPTCHA into the initial access, you can't do so easily in a later POST, especially if it is triggered by JS. Would be possible with cooperation of the site code, but I doubt many companies would be willing to go to that length.


From the spec:

     The scheme requires the server to detect nonce reuse with reasonable 
    reliability. However, there might be no need for a zero false positive rate, 
    because if an attacker needs to make 10,000 requests to have one succeed, 
    that's possibly an acceptable trade-off.

    Therefore, the server could use data structures such as Bloom filters or 
    cuckoo filters to store tokens that it has witnessed. The parameters of 
    these structures can be chosen to ensure a false-positive probability of any 
    given amount. Cuckoo filters may be more efficient but Bloom filters may be 
    easier to construct.
I don't think this makes sense. "False positive" for a Bloom filter means it thinks an item was previously inserted when it wasn't really. If the filter represents a set of used nonces, the result of seeing an item as previously inserted would have to be blocking the request as a duplicate: therefore a false positive would cause a fraction of legitimate requests to be blocked, not malicious requests to be allowed as the first paragraph seems to imply.

This result wouldn't necessarily be unacceptable either, especially if there was some mechanism for the browser to automatically retry a HTTP request with a new token if it received a "reused token" error. However, this behavior would have to be specified, and it's somewhat tricky: it's also possible for tokens to be (actually) reused by accident, e.g. if the user restores their system from a backup or a VM snapshot. In that case, it would make more sense for the browser to respond to a duplicate token error by throwing away all its tokens, since it would have no way to know which of them were clean.

Then again, if the Bloom filter is big enough that the probability of a false positive is very low, even spuriously forcing the user to complete another CAPTCHA may not be the end of the world.


So, most captchas don't seem to pose much of a barrier to bots that need to get round them.

A couple of months ago there was an article on the state of web scraping in 2016: https://goo.gl/eUtkRA. In it, the author easily identified and integrated one of many captcha solvers.

Worst case scenario, there is also crowdsourced mechanical turk style captcha solving as a service: e.g. https://anti-captcha.com.

I guess this raises the question as to whether captchas pose more of a barrier to users than bots, and whether they should be used at all?


The big networks don't use CAPTCHAs in the original sense they were meant to be used anymore. They all moved on to phone verification many years ago. I was a part of some of the discussions in Google on this issue - should we keep using CAPTCHAs at all given that they'd been basically replaced by better systems?

The answer was yes. CAPTCHAs are still present because they act as a throttle. The point of a strong CAPTCHA is to limit the amount of abuse that can get through if the other mechanisms break down, by exploiting the fact that humans are kind of slow. Even though OCR can handle most CAPTCHAs these days, it's still not 100% effective, so by ramping up the number of CAPTCHAs you ask users to solve you can still put a throttle on activity. In this way it acts as a last line of defence.

That's why I'm not sure this is going to work out. CAPTCHAs are not a way to distinguish good users from bad, which is how CloudFlare is trying to use them here. CAPTCHAs are way to slow down and throttle traffic that might be auto-generated when you can't tell if it's good or bad. Building a new way to show you solved a CAPTCHA previously doesn't help if the reason you're being shown CAPTCHAs is specifically to slow you down regardless of whether you're good or bad.


Couldn't CloudFlare track a Tor user by tracking the tokens it gave to a particular user, then track them when their client used one of those tokens to validate?


I think that's what the "blind" portion of token authentication is for.


How would users know that tokens had actually been signed blindly?


If the tokens are never sent, only their blinded versions, it is pretty much guaranteed that the signature you get back was made without looking at the actual token.


I get that. What I wonder is who would nontechnical users need to trust about that? CloudFlare? The Tor Project?


I'm not sure, but it can be done with just CloudFlare changes; if the plugin is open source it should be fine. Maybe if Tor Browser integrates the plugin it should be fine too.


Optimal would be only needing to trust the Tor Project.


I had a similar idea for transit cards a while ago. http://jimkeener.com/posts/public-transit-and-ring-signature... Right now, you can probably narrow someone's home and work pretty well.


I see how this would still be enough to stop DDoS attacks or other high-volume automated activity like web spidering, but now spammers can save way more on captcha farms by solving just 1 captcha per N spambot submissions.


Cloudflare has shitty tech that doesn't work, now everybody needs to play along. Booooooo.


This affects other VPNs as well, not just Tor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: