Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The marketing site says there’s full text search. The paper doesn’t cover this. How is this being solved here?


Done completely client side. We actually have multiple blogs on this - https://skiff.com/blog/private-search


So if I have 15gb of email, I have to process all of that on every client bootstrap?

How do you plan on scaling your service with respect to this problem? once you have a non-trivial user base with non-trivial data volumes this is likely to become a substantial problem.


To put this in context, the trivial example of a user with a 15gb account, say you happened to be using s3 for storage. They buy a new phone, that costs you ~$1.50 that month, or 50% of your revenue at current pricing. They buy a new iPad and a new laptop? You’re 50% in the red.

Similarly you’ll have some users who are, say, content creators. They shove a 10gb video in their drive. Let’s say they have a laptop, a workstation, a phone and an iPad. Their upload is downloaded at least 3 times, costing you ~$4.50.


This isn't how it works at all? We don't pay for storage on users devices... buying the device = buying the storage. It's actually much more efficient than doing search through some massive database.


Based on the blog you referenced up thread:

I upload a large document to your drive product from my workstation. I go to search on my phone. My phone needs to download the content in order to index it. My phone downloads the content from you. You pay for the bandwidth.

If I provision a new device, and it needs a new search index, it needs to download all of my content once, in order to populate the local index of the content.

If I'm something like a youtube content producer, I might put extremely large files in the drive. Per the blog post all the other devices signed into drive will see this new file and pull it down to index it.

So if I upload a 15gb video from my iphone to later process it on the workstation, my laptop, ipad and workstation will all download it. That means you need to serve up 45gb of bandwidth. Cost of operation as described in post above.


Depends, on an older phone, downloading all emails just to allow for searches locally won't be very efficient. Log out also becomes a problem, if emails are stored on one device that gets stolen, adversary now has access to the local index since all the keys or on the device usually with no FDE. Meanwhile with gmail a log-out would clear all traces instantly.


Also, not really true of Gmail. Try turning your WiFi off, then deleting your Gmail account. You might have mail stored offline on your phone (let alone any other device), as well as any IMAP or other clients. It's the same or worse.


Emails are downloaded when you receive them. Isn't that how email works?


Normal email proiders don't dowbload all emails whenever a user logs into a new device


We also don't do this. In a near future implementation you can just synchronize the end-to-end encrypted search index.


This step is what I was expecting you to talk about, and it has some tricky subtleties to get right, which is why I looked for it in the whitepaper.

A trivial problem with a naive implementation is being able to perform presence proofs using side channel information: send someone mail containing a terms you want to verify, and watch for the associated high level costs affecting operations that are likely to be incremental index change uploads.


You mean you currently do this but plan not to in the future


All common operating systems can encrypt keys or full disks.


Yeah, this is where it gets real. VC funding and unbounded potential operational costs. Ouch.


Seems trivial enough to do client-side on modern hardware, especially if you exclude attachments.


To do this you would have to download all emails on all devices all the time to index them. Kind of makes the whole point of a cloud based email moot, if everything is on my device anyway, and logging in and out resets it all - might as well use an email client app.


Without producing confirmation side channels and so on? Doesn’t seem that trivial- at least it needs more thought than assumption of simplicity


We do it in a fairly straightforward manner right now. Check out the blog I linked to above - generate an in-memory index, end-to-end encrypt it, and store it browser storage. It's only decrypted in memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: