The interesting part it's cheaper than AWS Glacier ($4 per TB per month) and slightly more expansive than AWS Glacier Deep Archive ($0.99 per TB per month) but the data is available immediately and not in hours like glacier where you have to pay a hefty premium for faster access to the data.
Interesting, unlike Glacier this is significantly cheaper than Backblaze B2, meaning I might have to reconsider how I do my backups again. Any good backup tools supporting this type of service?
I rely on Restic at the moment which seems to need fast read access to data, but their incremental snapshotting is great. It'd be ideal if I could find something like that supporting these "cold storage" solutions.
One thing I do consider a real value add for AWS Glacier though has been their native support for offline media import/export. Ie., you can just send them a hard drive of your own for data load, and pay to get a hard drive back out as well. As gigabit (or faster) class WAN slowly spreads this will someday become unnecessary, but right now in many, many places a company could easily have terabytes to backup but 10/1 ADSL as their best available connection. Even with faster connections, aggressive data caps are sadly not infrequent. Whether it's for initial load, ongoing use or faster recovery, sometimes there is still nothing like a multi-TB drive or two in the mail.
There are 3rd parties that will do it for you (Iron Mountain is at least one) but that's an extra cost and Google takes no responsibility for it. I assume this is an example of a place where Amazon is able to leverage its wholistic business, with a Cloud service that can also take advantage of their physical logistics system. Google's service here is quite significantly cheaper and has some nice features though, but even if it's not worth a $4/$1.23 premium for Amazon I could definitely see continuing to pay Amazon some premium ($2 vs $1.23 say) for that alone anywhere with limited high speed WAN availability.
We also have a Transfer Appliance [1], that comes in two sizes (100T and just under 500T). We don’t currently support shipping one filled up with your data for recovery/export though.
Backblaze also offers that option. You can mail them up to 8tb on an external HD and have it loaded into their system for $190, up to 256gb on a USB stick for $100. [1]
You can also request a "B2 Fireball" [2] from them. It's basically a small array that they mail to you for $550 with 70tb of storage. You fill it up and send it back to them within the month, and they'll load the data into your account.
For comparison, Amazon supports up to 16TB in their basic service, with an $80 flat handling fee per storage device and then $2.50 per data loading hour. Since they support 2.5"/3.5" SATA and external eSATA & USB 2/3.0 interfaces and it's a pure sequential transfer, it's not much trouble to get at least close to maximum sequential, which even for decent spinning rust should allow a good half TB an hour at least. I've never tried an SSD so I'm not sure if they can saturate 6 Gbps, but as even a 32 hour transfer of 16TB would only be another +$80 it may not be generally relevant anyway.
Amazon's equivalent to B2 Fireball is "AWS Snowball" (amusingly enough, not sure if there is a bit of fun name riffing between the two here), which is a service fee of $200/50TB and $250/80TB device, any onsite days after the first 10 at $15/day.
It's interesting how the pricing mix is on this feature though. Amazon offers lower potential ingress pricing depending on your use, though notably if you kept the Snowball a whole month the pricing would get very close to the Fireball (+20 days @$15/day brings the price to $500/$550 respectively, though the former with 20TB less and the latter with 10TB more).
Backblaze and Google are both much cheaper to get data out of though, Amazon's Glacier and descendent services remain very much deep freeze focused.
It looks like a lower tier than the existing Coldline and Nearline (7x cheaper for storage than the former). Both have a minimum period, so this one is likely to have one as well. Coldline and Nearline are more expensive than regular storage when fetching objects, which means ice cold storage is probably even more expensive when you restore (is it going to be 7x, too, keeping symmetry?).
The concept, idea, and flexibility of Arq is great, ideal even. The amount of control is nice. I wish it were open source.
The actual product is pretty painful when you need to do a recovery, especially if you don't know where the file lived on disk. I haven't tried newer Arq Cloud Backup destination to see if it improves the search experience.
That said my experience is from more than a year ago and I would try it again if they were able to bring their search on par with current consumer backup offerings.
The place where this won't be as cheap as Backblaze is retrieval. Unless Google makes a big change, you'll still have to pay for network egress, which is obscenely priced: https://cloud.google.com/storage/pricing#network-egress
Borg Backup is mostly the same as Restic (regarding dedup / incremental backup) [1] and aggregates data into large chunks.
If you only backup from a single machine it has a local cache of already backed up data, this has the large advantage that it basically only needs to push the delta data to the remote, not do any kind of synchronization to check what is already there or not.
"Borg Backup is mostly the same as Restic (regarding dedup / incremental backup)"
... with one very big difference - you can only point borg at an SSH host. You can't point borg at S3 or B2 or Glacier, etc.
rsync.net supports both borg and restic, but even the heavily discounted plans[1] are much more expensive than "Cold Storage" or Glacier, because they are live, random access UNIX filesystems ...
Shameless plug: I built a backup service[1] just for Borg and the price per TB on the large plan is $5/TB. Not as cheap as "cold storage", but still better than rsync.net and the same as B2.
Also worth pointing out that my storage is calculated after compression and deduplication. So depending on the data a Borg backup can be much smaller than the actual data.
True - which is kind of weird, because as far as I understand their respective "databases", borg would be more suited for arbitrary remote storage because it should only need a "upload file" command basically without any interactivity, except for its robustness checks and some additional flexibility (having multiple backup sources, deleting data that is no longer needed).
Restic seems more made from the ground up to utilize the existing power of a filesystem as a database, so it needs remote storage that offers quick interactivity (esp. checking existing files), i.e. it's impossible to use something like Glacier as a backend.
It's not a problem for me since I just backup to a local drive and (am planning to setup) synchronization to a remote dumb storage.
Does anybody know what the retrieval fees will likely look like? I've been wary of most of the "cloud archival" solutions because while they're cheap to put data into, they seem charge you a billion dollars to actually retrieve it.
FWIW, this is still an ideal model for backup storage: If your more regular backup model is robust and your network is well-secured, you'll never need retrieval. And if you need it, you need it, and it's justifiable to spend big to save your business.
I'd be confident with periodically testing just little random parts.
For me this is a "last resort backup", costs little to keep around, and god-forbid we ever need it. BUT that means we need to account for the case were we do need it! And if it's going to cost too much then there's no point in the backup anyway.
I would generally agree. First of all, you're going to test a lot of your restore processes with backups which are closer to home: You should make sure your VMs can all restore from your onsite (or just less icy) backups, for instance. As long as you're confident in that, the only thing you need to test with "ice cold" storage is that you can successfully restore a single VM from it, since you know all of your VMs can be restored.
Same here. As a company you can go "we need this to save our asses, I don't care if it costs $50k in a 4 person company", but personally I kind of do care about the cost for retrieval...
I've been comparing cloud storage prices to hard drive prices for years now. My first thought when seeing the storage prices was "huh, that might actually be worth it", but depending on the retrieval costs, you might still want to roll your own no matter the storage costs. For private use, I am (was?) planning a variant of this as soon as I am finished doing a server migration: https://old.reddit.com/r/DataHoarder/comments/7rjcdn/home_ma...
It's your backup, not your primary system. The odds that more than one drive fails within the same, say, week, is probably perfectly acceptable for most people.
It is universal practice within cloud service providers to span redundancies across "fault domains" - basically things which could make failures correlated, like being in the same machine/power strip/datacenter/geo region. If you assume your fault domain analysis is good, then your failures should be independent. Many global outages are the result of a previously-unidentified fault domain, like the Azure certificate issues. Of course past a certain point it becomes unimportant - who cares about your data if an asteroid takes out every datacenter on Earth at once?
Nothing in engineering is 100%. As far as engineering goes 11 9s is pretty much as good as you can get. For comparison, AWS S3 and Glacier are 11 9s durability too.
It's worth bearing in mind the difference between durability and availability. Durability is roughly the chance of losing your data over a given time span (in this case a year), whereas availability is about how reliably you can access the data (and is almost certainly a lot lower than 11 9s). A service can be very durable but have very poor availability.
"What is the meaning of the claim about "99.999999999% annual durability"?"
It has no meaning whatsoever. Someone on the marketing side of the team decided that was a "competitive" number to present, outwards, and someone in engineering was tasked with, working backward from that number, coming up with some plausible calculation that resulted in it.
In the real world, they, like Azure and Amazon, will have single point in time outages that will wipe that out for a year or more.
Here is what an honest assessment looks like:[1]
"Historically (updated April, 2019) we have maintained 99.95% or better availability. It is typical for our storage arrays to have 100+ day uptimes and entire years have passed without interruption to particular offsite storage arrays."
...
"In the event of a conflict between data integrity and uptime, rsync.net will ALWAYS choose data integrity."
> In the real world, they, like Azure and Amazon, will have single point in time outages that will wipe that out for a year or more.
An outage affects availability, but as long as it's not permanent it doesn't affect durability. For example, if I add a new backup provider that stores data on-premise I've added a (nearly) independent data store. This substantially decreases my risk of losing my data unrecoverably (increases durability) but if I don't set up any sort of automatic failover I'm still at risk for substantial outages (no practical increase in availability).
> Someone on the marketing side of the team decided that was a "competitive" number to present, outwards, and someone in engineering was tasked with, working backward from that number, coming up with some plausible calculation that resulted in it.
I would be incredibly surprised if that happened. That's not the way I've seen anyone work here.
(Disclosure: I work at Google, though not in Cloud)
You are mixing availability (access at any given moment) with durability (not losing data). From the FAQ:
Cloud Storage is designed for 99.999999999% (11 9's) annual durability, which is appropriate for even primary storage and business-critical applications. This high durability level is achieved through erasure coding that stores data pieces redundantly across multiple devices located in multiple availability zones.
Disclaimer: I work at GCP, although not in GCS specifically.
(In a nutshell, pack some hot data with a lot of cold data on many large drives, then put a Flash-based cache in front of them to get long tail performance predictability back.)
There was also a talk about the low level storage service and the performance isolation work that allows it to mix batch and latency-sensitive traffic on the same drive, but it doesn't seem to have been recorded: http://www.pdl.cmu.edu/SDI/2012/101112.html
Gory details are in the patents, 9781054, 9262093 and 8612990, which I'm not linking directly, because your lawyers might not approve. There's even a follow up, 10257111. It's so new, from two days ago, that Google Patents can't find it, while Justia can.
You have to keep the data in that class for a certain period of time. That's the drawback. You can access the data at that price as long as you keep it there for a long time.
So I suspect this is not a fully cold storage. That's why they can retrieve the data faster. Seems more like an economics hack (Longer commitment to keep the data, allows them to buy and operate the storage hardware/software at a cost that can be amortized against those commitments)
It's a pretty good price but assuming you are storing 8TB and you get your own drive, the drive would pay for itself in about 14 months... so you would basically get the next 4 years for free if you are willing to manage it...
Will that storage have "11 9’s annual durability" and stored in multiple location?
Let say that you only need to write to it once, have 2 secure location available for free, that would still means that you need 2 of them which would pay for itself in 28 months then.
Sure it's "cheaper" but it's far from being as good and the price difference isn't that big.
Google offers some interesting services, but their API is always so awfully complicated and cumbersome that I've given up entirely trying to use anything.
While one major use of something like this would be backups, how does one handle these backup sets with respect to GDPR requests? The window to respond is 30 days, so keeping backups longer than say 25 days seems cumbersome. You would need hot access to the sets to load them up and delete the data.
You don't keep a single copy of each key, but store enough redundant copies to get the proper number of nines. Preferably that's redundant geographically, in terms of storage technology, and in write frequency.
The important part is just that the keys don't end up in long term cold storage. Either it's only retained for a short period (e.g. tape backups that get rotated after two weeks), or it supports live deletion.
I was also looking for that, the only piece of info about that was:
Unlike tape and other glacially slow equivalents, we have taken an approach that eliminates the need for a separate retrieval process and provides immediate, low-latency access to your content. Access and management are performed via the same consistent set of APIs used by our other storage classes, with full integration into object lifecycle management so that you can tier cold objects down to optimize your total cost of ownership.
It seems to have the same pricing like the other storage classes: No fees for accessing the files within the same region and the typical bandwidth fees if the backups will be downloaded to somewhere else.
Google has burned people so many times with shuttering products with little to no warning I'd be hesitant to trust them with my long term data storage.
Eh... for consumer stuff, sure, and perhaps even for new/experimental GCP features. But this is storage, a core function, on GCP, an enterprise service with actual contracts and SLAs attached.
Just don't think of it as something you'll ever want to restore unless the building burns down and you've lost everything.
Glaciers restore costs had a lot of fees in my one experience. We could have bought several RAID units for the price of a fast restore. If you asked for it back over a long period of time, the price dropped dramatically.