I wonder if you could do some automated A/B system here, and instead of every commit, just do it once a day: Every night, update one set of servers with the latest stable branch code. Route N% of your user to those servers. If faults start showing up, push N to 0 and notify developers. Otherwise, slowly move N towards 100 throughout the day. By end of the day you're either fully deployed, or rolled back.
(I already see faults with this...it's presented here as a thought experiment, not advice)
What we implement is much closer to this suggestion. We move the code out to a couple machines from each of our different web frontend servers. After a minute we compare before and after across numerous metrics (load average, cpu, errors, page failures, etc). If the revision passes, we roll it out to 100% of the cluster and do the same monitoring for 5 minutes.
Originally we intended to put business metrics in these tests, but it turns out we regress on them via code changes rarely and it takes a human to figure out what went wrong. Instead we test business metrics (and lots of other stuff) via nagios, which gives us 1-5 minute sampling frequency, good enough for most of our issues.
I did not cover how you would iterate towards the ideal (instant deploy) and what concessions you might have to make. Our 6 minute deploy is actually quite inefficient, but it's not the bottleneck to our deploy system.
(If you're wondering what our bottleneck is, our automated tests take 9 to 12 minutes despite being spread across 40 machines... Selenium in Internet Explorer is slow.)
I ask because I swear A/B and multivariatate tests have been around my head a lot lately, and when I finished reading the article, the first thing I thought was: Why not just deploy to 1% of the users and see if it works?
Then I thought about how hard would be to manage multiple versions of the same software, specially data, amongst different user. Certain features presented to the 1% might be incompatible with the other 99%. But that's a technical problem. Very hard to solve, but manageable. Then I imagined somekind of framework that would make communication between different versions of the data floating around easier, with "how to transform" data from-and-to version 1.1 and 1.2 easily.
Anyway, I am really curious, because it sounds like a good solution :)
As far as data transofmration goes you have two options:
1. Something akin to ActiveMigration is RubyOnRails world. This allows going back and forth different versions of your data's schema.
2. Use a more open data scheme such as Google App Engine uses where adding/removing properties to an object is not as disruptive compared to SQL-based solutions.
It turns out the system works fine with SQL based alters. We do have to do real work to deploy expensive alters (apply them to standbys, fail over, repeat, or worse) but in general it's cheap to change schemas.
Unfortunately, it's very manually intensive to roll back schema changes, so it's one of the few places where we put old school process in place (a DBA who reviews all schema changes prior to deployment)
ActiveMigration really solves a different problem. Our problem is that adding indexes or altering popular tables is impossible to do on a live and in production database. To get those changes out we have to go through quite a bit of extra work. It's really a MySQL limitation, not a process problem.
This is an excellent idea, please continue to explore it.
As my token contribution - Google App Engine allows storing and accessing several versions of the app (access through different subdomains). Perhaps one coudl use DNS to trick different users into seeing different app versions? Not quite what you wanted, but down the right path.
(I already see faults with this...it's presented here as a thought experiment, not advice)