I don’t agree here. There are operators like the one I’m a maintainer of (CloudnativePG) which works directly with Kubernetes, teaching it how to handle Postgres clusters as a coordinated set of instances. Enormous improvements have been done in the last couple of years, and we are particularly focused in working together with storage groups in Kubernetes to handle database workloads, such as for example declarative support for tablespaces and volume snapshots.
I have been a very happy user of CNPG even with occasional issues (database backup to GCS tripped me few times, but it works - mostly a bit of UX that I never was sure wasn't some fail of mine).
Now I only really need to add some automation for handling "recover the database and switch over clients to it" that is more automated (I understand why CNPG doesn't do recovery to existing database, but it is a bit annoying)
I actually do not understand the point here. And maybe you are not very familiar with the concept of transactions. Backups can only account for committed transactions.
However, we are talking about Postgres, here, not a generic database. PostgreSQL natively provides continuous backup, streaming replication, including synchronous (controlled at transaction level), cascading, and logical. You can easily implement with Postgres, even in Kubernetes with CloudNativePG, architectures with RPO=0 (yes, zero data loss) and low RTO in the same Kubernetes cluster (normally a region), and RPO <= 5 minutes with low RTO across regions. Out of the box, with CloudNativePG, through replica clusters.
We are also now launching native declarative support for Kubernetes Volume Snapshot API in CloudNativePG with the possibility to use incremental/differential backup and recovery to reduce RTO in case of very large databases recovery (like ... dozens of seconds to restore 500GB databases).
So maybe it is time to reconsider some assumptions.
I did read the article you linked. It does touch a little bit on potential benefits of having databases managed the same way the services using them are, and integration tests that include the database.
However, I’m still scratching my head how any of that is better or not possible with a Postgres installation that is outside of kubernetes. Let’s take out the management part of databases themselves - assume a managed database service like RDS POSTGRES. why would one want to run Postgres on EKS over having their pods on EKS talking to RDS Postgres?
I feel like I’m missing some technical reason/advantage that makes all these people choosing to run Postgres on Kunernetes with operators and what not.
What does Kubernetes bring to this table over a separate Postgres that dont run the risk of kubernetes interfering with the reliability or operation?
The reason could be the last 4 years of evolution in Kubernetes. Have you heard of DoK Community (Data on Kubernetes)? Might be a good place where to start.
Why not dedicate some worker nodes using taints/tolerations/labels, even on bare metal, with locally attached storage? I wrote this many years ago now but that's the reason why we started CloudNativePG (OpenEBS might not be the answer today, but there are many storage engines now, including topolvm which brings LVM to the game): https://www.2ndquadrant.com/en/blog/local-persistent-volumes...
It is ultimately your choice. I am a big fan of shared nothing architecture for the database. (I am a maintainer of CloudNativePG)
Yeah, and let Postgres take care of redundancy.
I agree that this is an interesting proposition.
AFAIK PortWorkx could do a similar thing, but then with storage redundancy.
Basically:
- storage is synced to 3 local storage devices spread across 3 different k8s nodes. This could be NVMe.
- pod is only scheduled next to one of the three
- reads are local, writes are local (for fsync) and synchronised to the other devices.
I would love to test with pg_tps_optimizer against Portworkx
Back then, we evaluated Crunchy Operator's source code. Being primarily imperative and using an external tool for failover, where the two main reasons we decided to start a new project in 2019 which was entirely declarative and purely based on the Kubernetes API server for cluster status. Such project was released open source last April under the name CloudNativePG and hopefully it will enter the CNCF Sandbox soon (fingers crossed).
Regarding being opinionated I believe that it is what we expect from an operator. An operator simulates what human DBAs in this case would do. I am a maintainer of CloudNativePG, and I have been running and supporting PostgreSQL in production for 15+ years, creating also another open source software for backups (Barman). In CloudNativePG we have basically translated our recipes into Go code and tests.
Many people believe that databases should not run in Kubernetes. I not only believe the opposite, I believe that running Postgres in Kubernetes represents the best way, potentially, to run Postgres out there.
What I've seen from teams that use operators is that nobody ends up understanding how to manage the situation when something goes awry.
I've run into way too many exotic edge cases with kubernetes to trust an operator to do the right thing with data I care about. Most especially when the operator is also managing the replication and replicas and their underlying storage.
I am pro running datastores and other stateful workloads on kubernetes. I've been running databases on kubernetes since petSets.
That is why we took the approach to reduce the number of components and integrate everything in Kubernetes, especially with logging (we directly log in JSON to standard output) and the usage of application containers, which enables us to cover the case of troubleshooting via the fencing mechanism (your pods are up, you can access storage, but Postgres is down, giving you the possibility to check even possible data corruption issues).
Also, the status is directly available in Kubernetes, so in our view easier for Kubernetes administrators.
Finally, the source code is open source and directly available for inspection - if you want to understand what is happening.
So, instead of (re-)inventing the wheel, with something like CloudNativePG, I would encourage your team to look into this, perhaps help contribute to better documentation.
I believe that would be the way to a) get your project to benefit from using an operator for Postgres (hence scaling / benefitting more and more quickly) and it would b) help a great bunch of other folks to also benefit from that shared knowledge.