pg_stat_statements is useful, but it isn't completely free, e.g. [1] (admittedly from pg12, which is three major versions ago).
Most people should probably leave it enabled, but if you have a database with a very defined and understood query pattern, then it might be worth running perf and checking if it is costing you.
Most modern processors run at a speed that makes it cheaper to check than to not check. Some people might argue for lower-level optimizations like this, but in this case, I don't think it's a big deal.
Bottom line: enabling this doesn't hurt. It might help, and it doesn't cost anything to check
Wondering about pgbouncer: A postgres connection can have state attached, e.g. with 'set'. Does pgbouncer save and restore state, with associated perf cost, or does any pooled connection just receive the random state from the previous connection owner, assuming all state should be identical?
pgbouncer supports session and transaction pooling. With session pooling a connection will be associated with the same session throughout its use. With transaction pooling each transaction on the connection could potentially get a different session.
What does crunchy cost? Page says to ask, typically meaning you can’t afford. Wonder if it is feasible for a small company. Is it competitive with RDS?
The starting prices for hobby instances [0] (i.e. development) are about half of what you would pay for AWS RDS [1]. If you choose to have high availability then prices are equal in this case. This is for Crunchy Bridge, which has a calculator on the website. Although, similar to Amazon RDS the prices increase dramatically as you get a decent machine to run your database and increase storage size. I'm using us-east-1 for the comparison.
Honestly, as always, I think these managed database services are way too expensive. If I really needed to choose one of them I would go with the AWS service, because you also have to factor in data egress costs in your cloud costs calculation which will vary a lot depending on your workload.
It's quite a bit more than RDS/cloud provider managed. I don't see a reason to use them vs a cloud provider native service if you're already in the cloud provider hotel california.
Product manager here from Crunchy, we aim to be quite in line with any of the major cloud providers. A number of customers have moved over from each of the major cloud providers to see better price to performance and quality of support.
Just as a data point: we are planning to move to Crunchy and away from Heroku soon. We tested the two, and we have 4X improved performance on Crunchy Postgres vs Heroku Postgres.
4X. We were astounded. And so we’re the Heroku folks I talked to, who I’m guessing will want to try to bring performance boosts.
We haven’t done the switch quite yet because we’re waiting on Crunchy to bring out their Heroku Dataclips replacement, which is something we rely on a lot. But once it’s ready, we’re going to switch!
Great. Wonder if you offer any a la carte consulting on modeling, performance, etc? Wouldn't need it often, but rather occasional "ask an expert" when when hitting a bottleneck. This is a bit different than support.
We don't do this so much a la carte for non customers, but definitely have it available for any of our existing customers. It's a common request, and we in general just stay very busy with it already. For customers I often tell them start with a support request to start... often in 15-30 minutes we can help give some insights that are helpful without having to go for a bucket of consulting hours, it just depends what and how much folks need.
One thing I really like about RDS is the query visualizer. It shows which queries ran when and for how long and what they spent their time waiting for, CPU or disk or locks or whatever. It’s really good. Are there any free tools that show a similar view?
edit: i think it’s called performance insights in RDS
Would be a whole lot better if it didn't prune their query logs to x (I think 1000?) characters.
We autogenerate many of our own queries which can have significant complexity (regularly over 10 joins, sometimes up to 30!) and our infrastructure isn't quite there yet to be able to recreate the exact query plan a customer saw on their own data without a lot of work. It could all be so much simpler, so if there is a setting to prevent this please tell me!
Which is a great thing. If this were on YouTube, we'd have a 20 minute video, with a long and unnecessary introduction and a sneaky 1 minute ad at the middle of the video.
>There seems to be a common lifecycle of indexes within applications. First you start off with almost none, maybe a few on primary keys. Then you start adding them, one by one, two by two, until you've got quite a few indexes for most any query you can run. Something is slow? Throw an index at it. What you end up with is some contention on overall throughput of your database, and well a lot of indexes that became a tangled ball of yarn over time.
I spent years doing index maintenance on a large corporate database. Creating and rebuilding indexes was a hassle. So I set about building a new kind of database engine where every column in a table organizes its data for fast access (e.g. a columnar store with no separate indexing structure).
I expected my system to be substantially faster than a Postgres table without indexes for a broad range of queries. What I didn't expect was for it to also be much faster for tables that were highly indexed in Postgres. I ran tons of queries on both systems and almost all of them were like this video:
You self-promote here often¹ (not judging, just an observation), and here are some thoughts that have occurred to me while reading your posts and comments.
• Your vision for Didget is blurry. Sometimes it's a database (or "general purpose data management system"), sometimes it's a file system. Products that are dessert toppings and floor waxes rarely find purchase. My advice: Decide which problem domain you're playing in, then which problem(s) within that domain this solves.
• The Postgres comparison seems naive. As the Postgres wiki notes, "PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance. Odds are good the default parameters are very undersized for your system." A comparison with a system-appropriate Postgres configuration would be more interesting.
• Promoting a new closed-source database seems anachronistic when there's a plethora of fine open-source options. If you're wondering why you're not getting many leads from HN, I'd imagine this is the main reason.
Whenever I mention my project's website in an HN comment, I am 'self-promoting' so you are correct. In my experience that is not something that is frowned upon on HN, since I see this kind of thing all the time (the article I commented on was promoting another product).
Didgets, as it exists today, is a technology rather than a 'product'. It consists of a set of highly optimized data objects that can be used to build complex systems like hierarchical file systems, relational database tables, logging frameworks, configuration managers, or content indexers. I have implemented enough functionality into the browser application to prove that it can do those things well, but none of them are fully implemented yet. I am trying to find the right product-market fit.
I have not yet open-sourced any of the code, but I probably will once I decide which open source licensing is best for it. I am simply trying to introduce this technology to others to see if there is any interest in it.
Note: While I didn't explore every configuration option in Postgres so I don't have confidence that it ran as fast as it possibly could, I did not just use the default configuration. I tried all the usual tricks to speed it up.
> It consists of a set of highly optimized data objects that can be used to build complex systems like hierarchical file systems, relational database tables, logging frameworks, configuration managers, or content indexers.
That sounds like a "general-purpose database", unless there's something that disqualifies Didget from being that. If it's a database, you're probably competing with something closer to the free and open-source SQLite than you are the free and open-source Postgres. This can be done (see rqlite), but in any case it seems like being free and open-source would be necessary (although not sufficient) for people to consider it.
> I tried all the usual tricks to speed it up.
I don't know what "usual tricks" means (maybe you have a link?), but I think it would add some credence to performance claims to specify what you're doing.
One potential hint is the Postgres queries are being executed as sequential scans, as you can see from the explain output. The first query only returns 200k rows (out of 7M), so even without any config tweaks it should be using an index if one is present. So, my guess is those queries are being made against unindexed columns. Also, I'm guessing this is vanilla Postgres, so it's possible OP hasn't tweaked its config for the machine they're using, and the default config pg ships with is designed for fairly resource-constrained machines by today's standards. If the data is indexed, it's possible the rows are all over the place, and clustering could help.
In addition, I wonder if random_page_cost is not set to be 1. I've noticed on machines with SSD only pools that makes the query planner much more deterministic and utilizes indexes over sequential scans almost unused.
I would love it if even one of the database experts who are so sure that the comparisons are somehow rigged, would actually run their own benchmark to prove it wrong instead of just speculating.
Get any moderately sized table (millions of rows and a dozen or more columns) on your 'highly-tuned' postgres database with all the proper indexes in place and load that same data into Didgets (takes a whole 10 minutes to download the software and start using it) and run whatever queries suits your fancy.
My post doesn't assume bad intent. As the new entrant in a space, the onus is on you to show your work. So, if you want people to run their own benchmark then share the data and configs you used.
I actually tried to find the dataset you used yesterday to run my own benchmark, but the only official "Chicago crime database" I could find had only 300k rows and 19 columns, so I assumed it wasn't the same data you mention in your video.
Even though I replied to your post, it was more of a general observation than to you specifically. My video has been viewed over a thousand times. I get comments all the time that say it can't be real or that I must be handicapping the other databases (I also have a video comparing to SQLite) by not configuring them correctly.
I have yet to have a single person who claims to be a database expert try it out on their own favorite data set and tell me that they found Didgets to be slower than their preferred database engine.
The engine was created from scratch. It can do JOINs but hasn't yet implemented all the different kinds of joins.
It is a completely different architecture (originally designed to be a file system replacement where multiple tags could be attached to each and every file) and the database functionality was almost discovered by accident. The tags I invented to make finding files based on them extremely fast, looked a lot like a columnar store. So I tried building regular relation tables using them. It surprised me when queries against my tables gave other databases (with decades of development behind them) a real run for their money.
A more generous interpretation would be the assumption that primary keys are (as they frequently are) synthetic data, and perhaps not the most commonly queried field aside from direct get-that-resource queries.
GP edited their comment after I responded and didn’t mention it. Completely changed the comment. Was very antagonistic to the author of the article before.
DBMS-s still have to identify a particular row somehow. That identifier is CTID in Postgres and ROWID in Oracle. Both will change if the data is moved around.
Most people should probably leave it enabled, but if you have a database with a very defined and understood query pattern, then it might be worth running perf and checking if it is costing you.
[1] https://www.alibabacloud.com/blog/postgresql-v12-how-pg-stat...