All The Benchmarks | Scaling Postgres 375

[00:00:00] Two episodes ago on scaling Postgres, we talked about PlanetScale Postgres. So now PlanetScale is now offering a Postgres solution in addition to a MySQL solution, and specifically this is using Postgres on bare metal. So essentially it's using local NVME drives and they posted performance benchmarks comparing themselves to a lot of other cloud providers. And in the content this week there's more than one organization that has come forward to discuss these results and give their take on them. Also, be sure to stay till the end in my consulting corner. I'm going to give another brief update with the summer of upgrades I'm doing for postgres, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is reaction to the PlanetScale PostgreSQL benchmarks. This is from Zeta IO. Now they were not specifically mentioned in the PlanetScale Postgres benchmarks, but apparently they do have a hosted offering as well. So they went ahead and talked about the benchmarks and where they thought they were good, where they thought they needed improvement, and they generally talked about them in general. But they did have issue that PlanetScale left the Postgres configuration at the platform's defaults, as opposed to tweaking it for each one. But most of the platforms I'm familiar with, they do adjust things like shared buffers, effective cache size, and things like that once you go higher in instances, so I'm not too concerned about that. And they did emphasize the biggest negative when going with local NVME drives on instances is that you can't just extend the storage. So if you do choose to go a local storage route, it's much easier to run out of disk in that case, whereas if you're using AWS or any cloud provider, you can just say hey, expand the volume size by another terabyte or something. [00:01:59] But of course they compare their postgres solutions to the benchmarks and of course in their case they came up on top as well, which is what you would expect. So basically whenever a company does a benchmark, you'll usually probably see them at the top of the list. So take benchmarks with a grain of salt. Now the next blog post still related to this is separation of storage and compute without a performance trade off. This is from neon.com and basically they had issue with some of the PlanetScale benchmarks because they didn't implement their local file cache because you know, they are separating compute from storage. So they are using a storage system. But in order to keep performance high they implement a local file cache. And I think this was discussed in another blog post a few weeks ago in scaling postgres as well that I think Aureoly DB did where they talked about the NEON architecture and they had I think they call them page servers and Neon did some benchmarks. And when you look at a local disk solution you hit QPS of 18,000, a P99 latency of 330 milliseconds. If you use the NEON defaults you hit QPS of 12,500. So definitely much lower than a local disk solution. And the P99 latency is around 600, so that's close to double what a local disk solution is. But if you implement their large LFC or local file cache, then the performance gets much better. It gets up to 17,000 QPS, so only 1,000 behind the local disk. And the P99 latency is 384 compared to 330 for the local disk. So much better performance. [00:03:44] So basically they're trying to indicate you can get local disk like performance if you add a large LFC to your NEON installation. And they also mentioned the fact that it's hard to expand the volumes when you're just dealing with a local disk solution. Now kind of related to this, NEON also made an announcement this week about recent storage performance improvements at Neon. Again, I'm wondering if this was still triggered by the PlanetScale announcement, but they talk about various enhancements like their sharded page server ingestion adding compressed wall transmission faster safekeeper disk writes. So that's the wall consumption scan prefetches more responsive L0 compaction and this is to lower read latency during heavy ingestion and parallel pipelined S3 uploads. So they've definitely been doing some engineering improvements. Although another post they release also this week is a recap on May June Stability so apparently they've had some stability issues with their platform, but the reason is because their platform is seeing a lot more use. So they had very recently new database creation increased more than 5x and the rate of branch creation increased more than 50x. So it's not to be unexpected, but when you have a very popular platform that's being used, you probably going to have some growing pain. So this post, you know, basically talks about that next piece of content. There was another episode of Postgres FM last week. This one was on Multigres and Sugu who actually designed or worked on Vitess and is now started on Multigres, joined Nikolai and Michael to talk about the project, and basically Multigres is the test for Postgres. So to my thinking, it's a proxy of sorts that sits in front of your Postgres installation and thus high availability. [00:05:43] Maybe some sharding, although sharding doesn't seem to be the primary purpose of it. But they talked about the benefits of local storage, because with a multigres solution it's not going to use a network attached storage layer necessarily. You could use local disks with it. But they're basically talking about a whole database cluster management where you can achieve five nines of availability, which is pretty impressive. And that's by having more smaller databases as opposed to really huge databases. He talked about looking to implement a consensus algorithm within Multigres that is not necessarily Raft, but similar to Raft, but designing it specifically to work in conjunction with postgres wall records, because ideally he doesn't want to modify anything in Postgres. But he did say, I think there are a few patches that would have to be made and they would work to basically get those done. They did also talk about a situation where synchronous replication doesn't necessarily 100% avoid data loss. And they were referencing this talk here called Myths and Truths about synchronous replication in PostgreSQL. [00:06:53] But they were saying that Multigres will work to try to ensure no data loss if a particular instance goes down. They also talked about the ability to support distributed materialized views that get updated automatically within the system, as well as talking about two phase commits and even talking about postgres isolation. Now, I will warn you, this is probably their longest episode. I think it was over an hour and 20 minutes. But it is very interesting hearing about the plans coming for Multigres. So if you want to do that, you can listen to the audio version of the show here or watch the YouTube video down here. [00:07:34] Next piece of content. Oriodb beta 12 features and benchmarks. This is from aureldb.com and this seems appropriate, giving all the benchmark talks in this episode and the one a few weeks ago. But oraldb continues to march first forward with their different enhancements. Basically they're changing the table access method from a heap storage engine to a newly designed one that's based on undo logs essentially. And we've covered a lot of this in previous episodes of scaling Postgres. But the interesting thing to me about beta 12 is they had a set of benchmarks down here. The first one is a go TPC which is a Go based TPCC benchmark. And here they're comparing stock postgres to Oriole db and what's interesting is the performance difference as they scaled up instance sizes. So the smallest one called large still the score changes from around 6,000 to 7,500, so that's nothing to sneeze at. But then once they got up to a 16xL size, it went from 83,000 to 218,000, so that's more than twice as fast. So the performance accelerates the better hardware you throw at it, given the more modern architectural changes that have been done to oriodb. The next benchmark they looked at Sysbench. It looks like they didn't increase the instance sizes here, but it looked like on average basically a doubling of their performance improvement. And then the third benchmarks was an OLTP and this wasn't as significant. So the Postgres heap averaged 32,000 queries per second, whereas OreLDB averaged 37,000 queries per second. [00:09:16] Now this was a more read heavy benchmark, so perhaps that's why the performance improvement wasn't as significant. [00:09:23] But if you want to learn more, definitely check out this blog post. [00:09:27] Next piece of content Postgres Listen Notify does not scale this is from Recall AI and they have a product that basically sits in meetings like a meeting bot and records, takes notes and then sends it to people afterwards. But as their system was running, apparently this was back in March, they saw massive spikes in database load and active awaiting sessions. [00:09:52] The database query throughput drops massively and the database cpu, disk I O and network traffic all plummeted so no real work was going on. The reason is quote When a notify query is issued during a transaction, it acquires a global lock on the entire database during the commit phase of the transaction, which has the effect of basically serializing all commits. So if you have a lot of writers and you're trying to use the Listen Notify functionality, you could hit a problem just like they were seeing apparently and they were seeing these messages that said Access exclusive lock on object 0 of class 1262 of database 0. And then they looked in the source code of Postgres where they're basically serializing writers and they're locking the shared object with the database with an access exclusive lock. And then they show some of the results when they were using it. Without Listen Notify there does still seem to be a lot of waiting, but the wait types were much different. Here where you have a lot of locked objects locked tuples, but you can easily see the drop in performance. Basically the network dropped off the cliff, the CPU utilization dropped off the cliff compared to when they weren't using Listen Notify. [00:11:12] So I've never been one to do implementations using Listen Notify, mostly because you have to keep a postgres session open all the time to monitor that. Maybe if there were some additional tooling around it, but I was unaware of how restrictive it can be with this code. But if you want to learn more, definitely check it out. [00:11:31] Next piece of Content how to get Foreign Keys Horribly Wrong this is from hakibenita.com Now I should warn you, this is a very Django heavy post. You can tease some postgres things out of it, but a lot of it seems to be working with the Django ORM and potentially fighting against it and configuring to get it to do what it should be doing. For example, it automatically likes creating indexes when you create a foreign key relationship and maybe sometimes you don't want that, or maybe there's only specific values you need an index for so you want to use a partial index. And this was a Talk given@djangocon EU. [00:12:13] So it's definitely Django heavy. But if you use Django and want to know how to work with its ORM to implement foreign key relationships in postgres, definitely a piece of content to check out. [00:12:25] Next piece of content Detection and Resolution of conflicts in PostgreSQL logical replication this is from postgresql.fastware.com and this is actually an embedded presentation talking about how resolution conflicts have evolved from pre Postgres 15 to the improvements in Postgres 15 and also what's planned for 18. [00:12:46] So definitely a lot of changes coming to help really understand if there are any logical replication conflicts that are happening for your installation. So check this out if you want to learn more. [00:12:57] Next piece of content PostgreSQL storage Comparing storage options this is from cypredeck PostgreSQL.com he's talking about different ways you could storage data and he compares the PostgreSQL row store or basically the heap. He compares a column store using Citus column store. He looks at CSV files and then parquet files and he looks at sizes and the ability to retrieve some of that data. So you can see with the heap store when you look at loads up 144 million rows of data, it takes up 72 gigabytes in the heap store and you can query individual rows very fast. You know, way less than 1 millisecond when you put an index on it, whereas the column store is dramatically smaller, like only six gigabytes or six and a half gigabytes. But you can't really index it efficiently to pull out an individual value. It's more optimized for doing an average or a sum of a single column. [00:13:54] Next he looked at a CSV file and you can see the CSV file size is about the same as the heap, a little bit larger. Then he looked at Parquet, which is really used in online analytical processing. So it's essentially a column store, and that size was roughly 5.5 gigabytes. So really this post shows some of the different storage systems and the advantages and disadvantages of each one. Sometimes they're smaller and they're optimized for certain workloads, like analytical workloads, and sometimes they're larger. But you know, with indexes you can more efficiently pull out individual values. So check this out if you want to learn more. [00:14:29] Next piece of content Fixing Slow Row Level Security policies this is from dianemfe.com so I don't have time to go over all of this, but it definitely is a comprehensive post about how to optimize your row level security to make the performance better, because that is one of the main disadvantages of row level security is it can have a negative performance impact. [00:14:53] And the last piece of content is another episode from the Talking Postgres podcast@talking postgres.com this one is on how I got started leading database teams with Cherish Fatah, who apparently works at Microsoft on Azure databases. So definitely check this out if you want to learn more. [00:15:12] Now it's time for my consulting corner. Another quick update this week. [00:15:17] Still preparing for the upgrades and doing a lot of automation work to try to do as seamless a cut over as possible. [00:15:25] So we're trying to automate as much as possible the failover point from a publisher on an old version of Postgres to the subscriber on the new version and the particular client I'm working with right now. It'll be a logical replication upgrade and we're using Ansible and to try and coordinate all the different movements that need to happen amongst not only the databases, but all the extra services that need to be paused or stopped before we do this kind of a database transition. [00:15:56] But if you have any questions about that, feel free to leave them in the comments. [00:16:01] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode. There you can find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 119

Replication Complications, One Word Performance, Pgbouncer Users, Serializable Isolation | Scaling Postgres 119

Episode 56

Practical Indexing, Vertical Scaling, Partition Migration, FDW Performance | Scaling Postgres 56

Episode 239

Shared Buffers, Compression Algorithm, Merge Command, Postgres Contributions | Scaling Postgres 239