Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about shared buffers compression algorithms, merge command, and postgres contributions. I'm Kristen Jameson and this is Scaling Postgres, episode 239.
[00:00:23] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is Harnessing Shared Buffers part two. This is from Enterprisedb.com and they're following up on a post they made previously on this topic, where they were looking at what you should set your shared buffers at when you're using magnetic hard disks. Now, this one explores using more Enterprise class grade SSDs. So they're using essentially Enterprise servers with 128GB of Ram and six SSDs configured in Raid Zero. Now, when they did a PG bench benchmark, so this is for an OLTP based you can see across a range of shared buffers, there was hardly any difference up to where you're exceeding about a quarter of the memory you start to see declining. And they attribute that to having to scan through all the shared buffers on a periodic basis. They also show the flame graphs. That gives an indication of how frequently data was accessed multiple times. And you can see with PG bench, something being accessed five times is relatively rare. It's usually none or basically very few times. So it was super interesting that there were essentially no difference, whereas it was quite a difference with magnetic disks. But they wanted to test what happens if you use data more frequently. So what happens if you have a smaller working set of memory and they have the TP Rock Ol TP benchmark to let them look at that? And you can see from this flame graph, the data was used much more. So much of the working set was kept in memory. And with this result, you can see improved performance as you're adding more shared memory. But again, around that 25% mark for shared buffers, which has been the go to recommendation, that's where it essentially levels off, so it's linear up to that point. So basically, when you have a working set that can fit primarily in the memory, it seems like a quarter of the memory may be a good estimate. But if you don't reuse data from the database and it's going to be in memory, the shared buffer starts to matter less and it's just basically more dependent upon the disks. Now, they also tried the data warehouse and on the part one of this post, they showed that smaller shared buffers led to better performance. And that's what they saw here as well. So in this case, a lower number is better. And you can see at 128 megabytes of shared buffers, that gave the lowest result. And as they started increasing, it performance diminished. Now, it's only by a few percentage points as they are ramping it up. So it doesn't make a significant difference, but it's probably just having that extra memory to scan. It makes more sense just to keep your shared buffers low and leave it for Caching. So, definitely a super interesting result with Enterprise class SSDs. Now, their ultimate conclusion is that one OLAP Warehouse oriented servers can operate normally with a default of 128 megabytes of shared buffers and continue to follow the 25% rule for OLTP systems. They also mentioned exercise caution we're exceeding 32 megabytes of Ram and never exceeds 64 megabytes of Ram, although I don't know if it's just because this machine had 128. Therefore, a more appropriate recommendation would be exercise caution when exceeding 25% of your Ram and never exceed 50% of the Ram. But looking back on the results, I think it depends how many times you're going to access something in memory, the more frequently you're going to access recent data. It makes sense to have the shared buffers high enough to accommodate that up to a certain level. Otherwise, keep it low as possible because again, the OLAP presumably is looking at a wide swath of data that doesn't fit in the shared buffers. Therefore, the performance is all based upon the disk system. And similarly, where the PG bench flame graph shows that if data is not accessed frequently, then the shared buffers really doesn't make a difference. It's, again, all based on the disk access. So to my mind, your shared buffers is based upon how often you are accessing repeated data in the database and maybe gauge the size of your shared buffers to that amount of data you are accessing on a regular basis. But if you want to learn more about this, this post and the previous one are what is covered by five minutes of postgres episode 41 tuning Shared Buffers for OLTP and Data Warehouse Workloads. And is 25% of Ram a good choice? This is from Pganalyze.com and Lucas covers both parts of this blog post and gives his conclusions as well. So if you're interested in that, definitely check out this piece of content as well. Next piece of Content you can now pick your favorite compression algorithm for your walls. This is from Enterprisedb.com. They're talking about the enhancement where you can define which compression algorithm you want to use for compressing walls if you want to enable that for your system. So before they used to offer a postgres based one called Pglz, but you can now use LZ Four and Z standard, and they show some of the results here in terms of the size of the walls generated. This is having it entirely turned off and you can see the huge savings in disk space using any one of these compression algorithms. And then they zoomed in to look at the different compression algorithms, and I believe this is a report on disk space. And they say Pglz provides slightly better compression than LZ Four, but Zstander provides better compression than Pglz. And frankly, I was expecting more of a difference between these. So it's kind of surprising that it's relatively minimal and then they looked at the transactions per second that was possible when enabling this and again Zstander was the winner by a narrow margin with LZ four and Pglz following up behind. Again, I'm super surprised by this because for different clients I've actually started using Zstandard to compress like backups or compressing a bunch of wall files together and it's been a game changer with how fast that works in the compression ratio it's because it is able to do it in parallel. Now with this use case, I guess those advantages aren't coming through but I'm still surprised. There is not much of a difference between these but if you want to learn more, definitely check out this blog post next piece of Content the merge command. This is from Enterprisedb.com and they're talking about the new merge command in postgres 15 and how it essentially lets you merge data from one table into another. So how you typically write it is merge into Target using source on a particular condition. So the example they have here is two different tables where some rows exist in one table but not the other and some of them the rows are identical but they have a change. So you can do a merge into target using the source and then have a condition when not matched then say insert values, or when matched then delete them, or when matched, then update. So basically you define how you want to handle the merges and you can even have multiple conditions within a single merge command to say in this case deleted, in this case update, et cetera. So this is a very brief post about the merge command. If you want to learn more, definitely check this piece of content out.
[00:07:42] Next piece of content is actually the next episode of Postgres FM talking about contributing to postgres. So they discuss all sorts of different ways that you can contribute to postgres, not only with code but in other ways. So you can definitely listen to the episode here or click here to view it on YouTube. Now, related to this is a post by base, which is what is PostgreSQL commit fest and how to contribute. So if you want to contribute code you're going to want to get involved with the commit fest and this explains what commit fests are. They're basically getting together and committing different features to postgres and they describe how this process works. So if you're interested in doing that, definitely check out this article. And the last article related to postgres contributions are actually look at postgres 15 statistics and this is from Eisentrout.org and he's showing from version twelve different metrics across versions of postgres, including changes listed in the Release Notes, number of commits, number of contributors listed in Release Notes, total files, lines of code, and the number of files changed. And he makes note that they're continuing at a pretty standard pace even in spite of things slightly growing in terms of the size of the code base, the number of features, et cetera. But if you want to learn more about that, definitely check out this blog post next piece of content. Cross partition uniqueness guarantee with global unique index this is from Haigo CA and this continues on with their feature exploration of adding globally unique indexes to partition tables. Because right now you can only have a unique index within a particular partition. It doesn't apply to the parent, which would apply across partitions. So they're talking about what they are proposing in terms of getting this working in postgres. So it's essentially a feature in progress. So if you want to check that out, definitely check out this blog post next piece of content. Postgres, databases and Schemas this is from Crunchydata.com. Now this is a simpler post, but it's giving you the 30,000 foot of postgres down to a little bit more detail. So for example, the first thing you address with Postgres is what instance is it running on or what container is it running on? So there's some sort of operating system that Postgres runs on, essentially, and that operating system is running on some type of instance. And then when Postgres is running, it's actually running as a cluster. Now that's a little bit of misnomer now, but basically a cluster means a running database infrastructure that you can run multiple databases on. So one Postgres cluster can have one or many databases, and then on a particular cluster you have a particular database. And then he explains how you can connect specifically to that database. You don't connect to the cluster, but you connect to a database on the cluster. Then he discusses what happens within the database and he talks about how schemas exist. And basically these are namespace areas within the database that hold the database objects you're going to be using. And more than that, people use the public schema, but you can also create your own schemas, and a lot of times that's creating schemas for individual users, for example. So if you want this kind of overview of the postgres landscape, you can definitely check out this blog post next piece of content. Easier upgrades and image management for Postgres and Kubernetes this is from Crunchydata.com. They're talking about some enhancements that have been done to their Crunchy postgres operator for running Postgres and Kubernetes, and some enhancements that they've added to make upgrading easier. So that feature was already there, but they've added a new pause feature as well as easier ways to increment the versions. So if you're interested in that, you can check out this blog post next piece of content. Postgres ML is eight to 40 times faster than Python http microservices this is from Postgresml.org. Now, I'm not familiar with machine learning, but apparently they set up a Postgres ML installation and a Python installation and did some performance tests and the results were exactly what they say, eight to 40 times faster. Now, this is coming from the Postgresml.org website, so there may be a bit of bias in that, but if you're interested in machine learning, maybe you'll want to explore this particular extension.
[00:11:57] Next piece of content moving Objects and Geofencing with postgres and postgres. This is from Crunchydata.com, and they're talking about how to set this up as a Web service using a few of the tools that they offer PG EventServ and PG Featureserve, along with Postgres and PostGIS and the OpenLayers Map API and some of the features that they Were Looking to build with this example. And they do have a Try it out button here. So it does lead you to a website where you can experiment with it, but it offers a real time view of the state of different objects that are on the map. Live notifications when objects enter and leave a set of geofences and then querying the history of the system to see where objects have been and to summarize their state. Like for example, Truck 513 spent 50% of its time in the yard. So if you're interested in building these types of applications or features, definitely encourage you to check out this blog post.
[00:12:52] Next piece of content routing with lines through Polygons this is from Rustprooflabs.com and their post describes this. I'm not that familiar with geographical information systems, but if you're looking to achieve this, definitely check out this blog post.
[00:13:06] Next piece of Content PostGIS upgrade GEOS with Ubuntu in three steps this is from Cybertechn postgresql.com and PostGIS apparently relies upon GEOS and Gdial libraries to install PostGIS and they go through the process of upgrading these in Ubuntu. You can check that out if you're interested.
[00:13:26] Next Piece of Content News Postgres 15 available in Azure Cosmos DB for PostgreSQL this is from Citusdata.com and I was not familiar with Cosmos DB, but apparently Postgres 15 runs on it now and they talk about the announcement down here. This next post talks about distributed PostgreSQL comes to Azure Cosmos DB so apparently they're merging the Citus capabilities into Cosmos DB. But the link here says that Cosmos DB is basically a fully managed serverless distributed database on Azure or on Microsoft's Azure platform, and it looks like they have different variants of it. So they've had a MongoDB variant and an Apache Cassandra variant, but now they're offering a relational one PostgreSQL leveraging Citus extension to do it.
[00:14:20] The next piece of content the PostgreSQL Person of the Week is ittian Prasak. If you're interested in learning more about it and his contributions to Postgres, definitely check out this blog post and the last piece of content we did have another episode of The Rubber Duck Dev Show this past Wednesday evening. This one was a holiday party discussing coding horror stories. So we had on a panel of developers talking about their worst horror or coding stories that they've had. So if you're interested in that type of content, we welcome you to check out our show.
[00:14:53] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show. Notes be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode. Or you can subscribe via YouTube or itunes. Thanks.