Episode Transcript
[00:00:00] Speaker A: In this episode of Scaling Postgres, we talk about postgres releases, a performance discussion, index types, and index power use. I'm Kristen Jameson, and this is scaling postgres episode 178.
All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 13.412, point 811.1310, point 18 and 9.6 as well as 14 beta three have been released. Now, this release does include a security issue, a memory disclosure in certain queries problem. And it looks like the known vulnerabilities only happen if you have max worker processes set to zero. But they say it is possible if you have a different setting that there may be some vulnerabilities with regard to it. But it's not known at this time. But basically this is a security issue that you should upgrade to address, as well as over 75 bug fixes and improvements that were done for a lot of these different versions. The other thing they mentioned is that PostgreSQL 9.6 is end of life as of November 11 of this year. So be prepared for that if you have not upgraded to at least version ten. So another opportunity to go ahead and upgrade to fix some issues with postgres as well as test out postgres 14 beta Three, which should be released sometime in the fall.
The next piece of content is that PG Bouncer 1.16 has been released. The main features of this is now you can make adjustments to the TLS settings without having to restart PG Bouncer. You can just do a reload so it's not going to drop any connections. They also mentioned that the max length of password has been increased to accommodate some cloud services as well as better cancel request handling. So it does have a number of bug fixes as well, and you can click on the Change log link to find out more.
The next piece of content is we did have our Rubber Duck Dev Show presentation on database performance this past week, and we discussed database performance in general, but about the 15 minutes mark of the episode, it's about an hour in length. We go into a deep dive of PostgreSQL performance, although it is applicable to other relational database systems. But I kind of go through the process that I tend to use to optimize database performance and the things to look out for, and spent a lot of time, of course, talking about indexes. So if you're interested in that, you can go ahead and check out this piece of content.
The next piece of content is using Postgres Create Index, understanding Operator classes, index types, and more. This is from Pganalyze.com and they're talking about indexes in postgres. So this is a good complement to the previous presentation. But they have a focus on operators in the first part of the post where they're talking about how the parser uses them to determine the best plan that it's going to follow. And it says how index types are related to that, in that certain operators work with certain index types. So for example, a hash index is mostly for equalities, whereas a b tree can do equal to greater than, et cetera, as well as you can't do any sort of contains with a b tree index, but you can with a gen index. So the operator kind of dictates what kind of index you can use, and they give some examples of that, and then they go into the different index types that are possible in terms of B tree index. Both a. Single column, a multicolum, as well as using functional index or expression indexes, such as taking the lower of an email, which we also discussed in our presentation, as well as partial indexes. So only indexing part of the data in a table, talking about index only scans and using the new include clause to add additional columns to it, as well as following up with when you're applying and removing indexes from a large database that's active, you definitely want to add and remove those indexes concurrently so you don't lock up the table. So if you're interested in learning more.
[00:04:26] Speaker B: About that, you can check out this blog post.
[00:04:30] Speaker A: The next piece of Content power Use of Indexes in PostgreSQL A User Perspective this is a video on YouTube on the Percona YouTube channel and again more discussion about indexes and how they affect performance. And this approximately hour presentation goes into all of the different index types and what they do, what their capabilities are from gen to Brend to Bloom to Gist to the Btree, etc. E. So if you want a great overview of all the different index types and how they can be used to help improve PostgreSQL performance, you can check.
[00:05:06] Speaker B: Out this blog post.
[00:05:09] Speaker A: The next piece of content PostgreSQL versus Redis versus memcached performance this is from Cyberdeck Hyphen Postgresql.com and he's talking about an article he recently discovered that talked about redis versus memcache performance and he asked the hmm, how would storing this data in postgres be compared to storing it in something like redis and memcached? And he talked about his setup here he's doing a few things such as using an unlocked table and using asynchronous mode to try and get better performance to match what you would see in Redis or memcached. And the blog post that he looked at compared memcached and redis based upon the number of records inserted and the speed both from a write and from a read operation perspective. So he replicated that for postgres. Now, unfortunately with these results, I have to agree that this is a little bit of an apple and oranges comparison because he did not 100% know how they did the redis and memcache tests. And frankly, what I think would be more beneficial if he had actually installed Redis and used his exact same test script to test redis in the same way to test memcached in the same way as well as PostgreSQL. I think that would have been a much better comparison to look at the number of reads versus writes because you do see quite a disparity like in terms of the read times, how they're significantly better for postgres. So I suspect there's something different because again, he's taking these test results from a blog post, whereas he produced these. So I think there's some discrepancy there. But this blog post was interesting in terms of seeing how he did the test to how he tried to approximate what Redis and memcached are doing, basically kind of removing postgres features. So when you have an unlocked table, you're not going to log that much to the wall. So that automatically gives you a boost in performance. And it looks like he only inserted a row at a time. I wonder what would happen if he tried to insert more than one row at a time because that should give some better performance. So it's an interesting blog post, but I do think it's kind of an apple to oranges comparison.
The next piece of content is Impact of Network and cursor on query performance of PostgreSQL. This is from Procona.com and they're talking about how the network or the speed of the network can have an impact on your query times. So for example, based upon the amount of data you're transferring to a client or to your application servers from the database server is going to have an impact on performance. So for example, they were bringing over a large amount of data and they did an analysis of query times saying, okay, what happens if we run this query on the database host itself? And you can see most of the work here, they're saying was done in about 1 second, around 1069 milliseconds, but the log reported it was complete in about 1.6 seconds. So basically 0.6 seconds slower than what it actually took to run it looks like. And they were trying to understand the discrepancy. And when you look at executing from a remote application, the timings go from again around that 1 second to over six and a half seconds, so significantly longer. So what they did is instead of pulling all the data, what they did is just pulled like the max from one of the rows. So it's much smaller, the amount of data being transferred. And when running on the host itself, you can see it's nearly identical, the runtimes for actually running the query and what's reported in the log and then from the application host it's still similar, nearly identical. So it looks like the time differences here are all due to the amount of data having to be transferred to the client and they use this capability in a PG gather script that they use. But this wait information is also available in the PGSTAT Activity view and what they see is that the majority of the wait time is due to client writes and that's writing data to the client. So all this discrepancy appears to be due to just the amount of data that has to be written to the applications client. And then they took a look at the impact of cursors and they looked at client side cursors. My understanding of this is they're pulling all the data down or a big subset of the data down and iterating through on the client itself that data not impacting the database when they're going through each record. But if you're using a server side cursor, you get a little bit of a false sense of security because what it logs is just the initial fetch and not subsequent fetches in the log. So you want to be cautious of that. But again, with the cursors, they saw a lot of client reads that were happening. So basically this post is talking about being aware of how much data you're transferring from the database to your application servers and what is that distance and what is that bandwidth. The faster and the lower latency connection you have will get you better performance. And a few other things they mentioned here is that try to avoid selecting all the columns from a table to be able to minimize the data transfer, only select the columns you need. Also avoid fetching large number of rows at once because again, that's just going to impact how responsive the application can be. They also suggest avoiding server side cursors whenever possible and maybe doing the iteration on the application itself. Pull back a set of records and iterate over it on the client. So, definitely an interesting blog post and things to take into account as you're working with your application.
The next piece of content NOSET PostgreSQL extension for blocking, set and reset execution. And this is from ungrace.com. And this is interesting that they're talking about a new extension that they created called NOSET. And what this does is a user, when they're using a PostgreSQL session, can actually alter different parameters of that session, including increasing their work memory or maybe increasing max parallel work per gather. And you could impact the resources of the server and its availability. So there may be cases where you want to limit a user being able to set these things so you're not hogging all the resources of the server. So that's exactly what this extension does and they talk about how to install it and how to enable it for your database. So if you're interested in that, you.
[00:11:49] Speaker B: Can check out this blog post.
[00:11:52] Speaker A: The next piece of Content migrating PostgreSQL to Kubernetes. So this is from Percona.com and they're talking about how if you have a postgres server, how you can convert it to using Kubernetes. Now they're using their own operator, the procona distribution for PostgreSQL operator. Different organizations have their own Kubernetes operator, essentially, and they're using their procona based one. And the technique that they're using is basically log shipping. So you have a postgres server set up and it's going to ship its logs to a bucket, in this case Google Cloud Services.
And then this Kubernetes Pod is going to pick up those logs and apply them to the database cluster or clusters within the Kubernetes Pod and that's how you get things in sync. Now, it's interesting that they didn't describe using, say, streaming replication or logical replication, but those should also be viable paths. But this particular example uses log shipping and they're using PG backrest for that purpose. Maybe it's because the percona operator has PG backrest installed as a means to manage the cluster and that's why they did it that way. But anyway, they go through the whole process and give you the documentation on how to do it and cut it over. So if you're interested in that, you.
[00:13:11] Speaker B: Can check out this blog post next.
[00:13:14] Speaker A: Piece of content horizontal Scalability Options in PostgreSQL this is from Heigo CA. They're talking about the different ways that you can horizontally scale postgres out. Now, the first one you can do is read scalability. So you can have, again, one primary database that has many replicas and you can scale those reads across the replicas. That's one way to do scalability is through reads. Well, you can also do a read write scalability. Now, in this scenario, they're using postgres's foreign data wrappers. So basically you're sharding your database into multiple instances and then using a foreign data wrapper to write to each one. Now, there's not a convenient bait in way to do this with postgres yet we're moving towards that. But this is definitely something that takes a bit of work to set up. Another option that's the most popular one I would say out there right now is the Citus extension. So that allows easy horizontal scalability. And there are some costs if you want all the features of it, of course. And the other option they mentioned is doing logical replication. So it is possible to copy data from one database to another and logically replicate it. So if you want to learn more about how to horizontally scale postgres, you.
[00:14:31] Speaker B: Can check out this blog post.
[00:14:33] Speaker A: And the last piece of content. The PostgreSQL Person of the week is Julian MarkWatt. So if you're interested in learning more about Julian and his contributions to postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the Show Notes. Be sure to head over to Scalingposgres.com where you can sign up to receive weekly notifications of each episode or or you can subscribe via YouTube Ritens. Thanks.