5 Minutes of Postgres, Unnest, SCRAM, Multi-master Review

[00:00:00] In this episode of Scaling Postgres, we talk about five minutes of postgres unnest scram and multimaster review. I'm Kristen Jameson and this is scaling postgres episode 200. [00:00:23] I hope you, your friends, family, coworkers continue to do well. Well, welcome to our 200th episode. So I've been doing this about four years. I don't intend to stop anytime soon, so I hope you do enjoy the content. And what I'm doing on a weekly basis is reviewing all the content I can find for postgres and presenting roughly the top ten articles that are present just to share with you. And I do it in a prioritized order. Well, our first piece of content is actually the start of it looks like a new video series called Five Minutes of Postgres. And this is Five Minutes of Postgres episode one using postgres statistics to improve bad query plans, PG Hint Plan Extension and this is from Pganalyze.com. So it looks like this is something they're doing on a weekly basis. I have the two episodes for the first and second piece of content here. This first one talks about where someone achieved a 200 time faster improvement by adjusting the query. But basically the key problem was that he reviews in the video as well as talks about here, is that it looks like the main problem was that the planner had invalid statistics and it thought there were more rows it had to search through than it did. So it chose an inappropriate plan. Now, when I've encountered issues like this, basically I have to increase the amount of sampling done for statistics. Or you can create extended statistics looking at more than one field to add relationships to the data. And he talks about this here in this post. And he uses other posts as reference, like this article that shows using Postgres extended statistics to achieve a 3000 speed up. So this episode goes through this process and definitely encourage you to check it out. The next episode was using unnest generate series and PostGIS So he talks about an article with regard to Unnest that I'm actually going to cover here. Generate series we covered last week as well as a PostGIS article that is also in this week's Scaling Postgres, but definitely some more video content if you are interested in checking that out. Now one of the Unnest articles that was discussed, and they did discuss others in Five Minutes of Postgres, but this article is Postgres Unnest cheat sheet for bulk operations. And this is from Atdatabases.org and they're talking about using Unnest to basically take an array and create rows out of it. That's what unnest does. So he says it is actually faster to insert records when you use a nest. Now, I always like to see performance comparisons and unfortunately this post doesn't do that, but this sounds like why it's faster is because there's less parsing required because this is essentially just a multiro insert and you avoid parsing all of that by just giving it an array, and presumably this is faster. He also shows how to do this with update records. So basically converting arrays into records that you want to insert. And he says the normal update statement only really lets you update multiple records in one go if you want to set them all to the same value. Well, you can do that if you're using a sub query, so you can update based upon the value in a table. But I suppose this technique has advantages if you want to present that data as an array. So there is another way to do it. So I think you can do some of the same things, but maybe there are some use cases where unnest has some advantages. But definitely an interesting blog post. I encourage you to check out the next piece of content, PostgreSQL 14 and recent Scram authentication changes. Should I migrate to scram? This is from Procona.com, and what makes this post relevant is that postgres 14 has set Scram as the default. So some people are running into particular issues. When they've upgraded to 14, that part of their environment wasn't ready, or maybe some of the libraries weren't ready for it. But basically 14 is using Scram by default now. It's just a default. You can still use MD Five, which was the normal way of handling passwords before, but Scram is the default. And this post talks about some different settings to be aware of in using it, such as setting channel binding, potentially when you're connecting, as well as when you're setting password encryption, because setting it to on doesn't work anymore, you actually need to say you're setting it to Scramshot 256, where you're setting it to MD Five. So if you want to learn more about it, you can check out this blog post, the Next Piece of Content PG Friday Replication Engine Potpourri this is from Enterprisedb.com and they're basically reviewing a collection of different multimaster utilities that work with postgres. One is Ricardo. The other is Symmetric DS. [00:05:13] Another is the EDB Replication Server. [00:05:18] The next is SharePlex as well as BDR, which is bi directional replication. And they talk about each of these solutions and the different ways that they achieve a multimaster postgres installation. So if you want to learn more about that, definitely check out this blog post, the Next Piece of Content accelerating Vacuum in Postgres this is from Dataegrit.com, and they're talking about some techniques you can use to accelerate your vacuum. Now, whereas I agree with the parameters that need to be changed to accelerate vacuum or auto vacuum, I would place emphasis on other areas, like for example, the vacuum cost limit, which is not even in the listed values here. To me, this is the most important value to change because this sets how fast vacuum can progress. In other words, it does work until it reaches a particular cost limit and then it stops its work to allow the database system continue doing its work. So this is a value from 1000 and I've put server systems on it where it's eight 9000, because they were very large multi terabyte systems with a lot of CPU that had the processing power to spare to run vacuum frequently. And I would say this is the number one thing to literally make vacuum go faster. Now, apart from that, you can drop the costs that get added into this. So reduction of these different costs that they list here the page hit the page, missed the page dirty. You can reduce those costs and that essentially is equivalent to increasing this, but it's basically allowing vacuum to work more. Now their first mention here increasing the number of auto vacuum max workers. Well, you have to be aware that this is governed by the vacuum cost limit. So if you leave this at its default and just increase the number of max workers, each worker will be slower because this is a global limit. So if you go to three to ten, each individual auto vacuum worker will now be three times slower because it's measured by the same limit. Now, this will help you vacuum more tables in parallel, but in my experience, the issues with vacuum being slower, it's usually on one table that it's just taking forever to vacuum. So literally what you want to do is have vacuum run faster. And generally I usually keep the automax workers at three. I've brought it up as high as six if needed, but that is usually not the area that needs adjustment. And of course the other things they mention here, reducing the nap time, that can be important for running auto vacuum frequently enough, as well as defining when auto vacuum runs, so adjusting the different thresholds that they mentioned here. But if you want to learn more about auto vacuum configuration, you can check out this blog post. [00:08:03] The next piece of content postgres SQL Lessons from Advent of code Challenges this is from Heap IO and they're talking about some different code challenges. They used SQL for and they used particular features of SQL such as window functions to get particular ranges. They used CPEs or common table expressions to create readable application code. And actually, one of the challenges, it was beneficial to think relationally instead of iteratively to come up with a particular solution. And lastly, there was one case where the SQL was much more compact compared with a JavaScript solution. So definitely a post to explore different ways you can use SQL. [00:08:47] Also related to this not so equal texts in PostgreSQL and how to compare texts in a more elaborate way. This is from Andreas Sherbaum La and they're trying to do string comparisons where they look equal, but they're actually not. And this postcode takes it in a different path of how to actually do the comparison by creating its own data types and operators. So this is definitely an example of how extensible Postgres can be. So if you're interested in that, you can check out this blog post. [00:09:19] The next piece of content readonly mode for PostgreSQL. This is from JCats Five. He's talking about a parameter called default transaction readonly. And if you set this, you can no longer send DML to the server. Basically, you can only send it select Queries. You can't create or insert or update or delete data. And this is actually a session variable. So you can set it per session. You can set it for your system, but the thing to keep in mind is that an individual session can turn it back off. So it's not great for ensuring that a database is in readonly mode, but you may find a use case for using it. So if you want to learn more, you can check out this blog post. [00:09:59] The Next piece of Content waiting for PostgreSQL 15 add Header support to Copy text Format this is from Dep.com and basically, much like the CSV output for copy, you can add a header. Well, now you can do it with the text format as well, and it gives you a header in that case. So that's apparently coming with Postgres 15. [00:10:23] The Next piece of Content elevation Profiles and flight lines with Postgres this is from Crunchydata.com and this is a scenario where someone was trying to project flight paths at a particular elevation. So if you're interested in geographical information systems, this is a post you'll probably interested in checking out. [00:10:45] The next piece of content transaction ID and Snapshot information Functions this is from Haigo, CA and they're talking about the transaction functions txid current that gives you the current transaction you're on, and a few other functions. Well, they've actually had their names changed. I mean, the old ones still work, but they started creating new ones that are PG current exact ID. So there are new functions for these in the newer versions of Postgres, and it explains how each of these are used. So if you're interested in that, you can check out this blog post next piece of content. The PostgreSQL Person of the Week is Flavio Grigel. So if you're interested in Flavio and his contributions to Postgres, definitely check out this blog post and the last piece of content that we had another episode of the Rubber Duck Dev show this past Wednesday. This one was on Refactoring in the Wild. Basically, in the real world, how would you do refactoring of your code? So if you're interested in a long form developer discussion format, you can definitely check out our show. [00:11:47] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks. Twist.

Show Notes

Episode Transcript

Other Episodes

Episode 64

Leveraging Indexes, Slugs, Addresses, Security Definer | Scaling Postgres 64

Episode 295

30K Messages Per Second Queue | Scaling Postgres 295

Episode 27

Multiple Databases, Slow Queries, Sorting Performance | Scaling Postgres 27

5 Minutes of Postgres, Unnest, SCRAM, Multi-master Review | Scaling Postgres 200