[00:00:00] In this episode of Scaling Postgres we talk about prewarming nondeterministic collations generated column performance in Foreign Keys. I'm Kristen Jameson, and this is scaling postgres episode 86.
[00:00:21] Alright, I hope you're having a great week. A first piece content is Pre warming PostgreSQL I O caches and this is from CyberTech Postgresql.com and it's actually talking about an extension called PG Prewarm that lets you essentially warm up your shared buffers because normally when you restart the system you're going to lose some of that in memory cache. Now some of it, because it does utilize some of the OS caching may still be present, but it should help you preserve things like shared buffers. And using this extension, he says there's two ways you can do it. You can do manual caching or you can do automatic caching on startup, which is probably what most people would want, and to manually cache something after you have created the extension in your database, which in this extension is part of the contrib module. So install that first. You just do a select star from Pre Warm and then specify the table name. So it's a function called PG Prewarm and the result says 4425 pages have been read and put into the cache is what gets returned as the results. Now they have some other options here you can use like specifying the first block, last block, things of that nature, but then he goes into automatically populating your PostgreSQL cache and basically once you install the extension you then add PG underscore Preworm to your share preload libraries in your postgresql.com file and do a restart. And then what you will see is an auto pre Warm master process that is started in addition to the other standard processes. And as he says here, PG Preworm will store a list of blocks which are currently in memory on disk and after a crash or restart, PG Prewarm will automatically restore the cache as it was when the file was last exported. So if you have a larger database system and you find it takes time after a restart for the caches to warm up, this is definitely an extension to check out and try using.
[00:02:28] The next post is nondeterministic correlations. And this is from PostgreSQL Verite Pro and apparently started since version twelve. PostgreSQL correlations are created with a parameter called deterministic. So basically doing a deterministic comparison it basically compares it essentially at the binary level to make sure that things are comparable. But you can also of course set this to false. So essentially you can get nondeterministic correlations and what that enables you to do is potentially do case insensitive searching or removing or searching and ordering without considering accents that are a part of words. And he has a list here of what he calls a fancy comparison features that are enabled by nondeterministic correlations. So this is something you couldn't do before, but now you can in version twelve. So you can determine equality between canonically equivalent sequences of code points. So they're doing a comparison here and when deterministic is false, you can get a truthful comparison between these values. So from a binary sense they are different, but from an actual sense, just looking at them they are identical. So you'll get that with the nondeterministic comparison. The next one is equality between compatible sequences of code points. So I believe this is considered with regard to ordering. So comparing two F's with a combined F, so two Latin F letters, you can get an equal comparison on that if you create a correlation that has a secondary strength. So by default when you do deterministic false, it still won't consider these true. You actually have to set the strength at a secondary level to be able to allow these comparisons to equate to true. Then they go on to ignoring case. And again, you can ignore case as long as the collation is set to a secondary strength. You can ignore case and accents if you set the strength to a primary. So at a primary strength you can ignore capitalization and accents. You can ignore accents but not the case. Using some more keywords here, saying the strength is primary but the case level is yes. You can ignore spacing and punctuation by setting different values here, alternate equals, shifted, matching compatible symbols and ignoring code points assigned to individual characters. So there's a lot of different capabilities with this support for nondeterministic correlations. I'm not 100% up on all this, but very interesting possibilities in terms of doing ordering and comparison operators when working with text. So if you do a lot of that, definitely check out this blog post to learn more about this new feature and potentially how it can be used.
[00:05:29] The next post is called Regenerated and it's basically about generated columns. Now, we've seen some similar posts about generated columns in previous episodes of Scaling Postgres. What is particularly interesting about this one is that they do a comparison of trigger related and then using the new generated column features. Because if you wanted to achieve what generated columns do, basically generate a column based upon existing data within that table row, for example, like adding two values together. But what he does is he does a trigger approach because you have to use triggers because this feature does not exist before version twelve. So he did a trigger approach and then he used the native generated column approach and then he looked at their performance. So that's what I found interesting here. So he compares inserts using both methods. Using the trigger method, it completed in 6 seconds, so the million rows inserted 6 seconds. For the generated columns feature it's at 4 seconds. So it's more performant, which you would kind of expect because it's a core feature of PostgreSQL now. But with no generated columns used, the insert performance was about 2.4 seconds. So generated columns still do take a performance hint, just not as significantly as when you're using triggers so that's something to keep in mind using this new feature. Then he looks at updates. So an update using the trigger was at 13 seconds, it was at 9 seconds with the generated column and about 5.8 seconds with no generated columns or triggers. So again you'll see a performance degradation with update but not as far as using the trigger and then in terms of deletes the performance between generated columns and the trigger was actually pretty similar about four and a half seconds each and less than half that speed when no trigger or generated columns were used. So definitely a great post to cover the generated column feature in general how you could use triggers instead if you're using your version prior to twelve as well as the performance comparisons. And this is from Pgdba.org, the next post PostgreSQL twelve foreign keys and partition tables. So this is a pretty brief post but it talks about the pretty big benefits for referential integrity that come with these new features added to twelve to be able to create foreign keys and then reference foreign keys within a partitioned table and they give example where you have an items partition table, a stock partition table and then a warehouses table that is not partitioned and how in the stock you can actually use foreign keys that references the items and references the warehouses. So now pretty much any way you can use foreign keys so this is a great boon to help maintain your referential integrity when you are using partition tables. And this is from secondquadrant.com, the next post also from secondquadrant.com is postgres BDR. It is also about fast safe upgrades. So BDR is their bi directional replication product, basically their master to master PostgreSQL proprietary solution. But I thought this post was interesting because it does advocate where BDR has some advantages particularly with upgrades. So now a lot of people use it because they want 24/7 availability and nothing to go down. And when you have a primary and a replica database and you need to switch over there's some time that you are actually going to be down because maybe the primary goes down and you need to wait for your monitoring system to detect that it's down. And then it does some automated failover method that can take on the order of minutes, whereas they're claiming here they've gotten it down to a second flipping over with their BDR product. But again, everything's always in sync. You don't have to wait for something to be synced over from a primary to a replica. Now, things may eventually need to catch up, but I could definitely see this has some advantages. And particularly they're talking about upgrades and how you can have bi directional replication between different versions, which, of course, we've seen some posts do with regard to logical replication, but their product kind of does. It apparently more out of the box to be able to keep things up and replicated. So definitely an interesting use case for a BDR that I had not considered before. So if you're interested, definitely a blog post to check out. And another post from secondquadrant.com is managing another PostgreSQL commit, fest. So if you want some insights into kind of how the PostgreSQL commit fests are run, here's a brief post that discusses that.
[00:10:31] The next post is Braces are too expensive. And this is from Robert
[email protected] and basically he's talking about the PostgreSQL executor being volcano style. And what does that mean? It means a number of things. The query planner generates a plan that is organized into a sort of tree structure and the system is designed to be extensible with individual nodes that appear in the plan tree having no data type specific knowledge, but rather obtaining their knowledge about specific data types from support functions. And it means that the executor function is using a pull model. So plan nodes generally support an operation that says give me the next tuple. So he has an example here where basically the nested loop one is run. And then it says, give me the next tuple from this one. And then this says, okay, give me the next tuple from this one. Retrieves it, and then says, give me the next tuple for this one, retrieves it. And then this request, this one. So I would say this is kind of like a more top down model, but all the interesting work happens essentially on the leaf nodes here, here. And he says this can create an overhead of a lot of function calls, which is why I believe he's saying braces are too expensive. It's not about orthodontics it's about the number of function calls that are being used. And they had done an overhaul of PostgreSQL expressions because the expressions were done in a similar tree of executable nodes and they reworked it and transformed the expression tree into a series of steps that are executed one after another, like a sort of dynamically constructed program. And they actually got some increased performance by doing this for the expression size. So basically they're considering potentially doing this for optimizing these plan trees. So this is something that they're thinking about doing in terms of PostgreSQL that may be able to gain some better performance. So if you're interested in digging into some of this, definitely a blog post to check out.
[00:12:32] The next post is tuning checkpoints and this is from Evolvemonkey Blogspot.com and he's basically talking about how to set up your configuration for checkpointing in terms of adjusting checkpoint timeout. Min wall size max wall size checkpoint Completion target gives some advice about first kind of set what you want your checkpoint timeout to be and then kind of backtrack from there. As well as doing some queries like this to see how fast you're generating wall files to get a sense of what you want to set these parameters to. And he gives an example where they helped optimized database that went from this type of very high activity, were able to calm it down by optimizing these parameters. So if you haven't done this yet, definitely a blog post to check out.
[00:13:21] The next post is PG Bouncer. One point 120 is released and they say this release contains a variety of minor enhancements and fixes. Also contains some fixes to the new Scram support in PG Bouncer. One point eleven, improving interoperability with newer PostgreSQL versions. So users of Scram are particularly advised to upgrade. So if you upgraded for Scram support, you'd probably want to upgrade to one point twelve relatively soon. And this is from a Pgbouncer.org.
[00:13:54] The next post is PostgreSQL Connection Pooling part One pros and Cons. So again after a PG Bouncer post, this is a very general post that talks at a very high level about the benefits of connection pooling and some of its disadvantages benefits being able to have more connections, virtual connections as they were relative to the physical connections to the database. Some of the cons is you're introducing another source of downtime if something goes down as well as increased latency. And you're going to be putting essentially your authentication layer within PG Bouncer a lot of cases because the clients need to authenticate to PG Bouncer. But if you're considering using a connection pooler if you haven't already, definitely a blog post to check out.
[00:14:43] Next post is that PostGIS 3.1.0 Release candidate Two is released and there's the information here from PostGIS Net related to that is a post about PostGIS and that is generating land constrained geographical point grids with postgres. This is from Corbend Net. So if you have a use case for doing that, definitely a blog post to check out.
[00:15:10] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the Show Notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.