Recovery Configuration, Alter System, Transaction Isolation, Temp Table Vacuum

[00:00:00] In this episode of Scaling Postgres we talk about recovery configuration, alter system transaction isolation in temp table vacuum. I'm Kristen Jameson, and this is scaling postgres episode 85. [00:00:16] You all right? I hope you're having a great week. The first piece of content is replication configuration changes in PostgreSQL twelve and this is from secondquadrant.com. And basically with postgres twelve the way you set up recovery as well as replication has changed because the recovery.com file is no longer used. In fact, if you have one in the directory when you try to start the PostgreSQL twelve cluster it's going to not start and give you a fatal error using recovery command file recovery.com is not supported and in place of this file instead it's using two signal files. One is that the cluster should be a standby and it has a standby signal cluster in the data directory I believe, or it has a recovery signal file which means it's starting up in a targeted recovery mode and if both files are present the standby signal file takes precedence. And it mentions some of the things here that when the standby is promoted, the standby signal file is removed and if a point in time recovery is taking place, the recovery signal will be removed once the recovery target is reached. And of course what this also means is that if neither of these files are present the cluster will start up acting as a primary database server. Now pretty much all of the configuration options that used to exist in recovery.com are now will be in the postgresql.com file except for two things that they mentioned here. One, the standby mode is no longer used. They're using the two different file signal types now. And secondly, the trigger file has been renamed to promote trigger file. So it's the file that will, when exists, promote this to the primary. Now the advantage of having these commands in the postgresql.com file is that now you can do things such as alter the configuration and doing a reload, whereas usually changes to the recovery.com file required a full restart. Now what's great about this post it also mentions some gotchas with this and the first one is that alter system settings take priority. So basically if someone changes the configuration by using alter system, what it actually does is alter a PostgreSQL auto.com file that actually exists in the data directory. Like I tend to use Ubuntu and the PostgreSQL comp file is located in the etc PostgreSQL directory and there's a folder structure for each cluster that's on that system whereas this auto file exists in the data directory and basically you're not supposed to modify it. And these changes take precedence over settings in the PostgreSQL comp file. So basically when you use the alter system command it actually makes those changes into this separate file. So it's just something to keep in mind that settings in this file take precedence over ones in the PostgreSQL comp file so you may run into confusion as different things are going to be configured in different areas and why this is important. I'll get into some of the it looks like some of the utilities may be storing some settings here. The next gotcha is that replication configuration settings may be present even on primary servers. So because this is in the postgresql.com file it could have like primary connection info setting here could be in the primary but it's not a replica. Basically you have to rely upon the standby files to identify if it's a replica or not or query a system table. The next gotcha is no canonical location to write configuration settings basically since it's part of the system configuration now it could be in multiple files or multiple locations whereas before it was just known to be in one file. And the other thing they also mentioned is of course that the last configuration parameter red takes priority. And again, we're referencing again the PostgreSQL auto.com file that is modified by Altersystem and other utilities that they mentioned here such as PG based Backup or Rep Manager, which is a backup Replication Manager, I believe by second quadrant another gotcha is a risk of signal file confusion. So basically you need to just be aware of these two files in their existence and their purpose. And the last gotcha that's really mostly just a configuration change is that only one parameter from the recovery target family may be specified and they mentioned that in a PostgreSQL eleven and earlier the last instance of these parameters was used. However this time it system won't start up unless there's one and only one which I think is a safer setting for the recovery target but it's just a change you need to keep in mind. Now related to this post there's actually three other posts that talk about these configuration changes. The next one is by Procona.com and its title is how to set up streaming Replication in PostgreSQL twelve. Now again they talk a lot about the changes that the fact that recovery.com has gone away and you need to do your configuration slightly differently to do replication. So they are doing this on a CentOS system? I believe so. They're showing the different ways to get the change the parameters you need to set replication, create a user, update the Pghba file so that you can connect to the primary. And then they do a PG based backup to the primary. And they're actually using the R option, which we'll look at in a second here. Basically, that writes the appropriate files to be able to start at the system in the recovery mode. So for example using this R option the Pgbase backup utility actually writes to the PostgreSQL auto.com file that was mentioned before that's in the data directory PostgreSQL and it adds the listen addresses and the primary connection info that was input with the Pgbase backup command here. So this is a pretty good post about showing you how to walk through and talks a little bit more about the changes that have come with PostgreSQL twelve with regard to recovery and also replication. The only thing that I saw here that it kind of gave me pause is the fact that they actually did an echo to write to the PostgreSQL auto.com file when it explicitly says in the PostgreSQL auto.com file is do not edit this file manually, it will be overwritten by the Alter system command. So they did overwrite it here. I would probably not do this because it seems to be not the best practice but just something to keep in mind. [00:06:50] Next post also related to this change. So apparently with people trying twelve they may be running into this issue. This is where is my recovery.com file in PostgreSQL version twelve and this is from Postgres Rocks Enterprisedb.com and his answer is the short answer is it's gone. And again he runs through the different process, why it's gone again talking about how PG based backup with Er option and how it works differently. So again, another resource to go and read to help you get up to speed before you're implementing twelve and needing to change your recovery processes potentially as well as your replication processes. With this I haven't used the Altra system command a lot. Typically what I do is modify the postgresql.com file and then do a reload of the system but this may see more increased usage so it would probably be a good practice. And again, this command has been around for a long long time. But to get familiar with the Alter system command and how it works as well as looking at the PG based backup command. So in versions eleven and prior the R option, the capital R option or the Writerecovery.com option was write a minimalrecovery.com file, well those no longer exist and in version twelve that same option in PG based backup actually does a create a standby signal file and append connection settings to the PostgreSQL auto.com file. And this also adds the replication slot if Pgbase backup is using a replication slot. So again, just some changes and some gotchas to be aware of with regard to PostgreSQL twelve in terms of recovery as well as replication. [00:08:35] The next post is Time and Relative Dimension in Space and this is from Pgdba.org and this is a great post. [00:08:46] He explains things very well, talking about essentially transaction isolation levels. First. He covers MVCC. So it's MultiVersion concurrency control how PostgreSQL handles concurrent access to a lot of the different data. And then what each individual can see, he goes into how the XID is assigned once a transaction started. And that along with essentially hidden columns and tables that define what XID it was created or deleted at. Determine what you can see in the database based upon the version that you're using. And then a part of this. And this was a great explanation of how this works. So if you're not familiar with that, definitely check this out. And then he follows up with a transaction isolation and how essentially this is the SQL standard defines four levels of transaction isolation in terms of what's possible. Like at the lowest level, it's possible to get dirty reads when a transaction can access the data written by a concurrent not committed transaction, when it's possible to get a non repeatable read, a phantom read and a serialization anomaly. So all sorts of different possibilities. And then what he did is created this table here that compares these possibilities with the isolation level options available in PostgreSQL. So essentially with a read uncommitted everything but a dirty read is possible. Essentially with PostgreSQL it's not possible to get dirty reads. And I should mention that the default state is read committed for PostgreSQL. So if you do nothing, it's going to be recommitted. So it is possible to get non repeatable reads. You do a select, you get a value, another transaction modifies it when you do a select, again you're going to get that updated setting. So you get the most updated data that's in the database when you do a query. But that's what a non repeatable read is. But you can change the isolation level in PostgreSQL to be a repeatable read. And this can be on a per transaction level or for your particular session that you're connecting to the database as. And with a repeatable read, you actually get rid of non repeatable reads and phantom reads. Although it's still possible to get a serialization anomaly where the order of the transactions, there's no guarantee as to when that can happen. And here he talks about being able to set the transaction isolation level as a part of a transaction and he goes over discussing this in more detail, read committed along with some examples, repeatable read and then finally serializable. So again, this is a great post and discusses how this kind of system works and how using these transaction isolation levels you can alter the concurrency of PostgreSQL depending upon your use case. So particularly if you're a developer, I encourage you to check out this post. [00:11:33] The next article is what is Autovacium doing to my temporary tables? And this is from CyberTech postgresql.com and he's talking about what Autovacium does. This essentially cleans tables, like what was mentioned in this previous post is that essentially to handle MVCC, what PostgreSQL does is it doesn't do updates in place when there's an update. What it does is it actually creates a new row and then flags the old row essentially for deletion, but it can still be visible based upon the transaction isolation level. What Auto vacuum does is it goes into the table and then clears out all the old rows, essentially finally deletes them as opposed to just being marked for deletion, as well as reset the XID that was mentioned in this post as well. So Auto Vacuum goes on a regular basis and essentially vacuums all these dead tuples from the tables. But an issue is it actually does not vacuum temporary tables. Temporary tables exist within the essentially connection that you're using and they get discarded once that connection is stopped or that session has stopped. And he has an example here where he's actually using an extension to be able to get statistics with regard to the tuple. He creates a real table called T Real and then creates a temporary table called Ttemp. He inserts 5 million rows. He deletes half of those rows from each of the tables, waits a bit for auto vacuum to clean up the real table, and then when he checks the stats of the tuple for the real table, you can see hardly any dead tuples, and a fair amount of free percentage of space has essentially been reclaimed by doing the vacuum operation. Whereas when he looks at the temporary table, you can see tons of dead tuples. So Auto Vacuum has not touched this table and you of course, don't have that much free space. So this can be an issue if you're trying to keep temporary tables around a long time because they're not going to be auto vacuumed. He says you can manually vacuum, so that works, but definitely, again, don't have like long running transactions with temporary tables that last a while because they're not going to be vacuumed. If you do have a temporary table, they get so large it's going to be around a while or potentially run into a transaction ID wraparound, depending on how much data you're working with. Sounds like you may need to manually vacuum it or drop those temp tables when you can and recreate them again. So just something to keep in mind. And another great blog post from CyberTech Postgresql.com. [00:14:04] The last post is newbie to PostgreSQL. Where to start? And this is from the Heigo website and it's basically listing resources that they suggest to their developers or people getting to start to use PostgreSQL for the first time. And the number one link he mentions here is Postgresqltutorial.com, as well as Momgm US's website at his extended presentations and of course, the official PostgreSQL documentation. So if you are at the Getting Started stage, it's definitely a blog post to check out. [00:14:37] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 167

Interpreting Explain Analyze, Bulk Loading Data, Limiting Update & Delete, pg_buffercache | Scaling Postgres 167

Episode 35

Terabyte Scaling, Postgres 11, Connections, Backup | Scaling Postgres 35

Episode 124

Partition Migration, Like Performance, Best Fillfactor, In-Memory Tables | Scaling Postgres 124

Recovery Configuration, Alter System, Transaction Isolation, Temp Table Vacuum | Scaling Postgres 85