State of Postgres, Dumping DBs, Configuration Parameters, Partial WAL

[00:00:00] In this episode of Scaling Postgres, we talk about the state of postgres dumping, DBS, configuration parameters and partial write ahead log. I'm Kristen Jameson. And this is scaling postgres episode 94 one. [00:00:21] Alright, I hope you're having a great week. Our first piece content is actually called The State of Postgres. And this [email protected] and this is a survey that they've published along with the results from 500 members of the Postgres community. And this was done by Timescale DB and they have a very interactive site that goes over some of the results and some conclusions from that survey. Now personally, I kind of like at the top link here, the full results to be able to see where different results came from. And this is from about 500 people. So not a huge number of people using Postgres, but definitely a representative sample, but nevertheless interesting pieces of information. It's like for example, it seems like predominantly software developers, engineers responded to it and the vast majority are using it for work, doing software development and of course, whole other set of questions that were asked. So if you're interested in this type of content, definitely a link to check out. The second piece of content related to that is five quick thoughts on the State of the Postgres survey. And this is from Exilla Net at the Azilla blog and he mentions five points from his thoughts. And the first thing section he mentions here is that 500 people, which is a pretty small subset of everyone in the postgres community, so that's something to keep in mind. He appreciates the slow and steady growth. It's not that suddenly people in the last few years have started using Postgres. It's had a long runtime which hopefully demonstrates that it will be relatively stable as opposed to something that's Flash in the pan talked about. You do what now? And he's basically talking about it's predominantly software developers that were filling out the survey using Postgres and predominantly with regard to software talking about the clouds, I didn't necessarily agree with some of the things mentioned here as I did feel the survey gave some pretty clear indications about how they're running Postgres. And he had some opinions about particular quotes that were picked out. But I thought this was an additional piece of content to complement the survey. So if you're checking that out, maybe you would like to check out this as well. [00:02:37] The next piece of content, how to effectively dump PostgreSQL databases. Now of course there's the tool PG dump all, which dumps the entire database, cluster every database with it along with the global things such as roles, table spaces, users, et cetera. And then PG dump just dumps a single database. All of those globals that are global for the entire cluster are not dumped. And basically he discusses kind of the advantages and disadvantages of each one. And basically he prefers doing using PG dump and not PG dump all. PG dump all is very convenient to get started with. But really as you increase your sophistication you're probably going to want to switch to using PG dumps. So using PG dump all has a number of disadvantages. You can't easily restore, say, a single table. You can't easily do dumps or restores in parallel. So what he advocates is dumping the globals and then dumping each database individually. And he talks about the different formats that you can do with pgdump, that you can't do with PG dump all. PG dump all just dumps out to the plain format, whereas different formats, such as the directory format, they allow you to do dumps in parallel, do restores in parallel, and you can use the J option to specify how many jobs to do that in. And you can even do partial restore. So you just want to restore this particular table, for example. And he has an example script here of doing this backup process. Basically, he chooses to dump the roles into one dump, dump the table spaces into another dump, and then go over per database dumps, doing PG dumps for each one. So, very good blog post. That explains the rationale for using PG dumps versus PG dump all. And if you're using logical backups as your primary way to do postgres backups, I would definitely advocate the solution that he has outlined here. And I should mention this is from Dep. [00:04:41] The next post is setting PostgreSQL configuration parameters, and this is from CyberTech Postgresql.com. Now, the first way to set the configuration, of course, is the postgresql.com file. So that has all the different configuration parameters that you can just go in and edit, although it does support having included files. So there's usually a separate directory called conf D that you can put files into that will be included as part of the postgresql.com. Now, I tend to use ansible configuration management to do my PostgreSQL configurations, and you can do a copy of this, but I prefer to use included files. So just say this is the configuration that I'm changing and basically leave the postgresql.com file unchanged. Now, a thing to keep in mind is that the last parameter set wins. So if you've defined, say, shared buffers in two different places, once in postgresql.com and once in an included file, the last one to be read will be set. So generally the included files will override what's set in the postgresql.com file. The next area that you can change to get configuration changes is running the Altersystem command. Now, what the Altersystem command does is it modifies a PostgreSQL auto.com file. So this is a file that specifically says do not edit this file manually. It will be overwritten by the Altersystem command. So basically, these changes take precedence over what's set in the postgresql.com file. So any Alter system commands are going to make changes to this file to persist over reboots. And as it said, what I mentioned here these values will have precedence over the postgresql.com file. The other thing that they mentioned is that you can set built in settings. So for example, when you start postgres, like for example using the Pgctl command, you can actually pass in parameters at that time that you're starting it up so that database cluster will have that parameter when you've started it. The next area that you can set configuration for is at the database level. So you can do an altered database command to set certain parameters. Not all parameters can be set. There are certain ones that cover the entire cluster that of course you cannot set using an altered database statement. But that's another way that you can set certain parameters. You can also alter the user and set parameters that are impacting a single user. You can also alter user in a particular database. So again, more granularity to be able to set it. And then lastly, of course, is setting at the session level so the connection, you've made a connection to a PostgreSQL server and within that session you can set certain parameters that will be persisted during that session and then they even go into the concept of assigning parameters to functions. So definitely an interesting blog post and if you're wanting to know where different configuration settings can be set for PostgreSQL, this is a very good post that reviews all of those different capabilities. [00:07:45] The next post is Barman 210 recovery of partial wall files. Now this is generally an announcement of the availability of Barman 210 but it also discusses partial wall files and how they've made it a lot easier to handle those. Now I bring this up because it's another post that also educates along with doing an announcement. So for example, normally when you have Archive mode enabled in PostgreSQL, you're going to enable an archive command that's going to save those wall files to a particular location. And generally I always have Archive command running, but there's also the ability to stream the wall files to another system that's running the PC receive wall process. So essentially doing physical replication of the wall to a separate system or process. And Barman actually supports both methods. Archive command can copy it to a particular directory defined by Barman as well as use a PC receive wall process to stream the wall changes. Now the advantage of PC receive wall is that with the Archive command it waits for an entire wall file to be filled up before archiving it. So it must have a full wall file before the archive command is run, whereas PC receive wall constantly streams the changes that have been written to the wall file. So in that case, when it's not a full file, you're going to get a file with a partial suffix. Now the advantage of this is there's actually more data that you will have if you're using PCPG receive wall versus just relying on the archive command. So potentially you will lose less data as they're talking about here. Your recovery point objective will be much closer to zero, and it may be zero if you're using synchronous wall streaming. And they say that basically Barman supports this fulfilling the RPO of zero, but this two point ten point release actually makes it a lot more seamless to be able to handle the recovery process by automatically handling these partial suffix wall files. So if you use Barman, definitely a blog post to check out for the update, as well as also get some good information about the differences between using an archive command and using PG Receivewall. And this is from the second quadrant.com blog. [00:10:10] The next post is how to perform a major version upgrade using PG Upgrade in PostgreSQL. And this is from Crunchydata.com, and this is a relatively short post, but it goes through the process of how you could do a PG upgrade, listing the different commands here. Now, you'll probably find a much more comprehensive documentation for doing an upgrade, but this is a simple post that actually just goes over the steps you would need to go through. There's points of validation you'll want to do and things of that nature, but this gives you a high level overview of how to do an actual PG upgrade. So definitely a blog post to check out if you're interested in that. The next post is how to run postgres on Docker part Three. So this discusses the process of setting up a primary and a replica as docker containers and have them sync data between each other. Now, I didn't actually see where part two and part one were, however, this shows you how to set up a cluster of PostgreSQL instances. So if you're interested in doing that, definitely a blog post to check out. [00:11:15] The next post is Power for New Power Collector Daemon, and this discusses the performance monitor Power for Postgres, and it's in a beta state. And this discusses the implementation of the collector. So if you use Power or you're considering it, definitely a blog post to check out. And the last post discusses a new connection pooler. If you're unfamiliar with it, they're classifying it it's called Odyssey and they're classifying it as a scalable PostgreSQL connection pooler because they indicate it's using a multi threaded process. And this is by Yandex, so it is apparently driving a fair amount of traffic. Now, I tend to use PG Bouncer, but if you're interested in looking at connection pooler alternatives, maybe you would want to check out the Odyssey project here. [00:12:06] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 3

Modeling, Query Performance, Statistics, pgmetrics | Scaling Postgres 3

Episode 47

pgBouncer, Postgres 11 Gems, DocumentDB, JSON | Scaling Postgres 47

Episode 52

fsync Stopgap, CTE Changes, autovacuum_naptime, Postgres Community | Scaling Postgres 52

State of Postgres, Dumping DBs, Configuration Parameters, Partial WAL | Scaling Postgres 94