Episode Transcript
[00:00:00] In this episode of Scaling Postgres we talk about logical replication, prepared transactions, ansible and tidy partitioning. I'm Kristen Jameson and this is Scaling Postgres, episode 43 Postgres.
[00:00:21] I wanted to make an announcement first that Scaling Postgres will not be presented next week due to the holidays. So I hope everyone has a great holiday next week if you are celebrating it and we'll catch up on the content the week after. In terms of this week, our first post is using PostgreSQL logical replication to maintain an always UpToDate read, write test server. And this is from the Several nines.com blog and the first part they talk about is just logical replication, what it is, how to set it up. They go into a little bit of the logical replication concepts and then they talk a little bit about caveats. And this is the part that I found interesting because if you set up logical replication, the subscriber for that logical replication can essentially be read, write and you run the danger of running into conflicts. Maybe you are using sequences but the problem is that one of the restrictions they show here is that sequences are not replicated. So you could run into say a primary key conflict and it talks about the ramifications of that is that actually logical replication stops and you could start building up wall files in the publisher to the point where you run out of disk space. So logical replication is still relatively new as of version ten and improvements have been made to eleven. But there's still some care you need to take, particularly if you're going to be trying to use one of the subscribers for writing to the exact tables that it's replicating. So this post talks about some of those issues to be aware of and it goes through the process of setting up logical replication and how you want to be aware of potential errors that happen and fix them as soon as you're able to. To avoid problems. It goes into a little bit how to deal with primary keys with sequences and also discusses some good practices including how to monitor the solution. So if you're just getting into logical replication this would be a good blog post to check out.
[00:02:27] The next post is Be prepared for prepared Transactions. And this is from the Cybertechgresql.com blog. Now they're talking about prepared transactions. These are not prepared statements where essentially you parse a statement and then you can just execute it later to make queries or statements faster. This is talking about prepared transactions which are actually two phase commit transactions. Now according to the PostgreSQL documentation they have a note here prepare Transaction is not intended for use in applications or interactive sessions. Its purpose is to allow an external transaction manager to perform atomic global transactions across multiple databases or other transactional resources. Unless you're writing a transaction manager you probably shouldn't be using Prepare transaction. So a lot of warnings with this. And it seems like 99% of us will never use this particular feature, only if you're doing a distributed transaction manager. So that's just something to keep in mind. Now, this particular blog post is talking about being prepared for prepared transactions. If those happen to be turned on and enabled, they are disabled by default. How can you get around it? So this blog post talks about, okay, what are prepared transactions, what's the use case, which I kind of mentioned some of it here the problems that can happen, and talks a little bit about the implementation details because actually some things for these transactions get written to disks. So therefore it's hard to get rid of these orphaned prepared transactions. And they give an example. Here how you can lock up your database with prepared transactions. So basically, you can start a transaction, create a lock on the PG Auth ID table, and then prepare transaction locked and then disconnect from the database. Now, PG underscore Authid contains database users, and since it's locked, all future connection attempts will hang and restarting the database won't help because the prepared transaction is still retained. They even tried starting single user mode, and that won't work either. And basically the solution that he came to is actually Deleting, where it gets committed in the database files in PG data to remove this prepared transaction. So again, this should be a super rarely used feature because getting out of problems with it seems like a big issue. But if this is something of interest to you, definitely a blog post to check out.
[00:05:09] The next post is Video ansible and PostgreSQL. And this is from the second quadrant.com blog. And this is a presentation they put up about 30, 40 minutes in length talking about using Ansible to deploy PostgreSQL. Ansible, if you're not familiar with it, is a configuration management tool similar to Chef or Puppet, but I find it a lot easier to use. And actually, I've been using it for a number of years, and it's how I deploy my systems, including configuring PostgreSQL. Now, this presentation is a little bit on the basic side, but it goes over using ansible and how you would potentially deploy a set of, say, five different servers, a primary, a Replica with a backup server, and then another Replica with a backup server, and potentially, how you can distribute that across multiple data centers or availability zones. So if you're looking for a potentially different deployment solution for your PostgreSQL database instances, maybe check out Ansible and check out this presentation. The next post is keeping postgres tidy with partitioning, and this is from the Dataegrit.com blog. And he's basically talking about how partitionings can help you with data management. So if you have say a history log or an event log and you periodically want to remove or delete that data or it has some sort of telemetry to it and in this case they have a looks like a 2 billion rows maybe and you want to clean out the old events using a delete by a certain date. It's going to take forever to run. And he says, quote, the query would take twelve minutes to complete and it would generate a lot of write ahead log files. Now, even after you delete it, you still have a huge table on your hand. So all those delete statements really haven't freed any space. And you'd have to use one of the tools that enables you to compact the table, such as you could use vacuum full but that locks the table, or one of the other third party tools that enable you to do it while the table is still accessible. But if you would use partitioning and partition by month or by year in this case month would probably be the best thing, then you could simply drop that partition and multiple partitions and all that data would just go away and you would reclaim all of your space. So definitely something to keep in mind on how partitioning or partitions can help you keep postgres tidy. The next post is monitoring PostgreSQL wall files and this is from the PG IO blog and they talk about what wall files are. They're the write ahead log. It's basically a log of every activity occurring, talking about why you should you monitor them. And basically the greatest risk is running out of disk space. So basically I don't really monitor the number of wall files like they are talking about here. Generally what I monitor is the disk space. So where are those wall files getting created, where are they being archived to? And that is what I monitor to notice things. Some of the problems they're talking about here, such as archival failures, replication failures, or even a lot of long running transactions can all cause wall files to build up and use a lot of disk space. But they do have some interesting things here where you can get a query to get account of wall files. So I thought this was particularly interesting. So if you're interested in that, definitely a blog post to check out. The last post is getting started. Running PostgreSQL on Kubernetes. And this is from the Crunchydata.com blog. Now last week in the previous episode of Scaling Postgres, we talked about using Kubernetes to build your own database as a service essentially for PostgreSQL where you can run and monitor multiple PostgreSQL database systems. And there they were basically building it out themselves. But Crunchy Data here has a few tools that they've come up with to help with this process. So again, if you're interested in using Kubernetes to potentially manage and deploy multiple PostgreSQL database servers, then definitely a blog post to check out.
[00:09:25] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.