Stat Tracking, Temporary Tables, pg_squeeze, pg_auto_failover

[00:00:00] In this episode of Scaling Postgres, we talk about stat tracking temporary tables, PG Squeeze and PG Auto Failover. I'm creston. Jameson. And this is scaling postgres episode 136 one. [00:00:22] Alright, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is tracking postgres stats. This is from Pgcraftsman IO and this is a blog post about a very simple SQL script that has been put together that looks at existing system tables within your database and it basically takes snapshots of them. So a lot of these stats tables that exist within postgres, they either have point in time statistics like PG Stat Activity, of course, or they have cumulative statistics like PG Stat, all tables. He argues you basically need to create snapshots of these tables to glean useful information from this information over time. And he has this PG Craftsman's snapshots SQL script that basically takes snapshots and records it in the database itself of these different statistics. So basically, maybe you want to schedule this with a cron job on a periodic basis and take a look at the data, or maybe do some work or some job and then take another snapshot to compare how your system has been modified by changes. So he gives an example of running the command once you have this SQL script and it creates a separate schema for itself, as well as a number of tables and indexes. And he shows an example of the tables that are present here. Now, he does mention some of the things like memory and CPU aren't currently used now because there's no way to query postgres to get that, but he's alluding to something coming in the future because he mentions a future post. I suspect he may be using an extension that actually contacts the underlying OS to pull things like CPU stats and things of that nature. But basically, taking a snapshot is rather simple. You just run this command to take a snapshot from those existing tables. Now he talks a little bit about what it takes to do a snapshot and it basically depends on how many different objects you have in the database or in the case of static activity, how many connections you have. And this post basically goes through this allows you to check out what the snapshots, as they exist in the separate schema are and then doing a report from one snapshot to another and it gives you stats from this table. So for example, for the PG bench history, you could see from a workload that was run, over half a million rows were inserted. So even though this looks like it was just introduced and it's relatively simple, it looks like something that could build up to something pretty interesting. It's basically a very bare bones way to collect a time series of statistics information from these tables over time. So if you're interested in that, definitely check out this blog post the next piece of content is PostgreSQL. Sophisticating, temporary tables, this is from CyberTech Postgresql.com, and it's all about temporary tables. These are tables that are created within a session and then once that session is complete or the connection is closed, the temporary table goes away. Now, in addition, he mentions a few other configuration options you can use with temporary tables. The default is to preserve rows on a commit of that table. However, you could also choose to delete rows like do a bunch of work, and then when you do a commit, it actually deletes the rows and blanks out the table. Or you could actually choose to entirely drop the table. Now, if you're going to start using more of these temporary tables, one thing he says here to be aware of is that the amount of memory dedicated to temporary tables, by default it's at eight megabytes. Well, you may want to increase that if you want more space for temporary tables to continue to be memory resident. So that's a configuration you may want to change. So this is a very quick, simple post about temporary tables in postgres. So feel free to check it out. [00:04:04] The next piece of content is PG Squeeze optimizing PostgreSQL Storage. So this is a new utility that's available that enables you to shrink tables live. While the database is working, move tables and indexes from one table space to another. Index, organize, or cluster a table, as well as change the on disk fill factor. Now, it seems like the primary use case is to shrink tables in a live database similar to what Pgrack does. What's interesting about this, it actually uses logical replication to do the majority of its work. So this may require less locking to be taking place than maybe PG repack. Now, they discuss a little bit about table Bloat in general, and basically because of how postgres works, whenever you do an update, a separate row is created and then that old row is deleted eventually. But even though vacuum happens and cleans up that dead row, it never reclaims the disk space. So for example, if you have a table, you've inserted some data, it's at 69 megabytes. If you then update all those rows, it goes to 138 megabytes. But vacuuming the table won't reduce the size of the table. However, if you use their utility, PG Squeeze, it's actually able to compress the table back down to the original size of 69 megabytes that they show here. They also talk about you can set up a scheduling system with it because they have a squeeze tables where you can insert a row in here to be able to squeeze tables on a periodic basis. And they have a schedule option that appears to work like Cron in order to schedule these operations. So it looks like a new utility to do something similar to what Pgrapak does, but it does it slightly differently. So if you're interested in that maybe you want to check out this new utility from Cybertechnposgresql.com. [00:05:53] The next piece of content is what's new in PG auto failover 1.4 for postgres High Availability. This is from Citusdata.com and PG Auto Failover is a very simple utility to do a primary to a secondary failover to maintain a High Availability postgres solution. And basically it just requires having a monitor that does health checks against a primary and a secondary instance. So relatively simple setup and basically it just takes these four commands to get a setup running. Now what they're mentioning here in 1.4 that's new is the ability to have multiple secondary or multiple replicas available because how it worked previously if the primary went down, of course it would promote the standby or the secondary node to become the primary. But what if your secondary node goes down? What happens then? Well, if you have Synchronous Commit Set, it's actually going to stop writing to both the primary and the secondary because they need to happen in synchrony. So what the utility did was actually turn off that synchronous replication. So you can still do writes to the primary but then essentially your data is unprotected. But with being able to support multiple replicas, even if a secondary node goes down, you still have data redundancy on both the primary and the secondary if you have two or more nodes available. So this version supports that capability as well as Postgres 13. So if you're looking for a High Availability solution that looks relatively simple to set up but they continue to add new features to it, definitely check out this post from Slitusdata.com. [00:07:32] The next piece of content is Multitenancy with Postgres schemas key concepts explained. This is from the [email protected] and basically using schemas they are essentially namespaces for tables. So if you're doing multitenancy where one account has a dedicated schema, another customer account has another dedicated schema if you want to set it up this way. This blog post describes that and it basically shows you how you can create additional schemas and how they namespace the objects within them and how when you're going to query against it, it's basically separate objects under different namespaces. So they had created a tenant one namespace and now you can look in the things table in that namespace and that there exists a default namespace public that all objects are created in by default unless you specify a namespace. So it talks about being able to query different one by specifying the full namespace or you can define a search path and customize it so that it will only look in particular schemas for tables and how you could do this with a set command for sessions in postgres. Of course the thing with that that you need to keep in mind that they cover here is that if you're using something like PG Bouncer and it's set to transaction pooling, you're probably going to be mixing tenants data up because transaction pooling is not compatible with sessions, so it's definitely a big caveat to keep in mind. But this post goes over some of the things to keep in mind if you're exploring using Schema based separation for multitenancy. [00:09:05] The next piece of content is PostgreSQL monitoring for app developers, alerts and troubleshooting. This is from Crunchydata.com and it's following up a series of posts that they've been talking about monitoring, particularly their tool Postgres Operator that works with Kubernetes clusters and PG monitor for the monitoring and it discusses using the Alert manager from Prometheus. I believe in order to set up these alerts to be able to get notified if there's something that's outside the bounds of what's expected. And it looks like they have some defaults that have been set up for their particular solution. And the most important ones that they talk about in terms of setting an alert on is this thing on? Basically is the postgres database up and available the second 01:00 A.m. I out of space. Basically tracking disk usage to make sure that you're not going to run out of space either due to excessive logging or just database growth. Next is why is this not clean? Basically tracking vacuum operations, in particular keeping an eye out for transaction ID wraparound to make sure you don't get close to that limit. Next is falling behind the leader, which is basically monitoring replication lag as well as running out of numbers, which is basically a sequence exhaustion, which means a sequence is set to a particular limit and you either run out of that sequence number or your table itself. The column that the sequence is used for you overrun the size of the integer that's present there. So definitely some key things to monitor that are discussed in this post. So definitely a post to check out for monitoring these critical things as well as the tools that they have set up if you're interested in checking those out. [00:10:43] The next piece of content is tuning your postgres database for high write loads. This is also from Crunchydata.com and basically they're only talking about one parameter change here, and that is the max wall size. Now, they do discuss some others related to checkpoints because basically what's happening that they've seen in log files sometimes is that the checkpoints are occurring too frequently. That's because not enough wall size is maintained to be able to prevent early checkpoints. So basically the only thing you have to adjust here to get things back to normal is increasing the max wall size. So this is a very brief post on that, but of course, you do want to consider more configuration changes. For high write loads, such as adjusting your checkpoint timeout, the warning completion target. A lot of these parameters you want to change, but this particular post only covers the max wall size, so you can check this post out if you want to learn more. [00:11:37] The next piece of content is postgres 13 observability updates. This is from Dataegrid.com and they've taken an image and have updated with the changes to Postgres 13 with regard to where you can get insight into performance with postgres and it points to all the different subsystems of postgres and the system views you can look at to get insight into it. So for example, you can look at the PGSTAT Wall receiver or the PGSTAT replication. So it lists all these different system views where you can get insight into performance of postgres and they talk a little bit about the changes from 13. But if you're interested in this image like this, definitely check out this post. [00:12:16] The next piece of content is Talking postgres on Podcasts. This is from Cleverelifent CA and he has referenced to a couple of different podcasts he's mentioned since he hasn't been attending conferences recently. You can get more GIS content here. [00:12:32] And the last piece of content is the PostgreSQL person of the week is Greg Sabino Mulane. So if you're interested in learning more about Greg and his contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 365

Here Comes Async I/O! | Scaling Postgres 365

Episode 151

Cleaning Up, Function Performance, 11 Million IOPS, Change Data Capture | Scaling Postgres 151

Episode 362

Optimizing For Time-Series Data | Scaling Postgres 362

Stat Tracking, Temporary Tables, pg_squeeze, pg_auto_failover | Scaling Postgres 136