Ballad of Bloat, Auditing Options, Configuration Options, Need Vacuum?

[00:00:00] In this episode of Scaling Postgres, we talk about the ballot of Bloat auditing options, configuration options and need vacuum. I'm Kristen Jameson and this is Scaling Postgres, episode 99 one. [00:00:21] All right, I hope you're having a great week. So next week will be episode 100. So still trying to think through perhaps what I could do a little bit different in that episode. If you have any suggestions, please leave them in the comments below or send me an email with regards to that. Now, in terms of this week, we have a lot of posts that were done, a lot of content to get through, so we'll go ahead and get started. The first piece of content is just something fun and interesting. It's actually a song that someone put together and it's on YouTube, so it's a YouTube and it's Nothing Compares to Vacuum the Ballad of Bloat. And this is in the Warsaw PostgreSQL Users Group. And it's a song, basically, using the melody from Nothing Compares to you by Sinead O'Connor. So quite interesting. I definitely encourage you to check it out. The next piece of content is Row change Auditing options for PostgreSQL. And this is from CyberTech postgresql.com. So if you're wanting to audit row changes, so inserts updates, deletes or truncates, they go through a couple of different options that you can do it. The first way is to change your log statement in the postgresql.com file so you can make it none. So don't log anything, log just DDL statements where you're creating manipulating objects, do mod, so like modification or all, but it has a number of downsides and they go through the different pros and the cons here. The pros are this is super easy to set up, it's one configuration variable to tune and it starts giving you some auditing. The cons are it catches only the top level DML statement issued by the user. So like stored procedures, things of that nature can bypass this. No row contacts for multiro updates, no table or Schema based filtering information is purely text based. Queries from failed transactions are also included. The next step up they mentioned is the PG audit extension. So the pros of this are you have quite granular logging and auditing options and internally dynamically generated SQL can also be logged. The cons are it is a third party extension, possibly heavy disk I O footprint, no row context for multi row updates. The information is purely text based and queries from failed transactions are also included. The next option is to build your own custom audit tables and triggers for those tables to populate. So this gives you the most flexibility of kind of what you want to audit, but it's probably the most work and they go through the different pros and cons for that. Now, the next option is interesting, I've used it myself, is using one generic audit trigger in table for all audited tables. So you have just a generic log table. You create indexes on it and you're basically using a JSON B field to store the row data and the changes that were done. And he creates different indexes necessary to do that. And then the function defines what and how to store the rows based upon changes. And then you put this trigger after insert, update or delete on each table that you want to be able to audit. Now, the pros of this is that there's less code to manage and automatic attachment of audit trail triggers can be easily configured for new tables. Now the cons are it is a bit more resource intensive to do this method. It says some exotic indexing gen may be needed. I don't necessarily know if that's too much of a downside and SQL searching may become a bit more complex. Again, work with JSON B fields it takes a little bit more work, but it's not so bad. So I personally kind of like this approach. The next approach is doing logical replication and there's two different ways. One way is PostgreSQL native, so basically you set up a separate database server and you basically create a publication on the primary and then create a subscription on what you're replicating to to audit the changes that are done. The other way is logical replication with custom applications. So you develop an application that supports logical decoding and you essentially read the wall stream and log whatever you want to from that log stream and they come up with a good summary table here that discusses each of the options. So, definitely a great blog post to consider if you're wanting to implement auditing or perhaps change how you're auditing your current application. [00:04:51] The next post is comp PostgreSQL configuration for Humans. So this is a blog post on Postgres.com that is talking about PostgreSQL conf where it's PostgreSQL NF that they are defining as your PostgreSQL conf documentation and ultimate recommendations source. So basically all the different configuration parameters, they have documentation that does mimic some of what's in the actual PostgreSQL documentation but also gives some recommendations. So it's essentially another source and the link that they're referring to is this one here. So it has all the different configurations options here you can just click one or you could just do a search and it comes from finding what are the best configuration options to change out of the 300 or so that are available. And so for example, if you click Auto Vacuum it gives a discussion of what it is, some recommendations, they even offer the ability for comments to be added and it gives general parameter info as well as looking at it by version. So if you want additional resource to help you configure your PostgreSQL configuration, definitely check out this resource. The next post is the most neglected postgres feature and this is from Richard [email protected] and basically he's talking about a log line prefix and so many times he's seen an empty log line prefix for older installations of postgres, or they just have a simple timestamp or something similar that is not that descriptive. Now, more recent versions have a more comprehensive one, but if you've upgraded and haven't updated it, this is definitely something to consider doing because you can define some of these ones that he's defining as his favorite prefixes. To add one is the application name, the username, the database name, particularly if you have multiple databases on one database, cluster, the remote host IP name, the process ID session process log line, as well as transaction IDs. So you may want to consider reviewing this post if you want to add a little bit more descriptive information to your logs. And then at the end here it says that's not all. And he actually covers some other log statements that are fairly important that he feels you should set. This is one of my favorites is the Long men duration statement to help find slow queries, log statement, log connections, disconnections log rotation, age rotation size log, auto vacuum, men duration log checkpoints, log temp files, and auto explain. So if you want to update how your logging is set up for PostgreSQL, definitely a blog post to check out. [00:07:32] The next post is which tables should be auto Vacuumed or Autoanalyzed. And this is from Dep. [00:07:39] And basically he had an issue where Auto vacuum wasn't keeping up with changes that were being made to the database. So he needed to do a manual vacuum to catch everything up. But he said he didn't have a great way to check what tables were waiting for auto vacuum. So he actually developed a script that does this or a query that does this. So he consults the different auto vacuum settings to be able to interpret what is the next table that would be up for an auto vacuum. And he developed this query that you can run in your own database to be able to do exactly that. And he created it as a view here that can be used in this way. So you can know when the next thing should be vacuumed or when the next thing should be analyzed. So if you think you could benefit from this, definitely check out this blog post. [00:08:31] The next blog post also from Dep is don't do these things in PostgreSQL. Now this is from a previous Wiki page that was set up on what not to do. And we had talked about this last year when it was posted, or excuse me, a year before last. And I remember saying some of these were kind of opinionated in that I might not do the same thing. But it's interesting here what he did is he took all of these don't do things and then he set up a SQL script that for those you could look in the database to see how it's said and what's being done. It can actually look for violations of these Don't Do rules. So if you're interested in that, check out the PG wiki don't in order to get access to this SQL script to run on your own database. [00:09:21] The next post is Pgctl Tips and Tricks. This is from PG IO and they're talking about Pgctl or PG essentially Control for controlling the setup in your general database cluster. So they cover this command where it typically is located on Debian, Ubuntu and Red Hat as well as CentOS, and they go over the process of creating the database cluster, how to start a database cluster, and different ports and options you can setting at the point at which you start it. So if you're interested in learning more about Pgctl, definitely check out this blog post. [00:10:00] The next post is debian integration of Petrone and VIP Manager. So they've set up in Debian, the VIP manager. So this is a virtual IP manager to help you manage a cluster of PostgreSQL servers, one acting as primary and one or more Replicas. And we're working with Petrone. This virtual IP manager essentially helps direct traffic to the active primary or direct write traffic to the active primary. So if you're interested in Petrone and the VIP Manager for managing a set of PostgreSQL servers on Debian or perhaps Ubuntu, definitely check out this blog post. [00:10:41] The next post is scheduled. PostgreSQL backups and retention policies with Kubernetes. And this is from Crunchydata.com. And they have developed their PostgreSQL operator that helps manage postgres instances within Kubernetes. And they've actually added the ability to use PG backrest as a backup and schedule retention policies with it all within Kubernetes. So if you're interested in using that, definitely check out this blog post. [00:11:12] Next blog post. Also from Crunchydata.com is migrating from oracle to PostgreSQL tips and tricks. So they talk about different solutions to issues you may run into if you're migrating from Oracle to postgres. They mention Aura FCE, which includes some functions that you're familiar with using in Oracle. Well, they give you postgres versions that work similarly to Oracle, and they tell you how to go ahead and install it and create this extension to be able to do that. They talk about how to handle if in Oracle you do things like disable and enable constraints or disable not null constraints. They discuss how to handle that in postgres, how the grant command is similar but yet a little bit different. They talk about how to best drop objects in PostgreSQL versus Oracle, how to check for not null, and discussing row ID, Ctid and Identity columns. So if you are migrating from Oracle to PostgreSQL, definitely a blog post to check out. [00:12:12] The next post is Barman Cloud Part One Wall Archive. And this is from secondquadrant.com. And as of barman two point ten, there's a way to do streaming backup of your wall files to a cloud location. So for example, Amazon s three, and that's what this post discusses Barman cloud wall archive. So essentially the archive command you can configure in Postgres. There's a Barman application, essentially you can send it to that will send the wall files to S Three, and the part two will cover Barman cloud backup. So if you use Barman and want to start using this archive directly to a cloud location such as Amazon s three, definitely check out this blog post. [00:12:55] The next post is PG. Timetable? Advanced PostgreSQL scheduling. And this is from CyberTech postgresql.com. Now, it's interesting they call it PG Timetable because I actually don't see that. This is an extension for Postgres. It's a separate application written in Go, and it's typically delivered as a container, it looks like. So I guess I'm used to seeing PG underscore something related to an extension for Postgres. But this is actually a separate application that does scheduling of tasks. It uses PostgreSQL essentially as the data source, but by having it being its own application, you can do all sorts of things outside of Postgres, like send emails, download files, et cetera. But it looks like when you're configuring it, you're doing all the configuration in SQL. So it's definitely an interesting project. And it looks like they have a Cron style you can implement as well. I'm going to be sticking with Cron myself, I think, but if you're interested or looking for another scheduling option, maybe check out this blog post. The next post is understanding prepared transactions and handling the orphans. And this is from Higo CA. And prepared transactions generally have to do with a two phase commit process, so you normally don't use this in normal application development. So I haven't really consulted this. But if you are considering distributed transaction management in things like two phase commits, maybe you want to check out this blog post. [00:14:30] And the last post is developing. PostgreSQL for Windows part one. And this is from secondquadrant.com. So if you are doing development of PostgreSQL and want to do one on Windows, here's a blog post that gives you some tips and tricks to help you do that. [00:14:47] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the Show Notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 38

Parallel Query, Real Time Analytics, Killed Index Tuples | Scaling Postgres 38

Episode 199

Shaped Sample Data, Version Changes, Missed Bottlenecks, Indexes for Newbies | Scaling Postgres 199

Episode 266

Postgres Releases, Postgres Distributed, Privilege Template, Real-Time Dashboards | Scaling Postgres 266

Ballad of Bloat, Auditing Options, Configuration Options, Need Vacuum? | Scaling Postgres 99