Postgres 13 Released, What's New In Postgres 13, Debugging PL/PGSQL, Monitoring

[00:00:00] In this episode of Scaling Postgres we talk about postgres 13 released What's New in Postgres 13, debugging plpgsql and postgres monitoring. I'm Kristen Jameson and this is Scaling Postgres, episode 133. [00:00:19] You alright, right? I hope you family, friends and coworkers continue to do well. Our first piece of content is PostgreSQL 13 is released and this is from the Postgresql.org website. So it was released on the 24th. And this news announcements goes over some of the new features such as Deduplication in Btree indexes, an extended statistics system to better communicate statistics, dependencies between columns for or clauses and any all lookups improvements with hash aggregate functionality that allows large aggregates and grouping sets to be done to disk as opposed to just to memory partition table enhancements. With regard to improved Pruning, as well as being able to do more partition, wise joins a paralyzed vacuum for indexes. Again, not on auto vacuum, but when you manually run vacuum replication slots allow you to define how much of the wall to retain so you don't run out of disk errors on your primary. Then they mentioned some application development conveniences such as the daytime function to SQL JSON path support and included gen random UUID function so you don't need an extension partition supporting logical replication as well as before row level triggers and then the fetch first with ties as well in addition to some security enhancements. Talking about trusted extensions, being able to require channel binding with scram authentication and additional features such as that. So I encourage you to check out this post as well as the Release notes link here. That of course goes into detail with all the different features that have been released with PostgreSQL 13. Now, there's a number of other posts I'm going to follow up that cover essentially what's new in PostgreSQL 13. The first one is lessons learned from running postgres 13, better performance monitoring and more. So they did this in the staging environment, the beta and I believe release candidate. And with regard to index deduplication, they found about a threefold improvement in space savings, which again this could be a huge win for performance because smaller indexes, higher probability they're going to be cached and of course smaller, so it's much more easy to search through it. Then they talked about the extended statistics improvements and some of the benefits 13 brings. With that they talk about the parallel vacuum and again how this doesn't apply for auto vacuum, but it does work when you run a manual vacuum because they assume you want to get the maximum performance with running that vacuum fast. Although I wonder if they're going to make auto vacuum support parallel vacuum in the next version. Or maybe the version after they talked about the incremental sorting improvement, where if you presort something and then you limit by it. Or do another sort and have sub queries. The Planner will detect it's already been sorted and then won't try to resort the whole thing again. So it supports an incremental sorting to give you better performance. They mentioned some of the monitoring changes where you can actually track wall usage to see how much wall a given query is using. So for example, here they did a delete from users and they asked for the wall and you can see how many wall records were generated, how many were full page images and the number of bytes involved. Then in particular, PG Stat statements can also track this information as well. So now you can kind of get insight into what's writing all of these wall records, which is something I've been asked about on occasion. And with Postgres 13 it should be much easier to find out this information. Then they talk about the sampling statement logging that is supported now. So you can take a sample, not just log all slow statements, but it allows you to take a sampling and then even log parameter changes for failed statements so you can get more insight into why a statement failed. And then when you're analyzing, you can also include buffers to see, for example, what has been hit with regard to shared buffers. And this is also reflected in the PGSTAT statements as well. And then they cover the inclusion of the gloss suri, the UUID automatic support and some psql improvements. So again, another great post from Pginalyze.com talking about the new features in Postgres 13. [00:04:40] The next one is a post of what's new in PostgreSQL 13 from Enterprisedb.com. They talk about the faster vacuum because being able to do the indexes in parallel. They talk about the security features avoiding man in the middle attacks. Now that the client can do channel binding with Scram authentication, being able to reduce errors in production. So they have a PG catalog check utility to be able to check your catalog and make sure everything is good. Being able to verify backups because of the new included PG Verify Backup utility that's included so you can verify you have all the information required to be able to restore a full backup. Going back to the joins of partition tables, this can happen more frequently now, as well as logical replication for partition tables. Then they also mentioned the deduplication of Btree indexes as well as other security enhancements. [00:05:33] So the next post also covering new features is new features in PostgreSQL 13. Full webinar video. And there's a full webinar video, I included the video right here, it's on YouTube. It's about an hour and 18 minutes in length. And it covers things like the partitioning enhancements, Btree duplication, incremental sorting, hash aggregation enhancements, the backup manifests which enables that verify backup, trusted extensions, replication time, size limits to avoid running at disk space on your primary, the parallel vacuum capabilities, security enhancements, as well as general improvements for large databases. So a lot, a lot of content related to all the new features in Postgres 13 that you can check out and analyze and see which ones you want to use when you decide to upgrade a postgres 13. [00:06:22] The next piece of content is debugging plpgsql Getstacked diagnostics. This is from Cybertechgresql.com and they're talking about you have a function and they called it a broken function which does a divide by zero which results in an error. And this is called from within another function but you can't really know why it crashed and you don't get a backtrace or a debug of the code. And he says one way you can do this is to create another function called Getstack, although you can name it whatever you want to and it runs the function of interest and then it has an exception where you do the Getstack diagnostics and then print out information related to it. So when you wrap your function in something, as he has done here, you get this type of output and you can see what the message was. It was a division by zero, the context related to that division by zero, and then what line numbers in terms of the stack trace things occurred on. So this is a great way to help you debug your plpgsql code. So if you're interested in that, check out this post. [00:07:29] The next piece of content is actually a YouTube video and it's monitoring PostgreSQL, a PostgreSQL talk. This is from the Procona YouTube channel and it's given by Jason Yee at Datadog. So they talk about different aspects of monitoring your PostgreSQL installation. In addition to iterating, some things you should be monitoring. It also talks about more so best practices with regard to do it. So if you're interested in learning more about monitoring, you can check out this YouTube video. The next piece of content is simple anomaly detection using plain SQL. This is from Hakibanita.com and he's talking about a technique he used to detect anomalies in a set of log entries. Like they were looking for status codes and they wanted to be say, alerted to it. So he goes through the process of doing that first. How does he detect anomalies? So he looks at it and then does a calculation in SQL to get a standard deviation. And then from the standard deviation he then gets a Z score. So a Z score helps you determine how many standard deviations each data point is from the mean. So you can see in this post here the Z score. The only one that is less than or greater than one in terms of standard deviations is this twelve value. So clearly it is the Outlier. So essentially he's using outliers as a term for an anomaly. And then he goes into analyzing the server log and the different status codes and he goes through the process of getting those standard deviations, calculating a Z score form so that you can then identify the status codes. And then the question becomes what Z score should you use? Should you use one standard deviation from the mean? Two, three? Four? What makes sense? And he advocates doing backtesting. So back test previous log information that you've been collecting to determine what you're seeing and what you want to alert on what do you call an anomaly or not. And he goes through the process of determining that for his set of data and how to eliminate essentially repeat alerts. Then he goes into talking about increasing accuracy through weighted means, potentially using a medium or the median absolute deviation. So this does involve a little bit of statistics, but a lot of SQL code in terms of doing this type of anomaly detection. So if you're interested in that, you can check out this post. [00:09:51] The next piece of content is in Memory Table with pluggable storage API. This is from Higo CA, and this is a process they're developing that I assume might be an extension to postgres or a pluggable storage solution that only uses memory to store tables and information. And they basically want to basically have the buffer become that memory storage mechanism, or at least intercepting its right to the disk and storing it in this memory structure that they're creating. So they've gone through the process of developing an initial prototype for it, and for 20 million rows, they're seeing a 50% performance increase for inserts, a 70% increase for updates, and a 60% delete, and a 30% increase in vacuum operations. So that's a pretty good improvement, but it's something they're working on. If you're interested in learning more, check out this post. [00:10:42] The next piece of content is announcing Crunchy Bridge, a modern postgres as a service. So Crunchydata.com has released a service called Crunchy Bridge, which they're doing a postgres as a service service that provides production ready postgres out of the box, does automated backups, a high availability point in time recovery, and then encryption in transit and at rest. This is supported on AWS and Azure right now, and they can basically work between different clouds, if that's something of interest to you, or on premise solutions. So if you're interested in a managed solution that is cloud agnostic, maybe you want to check out this solution from crunchydata. [00:11:27] The next piece of content is Postgres data types in PL Python. And they're talking about translating Python data types to postgres data types used in PL Python. So if you're interested in that, you could check out this post from Crunchy Data. [00:11:42] Next piece of content is using Postgres and PG routing to explore the smooth waves of Yacht Rock. And basically PG routing is typically used for geographical information system calculations, but you can use it when you have any sort of interrelationships between pieces of data, so anything that defines relationships. So you can look at one artist here at one hop, and then go to two hops, three hops, four hops, and they're using the PG routing extension to do that. So if you're interested in doing this type of analysis, check out this post from Crunchydata.com. [00:12:16] The next piece of content is Diary of an Engineer Delivering 40 X Faster Percentiles Using Postgres Citis and T Digest this is from Citusdata.com. Now, this particular post is talking about a use case with Citus, which is an extension for postgres that enables you to scale out postgres. But they're talking about an extension called Tdigest for doing estimates of percentile calculations and how with it they were able to achieve a 40 x improvement in performance. So if you calculate these types of percentiles, maybe you want to check out this post. [00:12:52] The next piece of content is how we are building a self sustaining open source business in the Cloud era. Version Two So this is from Timescale.com, and it's for TimescaleDB, an extension that does time series data in postgres, and they've basically altered their license to make it more applicable to more use cases. So if you're interested in a time series database in using postgres, maybe you want to check out this post to see what they're planning. [00:13:18] And the last piece of content is the PostgreSQL Person of the Week is Dan Langill. So if you're interested in learning more about Dan and his contributions to postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes tunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 267

PostgreSQL 16 Beta 1, Rust Functions, Partitioning Memory Problems, Tags & Arrays | Scaling Postgres 267

Episode 164

Fast Random Rows, Fast Distinct Queries, Temp & Unlogged Tables, Terminating Connections | Scaling Postgres 164

Episode 21

Using JSON, Procedures, Concurrency, Kubernetes | Scaling Postgres 21

Postgres 13 Released, What's New In Postgres 13, Debugging PL/PGSQL, Monitoring | Scaling Postgres 133