Modeling, Query Performance, Statistics, pgmetrics

[00:00:00] In this episode of Scaling Postgres, we talk about modeling, query performance statistics, PG metrics, and high availability. I'm creston. Jameson. And this is scaling postgres episode three. [00:00:21] All right, we're going to start things off a little bit different. I'm going to prioritize content that is more applicable to developers first, and then follow up with the other articles that would be of general interest to someone looking to scale postgres. So the first article we have is database modelization antipatterns. This is from the Tap Oueh.org blog. [00:00:48] So in this article, he discusses three antipatterns that he finds when people are modeling their application. [00:01:00] The first is Entity attribute Values, and how you should pretty much not use this pattern. The second is having multiple values per column. So the example he uses here is you have a Tags field that you contain in a text field with some sort of delimiter. Here's a semicolon, for example. And these are, for example, Tweets and how this is an anti pattern. And if you use normalization, there's a better way to structure this data with a Tags table and a Tweets to Tags linking table. And the third is looking at is actually UUIDs. So clearly he's not a fan of UUIDs. I mean, I generally tend to use sequences. However, UUIDs do become interesting when you need something that's globally unique, like you're having data coming in from multiple, say, postgres databases, that you need to do something to ensure uniqueness. So why I chose this article is that I think this is some education that would be of benefit to a lot of developers because really, it's the modeling that can really impact performance. And if you're looking to scale your database, it's important to get the modeling right as soon as you can. [00:02:21] So definitely encourage you to check out this article. The next article is from the CyberTech blog and the title is Detecting Performance Problems Easily in PostgreSQL. [00:02:33] So right from the get go, he says, improving PostgreSQL performance beyond parameter tuning. And that goes back to what I just mentioned. That a lot of the performance gains. If you're looking to scale your database, getting the modeling right, getting the indexing right, that can give you amazing performance gains, 100 fold, 1000 fold, potentially, compared to trying to tune the database better, there's a place for it. But generally, query optimization can get you bigger wins. So in this article, he goes over how to look for slow query statements in terms of enabling PG Stat statements and gives you some queries you can look at to be able to pull out those queries that are slow in the database using some of these PG Stats statements. [00:03:29] So, definitely an article to check out if you're unfamiliar with using PG Stat statements. [00:03:36] The next article is from the Citize Data blog, and the title is the postgres Ten feature you didn't know about. Create Statistics. [00:03:45] So this goes over, of course, a new feature in Postgres Ten where historically the Statistics Planner only operated on single columns, and that's pretty much what it does by default. However, it has introduced two additional statistics where you can define relationships between columns. So it's not just looking at single columns, but you can define that there are relationships that exist between these two columns to get better statistics. So it offers two different ways to define these related statistics. The first way is to define that there are dependencies between two columns, and near the bottom of the article they mention some real world implications of these types of dependencies. So for example, here's a quote having columns for month, quarter and year because you want to show statistics grouped by all in reports, well, there's a relationship between month, quarter and year. Also relationships between geographic hierarchies, so a country estate city columns and filtering grouping by them. [00:04:58] There are only certain cities in a state, there are only certain states in a country or cities in a country. So by creating these statistics, you give the optimizer more accurate information to be able to come up with the best query plan. And they also mentioned the other type of statistic you can create is an indistinct, which is a distinct count between the columns. So there are certain use cases where this could be beneficial. And if you think your data could benefit from it, I definitely suggest checking out this blog post on it. The next article is from the ops.com blog Announcing Pgmetrics so this appears to be a command line tool that enables you to retrieve from the system tables of postgres all sorts of statistics information and export it via JSON or just get human readable text. So as opposed to opening a database connection and querying certain tables, they've designed this utility to be able to pull out this information. And it covers replication status, replication slots, standby replication Status while Archiving BG, Writer Backends, Vacuum Progress, Table Spaces, databases all the sort of information to help you get a gauge on how your database is performing. So if you're looking for some different ways to monitor your database, you might want to check out this project. [00:06:34] So this is the Announcing blog post, but there's also Pgmetrics IO, which is a website dedicated to this project. The next article is from the several nines.com blog and the title is Top PG Clustering. Ha Solutions for Postgres so Ha means High Availability, and this just goes over a broad overview of all the High Availability options that are available in postgres. Some built in, some you need separate products for so standby databases clustered using DRBD, et cetera. So if you're considering a High Availability solution, this gives a broad overview of different options that are available. [00:07:21] The next article is from the second quadrant blog PostgreSQL meltdown Benchmarks so, with the recent vulnerabilities that have been discovered in CPU speculative execution, meltdown Inspector, there's been a lot of concerns about how will this impact performance of servers? And they went in and did some performance testing with PostgreSQL, as well as looking at it with different patches. So they have these graphs that define the different performance impacts for both online transaction processing loads and online analytical processing loads. And it's not an insignificant impact in the conclusion here, but definitely something to keep in mind that this will have an impact on performance as you install these patches. Potentially, maybe they'll get better in the future. [00:08:18] And the last article is actually a tutorial I [email protected] and it's PostgreSQL Backup and Point in Time Recovery. So if you're at the getting started stages of setting up your backup and a point in time recovery plan, I encourage you to go ahead and check it out. That does it. For episode three of Scaling Postgres. You can get the links to all the content presented in the show. Notes be sure to head over to Scalingpostgres.com to receive weekly notifications of these episodes. You can also subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 173

Delayed Replication, Materialized View Permissons, Paranoid Postgres, Memory Overcommit | Scaling Postgres 173

Episode 367

How Often Does That Query Run? | Scaling Postgres 367

Episode 75

Combining Indexes, Regular Expressions, Parallelism, PostGIS | Scaling Postgres 75

Modeling, Query Performance, Statistics, pgmetrics | Scaling Postgres 3