Slow Queries, pg_auto_failover, Roles, Hyperscale

[00:00:00] In this episode of Scaling Postgres, we talk about slow queries, PG auto failover roles and hyperscale. I'm Kristen Jameson, and this is scaling postgres episode 63. [00:00:21] All right, I hope everyone's having a great week. So our first piece of content this week is how to identify slow queries in PostgreSQL. And this is from Dev two pythonmeister. And this is a very short post, three minute read, but it tells you exactly what you need to do to be able to find slow queries. [00:00:42] Number one thing he mentions here is adjust your long men duration statement in your PostgreSQL configuration to be some time period. It could be 500 milliseconds a second. He suggests 5 seconds here so that in your logs you see your slow queries. And of course, he says you need to reload your configuration after you do that and even shows you how to test it by doing using a PG sleep. So if you're unfamiliar with some of these functions, you can see them right here. And then the next thing he says to do is to use PG Stat statements. So this is a extension that you put in the shared preload libraries configuration of PostgreSQL. Then you create an extension in the database that you're interested in and it tracks all of these statistics for queries that are running. And he gives some example output of what you can look at to get a good sense of all the queries that are running their average time frame for you to identify those slow queries. And then he also does, which I haven't seen a lot of blog posts do, is mention that you can because these are aggregate the PG Stat statements, you can reset them and this is the function to reset them when you feel as if you need to on a periodic basis or if you're diagnosing some issue. So very short, but a very good post telling you exactly how to find your slow queries. [00:02:01] The next post is introducing PG auto failover, open source extension for automated failover and high availability in PostgreSQL. Now this is actually on the Microsoft.com blog and they've developed this open source tool that enables you to do automated failover. Now I'm going to look more into this, but it's pretty interesting what they're mentioning here. Now there's other tool sets that do that, but I'd be interested to see how this works. And basically this tool allows you to set up, for example, a primary and a secondary. And it says it does, it automatically sets up streaming between the primary and the secondary. And it institutes a monitor that does state checks for each of the databases to make sure each one is up. And this is actually on the Citus data GitHub archive. So the first thing they do is they install the packages, assume the user postgres, and they create the monitor that's going to be monitoring everything. Then they in this example created a small disks for their first node, node A. And they created a database again using this PG Auto control command and they defined the monitor for this particular database that they created. So this would be their primary basically. And then they say they set up to run a Keeper here, which the Keeper is what runs on the database to monitor that it is up and active. And then they create their second node and run the Keeper on the second node. But in this case this is all happening in one instance. In their example in here you can look at, they say a watch replication with this command here and you can add data to the primary and then you can see it on the secondary. And then they caused a failover by filling up the disk essentially in their node one or node A. And they say after a few failed attempts, its Keeper signals that the node is unhealthy and the node is put into the demoted state and the monitor promotes node B to be the new primary. And that node B is not considered in full primary state since there is no secondary present. It is marked as wait primary until a secondary appears. And the other interesting thing they list here is something that was done with Postgres ten is that the psql client and others based upon Lidpq, you can set up more than one connection to a database and define which one is read write. So they're using this ability to be able to seamlessly switch between node A or node B. So this is a potentially very interesting tool and I'm actually going to be checking this out and looking at it. One thing that came to my mind is PG rewind, would PG rewind work in a scenario like this or is that a feature that's being added? So I'm definitely going to look into that, but definitely pretty interesting, definitely encourage you to check it out. [00:04:56] The next post is PostgreSQL using Create user with caution. And this is again another very short post, but it's talking about distinguishing, they say the golden rule or distinguishing between users and roles. And basically PostgreSQL just has the term of role. You can make it a user or you could essentially make it a role and have other roles inside of it or users. But the issue they say that you can run into is that when you assign a user so let's say you created a table A and you assign the user Joe to it. So Grant select on a to joe if you need to then drop that user, you're going to get an error that says error role joe can't be dropped because some objects depend on it. In other words, there's privileges on the table A and that there's no such thing as drop user cascade. And the reason being is that these users are created at the instance level, not on each database. So how you get around this is you actually assign roles to objects and then put essentially users in those roles. So, for example, they use cleaning staff, for example, they have a role cleaning staff with no login. So it's not meant to be a user, it's meant to be a role that others are placed into, for example, and you grant the select on that role. And then you can grant or add Joe to the cleaning staff role using this command. Grant cleaning staff that role to Joe. And then lastly, they also mentioned they also have a utility that they developed called PG Permission that allows you to look at all the permissions that exist on your system. Now, related to this post, there's also a second one that did a follow up called The Role of a Role within Another Role. And this is from Luca 1978 GitHub IO, or I should say F Luca, because it's Luca of Ferrari and he's talking about the know roles. Are they users or groups? And really they are both users and groups. And a role is an abstraction for saying it's a collections of permissions to do some stuff. So basically it's a collection of permissions. [00:07:03] A role can be logged into or cannot be logged into. And this goes into more depth about inheritance and non inheritance and how things work. So he goes over a lot of different examples and mentions how PostgreSQL by default now actually grants inherit to users added to a role so they have the permissions of that role. But you can also do no inherit as well if that's something you desire. So if you're wanting to get more in depth knowledge about roles and how they work with users, definitely two blog posts to check out. And I should mention the first blog post about roles is actually from CyberTech postgresql.com. [00:07:43] The next post is introducing Hyperscale Citus on Azure database for PostgreSQL. So this is a post on the Citusdata.com blog. And a number of months ago, Microsoft acquired Citis data. So this looks like one of the first big things they're introducing as a result of that purchase is that they've introduced a new product called Hyperscale Citus that's part of their Azure database platform and essentially takes Cytus and makes it more convenient easy to use with Azure. So this was just basically an announcement post to that fact. I'm talking a little bit about how Citus works. It allows scale out of hundreds or more nodes of PostgreSQL and do scale out and execute queries in parallel. Now they go into more depth about that in this YouTube video. That's called building PostgreSQL apps at any scale with hyperscale citus. And this is on the Microsoft Developer Channel. Now, this is actually a variation that Cytus has done presentations on before called PostgreSQL at any scale. So this is a variation of that talk. But they're incorporating it as the final step using Hyperscale with Cytus so it talks a little bit about Cytus and how it does scale out. It talks about the Hyperscale offering that's part of Azure, as well as some additional features and things they've built into it with just a little bit of general advice on how to get started with PostgreSQL. So if you're interested in learning more, this is a presentation to check out. There's also a shorter video that's only about four minutes in length, also related to Hyperscale and Citus called Unleash Analytics on Operational Data with Hyperscale. Citus on Azure database for PostgreSQL. So again, another piece of content if you're interested in checking out these new offerings. [00:09:40] The next post is PostgreSQL version twelve new feature optimizer support for functions. So this is from Cybertechn Postgresql.com and they're talking about functions as black boxes, meaning the optimizer. They say, quote here, that PostgreSQL optimizer can't really do a lot about functions, so it can't understand basically what they're doing. And for example, if you do an Explain select all from unnest an array, it can't look inside the array to see how big it is. And it's just the planner is just giving you an arbitrary amount of 100 here. But version twelve has added support for this support function. Now, I believe, if I'm interpreting this correctly, is that you actually have to use some C code to kind of do this function. So at this stage it seems like this may be of benefit for internal usage, like internal functions for PostgreSQL. I mean, you could clearly probably build your own, but it looks like some of the advantages could be done with the internals tool. So for example, that same example we just looked at in version twelve, you actually get the correct estimate. So again, we have three units in a row doing an unnest and you can see the estimate is three rows. So being able these support functions to be able to look inside of what functions are doing from the planner's perspective could be of further benefit for performance for version twelve and moving armed from there as different functions are looked at and optimized. So if you're interested in that, definitely a blog post to check out. [00:11:16] The last post is PG backrest. A great backup solution and a wonderful year of growth. So this is talking about essentially PG Backrest backup tool for PostgreSQL. They go over the installation, how you from packages or from source if you want to, how to configure it, do a backup and then do a restore of that backup. So if you're wanting to get started with PG Backrest, definitely a blog post to check out. [00:11:43] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 104

Reducing WAL, What is Null?, Understanding SQL, TPC-H Benchmarking | Scaling Postgres 104

Episode 273

Debian / Ubuntu Packaging, ivfflat Indexes, In vs Any, View Logging | Scaling Postgres 273

Episode 85

Recovery Configuration, Alter System, Transaction Isolation, Temp Table Vacuum | Scaling Postgres 85

Slow Queries, pg_auto_failover, Roles, Hyperscale | Scaling Postgres 63