Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about Show plans, wall monitoring using UUIDs and default logging. I'm Kristen Jameson, and this is scaling postgres episode 109.
[00:00:22] Alright, given the unusual times that we live in, I hope you, your family and coworkers are safe and doing well. Our first piece of content is detailed look on the new PostgreSQL troubleshooting extension, PG Show Plans. And this is from CyberTech postgresql.com. And we had discussed this before. This new extension called PG Show Plans that allows you to see what plans live queries are using. So this goes into a little bit more detail about the extension. Basically, the idea is kind of like a PG Stat activity, but for each of those queries, see what the plan is and it says, what does this do exactly? Quote it copies query execution plan info from the private session backend context to shared memory context when the query starts so that is accessible to other users. Now, in terms of the installation, you can install from source code, but they also look to be offering Rpm and deb packages. And it works from 9.1 up to 13 beta. Now of course, once you have it installed, you do need to add PG Show Plans to the shared preload libraries. And it does have a few different configuration options that you can set for it. And of course, it will require a restart of your database. So here they show an example of an inaction. They've set up PG. Bench. Here, they've created the extension and then run the query to be able to look at the output. And here you can see the plan for this query here. So the intent is, if you have really long running queries, you could use this to be able to see, okay, what plan is this taken? Why is this particular query taking so long? Now on the previous announcement they mentioned something about performance. And here they're saying that for very parallel short statements say less than 0.1 milliseconds, there's a penalty of about 20%, which is not insignificant. But for normal, more slowish real life queries, the performance hit was small enough to be ignored. So based upon this, it looks like your mileage may kind of vary. But it also looks like they have a function that allows you to disable or enable PG Show Plans. So perhaps you could have the extension installed and then dynamically enable it or disable it, depending if you have problems you need to investigate. So again, some more details on this new extension. And if you're wanting to learn more, definitely check out this blog post.
[00:02:46] The next piece of content is new in PG 13 Wall Monitoring. And this is from our juju GitHub IO. So basically in PG 13 looks like they are starting to track wall generation statistics. It says, quote here, each back end will track various information about wall generation such as the number of wall records, the size of wall generated and the number of full page images generated. Now this is great because I've had to do some analysis of wall files to determine essentially why they were so large. And being able to track this per back end is a huge benefit. And then you can do a per query wall activity with PG Stat statements. So essentially some of these columns will be added to PG Stat statements. So this is huge. This is a great feature I'm looking forward to. And he goes through how some of it is implemented. So if you're wanting to learn more about that, definitely check out this blog post.
[00:03:42] And from the same site is new in PG 13 monitoring the query planner. So most of the counters that is in PG Stat statements, they just give you execution times. But with this addition, you can also get planning times. So for example, it looks like you can enable PG Stat statements track planning, and that will track the planning time for your queries. So if you have queries that are quite complex that take a while to do the planning stage, you can actually collect that information now with this new feature. So if you're wanting to learn more, check out this blog post.
[00:04:16] The next piece of content is empowering a Rails application with UUID as default primary key. And this is from Slash Betterprogramming. So we talked a little bit about BDR in the last episode and how they had the concept of a shadow master. And I said, well, one thing you're going to have to do is centrally create any primary keys be they IDs UUIDs. So this is a way using a Ruby on Rails application to set up UUIDs. And they show you an example how you would typically pull up a given article with using an ID as the primary key, whereas you can switch them to UUID. So it looks like this. So some of the benefits he's mentioning is that you get to obfuscate information in the URL. Perhaps it's a little bit of a security layer, although it's security by obscurity, of course. And then the benefits of EUID is that you can actually generate them from client applications. It doesn't all have to be generated within the database. So he shows you how to actually do what they call a database migration in Rails to be able to set up the extension. Here he's using the PG crypto extension to do it, although there is another you can use as well, as well as setting the primary key type to be UUID. Now, what he also mentions is that this framework, Ruby on Rails also has the concept of being able to pull up the first record from a given set you're pulling or the last record. And generally it orders that by the primary key. Well, that may not be what you're intending if you switch over to use UUIDs, but he actually has a way to do this where you can actually specify a different implicit order within the application framework to say, be the created at date. So then first and the last record kind of make more intuitive sense. And of course he makes this disclaimer be super careful if you're going to be transitioning from say, integers to UUIDs for existing projects. But if you're looking to make this change, here's an article on how you could potentially do it.
[00:06:20] The next post is debian default logging explained. And this is from mydbainok.org and this kind of goes into how Debian tends to set up a PostgreSQL when you're using their packages. And this would be for example, for a Ubuntu server. So when you actually do get the PG current log file from Postgres, it's actually blank even though you know it's logging somewhere. And by default it logs into VAR log PostgreSQL. So it goes over a little bit of an investigation on how to find it. So for example, checking the logging settings in Postgresql.com log destination was not set, meaning was set to standard error and logging collector was off. So according to Postgres, there's no logging to this location. So how is it doing it? So he looks into the I believe it's a perl wrapper Pgctl cluster that they set up on Debian systems. So he gets the right one and then he looks at this particular code and it looks like it launches the systemd postgres service. And here's the service definition for it, or I should say the template unit file for it. And basically it works because it's using the L or the log switch of Pgctl because essentially this calls Pgctl. And he said of course that means if you use the L switch of Pgctl log file, the PG current log file function won't be able to help you find where your log file is. So if you use PostgreSQL on a Debian system, you may want to check out this blog post to kind of get a better understanding of how Postgres is set up for the debian packages.
[00:07:55] Next post is Store Procedures in PG. Eleven. Better late than never. This is from a highGo CA and they're basically explaining store procedures and what they are. And he goes over the difference between functions and store procedures. Basically it's executable code and can run in different languages. Functions cannot have transactions, whereas store procedures can. And you execute functions just by running the function, whereas a store procedure you actually call it. And he goes into a little bit of discussion of how you can return a value from a store procedure, how it gets invoked, its general structure which is very similar to a function and it even supports overloading. So if you want to get more insight into the differences between store procedures and functions, definitely check out this blog post.
[00:08:42] The next piece of content is parallel query inside function. And this is from abdul Yadi's blog and he's talking about where he was running a query here that had a bunch of unions. And when he just runs the Raw SQL function, it runs in parallel, something similar to this, and then appends everything together and gives the ultimate result. And that takes around 1.9 seconds. But if you put this exact SQL code in a Plpg SQL function or in a pure SQL function, it actually takes much longer, about 3.4 seconds here and again about 3.4 seconds here. So the question is why? So we actually looked at his CPU output and you can tell the plain SQL function looks to be running in parallel, it's using all the CPUs, whereas the Plpg SQL function only seems to be using one CPU core. Same thing for the SQL function. So how do you get to be parallel? Now, he actually did kind of a hack to get it to work, whereas he said he instead of returning the query, the result set is saved into a temporary array in memory and then exported out once finished. However, in the comments they also mentioned is that you can do a parallel safe to your function to get it to be able to work in parallel. So this is a quick post, but it's just something to keep in mind is that if you're using functions and you want them to run in parallel, maybe you need to use this parallel safe.
[00:10:12] The next piece of content is Optimize PostgreSQL server performance through configuration. And this is from Crunchydata.com, and they're basically going over some of the areas that you most want to change as you're optimizing server performance as you increase the size of your hardware. Namely the Ford that they've chosen is Shared Buffers, wall buffers, effective cache size and Maintenance work memory. So they go over each of the configuration options and what they do and some recommendations for settings. So if you want a refresher on that, definitely a blog post to check out.
[00:10:48] The next piece of content is actually a YouTube video and it's postgres Pulse Insights how to Solve Bloated postgres Database improve Database Performance And this is from the Enterprise DB YouTube channel and it looks to be about on a weekly basis. They're going to have this live stream, I believe, where they answer questions that they find on various online forums. So if you're wanting to have a live stream or a video covering questions that the community has about postgres, definitely a piece of content to check out.
[00:11:21] The next piece of content is the new PostgreSQL Person of the Week. Is Michael Pacquiao or Pacquiao? My apologies if I'm pronouncing that incorrectly. So if you're interested in learning more about Michael and his contributions and work in PostgreSQL, definitely check out this new post.
[00:11:39] The next piece of content is Polygon averaging in PostGIS and this is from Crunchydata.com and they show you how to average polygons in PostGIS. So if you're interested in that, check out this blog post.
[00:11:52] And the last piece of content is related to machine learning. So it's how to use the random Forest machine learning model with two UDA PostgreSQL and orange. Part three. So this is the part three of this article series, so if you're interested in that, definitely a blog post to check out.
[00:12:11] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.
[00:12:29] Our Channel.