Episode Transcript
[00:00:00] In this episode of Scaling Postgres we talk about F Sync stopgap, auto vacuum nap time width and CTE changes and postgres community. I'm creston. Jameson. And this is scaling postgres episode 52.
[00:00:22] I hope everyone is doing well. Unfortunately have a bit of a cold. So my apologies for my voice this week. But the first thing I wanted to mention is that there is a live show that's coming up this Thursday. So if you go to Scalingpostgres.com, at the very top any of the pages there's a way to click the check it out button and there's a way to go ahead and claim your spot for the free webinar. And we're going to be discussing scaling your postgres configuration. So basically from 10GB to ten terabytes, what are the different configuration options you need to change for your database? And again, this will be happening on Thursday, February 20 eigth at 02:00 p.m. Eastern Standard Time. Or eleven, actually, a M Pacific Standard Time. So I'll include a link in the show notes but feel free to go ahead and sign up to attend the webinar. Our first piece of content this week is actually new releases of PostgreSQL. So eleven point 210.7, et cetera. Now one of the big highlight changes is changing the behavior of F sync. So we covered this on a more recent episode talking about that F sync issue where if there's an error that occurs during Fsync, the data potentially may get lost. So the change that they've made is actually causing a panic which would basically cause the PostgreSQL instance to stop and restart itself. And then PostgreSQL can then replay the data from the write ahead log to help ensure the data is written. So I would call this kind of a stopgap measure. They're potentially thinking about things to maybe do for Twelve that may be a little bit different. But basically they chose this path to say we want to avoid data loss insofar as creating a panic to avoid that potential of data loss. So normally I would suggest go ahead and install your point releases pretty quickly, but definitely you want to monitor this better after making this changes in case you're having problems with your file system because this may bring some things to light. Now in addition to that, there are tons of other bug fixes and improvements that are listed here that I'll let you look at from the link in the show notes.
[00:02:37] The next post is Tuning Auto Vacuum Nap Time and this is from the Robert Haas blog and basically he talks about tuning the configuration parameter Auto Vacuum Nap Time and what it is and what it does. And I always like his in depth blog posts where he talks about certain features like this. So I highly suggest checking this out. But basically this triggers when an auto vacuum launcher process is going to start a new auto vacuum worker to scan a database in the cluster to determine what are the next tables that need to be vacuumed. Now he says by default this is 1 minute, but he's had some cases where he's seen it increase to an hour or even one case one day. And he says that extending it can cause huge problems because basically you're like turning auto vacuum off. And how this configuration works, it's like how soon does it wait to actually start up another auto vacuum worker. So if it's set for example to a day that first day it may only bring up one worker to auto vacuum one table, the next day it'll bring up another worker, the next day it bring up another worker. So that's getting close to the position of having auto vacuum turned off. So you definitely don't want to extend this too long if you need to. And he said there's only one case where he's actually made it shorter than a minute, that it actually offered some improvement. That was for a table that was basically being used as a job queue and there were a lot of inserts and deletes happening on that and in that situation it helped but otherwise it probably makes sense to leave it where it is or just tweak it slightly if you are noticing issues. So definitely blog post to check out. The next post is with queries, present and future and this is from Crunchydata.com and it's talking about CTEs which are common table expressions or with queries and basically it helps make your SQL more readable. Now one thing that they're mentioning here is that there's a new patch that has been made with PostgreSQL twelve. And what this does is it changes some previous behavior and actually pushes down queries into the CTE. So for example, in a previous episode of Scaling Postgres, we talked about this post, be careful with CTE and PostgreSQL basically using the with clause, they noticed that CTEs and PostgreSQL is that the database will evaluate the query inside the CTE and store the results. And we have this example here where you have a with, he just called it CTE. So a with clause select all from Foo. Then he says select from the CTE where the IED equals an integer. It's going to do a full sequential scan on the Foo table and not use an index that exists. So it materialized. The CTE? It actually created this as a data structure and did not push down essentially this where statement to be able to execute the query faster and use the index. And they were saying whereas you can do a subselect and this technique it does use the index scan. So going back to this original post, they basically made a change in PostgreSQL twelve that's projected to be released when that's released where in certain conditions it will push down those where statements to give you better performance and they go through and do a bunch of tests to show you how it gets pushed down and in what situations. Now you can control the behavior to a certain extent by saying whether a CTE should be materialized or not. So there is a materialized keyword and a not materialized keyword to specify whether you want that CTE materialized or not. So overall this is a great feature and should improve a lot of queries. But on the other hand, because it's changing the default behavior, it actually may negatively impact some queries. Now again, this is not coming until PostgreSQL twelve and we'll have to see what's included in there. But this could be something to watch over when you make the migration to twelve on what the impact will be to your queries.
[00:06:46] The next post is if PostgreSQL is the fastest growing database, then why is the community so small? And this is from the Timescale.com blog, and he talks about how for the second year in a row, PostgreSQL is still the fastest growing database management system. And then they reference these different articles on all these organizations that have started standardizing or converting to using PostgreSQL. But what's interesting, the number of attendees at PostgreSQL conferences seems rather small, at least compared to others. Like for example, they mentioned here the two biggest PostgreSQL conferences had about 1100 attendees, whereas MongoDB World had over 2000 attendees. And then Oracle Open World has about 60,000 attendees. But the point about this Oracle Open World I believe they sponsor Java. There's a lot more other content that's being discussed rather than just the Oracle database, so I can kind of understand that. But it's still interesting that the PostgreSQL community is not larger. Looking at some of the other statistics anyway, they're proposing how you could potentially get more involved with the community in terms of joining the Postgres Slack channel, following on Twitter, or joining a Meetup or a Postgres users group. So I'd be interested in what you think. Why do you think the community is potentially smaller than it possibly should be? Do you know if a lot of developers follow PostgreSQL? Anyway, I thought this was an interesting blog post to check out. The next post is PROJ six in PostGIS. Now, I'm definitely not an expert in PostGIS, I haven't really used it, but they do have a reprojection support library called PROJ, and they're talking about changes that they've started making in order to support PROJ Six. So if you use PostGIS, and potentially this utility, definitely a blog post to check out the next post is georedundancy of PostgreSQL database backups with Barman. So if you're using Barman, they've introduced a way to do georedundant backups. So if you have a database server in one location being backed up by Barman, and another database server in another geographic location being backed up by Barman, you can actually transfer the backups between them to ensure georedundancy. So if you use Barman or want to use this feature, that's a blog post to check out. And the last post is monitor PG backrest backups with Nagios. So again, this is specific. If you're using PG backrest to backup your PostgreSQL instance and you want to better monitor using Nagios, definitely a blog post to check out.
[00:09:25] That does it. For this episode of Scaling Postgres, you could get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.