Just Upgrade, Vacuum Updates, Connection Strings, Postgres Showcase | Scaling Postgres 80

Episode 80 September 09, 2019 00:13:47
Just Upgrade, Vacuum Updates, Connection Strings, Postgres Showcase | Scaling Postgres 80
Scaling Postgres
Just Upgrade, Vacuum Updates, Connection Strings, Postgres Showcase | Scaling Postgres 80

Sep 09 2019 | 00:13:47

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss why you should just upgrade, vacuum updates in Postgres 12, psql connection strings and a Postgres showcase.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/80-just-upgrade-vacuum-updates-connection-strings-postgres-showcase/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres we talk about just upgrading vacuum updates, connection strings and postgres showcase. I'm Kristen Jameson and this is Scaling Postgres, episode 80. [00:00:20] Alright, I hope you're having a great week. Our first piece of content is Just Upgrade how PostgreSQL Twelve can improve your performance. This is from Crunchydata.com blog. Now this blog post starts off talking about what is the major feature or features of this release, but what he says here quote many of the features and enhancements in PostgreSQL twelve will just make your applications run better without doing any work other than upgrading. So hence the title. Just upgrade. So he talks about some of these different areas and the first area he mentions is major improvements to indexing. Now this is something I wasn't aware of or haven't read about, but he said PostgreSQL twelve makes significant improvements to how B tree indexes work. And from experiments using TPCC like test showed a 40% reduction in space utilization on average. So things that update tables regularly could see notice improvements to disk utilization and in turn that would mean less space required for caching those indexes, presumably. And the other big benefit he also mentions here of course is being able to re index indexes concurrently. And actually you may need to do some of that to gain some of the benefits listed here in terms of space utilization. And then he also talks about it also reduces the overhead of wall records for the gist gen and SP gist indexes when an index is building. So that is another potential advantage. Then he talks about partitioning improvements and significantly in the area of being able to process tables that have thousands of partitions. So things like maybe you're using time series storing time series data, those will presumably be more efficient, not for us, just from querying a few tables out of those partition sets, but also improvements on insert speed. And he also mentions along with this that there's also been some boosts to copy. Then he mentioned CTEs being in line. So that is a big boon that's mentioned here in this section with queries get a big boost the fact that JIT is now essentially a default. And then of course there's all the additional features, but all of the ones he mentioned here of mostly of performance nature in some space wise that could maybe get you some performance improvements are all really great additions. So definitely a blog post to check out if you want to learn more. [00:02:55] The next post giving that PostgreSQL twelve is around the corner, is a primer on PostgreSQL upgrade methods. And this is from Cybertechn Postgresql.com and he basically is covering the three main backup methods that exist. The first one is just a logical based dump and restore. Just dump all the table data or in doing a restore of it. And then he covers binary in place upgrades using a PG upgrade and it can happen really fast with using the link method and then talking about using a logical replication as an upgrade method. And he has this convenient table here that says essentially the logical dump and restore, simple, safe, somewhat flexible. The downsides it's the slowest method of doing an upgrade and doing a per database approach. Has some pitfalls using a PG upgrade. Basically the binary in place, as he's saying here, is a fast, very fast. If you're doing the link method you still have to have some downtime, but it's still pretty fast and the old instance won't be affected in the default mode. In other words, when you're not doing the link, the downsides are it's more complex than the dump and restore method. It can be risky in link mode because you're actually reusing the same data files. It's just using a Linux hard link you're going to possibly lose standby servers. You have to do a lot of work in using R sync to be able to sync them back up. There's no default way to upgrade the standby servers and if you're using the default mode, essentially you're going to have to have double the disk space required and then the last method using logical replication. So just transferring data from one version to a newer version of PostgreSQL can be the shortest downtime possible safe with a possibility of thorough live tests and essentially very flexible because you're just determining how you want to move it over. But this is the most complex method. You may need some schema changes. It can possibly be slow performance wise because you're having to send everything over using logical replication and it's always per database so you can't do a whole cluster that way. You need to do each database. So if you're looking to upgrade your version of PostgreSQL, definitely a blog post to check out to help you determine what is the best method for upgrading your database. [00:05:25] The next post is PG Friday postgres Twelve revs Up Vacuum so basically they're talking about vacuum, they talk about some basics and they say that when you're doing a vacuum, it also needs to vacuum each of the indexes because they can bloat as well. So they did a test here and running a full vacuum using a PG bench example, which I think they had about 10 million rows in it. It took 21 seconds, or about 21 seconds, but there's this feature in version twelve where you can avoid cleaning up the indexes. So when you set index cleanup to false, it actually happens about half the time, about 12 seconds. So that's pretty great. You're probably thinking but you always need to eventually vacuum the indexes, otherwise they're going to bloat. And I imagine there's issues if you are never going to vacuum those indexes. And I don't think there's a way to configure auto vacuum to ignore the index cleanup necessarily. But they have some suggestions on how to potentially do this a little bit lower down in this blog post. Now here they added another index and you could see the time extended to 28 seconds. But doing the index cleanup Faust, it was a little bit higher, but still it's about half the time of vacuuming the indexes. And if you have tons of indexes on the table, I imagine the benefits could be increased. Now then they go over the disadvantage of skipping indexes. So basically you don't want to do this as a long term thing. Your indexes do need to be cleaned up. Maybe you could have some scenario where you're doing index rebuilds along with not doing the index cleanup versus vacuum. And they talk about going into the transactions and the 2 billion XID limit, which is why vacuum, one of the primary reasons vacuum exists in the first place is to be able to freeze visible rows so that they can all be visible to everyone. So they get into using a vacuum and vacuum DB command line tool and you can actually say only do a vacuum where the minxid age is a certain value. So when the transaction age is at say 98,000 for each of these tables, you can set it at 100,000 and basically no work will be done when you're running a vacuum of the database. However, if you put the minimum at 50,000, then all of these will be picked up and it will do the vacuum and it finished in 37 seconds. What's interesting is when you combine this with actually doing a freeze. So for example, they had a table with about 100,000 max transaction ID for one table and close to 200,000 for other tables. So we actually did a vacuum DB freeze with a min transaction age of 100,000 and now it's going to just freeze those tables that have a transaction greater than 100,000. So you could see this as they talk about in the conclusion here. This enables you to freeze certain tables at a certain level on a periodic basis, basically do the work of vacuum in smaller bits and parts, but more frequently, maybe it's a part of maintenance jobs that you schedule. Now, auto vacuum should still be running, but this enables you to fine tune and do maybe more work when the database is less busy and even the index cleanup flag. Maybe you need to get a vacuum done on a table, but maybe you want to defer the cleanup on the indexes. Now you'll eventually need to do it, but there's probably one time maintenance tasks or you have issues you could address having this feature. It's just the suggestion of course here, which makes sense, is don't always do your vacuum with just an index cleanup, you need to eventually vacuum your indexes as well. So this is a great blog post talking about some of the new features coming in version twelve for vacuum. So I encourage you to check it out the next post is Postgres Connection Strings and psql. So this is talking about the psql command line to utility and how it can use convenient connection strings when connecting to it. Instead of using what you may typically see, where you're using parameters like U for the user, d for the database, h for the host, p for the port. You can build it as a connection string like this, similar to how you may see it in some application development environments. Well, you can pass it in essentially as the database name and it will go ahead and connect. So essentially here there's three different connection methods being used and each of them gives you the same value. And he gives an example of using Python here that uses Lib PQ. So just a quick little post on connection strings for PostgreSQL. [00:10:20] The next post is actually a new blog that has been started. Again, this is for the Art of Postgresql.com site for the book that was recently released by Dimitri Fontaine and he started doing some blogging about SQL. Now it's not really about PostgreSQL administration, but just discussing SQL in general and using PostgreSQL as the example. So if you feel this would be of interest to you, definitely check out this new blog. [00:10:48] The next post is updates for the postgres showcase project. [00:10:53] So this is a list of topics covered mostly of an SQL related nature on how you create tables, create databases and they go into more detail how to connect, doing transaction methods, creating altering tables, partitioning other table modifiers constraints, view store procedures, triggers so a lot of information related to how you set up the table structures and objects within the database. So you can see the GitHub project here, that is the Postgres Showcase and it shows basically very short set of code. So a lot of code that shows you how to do things like in this view it explains views kind of what they are and shows you some examples. So if you're interested in this kind using this as a resource, definitely a blog post and a GitHub project you may want to check out. [00:11:46] The next post is remote backup and restore with PG backrest. Again. This is from CyberTech postgresql.com. So this is a follow up to a previous blog post about PG Backrest where they just installed and did a backup on the same database server. This one actually sets up a separate PG backup server that then connects to a database to do the backup and even tests to restore. So it goes through all the different code that you would need to install and get it running and does the backup and restore. So if getting to learn PG backrest is of interest to you, definitely blog post to check out the next post is Cloud Vendor deep dive PostgreSQL on Google Cloud platform. [00:12:30] And this is from the several nines.com blog and this is a comprehensive examination of Google Cloud's implementation of their PostgreSQL service. [00:12:40] So they go over very long and comprehensive post about all the different features that are available. What are some advantages and disadvantages. So talking about almost every single example you can think of and using PostgreSQL on the Google Cloud platform. So if that is of interest to you, definitely blog post to check out. And the last post is Creating a PostgreSQL cluster with Kubernetes. CRDs and CRDs are custom resource definitions. So this is a post about the PostgreSQL operator, which is a tool developed by Crunchydata.com, to my knowledge, that eases creation of PostgreSQL clusters on Kubernetes. So if you're interested in doing that, definitely a blog post to check out. [00:13:24] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 105

March 16, 2020 00:19:22
Episode Cover

World's Best, Subtransactions, Sharding, Schema Changes | Scaling Postgres 105

In this episode of Scaling Postgres, we discuss how Postgres is the world's best database, performance ramifications of subtransactions, the current state of sharding...

Listen

Episode 178

August 16, 2021 00:15:11
Episode Cover

Postgres Releases, Performance Discussion, Index Types, Index Power Use | Scaling Postgres 178

In this episode of Scaling Postgres, we discuss new releases for Postgres & pgbouncer, have a performance discussion, describe the different index types and...

Listen

Episode 242

November 21, 2022 00:11:38
Episode Cover

Reduce Replication Lag, Explain Parameterized Query, PostGIS Day, Multiple PgBouncers | Scaling Postgres 242

In this episode of Scaling Postgres, we discuss how PG15 helps reduce replication lag, how to get the parameters used in prepared statements, PostGIS...

Listen