Book Sales, B-tree Boost, More Postgres 12, Using pgBackRest | Scaling Postgres 92

Episode 92 December 02, 2019 00:13:17
Book Sales, B-tree Boost, More Postgres 12, Using pgBackRest | Scaling Postgres 92
Scaling Postgres
Book Sales, B-tree Boost, More Postgres 12, Using pgBackRest | Scaling Postgres 92

Dec 02 2019 | 00:13:17

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss book sales, boosts to b-tree indexes, more Postgres 12 features and how to setup and use pgBackRest.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/92-book-sales-btree-boost-more-postgres-12-using-pgbackrest/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres we talk about book sales, b Tree Boost, more postgres twelve and using PG backrest. I'm creston. Jameson and this is scaling postgres episode 92 one. [00:00:20] Alright, I hope you're having a great week. The first piece of content is actually an announcement that two books are on sale this Cyber Monday and also Black Friday. But of course you're missed about the time this episode is published. So hopefully you've watched this episode early and SQL Performance [email protected] is on sale and there's a Cyber Monday 19 code where you can get the PDF for free. So I've actually purchased this so if you're interested, definitely a book to check out. The second book is The Art of [email protected] and they're also having a sale and they're having 50% off with a coupon black Friday 2019. So definitely a great time to get some great educational materials on PostgreSQL. [00:01:14] The next post is Btree index improvements in PostgreSQL twelve. This is from Cybertechn Postgresql.com and it's talking specifically about the improvements that were made to the index. It's mostly an infrastructure improvement and the example that they list here is they created a two column table that's thought to be considered a join table. So they have a column aid and bid that are begins and there is a composite or multicolumn index for the primary key across both and then a secondary index on bid in case you wanted to query those numbers faster. And then to that they added test data using this query seen here using Generate series, and they say each bid is related to 10,000 AIDS. Now the first thing you'll notice is the size difference. So in version eleven the size is 545 megabytes for the bid index alone, but in version twelve it's 408 megabytes, so it's 33% bigger in version eleven. So the question is how is this happening? And they say that every instance of bid occurs 10,000 times in this index. So there's a lot of duplicates in this index, so a lot of the leaf pages, all of the keys are going to be the same. And since PostgreSQL doesn't quite know where a new index is going to go, you could get one of these middle leaf pages being split and you're not necessarily going to be filling in each leaf page. And the reason is that such entries are stored in no special order in the index that are all the same value. But they did mention the rightmost leaf page was always split towards the right end to optimize for monotononically increasing inserts. In contrast to this, other leaf pages were split in the middle, which wasted space. But what they're doing in version twelve is they're adding the physical address or the TID, the tuple ID as a part of the index key. And as they say so duplicate index entries are stored in table order, so the physical order of the table. So of course the obvious benefit is that it says you should get significant performance benefits when trying to pull out this value because they'll all be clustered together on disk and then they say moreover pages that consist only duplicates will be split at the right end, resulting in this densely packed index. So it basically packs up the instance more to get that space savings. And the second index improvement that they did is compressing storage of internal index pages. So looking at version eleven and version twelve, it basically when doing a query using it looks to be the primary key. Actually one less block is read in version twelve and they say because internal pages can fit more index entries. So how does it do that? It describes how it started storing the TID as part of the index, but they also have a way to reduce this redundancy because if you did it for every single index key it would take a hit to Space Savings where the TID is redundant as our non key attributes from an include clause. They introduced a truncation of these redundant index attributes and they show here in version eleven you can see an item length of 24 when looking at it through the page inspector extension, whereas it's 16 in the version twelve. So that also gives you new space savings. But then the important part to remember here if you want to get these improvements is that you actually have to do a reindex or rebuild the index in order to get these benefits. Because doing an upgrade using say, PG upgrade is not going to automatically upgrade the indexes, you actually have to do a reindex of each index to get these benefits. So definitely a good explanation of the index changes that came with version twelve of PostgreSQL. [00:05:12] The next post also from Cybertechn Postgresql.com is discovering less known PostgreSQL twelve features. Now some of these are going to be redundant that we've covered in previous episodes, but I'll just run through quickly some of the ones mentioned here. First one is automatic inlining of common table expressions which was discussed before. Allow paralyzed queries when in serializable isolation mode so that could give some potential performance benefits if you're using a serializable isolation mode for some of your queries, JIT is enabled by default. But they also in here mention is that if you're not going to be using like a data warehouse use case or maybe your database is not quite as large, they say you could also tune the jitting threshold family of parameters JIT timesum cost so that the small medium data sets won't use them. The next one is support for this SQL JSON path language. Allow foreign keys to reference partition tables which we've heard before. Add partition introspection functions to be able to look at roots and ancestors in the tree of a set of partition tables. Add connection parameter TCP user timeout to control libpq's TCP timeout show the manual page URL in PostgreSQL's help output for an SQL command. Allow vacuum to skip index cleanup, which could have some benefits but of course you don't want to use it. You do eventually need to vacuum those indexes. Add explain option settings to output non default optimizer settings. So doing settings on could be beneficial to get additional detail when you're doing an explain plan. Allow logging of statements from only percentage of transactions to minimize the number of log files. Cause recovery to advance to the latest timeline default. This is when you're doing recovery. So this is beneficial not having to set it all the time because this is usually what it's set to is the latest and now it does it by default parallel automatic index rebuilding using a reindex concurrently, which is great. PG checksums can now enable disable page checksums for an offline cluster and they discuss a little bit here. Hopefully it eventually they'll be in online mode, but right now it's offline. Allow create tables table space specification for a partition table to affect the table space of its children. So this looks like a definite convenience. Allow values produced by queries to be assigned to PG bench variables. So this will be useful if you use a lot of PG bench. Allow fractional input for integer server parameters, which is interesting because now you can do things like set your work memory to 1.5gb as opposed to say 1500 megabytes. Allow vacuum DB to select tables for vacuum based upon their wraparound horizon. And then lastly is the changes for recovery.com in that it has gone away and that they now have the signal files. So just yet another post of all the different changes that came in PostgreSQL twelve, so if you're interested in learning more, definitely a blog post to check out. [00:08:16] The next post is setting SSL TLS protocol versions with PostgreSQL twelve and this is from secondquadrant.com, they're talking about the new settings, the SSL Min protocol version and SSL Max protocol version. So I tend to do a lot of this working with NGINX and now PostgreSQL has the capability to set a Min protocol version as well as a max one and they go through when different things were released. In terms of the different protocol versions, SSL is not considered secure anymore, but they were mentioned here for completeness and then they mentioned the TLS versions. 1.2 is pretty much what most people are using. There are some that are still supporting one and 1.1 and 1.3 is relatively new, not really much use as of yet. But if you actually want to see what your PostgreSQL instance is doing, there is, as I mentioned here, a PG stat SSL view to see the SSL status of all connections. And he mentions you can also enable log connections in the server for each connection attempt because it'll list the SSL parameters used. He mentions when you connect generally you're going to get an output if it's an SSL connection and gives you the different parameters as well as you can use the PSL command backslashconinfo to have the information printed at any time. So if you're interested in learning more about this setting and how you can adjustment and even these different ways to determine what SSL version PostgreSQL is using currently, definitely a blog post to check out. [00:09:49] The next post is hope is not a strategy. How hardware issues affect your PostgreSQL data. This is from Crunchydata.com and this is pretty much an opinion piece. He was saying that he was made aware of a bulletin from Ewlet Packard where certain models of their SSDs have a firmware bug that will cause drives to deterministically fail very suddenly at precisely 32 768 hours of operation. And he says, of course, this is pretty much due to an overflow of assigned short integer and that this could also be affecting other models of these hard drives. So if you have these in some sort of arrayed array, you could suddenly have the whole system go down. Because you're not going to only have one failure of a drive due to some mechanical issue or electronic issue, but there's a programmatic issue that's going to bring all of the data offline and it just caused him to prompt some questions, like where exactly is the data? If you're trusting the cloud to do it, are they taking care of the hard drives? Are they keeping track of this to be aware of these issues? What kind of drives are holding your data? Is it SSDs, is it magnetic storage? Does firmware get updated, et cetera? Are you doing regular continuous backups and are you testing those backups to verify that you can restore them? And he mentioned someone that he worked with previously had said hope is not a strategy. So if you're just hoping that's not a strategy, you really want to know and be relatively confident that your data is being taken care of. Now, related to this, they have another post called how to get started with PG backrest and PostgreSQL twelve. So basically they go through the whole process of using PG Backrest to do a backup and then test restore exactly kind of what they're advocating. And they go ahead and do this on a CentOS box and use PG Backrest 2.19. And they using the most recent version. But what they make note of here is that if you're using PostgreSQL Twelve with the new recovery changes, you're going to want to use a version of PG Backrest 2.18 or greater. So using the recent version, they install PostgreSQL, install PG Backrest, they configure PostgreSQL to be able to do the backups. They configure PG Backrest, put it all together along with archiving the wall files and performing the first backup, and then as the last step, the restore the backup. So if you're wanting to get started using PG Backrest, this is definitely a great post to check out the last post is that PGConf in Russia's YouTube channel has recently added a number of videos from a recent PG comp conf conference in Russia. Some of these are in English and others are actually translated in English. So if you're looking for some video educational content, definitely a website to check out. [00:12:53] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 324

July 14, 2024 00:16:29
Episode Cover

Four Million TPS | Scaling Postgres 324

In this episode of Scaling Postgres, we discuss experiments to achieve four million transaction per second, the importance of extended statistics, parallelism in Postgres...

Listen

Episode 255

March 05, 2023 00:17:43
Episode Cover

Integer Overflow, User-Friendly Permissions, Dump & Logical Replication, Worker Config | Scaling Postgres 255

In this episode of Scaling Postgres, we discuss how to detect & handle integer overflows, a wish for user-friendly permissions, using a dump to...

Listen

Episode 209

April 03, 2022 00:14:14
Episode Cover

Tidy Vacuum, Dropping Roles, Merge Command, PgBouncer Tutorial | Scaling Postgres 209

In this episode of Scaling Postgres, we discuss how to optimize vacuum, how to drop roles, the new merge command in PG15 and a...

Listen