Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about Arm 64 with apt, contributors, Backup, Manifests, now and Timestamps. I'm Kristen Jamison, and this is scaling postgres episode 113 One.
[00:00:22] Alright, I hope you, your family and coworkers continue to do well with the situation that continues to be with us. But our first piece of content is Arm 64 on Apt postgresql.com. And this is from Christoph Berg's blog and basically the Apt repository for PostgreSQL now covers the Arm 64 architecture. So these are for Arm processors and the Raspberry Pi is a popular one. So they're not supporting all Arm processors, but they are doing Arm 64 which is the 64 bit version. So these are now available and they're targeting debian buster, bullseye Sid, as well as bionic and focal for Ubuntu. Now they also mentioned that the Focal repositories for 24 Ubuntu is now available on the repositories as well. And note about old PostgreSQL versions. So if you potentially want to use an Arm processor, maybe you want to check this out now. Now, normally this has been for smaller devices, but I know that Amazon EC two instances, not that I'm necessarily advocating them, but I know they've had a instances a while that have a custom built Graviton processor that uses the Arm architecture and they also have a newer M six G for medium workloads that get up to 256GB of memory. And they're using their second version of the Graviton processor and they state that they deliver up to 40% better price performance over the current generation M five instances which generally use intel chips. Now I see that it looks like there's a greater than 20% savings, so maybe there's more performance with these, I'm not really sure, but definitely having another processing architecture available is definitely interesting.
[00:02:14] The next post is who contributed to PostgreSQL development in 2019. And this is from the Robert Huss blog. And this is the as he says, the fourth annual post on who contributes to PostgreSQL development. And they are ranked here by looks like the number of lines contributed, the percent lines is also represented and the number of commits. So you can see who's taking the top spots here as well as committers of non self authored code and those people with at least 80 emails to the pgSQL hackers list. So definitely thank you to everyone who contributes to PostgreSQL and makes it what it is today. And this is from the blog at Rhoss blogspot.com.
[00:03:00] The next post is Backup Manifests and PG Verify backup in PostgreSQL 13. So this is something to look forward to in 13 where they are now providing a backup manifest to track what files have been backed up as well as a means to verify the backup. So this post from second quadrant.com walks through this new feature set. So basically what is the content of the backup manifest file here? Basically it's a JSON object with the following keys manifest version all the different files included in the backup wall ranges that include information such as the timeline, the LSN of the backup start and backup end as well as a checksum. So he goes through and runs through the process. He does a PG based backup. He's not streaming here and specifying the directory. He looks at the content of the backup directory and there's a new file called backup manifest and here's what the JSON file looks like. And then of course with this file you can then verify the backup with PG verify backup and he goes through and does that validation. Now he took this backup without the wall files because he was doing those separately. That's what the x none means, do not backup the wall files. So it's going to result in an error when trying to verify it and he actually skips the validation of the wall files using the N option and you can see it was verified. My assumption is that you could specify the location of the wall files if you didn't record them, but that wasn't mentioned in this post. And here he actually goes in and modifies a wall file to essentially corrupt it. And you can see it does result in an error when doing a PG verify backup. So, definitely interesting feature coming in postgres twelve and if you're interested check out this blog post.
[00:04:54] The next piece of content is PostgreSQL now versus now timestamp versus clock timestamp. So now is a function that returns what the current time is. Now what now timestamp is it is actually a constant. So whenever you execute now it will return the timestamp. But this is a constant that will store the value of what the current timestamp is. The first example you shows here, when it's part of a transaction it will still return the same value. For both of these methods, the function and the constant, it returns the exact same timestamp. Then still within this transaction if you sleep for 10 seconds it will still return that exact same timestamp. See 578-57-8578. So essentially time according to now is frozen within side this transaction. So where do you see differences in the constant? Well, if you use it as a part of creating a table and using a default, for example if you say default now, it will always return what the current timestamp is. So if you examine this table you can see the default is now. Whereas if you specify this constant it will take the time now and store that for that default. So here you can see it's actually storing that time. So I think this is probably more of the limited use case for now colon colon timestamp z now, the differences between now and the clock timestamp is that the clock timestamp gets constantly updated. So for that previous example in the transaction the clock timestamp would continue to be updated with what the current clock time is. Whereas now maintains the same value. So this is a pretty good review of these different timing functions. And if you want to learn more and look at more in depth in his example here, definitely check out this blog post from Cybertechn Postgresql.com.
[00:06:48] The next piece of content is actually Bruce Momgm's site at momgm us where he has nuggets of wisdom that he posts from time to time. Really very short, but one was particularly noteworthy this week that I wanted to mention and that is with PostgreSQL twelve, CTEs or with clauses are now inline by default and there is a keyword you can use with materialized in order to determine whether to inline those or not. So this is one of the few optimizations that you can add to an SQL query within postgres. Normally there are no ways to add optimizer hints, but with your CTEs you can specify whether it should be materialized or not. So that's just something to keep in mind is that once you eventually do upgrade to postgres, if you have some CTEs or with queries that aren't as performant as you think they should be, check out the materialized keyword to see if you should potentially not materialize them or not to get better performance for your queries.
[00:07:51] The next post is Index corruption in PostgreSQL. The hidden cost of your queries. So basically it's talking about index corruption, how to potentially identify it and see it, and then of course, what to do about it. Now, I personally haven't encountered this knock on wood, but the main way to see it is that basically queries aren't using the index for whatever reason, and a way to resolve it is to rebuild the index. And if you're on version twelve, you can reindex concurrently or previous versions, just create the index again and then you can later drop it once that index is active. So if you're interested in learning more about this, definitely check out this post from Enterprisedb.com.
[00:08:32] The next post is the best medium hard data analyst. SQL Interview Questions And this is from a document that's on Quip.com. I don't know if there's a related post somewhere, but it goes over some different interview questions and how to approach those from SQL. So for example, number one is a month over month percentage change. Number two is tree structure labeling. Number three is retained users per month. Number four is cumulative sums. Number five is rolling averages. Number six is multiple join conditions and then several window function practice problems as well. So if you're really wanting to understand more of SQL and again these medium hard problems, definitely check out this piece of content.
[00:09:23] The next post is Local persistent Volumes and PostgreSQL usage in Kubernetes. This is from Secondquader.com and they're talking about using Kubernetes and the type of storage. So they're talking about setting up a local persistent volume to see check the performance of that versus the network storage. And they believed that the direct storage would result in better performance. So they wanted to check this. They have their test system. They set up here and then ran different performance benchmarks. And here are the results. So, looking at sequential reads and writes, these are the disks as declared by the manufacturer. And then this is their bare metal results. And I believe they did Raid these with a Raid one. So that explains why the reads are twice as fast as what the disk normally is. So this is about twice as fast. So using local persistence volumes, you can tell it's within a few percentage points. Just a hair slower than bare metal. Same thing for the writes. Whereas if you're looking at the Open EBSC Store pool volume doing a more network solution, you can tell the performance dropped dramatically, essentially one 20th of the performance and one 10th of the write performance. So basically this communicates to me is that if you're wanting to use PostgreSQL and Kubernetes, you should focus on using local persistence volume versus the CSTORE pool volume. But if you're wanting to learn more about that and the methodologies they use for testing, definitely check out this post from Secondquader.com.
[00:10:55] The next post is Multi Kubernetes Cluster PostgreSQL deployments. So this is talking about deploying PostgreSQL in multiple regions or data centers using Kubernetes and using Federation, and it runs through the process of doing it for two Kubernetes clusters. Now they're having to use an active standby scenario, but it goes through and describes how you can do this using their Crunchy data operator. And this post is from Crunchydata.com. So if you're interested in learning how to do that, check out this post.
[00:11:28] Next post also from Crunchydata.com is deploy PG Admin Four with PostgreSQL on Kubernetes. So if you're wanting to do that to help manage the multiple instances you're going to be running of Postgres, if you're using Kubernetes, you can check out this blog post.
[00:11:44] The next piece of content is Jimmy. Angelakos is the next PostgreSQL person of the Week. So if you're wanting to learn more about Jimmy and his contributions to Postgres, definitely check out this post.
[00:11:57] The next piece of content is a review of four top PostgreSQL books. This is from the Enterprisedb.com Blog, and I mentioned a YouTube video previously. Well, this is basically just a text version of it. It mentions the different books that were covered as well as synopsis. So if you're wanting a more text version of that content, you can check out this blog post.
[00:12:20] And lastly is routing with PostgreSQL and Crunchy Spatial. So this is from Crunchydata.com and they're discussing the PG routing extensions to do routing on dynamically generated graphs. So if you're wanting to calculate routing using this in conjunction with PostGIS, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.