Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about backup compression postgres I O parquet files and PG stat monitor. I'm Kristen Jameson, and this is scaling postgres episode 214 one.
[00:00:22] Alright, I hope you, your friends, family and co workers continue to do well. Our first piece of content is parallel server side backup compression. This is from Rhos blogspot.com and he's talking about a new feature that should be coming in postgres 15 that does server side compression. If you use PG based backup to do a backup of your database that is essentially a client application, the server side really doesn't do that much other than allow connections from PG based backup. And the PG based backup client essentially backs up and stores the data the way you want to. Maybe you want it zipped, maybe you want it put into a plain format which replicates the existing file structure. So all that configuration happens from the client perspective. Not really much happens on the server. However, with this new feature, it enables you to do compression on the server side as it's sending the client data as well as do it in parallel. And that's what this post talks about. Now, in my use case for backing up databases, typically I run Pgbase backup on the actual server that is running the database. So essentially it's just a local connection. But I've never used the compression option in Pgbase backup because it's just too slow. And what we do instead is pipe it into Pigs, which is a parallel gzip. So I've been using that solution for years, essentially to use PG based backup to backup databases. But this new feature is pretty interesting. So he's using a particular data set from a UK land registry and the database is 3.8gb, so not too big. And he tested two different backup formats using this serverside compression capability. So he tested the Tar format and then the plane format, and he tested a variety of compressions. So he used none, so no compression. He used the default gzip method, he used LZ Four. Then he also used Z standard, which is actually something that I've learned that Facebook actually developed a number of years ago. I think maybe 2016 was when it first started.
[00:02:33] And the Z standard actually allows you to do things in parallel, much like I was saying, the parallel gzip that Pigs offers, so this might be able to do the whole thing in one go. So, of course, what's most interesting to me is the parallel compression capability and looking at the Z standard for eight workers. So this would be the server doing some compression responsibilities, even if you're backing up from a separate server, which I don't typically do, but if you do that, there will be a load on the server because it's doing compression before it sends the data over to you. Now, looking at this, the compression ratio looks like it's doing about a third. So it went from 3.8gb to 1.3gb once compressed down. So that is very good result. It's better than LZ four, it's even better than Gzip. But even I think you can adjust the compression ratio with Z standard. So I'm not quite sure what parameters he set for that. But in terms of the timing, you'll see that the Tar solution is almost twice as fast as no compression at all. It is 20 times faster than Gzip, and it's three times faster than LZ four. So this is a huge improvement from my perspective. Now, what you notice is that the plain format for using PG based backup is twice as long, essentially, as the Tar format. The reason being because when you're using server side compression, it compresses everything, sends it to the PG based backup client. In the Tar format, the client just simply sends that data to disk. But with a plain format, it actually has to decompress everything and rewrite it out to the format that matches what's on the server. So essentially it has to do a decompression step, which is why it's twice as slow, essentially, as just using a Tar file that just accepts the data from the network and writes it to disk. But this is a fascinating feature, and it may alter how I'm backing up databases in the future. But if you want to learn more, definitely check out this blog post.
[00:04:36] The next piece of Content IO and PostgreSQL past, Present, Future this is from the CMU or the Carnegie Mellon Database Group YouTube channel, and they're talking about Ion postgres and its current state. And that in and of itself is very interesting. But the second half of the talk is essentially the future where we're going. And with the advent of NVMe drives and Asynchronous I O becoming more prevalent, given these new standards and designs for the technologies that have just become available in the last three or four years, asynchronous I O has become almost essential for getting great performance out of these drives. So Andres is going through the process of making changes to postgres to make Asynchronous I-O-A reality. And he did show one slide where checkpointing happened like ten times faster than normal. So not everything will be that performant, and there's still issues they're dealing with. But this looks like a pretty impressive improvement once it's fully implemented. Now, this is probably not going to be present in version 15, maybe not even version 16. We'll have to see. It's going to take a fair amount of time to make the changes to support this type of Asynchronous I O, but if you want to learn more, definitely encourage you to check out this YouTube video.
[00:05:58] The next piece of Content parquet and postgres in the data lake. This is from Crunchydata.com, and this blog post is talking about a data format called parquet, which I actually wasn't familiar with, but they start the post referring to retools conversion experience to an up to date version of Postgres. We actually covered this in a previous episode of Scaling Postgres, where they were migrating a four terabyte database. And if you remember that over half of the database, I believe, was essentially log related tables of data. That really doesn't change that much. And he essentially represented it here in this graph where you have a lot of terabytes that are relatively static, but then you have a smaller amount of, say, gigabyte level dynamic data that probably is changing quickly. Now, what changes here is probably new inserts into appendonly tables. But in terms of conversion, moving a lot of data is much harder than moving a smaller amount of data. So he theorized if you were actually to partition this larger data into smaller partition tables, then it should make it a little bit easier to work with. Now, essentially, these are still static because they are unchanging if it's an append only table and it's just the most recent data that's going to be written to. But he theorized that this should be a much easier way to do a conversion if you ever need to. Now, one way he proposed doing this is to place this data in a data lake. So basically storing all of these tables in another data source, proposing using these parquet files. Now, what's interesting about these files, there's actually a foreign data wrapper for these parquet files. So you can actually go into Postgres and query the database and it will actually go and read these parquet files and return data from it. And apparently these files can also be consumed by R, Python, Go and other types of applications. Now, what makes these parquet files a little bit different is that they are language independent, they are column oriented. So as opposed to being row oriented, they are column oriented, they are typed, they are binary, which makes them more efficient, and then they're also compressed. So it's very easy to compress something in a column oriented format because the data types are all similar. And he actually came up with an example of doing this, taking some data from a CSV file, loading it into Postgres, and then converting it into parquet files. And then he actually queried the parquet files and did some comparison and the timings weren't significantly bad. Now, this was with a smaller data set. So as you come up with large data that you're trying to do this with, say, terabyte level, does it work as well? Don't know. But there's a lot of caveats because the parquet foreign data wrapper is relatively new. And how well would this work ultimately? But even with this proposal he mentioned, you could instead store it in a separate database. So if you're going back to the original scenario of where you're trying to migrate a large database, could you move all of this static data into just a separate database, not even using parquet files? But a separate postgres database, and then use a postgres foreign data wrapper to query them if you need to. But this was a pretty interesting technique to handle some historical data. So if you're interested in learning more, you can check out this blog post, the Next piece of content. Understand your PostgreSQL workloads better with PG Stat monitor. This is from Procona.com, and they're talking about the general availability of the PG Stat Monitor, which is an extension that's been developed by Procona.com. Now, this is meant to be an enhancement to PG Stat statements. So PGSTAT Statements keeps a record, if you enable it, of all the statements happening to your server. Now, it's a cumulative record of all those statements happening on the server, and they can be reset. But therein lies the problem, is that it's cumulative, so it's hard to trace back in a particular frame of time if something happened to queries or to Correlate, it to, say, a deployment that went out. And what PG Stat Monitor does is it actually placing these metrics in buckets. So as opposed to have something cumulative, it is bucketed by a particular time frame that you can define. So that enables you to correlate, say, a deploy that results in poorer database performance you can more easily target. Okay, when that happened, what change happened that caused this? In addition, it adds a number of columns that aren't present in PG Stat statements to give more insight. One of the main ones is the client IP. So knowing which server is sending which traffic they have, the bucket, the bucket start time things related to the query itself, like the plan ID, the relation, the application name, as well as some Cpuser time and CP sys time. So these are all meant to allow you to keep better track of what's going on with the statements running against Postgres. And then this is designed to work with another solution that stores this information long term. So this extension defaults to a maximum of ten buckets, each containing 60 seconds of data. So that's quite short. Essentially, that's ten minutes of what it retains, but the intention of it is to send the data to a longer term storage solution elsewhere. So you're not going to store all of this data that's being generated on the Postgres database itself. And they do have their monitoring solution that you can tie this into, of course. But this extension is something that can help you bucket your statements, as opposed to having one cumulative set of metrics and PG Stats statements. Now, there are some comments in here where some people ran into some issues using it. You may want to take a review of this before you start implementing this. Now related to it. Timescale.com posted an article point in Time PostgreSQL Database and query monitoring with PG Stat statements. So essentially, they're doing the same thing, and they want to bucket the PG Stat statements and they're using their extension timescale to do this and they're creston hypertables to store the data. So it's essentially partitioned data and as opposed to calling their buckets, they're calling them snapshots. So essentially you're taking periodic snapshots of the PGSTAT statements table and storing it in these hypertables or partition scheme. Now you need the timescale extension to go through this process, but you could also just use partition tables and stock postgres. I don't think you need the timescale extension to implement something similar, but they do have all the code here of how you could do this yourself as opposed to using the extension that was mentioned in the previous article. But if you want to learn more, definitely check out this blog post.
[00:12:50] Next piece of content, PostgreSQL 15 using merge in SQL. This is from Cybertechugh and Postgresql.com and they're talking about the new merge statement coming in postgres 15. They have an example here where they created a table and they inserted some data into it using Generate series and they have IDs one through ten with values ten to 100, so ten times what the ID value is, then they use this merge command. So they merge a particular query that they did. So the query or the set of data that they're generating, they're doing eight rows because they're doing 16 rows but skip every other one. So essentially just getting the odd numbers. So eight rows, you can see it says merge eight down here they're using the same ID and then a random number times 1000. Now they're matching it against the table using this on clause. So they're matching up the main ID here and they're saying when the data from the query matches what's in the table, then update the data and set the value to this random number. When it's not matched, then insert that value into the table. And as you can see for odd numbers it's going to update here, here. So every odd number, but once it hits eleven, it's essentially not doing the update anymore because at eleven it's no longer matched. So it's supposed to just insert those values. So it inserts 1112 and 13. Now, in addition to doing inserts and updates, you can also do deletes. So they show that example here of the same query interacting with the same table when it's matched, then delete it. So essentially all the odd rows are now gone. But you'll notice 14 is a new value because it needs to merge eight, but there wasn't a 14th value. So we went ahead and inserted that 14th value. But this is a good blog post that covers the new merge command coming in postgres 15.
[00:14:54] Next piece of content, it's bureauclock for PostgreSQL. This is from Ads Corner and he's talking about the function now and when you use it, it's now going to say it's bureauclock as opposed to give you the time. So how do you implement something like this and he says that functions in PostgreSQL can be overloaded, but you can also have functions with the same name in different schemas. So he's going to use this and create a function in a schema and then alter the search path. So what schema it's going to consult first when you're going to be executing command to determine which function to run. So by default, default functions like now are in the PG Catalog, but by default, PG Catalog is the first in the search path. So what he needed to do is put the public schema ahead of the PG Catalog schema in the search path. So he did that with a session. So he said, set search path to public and then PG Catalog. Now, at this point, he can create his function. He created a now function. So it's the same name as the existing now, but he actually added another parameter where you can change your beverage and then essentially it returns it's and then whatever beverage you select, o'clock. So by default, it's going to say It's bureauc. So now, when you do select now it'll say it's. Bureau o'clock. Or you could do select now. Whiskey. It's whiskey. O'clock. So this is a quick little post about how you can override different functions in postgres and gives you insight into the search path as well as creating functions in different schemas. But if you want to learn more, definitely check out this blog post.
[00:16:33] The next piece of Content atomic Commit with Two Phase Commit informed Data Wrapper distributed Setup this is from Haigo CA, and they're talking about the scenario where you have a coordinator database and you want to distribute your database across multiple nodes. So you're going to have part of the data residing here on, say, distributed node, one part of the data here and part of the data here. And when you're wanting to do an update that's going to impact multiple distributor nodes, how do you coordinate that? Now, if you do, it very simplistically and you send an update from the coordinator node to the distributed nodes. So you just send an update and then you send the commits. It's possible that one potentially does not commit because of a network issue or some other issue with a host. And now you have data committed here and not committed here. And that could cause a big problem with consistency of the database. Now, one way around that is to use a two phase commit. So basically, you do the updates to all the nodes and then you send a prepare transaction. And if one node doesn't prepare, it essentially it's easy. You can just roll it back and no harm, no fuss. The issue comes is if all of them did the prepare, so all three distributed nodes got the update, then they did the prepare transaction. But at the point that you commit, the prepared one did not commit. And again, this leaves you in an inconsistent state. Now, how they're planning to handle this is actually send a commit warning message and rely on a global Transaction manager to keep aware of these issues. Because once this node comes back online, they should be able to auto commit it, because they do have the information to commit the transaction, it just wasn't done for some reason. So there's a way to handle this scenario. So if you want to learn more about this, definitely check out this blog post.
[00:18:29] The next piece of content. The PostgreSQL person of the week is Abbas. But if you want to learn more about Abbas and his contributions to Postgres, definitely check out this blog post and the last piece of content. We did have another episode of the Rubber Duck Dev show this past Wednesday evening. This one was on the Jira Confluence incident, where they were down for many days for a subset of their customers. So we discussed through ways that this could have been perhaps made better and potentially was avoided. So if you're interested in that type of content, definitely welcome you to check out our show.
[00:19:05] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.