Faster Paging? | Scaling Postgres 325

[00:00:00] I think the user interface pattern I dislike the most for web application is a paging interface. I feel it runs counter to what a web page typically is and you just scroll down to get more information. [00:00:16] Now typically these paging interfaces usually use a query to get the total count of the records that need to be displayed and then use an offset in order to bring up that specific page. Well, as your record set grows, the performance just gets worse and worse. But our first blog post has some ways around that. But I hope you, your friends, family and coworkers continue to do well. Our first piece of content is keyset pagination with descending order. This is from cyberkyphenpostcarsql.com dot. And again this is from Marcus Winden in apparently copyright 2014. [00:00:55] Do not use offset for pagination. So basically he endorses definitely using keyset pagination. And this is from a different site. The author of this article is actually Lorenz from Cyberattack, but this blog post describes how to do keyset pagination when you have specific ordering you want to do in how you're displaying the data. So first he comes up with a table and has an integer value, a timestamp value and a text value in it, and he created some pseudo random data to store in it. And to do keyset pagination, you select the data you want from the table, order it by the values that you want to appear, limit it to 50. So that's the first 50 results. And then for every subsequent query, you take the last bit of data that was returned from this query and you use it in the next query to say give me the same data, but where the values are greater than the last record set that you gave me. That's pretty much the only thing different between these two queries is this one has a where clause saying give me everything greater than the last record I retrieved in the first result. So that allows you to page through the data, and as long as you have a multicolumn index on this essentially composite key here, it will return super quickly, no matter the size of the data. So it is possible to do this in a descending fashion. You just change your orderby to be descending and then adjust your where query so that it's still taking the last result set and using it for comparison purposes. The problem comes when you have a mixed order. So for example, here you're sorting ascending with value three, you're sorting descending with value one, and then the id. And if you put these columns here, what are you supposed to use it doesn't really work in this case. So you might be thinking, well, can't do key set pagination, but he has some inventive ways to get around that. The first is if your value is an integer, you actually use a negative of the integer. So here you're ordering by the same way you're doing up here, but instead of doing value when descending, you do the negative of value one. So essentially the descending of the negative and then your next page still orders by that, but you specify the where clause with the negative of the value. Now he says you do need to of course create an index with the negative in it, but this should work without a problem. So that's great. So that's for integers. What about timestamps? How do you do a negative of timestamps? He says well, what you can do is convert it to an I epoch, which is basically the number of seconds since I think 1970, and then do the negative of that. Now he says you are going to run into an issue if you try to do this, because when you try to apply the index, the index expression is going to think it's not immutable. So you actually need to create a function that is specified in as immutable and converting it to the epoch. And with that you can then use this negative function here, and it must also be used in the queries as well. Then your key set pagination will use the index. Now he says another way you can do this other than doing a negative is actually splitting up the query into multiple components. Now I'm not going to go into all the detail here, but BASICALlY you do three comparisons for the three different columns, but that's not going to be very performant. But he said what you could do is rewrite this as a union. All of where you're saying give me every record that's greater than the first value of the order by then get me all the records where the value two is the equal to that last date but less than value three. And then give me all records where value two, value three are equal to the final value and the id is greater than the last value. And these are all combined together using union, all with a specified order. And just give me 50 records and that will give you the full set of the next 50 records. And he says, quote, this looks ugly, but it's definitely performant. Now he says there is a third way, and that's by defining a custom data type. But this starts getting super complicated. You could see all the different work that has to be done to get this working. But he was able to do this successfully. And even he says here the third trick is cute, but perhaps too much black magic for practical use. So it seems like you may have to be a little bit desperate before you would try this option. But these are some great solutions to still use keyset pagination even when you're needing a very specific sort for data. And if you want to learn more you can check out this blog post. [00:05:46] Next piece of content implementing UUIDs version seven in pure SQL this is from PostgreSQL verite Pro. And he says as of May 2024, UUIDs are now part of the IETF standard and it's RFC 9562. And this basically gets us timestamped order UUIds, which should be much better to work with in a database environment where a totally random uUID can cause a lot of issues with random writes. He shows the structure here. Basically this is the timestamp component is the very first part. There's the version, there's a random a and then a random b area. So the timestamps at the beginning, the randomization is at the end. Now in terms of generating this, I'm going to skip to the bottom real quick and he says he has an a git repository postgres uuid version seven SQL that has all these functions that you can just use and run in your own system that has all of the functions he discusses here so you can start using them with postgres. You don't have to wait for a postgres feature because these functions are simply adhering to the standard that was written. So it should support any standards compliant UUid seven that's created. So what are some of these functions? The first one just basically returns a Uuid version seven. So this allows you to create UuId V seven s in your database. He does make a note on timestamp precision because right now it's at a millisecond precision. But if the order of the uuids are super important and you're creating them rather fast, like more than 1000 /second you're going to see an issue where they are not time ordered. For example, if you order the uuids here, you'll see the row numbers are all over the place. That's not in time order because it starts using the random component of the UID and the sorting mechanism. [00:07:42] But what you can do is trade some of that randomness so you can sacrifice the random a section here to be part of the timestamp so you get down to microsecond precision, which should be equivalent to basically the clock timestamp in postgres. And he has a function for this Uuidv seven sub Ms or millisecond. And as you can see when you order this one, all the row numbers are now in order by the order of their insertion, because more of the uuID component is used for time in exchange for being a little bit less random. So if randomness was less important to you, but the time component was super important, you could choose this option to generate the v seven uuids. And he said that the standard does take into account using this as an option. He also includes functions that help you do partitioning by the UUID. So he has a uuid v seven boundary here, and allow you to partition by a range essentially like per month partition tables. And lastly a function that allows you to extract the timestamp from the uuidgesthem. So this is great. I highly suggest you bookmark this post or this GitHub repository so you can have access to these functions and start using them. I know it's something I'm going to be doing. [00:08:59] Next piece of content, how expensive is it to maintain extended statistics? This is from Dano Levo dot substack.com. and last week he had a post where he was talking about using a tool to automatically generate extended statistics. And he says, well, if we start doing this, the question becomes, quote, what if it's too computationally demanding to keep statistics fresh because then no one would want to do it. So he decided to analyze the performance of how long it takes to generate some of these statistics. So we created a table with eight columns in it, integers, because that's the max number of columns that you can collect statistics for. And he's going to try several different statistics variations. So he's going to create statistics on some number of columns, analyze the table to show the performance. [00:09:52] So the first row here is just plain statistics. So two columns is 250 milliseconds, four columns is 285, eight columns is 367. So pretty fast. But what about extended statistics? So if you just did an indistinct, it was 300 484 and then 7500 milliseconds. So the second and the fourth columns were pretty quick, but doing an eight column definitely suffered a lot in performance if you did indistinct and MCV. Again, the two and four column weren't too terrible, but the 8th column was still bad. And then when you add dependencies as well, the fourth column started getting a little worse. And then the 8th column was 38,000 milliseconds. So the more columns you have to analyze, the longer it's going to take. But again, because these extended statistics, the plan is to do it for indexes. I haven't seen that many indexes with four or more columns in them. I mean, I have seen it, but when I create indexes, I generally like to keep it three or less, maybe four if I need to push it. So he had some ideas around this and he says maybe we could add some options to create stuff statistics so that we could minimize the number of combinations. So, for example, the reason why this is taking so long is because they do all combinations of the columns in different orders to collect the statistics. But he added an option here that basically just kept things linear. So there was only one combination that are assessed. But this is basically something he's exploring to see what changes could be made to potentially optimize what and how the extended statistics get created to make the performance burden of maintaining them as minimal as possible. So check out this blog post if you want to learn more. [00:11:42] Next piece of content simplifying time based queries with range columns this is from suvabase.com, and they're talking about when you have typically like event data or maybe some sort of object like a room that people may want to reserve. Historically, the main way to do it was to create two columns in your table. One a started add or start at, and then a end at or a stopped at or something. But this actually makes some queries you want to do a lot more complex and it can interfere with data integrity. So making sure things don't overlap is becomes a really hard problem. And a lot of code is required in an application, but using a range type makes this much easier. So instead of a start add and an end at, you could do say a duration with a timestamp range that defines the beginning and the end. And you can do queries and say, find the duration that overlaps with the specific duration I'm looking for. Further, it allows you to add constraints to the table so that you make sure you don't insert overlapping values as well. Now, it's not too useful to just have a constraint for a date overlap with the whole table, but it's more advantageous when you are applying to the object. So in this case, he's looking at the reservations table and he wants to make sure that a table does not overlap with a particular time range. So these two columns need to be taken into account and you can do this with an exclusion constraint. So basically exclude I using a gist index where the table id is equal and the duration doesn't overlap as well. And to do this you do need to use a b tree gist index. So this is a pretty quick post, but definitely if you're dealing with some sort of object that you need to check for overlaps on, range types are a great way to do that. [00:13:35] Next piece of content basics of tuning checkpoints this is from enterprisedb.com. so first of all, he has a good discussion on checkpoints and why they're important. Basically, when postgres is operating, it commits all writes to the write headlog and then saves the data changes to memory. Well, at some point that memory gets written to the actual database files and at the point that that happens, a record is inserted in the wall saying okay, as of this point in the wall, all the changes have been committed to the files. That way, in case you need to do a restore or the database system crashes, it knows it can restore the data files and then replay the wall log from that point forward to bring the database up successfully. [00:14:21] And then he talks about checkpoint tuning, and the most important value for that is basically your checkpoint timeout. Now by default the checkpoint timeout is five minutes, but the shorter you have your checkpoints, the more load it's going to create on the system. But the longer you have your checkpoints, then if you have a recovery event, the longer it will take to get the database back up. And usually the sweet spot that I use is 15 to 30 minutes. He uses 30 minutes here, although further down he says 30 to 60 minutes. So I think the sweet spot is around that 30 minutes period. You could go a little shorter, you could go a little longer. Based upon do you want a fast recovery or do you want to minimize the overhead of checkpoints? Now that's not the only configuration option that configures checkpoints. There's also max wall size, because if the wall reaches a certain level, it also triggers a checkpoint. So checkpoints could be happening faster than this 30 minutes limit. Now he describes a way to estimate how much wall size you potentially need and does different analysis. But what I don't see here is a discussion on the checkpoint warning parameter. So what I like to do is just set a max wall size from a gut perspective, based upon how much wall is potentially there. You could use a more accurate way to do it. But what I do is I set checkpoint warning close to the checkpoint timeout. Maybe it's five minutes less. Like if it's 30 minutes, I set the warning at 25 minutes and then you check the logs and what it does is it logs a warning if the max wall size is hit before the checkpoint timeout. So it's a way for you to know that maybe you need to increase your max wall size. So if you're seeing this warning a lot, basically means you need to bump up your max wall size a fair amount. If you don't see it at all, maybe your max wall size is too high so you can try reducing it. And maybe once you start to see one or a few times it happens. Maybe that's the right point, or you could increase it just a little bit from there. But personally that's how I like to adjust the max wall size. But this is a pretty comprehensive blog post I encourage you to check out next one is also from Thomas here. This is auto vacuum tuning basics. This is from enterprisedb.com and he talks all about what vacuum is, why it exists, how to monitor it, and then how to tune it. Basically talking about the different thresholds or scale factors you can set to make sure your tables are well vacuumed, as well as making sure you're not throttling auto vacuum too much because the default values are really too low, like the vacuum cost limit here. For production databases, I usually always have to have it at least somewhere in the range of like 2000. He also mentions be careful adding too many auto vacuum workers because these limits, the throttles that I just mentioned are for the whole pool of auto vacuum workers. So if you add like ten auto vacuum workers, each worker will work slower because they're using that common throttle pool. So fewer auto vacuum workers actually can get more work done because there are fewer workers using that shared pool. Then. Lastly, he also mentions whether you should make these auto vacuum setting changes at a global level or per table. He prefers definitely doing at the global database level in a table if you need to, and I definitely agree with that assessment as well. But check this out if you want to learn more. Next piece of content Cloud native PG benchmarking this is from dBI services.com and this is another blog post by Daniel Westerman covering cloud native PG talking about how to benchmark the performance. So check this out if you want to learn more about that next piece of content. System logs aggregation with postgres this is from Tuxnet Dev and he says basically under the heading just use postgres for everything. He wants to use it as a log aggregator. So this is a pretty long post and he goes through how he's setting up the database and his schemas and different users for each of these systems that's going to be logging the data. But the real key for me is this package r syslog PgSQL and this is the key thing that allows our syslog on a node to be able to log into a postgres database to log the data to it. So if you're wanting to use postgres for everything, including log aggregation, this blog post can give you a head start on that last piece of content. There was another episode of Postgres FM last week. This one was on the postgres startup ecosystem. So Nikolai and Michael discussed all about the different businesses and service providers that are doing postgres nowadays. And I you look at the links here, it's a sea of different companies, some self funded, some vc companies, definitely. In the last few years, the ecosystem has grown dramatically just by looking at the number of blog posts that I'm covering per week. Usually I used to be able to do a 1015 minutes show and cover every blog post. Now I'm lucky if I get through half of them. But if you want to learn more, you can definitely check out this piece of content. [00:19:41] I hope you enjoyed this episode. Be sure to check out scalingpostgrads.com where you can find links for all the content mentioned, as well as sign up to receive weekly notifications of each episode. There's also an audio version of the show on the site, as well as a full transcript. Thanks, and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 6

Terabyte Scale, Permissions, Fast Column Adds, pgBouncer | Scaling Postgres 6

Episode 185

Postgres 14 Released, Using JSON, Not Using Indexes, Sequence Gaps | Scaling Postgres 185

Episode 17

Real Time Analytics, Index Decrease Performance, work_mem | Scaling Postgres 17