Seven Years of Scaling Postgres | Scaling Postgres 350

[00:00:00] Welcome to episode 350 of Scaling Postgres. So because I take usually a week or two off every year, that's about 50 episodes a year. So essentially that's seven years of scaling Postgres. So I don't know if I have viewers or watchers who have watched all seven years, but if you have, kudos to you and I want to thank everyone who tunes in every week to check out episodes, even if you may miss an episode here or there. Also, I do want to thank everyone for the comments from the show last week expressing well wishes and hope I'd get better soon. But hopefully I sound better this week. I do feel a lot better and thank you for all the kind words. So since it's been seven years and 350 episodes, I thought it would be a good opportunity to introduce a little something new to the show. Nothing major. We're still covering 10 or so blog posts that I prioritize by the ones I like the best or have the most to do with scaling postgres. But that also means that at the end you're looking at blog posts that I ranked 8, 9 or 10. So maybe people tend to drop off. Well, what I want to do at the end of the show is add something I'm calling Consulting Corner. So basically I'll take a minute or two at the end of the show and kind of describe what I've been doing in my consulting. So it might be talking about something I just discovered when I was working with someone, or I may just give a report on the kind of things I've been working on this week. So that will start this week and it will be at the end after all the blog posts are covered. But I hope you, your friends, family and co workers continue to do well. Our first piece of content is a new postgres block storage layout for full text search. This is from paradedb.com and if you're not familiar with it, this is a new extension called PG Search or Newish, where they actually embed an elastic like search engine within postgres. And that search engine I think is called Tantavi and I think it's written in Rust, but it's basically elasticsearch like so you get elastic searches in postgres and they've recently done an enhancement to the most recent version where now they are using postgres block storage, essentially using heap storage for working with files. It used to be independent, but now that it's using the block storage, the heap storage, postgres, it gives them a Number of advantages so first can now leverage the write ahead log so that allows you to do physical streaming replication. It allows point in time recovery from crashes. It now supports postgres MVCC and also utilizes the buffer cache of postgres and some of the results of that is the index creation time looks like it's three times faster now and the insert throughput seems to be about six to seven times faster now, although the transactions per second with regard to insert is only 170, so that's still pretty slow. Slow I think. I'm not quite sure why that's so slow. If you happen to know, feel free to put it in the comment section. But they go ahead and talk about block storage a little bit about PG search which I just mentioned and why they wanted to migrate to it. Basically it looks like the big reasons why are being able to use the buffer cache, a simple battle tested API to write buffers to the wall. So basically it's not something they had to reinvent and then creation and cleanup of files for them. So again that's not something they had to programmatically do. And then they go into a lot of detail with how Tantive does its file based index layout and how they mapped to the postgres block storage layout and some of the challenges they had to overcome such as large files can spill over a single block and how they used a linked list of blocks to be able to handle that. Challenge number two was the fact that blocks cannot be memory mapped, which is something that Tantovi tends to do. And then the segment count that Tantive relies on can explode and update heavy scenarios. So they talked about different resolutions they came up with to kind of get around these challenges. But I've always been interested in this extension. I still haven't had the opportunity to try to use it yet, but if you're looking for Elastic like search and don't want to implement Elasticsearch, maybe you want to give PGsearch a try. [00:04:34] Next piece of content Anatomy of table level Reducing locking impact this is from Zeta IO. This is a follow up to a post we talked about last week. This goes more into table locking. [00:04:48] Another way you can avoid table locking is using concurrently commands where they exist. So for example, there's Create index concurrently. You're going to always want to do that to avoid excessive table locks as well as if you are working with partition tables and need to detach one, you always want to try to alter the table and detach it concurrently. Now downside of these commands is they do take Longer to complete because they actually do multiple phases. They are non transactional, so they can't be in a transaction block and can't be rolled back. And it does require additional care to handle failures. So for example, if an create index concurrently fails, then that index is hanging out there and you need to drop that invalid index concurrently and then try adding it back again. But then they talked about operations that say you need to do, but you know they're going to be very taxing on the database or could lock a table. Well, you could take some of the techniques used in concurrently and apply migrations in a stepwise pattern to get those implemented. So one example that they're talking about here is if you want to add a new column to a table, make it not null and then make the default a clock timestamp. This requires rewriting every row of the table, because this function basically needs to be executed for every row of the table. So that's going to take a very long lock access exclusive. And basically your table is locked while it's adding this new column and setting every value for this new column, essentially rewriting the whole table. So instead of doing that, you can break it into three less blocking steps. So first you add the column allowing nulls and setting the new default for the clock timestamp. That does require a very brief lock, but as long as you're using lock timeout, that shouldn't be a problem. Then you need to update every row in the table with that new clock timestamp. And this is something that should be done on a batch basis. So you minimize the number of rows that you're locking every time you run this command. And then lastly, alter the column and set it to not null. So this process is less blocking than the previous one command. And they do mention that they do have an extension called pgrolle that can automate some of this for you do some of these migration changes. So you can check that out if you're interested. Now, they do say this last step that they mentioned, that's less blocking, it still blocks writes where you're at, adding the not null to that column. An alternative is to use check constraints. So you can add a check constraint and make sure that something is not null. But in order to prevent blocking with it, you would want to make it not valid. But what that means is that moving forward for new data changes, that constraint will be evaluated. And then the next step, you validate that constraint for all the old data. And then optionally, these last two options, you can alter the column, set it to not null and then drop the constraint. So those are ways to be less blocking but yet still make the schema changes you want to make. And they do say the only reason that adding a column null with the default required table rewrite is because of this function clock timestamp. If you were to add a constant, postgres can handle this just fine. You can just add this column and it's not going to cause issues. Although it still needs very briefly an access exclusive lock. But as long as you're using a lock timeout, you shouldn't run into issues with this. But if you want to learn more, definitely check out this blog post. [00:08:26] Next Piece of Content There was another episode of Postgres FM last week. This one was on not valid constraints, which is very similar to the blog post we just discussed. And Nikolai and Michael first said when they started discussing this is that generally they're of the opinion that constraint validation is a good thing to have in your database. Meaning some people advocate doing it at the application level, but really the database is probably the best place to handle this because it's closest to your actual data. They also talked about not null constraints, and I think particularly Nikolai was saying, well, if we have check constraints we don't really need nut null. But you know, they acknowledge it is part of the SQL standard, so doesn't look like that's necessarily going anywhere. But if you're going to be adding some type of constraint to a table that is a blocking operation. So of course the recommendation is to use not valid when adding that constraint. So what that means you just have to worry about the lock acquisition. So as long as you're able to acquire the lock relatively quickly, you can easily add a constraint as not valid. So again, just make sure to use a lock timeout and retry it if it happens to fail. But once that not valid constraint is in place, your data is protected for any new changes that happen, but then to apply to the old ones, you then need to validate that constraint, as was shown in the previous blog post. And they actually did have a discussion about the best way to go about this, and I think where they ended up is first you want to make sure that your application is not generating any more invalid data. So before even applying the constraint, even in a not valid state, you probably want to verify that the column is as expected, correct any data that's there, and just make sure the application isn't generating any invalid data in that column. Only then should you apply the constraint in a not valid state and then validate it. That should allow you to avoid any unnecessary application errors. But if you want to learn more, you can listen to the episode here or watch the YouTube video down here. Next Piece of Content Group Fixing Optimizer Estimates this is from cyber postgresql.com and he's talking about an issue that when you have a query and you have a lot of columns in the group by or more than one, a lot of times the number of rows estimated by postgres is overestimated due to how it does its calculations because these statistics are only collected per column and as a result when it tries to do certain calculations and he shows that here some of the math when looking at distinct columns, it expected to find in his example 240,000 rows, whereas the actual number of rows from a query was 60,000. So the way to get closer estimates is to actually use create statistics. And in this case, because how he's doing the group by and how the query is structured, he created an indistinct statistic across all the columns that were a part of his group by. Once he's applied these statistics and analyzed the table again, you now get much closer estimates. So Instead of over 200,000, it is now much closer to the 60,000 rows that are actually there. So if you're ever doing a query and you see the row estimates are really far off and you're using groupize, you may want to add a create statistics function to try to get better estimates. Although you do have to keep in mind sometimes making a change like this actually doesn't improve query performance. In his example, the runtime of either query, no matter what the estimate was, was exactly the same because there's not going to be a plan change. So you just want to make sure and evaluate that to see if adding statistics actually gets you a plan change and make the performance better. But if you want to learn more, definitely check out this blog post. [00:12:29] Next Piece of Content January Meetup who needs Kafka? This is from hdomrovskaya.WordPress.com and this is about an hour presentation from a January meetup talking about who needs Kafka. So basically, as opposed to using Kafka for pub sub, they're using Postgres. Listen, Notify publisher Subscriber capabilities and he goes about setting it up and using that instead of Kafka. So if you're interested in that type of content, feel free to check out this blog post. [00:13:02] Next piece of content PostgreSQL 18 add old new support to returning in DML queries. This is from dbi services.com and normally when you do an insert you can return all of the data that's inserted or maybe just the unique ID if you're using a serial or big serial for primary key, so you know what that ID is. But you can also use returning with regard to updates so it returns the value that was updated or return the value that was deleted. Well now in apparently 18 you can reference the old and the new rows. So in the case of returning from an insert, the old row is essentially null, but the new row would have of course the new values in it. Whereas in the case for a delete, the old row would return the old information, the new row would be null, and then an update would return the old and the new version. So the this is great enhancement. So check out this blog post if you want to learn more. [00:14:00] Next piece of content waiting for PostgreSQL 18 enable buffers with Explain Analyze by default this is from depesh.com and this is something I know Michael and Nikolai of postgres FM are very interested in having. Basically you don't have to type out explainanalyze buffers every time, you can just do explainanalyze and buffers automatically are added by default. But it looks like this is a commit that someone just decided to do David Rowley and just to see if it makes it through the commit fest and no one reverses it or has issues with it. So I guess if it makes it through we'll see it in Postgres 18. [00:14:39] Next piece of content how to test your PostgreSQL connection this is from timescale.com and personally I just open up psql to make sure I can connect to the database, but there is actually this other utility that comes with postgres called pgin Ready and I haven't actively used that. I usually just use PSQL because it's essentially the same parameters to connect to. It looks like it returns accepting connections or no response. I don't know how many different things it returns, but I generally use psql but they have all sorts of different ways to connect and make sure that Postgres is working for different languages, operating systems, etc. So check this out if you're interested. Next Blog Post logical replication in Postgres Basics this is from enterprisedb.com and they are just basically setting up a simple logical replication. So if you're interested in that you can check this blog post. Next piece of content who contributed to PostgreSQL development in 2024. This is from arhas.blogspot.com and every year he posts the contributors to Postgres in terms of a few different metrics. This first chart looks to be principal authors by percentage lines of code. So we've got the list here and this next list is committers, again by percentage lines of code, but they weren't the principal author of the patch. So we have this list here and then the last one is people who sent at least 100 mails to the PGSQL hackers in 2024, again sorted by count number of emails. So you can see all the people contributing that way as well. But if you had any role in developing working on postgres, thank you so much for what you do. Greatly appreciate it. Last piece of content PostgreSQL hacking workshop February 2025 this is from arhas.blogspot.com and it looks like it's the discussion of Heike Lina Kanga's Talk on the Wire protocol. So if you want to get started hacking with postgres and want to start participating in these meetings, there's a form here to able to get access to these. So if you're interested in joining, definitely check out this blog post. [00:16:51] And for this week I think I'll talk about an issue I've seen with a few different clients, so I guess I'll classify it as something to watch out for. So out of different clients that I've worked with, the clients that have the most wall generation are clients who are using random UUIDs for their primary keys. We're talking wall generation that over 24 to 48 hours can exceed the size of their entire database. So basically potentially terabytes of wall being generated. That's because all the inserts are happening all over the heap or the indexes with these random UUIDs and this random IO whenever a checkpoint needs to happen, usually a full page write has to happen to the wall, and that just generates a ton of wall records. Now, probably the worst is using a randomized text string for your primary key, because the size is bigger than a binary representation of a uuid, whereas a sequential integer is the best with regard to this, although now that we have UUID version 7, it's not in Postgres yet, that's hopefully coming in 18. But UUID version 7, the first part of the unique identifier is basically a timestamp, and then the latter part of the UUID is random. But the benefit of that is much less random I O because you're pretty much going to have more sequential I O because you're working with incremental timestamps for the beginning part of the uuid. So if you're seeing more wall generation than you expect out of your app, and you are using a random text String or UUID version 4 a random UUID, you may want to think about switching to UUID version 7 IDs that are timestamped based in the beginning and random at the end. Or just switch to integers, but that's much harder for people to do. [00:18:51] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content discussed, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 157

auto_explain Overhead, Postgres Data Lake, Citus Shard Rebalance, SSL Authentication | Scaling Postgres 157

Episode 45

DBMS of 2018, Data Anonymization, JIT, Reserved Keywords | Scaling Postgres 45

Episode 275

No More Vacuum, Zero-Downtime Cut-Over, Network Impact, Not In Optimization | Scaling Postgres 275