Episode Transcript
[00:00:00] Welcome to the 300th episode of Scaling Postgres. I can't believe it's been almost six years since I did the first episode. Officially, it was February 27, 2018, and here's what episode one was about, petabyte scale and locking. And I think the postgres version at the time was version ten, so it's been quite a while. What's amazing is looking at how many links used to be in the episodes like this has, I think, eight links, whereas in preparing for the content this week, I think there's close to 30 links. So even though it's great to celebrate the 300th episode, but postgres itself has come a long, long way. We've had six new versions, and the amount of content produced every week has at least doubled, if not more. So that's incredible. There's so many more companies now that weren't there in the past supporting postgres or designing postgres solutions. So that's just phenomenal. But I would also like to thank you, the viewer or the listener, if you listen to the podcast of the show, because without you, I wouldn't have kept doing it. So thank you so much for that. I greatly appreciate it. But as you know, I hope you, your friends, family and coworkers continue to do well. And I think that's enough for the nostalgia. But what about moving forward? And of course, the thing I'm going to mention is my course, which I said it was going to be coming in 2024. I projected the end of January. I'm not going to quite make that, but I can say now definitively it will be open February the 26th. So in about four weeks or so. And this is the course open date. Of course this will be available to all the early adopters that signed up for it. But if you're looking to get in on the course, there will be opportunities to do so coming up. The sales page hasn't been prepared yet, but it's very close, and I thought I'd give you a sneak peek of some things that I'm working on. So my plan was to create as real a database as I could with two terabytes of data in it. And as you can see, this is the database I'm working with currently. It is just over two terabytes, and I have a little snapshot here of the different tables in it. So there's an events table for capturing page views and things of that nature that you can see has 5 billion rows in it. There's an email transmissions table with a billion rows payments table with 150,000,000, about 20 million users and about 4000 accounts. So this is kind of going to be like a software as a service application, and I'll provide more details about what the application is as we move forward into the course. Open date but I thought I'd run a query here really quickly because you'll see in some of the content this week people were talking about their really large table being 20 million rows and I was like, I got a 5,000,000,001. So I just thought to query to see how fast it would return. And what I'm doing is just pulling a month of data by the recorded at looking at a particular account and accounts. The user id, which is not a part of any index, and uncached it ran in 138 milliseconds, whereas cached it was at 1.8 milliseconds. And I should say there's relatively speaking, hardly any indexes on the database yet. So this size will only grow. I don't want to make it too, too large, but it is over the two terabyte threshold at this point. And I should say that a lot of this is subject to change. I can recreate the database if I need to, to make alterations as I'm finishing up the course. But that was just a little sneak peek. Let's go on to the first piece of content, which is one PID to lock them all. Finding the source of the lock in postgres. This is from crunchydata.com. And if you have your system having issues and it looks like it's because it's locking excessively or it has a lock that is really long running, these are some techniques to be able to find it. So the first place you want to look is pg stat activity and specifically looking the wait event wait event type to see what's waiting and look for things that are not idle. So you'll see active ones and also idle in transaction ones. And here's the results of this table. If we look at the state, you do see an active one and then one that's idle in transaction, and you can see the one that's active is in a lock weight event state and it's waiting for the relation. Now it's probably pretty obvious that this is the one that's locking, but if you have a system with hundreds or multiple hundreds of processes, it could be hard to identify this. So the next thing they propose to find it is to find which PID is locking the table, meaning that there's a lock on the relation. So what PID is locking this relation. So you go to the pglocks table, you put in the process id of the one that's locked and look for where it's not granted. It'll give you this row and then you can see the relation that it's trying to get access to but can't. This may be obvious if you look at the query but this tells you right here. And then you query the pglocks table again using the relation oid where it is granted. And you can see this is a process that has that lock and you can look at the Pid for it. So this is the one that's holding the lock. Then you can query pg stat activity again for this PID and you get the exact process that has the lock and has not released it yet. And this was the same one that we saw earlier just because there's not as many processes now he says this is pretty straightforward to do for one example. But you can have chains of locks so something's locked on something else and then something else so you can have a chain of locks. And what he's provided is this CTE query to be able to identify what is patient zero, what is the one that's causing all the locks to happen. And you can see it here, the output of this particular query. So I would definitely keep this to use in your own database. You can see all these active states that are locked and the culprit is this one up here and in this case it's slightly different where they're doing a select for update. So this caused all these other updates and an alter table statement to be locked. Now he then goes on to ways to end the process holding the lock. If you're in the process you can do a commit or a rollback. If you're outside of it you can try to cancel it or even terminate the whole back end. He talks a little bit about other causes of locks, but the other thing to mention is that the process of finding these locks work great if they're long running locks but if they're very transient or they happened 2 hours ago, they're not happening anymore. How do you figure out what's going on? Well that's where you're going to want to turn on logging lock weights because then the logs can lead you through the same process to identify what the patient zero is, what the culprit is, as well as to minimize problems, especially when you're doing migrations, is setting a lock timeout so that'll cancel a query if it's unable to do its work because of a locking situation within this particular time period. So that's a must have for migrations in my experience. But if you want to learn more, check out this blog post next piece of content why postgres RDs didn't work for us, and why it won't work for you if you're implementing a big data solution. This is from Max Creamer on medium and he said they started using postgres RDS and he says, quote, when you're storing a large time series table, say 20 million rows, and the size of a table on disk is 10gb and you want to scan the table. Basically he wanted to get a report of throughput, which is very hard to do on rds the way they've set up things. And this is what I was talking about earlier, saying 20 million is large because of course I've seen a lot larger and they tried to improve their situation by increasing the I ops of their EBS volumes, but it didn't really help with the performance that they were looking for. They then switched to Aurora that they were hoping for, quote unquote promises of five to seven x performance increases. But unfortunately their costs went up instead of down, and apparently they still weren't getting the performance that they were looking for for the cost associated with it. So they went with rolling their own. So they basically set up EC two instances. They had a master or primary and a standby replica, and it says they were kept up to date using wall g, so I don't know if they were using streaming replication or they're using log shipping. So not quite sure about that. They said the write node could be slow since it's only ingesting data. So apparently there's not a lot of data on a daily basis. And EBS was good enough, but they were actually able to double the throughput using ZFS with compression. And then on the read node or the replica they set up ephemeral NVMe drives as well as ZFS, and I imagine that really boosted the read performance. But also take a look at their bill. It went from $11,000 down to $2,100. I'm assuming that's monthly, so that's really great savings. That's a fivefold improvement. Also, queries that would take hours or timeout altogether run in seconds, so that's pretty incredible. And I must admit for my servers I still just use EC two instances. I don't use RDS for anything at this point, but I will say with regard to ZFS is be careful because I have other clients that have set up ZFS for the storage with compression, and because there's only one apply worker with postgres, the replicas are having a really hard time keeping up with streaming replication from the primary, so sometimes it delays significantly. So there may be some cases where zfs and compression really helps things, but it can also cause some issues from my experience. But check out this blog post if you want to learn more. Next piece of content how to partition postgres tables by timestamp based uuids this is from pganalyze.com and in this five minutes of postgres he covers a forum post talking about where they used ulids to partition a table to get improved performance. And that article is here. And again he said, quote we had a very large table, 28 million rows. So again that reference of about 20 million was really large, so I just thought that was interesting. But they were getting timeouts on querying anything other than the primary key to which I'm thinking 28 million is not really large and what perhaps was going on. There was probably an indexing solution to resolve this, but they went ahead and went forward with partitioning by UliD and they describe how they did it, and they said queries went from multiple minutes to subseconds. So I didn't see a lot of the detail about what the difference was other than partitioning, because in my experience, partitioning either doesn't give you a performance improvement or it's slightly incremental versus just having a large table. It's really great to do to ease maintenance because vacuum and auto vacuum can work much more efficiently on multiple small tables versus a large table. But going back to Lucas's video, he actually shows doing this with a development version of Postgres 17 using UUID version seven, which that UUID has a time component in the beginning and then a random component at the end, and he used range partitioning by day looking at the time component of the UUID and I thought this was fascinating. So I definitely encourage you to check it out because this is probably how we're going to be doing this in the future. I can't wait for UUID version seven to get into postgres. It looks like it might get into Postgres 17 and he discusses that in the video. We'll just have to see next piece of content PGXN creator David Wheeler joins Tembo to strengthen postgresQl extension ecosystem. This is from Tembo IO and exactly as it says, David, who is the creator of the PG extension network, has joined Tembo with the objective of maybe making that the definitive extension repository for postgres, or at least put work towards doing that. And they have a few objectives they've listed down here. One, be the canonical source of record for all postgres extensions. Two, be super easy for developers to publish their extensions in the ecosystem. Two, make extensions easy to find and understand. Next, make extension documentation consistent, standardized and comprehensive provide stable, easy to use and comprehensive APIs that downstream developers can use to build other tools and products provide automatic binary packaging for wide variety of platforms and provide intuitive, powerful interfaces for managing extensions. So this is super interesting to me if we start using something like this for extension repositories to be able to easily install and uninstall extensions. So we'll just have to see the next post is from justotherory.com and this is David writing about I'm a Postgres extensions timbo knot. So he talks about his transition here as well. Next piece of content there was another episode of Postgres Fm last week. This one was on PG vector, and in this one Michael was joined by Jonathan Katz, who's a postgreSQL core team member and has done a lot of work on the PGvector extension. And they talked all about PGvector, the two primary indexes that are being used now, IVF flat as well as HNSW, and where potentially PG vector is going in the future. What I thought was interesting is that people have been advocating PG vector going in the core, but the thing that Jonathan mentioned is that maintaining it as a separate extension at this particular point in time helps it be more nimble compared to being in the core postgres product, meaning postgres does yearly versions, whereas they can iterate quite rapidly in an independent extension. So given that all this machine learning and artificial intelligence is so new, that adaptability and being able to react quickly is highly beneficial. But really great episode. Definitely encourage you to check it out.
[00:14:06] Next piece of content extensions for PostgreSQl how we participate at adjust this is from engineering adjust and they indicate down here they have about 30 extensions, most of them private, but 13 of which are public. So they are available. And this is basically a directory of all the different extensions that they have available. One I thought that was particularly interesting is first last aggregate. It basically gives you a function that lets you grab the first row when doing a group by, or the last row when doing a group by. They also have an istore which is basically a key value store but is strictly for integers only. So that has a unique use case or they have multiple different enums like one set up for country or currency or a device type. So pretty interesting. You may want to check this out to see if there's any of these extensions you want to consider trying out.
[00:15:03] Next piece of content PostgresQl 17 copy and save error too this is from dbiervices.com and this is really great. So normally if you are importing data and there's some sort of error with the data. So for example this data layout gets imported no problem. This data layout is going to have problems with a few rows here and you'll get an error with regard to it and it will just stop and not import anything. The table will be empty, but if you define a save error two it will load the data it can and then I believe save it to a location. Although in his example he saved it to none so I guess he just ignores it. But I assume this feature is supposed to write it to a separate file. Then you can make adjustments only in the file with the rows that have the issue. Correct it there and then upload those correctly. So I think that's a great addition. Check out this blog post if you want to learn more.
[00:16:03] Next piece of content PostgresQl is now a CvE numbering authority or a CNA. This is from postgresql.org and the Postgres security team can now submit cves on their own.
[00:16:16] Next piece of content postgresql parallel query problems in JBC and DB Beaver this was an interesting problem where parallel queries were not working when using this particular tool. DB Beaver and if you look down here when they analyze queries from it, you can see two workers were planned but none were launched. So the planner thinks it can go ahead and do a parallel plan, but the execution does not use it. It didn't start any extra processes and there are some limitations for parallel query, but the particular problem with this one is because it uses the JBC driver. If you specify a limit at the protocol level here, it actually doesn't give you a parallel plan. And the solution to it is to change it to zero. And now the parallel plan works. So that's really strange. And the solution if you need to limit the results, don't use this feature here, but just add a limit to your query if you're working on that. The last piece of content posette an event for postgres 2024 so this is actually rebranded Cytuscon, so Cyduscon is now posette and it stands for postgres open source ecosystem, talks, training, and education. So definitely look for more information about that in the future.
[00:17:34] I hope you enjoyed that episode. Be sure to check out scalingposgras.com where you can find links for all the content discussed, as well as a podcast version of the show and a full transcript. You can also sign up to receive weekly notifications via email. Thanks for watching this episode. I really appreciate it. Please like and subscribe and I will see you next week. It's.