Petabyte Scale and Locking | Scaling Postgres 1

[00:00:00] Welcome to episode one of Scaling Postgres. My name is Kristen Jameson and I will be your host. Scaling Postgres is a free weekly show that curates content from around the web along with my commentary to help you understand the best ways to scale and get the most performance from your postgres database. I'm going to try and focus especially on those topics that are of interest to developers, since typically the greatest performance gains are achieved through database design and architecture and how you use it in terms of format. I'm going to be curating the best content from around the web and presenting it to you to help you understand how best to scale and get the most performance out of your postgres database. All right, let's get started. [00:00:54] All right, the first article that we're going to be covering is Testing Database Changes the Right Way. So this is an article from the Heap Analytics blog, and it talks about their Petabyte scale cluster of postgres instances. So if you had the question of does postgres scale, clearly it does. Now, at a Petabyte scale, they're not talking about a single database, but from my recollection, they are sharding their database. So there's multiple databases that handle that Petabyte scale of data that they're working with, but nonetheless, they seem to be on the forefront of pushing postgres in terms of the amount of scaling and the amount of data it can handle. [00:01:37] Now, this is an interesting article because it tells you how they test database changes before they push them to production. [00:01:45] So they discuss some things that didn't work and then talk about how they've come up on the solution of using a shadow production. So basically, they clone out segments of their production database and share traffic with production and the shadow database and make changes to the shadow portion and check for changes to see how that works. [00:02:09] And they go into detail about how they populate it, how they mirror the reads and writes to make sure that they're getting accurate results and then analyzing them. And then they go into an example of where they were rolling out table partitioning and they used this technique in order to assure that their rollout went off without a hitch. So I highly encourage you to check out this post, particularly if you're looking to scale to a significant level. Now, another thing about this post is that it also has links to three articles that are equally valuable as well if you're looking to scale. So, for example, this first link here has the title running 10 million PostgreSQL indexes in production and about how they serve their customers through utilizing a lot of, I believe, their partial indexes. The next article is basic performance analysis that saved them millions. I believe that's millions of dollars, as well as how they're analyzing the performance of millions of SQL queries to basically eke out the best performance. So I definitely suggest checking out this blog post as well as the associated ones as well. So for our next article, it's actually a presentation that's been placed on SlideShare. The title is Postgres SQL at 20 Terabytes and beyond. What's a little interesting about this is that actually it is another example of a cluster of databases and the total cluster, the amount of data it's managing is 400 terabytes. So it's essentially 20 terabyte PostgreSQL databases that they're using to do their analytics and it's basically doing their analytics at a massive scale. So sometimes it's hard to track with a presentation what's going on, but this basically gives you insight into how they're using it to ingest data to process all of this and using Postgres for that. And like looking at the last couple of slides here, they're basically state that they're using PostgreSQL as a big data platform and it provides real time analytics to users'actions and they still have yet to outgrow Postgres SQL given the size that they are. So again, this is another case of if you're looking to scale, here's another example of an organization that is using Postgres in interesting ways to basically manage a lot of different data. [00:05:05] Our next article we're going to look at Rust Proof labs. So in this blog post, they cover postgres version ten parallel queries and performance. [00:05:17] So basically they're looking to see what kind of difference parallel queries can have for the performance of different queries. [00:05:27] So they discuss some of their testing methodology and they have these great charts that show you their different simple queries that they've done. And they're comparing postgres 9.5, not 9.6, but that's okay. But at least they have a control of Postgres Ten here where they have disabled the parallel execution. So probably the most apples to apples is comparing Postgres Ten single to Postgres Ten parallel and you can see significant differences in terms of execution time for simple queries. [00:06:04] And they make the statement just upgrading to Postgres Ten essentially gives you a performance increase. [00:06:11] And also mention a warning if you're setting the max parallel workers per gather greater than the number of cores that seems to cause a decrease in performance. So you want to be aware of that if you're checking out enabling this for your Postgres database. [00:06:30] And then they go into a more complex example and show the results here too. And again, they're getting a fairly good performance increase almost twice as fast with the parallel execution times. And then they also mentioned the CPU time where it's as one would think the actual CPU usage increases as well. So if you're using Postgres Ten, this is a great article to look at to review some of the settings you can make and think about how you want to configure it for the best performance, as well as just showing some third party evidence of the improvements that they've made with Postgres Ten versus the version nine series. [00:07:18] In our next article we're going to look at Postgres Rocks, except when it blocks understanding locks. And this is from the Citus Data blog and they talk about how of course, Postgres is great at doing highly concurrent transactions. But you do need to understand how locking works, particularly when certain locks need to be exclusive or shared or things of that nature. So this is a great table that they present and I'm thinking about printing this out for myself, but it basically looks at, given the different operations in Postgres, what can run concurrently, safely together. So selects can run with selects in amongst, insert, update, deletes, it's perfectly fine. But things get interesting like say you need to create a trigger in a table. Well, you can pretty much only do selects while you're creating that trigger, otherwise Insert, Update and Delete statements are going to be blocked. Whereas when you need to do an alter table essentially everything, it'll need an exclusive lock on that to be able to make that change. So it's a great thing to check out this blog post for. [00:08:35] And it also goes over an example, a quick example here of where you're doing a select and how this can essentially have your table lock preventing a select from happening. [00:08:50] And it also has another table where it goes into difference between shared and exclusive locks for what you're trying to achieve. So that's another good thing to take a look at. Now. [00:09:02] It also goes over the system table PG Locks and tells you how you can find useful information from the system table for Postgres, telling you the status of different locks in the system and what's going on. [00:09:17] Now, in addition to this post, the other very interesting post that has a lot of beneficial information in terms of our next article is again from the Citus Data blog when Postgres Blocks Seven Tips for Dealing with Locks. So pretty much everyone should probably take this and post it up and have it as a readily accessible reference as you're planning database changes. So step number one, never add a column with a default value because basically when you do this, the entire table will essentially have to be rewritten with new rows with that default value added, which can block, read and write activity. As opposed to doing that, they suggest adding the column without a default and then updating it. And of course, if you have a very large table, even that update statement to prevent extended locking occurring, you're going to want to break up your updates in batches. The point number two, beware of lock. Queues use lock timeouts. Because locks are in a queue, they can block up behind one another as one is waiting. So if you have a long like the example they list here, if you have long running update, insert statements or even select statements and this alter table command needs an exclusive lock on that table, it needs to wait until that's done. But the problem is every other say, insert statement or update statement that's happening afterward is going to be blocking because this is first in the queue to happen. So basically you start blocking your entire table for activity. So what they suggest is to set a lock timeout to 2 seconds right before you run this command, so that if it isn't able to acquire the lock it needs to alter the table, it's going to time out after 2 seconds. So definitely a valuable thing to do .3 is create your indexes concurrently. And again, this is definitely the golden rule of postgres SQL as they state here is to always create your indexes concurrently so that you don't block writes to your table while you're creating indexes. [00:11:41] .4 is take aggressive locks as late as possible. [00:11:45] So basically using your knowledge of locks, load into a new table and then do a renaming as opposed to doing a long running operation within a transaction. So basically you want to keep your trans in general, you want to keep your transactions as short as possible. If you're updating data, for example, keep the duration that that transaction is open as small as possible. [00:12:10] So zero five is adding a primary key with minimal locking and basically long running things. You don't want to maintain a lock open for a very long time, which is adding a primary key. What it will do because it has to add the index. Again, the proposed workaround is to create a unique index concurrently. Again, the rule number three they mentioned on that column and then adding that primary key constraint to the table will be fast because the unique index in this case would have already been created. [00:12:45] Zero six never vacuum full. [00:12:49] Vacuum is usually sufficient and it's interesting what they put here. Quote please freeze my database for hours if you do a vacuum full. [00:12:58] So definitely a good thing to take to, not to, and .7 avoid deadlocks by ordering commands. I personally haven't seen a lot of this deadlocks, but I think it's more if you're using a lot of transactions in your application, like explicit begin commit, end rollback type thing. If you're doing a lot of that, I think there's a higher probability that a deadlock will happen. [00:13:32] However, my experience, I haven't really seen much of this, but they give some good advice here on how to avoid deadlocks by ordering your commands. All right, in our next post we're looking at migrations and long transactions in Postgres, which is a blog from the Fin Exploration Company. [00:13:53] So this goes right back to related to the previous article in terms of locking queues in postgres. So essentially they were adding a column to a table without a default, but things started slowing down 20 minutes. The database migration they were running still hadn't completed, sort of trying to figure out what was going on. And after investigation they were running a long running backfill of some old data. Once they stopped that, the migration finished immediately. So again they were doing an update in a long transaction that kind of started locking up other actions from happening once this altered table was trying to occur. And he goes into again looking at the PG locks table and the different locks that get created and basically kind of replicates what happened in his instance, as well as talks about his takeaways and prevention strategies. Although what would also assist is definitely from the previous post where they were talking about setting a lock timeout. So the next blog post comes from PG Steph's blog and it's introduction to Postgres Automatic Failover. And he's basically talking about PAF which is also known as Postgres Automatic Failover which is a resource agent for providing high availability for Postgres SQL based on pacemaker and coresync. So if you're at the stage where you're wanting a highly available cluster that fails over to a replica without manual intervention, this post definitely shows how you can do that using pacemaker as well as this resource agent to be able and he goes through the whole process of setting up the server. So if you're at that point I definitely encourage you to check out this blog post for that. [00:16:02] And last is kind of a little bit like a fun blog post from the second quadrant postgres SQL blog called Postgres SQL maximum table Size so basically the maximum table size in postgres is 32 terabytes and as they say, it's been that way for many years. [00:16:19] One of the ways it's wrong he's saying here is that you can adjust the block size to make it larger. In addition, they're talking about table partitioning and how in postgres Ten they have now have declarative partitioning which makes partitioning easier, but they also resolved a bug in Postgres SQL Eleven. David Rowley has found the bug where basically you could expand the number of subtables allowed for partitioning. [00:16:52] So in postgres ten it only allowed 65,000 partitions, which is still quite a lot, but now you can go to the full size of the number of partitions allowed. So basically this is very interesting. Maximum table size of 32 terabytes postgres 9.62 exabytes in Ten and 131 yoda bytes in Postgres SQL eleven. [00:17:15] So that does it for episode one of Scaling Postgres. To get all the links for the articles in this episode, check the link in the Show Notes. Be sure to head over to Scalingposgres.com if you'd like to sign up to receive weekly notifications of these episodes or you're also welcome to subscribe on YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 108

Ten Things, Audit Triggers, Schema Design, High Availability | Scaling Postgres 108

Episode 210

Date Time Bins, Search Every Field, JSON_TABLE, Picking Primary | Scaling Postgres 210

Episode 204

Optimizing Trigram Search, Replication Review, Logical Improvements, Timescale Investment | Scaling Postgres 204