Locking Tables, Foreign Key Issue, Slow Decimals, Memoize & Joins | Scaling Postgres 249

Episode 249 January 23, 2023 00:09:06
Locking Tables, Foreign Key Issue, Slow Decimals, Memoize & Joins | Scaling Postgres 249
Scaling Postgres
Locking Tables, Foreign Key Issue, Slow Decimals, Memoize & Joins | Scaling Postgres 249

Jan 23 2023 | 00:09:06

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss issues with locking tables, foreign keys and slow decimals. We also cover when joins use memoize.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/249-locking-tables-foreign-key-issue-slow-decimals-memoize-joins/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about locking tables, foreign key issue, slow decimals and memoise and joins. I'm Kristen Jameson, and this is scaling postgres episode 249. [00:00:22] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is, okay, sometimes you can lock tables. This is from thebuild.com and it looks like the blog is PostgreSQL when it's not your job. And this is very short post that's actually talking about something he wrote before where you should never lock tables. Basically, it's definitely something that's going to hurt your performance. And he actually in this post, suggested using advisory locks instead of other types of more heavy locks. But he says there is at least one condition where you need a lock. And that's basically if you're going to modify part of the table, like you want to add a column to the table, to actually make changes to that table, you need an access of exclusive lock, even a brief one. But the problem is that if the table is so busy, even if it just needs this brief hold, once it says it needs this other activity for this table is going to start queuing behind it. And that's known as a lock queue and that could cause some big problems potentially. Now his recommendation on how to handle this is to use this code here where it looks like he's trying to acquire a lock. And if it's not available, try again in about a second. But frankly, what I use is just set a lock timeout and put it in a loop and maybe you set your lock timeout to 5 seconds, 10 seconds, whatever you feel is most appropriate for your database and the load on it and then just put it in a loop trying to apply that change. Or you could try using this code that he's provided. But if you want to learn more, you can check out this post. [00:01:54] The next post also from thebel.com, is a foreign key Pathology to Avoid. And he came up with a scenario where you have, say, a streaming service and you have number of streams, you have a number of users and then you want to link those users to those streams. And he called the table stream. Viewer. And because you want to maintain data integrity, a foreign key constraint is added on the Stream Viewer table to users and to the stream ID. But the problem is if some popular person joins your service now, suddenly you could have millions of new people watching an individual stream. Now you have to add all of these millions of records to the Stream Viewer. Well, you're going to have tons of foreign key relationships to that one stream that has to be maintained, which is adding a four key share lock on that record. But you're going to get diminished performance and you're going to start spilling two disk once you have an excessive number of these locks on the system and he proposes two different ways to get around this. One is don't use the foreign key. Of course you're going to lose referential integrity. Or another way is potentially batch up the join operations. So don't do an individual insert for each one, but batch them up in maybe 101,000 per. But if you want to learn more about that, you can check out this blog post. [00:03:08] The next post, also from thebill.com is how slow is decimal anyway? So he did a test comparing the numeric data type, which is essentially a decimal versus the double precision floating point type. And he found doing a straight sum, it was twice as slow as the float and it went up to four times slower. If there was a simple calculation taking the number and multiplying it by an integer, and it was five times slower. When you multiply the number by itself and do a summation of that and even just inserting records, inserting 100 million rows, took twice as long to do the numeric versus the double precision. So it's just something to keep in mind if you're choosing to use numerics is that there is a performance penalty for using them. And if you want to learn more, you can check out this blog post. [00:03:56] Next Piece of Content using Memoise to speed up joins between two postgres tables this is from Pginalyze.com and this is the next episode of five minutes of Postgres. And Lucas is referencing the presentation that Bruce momgum released that we mentioned last week. And out of the 40 or so different nodes that postgres can use in doing query planning, one of them is Memoise. And we actually discussed this in Scaling Postgres about a year ago when this was introduced in postgres 14. And it can happen where, as he says here, quote two tables that are being joined using a nested loop and the inner plan node uses a parameterized index scan. So those cases memoise can be used. And it's a bit like a cache of sorts, like when it tries to calculate a function and it keeps getting the same value, it can use a cached value in those cases. And there's another quote here. Effectively, what Memoise does is it speeds up the loading of the data because it doesn't have to keep going back to the original sequential scan in this case. And the post that he covers here, which is a little bit of an older post, shows where someone was using a lateral join and they saw a thousand times faster performance. So if you want to learn more about that, definitely check out Lucas's piece of content this week. [00:05:12] Next piece of content usql universal psql. This is from CyberTech postgresql.com. So Usql is a new command line interface that was based upon psql, but it's designed to be used against any database, I believe Oracle, Microsoft, SQL Server, MySQL, etc. And of course you can use it against postgres. So in this post they try using Usql to connect to a database and do some exploration of it. And not all of the familiar commands work in Usql, so your mileage may vary a bit. But if you use different databases on a regular basis, perhaps you would want to explore using Usql instead of psql. So I definitely encourage you to check out this blog post if you want to learn more about that. [00:05:58] Next piece of content. How to solve advent of Code 2022 using Postgres Day 13 this is from Crutchydata.com and Greg explores his solution to day 13 of the advent of code. [00:06:10] Next piece of content building a PostgreSQL wire protocol server using Vanilla Modern Java 21. This is from Gavinray 97 GitHub IO and here he is exploring using Java, something I'm not very familiar with on how to talk to PostgreSQL using its Wire protocol. So this is definitely a more in depth review of how to work with postgres. And if you're interested in learning more about this level of detail, definitely encourage you to check out this blog post. [00:06:43] Next Piece of Content debugging PostgreSQL CI Failures Faster Four tips this is from Citusdata.com and they're talking about the CI environment that is used to build postgres on multiple operating systems. And they're expressing that when building on multiple operating systems you can run into errors. And what are some tips on how to determine where the errors are coming from? The first tip here is connecting to the CI environment with a terminal to see interactively how things are processing. Tip two is enabling debug build options using them on CI, so adding more options to get more debugging information. Tip three is gathering PostgreSQL and other files from CI runs so you can actually see what's in the log files. And tip number four running specific commands on a failure. So if something fails, are there commands you can run to gain more insight into what's going on? And of course they go into much more depth of each of these tips and how to use it with the Sirius build system. So if you're interested in learning about that, definitely check out this blog post. [00:07:43] Next Piece of Content postgres FM did have another new episode this week. This one was on auditing. So if you want to learn more about different ways you can audit your database for compliance purposes or even security purposes, definitely check out this piece of content. And they definitely covered a lot of different solutions during this episode, as you can see here. Next piece of content. The postgres Dual Person of the Week is miroslav Shadivi. If you're interested in learning more about Miroslav and his contributions to Postgres, definitely check out this blog post and the Last Piece of Content we did have another episode of the Rubber Duck Dev show this Thursday afternoon. This one was on teaching new developers with Melissa Amaya. So we not only talked about someone who's say a new developer coming onto a team, but we also discussed how to teach development to someone who's completely new to it, like a student in a secondary grade level. So if you're interested in this type of long form, developer based discussion, we definitely welcome you to check out our show. [00:08:42] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 288

October 29, 2023 00:19:02
Episode Cover

Index Corruption From ICU Collation Change | Scaling Postgres 288

In this episode of Scaling Postgres, we discuss index corruption from a ICU collation change, another caveat for PgBouncer prepared statements, ways to version...

Listen

Episode 81

September 16, 2019 00:12:44
Episode Cover

Data Loading Speed, View Dependencies, Users & Roles, H/A Clusters | Scaling Postgres 81

In this episode of Scaling Postgres, we discuss data loading speeds, view dependencies, users & roles and high availability clusters. To get the show...

Listen

Episode 130

September 06, 2020 00:15:15
Episode Cover

External Compression, Parallel Vacuum, JSON Aggregates, JSON & Arrays | Scaling Postgres 130

In this episode of Scaling Postgres, we discuss external compression, vacuuming in parallel, working with JSON Aggregates and the JSON & Array datatypes. To...

Listen