Lock Causes Data To Vanish! | Scaling Postgres 406

[00:00:00] When I work with transactions in my application, I actually use them very simply. I usually am only locking one table at a time. I might do two tables depending upon what is required. But apparently there are some situations you can get into, particularly if you're doing joins that could cause data to vanish for some sessions. So we'll talk about that. Also, there are new postgres releases that fix some regressions, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is row locks with joins can produce surprising results in PostgreSQL. This is from hakibanita.com and he had a situation where he had two tables. One of them has a foreign key to the other one and they execute a select query that joined these two tables and they got no results for some of the sessions. [00:00:57] Well, this blog post goes into that. So he has a example here where you have a dmv, a Department of Motor Vehicles. So there's an owner table. These are the owners of vehicles. And then you have a car table and there is a foreign key called owner ID that references the owner's table. [00:01:16] So he inserted three owners into the owner's table and he has one car assigned to owner ID one. [00:01:24] Now let's imagine you change ownership, you start a transaction and then they're doing a select query with a join. So they're joining from car to owner using the foreign key and they're doing for no key update of car. [00:01:38] So this basically locks it. So no one else can change the ownership of this car while this particular transaction is going on, and it returns the data from that select query. The next step is to just update it to the new owner and then commit the transaction. So everything's fine. The problem is if you introduce a second session into this. [00:02:00] So in the first session you start a transaction, you do the same select query with the join and the for no key update of the car. That works. But at this point, if you then begin a second session and you do the select query with a join, again, the session will be blocked because the first session has a lock. And then once the update completes on the first session and it's committed, then the second session can continue. But the problem is it returns no data, so it didn't find any data. [00:02:31] So his expectation was that it would then do the query and acquire the lock and show the new state of the data where the owner ID is 3 now. But instead of returning data, it doesn't return anything. [00:02:44] So he says that's very strange. So he Said, all right, let me do a left outer join to at least see is there some data there. And when he does the left outer join, he does see the correct information for the car. It has been changed to owner 3, but you don't see the owner information. This ID and name is from the owner. So I believe this is supposed to be null. So there's an unusual concurrency issue going on. Or he calls elusive concurrency issue and he does reference some of the postgres docs. But he says what actually happens here is that part of the query executes before the lock and another part after. [00:03:24] So the query is essentially paused in mid execution until the lock is released. [00:03:29] This is why you get the updated owner from the card table, but the fields from the owner table are all nulls. It's not showing you essentially the old owner's information, presumably. [00:03:40] But at postgres default isolation level, which is read committed, you can have a quote updating command, see an inconsistent snapshot and this behavior makes read committed mode unsuitable for commands that involve complex search conditions. [00:03:59] And his conclusion is, if you lock a row in a query that includes a join and in the same transaction you also update one of the foreign keys you join to. A concurrent transaction might get partial or no results. [00:04:14] So the question is, if you want to keep this type of join and doing the locking, how can you resolve it? [00:04:20] So he tried locking both tables. [00:04:23] That didn't work either. What did work is using a stricter isolation level. So he used repeatable read. Now that just doesn't return no data, but it does provide an error. Could not serialize access due to concurrent update. So it solves it by letting you know there is an issue and you have to retry this command. [00:04:44] But that wasn't ideal for their use case. So another solution is splitting the query where you basically get rid of the join. So first you do a select for update for the car, then you query the owner and then you do the update and commit. In that scenario, the second session works just fine. So really it was those joins and the type of data that was being changed that caused the consistency issue. However, he didn't want to send multiple calls to the database to handle this, so he actually wanted to do a subquery or cte. [00:05:21] And in his testing of this, he saw that locking the rows was one of the final steps of this, and he actually wanted to move that further up the chain. So he did a CTE that first locks the carrow and then he does the join to get the data. And in this scenario everything worked as expected and you don't have to use a cte. You can also use a subquery as well that he presented here. [00:05:45] So definitely some things to be cautious of when you're using transactions. [00:05:49] So it looks like this was related to the join as well as perhaps the rows updated as a result of that join. [00:05:56] But if you want to learn more, definitely encourage you to check out this blog post Next piece of content PostgreSQL 18.3, 17.9, 16.13, 15.17 and 14.22 are released. This is from PostgreSQL.org and these are fixes for regressions that happened with the earlier release that happened a couple weeks ago. Now surprisingly there are a lot more bug fixes and improvements compared to what prompted them to do the out of cycle release. So these first two were definitely the main cause for doing the out of cycle release. But now they also have a strike word similarity function issue in PGTrigram again related to the fix, as well as a JSON strip nulls issue that actually became mutable as opposed to immutable. [00:06:46] And of course a number of other fixes as well. But with regard to the JSON strip nulls, if you have migrated to Postgres18, there's actually some steps you need to do to make sure these functions are immutable, so be sure to follow the updating guidelines down here. [00:07:02] Next piece of content PGvector 0.8.2 is released. This is mostly again a security fix because parallel HNSW index builds can leak sensitive data due to a buffer overflow and it's also possible to crash the database server. [00:07:18] So definitely upgrade this when you get a chance. [00:07:21] Next piece of content Making large Postgres migrations practical 1 TB in 2 hours with peer DB this is from clickhouse.com and what I thought was interesting about this is that they're offering a solution that they say is better than using logical replication to migrate a database. And a solution is fully open source and you can use it for any postgres to postgres migration, including self hosted managed setups or across cloud providers. So of course they're using it to get people migrated onto their new hosted postgres database solution, but it's great that this tool can be used for any migration. [00:08:03] So in terms of their tests, they migrated an AWS RDS instance into their postgres managed by Clickhouse. So solution I think they did a test on a single table that was one terabyte in size because they Wanted to get timing for different ways of doing it. So first thing they tested was a PG dump, a PG restore, and it took about 17 hours. That takes a really long time. [00:08:31] And the worst thing about it is essentially that's how much downtime you're not capturing data for. So this is not a viable solution for hardly anyone, unfortunately. The next thing they looked at was native logical replication. So this is setting up one publication, one subscriber to transfer this data that took 8 hours and 40 minutes. [00:08:51] And then they looked at using Peer DB, where you can configure the number of threads. [00:08:57] And with four threads it took four hours and 39 minutes. With eight threads, one hour, 50 minutes and 16 threads at two hours and 10 minutes. So a little bit longer with the 16 threads, but. But it looks like they were hitting a network throughput bottleneck with the instance type that they were using. [00:09:13] Now they are doing this transfer in parallel. So it's still a little bit of apples to oranges. The logical replication versus what PeerDB is doing. A apples to apples comparison would actually be setting up a parallel transfer using logical replication there I think the numbers would be more equivalent. [00:09:34] So they are transferring using CTID ranges and it says it does have automatic retry failure mechanism. So there are some niceties added to it. But another thing that may help with the performance is they are using the binary format protocol as opposed to text representation. Now, in order to ensure that the data is consistent, they are using PGXport snapshot. And this is a tool I've used to do parallel dumps of a very large multi terabyte table when doing logical replication upgrades. So you can't do it using logical replication, but it requires a lot more work and it looks like PeerDB has built a lot of conveniences into it. They also have support for unchanged host columns, so that can be beneficial as well. [00:10:22] But go ahead and check this out if you want to learn more. [00:10:25] Next piece of content. There was another episode of Postgres FM last week. This one was on pgash. [00:10:31] I was thinking, what is pgash? [00:10:34] And actually it's a new tool that Nikolai developed. And here's the GitHub for it. ASH stands for Active Session History for PostgreSQL and it basically does wait event sampling for you. [00:10:48] So if you've used RDS performance insights, it shows you the wait events and what's going on for the different sessions on your instance. And he wrote this because he really feels like weight sampling is one of the most important metrics to look at when you're assessing a postgres server's performance and why he developed his own tool is first he wanted to be able to install it anywhere. So this is actually not an extension, but it's pure SQL scripts that set up this utility. So this is what you would run. You just run the SQL and it sets everything up. The other requirement is that he wanted LLMs to be able to easily view the output data. And this is also just basically a bare bones way to get some monitoring data for your postgres instance, particularly if you're not paying or haven't set up sophisticated monitoring for your postgres instance yet. [00:11:40] So this does take sampling every one second by default, I think, and it stores the data in a local postgres table very efficiently and it has various different functions that you can use to query the data and see what's going on, and even graphs that show you the wait events happening in your system. [00:11:59] So back to their episode. It was ironic that Michael mentioned, you know, extensions were developed so that people could add additional functionality to postgres, whereas now a lot of times extensions are a bottleneck because so many people are using hosted database solutions that you are only allowed to install approved extensions on them. So really you get further reach by not releasing an extension. [00:12:24] So I thought that perspective was quite interesting. [00:12:27] And Nikolai did mention, you know, there is PG weight sampling that is an extension. And also the Supabase group is also working on PGflight recorder, so that's another example as well. But if you want to learn more, listen to the episode here or watch the YouTube video down here. [00:12:44] Next piece of content, Postgres JSON B Columns and Toast A Performance Guide. This is from snowflake.com and he's talking about. A lot of people are starting to use more and more JSON B in postgres, but you need to be wary of how large your JSON B gets because Postgres always operates on an 8 kilobyte page and if you have small JSON, everything fits inside it well and you get relatively good performance for queries and modification of data. But if all the columns of a row, including the JSON exceed the size of the page, it gets placed into multiple pages as shown here. So you have the JSON spread across these two toast pages. And the problem with this is from a performance perspective. So when looking at a small JSON item, He's generating a 40 byte JSON, and in a small table, the toast size is relatively negligible. All the size is in the actual table size. And when you query it, it returns in less than 10 milliseconds. And this is without indexes. But when you start using 40 kilobyte JSON B documents, then things become far different. You're using a huge amount of toast. This might be 500 times as much as the actual table size. And when you go to run that same query, it's 40 times slower. [00:14:08] And he calls this quote, the TOAST tax. Because when you run this query, it needs to go through and find the data, but then it needs to assemble all of those individual pages together to return the data to the client. [00:14:20] So that just takes time. [00:14:21] In terms of way to avoid the TOAST tacks, he recommends putting common fields into table columns. And one solution he has for this is using generated columns. So you can just create a dedicated column and have your application extract the data from the JSON and place it in there. Or you can have the database do it for you by creating a generated column. So it's going to look for the item price and store it as a float in this column and it's going to find the item name and store it in this column. The other way to speed things up is of course, add an index. And you can use an expression index for that. [00:14:58] So an index that speed specifically targets the value that you're looking for. This ran in 80 milliseconds, which is far faster than the 500 milliseconds without the index. But it's still slower than a direct filter on the name column. And if you actually need exact keys and values, you can use a gen index, of course. But if you want to learn more, definitely check out this blog post. [00:15:21] Next piece of content semantic caching in PostgreSQL a hands on Guide to PG Semantic Cache the this is from pgedge.com and I thought this was interesting where if you're using a chatbot that has a rag pipeline, using a cache can help cut down on expensive LLM calls. Because these queries, even though they're phrased very differently, they're using different words, they essentially all want the same thing. The Q4 revenue. And he says, quote, research shows that 40 to 70% of all queries are semantic duplicates. So if you could cache this and the results that the LLM is producing, you can save a lot on your AI bills. [00:16:04] So they developed a PG semantic cache to help do this caching in your postgres database. [00:16:10] And it's provided as a Docker container. So you do make an API call to generate the embedding. But once you have that embedding, it does a similarity score to what's already been cached and if it finds one that's super high and that's relatively recent, it can send that as the result, as opposed to asking the LLM to process that question. Anyway, I thought this was super interesting and if you want to learn more, definitely check it out. Next Piece of content Create your PostgreSQL clusters with the built in C collation. This is from cybertech postgreschool.com and he's clearly advocating to use the C collation with your database because it's relatively simple, it's more performant. But of course the disadvantage is sorting the way one would typically expect with different languages. And he says, well, you can get around that by explicitly setting a correlation for different columns, or you can do it for particular queries. Just make sure your index is compatible with that particular collation as well. [00:17:15] And he said this is far better to do so that the vast majority of your indexes are immune from collation changes if there's upgrades that happen and it shows you how you would create it using the built in C collation provider. So check this out if you want to learn more. Next Piece of content inside PostgreSQL's 8 kilobyte page this is from boringsql.com and this is a definite deep dive into how Postgres 8 kilobyte page is set up and all the different components of it. So if you want to learn more about that, definitely check out this blog post and the last piece of content. Why it's fun to hack on Postgres performance with Thomas Vondra and this is from the talkingpostgres.com podcast, so definitely something to check out if you're interested in performance work. I hope you enjoyed this episode. Be sure to check out scalingpostgrads.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show as well as a full transcript. Thanks. I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 335

Postgres 17 Released! | Scaling Postgres 335

Episode 224

Column Performance, BRIN Win, Unique and Null, Parallel Distinct | Scaling Postgres 224

Episode 12

The Future, Slow Counts, Avoid Or, Replication | Scaling Postgres 12