Zero Bloat Postgres Queue | Scaling Postgres 416

[00:00:00] Thankfully I've never had to deal with queue issues that much. Most of my larger clients have a queue system set up outside of Postgres. Maybe it's Redis or some other technology. [00:00:13] And at a certain scale you can run a queue in Postgres. For example, my current SaaS app uses Rails active storage library which is essentially using a Postgres queue using four updates, skiplocked, etc. But what if you wanted to keep your queue in Postgres but you want to push the limits of what's possible when doing things like for update skiplocked. Well we're going to talk about that today, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is actually a repository and this is Nick's repository from Postgres fm and just recently he released this tool called PGQ that he describes as a zero bloat PostgreSQL built on top of battle proven Skype's PGQ1 SQL file to install PGCrontotik. So the first notable thing about this, this is not an extension. If you look at the code distribution, it's basically a lot of PL PG SQL files, but you basically clone the repo and run a SQL file to install it you need to have Postgres greater than 14 and have something that calls PGQ ticker periodically. So you could use PGCron. So apparently you can set up like a 1 second tick and then it reticks every 100 milliseconds. So 10 ticks per second basically. [00:01:45] So that means this is available essentially everywhere. You don't have to install an extension to get started. But what really makes this interesting is the implementation. So it does not use for updateskiplocked. And you can see this cool graphical representation where he's showing dead tuples rising with all the other queuing systems. Whereas PGQ essentially has no dead tuples because it never updates or deletes. [00:02:13] What it does is use snapshot based batching. [00:02:17] So it uses snapshots to determine what transactions are new within the Q table to be able to process and then a truncate based rotation method for tables. So it cycles through three tables at a time so that results in zero bloat, no performance decay. It's built for heavily loaded system just like the Skype PGQ architecture was built for. [00:02:44] It has all the same Postgres guarantees and it even works on managed Postgres on cloud vendors. [00:02:51] Now there is one trade off for this and that's latency because he says PG ticks 10 times per second by default. So jobs are going to have a 50 to 150 millisecond latency. And I think for a vast majority of use cases that's perfectly fine, but it kind of depends on what you're using your cue for. [00:03:11] He said you could maybe reduce the latency by doing 20 ticks per second. [00:03:16] And he said there might be a way to push it even more. [00:03:19] But if you need single millisecond dispatch, PGQ is not the right tool for that use case. [00:03:25] But if you want to be stable under load without bloat, that's what PGQ was built to do. [00:03:32] Now the next blog post related to this is PGQ two snapshots and a diffuser. This is from thebuild.com and of course he's talking about PGQ. He talks about how current queues are typically made. You basically insert into some sort of a jobs table. Then you use a CTE to identify the jobs you want to process using for updateskiplocked. Then you update that jobs table indicating which jobs were pulled off of the table and eventually you probably delete those rows. [00:04:06] But once you start scaling this up, you hit what he says Brandur called Q spiral of death where you have a long running transaction of some sort holds back the global X men and autovacuum can't reclaim the dead tuples. Job table grows, index lookups slow down and then the inserts slow down because the indexes are full of dead entries and and then the skip block scan slows down because of all the extra tuples it has to sort through that causes the throughput to drop and then the backlog grows and it's a never ending cycle or a death spiral. And PGQ uses a different technique to do that. So like I mentioned, it maintains three tables and it looks like you can specify independent queues and each queue will have its own set of three tables and it just cycles through and, and truncates a table once it's done. So there's no deletes, there's no updates ever to it. And the ticker that runs once per second, although it looked like from the documentation it's actually 10 times per second, populates a table that looks like this and it's basically storing a snapshot for each tick. [00:05:21] So it's basically XminxMax and the list of in progress transaction IDs at that moment in time. [00:05:27] And when a consumer, someone who's connecting up to the queue to see what jobs are available runs basically takes the last tick and the current tick to determine are there any transactions that need to be processed. So you can see here where the visible in snapshot for the current tick but the last snapshot it's not visible. [00:05:51] So the only changes that happen to this table are the inserts of the new jobs coming in. The only consumer hot path is just simply select to say hey is anything available for me within this snapshot range? And he of course goes into more detail about the implementation, also talking about the truncate and essentially the disadvantage of this is the latency involved with picking up each job. [00:06:17] But he says you know everyone's been setting up a similar queue system using 4 update scale skiplocked but this is an innovative way circa 20 years ago. Basically what the name of this blog post is taking two snapshots and doing a diff between them to see what has changed in a queue that needs to be processed. [00:06:37] So check this out if you're interested. Another blog post related to this potential consequences of using Postgres as a job queue. This is from richyan.com and he talks about the typical queue using forupdate skip locked and some problems related to that. And he mentioned some alternatives like using advisory locks instead. [00:06:58] Talking about PGQ that Skype developed but is no longer really maintained. But we of course have the new pgq. You could also use Redis or Kafka as well. [00:07:10] Next piece of content maybe PGE backrest isn't dead. [00:07:14] So there's been a maintenance update that's been added to the PG Backrest repository where the maintainer David said my inbox blew up, especially with messages from people or organizations who have PG Backrest users of their own to support. [00:07:33] They would prefer the project to continue with him as the primary maintainer and he said he didn't think this was going to happen. Trying to do fundraising earlier, but now based upon this influx of support, it appears quote all but certain that I will be able to secure enough funding to continue the project. [00:07:52] And I'm assuming that means as the primary maintainer and that this will be supported by a coalition of sponsors as well as potentially bringing on another maintainer to distribute the workload and provide continuity in the future. [00:08:05] And this is what I kind of suspected might happen, because if there are organizations relying on this tool and recommending it, they're probably going to step in to try and save the project or alternatively choose another solution. [00:08:20] So this is definitely interesting outcome. Check this out if you're interested. [00:08:24] Next blog post related to that PGX Backup Continuity support for PG Backrest this is from thebuild.com and apparently they have forked PG Backrest to to make PGX backup. That's because that's what the request with David is was that if it's going to be fork that you should rename it and it lists the support that PGX will be providing for this backup solution. But I wonder if this will be necessary Based upon the previous article, we'll just have to see Next piece of Content Multi exact members at 64 bits one last wraparound to worry about. This is from thebuild.com and there's been an enhancement in Postgres 19 setting the multi exact offset to 64 bits. [00:09:11] Now this is not the multi exact ID that still has the same limit, but these are the members and I believe we covered a blog post a number of months ago where there was an organization that hit the PG multiexact members limit and they were doing things like trying to track disk space to monitor the size that that was at, but apparently it's been updated to 64 bits, so that should no longer be a problem. [00:09:39] But he said one downside of this is that when you are going to be upgrading, if you run PG upgrade, it's going to have to take a while to run this type of migration due to the size change. [00:09:52] But if you want more insight into this, you can definitely check out this blog post. [00:09:56] Next piece of content it depends Using session variables in Postgres this is from pgedge.com and he's talking about how you can use variables in Postgres and it's basically using session variables. First he looks at what MySQL does. You can just use a set command to set a variable and then use it directly in queries. With SQL Server it's a little bit more formal, but you can declare variables the same way in Oracle. It has its own syntax as well. [00:10:28] Now if you use psql, you can set PSQL client side variables to do this type of thing, but if you're trying to do it within an application, that won't work because it's only through psql. [00:10:42] So you can set session variables, and as long as you include a.in the variable you're naming, it will accept it as a custom variable. So you can create a variable to set anything you want, as you can see here, and then you can show it. But the problem is you can't use these in queries because these are essentially commands set, show, reset, etc. But what you can do is use methods that set these session variables so you can do set config to set it or use current setting to extract it or get it. So he shows various different examples of using that. [00:11:21] So if you want to learn more about this, definitely check out this blog post. [00:11:25] Next Piece of Content CyberText Contributions to PostgreSQL19 this is from cybertech PostgreSQL and a lot of these have been covered in previous episodes of scaling Postgres. But I did want to say that the technique that they're using for creating repack concurrently is a result of working with the people at Cybertech who developed PG Squeeze because they were using the technique to do logical replication to keep the old table and the new table in sync and do the changeover. So thanks to all their work to get this new feature into Postgres, hopefully it'll make it into Postgres 19 they also mentioned various documentation usability enhancements as well as getting Postgres to run on all the different features of Debian operating systems. So if you want to learn more, definitely check this out. Next piece of content pgkeeper building the bouncer we needed for Postgres this is from figma.com and this is yet another connection pooler for Postgres and figma decided to build their own. So they were using PgBouncer for a time, but they were hitting limits with regard to that. Mainly it is single threaded, so I understand that is a nuisance. [00:12:40] They also couldn't prioritize any particular traffic. They wanted to have better connection management logic and they wanted some better operational tooling essentially. [00:12:51] They did also evaluate pgcat as well, so that did give them a multi threaded pooler, but adding all the different observability features they needed and feature flagging and admission control. [00:13:04] They felt there would be too many changes to address so they built PGKeeper instead. [00:13:10] So if you want to learn more about the tool that they built as well as its rollout, definitely check out this blog post. [00:13:17] Next piece of content Christoph's 7 Rules of Disaster Response this is from thebuild.com and his first one is wind your watch which is to his point is basically take a deep breath and take your time to assess the situation. Don't react too quickly. The second is no one on the call who doesn't have something to do. So basically a small focused team. Three is have a final decision maker. Don't have a bunch of developers sitting around on the call wondering what to do. You definitely need to have a decision maker there. [00:13:49] Next is analyze. Don't argue. [00:13:52] So basically rely on the data to lead you where it leads as opposed to trying to be right. For example, next is point and call. Explicitly say all right, our plan is to do A and then we are going to do a. Now explicitly call out what you're doing. Next is appoint someone to deal with senior management. [00:14:12] So designate someone for communication so that management's kept up to date, but ideally keep them off the call. And lastly, no post mortem until after the course is called. Basically wait for the issue to be resolved and then you can examine what went wrong and why. So I thought these were great guidelines to cover and the last piece of content how are committers selected? This is from Vondra Me and he goes through how Postgres committers are selected and chosen for the project in terms of who does it and the process and general criteria. So check this out if you're interested. [00:14:51] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content discussed, as well as sign up to receive weekly notifications of each episode. [00:15:02] There you can find an audio version of the show as well as a full transcript. Thanks. I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 288

Index Corruption From ICU Collation Change | Scaling Postgres 288

Episode 166

Postgres 14 Beta 1, PG 14 Improvements, Best Primary Keys, Composite Keys | Scaling Postgres 166

Episode 182

Boundless Text, Revoked Permissions, Index Bloat, Hardware Performance | Scaling Postgres 182