Here Comes Async I/O! | Scaling Postgres 365

[00:00:00] At long last, Asynchronous I O is coming to Postgres and it looks to be landing in Postgres 18 and asynchronous I O has been around a while, but it's great to be finally getting it in Postgres as well. And be sure to stay till the end of the show for my Consulting Corner segment when I talk about when it makes sense to look deeper into a query and question how it's actually actually constructed. But I hope you, your friends, family and co workers continue to do well. [00:00:33] Our first piece of content is actually new Postgres versions are released, so PostgreSQL 17.5, 16.9, 15.13, 14.18 and 13.21 are released. This is from PostgreSQL.org and the first thing they mention is that just as an FYI, Postgres 13 is reaching end of life this November 13th, so if you're still running Postgres 13 in production, you're probably going to want to upgrade in advance of that date. There was a security issues for these releases, but it was related to the GB18 030 encoding encountering a buffer overread when doing validations. So that security issue was fixed in if you happen to be using that encoding and they do have over 60 bug fixes that were resolved. I usually like looking at the first three and then also checking the updating requirements because that usually means you have to do additional things than just install the update. And the first bug listed is self referential foreign key issues on partition tables. So that was fixed. So basically you could get a failure in enforcing the constraint fully and they even make a note here quote to fix this please see the instructions in the updating section. And basically they're saying you should drop and recreate any of these self referential foreign keys on partitions that have already been created or attached since the constraint was created. So basically you need to correct them to get around this issue. But feel free to review these additional bug fixes to see if they impact you and plan when you want to upgrade your version. [00:02:13] Next piece of content, PostgreSQL 18 beta 1 is released, so Postgres 18 itself is due to be released this fall. So anything mentioned here. There's not a guarantee it will land in that release, but I would say the most significant improvement mentioned is support for Asynchronous I O and the AIO subsystem. Now this first version only affects sequential scans that map heap scans and vacuums, so if you have a highly optimized database and everything is using indexes, you may not get much of a benefit out of this particular release. But for these use cases we have seen a 2-3x performance improvement and we actually look at a blog post next that showcases some of those improvements. But there's plenty of other performance benefits. One is skip scan coming to Postgres 18, which could be awesome for multi column indexes, some optimizations around or or in queries within where clauses, and also some join improvements as well. And we've talked about a lot of the improvements, but another big one in my opinion is actually UUIDv7 actually landing with a function within Postgres18 to support that, because I think this will sublant a lot of the UUIDv4 use cases and be the go to if you wanted to use UUIDs for primary keys. [00:03:36] But check this out if you want to hear about all the goodies that are in Postgres 18 right now. [00:03:41] Next piece of content waiting for Postgres 18 accelerating disk reads with Asynchronous I o this is from pganalyze.com and he first talks about what asyncio is well, right now we have synchronous I O. So one of the backends says okay, system, I want to read an eight kilobyte page so it goes to the disk and it returns that data. Or all right now I need to read another 8 kilobyte page goes to the disk subsystem, it does its work and then it returns the data so everything's processed synchronously and there are wait states between each of these queries as it waits for data to be returned from the disk. [00:04:18] Whereas with asynchronous I O it could just say I need this eight kilobyte page and then this eight kilobyte page and just do all of its requests of the Asyncio subsystem and and then eventually those results will be returned and it can presumably do other work while it's waiting, but overall there will be less waits. And they use the analogy of a librarian going to get a book at a time. Well now there's a cart and you can pack multiple books on it and deliver everything in essentially one round trip. Now asyncio at this point only supports reads, so it doesn't support writes as this point, which synchronous writes are actually a super important thing with postgres to be able to dedicate your write on disk before returning control back to the client that requested the nsert. But like I mentioned, this is for reads, and right now it's only focused on sequential scans, heap bitmap scans, and vacuum operations. Now, there are three different I O methods available now. [00:05:20] So essentially the existing method or the default method is sync. So if you set it to sync in Postgres18, the performance should be exactly like it was in 17. There's no difference. You can also set it to worker, and by default there are three worker processes that are dedicated for reading data. [00:05:38] So the main process receiving the read request can pass off those requests to the workers. And these workers, as it says here, interact with the Linux kernel to fetch the data, which is then delivered into shared buffers without blocking the main process. So the main process doesn't have to wait for that work. And these read workers are shared across all connections and databases. [00:06:01] And this worker method is supported for any operating system. But there's another I O method, iouring, and this is only available for the Linux platform. And this was introduced in kernel 5.1, and it establishes a shared ring buffer between postgres and the kernel. And this is more efficient than the worker method, and it essentially eliminates the worker. So you don't even need worker processes to use this asyncio method. The caveat, of course, is that it only runs on Linux, but also only on newer Linux kernels. And it requires the file systems and the configuration options to be compatible with I O uring support as well. But let's see the Async I O in action. So they created a table with a single integer column, populated it with a hundred million rows, that was about 3.5 gigabytes. And because async IO can be advantageous, particularly if you have network attached storage, because there's more latency associated with it. So the higher latency, the more async IO improves performance, because if you have a very fast local ssd, there's less latency. So you. We don't see as much improvement with adding the asyncio. But they used an AWS C7i8x large instance, so it has 32 virtual CPUs, and they used an io2 class EBS volume provisioned at 20,000 IOPS. So first they tested Postgres 17, and again they did a sequential scan. So they did a count from this table, so there are no indexes on it. It ran in just over 15 seconds, and then they ran it in Postgres 18 with the I O method set to sync. So again, this is essentially what Default postgres is now, and that ran in just over 15 seconds. So pretty much identical between 17 and 18. [00:07:53] And then they switched to the worker method and again by default it uses three IO workers and that ran in 10 seconds. So not quite twice as fast, but pretty good. And then they also raised the number of workers as well in this graph that we'll look at in a second. But then they ran using iouring and it ran in a little over five seconds. So basically three times as fast as the sync operation. [00:08:18] So that is super impressive. So for example, they compared cold and warm cache, and as you can see, the improvements come with cold cache because you're doing more disk accesses. But this is in terms of timing. You can see the IOU ring was a little over 5 seconds, where it was a little over 15 seconds for performance PG17, a little bit faster, but still over 15 seconds for the PG18 sync. And you can see with three worker processes, when you've chosen that option, the performance was around 10 seconds. They got it down to maybe 7 1/2 seconds, adding 6 workers or 16 workers. So there's definitely diminishing returns, but definitely the IOU ring is impressive. I mean, I know it's only sequential scans, vacuums and heap bitmap scan scans, but this one enhancement can triple your performance of some of these. [00:09:07] That's pretty impressive. And they mentioned some tweaks being made to effective I O concurrency and some consideration relative to that. They also mentioned that with this asyncio now, suddenly some of the system views or some of the monitoring tools we've relied on to give us insight into how much disk is being used will have to be updated because. Because a lot of the wait events will be different. Now, for example, maybe it's the workers that are actually doing data file read when you're looking in PGSTATactivity and the actual backend that the client is connected to, it's essentially waiting for an asynchronous IO completion. So the wait events will be different and you'll have to interpret things differently. And when you're talking about the IOU ring, you don't even see what's going on in terms of this. It's not really visible to postgres, but there is a new view that gives you some insight into it called the PgaIIos View. And this can, as they say, help you debug IO requests in flight. Even when using iouring, and even when you're using ExplainAnalyze, some of the timing information is going to be harder to interpret, but overall this type of performance improvement is definitely worth any kind of disadvantage when it comes to tracking observability. [00:10:24] But if you want to learn more, definitely check out this blog post Next Piece of content Pidgey Dog vs PG Bouncer this is from pgdog.dev and pgdog is a new extension that is a sharding connection pooler. So he's been working on the performance and he wanted to compare it against pgbouncer because he thinks that's the North Star in terms of performance he wants to be able to achieve, and according to this chart he seems to be doing pretty well. So so pgbouncer does have a slight edge at the 1 to 10 connection counts, but when it gets up to I think this is logarithmic, I guess around 50 connections, PG dog and even PGCAT pull away at that point. Now of course the thing to keep in mind is that pgbouncer is single threaded, whereas PG Dog and pgcat use worker threads. He shows the configuration he used for each. It mentions how pgbouncer uses Libevent for async messaging, whereas pgcat uses Tokio, which is a Rust library. But again, the key thing of pgdog is sharding, so being able to shard writes. So if you had a need for that, maybe you'd like to check out this blog post. [00:11:32] Next Piece of Content There was another episode of Postgres FM last week. This one was on Synchronous Commit, which is kind of ironic that we're talking about Asyncio today, because Synchronous commit is a setting that allows you to determine when write operations should be committed to the disk. By default it's on, which means when you do an insert from a client, it's not going to return control back to the client until that write is actually flushed to the disk. [00:11:59] Now you can turn synchronous commit off, but there you're at risk of data loss if something happens between the point at which you request an insert at the client and then it gets committed to the disk. If the system shuts down, you could lose that data, but there's many more different settings you can do with synchronous commit as opposed to on and off, because this one variable also controls what happens across multiple replica servers as well. [00:12:26] So you could also set it for local or set it for remote write or remote apply, basically determining a commit by whether something's written to the replicas or wait till it's all the way actually on the database files of the replicas itself. And definitely Nikolaj expressed his frustration at the fact that some of these variables it depends on what context you use them as, what values are valid and what they mean. So it seems a bit like reusing configuration variables and having to look at multiple ones to determine what the actual behavior of postgres is. They also talked about synchronous standby names to be able to create synchronous replicas as well. [00:13:06] And maybe the important point is that maybe you always have synchronous commit on or set at a particular level. If you have multiple synchronous replicas, maybe you have it at remote write, but you can also make the setting per session per user per transaction. So that gives you a little bit more flexibility in it. Personally, I've never turned it off or considered it because data is generally more important than getting the most performance out of it. And they even made the point that if you're under write pressure, maybe just try batching those writes. So use something like a multi row insert where you have one commit but you're committing a thousand rows of data. Try that first before considering turning off synchronous commit. But if you want to learn more, you can definitely listen to their show here or watch the YouTube video down here. [00:13:55] Next piece of content. Create index. Data types matter. This is from cybertech postgresql.com and he's looking at index build times for different data types. So he looks at an int int 8, which I think is a big int float, numeric and text. And in this table he created 50 million random rows that look like this here and then applied indexes for each of them and looked at the timing and the fastest index was created in 15 seconds and that was on the integer. [00:14:30] The bigint was a little slower, a little over 16 seconds. The floating point was a little over 22 seconds. So slower but still not a ton. Then numerics get even slower at 26 seconds and then finally text at a whopping 42 seconds and they have a little chart showing that here. So basically your index build times are based upon the data type you're using. I would have also liked to seen a timestamp because I know that can take some significant time as well. But if you want to learn more, check out this blog post. [00:15:04] Next piece of content. Another look into PostgreSQL, CTE materialization and non idempotent subqueries. This is from Shayan.dev and we talked about the blog post that he's referring to here last week and he reviews it again and goes through different scenarios of changing the query to see what different results happen. [00:15:25] Do you get a limit query executing multiple times? And I remember at the time I mentioned I haven't really seen a CTE written this way and what I normally see is the CTE does the query and then you have a delete statement that uses that CTE essentially and that's what he's written in this last one here, whereas this does the select it used to be one row, but he's doing 10 rows in this example and then you run your delete returning the item id. [00:15:54] So this is the way I've most commonly seen using a CTE for this type of pulling items off of a queue and using for updateskiplocked and then deleting the item. And you also have the benefit is that you can now materialize it which he's done here. But if you want to learn more you can check out this blog post. [00:16:10] Next piece of Content Anatomy of a Database Operation this is from Karen jecks.blogspot.com and this is a very long presentation and blog post that goes through a presentation she did for I believe the Django community. Looking at the process that a query goes from a Django app and how it gets connected, does it go through a connection pooler as well as parsing the query, doing the rewrite the plan, execute it, how does it work? So the whole anatomy of a database operation and then returning that data. So if you're interested in something like that, definitely check out this blog post. [00:16:49] Next piece of content waiting for PostgreSQL18 add function to get memory context stats for processes. So there has been a view since postgres 14 they mentioned here called backend memory contexts and honestly I'll have to say I've never looked at this view, but it's a view to help you if you are under memory constraint issues. Well now they have a function to give you the same type of information. [00:17:16] Frankly I usually deal with more disk I O considerations instead than memory, but if you want to learn more about this you can definitely check out this blog post. [00:17:25] And now it's time for my consulting corner. [00:17:29] So I was asked to optimize a particular query and it was a table that had a set of events and they wanted to look at the last 24 hours of events and then they had another table that recorded activities for those events so there would be multiple activities or steps that had to be done for each of those events and There are particular people assigned to that. The problem was that there was an intermediate join table. So when you look at the last 24 hours in events, there was not an insignificant number. So there were quite a few events that happened over a 24 hour period. And then if it joined to that intermediate table to look at all of the different activities that happened for that event, it would explode out the number of rows. Now there was a where clause on that activity table, but if postgres tried to use it, it wasn't date constrained. So it would still store thousands of rows from many years. [00:18:25] So this is somewhere where I kind of had to step back and I said, okay, wait a minute, if you're looking at events over the last 24 hours, why can't you also look for activities, maybe not in that range, but you know that activity is not going to happen for an event that doesn't exist yet. So you can apply that lower bound of date that they're using on the events for the activities as well. So this is the case of just using logic to say, well, it should still give you the same answer if you add this additional where clause for this activity table in here by a timestamp. So it was actually a where clause change. [00:19:03] As well as adding a multi column index for that column, they were narrowing the activity by and the timestamp so it could get a really efficient scan. And now the query against that table went from retrieving thousands of thousands of rows to only a handful, less than 5, less than 10. [00:19:22] So that dramatically improved the query performance. [00:19:25] So sometimes it's not just about finding an index to add or the right index, but it's kind of, kind of rethinking what is the use case for this query and how could we approach it differently to get the same answer but do it in a much more efficient fashion and hopefully you found that helpful. [00:19:42] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links for all the content mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 215

New Releases, AlloyDB, Time Zones, Connecting to Postgres | Scaling Postgres 215

Episode 79

PgBouncer SCRAM, Art of PostgreSQL, Distributed Time-Series, Window Functions | Scaling Postgres 79

Episode 192

Foreign Key Indexes, Graph Queries, Linux Huge Pages, Text Column Size | Scaling Postgres 192