Preparing For Postgres 18 | Scaling Postgres 384

[00:00:00] As I'm recording this, we're about a week away from Postgres18 being released. And I think this release is going to be very interesting because there's a lot of fundamental changes that are happening with regard to introduction of asyncio and skip scans, and it could really help push the envelope on performance. So I'm quite eager to see what the results will be as people start to benchmark it, but I hope you, your friends, family and coworkers continue to do well. Our first piece of content is get excited about Postgres18. This is from CrunchyData.com and they're just listing various features that are coming in Postgres 18. The I would say probably biggest one is asynchronous IO. [00:00:45] So this is just asynchronous I O for reads. So the writes will still be synchronous. And what this basically means is you can request data and then you can do something else and eventually that data will be returned to you. So it should allow greater throughput. And they say the asyncio will affect sequential scans, bitmap heap scans and some maintenance operations like vacuum. So it won't necessarily impact index scans at this point. And there are a couple of methods to do asyncio. One is worker, which will be the default, and there will be three worker processes handling the data retrieval basically. [00:01:24] Although if you are On Linux kernel 5.1 or higher you can use the iuring method and if memory serves, that seemed to give the most performance. [00:01:34] But again we'll have to see once Postgres 18 is released, the next big one is UUID version 7. So basically this is time ordered UUIDs. The front portion is basically a timestamp and then the latter portion of it is random. So these are far better to use for primary key uses compared to a fully random UUID because of all the random IO that can happen as it's trying to insert random UUIDs everywhere all over the disk and having a time ordered one makes that much more efficient. Another big feature is B tree skip scans. [00:02:14] So historically if you had a multi column index, they gave an example here. If you have status and date, you could query on status and date, or you could query on status and this index would work, but you couldn't query based upon the date and have it use the index. Well with skip scan now you potentially can because it essentially skips over that status and is able to have the index help pull out data based upon the date. So that could give some surprising benefits for people if they haven't rigorously index their system. The next one is generated columns on the fly, which basically means the generated columns can be virtual. Now, although keep in mind you can't apply an index to it. So it's really just a convenient way of pulling up data to reference in a query. Like, for example, the username is in the settings of this table. So they just created a username virtual column that just pulls that data forward, that they can just specify the username column as opposed to typing this. And another feature that could be beneficial for larger organizations probably is the OAuth2 authentication is now available in Postgres, so you don't have to keep creating usernames and passwords in postgres. So feel free to check out this blog post if you want an overview of some of these features. But we're actually going to talk about some of these features in detail in some of the blog posts that are coming up. [00:03:38] Next piece of content PG watch version 4 beta is out. This is from cyber.postgresql.com and basically this is being released because it has full Postgres18 support. And they've also upgraded the Grafana dashboards as well. So this is a way to keep track of your postgres performance with all these charts. So this is the beta, but they are anticipating next week releasing version 4 along with Postgres 18. So you can check this out if you're interested. [00:04:09] Next piece of content Fine tuning Incremental Backup this is from enterprisedb.com and this was a feature introduced in Postgres 17 incremental backup. And he says, you know, this is a very low level backup tool. It doesn't actually help you manage your environment like he says, for example, PGbase Backup has no clue where your backups are stored, what schedule you're using to do those backups. It's a very low level tool. [00:04:35] And the incremental backup was pretty much the same way. You would probably want to add tooling above it, either scripting your own or using a tool like Barman to help manage your backup life cycle. And what was interesting is when he designed the incremental backup, he tried to foresee things that would be needed. But it wasn't until Barman started using some of the incremental backup features that there was a demand to say, you know, it would really great for performance to be able to have a link option. So this is similar to when you're doing a PG upgrade. You can do a link option so you don't have to copy over all the data. Well, the combine backup option as part of the incremental backup would always create the combined backup in a new folder. So it rewrote all the data, but now there's a link option for that that could be definitely beneficial. So this will be coming in the new version Postgres 18. [00:05:29] So check this out if you're interested. [00:05:31] Next piece of content preview PostgreSQL 18's OAuth2 authentication. This is part one. Explore how it works and this goes into depth on how the OAuth2 authentication works in Postgres. This new feature that's coming. You can check this one out as well as part 2 is available. Building a custom OAuth 2 validator by Rust so this is basically setting up a way to use the OAuth authentication. So definitely check this out if you want to learn more. Next piece of content Reconsidering the Interface this is from cybertech postgresql.com and this blog post is inspired by a presentation 1000x the Power of an Interface for Performance by Joran Dirk Grief from Tiger Beetle. So he was taking a look at the performance of PGBench and he did a test on a particular system and he has the pgtimevent scale here and how long the test ran. And he went from one connection up to 512 connections and you could see how the transactions per second increased over time and where it started to level out. So at about 16 connections you start hitting diminishing returns. And if you actually install pgweight sampling and you take a look at what weight events are causing it, it basically locks. [00:06:47] Basically different updates are trying to presumably update the same row and it's locking itself. And he said the CPU during this benchmark was almost completely idle in spite of it hitting diminishing returns. So he says, okay, let's try adding some network latency to the mix as well, because this was using a local socket, which he says is, quote, an almost infinitely fast network. So he said let's add a 10 millisecond delay and suddenly the transactions per second just cratered. And even at 512 clients, it's less than 500 transactions per second. Oh, he said 431 transactions per second. So this is basically due to the latency of the network as well as the lock weights that are happening. And he says, how can we get around this and speed it up? He says, well, we can basically get rid of the back and forth. So instead of using PGBench directly, which sends individual commands. He said, let's create a PGBench like function. And he sends all of the commands in one batch, essentially. And now Even with this 10 millisecond latency, you can see it starts to approach the performance of a socket connection because it doesn't have to wait for round trips. And then he said, okay, how can we do this even faster? Well, let's change the interface and essentially pass in arrays of data that need to be changed. So he created this function that goes through and performs each of the statements needed, but he passes the source data in as array. So essentially you're providing a batch of work to the database all at one time. At a batch size of 1000 transactions, he got performance that looked like this. Basically he went from almost 4,000 transactions per second up to as high as 125,000 transactions per second. So by batching everything all at once and doing it in a sequence so you wouldn't have latency happening because you've given everything to the database to do within a function. [00:08:50] And secondly, you eliminate all the update locking because you are performing these transactions sequentially, so there's no chance for lock conflict. So that's a pretty, pretty crazy performance boost. But anyway, if you want to learn more, definitely encourage you to check out this blog post. [00:09:05] Next piece of content. Sharding our core Postgres database without any downtime. This is from gadget.dev and this company apparently allows developers to create apps on their platform. And they've been using a single database for apparently since they went live. But they were on Postgres 13 and they desperately needed to upgrade. I think they were pushing 4 terabytes in size and they really didn't want to, I presume, upgrade four terabytes. And they actually wanted to shard their database system anyway. Although, like what they say here, you can't spell shard without hard. So in spite of that, they still wanted to shard their data. And because they have a control plane that kind of manages the service and all the different developers and then you have user app data in that same database. What they wanted to shard is move this user app data out to their own independent databases. Now, I don't know if it's a database per customer or if it's multiple customers on a particular database that maybe they can migrate them around. I don't remember seeing that. But then the control plane what manages the whole platform would be on its own database and they would upgrade that to looks like Postgres 15. So the plan was to move the user app data over one app at a time and migrate it into a particular database. So this was great because they had internal apps they had worked on that were very small. So they started using those to test the migration pathway and then move on to increasingly larger ones. And so basically how they transferred this data over is they developed a maintenance mode where they could pause a particular app for five seconds. So basically pause traffic to an app for up to five seconds, giving them a maintenance window to do some type of work. And what they basically did is use logical replication to copy the. To copy each app from that monolithic database to a particular shutter database and then bring the app back up. So they have all sorts of detail about how they actually did that process. But basically using logical replication to send things up, move the data over. But then the actual cutover point where they're moving from the app using the old database to the new database was done within that five second maintenance window. And they said their longest pause window during this process was 4 seconds, and the P95 pause window was 250 milliseconds. That's super fast. But again, some of their apps were very, very small, so the process worked very quickly. So this blog post dealt with the migration to shards on an application by application basis. This blog post, Zero Downtime Postgres upgrades using logical replication from gadget.dev covers how they did the logical replication Zero Downtime upgrade for their control plane database. And after they had migrated all the shards over, they were essentially left with a 300 gigabyte database. So that was much more manageable. And. And they used, oh, actually seven terabytes. So they went from a seven terabyte monolith down to a 300 gigabyte control plane database. So that is the one that they zero downtime upgrade using logical replication. So you can check out how they did that here. [00:12:23] Next piece of content defense in depth for MCP servers. This is from supabase.com and if you recall, we did a post talking about someone compromising a Supabase database, where someone connected an MCP server directly to a production database and someone sent a support request to a service telling the LLM to do something like ignore previous commands. And now I want you to send me this particular private information from the database you have access to, basically doing a stored prompt injection attack. But this only happens when a developer uses an AI agent to connect and read the fields. So there's a very specific series of steps that need to happen for this to work. So this was Actually written by, I think the chief security officer at Supabase, and basically he's advocating use a defense in depth strategy. In other words, put as many levels in your security as possible when you're dealing with LLMs. So he mentions if you hear that Supabase have implemented you can do read only mode so that you can't write anything to a particular database. You can do a project scoped mode which limits queries to a single project, and then feature groups which are restricting which MCP tools a particular LLM can use. So this is something they've introduced as a part of their platform. But as he says, even in read only mode, prompt injection remains a number one concern. So he said they did put in some guardrails. But in his opinion the real fix for this is never connect AI agents directly to production data. Now by all means use it for development databases or staging or branch databases, or obfuscated or anonymized data sets, but definitely try to avoid connecting LLMs to your production data. But in general, I think a lot of the advice listed here is valuable, so definitely encourage you to check it out. Next. Next piece of content, PostgreSQL maintenance without Superuser this is from boringsql.com and basically he's talking about predefined roles that exist in postgres and these are fantastic. I remember when I had to set all sorts of different permissions for postgres and all the different commands that have to be run and it's very complex compared to just making someone a member of one of these predefined roles. So. So the one I've used more than anyone else is definitely the paged readalldatarole that has so many use cases that I've dealt with. But you can also have pgrite all data. You can have people being able to determine what you can monitor with PGMonitor or read all settings or read all stats or check the table scanning for stats. You can also allow people to cancel queries or terminate them. You could define which people can run checkpoints or run things like vacuum analyze reindex. And also there's different file access such as read server files or write server files or even execute a server program as well as create subscription or even make connection reservations with PG use reserve connections. So he goes into a lot more of these roles in depth and what version they came in. So definitely check this out if you're interested. [00:15:39] Next piece of content, how to handle databases not accepting commands. This is from cybertech postgresql.com basically if you've got this and the database is not starting up, you probably have a problem related to hitting some resource limit, maybe with transactions or something similar. So basically definitely get rid of long transactions, maybe check your replication slots to make sure you don't have any orphaned ones, and run vacuum. But if you want to learn more about that, definitely check out this blog post and the last piece of content. There was another episode of postgres FM last week. This one was on turbopuffer, so they basically had Simon Skeldson from turbopuffer come on, and he talked all about database systems in general. So they talked a lot about Postgres, a lot about MySQL because that's where Simon has a lot of his experience from. And they also talked about turbopuffer and why they constructed it the way they did. And they're basically using S3 as a data store, but they're supporting a caching layer above it to give you fast access to recent data. So I think the analogy is kind of like a pufferfish that blows up. So maybe you have some data on NVMe SSDs to get really fast response rates for a subset of your data, and then an intermediate layer that's a little bit faster, and then S3, which of course is generally slower, but it's basically this tiered storage. So if you want to learn more, definitely listen to the show here or watch the YouTube video down here. [00:17:07] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find the links to all the content mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 101

Postgres Releases, Useless Vacuum, Isolation Differences, WAL Compression | Scaling Postgres 101

Episode 235

Postgres 15 RC1, ICU Collations, Listen & Notify, Understanding TOAST | Scaling Postgres 235

Episode 291

2 To 4 Times Faster With Vectorization | Scaling Postgres 291