The Future of Postgres Architecture

[00:00:00] It's interesting to think about what's in store for the architecture of postgres in the future. So you have solutions like Aurora or Neon that have actually replaced the storage layer of postgres, enabling you to have independent compute nodes that are separate from the storage. And this has many advantages for being able to scale up and down the compute load really quickly. So if you want five more replicas you can bring them up pretty fast compared if you needed to build five replicas in standard postgres and copy the entire database to it. But there are also more internal changes being talked about, particularly the ones that Orel DB is working on. [00:00:45] But we're going to be talking about some of these today and also be sure to stay until the end where I do my consulting corner and talk about where we came up with a rollback method for RDS Blue Green deployments. I mean it's definitely not rocket science, but it works. But I hope you, your friends, family and co workers continue to do well. Our first piece of content is the differences between Oriodb and Neon. This is from oraldb.com and I think the main difference between these is that Neon has replaced the storage layer entirely with essentially a logging mechanism and storing the data in object storage but cached in something called page servers. So it's really very storage layer focused their innovations, whereas oreldb is actually more low level code changes to Postgres. So they're using MVCC or multi version concurrency control using an undo log as opposed to creating additional versions in the heap which helps prevent bloat and lets you avoid having to run vacuum. You also get I O friendly copy on write checkpoints and very compact row level wall as well as a new sharding memory caching layer based on squizzled pointers which I still need to do research on what that is and then they go into some of the architectural comparison. Basically orldb with that new shared memory caching layer helps prevent bottlenecks in the buffer manager which I have seen a lot of I O waits for really loaded systems on occasion and removing wall writer bottlenecks as well because again the row level write ahead logging whereas Neon really doesn't have any of these enhancements, although you can instantly add additional compute nodes because computer is separated from storage in terms of I O scalability. Orioldb implements copy on write checkpoints and row level wall saves on iops, whereas Neon, again replacing the storage system uses a distributed network storage layer that can he says potentially scale to infinity. Only disadvantage would be Network latency. And in terms of vacuum and bloat, because orioldb is using an undo log, you basically eliminate the need for routine vacuums. And it also does automatic merging of sparse pages to minimize the risk of bloating, whereas Neon still basically has to use the standard postgres vacuum. Now, I don't know why this is the case, but he says the scalability of multi tenant storage mitigates the disadvantages of vacuum and the risk of bloat. I don't know if that's just because of its high performance, but not sure. And then he shows the architecture of neon. Basically you have the compute nodes and then the multi tenant storage, which is the area that they have essentially rewritten. And you can see the wall is essentially streamed to a quorum of safekeepers, which is a replicated log service that eventually gets stored in object storage. And then page servers are how the reads are done. So this caches recent data, but the cold storage is in the object storage. Now he's saying Orioldb can also use distributed S3 object storage. So it has a special experiment mode where it manually evicts cold data to S3 object storage. So basically you have hot local storage, which could be very fast, NVMe SSDs and the cold storage would be in S3. And he has mentioned that NEON is able to scale to zero, which basically means remove all compute nodes when there's no active traffic happening. And he says oreldb can do the same thing in experimental mode because you can actually get get rid of your compute nodes if there's no activity happening because all your data is being stored in S3 and they have a chart showing a little bit about that storage. He mentions a little bit of the production readiness. So Aural DB is still in beta and not recommended for production usage, but they're still actively working to try and get it to be just an extension and not require patches. But still it does. Whereas NEON is more like a service you would get and they patch postgres to be able to do the things that they do. [00:05:12] So I thought this was an interesting blog post on what could potentially happen in the future with the architecture of postgres. I am interested in seeing some of these Orel DB features making it into postgres if it can, or as an extension. And then it also would be super interesting to be able to scale compute independently of the storage with postgres itself. But time will tell what will happen with that. But check this out if you want to learn more next piece of content. There was another episode of Postgres FM last week. This one was on multi tenant options. And here Nick and Michael were joined by Gwen Shapira to talk about multi tenant architectures because she actually, I think helped founded Nile and they're trying to make it easy for developers or SaaS companies to set up their multi tenant database because they were saying a lot of SAS companies struggled to come up with what model they want to use for their multi tenancy. And she kind of mentioned pooled versus isolated. So pooled means you basically put all the data into individual tables, but you separate the data for each customer or each tenant by a column storing say a tenant id. I happen to use an account id, but it's whatever that designates an individual tenant. And that's in contrast to isolated where you set up separate schemas or separate databases for each customer you have. Now I will say every company that I've ever interacted with or consulted with, they have always chosen the pooled approach. [00:06:46] They never created a database or schema per customer. [00:06:51] And my solution as well that I've built used that pooled method as well. But they say if you think you're going to have not too many customers and they are highly security conscious, you may want to go with a more isolated multi tenant method. But Gwen mentioned another reason that people may want to consider switching is if they suddenly get a very large client in like a big bank or a big series of hospitals or maybe even a government, they may want their own database. Frankly, if that was the case, I'd probably call it an enterprise plan and just replicate the whole solution for them as opposed to trying to separate data into a separate database on the same environment or system. [00:07:33] But she did talk about their solution, Nile and how it helps you partition data for each tenant out of the box. And it sounds like they are indeed using partition tables of postgres. I didn't hear if they were using list partitioning, which I would hope not because I think list partitioning falls down once you get more than say 100 or a thousand items in a list. But maybe they're using hash partitioning, I'm not really sure. But then they also offer sharding as a part of it as well, using foreign data wrappers. And another feature they offer is tenant migration, being able to migrate a tenant to either an independent compute node or maybe a separate shard. And she did mention that they are planning to open source the solution, so that would definitely be interesting. So you can actually see how it is working and they did have a fair number of discussions about AI as well. [00:08:26] So if you're interested in learning more about this, you can definitely listen to the episode or watch the YouTube video down here. [00:08:33] Next piece of content waiting for SQL 2002 Y vectors this is from Peter eisentraut.org and he's saying with the upcoming SQL standard, which will be SQL 2020, something he doesn't know. So we just put a Y in place for it. There's a lot of talk about vectors given everything that's happening with LLMs and AIs. So vectors are being added to the SQL standard, so you will be able to create a vector with however many dimensions and then a coordinate type. Maybe it's integer, maybe it's float 16, etc. The standard will have a few functions that should be able to be run and you can calculate vector distance, which is the main thing you do with vectors is to find their similarity using a number of calculation methods from cosine to dot to euclidean, hamming, manhattan, etc. And then also support for ordering by the distance and also supporting being able to Fetch the first 10 rows only. [00:09:40] But it looks like they actually may introduce a new keyword called approx for approximate, because given the random similarity of it, I don't know if they exactly need to do this because what would be the difference between fetch the first 10 rows vs fetch approx first 10 rows? I don't really know, but I do think with some of the presentations I've seen about sorting vectors, there's conditions in which you do get a random number of rows returned. So maybe that's why they have the approximate so you could say five to ten rows. [00:10:17] So you may not always get the exact number of rows you expect. So maybe that's why it's approximate. Okay, but if you want to learn more about these changes coming in the SQL standard at some point, check this out. [00:10:29] Next piece of content beyond the Basics of Logical Replication this is from boringsql.com and this follows on a post he did last week talking about first steps with logical replication. And in this one he goes beyond the basics. So here he's talking about the initial data copy and normally you can set up a publisher and then set up a subscriber and it will sync up and transfer all the data for you. But if you have a lot of data that can take a really long time or potentially never finish. So he talks about some different ways you can manually synchronize the data. He also talks about different ways you can monitor logical replication and he tends to use the systemvue pgreplication slots. But he never mentions pgstat replication because that that's the one I use more than anything else for monitoring the replication. Now this view has benefits too, but I would also use the statistic view pgstat replication as well. He talks about evolving publications where you can actually create a blank publication and add or drop tables to it. And if you do that you do need to make sure you refresh the subscription. Then he talks about logical decoding. This is where you're actually consuming the wall files using logical decoding to essentially do change data capture duties. Maybe you want to capture all the changes that are happening to a particular table and then fire off some other process to happen. And then he talked about fine grain publication control where you can literally define the columns and or the rows that you're going to be replicating to the next system and then different ways to secure your replication as well. [00:12:07] But check this out if you want to learn more. [00:12:10] Next piece of content why Postgres needs better connection security defaults this is from neon.com and he says you know the SSL mode equal require and I think he's talking about from a client perspective, when you specify require it does give you a secure connection, but a secure connection to what system you hope you are being accurately routed to the host. But the problem is this is ripe for man in the middle attacks, and he actually blogged about this back in 2023. [00:12:42] So the proper way to secure connections using certificates is using SSL mode verify full because that does encrypt it, but as well as it does authentication against the host you're connecting to. [00:12:56] The downside of this is that you have to supply the client with the root certificates used to generate the server certificate. So that definitely adds extra friction to the process. [00:13:07] But as of Postgres 16 there's a new connection parameter called SSLrootcert system which it gives you your encryption, but it also authenticates the server based upon the trusted root certs that already exist on the system. So the same way that a browser connects to a random web server, it knows it's contacting the right one or trust that one when it looks at the certificate and it matches its trusted certs and its operating system. Well this would do the same thing. PSQL would connect to a server and it could trust it's connecting to the right place because it's using the trusted root certificates already on the system. [00:13:47] Now he said back in 2023 this was not working. SSL roots are ecosystem, but fast forward to 2025 and things aren't much better. So apparently on the Mac the PSQL bundled into Postgres app now works, and the PSQL installed by EDB on macOS now works too. The Mac ports package now works. Homebrews PSQL works, but the Docker Postgres maintainers have declined to include the Debian CA search package. However, it looks like Debian may get it where it's been taken up to add CA certificates as a recommended dependency for psql. So hopefully this will happen for Debian and Ubuntu installation soon. [00:14:30] And here's a quote a Win store, but not for the Win. So apparently Windows is having even more issues. [00:14:38] But if you want to learn more about this, definitely check out this blog post. [00:14:42] Next piece of Content build a personalized AI assistant with Postgres this is from Supabase.com now a lot of this is not Postgres related, but I thought some of the examples that they showed of what could be built very interesting and then what can be used on the Postgres side to do it. So they show an example of a run tracking app. And again, this is a personalized AI app, essentially personal recipe and meal planning, company feedback analysis, interest based article bookmarker and it basically uses a Postgres database. Here they're using a Supabase account and certain functions that you can run on it, essentially cloud functions, but you could replicate this functionality in your own language of Choice along with OpenAI as the LLM. A telegram is essentially the front end interface with it and the zapier account for doing MCP tasks because a lot of the tasks for these AI bots is it does something based upon a cron schedule. It may send an email, it may do searches. [00:15:46] So I thought this was very interesting. Again, not highly Postgres specific, but you can look at how they're using the database and particularly they're using scoped database control. So basically the LLM has access to one schema that they happen to call memories, and the LLM operates under a specific role, the Memories role, that only has access to that schema. So basically it's very contained in what the LLM can do. But I thought this was interesting, so feel free to check it out. [00:16:15] Next piece of content There was another episode of the Talking Postgres podcast. This was on 12 years of postgres Weekly with Peter Cooper. So this is the weekly newsletter that goes out that talks all about postgres, so definitely check out this episode if you're interested. And the last piece of content PostgreSQL hacking and patch review workshops for July 2025. This is from arhas.plugspot.com and next month there will be discussions on the presentation Fast Path locking improvements in PG18 that was given at the 2025 pgconf.dev and if you're interested in joining, you can sign up using the form located here. [00:16:54] And in addition, Paul Jungwirth is organizing a Patch Review workshop that coincides with the July Commit Fest and you can sign up here. So check that out if you're interested. [00:17:05] So now it's time for my consulting corner. So as far as I know, by default RDS's Blue Green deployment does not offer a way for you to roll back. In other words, once you've switched over to the green instance, there's no way to switch back to the old blue instance. But with a particular client I was working with, they were highly interested in having a rollback process. [00:17:31] Now, because of all the automation involved in a Blue green deployment and then doing the switchover process, it's hard to find an ideal point at which to start replication back to the old cluster. So what we decided to do is take a bit of downtime prior to the blue green switchover. So what that means we're going to take the system offline in terms of traffic to the database, and on the green cluster, the future cluster, we're going to create the publications and the replication slots there. [00:18:04] Then we trigger the switchover process. [00:18:06] So the green cluster gets promoted to essentially the main cluster and then turn traffic back on. The application can start normally, but those replication slots have basically held the wall in that one position. When we applied the replication slots and then we set up subscriptions on that old cluster that still exists to those publications. But we don't copy any data because the data in the cluster should have been in the same state because no data changes were happening. And then data will flow from the publisher to the subscriber, which is from the newer cluster, back to the old cluster. So heaven forbid we have to fail back. We can take another downtime, update the sequences in that older cluster, and then point the application to use that one. [00:18:51] It's still a bit of a manual process to do this kind of rollback or fail back, but for this client it was an important aspect to have. So again, I'd be interested to know in the comments what your experiences are with Blue green deployments. [00:19:07] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 150

Scaling Advice, Vicious Cycle, Improve Observability, Addressing Corruption | Scaling Postgres 150

Episode 332

Sometimes It Is Slow? | Scaling Postgres 332

Episode 36

Sharding, Giving Back, SCRAM, Partitioning Evolution | Scaling Postgres 36

The Future of Postgres Architecture | Scaling Postgres 372