Episode Transcript
[00:00:00] I hope you didn't watch last week's episode of Scaling Postgres and saw that postgres had new releases and there were five security vulnerabilities and thought I need to get this upgraded as soon as possible.
[00:00:14] Why? Well, because there have been some regressions that have been identified and Postgres is doing an out of cycle release. It should be this coming Thursday, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is out of cycle Release scheduled for February 26, 2026. This is from PostGresQirl.org and there were two regressions that were identified from the most recent release that happened on February 12th. One involving a substring function raises an invalid byte sequence for encoding error for non ASCII text values and also a standby may halt and return an error could not access status of transaction. So they do say that one of the fixes was for the substring error, but they didn't mention that about the standby. So basically, if your security exposure allows it, you may want to hold off on upgrading for the next cycle of releases that include these regression fixes.
[00:01:19] And thanks to the Postgres team for jumping on this so quickly to resolve these. But it's another reason to maybe wait a little while after new releases come out to make sure there's no particular regressions happening, or at least waiting to put those releases directly into production.
[00:01:36] Related to that, Percona also posted information about this PostgreSQL miner release postponed in Q1 2026. So they're talking about their Percona distribution for PostgreSQL that bundles in some specialized experience extensions. They're actually not shipping a February 12th release and instead are waiting until Postgres does their February 26th update and then theirs will follow. But check this out if you want to learn more about that next piece of content. Supabase incident on February 12, 2026 so this is always unfortunate, but I always like looking into these given that everyone is trying to achieve five nines if they can of availability and what are some failure modes that happened? So this outage was really long, 3 hours and 42 minutes, but it was exclusively for their services in the US East 2 or the Ohio region of AWS.
[00:02:37] So basically customers with projects in this region were unable to access their Postgres databases, authentication data, APIs, edge functions, storage, real time, pretty much any Supabase service.
[00:02:50] Now what happened is they had a new internal monitoring service that inadvertently enabled AWS's VPC Block public access feature basically all network connectivity was shut down for their VPC in this region. So it only affected people in this region U.S. east 2. So that's another argument for operating multi regions if you want the highest availability.
[00:03:17] And they described the VPC block public access feature and it's basically designed for compliance sensitive environments where you don't want any Internet access at all. This basically got triggered when the new monitoring service was deployed. Now because you're thinking this was just one configuration change, why did it take 3 hours and 42 minutes? And they addressed that here and the reason is because the outage triggered alarms on shared services in the a different region.
[00:03:44] So initially they were investigating that and not the US East 2 region. And this event change happened in only one row or one line item in the cloudtrail logs and it wasn't immediately visible. And even the network team I don't think was immediately involved because they thought it was something else. Now they do have a pre production environment but none of them made use of the use East 2 region and then they go into what they're doing about it, basically trying to lock down and avoid that feature, putting in more guardrails and access controls, doing some better account isolation, so taking all non customer facing services and putting it into separate AWS accounts. But this is a huge one here. Setting up external connectivity probes, continuously monitoring network connectivity and and it will immediately pop up and tell them that there's something fundamentally wrong with the network connectivity. They want to implement full parity between production and pre prod environments so that they can test each region accurately and also do faster incident coordination. They also mentioned different communication challenges where they felt they could have responded faster to the issues. But anyway, I thought this was interesting and I encourage you to check it out.
[00:05:01] Next piece of content, how we optimized Top K in Postgres this is from paradedb.com and this is looking for the top sorted rows by some column or value within a query. So they're using limit K in this example here.
[00:05:16] Now this is super easy to do with postgres. You just order by a particular column and then you say how many of that you want and it's very fast. You can build an index for it that makes it equally fast. But then if you add a filter then it starts breaking down a little bit because now you need to in this example filter by severity. And the solution of course is say all right, we'll add that column to the leading column of the index. So you index on severity and timestamp and that'll filter it and give you great performance again for those top K queries. But then what if you want to add another filter? Well, that's when things start to break down. Yes, you can add that additional column to the index, but now you start creating this very large index. Or what if your filters vary by how you want to query things, then that starts making things more difficult.
[00:06:05] And then particularly what if you want to start doing text search and potentially along with rankings, then those B tree indexes can't really be used. And you're maybe using something like a gen index.
[00:06:18] And even once you optimize as much as possible, trying to get the top 10 ranked of something is still incredibly hard to do. You can use a gen index, you can even implement a B tree gen index, but the performance still isn't going to be super great. It's acceptable for a lot of use cases, but once you start talking about millions and millions of rows, then it starts to break down.
[00:06:41] So here's where they start talking about ParadedB's product and it's basically their BM25 index and the optimizations they've done to make top K searches across various different values or columns much easier. And they are using an inverted index, but they're actually storing the data in a column store as well to be able to retrieve the data efficiently.
[00:07:04] And I thought this was a pretty good summary down here where they say quote postgres Top K approach is a bit like all or nothing. The happy path is nearly instant. So basically you have 1, 2, 3 columns that you want to get the top K records for super fast, continuing with the quote. But the worst case can take seconds or even minutes.
[00:07:25] Whereas with Parade DB or their BM25 index, it's designed to make any top K query reasonably fast. So you don't have specifically dedicated indexes. It kind of indexes the whole data set, or at least the set of filters and sort keys to make things retrieve efficiently no matter how you're querying it.
[00:07:45] But if you want to learn more, you can definitely check out this blog post.
[00:07:49] Next piece of Content Read Efficiency Issues in Postgres Queries this is from pgmuster.com and I would say a good summary of this blog post is how to make your reads as efficient as possible.
[00:08:01] And one way to do that is to avoid bloat or the extra space in your database. So if you have a lot of updates happening given the MVCC rules of postgres, a new row is created and the Old one is marked for deletion eventually.
[00:08:17] So he examines the case of bloat. So he creates a table and inserts a million rows of random text into it, does a vacuum analyze, and he turns off autovacuum for this table, again for demonstration purposes only. But he doesn't want vacuum to run and clean things up, so he wants to look at the impact of bloat.
[00:08:37] So when the table is created anew, it has 135 megabytes of heap space and 21 megabytes of index space. And just reading from the whole table took about 233 milliseconds and read 17,000 buffers, which he says, if you multiply that by eight kilobyte pages, gives you about 135 megabytes of space in the heap.
[00:09:02] So he then updates the data in this table nine times, so basically makes it ten times the size.
[00:09:10] Now, there's still only a million rows, but there's the dead tuples that still exist in the table.
[00:09:17] So you can see the heap space and the index space are now approximately 10 times larger. And when you run your query, it's four to five times slower. And your number of buffers are hitting and red is almost 10 times the amount from when the query had essentially zero bloat.
[00:09:37] So this is an example of read inefficiency. So how do you make it efficient? Well, before he got to that, he actually said, all right, well, let's take a look at the index. So he does an index query that's retrieving a fair number of buffers, around 24,000.
[00:09:53] The way you make the index more efficient is you can just re index it concurrently and as you can see, it shrinks down the index size back to what it originally was. And that same query that read almost 25,000 shared buffers now reads 33.
[00:10:09] And of course, the speed is much faster. It's maybe 60 times faster. So that's the result of inefficient reading. Now, how do you address the rows in the table? Well, vacuum does clean up some of them, but it's not going to resolve all bloat. So how do you fully debload a table? He recommends the two extensions that are commonly used, PGrepack or PG Squeeze, because they can do online table rebuilds. You don't want to use the vacuum full or cluster tools within postgres because that halts all reads and writes to the table.
[00:10:45] He also gets in to talk about data locality as well.
[00:10:49] And this is basically how you're querying the data is the physical layout of the data match how you're actually looking for that data, and if you can increase the alignment, for example by using cluster to rewrite the table along a particular column, that can make reads more efficient. Andy discusses a few other issues as well, so if you want to learn more, definitely check out this blog post Next piece of content PostgreSQL bloat is a feature, not a bug.
[00:11:15] So much like we were talking about in the last blog about how MVCC creates two rows when you do an update, and even when you delete it still retains a row until vacuum goes through and cleans it up. This goes into more detail about how that works, so if you want to learn more about that, you can check out this piece of content.
[00:11:34] Next Piece of Content PG Background make postgres do the long work while your session stays light this is from vibhorkumar.WordPress.com and he's talking about an extension called PG background which I'm not that familiar with, but it basically allows you to create a background worker process doing some unit of work and it is entirely disconnected from a backend session so you don't have to keep a client session open while postgres does work. So you could tell it to kick off a vacuum process or an analyze process and essentially is disconnected from a session and presumably you know won't be killed if there's a disconnect or something. And these background processes run on a postgres instance. And here are some things he mentions that this extension offers as benefits. You know one. You can kick off a long running query or maintenance without holding the client connection open. You can run autonomous transaction style side effects, so basically do commit rollback independent of the caller. You can monitor, wait, detach or cancel explicitly and and you can keep the operational mode postgres native instead of adding another job system.
[00:12:46] Although to my knowledge this doesn't run automatic, it's only the means to set up a background worker. So you still would probably want to use something like PG Cron or some other scheduling system to actually kick these processes off if you want them to process something on a particular schedule. And another thing he mentions here, it does offer server side of the servability to what's going on in terms of the workers and their current state.
[00:13:12] He says There is a version 2 API that he highly recommends using and he mentioned some various enhancements that have been added that might make it good for production use cases.
[00:13:22] So check out this blog post if you're interested. Next piece of content There was another episode of Postgres FM last week. This one was on comments and metadata.
[00:13:32] So they discussed how how you can put comments in SQL code. Postgres does support this, and I actually haven't seen it used that much with OLTP applications, namely because there's one primary application that is using the database in my experience. But if you have one common database used by a multitude of applications, then I think those database comments become more important.
[00:13:57] Particularly if it's a data warehouse or a data mart or a data lake. Having comments in the SQL itself to explain to people what the data holds and what it represents becomes more important.
[00:14:09] They also talked about a query comment syntax and these query comments, for example, can say what area of the code called this SQL command. So for example marginalia, which is now actually built into Rails. Now it tells you what part of the code called the SQL query. So if you're looking to optimize it, you can know exactly where to go in the application.
[00:14:33] But I thought this was interesting. If you want to learn more, you can listen to the episode here or watch the YouTube video down here.
[00:14:39] Next piece of content the MCD your Rosetta Stone for turning we need a database into actual requirements. This is from mydbanotebook.org and this is part two of their Marisi series and talks about data modeling so defining the entities, relationships and assessing cardinalities and attributes for the use case of setting up a database. So she goes through an example of a zoo database and all the different interactions and entities that are involved to build a good data model. So if you're interested in that, highly encourage you to check out this piece of content.
[00:15:17] Next piece of content.
[00:15:19] Introducing Blue Box docker, a living PostgreSQL sample database. This is from softwareandbooz.com and he wanted to set up a more accurate sample database. And this is set of simulated data for a DVD rental kiosk that he's calling Blue Box. And he does say Redbox is a thing in the US with DVD rentals, but there are some comparisons in other areas of the world, if you're curious.
[00:15:47] So basically it's a Docker container and what you get is a fully populated database simulating a video rental kiosk business. You get real movie data from the TMDB data source.
[00:15:59] You get geographically realistic store and customer locations across the state of New York.
[00:16:04] You get automated PG cron jobs that generate new rentals every five minutes. It handles customer lifecycle events. It has PostGIS for spatial queries and a boatload of extensions already installed.
[00:16:19] So if you're looking for a interesting sample database, definitely check this out.
[00:16:24] Last Piece of Content hacking workshop for March 2026 this is from arhas.blogspot.com and the next hacking workshop is going to be discussing Thomas Vondra's talk Performance Archaeology, and if you're interested in attending, you can use this link to sign up.
[00:16:41] I hope you enjoyed this episode. Be sure to check out scalingpostgres. Com, where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.