Episode Transcript
[00:00:00] Has everyone been installing a lot of patches lately? I know I have, particularly with all of these Linux bugs that have been coming out recently.
[00:00:11] And also it looks like Postgres is no exception because the new releases this quarter cover 11 security issues, but I hope you, your friends, family and co workers continue to do well.
[00:00:26] Our first piece of content is PostgreSQL 18.4, 17.10, 16.14, 15.18 and 14.23 are released. The first thing they mention here is that Postgres 14 is approaching end of life as of 11-12-2026.
[00:00:45] And the main thing to note with these releases is there are 11 security issues and I would say this is four to five times higher than we typically get. Maybe you get two or three, maybe four at the most. To have 11 is a definite outlier. So I think more people are using AI the LLMs to help them discover these security issues.
[00:01:12] Now normally I read through all of them. I'm not going to do it this time, but I'll just give a little flavor of what's going on.
[00:01:18] The highest base score is an 8.8 and there are three of them that have 8.8s. Oh sorry, actually four of them and they cover the gamut of different things. There are integer wraparound issues, disclosing of server memory by a time of day function overriding unrelated files, some SQL injections, some overriding client memory stack uncontrolled recursion causing a denial of service buffer overflows, SQL injections again, and many of these issues exist all the way back to version 14. There are some that are just on more recent versions, but now more than ever it seems to be good advice to go ahead and install these security patches as soon as you're able.
[00:02:09] And of course with the security issues there are all of these bug fixes and improvements that have been done there. Thankfully there's nothing special you need to do about these releases, but again you do need to check each point release or minor release to make sure that you may need to do re indexing or some other steps to properly do the upgrade. But check this out for more details. Next piece of content, Postgres May 2026 security update 11 CVEs all versions affected this is from Robbins.in and he goes over the 11 vulnerabilities, but what he tracked is how far back they had to apply the patches. So for example this CVE had to patch everything back to 14, whereas this particular CVE was 18 only.
[00:02:56] So we looked at the ratio for different releases that have happened and how far back they had to go.
[00:03:02] And this month is a definite outlier where more times they had to go all the way back to 14 to address these issues.
[00:03:12] And he proposed exactly what I was thinking, could AI be behind the spike? And I probably think there's no denying that it is.
[00:03:21] And he's actually linking an article where a lot of people have started to feel AI bug reports have evolved from slop to genuinely high quality submissions.
[00:03:31] But I thought he had some good perspective in this blog post.
[00:03:35] Next piece of content pgbouncer1.25.2 is released again, the main issue is fixing CVEs four of them. Again, this is a higher ratio than we are typically used to seeing with PgBouncer. So if we look back because this is just a single changelog page, you could see there was 1 CVE in the last one, none in this release, 1 CVE in this one, none in this release.
[00:04:03] So we're all the way back to 2024, 2023 and maybe there's one CVE and in this release we're getting four. So clearly there's a pattern going on where seemingly more people are using AI to find vulnerabilities in software.
[00:04:17] So get ready to keep on patching Next piece of content. There was actually another episode of Postgres FM last week. This one was on pgq. Unfortunately I had my whole episode last week on pgq. But this is actually Nick giving his perspective of it, the creator as well as Michael as they discuss this new project. And Nick mentioned he has for the longest time recommended for update Skip Lock for people trying to do a queue on Postgres and do it using partitions to try and avoid too much deleting. You could just drop the partition as opposed to incurring the delete cost. But then as he learned more about pgq I think he mentioned he had known about it before but but looking at it more recently he akin to remembering forgotten Kung Fu. I love that quote from the show.
[00:05:09] He also mentioned discussion where people push back that well this really isn't a queue, it's more an immutable log system. And he says indeed it is more like an immutable log system as opposed to a queue.
[00:05:24] So it might be more like Kafka than some other queue systems because I think he said it essentially has one consumer but he's actually looking at and continuing to enhance it and there may be sub consumers at some point as well.
[00:05:38] But definitely encourage you to listen to the episode or watch the YouTube video next piece of content making JSON be more queryable with generated columns this is from rich yen.com and he says he has encountered tables in the wild that basically look like this, which is a little bit scary to me. How few columns it has definitely sets off a little bit of alarm bells. I'm fine having a JSON B column. It's just like typically would have at least a few other columns present, but they're basically putting all of their data in the JSON B as you can see here.
[00:06:12] So what are the best ways to query it? Well, one approach he mentions is you could put a gen index on it.
[00:06:19] But in terms of querying it, remember you're really querying it using containment or key existence operators.
[00:06:27] So like this query uses a gen index with this particular esthan symbol, whereas trying to run the query like this with the gen index, it does not get used. You need to use the write operators with it. Next option is using expression indexes. So writing a function to basically extract that value and you're creating a btree index on it.
[00:06:49] And the third option is to actually generate columns. So you use essentially a function to create a dedicated column for the data it contains. So here you're creating a stored column of the user ID that is kept in sync with the state of the jsonb. So whenever the JSONB gets updated, this value will change in this column because it's generated from the source data listed here. And you can create indexes on these generated columns. So we looked at the query performance and the fastest one was either the expression index or the generated column with a B tree index on it. Those are the faster ones, the gen ones were slower. He then looked at storage and overall table size and index size. The smallest is expression indexes, of course, because you're not creating any additional data, it's just a index definition. So that's the smallest followed by the generated columns version and then all the gen indexes. Again because gen indexes are rather large, they're almost the size of the table and here it's the same size as the table. He looked at the right throughput and again the generated columns in B tree and the expression indexes one again with Jen following up and he gives a rough guide down here that I liked. You know, if your situation is you have unknown or atom field queries, just use a gen index. If you have known fields, a few queries schema's not changing that much. You could do an expression index. If you do have known fields high query volume, evolving code base, especially ones with range queries or some mix. It's usually best to go generated columns and if you need some ad hoc maybe you add a gen index in addition.
[00:08:42] But he makes an important point here that I mentioned at the beginning is the real win is making data typed and relational again because you can use B tree indexes and you can use multi column indexes, etc. Expression indexes do have some difficulty because you need exact predicate matching because it must exactly match how you're querying the data.
[00:09:05] Also be aware that generated column expressions must be immutable and and they cannot be directly updated and that if you have a heavily written to table gen index is going to take a big hit in terms of maintenance. But I thought this was a great blog post. Definitely encourage you to check it out. Next piece of content Strong views on PostgreSQL views this is from boringsql.com and he's talking about views and how sometimes some projects choose to abstract queries behind views and and this can start to get you into a little bit of trouble because maybe you have an active customers views and then you use active customers in the customer order view and then you use the customer orders views in the customer summary view and then you have this chain that needs to be rebuilt in order to run the query.
[00:09:58] And he says the problem with views is that they are queried just like tables. So this query is indistinguishable from this particular query, but one is a view and one is a table. You can have the nested view issue that he showed and how views are basically just rewrite rules. They're rewriting the query and it says if you just put a subquery in it, it's essentially rewriting it as that to run it and trying to write through views. There's always issues with that.
[00:10:28] But the bigger problem can come when you actually need to change a table. Like if you need to drop a table if you have a view applied to it, suddenly you can't do it because maybe it's referenced by that view.
[00:10:39] And he says it's not just dropping. Try widening a table. So even altering the column type can cause an issue. If you have a view applied now maybe you try cascade, but that could cause a nightmare of things being lost.
[00:10:55] And he has this full process for going through to make changes like this. If you have nested views applied to tables, which is a whole lot of code I would not want to try to maintain.
[00:11:07] He mentions the select all the columns trap because again the views reference position, not the actual columns, which can cause issues. In terms of working around some of these issues, you could avoid views entirely, he said. Maybe versioning your views could help.
[00:11:25] He says there are other views you can use to try and track the state of things, but again, it gets quite difficult.
[00:11:31] Basically, here are the big bad things select in a view body is a trap because it freezes all the columns at creation time and if you add a column and delete a column then your view has to be updated.
[00:11:44] If you have nested views, every layer multiplies the teardown cost for when you need to recreate these views and and cascade is not a fix.
[00:11:54] So especially with different application frameworks and the way that they do migrations, essentially a view is typically just run once, whereas if you use views this whole maintenance lifecycle needs to be considered. And if you want to learn more about that, I definitely encourage you to check out this blog post. Next Piece of Content the wall level you set is not the wall level you get. This is from thebuild.com and he's talking about in Postgres 19 where it dynamically sets the wall level based upon whether a replication slot exists or not. Because historically I've recommended just set the wall level as logical in case you want to do logical replication at some point in the future, but you're generating all this extra wall when you're not actually using logical replication. So now with Postgres 19 it it actually elevates the level based upon a slot being present or not.
[00:12:48] But the transition is not actually free because it does have to force a checkpoint because as he says here quote it must force a checkpoint. So every writing back end has observed the change and only then start the slot at an LSN guaranteed to have a logical level wall behind it and because of this create replication slot can block until that next checkpoint. But he mentioned some other things you want to consider once Postgres 19 releases with this feature, so check it out if you're interested. Next piece of content A field guide to alternative storage engines for PostgreSQL. This is from thebill.com as well, and he's talking about the table access method that was actually released back in Postgres 12 and how the hope was that we would have multiple storage engines other than the typical heap, but the reality we really don't have any right now. He says part of the reason is that the TAM API is not really a true pluggable storage layer. It is a tuple shaped abstraction over an interface that was designed around the heap and then partially generalized.
[00:13:59] But given that what is the state of the environment? According to him he has one category which are heap replacements and and amongst them he mentions oreldb.
[00:14:10] So they're moving to an undo based log as opposed to storing row versions in the actual heap itself. But they're also adding a whole slew of other features on top of it in terms of copy on write checkpoints with row level, wall in memory caching layer using squizzled pointers and Even an experimental S3 mode where wall archives and undo segments live on object storage.
[00:14:36] But this project as of today is still in public beta and not recommended for production. But this is probably the furthest along Then we have Z heap that was attempted but then essentially abandoned at this point I believe he did mention you know it was started by edb.
[00:14:55] Cybertech did pick it up but I don't know if anything has moved forward with it. Then he mentions the columnar tams. This is like situs columnar that has existed for a while. There's also hydro columnar that's relatively new and he also mentions storage engine and PD cryogen. My understanding not too many of these are big.
[00:15:18] I think more people are probably using cytos columnar, maybe Hydra as well. Then he lists category three lakehouse back tams. These are where Postgres actually talks to an external storage system outside of the heap. So looking at parquet files for example.
[00:15:36] So he's mentioning pgmooncake that creates a column store mirror of your Postgres tables in iceberg or delta lake format. He mentions pgduckdb that embeds duckdb in the Postgres backend and exposes a duckdb table access method to query things like parquet files or JSON data etc.
[00:15:56] And he also mentions paradedb and pgsearch as others that are doing text search methods and then a few domain specific tams which don't seem to really be active. But he said as of today orioldb is the most credible undo log option, but it's still in beta. Citus, Columnar and Hydra are credible read mostly columnar options.
[00:16:18] PGMooncake and PGDuckDB are credible options if you're willing to push execution out to duckdb and if you're working with time series, basically you should be using timescaledb. But check this out if you want to learn more.
[00:16:32] Next piece of content, 8 bytes is the easy part. This is from thebuild.com and he's talking about in Postgres 19 where they widen the multi exact offset to 64 bits. Hey, does this mean people might be asking well hey, does this mean we're going to get 64 bit transaction IDs anytime soon? And he says probably not. Now the main reason why is upgrades and specifically PGUpgrade, because when you run PGUpgrade in order to do the upgrade as fast as possible, you would typically use a link mode to do it. And what that means is, is that you use the existing files of the old database version in the new database version. Essentially you're just changing the engine out of it. But the problem is if we change to 64 bit transaction IDs, all the data files and data tuples have to change.
[00:17:24] So as he says, you are now in PG upgrade copy territory, which will take forever on a large database cluster and PGDump PGRESTORE would be even worse.
[00:17:37] So that's the biggest reason why we're probably not getting them soon. He also mentioned snapshot issues because the snapshots are much larger because it stores an xmin XMAX and an array of xids that were in flight at the time the snapshot was taken. So that's going to be a larger size and then you have the least recently used cache references as well.
[00:18:02] So he says what is potentially being worked on is adding an epoch somewhere so that the effective XID is 64 bits while the actual storage remains at 32 bit.
[00:18:13] This is the way that Postgres Pro and others have achieved 64 bit xids and it might be the way the Postgres project goes, but again, time will tell, but it's probably not going to be soon.
[00:18:28] But check this out if you want to learn more. Next piece of content cnpgrecipe24 Migrating from Crunchy PGO to PostgreSQL 18 with Cloud native PG this is from gabrielebartolini.it and he's talking about migrating from Crunchy Data's Postgres operator to the cloud native PGE operator. And he has a process to follow to do that migration if you're interested.
[00:18:54] Next piece of content. No Compiler required, writing SQL, only Postgres extensions. This is from pgedge.com and here he walks through how to set up and an extension using pure SQL. No C code required. So he goes through the whole process to actually build it and Last piece of content from MEMSQL to Horizon An Engineer's Journey with Adam Prout this is from talking Postgres and this is a podcast where she's discussing with him being an engineer at Microsoft and seemed to talk a lot about Microsoft's Horizon DB product.
[00:19:31] But you can listen to the episodes here or watch the YouTube video down here.
[00:19:37] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content discussed, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show as well as a full transcript. Thanks. I'll see you next week.