Episode Transcript
[00:00:00] When I took off last week to go to a conference, I didn't expect all of the blog posts to be released last week and the week before last. So not a lot of intro this week because we've got a lot to cover. But before we start, I hope you, your friends, family and co workers continue to do well. Our first piece of content is PostgreSQL indexes for columnstore 1185 faster lookup queries 224 times faster inserts in timescaledb this is from timescale.com and normally I don't cover niche specific databases, but I thought this was of significance because you don't normally see indexes on column stores. Normally that's used for analytical queries, so you just are getting full aggregates of the data. But they've engineered a way to add indexes to columnstore and I think this is a beta or preview mode and they are offering B tree indexes and hash indexes to start. So in terms of what's faster is point lookup. So you need to update a specific piece of data in a row essentially in columnstore. Well, these indexes allow you to do that. You can see the enormous performance gains when you're using these hash indexes or B tree indexes compared to noindex. Similarly, they got 224 times faster insert on conflict decod nothing because it does need to check an index for the conflict happening and then they go into how they did the implementation. So basically timescale started off not using the table access method interface that postgres released, but now they've started using it and implementing different features into it. So normally you have the heap table access method in stock postgres that has a row store on the postgres heap, whereas the Timescale DB hypercore storage has a row store and a column store. And how timescale normally works is fresh data is put into the row store and then it is compressed and stored in the column store once that data has hit a certain age. And they're using a hypercore table access method to do this implementation. And of course they describe it all in here and even how you apply indexes to your column store. And at the bottom here they say when to use and when not to use indexes on your compressed data. And basically if you need fast lookups on non segment by keys so you need to look up specific records or query latency is a problem on your compressed data or you perform frequent updates to your historical data in the columnstore and you need that efficient uniqueness enforcement. But there are some other optimizations that columnstore has that they mentioned down here, so maybe you wouldn't necessarily need it, but there's a lot I didn't cover. If you want to look at it, you can check out this blog post. The next post, also related to Timescale because this is their launch week is they had you Too can scale Postgres to 2 petabytes and 1 terabyte of metrics per day. And this is basically their Insights feature of the Timescale platform that collects all of the database metrics for the databases they're hosting. And they have a little chart down here. So as of the first quarter of 2025, they've recorded 250 trillion metrics stored just under 2 petabytes and they're collecting a trillion metrics per day. So they're basically letting you know Timescale can scale. Next piece of content also from Timescale is speed up triggers by seven times with transition tables. And this is a feature that's existed in Postgres since version 10, so seven versions ago. And it's basically statement level after triggers. So you can create a trigger that fires after a statement is complete and apply some action on all of those rows. And that's much more efficient than doing a per row trigger. So it should offer better performance by not having to do per row work. And this feature should be available in timescale now, so check it out if you're interested. Next piece of content PostgreSQL 1 trillion rows in CITUS. This is from Cyberdeck PostgreSQL.com and he said on a rainy weekend he wanted to see on his personal computer if he could squeeze a trillion rows into it. So he basically chose Citus and is using Citus column storage. And it looks like he's storing an integer and a bigint. And for trillion rows he said it was about 1.5 terabytes. Now if this was heap storage, he calculated it would be approximately 34 terabytes using the Postgres heap storage here. Now he is using Citus column storage, but he also decided to create four separate nodes on his machine because Citus is known for its scaling out capabilities. I don't know what that necessarily offers on a single machine, but that's how he did it. And then he ran some queries, he ran a count query and it took 53 minutes to run. And he shows how it actually spreads a load to everything across the system. And interestingly, what he encountered was a CPU bottleneck as opposed to an I O bottleneck. So if you had more CPUs this would run even faster. But I assume that's because of the compression of Comm Store having to uncompress it to work with it. Not quite sure. He also looked at grouping the large data set as well as some things that actually it would have a problem with the scale out of Citus. So this window function up here wouldn't be very efficient. But if you want to learn more, definitely check out this blog post. Next piece of content Citus the misunderstood postgres extension this is from CrunchyData.com and this blog post says that, you know, Citus is good for essentially two things. One thing is sharding multi tenant application. So basically he's imagining a software as a service app and you have many different customers using it with their own data and you want to keep that data segregated. Well, normally what you would do is you would shard your database and then target a particular tenant id, usually some integer or maybe a UUID to go to a particular shard and all of that customer's data would be located on that shard. So any joins could happen locally on whatever instance that shard is operating on. So that should allow very fast individual queries. And he mentioned some of the nomenclature used. You have the physical nodes, which is basically a physical machine and it holds one or more shards. You have a shard which is basically a logical container of the data resides on a physical node and it can be moved between different nodes and then the placement group, which uses a hash based algorithm to assign a tenant ID to a shard. So if you're not familiar with Citus, there's a coordinator node that receives queries and sends them down to the individual worker databases that have shards. Those are processed there and then sent back to the coordinator. So you would imagine customer one would have all of their information here, customer two would have all of their information in the shard on here. So all of their data is kept together for efficient queries. But he said another example of using Citus is with real time analytics and as opposed to locating data on an individual shard and querying that particular shard and having all the joins happening locally, for these types of analytics, you want to leverage the whole cluster and all of its CPU to answer broad questions. So one analytical query could be sent to every single worker node that that has a portion of the data required. So if you're doing a sum query, all of the shards are active, working on the nodes to answer that particular query. So it allows much faster analytical answers by spreading out the load across multiple systems. So this would be more of an OLAP use case, whereas the software as a service application example would be more of an online transaction processing use case. And he does follow up talking about the importance for designing foresharding ahead of time before trying to transition an application to use it after it's been established. But if you want to learn more, check out this blog post. Next piece of content Also from Crunchy data is postgres troubleshooting fixing duplicate primary key rows. And I read this and I'm thinking what are you talking about? Primary keys are unique. But the issue he's talking about is indeed you do a query against a table and you can see a duplicate id. So what's going on? The problem is the unique index on the primary key is invalid because potentially during an upgrade of your operating system, the glibc library changed the correlations and now you have violations of your unique indexes which can cause duplicate primary key issues, which is no bueno. So he works through the process of correcting this problem. First thing he said is probably disabling indexes. So turning off index scan, bitmap scan index only scan so that the invalid indexes aren't playing a role in the queries that you're checking. Two is make sure you're using the right table and the right data that you think is duplicated. Be sure to take it back up, make a copy of your table so that you have the data in case you do some sort of mistake, and then get started with the cleanup. So you doing this within a transaction and he's setting a local session replication role to be replica. And what that basically does is it turns off foreign keys so that you can go ahead and delete that data without running into foreign key issues, but you just still want to make sure that the data will be consistent after you do this work. And this is the process he has. So he has a CTE here to identify good rows, which is basically find the minimum unique row identifier the CTID and group it by the id. So you're finding all of the good rows with the earliest ctid and then you delete from the table where those good rows don't exist, basically and return all the data and insert it into the duplicate rows table. So basically you know exactly what you've deleted. And then once you've done that, you should re index the table. Of course. But here's a procedure to use if you get in this unfortunate position Next piece of content There were two episodes of Postgres FM that I missed. The first one was on gen indexes. So this is a generalized inverted index. And as opposed to a B tree index which is a tree, a gen index is really more like a books index. So you look up a particular word and there are many entries for each page or location in the heap where that word exists within the data. So you can look up that specific word to find many locations where that data exists. Now this is useful for full text search. It's also useful for JSON B columns which seems to be only increasing in usage, and also arrays because there are good use cases for using arrays in postgres and gen indexes are the index type you would want to apply to an array data type. And they had one quote a little bit later in the episode where they said this works well. Until it doesn't. I think they're mostly talking about the full text search, but also JSON B can be involved as well. Once your data gets up to a certain size, it starts really slowing down gen indexes, particularly when doing a lot of text in the full text search. Maybe the JSON B and arrays, it's more efficient. But I think Nikolai mentioned once people hit a certain scale, they tend to move their searching off to elasticsearch or some other tool. The other difficulty with them is if you want to narrow down a query by something with a gen index and then something with a B tree index. Now there is the btreegen extension you can use that allows you to use both in the same query. But I've had hit or miss success with that. Sometimes it works well, sometimes it doesn't. But if you want to go into more detail about gen indexes, definitely encourage you to listen to the episode here or watch the YouTube video down here. Next episode was about snapshots. So these are about using disk or volume snapshots as a part of your backup solution. Now I generally like to use PG based backup. I know other people use PG backrest, but I've also used snapshots for backups as well because they are very fast to backup and restore when you have multi terabytes of data. And Nikolai made the comment that once he sees people's database size hitting 10 terabytes they tend to move to snapshots just because it's easier to work with. Now of course with these snapshots with a lot of cloud providers your data is now locked in there. You can't really take your snapshot elsewhere as far as I know. So I think there's still a role for some type of backup to happen as well. But I kind of agree with them is that I think some of the future is going to be using snapshots for backup more than it's used today. And if you want to learn more, you can listen to the episode or check out the YouTube video. Next piece of content why PostgreSQL needs a better API for Alternative Table Engines this is from oreldb and they say postgres has had a table access method and an index access method API, but for where they kind of want to take the performance of postgres, the current API is limiting. So they're basically trying to make a case in this blog post to say, hey, we need to make changes to this API to really push the limits of postgres performance. Because right now the Oriole DB solution can't just be an extension because of the limits of the API. You also have to add some patches to postgres. So they'd very much like the API to have more capabilities or features so that they could build their solution and deliver it as just an extension. But if you want to learn more, you can check out this blog post. Next piece of content scale 22x live streams and Row Level Security Sucks this is from byrust.org and what I thought was interesting was the YouTube video of his presentation. This is Jimmy Angelakos on Row level security sucks. Can we make it usable? And there's even a link to the slides down here. And I'm still not a fan of row level security given its performance issues, but he goes into a lot of depth and maybe some things you could do in if you still want to use it. Next piece of content Selective asynchronous commits in PostgreSQL balancing durability and performance this is from Shayan.dev and he's talking about synchronous commit, which by default it's on. It basically means that when you say do an insert statement, it writes that insert statement to the write ahead log. It then flushes those wall records to permanent storage, basically your disk, and only then does it return control to the client and say, okay, this insert was successful. So that's what is meant by a synchronous commit. Now if you turn this off, you get a huge performance boost. He's talking about 20, 30% better performance in some cases, but you have a risk if there's a crash before these writes happen. You could lose data essentially, so you probably don't want to turn it off. However, you can change the setting per session, per transaction, or for particular operations you want to do. So you can have some granularity as to how you would want to set it, but keep in mind that the setting also applies when you have synchronous replication going on, because it allows you to define you only want the synchronous commit happening locally, meaning just on the primary, or you want it to be remotely written to every replica that exists, at least on their disk, or you want to actually have it applied to the actual database files and available for query. This clearly is the highest latency, and this offers the lowest latency while still giving you local durability on the primary. But if you want to learn more, you can check this out. Next piece of content PostgreSQL 18 more granular log connections. So apparently there's a patch coming up in postgres 18 that should allow us to define what we want logged when a connection event happens, because currently it logs these three lines and it can make your logs very busy, but you can define particular value here. So here he just defined I want to know about just the authorization events and now only this one line is being logged, so that can make logs a lot easier to read. So check this out if you want to learn more. Next piece of content postgres Security Checklist from the center for Internet Security. This is from CrunchyData.com and the PostgreSQL CIS benchmark for PostgreSQL 17 was just released and the last piece of content Mentoring Applications and hacking workshop for April 2025. This is from arhas.plugspot.com and he has a call for applications for a one on one mentoring program for the PostgreSQL hackers group as well as in April there's a discussion of how auto vacuum goes wrong and can we please make it stop doing that? But you can check that out if you're interested.
[00:17:15] In terms of the consulting corner, I just have a very small nugget of advice and I got this watching some of the presentations at the conference I attended. But if you ever have to delete a large amount of data from a table, I would definitely look to do partitioning on that because deleting a lot of data it adds locking that's going on that may impact select queries that are going on and there's more work for vacuum to do. It's usually much more efficient to detach an old partition and then just delete it, as opposed to running millions of delete statements. And if you already have a partition table you're deleting from, but it's not using a timestamp as the partitioning key, well, maybe you want to consider a hierarchical partitioning and have some timestamp partitions so that you can then drop those partitions when they're done. Just a little piece of advice.
[00:18:06] I hope you enjoyed this episode. Be sure to check out scalingpostgrass.com where you can find all the links to the content I covered, as well as links to the content I didn't cover. While you're at the site, you can also sign up to receive weekly notifications of each episode, as well as get an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.