Vacuum Is Lying To You? | Scaling Postgres 397

[00:00:00] A lot of times when a client contacts me about having an issue, they may blame vacuum. [00:00:07] So there's some problem with the database and vacuum was running when it happened, so let's blame that. But if you're familiar with postgres, you know vacuum usually isn't the problem and it's usually something else causing issues. Usually the answer to some issues is actually running vacuum more as opposed to less. [00:00:26] But let's assume that you believe vacuum is good and you should run it frequently. And you know that as this blog post says here, dead tuples will get cleaned up, transaction IDs will be recycled, space will be reclaimed, although ever so slightly, and your database will live happily ever after. There are some areas that vacuum will not help, particularly with regard to indexes, and we'll talk more about that this week. [00:00:53] And since we are in the holiday season, I will be taking off next week. So there will be no scaling postgres show next week, but it will be coming back the 1st of January. So I hope everybody has a great holiday season, but I hope you, your friends, family and co workers continue to do well Our first piece of content Vacuum is a lie about your indexes. This is from boringsql.com and he said Vacuum does all these good things I mentioned in the intro to the show, but Vacuum is lying to you about your indexes. So he says first you kind of have to understand how storage is laid out. Basically the heap is just sitting out there unordered and you can delete data and get a new row coming in. It could go into that location, filling in any gaps. But even if it doesn't, vacuum comes along. It removes dead tuples and compacts remaining rows within each page. [00:01:50] And quote if an entire page becomes empty, PostgreSQL can reclaim it entirely. But indexes are different primarily because they are sorted. So if you delete enough data, you basically just have these open areas within the index and it doesn't fundamentally restructure the B tree. And he says Vacuum does remove dead tuple pointers from index pages, marks completely empty pages, is reusable and and updates the free space map. But vacuum cannot merge sparse pages together, reduce the tree depth, deallocate empty but still linked pages, or change the physical structure of the B tree. So he shows an example here of where he created demo table insert 100,000 rows, analyzed it, and he's looking at the file size of the index and the table and looking at the actual data that he got by doing a PG tuple calculation based upon the tuple length. Then he deleted 80% of the middle data from the table. [00:02:52] So from 10,000 to 90,000 rows. He then ran vacuum again and he did a size analysis again and this time the file size did stay the same. You can see the file size is mapped before and after the deletion because vacuum deletion, it doesn't reclaim any part of the file being used. It can do it at the ends, but he deleted in the middle. So the file is going to be the same size, but you can see the actual data shrunk for the file but not the index. Now this is interesting because I actually ran this demo and I got less actual data in the index so I couldn't replicate his results here. I did see reduced data for the index after I ran vacuum and I'm using Postgres 17. [00:03:37] I'm not sure what version he was using here. Now he does talk about fill factor and how it relates to this, but in spite of seeing the differences with the amount of actual data, every other thing I did matched what he was seeing. [00:03:50] So when you go to do an explain analyze on the data that was deleted, and I did run analyze on this table as well, you will see that the planner thinks there are going to be 200 rows of, but there are actually zero rows when you run it. And if you run this query you see that the number of rows is accurate, around 20,000 rows left. But you have this page estimate that's based upon the size of the physical file. So that could cause statistics to be off. As it shows, the cost here is based upon the random page cost times the number of pages. So that's the issue. Plus the CPU tuple index cost times the number of tuples, so it has the accurate rows, but the number of pages is bloated. And then he looked at PGStatIndex with regard to this index and you see that the average leaf density is 86%. And you might be thinking, hey, we're doing pretty good. But the problem is a lot of these are deleted pages. Now I'm still analyzing this blog post. I think there's a few things I might be missing here given I did see that discrepancy when I was running some of the examples. But what is true is that vacuum helps in a limited fashion when you're deleting a lot of entries from indexes. And the solution of course to that is not more vacuum, it's reindexing so entirely rebuilding the indexes from scratch. And because you can do it concurrently, you can do it without blocking operations. [00:05:19] Although I will say I have done this reindex process for some clients using partition tables and even though you're supposed to be able to do it on partition tables from version 14 on, I have noticed some query cancellations when running it. I haven't seen that when re indexing other tables though. But if you do the re index on this column you do see that you have now 55 leaf pages and no more of these deleted pages in the index and the file size is significantly smaller of the index. Now in terms of the actual data, this is what my query showed for the index. So it did show before the reindexing that this was the amount of actual data in it. So that makes more sense to me. But if you have a lot of data changes going on, particularly deletion of a lot of data, re indexing indexes is a really good idea to keep those indexes optimized. And if you're wanting to shrink the file as well, he mentions extension to do that and that is PG squeeze. He says you can also do vacuum full, but that doesn't access exclusive lock for everything, locking out all reads, all writes. So that's usually not a viable option. And some of the reasons that you may want to do reindex, he says, is after a massive delete if your bloat ratio exceeds 2.0 and keeps climbing, query plans suddenly prefer sequential scans compared to index scans because the statistics are so far off at things. Well, I'll just scan the table or index size is wildly disproportionate to the row count, but overall this was a good blog post. Definitely encourage you to check it out and learn more. [00:07:01] Next Piece of content building AI agents on Postgres why we built the PGEdge agentic AI toolkit this is from PGEdge.com I've noticed a number of open source solutions coming out from PGEdge recently and that's why I've been posting posting more of their blog posts. But this is a beta release of a full toolkit for using AI with postgres. Now this is not dedicated to their solution, so they do offer PGED Enterprise Postgres and the PGEdge distributed Postgres. But this toolkit can be used with any Postgres system. It can be used on prem in the cloud with other providers or their own solutions. [00:07:42] And the toolkit contains a PGEdge MCP server. So basically granting LLMs access to the database to read and eventually write data to it. If you're comfortable with that, the PGEDGE vectorizer for chunking and creating embeddings, which we discussed in last week's Scaling Postgres episode. A PGED RAG server for doing retrieval augmented generation of text content, a doc loader which I wasn't familiar with, which brings material online to make it searchable by the Agents and then VectorCord BM25 this is an extension to do BM25 rank search via Hybrid semantic and full text searching. So the BM25 is similar to the Elasticsearch indexes that are used for searching. So this would make up the AI toolkit. They also have another blog post introducing the PGH postgres MCP server and how to connect it to Claude code and cursor. So this is focused on their MCP server and again the fact it doesn't just work with their solutions, it works with any solutions on premises and self managed cloud accounts or their own managed cloud service of course. So it allows you to do full schema introspection, it can do some performance analysis for you, and it's put some effort into the security side of things. Initially this is going to be read only, but they say you will be able to write at some point. This particular MCP server is usable from version 14 and higher I believe, but the PGI Agentic AI toolkit is only supported for Postgres 16 and higher. But you can go to this link to download the MCP server for your operating system and you can get started connecting it up to Claude or other LLMs and they show you how to get it configured so that any LLM you work with will be familiar with your database. So check that out if you're interested. [00:09:41] Next piece of content which indexes can be corrupted after an operating system upgrade. So this is the problem where your collation changes usually based upon the operating system. I hope to see this less and less given that Postgres 17 offers a built in collation provider so it's no longer necessarily tied to the operating system. But basically this problem is a result of sorting changing and your text based index is becoming corrupt or invalid that could lead to wrong query results or violated unique constraints. So you have duplication where you don't want it. And basically how you would resolve this problem is during the upgrade process you would re index your indexes. Of course that adds to the downtime of course, but he discusses the exact indexes that you should have to do this for. So it's text based indexes that are not using the binary collation because if you use the Binary collation, that's not going to change. [00:10:40] But you also need to consider collatable expressions. So these are expressions that can impact the collation, like doing length or upper of a particular text string or even conditional indexes that have collatable expressions in them. He says he hasn't seen it, but it's also possible that range partition tables with a string expression could be impacted or maybe even check constraints that use a collatable expression. But he hasn't necessarily seen those and he has a query to show potentially vulnerable indexes. Now he does say you can also do the retroactive or I would kind of say reactive method is that find indexes that have become corrupted and after an operating system upgrade I personally wouldn't want to do that. But if you do, he gives you a query to find the corruption and then you can re index him at that time. But he did mention here a way to avoid this problem is to actually do your upgrade as a logical replication upgrade. So you logically replicate to a newer version of postgres that avoids this problem because you're newly creating those indexes on the new system and version with whatever correlation is set. So that bypasses this problem completely. And that's the one I prefer to use. But if you want to learn more, you can check out this blog post. [00:11:59] Next piece of content. $50 Planetscale metal is general availability for Postgres. So they have been talking about this and now they are offering a Planet Scale metal. Now it's not necessarily bare metal, but they have gotten the price down to $50 a month when it was, I think just over $500 a month. Yeah, $589 per month. And you might be wondering how on earth are they dropping the price that much? Well, it's because the lowest plan gives you 1/8 of a virtual CPU, 1 gigabyte of memory and 10 gigabytes of NVM. So a very, very small database. So even though they call this metal, there is clearly virtualization going on. They're probably using Kubernetes and they're giving you a slice of a CPU and a portion of memory and a portion of the storage. But based upon how they engineered this, you can configure different levels of CPU and RAM for the system that you're looking for. [00:12:58] And if we check out the pricing page here. So I went ahead and chose database engine as Postgres and looking for primary and multiple replicas. So two replica nodes using PlanScale Metal and the Arch64 architecture, you get 1.8 of a CPU, 1 gigabyte of memory and 10 gigabytes of NVMe drive for $50 per month. And you can click that to scale up and choose different options based upon what you're looking for. Now I'm not necessarily endorsing this, but I do find the direct attached SSD super appealing. [00:13:39] I'm thinking about are there any projects I have that I might want to check this out just to experiment with it, but check it out if you're interested. Next Piece of content enhancing PostgreSQL OIDC with PGOIDC validator this is from percona.community and with the introduction of OAuth2 to PostgreSQL 18, they released an upgrade over their PGOIDC validator to do verification of OIDC tokens directly within postgres. And the major support for this is being able to cache some of the data from external identity providers. Otherwise each back end has to perform this validation and if they can catch it, it just makes the system much more efficient. [00:14:22] So you can check this out if you're interested. Next piece of content anonymizing PII in PostgreSQL with PGEdge anonymizer this is from PGEdge.com and this is another, I think open source tool that allows you to anonymize the data within your postgres database, which can definitely help with compliance issues. [00:14:42] So you can check this out if you're interested. And the last piece of content what Postgres developers can expect from PGconf dev with Melanie Pageman so if you want to learn more of that, you can definitely check out this podcast. And now it's time for my consulting corner. I'll just talk about some of the postgres optimization work I've been doing. It's interesting that about a third of what I have done is actually optimizing queries to run faster. With this most recent engagement, another third of it is actually figuring out how to pre calculate aggregated data needed for queries. So these queries were doing all sorts of aggregations each time it was needed? Well, it was determined that the most efficient way to do that is alright, let's pre build these, store the totals in postgres and then just query those when we need them. [00:15:33] Now this can be hard to do if you're dealing with super real time data, but these were processes that run periodically so it was easy to update those aggregations when needed. The other third of the work was actually dealing with paging issues, so they had a number of cases where they had a paging interface that did limits with offset that clearly started running poorly when you started going out too many pages. Now when I see something like that generally I want to do keyset pagination or maybe an infinite scrolling solution but they actually opted to just limit the number of pages that could be viewed which solves the problem as well. All without having to optimize any queries. But those are just some things I've been working on and just before I sign off, remember there is no scaling postgres episode next week because of the holidays. I I hope your holidays are great and I will see you in 2026. I hope you enjoyed this episode. Be sure to check out scalingpostgrows.com where you can find links to all the content discussed as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show as well as a full transcript. Thanks and I'll see you in 2026.

Show Notes

Episode Transcript

Other Episodes

Episode 5

Constraints, User Order, PG Terrible, Logical Upgrades | Scaling Postgres 5

Episode 349

The Dangers of Temporary Tables | Scaling Postgres 349

Episode 10

ANSI Schmansi, Split-Brain, Performance Scenarios, Parallelism | Scaling Postgres 10