Episode Transcript
[00:00:00] When you start scaling Postgres, it can be important to know whether the workload you're dealing with is primarily going to be read heavy or write heavy, because that may dictate what changes you're going to make. So for example, if it's read heavy, you probably want more indexes. If it's write heavy, you probably want less indexes. So it doesn't have the burden of maintaining those indexes. But the reality is that this differentiation exists on the way down to the table level, so it can be beneficial to know which tables tend to be read more versus written to more. Well, we're going to talk about that this week. Also, in my consulting corner I just have a postgres backup public service announcement and a few questions about backup as I'm looking into that now, but I hope you, your friends, family and coworkers continue to do well. Our first piece of content is Is postgres read heavy or write heavy?
[00:00:57] Why should you care? And I first read this and I was thinking postgres read heavy write. I mean it's kind of both, but really what they're talking about is from a database perspective, is your database read heavy or write heavy, not necessarily postgres itself. And like I mentioned in the intro, whether it's read heavy, write heavy can dictate what kind of changes you're going to make to optimize the database. And I thought what was cool about this blog post is he actually has a query to to look at your statistics tables to look at some IO statistics and even PGstats statements to give you a sense of how often tables get read from or written to. And he's predominantly using the info from the PGSTAT IO user tables to help get this insight. And he has a custom configurable ratio here so you can define what is read heavy versus right heavy. He used a ratio of five.
[00:01:51] So once the ratio drops five reads per write, it's considered write heavy because you're pretty much always going to have more reads than writes, even if your database is particularly write heavy. India is an example of what the output looks like here. But basically it outputs either a write only workload, read only workload, or tends to be write heavy or read heavy or even a one to one ratio is considered balanced. So I ran this against my own database to kind of see what it looked like and pretty much more than 95% of the tables were read heavy and a few were write heavy. They were mostly where API keys needed to be refreshed periodically. Those were the tables that had a lot of writes, basically a lot of updates, and I guess apparently not a lot of reads for the number of API calls being made. But he also showed another query you can use to check the amount that's being read or written to the the database system as well, or you could even narrow it by particular statements. So the reality is that most postgres databases are read heavy and if you're looking for configuration changes for the whole database, he says, okay, well then in that case, if it's read heavy, you definitely need effective caching. So ideally all of the working set will fit in your memory. So you can check things like the cache hit ratio to make sure that's as high as possible. You want to be sure, of course, to optimize your queries. And you can add indexes where necessary as long as they're not excessive to actually hinder your writes. And you can scale out with read replicas to help deal with that read heavy workload. Also, because of asyncio in 18, you could give that a try as well. Now, if your application happens to be write heavy, one of the most important things to do because of all the writes that have to happen, particularly with regard to the wall is having faster storage. Now next is that more RAM can help with having a larger shared buffer pool to better manage the flushing to disk. You want to minimize your indexes, of course try to optimize heap only tuple updates. So try not to have an index on a column that you tend to update frequently because then you won't be getting those heap only tuple updates and it will cause a lot more writes to the system. It can be beneficial to tune your wall like adjusting wall buffers and of course optimizing checkpoints like setting checkpoint timeout and the checkpoint completion target as well. But again, the reality is it's also per table you may need to do some optimizations for. So if you're noticing a write heavy table and it's impacting performance, well, you can make some of these changes at a table level to try and optimize it. For example, do some of the index optimizations we talked about or even doing things like changing fill factor potentially to try and optimize different kind of writes. But if you want to learn more, definitely check out this blog post. Next piece of content Benefits of a descending index this is from cyber.postgresql.com and I rarely if ever create an index where I specify the order because I know the database can easily reverse the order of index and find values. So I never create an index specifically for that purpose. But there are three cases he mentions here where, where it could be advantageous to do so. So one is when you have mixed order by clauses. So in this example here, where you're ordering by one column descending and then a second column in an ascending way, you actually need to create an index to be able to properly use the index in this order by. So when you're mixing the directions of the order by, you need a dedicated index to handle that postgres just can't use a single ordering. The next use case is when you can take advantage of some space efficiency.
[00:05:35] So there's actually an optimization in postgres for B tree index where they are optimized for insertion at the upper end. And this particular optimization helps with building the index when you're using sequence generated keys or increasing timestamps. Now that that's the vast majority of use cases. But if you have a use case where you actually have a descending index, and he had an example here where he generated a series going backwards and when you do that you can actually end up with a larger index. So if you look at this, the specifically descending index is actually smaller than the ascending index because this postgres optimization for ascending and actually hinders when you're trying to insert data in a descending way. And your indexes will be smaller if you add them with a descending order. And in addition you can take a look at the average leaf density and it's at 90% with that descending index, but the ascending one is at 50, so it's definitely much more efficiently packed with the descending index in this particular use case.
[00:06:42] The next case it could be advantageous is if you're doing range scans in a descending way. So here he basically restarted postgres, cleared out the cache entirely and did an index only scan forward and backward. And you can tell the backward one for this descending index took over 5 seconds, whereas the non backward one for the descending index only took 680 milliseconds. Now why was this so much faster? It's basically due to Linux's read ahead. So if you think you can take advantage of rain scans in a descending order, then it could be advantageous to create a specific index in a descending order. So I thought this was particularly interesting. So definitely check it out if you want to learn more. Next piece of content There was another episode of Postgres FM last week. This one was on lightweight locks and a lot of the content covers the marathon blog posts that Nikolai has been doing at Postgres AI so a lot of this content will probably be redundant with the content I presented in the last two episodes because it contains a lot of what he talked about. Some new information that was mentioned is that when you actually look in the docs there are probably 50 some different lightweight lock types. So it covers all the different internal areas that can be locked as Postgres is processing its statements. And of course they rehashed the lock manager issues. That can happen as well as buffer mapper thrashing as well. That can happen if your shared buffers is a little too insufficient or you're trying to update the same page multiple times. But feel free to listen to the episode here or watch the YouTube down here.
[00:08:21] Next piece of content Postgres and Kubernetes the commands every DBA should know this is from gabrielebartulini.it and this covers using Cloud Native PG and the commands that you would typically run doing different operations in postgres converted to and kubernetes version. So for example you can see which cluster am I on, what machines are in this cluster, and basically all of them have kubectl commands. So basically if nothing, this is a great reference to say I need to do this activity. What command do I need to run in a Kubernetes environment to get that? So check this out if you want to learn more. Next piece of content say hello to OIDC in PostgreSQL 18. This is from Percona Community and with the release of Postgres 18 it now has OAuth 2 support and they were saying it's a small step to go from OAuth2 to OIDC and this is the OpenID Connect authentication, so they decided to work on a validator library to support it.
[00:09:27] And right now this is in its testing phase, not ready for production, but it should be compatible with Okta Ping Identity, keycloak, Microsoft Entra ID not yet compatible with Google's OIDC implementation, but they plan to get to that eventually. So if you're interested you can check this out. Next piece of content hybrid search and PostgreSQL the missing manual. This is from paradedb.com and we've talked about hybrid search before, but that was PostgreSQL's built in full text search and vector search, basically combining those so you get exact searches as well as semantic search. Well, their concept of hybrid search is actually using the BM25 index type, which is basically what Elasticsearch uses for doing its searching and ranking and combining that with vectors. Because of course what paradedb does is they offer an extension to give you elastic like search within postgres.
[00:10:25] So they explain all of this, of course, and then they start you would get this set up and working as a hybrid search and then combining the results in a reciprocal rank fusion step. So basically this is converting rankings into scores from the outputs of both of these search engines. Now, one thing to keep in mind with doing BM25's lexical search is that right now the community parade DB options are not written to the wall, so these indexes won't be replicated to replicas. And you can't do database restores with the wall and have those indexes come up because they're not maintained in the wall. However, the enterprise version of paradedB does offer those features. So just something to keep in mind.
[00:11:13] Next piece of content waiting for PostgreSQL 19 support copy 2 for partition tables.
[00:11:19] So before you couldn't copy partition tables, but now you can in a patch for Postgres 19. So hopefully that makes it in the next version, but we got a year to wait for it.
[00:11:32] Next piece of content Introducing generalized consensus, an alternate approach to distributed durability. This is from multigres.com and if you remember, Multigres is basically sugu version of the test for postgres. So this is an active development and he's saying we really need to examine how our solution is going to handle consensus. So he talks about the different goals and basically he wants a more flexible consensus algorithm where you can actually customize what kind of durability you want.
[00:12:03] And he's put together a full series describing it and the link is down here that talks about the problem, the foundation, what they're basically looking to build and incorporate into multigras. So if you're interested in that, you can check out this content and the last piece of content hacking workshop for November 2025. This is from arhas.blogspot.com and he's scheduling two to three discussions on Matthias's Van Dement's talk. Improving scalability, Reducing overhead and shared memory. So if you're interested in joining that, you can sign up using this link right here.
[00:12:40] And now it's time for my consulting corner.
[00:12:43] So just a public service announcement with regard to backups, because I actually have seen this more than once where someone takes a backup of the system using say PGBASE backup, and they retain that backup and then try to restore from it. But the problem is they don't have the corresponding wall files. So when you're taking a PGBASE backup, those wall files are essential to do a point in time recovery because the backup won't work without the wall files and you need those wall files from the point at which the backup starts until a consistent point is reached in that wall.
[00:13:19] Now, by default PGbase Backup, the default flags. It does save the wall files for you, but it's easy to configure that or maybe archive those wal files and not archive the backup files and then thinking that you can still use them. So it's just remember when you're taking a database backup, definitely save those wal files. And actually, in terms of backups, I actually have a question for you. If you're doing your own backups, do you tend to just use PG based backup or do you use things like PG Backrest or Barman? In which do you prefer?
[00:13:52] And in addition, since Postgres 17 they have the incremental backups available. Have you started using those? And what do you think about the experience so far? Please let me know.
[00:14:03] I hope you enjoyed this episode. Be sure to check out scalingpostgrads.com where you can find links to all the content we discussed, as well as sign up to receive weekly notifications of each episode. There you can also find an audio version of the show as well as a full transcript. Thanks and I'll see you next week.