Latency Killing Performance? | Scaling Postgres 398

[00:00:00] Happy New Year, everyone. I hope your holidays went well and I hope your 2026 is looking good as well. And we'll start off this episode asking a question. [00:00:11] How many barriers exist between your application server and your database server? Now, what do I mean by barriers? Well, that could be a network path that has to traverse, or your application server could be on your database server, although that's not very common nowadays. Frequently at least, you have multiple application servers that talk across an Availability zone to speak with a single primary database. And maybe you have other proxies you route through as well. [00:00:44] Well, every point in that chain increases latency of the connection and unfortunately that can kill your performance. And that's what we're going to talk about this week, but I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL performance latency in the Cloud and on premises. This is from cyberduck postgresql.com and he's showing an example. You have a client app that sends a queries a select query to your database system. [00:01:15] The execution happens here and then the result set is returned. [00:01:19] Well, even if you have a super highly optimized database for queries, the latencies of the select and then returning the result could be killing your performance. [00:01:31] Now, to test this out, he ran PGBench and he did a UNIX socket connection, so not over tcpip. And for his configuration, he got an average latency of 0.019 milliseconds and a transaction per second of 51,000, almost 52,000 transactions per second. And this is for a single connection. But when he tried over TCPIP using Localhost, that same run almost doubled the latency to 0.034 milliseconds and reduced the transactions per second to 29,000. [00:02:10] So throughput dropped by 44%. That's pretty close to half just from moving to a UNIX socket to a local host. [00:02:18] Now, if you're actually using a real network, and in this example it just had a single hop running the same PGBench, the latency basically jumped 20 fold to 0.42 milliseconds. [00:02:31] That cratered the transactions per second down to 2378. So that's a massive performance hit. Now, again, this is with a single connection. [00:02:42] How you can help to alleviate this situation is to ramp up your concurrency. And in this example, he ran PGBench to the localhost again with four clients. So as opposed to 14 clients to the localhost, the latency average was 0.049 milliseconds, which was a little more than the latency average for the single client at 0.034 milliseconds and the transactions per second was 82,000 compared to the previous result of 29,000. [00:03:17] So the concurrency definitely helps improve throughput. But that latency tax is still there, so you definitely want to be aware on how many jumps or boundaries you have between your application and your actual database server and try to minimize that as much as possible to get the best performance or another alternative that we've seen in previous blog posts is trying to batch up work as much as possible to send to the database. This could be a CTE or joining some queries together. I'm not necessarily going to endorse doing your business logic in procedures or functions because I think that causes other issues. [00:03:58] However, if you have a particular process that you need more performant, maybe that's a way to go. But check this out if you want to learn more. [00:04:09] Next piece of content PostgreSQL Santa's naughty query List how to Earn a Spot on the Nice Query List this is from drunkdba.medium.com and this is basically a top 10 list of knotty queries. So knotty query number one is using select Star or Select Asterisk, basically pulling all the columns of a table when you do a query. [00:04:31] And it's bad because this increases your cost of the query. Everywhere you're going to be using more memory, you're going to have more cache misses. If you need to do sort or hash operations that are going to be bigger, there's a bigger network payload. Another deficiency is that index only scans pretty much become impossible because you're pulling back so much data and you'll pay a bigger spill tax. That means if things exceed memory, you need to spill to disks. That'll happen sooner the larger your queries are. [00:05:00] Now, of course the way to fix it is to only select what you need, so that's pretty easy to do. Number two on the list is hitting the order by limit issue, where the optimizer sometimes chooses to prioritize the column being ordered by thinking it can more quickly find the rows you need by doing a scan of the order by as opposed to filtering by what's in the where clause. [00:05:25] So that can definitely tank your performance. [00:05:27] Now he does offer a number of fixes for this. [00:05:31] Each of these I think I've mentioned on the show previously, but the first one is a manual query change where you basically make the order by a function as opposed to a column. So if it's an integer, you add zero to it. It's still the same value. But now postgres considers it a function. It won't use it unless there's an explicit functional index. [00:05:50] So if you're ordering by text string, you just add an empty string to it. Or if you're ordering by a date, you add a 0 second interval to it. Another way to get around it is to set column statistics for generally the columns in the where clause that can help the planner understand oh, this is a more efficient route to retrieve the right data. And then the third one that he didn't mention. But a lot of times a compound or a composite index will work. So you indexed by tenant ID and then by the column that you're ordering by. Number three is dealing with idle and transaction issues. So this is where a client starts a transaction and then it starts doing all sorts of work while this transaction is sitting open on the database. So autovacuum running now, can't reclaim any dead tuples, bloat grows on the indexes and the tables, and you're also just holding a database backend open for no reason. So basically keep your database transactions as short as possible to avoid that. He also recommends setting idle and transaction session timeout and also statement timeout as well. Number four is mega ctechains, and he says this can have issues with row count estimation errors, so you may get some bad plans with it. So his recommendation is to split the CTEs into temp tables, which kind of gives me pause because I know we have heard about, as he mentions here, the PG catalog bloat issue. But he says he's seen bigger wins from good auto vacuum tuning than worrying about catalog bloat. But personally I would be hesitant on going this route unless it was absolutely necessary. I'd probably still stick with the CTE and see if materialization helps at all with it. Number five is wildcard search. They're doing like in searching for a value within the middle of the string, a normal B tree index can't optimize that. You can create an index that helps with anchored left searches depending on the collation of the index or what kind of operator you add to the index. But if you truly need an internal search of a string, he suggests using the pgtrigram extension to break that string up into three character trigrams. [00:08:08] And then you can use gin or just indexes to be able to search that query pattern. He also mentions Biscuit, which I have not heard of, but he has a link to a GitHub project so you can check that out if you're interested. [00:08:22] Number six is functions on indexed columns well, and it's only bad if you don't have the corresponding index for it. [00:08:30] So if you're going to have a query with an expression in the where clause, you'll need an index on that expression as well to make it efficient. [00:08:38] Number seven is or conditions that derail index usage and sometimes the planner can have issues choosing what the best plan is, but sometimes it's best just to split it up using a union all so you have one query where a equals one, another query where b equals two, and just do a union all between them. Number eight is mismatched join types or implicit casts on join keys. Thankfully I haven't seen too much of this, but he's basically saying fix your schema or your data types first as opposed to explicitly having to cast things. Number nine is missing indexes for join keys or foreign keys. So basically index your foreign keys and then lastly is select distinct as a band aid for join explosions. So I've seen it's quite convenient sometimes to just add a distinct to a query and says oh look, it's working. But as he says here, it often hides missing join predicates unintended many to many relationships and data modeling issues. So his recommendation is to actually fix the join. I mean, maybe you need to use GroupBY in some cases, but just be cautious with it. [00:09:48] But if you want to learn more, check out this blog post. [00:09:51] Next piece of content you don't need Elasticsearch BM25 is now in Postgres. This is from tigerdata.com and in spite of the little bit of a misleading title, BM25 is now in Postgres. It definitely wasn't released in Postgres 18. So what they're talking about is Tiger Data has open sourced PG Text search which offers a BM25 index to do text searching on your data. And this blog post basically shows some of the disadvantages of using built in Postgres text search versus the BM25 method of searching. [00:10:28] It helps avoid keyword stuffing issues, it helps prevent common words from dominating. It helps reduce prioritizing long documents over shorter documents and helps with all or nothing matches. And this index type is available if you install the PG text search extension and once you go to the GitHub link here you can create the extension pgtextsearch. Create create BM25 indexes on your Data and then use this query syntax to do a text search of your data. And they even show an example where you can use this with PGvector Search to give you both semantic and syntax search. And then you combine those results together. We have talked about some solutions using Reciprocal Rank Fusion. So that's a way to combine the results together to give you the best result set. [00:11:25] But they show an example here using Reciprocal Rank Fusion. So they have a CTE doing a search using the BM25 index and then a search using a vector and then combining those result sets together. And they actually have a demo here of a GitHub link that puts them all together. [00:11:44] Now when you look on the GitHub repo for it, you see it is open source. My understanding is that it is using the postgres license, but it's still considered pre release and not yet recommended for production use. So keep that in mind. But it's definitely something to keep track of to see where this project goes next. Piece of content. Instant database clones with PostgreSQL 18 this is from boringsql.com and we've all heard about database clones and some vendors say use our solution because you are able to do clones super quick. [00:12:20] Well, he's actually come up with a method with PostgreSQL 18 and the right file system to do cloning super fast in your local system. And he does it by using the create database command and specifying a particular template. Now he says before version 15, when you created a new database, the default strategy was doing a file copy. But since Postgres 15, it's using a wall log method. And this is very slow. [00:12:48] What you really want is the file copy method. So basically file copy is the strategy, but there's another setting you can change, which is the file copy method. So by default it's just copy. And if you're using a file system like ext4 on Linux, it will just be doing a standard file copy. However, if you have another file system like ZFS XFS with ref links or apfs, you can switch it to clone. [00:13:18] So basically these are the requirements. Basically a ZFS XFS file like system postgresql18 because it has the feature to be able to specify clone and set file copy method to clone in your config. And he shows an example here where he sets up a six gigabyte database and he copies a database from a template using the wall log strategy. And it took about a minute and seven seconds. [00:13:45] And then he does strategy file copy and it took a little over 200 milliseconds. Now, even though it says strategy file copy, he is actually doing a clone because the file copy method is set to clone. So that is a little confusing, but just keep that in mind. And he says if you check the data now, because this is now using copy on write, which means it's pointing at the same database files, but once you make a change to basically shows different data blocks changing in size. And he shows an example of that with xfs. Now this is great for test environments or development environments, but the thing that's a disadvantage is that the source database can't have any active connections during cloning, and that's a postgres limitation. So you're probably not going to be using this on your production database to make a clone. The other disadvantage is that it only works within a single file system, so you can't move that clone elsewhere now. And of course the solution won't work with, you know, the various cloud environments, although they do have ways to do things like EBS snapshots to try to get a quick copy of something, but definitely not as seemingly easy as this. But check this out if you want to learn more. Next piece of content Cloud Native PG in 2025 CNCF Sandbox PostgreSQL 18 and a new era for extensions. This is from gabrielebartolini.it and he's just covering the successes in 2025 for Cloud Native PG. So if you want to learn more about that retrospective, you can check it out here. [00:15:19] Next piece of content There was another episode sometime over the past two weeks from Postgres fm this one was on archiving and they definitely had a little discussion on okay, what does archiving mean to you? Does archiving mean backup? Or does archiving mean you take a subset of older data and then just place it on a less expensive area or file system or disk so you can access it periodically? [00:15:45] So really the question comes down to what do you mean by archiving? Are you just talking about a backup or you still want that data accessible in some way? [00:15:54] So how often are you going to be accessing it and how fast do you need to get access to it? Because a lot of times solutions for archiving data means it takes longer to get access to that data. So for example, glacier on Amazon S3 now there's a couple of different ways they mentioned you can do archiving. It could just be a backup. So maybe you just take a backup and you store it on S3 and you can retrieve it later if you want to, and restore the data. It could also mean using partition tables. So maybe you partition your data and then periodically you detach those partitions and either delete them or take those tables and export them, maybe using pgdump to another file System or to S3 for storage, and then you can retrieve that data later if you need it. Or archiving could also mean extracting data and putting it into a different format. Like maybe you want to take historical data and put it in a compressed parquet file or an iceberg file and archive that on storage that then you can retrieve to access later. So definitely an interesting discussion, and if you want to learn more you can listen to the episode here or watch the YouTube video down here. [00:17:08] Next piece of content the Oom Killer Summoning Ritual Just increase workmem this is from drunkdba.medium.com and if you have postgres backends that start disappearing, which is a bad sign, and you see that the kernel is reaping postgres processes, but someone spots an out of memory and says hey, let's increase workmem, that's the wrong thing to do. As he says here, that recommendation is frequently backwards for operating system out of memory killers because you're actually instructing postgres to use more memory, whereas the operating system is fighting against it, trying to kill processes because it's using too much memory. [00:17:50] And keep in mind that work memory is a per operation budget for executing nodes of a query like sorts or hash tables before they spill to temporary files. So that's a normal operation. The executor is spilling to disk when it runs out of memory, that's fine. The problem is if the operating system kernel is actually killing processes, that's a big problem. And the number one thing to do in that case is to reduce workmem as well as maybe other memory settings like maintenance work mem. And he talks a little bit about how the multiplier calculation is done for hashes and sorts, and shows the calculations on why increasing work memory can actually make the out of memory killer try to kill even more. And he shows an example of where adjusting it upwards slightly takes the memory usage from 38 gigabytes to 76 gigabytes. [00:18:45] And the main thing to keep in mind is that you should have a relatively modest sized work mem for your entire database configuration. But if you have, as he says here, a reporting job, an ETL or an admin session that needs more work memory, go ahead and set it per session so that way you just have one back end that has that increased work memory to do the work it needs to do, and then once that connection is finished where that user's commands are done, that level of work mem is not needed anymore. And if you did set up postgres yourself, be sure to set the Linux over commit behavior for postgres. That's definitely important to do, but check this out if you want to learn more. I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links for all the content mentioned, as well as sign up to receive weekly notifications of each episode. There you can also find an audio version of the show as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 152

New Releases, Reconnect After Failure, Bablefish, Troubleshoot Disk & Memory | Scaling Postgres 152

Episode 326

Faster Index Creation | Scaling Postgres 326

Episode 171

Practical Partitioning, Understanding pg_repack, Explaining Buffers, Multi-row Contraints | Scaling Postgres 171