Half as Fast on Linux 7 | Scaling Postgres 412

[00:00:00] I love getting new things. I like getting new technology that's hopefully faster or more efficient. But sometimes on the road to improvement there may be issues with other hardware or software you work with, and unfortunately it looks like something new in Linux might be impacting postgres performance. [00:00:22] So we'll talk about that today, but I hope you, your friends, family and coworkers continue to do well. [00:00:29] Our first piece of content is AWS engineer reports PostgreSQL performance halved by Linux 7.0 but a fix may not be easy this is from Phoronix.com and unfortunately the site has a lot of ads and the page tends to rebuild, so hopefully we'll see how easy it is to review this article. Now there's a proposed reversion of changes to the kernel that would resolve the performance issue, but it looks like it may not get in because presumably the 7.0 kernel is due to release in maybe a week by the time you see this now it was noticed on a Graviton 4 server. I don't know if it's only for arch architectures, I don't know if AMD or Intel is affected as well, but this was due to a change in the kernel where it's restricting the available preemption modes of the kernel and a patch was proposed to revert it to get it back to normal performance, but it says it might not be picked up. So now we have the Linux developers pointing to postgres. Okay, this is a postgres issue. They need to change their code based upon the changes in Linux 7. [00:01:42] So unfortunately a bit of finger pointing going on. We'll just have to keep track of this and see what happens in the next week or two. [00:01:49] Next piece of content Postgres bench a reproducible benchmark for Postgres services this is from Clickhouse.com and there has been the public Clickhouse benchmark. Well now they have published a public Postgres benchmark but at the base of it is using PGBench but you can get to the site going to postgresbench.clickhouse.com and as you can see here, amazing performance by Clickhouse on their benchmark site, which would expect some of the other ones mentioned here is Crunchy Bridge, aws, Aurora Neon and there's of course plenty of other services that can be added. [00:02:27] Like I said, they are using PGBench to do the performance check and of course the question I had is why was the performance so much better? [00:02:37] Are they using NVME drives? And that's exactly what it is. When you scroll down to why postgres, managed by Clickhouse leads this benchmark, they're basically attributing it to the NVME backed storage. [00:02:49] So one would assume PlanetScale and their Postgres metal solution would be in the running for the top spot in this as well. But if you want to know more about it or visit the site, you can check out this piece of content. [00:03:03] Next piece of content There was another episode of Postgres FM last week. This one was on what's Missing in postgres and Nick and Michael were joined by Bruce Momjiam to talk about his presentation. What's Missing in Postgres Now I think I linked to this in a scaling Postgres episode a few weeks ago, maybe more than a few weeks ago, but they covered this topic in particular and talked about the presentation. [00:03:28] And Bruce's idea for this was what are the big features that they've just started working on? Or it's not even in the ballpark of kind of what they're working on that people are potentially interested in and not so much particular features or errors missing in a given area. Like something is maybe 90% feature complete. He's not talking about the 10% that still needs to be done necessarily. So looking at the presentation to see what they talked about, I'm looking at slide 32 here where he's basically saying what the current status of things are and he basically listed 12 things that are missing using his benchmark and you'll see the different colors here. Green is something that is in progress work for core Postgres or in some of the maybe contrib module extensions. [00:04:20] Red is no progress on anything and I guess no plans to in the near future. And blue it is an external extension or a fork of Postgres. So the first one is actually numbered 2. Here is the cluster file encryption or basically TDE transparent data encryption. [00:04:37] So he says they are working on it, but you can get forks of Postgres that have that built in. The next is 64 bit transaction IDs that is something that they're working on but like anything it's taking place over multiple versions. The transition to this. But if you want 64 bit transaction IDs today you can use a fork of Postgres. Next thing he mentioned Optimizer Hints which is coming in Postgres 19 at least a first iteration of it, but you can also get extensions that do that as well give you hints. The next is columnar storage. That is something that they're working on, although I don't know about any active work that I'm aware of. But of course there are extensions that help you do columnar storage and other dedicated database solutions that do that. Next is global indexes, which I didn't even know they were potentially working on, but he says that apparently they are. [00:05:33] And these are indexes that cross partition tables so you could actually have a unique index across many different partition tables. The next is direct IO and I know we just had some work with regard to it, but they're continuing to do more and we'll see where that goes in the future. The next is server side threading and I don't know how much this is actively working on because the way Bruce was talking this was something that sometimes the gains aren't worth all the additional complexity to it, but maybe there are certain areas where they could do more threading. The next one is an internal connection pooler that's not even on the table at this point. So we're still basically using things like PG Bouncer, PGpool or newer arrivals like PG Dog maybe. [00:06:19] Next he covers multi host areas and something like Oracle Rack where basically you're decoupling storage from your compute, which is same thing Aurora does. Same thing that a Google solution does. The name is escaping me right now, but no one's really working on that at all in the postgres core being able to separate compute from storage. [00:06:44] The next is multi master replication, so this is something they're working on apparently, but I'm not quite sure what movement has been done in this area, but you can always get that through a fork. Next is logical replication of ddl and I'm actually saddened by this because I think this would be a really cool feature, but it's not being worked on at all, but apparently you can get it through a fork. And lastly is sharding, which is something they're working on. But I think he also mentioned the work has kind of stagnated and has not moved forward that much because the machines you can get now are so large it's diminished the need for sharding. But maybe it means this area is ripe for external solutions which he doesn't mention here, like for example, pgdog or multigres, which the creator of the TESS is now designing for Postgres. [00:07:36] But if you want to learn more, you can listen to the episode or watch the YouTube video down here. [00:07:42] Next piece of content wall as the data distribution layer. This is from richyan.com and he's talking about people who have their application and their data analysts needing access to the data and how they have an ETL process set up and how it takes hours to finish and it can break and cause issues. [00:08:01] And he says it makes them wonder if they need all that plumbing to get a snapshot of their live data set. And back at somewhere he worked, they did something a little bit different. [00:08:12] So, so he said this is what the common pattern is. You see, first you allow people to query the primary. [00:08:19] Now this is a bad idea on a number of levels because you could have some long analytic query, hang up, write operations, lock something for too long, caused increased bloat. [00:08:33] Basically you're potentially fighting against your existing application users. [00:08:37] Another solution is setting up a streaming replica. [00:08:40] But if you do that, those long running analytical queries are probably going to create replay lags. So your replica is not following as closely as it could to the primary. You could hit vacuum conflicts, get canceled queries, and depending on how you set up, I O contention can affect the primary upstream. The next solution is doing nightly snapshots or builds. These could be PGDumps. These could be just exporting data to C CSV and then loading it in another database system. [00:09:09] But you do have more stale data and again, it tends to be a little bit of a brittle process. [00:09:16] But he says, you know, why don't you just use log shipping? So use the wal file. And basically this is like streaming replication without the stream and there's no impact on the primary from doing it. So basically you set up a replica. You don't do streaming replication. You transfer the wall files from the primary to a central location. It could be an S3 bucket or some other volume and then the replica, or replicas, it doesn't matter how many, retrieve those wall files and then apply them to their database. And this gives you close to real time data. And they are replicas of the primary database. [00:09:55] But importantly, there's no impact to the primary from all of these replicas operating because you're just transferring files back and forth. And he even set up a simple demo showing this pattern. So if you're interested in that, you can check out this blog post. [00:10:11] Next piece of content. Don't let AI touch your production database. This is from boring SQL.com and amen to that. [00:10:20] He says the reason why you don't want to do that is because AIs have wiped production databases. But he says really the AI agent just needs some specific pieces of data in order to give you recommendations. [00:10:32] Number one is the schema. So what's the table and the indexes it's dealing with and the columns and the data types. And the other thing you need is these statistics. So if you have those things, you don't need to connect your MCP servers up to your production database. You can connect up to this smaller database that has the up to date statistics and the full schema in order to analyze it. So we actually set up another tool called dryrun, which is a postgres schema intelligence tool. It ships as a CLI and as an MCP server and it works from a JSON snapshot of your schema and it basically captures all of your objects, basically the PG catalog, as well as per table, statistics from PG statistic, as well as runtime counters from BG class, reltuple, SQL scan, index scan, etc. [00:11:26] So now the LLMs have the information they need to do an analysis of something and it doesn't have access to your actual production database nor all the data that it contains. So he goes through an example of how to use this. So if you're interested, definitely check out this piece of content. [00:11:44] Next piece of content. [00:11:46] What is collation and why is my data corrupt? This is from pgedge.com and this basically goes back to the issue that happened in 2018 where the GNU C library was updated for some operating system instances that caused havoc with postgres indexes, causing unique indexes to be invalid for data to not show up because the collation had changed in this new version. [00:12:14] So because of that, he's basically advocating when you set up postgres, use the new built in C collation that's available. [00:12:22] And if you need to use linguistic sorting, go ahead and do that per column or on a per index basis and basically give some recommendations on how to handle that and keep your collation correct for your database. But another solution that avoids this entirely is when you're doing an upgrade or changing the operating system version of your database, actually do it as a logical replication upgrade because that newly creates all the indexes on whatever operating system version you're using. But if you want to learn more, definitely check out this piece of content. [00:12:55] Next piece of content packs the storage engine strikes back. This is from mydbanotebook.org and she covered packs in a blog post a month or two ago talking about the cache performance you're looking for in that it's highly cache efficient. The reason why is the storage is kind of columnar like so she was actually doing a deeper dive into this and realizing some of the ways it handles things based upon the original paper won't work that well for postgres, but discusses a possible way around it to be able to achieve this. But again this seems more like a long term project to try to achieve columnar like performance or at least space efficiency with postgres. So check this out if you're interested. [00:13:43] Next piece of content PGcolumn size what you see is not what you get. This is again is from mydbanotebook.org and she's talking about how the PGcolumn size is great for giving you the size of a column if it's in the primary table of the heap or if it's compressed. The problem comes in when Toast kicks in and does it give you an accurate count. So ideally this should be patched to properly report that, but we'll have to see what happens with that in the future. Next piece of content Schema is in PostgreSQL and Oracle. What is the difference? This is from cybertech postgresql.com and he covers the difference between schemas in Oracle and PostgreSQL. So if you're migrating, definitely something to take a look at. [00:14:28] Next piece of content A conversation with Paul Brell, creator of Tantivy. This is from paradedb.com and again this is the person that built Tantive, which is a Rust implementation of the Lucene engine that powers elastasearch and Solar. [00:14:46] So this is the tool that paradedb wraps to do their full text searching. So if you're interested in this you can definitely check it out. And the last piece of content is actually a new book is available called lift the scaling PostgreSQL beyond query optimization. [00:15:03] So this is available on leanpub.com and it is in a 100% complete state. So if you're interested in a book you can check it out. I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode There. You can also find an audio version of the show as well as a full transcript. Thanks. I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 106

Recursive CTEs, DB Access Patterns, Dates & Times, Cluster | Scaling Postgres 106

Episode 141

Unattended Upgrade, ARM Benchmarks, Exploration, PostGIS Performance | Scaling Postgres 141

Episode 68

Prepared Plans, Materialization, Recursive CTEs, Safe Migrations | Scaling Postgres 68