Absorbing Load | Scaling Postgres 411

[00:00:00] Normally when you have a performance problem with postgres, the first thing you do is probably look for what type of index to add, because generally that's the fastest way to resolve a performance issue and maybe that's 90, 95% of the time. However, where things get interesting is when that doesn't always work. And we're going to talk a little bit about that with the first post this week, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is absorbing the load this is from pgedge.com and Sean mentioned that he was looking at the PG SQL performance mailing list and noticed a question that popped up with regard to a top end query that had quote gone wrong. [00:00:49] So it had a join and some CTEs, a dozen tables, a few million rows, and it required two to three seconds of execution time, which he says, you know, it's not ideal, but it's not a showstopper. But the problem was this was being called a lot. So it had a high concurrency problem. [00:01:08] So when you have 40 of these queries running at the same time, it saturated all the CPU threads. And like I mentioned in the intro, a new index was sufficient. [00:01:18] But he theory crafts, okay, what if you couldn't get your way out of this by adding an optimal index? What do you do? [00:01:26] So he says, suppose you have a Query that takes 2 seconds after being fully optimized and there are 4000 users that each need that result once per page load spread across a 60 second window. [00:01:39] So that's something you can't really easily resolve. But he comes up with some solutions for this that don't entail adding an index. The first solution he discusses is materialized views. So you could create a materialized view which is an actual table that represents that query. So querying that table should be much faster and you can even add indexes on top of it. [00:02:04] So it's a way of creating a summary table with specifically designed indexes. [00:02:10] Now the problem with this is that this freezes the state of the data at the point at which it's built. So if you want fresh data you continuously have to refresh it. And of course that can take time to do, but it's not a good solution if you need real time data. [00:02:28] So another similar solution is the PGIVM extension, which is the incremental view maintenance extension. And basically it uses a materialized view but but it applies triggers to the source tables that keep that view up to date. So you actually take a write performance hit with the triggers, but you do get pretty much real time data with regard to it. So we say as quote, we've essentially traded faster reads for slightly slower writes. In addition, currently PGIVM does not support having order by limit window functions union subqueries in the view definition. So clearly it has some limitations, especially the order by and limit, because many queries I've dealt with of course use that. Okay, well what if your writes are happening too fast? [00:03:21] Another solution is can you cache things? [00:03:25] So he says, quote here the TTL of your cache only needs to be greater than or equal to the query execution time, which is essentially saying, as he says here, quote, a Query with a 2 second runtime has a minimum resolution of 2 seconds. So he says, could you run a query with a background worker or background process, say every 5 seconds and cache that result? So when an incoming request comes in from the application, it uses the cached version. So you've separated the client request coming in from always hitting the database. You have a backend process that just hits it once every five seconds, refreshing the data and the clients get that fresh data that is always at least five seconds up to date. And he says, quote, clients never touch the database directly, they only read from the cache. And he shows a Python example of that here. [00:04:19] So the next logical question is of course, where does this cache live? So it can live in the in process memory of every application server, but you're still going to get a query per application server. If that's the case, that may be fine depending upon how fast the query runs. But if that still doesn't meet your needs, then you can go to a common cache. And now we're probably talking about Redis. So you can store the results of this query in a common cache such as Redis. [00:04:51] And yes, that means not using postgres, of course. And he mentions how you could probably handle this solution, but another way to solve it is, as he says here, pushing it to the edge. So if the queries you are running rather quickly have common public data, they don't need user specific data, then you can cache it on a CDN like Cloudflare, Cloudfront Fastly, etc. [00:05:17] And you just set the appropriate cache control header for that page. But if you have a common public dashboard, this could be a great solutions to that. And then he has a great flowchart here on how you could speed things up. So is your data public? You could probably use a cdn. If not, does the data change frequently? If it doesn't, you could probably get by with a materialized view that's refreshed concurrently if it is changed frequently. [00:05:43] Okay, do you have high write throughput? [00:05:46] If you don't, you can use PGIVM incremental view maintenance. If not, probably need to do something like Redis or some sort of caching of the results. [00:05:56] And he says in reality most high scale systems use multiple layers simultaneously. So he gives a convenient chart here to assess which type of caching layer would be best for you. [00:06:10] And I will say the other solution that I've used when it's appropriate. Particularly if you have append only data, you could maintain a history of all of that data in a separate table and then only look at say the last day or two in real time. And then when a query comes in you look at the raw tables for say the last day or two. But anything past that you consult the historical data table and merging those results together can give you fast real time data. But that's another solution that can be done. [00:06:43] But I thought this was a great blog post. Definitely encourage you to check it out. Next piece of content PGText Search 1.0 how we built a BM25 search engine on Postgres pages this is from tigerdata.com and I had mentioned an earlier version of this. Well now it is released and apparently ready for production. So this is an extension you can install and it uses BM25 indexes which are essentially like elasticsearch indexes and they have a lot of benefits mentioned here compared to the standard text search built into postgres for doing lexical searching, particularly with regard to ranking. [00:07:26] And here's some of the updates in this release. [00:07:29] So this is ready for production. They say they used to have a memory only architecture for the indexes, but now they're using disk based segments, they're using block, max wand and one optimization for fast topk queries. They're using a posting list compression with SMID accelerated decoding resulting in 41% smaller indexes. They have parallel index builds enabled and they do have some performance results comparing themselves to Parade db looking at performance concurrency. We'll look at here in a second. So they do go into a lot of details about the architecture. I'm not going to cover in the interests of time, but they do describe a lot of it in detail. And this was all written in C and it seems like it might be a little bit more native postgres like solution compared to Parade DB because they wrap an existing search engine with their extension, whereas this is kind of written from the ground up in C and we'll take a look at some of the benchmarks here. [00:08:36] So this particular system size, it's an eight core 16 thread system, 123 gigabytes of RAM with an NVMe SSD. [00:08:44] So the index size is 17 gigabytes compared to paradedB's 23 gigabytes. The build time is about twice as long for PG Text Search compared to paradedb. [00:08:56] But maybe because it's doing more compaction as to why it's smaller, because that should lead to better performance because there's less you have to read. And based on how many Lexemes you search, you get a dramatic speed up with pgtext search vs parade db. [00:09:14] So with one lexeme it's as high as almost 12 times faster for PG Text Search, but. But it drops down to a one to one ratio when you're at seven lexemes and higher. [00:09:25] So they say quote pgtext Search is fastest on short queries, but the systems converge at longer lengths and in terms of concurrent throughput, it was about eight times higher throughput for PGText search versus parade DB. But again, you always need to be cautious about benchmarks. But it is great to have now two solutions that offer BM25 index capability with Postgres. And they do mention there are some limitations with BG text search. So you can't do phrase queries, but there's a workaround to do that that they demonstrate here using an I like query wrapped around the beam 25 search. Right now they're only using or only query semantics. They're planning to add and or not operators. [00:10:16] There's no highlighting or snippet generation, there's no expression indexing if you want something like that. They say you can use a generated column to concatenate multiple fields. There's no fuzzy matching or typo tolerance. [00:10:32] I don't know if that will be coming in the future or not. And of course you have to use shared preload libraries. [00:10:38] So again, this is great to have another BM25 text searching solution in Postgres. And it seems like it's built more natively to me compared to paradedb. But this is freely available via open source, so if you're interested in it, go ahead and give it a try. [00:10:58] Next piece of Content Good CTE Bad CTE this is from boringsql.com and this is talking all about common table expressions and a lot of it is talking about postgres and how it decides to materialize or not materialize something. And he said, you know, before Postgres 12 all CTEs were materialized, meaning it created a little mini table and then it ran the other parts of the query, the CTE. But then since Postgres 12, everything was rewritten and all the CTE was inline. So it's essentially exactly like doing a subquery. There was no materialization. [00:11:37] However, you can use the materialized keyword if you wanted that behavior. But now he goes over a bunch of cases to determine when does a CTE get materialized. So the first example, when you have a single reference and no side effects for cte, it's always inlined when you have multiple references for a cte. [00:12:00] So here summary is used twice. It is materialized. [00:12:04] So that makes sense. A recursive CT is always materialized. [00:12:09] A data modifying CTE where you're actually say deleting data or updating data is always materialized. [00:12:16] Those with volatile functions like random are materialized. Those with stable functions like now can be inlined. So they're inlined. And like I mentioned, you can make a CTE materialized just by saying as materialized. If you're doing for update or for share, that gets materialized. [00:12:38] And you see there's a chart here covering when things get materialized or inlined. But the thing about materialization is you don't really get access to any of the statistics. But statistics propagation does happen in Postgres 17 and higher. Now this blog post continues to go on to talk about CTEs with a lot more detail. So if you're interested in learning more, definitely encourage you to check out this blog post. Next piece of content. The Hidden behavior of Plan Cache Mode this is from RichYen.com and he's talking about when you're working with prepared statements. When a backend starts, the first five executions are done with a custom plan. [00:13:18] That means they develop the plan based upon the actual parameters for the first five executions of that query. [00:13:26] So that's most optimal. But it requires planning overhead. Every time it's done. [00:13:31] On the sixth execution you get a generic plan and it compares the average cost of the custom plans that were done against the estimated cost of the generic plan. And if the generic plan is cheaper or equal, the planner switches to it permanently for that session. Otherwise it keeps the custom. [00:13:50] And he said this really isn't a problem unless you have a lot of data data skew. So he shows an example here where you just have a yes no flag and there's so many more yeses compared to the nos. So when you force a custom plan and you're doing an explain of looking up why you're going to get a sequential scan. That's the fastest way to look up the data rather than doing an index scan and then scan the table because there's so many values. [00:14:19] Whereas if you force a generic plan and again try to look up why now it's going to do an index scan first because it believes the costs are lower than doing the sequential scan. But of course it's not going to be because it has to look up all those values in the index and then look them up in the heap. But he shows the five times it running you get custom plans and then the last time you get a generic plan and the plan stays locked in until you deallocate the prepared statement or the session ends. So if you want to learn more about that, definitely check out this blog post. [00:14:54] Next Piece of Content There was another episode of Postgres FM last week. This one was on long running transactions and they talked about well in general databases are going to have some long running transactions. The key is how harmful can they be? [00:15:09] Because the problem with a long running transaction is that remember that locks only get released at the end of the transaction. [00:15:17] So as long as their locks are in place you can't free up tx IDs, you can't vacuum those tuples, etc. And with MVCC and the transaction IDs, if this just happens on one table it can impact the whole system because those TX IDs can't advance and be freed up for future transactions. They also mentioned cases where maybe a PGDUMP was going on and then someone tried to run a migration which caused tons of locking issues to occur. [00:15:48] In terms of mitigating long running transactions, the key thing to do is to set your timeouts so things like statement timeout, the transaction timeout if you're on a recent version of Postgres, especially the idle in transaction session timeout and lastly to avoid excessive locking or lock queues that can happen is using a lock timeout. So if a statement locks other things for too long it basically sacrifices itself. [00:16:16] But there was definitely a lot covered in this episode. If you want to learn more, definitely encourage you to listen to the episode here or watch the YouTube video down here. [00:16:23] Next piece of content PGSurface conf the spell your team forgot to learn this is from mydbanotebook.org and they're talking about the pgservice.com file which I had only learned about maybe a year or two ago and it's basically any or an INI format that you have an alias to your service and then you put all the connections parameters within it except for the password. And then you can basically connect to the service by running psql service equals and whatever the alias you set up, generally this would be the name of the service as opposed to prod or staging. But you could clearly set it up that way if you want. [00:17:05] And she mentions it lives per user. So this is generally the pgservice.com file for a particular user. This is how I've generally used it. Although you can set a system wide one based upon your system config directory for postgres, because a lot of times I also use these on application servers. So I'm just using whatever the application user is and the PG service file. And because you can change the name of this file, you can it's wherever the $PG service file points. [00:17:36] She actually has set up a configuration where she has dedicated service files for production and staging. So that way she just connects to whatever the service name is. And as long as the environmental file is accurate, she's going to connect to staging or production. And she wrapped it all in a script where she can just run this to connect to a given service. [00:18:01] And then you don't want to put the passwords in here, but you basically use this with a pgpass file, so you put the passwords for it in there. [00:18:10] So if you want to learn more, definitely check out this blog post. [00:18:13] Next piece of content waiting for PostgreSQL 19 add, update delete for portion of this is from depeche.com and this is about temporary tables. So this is where you're essentially storing history in a table. [00:18:28] So two months ago the value was 6 for a unique record. [00:18:33] Now the value is 8 and maybe in the future it will be 10. But it's still a unique record. But that history is all stored in the same table. [00:18:43] And this enhancement adds a easier way to do updates and deletes. [00:18:50] So he shows an example of how to do it before and now how you can do it now. So it's a simple update statement, whereas before it was a fair amount of manipulation you had to do and again a simple delete statement as well. But this definitely becomes a head scratcher. So if you want to learn more, definitely check out this blog post. [00:19:10] Next piece of content waited for PostgreSQL 19 JSON format for copy 2. This is from depeche.com and this is awesome that potentially we now will be able to export data not just as CSV or as text, but now as JSON so you could send this to a programming language and have it parse the JSON, which is generally much easier to do than text or potentially CSV files. So check this out if you're interested. I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode. There you can find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 19

100K Tenants, Extensions, Query Planning, Crosstabs | Scaling Postgres 19

Episode 52

fsync Stopgap, CTE Changes, autovacuum_naptime, Postgres Community | Scaling Postgres 52

Episode 309

Will Postgres Pull A Redis? | Scaling Postgres 309