PgBouncer, PgCat, Supavisor Fight!!!

[00:00:00] Before we start the show this week, I wanted to let you know that there is one more webinar time available for postgres performance demystified learn a simple framework to achieve a thousandfold improvements. This last time is on February 21, which is a Wednesday at 01:00 p.m. Eastern Standard Time. It's entirely free, so I welcome everyone to attend. And in case it was confusing as to why there were three times, this is just the same content, but giving you three opportunities to be able to see it. So if you thought you had to attend all three, you don't. You just need one time. So now's your last opportunity to see it if you want to, but I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is postgresQl 16.215 point 614.1113 point 14 and twelve point 18 is released. This fixes a number of bugs and also has done some improvements, mainly for postgres 16. But some of the issues also affect previous versions. But there's two significant things you should look at. One is the security issue that was resolved, and it's for CVE 2024 0985, which is postgresql. Nonor refresh materialized view concurrently executes arbitrary sql so basically, as it says here, if a materialized views owner could persuade a super user or other high privileged user to perform a concurrent refresh on their view, the views owner could execute arbitrary code. So if you think you're vulnerable to that, definitely want to accelerate applying these particular releases. But it looks like it only affects twelve through 15, so not 16. The other important point to make note of is that there was a fix to gen indexes where they say quote you may need to reindex after updating to this release and it says please see release notes for more information. And looking at the release notes, it says one of the bugs that was fixed could result in corruption of gen indexes during concurrent updates. So basically the reason you may need to do it is if you expect this type of corruption so personally to be on the safe side, I would re index your gen indexes after applying this fix. But let's get on to the main event, which was the title for this episode and that is benchmarking postgresql connection poolers, PGBouncer, PGCat and supervisor. This is from Timbo IO and they took PGBouncer which is the probably reigning champ in terms of poolers that are used at least in terms of market share, and compared the performance against PGCAT and supervisor two up and coming connection poolers. Now, I should say this fight, this comparison is mostly from a performance perspective. So PGCAD and supervisor are adding a lot of features, and there's even features that vary between PGCAD and supervisor. So that may explain why some performance was worse than one compared to the other. Some of the things to make note of is that PG bouncer is single threaded, written in c, PGCAt is written in rust and multi threaded, and supervisor is written in elixir and is multi threaded. Now this post did discuss in terms of the setup, it's relatively easy to set up PG bouncer. It seemed like it was relatively easy to set up PGCAD as well, but supervisor was a bit of a challenge with the different moving parts that they needed to do. But let's take a look at the performance results. Dark blue or purple is baseline postgres, not reconnecting for doing the performance test. So it didn't do a disconnect and reconnect, it just ran the benchmarking workload without taking a hit from breaking down or starting new connections. So as you can see, this had the lowest latency, the more red is baseline reconnect. So this is postgresql. You're not using a pooler and you're connecting and disconnecting for the benchmark run. And this has the worst latency, which is why you would generally want to use a pooler. Then the purple or the pink is PG bouncer, the blue is PG cat and the yellow is supervisor. And at relatively low number of clients, ten to 25 pg bouncer still comes out on top in terms of having the lowest latency. But PG Cat starts to win at 50, 75 and 100 clients, and it's super, super close to postgres without any pooler at all where they're not doing connections and disconnecting. So that's pretty amazing. Unfortunately, supervisor having a lot of latency issues. It seems more than double what PG Cat is at 100 clients. Now let's take a look at the throughput and it's a similar story. The highest throughput you can get is where you're just connected to postgres with no pooler and you're not connecting and disconnecting during the benchmark run. The most atrocious throughput is not using a pooler. As you can see, the red is here on the bottom for all the different numbers of clients. Again, at low counts there's a higher throughput on PG bouncer. But once you get to 50 and higher. PGCAT takes over and again has more than double the throughput of supervisor at 100 clients. Then he moved on to testing larger connection counts like up to 2500 connections. And here the latency story stays the same pretty much. PGCAT coming on top with the lowest latency ramp where a supervisor has the highest one, and then looking at the average throughput amongst all these clients. The throughput for PG bouncer went up to around 45,000 transactions per second, but then dropped as the number of clients increased to around maybe 28,000 and continued to drop from there. Supervisor was worse at peaked at maybe 21,000 and then declined to around 20,000 but still relatively consistent. Whereas PGCAT was amazing at up to 59 transactions per second and held pretty consistently until it got up to maybe 2500 client connections. Then he showed statistical variation with the different query latencies. PG bouncer was relatively consistent, PGCAT relatively consistent, but they had huge spikes in supervisor. So again, there seems to be something quite amiss here. Now I should say as you look at these benchmark results, you always have to take them with a grain of salt and even question who's doing these benchmarking runs? But temmo IO to my knowledge, shouldn't prefer PGCAD over supervisor or PG bouncer because I'm not sure they had a hand in any of these. But if you happen to know that, please post it in the comments. The next chart takes a look at cpu utilization and it looks like at PG bouncer and 50 clients it hit 100% of the cpu usage. So this chart is from zero to 800 because there's eight cores in the machine. So it was using all of one core because PG bouncer single threaded supervisor hit close to 100% or basically pegged using seven or a little bit more of the cpus at only 100 clients. So that's pretty crazy how much cpu it's using for only 100 clients. Whereas this chart for PG cat where it shows it going up to using up to six cpu cores at 100%, it does that at 1250 clients. So over ten times more than supervisor. So this may make it ten to 20 times more efficient. I'm not sure, that's pretty crazy. Then he shows some latency with 1250 clients and again less consistency with supervisor. But PGCAT again coming out on top with the lowest latency at this high connection count and it gets even worse at the p 99. So I thought this was fascinating and if this holds true, it looks like PGCAD is going to be my new go to for a bouncer, given its performance. But also given from what they said here, is that the installation setup is relatively easy. I haven't set it up yet, but I'm definitely going to have to give it a try now. And are PG Bouncer's days numbered at some point? Again, one of the big inconveniences about PGBouncer. I mean, yes, they've been adding new features that have been great, like supporting prepared statements for transaction pooling, but the thing that would be really hard to change about it is its single threaded nature. So would that require a ground up rewrite almost to be able to support multi threading? I mean, you could do it, but it would seem to be hard to do, whereas multithreaded comes out of the box with PGCAT and even supervisor. But definitely check out this post if you're interested. [00:08:46] Next piece of content PG analytics transforming postgres into a fast OLAP Database this is from blog paradb.com and this is a new extension they're calling PG analytics, and it basically gives you a column analytics based store in postgres. And they showed a clickbench score down here you can see the top one is clickhouse, of course using a single parquet file, but Paradeb using this new extension is the next fastest one, followed by elastasearch. And then even postgres tuned is down here as well. So again, because this is the person who made the extension, you have to take these benchmarks with a grain of salt. However, I find it really interesting what PrayDB is doing. So what they did is they developed a new type of table for columns storage called Delta Lake. And these delta lake tables quote behave like regular postgres tables, but use a column oriented layout via Apache arrow and leverage Apache data fusion, a query engine optimized for column oriented data. So they're actually storing the data in Parquet files, similar to what the clickhouse benchmark was reporting up there. And using this data fusion embedded engine to be able to query things very quickly. And this is very similar to what they're doing for the text search. So if you remember, a month or two ago I covered this post from Paradb where they were talking about a new extension they're calling PGBM 25, where they're embedding a search library called Tantavi in this extension. And Tantavi is a rust based alternative to Apache Lucine, which is the search library that powers elastasearch. So it gives them amazing search performance. Well, they're doing the same thing with this for data analytics or column storage data. And then as a result of this you can even see they've gotten huge file compression down five times more than postgres is normally. So you can see clickhouse again is the smallest one in terms of data file size, but parade DB follows up next, followed by timescale compression. So this is an open source library. It's available now. I think they said it was definitely an early release, but you can get started with their docker container to see how it works. So it's definitely something I'm going to keep my eye on and definitely encourage you to check it out as well. [00:11:11] Next piece of content what's new in the Postgres 16 query Planner Optimizer? This is from citusdata.com and this is a really interesting post. So they go through ten improvements that have been made to the planner in postgres 16 alone. And actually the author of the post, David Rowley, actually did many of these improvements. And a lot of these optimizations are for particular types of queries that the planner sees. So some performance may be a 50% improvement or a 500 fold improvement, or I even think I saw one was 1250 times faster. I think that was this 9th one short circuit trivial distinct queries. So if you don't have to do work, that's the fastest way to get an answer to something. So I'm just going to list what these are, but I do encourage you to read this. I don't have time to go through all the detail in each of the ten because it's quite significant what he discusses here, but it's incremental sorts for distinct queries, faster order by distinct aggregates memoise for union all queries support write anti join, parallel hash full and write joins optimize window function frame clauses optimize various window functions, join removals for partition tables, short circuit trivial distinct queries, and incremental sort after merge in more cases, so I definitely encourage you to check this out. However, this is also the post that Lucas did in his five minutes of postgres faster query plans with postgres 16, incremental sorts, anti joins, and more, and it took him a while to go over it because his five minutes of postgres went to about eight plus minutes. So you can check out this content if you want to hear another discussion about this next piece of content. Recommended resources to improve your knowledge of databases and postgresql this is from blog annariat info, and this is a list I think of about eleven books, he suggests. The first one is the manga guide to databases by Mana Takahashi. Next is database design for mere mortals by Michael Hernandez. Next is SQL anti patterns volume one by Bill Carwin. Next is SQL for smarties advanced SQL programming by Joe Selko. Next is the art of SQL by Stefan Farut. [00:13:31] Next is SQL performance explained by Marcus Winnand. Next is indexing beyond the basics. I actually didn't see the author for this one. Next is practical SQL by Anthony Debaros. Next is PostgreSQl architecture and advanced notions. But this is in French. Hasn't been translated English yet, but if you know French you could read that. Next is learn PostgreSQl by Luca Ferrari and Enrico Pierosi. Next is PostgreSQl 14 eternals by Igor Rogov. So check out these books to learn more about SQL and PostgreSQL. [00:14:11] Next piece of content waiting for PostgreSQL 17 support identity columns in partition tables this is from depeche.com and apparently you can now use the generated identity columns in partition tables whereas you couldn't in previous versions. Now I've used serial, or if not serial, I've created an integer and used a sequence with it to achieve the same thing he shows here, but apparently it now works with the newer and SQL standard of generated columns using identities. [00:14:43] Next piece of content actually makes me sad. So waiting for PostgreSQL 17, add new copy option, save error two, rename copy option from save error two to on error. And this is from depeche.com. So what does that mean? So I talked about this in the previous episode of Postgres and I was so excited for it, where they were adding a copy option, save error two. And my interpretation of that is that when you're copying data into the database, if there's some type of error right now, it just stops copying and the process fails. I thought save error two meant I could save things that had an error to a separate file, but that's not what this feature does. So they actually renamed save error two to on error, and there's only two options. There's stop, which is the default. So if there's any type of error, stop the copy entirely or on error, ignore. So ignore those particular rows, don't load them in. So we don't really have a save error to capability anymore. However, he does say quote the future options could be file and specify the file name. So this is kind of what I was hoping it was, but it's not there yet. But check out this blog post if you want to learn more about that. Next piece of content connecting to postgres with PSQl and pGservice.com this is from timescale.com and it's crazy, I forgot that this capability exists. As much as I learn about postgres, I still learn something new every day. So the PG underscore service file is like an SSH config. So you can specify your own alias for connecting to different databases. So you could set up this file the way you want to and then just give a convenient alias for being able to connect up to it. Now the latter part talks about some timescale features that were added, but I thought this post was interesting for this part alone. [00:16:38] Next piece of content postgres not starting the case of the missing output this is from crunchydata.com and they had a database that couldn't start. It was logging absolutely nothing. And they're trying to figure out okay, what do we do about this? So basically they went down to just start the raw postgres executable and start it in single mode and just specify the data directory. And even that did not start. So like okay, let's try getting rid of the configuration file. Basically go to a default configuration, don't try to load any extensions. So they added config file and they put in an empty file for it and it still crashed like that. So like okay, what do we do at this point? So they decided to add debugging and put it up to the highest value of five. So with a hyphen d option and with that they finally got some type of output and it was basically invoking IPC memory create. So like oh wait a minute, this might be a huge page issue. So they added a flag, turning huge pages off, because by default it's that try. So apparently it was having some problem allocating huge pages and that finally allowed the system to start. Now, what was the ultimate problem? They actually traced it to a bug in the Linux kernel that they were using. So that ended up being the problem. But I thought this was an interesting process to go through, how you could potentially troubleshoot if your postgres has trouble starting. [00:18:05] Next piece of content postgres 16 Contribution Analysis 2023 this is from enterprisedb.com. And a couple of weeks ago Robert Haas posted who are the contributors to postgres in terms of lines of code and contributions to the mailing list? This post, and one that follows takes a company perspective of it. So this is 2020, three's lines of code contributed to PostgreSQL, and it's a percentage by company now assuming they took the person and who they were employed by in order to get these stats. But according to this, EDB was in the top spot with approximately 27% of the code changes. AWS was at about 18%, crunchy data was at 11%, postgres Pro at 10%, Microsoft at around 8%, and down the line with other contributions now the next post looks at it from the perspective of number of employees. So here there were approximately 38 at Enterprise DB, 20 at AWS, 15 at Postgres Pro, 15 at Microsoft, some independent Fujitsu at 13, crunchy date at eleven, et cetera. So really for all the people and all the companies that work on postgres, again, thank you so much for your contributions. I greatly appreciate it because it's great having postgres as a database. [00:19:28] I hope you enjoyed that episode. Be sure to check out scalingposgres.com where you can find links for all the content mentioned, as well as the video and audio of the show. And while you're there, please go ahead and sign up for the mailing list where you get notified of every show that comes out. Thanks and I will see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 283

222 Times Faster Analytical Queries With Columnar Storage | Scaling Postgres 283

Episode 1

Petabyte Scale and Locking | Scaling Postgres 1

Episode 120

Postgres 13 Beta 2, System Stats Extension, Tuned, Join Strategy | Scaling Postgres 120

PgBouncer, PgCat, Supavisor Fight!!! | Scaling Postgres 303