15% - 250% Faster Query Throughput | Scaling Postgres 287

Episode 287 October 22, 2023 00:14:25
15% - 250% Faster Query Throughput | Scaling Postgres 287
Scaling Postgres
15% - 250% Faster Query Throughput | Scaling Postgres 287

Oct 22 2023 | 00:14:25

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss how the new version of pgbouncer could get you 15% to 250% faster query throughput, the availability of using kubernetes snapshots for backup and restore with CloudNativePg and Ruby on Rails Postgres improvements.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/287-15-250-percent-faster-query-throughput/

 

View Full Transcript

Episode Transcript

[00:00:00] Do you use? PG bouncer. Do you want 15% to 250%? Better query throughput. We talk about that in this episode of Scaling Postgres, but I hope you, your friends, family and coworkers continue to do well. But before we get started, I wanted to thank everyone for reaching out to schedule zoom discussions with regard to my course. So if you reached out, I did send you an email and hopefully we can try to schedule these over the coming week. I also wanted to say that I haven't forgotten the interview shows that I was talking about. And my plan for those right now is not to put them on Scaling Postgres because everyone kind of likes the format of the channel as it is now. But I'm actually going to use my other channel, the Rubber Duck Dev Show, to do those interviews because that's already an interview format. But Chris and I will be bringing on people with postgres expertise and other people from the community. But don't worry, whenever we have a guest who is a postgres expert, or at least that's what we predominantly talk about, I'll include it in the content for Scaling Postgres. But for this week, we're talking about the release of PG Bouncer 121. And the biggest thing about this is support for named prepared statements. And I should say in transaction or statement pooling mode. Of course, it already worked in Session mode. So if we look at the change log link here, it has a little bit more of a description. And you can see this is for protocol level named prepared statements. So it actually doesn't work with SQL based prepared statements. So this is for libraries that use the client server protocol. And just a reminder as to what prepared statements are they're basically you prepare a statement ahead of time and then you execute it multiple times with variations on the parameter used. Like, give me all of the orders for this account. Well, you could change which account you pull orders from. That statement doesn't need to be planned. Again, essentially it knows the plan it's going to use. It just uses different parameters to run it. Now, the reason why this didn't work with Session and Transaction Pooling in the past is because you actually need a session maintained that you prepare the statement in one step and then you send multiple execute statements presumably after that. And PG Bouncer had no way to know which server Session initiated the prepare or not. So now PG Bouncer actually tracks prepared statements and they mention here in the docs that we'll get to later and they say using prepared statements together with PG Bouncer can reduce the CPU load on your system a lot, both on the PG Bouncer side and the Postgres side. And that from synthetic benchmarks they saw query throughput increases from 15% to 250%. So basically, there's a new setting called max prepared statements. It's set at zero. So it doesn't do any statement tracking. And basically you set it up to a value of how many to track. They had a recommendation of maybe 100 is reasonable, but of course it'll depend on your workload. And then in the documentation, they go into the detail of how it's designed. Basically every query that comes in gets assigned a unique string. So it needs to be tracking this prepared statement. So it does take some CPU and memory to do that, and they're talking about the cost of that. So you have to be aware, since PG Bouncer is not multithreaded, one process is going to be taking this additional load. So you need to be cautious if you're already at the point where PG Bouncer is pegged before you turn this on. So generally good to use caution. And of course, they say here this tracking and rewriting of prepared statements doesn't work for SQL level prepared statement commands such as these listed here prepare, Execute, deallocate All, and Discard All. So it only uses the protocol level name prepared statements. So basically, your programming libraries have to be able to support this. In addition, this release also firms up a lot of OpenSSL settings. The defaults have now become closer to what the OpenSSL defaults are now, but this is a huge win. I know that Pgcat implemented something similar a number of months ago, so it's great that PG Bouncer is doing this because I know, for example, I use Ruby on Rails, and you have to disable prepared statements and lose the benefit they provide if you're running through a PG Bouncer in Statement pooling mode, for example. And it would be great to still be able to use prepared statements even though you're running in Statement Pooling mode. But check out this announcement if you want to learn more. [00:04:22] Next piece of content PostgreSQL Disaster Recovery with Kubernetes Volume Snapshots this is from Enterprisedb.com and this post is essentially about their cloud native PG operator for running postgres on Kubernetes and specifically looking for backup and restore solutions that use snapshots instead of other backup tools. So basically they say Kubernetes has supported volume snapshots for a while, but the feature set has gotten to the point where they can actually build native Kubernetes volume snapshots into their postgres operator, and that with the release of 1.21, they now support volume Snapshots as a backup solution. Now, in this first release, it only supports cold physical backup. So the database has to actually shut down in order to do it. Now, because you're running a Kubernetes cluster, essentially you're going to have replicas of the database. So how it works is it takes one of the replicas down, makes a backup of it, and then brings it back up and in sync with the primary. So they look at the configuration of setting up a postgres cluster and how you would configure things and commands to use to set up the snapshot backups as well as how to do the recovery process. [00:05:38] Now, the reason why they do this is because snapshots offer a very fast way to do a backup and restore, especially on the restore side. So they have four representative sizes of database. A 4.5 gigabyte one, a four to four gigabyte one, a 438 gigabyte one, and essentially a four terabyte one. And they showed these different sizes with different backup durations. So the tiny takes two minutes to do the first backup and four minutes to do the second backup after an hour of PG bench has been run against it. Whereas the large, the first backup took about 4 hours and the second backup duration took about 2 hours. But look at these recovery times. The tiny one was done in 31 seconds, the large one, a four and a half terabyte database, was restored in two minutes and 2 seconds. So that's super fast. Normally restoring like from a PG based backup, it could take a really long time to restore that much data, but it ultimately depends on the amount of wall that's been generated and you need to replay up to. But that's some of the promise of these snapshots. And I actually have a client who is not opting for doing a PG based backup method, but they wanted to use AWS snapshots. So we are doing like the PG start backup, taking a snapshot and doing the PG stop backup. And essentially that's how we're doing their backups. And for the size of the database, the restores are very fast. Not as fast as this, but pretty quick compared to some other clients that I work with where we use PG based backup. Now, of course, the disadvantage of this right now is it only runs on a stopped postgres instance, but their plan is to have hot backup support by version 1.22. So the next version but I found this particularly interesting. If you want to learn more, you can check out this blog post next piece of Content postgres goodies in Ruby on Rail 7.1. This is from Crunchydata.com and there are some new database focused features in Rails that they mentioned here. The first is expansion of Async queries. So it provides additional tools to be able to do Async queries, particularly from being able to give it some raw SQL. You send it a query, it'll take a while, and then you can check that the value has been populated at some point in time later. And to be clear, this is not Pipelining, which is something that Postgres has been working on and I think done some implementation. But this is more on the Rails framework side. [00:08:04] This post talked about using it for long pages that have a lot of data it needs to pull, but I kind of see it more beneficial for long running background jobs. And whatnot next feature is Composite primary key support, which I am so thankful for because I use Composite primary keys on my partition tables for my app, and for years I've seen Warning Active Record does not support composite primary keys and I just ignore it. So having support for it now greatly appreciative. Next is native support for CTEs. So again rails DSL for sending SQL queries. They now offer the width method, so you could choose to do this or just use raw SQL. Still, if you want support for unlocked tables for doing tests, this is awesome for running your test using unlocked tables in Postgres. And the last feature is a normalizes capability so you can actually define that an email should have some transformation applied, like for example, strip all the white space and put it in downcase. But you can check out this blog post if you want to learn more, as well as Lucas's episode of five minutes of Postgres where he talks about what the new Rail 7.1 brings for Postgres users. [00:09:15] Next piece of content unleashing the power of vector embeddings with PostgreSQL. This is from Timbo IO, and this is another PG vector artificial intelligence discussion. They have a little bit of Python code in here explaining how it works with Chat GPT, and I frankly am trying to get up to speed on some of these things. And any additional posts that can help solidify my knowledge, I appreciate. So they were testing sentence and blog similarity using Cosign distance. They didn't use the new HMAC index, they were just using IVF Flat index. But another piece of content to educate you a little bit more on AI use cases. So you can check out this blog post if you're interested in that. [00:09:56] Next piece of content. Using Vim as my PostgreSQL client, this is from Trstringer.com. Now this is super short, but I found it really interesting. So it describes how he uses psql. So I use psql as my standard SQL client, but how he set it up is in three separate windows. I think he said using tmux, this top pane is the psql prompt, the lower left pane is the file of the SQL code he's running, and the right pane is the output. And as I'm looking at this, I'm thinking back on different database Gui's I've used in the past, and I remember how I could work with SQL code and it would appear in a pane below the results of the output. And that's kind of something I'm missing sometimes when I'm using psql, not being able to see the code and the results at the same time. Now sometimes you can, it's just different. But I'm definitely going to try this to see what the experience is like and if I might want to use it. [00:10:55] Next piece of content there was another episode of Postgres FM last week. This one was on query hints. So basically what they are, what they do, and why we don't have them in postgres, and some things that we do have that you could potentially use, like query hints are usually to get around a bad plan. So then the question comes, well, what is causing the bad plan? And some of the recommendations they discussed is basically increasing the amount of statistics you are collecting because maybe it doesn't know enough about your data. Or maybe you need to add some enhanced statistics where say you define relationships between two different columns that can help the planner choose a better plan of execution. Or there are some variables you can adjust like to enable different features of the planner. You definitely don't want to use those globally, but maybe for a given session that's something you can explore. But they also talked about doing things like CTEs with materialize to basically explicitly dictate how you want your long query to be executed. So definitely interesting piece of content to check out. [00:11:55] Next Piece of Content PostgreSQL 17 Log Event Triggers this is from Dbizervices.com and this is a trigger that doesn't get fired based upon a row being inserted, updated or deleted, but it's based on some other event, so some existing ones are a DDL command start or End, a table rewrite, or an SQL drop. Well now there's a Login event. So at the point at which someone logs into a given database, you can now perform some action and he just gives an example of a simple one where you raise an exception if a particular user logs into a database. There's other ways to control this, but this is just an example of how it could be used. So check this out if you want to learn more about this new feature coming in 17. [00:12:38] Next piece of Content PostgreSQL 17 Support for at local this is also from Dbiheffingservices.com and here he's talking about how you can output times in different time zones. Well now you can specify at your local time by just saying at local or at local time zone, so it will output the dates and times based upon your local time zone. So check this out if you want to learn more about this feature. The last piece of content kill long running queries in PostgreSQL this is from Cybertechnpusgresql.com. Now they do have two different links on how to actually identify slow running queries, but mostly this covers how to kill them once you've found that they're running. And there are two functions to do this. The first one is PG cancel backend which terminates the query and keeps the connection alive, or PG terminate Backend which terminates the whole database connection. So usually you would run this and then if that doesn't work fast enough or you need to actually terminate the whole connection, of course you can run this. [00:13:39] And lastly he mentions you can put a governor on long running queries by using statement timeouts. So again the statement gets timed out if it reaches the limit that you've set. Now usually I don't like setting this at the database level, but I like setting it at the per user level because otherwise administrative functions may get caught with a really restrictive statement timeout. But check out this blog post if you want to learn more. [00:14:04] I hope you enjoyed this episode. Be sure to check out Scalingpostgres.com where you can find all the links to the content covered in this episode, as well as a transcript and the podcast recordings as well. Thanks and I will see you next week.

Other Episodes

Episode 294

December 11, 2023 00:14:09
Episode Cover

How Much Observability Is Needed? | Scaling Postgres 294

In this episode of Scaling Postgres, we discuss how much observability is needed, avoiding locks during database migrations, a huge Postgres events blogging event...

Listen

Episode 97

January 20, 2020 00:13:11
Episode Cover

Tips & Tricks, Faster APIs, Full Text Search, pgbouncer_fdw | Scaling Postgres 97

In this episode of Scaling Postgres, we discuss tips & tricks, how to get faster APIs, full text search considerations and pgbouncer_fdw. To get...

Listen

Episode 121

July 06, 2020 00:16:33
Episode Cover

Fantastic Data Types, Domain Data Types, Insert Performance, Row Differences | Scaling Postgres 121

In this episode of Scaling Postgres, we discuss fantastic data types, custom domain data types, improving insert performance and how to query row differences....

Listen