SQL Functions, Explain Analyze Buffers, Debug Autovacuum, RLS Multi-Tenancy

[00:00:00] In this episode of Scaling Postgres, we talk about SQL functions. Explain, analyze, buffers, debugging, auto, vacuum, and rowlevel security multitenancy. I'm creston, Jameson. And this is scaling postgres episode 226. [00:00:25] Hello. All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is better SQL functions in PostgreSQL version 14. This is from Cyberdeck Hyphen postgresql.com and they're talking specifically about SQL functions. So not procedural level SQL functions, but just pure SQL so that's all that it contains is structured query language. So basically in version 14 there was an enhancement that improves error handling as well as security. So typically you would have a function that looked like this. In this example, he wants to create a new function called Mangle that does a lower of an unaccent of a given text. So he has to create the extension unaccent. And then he created this function that returns text using the language of SQL as this statement. And when you run the function it should unaccent it as well as represent it in lowercase. Now with this, when this function is stored as a literal text and as he says here, the function body of a PostgreSQL function is simply a string constant. And then essentially this is run again to execute that function when it comes time to run it. But this has at least two issues. One, there's a lack of dependency tracking. So for example, he shows an example here. If he just drops this extension, an accent that this function depends upon, and you try running it again, you're going to get an error. Ideally, if there was some sort of dependency tracking, it would say, oh, well, you can't drop this extension because it's being used in this function, but because it's simply storing a text string as a part of that function, it can't do any dependency handling like that. Second, you can run into search path problems depending on how the functions have been set up. So there's potential security issues around that. But with postgres 14 they have a new SQL function syntax and as opposed to storing a string literal, the function body is quote parsed at function definition time and stored in the Pgproc system catalog. So basically it does the parsing ahead of time. So it can do things like dependency tracking to make sure the function has everything it needs. And he shows that example here. When you drop the extension, it gives you an error saying you can't drop it. It also fixes some of these search path issues. So this is a new way to define functions. So what he says here, you may have certain clients, not necessarily psql, but particularly if you're using older versions of psql or you're using like older versions of PG Admin, then they need to include this type of support so that you can use these new features. Now, our second post is related to this. Begin atomic a better way to create functions in PostgreSQL 14. This is from JCats Five and he's talking about the same new feature. And again, he's showing here where again, what functions typically did was just record these as string literals. That's what the double dollar sign means. But if you want to use Begin Atomic, you'll see that there's no string literals here. You simply say begin atomic. You put one or more statements in here and then you do your end. And he talks about the benefits of dependency tracking as well as creation time parsing checks to even correct additional problems that can occur. So this is a new feature I was not aware of in version 14 and you can check both of these posts out if you want to learn more. [00:03:52] The next piece of content using Buffers for query optimization this is from Pgmuster.com and they're saying when someone does optimization work, you use Explain Analyze and track timings of different steps. But they suggest that this has a few weaknesses. One, they vary from run to run, which is absolutely true, so you have to kind of get an average. Two, they are dependent on the cache, which is also true. And then the timings alone can hide efficiency issues through parallelism, so it doesn't necessarily hide them, but it may be harder to discern how many loops are happening with something, whereas asking for buffers as well. When you do an Explain Analyze, it can help give a sense of how much I O is being done and allow you to know how much is happening with regard to cache. It should be relatively more consistent how much data you need to pull back and avoid some of potentially timing issues. And they go into what the different buffer statistics mean, looking at the shared blocks area, the temp blocks area and the local blocks area, as well as what's a hit, a read, a dirtied and when it's been written. And they give you an example here where they do an Explain analyze from a particular table and you can see it is doing an index scan in the buffers. The shared area was hit with three. So the data was found in the shared buffers, but it did have to read one block from the disk. And they feel that using buffers can help spot when operations are doing way more I O than you expected. Two, getting a sense of the total I O of the query. Three, spotting operations spilling to disk, although a lot of times that's pretty obvious. And four, signs of cache performance problems. So I would say any additional information, the buffers definitely help optimize your queries because any additional information you have is beneficial. And if you want to learn more about that, you can definitely check out this blog post. [00:05:43] The Next Piece of Content debugging Postgres Auto Vacuum Problems 13 Tips this is from Citusdata.com and they're talking about addressing issues that comes up with auto vacuuming. And this is similar to other vacuum auto vacuum posts that we've had. So some of the content will be similar, but they talk about three main areas and some possible fixes for it. So what do you do when a vacuum is not being triggered? In other words, your table is not being vacuumed? Well, one, you can decrease the auto vacuum scale factor, which reduces the number of rows that are needed to be changed for vacuum to kick off. And you can set this globally or for a specific table. Also, you can decrease the auto vacuum insert scale factor. So this is a new configuration option for newer versions where you can separately set the insert scale. So what do you do if your vacuum is too slow? So they say you can reduce the impact of cost limiting. Now, the main area for this, in my opinion, is increasing your vacuum cost limit or your auto vacuum vacuum cost limit. You can also reduce your vacuum cost delay or your auto vacuum vacuum cost delay. And those two things will allow vacuum or auto vacuum to just use more resources to vacuum a table faster. The next thing you could do to improve it is increasing the number of parallel workers. Now, the issue is that this cost limit affects all workers. It's one setting for your whole pool of workers. So if you increase the number of workers, each worker will work slower because it's based upon this global setting. So if you have the scenario of a small number of very large tables, it makes sense to keep your auto vacuum max workers low. But if you have, say, a lot of partition tables that are relatively small, then it may make sense to increase your number of auto vacuum max workers to be able to address those tables more quickly. Another area to help increase vacuum speed is improving the scan speed by pre, fetching and caching. Now, I haven't heard of this technique, but they're basically suggesting using PG prewarm to get some of the heap in the shared buffers, or just increasing the shared buffers configuration so you have more of the heap so it doesn't have to pull things from disk as much to do a vacuum operation. The next way to speed things up is increasing memory to store more dead tuples. So basically increasing your maintenance work memory so it can hold more of those dead tuples so it doesn't have to do multiple cycles of auto vacuum operation. And the other way to speed things up is actually vacuum your indexes in parallel. Now, you can't do this with auto vacuum, so this would require a manual vacuum, but a manual vacuum can do things in parallel. So you do have that advantage there. And you may want to, as he suggests here, adjust your max parallel maintenance workers to be able to do vacuum operations more in parallel and then the third problem area is that vacuum isn't cleaning up dead rows. So basically something is holding onto a row that's preventing vacuum from vacuuming or cleaning it up. You could have long running transactions, you could have long running transactions that are on standby because you have hot standby feedback on because you have replicas that you want to set that on for. Another possibility causing this is unused replication slots. Although generally you would see your disk space ever increasing because it's not able to clean anything or uncommitted prepared transactions. So if you're using two phase commit and you only have one phase initiated, that can't be resolved until that whole prepared transaction is complete. So having these orphaned does cause vacuum problems. So that's something that you can look at. But this is another great post about how to configure vacuum and auto vacuum for postgres. So I definitely suggest checking this post out. [00:09:28] The next piece of content shipping multi tenant SaaS using postgres row level security. This is from the Nile Dev and I was interested in looking at this post to see how they did this. And is this a technique I would want to potentially explore? But I really didn't see anything that persuaded me that this is an option I would potentially want to explore doing. A lot of the performance concerns that I've seen from so many other pieces of content still exist. Now, maybe things are changing with newer versions that will alleviate some of the performance concerns, but having this security overhead for every query just has concerns for me. And even in this post further down, they said that their access policy is relatively simple at this time, but quote, more mature access control policies such as RBAC ABAC require their own Schema design and can be more challenging to integrate with RLS row level security and even more challenging to make performant. So I think that's the key thing here is that I keep hearing with regard to row level security, a struggle to make it performant. Now again, that's not to say you can't use it, but if you want something performant, maybe you would want to avoid this for a multitenancy concern. And I should also add the technique that they're using here is essentially storing the user in a session variable so you read it to be able to determine what tenant they belong to. Now, as you scale, presumably you're going to have to start using some sort of connection pooler such as PG bouncer. And if you did that, you would be restricted to session pooling, which is definitely not as efficient as transaction based pooling where any particular transaction can be sent to any particular database. So with this technique you could not use the more performant transaction pooling. With PG Bouncer, you'd have to stay on session pooling, which kind of alleviates some of the benefits of PG bouncer. Now, this is the exact post that PG Analyze covers in their five minutes of postgres. Episode 28 row level security in Postgres. Security invoker views. And why leak proof functions matter. Now, he talks about this post in general, and I highly suggest looking at his episode as well. But he also talks about some security considerations you should take into account because most views will bypass row level security and you have to use some of the new features like security invoker views, to actually get it working well and how you need to be cautious of functions, even because you need to use specific leak proof functions. And with regard to performance, he even shows someone posting that, well, they tried using row level security and they ran into performance issues even with relatively simple queries. So again, there's that performance issue I keep hearing about with row level security. But if you want to learn more about this, definitely check out both of these posts. [00:12:22] The Next Piece of Content State of PostgreSQL 2020 213 Tools that Aren't psql this is from Timescale.com, and again they're revealing some more results from the State of Postgres survey. And they're showing here that definitely psql is far and away the tool used for doing queries and administration tasks. But they also mentioned PG Admin, DB Beaver, Data, Grip, Intelliu, and actually a few others that they list down here as well. They also talk about the third party postgres tools used for application development and the top response is they don't use any. The second is PG Bouncer. Which not surprised by also using Deplain or PG Analyze or PG Bouncer or PG Backrest or PG Mustard or some others. And then they cover some visualization tools that people use such as Grafana, PG Admin or DB Beaver. But if you want to learn more about the results from the survey, you can check out this blog post. [00:13:18] The next piece of content is actually a YouTube channel. This is the Percona YouTube channel. And this week they posted approximately 60 or 70 videos from, I believe it's Percona Live. So a lot, a lot of video content. Now, all of this is not postgres related. They talk about a number of different database solutions, but you can tell there's numerous postgres specific ones as well, such as Oracle to Postgres or Solving Postgres Wicked Problems, although it's showcasing another database, but definitely some postgres content amongst all of these videos. So if you're looking for some video content, definitely suggest checking out this channel. [00:13:58] Next piece of content partitioning with native postgres and Pgpartman this is from Crunchydata.com and they're talking about partitioning, how to get started, the different benefits of partitioning predominantly around data lifecycle and cost management. Basically, if you can partition and maybe archive to another area, your data, your working data set can be much smaller as well as even if you have all of the data still in, the database process of partitioning can make the vacuum process much easier to maintain. Each of those partitions. Now, there are performance benefits. They're not as great as you might expect, but you do see some improvements. They talked about the different types of partitioning and how to actually get started, and then in terms of managing those partitions, like keeping them maintained, creating new ones, et cetera, they talk about the Pgpartman extension as well. So if you want to learn more, you can check out this one. [00:14:53] The Next Piece of Content running PostgreSQL on Docker this is from Endpointdev.com, and this is a pretty comprehensive post about running postgres on Docker. So if you are interested in doing that, as well as doing things such as handling the persistent storage so that your data stays around after the container gets restarted, and also supporting port forwarding from a host to a container, as well as managing the container with Docker Compose. So if you're interested in that, you can check out this blog post, The Next Piece of Content. The PostgreSQL person of the week is Elizabeth Garrett Christensen. So if you're interested in learning more about Elizabeth and her contributions to postgres, definitely check out this blog post and the last piece of content we did have another episode of the Rubber Duck Dev Show this past Wednesday evening. This one covered the twelve principles that the Agile Manifesto was based upon. So if you're interested in that type of long form developer discussion, we welcome you to check out our show that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks, our.

Show Notes

Episode Transcript

Other Episodes

Episode 76

New CVEs, Postgres Programming, JSONB, Advisory Locks | Scaling Postgres 76

Episode 282

Postgres 16 Released, The Postgres Meme, File vs base64 Strings, Intelligent Sharding | Scaling Postgres 282

Episode 156

Columnar Storage, Docker Containers, Notify & Listen, Tuning RHEL | Scaling Postgres 156

SQL Functions, Explain Analyze Buffers, Debug Autovacuum, RLS Multi-Tenancy | Scaling Postgres 226