Declarative Partitioning, Permissions, Trigger Speed

[00:00:00] In this episode of Scaling Postgres, we talk about declarative partitioning, permissions, trigger performance and window functions. I'm Kristen Jameson and this is Scaling Postgres, episode 20. [00:00:21] Alright, our first piece of content this week is actually a presentation and it's called Declarative Data Partitioning with PostgreSQL. And this is from PG Day, I Believe in Italy by Gabriel Bartolini. And this presentation basically goes over declarative partitioning in PostgreSQL as of version ten because that's when this feature was introduced and it also covers what's coming in eleven. So this was a very comprehensive presentation about the state of partitioning, how it started out, supporting partitioning through inheritance and having to use triggers to the declarative partitioning, which makes it a lot easier, and even what's coming in version eleven, which are some really great changes. So it talks about offering the support of multilevel partitioning so that you can have a central master table or in this case here, a parent, but then you can have one using a range of dates as a first level and then even a second level, breaking out those dates and then into by customer ID or account ID. And then he goes into more depth about what is introduced in terms of version eleven, offering hash partitioning in addition to the list and range partitioning, about being able to add indexes on partition tables, foreign key support for partition tables, local unique index support updates, moving rows across partitions and also default partition support. In other words, if you have a piece of data that needs to be inserted into one of the partitions but there's no match for it, it could be put into this default partition. So a lot of great changes are coming in eleven. But still he does mention in this presentation that there are a number of things that eleven still is not all the way there yet where we want it to be. Such as being able to prune not just for select to choose the right partition, but also doing updates and deletes, making it easier to create partitions in the first place. Foreign keys with partitions as the target and a concurrent detach and attach of partition. And then this last important point is performance still requires work. So partitioning by years, not the same thing of partitioning by hours. So when you get a lot of partitions, you can still see some performance issues, but all those are probably going to be worked on in the next version, version twelve, version 13, et cetera. So if you're interested in partitioning, this is definitely a presentation to check out. [00:02:48] Now, related to this, the next post is PostgreSQL ten, how to create a partition table based on JSON B column field value. And this is from the newbiedba WordPress.com blog. The little things about databases. And this is a super short post, but he gives you in very minimal detail in postgres ten how to set up a partition data using data that's contained within a JSON B field. So I think this is a great example of how easy it can be. And you don't have to use a JSON B field but for setting up a partitioning. [00:03:26] The next post is also related to partitioning, although the title of this doesn't indicate it necessarily. The title is Scaling IoT time series data with postgres BDR. And this is from the Second Quadrant blog. And he does talk about the Internet of things and about how time series data tends to be append only. But those are good candidates for partition tables. So a lot of this covers this section here, time based partitioning in Postgres. Now, you can pretty much ignore the BDR that stands for Bi Directional Replication, which is something that Second Quadrant produces. But this is in general a good post on how to set up partitioning. And he shows you how to do it by setting partitioning over time. And again, this is using the declarative partitioning, how you can insert the data and then select it out. And then he also shows you how you can partition over time and space. Now, space here he's using different devices that are collecting temperature sensor data in his example. But like in the previous presentation, the very first piece of content I covered, they were doing it by account ID. So it's similar but just a different way to do it. But definitely if you're looking to get into partitioning, these last three posts are really great ones to look over on how to get started and particularly what's coming in version eleven, which are some really great changes. [00:04:51] The next post is using Search Path and Views to hide columns for reporting with postgres. Now, this post is about essentially permissions and what type of data someone is allowed to view. Now, as an example, they're using GDPR has gone into effect and there's increased concern about access to personally identifiable information. [00:05:14] In this example they use a user's table that has an email and a first name and a last name. But maybe you have other elements that you want someone to be able to access to. Look at the user's table, but you don't want these identifying fields to be included in it. Now, in this example they use schemas and views and then revoke and grant privileges to do that. So the first thing they do is create a new schema called Mask and create a user called Nonadmin, or this could be essentially a role. And then they revoke all privileges on the schema public. That's the by default schema that you receive for a database is the public schema and you remove this nonadmin user from being able to access the public schema. Then you create a view using the schema that allows you to select that from the main users table. You grant usage on the schema mask to the nonadmin and then grant select on all the tables in the schema mask to the nonadmin. So basically the nonadmin can only access data through this newly created Mask schema. So now at this point users that are not nonadmin, they can just do a select all from the public schema users and get all the data. Whereas if you are non admin user you can do select all from Mask users and you will see only the created at date because that's only what is included in the users table. But you can of course include other columns that don't include personally identifiable information. [00:06:47] So this is a pretty brief post but a great example of how you can use schemas and views in order to segment access to your data. And this tends to become important as you scale up your database and more users want access to that data. [00:07:01] The next post is Rules or triggers to log bulk updates. So there are a few interesting pieces of this post that I found interesting, but they're talking about rules. Now I haven't used these but upon looking through this post it seems like rules are a bit faster but they're more of an older technology and has a quote here. There have been attempts to deprecate them but they are used in the implementation of views so they're probably here to stay for a while. Not sure. Historically I've mostly used triggers, I haven't really used a rule. Now, just talking about the test case here, he's actually setting up an unlogged table so there's no crash recovery for it, but it also is faster because it's not logging all the operations. And he also has turned auto vacuum off for the creation of this table and basically he's creating a second table to serve as the audit log for it. And then he does a loop to generate a series of data and checking the performance with using two different triggers or a rule. So he shows a logging with a rule, he shows logging with a row level trigger and this will be fired on after the update of every row. So if you are updating ten rows it's going to fire ten times and then he does logging with a statement level trigger so that you just gets executed once for an entire update. And an interesting point he makes here is that this statement level trigger in post version, PostgreSQL version ten uses transition tables and he quotes the referencing clause will make all rows modified by the statement available in a temporary table called New Rows. So this implementation, I've been tested but presumably will only work in version ten. And he looks at the test results and essentially a baseline for just inserting data was 295 milliseconds, the rule was 454, so about 54% slower. A row level trigger was almost 4 seconds, so really took a lot longer to execute, whereas the statement level trigger was pretty respectable at 642 seconds. So just a little over twice as slow. But given the rule is, I don't know if it's necessarily antiquated, but may be deprecated in the future. It seems like for this particular implementation, a statement level trigger might be the way to go. So if you're using triggers, this post gives you a good examination of different performance levels and with some of the new features in version ten, maybe you would want to consider using a statement level trigger to get a little bit of a higher performance improvement versus a row level trigger. [00:09:40] The next post is PostgreSQL's eleven s support for SQL standard groups and exclude window function clauses. So of course these are features in version eleven with regard to Windows clauses. And first what he's talking about is that there is a frame clause which gives you a sliding average over your data. So essentially it's a moving average and the current value you are on in the last two and the first two are averaged in to give you what the moving average is. And he gives an example of the data here. Now in this class he's using rows between two preceding and two following, but he said there's other units you can use, you can use range. And now with version eleven you can use groups and he says, quote rows, count the number, exact number of rows in the frame range. Performs logical windowing where we don't count the number of rows but look for a value offset and then Groups counts all groups of tied rows within the window. So he actually gives two different comprehensive examples of how do rows behave, the range behaves and the new groups behave. This post also includes the exclude clause to be used in your window framing and gives an example of that again for the rows, the range and the groups. So this is a very comprehensive post about these new features in PostgreSQL eleven with regards to window functions. So if you're using window functions or you want to use them, definitely a post to check out to really grasp the potential use cases for window functions. And I should say this is from the J. O. Oq.org. Blog, which is Javasqlnjoq. [00:11:31] The next post is Diagnostic of an unexpected Slowdown. So this was an interesting post because again it gives you a real world scenario of a problem that was encountered and how to diagnose what was going on and getting around it. Basically they had a 40 core server, essentially 80 threads with Hyper threading enabled, 128GB of Ram, shared buffers at 16GB and max connections of 1500. So a lot of connections don't normally see them that high, but they had some special cases where they had to run due to an application using the database, it had to run on 9.3.5, that specific version, and you could not use a connection pooler, so you had to have 1500 connections to the database. And in that they were getting some long running queries for what should be relatively simple. So for example, with the normal timing doing a count of PG Stat activity returned in six milliseconds, whereas when they were having particular issues it was as long as 670 milliseconds, so 100 times as slow. So using the Explain plan, looking into the different possible solutions there are now this does get very technical and a little bit deep in the weeds, but it's definitely an interesting read if you are into that sort of thing. Basically led down the trail of a potential memory issue as frequently happens when you have 1500 connections. So he talked about the solution that they had to deal with and basically they chose to reduce the amount of shared buffers to the point that it resolved this memory issue but it was still able to give adequate performance. But he did note there were they could do something as of 9.4 if that was an available version to allocate the shared buffers in huge pages and that could have alleviated some of the memory issues he believes. So definitely one of these behind the scenes posts about having a problem and addressing the solution you came up with. I definitely find these particular posts quite interesting. [00:13:41] The last post is PostgreSQL concurrency, isolation and locking. So this is a pretty comprehensive post discussing isolation and locking. So it's good as a documentation to augment what's available in a PostgreSQL documentation, but it goes over the different levels of isolation and as it relates to concurrent updates and gives some examples of how this behaves with a practical example. So if you're wanting to learn more about that, definitely a blog post to check out. [00:14:15] That does it. For this episode of Scaling Postgres, you can get links to all the content presented in the show notes. Be sure to head over to Scalingposgres.com where you can sign up to receive weekly notifications of each episode or you can subscribe via YouTube or itunes. Thanks. [00:14:35] Our channel. Our channel?

Show Notes

Episode Transcript

Other Episodes

Episode 115

PGCon Online, Performance Tips, Tracking Counts, Essential Monitoring | Scaling Postgres 115

Episode 126

BRIN Pitfalls, pg_stat_statement Troubleshooting, Natural Join, Geography Type | Scaling Postgres 126

Episode 362

Optimizing For Time-Series Data | Scaling Postgres 362

Declarative Partitioning, Permissions, Trigger Speed | Scaling Postgres 20