Efficient Pagination, Rotating Passwords, BRIN, Row Level Security | Scaling Postgres 78

Episode 78 August 25, 2019 00:17:36
Efficient Pagination, Rotating Passwords, BRIN, Row Level Security | Scaling Postgres 78
Scaling Postgres
Efficient Pagination, Rotating Passwords, BRIN, Row Level Security | Scaling Postgres 78

Aug 25 2019 | 00:17:36

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss efficient pagination, how to rotate passwords, BRIN benefits and Row Level Security.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/78-efficient-pagination-rotating-passwords-brin-row-level-security/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about efficient paginization, rotating passwords, brin and row level security. I'm Kristen Jameson and this is Scaling Postgres, episode 78. [00:00:21] Alright, I hope you're having a great week. Our first piece of content is pagination with relative cursors. And this is from the Engineering Shopify.com blog and they're talking about doing pagination. So the way a lot of tools do paginization is they use a limit and then an offset. But this is very inefficient once you get to large record sizes. And they're showing an example of it here. The first query with a ten offset is six milliseconds, seven milliseconds, eight, that's not too bad. Once you get up to 10,000, it gets up to 80 milliseconds. And this apparently is for 14 million products. Once you get up to 100,000 with the offset, it's over 2 seconds. And then it just times out, they say, at the 1,000,000th offset, which probably is something north of 30 seconds. So there's no way to be performant. So basically, as other articles have mentioned, using this method for paginization, it does work for small record sizes, but once you get large, it's very inefficient. So they're talking about relative cursor pagination. Now, this isn't cursors as you typically think of it in terms of server side programming within a database, like a database cursor, but it's still a cursor nonetheless. You basically keep the last ID of what was looked at. So for example, here, if the last ID someone was looking at as they were paging through something was 67 890, then you would show the products where the ID is greater than that last essentially where the cursor is and order it by the ID in ascending order. So this is assuming that you're ordering whatever you're ordering by by ID and this becomes very efficient. So you can keep increasing the offset. Essentially what's being used, meaning looking at the 100,000th row in that 14 million and you'll still get great performance, the performance doesn't diminish. But of course, maybe you need to order by something different than an ID. Like maybe you're sorting the records you're looking at by a particular date or a name. How could you do that? And here they're talking about a sorting and skipping records where they're wanting to sort by a title and then by the product ID in case of assuming you have duplicate product titles. And here they use the technique of where the title is greater than pants. So all titles great once you get to the pants, the cursor is at pants as well as an ID larger than the last ID you were looking at. So looking at all titles greater than pants, as well as those titles that equal pants and the product ID is greater than two. And then of course, sorting appropriately by the title ascending and then the ID ascending as well, that will allow you to page through these types of records efficiently. Now, one thing I was looking at is this or statement and I wonder instead if they could use the ability to use a where statement containing two columns. So you can use parens to say perenn title, comma ID closing perenn greater than, and then a perenn around the values that you're searching for. I wonder if that is more efficient planning wise to the postgres planner. So potentially if you're investigating using a solution like this for your Pagination, maybe try that technique as well. Querying two columns at the same time and looking at the combined value, that may be more efficient than using the or. And then they did the timing of this example, it wasn't as efficient as just the ID, but it still results in pretty consistent performance and you would need a multicolumn index to get the most efficiency out of this type of Pagination, of course. So if you're looking to improve your Pagination and you're using the offsite method, definitely check out this blog post for some ideas to improve that. [00:04:14] The next post is rotating PostgreSQL passwords with no downtime. So this is the case of where you have a password for a postgres user that an application is using and you want to change that password. So we talked about three scenarios here. You could update the database and then update the application, and in between you'll have a short downtime, make both the old and new password known to the application and let it fall back on the new one once the old one fails, or create a copy of the database user, update the application and delete the old user. So of course this sounds ideal, but it sounds like it might be a little bit complicated, but this is exactly what this post describes how to do. So essentially they use roles within postgres. They create a MyApp role with no login and that's the one that has the permissions to all the objects in the database. And then created My app tom, essentially a different role with no login in the same role as this, and set My app as the main role. And then created My app Jerry, essentially another username with presumably a different password with the same permissions. It's basically has set its role to My app. Now, assuming that My app tom is the one that's active, you make it with login and have its password. So essentially this user can log in with the application to get access to all the objects that MyApp has access to. And then down the line, if you want to change the password that's being used, you make this secondary role, essentially the inactive one, MyApp jerry make it with login, set its new password, and then you change the as they say here, and then you change the application to use this new username. It begins using it and then you deactivate the old one by reverting it back to no login. So this is a pretty efficient way to be able to do password changes, coordinating what the application is using in the database without any downtime. So if you're interested in that, definitely a blog post to check out. They also talked about using Liquid Base, but I didn't really cover this part, but this may be of interest to you as well. [00:06:24] The next post is create a Brin index at a fraction of the normal size. And this is from Alibaba Cloud. Now they talk about B trees, which are essentially the most common index, but also Brin. And they say how useful it can be for the Internet of things or things using time series data. Because Brin is essentially a block range index, it indexes things in ranges. [00:06:52] So for use cases where you have constantly increasing data, like an incremental ID, like a serial data type, or if you have dates that are constantly increasing, a Brin index could be beneficial in that type of scenario. Now, we've discussed this before in previous episodes of Scaling Postgres, how the Brin index super super small because it's only storing ranges, not the exact data. So it targets a range and then it has to look within that range to find the exact data when you're doing a query. So for that reason it's super, super small. But it's not as performant looking at particular individual records as a Btree index. But one thing that's interesting about this post, it also looks at insert performance. Because you're having less records inserted by the Brin index, it's definitely more performant for inserts as well. Now, he goes through all the different testing scenarios here, but I want to kind of jump down to the meat of the material in the graphs here where they talk about space. So, for example, again, what we've heard before, brins is super small. So look at this comparison table, over 4000 megabytes, the Btree index 2500 megabytes, whereas the Brin I think is 4.5 KB. So super small relative to the table size and the Btree index. And then he also did queries with regard to a range query and then an exact query. So you'll see, the cost of a full table scan is immensely large doing range or exact queries doing the B tree index is the fastest, so 24 milliseconds and then part of a millisecond for the exact query. Whereas with a Brin index it's still vastly faster than a full table scan, but not as fast as a B tree. But again, for this space savings, is this fast enough? Now, the next part that I've been seeing discussed elsewhere, but it makes total sense, is how insert performance compares. So with no index you get essentially 66,000 rows per second, with a Brin index it's 62,000. So almost negligible performance on inserts. Whereas with a B tree essentially you're less than half of your insert performance. So if you're using a lot of time series data, and things that need fast inserts. Definitely check out Brendan Indexes again and check out this blog post to see how you could potentially incorporate them in your database system. [00:09:18] The next post is PostgreSQL row Level Security Views and a lot of magic. So this post basically goes over row level security and what it is and how to set it up and some things to watch out for. So first they create a user, Bob and Alice, and then they create a table with two different types of data. And they allow Bob and Alice to access the schema and the tables so they can essentially query this table. They created a little function to be able to debug exactly what was happening when row level security was turned on. And then they altered the table, specify the table name t service enable row Level Security. Now, once that happens, no one can essentially query anything. So you need to set a policy. So it creates a Bob policy on this table to allow him to select when the service type was open source. And that's one of the column values in here. So when the service type is open source, Bob can select from it. When the service type is closed sourced, Alice can select from it. So when you set the role to Bob and select from the service table, you only see the open source service types. When you set it to the role to Alice, you only see the closed source data types. Okay, that works great. But then what if we're using views? So here he sets the role to postgres creates two different views. The first view both users can select from, but then the V two sets the owner to Alice. So Alice is the owner and grants select on V Two to Bob. Now, when you set the role to Bob and query from this view, you get both sets of data. The reason is because the V one view is owned by the Postgres user who created it. So that's maybe not what you want. Essentially, Bob can now see both sets of data because Postgres can see both sets of data. So it's not applying the policy to Bob. Next still is Bob. When you look at the second view, you only see closed source and they say here quote, the reason is because V two belongs to Alice and therefore PostgreSQL will check Alice's RLS policy so it's showing what she can see even though you're Bob, because Bob has view permissions on it. So again, these are corner cases you need to be aware of and how views behave when row level security is enabled. And then they go through and they talk about another scenario, setting up a row level security that may have some unexpected behavior. Now, my own application, I don't use row level security. I tend to do security at the application layer. However, if you have a need to do it, at the database layer. This is definitely a great post to check out to explain some of the corner cases as they describe for row level security in PostgreSQL. [00:12:09] The next post is Managing High Availability in PostgreSQL Part Three patroni. And this is from ScaleGrid IO. And we've covered the previous two posts. The first one on PostgreSQL automatic Failover or PAF. And the Part Two replication Manager. So this covers Petroni So. Essentially another high availability framework for using PostgreSQL. So it goes over how it works, how does it handle the split brain scenario, talks about Pros versus Cons, and then describes their whole testing methodology, and then discusses what's the best PostgreSQL High Availability framework and compares PAF Rep Manager and Petrone So. It has these different tests that they did and compares them. So if you're looking for a High Availability solution for PostgreSQL, definitely a blog post to check out the next post. Developing on many versions of PostgreSQL at once. So this is talking about someone who uses many versions of PostgreSQL, I believe doing some development work and support work and how he juggles them. Now there are virtual environments which is akin to like Ruby's RBM for RVM and Python's Virtual End. So there's a tool he mentions called PG underscore Virtual End and Pgnv so these are tools that allow you to set up multiple environments and switch between different versions of PostgreSQL on the fly. He actually does some things a little bit different that he describes in this blog post. He talks about CC cache to be able to nuke your cache when you need to, particularly if you're jumping between different versions. Talks about how he works with Git Work Trees as well. So if you're needing to coordinate multiple versions of PostgreSQL that you're working with, definitely blog Post to check out next post is clone schema in Postgres. So basically there's no out of the box way to clone a schema either, just the schema itself, the objects without data or objects with data. So this blog post by Pateldinish.com has a link to a tool that allows you to clone a schema and does schema only or schema with data as well. So if you're wanting a tool to be able to clone schemas, maybe this is a blog post and a tool to check out. [00:14:32] The next post is percona distribution for PostgreSQL eleven. Beta is now available. And this was from Procona.com and it talks about it sets up I don't believe this is a proprietary setup of PostgreSQL, but it's basically they've put together an open source package, as it were, with version 11.5, most recent version and complements it with a selection of extensions. So they say additional extension supported by PostgreSQL Global Development Group, which looks like it's the Contrib library. Pgrack for being able to do something akin to a vacuum. Full on tables while they're live. PG audit to be able to enhance the object audit logging PostgreSQL logging PG backrest as a backup replacement or an alternative to PG based backup. And then Petroni, which we just saw a blog post on, which is the high availability solution for PostgreSQL. So essentially it packages these up together in a cohesive unit. So if this is interest to you, definitely a blog post to check out. [00:15:39] The next post is porting a PostgreSQL extension from Unix to Windows Ten. So this is very specific to Windows and it's how they have ported an extension from a Unix system to Windows Ten. So if you do extension development, particularly want to get something working on Windows, there's a blog post to check out and the last blog post is waiting for PostGIS three parallelism in PostGIS. Historically PostGIS hasn't been able to use that many parallel queries here. They say very few queries would parallelize and you had to force certain configurations to get them to use a parallel plan. But they say a quote with PostgreSQL twelve and PostGIS Three parallel query plans will be generated and executed far more often because of changes to both pieces of software. PostgreSQL twelve includes a new API that extends the ability to modify query plans and add index clauses. And PostGIS three has taken advantage as well to be able to enable more parallel plans to run. And they have a little example here of a particular function where they saw more parallel plans being used and dropping the runtime of a particular query. Although of course, as you ramp up the number of cores, as with anything parallel, you do get diminishing returns. But this is a boon to people using PostGIS because you'll get more parallel query planning. [00:17:13] That does it for this episode of Scaling Postgres. You can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes. Thanks. Next.

Other Episodes

Episode 50

February 11, 2019 00:14:23
Episode Cover

Outage from TXIDs, Breaking Scale, fsync Gate, pg_stat_statements | Scaling Postgres 50

In this episode of Scaling Postgres, we review articles covering an outage from running out of TXIDs, breaking scale, that PG fsync issue and...

Listen

Episode 261

April 17, 2023 00:16:24
Episode Cover

Easy Foreign Data Wrappers, JSONB Cheatsheet, Updating Cost Limit, Parallel Aggregate | Scaling Postgres 261

  In this episode of Scaling Postgres, we discuss how to easily crate a foreign data wrapper to consume an API, present a convenient JSONB...

Listen

Episode 211

April 17, 2022 00:11:17
Episode Cover

Multiranges, Missing Metrics, Newbie PostGIS, Conference Videos | Scaling Postgres 211

In this episode of Scaling Postgres, we discuss working with multiranges, missing Postgres metrics, PostGIS for newbies and videos from CitusCon. To get the...

Listen