Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about materialized views, PG Bouncer authentication, implicit versus explicit and configuring JIT I'm Creston Jameson and this is Scaling Postgres, episode 40.
[00:00:24] We have very little content this week. I only have four posts to discuss. Basically it's Thanksgiving here in the United States, and I guess that is kind of the reason why there's not as much content this week. So our first post is the benefits of materialized views and how to use them in your Ruby on Rails project. And this is from Ryan rebo on medium.
[00:00:50] And basically they had a query that was slow and they wanted to use materialized views to essentially speed it up and use it as a form of caching. So we showed how he does a database migration here in Rails where he's creating the materialized view to basically make a physical table out of a view and select the exact query that he's looking for. Now some of the technique used here, I'm not sure how well this will scale as your database grows in size and as your data size increases. But what he chose to do in this example is he creates a trigger on particular tables that will refresh the materialized view when data is updated. So first he created a function that's refresh the ready song version and it's going to be used as a trigger and it basically refreshed the materialized view. Now, one thing you may want to consider doing if you're looking at this technique, is to actually refresh the view concurrently so it doesn't block read access to the view while it's being rebuilt. So that's one thing that wasn't mentioned here you may want to consider doing.
[00:02:02] And then secondly, for each statement that updates a particular column or deletes or truncates from a given table, execute this procedure basically refresh the view. Now, if you have a table that's being updated frequently, I'm not sure how well this would perform. I mean, he says that the materialized view refreshes fast, but it's basically probably depending on how much data you have in it. And if you have a lot of data and are looking to scale. In a previous episode of Scaling Postgres, I believe episode 39, or potentially 38, we talked about materialized view versus roll ups, and that was done by the Citrusdata.com blog. So using roll ups may be a more appropriate use case than materialized views, but it depends on how frequently your data is changing. And then he goes into Rails and how to set it up for their use case, where they could actually just trigger a refresh of a materialized view when they needed to through this Ruby class. And basically once they got everything set up, he did a timing benchmark in a query, went from 7.8 seconds to 1.7 seconds. So that's a pretty good win. But there may be other cases where you could use optimized indexes to potentially achieve similar performance. And actually in the comments they talk about hey, why not use a partial index? So this was an interesting use case, but definitely consider other potential options if you're looking to scale. Like maybe you want to use roll ups or if you want to use materialized View, maybe do it concurrently so you're not blocking read access. And if your table is really active, you may not want to do a trigger for every statement that updates the data. You may want to refresh it on a periodic basis like maybe through a cron job or something. The next post is PG. Bouncer Pro tip. Use authuser. So simplest way to set up Pgbouncer, which is essentially a connection pooler for postgres, is to have usernames and passwords in a UserList text that defines who can connect and what's their password. But there's another way to do it using Auth User and Auth query. So basically you define a user in your target database that has sufficient privileges to query the PG shadow table which is basically where usernames and passwords are stored and you define a query to be able to look up this table. So for example, they created a role my Auth user with a particular password and then established a function that that could use where it selects the username and password from the PG shadow table. And once you add in this configuration to Pgbouncer, every time a user connects to Pgbouncer, it will actually look up the username and password to see if they are authenticated in the destination database. So if you have a lot of users connecting to Pgbouncer, this is potentially a better way to manage that process. Now of course, one thing to keep in mind with postgres ten and eleven it supports scram and a lot of people are potentially moving to that authentication system and I'm not so sure that Scrum will work with Auth user and Auth query. And to my knowledge Pgbouncer doesn't support Scrum at all actually. But if you want to use this technique to potentially make user management easier for your PG bouncer install definitely a blog post to check out.
[00:05:31] The next post is PostgreSQL implicit versus explicit joins. So it basically talks about the differences and an implicit join is where you're just saying select all from and you just have a comma between the tables and in the where clause you state how those should be joined. An explicit join is where you literally say table join to another table on and define the columns that they're joined on for each table. You could also use a using clause as long as the column names are identical in two tables. So this is considered an explicit joint using the join syntax and implicit is basically using a comment and stating it in the where. Now personally I use explicit all the time. I don't use Implicit because I tend to favor being explicit in what I want to have happen and it just makes things much more readable for me. And in this blog post he talks about people are asking well, is there any performance difference and not really, but he says there is some case where there's a join collapse limit where it actually may start reordering join order once you get up to like nine tables that you're joining. So it's only in rare circumstances that you may get a difference. But normally the query plans between implicit and explicit joins are identical.
[00:06:56] The last post is installing and configuring JIT or Just In Time compilation in PostgreSQL Eleven. This is from the Procona.com blog and basically this is a new feature in Eleven where they're using Just In Time compilation and you have to actually install it a certain way and then enable it in Postgres. So this talks about okay, how do you install it and first example is installing it from a package repository and he shows some examples in CentOS and Red Hat and the different packages and the process you need to install it. And then also for Ubuntu users, which this is definitely the simplest one to do, essentially you trust the Postgres repository and just do Sudo app install PostgreSQL Eleven and the package for Ubuntu from Postgres already has legit features in it. And then also if you actually want to build from source here's how you do that. And lastly, to actually use it and turn it on, you need to enable JIT and even once it's enabled, it may only be triggered in certain use cases. And they talk about setting certain costs here to be able to get it to start working. So if you want to try out the JIT compilation, definitely a blog post to check out that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode or you can subscribe via YouTube or itunes. Thanks.