Episode Transcript
[00:00:00] In this episode of Scaling Postgres we talk about postgres eleven, multitenant scale trigger speed and full text search. I'm creston. Jameson and this is scaling postgres episode 14 one.
[00:00:20] Alright, the first article of the week is actually a news announcement. So PostgreSQL eleven beta one is released. So I have the release notes here and it talks about all the different major features that have been added to PostgreSQL eleven. So the first what they talk about is major improvements to partitioning. So you can now partition by a hash key update statements issued to a partition key now move affected rows to the appropriate partitions so it helps make it easier to manage your data and keep them in the right partitions. Improved select query performance due to enhanced partition elimination. So basically you can pare down the partitions that you're hitting much more efficiently and support for primary key and foreign key indexes and triggers on partitions. So this is a lot of great improvements to the partitioning.
[00:01:16] Another is improvements to parallelism. So basically being able to use more cores of your database system to satisfy queries. So they've added the ability to parallelize hash joins, creating an index for B tree indexes as well as create table as and create materialized views and certain queries with union. So again, another set of great improvements that are coming. The next listed which we and a lot of these we've talked about in previous episodes of Scaling Postgres, but these appear to be the ones that will definitely be released this coming year in PostgreSQL eleven is SQL store procedures. So basically the ability to embed transactions within a function or a procedure, so being able to begin, commit and rollback within a function, and those are called procedures.
[00:02:08] They introduced just in time compilation to help optimize execution code and operations at runtime as well as Windows functions, fully support the SQL 2011 standard and then a few other user experience enhancements that they mention here. The big one for people that have large tables is this one here I believe where basically before when you were adding a column to a table and you had a default, it would have to rewrite the entire table. But here it says PostgreSQL eleven removes the need to rewrite the table. In most cases we'll have to with time kind of determine what the case is, where that's not the case because we need to fall back to the old way of doing it. But basically now when you add a column with a default, it should execute, they say, extremely quickly. And another quite interesting one is that psql on the command line, normally to exit it, you do a backslash Q to exit, but they've now added you could use quit and exit as well, so making it a lot more intuitive, particularly for new users. So a lot of great features in this release, and I've just kind of talked over the major points. Feel free to check the link in the show notes if you want to read in more detail.
[00:03:24] The next post is preparing your multitenant app for scale. And this is from the Citusdata.com blog. So in this post, they assume you have a multitenant app. So multiple accounts that you satisfy with one app or multiple customers that their data is commingled. And of course, they give some initial advice where you want to avoid schema based sharding as well as one database per customer. Sharding? But assuming that you've done that, what prep work would you need if you're wanting to further scale your multi tenant app, potentially eventually moving on to citus data? Of course, they hope the first thing they mention is denormalize earlier to scale later. So normally you're going to set your database up in a normalized fashion, but as you continue using it to get better performance, you can start denormalizing certain areas. And one way they suggest here is typically you have a tenant ID that breaks up your database into what data is dedicated to what customer. Now, I've seen this called a tenant ID. I've seen this called an account ID. I've seen this called a customer ID. But whatever you're going to be using, you should pretty much add that to all your tables, even though that's probably not the fully normalized best practice. But it does allow you to segment your data and return queries a lot faster. Because generally, you're going to want to look at hey, what is this? Customers? Or what is this? Account's activity in a certain area. The second recommendation they have is to adapt your keys to leverage the multitenant schema. Now, I see here that they have created a table and that they've defined a composite or a multicolumn primary key going by the tenant ID, essentially, and then by the unique ID for the field. Now. I haven't seen this done a lot in practice. Normally the databases that I see follow the structure of what their orm does, and typically the primary key is just an ID, and it can take some work to readjust it on a live app. And I'm wondering if this is a case of premature optimization, because in order to avoid pain later on, you're going to have to do this really early. I myself have scaled a database system from application up to a Terabyte. I've seen other customers multi terabytes and they haven't had to change the primary key in this way. I can see some advantages to it, but I wonder how useful this could be. And I'm wondering if other multicolumn indexes not necessarily the primary key, would get you 90% of the way there. So I'm not wedded to this particular suggestion because I think you could do it with just additional indexes on the table. But maybe this becomes important if you're wanting to do sharding and migrating to Cytus data. I'm not sure. And The Last Suggestion is manage your postgres connections before they manage you. So basically once you start getting a high level of connections, you're going to want to introduce some sort of connection manager like Pgbouncer. And that's what I've used historically. So a couple of good recommendations and one that I'm wondering if this might be a little bit premature, but definitely a blog post to check out.
[00:06:38] The next post is are triggers really that slow? In postgres this discusses using triggers in your application or your database system. And he talks about sometimes some application developers would say using triggers is an awful practice, doing magic stuff kind of secretly, whereas people more versed in databases would probably see good use cases for them. Now I haven't used a lot of triggers, usually they're for particular purpose, but it's rare that I use them in my application. But what's interesting about this is this article uses PG bench to actually take a look at what kind of load a trigger places because it's going to place some sort of load on the database system. But how much is it?
[00:07:22] And he goes through all of the methodology that he used. The scenario he used was doing auditing. So he created a separate table with two columns a timestamp and a text ID to basically record when something was last modified and by whom. So auditing what changes are happening to particular tables in the database and then he used PG bench to run it. And essentially with these triggers added, there was basically a 0.4% difference. So basically I consider that pretty negligible and I wonder actually what the standard deviation is because how measurable is it? Now some of this kind of makes sense and I would imagine that triggers have a negligible impact compared to doing other things in your database. Like for example, if your application is doing some sort of auditing and he mentions that here. But thinking about it logically, if your application has to make the calls to the database, you have a network round trip you have to deal with. Potentially, if you're doing things in the Orm that requires application load to deal with and just having the database self contained, adding additional rows to a table or potentially, maybe even updating them shouldn't take that much time. So really triggers are not that slow as long as they're kept relatively simple. So if you're considering using triggers, maybe you want to check out this post.
[00:08:46] The next post is Create View versus Alter table in PostgreSQL. Now, I haven't used a lot of views in the applications that I've developed. I've used them historically when doing a lot of reporting and typically that's how I've used them in my application is to give simplified views of the data for reporting purposes for my applications. But it's an interesting post and then it goes into the background of how views are managed and how it can handle changes you implement at the table and how the views are able to keep up with that and then even list some cases where they aren't able to handle it, like when you're dropping a column. But in most cases the views can be maintained and kept up to date with changes to the table. So, definitely an interesting blog post to check out if you're interested in how that works behind the scenes.
[00:09:35] And I should say that these last two posts were from the Cybertechgresql.com blog.
[00:09:43] The next post is Full Text Search made almost right in PostgreSQL Eleven.
[00:09:49] And this is from the Akorotkov GitHub IO blog or Alexander Korotokov's blog. And in it he discusses changes made to PostgreSQL Eleven. And specifically I'll actually mention what's list here. Long story short, using PostgreSQL Eleven and Rum index you can do both a top end query and count all columns query for non selected full text search queries without fetching all the matching results from the heap. So essentially making it much faster. So normally when doing full text search I've added Gin indexes, but he's saying that with some changes added to the Rum index, you're able to add some additional information to them. That makes doing top end queries as well as counting queries a lot more efficient, at least in PostgreSQL Eleven. So if you do full text search, I definitely suggest you check out this blog post.
[00:10:54] The next post is actually a YouTube video in its Rails Conf 2018 Postgres Ten performance and you by Gabe Insigne. So actually in Scaling Postgres episode Ten I shared the presentation, I believe it was on SlideShare of this presentation. But now we see that this YouTube video has been posted. Basically this presented some business scenarios and then like for example, oh, you're seeing poor performance with tables with a lot of rows in them, what's the solution to it? So basically presenting business problems and then identifying features in PostgreSQL Ten that allows you to address those efficiently. So now that we have the presentation, I definitely suggest you review it, particularly if you've looked at the slides and found them interesting.
[00:11:47] The next post is Understanding Deadlocks in MySQL and postgres SQL. So this is from the several nines.com blog and it basically describes Deadlocks and then how to simulate them using PostgreSQL as well as MySQL. So I historically have how I've structured my applications have not seen a lot of or hardly any Deadlocks, but there are certain cases when you're doing manual transactions and special types of locking where you can run into this situation. So it basically describes what they are as well as gives you an example of how it happens in the error that you will see. So if you're seeing Deadlocks or you're just interested in what they are, it's basically you can't resolve two updates or deletes that happen concurrently. Definitely a blog post to check out that does it. For this episode of Scaling Postgres, you can get links to all the content in the Show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.