[00:00:00] In this episode of Scaling Postgres, we talk about postgres twelve window frames, index types in synchronous replication. I'm Kristen Jameson and this is Scaling Postgres, episode 83.
[00:00:16] You all right? I hope you're having a great week. Our first piece of content end is actually an announcement that PostgreSQL twelve release candidate one is released.
[00:00:32] And it looks like the release of PostgreSQL twelve is set to release the full release on October 3, 2019. So that would be this Thursday. So this week it looks like postgres twelve is being released.
[00:00:46] Related to this release, there's also a timely webinar that was done called Webinar. New features in postgres twelve follow up. This is from second quadrant postgresql.com and they go through the different topics they show. Here an intro to PostgreSQL twelve. SQL JSON, improvements to partitioning, reindex concurrently, progress monitoring, generated columns, case sensitive correlations, and plans for postgres 13. So, definitely a webinar to watch. I haven't had an opportunity to watch it yet, but I will be registering to go ahead and look at the replay, but to get ready for postgres twelve. Definitely a timely piece of content.
[00:01:30] The next post, also from second quadrant is PostgreSQL twelve. A few special case performance enhancements and these are very short and abbreviate. They're basically talking about a minimal decompression of Toast values. Toast is where values get stored when they're too large to be put in a single record. It's kind of a spillover area and it says it decompresses prior to twelve, decompresses the entire value. Whereas in twelve they say quote, we only decompress what we need, which can speed up accessing the Toasted value. So that could be of benefit for very large records that you may be storing. Another improvement is faster float conversion to text. And then the third one is a parallel query with serializable. So it's now possible to do parallel queries with serializable isolation levels. So just a few additional performance enhancements done for PostgreSQL twelve.
[00:02:28] The next piece of content is advanced SQL window frames. So this is a very great article. I definitely suggest you checking out. So he said there was a previous article on window functions and this talks about window frames, so areas within a window. So here's a quick overview of what window functions are basically allowing you to do things like rolling averages and things of that nature. And he has these great graphs here that represent what he's talking about. So for example, he's looking at a windows by release year. So you can see this is a particular window and it's going to average the values across that window to give you what the year average is. Same thing for the next window, which is 2016, and the next window which is 2017. So to expand upon his window example, he wanted to have a window frame example and he wanted to use quote for each film, find an average rating of all strictly better films in its release year. So you're basically doing a comparison among each row. So he gives again example here of where you had this partitioner, this window and then you're looking at the window frames. He goes over the different syntax and then talks about the three different modes. So there's the rows mode that operates at the row level essentially and all the different ways that you can define that frame. He looks at groups mode and again how to define the frame start and the frame end and then following up with the range mode then he breaks that into some real world examples. Now I of course don't have time to cover all of this in depth but this is a very great post and I definitely suggest you take the time to check it out if you want to get better with window functions and or window frames.
[00:04:22] The next post is what Django Khan has to do with Postgres and Crocodiles an interview with Louise Granjon from Microsoft. This is from the Citusdata.com blog and actually this interview is mostly talking about a presentation that's going to be given called postgres index types and where to find them. So that's mostly what the content is about. But this starts off great. These first three things that she's going to be covering in the presentation talking about one wants the audience to understand that postgres indexes are useful for two reasons performance and constraints. So you can use a unique constraint to ensure uniqueness and also get you better performance when doing lookups. And then she talks about second thing is an overview of different options with indexing. So you could use partial indexes, unique indexes, multicolumn indexes as well as well as just standard indexes and then also know when to use the different types. So there's the standard b tree but also gen that's generally used for JSON, b or text gist and then brin indexes which again can give huge space savings particularly for data types that are more sequential in nature. Now going through this she talks about being a Django developer and then also what advice she gives to fellow developers in terms of learning postgres. So it gives a lot of valuable information for developers that are using PostgreSQL. So if you're a developer I definitely suggest you check out this piece of content to see how you could potentially improve working with PostgreSQL.
[00:06:01] The next post is synchronous Replication is a trap and this is from the Robert Haas
[email protected] and what does he mean a trap? It means don't just rely on synchronous replication to assure that your data is safe. And on two systems. You need to take into account the holistic system that you're developing to ensure that data doesn't get lost. And it uses an example of where you have a user inputting data into a web application that then talks to a database system and that there's different failure points along that. And just implementing high availability or implementing synchronous replication will automatically make sure everything's safe. And you have to do more and think more holistically about the system to actually accomplish not losing data. And he just wants to make sure that people are using the features appropriately. So like, one thing he says here is I have few reservations about the use of synchronous replication for data consistency. Basically making the master remote apply for its synchronous commit setting and then set the synchronous standby names to a value that will cause it to wait for all standbys to respond to every commit. So that assures that things are written to disk across all the synchronous Replicas at the same time, so that you truly shouldn't lose any data. But there's of course downsides to that that he discusses here is that if one of the Replicas happens to go down now, your whole system is down because it cannot synchronously replicate to that Replica. So you have to have a process in place to handle those particular conditions so that your processing can still continue. He also mentions that quote I also don't think it's a big problem to use synchronous replication to control replication. Lag basically have a smaller delay when Replicating to Replica by using synchronous replication. But the issue that he mostly covers here, it says, quote, where I think a lot of people go wrong is when they think of about using synchronous replication for data durability. So basically a reliable system that doesn't lose data. And that's when he goes into the discussion about looking at it as a holistic system. And this is just one feature that may or may not help you accomplish that overall goal. But overall this is a great blog post, as is almost every blog post he does. So I definitely suggest checking it out.
[00:08:34] The next post is how to run short Alter Table without long locking concurrent queries. So this is a Casey's recently seen where an application had to add a column to a table. The application ran Alter Table add column without a default and everything stopped for many minutes. So basically there was some long running query and then when this Alter table started running, it had an Access exclusive lock. But what happens is there's a lock queue and things started backing up behind this lock waiting for it. So of course, what you need to do now in this is use a timeout. Now he talks about using a statement timeout, but as mentioned in the comments that he agrees with, what would be more appropriate is the lock timeout settings. So generally when you're wanting to do these types of DDL changes, definitely use a lock timeout of some number of milliseconds or seconds and then if it does not complete, it just errors out and doesn't do the DDL statement. So definitely something to keep in mind when doing database changes to your database.
[00:09:45] The next series of posts are all related to security. The first one is very brief, but it talks about implementing transparent data encryption in postgres. And this says that this is something that they've been discussing and now it looks like it's actually moving forward and they're going to be implementing a transparent data encryption method where the first thing they're going to accomplish is all cluster encryption. So that's what they're going to start with. And the hope is, quote, this can be completed in postgres 13. So we're already looking to version 13 and what can potentially be an end. And it looks like this is something that they're going to try for. And this is from Momgm US blog.
[00:10:31] The next post related to security is using row level security to make large companies more secure. And this is from CyberTech postgresql.com. And they're talking about setting up two tables, one a company table and a manager table, and defining a relationship like this and then going through and defining a policy based upon a query of these tables to determine who has access to what data in the system. So they grant various roles and then show how it can be able to query different outputs. Now one thing they mention here is that relying on a query like this, basically this needs to be run to check that policy every time. So you could get into a performance issue. But it's just something to keep in mind if you're looking into an investigating row level security.
[00:11:21] The next post is new version of PostgreSQL anonymizer and more. And this is talking about the anonymizer. So it is a tool that they've developed. And this is from Tadim Net that enables you to anonymize data. So if you have production data, it enables you to take that data, that database, scrub it and create an anonymized version of it. So blanking out names or replacing with something else, or email addresses, contact information, things of that nature. And they're talking about with the GDPR rules in place, this could be important. And he's envisioning it for use cases. Like if you have a standing system and you want to take production data and put it in there, you can do that if you anonymize it. So you're not worried about production data being on a separate system. So if you're interested in a tools like this, definitely a post to check out.
[00:12:21] And the last piece of content is there was PG. Comp Brazil in 2019 and they've just uploaded a playlist of it looks like 22 videos to the Info Cube Brazil YouTube channel. Now as far as I could tell, all of these presentations are in postgres, so I was unable to understand them. But I do know that we have an international audience. So if this is of interest to you, definitely the videos to check out.
[00:12:51] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.