New Postgres Releases, Privilege Escalation CVE, Chaos Testing, High Availability

[00:00:00] In this episode of Scaling Postgres, we talk about new postgres releases, privilege escalation, CVE chaos testing and high availability. I'm Kristen Jameson and this is Scaling Postgres, episode 228. [00:00:24] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 14.513, point 812.1211, point 1710.22, and 15 beta three is released. So quite a few releases. This is part of a regular release schedule, so basically all the recent versions of Postgres have been updated. They also give an FYI that postgres ten will stop receiving fixes on November 10, 2022. So that's coming up rather quickly. So if you're still on postgres ten, you'll want to upgrade to at least eleven, if not further. They mentioned they resolved over 40 bugs with these releases, as well as fixed this security issue here. And finally, this marks the third beta of postgres 15, so you can download that and test that out if you're interested. But if you want to learn more, definitely check out this post from Postgresql.org. The next piece of content is related to that CVE that was fixed in these releases in this article. PostgreSQL Extensions Impacted by CVE 2022 26 25 with Privilege escalation this is from Enterprisedbe.com and it goes into more depth about this vulnerability. So it's rated A 7.1, which is a little on the high side. So you're probably going to want to patch this as soon as you can. And this post works through the vulnerability. Basically, there's two scenarios. You could have a rogue extension that you install that acquires more permissions than it should, or an existing user could set the stage such that if an extension is installed, you would get a Privilege Escalation and it walks through more in depth than the postgres post about this CVE and the Ramifications. Now, if you're not able to patch this relatively soon, there's different mitigation strategies you can use that they go into in this post. One of them is this convenient query to determine if there are any objects that are owned by a user other than the extension owner or a bootstrap super user. And basically, if they're found, you have to do an investigation to see what security impact that has and they go over some different mitigation strategies to handle it if you're not going to be doing the patching. So if you're interested in that aspect, definitely check out this blog post next piece of content. Chaos testing of a postgres cluster managed by the Zolando postgres operator. This is from Koru.com and they're talking about the Zolando postgres operator. So from what I understand, Zolando is a fashion site. So it appears their It department has developed this postgres operator for using Kubernetes to manage postgres installations. And they're using this particular postgres operator to set up a postgres cluster. And they're using their tool co route to monitor it, while they use chaos testing to test different failure scenarios. And see how it reacts. So I believe this is how they set up their postgres cluster using the Zolando operator and they're creating three instances. One is a primary with two Replicas and they have different supporting servers as well as the application. And these lines signify direct traffic. And I believe this is part of the co root application that's monitoring things. Now, normally I wouldn't look at such a product heavy post, but this post is really cool from the fact that they do a test like, all right, let's simulate a Replica failure. So they're going to fail a Replica and they use this network chaos toolkit to do that, but then they have a video representation of what happens. So this is what happens when you isolate a Replica and you can play it and it shows the different connections dropping from this Replica and the active connections from the still existing Replica. So you can see in real time how the operator handles these failure scenarios. And then they go over RA what happens when the primary fails. And they even have a video on that they show you what happens if you have excessive replication lag. And they simulated that by reducing traffic to the Replica by 80%, I believe yes, at 80% loss, which causes it to delay and fall behind and then the existence of a connection pooler. Then you have a pooler where the connectivity is broken and they even do a major postgres upgrade to see how that impacts the cluster. So even though this shows a lot of a particular product, it's fascinating seeing how the operator handles these different failure scenarios. And if you're interested in that, definitely check out this blog post next piece of content, how High availability works in our cloud database. This is from Timescale.com, and they're going over at a very high level how they handle High availability for their cloud solution. Their timescale cloud is what they call it now. Basically, they're highlighting the feature set that kind of already exists with Amazon. AWS like they're emphasizing how they separate their compute nodes from their storage nodes and that if you have a failure of a compute node, you can actually take the storage node and put it on another compute node. Which is how Amazon basically works. And given that the availability of the compute nodes is much lower than the storage nodes, nine times out of ten, if you have a failure, it's going to be in one of the compute nodes and basically there's a scenario to replace them and put the storage node on another compute node. But again, that's pretty much Amazon basics. But how they do that flip over. They do say that they are using Kubernetes for some of this. Now, they didn't mention if they're using a particular operator or how they're using it, but they say there may be a future blog post about it so that may have more detail. Again, this was basically a more high level post. Then they talk about storage and how they have a load balancer that directs where your read traffic is going and write traffic is going and how it directs traffic between different nodes and it looks like it is using the standard wall streaming mechanism. And they talk about what happens when a failure occurs and basically doing a promotion and the last area they cover is backups and how they handle that. So if you want a little bit of insight at a high level of how timescale Cloud manages its databases, you can definitely check out this blog post. [00:06:35] Next Piece of Content PG friday Do's and Don'ts of Postgres High Availability Q and A this is from Enterprisedb.com, and there was a post talking about Postgres High Availability the Do's and Don'ts back at Postgres Build 2021, but there were some questions that weren't addressed, and they're basically going through those unaddressed questions there. One is talking about split brain scenarios, one is talking about the ordering of starting a replica in primary. What order should you start in them? Could that trigger a failover? Talking about different backup commands when you have a standby that's falling a little bit far behind and potentially causing wall to accumulate on the primary using multihost connection strings as well as other failover solutions such as PG Auto Failover and talking about that a little bit and then actually some petroni questions as well for tools for doing high availability. So if you want to learn more about that, definitely check out this blog post. [00:07:35] Next piece of content five minutes of postgres episode 30 postgres performance puzzle linux. Read ahead and tuning effective I o concurrency. This is from pganalyze.com. And this is the performance puzzle that I covered in about the last two episodes of Scaling Postgres. And Lucas basically addresses this in his episode as well and actually talks about how it relates a little bit to the setting of the effective IO concurrency and how that can potentially impact Linux. Read ahead. Now, I will say one thing that I was a little confused by is that he states some of the reasons for the slowness is that the index layout is ineffective. I was a little confused by that because the index will be an ordered structure of those values. I don't see it necessarily being ineffective. But the reason for the slowness is actually because you scan the index and then you need to go to the heap because in this performance example you're counting rows from the heap, not the index. So whether you use the first index or the second index, if you are pulling back the data from that index, it'll be equally fast. The slowness comes when you have a low correlation between the ordering of the index and the heap. So the first index is highly correlated in terms of order with the data on the table. So that's what makes it fast and be able to do essentially a sequential scan. The correlation between the index order in the second column compared to that order on the table is very low. Therefore, doing a search, you can do a fast sequential search on the index, but then when you actually need to go the heap, you're doing random I O all over the place. Now, if you do just an index only scan, counting the rows in the index is super fast. But it's when you go to the heap that things get super slow because it has to do all that random I O. So I think really the issue is that the correlation of the index to the data in the heap in the slow example is very low, which causes the issue. But if you want to learn more about that, you can definitely check out this blog post. [00:09:40] Next piece of content having a less understood SQL clause. This is from smallthingsql.com and basically how I explain having is it's the where clause that happens after a group by because basically you group a set of data and then if you want to further filter it and say only show me the rows where the total is greater than 100. That's what having is used for. It's essentially a where clause for the rows that get generated by a group by. Now, in this example here he comes up with a pretty complex example where he has a couple of tables and to get the data he's interested in without using having he did this very complex four union, really a ton of unions to be able to get the data necessary. Whereas he filters this big long query into this very simple query by introducing the heading clause as well as doing the group by a cube also helped eliminate some of the complexity. But if you want to learn more about that, definitely check out this blog post next Piece of Content queries in PostgreSQL Six hashing this is from Postgres.com and they're talking about hash Joins and how they work in Postgres. And these blog posts on Postgres Pro definitely are super comprehensive. So if you really want to know all there is to know about Hash Joins, definitely encourage you to check out this blog post next Piece of Content the next episode of Postgres FM has been posted and they've been doing a really good job of posting episodes every week. Now before with Postgres TV and Postgres FM it was a little more inconsistent, but now for over a month they've been pretty consistent having weekly episodes. So for now I'll post the episodes link, but definitely check this out. Interestingly, they don't have a link to the Postgres TV YouTube because they are posting these on the Postgres TV YouTube channel as well. But if you're looking for more long form content, you can definitely check out these episodes. The most recent one was on Vacuum. As you can see here the next piece of content. The PostgreSQL Person of the Week is Adam Wright. If you want to learn more about Adam and his contributions to Postgres, definitely check out this blog post and the last piece of content we did have another episode of the Rubber Duck Dev Show this past Wednesday evening. This episode was on open source software experiences and our guest was Chris Oliver, and he's done many projects, but one of them is The Paygem So. It's a Ruby library that helps you extract away different payment providers and provides a common interface for working with them. If you're looking for a more longford developer discussion, we encourage you to check out our show. [00:12:20] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 140

Replication Conflicts, Postgres Governance, pg_crash, MobilityDB | Scaling Postgres 140

Episode 341

The Future of Postgres Upgrades | Scaling Postgres 341

Episode 369

Move Off Of RDS? | Scaling Postgres 369

New Postgres Releases, Privilege Escalation CVE, Chaos Testing, High Availability | Scaling Postgres 228