Man-In-The-Middle, pg_auto_failover, Lesser Known Features, LZ4 Compression | Scaling Postgres 191

Episode 191 November 15, 2021 00:14:54
Man-In-The-Middle, pg_auto_failover, Lesser Known Features, LZ4 Compression | Scaling Postgres 191
Scaling Postgres
Man-In-The-Middle, pg_auto_failover, Lesser Known Features, LZ4 Compression | Scaling Postgres 191

Nov 15 2021 | 00:14:54

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss new releases of Postgres due to a man-in-the-middle vulnerability, the high availability solution pg_auto_failover, lesser known Postgres features and LZ4 compression.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/191-man-in-the-middle-pg_auto_failover-lesser-known-features-lz4-compression/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about man in the middle, PG Auto Failover, lesser known features, and LZ Four compression. I'm Kristen Jameson and this is scaling postgres episode 191. [00:00:23] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 14.113, point 512.911, point 1410.19 and 9.6.24 is released. And this announcement was [email protected]. And the primary reason is because of two security issues. One is a server processes unencrypted bytes from man in the middle, and the next is libpq processes unencrypted bytes from man in the middle. So it's both a server and a client issue where their interactions can cause a man in the middle attack. Now, it primarily looks to only affect if you're using trust authentication with a client Cert requirement or using Cert authentication. So if you're using, say, MD Five or Scrum authentication, you should be unaffected by this particular vulnerability. But again, they've made changes to it, so it is a good practice to upgrade anyway. They also included a number of bug fixes and improvements. It says most of these affect 14, but they may impact prior versions as well. Now, related to this, there was another [email protected] called Postgres MITM 21 Vulnerabilities. That's abbreviation for man in the Middle 21. And they go into more depth about this particular vulnerability. And they do say it is classified as a high severity, but it's also a high complexity as well. So there's definitely certain things that need to be done to be leveraged. But this blog post gave a good overview of the vulnerability and its impact. And they even include a query here to allow you to check if your Pghba comp file has any of these known vulnerable configurations, namely trust with the client Cert or just a Cert. So definitely encourage you to check that out. Now, related to this, there was also a YouTube video that was released called Securing PostgreSQL from External Attack. So this kind of speaks directly to mitigating this sort of attack because ideally, your PostgreSQL server should be on its own internal network and only speaking to very specific servers. And if you need to make any sort of connection to it, you should be going through some sort of proxy or bastion to be able to speak with it. So basically having a well locked down system and network, and this is about a 43 minutes presentation talking about some ways to do that. [00:02:54] The next piece of content, PG Bouncer 1.16.1, is released. And this is also related to the security fix because PG Bouncer essentially enables you to connect to it as Postgres and then connect to a Postgres server. Well, essentially they had to make some updates to be able to handle the security fixes in PostgreSQL, presumably. So you can check this piece of content for additional information about that. [00:03:19] The next post is an introduction to the PG Auto Failover project and this is from Ta Poueh.org and this is talking about PG Auto Failover and they're on version 1.6.3 now. Now, I find this interesting because I would say this is probably the simplest way to set up Auto Failover where it tries to automagically handle which is the primary which is the replica and basically lets you set up a High Availability configuration, ideally with three or more instances. But this goes into the detail of the project such as its early history developing the prototype and kind of the reason why they did it. Basically it was a customer demand for people who are using Cytus and it goes into a little bit of the architecture and how it works and why the way it does and how it is different from some other distributed consensus systems like Petrone and for example, Etsyd. So it's a little bit of a different implementation and as a result it is more simple and less prone to some issues because the thing about High Availability to have the automatic failover work they're using something like Petrone and etcd. There's a lot of sophistication you have to be aware of to make sure that system stays up and reliable. And this might be a simpler alternative, this PG Auto Failover but definitely a long blog post that goes into a lot of the history and some of the architecture of it. So if you are looking for High Availability solution, definitely check out this post to see if this might fit your needs. [00:04:55] The next piece of content. Lesser known PostgreSQL features. This is from Haki Benita.com and like many of his posts, this is a very long post. But what I kind of think of it as it's 18 mini blog posts that are all put together about all sorts of different things that you can accomplish in PostgreSQL. And some of these are kind of lesser known. I'll just list through some of these how to get the number of updated and inserted rows in an upset. There's a particular way to do that, how you grant permissions on specific columns. So some of these are ways to do things. Of this first one is it's not necessarily a feature but you can use like the XMax column to check this. That's what this one does. The second one to let you know where you can grant more than just permissions to a table but at an individual column level. Ways to do matching against multiple patterns, finding the current value of a sequence without advancing it, using copy with multiline SQL. So a lot of quick hits, I would say of different features that PostgreSQL offers. And I highly encourage you to check out this post and to just skim through it. It doesn't take that long but you may discover some features and capabilities you were not aware of. So definitely encourage you to check out this blog post. [00:06:13] The Next Piece of Content what is the new LZ four toast compression in PostgreSQL 14. And how fast is it? This is from PostgreSQL fastware.com. And in version 14, LZ four compression has become available as an alternative to the built in PostgreSQL toast compression. So the only level of compression this does is on toast files. So it's basically where the amount of data you want to insert a row exceeds the limit of that row. What happens is it spills over into the toast file and by default, it does compress it, and it uses the Pglz compression to do it. But now LZ four is available as an alternative compression algorithm in version 14. And you can see, you can describe a table using the backslash D plus, and it actually gives you the compression output for a given column. In addition, you can alter columns to change the different compression algorithms, so you can alter a column and set the compression to LZ four. But you do need to be aware that LZ four compression needs to be built in when you're compiling and packaging it. Now, I tested this on Ubuntu, and the Ubuntu PostgreSQL 14 package does include LZ Four compression by default. So you don't need to do a separate build step to get that. But if you do set it per column, it just compresses all new data with that new compression because you can actually use this function PG column compression to let you know what a given row of data is compressed with. So you can see row one had Pglz compression, but when they changed that column, the next row that was inserted was at LZ four compression. So I imagine if you want to apply this to a large table, you may need to essentially update all of the data in this table to get the new level of compression. But the interesting part of this post is they also go into performance, and they checked out size comparisons. So comparing the table size between them and what they discovered was that LZ four is slightly larger than the default compression of PostgreSQL, but the advantage is on its compression speed and decompression speed. So, for example, they checked insert performance, and you could see that the LZ four compression is very close to uncompressed in terms of insert performance, but it's up to five times faster than the default compression. So that could result in a huge performance gain if you have very large pieces of information you're inserting that need to go in the toast. They then checked select performance, and again, there was an advantage with the LZ four compression in terms of speed. And then they tried multiple clients. And still the LZ four compression had advantages at both the insert level and the select level. So it definitely seems from these tests that LZ four has a big advantage in terms of speed when working with toast compression. So if you want to learn more, definitely check out this blog post. [00:09:15] The next piece of Content PostgreSQL 14s enable memois for improved performance of nested loop joins this is from Joq.org and he's talking about a new feature in PostgreSQL 14 and he included a Twitter post that someone did that says wow, memoize in PostgreSQL 14 makes one of my queries using join lateral 1000 times faster. Okay, so the question is what is this? So he goes into what is Memoization? And it's basically whenever you have a given function and a given input, if it keeps receiving the same output, you can basically cache that value for given inputs, essentially creating a hash map so that you can very quickly give an answer to a particular question rather than going in and computing it every time. So it kind of is a form of caching. Now, by default in postgres 14 this is turned on. So he checked it and the current setting was on. And if you do an explain plan where you would potentially use this with a nested loop, you can see the Memo eyes in the explain plan where it's being used. Now he checked some different benchmarks, did a generic run and he didn't see a significant variation. There was a slight advantage with the memoization but not much. So then he tried benchmarking a lateral join and with that the performance with it off was at about a 3.5 and then with Memoization on it went to about 1 second. So that was a pretty significant boost in performance. I mean, it's not 1000 fold like that Twitter thread suggested, but this was a pretty significant speed up and maybe at larger data volumes it could lead to an even greater improvement. Now, we did try a correlated subquery which essentially gives you the same results as a lateral join, but he didn't see the same advantage. So looks like there is some optimization that would need to know to use Memoization in these particular cases when you're querying in this fashion, but definitely an interesting feature being added to version 14. And if you want to learn more, check out this blog post. [00:11:20] The Next Piece of Content generating more realistic sample time series data with PostgreSQL generate series this is from Timescale.com and this post is a follow on post to generate series where this one they're focusing on creating more realistic numbers. So realistic distributions of numbers, realistic text as well as sample JSON. Now, a lot of times for generating sample data I'm using developer tools like in Ruby, there's Faker, in Python I think there's something similar. Now, I imagine these solutions are not as robust as those, however, I bet the performance of generating a lot of data would be a lot higher with these solutions compared to using some of these other libraries. So if you're interested in doing that, you can check out this blog post. [00:12:08] The Next piece of Content multifactor SSO authentication for Postgres on Kubernetes this is from Crunchydata.com and kind of following up to the man in the middle attack. This is a potential solution that could have sidestepped that. Basically doing multifactor authentication. That's where you say have a certificate, but you also have to provide a password as well, such as through Scram or MD Five. Now, this blog post explains how to set up that type of dual authentication mechanism, something you have a certificate and something, you know, password, but it does it for Kubernetes. So if you're interested in learning how to do this for Kubernetes using their postgres operator, you can check out this blog post, The Next Piece of Content. Postgres PG A-G-R-O-A-L connection pool. I'm sorry, but I do not know how to properly pronounce this word, but basically this is another connection pooler that's similar to PG Bouncer. So it enables you to do connection pooling for postgres, but its focus is on performance. And this blog post explains a little bit about using it, particularly on Red Hat Enterprise Linux distributions. So if you're interested in learning more about that, you can check out this blog post, The Next Piece of Content. Find missing crossings in OpenStreetMap with PostGIS. This is from Rustprooflabs.com, and they're talking about a 30 day map challenge that's going on in November. And for day five, there was a particular submission and he completed it using OpenStreetMap data as well as PostGIS and QGIS for Visualization. So if you want to learn about how we did that and his implementation, you can check out this blog post, The Next Piece of Content. The PostgreSQL Person of the Week is John Naylor. If you're interested in learning more about John and his contributions to postgres, definitely check out this blog post and the last piece of content. We did have another episode of the Rubber Duck Dev show this past Wednesday. This episode was on. How do you start testing if you really don't have any tests for your project? So you can check out this video or the podcast audio. If you're interested, we will be having another episode this Wednesday at 08:00 P.m. Eastern Standard Time. [00:14:29] That does it for this episode of Scaling Postgres, you can get links to all the content mentioned in the the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 8

April 16, 2018 00:15:06
Episode Cover

Scaling Real-Time Analytics, Covering Indexes, 1,500 Upgrades | Scaling Postgres 8

In this episode of Scaling Postgres, we review articles covering real-time analytics at scale, covering indexes in Postgres 11, 1,500 Postgres upgrades and PostgreSQL...

Listen

Episode 216

May 22, 2022 00:13:50
Episode Cover

PG 15 Beta 1 Release, Query Breakdown, Sort Performance, Prepared vs. Partitioned | Scaling Postgres 216

In this episode of Scaling Postgres, we discuss the Beta 1 release of Postgres 15, optimizing a query by breaking it down, improvements to...

Listen

Episode 91

November 25, 2019 00:13:22
Episode Cover

Global Indexes, Caching Aggregates, Vacuum Processing, Effective Cache Size | Scaling Postgres 91

In this episode of Scaling Postgres, we discuss global indexes, ways to cache aggregates, how vacuum processing works and the purpose of effective cache...

Listen