Postgres 15 RC2, Shared Buffers, Secure Connections, PGSQL Phriday

[00:00:00] In this episode of Scaling Postgres, we talk about postgres 15, Release Candidate Two, shared buffers, secure connections, and pgSQL Friday. I'm Kristen Jameson and this is Scaling. Postgres episode 236. [00:00:24] All. All right, I hope you, your friends, family and coworkers continue to do well. We have a lot of content this week. I've been doing this for four years, and I don't think I've ever had this many articles. Now, normally I usually pick only the best articles, but there were a fair number of articles this week, just a lot. So I'm going to have to move pretty quickly. So our first piece of content is PostgreSQL 15, Release Candidate Two released. And there are just a few changes, unfortunately, reverting. It looks like a particular feature from Release Candidate One, but just a few fixes. They're still projected to release it October 13, which is about a week from now, but it may be pushed back to October 20. [00:01:10] The next piece of content harnessing shared buffers and reaping the performance benefits. Part one. This is from Enterprisedb.com, and they asked the question, normally when you hear about how do you configure shared buffers, you make it a quarter of the memory and you revisit it if you need to. And frequently many people don't revisit it. Well, they actually decided to do empirical tests to see, okay, what is potentially the best setting for shared buffers today, given the current version of postgres. Now, they actually went back in this part one and addressed more antiquated hardware. So, for example, something using magnetic hard disks. So they used PG bench, which primarily tests OLTP. They did another TPC evaluation using HammerDB, and then they did essentially an OLAP test as well. Now, what they found with PG Bench is that the more shared buffers, they gave it up to half the memory. The performance continued to improve. Like the performance was about double going from the quarter of a memory up to half of the memory. Similarly with HammerDB, when they were doing the OLTP test with the TPCC, they found a pretty significant increase in performance as shared buffers went up to half the memory. Like, this looks like a fourfold improvement. And from their analysis, it looked to be basically how frequently data was in memory. So things that are kept in memory more frequently, you're going to see better performance the larger your shared buffers. But then they tested TPCH, which is a more OLAP focused test, and they actually found the reverse. So the smaller the shared buffers, the better the performance was. And what they found with this is that they rarely hit rows in this test. So it was gathering data from all around the disk. It wasn't always in memory and available. So basically, the lower you're setting for shared buffers, the better the performance, pretty much. So the conclusion at this point is when you're using magnetic disks, it depends if you have an active working set in memory and you can fit it in the shared buffers. Having that as high up to as much half of the memory seems to give you the best performance. Or if you have more OLAP concerns where you have so much data and it's not going to fit in memory, then maybe a lower shared buffers will give you better performance in that case. But they're also going to be doing future parts to this where they're looking at more Ram and NVMe devices or maybe SATA drives. So I'm definitely going to be looking forward to the next parts of these blog posts. But if you want to learn more, definitely check out this one. [00:03:57] Next piece of content. The majority of PostgreSQL servers on the Internet are insecure and this one graph essentially tells it all. So they found from a particular service 820,000 PostgreSQL servers listening on the Internet. And they found that over 500,000 of them had no support for connecting via SSL, which seems like semi insanity to me. Number one, it's a best practice not to put PostgreSQL so it's accessible on the Internet. Number two, if you're doing it, you should enable SSL and also enforce certificate authentication or at least getting dedicated certificates for it as well. They also found that SSL was supported on the server, but it wasn't required at all for 200,000 of those servers. And then they found another set of around 100,000 where SSL is supported, but they weren't quite sure if it was required or not. It might have been, but a very paltry number that they knew SSL was required. And they do describe in the appendix their testing methodology for this, but they go into all the data that they found. Now, my understanding that Bit IO is a hosted service, so they offer hosted postgres. But of course, what do you do to correct this? Well, number one, don't put postgres out on the Internet. Or if you do, you need to force or require SSL at the server and ideally at the client, as well as using dedicated certificates. Now, to require it on the server, you basically need to go to your Pghba comp file like they talk about here, and only use Host SSL. People frequently just use Host, which is not going to enforce SSL. You want to use host SSL and then there are some settings in your PostgreSQL comp file including turning on SSL, making dedicated certificates, and there's more information about it here. They even include a link to Mozilla.org that tells you how to do the configuration for postgres to set up SSL securely. Now, in terms of the clients, they're saying to use Verify Full because if you have those certificates set up, then you can connect Verify Full and you'll hopefully avoid man in the middle attacks by doing that. But this post goes into a lot more depth. If you want to learn more, definitely check it out. [00:06:16] Next piece of content pgSQL Friday Episode One Two Truths and a Lie About PostgreSQL this is from Software and booze.com and this is the start of a monthly community blog post initiative called pgSQL Friday. Now they did initially call it psql Friday, which people had confused. Is this only about the command line client psql? No, this is about generally PostgreSQL. So it's pgSQL and this is the kickoff for it. And the next set of five blog posts were actually written on this topic. The first article is from Andres Sherbon La and he's talking about setting up postgres initially and how he uses Ansible where he's developed Playbooks to set up postgres for himself. But these are kind of the three most important areas to set up. One of them is a quote unquote lie and he says you should do this but then he corrects the lie later. But the first is changing the most important configuration parameters restart the database. The second is Use pgdump for backups. And the third is Create necessary accounts. So definitely check out this blog post if you want to learn more about those. The second post is from Mydbainobook.org and she did it a little bit differently where her title is Two It depends and one absolutely always do this and she covers psql. She talks about the psql E option that shows you the SQL that gets generated when you're doing backslash commands in psql. The second is combine different options in psql to achieve what you want. So they're talking about scenario where someone wanted to export to CSV and how you can make those configuration changes to do it in psql. Although I tend to use the copy command, but I suppose there's areas where psql could be used. And then talking about using the Hyphen capital W command to enforce asking for a password prompt. Now I'm not going to mention which of these is the quote unquote lie, but you can check these posts out to determine more about that. Next post is from Andyatkinson.com and this post is about Query Stats log tags and M plus one queries. So basically Query Stats is using PG Stats statements to make an assessment of your server. Log tags is indicating for queries sent to the database have your application say for example, what class it has come from so you can identify what queries are being run by what objects in the case of object oriented programming or modules. And then avoiding N plus one queries where your application needs to pull back 1000 rows and it goes and asks for it a thousand times as opposed to doing one query. The next post on this topic is scarydba.com and he's talking about backups and restores using pgdump, pgdump Ball and how to use pgdump for individual schemas. So you can check out that blog post. And the final post is from Sqlasylum WordPress.com and his post is on Truth and Lies where he's all talking about logical replication and the different configuration parameters you should do to set up logical replication for your database. So definitely check these blog posts out if you want to learn more about Postgres next piece of Content future Proofing SQL with carefully placed errors. This is from Hakibanita.com and he's talking about trying to avoid silent errors and he's focusing on a conditional where the else hasn't been defined. So if you get a value you don't expect in there and you don't have an else, it will kind of silently fail. So for example, he's not using an FS in this case, where in SQL's case he's using a case statement and basically when it's this value, do this, when it's this value, do this. But he doesn't have an else. Now you can have an else and have something take that path, but what he advocates is essentially throwing an exception if a particular case does not exist. And how do you do that in SQL? And his proposal to do that is actually creating a function. So he creates a dedicated function he calls Assert Never and passes in a particular element here. And what this function does, it just raises an exception. So in your else you say else and then that function and you're passing it in the value that's being assessed and he's casting it to an int. Therefore, if it ever sees a value that does not match what's in your when, it'll throw this exception and alert you hey, this is something that hasn't been handled. So this is a pretty interesting technique. And then he also follows up in talking about how could you do this if you don't want to use functions in your database. He tried a division by zero and there's some cases where this is still going to fail, so that's not a good solution. But he then tested out casting something to an int and that seemed to work. Although you do run into issues when you're trying to cast from non text types. But if you want to learn more about this technique, definitely check out this blog post next piece of content. Vacuum does not shrink my PostgreSQL Table this is from CyberTech Postgresql.com and he's talking about how when you insert and delete data, it doesn't always free up disk space where people deleted a lot of data and they expect their disk space to be reclaimed. And it doesn't even running Vacuum doesn't necessarily do it well. This post kind of explains why. So he creates a test table inserts as a part of two transactions, three rows each, and he shows you all the hidden rows that exist that help define the visibility of these rows to other clients who are using the postgres system. And then you can see this first set of three. This is part of one transaction and the second part of three is a part of another transaction. And the Ctid is basically the physical layout of where the system can find that data in the heap, essentially. And then he does an update of the last row and you can see that it goes from zero comma six to zero comma seven. Then he updates it a few more times and now you can see that it's zero comma nine for the Ctid. So essentially, as he says here quote, everything between five and nine is full of dead rows which have to be removed. So he runs a vacuum on that. And those rows do get vacuumed up, but it's not going to reclaim space. Now, Postgres knows the space is available, so it could reuse it if it needs to, but it's not going to relinquish it back to the operating system because there's no way within a particular database file to free up a middle part of that file back to the operating system. However, you can do it at the ends, which he's going to talk about here. So he talks about this exception to the vacuum rule and they basically shows the database size. And if you actually delete all the rows from that table, then do a vacuum, you can see that the relation size does shrink. Now, you could also just remove half the rows as long as it's at the end of the file, it's hard to know that exactly where that is, but you should get your file shrinking as a result of that as well. But of course, if you really want to shrink and compress all available space in the file, there's vacuum full. And also he mentions another solution called PG Squeeze. So if you want to learn more about vacuum and how to free up space, definitely check out this blog post next piece of content. Tuning a Postgres Brin Index this is from jcarson GitHub IO, and they're talking about ways to use a block range index efficiently for their use case. So block range indexes basically index blocks of ranges. So they're particularly good for append only data, like time series data, just being able to store dates. And they're really good when you're wanting to say just get aggregates. So you're looking for a whole bunch of rows and aggregating that data, not so much. If you're looking to pull up a single value, a B tree index is going to beat it in that case. The other advantage is because they're only indexing the individual ranges, they can be super super small relative to say, a B tree index, but they go into detail about how to assess what's the best size to make them, because determining how many rows within a particular range is important to do. And they also talk about the importance of doing some things like enabling auto summarize. So if you're considering using a Brin index, definitely check out this blog post as well as this next piece of content, which is five minutes of postgres. Episode 38 when to Use Brin Indexes in postgres tuning pages per range and determining correlation with PG stats. This is from Pganalyze.com and Lucas goes into more depth about when is a good time to use a Brin index. Doing things like looking at the PG stats to check the correlation of the physical layout of the data versus particular columns, like which particular columns would be good to potentially use a print index for it. So definitely check out this piece of content as well if you want to learn more about that. Next piece of content. PostgreSQL 15 new features to be excited about. This is from Percona.com. This is just a post that runs down some noteworthy features of postgres 15. Number one is merge. Number two is the row and column filters and logical replication. Next one is server side compression and client decompression and PG based backup. This is as opposed to the client doing the backup and pulling all the data, the server itself compresses it as it's sending to the PG based backup client. Next is offering the JSON log format for logging, number of performance enhancements in terms of parallelism and sorting algorithms, and also the new technique of storing the server stats in shared memory. So if you want to learn more about any of those, definitely check out this blog post. Next piece of content postgres WASM by Snaplet and Superbase. This is from Superbase.com and wasam is WebAssembly. So this is basically running postgres in your browser. And this is something similar to Crunchy data. Did. But this is actually an open source version of this project where they collaborated with Snaplet to be able to develop this. And they talk about the numerous features that available and kind of why they did it. And they mentioned documentation for tutorials and demos potentially uses for offline considerations like if you want to work with data offline or do data analysis offline without having to be connected to a main database. Maybe there are use cases for this as well as testing and dev environments, but they go into a lot of depth about how they got this running. Similar to the previous Crunchy data post that I talked about on a previous episode of Scaling Postgres. So if you want to learn more about that, definitely check out this blog post. [00:16:58] Next Piece of Content Learning PostgreSQL Internals this is from Clever Elephant CA, and here he's talking about someone who wants to get started learning about the internals of postgres, maybe become a contributor, or maybe they want to develop some extensions. Well, he goes through a process and good resources to go through to learn more about the internals. Now, related to that, the next piece of content is queries in PostgreSQL seven Sort and Merge. This is from Postgrespro.com and their blog posts always go into intense detail into the internals of postgres, discussing how it works in depth. So I would say postgres pro. A lot of their blog posts are excellent resources to learn more about the internals with regard to Postgres next piece of content. Kind of related to that is how to add more environments to the postgres. CI. This is from Citusdata.com and he's talking about if you're getting started with developing in postgres, you're going to want to have the CI system set up and this talks about how to go through that process and set it up. So check this blog post if you want to learn more about that. Now, if you're wanting to learn about more postgres generally, maybe this next blog post would help you with that. This is learning PostgreSQL. This is from Proopensource It. They're talking about certain resources that they think are good for probably more beginner to get started with postgres in terms of tutorials, learning portals, and then different types of online courses and books, and of course the postgres documentation itself. But these are all sorts of different resources to help you learn more about Postgres next piece of content. There was another episode of Postgres FM this week. This one was on 102 Query Optimization, and 102 is basically their quote. The next step of query optimization work, which is quote, the difficult topic of how to verify your changes won't make other things worse. So if you want to listen to this episode, or you could click here to watch the YouTube video as well. [00:18:58] The next piece of content. The PostgreSQL Person of the week is Joseph Sharina. If you're interested in learning more about Joseph and his contributions to Postgres, definitely check out this blog post and the last piece of Content we did have another episode of the Rubber Duck Dev show this past Wednesday evening. This one was on a beginner's journey with Code with Julie. So in this show we talked about developers getting started with programming and different resources that could potentially be available to them. If you're interested in that content, we definitely welcome you to check out our show. [00:19:29] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 320

100 TB and Beyond! | Scaling Postgres 320

Episode 215

New Releases, AlloyDB, Time Zones, Connecting to Postgres | Scaling Postgres 215

Episode 136

Stat Tracking, Temporary Tables, pg_squeeze, pg_auto_failover | Scaling Postgres 136

Postgres 15 RC2, Shared Buffers, Secure Connections, PGSQL Phriday | Scaling Postgres 236