Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about community acquisition, space saving, terabytes wall archiving, and vacuum analyze tips. I'm Kristen Jameson, and this is Scaling Postgres, episode 134.
[00:00:17] You all right? I hope you family, friends and coworkers continue to do well. Our first piece of content is EDB completes acquisition of Second Quadrant becomes largest dedicated provider of PostgreSQL products and solutions worldwide. So, this is an announcement from the Enterprisedb.com website, or EDB, and as you can tell, they have acquired Second Quadrant, and this post goes over some of the details as to why they did it. They give you some highlights of the deal in terms of number of employees, customers, locations, things of that nature. There's also an announcement on their blog how EDB became the leader in the postgres market and has some discussions about why they decided to do this now. So I find this definitely noteworthy item, even though it's not specifically about how to scale.
[00:01:15] You know, I believe this has impact probably both positively and negatively for the Postgres community. Like having a larger organization advocating for postgres is beneficial. But I can't help but thinking that we lose a little bit of the community's diversity when something like this happens. Either way, feel free to check out these two posts.
[00:01:35] The next piece of content is PostgreSQL, a community project. This is from Secondquarter.com, and he's talking about how PostgreSQL is truly a community open source project. And he says most of the open source governance models usually have three different structures. One, it's run by an individual, or maybe two, but not that many. The second one is it's ruled by one company, or third, it's actually community led. So he talks about each of these models and how one can transfer into the other one and transfer back. But it's very hard to actually transfer to a community led because it's hard to go from one company to suddenly multiple companies and individuals supporting it, or even if you have one or two individuals to grow a community where you have multiple organizations as well as people assisting on the project. And talks about how Postgres is definitely this community led version and it has a lot of benefits. And he mentioned this because when Postgres 13 came out, he noticed a lot of people discussing it as being a successful community led open source project. So definitely interesting perspective and kind of why I mentioned one of the potential negatives of having acquisitions between EDB and Second Quadrant is maybe we lose some of that community diversity by doing that. But definitely some interesting news items happening this week.
[00:02:54] The next piece of content is PostgreSQL at scale saving space basically for free. This is from Braintree Product Technology on Medium, and basically they're talking about all about ordering columns to get a space savings. Now, he says with this technique, they've been able to save about 10% of their disk space and they have over 100 terabytes of data. So essentially it looks like they've saved maybe about ten terabytes of disk space. And as he mentions quote, this technique isn't revolutionary and it's been well documented by others such as Second Quadrant, EDB, GitLab, et cetera. And a lot of it goes down to having padding for types when it's needing to preserve some alignment boundaries. And they give a good example of this because if you're going to have a big int, it's going to want to have an eight byte alignment. So if you have an integer followed by a big Int, it only wants to start this big int at the next 8th byte or eight byte increments. So if you have a simple integer before that, it has to pad this space so it essentially becomes wasted space. Whereas if you order it with the big int first followed by the integer, as you can tell, we've saved about a quarter of the space. So it's using these techniques as well as a few others ones that they mentioned here, such as handling text binary types in a certain way, handling defaults and not nulls and primary keys a little bit differently, where putting primary keys first, basically things that are likely to contain data put them more so ahead of time. So this is how they've been able to achieve these space savings. And they actually packaged it up into a Ruby gem because they use Ruby but it's called PG column Byte Packer. And for those people who use Rails as their application framework, what it actually does is it helps order your migrations that you do to the database, so that when you do a new one, it places things in the proper order to try and conserve space. And it says they even went one step further and basically used PG dump to rewrite tables for their whole database to reorder their data in it. Now, normally I wouldn't anticipate wanting to do this for just the 10% savings, but when you're talking about hundreds of terabytes and you're going to save tens of terabytes doing it, then maybe it makes sense to do that. But definitely an interesting way to save space in your database. So if you're interested in learning more, you can check out this post. The next piece of content is PostgreSQL wall archiving and point in time recovery. This is from Higo CA and it's a post about exactly that, about how you have the write ahead log that logs all the databases activity to do crash recovery and maintain durability and consistency. And for backup purposes, you're going to want to archive those walls because they enable you to restore the database and restore to a specific point in time. And they're talking about the different archiving options you can set to be able to do that. One is the archive mode, whether generally setting on off, there's also an always setting the archive command where you're going to save these wall files to essentially it's a command or a shell script you can put in there. And then how much of the wall to save do you want? Just a minimal level, enough sufficient for replica or to be able to do logical replication with that wall stream as well. And then they go through setting that up as well as doing a backup and a point in time recovery. So if you want to learn more how to do that, definitely check out this post.
[00:06:15] The next piece of content is PostgreSQL, vacuum and analyze. Best practice tips. This is from secondquarter.com. They're talking about vacuum and analyze. Basically vacuuming up dead tuples after an update or delete, as well as keeping statistics up to date with Analyze. The first tip is don't run manual vacuum or analyze without reason. Now basically they mentioned here just waste resources. They didn't mention any reason other than resource usage as to why you wouldn't want to run them too frequently. The second is fine tune the auto vacuum threshold. So basically define how often you want auto vacuum kicked off for a particular table and that the default values are good for smaller tables. But once your tables start getting larger, you are going to want to adjust these. Next, do the same thing for the autoanalyze threshold to determine how often you're analyzing it for statistical purposes. Then he mentions fine tune the auto vacuum workers. So how many workers to have. And the thing to keep in mind with auto vacuum workers is that they are in a pool. So by default it's three and you can increase it. But that's not going to actually make your auto vacuuming faster because there is a common limit set by the vacuum cost limit or the auto vacuum cost limit and that is the limit for all those workers. So just increasing the number of workers won't make things go faster. Increasing the limit is the best thing you can do to actually make vacuums run faster or more frequently, essentially. So if you're wanting to learn some more best practice tips on vacuum and analyze, definitely check out this post from Secondquader.com.
[00:07:51] The next piece of content is PostgreSQL FDW authentication changes in PostgreSQL 13. This is from Percona.com and they're mentioning two changes that were done for the Postgres Farm Data wrapper. The first is the super user can permit the non super users to establish a passwordless connection on Postgres Farm Data wrappers. Previously it was only super users that can do it, but now the super user can specify it can work for non super users as well. And secondly that you can do authentication via an SSL certificate and they even go through the process of how to generate a certificate and a key to be able to set this up to use in postgres. So if you're interested in that, you can check out this post.
[00:08:35] The next post also from Bocona.com is PostgreSQL 13 new feature, Dropdb Force. So normally you have to have no connections to the database if you're going to drop a database. However, force has been added so that now even if you have existing connections, it will go ahead and drop that database. Definitely a high risk thing to do, but this new feature is now available in Postgres 13.
[00:09:00] The next piece of content is a quick look at PostgreSQL 13 release Candidate One query performance. So he ran a bunch of queries that he had and he mentions this test set up here and looking for performance differences between Twelve, Four and Release Candidate One of Postgres 13. And basically there were positives and negatives on balance. Nothing was dramatically more speedy in the queries that he was testing and essentially his conclusion was so in the end, some test items were a bit slower, others faster, and most importantly it seems like there are no grave problems. So that's basically some good news. This is from Cybertechn postgresql.com. Now. Similarly, there's another post called PostgreSQL 13 upgrade and performance check on Ubuntu debian 1.6gb/second random reads. This is from Crunchydata.com and he did an analysis using a bunch of different queries that he has to run against 13.
[00:09:59] It looks like he did not use the Release Canada but the actual release, and it looks like he checked 13 and twelve as well. And he found one example where some things were a little bit slower, but for the most part, all the queries he checked, again, it seemed to come to the same conclusion. Some were a little faster, some a little slower, but overall everything looked pretty good. So not any great improvements from either of these two posts, but basically there were little to no regressions. So, good news to know. And if you want to learn more about the details of these posts, definitely check them out.
[00:10:32] The next piece of content is PostGIS versus Geocoder in Rails. So they're talking about Ruby and Rails, which is a Ruby application framework. And there is a Ruby gem called Geocoder that helps you do things like geocode cities, states, countries into longitude and latitude and then being able to calculate distances to those. And they were comparing it to actually using PostGIS because geocoder does not use PostGIS to do its calculations. So they ran through the process of actually installing PostGIS into their environment and they got an active record PostGIS adapter gem. So this enables the Rails database layer to support usage of postgres specific data types, such as their point data types. So PostGIS tends to use points and things of that nature, whereas the Geocoder gem just uses floats for longitude and latitude. And then they talked about building some helper classes to be able to do some of these calculations and finding nearby records. And they compared both using Geocoder and PostGIS and they got pretty much the same performance, although it seems like PostGIS will be more accurate because it actually takes into account the curvature of the earth, whereas Geocoder does not. But that should only matter if you're doing really long distance calculations, I would imagine. So the shorter distances between one another, it won't matter as much. Then they looked at finding records within a bounding box with PostGIS and Jetocoder and looked at the comparisons there. Then they looked at some specific things you actually require PostGIS for. So if you're looking for records inside of a specific polygon as opposed to just a bounding box here, that's not something that the Geocoder gem does. But you have to reach for PostGIS to do that. They have an example of doing that type of query here, as well as finding related nearby records. So if you use Ruby on Rails and you're interested in learning more about actually using PostGIS as opposed to something like Geocoder, definitely check out this blog post.
[00:12:38] The next piece of content is PG Two. Four two features. This is from Higo CA, and it's basically covering all the list of features that are coming in the next major release of Pgpool Two, which is 4.2. So they talked about things like the Logging Collector, support for Log disconnections, health check improvements, health check stats, different pool back end stats, the LDAP authentication support, the Snapshots Shot Isolation Mode, as well as others. So if you're interested in using PG Pool Two, definitely check out this post.
[00:13:11] The next post. Also related to PG Pool Two is how to configure Scram and MD Five authentication in PG Pool Two. Now, this was from a previous post talking about authentication in general in PG Pool Two, whereas this one focuses on how to set up Scram and MD Five authentication with it. This is from Bping Blogspot.com, so check out this post if you're interested in doing that. And the last piece of content is the PostgreSQL Person of the Week. It's Thomas Monroe. So if you're interested in learning more about his contributions to postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the Show Notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.