New Releases, Reconnect After Failure, Bablefish, Troubleshoot Disk & Memory

[00:00:00] In this episode of Scaling Postgres, we talk about postgres releases, reconnecting after failure, BabbleFish, and troubleshooting disk in memory. I'm Kristen Jameson and this is Scaling Postgres, episode 152. You alright? I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 13.212, point 611.1110, point 16, 9.6 and 9.5 have been released. This is from Postgresql.org and the reason for the release primarily are two security issues. One is a partition constraint violation, errors, leak values of denied columns. So basically it looks like some columns can leak through on certain operations that someone shouldn't have the permission to view. And also single column select privilege enables reading all columns. So again, kind of column leakage. This latter one looks like it only affects PostgreSQL version 13 and the other between versions eleven and 13. But overall there's been a number of bug fixes and improvements that have been done to all the releases. And it's always good to look over these because some of them require an additional work. So for example, there was an issue that was fixed with just indexes and then they suggest you to reindex those just indexes. Or there were certain cases where create index concurrently had issues. If you're using prepared transactions and if you are an installation that uses those, then you should re index any concurrently built indexes previously. So go ahead and consult this post to look at all the different changes that have been made and resolved in postgres. [00:01:51] The next piece of content is reconnecting your application after a postgres failover. This is from Citusdata.com and they're talking about how when you're setting up a postgres high availability solution, one thing that you should also consider in addition to the server infrastructure is your client or your application side. And make sure that it can handle those disconnects. Because when a primary server fails over to a secondary server, you are going to get dropped connections, it doesn't transparently handle it off. So your application needs to handle that. And it discusses some techniques of doing that and even gives you a way to do some testing. But in terms of some tools that you can use, one is if you're not using something like Pgpool or PG bouncer that have some techniques to handle transitions between servers, you could use the multi host connection strings that are part of the libpq library. Now, even if your application doesn't use libpq directly, it may rely on libraries that do use that. So for example, they're mentioning that the postgres drivers in Python, Ruby, PHP or even node JS actually use libpq under their standard libraries. But there are others that use a more direct implementation and you need to check if they support these multi host connection strings. And basically it enables you in the host connection string to be able to specify multiple nodes and it will try one after the other to make a connection. So if Node One is the primary and it goes down, if it tries a reconnect and it's unable to reach Node One, it'll try Node Two. In addition, you can also set target session attributes and you can specify that this particular connection for this client should be Read Write. So if one of these nodes it connects to only supports read only connections, it will then go to the next node. So this is a simple way you can set up a way for a client to be able to try and find the PostgreSQL primary, say, after a Failover. Now, of course they did mention you can use PG Bouncer and PG Pool, and there's a number of different ways that you can handle that. But then what it talks about is a way to have your application being prepared to handle those types of transition failures. Now, Cytus Data has developed a PG Auto Failover, which is an open source solution for doing Auto failing over or a high availability solution for Auto Failing over your Postgres instance. And they've actually packaged it up into a docker container that enables you to run it very simply in your development environment and test out how your application behaves with a PG Failover with this. Now, of course you wouldn't use this docker container in production, but it gives you a basis to do some testing to make sure that the client side of the application can handle these types of transitions. But of course, you can choose to implement PG Auto Failover for your production area or use another solution. So if you're interested in learning more about this and the capabilities of PG Auto Failover, definitely check out this post from Citusdata.com. [00:04:53] The next piece of content is BabbleFish, the elephant in the room. This is from PostgreSQL Fund, and this is a discussion piece about the presence of Babelfish. So BabbleFish is quote, an endpoint to PostgreSQL that understands the SQL Server Wire Protocol tabular data stream so it can interpret things like TSQL commands and things for SQL Server. I love this quote that he says here. Quote what this means is that BabbleFish will be able to impersonate an SQL Server database. Applications may be able to run unchanged believing that they are connecting to SQL Server when they will actually be connecting to PostgreSQL. BabbleFish. Now, from my understanding, AWS developed this for their Aurora database, which is based upon Postgres but is not Postgres, but they have pledged to open source it. And this post talks about that possibility. And basically it's trying to rally the PostgreSQL community to help support this and to try to implement it as an extension to postgres and make changes to postgres to more easily support this type of extension that gives you this SQL Server compatibility because he doesn't want to get into this situation where they have to create a fork of postgres to support this or they just use it for Aurora. I suppose and potentially don't offer it for PostgreSQL. So I found this a very interesting post and I think you should take time to review it to understand kind of his perspective and why he thinks the PostgreSQL community should help and assist BabbleFish to be able to work better with PostgreSQL. So if you're interested you can check out this post. [00:06:36] The next piece of content is Troubleshooting performance Issues due to Disk and Ram. So this is from Higo CA and it basically talks about a number of tools you can use to diagnose performance issues as they relate to say, disk or memory issues. So the first thing he mentions is top in order to check out CPU utilization, checking out memory using the Free command and the disk space with the DF command. But if you want to look more into disk performance, he mentions three other tools here DSTAT, Iotop and Systat, as well as a few other tools such as iostat, SAR, DSTAT and also reviewing the Proc Mem info to get more detailed information from what Free gives you. So if you want to learn some more Linux tools to help you diagnose disk or memory issues as it relates to operating postgres, you can definitely check out this post. [00:07:29] The next piece of content starting with PG where is the config? This is from Dep and he's talking about you've been introduced to a new postgres database and you need to find out where its configuration is. And by configuration he basically means the postgresql.com file, the PG underscore hba.com file and maybe the PG underscore identcomp file. Now if you have access to a. psql terminal you can do show config file and it will show you the file location of where that configuration file is at. Or you can do show. HBA file or show. Ident file and it gives you the directory location, he says. But what if you don't have access to a psql prompt? Well, you can actually take a look at the process running on the server. So if you do a PS command, you can actually pull out the running postgres process and it tells you where the data directory is located as well as the configuration file. So you can use that to be able to see what its configuration is. Now the other thing to keep in mind that there is also the PostgreSQL auto.com file and there may be some configuration in there that's typically located in the root of the data directory. So if you want to learn a little bit more about where to find different configuration in postgres, you can check out this post. [00:08:43] The next piece of content also from the same website is why is their database named Postgres? And basically when you bring up a cluster, it comes with three databases template Zero, template one and postgres. So template one is basically the template from which all new databases are created and you can make alterations to this template that will then be implemented in the databases you create. Now, template zero exists in case anything happens to template one. So you can recreate template one from template zero. But of course, what is this Postgres database? Because if you look in it, do a backslash D, it has no objects in it, and basically it's described as a way just to be able to connect to some database, to execute some commands, even to create a new database. So you can connect to the Postgres database to be able to generate commands. Because sometimes if you connect to template one, then there's certain commands that don't work and he does some different tests connecting his different users so you can see what the different behavior is. And there's also some comments below as well that discusses it. So if you're interested in that, you can check out this blog post. [00:09:53] The next piece of content exploring SQL command changes in postgres 14 this is from Higo CA, and they're listing different changes at the SQL level that have happened to Postgres. The first is that the as keyword to give a new name for a column is now no longer necessary. So before you had to do say select LoC as Analyze, well now you can just say select LoC Analyze and it will interpret that in postgres 14 correctly. So you no longer need the as keyword. It's optional, although there are some that still need to be explicitly set because they're reserved keywords. The next is adding support for leading trailing trimming of byte array values. So that's a new feature allowing the current role in the granted by command allowing or replace when you're working with triggers. So before you had to drop a trigger and then create a new one, well, this now allows create or replacing a trigger, which is a great addition, and also support for reindexing concurrently on partition tables, which it did not do previously. So if you're interested in some of these SQL changes coming to postgres 14, definitely check out this post from Higo CA. [00:11:06] The next piece of content is PostgreSQL Toast data Corruption error, unexpected chunk number. And this is from fluca 1978 GitHub IO. Now he's describing an issue where data has been corrupted. Now I personally haven't seen this and he actually says down here that quote, so far I've only experienced human caused damages. So PostgreSQL itself while running he hasn't experienced Toast corruption, but apparently someone has gone into the data directory of Postgres and has altered something or some program has run and impacted the data directory. That's where he sees these corruption changes. Well, he's developed a function that can actually identify and find out where these are and he's placed it in a GitLab repository here. So he goes through the process of testing it out and actually corrupting a database. So definitely do this in a test system. But having done that, you can then use the function to be able to identify the data file that has the issue. So if you're interested in that, you can check out this blog post. [00:12:12] The next piece of content deep PostgreSQL thoughts. The Linux Assassin. This is from Crunchydata.com and he's basically talking about the out of memory killer. There are various different configurations that you do to try to avoid the out of memory killer on your postgres system. One of the main ones is setting the overcommit memory to two and then adjusting your over commit ratio as well to make sure you're using the optimum amount of memory. Failing that, there is a way to change the out of memory score adjustment, so that's also a possibility. And then it goes into there are host level out of memory killer mechanics as well as C group level, so you can apply some of these out of memory parameters at the C group level. Then he talks about why it's important to avoid this and basically you don't want to bring down your database or require restart because that has its own host of issues. And he goes into more detail of a lot of this here. But the post then turns to Kubernetes and how Kubernetes actively sets VM over commit memory to one so it doesn't turn it off as the setting of two, which is the recommendation. It also has different C group out of memory behavior and you can't really change the score adjustment and where a swap memory can help reduce some of this over committed memory by using swap, he says that Kubernetes turns off the swap. So running postgres in Kubernetes has some particular challenges and he goes into detail with some of these and gives you a few options on Mitigation. And he also says that they're actively working to try and make these issues be minimized in future versions in the future. So if you're interested in this content, you can check out this post from Crunchydata.com. [00:13:52] The next piece of content. Also from Crunchydata.com is Kubernetes Pod tolerations and postgres deployment strategies. So if you want to learn a bit more about Kubernetes and its new Pod Tolerations capabilities with regard to postgres and their upgraded PostgreSQL operator, you can check out this blog post. [00:14:10] The next piece of content petrone Environment Setup PostgreSQL High Availability for Windows and it discusses how you can use a graphical installer to set up patroni on Windows. So if you're interested in that, you can check out this post from CyberTech postgresql.com. [00:14:28] The next post, also from CyberTech, is catchment areas with PostgreSQL and PostGIS. So if you want to learn more about these catchment areas, definitely check out this blog post. [00:14:40] The next piece of content is check. PG Backrest 2.0 has been released. This is a monitor for the PG Backrest Backup tool, so if you're interested in learning more about this, you can check out this post from Pgstuff GitHub IO and the last piece of content is the PostgreSQL Person of the Week is Hubert Lubachowski. So if you're interested in learning more about Hubert and his contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks. [00:15:27] You close.

Show Notes

Episode Transcript

Other Episodes

Episode 245

ENUMs vs Check Constraints, Faceting With Roaring Bitmaps, Better Scaling, In DB Business Logic | Scaling Postgres 245

Episode 90

Chaos Order, Pub Sub, Petabyte Scale, PgBouncer Connections | Scaling Postgres 90

Episode 230

Stats Collector Gone, Handling Latency, PG In The Browser, Future High Availability | Scaling Postgres 230

New Releases, Reconnect After Failure, Bablefish, Troubleshoot Disk & Memory | Scaling Postgres 152