pg_receivewal, Application Failover, pg_checksums, pgBouncer

[00:00:00] In this episode of Scaling Postgres we talk about PG receive wall application failover PG checksums and PG bouncer. I'm creston. Jameson and this is scaling postgres episode 87 one. [00:00:21] Alright, I hope you're having a great week. Our first piece of content is never lose a PostgreSQL transaction with Pgraceive wall. And this is from CyberTech Postgresql.com, and this article is talking about Pgraceivewall. And basically your PostgreSQL database is constantly producing wall files or write ahead log files. And you need to keep those around so that you can restore your database up to a point in time if you have enabled archive log mode. Now, the typical way to do it is to just archive it with the archive command. And that command, as they mentioned here, is to copy to some local drive and then sending it somewhere else, or to send it to some network attached storage like NFS or secure copy it somewhere, r sync somewhere, or even to use it for some backup software. Basically you eventually want to get it off the database into some other storage medium, so that way it's safe in case something happens to the database that it was generated on. Now, one thing they mentioned here under the section of When Archive Command isn't Good enough is that you can still lose some committed transactions because it waits to fill up the whole log file, which is by default it's 16 megabytes worth each wall file. It waits till that wall file is full before it archives it. So you could have that partially filled, so you could be missing transactions if your system goes down and you only have what the Archive command has produced. But that's something that the PG Receive Wall utility can do. Now, it's actually designed to stream the wall files to a separate location. So it actually generates wall files. It is not in and of itself a database, but it kind of uses streaming replication to read the wall stream and produces the wall files where it's running. But an added benefit of it is that it also writes partial files. So it has a extension partial as it's filling up a particular file as it's being streamed through the replication. And you should also note that it's called PG Receive Wall in version ten. But prior to version ten, it was called PG Receive XLOG. And this section here talking about this ability to repartial files is that PG Receive Wall is an alternative to wall archiving that avoids the gap between the current and the archived wall location. So that is one extra benefit. I've used it to help easily get wall files off of the primary database server because you can set up a dedicated system to run PG Receive Wall and that automatically gets the wall files off of, say, your primary database system. And it enables your primary database to have more resources to use for other purposes. So that wall archiving is handled by a separate system. And they say by default, the replication with PG Receive wall is Asynchronous, so you can enable it to be synchronized, but if you do, it has to be written to the PG Receive wall location before that transaction can be committed. So you can slow down your write performance. And if anything happens to this PG Receive Wall, your system won't accept any commits at all, which essentially means you're down. So what they suggest here is to avoid this type of problem, you need at least two synchronous PG Receive Wall processes. So this is a great post that talks about PG Receive wall and also some of the benefits it has over the archive command. So if you're not yet using it definitely a blog post to check out. [00:03:55] The next post is Seamless application failover using libpq features in PostgreSQL. And this is from Percona.com and this is talking about the ability of libpq and by extension the psql clients for you to insert multiple hosts that you can connect to as a part of the connection string. And it will actually try each one in turn. So it kind of gives you a way to do application failover using these because a lot of application environments use libpq to be able to connect to postgres. And you can also specify whether those connections you're making should be in a read write state or a read only state. So they're giving you an example here of where they have three servers, a primary and two standbys. And you can actually make a connection string like this, where you put a comma between each host you want to contact. And then you can specify target session attributes and equal read write. So what it's going to do is going to run this command on the first read write server it finds, so you can see it's connecting to Ten. The first one listed in this string here is 20, and the second one is Ten, which is the read write, the primary, so that's how it knows to go ahead and connect to it. And secondly, you can connect to any server for reads. So this one is using the target session attributes any and it's selecting the first one in the list here and the first one in the list here. So if something happened to go wrong with one of these servers, it would of course connect to the second one and then they show how they stop PostgreSQL on the Ten server. And with this connection string you can see it will automatically connect up to 20. Now they're saying you're going to have some sort of delay that happens as it tries to connect to that first one, but it'll eventually connect to the second one. And then they give an example using Python and PHP, how you can use these connection strings with these application environments. And of course it exists for others as well. And in this one you can see they're specifying target session attributes being read write as well. So this is a native function as I believe in and around PostgreSQL ten, so it gives you a bare bones way to do application failover. So if this is of interest to you, definitely a blog post to check out. [00:06:15] The next post is Auto Failover with PostgreSQL twelve. And this is from the tech community on Microsoft.com and they're talking about PG Auto Failover. And so with the release of PostgreSQL Twelve and the changes to the Recovery Comp file, utilities such as these need to be updated. So basically they've said they have updated it and their version 1.0.5 will work with PostgreSQL Twelve. So this seems to be a relatively simple solution for doing Auto Failover. They talk a little bit about what it can do currently, as well as what they have planned for the future for PG Auto Failover. So if you're looking for a failover solution, definitely a solution to check out that is compatible with PostgreSQL Twelve. [00:07:00] The next post is PG Checksums 1.0 released and this is from Creditive.com and this is a utility that verifies activates or deactivates data checksums in PostgreSQL instances. [00:07:15] So there is already a PG Checksums utility in PostgreSQL, but this one allows online verification of checksums. That's pretty much its primary feature, so you don't have to bring down the database to do the checksum process. And of course it's compatible with all supported version of PostgreSQL as opposed to just the more recent one where they've made some changes to the PG Checksums as well as activation of progress reporting during operation and I O rate limiting. So if you want a more robust version of doing checksums with PostgreSQL, definitely a utility to check out. [00:07:52] The next post is PG Bouncer tutorial, installing configuring and testing persistent PostgreSQL connection pooling. So this is from Enterprisedb.com and this is a first, what they say is going to be a multi part series on using PG Bouncer and how to handle failovers of the database and things of that nature. So this first post is relatively simple, it's just installing a PG Bouncer, connecting up to a Postgres instance and then seeing if you bounce the network connection. What impact does that have? Can the PG Bouncer still reconnect up to it without losing its connection? So, a relatively simple getting started post, but I'm interested to seeing what's going to be coming in the future with the second and third editions of this blog post series. [00:08:41] The next post is monitoring PostgreSQL clusters in Kubernetes, and this is from Crunchydata.com. And they're talking about using the PostgreSQL operator, which manages PostgreSQL clusters on Kubernetes developed by Crunchy Data, and how they tell you how you can set it up using PG Monitor, which is a open source solution using Grafana and Prometheus as a data store to be able to track reporting information of PostgreSQL. And this post goes into how to set this up for a set of PostgreSQL clusters in Kubernetes, so if that's of interest to you, definitely a blog post to check out. And if you're looking at monitoring solution in general, they do have a link to their PG monitor as a way to build something up relatively quickly, and I believe by default it works more easily with CentOS and Red Hat Linux. [00:09:39] The next post is a beginner's guide to formatting dates in SQL. This is from the Pinopley IO block and it's a very simple post, talking about timestamps, years, how to store certain dates, converting dates into a string, extracting certain parts of a date as well as truncating to certain parts of a date. So if you're interested in learning some more about the date time functions in PostgreSQL, definitely a blog post to check out. [00:10:08] Next post is PostGIS 3.0.0 is released. So now that PostgreSQL twelve has been fully released, they have released PostGIS 3.0.0. So if you're interested in this, definitely a blog post and a project to check out. The next series of posts are a little bit older, but they're talking about a text search of postgres. The first post is using PostgreSQL to create an efficient search engine, and they talk about comparing the different ways you want to do a search. So for example, if you're looking for a prefix in a fuzzy query, you can use a b tree index. And here's how you can specify your searches. If you're looking for a suffix and fuzzy query, you can do your searches this way using the reverse column expression and a b tree index. If you're wanting to do a prefix and or suffix and a fuzzy query, you can use PG trigram extension and gen indexes, or you can use gen and rum indexes for full text search. And this post goes into a very comprehensive discussion of full text searching in general, how to break apart different words, defining search, syntax, sorting algorithms, and other types of information. So if you want to get more information about text searching in postgres, definitely a post to check out. Follow on posts that are related to it are the rum index and full text search using PostgreSQL. This does a comparison of using storing data as an array or in a text field and using a gen index or a rum index and seeing how the performance changes with different types of queries. The next post in the series is using the built in and custom ranking algorithms of PostgreSQL for full text searches. So this goes into ranking algorithms using a TS vector or using a multidimensional array. And then the last post is term frequency statistics in PostgreSQL full text searches. So how frequent or infrequent a particular term appears in a series of text. So if you're interested in these types of postgres, definitely ones to check out. [00:12:16] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe by YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 311

1,400 Times Faster Max & Group By Performance | Scaling Postgres 311

Episode 132

Postgres 13 RC1, Upgrading at Scale, With Ties, HOT Updates | Scaling Postgres 132

Episode 170

Redis vs Postgres, Hard Quadrant, Optimizing Joins, Materialized Views | Scaling Postgres 170

pg_receivewal, Application Failover, pg_checksums, pgBouncer | Scaling Postgres 87