ANSI Schmansi, Split-Brain, Performance Scenarios, Parallelism

[00:00:00] In this episode of Scaling Postgres, we talk about ANSI Schmancy split brain performance scenarios and parallelism. I'm creston. Jameson. And this is scaling postgres episode ten. [00:00:20] Our first article is ANSI Schmancy how I learned to stop Worrying and Love postgres isms. And this was actually a presentation done at the recent postgres conf in Jersey City, New Jersey. And this can be found at the Sayrope GitHub IO blog because it was actually Sayrope Sarkuni that did this presentation. So what this article is about is that you can choose to treat your database in an agnostic way, meaning you store your data in postgres just as you would MySQL or Oracle, and use ANSI compliant SQL for everything. And potentially in the future you may transition to another database. However, this presentation talks about how we learn to stop worrying about that and love all the features and power that comes with using postgres specific features. So this was a really great presentation that covers a lot of different features of postgres, especially for developers and how to use it. So here's just to touch on some of the interesting things that he mentioned. [00:01:28] Support for dollar quoting. So instead of doing quotes like this in single quotes, you can use dollar quotes. He talks about a hotel booking example using range types and exclusions to make sure you don't double book a room. [00:01:42] He talks about filtered aggregates, as opposed to using a complex case statement, you could use filtered aggregates to do these types of queries. He talks about foreign data wrappers and how they could connect up to Redis or Oracle or other database systems. So, very, very comprehensive. And I've just touched the surface on what's there, but definitely a presentation that you should review. [00:02:08] Now there is a YouTube post of it that someone put up and it is someone manually doing the recording. So if you want to look at the presentation as part of it, you can check out this YouTube video and I'll include a link in the Show Notes. [00:02:24] The next article is GitLab crawling back online after breaking its brain in two. Database replication snafu took down three out of five PostgreSQL servers. So this is at the Register Co UK is where I saw the initial report. It discusses an issue that GitLab was very recently experiencing, where they had a split brain issue. And it includes links, and again, I'll include these links in the Show Notes where they talk about the issue, actually their infrastructure issues area on GitLab itself. And they say there was a database failover that was accidentally performed, leading to a split brain problem. So if you're unfamiliar with this, it's basically you have a primary database, but suddenly you promote one of the replicas. So now you have two primaries and some of your traffic may be talking to the old primary, some may be talking to the new primary. So you have transactions potentially going to both and how do you resolve those? So what I found interesting about this is that it definitely goes into depth and has discussions on how they're dealing with the situation and what steps they're taking. [00:03:42] So definitely an in depth fly on the wall view of what's going on. And related to this they also released this Google Doc that describes the issue as well you may want to review. So definitely if you are responsible for operations you might review this and potentially find out some things that are beneficial to you. [00:04:03] The next post is actually a presentation, it's PostgreSQL Ten Performance and you by Gabriel Insulin and this [email protected] and I will include the link in the show notes of course. And this was particularly interesting because he actually goes over different scenarios. For example a scenario where you're collecting quotes, he discusses encountering a particular issue with performance and well a solution to that would be using the native table partitioning, that's part of version ten. He discusses another scenario where using full text JSON searching in postgres ten could help resolve the issue. So he basically has business cases and then discusses how postgres ten can help resolve those. Also talks about using parallel queries for particular queries that are hitting the database. So basically having a business case and then applying a new feature in postgres Ten that could help solve it. So definitely a presentation I suggest you check out. [00:05:08] The next article related to parallelism is actually called let's speed things up and this is from the Data Egret.com blog and it discusses the new parallel features, there were a few that were added in version 9.6 and also version ten. And he discusses several different parameters that you can set and even this very useful graph to help you understand what settings these parameters affect in terms of max worker processes, max parallel workers per gather. So it's individual processes that are doing the parallel work and then the number of max parallel workers and even has a suggestion on how you can configure these given a particular size database. So here he says a 32 CPU cores and SSDs. So one thing to keep in mind if you're changing these configuration settings is that you're not really getting anything for free. Whenever you're going to utilize more CPUs or CPU cores for doing parallel operations, that means there's less available for something else. So if you're already close to getting high on your CPU, 60, 70, 80%, and you're hoping that this is going to magically give you faster queries, it all depends on what kind of load you're experiencing. You may execute certain queries faster in parallel, but it may be at the expense of concurrency for non parallel queries. So it's definitely something that you'll have to experiment with given your workload, what settings work best. [00:06:50] So definitely a blog post to check out if you're considering using more of the parallel features that PostgreSQL offers. [00:06:59] The next post is PostgreSQL's F sync surprise. So this is an article that's discussing an interesting issue with the Linux kernel and how it handles writes to the disk. So PostgreSQL relies upon the operating system to actually commit items to the disk. It doesn't do direct writes to the disk, it relies on the operating system, if it runs on Linux, the Linux operating system to do it. But there are some cases that some errors can occur and they're not being reported back, which means you could potentially get data corruption. So this goes into a lot of detail and a lot of quotes from I believe listservs are people discussing the issues and how to potentially address it. So if you're responsible for Ops, again, this is an interesting article to look at to just understand what the situation is long term. They're basically moving to Direct IO to the disk would give Postgres greater control. But there's a quote here, it says it would be a metric ton of work to do that and not relying on the operating system to do it. So definitely an interesting issue that's been brought to the forefront. The next article getting started with Patroni, and this is from the ops.com blog. So Patroni is a tool for allowing PostgreSQL servers to operate in high availability configurations so it can automatically fail over from a primary to a replica that's running. So this post goes over how to set up Petroni and it says you do need a Distributed Configuration manager to use Petrone and it indicates it supports Zookeeper, etcd and console they're using in their example, etcd or etcd. And they discuss how to install it. Installing Postgres, setting up the configuration files so that you can set up so it can automatically trigger failover when needed. So if you're considering using an automatic failover for your PostgreSQL replication instances, definitely check out this article. The next article is called Aura Migrator moving from Oracle to PostgreSQL even faster. And this is on the CyberTech Postgresql.com blog and they're describing a tool or utility called or a Migrator. And it basically allows using a Oracle foreign data wrapper, enables from a Postgres server to connect to Oracle, copy over system tables and then copy over data as it indicates as part of a single transaction. So it's basically a way to migrate from an Oracle database to using a PostgreSQL database. Now it assumes that it's just going to be copying over tables, maybe some primary foreign keys, some indexes, some things of that nature that it recreates, not so much the stored procedures that would exist in Oracle, you would have to do those manually, but this tool can get you a lot of the way there. So if you're currently using Oracle and considering migrating to PostgreSQL, definitely a blog post to check out the next blog post is Cloud SQL for Postgres SQL now generally available and ready for your production workloads. So much as I reported a number of weeks ago that Azure has gone into general availability with their PostgreSQL solution. Now it appears that Google has done the same. So if you use Google Cloud or you're looking for a hosting platform for your PostgreSQL database, you can check them out. [00:10:49] And the last article is actually an in depth tutorial that I put together on PostgreSQL replication monitoring. So I'd done a number of tutorials setting up replication in different ways and I went ahead and did a tutorial about monitoring or what you want to keep track of using the system views that PostgreSQL provides. So I go over that, show you how to track for different timing and explaining what some of the different columns mean in terms of extracting lag or delay between a master and replications and what to watch out for when you have replication active. Probably the most important points is if you're using replication slots that the replication slot is active for any active replicas, as well as you have an entry in PG Stat Replication for each replica. So if you use streaming replication and want to get a bit more information about monitoring, I encourage you to check it out. [00:11:48] That does it. For this episode of Scaling Postgres, you can get links to all the content presented in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 271

30% Faster, UUID Downsides, Growing WAL, Processes vs. Threads | Scaling Postgres 271

Episode 293

Better Plans With Statistics | Scaling Postgres 293

Episode 70

Default Security, Max Connections, Progress Reporting, Migration | Scaling Postgres 70

ANSI Schmansi, Split-Brain, Performance Scenarios, Parallelism | Scaling Postgres 10