Scaling Advice, Vicious Cycle, Improve Observability, Addressing Corruption | Scaling Postgres 150

Episode 150 January 31, 2021 00:16:57
Scaling Advice, Vicious Cycle, Improve Observability, Addressing Corruption | Scaling Postgres 150
Scaling Postgres
Scaling Advice, Vicious Cycle, Improve Observability, Addressing Corruption | Scaling Postgres 150

Jan 31 2021 | 00:16:57

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss scaling advice, a vicious cycle of WAL writes, how to improve observability and addressing database corruption.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/150-scaling-advice-vicious-cycle-improving-observability-addressing-corruption/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about scaling advice, a vicious circle, improving observability and addressing corruption. I'm Kristen Jameson and this is Scaling Postgres, episode 150. [00:00:21] All right, I hope you, your friends, family and coworkers can continue to do well. Our first piece of content is Postgres Scaling Advice for 2021. This is from Cybertechgresql.com and in a nutshell, most of the advice is don't prematurely optimize. So if you're wanting to scale, don't think of creating some huge distributed database system. Just start with a single server. Because postgres can get you a long way on just that single server and it covers things such as all distributed systems are inherently complex and difficult to get right. So the more you try to scale out or become distributed, the more complicated things get. And he mentions that with a good server, a single PostgreSQL instance can easily do hundreds of thousands of transactions per second. So he did some readonly tests of his local system and got 32,000 transactions per second. Then he looked at where he was doing another workstation with around 45,000 write transactions per second. So he says basically a single node can typically do tens of thousands of write transactions per second and that a single postgres instance can easily handle dozens of terabytes of data, which is true. And then also a single node instance is literally bulletproof as far as data consistency is concerned. But he says to help achieve this, you need to declare your constraints so that your data is in a good state and don't fool around with the F sync or Asynchronous commit option. So you make sure that things get committed to the disks appropriately and use good disk systems because we're actually going to cover a post in a little bit that talks about corruption. And one of the main ways corruption can happen is through hardware issues. But he also covers how do you be prepared for when you need to scale. So if you need to scale out, what do you need to do? So the first thing he mentions is don't be afraid to run your own database and that you can migrate to the cloud later if that's something you feel you need to do. Don't make the mistake of having your entire data architecture centered around one huge table. Be sure to use plenty of tables to spread things out. And thirdly, make sure you bake in a proper sharding key in case that is a direction you have to go. So in other words, make sure you can partition all of your tables pretty much with a shard key. So if all of your tables include that key, then presumably you could shard your data that way. So this post didn't have a lot of specific advice for scaling because those are covered in other posts, but it's more kind of a thought post for don't prematurely optimize your database. So if you're interested in learning more you can check out this post. [00:02:59] The next piece of content is the vicious circle of huge writes. This is from Mydbagnotebook.org and they're talking about a cycle of wall writing that you can get into and that if you have a lot of write activity, a lot of wall files are being written that can cause checkpoints to occur more frequently, which of course triggers more full page image writes which gives you more wall. So more wall leads to more frequent checkpoints, which leads to more wall being generated, which leads to more checkpoints. So it's a vicious circle and then you have auto vacuum kicking in when you're updating a lot of rows for example, which then also causes more wall to be written, thus propagating this vicious circle. So there's a graph right here that depicts it is that you have huge writes on a database in a short time that causes a lot of wall to be written. Now this triggers checkpoints more frequently, which of course causes more wall to be written because of the full page image rights. And then still while this is going on, it also triggers auto vacuum to happen more frequently when you have a lot of huge rights which causes more wall to be written and it's a constant vicious circle. Now, in terms of how to solve that, the number one thing she's advocating is what is your application doing? So for example, if you have a lot of updates, is there a way you can reduce the amount of updates? Like maybe you insert a lot of this data and then update it less frequently? Or can you architect your writes a little different? The other thing you can do is increase your max wall size so that the checkpoints don't happen as frequently, or change your checkpoint timeout or your checkpoint settings so that your checkpoint less frequently which gives you less full page image rights. You could also potentially do some tweaks to auto vacuum. So if you're interested in learning more about this, you can check out this post. [00:04:48] The next piece of content is configuring PostgreSQL for observability. So this is being able to determine what's going on in your database. Now the first thing they mentioned doing is actually changing your log line prefix to add a bit more information to add information such as the application being used in terms of the client, the username, the database connected to the IP address, so you could potentially identify the host and things of that nature. So this gives you more observability to what's going on. When a particular error, say gets written to the logs, the next thing they mentioned is to log your slow queries. So you could do use log min duration statement. They propose 1 second so any query longer than a second will get logged. Now, if you have too many logs you can actually use sampling in more recent versions of postgres so they give a proposal of how you could sample say, 25% of the slow statements here. In addition to that, they mention logging your locks. So for example, turn on logging the lock weights and they also mention logging auto vacuums. So specifying the duration when a vacuum is taking too long so that you can know what's going on and happens with the process logging checkpoints. And you also might want to alter say, your track IO timing or the track commit timestamp. Now in addition to configuration settings, you could also add the PG Stat Statements extension so that you can get a sense of all the different statements that are being run. You could also use the Auto Explain extension that will automatically put an explain plan in the log when queries take too long and they show you how to configure that here. And then lastly on the client side, you can actually specify an application parameter so that you can uniquely identify what clients are connecting to the database. So for example, they set an application name of the client that's running the weekly revenue report so they'll know exactly what connection is causing particular issues if there's a problem in the logs. So interesting blog post that you can check out from PG IO. [00:06:49] The next piece of content is avoiding, identifying and dealing with PostgreSQL database corruption. Part One So they talk a little bit about what is database corruption and some of the causes. Mainly bad hardware, bad software is also a culprit. One thing they're mentioning here that CFS has been known to have some issues on Linux. I don't know how true this is anymore, but that's another indication. Misconfiguration in user errors and then they go into symptoms of a corrupted database. So maybe you have duplicate rows that shouldn't be there or rows are deleting, they're not appearing anymore. Or you have system crashes attempting to read the data or even error messages that are related to internal tables or internal file structures and it's not related to client activity. Those could be indications of corruptions. Now in terms of best practice to avoid data corruption, they talk about backups and restores. Now of course this won't prevent corruption but will help you recover it from it if in the worst case scenario but in terms of a way to avoid it is that keep your Fsync on, keep your full page writes on and set the appropriate value for the wall sync method. Additionally, if you want to protect yourself even more, you can turn on checksums for your data and the last thing they cover is some other best practices. So for example, they mentioned doing a plug pull test of your system to make sure that it can recover successfully. So all your hardware and or software is working as you expected. Next, never modify your data files manually. Very bad. To do that, don't go in and delete the postmaster PID the process ID. Don't run antivirus on the data directory and don't kill any PostgreSQL processes. Use the commands PG cancel backend. Or, if you really need to, the PG terminate backend commands. So this is a great list of recommendations for trying to avoid corruption of your database. I definitely suggest you check it out. This post was from Haigo, CA. [00:08:52] The next piece of content is our journey to PostgreSQL twelve. This is from Coffeemeatsbagel.com and they're talking about an upgrade they did from 9.6 to 12.4. Now, what I found interesting and they give a description of their architecture here. Basically they have three Replicas, an additional background Replica and an ETL Replica. They do have PG bouncer that's in the mix and they needed to upgrade this system. Now, the reasons for upgrade is that the primary database had been running for about 3.5 years and it had a number of oddities. So for example, quote new services would refuse to run in system D so they had to run things like a datadog h in a screen session and sometimes CPU usage went above 50%. It would completely be unresponsive to SSH. So that's a serious concern. And then their disk usage was an issue because they were running out. So they needed to address that. So when doing the upgrade they wanted to have minimal downtime, of course and roll out the new clusters on new instances to replace the current aging fleet and upgrade their instances. So to have a little bit more headroom. Now they considered doing a logical dump and restore but that would take way too long. So then they considered PG upgrade but they didn't want to have any sort of downtime at all. So they chose using logical replication. And specifically they decided to use PG logical because I believe this has some additional niceties to be able to subtract the sequences which logical replication normally does not. And they went through the process of doing it and basically they first brought up a new postgres 12.4 primary that was doing PG logical replication. And then they set up streaming to set up new 12.4 Replicas from the new 12.4 primary and they just used the standard streaming replication. And then once everything looked good, they put the site into maintenance mode, switched the DNS record to the primary database, forced a sync of all the primary key sequences. So I believe this is a PG logical capability. Run a manual checkpoint on the old primary, perform some data validation tests against the new primary and bring the system back up. And it happened successfully. Now, they did have a couple of issues that they mentioned is that slow synchronization can be dangerous. So during the initial PG logical synchronization they had to copy a lot of data and that was causing wall to build up because it's holding the wall, retaining it while it's doing that initial data copy. So that was a concern for them. They had to cancel it. And what they did to have it happen more quickly is they dropped all the indexes on the table being synchronized. They set F sync to off. I would be really concerned about studying that given what we understand about database corruption. They set the max wall size to 50GB and checkpoint timeout to 1 hour. So basically they extended those checkpoints to minimize the amount of wall writing. And with that they were able to get the data transferred over. And also they mentioned that every update is logged as a conflict. So they went ahead and set a PG Logical Conflict log level to Debug to avoid those messages. So if you're interested in learning more about how they did this go ahead and check out this blog post. The next piece of content is monitoring replication, PG Stat replication. So this talks about looking at the PG Stat replication view on the primary to determine what replication is happening. And as I mentioned here, it only knows what Replicas are connected to that primary. So if you have a Cascading streaming setup here where this primary synchronizes to these Replicas, it doesn't know anything about these Replicas. So it only has two entries in PGSTAT replication whereas this Replica that has three Replicas against it has the three entries of these. So just something to keep in mind if you're using this view to understand what's going on with your PG Stat replication. And you have four LSN indicators. One is has it been sent? Which means has the primary sent this LSN to the Replica? Second is the write LSN which means has it been written to the operating system? Not necessarily flush to disk but the OS knows that it's a write that needs to be done. Thirdly, has it been flushed? So has that LSN been written to the disk on the Replica and then finally replay LSN which means the LSN that has been written to the database file. So if you do a query of this database system you will return that data and they talk about replication lag. And they do have lag columns by time in the PG Stat replication view. So you can determine the write lag, the flush lag, the replay lag so you can know how closely the Replicas are being maintained to the primary. So if you're interested in learning more you can check out this post from Cybertechnposgresql.com. [00:13:46] The next piece of content is how to run some tasks without user intervention at specific times. Part Two so this is a part two of a post from Dep and this one talks about PG Cron. So it's another extension you can use with postgres to do Cron scheduling of tasks. So if you're interested in that you can check out this blog post. [00:14:08] Next piece of content is actually a YouTube video. It's managing PostgreSQL with ansible. [00:14:14] This is from the San Francisco Bay Area PostgreSQL Users Group. And this has about a 50 minutes presentation on how to use ansible to configure and set up postgres. [00:14:27] The next piece of content is how to create PostgreSQL custom builds and debian packages. This is from Percona.com, so it does exactly that. It walks you through the process of actually creating your own debian packages for custom builds of postgres. [00:14:43] The next piece of content is a preliminary exploration of dynamic hashing in PostgreSQL. And this is from Haigo, CA. Now, this is very internal related post to postgres, so if that is of interest to you, you can check out this post. [00:14:58] Next piece of content is deploy PostgreSQL with TLS and Kubernetes. So if you use Kubernetes and want to set up TLS with Postgres, check out this post from Crunchydata.com. [00:15:10] Also from Crunchydata.com is deploy TLS for PG bouncer in Kubernetes. So if you are setting up PG bouncer with Kubernetes, you can follow this post to determine how to set that up. [00:15:22] The next piece of content does Post GIS 3.1 mean the end of topology exception? So apparently when you're doing a spatial analysis, certain activities can fail when geometries don't comply with OGC validity rules. But the hope was that with this update, these exceptions could be reduced. Unfortunately, in the testing that was done here, the conclusion was that there are still some issues with it. So he basically says you probably shouldn't hurry to update PostGIS, but if you want to learn more about that, check out this post from Elephanttamer net. The next piece of content is better. OpenStreetMap places in PostGIS. This is from Rustproof Labs and they have been talking about the new Flex output available in OpenStreetMap and here they cover some additions and changes since 0.0.3 they were looking at 0.7. So if you're interested in some of the additions and changes, you can check out this blog post and the last piece of content. The PostgreSQL Person of the Week is Frank Parchon. So if you're interested in learning more about Frank and his contributions to postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingposgrows.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 220

June 19, 2022 00:18:57
Episode Cover

PG 14.4 Release, Subscription Skip, Using Schemas, Open Source Citus | Scaling Postgres 220

In this episode of Scaling Postgres, we discuss a PG 14.4 release, how to skip transactions with logical replication, how to use schemas and...

Listen

Episode 22

July 23, 2018 00:11:43
Episode Cover

Concurrency, Listen & Notify, ZFS, pg_rewind | Scaling Postgres 22

In this episode of Scaling Postgres, we review articles covering concurrency, how to use listen & notify, zfs, caution regarding pg_rewind. To get the...

Listen

Episode 234

September 25, 2022 00:16:20
Episode Cover

Rust for Extensions, Timescale vs. Postgres, Uninterrupted Sharding, Data Flow | Scaling Postgres 234

In this episode of Scaling Postgres, we discuss using rust for Postgres extensions, performance comparisons of TimescaleDB vs. Postgres, uninterrupted writes when sharding in...

Listen