Any Scale, Monitoring Vacuum, Copy, Annotated Config | Scaling Postgres 7

Episode 7 April 09, 2018 00:11:09
Any Scale, Monitoring Vacuum, Copy, Annotated Config | Scaling Postgres 7
Scaling Postgres
Any Scale, Monitoring Vacuum, Copy, Annotated Config | Scaling Postgres 7

Apr 09 2018 | 00:11:09

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we review articles covering Postgres at any scale, monitoring vacuum, using copy, annotated config files, and how to do easy replication failback to an old primary database.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/7-any-scale-monitor-vacuum-copy-annotated-config/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about postgres at any scale, monitoring vacuum using copy in annotated config files. I'm Kristen Jamison, and this is Scaling postgres episode seven. [00:00:21] Alright, our first piece of content is actually YouTube video and it is Postgres. It's from a presentation called Postgres at Any Scale by Craig Kirstein's, I believe at Citus Data. Now this was actually done in 2017, but it was just recently, a relatively recently post in February. And in this, this is intended to be more of a developer talk. So he goes over at the different scales from where you're starting your app to where you are getting up to a medium level of traffic and then a higher level of traffic, what kind of things you need to pay attention to. So definitely if you're a developer, I would give this YouTube video a watch. The only comment I have with regard to it is that when he discussed replication, he mentioned all these tools, but not the ones that Postgres provides by default, which I found kind of interesting. And he really didn't go over too much partitioning and where that could potentially be useful. He went straight into Sharding. But of course, Citizens Data is a Sharding Data platform, so that makes sense that he's more focused on that. But it's definitely a good presentation for developers and some good guidance. [00:01:41] The next piece of content is actually a YouTube channel, not a specific presentation, because there were quite a few presentations, but Pgcomf India from 2018, they recently posted all the videos on this YouTube channel. So definitely check it out and see if there's something that you might find of interest that will help you scale your PostgreSQL database. [00:02:10] The next article is Monitoring PostgreSQL vacuum processes, and this is from the Datadoghq.com blog by Emily Chang, and it goes over how to monitor vacuum and what to pay attention to. So, for example, they talk about the different metrics that you should be monitoring, including dead rows, table disk usage, last time vacuum or auto vacuum ran, and different vacuum events. [00:02:40] And they also go over investigating different vacuum related issues. So is Auto vacuum running or not? Are vacuums running into lock conflicts? Do you have long running open transactions that can cause issues with vacuum? So, definitely a pretty comprehensive blog post about vacuum. So I definitely suggest you doing a readover of this to get more familiar with vacuum. If you're not, or if you have particular issues, maybe there's something here that could assist. [00:03:15] The next article is fast. CSV and JSON ingestion in PostgreSQL with copy by Jonathan Katz. And this is on the Crunchy Enterprise PostgreSQL blog. So basically this is an article that discusses how to use copy, and if you're not familiar with copy, it's definitely something to be aware of. It basically allows you to insert a lot of records from a CSV file, or they also have a JSON data file, and it does it much faster than doing individual insert statements, so it's definitely a good thing to have in your toolkit. And Copy can be used to upload data into a PostgreSQL database or actually do export it to a CSV file as well, for example. [00:04:03] So they go over some examples here. So definitely something to check out. [00:04:08] Now, with this, in terms of scaling, one thing you may be doing is inserting updating a lot of data. And for that, I recommend taking a look at this documentation, populating a database and different performance tips that you can do, such as disabling auto commit using copy, removing indexes, foreign key constraints and then adding them back after you do the import. For example. So different ways to speed up if you're trying to import a lot of data into tables. [00:04:40] The next article is advanced. SQL. Why? First Value and Last Value are no bugs, or probably are not considered bugs. This is from the Cybertechql.com Blog and in it First Value and Last Value are related to windowing functions and this is a very short blog post, but he goes over something you may not expect in their usage and explains why. So if you're not that familiar with window functions, I definitely suggest checking it out and seeing what capabilities they can provide for you. It may be more efficient to do some of these capabilities in the database as opposed to at the application level, but definitely a good blog post to check out. [00:05:28] The next article is New annotated config files for PostgreSQL Ten. And this is from the database Soup blog by Josh Perkus. And in it he has basically set up an annotated version of the PostgreSQL comp file, and he has also included a simple comp file with descriptions for the ten most or excuse me, 20 most commonly changed settings and detailed advice on how to set them, and then an extra 20 most likely to be changed settings. So if you've ever been confused by some of the settings in the PostgreSQL comp for maybe what some of them did, even after looking some of the PostgreSQL documentation, this would be another source to check out and also general recommendations as to what you potentially should be changing when you're doing your PostgreSQL configuration. [00:06:25] And this is the GitHub site that has the and I'll include this link in the show notes has the annotated CSV file as well as the simple configuration and the extra configuration. So definitely a piece of content to check out. The next post is Upgrading PostgreSQL from 9.4 to 10.3 with PG Logical. Now I presented a presentation that discussed using PG Logical to upgrade from one major version to another major version with minimal downtime, and this blog post essentially does that, but it actually gives all the instructions and goes through how they actually did the upgrade. So if you're considering going through this process, definitely check out this blog post. [00:07:14] Oh, and this is from Douglas Hunley at Hunleyd GitHub IO. [00:07:22] The next article is announcing PG. [00:07:25] And this is from the Rapidloop.com blog. [00:07:30] And a few episodes ago I mentioned that Pgmetrics was released. This was a command line tool that essentially sent SQL functions to postgres and collected the data and then formatted them into a JSON format. Well, this is an online service called PG Dash that actually processes those metrics that PG metrics produces. Now this is a service that runs in the cloud. It's free during beta, it says in this post. But if you're looking for additional ways to monitor your PostgreSQL instances, you may want to check this out. The next post is Citize 7.3, broader SQL coverage, tableau integration, top end extension, and more. This is from the Citusdata.com blog. And this is they're announcing their version 7.3. So in the interest of scaling postgres, if at some point you reach the scale that you feel you need to start sharding your database, you may want to check out Citus data to help you with that. [00:08:33] And the last post is fun with PG backrest. And this is from the Pgdba.org blog. Now, typically I use the built in or the community provided utilities that PostgreSQL provides, but there are a number of other backup technologies or replication technologies you can use. PG backrest is one. So if you're interested in potentially using this utility, it goes over here, how to set it up, and even has some videos with regard to that. The only disadvantage with this particular post, and maybe it will be something in the future, is that they talk about performing the backup, but they don't do the restore part. Usually I always like to see that because that's the most important part of the backup is the restore. [00:09:20] Generally when I do my tutorials, I also show how to do the restore. [00:09:26] And the next article is PostgreSQL replication with Easy failback. And this is a in depth video tutorial I put [email protected] and it details how when you have a failover event, so you have a master and one or more replicas and you promote one of those replicas or also considered a failover event, you fail over to that replica. At some point you want to fail back to the master. Now, typically you would have to restore the entire database to the master and then make it become a replica of that new primary. However, if you have a large database, that can take a really long time. So there is a way to reuse that old master or that old primary. Now, one way is you make sure that no writes have happened to that old primary after the promotion event. And I call that easy way to fail back to that old primary. And that's what this tutorial goes over. The other way is using PG rewind and there's going to be a tutorial coming up on showing you how to do that. So if you are using Replication and want to do failback events, definitely check out this tutorial that does it. For this episode of Scaling Postgres, you can get all the content presented in the Show Notes be to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you can choose to subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 62

May 05, 2019 00:15:39
Episode Cover

Index Include Clause, Exporting CSV, JSON, Zedstore | Scaling Postgres 62

In this episode of Scaling Postgres, we review articles covering the index include clause, exporting data to CSV, using JSON in Postgres and Zedstore....

Listen

Episode 301

February 04, 2024 00:18:14
Episode Cover

Postgres LLM OS & 30 Times Faster Index Builds | Scaling Postgres 301

In this episode of Scaling Postgres, we discuss how you can build a GPT in 500 lines of SQL code, how to optimize extension...

Listen

Episode 245

December 12, 2022 00:11:40
Episode Cover

ENUMs vs Check Constraints, Faceting With Roaring Bitmaps, Better Scaling, In DB Business Logic | Scaling Postgres 245

In this episode of Scaling Postgres, we discuss ENUMs vs. check constraints, querying table facets with roaring bitmaps, a better way to handle scaling...

Listen