Popularity, Load Testing, Checksums, pg_hba | Scaling Postgres 59

Episode 59 April 14, 2019 00:12:20
Popularity, Load Testing, Checksums, pg_hba | Scaling Postgres 59
Scaling Postgres
Popularity, Load Testing, Checksums, pg_hba | Scaling Postgres 59

Apr 14 2019 | 00:12:20

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we review articles covering Posgres' popularity, conducting load testing, checksums in PG12 and pg_hba configuration.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/59-popularity-load-testing-checksums-pg_hba/https://www.scalingpostgres.com/episodes/59-popularity-load-testing-checksums-pg_hba/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about postgres's popularity, load testing, checksums, and host based access. I'm Kristen Jameson and this is Scaling Postgres, episode 59. [00:00:20] All right, I hope everyone's having a great week. Not a lot of content this week. So this week's episode may be pretty short, but the first piece of content we have is Postgres is the coolest database. Reason number one, developers love it. And this is from the Second Quadrant.com blog. And this is a very short post, but it just reiterates some of the things that have been mentioned previously. Like DB engines for two years in a row has awarded PostgreSQL their database management system of the year. So 2017, 2018. And they do this, I believe, at the end of the year. So it was just recently awarded this reward. And this was out of a total of 343 databases that they track. So pretty good indication of its popularity. Also mentioned the stack overflow where PostgreSQL came in second to Redis. So definitely the most popular relational database system. And we'll see a presentation here. They really also classify it as the most popular comprehensive database. In other words, it's very good at doing a lot of different things. The next closest relational database I see is MariaDB. Now, I'm not including where it says Amazon RDS because is that Postgres RDS or is it only Aurora? So I'm not quite sure about that. And they list Azure and Google cloud storage, whereas you can run PostgreSQL on these services. So, I mean, for relational database, I'd consider MariaDB the next popular one. And then just looking, this is trends over the last eight years in hacker news. And you can see the popularity and these are PostgreSQL increasing. And you can see these are four different relational databases except for MongoDB. But even that its trend is slowly going down. So definitely a couple of different indications of the current popularity of PostgreSQL. [00:02:20] The next post is actually presentation that was done recently called PostgreSQL Hyperconverged Database Management System. That fully virtualizes database management. And this is by Simon Riggs, the CTO at second quadrant. Now, I did watch this presentation by Fully Virtualized is not quite sure what was meant by that in the title, but Hyperconverged, he's saying if there's some task you want to get done, pretty much PostgreSQL can probably meet that need. Now, there may be more specific databases that excel in particular areas, but because of all the features, it's super comprehensive in its feature set to be able to handle all sorts of data analytical needs, from handling JSON data types, to full text search, to even doing some GraphQL queries using with recursive. And it's a whole extension ecosystem that adds additional features to PostgreSQL apart from the core. So this presentation didn't have anything necessarily new, but it does give you a perspective of how many different areas PostgreSQL can help fill your needs. So if you have interest in. That definitely a YouTube video to watch. [00:03:36] The next post is building a PostgreSQL load tester. And this is from Lawrence Jones and I believe he's at Go cardless. And basically he had a need to simulate production load on, I believe, a test environment or a stress test environment. And he had previously used a tool called Pgrepay, which basically reads a PostgreSQL log. Now, not the wall, not the write ahead log, but the actual what you're logging to a text file on what queries are happening or statements are happening. It can read that log, parse it and then replay it against a database system. Now he had some problems when he was trying to do that where the database, the stress test environment would just stall because some queries weren't executing the same as they were in production. And he says here the new cluster was sufficiently different that several queries were now much slower than they had been in the original log capture. PG replay would ensure queries execute in the original order. Any queries that take longer to execute in the replay will block subsequent queries. So basically he had stalls when trying to use it. And his idea was, well, since this tool doesn't quite do it, maybe I'll make my own. And the most important change he sought to make is to not have all queries back up behind a stalled query, but do it by connection. So for each connection, if one of the queries in a connection gets stalled, all the other connections queries can still happen when they were supposed to, but it's just that one connection that gets delayed if one of the queries is slower than it was when it ran in production. Now he calls this new tool Pgrepayo because he used the Go language. So this post goes over how he basically wrote the first part in five days. So he talks about parsing the logs to be able to interpret it streaming from the log, to be able to replay it, the debugging process he went through and then having to go in and analyze some issues he was dealing with. So if you are interested in getting a little bit more insight to PostgreSQL or in Go in general on how he developed this, this might be of interest. But the other part I felt was of interest is that if you're wanting to use a tool like this to load test your PostgreSQL instance for stress testing purposes, maybe you can check out both PG replay, the tool he had used previously, as well as his new iteration, Pgrepayo. [00:06:16] The next post is postgres twelve, highlight PG checksums. And this is from the Michael Peculiar blog. So in postgres version, I believe it's 9.3, they introduced the concept of checksums where you can enable checksums when you create a database to be able to check for things like a file or discorruption. And then in postgres version eleven they introduced a tool called PG Verify Checksum. So this is something that can run on a database that has been shut down to verify that there's no corruption in that stopped database system. [00:06:54] But apparently for version twelve they're renaming it to the PG Checksums because it actually has three different modes of working. The first mode is Check, which is basically what it does. Now it does what PG verify checksums already does. It scans a stop database to ensure that file integrity is where it needs to be. But it also adds an enable and a disable of checksums because historically when it was introduced in 9.3, it can only be set at database creation time or cluster creation time. But with these new switches you should be able to enable it or disable an existing PostgreSQL cluster. So definitely an interesting feature to add. [00:07:39] Now you still have to stop it, but he also mentions a way you could potentially do it if you have a primary and replica setups, a process to go through to be able to enable checksums for your primary and replicas. So if that's of interest to you, definitely a feature to watch for in PostgreSQL twelve. [00:07:59] The next article is related to PostgreSQL twelve as well and it's optimizing storage of small tables in PostgreSQL. In here they did an example of creating a table and just inserting a single text value into it and they looked at the different files that created. One is the free Space map, the visibility map, some file areas related to the Toast table size, the toast index size, as well as the heap size. And they say with this one table with one value in it, you need 48 space. Due to all of this supporting files for PostgreSQL. Now many people, their tables are huge, so you're not going to be feeling this. But the scenario that apparently they have seen is the situation where an application uses schemas for multitenancy. So there you could have tons of small tables and perhaps this could be an opportunity for space savings. And what they said here, and if you look at this, the largest area that takes the most space is the free space Map. And they say with PostgreSQL twelve you only get a free space map if they exceed four pages or 32 KB with the default page size of 8. If you have a lot of small tables, this could potentially be an opportunity to save some disk space. With this new feature in PostgreSQL Twelve, the next post is PostgreSQL Security. A quick look at authentication best Practices tutorial. This is from Packethub.com and this is an excerpt from a book called Learning PostgreSQL Eleven, the third edition by Andre Volkov and Salahadin Juba. Now this post basically talks about the Pghba file, the host based access file for allowing in network traffic to access the PostgreSQL database. So it goes over the different settings that you can make for it, and some suggested best practices on how to set up your HBA comp file. They also cover a little bit about listened addresses as well as some best practices. So if you're wanting more guidance with setting up your Pghba file, definitely a blog post to check out. [00:10:15] The next post is continuous replication from a legacy PostgreSQL version to a newer version using Slony. And this is from the Percona.com blog. Now, this is part of their series where they're using different techniques to upgrade PostgreSQL. And here they're saying if you have an existing PostgreSQL database that is older than 9.4, this is a technique you can use to do the upgrade with zero or essentially near zero downtime. And it's using Slony to do it so it goes over what Slony is. It's basically using triggers to try and replicate and send the data over to other database systems. So again, this is potentially a good scenario if you have a PostgreSQL instance older than 9.4, if you have one that's at least 9.4, generally the recommendation is to use PG logical to do logical replication if you want to try a zero downtime upgrade. But if you're interested in this, definitely a blog post to check out. Related to it is a next blog post called Fast upgrade of legacy PostgreSQL with minimal downtime using PG upgrade. So PG Upgrade is basically, unless you're using PG dump, PG Upgrade is generally the recommendation to do an upgrade, and you can basically use PG Upgrade to move up to different versions. And this does require some downtime, but you can drastically minimize that amount of downtime if you're using hard links when using PG Upgrade. So if you're wanting to learn more about PG Upgrade, here's our recent blog post you can check out. [00:11:57] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Galingposgres.com, where you can sign up to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 320

June 16, 2024 00:17:32
Episode Cover

100 TB and Beyond! | Scaling Postgres 320

In this episode of Scaling Postgres, we discuss three organizations scaling their databases to 100 TB and beyond, collation speed, configuring memory and new...

Listen

Episode 166

May 23, 2021 00:16:31
Episode Cover

Postgres 14 Beta 1, PG 14 Improvements, Best Primary Keys, Composite Keys | Scaling Postgres 166

In this episode of Scaling Postgres, we discuss the release of Postgres 14 Beta 1, different PG 14 improvements, choosing the best type of...

Listen

Episode 250

January 30, 2023 00:14:03
Episode Cover

Performance Issue, Survive Without Superuser, Reserved Connections, Partition Management | Scaling Postgres 250

In this episode of Scaling Postgres, we discuss resolving a performance issue, how PG16 allows you to survive without a superuser, reserving connections and...

Listen