Terabyte Scale, Permissions, Fast Column Adds, pgBouncer

[00:00:00] In this episode of Scaling Postgres, we talk about terabyte, scale database permissions, faster column ads and PG bouncer. I'm Kristen Jameson, and this is scaling postgres episode six. [00:00:21] Alright, our first article is PostgreSQL at 20 terabytes and beyond analytics at a massive scale. Now, if you've been watching my episodes, that title might be familiar to you because in episode one of Scaling Postgres, I found the slides to this presentation. But here they recently posted the YouTube video of the presentation by Chris Travers and this is part of the FOSSASIA YouTube channel. [00:00:49] So again, just to remind you, 20 terabytes is a single database, but it's actually multiple databases that are handling all the data that this service supports. So it's up to 400 terabytes at this point, I believe. So definitely a piece of content to check out if you're looking at how to potentially scale your PostgreSQL database and see some of the challenges that they've run into. [00:01:18] Now, related to this, he did an earlier presentation at an entirely different organization and that presentation was called PostgreSQL at Ten Terabytes and beyond. So this was a different organization, different application, and also different problems and issues he had to encounter. Now, this is an older presentation. It was posted last February, or I should say February 2017 on the Edument YouTube channel. [00:01:48] But again, if you are at this level, it would be a good presentation to review when you're looking how to scale your PostgreSQL database. [00:01:57] The next article is from the Squarespace Engineering blog, and the title is Building on Solid Ground getting Postgres Foundations right with PG Bedrock. [00:02:09] So basically, this is a tool that enables you to manage your PostgreSQL permissions. So they were having some issues because they have many users and different responsibilities for each of those users for accessing data to their PostgreSQL database. And the issues they were encountering is that permissions are managed in multiple tables across PostgreSQL and it was challenging to correctly set the appropriate permissions for every user and maintain consistency. [00:02:44] So they developed this tool with these goals in mind to collate all configuration within one file, simplify their permission complexity and assert that the configuration matches reality. So again, they developed this tool called PG Bedrock to manage the roles, role memberships, schema existence and ownership and object privileges. So it's doing a lot for maintaining permissions in their database. [00:03:16] And they define a configuration file, as you can see here. So they have a particular user, they can log in, they're a member of a particular role, they have these attributes, they own particular schemas, and here are their privileges to other schemas tables or sequences. [00:03:34] So in terms of scaling, this isn't necessarily a performance related post, but it's when you scale, say, staff members and you start to get really complex permissions that need to be established for access to the data within PostgreSQL. So definitely something to check out if you're experiencing these kind of pains to see if this particular tool would help you. Now on GitHub, here's the tool on the squarespace PG bedrock and I'll include the link in the show notes, but it has a very, very comprehensive README that describes how everything gets set up. So again, if you're experiencing permissions issues, I would definitely suggest checking out this new tool. [00:04:20] The next post is pain free ad column with non null defaults waiting for PostgreSQL eleven. Fast alter table add column with a non null default so this is a feature that's coming in PostgreSQL eleven, and this was mentioned on the Dataegrid.com blog by Andrew Dunstan. So basically, once you have a larger database, adding a default the same time that you add a column essentially causes an entire rewrite of the table and could potentially lock things up for you. And as they mentioned here, what you would typically do to add a column where you want a default value is first, adding a new column with no default value, then secondly, setting a default value which will be used for new and updated rows. And then last is updating the existing rows and setting the default value into the added column. So it's a bit of a process, particularly on a larger table, to add a column with the default that you want. However, with this patch that should be coming in PostgreSQL eleven, it enables you to add a default value when you add the column without rewriting the entire table. And this blog post goes into a little bit of the details about it, so I suggest checking it out if you want to learn more. [00:05:49] The next article is from datanami.com and it's Making Hadoop relatable again. So I found this post and I was reading it a little bit, it's not about PostgreSQL, but I found it interesting. They're talking about making Hadoop more like a relational database. And here's a quote if you make Hadoop more like a relational database, then people will do more with it. So it's recognizing the power of a relational database. So I just found this interesting that all of these different database systems keep going back to the relational database and using SQL in order to allow easy access to the data. So it's interesting how these NoSQL options seem to keep going back to SQL. [00:06:43] The next couple of articles are very short, brief ones that give suggestions for PostgreSQL configuration. [00:06:52] So for example, these are from the Build blog. The first one is mount points in the single PostgreSQL server. So basically it's a very quick little post that says two rules to keep in mind when choosing a name for the mount point in your directory structure for your PostgreSQL database. One, always include the major version in the directory you're using, but never include the major version in the mount point. [00:07:23] For example, use something like PgSQL for a mount point, something that doesn't include the version. But when you're establishing your PG data or the actual data directory do use a version for it. So here they had an example of within the mount point ten data, the reason being because at some point when you're doing a PG upgrade, you're going to want to potentially have two different versions running when you're using PG upgrade. And having the directory structure in this manner enables upgrades to happen much more easily. [00:07:58] The next article is again from thebuild.com Change this wall compression. And basically he's suggesting that pretty much everybody should go ahead and enable this parameter even though it's off by default. And basically as long as you're not severely CPU bound, you get a benefit in disk space and a reduction in network traffic if you're sending to a streaming replica. So definitely something to take a look at if you have not enabled that setting. The next post is a guide to using PG Bouncer for PostgreSQL. And this is from the several nines.com blog and this goes over the general reasons why you would potentially want to use Pgbouncer and some other alternatives such as PG Pool. But this focuses on PG Bouncer and how basically it helps make many, many connections to the database much more efficient than the stock PostgreSQL. [00:08:56] And then at the bottom here, it actually goes into a configuration and different parameters that you can use to set it up. So if your connections are starting to increase more and you haven't looked into PG Bouncer, and I would say your connections going up to 200, 300, 400, definitely check out PG Bouncer as a way to help manage that many connections to your PostgreSQL database easier. [00:09:24] The last article is PostgreSQL wall archiving with PG Receive wall. So this is a video tutorial on essentially how to use PG Receive wall. Now, if you haven't heard of that, that is utility that simply lets you use streaming replication protocol to stream wall files to another server that doesn't have to be a database or a regular replica. So I would say this is one of the most efficient ways to get the wall files off of a master database. If you're doing that, you essentially streamlim and then produce the files on this other server. And I go into in depth how to set that up and the different commands that you would use. So if you're looking for a potential solution for better wall archiving, I encourage you to check it out. [00:10:16] That does it. For this episode of Scaling Postgres, you can get links to all the content presented in the notes below. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. You can also subscribe via YouTube or itunes. [00:10:34] Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 231

PG15 Public Schema, Kubernetes Experiences, Dump Manifests, PgBouncer Fork | Scaling Postgres 231

Episode 275

No More Vacuum, Zero-Downtime Cut-Over, Network Impact, Not In Optimization | Scaling Postgres 275

Episode 281

HNSW Indexes, Vacuuming Bloat, Watch Me Now, Connections | Scaling Postgres 281

Terabyte Scale, Permissions, Fast Column Adds, pgBouncer | Scaling Postgres 6