Postgres Playground, PG14 Internals, DB Corruption, Anti-Join | Scaling Postgres 229

Episode 229 August 22, 2022 00:11:49
Postgres Playground, PG14 Internals, DB Corruption, Anti-Join | Scaling Postgres 229
Scaling Postgres
Postgres Playground, PG14 Internals, DB Corruption, Anti-Join | Scaling Postgres 229

Aug 22 2022 | 00:11:49

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss a new Postgres playground, a book about PG14 internals, how to corrupt your database and using anti-joins.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/229-postgres-playground-pg14-internals-db-corruption-anti-join/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about a postgres playground, PG 14 internals database corruption and an anti join. I'm Kristen Jameson, and this is scaling postgres episode 229. [00:00:24] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is Learn Postgres at the playground. This is from Crunchydata.com, and they're talking about a new tool that runs postgres in your local web browser. It uses WebAssembly from what they're saying and it takes particular data sets that are loaded in and it gives you tutorials to follow along. So it gives you a sandbox or a playground to play around in and get to learn postgres. And so some of the initial tutorials that they're covering is the Basics of psql. So just using the psql client, doing partitioning, doing some performance analysis, joins indexing PostGIS as well as window functions and CTEs. So if we take a look at the page, it looks like this at a crunchydata.com slash developer slash tutorials. And it has different links that you can click on. And so for example, if I click on psql Basics on the right pane, it actually starts a psql database presumably, and loads it with some sample data using WebAssembly. And then the left pane, you can just follow along and enter commands to see what the results are. So as you can see, this is an actual database. You can look at the databases installed. So you just see postgres and the templates and you can see what tables available, just this one table and you can do selects against it. So this is a pretty cool tool set, in my opinion. Now, also related to this, there was a YouTube video recently released called The Past, Present and Future of Postgres and the Postgres Playground. And this is on the Hashura YouTube channel. And this is about an hour interview with Craig from Crunchy Data. And whereas the first 20 minutes or so is just talking about postgres in general, the latter half goes into more depth of the postgres playground. So if you want to learn more about that, definitely check out these pieces of content. [00:02:20] Next piece of content, PostgreSQL 14 Internals part Two this is from Postgres.com and maybe a month or two months ago we mentioned the first part was released. This is a free PDF describing the Postgres 14 internals. So they released part one, and this is now released as part two, where they cover design of the buffer cache and explains the need for write ahead logging. So if you want to learn more about the internals of postgres, definitely recommend checking out this free PDF. [00:02:51] The next piece of content how to corrupt your PostgreSQL database. This is from CyberTech postgresql.com. And this is the post that literally describes how to corrupt your database. So it lists about seven different methods. The first one is creating a corrupt database by setting Fsync off and then how you corrupt it is basically you put a load on it. They're using PG bench in all of these cases and then cut the power to the database and eventually you'll get a corruption error. The next one is creating a corrupt database from a backup and this has to do with not creating a backup label and he goes through the process of how to corrupt it this way. Another way is using PG reset wall. So for example, if you do an immediate shutdown and then you reset the wall, essentially the wall is not available to restore the database. So now you have a corrupted database. The next is using PG upgrade with a link option, but then turning on the old database before you've dropped that old database cluster. Because they're using shared files because of the link that could cause corruption. You can just go in and manipulate the database files and edit them any way that you want. You're going to get corruption that way or even modifying things in the Postgres catalog. So here they're deleting something from the PG attribute table. So this is also more a list of what not to do. So as he says here, don't mess with the system catalogs, don't modify things in the data directory, don't run with F sync off, don't call Pgreset wall on a crash server, don't delete or omit the backup label. And the last three items are things you should do remove the old cluster after an upgrade with PG upgrade, run a supported version of Postgres and run on a reliable hardware. So if you do these things, you'll definitely minimize your chance of corruption. But if you want to learn more, definitely check out this blog post next Piece of Content rise of the Anti Join this is from Crunchydata.com and they're talking about anti joins, where you want to do a join to a table and exclude everything that exists in the second table. Now they actually approach this in a few different ways, but to get started they generated a series and created a table a with all the same values, b with all the same values, and then just deleted two values from the B table and says, okay, how do we pull those out now? Really, they were wanting to use this for a spatial query where they exclude everything that is not a county. So I'm assuming this is pulling all the data that exists in the water in this image I'm looking at here. So one way they tried to do it is use a not in and that just takes forever. You don't want to use that. Another way is using the accept keyword, but that's pretty slow as well. I mean, it gives you the right answer, but the fastest way to do it is an anti join. And there's two ways to write this. You could do a where not exists and then your query for matching them up or and this is the syntax that I always tend to use. You just do a join to the table where the values are equal, but then you look in table B where the value is null, and that's definitely the fastest way to do it. And then he says you can also apply this technique doing a left outer join and then checking if the second table's value is null as a way to pull out these points that are not a part of the county. So it looks like items that were in the water here, but if you want to learn more, definitely check out this blog post. [00:06:13] Next piece of content AWS RDS PostgreSQL development with PG Admin Four. This is from Enterprisedb.com and this is pretty cool. I don't use PG Admin, I tend to just use psql. But for those who use PG Admin, you can now actually create instances in it in cloud platforms. It looks like they have one for Azure already for creating hosted database systems in Azure. Now you can do it for Amazon RDS and basically you just put your keys in and it just uses the AWS API to provision a given RDS database instance for you. So definitely check out this blog post if you want to learn more about that. [00:06:52] Next piece of content. Leverage a new way to import an existing postgres database to Kubernetes. This is from Enterprisedb.com and they are using their cloud native PG operator for Kubernetes and they are going through a scenario where you're running postgres ten on RDS and how do you transfer that to a three cluster? Kubernetes managed postgres 14 installation and they go through the YAML file that you would need for the operator to be able to set up that environment and go through the process for transferring the data over. So definitely check out this if you're interested in that. [00:07:28] The Next Piece of Content PostgreSQL 16 Part One or Commit Fest 2022 Seven so this is talking about postgres 16, which is due to be released about 15 months from now. So over a year from now. But they've already started the Commit Fest to see what features are going to be landing in postgres 16. So this goes over the notable features that were worked on during the Commit Fest, and there's a lot of detail with regard to each one. So if you're interested in what the future holds, you could definitely check out this blog post. [00:08:00] Next Piece of Content aliases for subselects in from clause this is from Cybertechn Postgresql.com and we talked about that previously, but if you're not aware, when you use a sub query, you need to give it an alias. So this is a sub query and it has no alias and it produces an error. But in postgres 16, so here they're using 16 development, you now no longer require that alias and it works as expected. So definitely something to look forward to. In the next 15 months or so. [00:08:34] The Next Piece of Content the next episode of Postgres FM was published, and this episode covers a monitoring checklist. So they go over a checklist of all the different monitoring that you would want to do for your Postgres installation. So they're talking about TPS and QPS latency connections or your sessions longest transactions tracking those tracking commits versus rollbacks transactions left until a transaction ID wraparound replication. Lags the count of walls waiting to be archived, which is an archiving. Lag wall generation rates, locks and deadlock counts. Basic query analysis graphs such as top n by total time or meantime and basic weight event analysis. So I haven't had a chance to listen to this episode yet, but I'm definitely going to be doing that because all that stuff sounds pretty cool. [00:09:27] Next Piece of Content five Minutes of Postgres episode 31 Postgres Security Patch Release Spotting Vulnerable Extensions and Securing the Public Schema so last week, new versions of Postgres were released to handle a security bug as well as introduce some additional features. And in this episode, Lucas covers that update, and he also talks about some additional extensions that you can use to identify, as he says, problematic vulnerabilities. So there's the PG Spot extension that looks for potential vulnerabilities in extensions that you're using. And he also talks about an extension called PG Hostel, which basically is great for pen testing that tests your database to see how vulnerable it is. And he also mentioned that with Postgres fifteen's new default, they're revoking public create from the public schema, and that should alleviate a lot of potential security issues that come up. Basically, it's just safer to revoke the public create from the public schema. Otherwise, as he says here, users can just go ahead and create objects like a table or a function in the public schema. So definitely for security, a great new default coming, but everyone will definitely have to test their database on version 15 to make sure that all their permissions are set correctly to make sure they don't have any issues. But if you want to learn more, definitely check out his episode. [00:10:48] The next piece of content. The PostgreSQL Person of the Week is H Juzia. My apologies for that pronunciation if it is incorrect, but if you want to learn more about H and his contributions to Postgres, definitely check out this blog. Post and the last piece of content. We did have another episode of the Rubber Duck Dev show this past Wednesday evening. This one was on code quality with Ernesto Tagworker. He has worked on a number of Ruby libraries that assess code quality, including Ruby, Credit and Skunk. So if you're interested in code quality as well as testing, I definitely recommend checking out our show. [00:11:25] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 84

October 07, 2019 00:12:37
Episode Cover

Postgres 12 Released, Generated Columns, Parallel Estimates, Change Data Capture | Scaling Postgres 84

In this episode of Scaling Postgres, we discuss the release features of Postgres 12, generated columns, parallel estimates and change data capture. To get...

Listen

Episode 232

September 12, 2022 00:14:36
Episode Cover

PG15 Beta 4, Primary Key Options, Sequence Limits, Configuration | Scaling Postgres 232

In this episode of Scaling Postgres, we discuss the release of Postgres 15 Beta 4, different primary key options, sequence limits and how to...

Listen

Episode 157

March 22, 2021 00:18:40
Episode Cover

auto_explain Overhead, Postgres Data Lake, Citus Shard Rebalance, SSL Authentication | Scaling Postgres 157

In this episode of Scaling Postgres, we discuss auto_explain's overhead, setting up a Postgres data lake, rebalancing a Citus shared database and implementing SSL...

Listen