Per-Operation Explain, PG14 Contributions, Foreign Parallel Execution, Incremental View Maintenance | Scaling Postgres 172

Episode 172 July 04, 2021 00:12:39
Per-Operation Explain, PG14 Contributions, Foreign Parallel Execution, Incremental View Maintenance | Scaling Postgres 172
Scaling Postgres
Per-Operation Explain, PG14 Contributions, Foreign Parallel Execution, Incremental View Maintenance | Scaling Postgres 172

Jul 04 2021 | 00:12:39

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss tracking per-operation explain analyze, Postgres 14 contribution stats, parallel execution on foreign servers and incremental view maintenance.

To get the show notes as well as get notified of new episodes, visit: 

 https://www.scalingpostgres.com/episodes/172-per-operation-explain-pg14-contributions-foreign-parallel-execution-incremental-view-maintenance/

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: In this episode of Scaling Postgres, we talk about per operation, explain, PG 14 contribution stats, foreign parallel execution, and incremental view maintenance. I'm Creston Jameson. And this is scaling postgres episode 172. [00:00:24] Speaker B: Our. [00:00:25] Speaker A: Hello. Right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is calculating per operation times in Explain Analyze. This is from Pgmuster.com, and they're talking about being able to calculate at an operation level how long each piece of time takes, because sometimes it looks like it's relatively easy, but sometimes it becomes a little bit more difficult. So they start off with this very simple example where you're just doing a sequential scan and doing a sort, and they're using Explain analyze to get the actual times. Now, the actual time of this whole operation in terms of execution is zero 77 milliseconds. The time to do the sort here is actually 00:54 milliseconds. [00:01:14] Speaker B: But if you look at the sequential. [00:01:15] Speaker A: Scan on the table, the actual time it took is 0 second. So you can actually find out how long it took the sort to do it by subtracting the individual operation within it. So 00:54 minus zero 118 gives you 00:36 milliseconds. So that's how long the sorting took. Ignoring the sequential scan part, the next thing they talk about is loops. So you want to look at each operation to see if it's looping at all. Now, there were no loops. In the one up here, you could see it says loops one. But in this one, which it already says a nested loop, you could see that this operation loops once, but this operation loops nine times. So what that means is you need to take the actual time here and multiply it by nine to give you an estimate of how long this entire operation took. So you can see 00:28 times nine gives you 00:25 two milliseconds. So to get the overall time of this nested loop, you take the total time, subtract it by that new calculated 00:22, as well as the zero one nine to give how long it took this nested loop part to run, which is zero 64. And things get more complicated when you're talking about parallel execution. Now, they say here if you want to get the individual worker timings for each parallel piece of work, you have to use the Verbose parameter. So they used explain analyze verbose. Then it shows you the actual time for each worker for parallel operations. So this is a parallel sequential scan, and it actually has three threads. So one of them is the coordinating worker, and the other two workers actually collect the data of interest. So it doesn't report the coordinating worker. But what you can do is you can take the total of this activity, and again, it's three loops. So three times the total amount here, which is about 7 seconds. Then you subtract this worker and subtract this worker to give you a sense of the main thread, and then things get even more complicated when you're talking about CTEs. So they have a CTE example here, and they say because being able to subtract these timings actually breaks down a little bit because they replicate in the timing output where things are reported twice in both the Append Node and the Scan and Limit Node. [00:03:36] Speaker B: Now, they also mentioned a couple of. [00:03:38] Speaker A: Tools that can help you diagnose this, including Explain, Deps Explain, Dialbo, Flame Explain, and I think PG Muster does some of this as well. So if you're looking for a post to help you better interpret your Explain Analyze output, definitely check out this one. The next piece of content some interesting statistics about PG 14 contributions. This is from Haigo CA, and they went through a bunch of sources, including the commit logs release notes to collect statistics on the Postgres 14 upcoming release. So the first thing they did here was produced a chart by the number of features for a particular category area. So you can see there are ten features that cover vacuuming, and there are four features relative to the optimizer, for example. And he actually comes out with his top seven features of 14 he's most looking forward to. The first one is asynchronous execution of postgres SQL foreign data wrapper append node. So this is where you can push out work to foreign servers and have them do the work in parallel. The second feature he's looking for too is the improved connection scalability with Get snapshot data. So this would hopefully minimize having to use PG Bouncer in some use cases because you can ramp up the connections further. The third is overhauling, update and delete processing, and this is particularly relevant for partition tables. The next one is logical streaming for large in process transactions. So if you have a long ongoing transaction, the logical streaming can start streaming those down before the transaction is complete. Adding support for multi range data types, which we've discussed in previous episodes of Scaling Postgres, allowing Btree index editions to remove expired index entries to prevent page splits. And lastly, Postgres SQL foreign data wrapper batching. So before when you were doing a bunch of inserts, you would do one insert into a foreign server at a time. Well, now you can specify the batch size of how many inserts you want to do at a time, could be 101,000, et cetera. Then he gives a few more charts that shows contributions by author, so you can see what contributions have been made by author. He also did it by company as. [00:05:55] Speaker B: Well as by country, so I thought. [00:05:57] Speaker A: These were some pretty interesting statistics and I encourage you to check it out as well. [00:06:02] Speaker B: The next piece of content parallel execution. [00:06:04] Speaker A: Of Postgres Foreign Data Wrapper scans in PG 14 important step forward for horizontal scaling. This is also from Haigo CA, and they're talking about being able to eventually scale out postgres. And they have a post here where they went through three posts talking about the work that needs to be done to do horizontal scale out of postgres. And a new feature coming in 14 is the ability to push scans to foreign servers, have them work in parallel and then send the results back to the local server or the primary node to be able to return that data to the end user. So it's basically a parallel append across foreign servers. So he talks a lot about the contributors to this, who developed it, and then he goes into some example code of setting it up. Now he sets this up on just a single server so it's not a multi server setup, which that is how you would actually do the scale out. But even doing it locally, they were able to get a performance improvement from about 136 milliseconds to 81 milliseconds using these new features. So there's still a long way to go to have the community edition of PostgreSQL do horizontal scaling of databases. But this is the next new feature that's working towards that ultimate goal. And if you're interested in learning more. [00:07:23] Speaker C: Check out this blog post. [00:07:25] Speaker B: The Next Piece of Content implementing Incremental. [00:07:28] Speaker A: View Maintenance for PostgreSQL Part Three so this post is talking about a future feature that they're targeting maybe Postgres 15 and it's basically being able to create a materialized view that you do not have to refresh. In other words, it's kept up to date and this post discussed the process about going through it. So the example that they gave here is that you have two separate tables and the materialized view you have is basically a natural join between these two tables with the following output. And they said, let's say you update this column here from one to first. How do you go about updating that materialized view. And they take it in three steps. First they extract the table change that has been made to it. Then they calculate what is the view delta, what needs to be changed in the view to be consistent with what the table now stores and then applying those deltas to the view. Now predominantly how they're capturing table changes is that they have triggers on insert deletes and updates on any table that's a source for this type of materialized view that you're wanting to kept incrementally updated. But they go through the process of discussing how they do the updates to keep the view in place. So there is a fair amount of overhead for doing this, of course, but it's definitely something they're working on and working towards getting something implemented by Postgres 15. So if you're interested in learning more. [00:08:57] Speaker C: Check out this blog post. [00:08:59] Speaker B: The Next Piece of Content updates of. [00:09:01] Speaker A: PostgreSQL Observability Diagram this is from Levosky Medium and there's this chart that's been produced which has a link to PGStats Dev that has a diagram of all the different processes and some objects in postgres and how you can observe what is going on with them. So for example, if you want to learn about, say, the client back ends, you could look at PG Stat Activity. You could look at PG backend. Memory context. That's something new in postgres 14. If you're interested in the Write ahead log, well, there's PG Stat Wall and PG is Wall directory. So it color codes and gives you an example of how you can get insight or observability into postgres. So if you're interested in learning more, you can check out this post in the link included. [00:09:52] Speaker B: The next piece of content, Aura Two. [00:09:54] Speaker A: PG now supports Oracle foreign Data wrapper to increase the data migration speed. This is from Migops.com, and if you're looking to migrate from Oracle to postgres, one utility you could use is Aura to PG. And they've done some updates to it recently. It looks like version 22, and apparently this has been worked on for 20 years. But with the new implementation and using an Oracle foreign data wrapper, they got a lot of speed improvements. So this post walks through, okay, what configuration would you need to do to postgres and how you would set this up to do the migration. So the post goes into a lot of depth with that. I'm not going to cover that, but I'm going to cover the end result here from some testing that they did. And with this new version, using the Oracle form data wrapper, they were able to more than double their speed of migration. So that's a huge performance improvement. So if you want to learn more. [00:10:49] Speaker C: Check out this blog post. [00:10:51] Speaker A: The next piece of content, Postgres 3.1.3 is released. If you want to learn more, you can check out PostGIS net. [00:11:00] Speaker B: Related to that, the next piece of. [00:11:02] Speaker A: Content is PostGIS Setup with Ubuntu 24 Two. This is from Cybertechn postgresql.com. They talk through the process of installing PostGIS on Ubuntu and particularly one of the recent 3.1 versions. So if you're interested in that, you. [00:11:20] Speaker C: Can check out this blog post. [00:11:23] Speaker B: The next piece of content is how. [00:11:25] Speaker A: To deploy PG Admin in Kubernetes. This is from Enterprisedb.com. So if you want to install PG. [00:11:32] Speaker C: Admin on Kubernetes, definitely check out this blog post. [00:11:36] Speaker B: Next piece of content is the PostgreSQL. [00:11:38] Speaker A: Person of the Week is Stefan Keller. So if you're interested in learning more about Stefan and his contributions to postgres. [00:11:44] Speaker C: Definitely check out this blog post. [00:11:48] Speaker B: And the last piece of content I'll. [00:11:49] Speaker A: Be following up with today is that we published a new episode of the Rubber Duck Dev Show, talking about project management. So specifically software development, technical project management. So again, this is more long form content about an hour in length. So I welcome you to check that out. Also, we will be doing another episode talking about the human side of project management, this Wednesday, 08:00 P.m. Eastern Standard Time. That does it for this episode of Scaling Postgres, you can get links to all the content mentioned in the show. Notes be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode. Or you can subscribe via YouTube. YouTube. RyTunes. Thanks.

Other Episodes

Episode 38

November 12, 2018 00:09:53
Episode Cover

Parallel Query, Real Time Analytics, Killed Index Tuples | Scaling Postgres 38

In this episode of Scaling Postgres, we review articles covering parallel queries, real time analytics, killed index tuples and unexpected downtime. To get the...

Listen

Episode 3

March 12, 2018 00:09:02
Episode Cover

Modeling, Query Performance, Statistics, pgmetrics | Scaling Postgres 3

In this episode of Scaling Postgres, we review articles covering modeling, checking for long queries using pg_stat_statements, statistics in PosgreSQL 10 and pgmetrics. To...

Listen

Episode 104

March 09, 2020 00:15:21
Episode Cover

Reducing WAL, What is Null?, Understanding SQL, TPC-H Benchmarking | Scaling Postgres 104

In this episode of Scaling Postgres, we discuss how to reduce the size of your WAL files, what is null, how to better understand...

Listen