Faster Partitions, Count Speed, Extensions, Education | Scaling Postgres 82

Episode 82 September 23, 2019 00:12:56
Faster Partitions, Count Speed, Extensions, Education | Scaling Postgres 82
Scaling Postgres
Faster Partitions, Count Speed, Extensions, Education | Scaling Postgres 82

Sep 23 2019 | 00:12:56

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss faster partition performance, count speed, extensions and Postgres education.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/82-faster-partitions-count-speed-extensions-education/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about faster partitions, count speed, extensions and education. I'm Kristen Jameson. And this is scaling postgres episode 82. [00:00:20] All right, I hope everyone's having a great week. Our first piece of content is postgres SQL. Twelve partitioning is now faster. This is from the second quadrant.com blog. The first thing they talk about is copy performance, where they say that bulk loading of data into a partition table using copy is now able to make use of bulk inserts. Previously, only one row was inserted at a time. And they show a scale here going from two to 8192 partitions. And they looked at version eleven and twelve and the number of seconds to do a copy from 100 million rows. And as you can see, twelve was noticeably faster. It starts getting into diminishing returns as you go higher and they're talking about the reason for the slowdown is due to how copy code makes up to 1000 slots for each tuple per partition. So there may be some use cases where you may not see some of this effect, but definitely much improvement with partitions in general, not just with larger numbers of partitions. Next thing you talked about is insert performance and they said in version eleven every partition was locked when you were doing an insert, no matter if it received a new record or not. But with version twelve, quote we now lock a partition just before the first time it receives a row. So if they're just inserting one row, then only one partition is locked. And as you can see here, as the number of partitions goes from again two to over 8000, the transactions per second, it's pretty good, up to about 32 starts getting the difference. And you can see a huge difference that happens as you add more and more partitions. In version twelve, you can see there's no penalty as you're increasing the number of partitions that exist for a parent table. So that's a huge improvement. If you have a lot of partitions in your partition tables, then they talk about a select performance. Now they mentioned here, of course, that PostgreSQL eleven added a partition pruning and with version twelve they just made it a whole lot better. And as you can see what a whole lot better means. Again, the TPS, the transactions per second is still pretty consistent, although you start getting a little bit of a drop down in the 4000 to 8000 partitions level, but it knocks the socks off version eleven. So version twelve is much improved in this area. Then they go over some other partitioning performance improvements in terms of ordered partition scans, getting rid of single subpland append and merge pin to nodes, and some various improvements to the runtime partition Pruning. And there are some particular use cases that again, twelve is improved. But as you can see, as you get larger partition sizes, there still can be some issues that of course they'll be working on. But even with this drop off, they said if they change the where clause slightly, swapping out the stable function, call for a constant, the planner is able to take care of Pruning, so there's still some use cases. Again, that again your mileage may vary, but definitely a lot of great improvements with partitioning. So it seems like Postgres just keeps getting better and better with regard to this. [00:03:20] The next post is what's faster, count asterisks or Star or Count One. And this post basically says there shouldn't be too much of a difference in it and you don't see much of a difference with MySQL Oracle, some other databases, but with PostgreSQL you actually see or he saw a little bit of a difference and they know what is the difference. Count asterisks or Account Star counts all Tuples in a group account by an expression counts all Tuples in a group for which the expression evaluates to something that is not null. So sometimes this comes in useful when doing accounts of outer joins and also when you're doing things like counting groups in a subset when using a case or if you're using Filter in PostgreSQL. Now. He also talks about looking at various different database systems. And pretty much for MySQL Oracle SQL Server doesn't really matter. But it does matter for PostgreSQL. And that Count asterisks or Count Star was consistently faster by around 10% on 1 million rows. So more than he had anticipated. And someone actually in the comments mentioned that they saw something similar too. So it's definitely something to be aware of. But he does note here quote do note that the benchmark only tried a very simple query. So as you get more sophisticated in your counting with different types of queries, you may not see this behavior, or maybe when using having or order by, there's different things that can affect it, but it's something interesting to observe and remember and see if you see this kind of a difference in your queries. And this is from a J Oq.org blog. [00:04:57] Next post is PostgreSQL Extensions a Deeper Look. And this is from a presentation that's been embedded here on the LinkedIn SlideShare called PostgreSQL Extensions a Deeper Dive. Now, this is specifically about Amazon RDS, but I did still find the content relevant for anyone using PostgreSQL. But it talks about what are extensions, what are some of the common extensions that people use. Again. PG Stat Statements is a huge one. PostGIS for doing spatial support, postgres foreign data wrapper, procedural language and JavaScript. And then UUID, he shows different commands to be able to look at the supported PostgreSQL extensions. Again, this is on RDS, how you would go about creating them, listing the used ones and how to upgrade or remove them. And then he also goes into different plugins and how to develop extensions as well. So if you're interested in extensions and getting a little bit more information with regard to them and development. Even though this is again about RDS because it's an AWS presentation, still some relevant content in this presentation so I suggest you to check it out. And this is at Jignesh Shah's Blog. [00:06:09] The next post is PostgreSQL top Learning and Training Resources. So this is basically a list of where you can get more information about learning PostgreSQL and of course the number one resource of course is the PostgreSQL Manual. So it's the online manual that's [email protected]. Other thing mentioned is distribution lists. So email list or distribution list, you can get on these to get communication and communicate with others in the community. [00:06:36] It talks about partners and software tools, official PostgreSQL webinars, a list of PostgreSQL events, all the different PostgreSQL events taking place, local and regional user groups, the PostgreSQL Planet which is a list of different blog posts. So a bunch of different resources to learn more about PostgreSQL. Continuing with the theme of PostgreSQL education, Postgres Open just posted 30 to 40 different presentations that occurred at that conference. So if you're looking for more PostgreSQL education material, definitely check out the Postgres Open YouTube channel to look for more content. [00:07:16] Again, continuing with the theme of education, there is a website that has come up called the Postgrescheaat.com and it is a cheat sheet for accomplishing various different functions in PostgreSQL. So for example, I'm on the tables, it tells you how to show a table backslash d the table name showing more details with it get the list of tables from the current schema. You can look at users and it tells you how to set or change a user's password. So there's all sorts of just quick little cheat sheet of different commands you can use for PostgreSQL and this was built from the team at TimescaleDB so definitely an interesting resource to check out and keep on hand in case you need a way to look up something quickly. [00:08:02] The next post. Another educational resource is postgres execution plans. Field glossary. Now this is defining each of the different things when you're looking at education plans and it is truly a glossary. It tells you a node type and then what it is tells you the different parent relationships, what a filter is, parallel aware, relation, name, alias, estimate fields in terms of total cost, startup cost, plan roles plan with actual value fields in terms of actual loop, actual total time. So this is a good resource in case you're trying to look, explain, analyze a query and you want a little bit more information about what you're seeing in the data returned. It would be great if this had some maybe small examples that could highlight it. But this was intended to be a glossary and that's exactly what it is. So if you're interested in this, definitely a blog post to check out. This is from the Pgmuster blog on Medium next post. Again, related to education is The Art of PostgreSQL the Transcript part one. And this is a text version of the presentation that was given by Dimitri Fontaine at Postgres Open. Again, one of the YouTube videos that was uploaded in the Postgres Open channel. And this is a text version of his presentation. Now this presentation actually seems to cover more why PostgreSQL? In other words, why would you use PostgreSQL? And it's basically talking about its relational database roots and how PostgreSQL fits into that. So if you're interested in this content, definitely a blog post to check out. [00:09:38] The next post is SQL and now GQL. So this is talking about making GQL an official standard alongside SQL and the subtitle says here a standard query language for property graphs. So this is a little bit outside of PostgreSQL's bailewick. But again, talking on the topic of conversation, this is pretty interesting and of course it brings up questions in my mind. Are there different types of support that will come into PostgreSQL with this potentially new definition of a language at GQL? So if you want to get more information about this graph query language and this is a great post to check out, that is on Pulse on LinkedIn.com. [00:10:23] The next post is Manage Linux Control groups in PostgreSQL with PGC groups. So this is in reference to a previous blog post on CyberTech Postgresql.com that they did talking about C groups. And they actually have a PostgreSQL plugin called PGC Groups for making the management of them easier. Now, the first thing to talk about is what is a PGC Groups good for? Basically the C groups kind of put a constraint around the processes that are running on a system like a Linux system. So if you have one PostgreSQL cluster running on a system and you want it to have full resources of that machine, you wouldn't want to use C Groups. But if you had multiple clusters that were running and you wanted to restrict their access to certain resources on the machine, then maybe C Groups is something you would want to potentially use. And PGC Groups, as they talk about here, is something that makes it easier. So they've set up a project page for it and they go through to the steps to how to set it up. And then once you set it up, you can, as they say here, support limits on memory consumption, CPU time, and I O bandwidth. And they say one particularly nice feature is this is the only way to limit the amount of file system cache that is used for PostgreSQL. So if you have a use case that would benefit from constraining the resources that PostgreSQL uses on the machine, definitely a blog post to check out. [00:11:49] The last post is PostgreSQL twelve beta four, up and running in less than six minutes. And this is from Luca Ferrari at Fluco 1978 GitHub IO. And he's using PGN, which is kind of like an environmental builder for postgres, so much like you hear of Python's Environment Builder or Ruby's RBM for building a Ruby environment. This builds postgres environments, and you could use this utility to just build version twelve, beta four pretty quickly, set it to use, and then start up the database. So if you have a need to run different versions, this is a quick and easy demonstration of how you could use PGM to accomplish that. [00:12:32] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 153

February 21, 2021 00:12:24
Episode Cover

Int to BigInt, Resistance is Futile, Optimizer Statistics, Advanced Query Optimization | Scaling Postgres 153

In this episode of Scaling Postgres, we discuss how to convert an integer primary key to a bigint, how container resistance is futile, how...

Listen

Episode 205

March 07, 2022 00:09:56
Episode Cover

Indexing With tsearch, Tuples Not Yet Removable, Wicked Problems, Pattern Matching | Scaling Postgres 205

In this episode of Scaling Postgres, we discuss the best ways to index with tsearch, what to check if tuples are not yet removable,...

Listen

Episode 278

August 20, 2023 00:15:58
Episode Cover

Squeeze Your System, One Million Connections, Indexing LIKE, pgvector HNSW | Scaling Postgres 278

  In this episode of Scaling Postgres, we discuss how to squeeze the most out of your database, achieving one million connections to Postgres, how...

Listen