ARM Tests, Tips & Tricks, Hierarchical Structures, Benchmarking Framework

[00:00:00] In this episode of Scaling Postgres, we talk about Arm tests, tips and tricks, hierarchical structures and benchmarking framework. I'm creston, Jameson. And this is scaling postgres. Episode 149. [00:00:18] Was all right. I hope you, your friends, family, family and coworkers continue to do well. Our first piece of content is PostgreSQL on Arm based AWS EC two instances. Is it any good? This is from Procona.com, and they're talking about the new Graviton two based AWS instances that have been added last May 2020. And they compared them against an X 86 platform. Now, they're calling these older, so maybe there'll be a new one at some point, but an older M large against the new Graviton ones, arm based the M six GD eight X large. Now, they both have the same amount of virtual CPU cores, same amount of memory. The storage is a little bit different. They use the local NVMe storage, not the EBS store that Amazon offers. But of course, the difference you'll notice here is that it's cheaper about, I think they said, 25% cheaper. So if the performance is equivalent, you already get a win here by being able to pay 25% less. So they installed Ubuntu on these systems, did the postgres configuration as defined here, and they ran some PG bench tests. Now, with a read write workload, they found that the Arm processor gave a 19% performance gain. So even though it's 25% less, it gave 20% more performance. That's pretty good. And with read only, it was a little better. It gave a 30% performance gain in transactions per second for the Arm processor versus the X 86 processor. So it looks like by switching to these, you could immediately get better performance for a lower price. Next, they tested a sysbench TPCC. Now, what they did mention is that whereas they didn't run into any issues with PostgreSQL on Arm, they did have issues with Sysbench on it. So that's just something to keep in mind. So they looked at an in memory load with 16 threads, and you could see the Arm in blue here. [00:02:28] Gives you about a 15% better performance. Now, this drop off, they said, is due to the checkpointing and the full page writes, that happen. And it just happened earlier here. Since it was processing more compared to the X 86 platform, they did in memory with 32 threads, and the actual difference kind of reduced. Now, remember, these are using 32 virtual CPU cores. Then they went to 64, and the difference between them, as the threads increased got narrower and narrower. In some I O tests, they also saw some better performance for X 86 platform. [00:03:05] So it's not always the case that Arm wins every time. But in a fair number of the tests that they've shown here, it gave what they're calling about a 15% to 20% performance gain for 25% less money. So definitely an interesting finding. And perhaps you want to check out using PostgreSQL on Arm for your workloads. [00:03:25] The next piece of content is Tips and Tricks to kickstart the Postgres year 2021. This is from Cyprduck Hyphen postgresql.com and the first one they cover is partial indexes. And this is just something that is frequently on his Tips and Tricks list, he says because it's so important for performance, because if you can only index certain parts of the data, that index will be smaller and much faster to look through. So places where I see partial index as being important is if you have say, like a status column and there's only a few different statuses. Well if you only query for one or two of those statuses, then just create a partial index for each of those statuses that you use frequently and then that index can be that much smaller because it only records those or things with a low cardinality. Just create partial indexes for each of those values in those particular fields. He uses an example of when you have say, a lot of nulls in a column but a few values, a partial index would be good in that case. The next step he mentions is estimating the average row or record size in bytes and he says you can do this pretty easily using the function PG column size. Now this says column size, but you can actually apply it to all the columns of the table by using a query such as this where you're referencing the entire table so that is possible to give you an average row size. [00:04:46] The next tip he mentioned is avoiding risk on upgrading older postgres instances to new major versions and basically saying use logical replication. So if there's some reason you can't have downtime, logical replication is a means to do that upgrade, ideally without any downtime at all. The next tip is detecting which index types are available for my column type. So it gives a query on how you can define what column type or data type you're using and it gives the available indexes for it. And he mentioned you can also add some additional extensions such as Btreegen or Btree gist to get a few more different options as well. So basically be aware of the different index types that can potentially help give you better performance for what you're trying to query. And the last tip is inspecting metadata at the speed of light with Psql's gdesc. So this gives you a description of all the data types from a query that you've run by running the backslash gdesc. So if you're interested in some of these steps, go ahead and check out this post from CyberTech Postgresql.com. [00:05:49] The next piece of content is hierarchical structures in PostgreSQL. So they're talking about a hierarchical structure where you have departments and what departments are located and others. So they have a name. So the engineering department, it has no parent, whereas say, the product department is a part of Engineering and Finance and human resources are part of administration, so it's a hierarchical organization. And then they wanted to create a path that defines that hierarchy that you could just look through. And they built this using two different implementations. The first implementation used materialized views with recursive Cdes. The materialized view was just to be able to refresh it conveniently, but the recursive Cde is how it goes through and builds that path structure that we saw. And this is the relatively simple query to go ahead and do that. And the materialized view is just to refresh that on our frequent basis. And they show an example of it, of how you can query what it looks like, what the path that it generates is looking like, and how you can pull out individual values as well as look for, say, where product is in the hierarchy. The second implementation is using lTree, and we've discussed this in a previous episode of Scaling Postgres, but lTree is an extension you can add that actually builds this for you. So you can see when they created this teams table, again, they used the data type of lTree for creating it. Then they loaded the data and it gives you the same type of path. Now it doesn't look identical, it's a little bit different, but it shows you how you can query this path to get the same types of answers. So if you're interested in learning how to store hierarchical data, then maybe you want to check out this blog post from Hoverbear.org. [00:07:35] The next piece of content is demo of another benchmarking framework. This is from Markwkm Blogspot.com and he's talking about developing a more convenient benchmarking framework that you can load in your own queries to do benchmarks against different data sets or different databases. So he used a data warehousing example and used a Dbgen and QGEN from the TPCH framework. And his user defined benchmark is called PG, not TPCH. [00:08:08] So it's basically him developing this benchmark that could potentially be used to put your own information into to test against databases. So if you're interested in that, you can check out this blog post. [00:08:21] The next piece of content is waiting for PostgreSQL 14. Add PG database counters for sessions and session time. So basically to this view, the PG Stat database, it's adding session time, the active time, idle and transaction time, number of sessions, the number of sessions abandoned, fatal or killed. So it gives you an overview of how much time different sessions are spent in different activities, as well as where they potentially may fall down. So you can gather statistics on that. And this is from Dep, also from that same website. Are are there limits to partition counts? Now this is interesting. He actually created up to 70,000 partition tables because he saw someone was referencing hey, there's some limit on it. And he went all the way up to 70,000 with no issues of doing that. However, he did notice an extension in the planning time required for the queries that were going against the database for these partition tables. So it does work, but you probably don't necessarily want to do that. But definitely an interesting post about what's possible. And he actually has a quote here I would probably discourage people from making 5000 plus partitions on a single table, but PG can definitely handle it. So if you want to learn more about this, you can check out this blog post. [00:09:41] The next piece of content is how to with Dave Page episode Three install and run PG Admin using the Python package. So there's a convenient YouTube video here that shows you exactly how to do that. This is from Enterprisedb.com. [00:09:56] The next piece of content is golden proportions in PostgreSQL. This is from CyberTech UL.com, and they're talking about golden proportions where they're basically certain mathematical values that have interesting concepts. So one particular value he mentioned is 1.6 118, and he uses this in reference to building a new office that they were discussing. Now, the interest of this post is basically on all the different mathematical and conversions you can use within PostgreSQL, because this whole post talks about doing different calculations and computations using SQL. So if you want to learn more about the different functions and mathematical computations that are capable of postgres, maybe you want to check out this blog post. [00:10:41] The next piece of content is production. PostGIS Vector tiles. Caching. This is from Crunchydata.com. They're talking about an issue where you have, say, a public website that needs to look at images, and you're using PG Tile Serve to dynamically pull that data to present to users of your application. And that creates a huge load on the PG Tile Serve as well as on the database to pull that data from it. Now, one way you could do it is set up multiple databases in a cluster like this. But he advocates probably the easiest thing to do is set up a caching. Now, he used Varnish in this case, but basically you cache the results of the PG Tile Serve to serve to your application. So you could probably get by using a 1 minute or a five minute or even a 30 minutes cache of data. You don't necessarily need it to be all real time, but this would allow near real time results from your application if you have this type of caching in place. So if you're interested in learning more about that, you can check out this blog post. [00:11:47] And the last piece of content is the PostgreSQL Person of the Week is Mark Lindster. So if you're interested in learning more about Mark and his contributions to PostgreSQL, definitely check out this blog post. [00:11:59] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode or you can subscribe via YouTube. Ritunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 236

Postgres 15 RC2, Shared Buffers, Secure Connections, PGSQL Phriday | Scaling Postgres 236

Episode 337

77 Times Faster In Postgres 17 | Scaling Postgres 337

Episode 189

Index Downsides, TCP Keep Alive, Development with Postgres, Learning PL/pgSQL | Scaling Postgres 189

ARM Tests, Tips & Tricks, Hierarchical Structures, Benchmarking Framework | Scaling Postgres 149