Episode Transcript
[00:00:00] In this episode of Scaling Postgres we talk about practical indexing, vertical scalability, migrating to partitions and foreign data wrapper performance. I'm Kristen Jameson and this is Scaling Postgres, episode 56.
[00:00:22] Alright, I hope everyone is having a great week. We do have a fair amount of content this week, so that's better than last week. So our first piece of content is a presentation that was done in PG Nordic Day recently and it's called look it up practical PostgreSQL Indexing by Christophe Pettis of PostgreSQL Experts. So this basically goes through all sorts of the different indexes of PostgreSQL, why it's important to use them, and it goes over the Btree hash gist gen SP gist in bren indexes. And what I found particularly interesting is around slide 53 he has actually a decision tree. So he says what index to use based upon what your use case is. So how many rows type of column is it a small scalar and basically he goes through a decision tree process to determine what is the best type of index to use in your use case. So I found this a very valuable piece of content, particularly if you're a developer, but also of course DBAs to find out what is kind of the best index to use in PostgreSQL. So, definitely a piece of content I suggest checking out.
[00:01:39] The next post is vertically Scaling PostgreSQL. So this is how you typically think of scaling. Horizontal scale out is using multiple servers, but vertical scaling is basically getting bigger and bigger servers. So this is kind of the default way to scale postgres and it actually talks about it from the perspective of CPU memory and disk and what configuration options you are going to change in PostgreSQL. So with regard to CPU, maybe you're wanting to increase connections and set connection limits at the database or for particular user roles. It talks about the number of back end processes you can adjust based upon the number of CPUs as well as parallel queries, as well as allowing workers for doing better handling, potentially logical replication or maintenance workers. And then of course going into auto vacuum as well. Then it talks about different memory parameters. So the main ones that you generally change as you're configuring PostgreSQL is shared buffers and effective cache size and of course work mem. But they also mention temp buffers and some other things such as maintenance work memory or auto vacuum work memory and then in terms of disk talking about temp file limit, effective I O, concurrency, different page costs, so they go through different parameters that you would want to change basically as you're scaling up your PostgreSQL instance. So definitely a good post to check out. With regard to PostgreSQL configuration, the next post is Migrating simple table to partitioned how and this is a relatively brief post, but it gives you a convenient way to be able to take a particular table and convert it to using partitions and the technique he uses. First he creates a general user's table and places some rows in it. And here's the technique. Basically you start a transaction, you alter the table name to something different. Then you create a new table that is partitioned. So this will be the parent. And then you attach that table you just renamed as essentially the default partition for this new partition table.
[00:03:51] And then you commit it so then all that data will be accessible. And essentially he says your quote, now we have a user's table that's partitioned into a single partition. And then you can use this additional technique to actually move the data into different partitions. So you create a table that's a partition of it and then you use the CTE and you insert into the new partition table and deleting it from the old or the default partition table now. And then you can attach it as a new partition. Now, he does have a caveat. It says unfortunately it will lock the rows for the duration of the move, but this can be easily alleviated by doing the migrations in small steps. And then he has another example here. And then of course, in the comments they have some additional suggestions that you should potentially evaluate to see if these different patterns may work better for you. But I thought it was a pretty interesting blog post. And if you're wanting to move to partition tables, this is definitely something to check out.
[00:04:52] The next post is foreign data wrapper for PostgreSQL performance tuning. And this is from CyberTech postgresql.com.
[00:05:01] So foreign data wrappers essentially allow PostgreSQL to communicate with other database systems or even files, basically access data outside of PostgreSQL. Now they're talking about a particular feature that's in the postgres foreign data wrapper. So essentially you're in a PostgreSQL database or cluster and you want to talk to another one. It's using the postgres foreign data wrapper. So he goes through the process of creating a table, inserting data, creating the foreign server, mapping a user to it, importing the schema, basically getting up to a point where he has a foreign table that he can access. And then he tests the postgres foreign data wrapper performance. So querying the table locally, he retrieves all the rows in about 7.5 milliseconds. But when he queries it through the foreign data wrapper, it returns in 90 milliseconds. So the question is why? And he attributes it to by default, only 50 rows are fetched at a time and sent over the network. So if you're trying to send 100,000 rows, that's going to be a lot of network requests back and forth. And he says what you can do is actually change the fetch size to increase the size of it. So in this scenario, he increased the fetch size to 50,000. And now when he runs that query, it runs in 40 milliseconds. So essentially he more than doubled his performance by altering this one parameter of the foreign data wrapper configuration. So if you're using foreign data wrappers and potentially can use a performance tip like this, definitely something to check out.
[00:06:38] The next post is PostgreSQL roles are cluster wide and this is from Blogs ed AC Uk.com Not. And basically it's describing how roles are cluster wide or essentially users and groups are cluster wide not specific to a database and how their organization handled it is actually they prefix the role that they create with the name of the database and what he's proposing is potentially a way to support database specific users. The one I found most interesting is approach Three where when you create a role you can establish a DB only option that restricts that particular role to that particular database. But if you have a need to restrict by database within a cluster certain roles, this is definitely a blog post to check out.
[00:07:31] The next post is having group by clauses. Ellen's General Bits this is from Secondquader.com and basically this explains how to use group by with having clauses. And having clause is kind of like a where but it operates at the aggregate level. So after your group by it's like an additional where clause and they go through a few scenarios of using group by having. So if you want a little bit more practice with that, definitely a blog post to check out.
[00:08:00] The next post is actually older, but it's one of the ones that happened at Postgres Open 2018 and it's another YouTube video I suggest checking out and it's called how to monitor your Database or how to Monitor PostgreSQL. This is from Baron Schwartz and again it was at Postgres Open 2018 and he talks about kind of a general perspective to monitoring and what he likes to focus on. So I found it a really interesting perspective and if you have the time, definitely check out this presentation as I highly recommend it.
[00:08:34] The next post is Configuring PostgreSQL for Business Continuity and this is from the Several nines.com blog and it basically covers few steps to take to try to assure high availability for your database in terms of enabling wall archiving and setting up automatic failover. In this example they're using Rep Manager and we've discussed some of this in some previous episodes of Scaling Postgres, but this is definitely another post to check out. If you're interested in enabling high availability for PostgreSQL, the next post is PostgreSQL Zero to Hero Getting Started with RPMs part One. And this basically covers using an Rpm package to install and configure PostgreSQL on a CentOS or Red Hat based system. So if you're needing to do that, here's a blog post to check out.
[00:09:26] The next post is Azure Data Studio, an open source GUI editor for Postgres. Now, I know a lot of people just tend to use psql, but if you're looking for a Gui tool, here is another one that is basically in preview support. So I guess you would classify that kind of as like a beta. But if you're looking for a Gui tool, maybe this is something to check out, potentially to use in your installation.
[00:09:50] And the last blog post is shared relation cache. And this is from plain with PostgreSQL and a PG pool. So it's talking about a Shared Relation cache within PG pool working with PostgreSQL. So if you use PG Pool and want to learn more about this feature, definitely Blog Post to check out.
[00:10:12] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure you head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes. Thanks.