Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we discuss high volume processing, indexing, PG wall issues and restoring with PG receive wall. I'm Kristen Jamison and this is Scaling Postgres, episode nine.
[00:00:21] Welcome to Scaling Postgres. An event of notes that happened this week is Postgres Conf took place in Jersey City in the US.
[00:00:30] Now, as a part of that, our first content is related to that because it actually was presented there. It's how to Index Your Database by Baron Schwartz. That occurred at Postgres Conf. Now, some of the initial slides are things that were already pretty familiar with to me, but starting on around slide 29, it started to get a really good refresher on six ways to optimize index usage visa vis, performance related issues.
[00:01:02] So for example, he covers how not to defeat indexes in that you need to do a search on a value in an index, not necessarily an expression, and about the left prefix rule, whereas multicolumn indexes are sorted by column one and column two. And that queries can use the prefix of an index, but not necessarily the suffix.
[00:01:28] Related to that is that it's always best to do equalities first and then the ranges when you're using a multicolumn index. Another piece of advice is to exploit index only queries. So basically create covering indexes for important queries so that you can get index only queries.
[00:01:49] The next piece of advice is to exploit clustered indexes. However, this is not supported in PostgreSQL, so this presentation is more generic. But the other pieces of advice up to this point have been beneficial.
[00:02:03] And again, five is consider column selectivity and order. So again, the leftmost prefix rule applies.
[00:02:12] And lastly, he mentions avoid over indexing in terms of it does increase your right load. But the reality is the other side of it is don't fear indexes because you can probably put a lot more indexes on a table than you think you can, particularly if you're using partial indexes strategically, you're not covering the entire database with them. So, some pretty good pieces of advice. So definitely a piece of content to check out.
[00:02:43] The next article is PostgreSQL's developments for High Volumes Processing. So in it since about PostgreSQL 9.5, he goes over all the different features that have been added up to and including what is projected to go into version eleven of postgres on features that help with high volume processing. And this is from the select all from Adrian blog. And in it he goes over features that impact parallelism, how well it can execute queries in parallel, different access methods, methods of partitioning, internal improvements to the code to execute things faster maintenance tasks, task areas, foreign data wrappers. So, a very comprehensive post about each of these features. So definitely a piece of content to review to ensure you're using all the features that can help you scale your database.
[00:03:41] The next article is postgres as the substructure for IoT and the next wave of computing. And this is from the Timescale.com blog. And again, Timescale is a extension for PostgreSQL that adds time series features to postgres. Now, this title is not very specific about what it covers, but why I wanted to take a look at this post and what I thought was interesting is it's reporting down here showing once you have large data sizes, millions of rows in a table, how insert performance can start to degrade. And this is something that they experienced with their particular instance size they were using. Now, there's different ways to address this with base PostgreSQL in terms of you can start using partitions, larger instance sizes, there's some other things you can do. But because of this pain that they were experiencing, because they were doing consulting in this area with regard to time series, they developed their solution where you can get consistent performance gains up to looks like up to a billion rows here in the table. And comparing the performance drop off. So if you're using, or I should say collecting time series data in your PostgreSQL database and you're experiencing slowdowns in the inserts, you might want to check out this extension called Timescale DB to see if it can help you improve your performance. And you can see some of the results that they've seen here with this, with their experiment that they did, is 20 times higher inserts, 2000 times faster deletes and up to 14,000 times faster queries. I'm sure some of the partitioning can get you some of the way there, but I'm sure some of their enhancements have definitely increased that or made it faster.
[00:05:40] The next article is actually a series of articles about PostgreSQL data types. So if you're a developer, this is something that you might find of interest. And I will include three links to the three posts. One is on arrays, one is on range data types, and one is on network addresses, so IP addresses. The one I find of more interest is the arrays where he goes over a particular use case where these could be advantageous and goes over loading data and showing you how to set up the array data type.
[00:06:18] But he also goes over when you're needing to search in certain ways, applying a gen index to them to be able to search for certain values, as well as cases where it's beneficial to use a nest to UNNEST the array data type. So, definitely a blog post to check out and these series of blog posts are from the Tap Oueh.org blog. So the next series of content would be of interest to database administrators. The next article is PG wall is too big. What's going on? So this is a very interesting blog post in terms of where they went in and diagnosed a problem that was occurring and trying to find out the solution to it and giving you some general advice about how PostgreSQL works. So if you are doing management of a database, I definitely suggest checking this out. Basically, he talks a little bit about how essentially the active log files that PostgreSQL uses are stored in versions prior to Ten in PGx Log. And you never want to touch those, you never want to delete them manually. Let Postgres manage those because they actually mentioned in the post how some people thought they were just logs and they could delete them.
[00:07:35] And that may be part of the reason why in version ten they changed the name from PGx Log to PG Wall. So it's clear that these are the right ahead logs. And basically it talks about different ways to address if your disk space is filling on, how you can grab some free space, and how you can diagnose different or resolve different issues that you encounter. And there's even an interesting comment here about another issue that someone ran into. So definitely if you're a DBA, this is very good post to check out.
[00:08:10] The next article is a guide to PG Pool for PostgreSQL part one. And this is from the Several nines.com blog. And as the name implies, PG Pool is a connection pooler for postgres, but it also does a lot more. So for example, they list terminator replication load balancing for read, scalability, high Availability.
[00:08:32] So PG Pool is one of these utilities from my understanding. I've never set up PG Pool, but it's kind of has a lot of different features under it, whereas I've tended to use PG Bouncer because it's a focused connection pooler. But if you're interested in looking at PG Pool, they discuss how to set it up and how to get it working. So definitely a blog post to check out if you're investigating using PG Pool.
[00:08:58] The next article is an expert's guide to Slony replication for PostgreSQL. So Slony is another way to do replication. It's a third party replication system. It's not part of the PostgreSQL community. I tend to use the built in the community's PostgreSQL replication, but this is another option that has advantages in some use cases. So if you're interested in checking out, you can review this blog post.
[00:09:28] The next article is PostgreSQL restore when using Pgraceive Wall. So I did a tutorial a week or two ago about setting up Pgraceive Wall and that basically streams wall files from say, your primary database so that you don't have to worry about archiving them on the master and finding ways to get those files off the primary database. You just use a replication slot and stream them to another system to aggregate them there and then send them off to long term storage, say in S Three or whatever you would like to do. But with using PG Receive Wall, you have to take into account some special situations like it creates partial files. And how do you deal with that when doing a restore, particularly when you want to get it up to a particular point in time, or to restore synchronization with an existing replica and even having the PG receive wall instance following the Master after Restore. So I cover that in this video tutorial, and if this is something you're interested in, be sure to check it out.
[00:10:36] That does it. For this episode of Scaling Postgres, you can get the links to all the content presented in the show Notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.