[00:00:00] In this episode of Scaling Postgres, we talk about constraints. User base sorting postgres is terrible time series databases and zero downtime upgrades. I'm creston jamison. And this is scaling postgres episode five.
[00:00:24] Okay, the first post is Database Constraints in Postgres, the last line of defense, and this is from the Citusdata.com blog, and it goes over the importance of constraints in terms of restricting the type of data you insert into your database. Now, depending on the application that you're developing, usually a framework that you're using allows you to place constraints on the application to ensure only certain data gets inserted into your database. However, what he's talking about here are actual database constraints. So these are constraints that you apply on the tables and columns of your data to ensure that only the right data gets inserted in there. Because if you place your constraints at the application level, there are always other ways you can using Directsql get in and alter the data. So having constraints on the database is an important thing to set. And one of the first examples he uses here of a constraint are basically unique indexes. Now, he's specifically talking about partial unique indexes, but just having a unique index itself is a type of constraint. But the partial unique index example he uses is say you have a user's table and you want the emails within that user's tables to be user table to be unique only when a deleted at column is null. So he is saying a partial unique index is a tool you can use for this, and then they go over other basic constraints you can set. So for example, you can set, okay, the price is greater than zero, or the sales price is greater than zero. So establishing a classical check constraint on the data being inserted to ensure that prices are greater than zero, and you can even add other checks, such as the price will always be greater than what the sales price is. And they also go into more sophisticated check constraints you can establish for your database. They also discuss exclusion constraints and these help prevent overlap of particular data sets. And then they talk about simple data types being database constraints. So you could use a text column for everything, but it can be really helpful to use a boolean if all you're looking for is a true false value, or if you need to restrict the length of a particular column to actually use a fixed size Vercare field. Or there are IP column types you can use.
[00:03:03] So this is a good article to look at for developers, especially if you are looking to do things to help increase your data's integrity.
[00:03:13] The next article is User defined order in SQL. And this is from the Begriffs Begriffs.com blog.
[00:03:24] And this blog post discusses a particular use case where you have an ordered list of items and the user can move those items and arrange them in any order that they want and how best to accomplish it from an efficiency standpoint, a robustness standpoint and an elegance standpoint. And he goes over four different approaches that you can use to support user based ordering. He talks about using an integer method, using a decimal position method, an approach using true fractions or rational numbers that's supported by a database extension he created, as well as using that technique true fractions, but doing it as floats. So if this is a particular feature set that your application has user based ordering, you might want to check out this post to see if implementing some of these methods make sense to you. Now, his database extension is called PG Rational. I'm not 100% sold on using it where I'm using this feature in my applications. However, I would encourage you to check out this blog post to see if it's something you might be interested in looking into.
[00:04:45] The next post is PostgreSQL speeding up analytics and windowing functions. And this is from the CyberTech Postgresql.com blog. And the subheading can be Improving SQL Performance or how to Trick the Optimizer. So in this example he sets up an array aggregate window function that orders once by ID and then once by ID descending and then orders the final result by ID. And here's the explain plan for it. But what's interesting is by simply moving the column order in the select, he is able to eliminate one of the sorting operations in the explain plan. So this is an example of where a simple change can get you a performance boost in your queries. So if you're using window functions that are utilizing multiple order bys in this way, you might want to check out this blog post to see how you can potentially make them more efficient. The next article is actually a presentation that was given at Nordic PG Day in 2018 from Christophe Pettis, I believe from PostgreSQL experts, and the title is Why is PostgreSQL Terrible? So it's always a little bit difficult to interpret presentations from slides. However, this presentation, you can pretty much get the gist of it by reading through it. I hope at some point they post it on YouTube because it is a really interesting presentation and it basically discusses the pain points that people typically have with PostgreSQL from the perspective of query planning, vacuum upgrades, connection management. So this is how many connections to have to your PostgreSQL server as well as are you using PG Bouncer or not? Or PG Pool, high availability and of course of special interest to this channel scaling. So this post talks a lot about the problems that PostgreSQL has or that people bring up that it has. And it doesn't necessarily come up with exact solutions, but a general direction in where the community generated PostgreSQL should follow. So if you use PostgreSQL for any purpose, I highly suggest you check out this presentation because it gives you a good perspective on not only where PostgreSQL is right now, but how it can be improved for the future. The next article is actually a YouTube video and its name is time series database lectures number six Mike Friedman, Timescale DB.
[00:07:37] And this is from the channel. The Carnegie Mellon University Database group. Now, this didn't happen this week, this was back in November. However, I just happened upon it while looking for other types of content and I had heard a little bit about Time Series databases, so I thought I would take a look at it. I had seen another Time Series database where basically they explained what they were doing and it sounded a lot like they were developing features that already existed in PostgreSQL. For example, they were relying on the operating system for a lot of the know inserts happened very fast, whereas updates, they were relatively slow, which is typically what PostgreSQL is like deletion happens via tombstone records, which is kind of similar. They had a process they went through called Compaction, which sounds like vacuum essentially, but they weren't an SQL database. They had an SQL like language, they had write ahead logs, they had indexes in memory, but in the next version they were going to disk, so they had to cache them in memory. So a lot of it, it looked like they were replicating the kinds of things that PostgreSQL already does. However, this particular presentation is on Timescale DB. Now, this is a PostgreSQL extension, so it uses all the power and the features of PostgreSQL, but then it adds on additional capabilities to support its time oriented feature. Basically a lot of high write rates in time oriented optimizations, as they say here. So if you've heard about Time Series DBS and are kind of interested in what they are, and if you're using PostgreSQL, I definitely suggest checking out this presentation because if I were ever to need these types of features, I would definitely reach out for this particular extension as opposed to changing the entire database that I'm using. But it gives you a good overview about the direction they're taking in terms of this extension.
[00:09:48] The next article is an overview of logical replication in PostgreSQL, and this is from the Several nines.com blog and it's exactly what it says. It gives you an overview of logical replication in PostgreSQL version ten. Its features, its use cases, limitations, and then breaks out how to actually use it by setting it up with the commands here. So if you're thinking about logical replication, definitely check out this blog post. Now, related to that, there is another blog post on the second Quadrant.com blog, and it's near zero downtime. Automated upgrades of PostgreSQL clusters in the cloud. And this is actually part one. And he goes over a presentation that he did discussing exactly what the title near Zero downtime Automated Upgrades of PostgreSQL Clusters in the Cloud. And they say automated because they were using Ansible to do this upgrade. So the blog post goes over some of the options for doing the upgrades and then goes more into depth about using logical replication for upgrades. But the actual presentation, the slides have been posted at this link that will be in the links in the show notes and the technique that they're using for a nine five to nine six upgrade actually uses PG logical, which to my understanding is the precursor to the logical replication feature in version ten. And essentially in this example you had a primary database server with three standbys and Pgbouncer is connecting to the primary database. When you want to do the upgrade, you take one of the standbys or an entirely new server and you set up version 9.6 and you do logical replication of all the data from the primary at 9.5 to the primary at 9.6.
[00:11:44] And then you set up one or more standbys on that version 9.6 using standard physical replication for example. And then at the point that you need to do the application switch, you simply transfer the Pgbouncer to point to the new 9.6 and then stand up the additional standbys that are needed. So if you're considering using logical replication as a way to upgrade your PostgreSQL database clusters, I definitely suggest checking out this blog post and then the presentation that it links to.
[00:12:18] The next article is from the Azure Microsoft.com blog and this is announcing general availability of Azure database services for MySQL and PostgreSQL. So basically, much like Amazon, these are now generally available to be used on Azure and there's a presentation here that goes over the feature set as well as a lot of items mentioned in the blog post. So if you are looking for a hosting service, Azure is now another possibility for you.
[00:12:50] The next article is using PG underscore Bad Plan to create statistics is a good plan, and this is from Joel on SQL blog
[email protected]. And this is a post related to a presentation that explained how to manually compare the predicted rows a query plan would use versus the actual rows. So if there's too much of a divergence with what the planner thinks should be there statistically versus what's actually there when it does the query, it will make you aware that these variances exist so that you can potentially use Create statistics. Again, this would be a version ten feature in PostgreSQL that would allow you to define dependencies. So for example, here they're looking at the dependencies for a zip code table with city, state and country. So generally certain zip codes reside in certain cities, states and countries. So if you suspect that you may have queries or data that could benefit from this, you may want to check it out.
[00:14:00] The next article is PostgreSQL replication slots and this is an in depth tutorial done by me that discusses how to set up PostgreSQL replication slots for physical replication if this is something you're thinking about implementing, I definitely suggest checking it out.
[00:14:18] That does it. For this episode of Scaling Postgres, you can get all the links to the content presented in the show. Notes be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode. You can also subscribe via YouTube or itunes. Thanks.