Sharding, Giving Back, SCRAM, Partitioning Evolution

[00:00:00] In this episode of Scaling Postgres, we talk about sharding, giving back scram and partitioning evolution. I'm creston. Jameson and this is scaling postgres episode 36. [00:00:21] Alright, I hope everyone is doing well. So this week, our first article is why Citus Data is donating 1% equity to PostgreSQL organizations. This is from the Citusdata.com blog and it basically is exactly as they've described to contribute to PostgreSQL. Citus Data has contributed 1% of its stock to the project and specifically it's the PostgreSQL Association, which is in the US. And that will work with PostgreSQL Europe to support the growth, education and future innovation of postgres in both the US and Europe. So again, if you're not familiar, citus Data as an extension to Postgres for Sharding your database and they also offer it as a service. And I definitely encourage you to read this blog post because it's really great and I really like some of the things that they say here. And I found this part interesting. [00:01:13] PostgreSQL is the world's only independent database and they talk about it's significant because it's unique and that no single company backs or represents the Postgres project. So it's essentially a community, a coalition of companies, organizations and individuals that have made and will continue to make PostgreSQL the database that it is. I definitely encourage you to check out this blog post and I really find it interesting the investment that Citus Data has made here. [00:01:45] The next post is sharding your data with PostgreSQL eleven. And this is from the PG IO blog. Now, as we've discussed in previous episodes of Scaling Postgres, one of the big time features of PostgreSQL Eleven is all the different partitioning enhancements that have been added. And one of the ones that hasn't quite been mentioned in a lot of the posts is a change for foreign data wrappers that kind of lets you start setting up your own sharding system, if that's of interest to you, and they walk through how to do it in this blog post. Basically you're going to be using a declarative partitioning introduced in version Ten. So in their example here, they created a temperatures table and partitioned it by a range of dates and created some partitions for it by year. So they did 2017, 2018, and this is just your standard partitioning. But the interesting part is getting into the foreign data wrapper. So they create the extension Postgres foreign data wrapper FDW, and they essentially set up a second server system they're calling box Two. So it's a separate server instance. And they also map a user for that second server instance. On that second box, they're going to create a temperatures table for 2016. So the primary database server, as it will, has 2017 2018. So they create 2016 on their separate instance. Then within their primary, they create that foreign table and make it a partition table and they indicate it's on the second server. And now you can do inserts into temperatures with the old date on the primary server and it will write that data to the second server. So it's communicating over the network to store the data for this partition child table on that second server. And they say here a quote being able to insert rows into a remote partition is new in version eleven. With this feature you can now have your data sharded logically via partitions and physically by the foreign data wrapper. So again, Sharding is not something that you typically hear about as a mainstay feature of PostgreSQL, but we've been slowly moving towards it and this kind of puts some additional pieces in play where you could potentially start using these techniques now. But with each release we're going to get more features that make Sharding a reality for more use cases. So if you're potentially interested in Sharding, this is definitely a blog post to check out. [00:04:14] The next post is how to set up Scramshaw 256 authentication in PostgreSQL. And this is from the Hacksoclock Blogspot.com blog and basically most everybody's using MD Five authentication. However, Scram support was added in PostgreSQL ten and basically it's much more secure and you should look to move to Scram as soon as you're able. And one of the main delays you might get here, some libraries may not support this level of authentication yet. So before you choose to go this route, definitely confirm that all of your libraries that need to communicate with PostgreSQL support this type of authentication. And he just goes over the steps, how to set up and the process to run through and it was all very clear to me. So if you're wanting to increase your security by using Scram, definitely a blog post to check out the next post is PostgreSQL eleven partitioning evolution from postgres 9.6 to eleven and basically table partitioning has existed for a long time in postgres and it wasn't until ten that they added declarative partitioning and then much more features in eleven. And I keep saying this in different episodes and talking about blog posts, but this kind of lays out a feature comparison you may see for a product that's on sale and you could see the significant enhancements that have been added via eleven compared to ten. And even in 9.6 had hardly anything, only a little bit of optimizer partition elimination. So again, this reiterates it in a very visible form, the number of enhancements that have been added in PostgreSQL eleven with regard to partitioning. And the other thing they talk about here is the performance. Now here they're loading partition data, 10 million rows worth. With 100 partitions we could see a dramatic increase in speed from version 9.6 to ten and eleven, so really big improvements. [00:06:15] And here they're looking up a single record and performing some DML statements and you can see going from 9.6 to ten to eleven. Again, dramatic improvements in selects inserts. It looks in terms of updates we lost a little bit with Ten but have gained it back in eleven and then Deletes are about the same. But definitely it talks a lot about partitioning and how if you're not using it, PostgreSQL eleven is definitely a version to start trying it out if you think it'll help you scale your database better. [00:06:45] The next post is actually the YouTube channel I've discussed for a few episodes of Scaling Postgres, but it's the Postgres Open SV 2018 YouTube channel. I've actually gone through a few of these videos and I want to kind of mention the top one I've seen thus far as well as a runner up, I would say. So out of the ones I've seen thus far, the one that I would give kind of the top honors is Dude, where's my byte? And basically he traces where is your data stored actually in postgres when it stores it. And he talks about where on the file system looking in the different directories and the base directories, what's stored there. And he talks about how that data is there and how it gets there. Like when you create a database, where is that actually created? When you create a table, where is that in the file system? And then he talks about the importance of when you insert data, why does it take a while to get there. And he talks about the different processes in terms of background writing and checkpoints and wall. And basically he goes through a lot of the rationale and the engineering behind PostgreSQL. So it is a bit of a Whirlwind tour but it gives you a good overview on kind of how postgres works and kind of why it works the way it does. So out of the videos I've seen thus far, it's not specifically related to scaling, but I would say this is one of the better presentations I've seen thus far from that conference. And what I'll do is over the next few weeks as I've watched more of these videos, I'll definitely share what I think are some of the top videos from that YouTube channel. Now, the runner up is the evolution of postgres high availability. So again, this talk goes from postgres versions prior to, I believe, eight and going up to all the different changes. And essentially what it's mostly talking about is replication. So at first you had wall shipping where you just take your wall files and you ship them to another server and restore them there to be able to do real time physical streaming and then moving into logical replication and logical decoding of the wall files and ending up talking about bi directional replication. Now, this is not something that the community PostgreSQL does, but Second Quadrant does have their BDR, their bi directional replication product and he talks about that and some of the trade offs in using that. So if you're wanting to learn more about replication or maybe altering or changing how you're doing? Some of it definitely a presentation to watch. The next post is PostgreSQL locking part Two heavyweight locks. And this is from the Procona.com blog. And basically this is just an overview of all sorts of different locks that are available in the system. So he talks about Row sharing locks, exclusive locks, different Share locks, Share Update, exclusive, Access Exclusive, and talk about PG advisory locks and Deadlocks. So this is definitely a great review for learning all about the different locks that are in PostgreSQL that the system uses as well as you could use to lock certain objects. [00:09:39] The last post is basically an announcement that Pipeline DB has reached 1.0.0, and this is a high performance times aggregation for PostgreSQL. And the TLDR is Pipeline DB is now a postgres extension. So similar to Timescale DB, they've made their time series database essentially now a PostgreSQL extension. It used to be a fork, but now it's moved back to an extension, and they classify it as a enabling real time reporting use cases at scale where only summary data is needed. And it's now licensed under the liberal Apache 20 license. So if you're potentially interested in Pipeline DB or a time series database, that's essentially just an extension of PostgreSQL, definitely a blog post to check out. [00:10:27] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can sign up via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 175

Zheap Status, Sequence Gaps, Efficient Pagination, Set Up Patroni | Scaling Postgres 175

Episode 288

Index Corruption From ICU Collation Change | Scaling Postgres 288

Episode 348

The Future Is Containers? | Scaling Postgres 348

Sharding, Giving Back, SCRAM, Partitioning Evolution | Scaling Postgres 36