10-100 Times Higher Latency With Distributed Postgres | Scaling Postgres 298

Episode 298 January 14, 2024 00:19:28
10-100 Times Higher Latency With Distributed Postgres | Scaling Postgres 298
Scaling Postgres
10-100 Times Higher Latency With Distributed Postgres | Scaling Postgres 298

Jan 14 2024 | 00:19:28

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss hopes for Postgres in 2024, whether you need foreign keys, incremental backups, and five ways of doing not exists.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/298-10-100-times-higher-latency-with-distributed-postgres/

 

View Full Transcript

Episode Transcript

[00:00:00] Did you know that some distributed postgres solutions can increase latencies up to ten to a hundred times? We'll talk about that in today's episode, but I hope you, your friends, family and coworkers continue to do well. Our first piece of content is an overview of distributed PostgreSQL architectures. This is from crunchydata.com, and what I first noticed about this is it's from Marco Slott, and he mentioned that he is now working at crunchy data, whereas before he used to be at Citus data that then was acquired by Microsoft. So that's a pretty interesting move. I also saw where crunchy Data has a hosted solution called Crunchy Bridge, I believe, and now it supports Citis. So I wonder how much of a hand he had in that. But this particular post is exactly as it says, an overview of distributed postgres architectures, and specifically he's looking at things such as network attached block storage. An example is EBS on AWS. So as a post attaching a disk directly to an instance, it uses a network attached storage solution. He talks about read replicas, which are an out of box capability from postgres. The next one he covers is dbms optimized cloud storage. So this is like Aurora or AlloyDB at Google, where there's more of a separation between the storage solution and the database engine itself talks about active, active. So multiple primaries, which is something that BDR does in terms of bi directional replication. He talks about transparent sharding, which is something Citis does, and I think a few other products too, and then talks about distributed key value stores that have SQL bolted onto them. And he uses gigabyte as an example. [00:01:51] Now what he first establishes here is that the fastest way to get postgres is on a single machine because you are eliminating so much latency compared with these other distributed solutions. So as he says, quote, there's virtually no network latency on the database layer, and you can even colocate your application server. So if you have one server that's running the application and talking to the database, you can't really match the speed of that any other way, particularly if you're directly attaching high speed NVMe storage directly on that server, as opposed to having to go over the network. And he says here, quote, millions of I Ops are available depending on the machine configuration, and disk latency is measured in microseconds, so that's definitely the fastest way to go. But he doesn't say this here, but I would definitely say it's not the safest way to go, because the reason that people choose these distributed architectures is for protection of their data and higher availability of the data. And he does talk about that, of course. So that's being able to set up replication so you can copy data to another location or distribute it and partition it and protect it and make it available and not vulnerable to a single machine going down, and now you have no application anymore. And he specifically talks a lot about latency, because if you look at a lot of these distributed systems, the number one disadvantage of them is the higher latency for setting up a distributed postgres system and how latency really hurts, particularly on OLTP systems. But the first one he covers here is the network attached storage. So this basically separates the database engine itself from the storage area. So you do have to communicate across a network to access your storage. Plus he says, quote, the block storage service internally replicates to two or three storage nodes as well. So this definitely gives you higher durability, as he says here, and higher uptime to avoid failures of a particular one. But it comes with pretty big cons with higher disk latency. And for locally attached storage, maybe it's around 20 microseconds or so, but it can be 1000 microseconds for something like EBS. And even the IAPS is significantly impacted. So you go from millions to tens of thousands. Although of course you can pay a lot more money to get maybe up to 100,000. I don't know what the current limit, at least on AWS is, but you can definitely spend a lot more money to solve this. I don't know how much more money impacts this. I know they have different tiers or different versions of their storage systems where they're looking to improve this, but it's just a reality of physics when you separate things in this way, there's a latency in its communication. The next area he covers is read replicas. So this is having a primary system and creating replicas of the data that maybe you want to query for readonly purposes. And of course there's latency inherent in that. And if it gets too extreme, depending on how your application works, you can run into problems. And he gave an example of a shopping cart. So if a customer purchases something and then goes to look at the shopping cart, if the purchase is sent to the primary database, but you're using a replica to determine what's in the shopping cart, if there's significant latency with the replica being updated, someone looks at their cart and they don't see the item that they just placed in it, that could be a problem. [00:05:21] So as he says here, that's pretty much eventual consistency. So eventually you can read your the next error he mentions is dbms optimized cloud storage. Again, this is things like Aurora and Alloydb where they've created basically a whole new storage solution to handle database changes. And how he explains it is as opposed to in a standard installation where you're writing wall to the storage system, the write ahead log, but you're also writing to the data pages in the database files. So when changes are made within the database system in memory, it runs a checkpoint to save those changes to the disk. Whereas he says these dbms optimized solutions actually just write the wall and pass the wall around to different storage areas for these solutions. And sending this wall all around definitely increases the amount of writes that happen, as does the latency as well, because frequently you're copying this over availability zones in AWS's case. So he mentioned some of the pros of this as well as some of the cons. So for example, write latency is high by default, and of course you have the high cost and pricing of all that write activity essentially. Now, one of the biggest pros I see with this particular solution that I've seen some clients run into with just using standard replicas is with these solutions you can reuse the storage. So there's one storage system essentially, and you can put as many compute nodes on top of it as you want, whereas with the typical postgres read replica solution, every replica has its own copy of the data, whereas they're using a common storage pool with these solutions. To my understanding, next area he covers is active active. So this is where you have multiple primaries, each receding reads and writes. And in that you're going to inherently have latency because something's updated here and updated here and it takes a while for it to be put in sync. And then you have collision problems too, because what if they choose the same value to update by one in his example on both servers at the same time? Well, what should the value be? How are those conflicts managed? You need to be very careful with how you program your application if you're going to be using an active active solution. And this solution again is basically eventual consistency. Eventually you'll be able to read the writes that have been done, but there's no monotonic read consistency and a possibility that updates might conflict after a commit, which is not really possible in the other solutions that have been mentioned. [00:07:53] Next one he mentioned is transparent sharding. So this is the citizen solution, basically distributing your data across multiple tables. Now of course this comes with a lot of advantages, but again, the major disadvantage is you have high read and write latency because you're sending queries to a particular node and it must send them to other nodes that do the computation and then send it back over the network to the node that requested it to present the answer to the application. [00:08:19] And then he mentions here data model decisions have a high impact on performance as well, but the pros are it definitely allows you to scale your reads and writes in terms of cpu and I ops, so that's a benefit. Lastly, he does cover distributed key value storage with SQL. I'm not really going to cover that too much because I think the architectures are too dissimilar to postgres to make a good comparison. It feels a little bit like apples to oranges, even though they speak the postgres dialect. But I found this post super interesting. And thinking about all these distributed postgres solutions do give benefits, but one of the primary disadvantages that you'll see in all these is performance, and specifically with regard to latency. So definitely a great blog post. I suggest you check out next piece of content incremental backup what to copy this is from Rhos, blogspot.com and last week where we mentioned that there's been a commit in postgres 17 for an incremental backup solution. And this allows backing up just what has changed. So you don't have to do a full backup every time. Now you can just back up what has changed since the last backup. And Robert Haas is actually the committer of this particular patch, and he goes through his thought process of how he designed it, and I found this super interesting. So first he considered how general file copy tools identify what's changed, and they may use file sizes, modification times, or checksums to identify which files have changed. But that really wouldn't work for postgres because what he wants is a block level solution. So he needs to identify the modified blocks. And he said he might have figured out a way to do some checksums with regard to that, but he also didn't want to rely too much on timing, specifically the timing that the database server is at to make some of these decisions. And he does admit that, quote PG backcrest has had success with these kinds of techniques, but he chose a different path. The next way he considered is basically keeping a bitmap of what blocks have changed. And he mentions that PG track does this. And he didn't choose this particular approach because he was a little worried about quote code complexity, storage space requirements of this, the crash safety, and the performance. So what he ultimately used was PostgresQl's write ahead log to identify the modified blocks. Now, using the write ahead log to do this, he knew we couldn't use the full write headlock because that would be way too large of a thing to track because I literally have some customers that their wall files in a given day exceed the size of the database by maybe up to two times. So you can have a really high amount of wall generation. So just trying to read through that to see what's changed is a no go, basically, he said. But he thought, could we just get the information we need and store that? Because he says, quote, we only need information about which blocks were modified and which relations were truncated. So he doesn't need to know how many times something was changed or when it was changed, or the order just needs to know that it was or not. And he was able to define a very small amount of data from the wall that would be needed to be retained for this purpose. And that is the wall summarizer. So we covered this incremental backup last week, and you have to turn on a wall summarization process in order to be able to do incremental backups. And that's what the wall summarization process does. Basically, it stores these summaries of what blocks have changed in the files under PGwall summaries. And he says, quote, you get one file per checkpoint cycle. And basically when you want to do a backup, you just request the backup and you specify the backup manifest file and it compares that with the summaries to determine what needs to be backed up in that incremental backup. And he did have to deal with particular issues like what if a relation is truncated and then a new relation is created with the same name, how to handle some of those potential issues and conflicts. And he discussed that here. But a great blog post explaining the process and that he definitely encourages. Please everyone, test as soon as you're able, because it would be great to identify and get this working really well because as he says here, quote, this feature is not guaranteed to appear in PostgresQl 17, but I think a lot of people would very much like for it to be there. So if you have the ability to test out these changes with the developer version, I'd go ahead and do it. [00:12:56] Next blog post is also from rhos blogspot.com and this is incremental backups, evergreen and other use cases. So here he's theorizing on how people may use incremental backups. And I covered a lot of this last week. Maybe you have a full backup once a week and incrementals every day, or maybe you have full backups every day and incrementals in between, and then the wall files to bring you up to a particular point in time. So he's interested to see how people are going to be using this new capability. But he also mentioned something here is that you don't have to keep a ton of incremental backups. And what you could do is if you have a full backup in this example on Sunday and then create an incremental on Monday, you can use PG combined backup to combine Mondays with Sundays to create a new Monday full backup. So now this is your new full backup. Then you can do another incremental backup, and if you want to consolidate them, you could use the same procedure. But these are some thoughts on how it might be used next piece of content waited for PostgresQl 17 add support for incremental backup this is from depeche.com and this is a post that goes into detail running the commands to get incremental backup working. Now, because I covered this last week in a separate blog post, I won't cover this here, but you can check this out for a different person's perspective on using it. [00:14:17] Next piece of content PostgreSql is the dbms of the year 2023. This is from dbengins.com and this is great being recognized yet again. What I found interesting, they have the last winners for the last ten years, and PostgreSQL has won four years out of the last ten years, which is an impressive feat because other database systems only won at most two years out of the ten. So that's pretty good. [00:14:43] Next piece of content there was another episode of Postgres FM last week. This was on explain, and this is a solo episode by Michael where he talks all about the explain plan and the best approach to use this command to understand how your queries are performing. So you can definitely listen to the episode with this link or watch the YouTube video down here. [00:15:03] Next piece of content create an AI expert with open source tools and Pgvector. This is from procona.com and if you're looking to build your own artificial intelligence solution, your own expert of your proprietary data, and you don't want to necessarily use something like chat, GPT, unincorporated this blog post walks you through how to set up your own expert system using a fair amount of python. But along with postgres and the PGvector extension. Now in their examples here, they are using their procona operator and kubernetes. But as long as you have postgres and the pgvector extension, you should be able to mimic what they're doing here. But they have code for all the process of building and setting this up. So if you're looking to do some AI work, definitely encourage you to check out this blog post. [00:15:51] Next piece of content best PG Dunk compression settings for postgres in 2024 this is from Kmopple GitHub IO and he did some tests with various different compression solutions with pgdump, Gzip, LZ four and Z standard, and his ultimate conclusion was Zstandard indeed should be the new default pg dump compression method for most data sets. And that was a conclusion I came to, I think six months to a year ago where I started using Zstandard before they were incorporated in the postgres tools. So basically I would pipe PG dump or pgbase backup into Zstandard because it can use multiple cpus in parallel. Now he has his results down here. I had a little difficulty interpreting some of what he mentioned here, but you can definitely review this yourself. I know when I was looking at it a while ago, it seemed like LZ four was very fast for smaller sets of data. So maybe if we're wanting to compress wall files, LZ four might be faster. Zstander took a little bit of time to rev up, as it were. So it worked really well on really large data files, like a backup would be. And the benefits that we found is really good compression ratios, really good speed compared to something like Gzip. And the third big thing is being able to use multiple cores. Now his tests here, he said he didn't use multiple cores, but that's a huge advantage of Zsander compared to stock Gzip. But check out his blog post if you want to learn more about this comparison. [00:17:25] Next piece of content my postgresql wish list this is from ryangill.com and I found this fascinating. It's basically, I don't know if it's 20 or 30 different wishes he had, postgres had, and it ranges from I wish I could set a column as insertable but not updatable. So basically make it immutable. So I know some frameworks they have a created at date for every table they make. Well, that is an example that created at once it's set should not be changed. The other one is I wish I could set a column to be generated on update and basically that's what the updated at column is typical for frameworks that is supposed to be updated with every data change to the table. Now you can use triggers for this, but his wish is for something that you don't really have to use explicit triggers, and that there's some sort of feature even if it uses triggers internally. He has a whole set of wishes around foreign keys and json some things that seemed really hard to do. I wish that we could control the default order of columns without having to rewrite the table. That sounds really hard to do, but I thought this was pretty interesting and a lot of interesting features that you hmm, what if. [00:18:39] And for the last piece of content, there's actually two blog posts. So if you use Ruby or Rails, Andrew Atkinson was actually on two different podcasts discussing his book coming out about postgres and rails. So you can check out remote Ruby unleashing the power of postgres with Andrew Atkinson, or you can check out the rails change log 14 postgresQL for Rails developers with Andrew Atkinson. [00:19:02] I hope you enjoyed this episode. Be sure to visit scalingpostgres.com where you can find links to all the content discussed, as well as the podcast version of the show and the full transcript there. You can also sign up to receive notifications of each episode as it's released. Thanks and I'll see you next week. [00:19:25] Our.

Other Episodes

Episode 79

September 02, 2019 00:10:06
Episode Cover

PgBouncer SCRAM, Art of PostgreSQL, Distributed Time-Series, Window Functions | Scaling Postgres 79

In this episode of Scaling Postgres, we discuss PgBouncer SCRAM support, The Art of PostgreSQL, making Postgres a distributed time-series DB and window functions....

Listen

Episode 246

December 18, 2022 00:10:18
Episode Cover

Columnar Tables, Advent of Code, PG 16 Features, Commit Times | Scaling Postgres 246

In this episode of Scaling Postgres, we discuss columnar table storage, solving Advent of Code using Postgres, new features coming in Postgres 16 and...

Listen

Episode 51

February 18, 2019 00:15:58
Episode Cover

Value of SQL, Window Functions, DB Migrations, Data Storage | Scaling Postgres 51

In this episode of Scaling Postgres, we review articles covering the value of SQL, window functions, scaling database migrations and efficient data storage. To...

Listen