Hacking On Postgres is Hard! | Scaling Postgres 315

[00:00:01] If you've ever been disappointed with the pace of Postgres's development, or questioning why it takes forever to get certain features implemented, or why we have to rely on external tools or extensions to get certain work done, the first blog post this week kind of gives you some insight into that, but I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is hacking on PostgreSQL is really hard. This is from RHaas dot blogspot.com, and this particular post focuses purely on the technical aspect of working with postgres, namely the extreme difficulty of writing reasonably correct patches. And for most of the blog post, he does talk about his recent implementation of incremental backup that should be making its way into Postgres 17. And he worked a fair amount of time on this in 2023, finally getting the commit off on December 20. And then a day later, there were four commits, two by him, meaning Robert Haas, and two by Tom Lane, fixing defects in the commit. And then January 15, there were 16 more commits to resolve issues, only two of which were planned. So essentially 14 were unplanned. And he says that all of these were genuine issues that essentially were oversights. Then he had to deal with some data corrupting bugs on March 4, April 5, potentially April 19, April 25. So a lot of work to basically stabilize incremental backup. Now, of course there were tests, and other people did tests, but there were just some unforeseen things by them resulting in these problems. And he says, you know, one possible theory is I'm not very good at this whole hacking on postgres thing. But he doesn't think that's the whole explanation. Now, he doesn't talk too much about this, but I think the first problem that is apparent is basically a problem of oversight, not potentially seeing something. And I hate to say it, but as postgres grows larger and if it takes on more responsibilities, the domain knowledge that a committer or a programmer has to have in order to implement a feature without breaking something else somewhere else is just going to increase. So doing something really ambitious like incremental backup in the core of postgres is by its nature going to touch a lot of things and potentially cause unforeseen problems, and it starts to become an argument. Maybe separate programs or extensions should take some of these responsibilities, or maybe they should be broken off into their own projects. So, for example, I know there are independent backup tools, like there's Barman, there's peachy backrest. Maybe those should take the backup burden. And if we want to keep the existing postgres backup tools, maybe they should be broken off into their own project or extension. And the core postgres project just has the hooks to allow backup to work. So that way the surface area of what needs to be changed is not as significant. And this reminds me when we're talking about the release of PG vector and how it can have multiple releases in a given year, that's an advantage that postgres doesn't have. Essentially, it only has one release a year, so there's just a finite set of feature changes that can go into that release every year. But I just find it fascinating to think about now. He also talks about some of the other issues related to making hacking on PostgreSQL is really hard is that if you're a committer, you're not necessarily the person who's writing all the patches or all the changes, you're merely committing those changes to the code base. And as a result, as a committer you have a liability. So if you commit code to the project and it breaks the build farm or it causes an issue, it's incumbent upon you and whomever submitted the code to correct the problem. So as a consequence, some committers are very careful about what they choose to commit to the project. For that reason, they have a responsibility, a liability related to changes that people want to make. And he talks about the requirements of being committer. And you have to have an immense knowledge base of the code itself, as well as being able to read people's code and interpret that this is a good patch to go ahead and commit. Now he goes into a lot more detail that I don't have the time to cover here, but I found this post super interesting. And equally, a lot of the comments were particularly interesting as well. And one person made a phrase that I was thinking about when I was reading it, talking about minimizing the blast radius, essentially. So if a commit is made, a change is made to postgres. Try to make sure each commit is not going to impact too many things at once. Of course, the consequence of that is you're going smaller with your commits and slowing down velocity. Couple that with the fact that you only get essentially one feature release a year, that drops the velocity even further, which kind of goes back to maybe the postgres core team should focus on essentially the core postgres platform and rely on other sub projects or programs or extensions. Focus on adding new features to postgres. But a really great blog post definitely encourage you to check it out and share what your thoughts are as well. [00:05:23] Next piece of content, the notifier pattern for applications that use postgres. This is from brandier.org and he's talking about listen, notify and postgres, which is basically a messaging platform. So in postgres you can just set up to listen to a particular string. In this case he's calling it test topic. And then in another session you do pg notify. You reference that string, test topic and you send it a message which is basically just another string. And then on this connection that's listening, you receive a notification essentially on this channel test topic with the payload that was sent. So it's a very simple messaging system. But he has some advice for implementing this in your own project. And the first piece of advice is basically hold a single postgres connection per process. Or he also says per program, allow other components in the same program to use it to subscribe to any number of topics, wait for notifications, and distribute them to listening components as they're received. So that's a mouthful. So there is a diagram here, which is great. So basically he's saying the implementation you don't necessarily want to do is to set up a listener per channel, essentially, but you want a listener per program or per process. This program has a module he's calling a notifier that makes one connection to postgres to listen for those notifications messages being sent from postgres. So don't have multiple connections for each program, just one connection. The reason being is because it is a session that has to be maintained in an open state, waiting and listening for notifications to come across. And then it's the notifier's job to of course listen for those notifications coming in and then sending them to the different components that are required. So here he has three components that are turning on feature flags, or processing a job, or addressing a websocket event, etcetera. He talks a little bit about the implementation details and some example code here and, and he said it's very important for the notifier to use a buffered channel, which basically to me means try to keep things as asynchronous as possible because you don't want one component of the program not pulling items off the queue or having to wait for synchronous call if it's busy doing something else. And he says the other thing to keep in mind is that make sure of course, that this is a dedicated session. You can't have PG bouncer necessarily in between here unless you're using session pooling. Most people are using transaction pooling. That's fine to do for, say, your main application connections, but these workers using listen notify need to have a dedicated session open. So if you want to learn more, definitely check out this blog post from brandier.org dot next piece of content, also from brandier.org is histograms worked. So he's talking about some metrics that he's working with in terms of crunchy bridges metrics dashboard and historically they were just collecting the raw data, putting it in partition tables by day, and based upon whatever metric a user wanted to see, they would hit the different partition tables and aggregate the data needed. So that's a pretty simple implementation. And they tended to have around 10gb per partition table, although that grows over time as more people are using the service. But he decided to create a separate process that actually developed a histogram model of this raw data so he could store it granularly at varying levels, he says, like five minutes, 1 hour, 6 hours, etcetera. So it's essentially aggregated data, but you can query it based upon the granularity you want. And as a result of this, he dropped the size of these aggregate tables down to 1gb. So it makes it much easier to query the data for different granularities of a time range as well. Of course speeds it up immensely because it's ten times smaller. So if you want to learn how he did it in the package he chose for generating the histograms, definitely check out this blog post. [00:09:21] Next piece of content saturation arithmetic with a postgresql extension. This is from engineering dot adjust.com. And in terms of saturation arithmetic, this means when you're hitting the limit of an integer. So in the example here, if you tried to store this very long number looks like ten or twelve nines and cast it to an integer, you're basically going to get an out of range error. Or what can even happen. If you do some sort of aggregate or arithmetic on an integer and you're combining ones that are already at their max, you're going to hit an integer overrun error. Well, they developed a PG saturated int extension that allows you to avoid that error. It basically returns the max integer available. So they have a number of different comparison operators here that it supports arithmetic operators as well as an aggregate function sum. So they show an example of it in action when you try to cast a very large number into an integer, but they're calling a saturated int as opposed to giving an error, it gives you the max integer possible back. So I thought this was pretty interesting. If you want to learn more about it, definitely check out this blog post. [00:10:29] Next piece of content postgres planner quirks the impact of order by and limit on index usage. This is from pganalyze.com and this is Lucas's five minutes of postgres. In this scenario, he's talking about the case where you have a table, you're querying by an object id, and as long as you have an index on the object id, this should be a relatively fast operation. But if suddenly you add an order by id, do this query. The fast query becomes not so fast anymore because it stops using the index on the object id like they're doing here though, and without the order by it is using the object id index to return the query very fast. But when you add that order by the id, it starts using the primary key id of the table and not the object id. And it basically takes a really long time to find that particular object id in the data. Now of course, this is data dependent on how bad the performance will be in this case. But in this example you did add an order by id and performance got really really poor sub millisecond compared to ten minutes. Now of course, the question is why is it choosing this path? And it basically comes down to statistics. And Lucas did another example in the video that is shown here where he showed the reason that postgres isn't using it is because it thinks the costs are really inappropriate. And a possible solution, he said, is basically to increase the defaults statistics target. So collect more statistics so the planner has more information to choose a better path, a better plan. Another workaround he proposed is add a plus one to your order by so you order by the order id plus one. What that does is it essentially becomes like a functional index and it ignores the order by in planning considerations. So that gives you better performance. Or the other solution is to have a multi column index where the first column is shipping date and the second column is the order id. That should be super fast to return the rows. But check out his piece of content if you want to learn more. Related to that. Postgres FM had another episode last week. This one was on limit versus performance and they essentially talked about the same thing, order by and limit and the different performance issues that can come from that. And they covered a lot of similar topics and Nikolai was definitely in the camp that limits are always good, so you should always do a limit buy for your queries and if you run into an issues with orderby, then maybe you should be adding a multicolumn index to it to make that an efficient query. Although Michael mentioned if you're doing ad hoc queries and you're unsure of what the potential impact would be, go ahead and throw an explain in front of it to see what the planner thinks is the optimal path. And if it seems off to you, maybe hold off on that query or think again before deciding to run it. But you can listen to their show here or watch the YouTube video down here. [00:13:23] Next piece of content what's new in PG vector version 0.70? This is from supabase.com dot and we've covered a lot of this before, but I specifically want to mention their explanation of sparse vectors. So of course o seven implements the half vector or the float 16 vector representation. And he said in addition to doing indexes, you can also create tables using half vector so you can get additional space savings that way. So that's something I didn't realize, but he said that is possible to do. He also mentioned sparse vectors and what those do because there's so many dimensions, 15 over 1500 dimensions. If you're using OpenAI, for example, sparse vectors basically only store the non zero components of the vector, so it makes it much more efficient. So out of these 1536, if only three of those values are non zero, that's the only thing you would need to store. He also talked about the bit vectors with the binary quantization and also mentioned that there are three new distance operators for HNSW indexes. There's the l one distance operator for the Bitmap quantization. You also have hamming distance and jacquard distance, so feel free to check out this content if you want to learn more. Next piece of content buy temporal tables, PostgreSQL and SQL standard. This is from htombrovoskaya dot WordPress.com and she's a big time advocate for temporal tables. And this is basically storing the entire history of a piece of data in a table. And you use time ranges to determine at what time range a particular set of data was active. So if you have a unique order with an ID of seven that could be represented by multiple rows, each with a non overlapping time range, so that row shows the state of that id seven at each point in time. Now, she said postgres 17 implements a unitemporal framework with unitemporal primary keys and unique constraints. But she's really wanting the bi temporal and I was like what is that? But apparently with regard to temporal tables and I believe the SQL standard, there is system time, in other words, when data was inserted or changed in essentially the time in the real world. But then there's also application or business time that is an entirely different time range. And how I think about this is I used to work for a nonprofit. We would collect gifts at the end of the year, but it took a long amount of time to be able to process all the checks coming in at the end of the year. So not all gifts were processed until maybe five days after the end of the year. So the system time for those gifts being processed was five days after the year end, but the application or the business time that those need to be recognized was the end of the year. So the end of the calendar year or the end of the fiscal year for the nonprofit. So postgres 17 provides, I think, these system time, but not yet the application time. So that apparently is something that's being worked on. She also addressed some questions that people had had about don't temporal tables require a lot of disk space? And she indicated not really, but she goes into detail about that as well as doesn't it cause excessive IO? Again, she said it wasn't necessarily significant, but clearly you can check out this blog post if you want to learn more. But these temporal features are something that exists in the SQL standard and other database products have them, so it's good to see postgres getting them. So check this blog post out if you want to learn more. [00:16:59] Next piece of content how not to change PostgreSQL column type this is from notso dot boringsql.com and this is basically a reminder if you have a very large table and you need to change the data type, say from an int to a bigint, do not simply alter table table name alt column name to the new type because basically postgres has to rewrite the whole table. It has to implement locking. To do so, you might have indexes that need to be changed, not to mention the writing to the write ahead log, in addition to potential replication issues. So what you want to do is add an entirely new column, create a trigger to write to both the new column and the old column, and do any changes between it as well. Then backfill the new column with the old data and then in a single transaction switch over to the new column. Now, they don't give you the code to do that, but I remember in previous episodes of scaling postgres they have shown how to do this as a part of a single transaction. But check out this blog post if you want to learn more. [00:18:02] The last piece of content cursor tuple fraction and the PostgresQL JDbC driver this is from cyberattack postgresql.com and he says here, quote this article is kind of a failure story. So he was trying to achieve some new knowledge, but it didn't really work out how he had planned. But I wasn't as familiar with cursor tuple fraction because frankly, once I started using postgres, I kind of stopped using cursors or I didn't really have a real use case to use them because in web development, keeping a connection open and having cursors constantly interacting with the data is not a common occurrence. I used to use Oracle, where that was definitely more common, but what cursor tuple fraction does is determine how to optimize the queries that are being sent for the cursor. Should it be optimized for very fast retrieval of the first row, or should it be optimized to pull all the data and essentially materialize it for being able to quickly pull all the data? But if you want to learn more about his failed experiment, you can definitely check out this blog post. [00:19:05] I hope you enjoyed this episode. Be sure to check out scalingpostgrads.com, where you can find links to all the content mentioned, as well as sign up to receive notifications of each episode as they come out weekly on each show page. You can also find an audio podcast and eventually a transcript. Thanks, and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 19

100K Tenants, Extensions, Query Planning, Crosstabs | Scaling Postgres 19

Episode 319

When Postgres Development Stopped! | Scaling Postgres 319

Episode 49

Over One Billion, No Downtime, SQL Life, Throughput | Scaling Postgres 49