DocumentDB Movement | Scaling Postgres 381

[00:00:00] About a decade ago, document oriented databases or key value stored databases were all the rage. It seemed like most applications would start using those as opposed to SQL databases. But today with the popularity of postgres, it's clear that SQL databases are stronger than ever. [00:00:20] However, we're going to talk about a project that may revitalize some document database use cases, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is documentdb joins the Linux Foundation. This is from opensource.Microsoft.com and I haven't kept too much up with what documentdb is, but it is a postgres back. So it uses postgres behind the scenes open source document database project and Microsoft released it earlier, earlier this year and it's basically meant to be compatible with the open source MongoDB drivers and tools. So basically it acts like MongoDB, but it's running Postgres under the covers and it's all open source. And that's really been emphasized now that the Linux foundation is going to be shepherding it now. Now you may be wondering why join the Linux Foundation? And they do talk about a commitment to open source and a number of different things, but the next post we're going to talk about is basically AWS joining the documentdb project. [00:01:29] And I could imagine moving this project to the Linux foundation as a bit of an impartial third party. And now other corporations are going to join in and say, all right, let's make this as strong an open source standard as we can. [00:01:43] And in addition to that post you have comments by AWS listed here, also by someone from Google, Cloud and even Yugabyte. So now all these corporations are joining in to support and build up the documentdb project. So that's pretty interesting. [00:02:01] I'm not going to jump out and start using something like documentdb, but it's interesting in terms of the next post, this is AWS posting about it. Aws joins the DocumentDB project to build interoperable open source document database technology. [00:02:16] This is from aws.Amazon.com and again they're talking about this project maintaining MongoDB API compatibility, different innovations they want to work on as well. So check this out if you want to learn more. [00:02:29] Next piece of content. There was another episode of postgres FM last week. This one was on multi column indexes. [00:02:36] So Nick and Michael discussed how important multi column indexes can be. Because actually if you look at some of the documentation within postgres, they say they should be used separately, sparingly but Michael mentions that that's actually not the case. He sees is that there are frequent use cases where they are the best solution for performance problem. And I wholeheartedly agree there are some engagements I'm on where 75% of my recommendations are okay, use a multi column index in this way because nine times out of ten it's so much more efficient than searching on two different existing indexes on single columns or heaven forbid three different ones and trying to hash those together. [00:03:16] They also really help with ordering. So if you are narrowing a query by a column or two and then you need to order it by a third column, adding that order as the last column in the set of multi columns can really help in those instances. [00:03:30] The thing to remember with multi columns is that the first column should generally be the most selective. And also that first column in a multi column index no longer needs to have its own independent index. [00:03:42] So the multi column index can easily satisfy queries when you're just looking at that single column. That's not true for the other columns. [00:03:51] So if you have a query that's trying to narrow by say the second column in a multi column index, you're going to need a separate index on that column. They did also discuss booleans and potentially adding them to a multi column index. I usually try to avoid that because they're not very selective in general. I usually like to use partial indexes for those which they did mention in the show as well, or also status fields, basically low cardinality fields. I think partial indexes work better for that than trying to include them as additional columns. But they did have a warning that you want to be careful having too large a multi column index because adding all of those columns could actually hurt your update performance if they prevent you from having heap only tuple updates or hot updates. So the performance gains in terms of queries can be great, but you want to make sure you're not putting too many columns in these types of indexes or indexing too many columns because you actually may hurt your update performance. [00:04:51] But if you want to learn more, you can definitely listen to the episode here or watch the video down here. [00:04:56] Next piece of content PostgreSQL 18 scram pass through authentication for Foreign Data Wrapper Connections this is from enterprisedb.com and this is a great enhancement where if you're using a postgres foreign data wrapper or the older dblink foreign data wrapper, you don't need to use plain text passwords in your user mapping anymore. So basically it can pass through that Scram authentication right to the foreign server, which is awesome. And they have an example of setting this up here for scram shot 256. [00:05:31] Oh, and I should mention this is something that's supposed to be coming in postgres18 as I mentioned in the title, but if you want to learn more about this, definitely check out this blog post. Next Piece of content introduction to Postgres extension development this is from pgedge.com and this blog post works through creating your own C extension that you can use with postgres and demonstrates how it's not too hard to do. So if you're interested in that, you can check out this blog post. [00:05:59] Next piece of content LLMs are new database users. Now we need a way to measure them meet text to SQL eval this is from tigerdata.com and this is a tool that they're using to evaluate how well LLMs interpret text and generate the correct SQL. [00:06:18] So they talked a little bit about this last week where they developed a semantic catalog to give LLMs more context for solving SQL queries. Well now they needed a better way to evaluate the LLMs being able to generate the right SQL and they developed this tool to do that. [00:06:36] So if you want to learn more about that, you can definitely check out the tool and download it. And the last piece of content Using JWT or JSON web tokens to establish a trusted context for RLS or row level security this is from vondger me and he's saying how he generally likes row level security but it can be a hassle identifying the trusted context for RLS policies. So this is some identifier to determine what data someone can view, such as say this person can see all of the data for this particular tenant ID in a multi tenant database. He says many times people use roles and define a current user, but he says that also means there has to be a role per application user. So now you have to have database role management for those database objects. [00:07:27] And of course this can also complicate pooling because these roles are set using session variables essentially. So the solution he came up with is using JSON web tokens and he basically calls them cryptography based context. [00:07:41] So this context is generated by a trusted system and delivered via JSON web token. And he describes his architecture and how he set it up. So if you're interested in this and you use row level security, maybe you want to check it out. [00:07:57] And now it's time for my consulting corner. It seems like the summer of upgrades has been extended a little bit into the fall of upgrades as some of the projects have been delayed just a little bit for different reasons. Some things I'm working on recently is actually looking at some issues with RDS proxy in aws. [00:08:18] More than once I've actually seen when you are proxying writers or things that send, insert, update, deletes, you get a really good ratio of number of client connections to database connections, meaning you can have a lot of client connections and very few database connections. However, for the readers or replica systems that mostly receive select queries, the ratio has not been as good as and in a few cases I've seen the number of database connections far exceeding the number of client connections, which is not a behavior you're supposed to be seeing necessarily, because the point of the proxy, at least when you're doing transaction pooling, is supposed to minimize the number of database connections you're using. [00:09:03] Now, there might be some kind of application reason that's happening. We're looking into what might be the issue, but I actually have seen this behavior more than once, so I'm not quite sure what is going on. So out of curiosity, have you run into a similar type of issue? Please let me know in the comments below. Other than that, I hope the final days of summer are treating you well. [00:09:25] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 384

Preparing For Postgres 18 | Scaling Postgres 384

Episode 392

100 Times Faster By Skipping! | Scaling Postgres 392

Episode 362

Optimizing For Time-Series Data | Scaling Postgres 362