One Billion Vectors On Postgres | Scaling Postgres 401

[00:00:00] I've reviewed a fair amount of content, particularly the content from Postgres FM and Nick tends to say once you hit a billion vectors things start to fall apart. [00:00:11] Well, we have a couple of posts this week that talk about vectors. One is being able to search through a billion vectors on Postgres as well as another technique that kind of bypasses a huge index entirely. Also, no consulting corner this week because there is a lot of content I wanted to try to cover in the show this week, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is Scaling Vector search to 1 billion on PostgreSQL. This is from blog VectorCore AI and the subtitle is fast indexing in two hours using one machine. [00:00:49] So they are using VectorCord, which I think is a Postgres extension. I don't think it's open source from what I recall, but you can check their website if you want to learn more about their pricing. [00:01:01] But they did a benchmark of the deep 1B benchmark, which is Yandex's 1,096 dimensional embeddings benchmark, and they compared two others here, SilaDB and YugaByteDB and showed latency and recall performance. They were middle of the pack with latency a little lower on the recall, but still 95% in terms of the hardware. They only used one machine which is an i7 ie 6x large which is not gargantuan, and the build time took 1.8 hours compared to either it not being reported or over a 24 hour build time for the index in a siloDB. And I should note some of the other hardware here they're using potentially six instances whereas VectorCorp is using just one. [00:01:54] And they even have a memory usage map where you can see how much memory is being used by Vector Cord compared to some other vector search systems. So it is a 24 VCPU machine, 192 gigabytes of memory and fair amount of disk space that it comes with two 7.5 terabyte drives at the cost of $2,200 a month. And they drill into the more specific results results as well. [00:02:20] So you know these benchmark posts you always have to take with a grain of salt, but it's definitely indicating a promise of being able to handle at least a billion vectors and presumably more than that. [00:02:32] So check this post out and visit the VectorCord website if you want to learn more. [00:02:36] Next piece of content retrieve and re rank personalized search without leaving postgres this is from paradedb.com and they're taking the scenario where someone wants to search for king and they're saying, okay, do you want to know about the Lord of the Rings, the Return of the King, or do you want to look at a movie called the King's Speech? And deciding how to rank those is really probably a question of what you've searched for before, what clues exist in your personal data to determine what you're potentially interested in searching for. So you can imagine an AI model trying to interpret what you're looking for. Now you may have your own opinion on whether doing this is good or not, but I really like this aspect of how they're thinking about searching data. [00:03:26] So they are proposing something they're calling retrieve and then re rank. So we've looked at use cases where people are using like a postgres full text search to do a lexical search or the BM25 index that's similar to Elastasearch is search engine to lexical search. And then do a semantic or a vector search using say PGvector and then combining both of those results using reciprocal ranked fusion to give you an ultimate answer. But this proposal is not using that approach. [00:04:02] In this proposal they're doing retrieval only using a BM25 index. So essentially the where clause of a query would use the BM25 index and it would pull all the results. [00:04:14] You could see, you know, the top 100 candidates for the query. Then the question is, all right, how should we rank these? [00:04:21] Then you can just use a semantic or a vector search applied to those hundred results, a thousand results. And importantly, you wouldn't need an index on the vector to be able to search for that. [00:04:34] So again, going back to the example, you're looking for a king. Well, given other context, either through personalization or other things associated with a query, you can use the semantic search to determine which result is more appropriate and rank it higher. So essentially the re ranking is what you would do in the order clause. Because actually when you work with vectors, that's where you put the search. It's in the order by clause for vectors. But they go through the process of setting this up and giving an example of this. So what I thought was interesting about this blog post, it's kind of thinking different and sitenary will let's use BM25 for the search to eliminate as many records as possible and then bring in semantic search to do final ordering to make sure the relevance is as appropriate as possible. That way you don't have to have an index on your vectors to be able to do this type of searching. Now, this won't work in all use cases, of course. So for example, if you're looking for something to eat and you search pizza, it's not going to find spaghetti because it's similar to pizza in that it has, you know, wheat and tomato sauce. You still need semantic search or vector search to find things like that. But I thought this was interesting. You can check it out if you want. Next piece of content. Also along the thinking outside the box arena is unconventional PostgreSQL optimizations. This is from Hackybonita.com and he proposes three unconventional techniques for dealing with postgres issues. [00:06:08] The first is being able to eliminate full table scans based upon check constraints. So imagine you have a table and you have a, say a status column, or in this case he has a plan check column that has free and pro available in the check constraint. But then someone does a search for it and they actually capitalize pro and it returns zero results. Of course, now the problem is it had to do a full sequential scan through the table to find those results. So it was doing processing of the system. But he says, well, why do we have to do that? We already have a constraint exclusion on the database that should know that this doesn't exist. And he says there is actually a way to do that, and that's by changing the constraint exclusion parameter. [00:06:53] So here he changed it from its default of partition because you definitely want constraint exclusion for your partitions. But he sets it to on. And this takes into account for all queries whether to actually do the scan of the table or not. Now, we did say the reason this is not on by default is that usually for small queries, the planning overhead exceeds what it would take to just execute the query. Therefore its default is left at partition. [00:07:23] And he's not necessarily proposing changing this for your typical application server or an OLTP type workload. [00:07:31] But if you have a data warehouse or data mart and there are analysts doing different types of queries, they may hit something like this. And turning on constrained exclusion might minimize unnecessary work in the system. The next optimization he mentioned is optimize for lower cardinality with function based index. [00:07:52] So he has a sales table here with a timestamp and the query is looking for sales within a particular date range. And he says, well, how do you make this faster? You can slap a B tree index on it. And he says, okay, great, that index is working, but you've just added an index that is approximately half the size of the existing table at 214 megabytes. So he says you could do this or because you're only interested in a date range, not a timestamp range, you can actually build a date based index on, on that timestamp. So he has a functional index showing that here. [00:08:29] And if you look at the size of the index, it's less than half the size because it's only looking at a date as opposed to the full timestamp. And in addition, the query speed is a little bit faster as well. But he said there are some downsides. You have to always query the data as you have set up that functional index, otherwise it's not going to use the index. So in this example here it's not using the index, although most one exists for that date range. So they definitely need to match. Now he says one alternative you can do to handle that is create a view. [00:08:59] So that way it's already built in. But then he says, okay, well people could bypass the view, they could still query the tables. Another solution he proposes is actually using generated columns. So add the date field to the table and then create the index on that. But again, you're starting to lose your size advantage because now you have another date column in the table. [00:09:21] So yes, maybe the index is smaller, but now you've got more data in the table itself. He says, oh, well, maybe you could use virtual generated columns. But the problem with that is currently you cannot apply an index to them. And he says hopefully in the future they'll allow this for use cases such as this. [00:09:39] The final recommendation he has is enforce uniqueness with a hash index. [00:09:44] So imagine you have a URL table. [00:09:48] Now you can slap a unique B tree index on it to make sure the URL is unique. And of course this lets you search the URLs very quickly. But the problem is the index is about the same size as the index, so it's huge. So he said, ah, maybe we can do a unique hash index. The problem is you can't create a hash index that is unique. Now I'm not quite sure why. Maybe it's because it wants to avoid collisions, because when you hash a value, there is the potential for collision. So maybe that's why they don't allow it. However, they do allow creating an exclusion constraint on a table using a hash, which essentially creates a unique index that is a hash. And he tested it out here and everything seems to work. He's getting good query performance, actually better than with the B tree index. And the index is about a fifth the size as well. But he does say there are a few limitations with using these exclusion constraints is that the column can't be referenced by foreign keys and there are limitations when doing insert on conflict. But check this out if you want to learn more. [00:10:59] Next piece of content. Turbocharging Listen notify with a 40x boost. This is from Robbins.in. and apparently there is a new commit in postgres maybe going into Postgres 19 that that optimizes some Listen notify functionality. So basically when a backend sends a notification, the system needs to then send that to what backends are listening. And historically it's done a linear search or a walk through a list for each notification. And this starts getting super slow if you have tons of unique channels, like thousands and thousands of unique channels. [00:11:37] So to optimize it, they've actually created a shared hash tables to track the listeners. So now when a notification comes in, they know exactly who to send it to. And he says here the implementation uses partitioned hash locking, allowing concurrent listen unlisten commands without global lock contention and some other optimizations to reduce CPU context switches as well as adding a specific wait event for you to be able to view via PGSTATactivity. Now, I don't know if this optimizes the global lock that impacts commits being serialized. I don't know if that's specifically part of this optimization. I'm assuming it's not. But still this is a great enhancement, particularly if you have a lot of channels listening. So you can look at the performance benchmark he looked at here and you start seeing a difference once you get to about 1,000 listeners and then it just goes off the rails as you go up to 100,000 listeners. So definitely if you have a use case like this, this could improve performance for you. So check it out if you're interested. [00:12:43] Next piece of content. The strange case of the underestimated Merge Join node. This is from Yulif GitHub IO and he's talking about a customer that contacted them and the first execution of a batch process was very slow and everyone after that was fast. [00:13:02] So they said, okay, well just do a vacuum analyze after your batch process and you should be good. And they said, yeah, that's not what's happening. So they actually dug into it and the first part of the query looked like this. [00:13:16] So relatively complex Merge join and then the second execution looked like this. So dramatically faster. Seems like, I don't know, maybe 300,000 times faster. But the thing that was very suspicious about the first execution is that there were 71,000 dirty buffers. So something was doing a lot of writing while this query was running. And what they determined was all this writing was due to index pages setting the dead flag for index tuples that were pointing to dead heap tuples. So basically as batch process deleted a bunch of tuples, but the visibility information is not present in the index. So when the index is used, it looks and says, oh, this tuple is dead. I should set the dead flag for this index. So that's what all this dirty writing was going on while the query was being executed. And the issue with it is that this can take some time. [00:14:15] But there was a patch that Simon Riggs did in the getactual variable endpoint function and limits the number of heap pages that could be fetched. So after reading 100 heap pages, it gives up, preventing a planning time explosion. So basically, the first time the query runs, this function gives up. So the planner just uses whatever values it has in the histogram and results in this essentially terrible plan. But the second time when it's run, essentially enough index entries have been marked as killed, this function actually succeeds and you get actual data that the planner can use to develop an accurate plan to run. [00:14:59] So I found this fascinating. I mean, I don't know of a particular way to fix it. He's just basically describing the issue as it exists and why it happens. So if you want to learn more, definitely encourage you to check this out. Next piece of content. Understanding Alter table behavior on partition tables in PostgreSQL this is from Higo CA. [00:15:21] Now this is interesting because when you alter a table, a lot of the postgres documentation says exactly what happens, but it's not as clear what happens when you do it on a partition table. Can you do a particular alter table command on the partition table or on the partition table? So a lot of the behavior is arguably undocumented until now. [00:15:46] So this blog post I think is a good reference to keep on hand because he goes through the process of testing do things get propagated, do they get inherited for new partitions? And what is the effect of only when you run it? Because you can run particular commands only on a parent table or can you not? [00:16:06] And then independence, can parent and child partitions have different values? So he went through and tested all of these different commands. Now, I'm not going to go through them, but he really breaks down very well, what commands work, why they don't work. I mean One that I found particularly interesting is that you can have different defaults apparently on the parents versus the children tables. But again, encourage you to bookmark this in case you're ever curious about how something works with partition tables. [00:16:39] Next piece of content introducing Postgres best practices this is from Supabase.com this is actually a GitHub repo that has information to communicate to LLMs you're working with on best practices with regard to working with databases. [00:16:57] So this is not an MCP server. This is a set of docs you can load to let your LLM read or you can install a plugin via clog code to use it. The thing that would hesitate from me using this is row level security because they talk a lot about row level security in here and I don't know if it's just the clients I've worked with, but I've never seen anyone use row level security in my clients. But I know it is something that Supabase uses a lot so just keep that in mind if you're looking at it. Next piece of content PostgreSQL on Kubernetes vs VMS a technical decision Guide this is from Stormatics Tech and this is a question I've exactly asked myself at times. It's like in terms of should I move from VMS to Kubernetes or should I stay where I am? And he works through a decision path you can go through to determine which best fits your needs. [00:17:53] So I thought this was a particular interesting thought piece about what you should choose based upon your infrastructure, your expertise and what you're planning to do. [00:18:03] And his conclusion is pretty simple. Kubernetes is excellent when you're building the database platform or you have a lot of knowledge and experience with it, or VMS are excellent when you're essentially running a database. But he does talk about all the positives and negatives of each side as well in the blog post, but check it out if you're interested. Next piece of content postgres serials should be bigint and how to migrate this is from crunchydata.com and we covered a blog post like this just last week@cybertech postgresql.com so I'm not going to go through this in depth again, but this does have a little bit more information, particularly on how to handle foreign key relationships when you're needing to transition a primary key from a int to a big int. So check this out if you want to learn more. [00:18:56] Next piece of content this is the annual list From Robert Haas on who contributed to PostgreSQL development and we can see in terms of the most commits, Tom Lane is again number one here and the top person in terms of who did the most work for patches, but they weren't themselves. The principal author was Michael Paquier. And of course the person who sent the most emails to the PGSQL hackers list is again Tom Lane. But again, thank you for everyone who makes Postgres what it is. We really appreciate it. [00:19:28] Next piece of content. Introducing PGEdge load generator realistic PostgreSQL workload simulation. [00:19:36] So typically PGBench is used to benchmark a database system to see how well it performs. But the problem with these benchmarks, they don't really mirror a typical database workload, whereas that is what this tool attempts to do, they say. So you can actually configure seven different pre built applications spanning different use cases. So you can do a wholesale application which basically is classic oltp. You can do analytics which is a decision support. [00:20:08] You can do a brokerage which is a mix of read and write trading stock simulation, retail which is a multi channel retail decision support and then even some semantic search related ones in E commerce, doing product search and vector embeddings, a knowledge base which is an FAQ and a documentation search and then doc management, basically doing enterprise document management. In addition they have several temporal profiles. So basically time of day or time of season things. They have a local office profile, a global profile, basically 24, 7, a store regional profile as well as a store global profile. [00:20:49] So if you're interested in a tool like this, check this out. Next piece of content. There was another episode of Postgres FM last week. On this episode they had Radim Marek from Boring SQL to come and talk to them about Rege SQL. I'm slaughtering the name, I'm not quite sure how to pronounce it, but it's basically a regression testing tool for SQL queries. I don't remember how they pronounced it in the episode. My apologies. [00:21:15] But you can listen to how to properly pronounce it here or watch the YouTube video down here. Next piece of content. There was another episode of Talking Postgres. [00:21:25] This was how I got started with dbtoon and why we chose Postgres with Luigi Nardi so you can check out that episode and the last piece of content. There is a hacking workshop coming up for February 2026 and they're doing a talk on Amit Langotes Talk hacking Postgres executor for performance, so you can use this form to sign up to learn more about that. I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode. There you can find an audio version of the show as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 167

Interpreting Explain Analyze, Bulk Loading Data, Limiting Update & Delete, pg_buffercache | Scaling Postgres 167

Episode 359

Trillions of Rows & Metrics | Scaling Postgres 359

Episode 187

Sharding, Federation, Linux HugePages, Performance Tuning | Scaling Postgres 187