150 Times Faster pgvector? | Scaling Postgres 314

[00:00:00] If you are looking for more AI or PG Vector news, this episode definitely has it, but I hope you, your friends, family and coworkers continue to do well. Our first piece of content is the 150 x PG vector speedup. A year in review this is from jcats Zero five.com and clearly this is a year in review by Jonathan and all the work that's been done with PG vector. And he does a special call out to Andrew Cain for all the work that he's put into this extension. But he does a benchmarking analysis, not necessarily against other AI or vector based search systems. He does it against PG vector itself. So how has PG vector improved over the past year? So first he describes his test methodology, so he first prioritizes recall, and I totally agree with that as well. If you can't get a good recall, the performance really doesn't matter. He also looked at an assessment of storage size, how much the load time takes to load in the data, and do the index build times, what is the latency, as well as the single connection throughput in terms of queries per second? So I guess over the last year they've gone from version zero 4.1 up to 0.70, but they just released 0.7 and I'll show that blog post in a second here. So he tested the index types of IVF flat for the versions of 0.4, but for versions 0.5 and above he used the HNSW index and also tested out the half vector indexes that are available, which are basically two byte float representations as opposed to four byte, as well as the binary representation using a Jaccard with a re rank and an hamming with a re rank. And he has notes here describing it as well. In terms of the index creation, he used these as the default parameters. I think these are the defaults for HNSW. I'm not sure if this is the list default, but he used 1000 lists on the IVF flat and he actually tested two different systems and 16 x large seven reals. So this is an arm and a 16 x large intel based system. Both have 64 virtual cpu's and 512gb of memory. And he used the local NVMe or GP three storage for the r seven, which I might expect a performance difference between between those, but we'll see. And here's the configuration of postcards he ran these tests with. And he used the and benchmark framework or the approximate nearest neighbor benchmark framework to run these tests. So the first benchmark he looked at is this DBPD OpenAI 1000K angular and I'm going to scroll up a bit so that you can see it better. Might be a little bit hard to see, but here is the version right here. So this is 0.4 all the way up to 0.7, and then the half effect and then the two binary quantizations done down here. And I'm going to primarily look at what the difference in performance is. And you can see an incredible 30 times faster jump in queries per second for the HNSW indexes. And that was achieved in 0.5. And that performance improvement has held through throughout the remainder of the versions. In addition, the latency that increased by a factor of 27, so it went from a base of one to 27. So vast improvement of latency as well using the HNSW indexes. But as a consequence of that, using HNSW indexes increased the index build time. So you could see the IVF flat was 16 times faster building compared to the HNSW index. So the buildup time got significantly longer, 476 seconds compared to 7479 seconds. But in version 0.6 here you can see the index speedup actually doubled from the original IVF flat. I think that's basically due to the parallel indexing capability of HNSW indexes at this version. But performance continues across the board up to 0.7 to look pretty good. But then at the half vector, the major improvement seems to be a little bit less than a doubling of index speedup. And that's basically due to the smaller index sizes because they're using a vector that's half the size, but they're still achieving, as you can see here, really good recall results. So using the half vEc, it seems like a no brainer to me. And with a binary quantization, the index build speedup increases by a factor of five compared to not using binary quantization. So I'll turn my camera off. So you can see this index speed up, that it's five times faster compared to the base seven and it 16 times smaller compared to the base level seven in terms of indexes. So basically the 150 x speedup is with regard to index build time. So it's not with regard to query performance or things like that. So basically that's from 0.5 where the index build speed up was the baseline, to where you can see it's at 150 here for the binary quantization. But still that's an impressive improvement over the year. Now this is with the arm based system. There is similar performance results for the intel based system here not quite as significant index build times, but the queries and the latency are actually a little bit better in the intel systems. Now the other thing to keep in mind is that the binary quantization doesn't always work. It kind of depends upon your data and using the different benchmarks here, the recall was poor enough that he didn't even include the binary quantization in this next set of benchmarks, the Sift 128 Euclidean. And you can see that the performance differences are not as significant as that first benchmark we looked at. He looked at now just 960. Again, the same type of thing can't do the binary quantization on it due to poor recall and the performance change is not as significant. And the same thing with the glove 25 and glove 100 as well. But overall this is very impressive for only a year and we'll see how things continue to improve in the next next piece of content. Pgvector Zero 7.0 released this is from postgresql.org dot. So if you're wondering, hey, when can I do binary quantization for my indexes? Well, you can now because it's just been released. Actually this is yesterday as I'm recording it on the May 1 today. So definitely download it and give it a try. [00:06:33] Next piece of content, a beginner's guide to vector embeddings. This is from timescale.com dot. So this is a good intro to talking about vectors. What they are, what embeddings are basically vectors are an array of floating point values. And this happens to be the means that AI uses to interpret text or images or whatever. They basically use a model to convert text, images, sounds into a set of vectors, and those resultant vectors are called embeddings. I talk a little bit about how these are created with large language models and basically how they work. Basically an embedding has a set of vectors. I think OpenAI uses a vector with 1500 dimensions, and you compare one embedding to another and calculate the distance between those two vectors to determine how similar they are. They have a basic example here. Maybe Cat has a vector of this, whereas dog has a two dimensional vector of this, and those are relatively similar, whereas a car vector would be much further away, less similar. So the thing to keep in mind, and actually what I did for one of the bonuses of the course that I did, yes, you have to take your source data, convert it into embeddings or essentially vectors. But then when you want to query it, you need to query it using an embedding. So you have to translate what is being queried into an embedding to then compare it against the data that you've set up embeddings for. So if you have a set of recipes that you've created embeddings for, they're in a vector column in a database. If someone queries hey, I feel like pizza, what are the pizza recipes you have? That query needs to be converted into a vector embedding that is then compared against the recipes to say what's the closest thing to pizza in terms of the recipes we have, and then present that in an ordered list. They talk about the different ways you can create these embeddings using the OpenAI API as an example. So I thought this was a pretty good blog post giving an introduction to vectors and embedding. So check it out if you're interested. [00:08:39] Next piece of content PG Friday Papa's got a brand new rag. This is from bonesmoses.org dot and a week or two on scaling postcards he developed a, what he called a dirty rag. So a very bare bones retrieval augmented generation app. This new blog post, he does it using more tools and makes it actually much easier to build something. So he uses LM studio to easily browse and download large language models so it has a variety you can choose from. He then uses Olama to run the LLM and provide a rest API because like I said, you have to convert a user's query into embeddings and then compare those embeddings to what's stored in the database. So the rest API allows you to do that in real time against the model. He uses Docker to set things up easily and then pgvectorize, which I think allows you to convert your data into vectors within the database itself without relying on an external tool. So he basically goes through setting this all up and there's actually very minimal code to get this working. So if you want an easier way to set up a rag app or at least get started, definitely check out this blog post. [00:09:52] Next piece of content boost database security restrict users to read replicas this is from that guy from Delhi.com dot. And basically he wants to only allow logins to read replicas and standby, so don't allow them to log into the primary database. So how did he do that? He used his own extension. Now in order to do this, he's using the PGT Le extension, which is the trusted language extension. So he's basically creating a trusted language extension that he's calling standby user check and he basically says, hey, is the database in recovery? And if it is, they can log in. If not, they can't log in. And then he registers the feature specifying his extension for client Auth. So this seems pretty interesting. So check it out if you want to learn more. [00:10:39] Next piece of content buffers two the SQL this is the next episode of Postgres FM that was posted, and here they're discussing the buffer's options to explain. And basically both Nikolai and Michael think that buffers should be a default when you're running explain analyze, so they go into more depth about that. And their main reasoning is because it gives you an indication of how much work is being done as opposed to just having a timing where you're not sure quite what work was being done. It actually tells you how much of the shared buffers were hit or had to be read, and you can convert those blocks to give you an indication of how much data you're actually moving around. I think it was Michael that said, it also really helps you find read efficiency issues. So if you find something is slow but the blocks are very low, maybe you've got a problem with your disk or something like that. Or maybe there's some network issue when you're working with network attached storage. They also covered a little bit about the new option serialize that's supposed to be coming in postgres 17. So definitely feel free to listen to the episode here or check out the YouTube video down here. [00:11:46] Next piece of content waiting for postgres 17 streaming IO for sequential scans and analyze. This is from pganalyze.com and this is Lucas's five minutes of postgres. And this is basically talking about the path or the road to using asynchronous IO in postgres. And some new changes in postgres 17 have some new internal APIs for streaming reads, so basically make reading more efficient. Now this is not asynchronous IO yet, but they're just putting in the building blocks to be able to support that later. But already he did some tests and you can see improvements with regard to table sequential scans. I think the performance improvement was maybe ten to 15%. So basically another great addition to postgres 17. But Lucas mentioned he was definitely looking forward to actually getting some async IO in potentially postgres 18. But we'll have to see next piece of content when and why postgreSQL indexes are ignored. This is from boring sql.com dot and the first reason he mentions is that you have a missing condition. So maybe you have a partial index, but your condition is not sufficient enough to hit the partial index, so the index isn't going to be used. The second is when you have a multi column index, but you're querying the data in such a way the index can't be used because column order is important. So I can query this index here on both country id and city and the index will be used. Or I could just use only country id and it will be used, but I can't do a query and city and have that index be used. So basically you need to use the leading columns of the index, at least one or more for that index to be used. The next reason is low selectivity. So imagine you have an orders table and the vast majority of the state of these order tables are delivered. So if you're going to do a query and hope to use an index that shows where the state is delivered here, if 95% of the rows are delivered or 99, it's not going to use that index because might as well just go to the table and do a sequential scan. Another reason that indexes can't be used is maybe you have outdated statistics. So basically the planner gives poor decisions and it maybe won't use an index. That makes perfect sense to use, and it's not mentioned here. But another reason could be is your data is just not large enough. If you only have a couple hundred rows in a table, or maybe a thousand, you don't need to use an index, you could just scan the table pretty quickly. But check out this blog post if you want to learn more. [00:14:11] Next piece of content nightly postgres backups via GitHub actions this is from joshstrange.com and if I was doing a postgres backup, I think the last option I would choose would be GitHub actions. And I don't mean to criticize the blog post too heavily because I guess, you know, people use what they know. Like I know Ansible really well, so I'm more inclined to use that for a bunch of different things, whereas other people may not. But he is actually running on neon where they do the backups for you. But he wanted some redundant backup, so he chose to do it in GitHub actions. So he set up an s three bucket on AWS and then used GitHub actions to take another backup of his cloud database from another provider. So here's a solution to doing that if you're interested in it. But personally, I would probably just set up a five dollar a month server on AWS, do the backup and send it to s three and trigger it using a cron job. Have it sending me an email. Let me know the state of the backup. But there's many ways to take a backup and do what works for you last piece of content why PostgreSQl is the bedrock for the future of data this is from timescale.com and under the heading of just use postgres or apparently there's a new comment postgresql for everything and a lot of different quotes from people about using postgres as the central data source for all your data. I thought this particular image was interesting where you replace all these different data storage services with just postgres. It's what I'm doing. Not that you should do that, but I found this post pretty interesting and they do eventually get into talking about all the stuff they've done at timescale as well in the latter part of it because this is from timescale.com. But still, it looks like the future is pretty bright for postgres, so feel free to take a look at this content. I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content discussed, as well as find an audio version of the show and a full transcript there. You can also sign up to receive weekly notifications of each episode. Thanks, and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 251

Filter Clause, Hypothetical Index, Non-Relational Data, Using TOAST | Scaling Postgres 251

Episode 352

You Can Always Optimize More | Scaling Postgres 352

Episode 101

Postgres Releases, Useless Vacuum, Isolation Differences, WAL Compression | Scaling Postgres 101