Index Merge vs Composite, Transparent Column Encryption, Trusted Language Extensions

[00:00:00] In this episode of Scaling Postgres, we talk about index merge versus composite, transparent column encryption, trusted language extensions, and Wall Archive module. I'm creston. Jameson. And this is Scaling. Postgres episode 244. [00:00:25] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is index merges versus composite indexes in Postgres and MySQL. This is from Sirupson.com, and he's comparing the difference between two indexes each in its own column, and looking for data in both of those columns versus having a single index that is a combination of both of the columns. So a multicolumn index. And he looked at this for Postgres and MySQL, and he did something called napkin math, which is actually a tool on GitHub that gives you references for how fast an operation you would expect to be. And based upon those estimates, he felt a composite index could happen in one millisecond. And an index merge solution using two indexes, one on each column, would be ten to 30 milliseconds. Now, of course, this is a contrived example. They're comparing integers, but it still has some relevance here. And when he actually did the test, he found the composite index was about five milliseconds for both Postgres and MySQL. And I should say this was doing index only scans for the type of query. It was doing just a count of all the rows, and the index merge for MySQL took 30 to 40 milliseconds. So about the range that it was predicted, the actual composite index was a little bit slower, but he says it's normal for the estimates to be in order of magnitude different, but the postgres was, on average, a little bit slower, but had a very wide variance. And the reason being is because it wasn't actually doing an index only scan when doing these index merges. So basically, in terms of comparison between two indexes versus a multicolumn index, the multicolumn index was about ten times faster. And he says, particularly with Postgres, as you get a higher number of records, it looks like that difference will become more prominent. And this is what I've kind of found in my own consulting, is that normally as the base case, I stick with single column indexes. But then as the data grows and it comes time for optimization, usually a multicolumn index is the first thing to reach for because the application or the use case has matured at that point, and you have a good sense of what queries are going to be running and what could take advantage of a multicolumn index. But if you want to learn more about that, definitely check out this blog post next piece of content, transparent column encryption with Postgres. This is from Superbase.com, and he's talking about a new enhancement to the extension called Pgsodium. Now, this relies on the encryption library Lib Sodium, and it does encryption in the database. Now, what it does is it does store a reference to keys in tables, but it doesn't actually store the key within the database. So the keys that do the decryption are not accessible from SQL, for example, nor are they placed in the logs or things of that nature. I believe you probably can place it in a file in the database, or you can point to any key management system where the keys are actually stored and integrate that with Lib Sodium, as far as I understand. So basically, the new version of Pgsodium has this API based key management in place, and he shows you how you can do some encryption of particular data and then do a decrypt of that data as well. But then the thing of interest that was released in PG Sodium 3.0 is transparent column encryption. So basically, it has a way to insert data into a table where it's automatically encrypted and then be able to retrieve that information in a decrypted state. So how it works, they're using security labels to set this up. Once you have the extension installed, and they had the example of a credit card table where you're putting a credit card number in a text field, and then you apply a security label on that particular column and says, encrypt it with this particular key ID. And again, this is a reference to the actual key. So PG Sodium knows where to find the actual key to do the encryption, even though the key reference is stored in a table in postgres. Now, you can also do one key per row, so that way every individual piece of data has its own dedicated key for doing it. And by default, these are also nuanceless. So you can do rows with nuance support, so you can add that as a byte array as well. So PG Sodium supports that, and you can even include associated data with the encryption. So definitely a lot of different features to set up your encryption. Now, how it actually works is that you can now insert your data into the table, so you specify what data you're inserting. So of course, you want to be cautious not to log this information in Postgres. Like, don't log your insert statements, for example. But then if you pull it out from that table, you will see encrypted values. So you could see the credit card number has an encrypted value, whereas it sets up a separate view for that table from where you set it up, where you can actually call the view called decrypted credit cards, where it actually shows you the decrypted credit card. So this is definitely an interesting enhancement to the existing PG Sodium extension. And if you're interested in doing transparent column encryption, definitely suggest you check it out. [00:05:33] Next piece of content new trusted language extensions for PostgreSQL on Amazon Aurora and Amazon RDS. This is from AWS, amazon.com. They're talking about a new open source extension that they have developed that allows you to create extension using only trusted languages, so languages that exist within a sandbox. So this is an open source project that they've released. And in terms of trusted languages, what they mean is you can use SQL Plpg, SQL JavaScript or Perl because these have security constraints where they can't write to the file system. Now, probably the reason that why they created this is because they have a hosted infrastructure and they don't want you writing to the file system. So they've given you a convenient way to develop your own extensions and install them quite easily into their hosted infrastructure because nothing's going to be writing to disk essentially. Now this should work with Postgres, of course, not just on Amazon's infrastructure, but once you go ahead and get this extension set up, you create it within your database and then you can define an extension like this pgtle install extension and you just put in the code for your custom made extension. They did a sample use case here where they're actually checking for bad passwords. So this new project also supports hooks into Postgres, so they're using the check password hook to be able to do password validation. So that's what this extension does and they show you how to go ahead and install and get that activated within the database. So this is a great enhancement to the open source community. And if you want to learn more about it, I definitely encourage you to check out Creating Custom Extensions on Amazon, RDS and Aurora with [email protected], where Lucas goes into much more depth about different use cases for this new project. And furthermore, you can check out the code for the project on GitHub as well. Next piece of Content new wall archive module library in PostgreSQL 15 this is from Procona.com and they're talking about how you archive walls. Now, for the longest time we've had the archive command where you actually run an operating system command that takes active current walls and then archives them. Well, now you can use a programmable library to do it as of postgres 15. So for example, you can set this specific archive library and do a reload to get it working, but you need to use an acceptable library. Now, it does come with a sample archive library called Basic Archive, but really based upon my reading of this post, there seems to be a lot of issues. So it's definitely not something used for production, but it's something for you to assess the code to essentially build your own if you want to do that. Or alternatively, it is a source for people who are writing archive solutions for postgres to use. So for example, I believe things like PG Backrest may start to use this or other backup archiving solutions. But if you want to learn more about how this works and some of the benefits of it, definitely encourage you to check out this post. [00:08:40] Next piece of content it's time for pgSQL Friday. And this is episode three where the community is covering what is PostgreSQL Community to you. And the introductory post is here by Sqlasylum WordPress.com so you can read more about why he wanted to discuss this particular issue. The first post I saw was PostgreSQL Community passionate, knowledgeable and thoughtful. And this is on Software and booze.com, so definitely encourage you to check out this blog post. The next one is pgSQL Friday number three. What is the PostgreSQL community to you? And this is from Scarydba.com where he discusses his feelings on Community. And finally there is what is the PostgreSQL community to you? pgSQL Friday number three from Rustprooflabs.com, and he has a number of different linked resources for keeping up with the postgres community. Next piece of content PostgreSQL 14 Internals part four. This is from Postgresprogue.com and this is the PostgreSQL 14 internals book that is being translated and released in different parts as free PDF downloads. So if you want an in depth coverage of Postgres Internals, definitely encourage you to download the next part of this where they cover the workings of the planner and the executor. [00:09:59] Next piece of Content Partition PostgreSQL which partition did I insert my data into? This is from CyberTech Postgresql.com and they're talking about which partition a particular data has been sent to. So they just set up a basic partition table with three partitions and a default partition that they set up just using a list of values. And when they do the insert, they actually do a returning statement and they look at the table OID and they cast it to a reg class. So that actually gives you the table name that it was inserted into. And I didn't know that this was possible to do. So that's pretty interesting because they're loading something that does not have a defined partition, so it gets placed in the default partition, which they call trest. And you can even do queries on this and include the table OID casted to a reg class so you can look at particular data and see what table that resides in. So if you want to learn more about that, definitely check out this blog post. [00:10:58] Next piece of Content postgres strings to Arrays and back again. Here they're talking about a situation where you have a CSV and a CSV. So someone has formatted more than one value in a spreadsheet of some sort and has exported it to a CSV file. And now you have data that looks like this within a particular column. So the question is, how do you deal with that? And what he did is he just loaded it into a two column table, used the copy command to get it in there. So now your table essentially looks like this, and he uses the function string to array to actually break that array into individual values. But then of course, what do you do with it at that point? You can now unnest it. And if you unnest that string to array, it actually creates multiple rows with how many values there are along with this station. And you can click this downward link to see what the query results are. Now with this in place, you can now do different queries against it. Here he's using a CTE, but he's looking at okay, what was the max temp and the min temp for a particular station. So if you want to learn more about that, definitely check out this blog post next piece of content. Using Postgres as a task queue for Rowdy gophers, this is from Inte IO, and he's talking about building a queuing system for Postgres. But the TLDR is he set up a polling system using Update Skiplocked. Now, this post is quite substantial. He talks about different paths he went on that didn't work or didn't work as well, and how he basically ended up using for Update with Skip Lock. So if you want to learn more about that, you can check out this blog post next piece of content. Grant vacuum analyze in PostgreSQL 16. This is from Cybercap and postgresql.com they're talking about. In version 16, you will now be able to grant particular users the ability to run Vacuum or Analyze. Right now, only owners and super users can run these commands, but now you can grant them to particular users. There will also be two new predefined roles. One is PG vacuum all tables and PG analyze all tables. So this is great for people who are doing Ops work but don't actually need super admin capabilities. So check this blog post out if you want to learn more about that. Next piece of content. PostGIS Day 2022 this is from Crunchydata.com, and this is a review of PostGIS Day that happened on November 17. They talk about a number of talks they found interesting, as well as different tools that were showcased as a part of the event. So if you want to learn more about that, definitely check out this blog post. Next piece of Content there was another episode of Postgres FM this week. This one was on Postgres and they cover what it is as well as the pros and cons of using it. So basically, this is a standalone web server that turns Postgres into a restful API endpoint. So then you could just write client code against it. So if you want to learn more about that, you can click to listen to the episode here or click here to watch the YouTube next piece of content. The PostgreSQL Person of the Week is Alexander Schulgen. If you're interested in learning more about Alexander and his contributions to Postgres, definitely check out this blog post and the last piece of content. We did have another episode of the Rubber Duck Dev show this past Wednesday evening. This was on. What happens after Rubycrytic with Ernesto Tag worker? So Rubycrytic is a way to analyze your code and give you recommendations on how to make it better and more maintainable. And we covered. After you did that, how do you actually go about deciding what to change and how to do that? So if you're interested in that, we definitely welcome you to check out our show. [00:14:35] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 101

Postgres Releases, Useless Vacuum, Isolation Differences, WAL Compression | Scaling Postgres 101

Episode 307

Collation Conundrum | Scaling Postgres 307

Episode 235

Postgres 15 RC1, ICU Collations, Listen & Notify, Understanding TOAST | Scaling Postgres 235

Index Merge vs Composite, Transparent Column Encryption, Trusted Language Extensions | Scaling Postgres 244