Leveraging Indexes, Slugs, Addresses, Security Definer

[00:00:00] In this episode of Scaling Postgres, we talk about leveraging indexes, slugs addresses and security. Definer. I'm creston, Jameson. And this is scaling postgres episode 64. [00:00:21] Hi, I hope everyone is having a great week. So before we start off with the content for this week, I wanted to just say thank you for watching so many episodes and for subscribing because actually the Scaling Postgres YouTube channel just crossed over a thousand subscribers. So thank you very much for tuning in every week. So our first piece of content this week is actually a news item from PostgreSQL, where new versions of PostgreSQL have been released since 9.4, so 9.4 up to 11.3. Now there are four security issues. Few that are related to Windows, one related to bypassing Row security policies. But this is probably the one that may impact probably most people if you're on eleven. So it only affects eleven and it's a memory disclosure and partition routing. So it relates to partitioning. So if you are using that version eleven and are using that feature, you probably want to go ahead and get patched as soon as you can. And also with this release, they say they have over 60 bug fixes and list the different things that were resolved. So definitely something to check out if you need to do an upgrade. The next post is getting the most out of your PostgreSQL indexes. So this is sort of a best practice document for using PostgreSQL indexes. So the first section it covers is covering indexes, where you can essentially attach an additional payload to an index so that you can increase the probability of getting an index only scan. And in this example, they're looking for active customers and they actually put in the include statement the email. So if you're wanting to get the email from the customers on who is active, you can get an index only scan by using a covering index using the include clause. But again, this is only available for PostgreSQL eleven, so just keep that in mind. The next area to think about is using partial indexes. So partial indexes only cover a certain part of the data in the table. So in this case they have an address table and what they wanted to do is actually create a partial index where the district is California. So it's only going to be maintaining those index values for the part of the table where the district equals California. Now, where I find partial indexes particularly useful is if you have a column that is indicating some sort of status for the items in the table. Like maybe you have three or four statuses, so maybe the cardinality how random the values is rather low. Using separate partial indexes will help pull out different records of different statuses. So I find those very effective to use in that case. And then they talk about MultiValue indexes. Now initially I thought they were referring to multicolumn indexes, but they're actually talking about data types that store more than one value. So like JSON fields or array fields and things of that nature. So in this example they're using the containment operator to see if the left hand is a superset of what's in the right hand and when they're just doing a query on a I believe they're using an array field here for this value it just does a sequential scan. When they use a b tree index, again, it's still a sequential scan. So with columns that contain multiple values, what you want to use is actually a gen index. So things like wanting to use it for full text search or arrays or JSON or JSON b data types, you would want to use a gen index in that case. And then they talk about in general you want to eliminate duplicate indexes because of course it uses up more disk space, it requires resources to maintain those indexes. So you want to review and find unused indexes and remove them when they're no longer used. And then they talk about rebuilding indexes and how for versions ten and eleven you can bump up some of the max parallel settings to get faster index creation as well as once your database gets of a certain size or it's a best practice to do it in general is used concurrently with a live or production database. So definitely a lot of great advice to follow with regard to PostgreSQL indexes. Oh, and I should say this is from the PG IO blog. [00:04:39] The next post is ultimate PostgreSQL slug function. And this is from Kdobson Net and he said he's developed a few web apps and APIs and utilities and he's created Slug functions. That basically takes some sort of a title. Like a blog title and converts it into a URL friendly format, generally undercasing it and putting a hyphen between usually where a space is and eliminating certain characters such as a double quote, exclamation points and then convert non ASCII characters as well. Although he says that may not be 100% necessary. But he's actually looked to do this in a PostgreSQL function that he can call from his applications. Now it's based upon one that was developed by Ianks on GitHub and he has the link here but he's done a little bit of his own in terms of removing a single and double quotes. And if you look it's pretty simple to understand. It creates a slugify function and it is using the unaccent function to handle some of those non ASCII characters. So you do need to install the extension to the database, but it's basically just a CTE that first removes the accents, lowercase the string, remove single double quotes, replace anything that is not a letter number, Hyphen or underscore. With a hyphen and then trims the hyphens if they exist on the head or tail of the strings so it goes through and does each of these steps in order to create this slug. Now, once you have that, you can just call the function and it will output the slug value right here. Now you can immediately start using it or you can actually use it as a trigger. In his example here, he creates a trigger and before the insert for each row where the title is not null and the slug is null, go ahead and execute the procedure. So, a quick little blog post if you want to look into slugifying using a PostgreSQL function. [00:06:38] The next post is quick and dirty address matching with Lib Postal. And this is from Crunchydata.com Blog and this is looking at Lib Postal which is a they describe as a natural language processing library trained on over 1 billion address records in the international OpenStreetMap database. And they're actually using it through an extension called psql Postal to call the library from SQL. So what it does is it takes a string that represents some sort of an address and you can actually get normalized values of how this could potentially be interpreted. And then you can also do a parse and then outputting to JSON pretty to actually interpret this string. Now you could do this pretty well by yourself using like us for addresses for example. However, this library has the great ability to handle all sorts of different international addresses as well. So here's something from an address in Berlin and it automatically knows how to translate the address in the example here, STR. It's able to know it means Strauss which is street as they say here. And then they go into if you need to use this type of tool, it tells you how to download and install the Lib Postal library, prep the extension and then go ahead and load some data in order to test the features of this. So if you're wanting or needing to work with a lot of different addresses, potentially from different sources and different formatting and want to come up with a common way to do it, this is definitely a blog post to check out because they also discuss searching them using the full text search engine. So definitely an interesting blog post if you're interested in that. [00:08:29] The next article is Abusing security definer functions. This is from CyberTech postgresql.com. Now the first question is what is a security definer function? And they say here PostgreSQL functions are defined as security invoker. By default, that means they're executed with the user ID and security context of the user that calls them. Okay, that makes sense, but a security definer function will run with the user ID and security context of the function owner. So this enables you to pass on certain powers to someone. But as you can tell, this is pretty risky that they're basically running something as you, so you need to be extra careful when you're using this. And one of the recommendations they have here is that the search path should be set to exclude any schemas writable by untrusted users. This prevents malicious users from creating objects, tables, functions operating that mask objects intended to be used by the function. And in terms of the dangers they look at this here where they have this pretty harmless function that just says select what is passed in and add one to it looking for an integer. However, if you do these series of steps, you create a function that along with an operator, set the search path here and then you execute this function. You can literally make yourself a super user. So this looks to be something you need to be really be careful with. And of course, the last section is how can you protect yourself. And they suggest three measures here as recommended by the documentation always set search path on a security definer function and put Pgtemp on the list as the last element and that's emphasized in the PostgreSQL documentation. Also don't have any schemas in the database where untrusted users have the create privilege. In particular, remove the default public create privilege from the public schema and also revoke the public execute privilege on all security definer functions and grant it only those users that need it. So again, a set of security guidelines you should definitely follow, particularly if you're using these security definer functions. The next post, I actually don't know what the title is because this is from Postgres Pro which I believe is located in Russia, but this is a super comprehensive presentation that was given. It is in English, but I'm not sure as to what the title is here, but it's over 90 slides long. [00:10:55] A great many are talking about the JSON feature set that's available in PostgreSQL, talking about Controllable CTEs which are coming with PostgreSQL twelve where you can define whether it will by default be materialized or not. Talking about the K nearest to neighbor, talking about different indexes features that are coming in twelve, the new pluggable storage that was implemented as well as potentially Zheep which will probably be one of the first new storage type solutions available and talking about the different partitioning improvements. So if you want to get a sense on where Postgres is today and going in twelve, definitely a great presentation to check out. The next post is Power four, brings a remote mode available in beta. This is from our Juju GitHub IO. And this is talking about the postgres workflow analyzer. So it basically does monitoring and performance analysis of your PostgreSQL database instance. [00:11:58] And before it ran on the node with its own database here, collecting statistics and it had a web interface that would interface with a database. But with this new mode it can actually have a separate power repository from your database systems and thus causing less load on those to run it. So if you're interested in this type of monitoring solution, definitely a blog post to check out. [00:12:26] The last post is how to play with upcoming unreleased PostgreSQL and this is from Dep and he goes through all the process to actually build from source the most recent commits that have been made to PostgreSQL. So if you want to get your hands on twelve early, this is a blog post that you can check out and get it running. [00:12:49] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 319

When Postgres Development Stopped! | Scaling Postgres 319

Episode 273

Debian / Ubuntu Packaging, ivfflat Indexes, In vs Any, View Logging | Scaling Postgres 273

Episode 40

Materialized, PgBouncer Auth, Implicit vs Explicit, JIT | Scaling Postgres 40

Leveraging Indexes, Slugs, Addresses, Security Definer | Scaling Postgres 64