Foreign Key Indexes, Graph Queries, Linux Huge Pages, Text Column Size | Scaling Postgres 192

Episode 192 November 23, 2021 00:12:39
Foreign Key Indexes, Graph Queries, Linux Huge Pages, Text Column Size | Scaling Postgres 192
Scaling Postgres
Foreign Key Indexes, Graph Queries, Linux Huge Pages, Text Column Size | Scaling Postgres 192

Nov 23 2021 | 00:12:39

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss if foreign keys should have indexes, how to run graph queries, how to configure Linux huge pages and the benefits as well as text size similarities.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/192-foreign-key-indexes-graph-queries-linux-huge-pages-text-column-size/

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: In this episode of Scaling Postgres we talk about foreign key indexes, graph queries, Linux, huge pages and text column size. I'm Kristen Jameson and this is scaling postgres episode 192. One alright, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is should I create an index on foreign keys in PostgreSQL? This is from Procona.com and they're saying a lot of the advice and I've even given it is that generally you want to have indexes on your foreign keys and these are references to other tables that are frequently used with query. However, it's not always the case that those would be used. So they present an example here where you have a supplier table and a product table and most of the time you're going to be looking and searching for products and just wanting to know who the supplier is. So for that type of query, you're not going to be needing an index on the supplier ID. You would only need it if you are looking at products from a given supplier. Now clearly that would happen, but it's probably not going to happen as frequently. So do you need an index on it? And some of the arguments they're making is that it may be more efficient just to do a sequential scan of the table. But of course they also discuss the flip side of where it's having an index on a foreign key is pretty much essential. So when you have a customer and you have orders, a lot of times you're just going to be looking up a customer's order. So in this case, this foreign key would be essential for those types of queries. But this post basically talks about it's not always essential to have indexes on foreign keys, although I would still probably do it as a first pass unless I'm sure it's the case like above here where it's not going to be used that much because you can always add the index later. And even if you've added one, postgres has tables that let you examine how frequently that index is being used or if it's being used or not. So it's a good practice to use those views to look at postgres and remove any indexes that just aren't being used. But if you want to learn more. [00:02:12] Speaker B: You can check out this blog post. [00:02:15] Speaker A: The next piece of content is webinar graph queries with PostgreSQL. And this is on the EDB YouTube channel and this is a webinar about an hour in length. And he explains how you do graph queries with PostgreSQL. And basically graph queries are where you have separate nodes that are linked or there's relationships between one another. An example I typically think of you have a person that has relationships, maybe it's a friend relationship or a familial relationship. Those are links between two different areas. Or you could also have a graph query that looks at where different HTML pages are and how they're linked together. Well, how you do these types of queries in PostgreSQL is you use Recursive queries, so you use the with Recursive keyword to recursively search through those trees essentially. And this webinar goes through how you can write queries that run efficiently, use indexes that can also avoid circular references which are possible with a graph as well as handling multi paths to the same node as well. So if you're interested in that you can check out this YouTube video. The next piece of content improving PostgreSQL performance without making changes to PostgreSQL. This is also from Enterprisedb.com and this post is basically all about Linux huge pages. Now, we've had some other presentations about it, but as your database scales this becomes more and more important to look at and configure. So it's basically an operating system level configuration, not a PostgreSQL configuration, although you do have to make an adjustment. But this discusses the importance of it as well as some of the magnitude of the changes. So they say when they were working with one client after configuring Linux huge Pages, the CPU usage of one database system went from 51% down to 15% with essentially the exact same load. So that's definitely really significant. And they give you some queries here you could use to check out the status of the page table to see if you could potentially benefit from this. So they go about discussing hey, does this make sense? And how to check that out. And then once you're ready, making the change even in a test system and testing the results. So you can definitely see some TPS adjustments based upon how you configure Linux huge pages. And I really like the advice at the bottom of this post. The first thing, set up a test server. Make sure neither transparent huge pages nor huge pages are enabled and run a test load. Then repeat with the two megabyte size huge pages. Record the results and then repeat it with the 1GB huge pages. Record those results as well. Determine what's best for you and then choose what you want to put into production. So this is a great post about Linux Huge Pages that I definitely encourage you to check out if you're running a larger size database. The next piece of content does Varcaren use less disk space than Vercare or Text? I personally have always heard that there's essentially no difference between these. But he actually set up tables and tested all sorts of different character links to examine is there a difference? And he has the table right here and basically for anything greater than a string length of 100, the results are exactly identical for all different column types, both the Text column, a Vercare and a varicare with a specified number of characters in it. The only two areas that were different is where the string length was ten or 100 and that was for their care of a specific number, but he narrowed it down to what he believes is the presence or not of a toast table for it. So basically it pretty much looks to me like all of these are identical and it doesn't really matter which you choose in terms of disk space utilization, but if you want to learn more, you can check out this [email protected]. The next piece of content using PostgreSQL views in Django. This is from Pganalyze.com and they're talking about using views which are basically virtual views of a table. And you can create a view that looks at multiple tables and it basically is a way, as they say here, to cut down on complexity. So as opposed to having to join the same tables over and over again and rewrite a long query, you can simply write that query once and save it as a view using the Create View command. Now you can simply select from this view to get the information of interest. Now these are virtual tables, but you can also materialize them, which is what a materialized view is. So you're actually creating a separate table and putting the data into it. A standard view has no data stored in it, it always just goes and reads directly from the tables, but a materialized view actually creates that table. Now the advantage of that is that now you can put indexes on it to get potentially better performance and they talk about how you can create it. But the other thing you need to keep in mind is that now that is a static representation of a point in time and if you want up to date data, you're going to have to refresh that materialized view so they have the command to do that and ideally you're always going to want to concurrently refresh it. And then the latter part of the post shows you how to set up and use these views in Django because in terms of Django, they're essentially just another model that you can define. And I've personally done the same thing with Rails. I've set up views and created models that point to those views and you can use them for reports, for example. But if you want to learn more about how to do that, you can. [00:07:52] Speaker B: Definitely check out this blog post. [00:07:55] Speaker A: The next piece of Content PG Friday Isolating postgres with Rip Manager this is from Enterprise db.com and they're talking a lot about high availability with regard to repmgr, which is essentially a rep manager. And specifically this talks a lot about consensus as well as fencing and how it handles those cases with rep manager. So if you're interested in that you. [00:08:20] Speaker B: Can check out this blog post. [00:08:22] Speaker A: The next piece of content, PostgreSQL on WSL two for Windows install and setup. This is from Cybertechn postgresql.com and WSL two stands for the Windows subsystem for Linux. So basically it's kind of like having a virtual machine infrastructure within Windows. So if you set up this WSL Two, you can essentially install a Linux distribution within Windows with presumably better performance than you would get from say, a VM solution. And this talks about installing PostgreSQL in that type of an environment. Now they're referencing a stack overflow survey here where they show a lot of people are using Windows for their development environment. Frankly, a lot of what I see is Mac. So I'm a little surprised Mac is only 25%. I'm a little surprised that Linux is this high at 25%, but the Windows subsystem for Linux is about 3%. But there are some users out there I know personally I use Linux on the desktop and I have for years. I used to predominantly use Windows, but about a decade ago when I started Rails development, I pretty much switched over to Linux. But this post shows you how you get WSL Two installed on Windows and then how you can then get a Ubuntu distribution and install it on that, and then from that point install PostgreSQL on it and be able to connect to it successfully. So if you're interested in that, you. [00:09:53] Speaker B: Can check out this blog post. [00:09:55] Speaker A: The Next Piece of Content waiting for PostgreSQL 15 allow publishing the tables of schema. This is from Dep.com and basically they've made an enhancement to logical replication where now you can also define a schema that you want to publish. So before you could publish individual tables or the whole database, well, now you can do it at the resolution of the schema. So if you're interested in that feature. [00:10:20] Speaker B: You can check out this blog post. [00:10:24] Speaker A: The Next Piece of Content PostgreSQL 15 will include some more regular expression functions. This is from Migops.com and they're talking about Postgres 14 and prior have a fair number of regular expression support within a number of functions that they provide. But this post was discussing that Oracle provides even more. So this individual or organization put in a fair amount of work to add a lot more regular expression functions to postgres that should be Live by version 15. So if you're interested in learning more. [00:10:58] Speaker B: About that, you can check out this blog post. [00:11:02] Speaker A: The Next piece of Content easily running BabbleFish for PostgreSQL on Kubernetes this is from Stackres IO and they're talking about Babelfish, which is the open source project that allows PostgreSQL to act like a Microsoft SQL Server, essentially a line compatible version. And since the source code has recently been open sourced, they've used their solution, their stackres operator for Kubernetes, I believe, to easily enable you to run BabbleFish on it. Now they do provide a hosted solution, but it appears that there is an open source version as well that you could try out. So if you want to try a BabbleFish without having to compile postgres from source code, maybe you would like to try out this alternative. The Next piece of Content the PostgreSQL person of the week is Luis Granjol. If you're interested in learning more about Louise and her contributions to Postgres, definitely. [00:11:57] Speaker B: Check out this blog post. [00:11:59] Speaker A: In the last piece of content, we did have another episode of the Rubber Duck Dev Show this past week. This one was on what is DevOps? So if you're interested in a long form discussion about DevOps and what it is, you can feel free to check out our show. That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube for itunes. Thanks.

Other Episodes

Episode 230

August 28, 2022 00:12:33
Episode Cover

Stats Collector Gone, Handling Latency, PG In The Browser, Future High Availability | Scaling Postgres 230

In this episode of Scaling Postgres, we discuss how the stats collector disappears in PG15, steps to mitigate high latency connections, how to run...

Listen

Episode 106

March 23, 2020 00:14:58
Episode Cover

Recursive CTEs, DB Access Patterns, Dates & Times, Cluster | Scaling Postgres 106

In this episode of Scaling Postgres, we discuss how to use recursive CTEs, database access patterns, using dates and times and how to cluster...

Listen

Episode 114

May 18, 2020 00:20:28
Episode Cover

Data Directory, Vacuum Features, Binary Performance, Network Latency | Scaling Postgres 114

In this episode of Scaling Postgres, we discuss the Postgres data directory, new vacuum features, binary storage performance and the impact of network latency....

Listen