IN vs ANY, Ghost Conditions, Percentage Calculations, Variadic Unnest | Scaling Postgres 243

Episode 243 November 27, 2022 00:10:25
IN vs ANY, Ghost Conditions, Percentage Calculations, Variadic Unnest | Scaling Postgres 243
Scaling Postgres
IN vs ANY, Ghost Conditions, Percentage Calculations, Variadic Unnest | Scaling Postgres 243

Nov 27 2022 | 00:10:25

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss how PG15 helps reduce replication lag, how to get the parameters used in prepared statements, PostGIS Day and how to use multiple pgbouncers.

 

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/243-in-vs-any-ghost-conditions-percentage-calculations-variadic-unnest/

View Full Transcript

Episode Transcript

[00:00:00] You. In this episode of Scaling Postgres, we talk about N versus any ghost conditions, percentage calculations, and Variatic unnest. I'm creston, jameson and this is scaling postgres episode 243 A. [00:00:24] All right, I hope you, your friends, family and coworkers continue to do well. We don't have as much content this week as normal, probably because of the Thanksgiving holiday that was last week, but there are ten pieces of content to go through. So our first piece of content is five minutes of postgres episode 45. N lists versus Any operator bind parameters and performance. This is from Pganalyze.com. And this covers a post that you can see highlighted here that we covered on last week's scaling postgres, which is a postgres query boost using Any instead of N. And this particular post examined using Java and how they couldn't really use N and they had to use Any because N essentially accepts a list of values, whereas any is kind of any size array. While Lucas covers this post and the points in it, he also brings up questions of performance of looking at other previous posts that had compared N versus Any. So he compares a benchmark back from 2017 that showed N lists were faster up until you got about 128 values in the end list. Then the Any list started performing much better. He also talked about someone was using thousands of different integers in an n list and they were seeing poor performance, and they actually did a patch to postgres 14 that improved the performance of endless. So he's advocating if you use a large set of endless, definitely move to postgres 14 or later, or consider changing over to Any, maybe that would give you more performance. And he even commented about how there were some cases where a postgres index was not used with Any, but it was used with n, and this was with row type comparisons, but they were able to eventually get it to work by typecasting appropriately the values. So I definitely encourage you to check out his episode because it covers a lot of performance related information with regard to n and any next piece of content. Ghost conditions for unindexed columns. This is from Sqlfordevs.com, and he's talking about something he's calling ghost conditions, which I haven't heard that term used before, but it's basically you have a table that has indexes and certain values, but what you want to search by is actually not indexed. So, for example, status and insurance is either not indexed or it's indexed individually, not as a composite index. And how can you improve the performance of this without adding additional indexes? And he suggested are there other values that are indexed that this set of data fits within? Well, you can add that condition to it what he's calling ghost conditions because they aren't really relevant for pulling in the data, but they are relevant if this allows you to use an index, say on type that allows you to narrow the range of records that you're interested in to retrieve the results faster. So he goes through and talks about that in this particular blog post. So basically, if you're seeing poor performance, you can always add an index, of course, but maybe look at other columns in there. And are there correlated data that is indexed you could use to return results faster? But definitely check out this blog post if you want to learn more about that next piece of content. Percentage calculations using postgres window functions this is from Crunchydata.com and he's talking about before Windows functions it was quite hard to do different percentage calculations over all the rows in a particular table or subset of data in the table. And he's giving an example here of how you would calculate the percentage that someone earned over the rest of the people in the table. And basically you have to do two scans of the table to collect that information. Whereas if you use a window function, you can actually get all the information you need in one scan and you do that by using a window function. So in this case, he has the numerator here looking at the earnings times 100 and he divides it by the denominator where he's using a sum function in a windowing mode. So he's summing all of the earnings of the table over the entirety of the table. And even if you want to do percentage of earnings by band, that becomes more difficult to do if you don't use a window function. But if you do, instead of doing it over the whole table, like it shows here with just an open and closed parenz, you can actually say partition by band and that breaks it out by band. So definitely a quick post on how to do percentages using window functions. Definitely encourage you to check out this blog post if you want. [00:04:58] Next piece of Content Variatic unnest this is from Postgresonline.com. He's talking about a situation where he received some data that was entered quite poorly into a spreadsheet. So he had a practice name, an employee name, but each of the employers was just separated by a new line in the data, so it wasn't in its own row. So this is three rows of data, but yet it includes five employees and a blank employee for one of the practices. So their credentials there was just a new line to align up the second credential with the second person listed and their speciality as well. So he received this and says, oh my gosh, how am I going to deal with this data? And he used a technique where he converted the strings to arrays breaking on the new line that was entered to keep them separate. And he did that for each row so that they could line up. And they used the unnest call, which you can call in a Variatic fashion, meaning you can have multiple different variables and each of them will fit in their own column and it won't do like a cross join, which he demonstrates up here at the first part. And he basically got the data all nicely structured using this technique. So if you're interested in learning more about that, definitely check out this blog post next piece of Content SQL or NoSQL? Why not use both with PostgreSQL? This is from Subabase.com and basically he went over the scenario of where you have a meal planning application that tracks essentially what you eat and you could go the NoSQL route and just throw everything into a JSON document and store that. And that works much more easily than trying to break out the data into a relational database. But the problem comes when you want to try to analyze it. It becomes much more difficult to do that using JSON documents as opposed to doing a SQL query. However, it also causes issues because normally you would want to set up a lot of tables and doing all those inserts, maybe you don't want to do that. So he developed kind of a hybrid schema where there's a main table per person per day, but then exactly what they ate was in a JSON B column in terms of amounts, in terms of food, water and exercise. So I wouldn't necessarily advocate this type of design layout, but it does give them a way to then show you how to analyze this type of JSON data that's received using different JSON functions. So if you want to learn more about how to use JSON functions, definitely check out this blog post next piece of content PostgreSQL Error permission denied for schema public. This is from CyberTech Cuff and Postgresql.com, and this happens, of course, when you upgrade to Postgres 15 Create a User and you start trying to create objects. You may run into this error, and this blog post covers that issue and how you could resolve it. Basically you now need to grant permissions to that user on the public schema or more appropriate ways maybe to create its own dedicated schema and it would have rights to those. And how he resolved it is just ran grant all on schema to public demo. But if you want to learn more about that, you can definitely check out this blog post next piece of Content global Index Benchmark with PG bench this is from Haigo, CA and they're talking about the global index on partition tables. So basically they're trying to create a unique global index that would apply to partition tables because right now you can't have a globally unique index and this is the proposal or their proof of concept on how to do it. And this actually shows some performance results that you can see here. And they're looking at a different number of partitions from one to 1000. And as you can tell, it looks like the transactional performance of the global index, given the data that they have here looks really good at the one partition and the ten partition level, but then it drops off significantly at the 100 table level and the 1000 table level. So don't know if this is a long term solution for this, but this is the stage that this work is at. So if you want to learn more about that, you can definitely check out this blog post next Piece of content. PostgreSQL scary settings. Data syncretry. This is from F Luca 1978 GitHub IO. And he's talking about a setting you can make in your PostgreSQL comp file called Data syncretry, and how you really shouldn't turn this on because there's a high probability you're going to corrupt your data. And he goes through all the process of why that is. But if you want to learn more about that, definitely check out this blog post next Piece of Content there was another episode of Postgres FM this week. This one was on Materialized Views, and they cover what they are, their pros, their cons, and some areas where they can improve. So you can click this button to listen to the episode here, or you could watch it on YouTube from this link here. [00:09:50] Last piece of content. The Postgres Girl Person of the week is Tobias Busman. If you're interested in learning more about Tobias and his contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 217

May 29, 2022 00:16:33
Episode Cover

Schema Change Mistakes, Canceling Statements, pg_rman, Pedantry Removal | Scaling Postgres 217

In this episode of Scaling Postgres, we discuss mistakes you can make when doing schema changes, how best to cancel statements, looking into pg_rman,...

Listen

Episode 77

August 19, 2019 00:12:14
Episode Cover

Timeseries Queries, Arrays, Ranges, C Triggers | Scaling Postgres 77

In this episode of Scaling Postgres, we discuss timeseries queries, using arrays with queries, range use cases and developing triggers in C. To get...

Listen

Episode 269

June 11, 2023 00:18:41
Episode Cover

Upgrade Struggles, Hyperloglog, PgBouncer Usage, Postgres Scaling | Scaling Postgres 269

  In this episode of Scaling Postgres, we discuss Postgres upgrade struggles, how and why to use hyperloglog, using pgbouncer for session vs. transaction pooling,...

Listen