Entity-Attribute-Value Design, JSON Subscripting, mysql-fdw Push-down, New Regex | Scaling Postgres 193

Episode 193 November 28, 2021 00:09:26
Entity-Attribute-Value Design, JSON Subscripting, mysql-fdw Push-down, New Regex | Scaling Postgres 193
Scaling Postgres
Entity-Attribute-Value Design, JSON Subscripting, mysql-fdw Push-down, New Regex | Scaling Postgres 193

Nov 28 2021 | 00:09:26

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss using entity-attribute-value designs, new JSON subscripting capabilities, the mysql-fdw support for aggregate pushdowns and new regex functions in PG 15.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/193-entity-attribute-value-design-json-subscripting-mysql-fdw-push-down-new-regex/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about entity attribute value design. JSON subscripting MySQL foreign data wrapper, pushdown and new regex. I'm Kristen Jameson and this is scaling postgres episode 193. [00:00:24] Our all right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is entity attribute value design in PostgreSQL. Don't do it. This is from CyberTech Postgresql.com, and they're talking about the design pattern using entity attribute and values. And basically this means using a generic entity table for all objects. Like you don't have an address table, a name table, a orders table, but basically you put everything in an entity and you attach attributes to it via different tables. So like in this example, you have your objects table, which would be essentially the entities. And then you have a dedicated attribute table for strings, and it just has a single string container and you have a dedicated table for integers and it has a dedicated integer column. Now, there's also designs where you have a separate attribute table that lists all of the attributes and then references the individual values. So it's really an entity table, an attribute table, along with one or more value tables. But whereas you may think this is not too bad on the surface or it has some appeal to certain developers, it can really get you into a lot of trouble if they talk about in this post. I mean, the one thing that he mentions here that a lot of people talk about is it's flexible? Well, yes, it is flexible in terms of not having to create dedicated columns in tables for particular sets of data. But it does have huge performance issues, which he talks about here. So if you're going to do a simple insert in his design, it essentially requires inserting four rows into these different tables where you just want to get a name, a start and an end. This is usually inefficient compared to just inserting into a dedicated table with dedicated columns to it, one simple row insert. And when you're deleting something, that delete has to cascade through everything. So you can do this one delete, but it still is going to remove all of these row references. Now, updates can have an advantage because it's just a very small row in a single table where you're just updating a value compared to a larger table. So there could be some advantages from an update perspective, but then when you get to the selects, it is Join City, where depending on the data you have to pull back, you're going to do a lot of joins. And this is just a simple example. In addition, column aggregates are more complicated, like you're wanting to sum a given value and more complex queries. Again, it's Join City with so many joins and tables joining to itself to be able to find relatively simple data that you're pulling out. And it's much easier to properly lay out data with a dedicated, say, person table and address table to be able to pull out the same information. Now, one of the benefits that was mentioned in this post is the flexibility of this particular design because you don't have to create a new column or new tables for new entities you need to add or objects you need to add. But one way to handle it is just go ahead and create the tables when you need them or create the columns as you need them. It's not that hard to do. I mean a lot of frameworks have the ability to migrate the database to be able to handle this. And what you can do, which I do as well, is you use a JSON column field to give you more flexibility where you don't have to create new columns necessarily. But he does have some very good caveats here is that if you're going to be putting attributes in a dedicated JSON B field, don't put in ones that you are going to be using for joins because that'll just make your life more difficult. Make it a dedicated column. Don't use it when you need a dedicated database constraint. So like for example, if you're working with date times, probably don't put that in the JSON B, but use a dedicated column for it so that you can enforce the fact that it is a date time in that column. And then thirdly, if you're going to use a where condition that is not going to be an equality, like if you're doing greater than or less than, then probably give it a dedicated column for that purpose. But this is great advice to follow and I definitely agree that eav or entity attribute value designs cause bad performance. But if you want to learn more, you can definitely check out this blog post. [00:04:46] The next piece of Content what's new with JSON in PostgreSQL 14? This is from Logrocket.com. This particular post is talking about namely subscripting. So, for example, if you had a data column that is JSON. B and you insert information like a title into it, an author with a first name and a last name, in order to pull that out, where the author's first name is Ada, you would have to use the single arrowhead or double arrowhead syntax to be able to pull out the data. And the important point is the syntax is not the easiest and it's not common in the programming world. You would use subscripting to do it. Well with version 14 you can do that now. So for example, you can say where data author first name equals Ada. And the only thing that's a little bit unusual is that you do have to use double quotes around the text when doing that. But if you want to grab the title, you just do data with a subscriptive title and it works. Then they go into updating the JSON before version 14 you had to use JSON b set, which the syntax is even less appealing compared to the arrowhead syntax. But with version 14 you can use this update statement where you can use the subscription and just say set the data author first name where it's equal to Sarah. So that is much easier to interpret than this JSON beset function. Now, they did say there are some caveats with regard to that. In particular, they say quote if the value of the JSON key that is being updated is null, assignment via subscripting will act as if the value of the key is an empty object or array. So that may not necessarily be what you want. So you will have to drop back to using JSON BSET for certain use cases because it does have a create optional parameter. But again, there's another post describing this great enhancement to postgres, and I'm sure there'll be more. But if you wanted to find out more about how this works, you can definitely check out this blog post. [00:06:48] Next piece of content aggregate pushdown in MySQL farm data wrapper. This is from Enterprisedb.com. They're talking about an enhancement that's been done to the MySQL foreign data wrapper where it can now push down aggregates. So as opposed to just pulling all the data back into postgres from MySQL from that data wrapper and then processing the aggregate on postgres, it can actually ask the MySQL server to do that aggregate. They point out the performance impact here is that doing a particular query goes from 6 seconds to deliver the result down to 380 milliseconds. So that's a great improvement using this aggregate pushdown. And they say you can also use a join and an aggregate pushdown together to get responses to queries from MySQL and they give an example of pushing it down to a partition table. So if you want to learn more about this enhancement, you can definitely check out this blog post. [00:07:44] The Next Piece of Content waiting for PostgreSQL 15 add assorted new Regex SQL functions this is from Depom, and this is an additional post discussing the new regular expression functions that are coming to postgres 15. So if you want to learn more about their use and their capabilities, you can definitely check out this blog post. [00:08:07] Next piece of content using TimescaleDB with PGO, the Open Source postgres Operator this is from Crunchydata.com, and this post just basically describes how you can use their postgres operator to get TimescaleDB running on Kubernetes. So if you're interested in that, you can check out this blog post, the Next piece of Content how to run a specific regression test this is from Higo CA and when doing postgres development, of course you have regression tests to make sure that any changes to the code don't impact other features of postgres. Well, this post describes how you can run a specific regression test if you're testing a particular area of postgres. So if you're interested in learning more about that, you can check out this blog post and the last piece of content. The PostgreSQL person of the week is Pavel Luzanov. So if you're interested in learning more about Pavel and his contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scaling Postgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 324

July 14, 2024 00:16:29
Episode Cover

Four Million TPS | Scaling Postgres 324

In this episode of Scaling Postgres, we discuss experiments to achieve four million transaction per second, the importance of extended statistics, parallelism in Postgres...

Listen

Episode 216

May 22, 2022 00:13:50
Episode Cover

PG 15 Beta 1 Release, Query Breakdown, Sort Performance, Prepared vs. Partitioned | Scaling Postgres 216

In this episode of Scaling Postgres, we discuss the Beta 1 release of Postgres 15, optimizing a query by breaking it down, improvements to...

Listen

Episode 311

April 14, 2024 00:18:52
Episode Cover

1,400 Times Faster Max & Group By Performance | Scaling Postgres 311

In this episode of Scaling Postgres, we discuss a 1,400 times faster max and group by implementation, new quantization capabilities in pgvector, adaptive radix...

Listen