Full Text Search, JSONB_AGG, Upgrades, CIS | Scaling Postgres 24

Episode 24 August 06, 2018 00:12:31
Full Text Search, JSONB_AGG, Upgrades, CIS | Scaling Postgres 24
Scaling Postgres
Full Text Search, JSONB_AGG, Upgrades, CIS | Scaling Postgres 24

Aug 06 2018 | 00:12:31

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we review articles covering full text search, aggregating JSON with jsonb_agg, upgrades and CIS benchmarks.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/24-full-text-search-jsonb_agg-upgrades-cis/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres we talk about full text search, JSON, B, aggregates upgrades and CIS benchmarks. I'm Kristen Jameson, and this is scaling postgres episode 24. [00:00:21] I hope everyone is doing well this week. Our first art goal is setting up a fast comprehensive search routine with PostgreSQL. This is from the Rob Connery blog and in it he wants to replicate some of the search features that you can find in applications such as Shopify where you do a comprehensive search essentially across multiple areas of the application or multiple tables as it were. Now this blog post walks through things gradually, so he starts with a regex query and doing a union for the different tables in order to retrieve the results. And for each of the steps he doesn't explain analyze in order to see how that query would be executed and how fast it would run. Now very quickly he turned to a materialized view. So a view of course is a virtualized table and materializing essentially makes it like a real table and you can refresh it on a periodic basis. And after setting up the materialized view, he then introduces the full text searching with the inclusion of a gen index. Now, generally when I've used full text search in applications I just use the gen index, I haven't used a materialized view in a large application I'm not necessarily sold on the fact that the materialized view could be refreshed fast enough when you have a lot of data. So I don't know, for a large application this implementation would be the best choice, but it is an interesting use case because you can just refresh the materialized view when you want the search results to be able to be viewed by a user. So it was an interesting use case, but again, for a large application, I'm not sure how well this particular implementation would work. And then he shows using the full text features to do the particular query and how he got the execution time again for a small data set, but relatively low under a millisecond. Now he does mention here in the last paragraph, to be honest, it's not really made for small loose searches like this. That's the full text search where he's just searching for a name across a couple of different tables in certain fields. And he continues and it's really easy to generate a false positive full text indexing really shines over things like blog posts, comment searches and so on. So I thought this was an interesting blog post in order to look at how materialized views could be used with full text indexes and maybe there are some use cases in your application that they could be used for. [00:02:48] The next post is Sqltip JSONB underscore AGG in PostgreSQL for simple one to many joins. And this is from the Geek Uelmi blog on Medium. My apologies for the pronunciation of that, but this is a super short post and really focused on the JSONB aggregate function in PostgreSQL. So essentially he had a use case where he wanted to create an API endpoint that displays items in a list like this. Historically, he would do a simple join. So join from To Do Lists to todo items and it will generate a table like this. But then in his application code, he would then have to group it how it needed to be in order to present this output as part of the API. However, using the JSON B aggregate function, he's able to rewrite the query like this and he takes all of the To Do items, sets it to JSON B and then aggregates them. In addition, he removes the To Do list ID from the output of the todo items table. So he gets just the data he wants, the ID, the name field and grouping by the To Do Lists ID. And now as you can see here, it's going to output this the To Do list ID, the name as well as all the items in essentially a JSON array. And he says, I quote, I then return that to my API client. No other changes needed. Now, we've mentioned previously how some people have used PostgreSQL to directly return JSON to their APIs as opposed to having to do more on the application side to prepare it. So this was a very quick focus post on how to do that. [00:04:34] The next piece of content is Meetup PostgreSQL how PostgreSQL's SQL dialect stays ahead. Now, this is a YouTube video from the Liferay Agora channel. However, this is for a presentation that was mentioned in the previous episode of Scaling Postgres that was about 96 97 slides long. But here is the YouTube video for and it is over an hour. Now, the audio is not great, but if you found that presentation interesting in the content, again, this is the previous episode of Scaling Postgres. Here's a YouTube video for you to check out and get the audio along with that presentation. [00:05:13] The next post is Upgrading PostgreSQL on AWS RDS with minimal or zero downtime. And this is from the preply engineering blog on Medium. Now essentially this post covers Bucardo for asynchronous multimaster replication. So essentially they had AWS on RDS two instances, a master and a slave. And they wanted to upgrade them from 9.4 to 9.5, but they wanted to do it with no downtime. So this blog post goes into how they the different options that they looked at and those options included BDR, PG, Cluster, Ruby, Grep and Bucardo. And in the end, due to the different constraints they choose, bucardo and I talked about what Bucardo is. Essentially it uses triggers to replicate, inserts updates and deletes between instances. And he goes into a lot of detail of how they were setting this up on their staging environment. Again with AWS RDS instances and how they got it working on staging and test and they say we tested this multimaster replication on test servers and staging and it works really good. But they had problems in production so they were having some issues with the replication, not being able to keep up with their insert update delete operations and they were potentially getting some primary key collisions and unfortunately they weren't able to do the upgrade with no downtime. So he then says for production we went ahead and did the RDS upgrade with Downtime. Now that may seem like a little bit of a disappointment, but their environment is presumably different than yours. And if you wanted to pursue this course, perhaps you could get around some of the issues that they were encountering when they attempted this with their production system. And it's also interesting to note that it really only required, it looks to be about as much as seven minutes to do the upgrade with the downtime. So a good blog post that goes over how they chose, what solution they were going to be using and how they used it, and then ultimately the issues they ran into and prevented them from using it. But still a good overview of using these tools in their environment. [00:07:29] The next post is using the CIS PostgreSQL benchmark to enhance your security. So essentially Crunchy data here, and this is from the Crunchy Enterprise postgresql.com blog. [00:07:43] They worked with the center for Internet Security to develop a benchmark and these are essentially standards that you should follow to help secure your PostgreSQL instances. So if you're particularly focused on security, you might want to check out these benchmarks that they've put together and they have a link here. You do have to submit email and contact information when retrieving it, but again, a great recommendation to check out this guide. [00:08:13] The next post is designing the most performant row level security schema in postgres. And this is from the Caleb Brewer blog on Medium. Now in this post he discusses potentially using your database as an access layer. So for example, he's saying if you need to do a select against the database for a particular item for a user, typically you would do this. You would say maybe you'd have some sort of created at date collect some items where you are the owner of that item using the owner ID whereas with this implementation using row liberal security for the user that's connecting could you potentially use this type of query, whereas essentially use the access controls and the role level security to only show the objects to that particular owner? Now, for my use cases, as an application developer, I don't necessarily see the solution he's describing as something I would use at this time. He did go over and record different performance issues he was seeing and for different implementations. So it is kind of a work in progress post. But if you are using row level security or considering it, this is definitely a post to check out to kind of understand where the state of things are with the version of PostgreSQL that we're at. [00:09:34] The next post is actually an announcement for a new webinar, an introduction to performance monitoring for PostgreSQL. And this is on the Several nines.com blog. So I believe this is a webinar that they are hosting. And since this show is Scaling Postgres, something performance monitoring related is definitely a piece of content to check out. And this is scheduled for August 21. So if you are interested in learning more about performance monitoring, definitely a webinar to go sign up at. [00:10:06] The next post is also from the Several nines.com blog and it is PostgreSQL Triggers and Stored Function Basics so basically it talks about what triggers are, what stored functions are, and what are the use cases for them and what are potentially some disadvantages of them. So of course they go through how do you declare or set up a trigger in a function, and then they talk about different use cases. So this is a pretty long blog post and some of the use cases they cover are data validation audit logging, which we've seen before in previous episodes of Scaling Postgres, where blog postgres have described using triggers to do audit logging, using them for derived values. So again, this is a use case with regard to full text searching capabilities. Maybe you store information for searching purposes. So again a very comprehensive post. So if you are interested or you believe triggers or functions could assist you in your application, definitely a blog post to check out. [00:11:09] The last post is understanding and reading the PostgreSQL system catalog. And again, this is from the Several nines.com blog. And this goes over basically the PostgreSQL system catalog and it discusses essentially all the different system views that are available. So they talk about PG underscore Database and what are the different fields there? What does it mean? Also PGSTAT database. They talk about PG Stat BG rewriter for checkpoints in the background writer PG Stat Activity. That's a very large one. Understanding what queries are going on in your system. Currently looking at PG locks to understand what currently locks are doing. PGSTAT User tables to get information about each of your different tables. PGSTAT User Indexes again to understand all the indexes in your system. So they go over some of the main system view tables that you would typically query to understand what's going on with your database. So definitely a blog post to check out. [00:12:07] That does it. For this episode of Scaling Postgres, you can get links to all the content presented in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 76

August 12, 2019 00:12:54
Episode Cover

New CVEs, Postgres Programming, JSONB, Advisory Locks | Scaling Postgres 76

In this episode of Scaling Postgres, we discuss new CVEs, programming Postgres, JSONB comparison and using advisory locks. To get the show notes as...

Listen

Episode 276

July 30, 2023 00:15:57
Episode Cover

BRIN & Correlation, Poor Partitioning, 10 Beginner Tips, Table & Index Usage | Scaling Postgres 276

  In this episode of Scaling Postgres, we discuss the importance of correlation with BRIN indexes, how partitioning can kill performance, 10 tips for beginners...

Listen

Episode 250

January 30, 2023 00:14:03
Episode Cover

Performance Issue, Survive Without Superuser, Reserved Connections, Partition Management | Scaling Postgres 250

In this episode of Scaling Postgres, we discuss resolving a performance issue, how PG16 allows you to survive without a superuser, reserving connections and...

Listen