Tips & Tricks, Faster APIs, Full Text Search, pgbouncer_fdw | Scaling Postgres 97

Episode 97 January 20, 2020 00:13:11
Tips & Tricks, Faster APIs, Full Text Search, pgbouncer_fdw | Scaling Postgres 97
Scaling Postgres
Tips & Tricks, Faster APIs, Full Text Search, pgbouncer_fdw | Scaling Postgres 97

Jan 20 2020 | 00:13:11

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss tips & tricks, how to get faster APIs, full text search considerations and pgbouncer_fdw.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/97-tips-tricks-faster-apis-full-text-search-pgbouncer-fdw/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about tips and tricks, faster APIs, full text search, and PG bouncer. FDW. I'm Kristen Jameson, and this is Scaling. Postgres episode 97 one. [00:00:22] Alright, I hope you're having a great week. Our first piece of content is Tips and Tricks to Kickstart postgres year 2020. This is from Cybertechn Postgresql.com and it's basically just a list of tips and tricks, and some of these I wasn't aware of, so it was great to be able to get this content. [00:00:42] The first tip is you can do dynamic SQL loops with Psql's Gxec command. So basically if you have the need to query the database and then do something from that query, so you want to select a set of tables and vacuum them all. You can use the Gxec command and it basically runs through the return value of that query and then executes that command. So basically you're building an SQL statement in what you're returning and you can use Gxec to actually execute that statement. So this makes it easy to execute statements essentially over a loop and they make note here. Be sure to know that this is not an SQL command, but a psql command. The next tip is that you can use session variables, so you can set under a particular scope a variable to enter a value. And they say that the data types aren't enforced, so you could set it to an integer or a text value and then you can do show that variable to actually print out what it is. So you can use current setting to get that setting and use it in an SQL query. And they say something might change in version 13 with something called schema variables, but as of right now, this is something that works in current versions. The next tip is that in addition to PG Sleep that sleeps for a certain number of seconds, you specify you can actually use PG Sleep Four, which indicates essentially a certain interval. So like five minutes, two days or whatever, you put in a more human friendly version. Or you could do PG Sleep until a specific time, so they're using Tomorrow at 03:00 A.m., or you could do a specific date and time. So again, these are functions I was not aware of. The next tip is that you can set Planner constants per function. So for example, for a particular function you can alter that function and then specify, for example, you want to enable the sequential scan to off. Now normally people use SQL statements, so they don't have a lot of functions, but this would be a way that you could change how a query is potentially executed, or you can alter the amount of work memory it has or set a particular statement timeout so I could see some use cases where this could be useful. The next tip is Duplicating databases for testing purposes. So probably you mostly, as they say, here, do PG dumps to backup and then restore elsewhere. But if you want to do it within the same cluster, you can use the Create database command and do it from a template. The thing to watch out for is that it actually locks up the source database while it's doing this. So it's something definitely you need to be aware of and no connection should be active. But this is another alternative for duplicating a database that should be much faster than pgdump. The next tip is debian specific and I believe the next two are or also Ubuntu is that you can define cluster and initialization defaults. So there actually is a cluster configuration at this location, etc. PostgreSQL common createcluster.com and it defines all sorts of configuration functions for setting up clusters. So this could be really useful to be able to alter this if you want to change how clusters get created. The next tip is that there are sample configuration files located in this location for debian systems and this location for Red hat based systems. So if you've altered your configuration files and you want what the original one was, you can consult these directories. The next set of tips is with regard to randomly sampling rows and they're using the Table sample SQL command. So you can use Table sample system and zero one for a zero 1% sample of rows. It's pretty fast. Or if you want more randomness, albeit slower, you can use a Bernoulli as they demonstrate here, and they give a few other examples of how you can use Table Sample. And last but not least, the tip is you can save characters when doing a not null and instead of doing is not null, you can actually just do not null. So again, yet another tip I was not familiar with. So I'd say this is definitely a blog post you're going to want to keep in reference to be able to use some of these tips and tricks. A great blog post to check out the next post is Faster Ruby APIs with Postgres and this is from Goyabada blog. Now this is talking about Rails in Ruby. However this is applicable what the technique they're using is applicable for any application framework. And what they're basically saying is that for API purposes, that is, returning JSON, you can do a select of the database, pull all that data into Ruby, then build objects with it as the Orm for active record would do, and then format that output as JSON. An alternative is do a SQL query directly to Postgres and then have it return JSON to you. So they do this for the Ruby on Rails case where they're using the JSON build object function to be able to build out the JSON that they want along with the JSON aggregate function. And it essentially returns the data that you would expect and it essentially can return the same data that you could using ruby on Rails or any other application framework, and it does it much more efficiently. So they have some graphs down here that show essentially it's about tenfold faster using this technique compared to going through essentially the Orm and having Ruby build the JSON. And this essentially holds true when looking at single tables or a join table or even increasing the number of elements. It's still about ten times faster. Now, other frameworks that may be faster than Ruby probably won't be ten times as fast, but still it's going to be relatively significant. Now this may involve more complex SQL queries, but if you have an API that requires high performance, then querying postgres directly and returning JSON from it and then just passing it on through the framework should be the most efficient option. So if you're interested in using this technique, definitely check out this blog post. The next post is Two Edge cases in PostgreSQL. Full text search. So they're using Full Text Search and they have a TS vector column, this TSV that they've built along with their table. So if they have a title and they want to query it, they went ahead and built a two TS vector value here. And they do it using a trigger function. And they were using the English catalog. And as you can see, it reduces things like painting to paint, roofing to roof, heating and cooling to cool in heat. And that's fine. And you can query like Roof and it will find the record. So whatever you're searching, you do it to a TS query. However, it won't find roofing. Now, their solution to this was to actually use additional catalogs, and in this case they could use a simple catalog. So what they did, they changed their function that actually concatenates the results of two different catalogs, the English catalog and the simple catalog. And those conversions converted painting into paint and painting roofing into roof and roofing. And now when you do a query, you get the one record expected. So if you're not getting necessarily what you expect with Full Text Search, maybe consult using a different PG catalog to see if that can give you the results you're looking for. And this blog post is from Jetrockets Pro. [00:08:34] The next post. How to simplify PG Bouncer Monitoring with PG Bouncer FDW. And FDW stands for foreign data wrapper. This is from Crunchydata.com. So this is a new tool, essentially a foreign data wrapper that allows you to query and monitor PG Bouncer because by default PG Bouncer, you get details on its running by actually connecting to it with a specific administrator or operator user, and then you show commands. So for example, here it's showing the different pools to get the output. What they wanted to do is actually be able to do SQL queries to query the different metrics of PG Bouncer to be able to easily monitor it. So they set up essentially a foreign data wrapper. They're using the DB Link foreign data wrapper and they go through the process of doing it. Then they're adding foreign server, doing all the granting for the selects. And now you could do something as select all from PG Bouncer underscore Pools and it returns all the same data. So basically you're querying your main database and it goes and talks to the PG Bouncer to get its metrics for monitoring purposes. And they also mentioned that they've incorporated this into their PG Monitor open source monitoring utility. So this seems like a really great addition. I really like this and if you are interested in learning more, definitely check out this blog post. [00:10:04] The next post is effectively using materialized views in Ruby on Rails and this is from Pganalyze.com. Now again, this is a general subject about views and materialized views. It is specific to Ruby on Rails, but you can learn more about views and materialized views if you want to consult this post. Basically they cover what is a view, essentially a virtualized table and it helps minimize complexity. So if you have a very complex query, you can put it into a view and it will do that query real time. Alternatively, you can use a materialized view which actually creates an underlying table of all of this data. It has materialized this virtual view, so essentially it's getting a one time snapshot of it. Now as a consequence, you have to refresh that snapshot periodically. So essentially it's like a cache. So it's like a cache for your query. So they do need to be refreshed periodically. But if you want to learn more about views and materialized views and how you can use them specifically with a Ruby on Rails or a Ruby application framework, definitely a blog post to check out. [00:11:16] The next post is actually a YouTube video and it's why developers love postgres. Now, this was presented at Microsoft Ignite that happened recently and this presentation is about 38 minutes in length and it covers for about the first 15 to 20 some of the reasons that developers are loving Postgres. The latter half talks about how an organization switched over from Oracle to Postgres and the benefits that they garnered from it. So if you're interested in viewing this presentation, go ahead and check it out. [00:11:50] The next post is Webinar BDR latest features and updates follow up. And this is from Secondquadron.com and this is a webinar that you can register for. There's a link right here to learn more about their bi directional replication product. I haven't had the opportunity to look at it, but if you are interested in learning more about bi directional replication, definitely check out this webinar. [00:12:14] And the last post is waiting for PostgreSQL 13 Alter Table Alter Column Drop Expression and they're mentioning here that with Postgres Twelve generated columns feature that allows you to do, for example, calculations based on other columns. It's a generated column. There's actually no way to remove it once you add it to a table. So this new command actually enables you to do that. So if you want to learn more about it, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgriz.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 21

July 16, 2018 00:19:41
Episode Cover

Using JSON, Procedures, Concurrency, Kubernetes | Scaling Postgres 21

In this episode of Scaling Postgres, we review articles covering how to use JSON & JSONB, procedures, deal with concurrency issues and Kubernetes. To...

Listen

Episode 165

May 16, 2021 00:13:33
Episode Cover

Postgres Releases, Data Warehouses, Exclusion Operators, Read-Only Users | Scaling Postgres 165

In this episode of Scaling Postgres, we discuss the newest Postgres releases, implementing a data warehouse, using exclusion operators and setting up read-only users....

Listen

Episode 87

October 28, 2019 00:12:39
Episode Cover

pg_receivewal, Application Failover, pg_checksums, pgBouncer | Scaling Postgres 87

In this episode of Scaling Postgres, we discuss the benefits of pg_receivewal, how to setup simple application failover, new pg_checksums and setting up pgBouncer....

Listen