Max Connections, SQL Tricks, Fast Text Search, Security & Compliance

[00:00:00] In this episode of Scaling Postgres, we talk about max connections, SQL tricks, fast text search, and security and compliance. I'm Kristen Jameson, and this is scaling postgres episode 110. [00:00:22] All right, I hope you, your coworkers and family continue to do well, and I hope you have a great week. This week, our first piece of content is tuning max connections in PostgreSQL. And this is from Cyberdeck. Hyphen postgresql.com. And they're talking about how best to tune max connections, which is basically the maximum number of connections that you can use to connect to PostgreSQL. And they mention here the default setting is 100, but three of these are reserved for super user connections. So essentially you only have 97. But they caution against setting max connections too high. Now, the first thing they talk about is why you would set it too high. And they say, well, first, because it requires a restart to change it, you probably want to be in the safe side and keep it higher. You may be getting an error message. Remaining connection slots are reserved for non replication super user connections. So you just raise the limit. Or the application developers convince the DBA that they need more database connections for better performance. And then they go into actually what the problems are with setting it too high. The first one is you're overloading the database, so the more active connections you have, it's going to be actively using all the resources of that server. Now, the thing that I've run into is people put 100, 200, 400 connections. But the thing you have to keep in mind is that if you have an eight core server, a 16 core, a 32 core server, each of those connections is a process. And if those processes are very actively doing work, you can't have the number of connections too far removed from the number of cores. Now there is wait time associated with doing network activity or disk access or things of that nature that the cores can be doing something else and they can handle more connections than just one per core. But you just have to keep in mind all of those connections are going to be using the resources of that database server that you have. So you want to be cautious and not overload that or overload the memory. And with regard to memory, they talk about insufficient resources for each database connections and that each connection uses your allocation of work, mem you have or higher than that workmen, depending upon the statements that need to be run. So each connection consumes memory. So you want to make sure that you don't run out of memory with setting it too high. Now, they give a formula you could use to kind of derive how many max connections you should have. But mostly I've had to use it empirically, test out the server, test out the different connections to see what the workload needs. But this is definitely one way to look at it and examine it. And then of course, they say when you want to start having really high connections, you're probably going to want to investigate a connection pooler. Now, a lot of application frameworks have built in poolers but there's also advantages to having a separate connection pooler. The one that I use the most is PG Bouncer, but there's also PG pool that you can use. And they talk about how you can have many clients that are connected through PG Bouncer utilizing a smaller number of actual database connections. And you typically achieve this using transaction pooling. So if you want to learn more about max connections and setting it, here's a blog post to check out. [00:03:36] The next piece of content is six SQL tricks every data scientist should know. And this is from towardsdatascience.com and they're talking about some SQL tricks you can use if you are a data scientist, but this is equally applicable if you use SQL in your daily work. The first trick they talk about is coalesce to basically recode null values into some other type of reference. Like for example here they coalesce a null value into something called Missing. So when they query it, it says Missing. Now I should say this, this is for Oracle, so it's not PostgreSQL specific, but I thought a lot of the tips that they gave here would be beneficial. The next one they talk about how you would do a rolling total and accumulative frequency. Well they show it using window functions and a number of these tricks use window functions. So of course they're using this to get the window functions. And then as part of the subquery that they built, they're able to calculate this cumulative frequency partitioning by this join ID that they set up here. The next trick is find the record with extreme values without self joining. And again here they're using a window function to achieve that with this code here. Next is a conditional where clause. So using a case statement and a formula here to be able to determine what needs to be selected. Fifth is looking at a lag and lead to work with consecutive rows. So here's an example where they are using the previous number as a reference to indicate differences and six has to do with Python and R. So if you do use those, this is potentially relevant. So if you're wanting to learn some more tricks and tips working with SQL, particularly window functions, definitely check out this blog post. [00:05:25] The next piece of content is SQL optimizations in PostgreSQL in versus exist versus any all versus Join. So they're looking at different ways to write particular queries. And here they are looking at some inclusion queries and no matter which way you write it, whether it's using N ne exists or an inner join to get the answer that they're looking for, the PostgreSQL planner is intelligent enough to give the exact same execution plan. So no matter which way you write it, when you do an explain plan, it'll give you the same result. So that is good news, it is able to find the most efficient path to give the answer that you need. But then they looked at some exclusion queries. So using not in not all not exist left join whereas and is null. And here they got a couple of different variations from the plan, but kind of based on what they were interested in. It looks like the not exists or the left join where data is null, tends to give close to the better performance. Now this actually did report something faster, but they said once they increased the number of records, then the not exist left join one. And the other thing of note was down in the summary, they gave some pretty good advice in terms of developers writing good SQL statements. He says first make a list of tables from which the data should be retrieved, then think about how to join those tables and then think about how to have the minimum records participating in the join condition. So always think of what's the minimum number of records I need to pull to do this. That will generally lead to the most efficient query. And then of course, quote, never assume that the query is performing well with a small amount of data in the table. So it's always advantageous to work on a copy of production or production in order to gauge what is a good query because the statistics are different and different decisions will be made by the query planner based upon those statistics. So you need to have something as close to production as possible. So this is another good post about SQL to check out. [00:07:34] The next post is Full text search in milliseconds with Rails and PostgreSQL. Now this is talking about the Ruby on Rails framework, but most of this post covers PostgreSQL and full text search. So they give some comparisons of using like and I Like versus the Trigram extension for similarity searches and then full text search and they show how you can use TS vectors and then TS to queries in order to query and do a full text search. And they're even including rankings here. Now once they look at some of that, they then used a full text search within the Rails application framework. So here they're using a library that makes it easier to work with, and this library is a Ruby DRAM and it's called PG Search. They show you how to put it into your application and how you can do a basic search and what the query actually runs to give you the results of that search, and then how you can actually configure it so you can define what columns you're searching against, what dictionary you're using, and even doing a weighting. Now, in terms of optimizing it. And they're using postgres twelve to do this, they're using generated columns. So they are showing a database migration where they're adding a column that it's doing a generated as stored where they're setting awake. They're converting it to the English dictionary, two TS vector for the title and waiting that as A in the description waiting at B. So they're doing as much work as possible within the generated column without having to resort to triggers. Then of course, they applied a gen index on it. They adjusted the application configuration to know to look for that particular column. And after doing that optimization, a particular query that did take 130 milliseconds now takes seven milliseconds. I'd be interested to know what difference the generated column had versus the gen index, but it's still quite a significant speed boost. So if you're interested in learning more about fulltext search, definitely blog post to check out the next piece of content is actually a YouTube video and it's Webinar security and compliance with PostgreSQL by Boris Mayas, I believe. My apologies if I'm pronouncing that incorrectly. This is from the second quadrant PostgreSQL YouTube channel and this webinar is about an hour in length and it covers all sorts of different aspects with regard to setting up the security of your PostgreSQL installation as well as compliance, namely talking about PCI DSS compliance. So if you have interest in that topic, definitely check out this webinar on YouTube. [00:10:16] The next piece of content is waiting for PostgreSQL 13 add logical replication support to replicate into partition tables. And this is from Dep.com and they're talking how you can actually do logical replication into a partition table. So that's particularly interesting. And this is another feature that's been added to capabilities with regard to logical replication. So definitely an interesting new addition. [00:10:45] The next piece of content is the origin in PostgreSQL logical decoding. So this is talking about logical decoding, basically decoding of the wall files by some other application. So it reads the wall files and decodes logically what changes have been made for say, doing a change data capture solution and they're talking about the origin, which is defining the origin, where the data came from. So it was logically replicated from what origin and they go through a bit of the internals describing it and how it works. So if you're interested in that, check out this blog post from Higo CA. [00:11:22] The next post, also from Higo CA, is replicate multiple PostgreSQL servers to a single MongoDB server using logical decoding output plugin. So there's definitely a specific use case. But if you're interested in that, again dealing with logical decoding, check out this blog post again from Higo CA. [00:11:43] The next post is intersecting GPS tracks to identify infected individuals. Now this is with regard to the sickness that is happening and using GPS tracking, they're showing a solution to be able to essentially do contract tracing, it looks like. So setting up data structures within PostgreSQL, set up sample tracks via QGIS segment sample tracks to retrieve individual track points, and then do the intersection of infected individual to retrieve possible contacts. So if you're interested in this type of content, definitely check out this post from CyberTech Postgresql.com. [00:12:22] The next post is the PostgreSQL Person of the Week is Ilaria Batistan. Please forgive me for that pronunciation, but if you're wanting to learn more about Ilaria and her work in PostgreSQL and Contributions, definitely check out this blog post. [00:12:39] The last two pieces of content are from Dep.com, and they're talking about things we've mentioned in previous episodes of PostgreSQL about features that are coming in version 13. The first post is Allow Auto Vacuum to log wall usage statistics. Definitely advantageous. The second is Add the option to report wall usage in Explain and Auto Explain. So if you're interested in those topics, definitely check out those blog posts. [00:13:09] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive Week click notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 52

fsync Stopgap, CTE Changes, autovacuum_naptime, Postgres Community | Scaling Postgres 52

Episode 16

Storing Bits, Autovacuum, Assertions, Logical Recovery | Scaling Postgres 16

Episode 102

UUID Use Cases, pg_settings, Automated Restore, Parallel Future | Scaling Postgres 102

Max Connections, SQL Tricks, Fast Text Search, Security & Compliance | Scaling Postgres 110