Postgres 15 Released, File Systems, Connection Pooling, Secure Connections

[00:00:00] In this episode of Scaling Postgres, we talk about postgres 15 released file systems, connection pooling and secure connections. I'm Kristen Jameson, and this is scaling postgres episode 237. [00:00:19] One. [00:00:23] Alright, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 15 released. This is from Postgresql.org and they've released postgres 15. They mentioned some of the main improvements made down below. They talk about improved sort performance with some benchmarks showing speed ups of 25% to 400% based upon the data type sorted. Improved performance benefits with window functions with things like row number, rank, dense rank and count and select distinct can now be executed in parallel. They talked about a number of foreign data wrapper improvements as well as the new encryption options for all files as well as PG based backup. Now on the development side, they talk about the new merge command, new regular expression functions and creating views that query data using the permissions of the caller as opposed to the view creator, and also a host of logical replication improvements. So definitely check out this blog post or any of the other content that we've been covering on Scaling Postgres. If you want to learn more about all the new features in postgres 15, next piece of content is be ready. Public schema changes in Postgres 15 this is from Crunchydata.com and they're talking about where a general user can no longer create objects in the public schema because it was deemed too much of a security risk. So if you have a general user and you just go to create a table, you're going to get a permission denied for schema public. So this makes it not as easy to get started with Postgres, but it's relatively easy to get started with a general user. If you want to do that, you could always use an admin user to create new objects. Or you could do it the way he describes here, where essentially you've created your user, he creates a user here. You create a schema for that user and authorize that user to use it. That means it becomes the owner of that schema essentially. And then when you create tables, it gets created in that schema. But you don't have to specify the schema beforehand because it's already part of the default search path. And he also shows you how you could share that data with other users. So basically grant usage on that schema for that user to another user and then grant selects on tables. Or maybe you want to grant inserts or other types of privileges for other users to the objects in that schema. But he says the easiest way to do this is using roles. So you create a role, grant the privileges to that role and then you assign users to that role. So definitely check out this blog post if you want to learn more about that next piece of content. A look at Postgres 15 merge command with examples. This is from Crunchydata.com and he's talking about the new merge command and it basically allows you to take two data sets and merge together the changes. And he says one primary use case for this is if you're collecting a bunch of sensor data in batches. He uses the example where you place that sensor data in a temporary table and then you merge it in with the main permanent table. So he shows you exactly how to do that for this particular use case. So if you want to learn more about that, definitely check out this blog post next piece of content postgres versus file Systems a performance comparison. This is from Enterprisedb.com. Now, if Postgres 15 wasn't just a release, this would probably be the first piece of content because this is a great piece of content, doing a lot of analysis using two different types of systems, hardware systems, that is, to determine what difference file system storage makes with postgres. So for example, a lot of people are just using Ext Four if you're using a Linux system. But ZFS is also becoming more popular now and Butterfs has been around for a while. So basically they used all these different operating systems and then used PG bench to run different metrics against them to see what difference the file system makes with postgres. Because Postgres, unlike database systems like Oracle, doesn't write directly to the hardware. Postgres relies on the operating system's file systems to write to the disk. Now, this is a super comprehensive post with a lot of graphs, as you see here when I'm scaling through it. But overall, after looking at all this, the conclusion I came to is that Ext Four is still the most performant file system you can use with postgres. But they did have problems with Butterfs, so you probably don't want to use that. ZFS was interesting. It is getting close to the performance of Ext Four and because you can turn off full page writes with it, you can actually have a much more consistent performance level of your disk compared to the checkpoints that are pretty visible when using Ext Four. But ZFS still was not quite up to the performance of the Ext Four file system. In some use cases they were showing it was half as fast when doing select queries of a small data set, that's not great. But in a lot of the other performance areas it was much closer to Ext Four. Now, I do have some experience with a client who actually chose ZFS as the underlying file system with Postgres, but they did that because they wanted the compression that ZFS offered. And of course, when you're adding encryption on top of it, you're adding even more overhead to it, potentially impacting performance. But this was a great analysis and I definitely recommend you checking this out to help you make the decision about what file system you want to use under Postgres. [00:05:56] Next piece of content PG Bouncer types of PostgreSQL connection pooling. This is from Cyber Hyphen Postgresql.com, and the first thing they're talking about is why would you want to use it? Well, basically whenever you create a new connection in postgres, it forks a process and that has a lot of overhead associated with it. So what a lot of people do is implement a connection pooler such as PG Bouncer because they can very quickly create connections. It doesn't use a process based model to do that and it maintains back end connections to postgres. So you can think of it kind of like a proxy. And he shows an example where he has a query where he's simply doing select one. And when he processes it using PG bench over 10 seconds and ten concurrent transactions, he gets 295,000 transactions per second. Now this is without pooling at all. It's essentially creating ten connections and doing the performance run. But here he says, okay, create a new connection for each query. Now upon doing that, the transactions per second goes from 295,000 down to 3000 or 3.7 thousand. So a 98.7% drop in performance. So it's huge. That overhead of creating new connections for each query, essentially. Now you're not always going to do that, but it just demonstrates how important not having to create a connection every time you need to use the database is and the argument for using a connection pooler like PG Bouncer. Then he goes on to talk about the different types of pooling that PG Bouncer can do. One is session pooling where essentially each connection gets its own session. So it's essentially a one to one. You connect to PG Bouncer, you connect to Postgres. It's simply an intermediary. Then there's transaction pooling and basically PG Bouncer grants client connections coming in per transaction. So basically this potentially lets you have more clients than you have postgres connections because you can have a many to one relationship. Because if there's any latency going on or if there's nothing actively happening with a given connection, PG Bouncer can give it to another connection that's coming in. So it helps you utilize more of your resources and you can have more connections on the client end. And I've seen as high as five to ten x the number of client connections to postgres connections. So you could, for example, have 500 or 1000 client connections going through 100 postgres connections. And then lastly is statement pooling where basically each statement coming in could go to a different connection on postgres. Now I haven't seen statement pooling used that much. I have done some session pooling, but for performance reasons, transaction pooling is the usual go to if you want pretty good performance. It's not as much as statement pooling, but you can run into issues having transactions being spread across statements, for example. But if you want to learn more about that, definitely check out this blog post. Next Piece of Content don't make databases available on the public Internet this is from Tailscale.com, and this is in reference to the post that we discussed last week where Bit IO was discussing how many postgres instances are open to the Internet and not asking for SSL encryption. So one of the things is just don't make it available on the Internet as the title suggests, but their solution to that is to actually use a proxy system. So for example, this is a VPN solution, Tailscale, and if you're in the VPN's network, you don't have to worry as much about configuring certificate SSL and using encrypted connections if you're going through presumably their VPN network. So basically you don't have to be as stringent with the postgres client. But this new open source software they released, PG Proxy, it handles the strict TLS and encrypted connections to the destination postgres server that's being hosted somewhere. So they've released this proxy essentially as open source software. Now, interestingly, you could probably do the exact same thing with PG Bouncer like we just discussed. So you could, for example, set the client connections to not require SSL and not use certificate authentication if it's a secure internal network. But then the server connections that PG Bouncer makes, you can specify it to require SSL and to use verify full to avoid man in the middle attacks. Another solution is just using any kind of VPN to talk to the ultimate server that you're working with. This could be using things like IPsec as well. So there are scenarios where you potentially could have postgres open on the Internet, but you need to be very careful with your security if you're going to do that. Now, another solution addresses this next piece of content. It's. VPC peering From Zero To Hero. This is from Timescale.com, and they're talking about a feature of AWS called VPC Peering, where you can actually have two accounts set up so that they communicate with each other as peers and virtual private networks essentially. And they talk about how to set that up with their Timescale cloud service. But presumably other providers offer this as well. So if you have your other resources in an Amazon VPC, maybe you could use another hosted provider's, VPC and Peering to be able to securely connect them without going over the Internet in an unsecure fashion. [00:11:20] Next Piece of Content PgSQL Friday wrap up and Review this is from Softwareandboos.com, and he's reviewing the community blogging that happened, and he talks about each of the blog posts that were done. These should have all been covered in last week's Scaling Postgres episode, so feel free to check that out as well if you'd like. [00:11:40] Next Piece of Content six Findings We Rely on when Managing PostgreSQL Indexes this is from Noct app and they're talking about how their engineering team thinks about and uses indexes. Now this is a more basic post, but in the interest of time I'd actually like to recommend watching Lucas's take on it because this is the episode he covered this week in his five minutes of postgres episode 39, Postgres 15 release and six tips for managing indexes. And this can be [email protected] next piece of content. Don't forget to tune Stats Collector for PostgreSQL 14 and older. This is from Procona.com and he's talking about with postgres 15 they move the Stats Collector from essentially writing files to disk to being stored in memory. But every version from 14 and older is essentially writing to disk. And there's actually a recommendation if you want the most performance out of your system and basically avoiding statistics impacting disk writes, you can actually use a Ram based file system for it. And two recommendations they have is for Ramfs or tempfs and ultimately it looks like they recommend Temp FS and they walk through how to set that up for your postgres installation. So definitely check out this blog post if you want to learn more about that next piece of content partitioning in Postgres 2022 edition. This is from Brandyear.org and he's talking about how far partitioning has come and basically he references years ago when before declarative partitioning came out, a partitioning solution and all the work that had to go into it was overwhelming. But with all the improvements that have been made in the versions over the years, he says it was quite easy to convert a large table into a partition table and he goes through the process of doing that here. But he does talk about some drawbacks. One is you can't concurrently create an index, which is a big issue I wish they would resolve at some point soon. But there is a way to get around it when you're adding indexes is that you can create a concurrently on the child and then at the end just create the normal index on the parent and it should see that the index has already been created on the children and it'll just create it relatively quickly. And he does say there is an outstanding patch for this and he hopes it's going to get in postgres 16. [00:14:03] The other drawback is not being able to support unique indexes across the whole table. So you can create unique indexes within a partition, but it doesn't operate on the whole parent table essentially. Now with regard to that, the next post global index, a different approach addresses exactly that. Point is when you're working with partition tables, again, you can't have a unique index across all of those tables and this is a bit of an issue and they've discussed different approaches and he says hey, this may be a different approach. So there's work definitely being done by Heigo CA here on trying to figure out how to do a global index for partition tables. So if you want to learn more about that, definitely check out this blog post. [00:14:46] Next piece of content. Five years of postgres on Kubernetes. This is from TheNewStack IO and this is from, I believe, a founder of Crunchy Data. And Crunchy Data I think made one of the first, if not the first postgres operator to use in Kubernetes. And he talks about the five year experience of doing it and all the changes that have been made, especially with the most recent version. So if you are looking to use Kubernetes with Postgres, definitely recommend checking out this blog. Post second blog post related to that is run PostgreSQL in Kubernetes solution Pros and Cons. This is from Procona.com and this is essentially a review of five different operators for Postgres that use Kubernetes. They review the Crunchy Data PostgreSQL operator, the cloud native PG operator from Enterprise DB, the Stackres operator from Postgres, the Zalando postgres operator as well as the Procona operator for PostgreSQL. So as you can tell here, there may be a little bit of a bias in this review because it's from Procona.com, but nevertheless, definitely recommend looking this over if you're looking to start using Postgres with Kubernetes. [00:15:58] Next piece of content postgres full text search versus the rest. This is from Supabase.com and in this post they compared several different search engines to postgres's full text search. They compared Miele search OpenSearch, which is in the open source variant of Elastasearch. I believe SQLite, FTS and typesense. Now he goes through the whole process of how they did the measurements, but I'm looking down here for the ultimate results and as you can tell, Postgres is in green and you can see it holds its own with essentially all the other search engines. Not quite sure what the issue is with Open Search here in terms of the latency of what's returned, but they show the number of results that were returned for given queries the average latency for them. You can see Postgres sometimes gets up there in terms of average latency, but overall pretty good showing compared to these other dedicated search engines except for potentially the SQLite solution. So it's pretty impressive. Now, what they did say is that some of the other search engines handled Misspellings much better. So there were some details in the actual results of the search where they excelled. So it wasn't just a pure speed decision. But of course we should mention that Subabase does run, I believe on Open Source Postgres. So there may be a bit of bias in this comparison, but nevertheless, interesting set of results if you would like to check them out. [00:17:23] Next Piece of Content how to set up streaming replication to keep your PostgreSQL database performant and up to date this is from Enterprisedb.com and this post does exactly that. It walks through setting up streaming replication including Asynchronous, as well as synchronous replication. So if you're interested in that, you can check out this blog post. Next Piece of Content there was another episode of Postgres FM this week. This one was on PostgreSQL 15 and it's new release. So if you want to learn more about that, definitely check out their show or click here to watch the YouTube video. [00:17:55] Next piece of content. The PostgreSQL person of the week is Christoph Berg. If you want to learn more about Christophe and his contributions to Postgres, definitely check out this blog post and last piece of Content we did have another episode of the Rubber Duck Dev show this past Wednesday evening. This one was on live streaming tools and toys with Aaron Francis. So if you want to learn more about the software and hardware that are used to do streaming for example, live coding streaming or just recording coding definitely welcome you to check out our show. [00:18:28] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 290

Postgres Releases = Reindexing | Scaling Postgres 290

Episode 140

Replication Conflicts, Postgres Governance, pg_crash, MobilityDB | Scaling Postgres 140

Episode 184

Postgres 14 RC1, Little Things, Index Advisor, Million Inserts | Scaling Postgres 184

Postgres 15 Released, File Systems, Connection Pooling, Secure Connections | Scaling Postgres 237