Per-Table Autovacuum, FDW Synchronization, Distinct On

[00:00:00] In this episode of Scaling Postgres, we talk about per table auto vacuum, foreign data wrapper, data synchronization, distinct on and postgres open. I'm creston, Jameson. And this is scaling postgres episode 33. [00:00:18] Point. [00:00:22] All right, our first article is per table auto vacuum tuning. And this is from the Keith's Ramblings blog at keithf four. And basically this blog post covers auto vacuum configuration and he has some suggested changes for how he likes to set up auto vacuum, some different scale factor and threshold sizes he likes to adopt for the whole database system. But this post also goes into what you can do per table because as your tables get extraordinarily large, you may want to consider setting auto vacuum settings per table and potentially getting rid of the scale factor and only going with a threshold. So for example, he talks about using PGSTAT all tables in order to look at the statistics with regard to when a table's been vacuum and analyzed. And then for a particular table, he shows you how you can update per Table auto vacuum settings for it. So as your database gets larger and even more specifically certain tables, this is a great post to check out to see if you want to potentially make changes on a per table basis to your auto vacuum settings. [00:01:28] The next post is foreign data wrappers for data synchronization. Now, normally when you think of synchronization, or at least when I think about it, you think of you're setting up a replica so using physical replication to replicate your database to a readonly copy or potentially even doing logical replication where you're defining certain tables you want to replicate from one database system to another. However, this post considers using data wrappers in conjunction with materialized views to do it. So he goes over installing the foreign data wrapper for this purpose, which is basically creating the extension postgres foreign data wrapper, setting up or creating the remote server and mapping a particular user to be able to access that foreign server and then create the foreign tables so that you can reference them. And he shows the command here to use, for example, import foreign schema and then you can limit it to particular tables. So that pretty much sets up your foreign data wrapper. And then he went ahead and set up a materialized view for these foreign tables and he says, quote, whenever the data needs to be fresh, just refresh materialized view in the view name. Or you could also choose to do this concurrently if you'd like. So this is a little bit different way of setting up synchronization, kind of like on demand. So you may have a use case for this. [00:02:48] The next post is select distinct on in PostgreSQL and this is from the Geekytidbits.com blog. Now, this is a query that you will see a lot, or at least I've used it a lot. Basically you want to get the most recent record from a table and get more than just the date from it. So in a sub query or in a join table, you pull out an ID or the data that you're working for and the max timestamp, and you pull all that data for the most recent timestamp. Essentially. However, this is a perfect use case for distinct on. So instead of having to do code like this using Join or sub queries, you can simply do select distinct on. You're grouping the data by what you specify in the on clause. The order of the statement places in the proper order with the most recent timestamp first, and then after the select distinct on, you specify the fields or the columns you want for each row returned. So essentially you do the same thing, but with a much cleaner syntax. So this is a very quick and easy post, and if you're not using select distinct on or familiar with it, definitely a post to check out. [00:03:56] The next piece of content is actually a YouTube channel and it's the Postgres Open SV 2018 channel, and I mention it because it has five new videos that were posted this week. Now unfortunately, I haven't had time to go through all these videos, but they cover topics like Debugging, the Postgres Planner, a Scalable version, document store in PostgreSQL, the Evolution of Postgres, High Availability Dude, Where's My Byte? And Bloat in PostgreSQL. A taxonomy. So if you're interested in learning about any of these topics, definitely a piece of content and YouTube channel to check out. [00:04:31] The next post is new in postgres eleven, monitoring JIT performance, auto prewarm and stored procedures. And this is from the Pganalyze.com blog. So last week I went over a lot of content talking about the new features in postgres Eleven. So some of this is a repeat, but actually some of his JIT measurements were pretty interesting. But he covers the JIT Endtime compilation in postgres Eleven, the auto prewarm feature that's coming as well as store procedures in postgres Eleven. So if you're interested in learning more about these specific features, definitely a blog post to check out. [00:05:08] The next post is webinar. New features in PostgreSQL eleven. Follow up. And this is from the second quadrant.com blog. And this is about an hour and five minute presentation that goes over all the features in PostgreSQL Eleven. Now, he did mention they're projecting the release date for PostgreSQL Eleven coming around October 18, which is about two weeks away, so that's definitely some interesting news. So if you prefer a video format defining out about the new features in PostgreSQL Eleven, this is definitely a webinar to check out. You do have to register for it, but it's on demand so you immediately get access to it. Although I didn't notice anything that wasn't in the other articles that mentioned the new features coming in the previous episode of Scaling Postgres. So if you've already read over these features, this presentation may be a bit redundant, but if you prefer a video format, he definitely goes over a presentation that shows all the new features that are coming in PostgreSQL eleven. [00:06:08] The next post is scaling PostgreSQL using connection poolers and load balancers balancers for an enterprise grade environment. And this is from the Procona.com blog. So they've had a number of presentations that they've covered for what they called enterprise environments in terms of handling security, high availability, backups and now this one is on scaling with connection poolers and load balancers. So these posts are mostly a summary and in terms of connection pooling, they're basically talking about setting up Pgbouncer and in terms of scaling across multiple servers, they're talking about using HAProxy for that purpose. And they go into the configuration not in detail but some of the main points about how to configure PG bouncer on your application server, to talk through HAProxy, to Direct, some statements to Replicas and some statements to your primary database. So it's relatively brief posts, but if connection pooling and high availability are of interest to you, this is definitely a blog post to check out. [00:07:12] The next post is Moving Table Spaces and this is from the Momgen US blog and basically he goes over the process of how you could potentially move a table space from one area to another and he has a very quick reference for it right here. First record the OID of the table space you want to move. Shut down the postgres cluster, move the table space directly either within the same file system or to a different file system. Update the OID symbolic link that represents the move table space to the new table space directory location and then restart the server. So basically you shut down the cluster, move the data over and then move the pointer so that postgres can find where that new location is. So if you've ever had the need to do this, this is definitely a piece of content to check out. [00:08:01] The last set of postgres are all about postgres and all from the Paul Ramsey blog at Cleverelifent CA. The first post is five times faster spatial. Join with this one weird trick and basically he did it by not compressing the data using uncompressed data. And in his conclusion he says for a 50% storage penalty we achieved a 500% performance improvement. Now I'm not that familiar with postgres, but if you use it, this might be a blog post you want to check out. The next two posts also from the Paul Ramsay blog are talking about the PostGIS Code Sprints. First one is the code Sprint number one and Code Sprint number two. So again if you're using PostGIS, definitely two pieces of content to check out. [00:08:48] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 318

Postgres 17 Beta 1 Released | Scaling Postgres 318

Episode 189

Index Downsides, TCP Keep Alive, Development with Postgres, Learning PL/pgSQL | Scaling Postgres 189

Episode 377

Sharding At Network Speeds | Scaling Postgres 377

Per-Table Autovacuum, FDW Synchronization, Distinct On | Scaling Postgres 33