State Of Postgres 2023 | Scaling Postgres 296

[00:00:00] So this episode is due to release on Christmas Eve. So let's check out the gifts the postgres community gave to us and the content this week for this episode. But I hope you, your friends, family and coworkers continue to do well. And I was thinking, you know, this week we're in the holiday season, there's probably not going to be a lot of content. Boy was I wrong. There's a lot, lot of linked content here. But as I typically do, this is the top ten content according to me. I usually do the top rated first and then follow up with the rest as it comes. And it's not exactly ten. Maybe it's eleven, maybe it's nine. But the first piece of content is the state of Postgres 2023. And this is from timescale.com. And they have released their survey that they did, I think, in August and September. And I probably shouldn't have shown the preview because there's a lot of information that's repeated, but we'll take a look at some of it. So I did mention some in last week. This is something that wasn't mentioned as to why people chose postgres. The number one reason is the open source license. So that's far and away the most popular reason. And I believe this is just a recording of how many people have been using postgres. Next came the feature set, except for those who've recently adopted it, which is kind of interesting, or it's not as popular a reason. Next is its reliability. And interesting to me, the older you are, the more you like the reliability. So lessons learned everyone. [00:01:32] Reliability is important. Next, it looks like it's pretty much extensions and then performance and then it falls off from there. So that's some pretty interesting information. [00:01:45] The primary use case for using postgres is for application development, which is not too surprising. What kind of industry are you in? Information technology, software, SaaS and finance and fintech was the top one. Some areas talking about the community, we talked about the extensions and here's an option that they talked about the tools used to connect to PostgreSQL. Most often PSQL wins out. And that's the only thing I use. I don't use any of these others such as PG admin, DB, Beaver, Datagrip, intellij. What other third party tools are used? Most people said I don't use any third party tools or PG bouncer, although there are a few others like DepsC explain and PG backrest, PG analyze, et cetera. Looked at different visualization tools that were being used, and then the section about AI that again we discussed last week. So that's pretty much the survey. So go ahead and check it out if you want to look at the survey in more detail. [00:02:46] Next piece of content is actually a repository at GitLab at Postgres AI under PostgreSQL consulting under postgres how to so this is Nikolai, who's one of the hosts of Postgres FM and I think the founder of Postgres AI. And apparently, according to the readme, he started a Twitter stream or thread. He calls it a postgresql marathon where his goal is to create daily posts, 365 of them, and apparently he's on 79 and it's just how to do different things in PostgreSql looks to be maybe from a consultant perspective, like one is how to troubleshoot and speed up postgres restarts how to troubleshoot startup it's got a few on PG stats statements, so I figured this would be a great reference to have so I would definitely keep this link as he keeps it updated. I don't know if he's eventually planning to do this into a different format, but it's available here now and if you click on it, of course it gives you more detail in a blog post format. So definitely encourage you to check this out, or if not, at least bookmark it for when you have problems. Maybe you can investigate this next piece of content broken foreign keys how can that happen in PostgreSQL? This is from cyber hyphen postgresql.com. They're talking about issues where foreign keys can become broken. So foreign keys actually use system triggers, and that a single foreign key constraint will create an after insert and an after update trigger on the referencing table, and an after update and an after delete trigger on the referenced table. This is to make sure that you have no orphan records that get created based upon changes. Now the system triggers, as he mentions here, are written in c and ignore the ordinary MVCC rules to avoid race conditions. But he says if you try to use user defined triggers to do something similar to what foreign keys do, you're going to be running into race conditions unless you use serializable isolation or implement some sort of locking. [00:04:50] But you can take a look at these foreign key triggers. These system triggers in the PG trigger table. So we showed you how to query to see kind of what they look like when you're setting up a foreign key. So one way they can get broken is of course through database corruption. Thankfully, that tends to be a rare occurrence with postgres data corruption in and of itself. Second reason is they can become broken by the session replication role. And he says, quote, when logical replication replays data modifications on the subscriber, the order of these changes might conflict with foreign key constraints on the subscriber. Now one way to get around that is to use the session replication role and alter it to say replica. Then what happens is the triggers won't fire, including triggers on foreign keys. So this is on the subscriber. So it is possible to create orphan records that way. So you need to be cautious of that. The next way that they can become broken is by actually just disabling triggers. By using the alter table command to disable all triggers. Well, it's going to disable triggers as well as the triggers that maintain the foreign key constraints. So that's something to be aware of. And a final way mentioned that you can break foreign keys is that with foreign keys you can define a cascade. So cascade deletes from a reference table to the referencing table. Well, the problem is any triggers can also fire on that referencing table. And basically if you set up a before trigger on that referencing table, that cancels the operation, and you can do that by just returning null. In his example, he shows here, basically it's not going to delete that referencing row and you have an orphan row. So those are some of the ways that you can break Postgres's foreign key constraints. And you probably just want to review these. Make sure that you try not to do that next piece of content. The postgres playground byos bring your own SQL. This is from crunchydata.com. And the postgres playground is something that crunchy data set up in their developer area that allows you to run an instance of postgres in your web browser. And they're saying, hey, you can bring your own SQL files that can create tables, create data within this instance of postgres because it's just running on your web browser. So they have an example here where they're just creating an example table and inserting two values into it. And they say because it's its own instance, you can even set all sorts of parameters as a part of that SQL that you send to it as well. And basically their recommendation on how to set this up is to create a gist on GitHub that contains the SQL you want to run. And then you build your playground URL, passing it in an SQL parameter that links directly to your gist. So he has an example right here, and when you run it, it loads up their playground. Postgres in the browser runs your SQL file, so the schema or the data you've prepped is all loaded and ready to go. So this is pretty cool. Go ahead and check this out if you want to learn more. [00:07:55] Next piece of content zero downtime postgres upgrades and how to logically replicate very large tables this is from pganalyze.com and this covers the post that we covered last week. I think it was the second piece of content we covered where the knock app site was logically replicating from postgres. Eleven to 15, I believe. Yes, eleven to 15. And they use logical replication to do it. Well, Lucas covers that in this episode of five minutes of postgres, and he also includes a number of references to other ways that you could replicate that data as well. So if you plan on doing any types of logical replication upgrades, I definitely recommend checking out his piece of content as well. And then secondly, I included in this the post I mentioned last week. I didn't actually show the post, but this is the actual post where you can do, as it says, logical replication for some tables based upon a dump. And this is a different technique than I think even the ones mentioned in five minutes of postgres, where you start a PSQL session using the replication database, using the syntax here, and then you create a replication slot within that session. You keep that session active, and then you can place a dump using the snapshot that you're given as part of the output, and that basically holds the position of that slot while this PG dump is going on. Once you've copied the data over, you use this command to create the subscription specifying the slot name, telling it not to create the slot. Make sure the enabled is false and copied data is false, because if you already copied the data over using PGdump in this example, and then you alter the subscription to enable it. Now how this can help with large tables is that I've actually used this technique in parallel. So instead of using PG dump, I think I used copy. You could probably use PG dump as well, but for a single table I parallelized it across eight or ten sessions or processes to the database, and that gave me the maximum amount of throughput on the system to convert the data as fast as possible. So this is another technique you want to review in case you're considering doing this types of logical replication upgrade. [00:10:11] Next piece of content there was another episode of Postgres FM last week. This one was on hash indexes. So they talked a little bit about the history, the pros and cons of them. And basically the pros are not that high. The cons are pretty high. There may be specific use cases like I think the one that they mentioned was if you have a really large text field, it's not going to be storing that in the index. It only stores the four byte hash of whatever value you're trying to index in the index. And for example, b trees have finite limits of how much you can store in a B tree index. So if you have huge text, you're trying to test a quality for a hash is a great way to do that, because you could just hash it and it's just one hash key, and that hash key is what's stored in the index for a hash index. So that size may gain some advantages on lookups and queries, perhaps, but the big downside is that hashes only work on equality, so you can't look on ranges greater than less. Land doesn't work. You can't do any ordering like you can with a Beatri index, and plus you can't do index only scans because there's no value stored with it. Although an interesting thing that I thought about during the episode is that Nikolai mentioned how he created his own hash index by using a Beatri index and then using a hash function on it to basically index a hash. Well, you could make that a covering index and include certain payloads along with it. So that could give you a hash like index that has the potential for an index only scan depending upon the type of data you want to bring over with it. But this is another interesting episode you can check out if you're interested. [00:11:53] Next piece of content. What should you know about PostgresQL minor upgrades? This is from Cyber, postgresql.com, and basically the minor upgrades are when you're looking at a version of postgres, the first number before the dot is the major version, the second number is the minor version. And our minor versions only contain bug fixes, so there should not be any new features that'll change something unexpectedly, the minor upgrades are always binary compatible, means you should just be able to install the new software, restart postgres, and everything works with the data. There's no data changes needed, although caveat with that in a second. And he says minor upgrades won't introduce any bugs, although he said that's happened. So I would say it's rare to have any new bugs along with it, but the general recommendation is to install these as soon as they come out, because usually the benefits outweigh any disadvantages that have happened to be around. And if you have any concerns, maybe wait a week or two to see if anyone's reporting issues before you go ahead and do an upgrade. Now, how do you know that a new release has happened? Well, of course I announced these on scaling postgres, but there is a postgreSQL release roadmap link here, and it tells you when they're projecting to do each of the point releases, and usually the unplanned ones, only happen if there's a really severe, probably data corruption bug or some big huge security issue. But otherwise they tend to follow the plan. And they say you can also get alerted by subscribing to the PGSQL announced mailing list. Now back what I was saying in terms of there's no real changes that are needed to the data. You do have to look at the release notes because not infrequently, there are times where you have to do specific things, like for example, reindexing certain types of indexes is probably the number one thing that I see. So you just need to follow the recommendations. And are you using those specific index types? And if so, then you do need to plan a reindexing operation after upgrades of minor versions. But if you want to learn more, check out this blog post next piece of content Global Search inside a database this is from Postgresqlverite Pro and this is following a process where you know you have a value, but you don't know which tables or columns that value may be stored in. So this blog post has created a database wide search in the form of a function so that you can do a global search with a particular search term, and then you can narrow what parts of the database you want to focus on through these different parameters. And it basically searches all columns or rows to find those. [00:14:43] So it goes into a lot of detail of the different parameters. But actually here's the 85 lines of source code that does it. So it looks through the different schemas and does this to find searches of the data. And it has some examples down here where you can see what a search of foo results in. And it says okay, in this schema for this table name and this column, this column value was discovered and here's the row Ctid. [00:15:10] So if you have a need for something like this, definitely check out this blog post. [00:15:15] Next piece of content highlights from podcast episode about postgres monitoring with Lucasfiddle and Rob treat this is from citusdata.com and they're talking about a podcast episode, and this is basically the text version of it. But I'm sure you can get the link to the podcast if you want to learn more about this topic. [00:15:33] Next piece of content extracting SQL from wall this is from Archjuju GitHub IO, and this is the second part where he needed to extract data from a wall stream, but it wasn't set for logical replication, so it didn't have what you would normally expect to see here. But he goes through the process of analyzing and pulling out the data that he needed. So you can check out this blog post if you want to learn more. [00:15:59] And the last set of posts are about postgres. So the first one is data science is getting ducky. This is from blog clever Elephant CA, and he's talking about how he just got a new m two pro Mac where a geos compile time that took 20 minutes went down to 45 seconds. So that's impressive. And as a result, he believes that you could use more in processed databases on your own system to do gis tasks. And he specifically mentions ductDb here. So it's basically a thought experiment about this and the evolution of column oriented data or hybrid storage layouts for storing data and working with parquet files. So if you're interested in that, you can check out this blog post. The next one is working with GPS data in postgres. So this is from rustprooflabs.com and they show you how you can get this set up in PostGis. [00:16:56] And finally, there's postgis clustering with dBScan from crunchydata.com and they discuss how to work with clustered data with these different functions in postgres and postgis. So check this content out if you'd like to learn more. [00:17:12] And before I sign off, there will be no episode next week because I am taking some holiday time off. I hope you have a great holiday and new episodes will be coming in the new year. Other than that, I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links for all the contents discussed, as well as a podcast version of the show and a full transcript. Thanks and I'll see you in 2024.

Show Notes

Episode Transcript

Other Episodes

Episode 109

Show Plans, WAL Monitoring, Using UUIDs, Default Logging | Scaling Postgres 109

Episode 247

Generate Test Data, Faster Archiving, Date Statistics, Useless Indexes | Scaling Postgres 247

Episode 9

High Volume Processing, pg_wal Issues, Restore with pg_receivewal | Scaling Postgres 9