Postgres is Fine, Generated Columns, Postgres Inserts, Synchronous Replication

[00:00:00] In this episode of Scaling Postgres, we talk about Postgres is Fine, generated columns, postgres inserts and synchronous replication. I'm Kristen Jamison, and this is Scaling Postgres episode 122. [00:00:20] You all right? I hope you, your family and co workers continue to do well in these times. Our first piece of content is PG Friday. It's the end of the world as we know it and postgres is fine. This is from Secondquadrant.com, and this was a presentation that was given at the Chicago Postgres Users Group that was converted into a blog post. And it's all about monitoring. He first covers why monitor and of course this is good for determining why something potentially broke or failed. However, it's also important to know potentially what could break in the future. He then goes over to talk about some different areas you would want to monitor, such as transactions, sessions, status of replication, disk space, et cetera. And he mentions a few key areas that you definitely want to be monitoring to make sure that your postgres installation is running optimally. The first is a transaction ID wraparound to ensure that essentially that does not happen on your tables. And he gives you two different queries, one at the database level and at the table level to be able to see where the TXID is at. The next is making sure you don't have any prepared transactions that have been orphaned. Now again, this is not used very much, but if it's being used at all, you want to make sure that you don't have any of these orphaned because that could definitely cause a problem. Also monitoring your number of idle transactions, again, you want to minimize these as much as possible. And also the question about do you have hot standby feedback on because that could also cause some wall to build up potentially. Then he talks about checking replication lag, looking at slots, and also just checking PGSTAT replication in general, reviewing your temporary storage space checkpoint activity as well as your active transactions and your potentially your wall throughput. So if you are looking for a good set of areas to monitor for your postgres installation, definitely check out this blog post from Secondquader.com. [00:02:21] The next post is a comparative approach between generated columns and triggers. This is from Angres.com and they're comparing the new feature generated columns in postgres twelve to using a trigger method. So a generated column enables you to run an immutable function against data in the table to generate a new row, essentially. So in this table here that they've created, they have a radius named R and it calculates the area using a generated column using the function to determine the area based upon the radius. Now they check the performance of this. They show the system that they used here. And here are the benchmarks that they looked at. So using generated columns was the fastest approach. Creating a trigger in C got very close, as well as an immutable trigger in C got very close to the performance of generated columns, but generated columns still came out on top. The slowest was the plpgsql trigger and then the second slowest followed by the Immutable trigger. So in general, if you have an Immutable function, you would want to use generated columns. If you have use a trigger and need better performance, writing it in C would give you better performance than using, say, plpgsql. Now this is for insert operations. The update operations weren't that different between each of these methods, so that's pretty interesting. So if you want to learn more about these generated comms versus trigger benchmarks, definitely check out this post from Ongress.com. [00:03:52] The next post is a walkthrough PostgreSQL insert. This is from Crunchydata.com and it's talking about insert statements. The first thing it covers is that if you're needing to insert multiple values, you definitely want to do this as a part of one statement. So you could see here it has values and then comma between each row you want to insert. So if you have multiple values, this is a much faster way to do inserts. The next thing they cover is that if you happen to have need of data after an insert, you can use the returning function. So for example, in this case you're returning the ID to know what the ID of the row you just inserted was. So that could be beneficial if you then need to do work with that record. And then the last thing to cover is the different on conflict arrays you could use. So if you have a conflict during the insert, you could do nothing. So just ignore that insert such as you're violating a primary key, it would do nothing in that case. Or you could do an update, essentially an upset. It's going to try the insert. If that fails, it's going to do the update. So this is just a very quick post showing some different features of the insert statement in postgres. [00:04:59] The next piece of content is actually a YouTube video called when to use Synchronous Replication in PostgreSQL. And this is from the Enterprise DB YouTube channel. And this is a follow on from the blog post that was mentioned in the previous episode of Scaling Postgres. But I think this goes into a little bit more depth of the different individuals in the video discussing when to use synchronous, replication, and also considerations where there are times where you actually may want to have one setting. At the database level, but then for a particular session or transaction, use a more, say, aggressive settings for the synchronous commit command so that you can ensure that that particular transaction or set of data gets committed on all replicas with a high degree of confidence. So if you're investigating using synchronous replication, definitely check out this YouTube video for some additional insight. [00:05:49] The next piece of content is features in PG 13 deduplication in Btree indexes. So this talks about the new feature that dedups more entries in the B tree index in version 13 compared to twelve. They talk a little bit about how it works. It doesn't dedup everything, but it significantly increases the amount of deduplication that is done. Now, in their example here, they inserted about a million rows, but then to create the duplicates, they actually just updated those rows three times. So this isn't really a duplicate of the data itself, but they're creating multiple rows that I'm assuming haven't been vacuumed. So it's interesting they actually didn't create duplicate entries, so that may have given a slightly different result. But looking in postgres twelve, you could see the index size is 86 megabytes, whereas in postgres 13 beta two, the index size is 62 megabytes. So that's a pretty good space savings. Now, they also did a query to see, hey, is it any faster? And looking at version twelve, it returned a query from this table in 190 milliseconds, whereas in postgres 13 it returned in 174 milliseconds. So again, a little bit faster in terms of performance. So if you want to learn more about the performance implications and space savings coming with the deduplication of Beatri indexes in postgres 13, definitely check out this blog post from Higo CA. [00:07:17] The next post is random strings and integers that actually aren't. So this is a blog post that covers how to create random alphanumeric strings of this nature. So things used for maybe like coupon codes or things of that nature. And what it does is it converts an integer into this type of string repeatedly and it uses a feistal network. Now this is done using plpgsql functions. So this function takes an integer. So in this case, imagine this is a serial data type. So you're getting essentially the integer primary key and it's converting that into a random string of numbers first. So that's the first step. And you can actually use the same function to get this other number back. So once you then have this random number, you can use the string generation algorithm again in plpgsql and it converts it into these alphanumeric strings. Now, as a final bonus, if you use Python, he gave a Python implementation as well. So if you have a need to generate these types of random strings for coupon codes or other use cases, maybe you want to check out this blog post from Endpoint.com. [00:08:30] The next post is getting started with PostgreSQL operator four three in OpenShift. So if you use Kubernetes and want to use the PostgreSQL operator by crunchy data, check out this blog post on how to install it and how to use it. With OpenShift, the next piece of content is a simple way to trace a bug. This is from Higo CA and this is talking about a bug that the writer discovered in postgres in the circle function. And it talks about how he actually went through the source code using tracing to actually find where the bug is. So if you're interested in this technique, he used to be able to discover where this bug was, definitely check out this blog post. [00:09:13] The next piece of content is PG Timetable startup Improvements so this is some improvements that CyberTech Postgresql.com has implemented for their tool called PG Timetable, which is a scheduling tool for PostgreSQL. So if you want to learn about the changes and improvements to that, definitely check out this blog post. [00:09:33] And the last piece of content is the PostgreSQL Person of the Week is Kohei Kaigai. Forgive me if that pronunciation is incorrect, but if you want to learn more about Kohei and his contributions to PostgreSQL, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes be sure to head over to Scalingposgres.com, where you sign up to receive weekly notifications of each episode, or you can subscribe via YouTube itunes thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 218

PG14 Index Bug, View Permissions, Logical Replication Conflicts, AlloyDB Under the Hood | Scaling Postgres 218

Episode 165

Postgres Releases, Data Warehouses, Exclusion Operators, Read-Only Users | Scaling Postgres 165

Episode 18

Impact of Virtualization, Compressing Data, Parallelism | Scaling Postgres 18

Postgres is Fine, Generated Columns, Postgres Inserts, Synchronous Replication | Scaling Postgres 122