Postgres 15 RC1, ICU Collations, Listen & Notify, Understanding TOAST

[00:00:00] In this episode of Scaling Postgres, we talk about postgres 15 release candidate one, ICU Coalitions listen and notify and toasting. I'm Kristen Jameson, and this is scaling postgres episode 235. [00:00:24] All. All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 15 RC One released. So this is the first release candidate and this is from Postgresql.org. And this means we're really close from releasing postgres 15 and they're projecting to do it in basically two weeks, october 13, 2022. So it looks like, of course, most of the changes are primarily bug fixes. Now is a good chance to try out this release if you're planning to upgrade it as soon as it comes out. [00:01:03] The next piece of content, ICU features in PostgreSQL 15, this is from Peter Eisentrout.org. So he says since version ten, they've had ICU support to add it to specific objects, but not at the whole database level. And now this is possible with postgres 15. So when you create your new database using in a DB or your operating system versions way of doing it, you can specify locale provider to be ICU, and then you can also specify the ICU locale you want to use. So this is great and it should make it a lot easier for people to start using ICU Coalitions. Now, there is an objective to eventually move to this as the default ICU correlations as opposed to the Libc correlations. But there's still some things in the postgres code that you still need to define the default locale as well, using the Libc correlations, for example. And he talks a little bit about that here. Now the reason why this is particularly important is that when you do version upgrades of the operating system on which your system runs, typically the Libc version changes and there could be hidden correlation changes that break essentially your indexing of your database and it could cause rows to be missing or unique indexes to not operate properly. So it can cause a big mess. And basically, if you're changing operating system versions, which typically implies a Lib C version change, you have to re index all the text based indexes to handle that. But that pretty much includes every non number based index because even I think some JSON B type indexes run into issues with correlation changes. But I really like the last bullet point he mentions here is that quote, I would like to move toward making ICU the default. So basically, quote, get to a point where we think ICU and Utf Eight first and where all the binaries are built with ICU enabled. So I really like this path to avoid this surprising version change, because presumably you could then upgrade operating system versions and upgrade your ICU collation version whenever you wanted to. You didn't have to do it along with an operating system upgrade. But it looks like this is still a work in progress. I mean this feature is ready for 15 but in order to make the reality here ICU the default, it sounds like it may be version 16 or version 17. We'll just have to see. But if you want to learn more about that definitely check out this blog post. [00:03:35] Now related to that, the next piece of content is five minutes of postgres episode 37 new in postgres 15 logical Replication Column Filters and Database Wide ICU Correlations. So of course the content related to the previous article was the database wide ICU Coalition. So Lucas talks about that. But he also mentions the logical replication column filters that's coming in postgres 15 as well. And that was mentioned on last week's Scaling Postgres episode. But if you want to learn more about that definitely check out this piece of content from Pganalyze.com next piece of content, listennotify automatic Client Notification in PostgreSQL. This is from Cyber Python postgresql.com and they're talking about essentially the pub sub feature of postgres called listen notify. And basically this lets you check for changes and do some work. Now one way you could do it is doing polling. So you could have some sort of code that has a loop, checks for changes, say to a table, does some work and then it sleeps for a period of time. But then it polls again. So this is Pub sub where there's an actual change in the database and it notifies you that the change has happened rather than you having to check periodically. So first there's the Listen command. And that essentially lets you listen on a channel that you choose whatever name you want to make it. And then if a particular event happens you can notify those who are listening to that channel. So you notify and then say that same channel name that say other clients are listening on. And you can optionally include a text based payload. So to get started you would just open a client connection and say listen and then whatever channel name. And he chose x here. But then if you want to send a notification you just say notify x the channel name. And then whatever payload you want to send say some message. Now the psql client sends this type of information. It says asynchronous notification x with payload shows the payload received from server process with PID and the process ID. Now different clients are going to behave differently. So if you have a Ruby client or a JavaScript client there's different ways to hold a connection open and keep listening for notifications to come in. So there's different ways to do this type of implementation. But this is how psql gets back the notifications. But you don't just want to send a notify. You want the notify typically to be triggered by something. And the example he has here is he creates a table. He creates a function that will be fired by a trigger and basically waits for an insert or update, and then it sends a message where it's executing the Notify command the given channel, and then the payload that's coming in where essentially he's just sending the contents of the row. But there's another convenient function that you can use called PG Notify, and that basically is a function. The first parameter is whatever the channel name is, and the second parameter is the text based payload. So you can see here, when it sent the notice, it sends the payload as essentially whatever the row contents were. Now, they do emphasize the point here is that the notification is not immediately sent as soon as something is inserted. It needs to be waited until that transaction is committed, because other sessions need to be able to see that this insert has happened. So if you do this as a part of a transaction, the notice will be sent as soon as it's committed. But this is a great post that explains how Listen Notify works and how you can get it set up. [00:07:05] Now, related to that, there's another piece of content, this one from Crunchydata.com, called Real Time Database Events with PGE Eventserve. So they're covering. Listen notify as well. So they have a diagram here that shows a separate web service that they designed, I believe in Go, but essentially uses Listen Notify, and then it sends out information via WebSockets. So you could design something equivalent to PG Eventserve in any other language you want to, but this is a particular web component that they created to be able to do it. It's still using Listen, so it happens to listen on a channel they called People, and then they're using, again, that same trigger mechanism to send notifications on the People channel. And then their PG Event Serve picks up those notifications and then sends out those messages via WebSockets. Another example of the function they're using, they're actually using JSON b to put everything together and then using PG Notified to send that JSON b that they've constructed as text. So they're casting it to text because the payload of the notification needs to be text. But then presumably their PG Event Serve parses that text back into an object based format. So you can download this and run it. And it gives this little interface here. Once you go to localhost 7700, you put in the channel name and it will send notifications as you insert rows into the database. So this is another type of implementation where you actually can look at the Go code, presumably to see how you can actually consume these messages. But if you want to learn more about that, definitely check out this blog post next piece of content, toasting Strategies in PostgreSQL. This is from DBI services.com, and this is a series of three blog posts explaining toast, and they're relatively short, but packed full of a lot of useful information. If you're not as familiar with Toast, which stands for the Oversize Attribute Storage technique and it's basically the separate table dedicated to storing additional information that can't fit in a particular row for a particular column. And when you create a table, there's actually three different strategies that can be used for storing the data by a column. So for example, one column could be main, which means there's no compression and no out of line storage, which means it's not going to be stored in the toast. There's plain, which can be compressed but no out of line storage. It's not going to be stored in the toast and then extended, it's compressed and it can be in the toast. So he created a table here with three different data types an int, a text and a numeric. And then you can actually look at the different strategies that by default the database assigns to them. So an integer is plain, which means it can be compressed, but it won't be stored in the toast. Numerics don't get compressed, don't get stored in the toast, whereas text can be compressed and stored in the toast. Then he mentioned there's also extended, which doesn't do compression, but it offers out of line storage. And he says this is good for pre compressed files. Potentially you want to store like PDFs or Pings or MP3 s, anything that's already compressed. You don't want to compress it again because you're just wasting CPU cycles, basically. So that's what this post talks about, basically the strategies that are available. The second post toasting and PostgreSQL toast tables. So this shows basically how toasting actually works. It shows you that when you look in your list of relations, like if you do backslash D in a psql prompt, it won't show these extra toasted tables, you actually have to explicitly look for them. And this is how they show up in the PG Toast schema. And it's a prefix of PG toast followed by the OID of the table. And the best way to find the OID of a table because it can potentially change, is use a PG class. So he did this here to identify the particular toast table and he shows what the structure looks like. So basically you have a chunk ID that pinpoints the OID of the particular object or column. You have a chunk sequence number, which is an integer, which means if it data exceeds more than can fit in a row, it splits it down to multiple rows. So you can have as many different sequence numbers related to that particular column for a row in a database and then the actual data which is a byte array. So everything gets converted to a byte array when things are stored in the toast. So another brief post giving a great explanation of toast. And then the third one is toasting in PostgreSQL, let's see it in action. And here he's using the same table and he inserts a very large value into a text field, but he takes a look at how many Toast rows are available, and there's actually zero. So why is that? Well, it's being compressed, and it's being compressed so well, it doesn't have to store it in the toast. Then he talks about compression and how by default, PostgreSQL uses a built in Pglz compression method. But as of postgres 14, you can use LZ Four. And then he took a look at that compression algorithm, and the performance benefit, he says, is quite huge. So inserting a million rows using each type of compression algorithm, LZ Four was able to store data in 3 seconds versus 51 seconds for the default Postgres compression. So that's pretty significant win as well as the size is almost half the size for LZ Four, so that seems like a win all the way around. And he shows different ways that you can turn this on and enable it for your Postgres database as well. And lastly, he shows if you go back to do the inserts and make the inserts totally random so it's harder to compress that type of data. You now can see that 81 chunks have been created in the Toast table. So if you want to learn more about toast, highly suggest checking out these three blog posts. [00:12:59] Next piece of content book Release Mastering PostGIS and OpenStreetMap this is from Rustprooflabs.com, and basically he's announcing a book that is available for purchase and it's all about PostGIS and using OpenStreetMap data to work with it. So if you're interested in getting started with Spatial Analysis and Postgres and Postgres, definitely recommend checking out this blog post and the book to see if you're interested. [00:13:25] Next piece of Content all about PG Stat statements this is from Dev Two in the Gigabyte section, and this is exactly what it says talking all about PG Stat statements, which is probably the most used extension that's available for tracking the performance of your statements that are running in Postgres. So if you want to learn more about how that works, definitely check out this blog post. Next piece of content preferred Types this is from Mydbainobook.org, and she's talking about an issue where someone was doing a Power function and they noticed that when they tried to do a mathematical adjustment minusing it by one, it wasn't working. It was giving the exact same value, even though one was subtracted from the version one code. The reason being is because of floating point errors. So basically, if you don't explicitly state it the Power function, when you're putting in numbers, it assumes it's a double precision or it's a float, a double. But in order to get the correct answer, you need to actually cast it to a numeric or ensure that it's a numeric when you're doing it. And if you do that, you get the right answer. But it just so happens the double is actually what the default is. So if you don't specify what the data type is, it assumes it's a double, and she goes in and shows how you can discern what the preferred data types are for postgres. So definitely check out this blog post if you want to learn more about that next piece of content. Introducing PGSQL Lite, a pure Python module to import SQLite databases into postgres. This is from Innerjoin Bit IO, and they're talking about a utility that can convert your SQLite database into postgres, and it's a Python module, and he goes over the process of creating that and has a link to the Python module. But the other thing he mentions that has been around a while longer is PG Loader. So this is another way to do a migration into Postgres from three different types of databases MySQL or SQLite or even Microsoft SQL Server. And this is from PG Loader IO next piece of content. There was another episode of Postgres FM this week. This one is on why is Postgres popular? So definitely listen to this episode, or check out the YouTube video if you're interested in that. [00:15:45] And the last piece of content the PostgreSQL Person of the week is Nicholas Sullivan. If you're interested in learning more about Nicholas and his contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks, Sam.

Show Notes

Episode Transcript

Other Episodes

Episode 269

Upgrade Struggles, Hyperloglog, PgBouncer Usage, Postgres Scaling | Scaling Postgres 269

Episode 350

Seven Years of Scaling Postgres | Scaling Postgres 350

Episode 317

Time-Series Open Source Extension | Scaling Postgres 317

Postgres 15 RC1, ICU Collations, Listen & Notify, Understanding TOAST | Scaling Postgres 235