Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about postgres releases, alloy DB time zones in connecting to postgres. I'm Kristen Jameson, and this is scaling postgres episode 215 one.
[00:00:22] All right, I hope you, your friends Emily and coworkers continue to do well. Our first piece of content is PostgreSQL 14.313, point 712.1111, point 16 and ten point 21 are released. This is from Postgresql.org and this release seems to be primarily being done due to a security issue that affects auto, vacuum reindex and others where these actions omit a security restricted sandbox operation. So it says, quote, they make incomplete efforts to operate safely when a privileged user is maintaining another user's objects. Now, along with this security patch, there are also a number of other improvements that are included with this release. And you can feel free to review this post to get more details about each one. And then also as a part of this notification, they included a reminder that PostgreSQL version ten is end of life this November. So November 10, 2022. So if you're still on version ten, you'll want to plan your upgrade process ideally before then.
[00:01:26] The next piece of content, introducing Alloydb for PostgreSQL free yourself from expensive legacy databases. This is from Cloud, Google.com, and they have released this postgres compatible database they're calling Alloydb. So it looks to be similar to what Amazon has done with Auroradb and that what makes them different than other solutions in PostgreSQL. Their number one is superior performance and scale. So basically they're saying with whatever metric they've used, that Alloydb is four times faster than standard postgres and two times faster than Amazon's comparable PostgreSQL compatible service, which I think they're referring to Auroradb. So in addition to this article, there's also this companion YouTube video calling Introducing Alloydb a PostgreSQL compatible cloud database service. That's about 20 minutes. And looking at that in this article, it looks to be they're doing a lot of separation from the storage layer and the compute layer and dedicating resources for each in a unique combination, which seems to be some similarity with what Aurora DB was doing as well, where it was mostly the storage environment that changed the most from stock PostgreSQL. But in addition to greater performance improvements, they're suggesting they're also providing a pretty high SLA of 99.99. And they're saying it works great for real time analysis too, in that it automatically takes data and puts it in a column format that allows it to do much faster aggregate queries. And it says it uses machine learning to kind of enforce based upon your queries that certain data needs to be column stored. So basically handling some of these hybrid transactional analytic workloads and then their machine learning, assisted management and insights. So basically they'll take care of handling patches, the backups scaling up and down if need be, and the replication. So this is another hosted offering and I guess it's not surprising that Google has come up with this because Amazon has had Aurora for a while, which is a PostgreSQL compatible hosted database service, as well as Microsoft is probably their site. As acquisition is kind of doing similar things, I suppose, but that seems to be a little bit more open source than this solution. So even though they're talking about getting away from legacy database systems, this is a new system, but you kind of do have vendor lock in if you choose this route. But if the presumably low maintenance and better performance is important to you, then maybe you choose to go this route. The next piece of content time zone management in PostgreSQL, this is from CyberTech Postgresql.com and they're talking all about time zones. And the first thing that they're talking about in this post is that the SQL standard doesn't have a lot of support for time zones, particularly daylight savings time. So they have the concept of zones and time zone displacement from UTC time. But that's basically the support. Again, there's nothing that helps handle things like daylight savings time. So then he gets into talking about the postgres timestamps and there's basically two. There's a timestamp without time zone, so you don't include a timestamp when you use it and then there's timestamp with time zone, and when you use it, you're generally including the time zone. Now, what's interesting here is that timestamp with time zone doesn't actually store the time zone. How it's stored is the number of seconds since midnight of January 1, 2000 UTC time. So with time zone just basically means you use it with a time zone. So when you're inserting data into it, you include the time zone, whereas no time zone is included when you're using timestamp without a time zone. Now, he has a side discussion here is that some people choose to store times in terms of number of seconds from the Unix epoch. But he said you should really try to avoid that. And I personally agree because when you go to look in the database, you're not going to be able to know what date it's talking about. It's harder to do as he says here, date time arithmetic. So the support for intervals of adding or decrementing hours, month, day from a particular time doesn't work anymore and it could potentially lead to bad performance for your queries. So now with time zones, assuming that you're using the timestamp with a time zone, how does it actually calculate the time zones? Well, basically it stores that one time zone in the database, but then it uses a parameter to translate based upon what time zone you're looking for. So for example, he created a table here with a timestamp with time zone column. He inserted the values and because it's with time zone, he included the time zones here plus two and minus two. And how it determines what gets sent to the client is based upon how the client session is set. So this particular one, he set it for UTC time. So now the dates and times he's going to get back are in UTC time. This could be made Eastern Standard time or the Pacific time zone or a time zone in Europe, and it will translate those internal dates to the proper time zone based upon what you requested. Now, how does it handle this translation? Primarily it's through a database maintained by Ayanna. So the database he says here is known as the Olsen database after its founder. So it includes an area and a location and then an offset essentially from UTC time. Now, Postgres includes this database, but you can also use your operating system's time zone database instead. And actually when I use Postgres on Ubuntu, the default installer for it does include the operating systems copy of the time zones. And then he shows an example here of where you can create a timestamp with a time zone and how it outputs it appropriately based upon the time zone that's set for the client or for the session. Now, he mentions also two caveats is that you should probably avoid the time zone abbreviations. These are the three letter or sometimes four letter acronyms for the time zones. He also suggests avoiding the possex style time zones as well. And then he talks about conversion. So these are basically converting, say, a timestamp into a timestamp with time zone, or potentially changing the time zone of a particular timestamp you have. And there's two different ways to do it. One with a cast with time zone, but I tend to use the at time zone clause when I'm doing these. I think it makes a little bit more sense to me. Now, this is the most important part of this post, is that when you're going to be using time zones in your application, he says you have two choices, and I wholeheartedly agree. One, use timestamp throughout. That means timestamp without time zones throughout. Store UTC timestamps only in the database, but let the application handle time zone conversions. So basically you allow the user to set their time zone. You set it automatically through JavaScript, or maybe you have time zone settings per account for your application. You basically store UTC timestamps in the database without the time zone, and you convert those from UTC time into whatever the person's local time is based upon their settings. The second option is use timestamp with time zone data types throughout your database and for each session you set the time zone correctly. And basically this lets Postgres handle the time zone conversions as opposed to your application. This is a great post and really comprehensive discussion about the different time zone features within Postgres. So I definitely encourage you to check it out.
[00:09:02] The next piece of content PG Friday what should you know about basic postgres authentication? This is from Enterprisedb.com, and this is an interesting post about what happens when someone installs Postgres, and then they start trying to use it if they've never used it before. So, for example, say you're on Debian or Ubuntu, and you do a Sudo app, install PostgreSQL 14, and you say, all right, I want to get started, so I'm going to type psql, and you get a connection error, and it says, Fatal role Bones does not exist. It's like, what is that? Well, Bones is presumably the user's name. And he's like, Wait, my role doesn't exist? Okay, well, there's a command to create a user, so let me do create user Bones. But of course, that command fails, giving you the same error message. So you do a little bit more digging, and you say, oh, I think I can do it. If I assume the postgres user because the Postgres user has rights against the database, and I can create the user bones, then, well, that causes a problem, because the database bones does not exist. So basically, it can get very frustrating for new users to figure out how to connect to the actual database. One scenario you can do is you can use the psql list, and just to get a list of the database that exists, you can see that Bones clearly doesn't exist, but you can see, oh, I can connect to postgres. So you can do psql postgres, and it will actually connect to you. Now they go into the process of actually creating a database for your app and then how to actually get it connected with a password to modify the configuration, to be able to listen to particular addresses as well as configure pghba conf. So, as this post is clearly discussing, doing this for a new user is a pretty steep learning curve. And basically, as a community, we should probably try to figure out how to make this a little bit easier. But this is a very interesting post, highlighting the issues that people experience when they're new to Postgres, trying to use it for the first time.
[00:11:04] The next piece of content, the vectors of database encryption. This is from Crunchydata.com, and they're talking about basically three different ways that you can encrypt your data as it relates to data being stored in postgres. The first area is data at rest. So presumably this is data that's residing on a disk not necessarily at rest in memory, where it's just being stored there temporarily. But data at rest is considered on the disk and you can use an operating system level or a disk level way to encrypt that such that the encryption key must be used in order to access data on it. So he discusses this first vector and the advantages and disadvantages of it. The second vector is data in transit. So it's being transported presumably not within the database machine, like from disk to memory, but to another machine or to some other location. And there you're going to want to employ things like TLS and certificates to be able to securely encrypt the data in transit, when it's in transit between machines, such as even between your application server and your database server. And the third vector is data in use. So this is data that is actively being accessed. Now, I would kind of prefer a slightly different name for this, but it's data that is actively moving around that's in, say, the memory of the database, it's still encrypted at that point. So what encrypts it and decrypts is actually a key residing on the application server or on a separate server that the application server has access to, to be able to dynamically encrypt and decrypt the data before the database even sees it. So basically, when it's in memory, when it's moving different locations within the database system, it's always encrypted because the database doesn't have the keys to it. It's the application server or a separate server that has the keys to be able to do the encryption and decryption operation. And then he closes off this post talking about backups and how you can handle data at rest, in transit and in use with regard to backups. And they're pretty much the same thing, ensuring that when you take the data off of the database server and put it in files or some location it's encrypted, when it's in transit, be sure to encrypt that transmission. And when you're using data in use, essentially it's always encrypted within the database, so you don't need to do any additional action to protect it. But this is a pretty interesting perspective of advantages and disadvantages to different levels of encryption. With postgres, the next piece of content demystifying database performance for developers. This is from Crunchydata.com and they're using the example of a library and comparing it to a database and discussing how data interactions lead to better performance. And they talk about a few different areas. One is indexes and how advantageous indexes can be in terms of speeding up access to data and can also be a burden when you're needing to change that data. They talk about the important concept of index cardinality. So this is basically, as they say here, quote the number of records returned per value. So for example, if you have a unique column with an index on it, that's a very high cardinality. Whereas if you have an index on a Boolean field, which you really shouldn't, but essentially there's only two values in there. So if you have a billion rows in a table with two values, that cardinality is going to be super low. If you think you need something like an index on a Boolean field or some kind of status field that only has a handful of values, a better approach to take would be to use a partial index. So you could say where the value is true or where the value is false, and it'd be more efficient just to create, for example, in a boolean case two indexes with a partial for each case, as opposed to creating one index on it. That would be a much more efficient way to pull the data. They cover table scans and some of the disadvantage of that, as well as when you're building queries. Be cautious of using functions on the columns that you're pulling. For example, if you take a column that is in a table and you add an interval to it, it basically has to add all of that to every row before evaluating your now command. Now function indexes can potentially handle some of this issue, but it's much better to do your calculation from the now function minus three days from it and then compare it to the value in the database. Then an index can efficiently be used without having to worry about a functional index. And they also talked about the process of updating deleting records and talking about Scaling Now. If you want to learn more about this, this is actually the article that PG Analyze did with their five minutes of Postgres episode 17 Demystifying Postgres for Application Developers a Mental Model for Tables and Indexes. So if you want to learn a little bit more about it from Lucas's perspective, you can definitely check out this content.
[00:15:53] The Next Piece of content queries in PostgreSQL four index scan. This is from Postgrespro.com, and they go into a lot of the in depth details about how index scans work in postgres. So if you want a more internal look and understanding of how index scans work, you can definitely check out this blog post.
[00:16:13] Next piece of Content pgivn 1.0 Released so this is the Postgres incremental view maintenance extension 1.0 is now released. Next piece of content. The PostgreSQL Person of the Week is abajit Menansen. If you want to learn more about Abajit and his contributions to Postgres, you can definitely check out this article and the last piece of content we did have another episode of the Rubber Duck Dev show this past Wednesday evening. This one was on symbols, strings, and freezing. So if you're interested in more developer based content, we encourage you to check out our show.
[00:16:49] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.