Pluggable Storage, Developer Gotchas, Cursors, PG12 | Scaling Postgres 60

Episode 60 April 21, 2019 00:14:51
Pluggable Storage, Developer Gotchas, Cursors, PG12 | Scaling Postgres 60
Scaling Postgres
Pluggable Storage, Developer Gotchas, Cursors, PG12 | Scaling Postgres 60

Apr 21 2019 | 00:14:51

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we review articles covering pluggable storage, gotchas for developers, using cursors and new PG12 features.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/60-pluggable-storage-developer-gotchas-cursors-pg12/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres we talk about pluggable storage developer Gotchas Cursors and postgres twelve. I'm Kristen Jameson and this is scaling postgres episode 60. [00:00:19] Alright, I hope everyone's having a great week. Our first article is PG Friday day around the world in 2 billion transactions. And this is from the second quadrant Postgresql.com blog. In here they're talking about the issues of transaction IDs or Xids, and the fact that they are limited to 2 billion transactions based upon the fact that they're using 32 bits for this counter. Now, this can be an issue because if you have a lot of activity, you can run out of them. And they mentioned some high profile incidences of this happening and that as databases are processing more data, this becomes an ever bigger limitation. Now, I have covered some previous articles where they're talking about making it 64 bits to lengthen the runway on it a bit. But here they're talking about reframing the problem and how to kind of think about this and what's the way forward. And a lot of people just say monitor it. But as more database use cases have higher and higher volume, the amount of time you have to deal with a potential problem gets smaller and smaller the higher the activity. So you definitely need to monitor your X IDs, but given your volume, you could run out of it pretty quickly. Now, one thing they mention here is a way out of this because they talk about you could change it to 64 bits, but that would require a lot of work to do. And if you already have a large database, they frame the question here that would mean a very time consuming and antiquated dump and restore procedure. So you couldn't use something like PG upgrade. And they say, quote who wants to spend days or weeks transferring 50 terabytes worth of data to a new target that removes the XID issue. So even if we were able to make 64 bit Xids, the problem is transitioning large databases to it. But interestingly, he mentions under this heading a way out. And the point he's referring to is Andres Round, I believe this part is postgres twelve. He posted postgres now has pluggable table storage. And why this is beneficial is that once data storage becomes decoupled from the engine itself, a whole new world opens up. So instead of having to do a whole database transition upgrade at once, maybe you could create a new table storage engine with 64 bit IDs and transition it that way. And they also mentioned some of the back ends like Enterprise DB is currently working on Zheep, which maybe it could use this pluggable storage and this type of storage engine. Some of the advantages, maybe we wouldn't need vacuum anymore, maybe we wouldn't need to vacuum up rows anymore and updates could be done in place without having to keep a copy of that row around. So basically this offers a way forward potentially to moving from the 32 bit limitation on Xids. So it's definitely a very interesting article and I definitely encourage you to check it out. [00:03:23] The next post is actually a YouTube video and it's PostgreSQL gotchas for app developers by Magnus Hagandar and this was on the Foss North YouTube channel and this had some great advice for developers using PostgreSQL on Gotchas to watch out for and I'll list a few that occur in the video. Like they're talking about the importance of pooler size and perhaps basing your connections on the number of cores. In other words, you don't want 10,000 connections going to the database when you only have ten cores in your database system. How it's important with regard to the security notices that have been released is that the user you're using to connect as your application, you don't want that to be a super user, you want it to be a generic user or at least owner of the schema that you're working with. You want to be sure to turn off trust in your Pghba file since that bypasses the network security in place. He talks about different ways you could use JSON as well as some best practices with regard to migrations. He also talks about cautions when using CTEs or common table expressions, which is the with clause because PostgreSQL tends to materialize those. However, in Postgres twelve I believe they don't do it by default but you can choose to materialize them if you so choose. And then he has a whole nother section talking about orms and how best to work with them with your application with PostgreSQL. So if you're a developer, definitely a really great presentation to check out. [00:05:01] The next post is declare cursor in PostgreSQL or how to Reduce Memory consumption. This is from CyberTech Postgresql.com and this is a post about cursors. Now, I actually haven't used cursors that much in PostgreSQL, I did use them more when I was an Oracle DBA but I haven't used them any for my recollection in PostgreSQL. But they have cursors and they're talking about specific use cases where they can be beneficial. In other words, in this scenario he put a 10 million rows into this table and he just did select everything from the table and it takes forever loading that data in and pulling it from the database. So you're pulling everything at once, whereas if you use a cursor it allows you to fetch a certain number each time. So with this first one he can immediately fetch one of the rows and then when he fetches the next four rows, it gives him the next four rows from his cursor. So imagine a screen cursor and it saves the place of where you are. And you can get one record and then another four or ten or however many you need, but you have a main query. And then you move that cursor along within that data set you've pulled and it should return the data much faster because it's only pulling back for you specific amounts of information that you need. And here are some of the different capabilities like you can move to the next or the prior or the first, the last moving forward, backward. So all of these commands to be able to move the cursor throughout the data set that you're selecting. Now, one thing to keep in mind is that these do have to be within a transaction. So normally when I had used cursors it was within a stored procedure, but it's just something to keep in mind when you're using cursors, it generally has to be within a transaction. However, they do have a section here saying using cursors across transactions, which they say here to make sure the data can survive the transaction, PostgreSQL has to materialize the result. So when you do this select, it does materialize the result and then you can use it for a period of time until you close out that cursor. So if you're needing to work on a particular data set and move forward and backward through retrieving data, perhaps cursors could help you get better performance for the activity you're doing. So this is definitely a blog post to check out. [00:07:24] The next post is how to update objects inside JSON B arrays with PostgreSQL and this is freeCodeCamp.org. And basically this is a post about working with JSON B in PostgreSQL and he's assuming this scenario you have a customer's table and maybe you want to store contact information in a JSON B in this type of format where you have a phone type and a value. Although if you looked at the gotchas for Postgres video, they would definitely endorse having a separate email phone column, a separate email column as opposed to using JSON B to store it. However, you may have use cases where this is beneficial to do it. And this post focuses on how you can update a single area of this JSON B without having to rewrite the entire contents. Now for doing that you can use the JSON BSET function that's part of PostgreSQL and he goes over how to do it so that you can specifically look for and update one of those elements you need without having to rewrite the whole JSON b. So if you use JSON B in your database or applications with PostgreSQL, definitely a blog post to check out. [00:08:33] The next series of posts are all about new features that are slated to be coming with PostgreSQL twelve. The first one here and these all start waiting for PostgreSQL twelve and actually three posts in a row from Dep and the first one is Report progress of Create index Operations. Now for me this is huge. [00:08:56] So finally we get insight into what a create index or create index concurrently statement is doing. Because currently you can't really track the only way that I've found to do it is maybe monitor the size of the index as it's being built. But this commit for postgres twelve actually creates a PG stat table called PG Stat Progress create index. So it lets you know at what stage things are. Like you're scanning the table, you're loading the tuples in the tree. So I like what he says here and I agree wholeheartedly. This is amazing. Thanks to all involved. So for me this was definitely the best thing to see because this would be so welcome when you're building indexes on Terabyte plus databases. The next post is generated columns. Now I also found this very interesting. Now this is part of the SQL standard. I'm not sure which SQL standard it is meaning what date it was put in, but it lets you establish a computation to apply for a stored column. So it does stored column as opposed to virtual. So for example, they have one int, a second int and then a third int that is calculated for you. So you don't do anything. It's essentially read only to you, but you say whatever value is in the first column, add it to the second column. So for example, when you insert into this table, again, you don't insert into the generated column, but you insert into the other two columns like one and two and it will automatically calculate an update value to be three. And then here you can even use update statements to set, okay, set the second column to five. Now it will calculate that sum again to be one plus five equals six. So we will always maintain what calculation you define. And again this is called generated columns. Now presumably in the future they're going to be offering virtualized calculations so it actually won't store it. But in this first release for this feature they are doing the stored columns and they talk about historically you had to do this through triggers, but this mechanism is much more efficient. Like he did some testing and it was twice as long to use a trigger to do this type of functionality versus using generated columns. So if this feature is of interest to you, definitely something to check out once a PostgreSQL twelve gets released. [00:11:14] And the next in a series of posts are log all statements from a sample of transactions. So this goes to when you're looking in your logs for PostgreSQL and maybe you have set a log min duration statement and say it's set to print out every slow query that's greater than a second. So this is very beneficial to finds slow queries. What this offers is it adds a log transaction sample rate and here it's set to zero zero one or zero 1%. And what this does is it takes a zero 1% sampling of all the slow queries. So basically it makes your log much smaller. Now you don't get all the slow queries of course because you're taking a sample, but at least you get insight in some example queries that are slow in your system. So of course the benefit of this is less logging. With some of the testing that he did in this post, it resulted in a dramatically smaller log. So if you have a lot of transactions going through your database in very large log files, this may be a feature you want to check out with PostgreSQL Twelve. [00:12:21] The next post, again related to Postgres Twelve is New and PG Twelve statistics on checksums errors. And this is from our juju GitHub IO. Now, we had talked about this in the previous episode of Scaling Postgres, the data checksums that was introduced in 9.3, but you basically have to do it at database creation or a cluster creation if you want to enable these. With Twelve, they are adding a mechanism to enable this on existing clusters. But this post is talking about new counters that are available in the PGSTAT database view that makes checksum errors easier to monitor. So it basically lets you track these checksum failures by checking your PGSTAT database. And it tracks validation errors both for the backend activity as well as base backup activity per database. It shows cumulative number of errors and the checksum last failure. So again, this is a really great feature that's being added to PostgreSQL Twelve, and if you want to learn more, definitely check out the blog post. [00:13:25] The last piece of content is another YouTube video and it's called the Mother of all query languages, SQL in Modern Times. And this is from Marcus winend. Now, this presentation is really interesting and kind of opens your or at least opened my mind up with regard to SQL and all the advanced modern features that are being added now. It's a little bit on the cutting edge because even a lot of the features that are in the standard, even as a few years ago, haven't even made it into all of the major database platforms. Now, PostgreSQL has a fair number of the features he mentioned, but some of the like a few of the kind of the mind blowing ones aren't even a part of PostgreSQL now. But if you definitely want to get more insight into the power of SQL and what it can do and what features are probably going to be eventually added to PostgreSQL as well as other relational databases or actually he didn't like the term relational, but SQL based databases definitely a presentation to check out. [00:14:27] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 158

March 28, 2021 00:19:48
Episode Cover

Vacuum Speed Up, Faster Foreign Tables, Fast Queries With Union, Query Optimizer | Scaling Postgres 158

In this episode of Scaling Postgres, we discuss a vacuum speed up and faster foreign tables in Postgres 14, running faster queries with union...

Listen

Episode 207

March 20, 2022 00:17:54
Episode Cover

Autovacuum Configuration, Duplicate Key Violations, Pipelining, Tuning max_wal_size | Scaling Postgres 207

In this episode of Scaling Postgres, we discuss how best to configure autovacuum, the dangers of duplicate key violations, how to set up pipelining,...

Listen

Episode 256

March 12, 2023 00:12:23
Episode Cover

11TB WAL, pgec Writes, Patroni 3 & Citus, PostgREST | Scaling Postgres 256

In this episode of Scaling Postgres, we discuss WAL growing to 11 Terabytes, pgec supporting writes, Patroni 3.0 & Citus for high availability, and...

Listen