Postgres 14 RC1, Little Things, Index Advisor, Million Inserts | Scaling Postgres 184

Episode 184 September 26, 2021 00:12:37
Postgres 14 RC1, Little Things, Index Advisor, Million Inserts | Scaling Postgres 184
Scaling Postgres
Postgres 14 RC1, Little Things, Index Advisor, Million Inserts | Scaling Postgres 184

Sep 26 2021 | 00:12:37

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss Postgres 14 RC1, little improvements in PG14, an index advisor and how to handle 1 million insert statements.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/184-postgres-14-rc1-little-things-index-advisor-million-inserts/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about postgres 14 release candidate One Little Things index Advisor and million inserts. I'm Kristen Jameson, and this is scaling postgres episode 184. [00:00:18] You all right? I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 14 RC one released. So this is the first release candidate of postgres. This is from postgresql.org. So this should be the release prior to the finalized version that is expected to be released next Thursday, September 30. And you can tell there's not a lot of changes since the third beta. Although one item of note here is that quote the feature where auto vacuum analyzes partition tables is reverted and will not be available in PostgreSQL 14. So I wasn't really familiar with this. I looked into it a little bit and basically the analyze portion doesn't get run on partition tables by Auto vacuum. So apparently this patch was an attempt to start doing that, but unfortunately it didn't make it in. So if you want to learn more about the release candidate, you can check out this content. [00:01:27] The next piece of content, postgres 14, it's the Little Things. This is from Crunchydata.com and he's talking about a few little features that he definitely will appreciate with postgres 14. The first one is the JSON syntax where you can now use JSON subscripting so much like you can with hashes in other languages, you can use the brackets with the quotes or double quotes to identify a piece of nested JSON that you want. So I definitely appreciate this syntax better than the standard syntax up here. The next area he mentions is read only roles. So it's quite easy to add someone to some predefined roles. The one he's talking about is PG Readall Data, and then there's also the PG database owner. So those are great additions to have. He also mentions a few enhancements to psql, the command line client for Postgres. Then he also mentions the enhancement of query pipelining. So this is basically a way to communicate with Postgres in an asynchronous fashion. Basically you can send it a query and then send another query before you receive the results back. So basically you can queue up multiple queries to send to the database and it can go ahead and execute them and then send them back asynchronously. And they're calling this a pipeline mode. So you can click on this link and it talks more about the pipeline mode in Postgres. [00:02:55] But mostly it has to do with the client being able to handle it and the libpq client, which is what a lot of adapters use for application frameworks will do this. But of course the application framework needs to handle it itself. So if I wanted to use this in Ruby on Rails, I would be using the Postgres jam and that would have to have support for this. And the Ruby on Rails framework would have to utilize that as well. But definitely an interesting performance improvement that I hope people are able to leverage. [00:03:28] Next piece of content a better way to index your postgres database PG Analyze Indexadvisor this is from Pganalyze.com and they're talking about a way to determine what indexes should be placed upon a table. And they were thinking about different ways of doing this. One way they considered is machine learning ready to do this? And they actually tried doing this using GitHub's Copilot, which is an AI based helper for developers. And they really weren't able to get anything effective out of it. So they kind of, hmm, well, how else can we do it? They thought about using actually the postgres planner to do it and they had the question, what if we use the postgres planner to tell us which index it would like to see based upon a given query? So basically ask the question, quote, what's the perfect index for this query? So basically have a query presented to it and then let postgres say, well, this would be the ideal index for it. Well then that would be the index you would probably want to create. So, based upon that logic, they created the PG Analyze Index advisor. So this is a tool that is within the PG Analyze product, but they also made a standalone one that's entirely free. And you can just click on this link to get to the free one. And basically you post the schema that you have into the top field and in the bottom field you place your query and it gives you index recommendations. Now, I tried this out and it does work for relatively simple indexes, and they say it only works for Btree indexes at this point. But like, I tried some like, operators or greater than and less than a date and I really didn't see it kind of responding to that. So this is probably still early days, but it's an interesting concept. So you may want to check out this post and this new tool. [00:05:25] The next piece of content. When you have millions of insert statements in a file, this is from Dev Two in the Yugabyte section, and he's talking about the situation where you have a file that just contains multiple insert statements in a row and how to run, say, a million of those and load up a table. Now of course, the post mentions this is probably a very inefficient way to do it. A more efficient way to do it would be to use a copy command and just have the raw data. The other scenario that might work well is to batch multiple rows per insert statement. So that way you get more data inserted per insert statement. But assuming that this is just how you have the data, how could you make this execute quickly? Well, the problem is, because of postgres's auto commit, it actually does a commit after each statement is run and running. This actually took 21 minutes. However, when he did an explicit transaction begin transaction and then ran through that process, it ran in a little over a minute. So just by avoiding all that transaction overhead drastically improved the performance of it. So that's just something to keep in mind. And then the latter part of the post does talk about gigabyte DB, but the postgres related portion is up near the top of the post. But if you want to learn more about that, you can check out this post. [00:06:50] The next piece of content restarting a sequence. How hard could it be? PostgreSQL and Oracle this is from fluca 1978 GitHub IO and he's talking about sequences in postgres. So this is the way that you can get auto incrementing IDs for your primary keys is that generally it creates an integer, either an integer or a big int, and then applies a sequence to it. And it's the sequence that gives it what the next value should be. Now you can create a sequence by just doing create sequence, give it a name and increment it by whatever value you want, typically one, and determine where you want to start it at and that gives you a sequence. And then he runs through and does a next value for this sequence that he created up to 100 times. And you can now see that if you look at the current value, it is at 100. Now in terms of resetting it, you can do select set val, the sequence and then the number you want to set it to. Although personally I prefer the Alter sequence command just because everything else is Alter table, Alter index, et cetera. This is alter sequence. Give it a name and then you can say restart. So by default it will restart over at the beginning, which would be one, which is how you've generally set it here. Start with one, it'll reset it to that, or you can tell it to restart at a particular location. Like this says restart with a given number. So it's pretty simple. But he says the thing to keep in mind is that when you're looking for what the value is, if you do select current value, it'll just tell you what the current value is, not what the next one is going to be. For that you have to do select Next val and give it the sequence. And then he goes into talking about the comparisons to Oracle and how it's interesting. It wasn't until Oracle 18 that their Alter sequence could actually restart a sequence. So I wasn't familiar with that. But if you want to learn more about sequences, you can check out this blog post. [00:08:49] Next piece of content building an oracle to PostgreSQL Migrator. Lessons learned. This is from CyberTech postgresql.com. They're talking about a tool that they developed, an Oracle to postgres migrator. And these are some of the best practices this tool has taken into account. The first one is that transaction length matters, particularly for Oracle when you're trying to transfer, they say terabytes of data over doing like a change data capture to do an initial copy of the data and then keep it up to date, getting ready to do a switch over to using Postgres. With Oracle, you can sometimes get snapshot to old error messages or can't serialize access for this transaction. And so there's some considerations you need to take into account on the Oracle side when you do this, and apparently the tool does support that. Talking about that the Oracle catalog is very slow compared to Postgres and had to do some things like having a Gui avoid reading from the Oracle catalog or even caching the data that's in the Oracle catalog, which seems a bit surprising to me. The next one is that it's easy to have invalid objects coming in, you just need to be prepared to handle those when you're doing a migration and also be aware of null bytes and broken table content that could possibly happen. Then the Post follows up and talks about ways to efficiently load data into Postgres and says because you're writing to a wall log as well as the data files itself. If you're doing the data transfer using a tool such as Copy, you would probably want to do a copy on Freeze and also be aware of the hint bits to try and minimize the number of writes that have to happen to transfer data into Postgres. And then they give some final advice about making sure you get your data types aligned up correctly when transferring data. And as they say quote, a migration is a chance to do a cleanup. So generally you're going to be doing this if you're migrating from one database to another. But interesting blog post, and if you plan on moving from Oracle to Postgres, maybe you would like to check it out. [00:10:54] The Next piece of Content pgpool Two Configuration Parameters Reserved Connections and this is from Bping Blogspot.com and this talks about Pgpool Two and Reserved Connections, where you can set a setting that will actually send a message sorry, too many clients already as opposed to blocking connections. So if you're interested in that, you can check out this blog post. Another Post Pgpool Two Configuration Parameters Enable Shared Rel Cache this is also from Bping Blogspot.com and it talks about the relation cache that exists in Pgpool Two and how it works, so you can check that out as well. [00:11:33] The next piece of content. The PostgreSQL Person of the Week is Michael Goldberg. If you want to learn more about Michael and his contributions to Postgres, you can check out this blog post and the last piece of content we had another episode of the Rubber Duck Dev show this week. This past week was on background job processing. So if you're looking for two developers discussing how to handle background jobs, you can check out this episode. This coming week, we're going to be covering Working with UUIDs, so if you want to hear a discussion determining how best to use UUIDs, you can check us out on Wednesday at 08:00 p.m. Eastern Standard Time. [00:12:13] That does it for this episode of Scaling Postgres. You can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks. [00:12:32] You our channel.

Other Episodes

Episode 104

March 09, 2020 00:15:21
Episode Cover

Reducing WAL, What is Null?, Understanding SQL, TPC-H Benchmarking | Scaling Postgres 104

In this episode of Scaling Postgres, we discuss how to reduce the size of your WAL files, what is null, how to better understand...

Listen

Episode 195

December 13, 2021 00:18:31
Episode Cover

Aurora vs. Postgres, Surprising Transactions, Write-Only & Read-Only, Indexing Advice | Scaling Postgres 195

In this episode of Scaling Postgres, we discuss Aurora vs. Postgres, surprising transaction behavior, write-only & read-only database connections and indexing best practices. To...

Listen

Episode 100

February 10, 2020 00:15:12
Episode Cover

Full Text Search, Query Optimization, Exception Blocks, Procedural Language | Scaling Postgres 100

In this episode of Scaling Postgres, we discuss full text search, a process for query optimization, caution with exception blocks and adding a procedural...

Listen