Inserts, Nearest Neighbor, Freezing, Comment | Scaling Postgres 34

Episode 34 October 15, 2018 00:11:46
Inserts, Nearest Neighbor, Freezing, Comment | Scaling Postgres 34
Scaling Postgres
Inserts, Nearest Neighbor, Freezing, Comment | Scaling Postgres 34

Oct 15 2018 | 00:11:46

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we review articles covering all about inserts, how to handle nearest neighbor, freezing confirmation and Comment.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/34-inserts-nearest-neighbor-freezing-comment/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk all about inserts nearest neighbor freezing and comments. I'm Kristen Jameson, and this is scaling postgres episode 34. [00:00:20] Alright, our first piece of content is actually a notice that postgres QL eleven release candidate one is released. So not too many changes since beta four, mostly bug fixes. So basically our release date is approaching quickly. And again, as mentioned in last week's episode, october 18 appears to be the release date for PostgreSQL eleven. [00:00:43] The next piece of content is PostgreSQL the versatile insert. And this is from PG IO. And basically they cover different ways you can do insert other than your typical doing single row inserts. So they talk about copying in bulk and in this case they're talking about where you're creating a table from an existing table and maybe appending some more data to it that essentially inserts a lot of data into this new table. And then they also mentioned you can do an insert select where you're inserting into a particular table and selecting from one or more other tables in your database. The next area they talk about is Up Certs. And basically this is using the on conflict clause. So if you're doing an insert you can actually say on a conflict of a particular value, you can tell it to do nothing, which is not really an upset, it's not updating anything, but on a conflict you can do nothing, or on a conflict you can do an update which is an upset. And they go into more complex use cases of where Up Certs can be used. The next area they talk about is returning generated values. So you can do an insert into a table and actually using the returning clause, return a given piece of data like a key. So I see this frequently used if you have a sequence, maybe you want to return the primary key or that that sequence generated for your table when you do an insert. And then a little more sophisticated use case is moving rows with CTE clauses. So you can create a common table expression that does a delete from one table, returning the values from that row and then inserting the data into the new table. So this blog post goes over some of the different ways you could insert data other than your typical inserting a single row of data. [00:02:32] The next post is Won't you be my neighbor? Quickly finding who is nearby. And this is from the Crunchydata.com blog and basically this is referring to the kneeest neighbor calculation. [00:02:45] So first thing they did is they built a visits table where someone is visiting a coffee shop, they have a visitor UUID a timestamp that they visited at and a geocode column to record the longitude and latitude point. And you can calculate distances between these two points using this distanced operator. And the smaller the number, the closer the two points are to each other. So if you wanted to find the three closest individuals, you would select from the visits table and put a particular time range and order it from a particular location to what's geocoded in the database and limit it to three so it'll pull the three nearest ones in this time frame that are nearby in this coffee shop. Now, they mentioned they do have, I believe, a Beatri index in place that they discussed in a previous article, but even with that previous index that it's in place, no indexes were used according to the explain plan for doing this particular query. So basically for finding nearest neighbors using an index, you should use a just index. So they applied a Gist index over the geocode column of their table and when they run it not using the time restriction but only the point, it does use the index and returns relatively quickly in less than one millisecond. However, when they added that time range back in, it still used that just index, but we don't have anything indexed on the visited that to be able to use within this query. But what they mentioned is that there is a Btree Gist extension that you can add to your database. So that allows you to create a multicolumn index on visitedat and geocode. And now with this query, it executes very far under one millisecond. So if you have this particular type of problem, maybe you want to check out the Btree Gist extension for use in your application or database. [00:04:45] The next article is managing freezing in PostgreSQL. So this blog post talks about how PostgreSQL basically manages its MultiVersion concurrency control. And basically as the database is actively working, it's constantly moving a transactional ID with each statement that gets executed. And that transaction ID is able to see 2 billion active transactions back and 2 billion transactions ahead of it. Now, before a transaction hits being 2 billion old, it needs to be frozen. And frozen means there's no active transaction that needs to see it anymore. And basically Vacuum or auto vacuum handles that process of freezing. So this goes over this concept in detail and tells you about some different parameters you can adjust on how to alter the freezing behavior. So it talks about vacuum freeze, min age, vacuum freeze, table age, and Auto vacuum Max freeze age, as well as some multitransaction configuration options as well. So if you're not familiar with this, this is definitely a good piece of content to check out to help you really configure your database well. But I found three other pieces of content that would be helpful for understanding this post as well. The first is freezing your tuples off. Part One and this is from the database soup blog. Now this is from 2012, but it's still relevant. And actually I found it easier to read this blog post first in order to help me understand the blog post that just came out and then there's also part two, Freezing Your Tuples Off. Part two again from the Database Soup blog and he covers some additional configuration options. And then of course, if you want to go to the source, there's the PostgreSQL documentation and specifically the area called Preventing Transaction ID wraparound Failures. So it goes over vacuum and auto vacuum and the process of freezing and how that works. So these are a set of content that can help you better understand how freezing works in PostgreSQL and how you might be able to alter its configuration to suit your use case better. [00:06:57] The next post is are your foreign keys indexed? And this is from the Cybertechgresql.com blog. So basically typically you're going to want to index your foreign keys because you're typically going to have queries that are going to be searching by those foreign keys. But another significant reason to have indexes on your foreign keys is when you delete rows or update key columns in your target table, PostgreSQL has to ensure that that foreign key constraint is still satisfied that you're not going to end up with orphaned records. So for example, you don't want to delete an order when you have a bunch of order items that are still connected to it. So definitely a blog post to check out to help you better index your foreign keys in your database. [00:07:40] The next post is PostgreSQL tips documenting the database. So this post is basically talking about a command called Comment that I actually really wasn't that familiar with. So for different database objects in your database, it allows you to set a comment by doing comment on the database object, giving it its name, and then say Is, and then put in a comment or description. And then when you go to describe a particular object, it will show information in the description column here. Now, I'm not sure I'm necessarily going to be using in some of my database systems because some of this information is stored elsewhere. But probably once you have very large number of users using the database and you wanted to use this as the source of truth for describing something, definitely a feature to check out and a blog post that goes through how to use it. [00:08:31] The next post is PG. Friday studying stored procedures in postgres eleven. And this is from the second quadrant.com blog. And basically this goes over of course the stored procedures in PostgreSQL, but this is kind of a light hearted take on things. So someone posted on Twitter this way, where you could waste transaction IDs using procedures basically loop over and just keep using up transaction IDs, which could cause a world of hurt if you read the managing freezing in PostgreSQL blog post, and then it goes into, oh, well, you know, you could do this more efficiently by just assigning this to a variable. And now you can waste transaction IDs four times faster. And then also each subheading is actually from a song lyric I felt so good like anything was possible. I hit cruise control and rubbed my eyes. So bonus points if you know the artist and the song that this is referring to. Now he goes in and tries different things like calling PG sleep when running this procedure and looking at date times and the effect when running it through the procedure. And all of this was kind of to be able to explore how procedures are going to be a little bit different than functions. So if you're looking forward to the procedures that are coming in PostgreSQL Eleven, definitely a blog post to check out. [00:09:49] The next post is dynamic monitoring of PostgreSQL instances using PG top. And this is from the several nines.com blog. So Top is a way to look at the top processes that are going on in the system. Look at CPU usage, memory usage. While Pgtop is dedicated for PostgreSQL and the output looks like this, you get a load average the number of PostgreSQL processes that are running sleeping, CPU states, memory usage, database activity, IO disk, and the different processes of PostgreSQL. So if you're not familiar with Pgtop and want to potentially use it, definitely a blog post to check out the next post is PostgreSQL Extensions for an Enterprise Grade System. Now, what I liked about this is that it basically lists some very popular PostgreSQL extensions and I'm using a number of these. But for your use case, I would say this is a good post to check out to say what else is available, what else are people using, or what extensions are they adding to their database system for what use cases. So I would highly suggest just checking out this blog post to look at some of the extensions that they are suggesting. [00:11:02] And the last piece of content is actually a YouTube channel. It's a follow up to something that was mentioned last week where they had actually posted five sessions. Well, they now have posted another five sessions this week. Now again. This is The Postgres. Open SB 2018 YouTube Channel. So definitely a lot of different content to check out. [00:11:23] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks Sam.

Other Episodes

Episode 172

July 04, 2021 00:12:39
Episode Cover

Per-Operation Explain, PG14 Contributions, Foreign Parallel Execution, Incremental View Maintenance | Scaling Postgres 172

In this episode of Scaling Postgres, we discuss tracking per-operation explain analyze, Postgres 14 contribution stats, parallel execution on foreign servers and incremental view...

Listen

Episode 128

August 24, 2020 00:16:56
Episode Cover

Statistics Speed Up, Synchronous Commits, TLS Updates, Avoiding Cursors | Scaling Postgres 128

In this episode of Scaling Postgres, we discuss speeding up performance with statistics, setting synchronous_commit, updates to TLS settings and how to avoid cursors....

Listen

Episode 306

March 10, 2024 00:14:29
Episode Cover

Scalability Limits From SLRU & Lock Manager | Scaling Postgres 306

In this episode of Scaling Postgres, we discuss one configuration change that resulted in an 11,000 times faster query, why Postgres is not using...

Listen