Columnar Tables, Advent of Code, PG 16 Features, Commit Times | Scaling Postgres 246

Episode 246 December 18, 2022 00:10:18
Columnar Tables, Advent of Code, PG 16 Features, Commit Times | Scaling Postgres 246
Scaling Postgres
Columnar Tables, Advent of Code, PG 16 Features, Commit Times | Scaling Postgres 246

Dec 18 2022 | 00:10:18

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss columnar table storage, solving Advent of Code using Postgres, new features coming in Postgres 16 and when Postgres development happens.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/246-columnar-tables-advent-of-code-pg16-features-commit-times/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about columnar tables, advent of code, postgres 16 features and commit times. I'm Kristen Jameson, and this is scaling postgres episode 246. [00:00:24] Alright, hope you, your friends, family and coworkers continue to do well. Since we're at the end of the year, I'm going to be taking a few weeks off. So the next show will be coming in 2023. So I look forward to seeing you and discussing postgres in the new year. So for this week, the first piece of content is Powering PostgreSQL 15 with columnar tables. This is from Procona.com and they are talking about an extension that's called, I believe, Sizes columnar. That is actually the next generation of C store column, store foreign data wrapper. And basically instead of storing data in rows, the data is stored in columns. So all the data of a particular column are stored together. Now, the advantage of this is for online analytical processing because typically you want to take a sum, take account, know a lot about a particular column, and it also enables you to highly compress that data because it's all identical types of data within a given column. And usually you can get some speed and some space savings doing this. The disadvantage of storing it in a column is that it's impossible or at least very difficult to update data or delete data or looking at single rows becomes much more difficult. But what this post does is it covers this feature using the Citize columnar extension and he discusses some cases where you would want to use it. Basically, OLAP is a major component of what you want to do with the data. Insert performance is not necessarily a priority because it takes time to decompress reinsert new data into each column. And you have append only data because you cannot perform update delete operations on this type of a table. But it goes through and discusses more details with regard to that and then actually goes ahead and installs it and starts working with it and shows different examples and use cases of using this extension. So here you could see the typical access method is Heap. That's the standard heap that is row storage. But you also can see the access method for this particular table is columnar. So he tries different queries and different runtimes comparing the two. So if you have an appendonly analytical based use case, you may find this post of interest and this particular extension. So definitely check it out if you want to learn more. [00:02:47] Next piece of content. Advent of Code 2022 days one through five with PostgreSQL this is from Software and booze.com. And every year there is an advent of code challenge. And this particular post addresses days one through five, which includes calorie counting, rock, paper, Scissors, rucksack, organization, camp cleanup, and supply stacks. And as you can see, for each of these days, he has a video of how he did the solution for this, just using PostgreSQL. And he said he did try to avoid using functions, procedures, things of that nature. So it's just using SQL and the base functions provided. Now, related to this, [email protected] has been doing the same challenge and he has days one through five of solving the advent of code challenges. But he did it in separate blog posts. So you can see day one here, day two, three, four and five. And of course all of these will be linked in the description of this episode if you wanted to check them out and see how they approach the advent of code puzzles using Postgres. [00:03:52] Next piece of content waiting for PostgreSQL 16 add Grantable maintain privilege and PG maintain role. This is from Depeche.com, and he's talking about the new features added to 16 that enables you to grant a particular user the ability to do vacuums, analyze, reindexes, refresh, materialized views, clusters and lock table and have no other super user privileges, and also not be able to actually query the data. So he shows an example here where he granted a particular user the PG maintain role, and now he can see that it can do a vacuum, but it can actually not select from that table. So this is a great addition and he discusses this enhancement here. The next enhancement he discussed is nondecimal integer literals. So basically, Postgres can now take different types of integer literals, including Hexadecimal, Octal and Binary integer. So you can see here when he does a select statement using hexadecimal formats, octal and Binary formats, they actually resolve to the same integer, essentially. So check this blog post if you want to learn more about that. And then his final discussion is on adding test scaffolding for soft error reporting from input functions. Now predominantly this affects testing, but they say some user use cases may be present for these types of functions. And he discusses that in this blog post too. Next Post also related to things coming in postgres 16 is pgdump compression specifications. In PostgreSQL 16. This is from Cybertechn postgresql.com. They're saying in 16 with pgdump, you will now be able to determine what compression algorithm you use. So instead of just having none or Gzip, now you can use things like Gzip, LZ Four and Z standard, which I'm super happy about this for doing a PG dump because I find this the fastest, most efficient way to do it. It has parallel compression built right in, meaning you can use multiple CPU cores, which I hope is a configurable option. But like today I use PG dump but then pipe it to Zstander to do my compression. And if it's built into PG dump to be able to do that, that would be awesome. But you can check out this blog post if you want to learn more about that. [00:06:13] Next piece of content. PostgreSQL Commit Times this is from Peter Eisenstrout.org and he looked through the Git repository of postgres to determine hey, when were these commits made, on average, what time of the day over the last nearly 30 years. So you can see examples here in that prior to 2000 it was basically a late evening, early morning hours work being done on postgres. Then up to maybe about 2012, 2010, it started to be an evening occupation working on postgres. But ever since, again between 2010, maybe 2013, people committing to postgres have been doing it in the middle of the day. So basically it's become more of their job responsibility to work on postgres. And then he actually showed only his commits and basically it lines up well with that. When he started committing for his job, it became pretty much the middle of the day, whereas before it was part of his job it was primarily, as you can see, done at night. So pretty interesting insight and you can check out this blog post if you want to learn more. Next Piece of Content Benchmarking multicolumn covering and Hash Indexes in postgres this is from Pganalyze.com and these were two posts that were discussed in last couple of episodes of Scaling Postgres, and here Lucas discusses them as well if you want his perspective. Definitely encourage you to check out his content. [00:07:44] Next Piece of Content just use postgres for everything. This is from Amazingcto.com and this post, as opposed to using Redis for things, mongoo for things, Elastasearch for searching things, should just use postgres as opposed to implementing all of this other technology. And I definitely agree with a lot of what is said here. I've never implemented a separate message queuing system, I've always used postgres's capabilities. I've used plenty of the JSON B functionality in postgres as opposed to other types of non relational databases. I tend to only use postgres for searching as opposed to using Elasticsearch. I'm not so sure about using postgres for Cron jobs because I typically just run it on the operating system, although if you have a hosted database that may become more important. And also I haven't really used what they're describing here in terms of unlocked tables and store procedures to add or force an expiry as opposed to using Redis. So I would probably still tend to use Redis in this case, but this is an alternative. But if you want to learn more, you can definitely check out this blog post. [00:08:47] Next piece of content. What's new in postgres 15? This is from Supabase.com and this is a post that covers all the different features in a pretty well organized way that have been introduced into Postgres 15. So if you want a quick overview of that, definitely welcome to check out this article. [00:09:03] Next Piece of Content Global Unique Index Attached Support and its Potential Deficiency this is from Higo CA and this is another post discussing their globally unique indexes that they're trying to get working for postgres partition tables. So if you want to learn more about their progress on that, you can check out this blog post next piece of content oracle to PostgreSQL Migration Cost Assessment Made Easy this is from CyberTech Postgresql.com. They're talking about a CyberTech migrator they developed for helping you make decisions when you want to migrate from Oracle to Postgres. Well, now they have a cost estimate as a part of it. [00:09:41] And the last piece of content. The PostgreSQL Person of the week is Tushar Hauja. If you're interested in learning more about Tushar and his contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 42

December 09, 2018 00:12:14
Episode Cover

Multi-Terabyte Scaling, Kubernetes DBaaS, Encryption, High Availability | Scaling Postgres 42

In this episode of Scaling Postgres, we review articles covering multi-terabyte scaling, building a kubernetes DBaaS, encryption and building high availability. To get the...

Listen

Episode 332

September 08, 2024 00:15:10
Episode Cover

Sometimes It Is Slow? | Scaling Postgres 332

In this episode of Scaling Postgres, we discuss what can happen when queries get slow, backup best practices, Postgres emergencies and the state of...

Listen

Episode 50

February 11, 2019 00:14:23
Episode Cover

Outage from TXIDs, Breaking Scale, fsync Gate, pg_stat_statements | Scaling Postgres 50

In this episode of Scaling Postgres, we review articles covering an outage from running out of TXIDs, breaking scale, that PG fsync issue and...

Listen