generate_series, PG11 Partitioning, GPU Queries | Scaling Postgres 25

Episode 25 August 13, 2018 00:13:14
generate_series, PG11 Partitioning, GPU Queries | Scaling Postgres 25
Scaling Postgres
generate_series, PG11 Partitioning, GPU Queries | Scaling Postgres 25

Aug 13 2018 | 00:13:14

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we review articles covering graphing with generate_series, partitioning in Postgres 11 and GPUs for queries.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/25-generate_series-pg11-partitioning-gpu-queries/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres we talk about graphing with Generate series better partitioning GPUs for queries in touring PostgreSQL. I'm creston. Jameson. And this is scaling postgres episode 25. [00:00:18] You. [00:00:23] Well we've got quite bit of content this week so we'll go ahead and jump right in. Our first post of the week is simple monthly reports in PostgreSQL using Generate series. And this is from the Rob Connery blog and in it he's talking about generating a graph and basically he wants to set up a monthly report that shows sales per day but he wants to see a zero if there are no sales for a particular day. And starting off a month he doesn't necessarily just want to see the first, he wants to see all 30 days and then if there are any zero, show as zero. Now to do this he's going to use a function called Generate Series to generate a series of dates. So he just first simply looks at Generate series which generates essentially integers. But here he shows how it can also work with days and intervals. And basically this would query you would need to do if you wanted to do for an entire month start at the first, stop at the end of the month and do an interval by day. And now he shows by using the date trunk function defining a month and saying now you can basically get the first of the month and you can get the end of the month by using that same function, adding an interval of a month minusing one day to get the end of the month. So basically that gives you a row per day. He then goes ahead and uses this to create a function in PostgreSQL called dates in the month where you can pass in the date and it defaults to now. Now to use this query he hits basically set up a view to get all of his sales information because that's what this graph is going to represent. And then he joins from the days in the month function to this sales view and then groups it by the date that's generated and then that gives him his graph with every date represented. So I really like this post from the perspective of the detail he went through to be able to generate something that you could potentially use if you wanted to generate graphs or tables that included every date of a time series you're trying to retrieve from the database but maybe you don't have sales on those days. For example, the next post is Talk slides partitioning improvements in PostgreSQL eleven. So this is from the second quadrant of postgresql.com blog and this was actually from a presentation in PGConf Brazil in 2018. [00:02:53] Now this is the link to the talk slide. So I'm actually going to look at this and this was by Elvario Herara. Now in this presentation you can tell here and this is a PDF of it, he goes over basically a historical review of partitioning in PostgreSQL. He talks about the new features and some of those include default partition. So you can define a partition that's a default. So if data needs to be inserted but there's no partition that it would match to, it can be placed in the default partition. [00:03:28] Row migration on update so if you do an update and it's actually going to move a piece of data from one partition to another, it can handle that. Now it can now support in addition to list and range partitioning, it can support hash partitioning which this is useful for being able to equally populate the partitions that you have set up. And then lastly in this area he mentions insert on conflict do update is something that works. Now, this is something that did not work in version PostgreSQL ten and actually I really like this update because I have some use cases where I could really use this in PostgreSQL and eleven in terms of better DDL. He talks about being able to create an index on the parent table that will then be created on the child tables some support for unique and primary key constraints some support for foreign key constraints as well as some row level triggers. And of course he talks about better performance as well, mostly with regard to partition pruning on targeting the appropriate partition for doing queries. So I know it can sometimes be hard to look at a presentation that's given where all you have is slides, but there's still some useful information in this presentation and I suggest you check it out as we get closer to PostgreSQL eleven and we find out all the different new features that are going to be coming online. [00:04:55] The next post is GPU accelerated SQL queries with PostgreSQL and PG strom in OpenShift 3.10. And this is from the Openshift.com blog by Red Hat. [00:05:10] Now this was quite a title but basically they're showing you how to use PostgreSQL along with PG Strom in OpenShift. Now, why I picked this particular article is because I was interested in the GPU accelerated SQL queries. So basically GPUs are basically graphical processing unit. They're designed for graphics. However, they tend to have a lot of cores that are specially designed for doing mathematical operations. So there are certain operations that there are queries that can be highly benefited by them. So for example, they're talking about when you're doing average count square roots of particular queries, it really accelerate them when you're using a GPU in conjunction with the CPU. And they mentioned that a CPU is basically very flexible, can do all sorts of calculations, whereas GPUs are specific for particular type of operations. But even with that they can show some significant performance improvements when using one GPU and one CPU. And they're looking at natural joins here in the execution time and as you can see, the GPU kind of gets up around 2 seconds, but never goes much more than 2 seconds as the natural joins go from one to nine. Whereas when you're just dealing with one CPU, it basically continues up in a linear fashion and they say at some point the GPU execution time will increase, but they never hit it in the test that they were doing. So in a previous episode of Scaling Postgres, we talked about the potential of using GPUs to process queries faster and basically PG Strom is an open source utility to do that. So if this is something that may be of interest to you, if you're doing a lot of data warehousing work, you might want to check out the PG Strom project. [00:07:05] The next post is taking a tour of PostgreSQL with Jonathan Katz and this is episode 42 of the Data Engineering podcast. [00:07:14] Now, so this is a podcast, it's about 56 minutes long and it didn't really have any content directly related to Scaling Postgres, but it was pretty interesting. [00:07:26] Whirlwind tour of PostgreSQL, a little bit of its history, a lot of its capabilities, talks about a little bit about the future of Sharding, which is I found particularly interesting. So again, not directly related to Scaling PostgreSQL, but an interesting piece of content nonetheless. If you wanted to check it out, the next post is Fun with SQL Common table expressions for more readable queries. And this is from the Citusdata.com blog and basically this is part of their Fun with SQL series that we've covered some of these in the past and this covers Common table expressions or CTEs, and basically how I like to think of CTEs. It's basically a way to name a sub query. So you could run this as a sub query, select all from and then just paste this right into this area. And it's essentially like a subquery, but it allows much more readable SQL because you can compartmentalize what something is doing and treat it kind of like a function is how I like to think about it. And they go through a couple of examples of using CTEs and even putting more CTEs together to create a more complex query. So this is a pretty short post, but if you're wanting to look into using CTEs a bit more, definitely a blog post to check out. [00:08:51] The next post is PG. Bouncer and off. Pass through. Now, I chose this post because I think whatever documentation or additional information can come out about Pgbouncer is definitely beneficial. And also because this post talks about a need where they wanted to be able just pass through authentication from Pgbouncer to PostgreSQL. Because typically there's a user list file in Pgbouncer that you define all your users and passwords for, and there's a particular user who didn't want to have that password file essentially just sitting on the PG bouncer instance and wanted to pass through the authentication. But in the process he actually shows you how you can go about and get just general authentication up and running with Pgbouncer now, how he actually does the pass through is actually using a function, or I should say creating a function that queries the PG shadow password database in order to accomplish it. So it's a pretty interesting post and if you want to learn more about how PG bouncer works, I definitely suggest checking it out. [00:09:59] The next two posts are from thebuild.com they're relatively short. They have a lot of condensed, excellent information. The first one is does anyone really know what time it is? And it covers how to use the now function, the statement timestamp function, and the clock timestamp function. I'll just quote this right here. Since this is so short, now is the time at the start of the transaction. So when you do begin and then do a transaction, that's when now is going to give you the time and it never changes. Quote it never changes while the current transaction is open. So until commit or rollback, it will always give the same time. Statement timestamp is the time that the current statement started running. It changes from statement to statement, but is constant within a statement. And lastly, clock timestamp changes each time it is called regardless of context. And then he gives pointers for its use, basically for a predicate and aware clause, use now or statement timestamp because these work properly with indexes and because they are constant within the execution of a statement, if you need the time to update within a single transaction, use statement timestamp. Otherwise use now. And generally you should only use the clock timestamp, which gives you the immediate time inside of a programming language, so something procedural. So definitely a quick post that gives a lot of good information. [00:11:21] The next post again from thebuild.com is Three Steps to PG rewind Happiness. [00:11:27] So first is have a wall archive. So even though you don't officially need one when using PG rewind, you're always going to want to have it so that the server you're rewinding has access to those archive logs if it needs them. Second, be sure you promote the source server before shutting down and doing things with the other server. You're always going to want to do a promote and wait until it's fully promoted. And that leads to step three, wait for the forced checkpoint to complete. So basically you want to make sure that when the secondary is promoted to being a primary, it needs that forced checkpoint to complete. So again, a very short post, but very stock full of useful information. [00:12:10] The last post is using Kubernetes to deploy PostgreSQL. This is from the Several nines.com blog and it goes over very simply how to set up PostgreSQL on Kubernetes. And they talk about the previous posts, where they talk about deploying PostgreSQL on a docker container. And this is kind of related to it. And what I found interesting is they have a section here talking on persistent storage, because in my mind, that's the most important thing to get right if you're using Kubernetes and how to set up the service and connect to PostgreSQL. So if you're interested on using PostgreSQL with Kubernetes, definitely a blog post to check out. [00:12:51] That does it. For this episode of Scaling Postgres, you can get links to all the content presented in the show Notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 22

July 23, 2018 00:11:43
Episode Cover

Concurrency, Listen & Notify, ZFS, pg_rewind | Scaling Postgres 22

In this episode of Scaling Postgres, we review articles covering concurrency, how to use listen & notify, zfs, caution regarding pg_rewind. To get the...

Listen

Episode 50

February 11, 2019 00:14:23
Episode Cover

Outage from TXIDs, Breaking Scale, fsync Gate, pg_stat_statements | Scaling Postgres 50

In this episode of Scaling Postgres, we review articles covering an outage from running out of TXIDs, breaking scale, that PG fsync issue and...

Listen

Episode 207

March 20, 2022 00:17:54
Episode Cover

Autovacuum Configuration, Duplicate Key Violations, Pipelining, Tuning max_wal_size | Scaling Postgres 207

In this episode of Scaling Postgres, we discuss how best to configure autovacuum, the dangers of duplicate key violations, how to set up pipelining,...

Listen