Data Loading Speed, View Dependencies, Users & Roles, H/A Clusters | Scaling Postgres 81

Episode 81 September 16, 2019 00:12:44
Data Loading Speed, View Dependencies, Users & Roles, H/A Clusters | Scaling Postgres 81
Scaling Postgres
Data Loading Speed, View Dependencies, Users & Roles, H/A Clusters | Scaling Postgres 81

Sep 16 2019 | 00:12:44

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss data loading speeds, view dependencies, users & roles and high availability clusters.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/81-data-loading-speed-view-dependencies-users-roles-high-availability-clusters/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres we talk about data loading, speed, view dependencies, users and roles in high availability clusters. I'm Kristen Jameson, and this is scaling postgres episode 81. [00:00:17] Microsoft. [00:00:22] Alright, I hope you're having a great week. The first article is data loading performance of postgres and timescale DB. And this is from the [email protected] called Stuff About Things, and this is basically a performance investigation looking at comparisons of insert performance between postgres and timescale DB and mentions Simon Riggs at second quadrant prompted him to look at the performance of loading a lot of data between postgres and timescale DB. And I believe it's in reference to this article on the TimescaleDB website, where they see degraded insert performance over time. As you add insert millions of rows into the database and approaching essentially billion rows, they show timescale DB does not degrade significantly, whereas PostgreSQL does. So this blog, Post actually investigates that independently, I guess you might say, to see if this can be seen. And it looks like they use two different instance types, one a high memory, one an R five and one a CPU. High performance one compared version eleven of PostgreSQL and the twelve development version as well as timescale DB 1.22 and 1.3, and looked at the performance of loading 4 billion rows in both standard and partition tables. And what they actually got on the high CPU one was 320,000 rows per second for postgres and 225 rows per second for timescale DB. And this is an ingest rate of about 100 gigabits per hour he indicates here. So didn't really see this drop off as indicated in this. Now this may be a different version, maybe they're using version ten of PostgreSQL. I don't quite remember where and I don't see where they're saying what version this is. But he goes through the test scenario and then shows how he did the configuration and then shows the performance here. So you could see 1.22 of timescale DB is in around the average of 183,000 rows per second, whereas different variations of whether it's a standard table or partitioned into monthly or weekly, both the twelve development version and eleven version are all north of 200,000, approaching as high as 268,000 rows per second. And then looking at the high CPU capability instance here, it looks like he just used version eleven of PostgreSQL standard tables got up to 325,000 inserts per second, whereas the partitions were a bit lower in timescale DB around the 220,000. So I thought this was very interesting that he did not see the same thing. Now again, the tests are different, the methodology may be different, and of course there's more than just ingest rate if you're working with time series data. But if you're evaluating different database systems and wondering do you want to use something like timescale DB, maybe it makes sense to go ahead and do your own benchmarks with PostgreSQL versus other solutions you're evaluating to see if that gives you better performance or not and what the difference is. So definitely interesting blog post to check out. [00:03:38] The next post is tracking view dependencies in PostgreSQL So this is talking about when you have a table you've created and you create a view on it. If you try dropping the table it's going to give you a dependency error because it cannot drop the table because other objects depend on it, the view. And you can't actually alter a table and drop columns from it as well because you can't drop the column because the view depends on it. So if you have the scenario where they're mentioning here, you want to go from an integer to a big int, you actually have to remove the view, make the change and then add the view back. And they mentioned some very good best practices for views here and they mentioned views are good for two things. One, they allow you to have a recurring SQL query or expression in place for easy reuse. And two, they can be used as an interface to extract from the actual table definition so that you can reorganize the tables without having to modify the interface. And they say two patterns here that they see people sometimes use that kind of causes problems if you have views within views within views. So it makes it really hard to understand what's going on or denormalizing the whole database into a worldview and using that for all the queries where sometimes you get things that just don't work as you expect when you're using different where conditions. So basically just keep it simple with one view and don't try to nest your views in general and then they go into how views are stored in PostgreSQL. It's not stored like a function they mentioned here where it stores the text, it actually stores a query parse tree that they mentioned here. So it actually does some work ahead of time at the create view stage. So for example using the search path at the point that the view is created and that objects referred to are used by the object ID. So you can rename objects or columns used in a view definition just as long as you don't actually change what the data type is and things or try to drop it and things of that nature. And they talk about where the dependencies are stored and then they give you an example here to test queries on how you can actually look for dependencies within a view. So if you need to be able to make a change to a table, these queries will help you identify what views are dependent upon them so that you can then adjust those in order to make the changes you need. So if you want more information about views and their dependencies and how they work, this is definitely a great blog post to check out. [00:06:07] The next post is managing PostgreSQL users and roles. And this is from AWS Amazon.com site. Now, they have information related to running PostgreSQL on AWS, but as a general overview of roles and users and essentially using groups which are essentially just roles, it has a pretty easy to understand methodology that I kind of liked. So first they make the clarifications between users, groups and roles and that roles are essentially all of them. And generally roles that are users can basically log in. So for example, you can create a user with its password and it's the equivalent of creating a role as long as it's with login, so it can log in. So it's a role that can log in. That's the only difference between these two statements. That's essentially what a user is. Now, in terms of their recommendation, they also talk about the public schema and public role and things that you would probably want to revoke, like revoke the create on schema public from public as well as revoke all on database schema from public. That's a certain recommendation they have. And then you go into granting permissions from that point. And they mentioned that when you're creating the role for accessing objects, you need to give roles to the database and then the schema and then the individual objects. So in this scenario, when you're wanting a read only role, first you create your role called Read Only. Then you grant connect on the database to this Read Only role. So you grant the database, then you grant usage on the schema, whatever the schema name is, to the Read only role. And at that point you can grant select on the individual tables to the Read Only role. Now, if you don't want to do individual tables, they say you can do all tables. And then if you want to make sure that this happens for future tables created in the schema, you need to alter the default privileges in the schema for this role. And then they go over the same process for doing a read write role to be able to create objects if you want, as well as insert, update, delete, select on the tables. And then once you've done that creating users, you simply create the user and then grant it to a particular role or revoke it from a particular role. So I kind of like this methodology that's laid out here. And if you're wanting to update how you're using users and roles, definitely a blog post to check out. [00:08:33] The next post is how we build a scalable on premises PostgreSQL cluster. This is from Medium and the Enigma blog. They're talking about how they needed to create a PostgreSQL cluster on, I believe, a customer's on premises site. So they wanted to be able to build it up and have it be highly available. So they used these tools from the application, it goes through HAProxy and then they wanted to have a leader or primary database with two replicas and they wanted it to be highly available. So they're using Petrone and Console as a means of leader election. So they go through all the different components of it and how they set it up. Now, they don't have every single command on how they set it up, but they just discussed how they went through and set this up for their use case. So if you're interested in developing a high availability solution, maybe this blog post is of interest. I'll also reference that they did mention at the bottom here, going from a single PostgreSQL server to a cluster is not very straightforward, and that's definitely true. And they've chosen particular tools, but one that may be kind of easy was I covered in a previous episode where there's this extension called PG Auto Failover, and this seems like a pretty simple solution. Now, it's not maybe as robust as the previous solution, but it does give a relatively simple solution for doing some Ha capabilities. So again, if you're considering that, maybe look back at this post that was done in May. [00:10:09] The next post SQL Concepts from A to Z so this was a pretty interesting blog post that basically goes from A to Z and they choose particular things for A to Z. And this isn't PostgreSQL specific, but it's about the SQL language in choosing particular topics to cover for each of the first letters of the alphabet. So if you want to check out a pretty interesting blog post and kind of make sure you know what each of these areas are, a to Z, definitely one to check out. [00:10:40] The next post. Postgres optimization tips. Part One explain and analyze. Now this is a I would call a beginner level blog post. So if you already use Explain and Analyze, I don't think you'll get too much out of this. But if you're still learning query optimization, it's a good example. And they talk through about a query that they're using. In this example, they're using Ruby, it looks like in Active Records. So Ruby on Rails and how they needed to optimize a query. And they used Explain Analyze to kind of see what the query was doing and that it was spending a great deal of time doing a sequential scan and actually looking at and rewriting the query, not necessarily adding a new index. They found a way to get the same information they wanted just by rewriting the query, and it improved dramatically the execution time. So if you want a little bit more about how you could potentially use Explain and Analyze, definitely what post to check out next post is how to deploy and manage PostgreSQL on OpenShift using the Robin operator. Now, this is outside of my area of expertise because I've never used OpenShift, but if you're looking to deploy it on OpenShift, it goes through all the process to do it. So if this is of interest to you, Jeff, definitely a piece of content to check out. [00:11:59] And the last post is waiting for PostGIS three St transform and proge six. Again, not an area of expertise of mine PostGIS, but again, PostGIS Three is coming and the Continue crunchy data here continues to produce blog posts about it. So if this feature is of interest to you, definitely one to check out. [00:12:21] That does it. For this episode of Scaling Postgres, you could get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgrids, where you can sign up to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 0

March 08, 2021 00:17:16
Episode Cover

Citus Open Source, Time Series Performance, Subscripting Updates, target_session_attrs | Scaling Postgres 155

In this episode of Scaling Postgres, we discuss Citus 10 open source, time series performance in native Postgres, using subscripting for updates and new...

Listen

Episode 135

October 11, 2020 00:19:42
Episode Cover

Connection Scalability, Benchmarking Performance, Community Impact, zheap Progress | Scaling Postgres 135

In this episode of Scaling Postgres, we discuss connection scalability, benchmarking Postgres performance, the community impact of an acquisition and the progress of zheap....

Listen

Episode 207

March 20, 2022 00:17:54
Episode Cover

Autovacuum Configuration, Duplicate Key Violations, Pipelining, Tuning max_wal_size | Scaling Postgres 207

In this episode of Scaling Postgres, we discuss how best to configure autovacuum, the dangers of duplicate key violations, how to set up pipelining,...

Listen