Account Configuration, Pricing, System Identifier, HyperLogLog

[00:00:00] In this episode of Scaling Postgres, we talk about account configuration, pricing system identifier and hyperlog log. I'm Kristen Jameson, and this is Scaling. Postgres episode 96 one. [00:00:21] All right, I hope everyone's having a great week. Our first piece of content is my go to postgres configuration for Web services and this is from the tightlycoupled IO blog. And even though he's talking about postgres configuration for web services, basically my interpretation of this is Web application because he talks about some Ruby, some Python. So basically web applications, what's the best way to configure your database for that? Now specifically he mostly focuses on account based information and some configuration settings related to accounts. And I would say this piece of content is a must read if you're a developer. I really liked a lot of the way he laid out things and the way that he's chosen to set up his database. First, he runs through a few assumptions. The first is that you have super user access to the database, although he said he did do an update for services like RDS and you're developing with a web service team. Schema changes are done as part of an admin process. So you're only going to be creating objects, views, tables, sequences, indexes through an admin process and for example, using Rails migrations or Python migrations or this Sqlize migration. Next assumption is that you just have one application postgres database and one schema that happens to be named after the database. But I imagine you could change that if you want to. So basically how he has set it up is that you have one user that owns all the database objects. So the database schema, tables, index is all owned by one user that he happened to call owner. But of course you can make that name whatever you would like. And any of these migrations that happen that change objects in the database are run by this owner. Next you have a role that is for read write users. So they can do select, update, insert, delete, but they can't create objects. Next is you have read only users so they can just do selects, but they can't do anything that the read write users or the owner can do. And essentially what your application connects to the database with is an app user that essentially belongs to the read write role. So again can't create objects. So you need to run your application migrations, if any exist, using the owner user. Now, he does make some configuration settings at the database level as well as at the user level. Defining statement timeouts to ensure statements don't run too long a lock, timeout to avoid lock queues that something is not locked for too long. Idle in transaction session timeout to make sure you don't have long running transactions that are idle in transaction. Defining connection limits. So you're not using all of your connections for your app user, but you have room to log in as owner or a super user to do certain activities and then defining the search path because he is defining a separate schema as opposed to using a public schemas. Now, he lists a fair amount of upsides that you're welcome to read through and add a few downsides but I would agree with him that the upsides outweigh the downsides. And then he goes through the whole process of how he runs it, the exact commands to set up the database as well as the users and all the different permissions and he even includes a gist of it here as well. And then at the tail end he has some considerations about how you do certain things like you need to make certain schema changes. What are some things you can do to handle local development? If you want to list users, you want to add new users using this type of scheme. You need to remove users, update their settings or change other settings. So overall this is a really great blog post and if you're a developer, I highly suggest you take a read through and see if you want to make changes or modify how you tend to set up or configure your database when working with web applications. [00:04:19] The next post is RDS pricing has more than doubled. This is from Rick Branson on Medium and he noticed this very interesting disparity between the EC two cost. This is hourly on demand costs and RDS. So their hosted database service hourly on demand costs. And then these instant types list which generation like the first generation, the price premium for them managing the database was 33%, the second generation it's at 35%, 3rd generation at 39%. But then on fourth generation it jumps up to 75%. And then at the M five this is their most recent generation of general use case instances it's 78%. In addition, I checked postgres and specifically the postgres RDS. It's even more of a premium, getting closer to about 85% for these M five S. Furthermore, looking at the R five S, which are the memory optimized, which you would think people would want a lot of memory for their postgres instance, and if you're using R five S, the price premium is closer to almost 100% for postgres. So this is something I didn't really realize that prices had. Essentially this price premium had gotten larger and larger as they've gone with new instance generations. Hence the title. Why? RDS pricing has more than doubled. So it's definitely something to keep in mind about. Do you want to rely upon a cloud service provider to manage your database? And clearly if have a large enough budget you can do that and not worry about it. But if cost is a concern, managing your own, particularly as these instance sizes grow larger and larger as you scale your application needs, you may want to consider running your own as opposed to relying on a vendor such as Amazon Web Services. So short but definitely interesting blog post is something I really wasn't aware of because I actually use just standard EC two instances and manage my own databases. I've thought about using RDS for smaller postgres instances that I use but I really haven't taken the jump yet. But seeing these price differentials, I think I may just keep managing them myself, even for small little used databases. [00:06:46] The next post is support for PostgreSQL's System Identifier in Barman. So this is from second quadrant, Postgresql.com, and they are talking about barman 20, which is a backup and restore manager for Postgres. And they're talking about the system identifier, which is a unique identifier to each postgres instance. And they've added support for it to make sure that when you're doing backups or restores that the wall files you have match the actual database files so you don't get a mismatch between them. And they say how you can find out where the System Identifier is. You can use the application PG Control data to get it and it's part of what's returned the database System Identifier. Or you can actually do a psql query, querying the PG Control system to get the System Identifier and you can even get it through the replication streaming replication protocol in looking for Identify System and it will return the System ID. And the rest of the post just talks about how Barman has implemented this to again give you those types of protections to make sure you're not trying to restore wall files from another instance with another associated backup. So this was an interesting identifier I was not aware of, so be sure to check out this blog post if you want to learn more about it. [00:08:08] The next post is postgres hyperlog log extension. And this is from Getwisdom IO and they're talking about hyperlog log, which is an extension from Citus that actually enables you to do very fast, distinct counts at an approximation. So it's not 100% accurate. And even some of the things they mention here quote the catch is you get about a 1.5% accuracy configurable, of course, by taking up more space, so it seems very inaccurate. But again, the more space you dedicate to the statistical data structure should be able to give you more accuracy. But it basically lets you do very fast distinct counts over large amounts of data and they give an example of how you can set this up and actually use the Hyperlog log extension as well as some gotchas to be aware of. So if you have a use case dealing with a lot of data needing to get distinct counts, maybe check out the Hyperlog Log extension and this blog post. The next post how to Create, Test and debug an extension written in C for PostgreSQL. So this was an interesting post that basically describes how you can create your own C extension. So it goes through over creating the extension actually testing the extension and creating a test script that will test it and then doing a debug using the Eclipse IDE. So if you're interested in getting started with writing your own extensions for PostgreSQL, definitely a blog post to check out. And this is from the Haiko CA website. [00:09:43] Now a second post from them is how to build and debug PostgreSQL Twelve using latest Eclipse IDE in Ubuntu 18 Four. So if you actually wanted to essentially build from source, this is another blog post to check out and do debugging using the Eclipse IDE. [00:10:00] The next post is PG. Qualstats Two global Indexadvisor. And this is from our juju GitHub IO. And basically he runs Power, which helps with postgres performance optimization and monitoring. And there is an index advisor that's a part of that. And it looks like he's rolling out this feature into something he's calling PG Qual Stats Two that basically does a prediction of what indexes are needed for particular queries that are hitting the database. So he basically has pulled this out of the Power product. So if you're interested in using that, he said it's not released yet, but basically in a testing phase. So if you're looking for a tool that will enable you to look at different queries and statistics to give suggestions as to what indexes you may need, definitely a blog post and a tool to check out. [00:10:54] The next post is finally fixed. Time calculations for parallel plans in Explain dep. [00:11:02] So this is actually a graphical means of presenting an Explain plan. Now, I haven't really used it, but if you want a graphical representation to use it, and apparently there was a time calculation for parallel plans that had an issue that has been resolved, so feel free to check this out if you want to get a more graphical representation of your Explain plans. [00:11:24] The last post is psql editor fighting with Sublime text under windows. So this is basically when you're using psql and wanting to define editors, there's a variable you can set, psql Editor and he's talking about using it under Windows and the best way to set it to be able to use, say, Sublime Text. So if you're interested in that, definitely a blog post to check out. [00:11:47] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks like that's.

Show Notes

Episode Transcript

Other Episodes

Episode 357

Postgres Best Practices? | Scaling Postgres 357

Episode 315

Hacking On Postgres is Hard! | Scaling Postgres 315

Episode 364

Scaling For High Volume | Scaling Postgres 364

Account Configuration, Pricing, System Identifier, HyperLogLog | Scaling Postgres 96