Non-Relational Data, Contributers, Security Features, High Availability

[00:00:00] In this episode of Scaling Postgres, we talk about nonrelational data contributors, security features and high availability. I'm Kristen Jameson, and this is scaling postgres episode 98. [00:00:21] All right, I hope you're having a great week. So we are on episode 98, approaching episode 100. So if you have any ideas of what you think I should do for the 100th episode, please go ahead and leave a comment in the box below. Or if you receive this via email, go ahead and hit reply on that email if you're part of our mailing list, and just let me know what you think I should do for the hundredth episode. [00:00:46] But to get started for this week, our first piece of content is Eight Ways RDBMS. Postgres handles non relational data. And this is from the Enterprisedb.com blog. Now, this is actually a brief synopsis of a presentation, and there's actually a link to the presentation slides, and I actually like those a little bit better. [00:01:10] These are on Momgn US, his site, so you can see in more detail kind of what was covered. And this presentation just talks about how Postgres stores non relational data. So something apart from your typical string of text or integer or boolean, things of that nature. And he covers these areas here, arrays, range types, geometry, XML, JSON, JSON B, row types, which these are pretty much custom types from my interpretation and character strings. So arrays are basically you can store arrays so more than one value within a column or within a field, which kind of goes against some of the normal forms of a relational database. But Postgres allows you to do this, and he goes through different ways. You can work with that array in terms of accessing particular entities of the array, doing unrolling, creating them, aggregating them, things of that nature. The second one is talking about range types, so you have something as a start and an end, and you can have non bounded as well. He tells you how you can query it to look, to say, does this exist in a particular range? And also mentions that you would want to use a gist index for using a range type. That's the most efficient index to use. And he goes over exclusion constraints on how you could see if something is going to be double booked. For example, you can add an exclusion strain to be able to handle something like that. The third area, he talks about geometry and how if you're using PostGIS, you can store geometric shapes within postgres and how best to index them, and again using the just index for that purpose. Then he talks about XML and the XPath query on how you can work with XML that you're storing in the database, and conversion of different values from XML. Then he goes into JSON, and this is a JSON data type that's different from JSON B. It basically stores the whole JSON structure as it's represented doesn't do a lot of changes to it. And he mentions that there's over 100 different JSON functions you can use to work with JSON. And he goes over different ways of loading it, presenting it, accessing it, concatenating it, things of that nature as well as calculations. Then it goes into JSON b. Think of it as binary JSON. So it supports indexing of all keys and values for efficient retrieval of information. It's stored in a compressed format, but it doesn't preserve the original JSON like it doesn't preserve the key order, doesn't preserve white space and retains only the last duplicate key. So this is something definitely to keep in mind depending upon how you're going to be using the JSON. And he goes over and shows the different examples here. And of course when you're working with JSON b, generally you would want to use a gen index for search purposes. Then he talks about row types and essentially these are like creating your own custom types. So he created a custom type driver's license that is actually composed of three different subtypes as you were, a character integer and a date. And then it goes over character strings and how to work with character strings, how to query them, how to handle case situations, how best to do indexes, and then following up with a talk on full text search and using TS queries TS vectors in order to handle full text search capabilities. So if you're interested in this content, definitely it's a great presentation to check out. The next post is contributors Team Redux. And this is from Robert Haas at Rhoss blogspot.com. And this is a follow on to a previous blog post that we mentioned on Scaling Postgres. But this is talking about the difficulty of how to address contributors to postgres and not even postgres, but also libraries or tools associated with Postgres. And then how about their contribution meaning differentiating between a contributor or major contributor and what happens if they're a major contributor but then they're not contributing that much today. And just all of the difficulty of giving essentially credit for people who are really helping postgres make it the way it is today. And he is also asking for feedback on how to make this better. So this is basically just an essay of the struggles of trying to recognize people appropriately and how best to handle it. So go ahead and give this a read if you have any thoughts you contribute to a comment or reach out to him if you so choose. But definitely since Postgres is a community built and supported database system, definitely a blog post to check out. The next post is actually a series of three posts, part one, two and three called Understanding Security Features in PostgreSQL. And this is from a highGo CA blog and he's basically covering different security areas of postgres, mostly with regard to network security and some of the roles. So, for example, this part one here covers listen addresses. What addresses postgres listens on the host based access in terms of how you're granting people to actually connect over the network to the database system or even locally being able to connect to the database system, how you can integrate it with an LDAP server or Pam, and then going into roles. Users and privileges and how to set those up to grant access to the system. Now the second part, which is this blog post here. Part two is all about essentially SSL or TLS and how it works public key cryptography. And this doesn't have a lot of directly relevant postgres. It's all about setting up certificates in public key encryption. But this is a good thing to read before reviewing part Three, which is definitely more PostgreSQL focused because it's talking about how you can prepare your TLS certificates, setting up a transport layer, security on the server, on the client, giving examples of how you can connect, and it even talks a little bit about transparent data encryption. This is where data is encrypted at rest and essentially Postgres can't handle that. Yet there is a plan to add some features in 13, version 13 and additional features in version 14 to have it ready by that time. But he talks about what it is and what it does. So if you're interested in learning more about the security aspects of Postgres or you're looking in particular areas you're wanting to increase your knowledge on, definitely a set of three posts, you may want to check out the next post. How to automate PostgreSQL Twelve Replication and Failover with Rep Manager Part One. So the Rep Manager is a replication manager that was set up by Second Quadrant. And this blog post goes into a lot of detail about how to set up a High Availability system using it specifically with Postgres Twelve. And they're talking about setting up three different nodes and three different looks like AWS Availability Zones. So one will be a primary and then two standbys or two replicas that will follow the primary, and then there will be an election process that Rep Manager uses to be able to determine if one of the standbys or replicas should take over the job of the primary. So it goes into a lot of depth on how to set this up and how to configure all the different features to get started. So if you're interested in High Availability and want to potentially use Rep Manager, definitely check out this blog post. A related blog post. But this relying on Kubernetes and crunchy Data's PostgreSQL Operator is another way of achieving the same means. So this blog post is deploy High Availability PostgreSQL clusters on Kubernetes by example. So they're using again the same concept concept of three different nodes. There's one primary and two replicas, but it's being done on Kubernetes and they have a load balancer that kind of handles these systems and how the operator coordinates setting this up so that you have an election to make sure there's only one primary and the other replicas follow it. And they're using the Distributed Consensus Store of Kubernetes to be able to handle this. So again, it's another way of getting high availability. So if you wanted to look at it from a Kubernetes angle, definitely check out this blog post from Crunchydata.com. The next blog post is JSONB set lax. This is from second quadrant.com and they're talking about the JSONB set function. And they say they've had some complaints that if the value argument of the function is null, the result will be null. Now, the reason this is happening is because this function is declared as strict, which basically it will turn null if the input is null. But some people wanted some slightly different behavior, so they introduced this function JSON BSET lax, which is scheduled for release 13, version 13 of Postgres. That then you can add an additional parameter that allows you to either use the default, which is use JSON null, or to raise an exception, return a target or delete a key. So if you use this function and want this additional functionality in version 13, definitely keep an eye out for it. [00:10:41] The next blog post also covering something coming in version 13 of Postgres is waiting for PostgreSQL 13 allow vacuum command to process indexes in parallel. This is from Depom and basically for the index vacuuming portion, you can assign multiple job workers to be able to vacuum indexes in parallel. And you can either set it explicitly or rely on a max parallel workers to determine how many auto vacuum workers will work on it. Now, he has built out an example here of a test table that he developed along with seven indexes disabled auto vacuum, did some delete of the data and then went ahead and run vacuum and did timing tests. So when he was doing no parallel vacuum, it ran in about 25 minutes. With two workers it ran in eleven minutes, so more than twice as fast. And with a workers it ran a little bit slower in about twelve minutes. So not quite sure, it was a little why it was a little bit longer here and he doesn't know why. I mean, this is something in development, so it's essentially at a beta stage, so maybe it will be improved. But there was definitely a big win going from zero parallel workers to two parallel workers. So definitely a feature to look out for in the upcoming postgres 13. [00:12:07] And the last post is PG watch two version 1.7.0 released. So this is a means of doing monitoring of your PostgreSQL installation. So if you use this or want to consider using it, they've recently upgraded the version, adding a log parsing, a Recommendations engine, real time stats tracking, and some new dashboards. So if you're interested in learning more about it, definitely check out this blog post and their tool that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the Show Notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks. [00:12:55] Our.

Show Notes

Episode Transcript

Other Episodes

Episode 230

Stats Collector Gone, Handling Latency, PG In The Browser, Future High Availability | Scaling Postgres 230

Episode 7

Any Scale, Monitoring Vacuum, Copy, Annotated Config | Scaling Postgres 7

Episode 54

Index-Only Scans, Lock Table, Hot Standby Feedback, Large Backups | Scaling Postgres 54

Non-Relational Data, Contributers, Security Features, High Availability | Scaling Postgres 98