DBMS of 2018, Data Anonymization, JIT, Reserved Keywords | Scaling Postgres 45

Episode 45 January 07, 2019 00:07:01
DBMS of 2018, Data Anonymization, JIT, Reserved Keywords | Scaling Postgres 45
Scaling Postgres
DBMS of 2018, Data Anonymization, JIT, Reserved Keywords | Scaling Postgres 45

Jan 07 2019 | 00:07:01

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we review articles covering Postgres as DBMS of 2018, data anonymization, implementing JIT and reserved keywords.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/45-dmbs-2018-data-anonymization-jit-reserved-keywords/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about DBMS of the Year, data anonymization, JIT and reserved keywords. I'm creston, Jameson. And this is scaling postgres episode 45. [00:00:17] One. [00:00:21] Alright, we're still a little light on content, but we do have few more articles this week, although not too many of them are with regard to scaling. But the first article is PostgreSQL is the DBMS of the Year 2018, and this is from the Dbengins.com site. And basically like last year, PostgreSQL has been chosen as the database management system of the year, with the runner up being MongoDB, and in third place Redis, which are one's essentially NoSQL document store and the other is a key value store. So another good year for PostgreSQL. [00:01:00] The next article is eight anonymization strategies with PostgreSQL. [00:01:06] Now he's talking about data anonymization, which basically means making your data anonymous or basically hiding it from view. And there he's working on a project called the PostgreSQL anonymizer. Now, the purpose of this is generally for using some data in development or CI or functional testing or analytics, where you're anonymizing the data, trying to obscure it or scramble it, but still have some use cases for it. Now he's talking about a basic people table here that has a name, address, age, salary, phone number, et cetera, and how to go about perhaps making this data more anonymous. Now actually what I found the most beneficial is this sampling here, because there's actually a PostgreSQL feature called Table Sample, which enables you to get a percentage sampling of your table. So if you have a production database and want to get a minimal sample of it for some sort of testing purposes, you can use this feature, Table Sample. They're using a Bernoulli at 20% to get 20% of the records from that. And he also mentioned an extension here called PG Sample, which also maintains referential integrity while taking these samples. So this is something you haven't really heard about, but that's definitely a couple of interesting tools. Now he goes into different techniques that he believes could be used. You could use suppression, basically converting everything to null, do random substitution, random, inserting random characters into the fields variants, basically varying the data that's in the field by some sort of percentage amount, using encryption techniques, shuffling the data around, faking and mocking partial suppression, basically various different techniques to try to obscure the data. And at the end here, he gives some suggestions on when to use these different types of techniques. Now, I haven't had too much of a call to anonymize data, I basically use fabricated data or entirely fabricated data for testing purposes or things of that nature. But if you're interested in trying to anonymize the data you have, this is definitely a blog post to check out on some different techniques you could potentially use. [00:03:21] The next post is an overview of just in time compilation or JIT for PostgreSQL. And this is from the Several nines.com blog and basically this is covering JIT. That is the new feature in PostgreSQL Eleven. Talks a little bit about why they're using LLVM and when to potentially use JIT, which is pretty important. So it's mostly used for analytical databases and to speed up expressions that you're calculating. So if you have to do a lot of calculations in an analytical database, that's probably where JIT is going to give you the most benefit for an OLTP or transactional workload. Maybe not so much in that first you have to make sure it's compiled with that support built into it and then you actually have to enable it and potentially adjust the cost settings to be able to use JIT. So if you think this is a potential use case for you, definitely a blog post to check out because they do go over how to set it up and the different results that they get from using it. [00:04:26] The next blog post is Keyword Mess and this is from the CyberTech Postgresql.com Blog and basically these talk about reserved keywords. So it starts off, for example, here once I received a mail with a question, can you tell me why I can't select the column references? So that is a reserved keyword and basically you have to potentially use double quotes around it. I myself have encountered this where say, a column name is named from or to and you have to use double quotes in order because those reserve keywords when doing SQL statements. But this goes into much more depth of identifying the different sources and how these reserve keywords are set up. So if that is something of interest to you, definitely a blog post to check out. [00:05:13] The next post is fun with SQL selfjoin. And this is from the Citusdata.com blog. Now generally where you would use self joins is when you have some sort of relationship, like maybe a manager to the person they're managing. In this example they're using a gift exchange, so somebody needs to get a gift for someone else. And how you would do this query is you would join a table to itself and give it an alias for it when you're doing your query. So if you want to brush up on what self joins are, definitely a blog post to check out. [00:05:45] The last post is PostgreSQL replication for disaster recovery. And this is from the Several nines.com blog and it talks about how you could handle disaster recovery with PostgreSQL in terms of having a master database with a replica and how you can promote that replica in case something happens with the primary database. It also goes in discussion about point in time recovery and how you can set that up. So if this is something that's of interest to you, it's definitely a blog post to check out. I also encourage you to check out Scaling Postgresql.com because we have some tutorials I've done that talk about doing a backup and point in time recovery for your system, doing streaming replication and also talking about replication slots. So these are video tutorials, so if you're interested in that content, just check out Scalingposgrows.com Tutorials. [00:06:38] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode. Or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 198

January 16, 2022 00:14:46
Episode Cover

Monitoring Progress, More SQL, Replication Slot Failover, Postgres Contributors | Scaling Postgres 198

In this episode of Scaling Postgres, we discuss how to monitor DML & DDL progress, using more SQL, one way to handle replication slot...

Listen

Episode 153

February 21, 2021 00:12:24
Episode Cover

Int to BigInt, Resistance is Futile, Optimizer Statistics, Advanced Query Optimization | Scaling Postgres 153

In this episode of Scaling Postgres, we discuss how to convert an integer primary key to a bigint, how container resistance is futile, how...

Listen

Episode 244

December 04, 2022 00:14:59
Episode Cover

Index Merge vs Composite, Transparent Column Encryption, Trusted Language Extensions | Scaling Postgres 244

In this episode of Scaling Postgres, we discuss merging indexes vs. a composite index, implementing transparent column encryption, developing trusted language extensions, and reviewing...

Listen