Hello Babelfish, Planner Deconstruction, Exist & Not Exist, Fun With SQL | Scaling Postgres 190

Episode 190 November 08, 2021 00:12:32
Hello Babelfish, Planner Deconstruction, Exist & Not Exist, Fun With SQL | Scaling Postgres 190
Scaling Postgres
Hello Babelfish, Planner Deconstruction, Exist & Not Exist, Fun With SQL | Scaling Postgres 190

Nov 08 2021 | 00:12:32

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss the open sourcing of Babelfish, deconstructing the Postgres planner, when to avoid exist & not exist and having fun with SQL.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/190-hello-babelfish-planner-deconstruction-exist-not-exist-fun-with-sql/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about Hello BabbleFish planner deconstruction exist and not exist and fun with SQL. I'm Kristen Jameson, and this is scaling postgres episode 190. [00:00:23] Alright, hope you, your friends, family and coworkers continue to do well. Our first piece of content is announcing open source BabbleFish for PostgreSQL, an accelerator for SQL Server migration. This is from Babelfishpg.org and this is a follow up to announcement where AWS has developed BabbleFish, which is a way for Postgres to basically talk like a Microsoft SQL Server. So they've now open sourced the code and they've created this site to explain more about it. So basically it is line compatible with Microsoft SQL Server. So it supports the SQL syntax of Microsoft SQL Server. It supports the TSQL language. So those all are all the different constructs like store procedures, variables, exceptions, things of that nature, as well as the TDs which is SQL Server's Wire Protocol. So it basically makes Postgres Server act like a Microsoft SQL Server. Now to get started with it, it says you have to add a patch against PostgreSQL, so they have the patch here on GitHub, as well as install different extensions. One for the TSQL language, one for the TDs Wire Protocol, another one for new data types found in TSQL, as well as a specific one for money support. [00:01:47] Now there was also a tool I think called Compass that also lets you do an audit of your Microsoft SQL Server to confirm that you can indeed migrate to using Babelfish on PostgreSQL. But this is a great addition for people who are wanting to potentially migrate to PostgreSQL and you're working with Microsoft SQL Server, so definitely encourage you to check this out. There's also another post by AWS itself that is called Goodbye, Microsoft, SQL Server. Hello BabbleFish. Now this Post talks about how to get this up and working on Aurora because that's the only service that AWS provides. Where BabbleFish is working right now is on Aurora. They don't offer it for their RDS version of Postgres, which is a little bit ironic because the first page of the Post talks about many of our customers are telling us they want to move away from proprietary database vendors to avoid expensive costs and burdensome licensing terms. The irony is that you're moving to Aurora, which is a proprietary database. Now it's Postgres compatible, but it's definitely an open source and you can only get it in one place, which is AWS. So a little bit of an irony with that, but I assume eventually they're going to make this work with their PostgreSQL hosting options as well. But this walks through how to create an Aurora instance and to get it up and working to be able to talk as a Microsoft SQL Server. So if you're interested in that, you can check out these two blog posts. [00:03:18] Next piece of content, how we deconstructed the postgres planner to find indexing opportunities. This is from Pginalyze.com and this is a very interesting post where for their service, they actually went through the PostgreSQL open source code and extracted the planner because they want to get to the point where they can do automatic index recommendations with their tool. And they also open source, I think, a part of that tool where you can give it a scheme and give some index suggestions. So basically, they utilized a tool called libclang, which extracted the source code for Postgres and for the Planner. It was almost a half million lines of postgres source, and they basically got it working as an independent library and built it into their separate tool. So it basically runs the Planner utilizing information about schema and some statistics as well, which we'll talk about to give index recommendations. [00:04:20] So they have an example of a relatively complex query here. They ran it against a Postgres server, and it came out with a particular plan. And then they ran it against their PG plan function that they developed. And the plan estimation was within 1% of the actual production estimate. So that seems pretty close. And basically, they want to use this in order to do index recommendations. And the other thing that they're adding to this, which we noted was missing before, is a measure of selectivity by utilizing the PG statistics table. So with that information plus information about the schema, they should be able to get more accurate index recommendations. So this is definitely something they're actively working on and improving, and it seems like they're making some pretty good improvements to it. So if you want to learn more about this, definitely check out this blog post. [00:05:14] The next piece of content, three cases against if not exists. If exists in postgres DDL. This is from postgres AI. And when you create a table, you can say create it if it doesn't exist. Now he's saying where not to use. This is in migrations, and a migration is basically an application framework sending to the database some means to alter the schema. Like you want to create a new table, you want to add a new index. Well, typically those are done through migrations that send commands to the database to move it to the next version of the schema as far as the application framework is concerned. And usually you can roll those back if there's a problem. The problem with using if exists in this type of migration is that now you're in a relatively unknown state because you don't know if the table existed or not when you ran it. So basically he's saying this is a no no. And I definitely agree. Personally, in my application, I've never used if exists or if not exists when creating tables. Because again, you want to know if there's a failure, and if there's a failure, figure out why. And if you need to roll something back, there's always the rollback functionality in the migration tools. Typically the next one he says not to do is Drop Table if exists. So again, this can cause definite problems if you're doing as part of an application frameworks migration tool set. [00:06:37] Now I use Create a table if it doesn't exist all the time, or Drop a Table if it exists. If I'm doing tests on temporary tables like this is not for an application framework migration, but if I'm working essentially outside of that framework and I'm creating temporary tables to maybe do some other DBA work. I use Drop table if exists or Create Table if not exists frequently, but typically those are for temporary tables that only exist for a span of time for particular work I'm doing. The other area he talks about is create index concurrently if not exists. Now, this is a big no no because you could potentially try creating an index multiple times because if there's an error when creating an index, it leaves it in an invalid state there. So you could be creating multiple invalid indexes if you leave this. So definitely something you wouldn't want to do, but definitely some great recommendations. And if you want to learn more about it, you can definitely check out this blog post, the Next Piece of Content fun with SQL in Postgres Finding Revenue Accrued per Day this is from Crunchydata.com and they had a previous post where they developed a query to show the monthly recurring revenue for like a subscription service. And this one they wanted to look at revenue accrued per day. So as payments came in, increase the running total revenue. And this blog post works through how to do that and the query changes that are necessary and explains how to do it. So if you want to learn more about working with SQL, you can definitely check out this blog post. [00:08:13] The next piece of content is exploring plpgsql part Two implementing a Fourthlike Interpreter So this is from Eatonfield.com and this post is a follow on to a previous one where he takes it to the next level in terms of working with plpgsql and he develops an interpreter. So if you want to learn more about that, you can check out this blog post, the Next Piece of Content the PostgreSQL Timeline Concept this is from Haigu CA and he's talking about when working with wall files. There's the concept of the timeline and there's a timeline ID that's built into the wall file. And whenever you do a point in Time Recovery Restore or whenever you promote a replica, it starts a new timeline and it discusses that. It discusses about the history file that's relevant to tracking the timelines with regard to postgres. So if you want to learn a little bit more about how postgres works in terms of the timeline ID, definitely check out this blog post. [00:09:12] The Next Piece of Content petrone and Etcd in High Availability Environments this is from Crunchydata.com and they're talking about one way to do High Availability is working with the Petrone and Etcd tools and working together, they help you create a cluster of PostgreSQL servers that enable the system to detect errors and promote certain replicas. Now, this talks about particular failures that can happen with this kind of setup because typically you have a Petrone service running on each replica as well as an Etsy D service. And one of the things I talk about because of the heartbeat process that it constantly checks to make sure that the postgres database is up and running, you're going to want to ensure that your network has super High Availability and essentially no downtime. Otherwise you're going to start getting failures in the High Availability and you'll get failovers when you aren't expecting them. The other issue is that disk rights to etcd or any kinds of disk issues can also impact that as well. So they go into detail about things you need to watch out for, such as underresourced file systems, I O, convention, networking delays, et cetera. And also how to diagnose the system in terms of looking at different logs to find the cause of potential High Availability issues. So if you want to learn more, you can check out this blog post. [00:10:32] The Next Piece of Content can't resize your postgres kubernetes Volume no problem. This is from Crunchydata.com and they're talking about how to do volume resizes with regard to Kubernetes and how this works when running postgres, particularly with their postgres operator. So if you want to learn more, you can check out this blog post. [00:10:52] The Next Piece of Content vehicle Routing Optimization with Amazon Aurora PostgreSQL Compatible Edition this is from Amazon.com and they're talking about essentially the traveling salesman problem, where you have a particular salesman or vehicle that needs to go to certain destinations and what is the most efficient route to do that. Now this uses Aurora, but because this uses PostGIS and PG routing as well as OpenStreetMaps, you can use Postgres for this. So anywhere where they're using Aurora, you should be able to use PostgreSQL as well. So if you want to learn more about how to do that, you can check out this post. [00:11:30] The next piece of content, the PostgreSQL Person of the Week is Tatsuru Yamada. If you want to learn more about Tatsuro and his contributions to Postgres, definitely check out this blog post and the Last piece of Content we had another episode of the Rubber Duck Dev show this past Wednesday, and we discussed how much time should you spend planning versus doing in terms of your coding work. So if you're interested in that content, you can check out this episode. Our next episode, Wednesday at 08:00 p.m. Eastern Standard Time, we will be discussing how to start testing your code. If you have no or low test coverage, that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can sign up to receive weekly notifications of each episode. Or you can subscribe via YouTube or itunes. Thanks. [00:12:29] You subscribe.

Other Episodes

Episode 5

March 26, 2018 00:14:44
Episode Cover

Constraints, User Order, PG Terrible, Logical Upgrades | Scaling Postgres 5

In this episode of Scaling Postgres, we review articles covering database constraints, user defined ordering, how PostgreSQL is terrible, TimeScaleDB and using logical replication...

Listen

Episode 285

October 08, 2023 00:18:02
Episode Cover

Indexing Deep Dive | Scaling Postgres 285

In this episode of Scaling Postgres, we cover a deep dive into indexes from a presentation that includes a decision tree of sorts, how...

Listen

Episode 245

December 12, 2022 00:11:40
Episode Cover

ENUMs vs Check Constraints, Faceting With Roaring Bitmaps, Better Scaling, In DB Business Logic | Scaling Postgres 245

In this episode of Scaling Postgres, we discuss ENUMs vs. check constraints, querying table facets with roaring bitmaps, a better way to handle scaling...

Listen