Episode Transcript
[00:00:00] In this episode of Scaling Postgres we talk about Chaos Order, pub sub, petabyte scale and PG bouncer connections. I'm creston, Jameson. And this is scaling postgres episode 90.
[00:00:20] Alright, I hope you're having a great week. Our first piece of content is that PostgreSQL versions have been released. So twelve point 111.610.9 and some of the nine series have all been released and it's predominantly bug fixes and improvements so not really any security issues. And also they did mention in this coming February PostgreSQL version 9.4 will be End of Life. When you get an opportunity, go ahead and upgrade your version. The next post is chaos. Order randomizing queries to uncover order dependency. And this is from Simplethread.com and basically it's talking about the concept that unless you specify an order, there's no guarantees as to what order in which you're going to get returned data from an SQL query. So for example, this first query we are asking for one user from the users table, there's no guarantee that you're going to get ID One or ID five. You could get ID three, you have no idea. Now in a development environment maybe you're going to get it to see some consistency with regard to this, but there's no guarantee what you're going to be returning. Therefore you should always use an order by. For example if you have an incremental ID such as this, maybe you want to order by the ID or if there's a date and timestamp, maybe you want to order by the date or timestamp that's available. But this blog post goes into a way to actually insert randomness into your tables to be able to sleuth out potential errors that could be caused by not using order by for all your queries. So this imagine a very large code base and you're wanting to say is it resilient to these changes in order by potentially. So he has this PostgreSQL function that actually adds a random column and then you can cluster this table by this random column to change the actual order of the table because clustering actually changes the physical order of the rows on disk. So if you order by this random value you get to see when you do select all from users that the ID is now essentially randomized. Now if you do new inserts and things of that nature you're going to have to reorder it again. But still this is a tool to test and make sure that your applications and really all your queries are using order buys when necessary to ensure the proper orders being returned. So if you're interested in learning more about that, definitely a blog post to check out. The next post is system design hack. Postgres is a great Pub sub and job server. So this is from Layerci.com and it's talking about how they've chosen to use PostgreSQL as their Pub sub server and even potentially a job server. And it says you have a lot of options. So they mentioned Kafka roberta MQ. Redis Pub sub Cloud Solutions But you can use Pub sub with Postgres. And how they've specifically used it is that they have CI workflows, continuous integration workflows they run through and they have a certain status. So an API creates a CI run that has a status of new, and then a worker grabs that new piece of work and changes the status to init and then follows up with the status of running. So they show here where they have created their own type with specified states as an enum and all the different statuses that it can be. New, initializing, initialized running, success error. And then they have a CI jobs table. And on an API call, a new row is inserted into this table. And then a worker claims a job by doing an update, setting the status to initialized. And they're choosing the IDs and using for update skip locked to do it. So they're skipping those rows that are already locked. And then they have a function that actually publishes to the CI Job status channel once the status has been inserted or updated as a part of the trigger on that table. And then they show the client code that actually consumes or monitors this channel that then picks up the job to do the work. And they have a graph here that says an API calls doesn't insert into Postgres that fires the trigger, that notifies the Postgres channel that they've created. And then the innet worker is notified by that channel that it's listening on and it claims the row with an update, setting the status to initializing. And then work continues from there. And then they go into some of the other benefits. So if you're interested in using Postgres as a Pub sub platform, definitely a blog post to check out.
[00:05:07] The next post is architecting Petabyte scale analytics by scaling out postgres on Azure with the Citus extension. This is from the techcommunity Microsoft.com. Now there seems to be more so of a marketing piece describing the Cytus extension, but this kind of goes into the story of the Windows Update platform where they release all of their Windows patches for their operating systems and the analytics that they use to track it. And this was before Cytus became a part of Microsoft, and it describes how they were looking for a solution for their analytics and how they identified the Cytus extension and PostgreSQL as a potential solution for the analytical problems that they were dealing with. So it definitely talks a lot about the business case and kind of how they chose this solution. And of course they eventually purchased them. So clearly it was definitely working out. But it goes into some of the architecture that they use for doing the analytics for the Windows updates along with the Citus cluster. And then they're also talking about building it on Microsoft Azure since it's now supported on azure now. So definitely an interesting piece of content to check out and read over.
[00:06:21] The next post is actually a YouTube video and it's called Discover What's New in Postgres and Advanced Server Twelve. And this is from the Enterprise DB YouTube channel. And this is a webinar that discusses some of the new features in Postgres Twelve. Predominantly these eight listed here partitioning improvements, b tree improvements, multicolumn most common value statistics inline many CTE queries, prepared plan control just in time, compilation checksum control and re index concurrently. And I believe most of these favorite features are from Bruce momgm. So if you want to find out more and get a discussion about it in a video format, definitely a presentation you can check out.
[00:07:06] The next post is PG Bouncer connection Pooling what to do when persistent connectivity is lost. Now. This is from Enterprise DB. It's the second post in the PG Bouncer series where they did an install of it. So this is saying what happens when you do a query again through PG Bouncer and you bring down the connection for a minute. And basically what happens is it hangs. But eventually once the network interface is back up, it continues and does the query. But what happens if you actually restart the database, not just bring down the network? Well then you're going to get a fatal error that mentions the server connection crashed, server closed the connection unexpectedly. Now they say one way to get around this is you can actually change the pool mode of PG Bouncer to transactional. So it only presents this message in the cases where you're using Session Pooling. Now they do say that they have a server reset query that you can adjust if you're wanting to use Session Pooling. But most use cases I've seen of PG Bouncer are that people want to use Transaction pooling, but with a Transaction pooling enabled you'll see that you won't get that error message anymore. So we did a query, did a restart of the server and then the next query there is no error message.
[00:08:23] The next piece of constant is also related to PG Bouncer and it's webinar use SSL with PostgreSQL and PG Bouncer follow up. And this is a webinar given by secondquadron.com and it is pretty comprehensive webinar about an hour in duration that talks about using SSL with PostgreSQL. About the first 20 minutes are general SSL teachings explaining the purpose of SSL and certificates and keys and things of that nature. And it's around the 20 mark that they start getting into some of the PostgreSQL configurations. And then around about the 45 minutes mark they talk about PG Bouncer specifically and configuring that to use with SSL as well. So if you're interested in that type of content, definitely a webinar to check out and you can just register for using the link right here.
[00:09:14] The next post is waiting for PostgreSQL 13. So they're already on working on 13 and it's introduced the force option for the Drop database command. So before, if you try to drop a database and you have a connection active, it won't let you. Personally, I kind of like that as a protection. But now they offer the option to use force to do it. So you can do a Drop database database name with force and it will go ahead and drop the connections and then drop the database. So another feature coming with PostgreSQL version 13 and another feature is waiting for PostgreSQL 13 allows sampling of statements depending on duration. So this enables two new configuration options log min duration sample and log statement sample rate. So the setting logman duration statement has a higher priority when compared to the log min duration sample. And whether that allows you to do as it mentions here is that you can use these new settings to, for example, log, as he says here, log 10% of queries over 100 milliseconds and then all queries above 500 milliseconds. So it gives you more granularity with regard to how you want to sample and track slow queries. So definitely a great feature addition for minimizing log files. And both of these two posts waiting for PostgreSQL 13 are from Dep.
[00:10:42] The next post is OpenStreetMap to PostGIS the basics. This is from Cybertechn Postgresql.com and this blog post describes how to import OpenStreetMap data into PostGIS for the purposes of visualization and analytics. So if you're interested in doing that, definitely a blog post to check out. And lastly, st underscore subdivide all the things this is from the Paul Ramsay blog at Cleverelephant CA and he's talking about the scale of a geospatial data and how tracking all the points of Canada is represented with over a 68,000 point multi polygon. And in terms of analysis, subdividing that using the St subdivide function enables speeding things up. So if you're interested in doing that, definitely a blog post to check out.
[00:11:38] That does it. For this episode of Scale Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingposgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.