Iceberg Causes Postgres Acquisitions?

[00:00:00] For maybe the last six months, we've talked about a lot of Postgres acquisitions that have happened. There's been crunchy data and Neon and AirDB, and is the reason why all of this is happening is because of Apache Iceberg. Well, we'll talk more about that and other topics, of course, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is Postgres is the gateway drug. This is from vigi28.dev and it has a subtitle. Every data platform now starts at Postgres because that's where the data starts. [00:00:38] So I thought this was an insightful post because it kind of explains why all these acquisitions are happening. And he says, you know, basically Postgres is leading the pack in terms of the database field now, and it seems like everyone is using it. But for years, data infrastructure companies have been focusing on faster queries, better compressions, smarter optimizers. And basically he says, quote, you know, move your data into our walls, pay us to store it and query it. [00:01:07] And once it was there, the cost of switching prevented people from migrating off. But he says, quote, apache Iceberg removed that gravity. So Iceberg is an open table format. It's parquet files plus metadata manifest that he mentions. And that data lives in Iceberg files on essentially S3 or any other cloud storage system. So it's not in anyone's data warehouse. [00:01:32] So Snowflake Databricks or DuckDB can all query it. So he says, quote, you know, the storage later is commoditized, so now that customers don't have to stay, what are they going to do about it? [00:01:43] And that explains why all these acquisitions are happening. So, for example, Clickhouse acquired PRD to be able to get data from Postgres into their service. And then they launched the managed Postgres service, where they're working in conjunction with UBI Cloud to deliver it. [00:02:02] Databricks acquired Neon and they also launched Lake Base. And then they acquired Mooncake Labs, who Mirror Postgres Wall and Iceberg and Delta Lake in real time. And then Snowflaked acquired Crunchy Data, who they're doing managed Postgres support. And they also introduced a new extension, PG Lake. So he says this quote here, three companies, same bet, own the Postgres layer. And it's interesting, last week I said, open source is eating the world. Well, Postgres has basically started to outpace all the other database solutions. And now it seems Apache Iceberg is essentially doing the same thing. [00:02:43] And he says all These three companies are basically building a postgres to iceberg bridge through their various acquisitions and introductions. [00:02:53] So super interesting insights and it's interesting to see what's going to become of this in the future, but definitely encourage you to check out this blog post. [00:03:01] Next piece of content Scale 23X and Cloud Native PG Robust Self Healing PostgreSQL on Kubernetes this is from Virus.org this is Jimmy Angelakos blog. He recently attended Scale 23X and gave a presentation on Cloud Native PG. [00:03:21] So definitely encourage you to watch the video or you can look at the slides down here. I always like his presentations Next piece of content Using Patrony to build highly available postgres cluster part 2 postgres and patrony this is from pgedge.com and Sean is following up on his way to set up a cluster on patrony. In the first installment he set up etcd. In the second installment he's setting up postgres and using patrony to control it all. So he goes through all the different commands that you would need to use to get that set up. The third installment which is coming is actually setting up essentially the interface layer like haproxy to it. But he says you know you could get started today using this just by using multi host connection strings. So he shows an example of how to do that. But definitely another great piece of content to check out if you're interested. Next piece of content Scaling Postgres connections with pgbouncer this is from planetscale.com and even though this is from Planetscale, this is a very general post on pgbouncer. What it is it's a connection pooler and why it's important. Basically it helps you multiplex all the potentially thousands of connections that you have from your application to hopefully down to at most in the hundreds on the postgres side because each postgres connection does use a lot of resources and it takes a while to spin it up. It's more efficient to use a efficient connection pooler like pgbouncer for all of these potentially thousands of application connections. And he describes how they've set it up at PlanetScale. They basically have a local pgbouncer per node and your application servers can just connect directly to that primary node and go to port 6432 which is the default for pgbouncer and you can immediately talk to your database through the PGBouncer and they definitely recommend that for all your application connections and it would only be like some Administrative or management connections where you would want to talk directly to the database. Now again, even though they're talking about PlanetScale, each of these practices could be used in your own environment. [00:05:30] So this is definitely a viable solution I've used for clients. The next is that you have a dedicated PGBouncer for your primary, so they do support that on a separate instance. And generally that connects to the local PGBouncer and then to Postgres, although you are introducing additional slightly additional latency. So we didn't see them explain why not just go straight to postgres, but they're basically going to the local pgbouncer on the instance. You can also do a dedicated replica pgbouncer. The next thing they talk about is that there are three pooling modes for PGBouncer. You have session pooling, which is essentially a one to one connection that doesn't really afford you hardly any benefits other than maybe a faster connection time statement pooling, which hardly no one uses because working with transactions would be potentially very difficult with this. And pretty much everyone uses transaction pooling. And that is what PlanetScale's PGBouncer solution offers. He talks all about the different configuration and how to set up PGBouncer. So for example, on a small instance, maybe you have 50 max connections on Postgres, you could have up to 500 on PGBouncer, maybe set a default pool size of 30, a max user connections of 30 and MaxDB connections of 40, leaving 10 reserved for administration. [00:06:50] And then they have a large server example and then they also talk about you could go within pgbouncer installed on each application server that's possible. Or even independent pgbouncers in front of your application side that talk to postgres. So this is just a great pgbouncer post explaining all the different ways you can set it up and configure it. [00:07:12] Next piece of content waiting for PostgreSQL 19 allow table exclusions in publications files via except table. So normally when you create a publication, you have two options you can do for all tables. That means every table gets added to the publication. Or you can define each specific table to be added to the publication. But with Postgres 19, they have a patch where you can say all tables except for these tables. So it's a convenient way to exclude certain tables when you're wanting to set up a publisher for essentially all but the largest tables. So if you're interested in that, you can check out how this works here. Next piece of Content There was another episode of Postgres FM last week. This one was on postgis. [00:07:56] So they had Regina Obey and Paul Ramsey to essentially discuss PostGIS. [00:08:02] They talked about the history, they talked about all the different extensions that actually make up postgis. So they were talking about MobilityDB and PG routing and numerous other extensions that are under the PostGIS umbrella. And I think this is a good example of development done right in terms of extensions, because a lot of people, when they talk about postgres development, they say, oh, we need to move these features into the core. Well, when you do that, you do lose velocity and probably less focus because postgres does a new feature release every year, whereas extensions can release new features whenever the community agrees on it. [00:08:44] So that definitely gives you a more rapid development capability. So I think it's a good example of an extension ecosystem of sorts that does it. Right now, I'm not very familiar with geographical information systems, but if you work with them, I'm sure you can get a lot of beneficial content out of this episode. You can listen to the episode up here or watch the YouTube video down here. [00:09:07] Next Piece of Content an ultimate guide to upgrading your Postgres school installation from 17 to 18 so he talks about the three main ways to do it. You can do a dump and restore. Assuming your database is small enough. You could do PG upgrade and again, there's a couple of different modes. You could do it link or swap being the fastest way to do it. Or you could do logical replication, which for a variety of reasons is my preferred one because it minimizes downtime, you avoid potential collation changes if that can potentially happen. You reduce bloat, etc. There's a lot of reasons that I personally like doing it, but it actually shows a practical example of doing an upgrade using PG upgrade. So it goes through all the process and all the commands you would need to use. [00:09:52] Next piece of content pgconf India 2026 PostgreSQL query tuning a foundation every database developer should build on. This is from database rookies.WordPress.com Andy basically gives a framework for working through performance problems or SQL tuning problems, and he actually has the slides from the presentation he gave at PGconf India. So this is definitely where most of the content is. The blog post is actually very short, so check that out if you're interested. [00:10:22] Next Piece of Content Debugging RDS proxy Pinning How a hidden JIT toggle created thousands of pinned connections this is from richyan.com and I've experienced this. You know, whenever you're moving an application behind AWS RDS proxy, you definitely have to resolve session pinning issues if they occur and session pinning happens. If there's some sort of session based command that's run, that connection is then pinned and can no longer be used in the pool. [00:10:52] So you definitely want to monitor those and eliminate them to get those most performance out of your application. And RDS proxy, well, they were having an issue with extremely high CPU usage, relatively high lightweight lock times. [00:11:07] They were having the out of memory killer active on their database every day or two and thousands of active connections. And if you actually look at their post here, you can see a peaks of 140,000 database connections that were pinned. [00:11:25] So that's really crazy. [00:11:28] So they were trying to find out where this was happening and it basically was happening within SQL Alchemy and apparently Async. PG was temporarily disabling JIT for certain commands. [00:11:42] So essentially it was using a session command in postgres, it was setting JIT to off and this command was essentially pinning the session. So they had to stop it from doing that. [00:11:53] Once they made that modification, it dropped the number of pin connections by more than half, they said. And you can see the graph looks much calmer now, although they're still having thousands of connections being pinned, so still there's more work to be done. He does admit that they're still working on that, but check that out if you're interested. Next piece of content Automating RDS Postgres to Aurora Postgres Migration this is from the Netflix Tech blog. [00:12:23] Now this is a migration from one AWS service to another. [00:12:27] So some of this knowledge might not benefit everyone, but there's a lot of great information explaining how they went through the process of doing the migration that you might find interesting. So definitely encourage you to check out this blog post. [00:12:41] Next piece of content we skipped the OLAP stack and built our data warehouse in Vanilla Postgres. This is from Zeta I.O. and they basically said they built an analytics warehouse on Vanilla postgres with materialized views, pgcron and Copy on Write branches. So I think this is something they utilize because I think some of their service does copy on write. So they just introduced materialized views in PG Cron to be able to get their data warehouse essentially set up. So look like they were aggregating four different data series into this data warehouse to be able to answer questions and they're basically flattening out some JSONB event data with the materialized views. And it looks like they are using PGDUCKDB for certain parts of it, but if you want to learn more, definitely check this out and the last piece of content Hacking workshop for April and May 2026. So they're holding a single workshop for these two months and they're talking about Sawada's talk breaking away from Freeze and Wraparound. [00:13:44] So if you're interested in attending, definitely sign up using this forum link right here. I hope you enjoyed this episode. Be sure to check out scalingpostgrads.com where you can find links to all the content discussed, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show, as well as a full transcript. Thanks. I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 366

20,000 Times Faster Order By Limit | Scaling Postgres 366

Episode 68

Prepared Plans, Materialization, Recursive CTEs, Safe Migrations | Scaling Postgres 68

Episode 166

Postgres 14 Beta 1, PG 14 Improvements, Best Primary Keys, Composite Keys | Scaling Postgres 166

Iceberg Causes Postgres Acquisitions? | Scaling Postgres 409