K8s Archives | simplyblock

How to Build Scalable and Reliable PostgreSQL Systems on Kubernetes

Sanskar Gurdasani (Guest Author, CloudRaft) — Mon, 25 Nov 2024 15:31:37 +0000

This is a guest post by Sanskar Gurdasani, DevOps Engineer, from CloudRaft.

Maintaining highly available and resilient PostgreSQL databases is crucial for business continuity in today’s cloud-native landscape. The Cloud Native PostgreSQL Operator provides robust capabilities for managing PostgreSQL clusters in Kubernetes environments, particularly in handling failover scenarios and implementing disaster recovery strategies.

In this blog post, we’ll explore the key features of the Cloud Native PostgreSQL Operator for managing failover and disaster recovery. We’ll discuss how it ensures high availability, implements automatic failover, and facilitates disaster recovery processes. Additionally, we’ll look at best practices for configuring and managing PostgreSQL clusters using this operator in Kubernetes environments.

Why to run Postgres on Kubernetes?

Running PostgreSQL on Kubernetes offers several advantages for modern, cloud-native applications:

Stateful Workload Readiness: Contrary to old beliefs, Kubernetes is now ready for stateful workloads like databases. A 2021 survey by the Data on Kubernetes Community revealed that 90% of respondents believe Kubernetes is suitable for stateful workloads, with 70% already running databases in production.
Immutable Application Containers: CloudNativePG leverages immutable application containers, enhancing deployment safety and repeatability. This approach aligns with microservice architecture principles and simplifies updates and patching.
Cloud-Native Benefits: Running PostgreSQL on Kubernetes embraces cloud-native principles, fostering a DevOps culture, enabling microservice architectures, and providing robust container orchestration.
Automated Management: Kubernetes operators like CloudNativePG extend Kubernetes controllers to manage complex applications like PostgreSQL, handling deployments, failovers, and other critical operations automatically.
Declarative Configuration: CloudNativePG allows for declarative configuration of PostgreSQL clusters, simplifying change management and enabling Infrastructure as Code practices.
Resource Optimization: Kubernetes provides efficient resource management, allowing for better utilization of infrastructure and easier scaling of database workloads.
High Availability and Disaster Recovery: Kubernetes facilitates the implementation of high availability architectures across availability zones and enables efficient disaster recovery strategies.
Streamlined Operations with Operators: Using operators like CloudNativePG automates all the tasks mentioned above, significantly reducing operational complexity. These operators act as PostgreSQL experts in code form, handling intricate database management tasks such as failovers, backups, and scaling with minimal human intervention. This not only increases reliability but also frees up DBAs and DevOps teams to focus on higher-value activities, ultimately leading to more robust and efficient database operations in Kubernetes environments.

By leveraging Kubernetes for PostgreSQL deployments, organizations can benefit from increased automation, improved scalability, and enhanced resilience for their database infrastructure, with operators like CloudNativePG further simplifying and optimizing these processes.

List of Postgres Operators

Kubernetes operators represent an innovative approach to managing applications within a Kubernetes environment by encapsulating operational knowledge and best practices. These extensions automate the deployment and maintenance of complex applications, such as databases, ensuring smooth operation in a Kubernetes setup.

The Cloud Native PostgreSQL Operator is a prime example of this concept, specifically designed to manage PostgreSQL clusters on Kubernetes. This operator automates various database management tasks, providing a seamless experience for users. Some key features include direct integration with the Kubernetes API server for high availability without relying on external tools, self-healing capabilities through automated failover and replica recreation, and planned switchover of the primary instance to maintain data integrity during maintenance or upgrades.

Additionally, the operator supports scalable architecture with the ability to manage multiple instances, declarative management of PostgreSQL configuration and roles, and compatibility with Local Persistent Volumes and separate volumes for WAL files. It also offers continuous backup solutions to object stores like AWS S3, Azure Blob Storage, and Google Cloud Storage, ensuring data safety and recoverability. Furthermore, the operator provides full recovery and point-in-time recovery options from existing backups, TLS support with client certificate authentication, rolling updates for PostgreSQL minor versions and operator upgrades, and support for synchronous replicas and HA physical replication slots. It also offers replica clusters for multi-cluster PostgreSQL deployments, connection pooling through PgBouncer, a native customizable Prometheus metrics exporter, and LDAP authentication support.

By leveraging the Cloud Native PostgreSQL Operator, organizations can streamline their database management processes on Kubernetes, reducing manual intervention and ensuring high availability, scalability, and security in their PostgreSQL deployments. This operator showcases how Kubernetes operators can significantly enhance application management within a cloud-native ecosystem.

Here are the most popular PostgreSQL operators:

CloudNativePG (formerly known as Cloud Native PostgreSQL Operator)
Crunchy Data Postgres Operator (first released in 2017)
Zalando Postgres Operator (first released in 2017)
StackGres (released in 2020)
Percona Operator for PostgreSQL (released in 2021)
Kubegres (released in 2021)
Patroni (for HA PostgreSQL solutions using Python.)

Understanding Failover in PostgreSQL

Primary-Replica Architecture

In a PostgreSQL cluster, the primary-replica (formerly master-slave) architecture consists of:

Primary Node: Handles all write operations and read operations
Replica Nodes: Maintain synchronized copies of the primary node’s data

Automatic Failover Process

When the primary node becomes unavailable, the operator initiates the following process:

Detection: Continuous health monitoring identifies primary node failure
Election: A replica is selected to become the new primary
Promotion: The chosen replica is promoted to primary status
Reconfiguration: Other replicas are reconfigured to follow the new primary
Service Updates: Kubernetes services are updated to point to the new primary

Implementing Disaster Recovery

Backup Strategies

The operator supports multiple backup approaches:

1. Volume Snapshots

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgresql-cluster
spec:
  instances: 3
  backup:
    volumeSnapshot:
      className: csi-hostpath-snapclass
      enabled: true
      snapshotOwnerReference: true

2. Barman Integration

spec:
  backup:
    barmanObjectStore:
      destinationPath: 's3://backup-bucket/postgres'
      endpointURL: 'https://s3.amazonaws.com'
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS\_KEY\_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS\_SECRET\_KEY

Disaster Recovery Procedures

Point-in-Time Recovery (PITR)
- Enables recovery to any specific point in time
- Uses WAL (Write-Ahead Logging) archives
- Minimizes data loss
Cross-Region Recovery
- Maintains backup copies in different geographical regions
- Enables recovery in case of regional failures

Demo

This section provides a step-by-step guide to setting up a CloudNative PostgreSQL cluster, testing failover, and performing disaster recovery.

1. Installation

Method 1: Direct Installation

kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.24.0.yaml

Method 2: Helm Installation

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

Verify the Installation

kubectl get deployment -n cnpg-system cnpg-controller-manager

Install CloudNativePG Plugin

CloudNativePG provides a plugin for kubectl to manage a cluster in Kubernetes. You can install the cnpg plugin using a variety of methods.

Via the installation script

curl -sSfL \
  https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | \
  sudo sh -s -- -b /usr/local/bin

If you already have Krew installed, you can simply run:

kubectl krew install cnpg

2. Create S3 Credentials Secret

First, create an S3 bucket and an IAM user with S3 access. Then, create a Kubernetes secret with the IAM credentials:

kubectl create secret generic s3-creds \
  --from-literal=ACCESS_KEY_ID=your_access_key_id \
  --from-literal=ACCESS_SECRET_KEY=your_secret_access_key

3. Create PostgreSQL Cluster

Create a file named cluster.yaml with the following content:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example
spec:
  backup:
    barmanObjectStore:
      destinationPath: 's3://your-bucket-name/retail-master-db'
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: ACCESS_SECRET_KEY
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  bootstrap:
    initdb:
      postInitTemplateSQL:
        - CREATE EXTENSION IF NOT EXISTS timescaledb;
  storage:
    size: 20Gi

Apply the configuration to create cluster:

kubectl apply -f cluster.yaml

Verify the cluster status:

kubectl cnpg status example

4. Getting Access

Deploying a cluster is one thing, actually accessing it is entirely another. CloudNativePG creates three services for every cluster, named after the cluster name. In our case, these are:

kubectl get service

example-rw: Always points to the Primary node
example-ro: Points to only Replica nodes (round-robin)
example-r: Points to any node in the cluster (round-robin)

5. Insert Data

Create a PostgreSQL client pod:

kubectl run pgclient --image=postgres:13 --command -- sleep infinity

Connect to the database:

kubectl exec -ti example-1 -- psql app

Create a table and insert data:

CREATE TABLE stocks_real_time (
  time TIMESTAMPTZ NOT NULL,
  symbol TEXT NOT NULL,
  price DOUBLE PRECISION NULL,
  day_volume INT NULL
);

SELECT create_hypertable('stocks_real_time', by_range('time'));
CREATE INDEX ix_symbol_time ON stocks_real_time (symbol, time DESC);
GRANT ALL PRIVILEGES ON TABLE stocks_real_time TO app;

INSERT INTO stocks_real_time (time, symbol, price, day_volume)
VALUES
  (NOW(), 'AAPL', 150.50, 1000000),
  (NOW(), 'GOOGL', 2800.75, 500000),
  (NOW(), 'MSFT', 300.25, 750000);

6. Failover Test

Force a backup:

kubectl cnpg backup example

Initiate failover by deleting the primary pod:

kubectl delete pod example-1

Monitor the cluster status:

kubectl cnpg status example

Key observations during failover:

Initial status: “Switchover in progress”
After approximately 2 minutes 15 seconds: “Waiting for instances to become active”
After approximately 3 minutes 30 seconds: Complete failover with new primary

Verify data integrity after failover through service:

Retrieve the database password:

kubectl get secret example-app -o \
  jsonpath="{.data.password}" | base64 --decode

Connect to the database using the password:

kubectl exec -it pgclient -- psql -h example-rw -U app

Execute the following SQL queries:

# Confirm the count matches the number of rows inserted earlier. It will show 3
SELECT COUNT(*) FROM stocks_real_time;

#Insert new data to test write capability after failover:
INSERT INTO stocks_real_time (time, symbol, price, day_volume)
VALUES (NOW(), 'NFLX', 500.75, 300000);


SELECT * FROM stocks_real_time ORDER BY time DESC LIMIT 1;

Check read-only service:

kubectl exec -it pgclient -- psql -h example-ro -U app

Once connected, execute:

SELECT COUNT(*) FROM stocks_real_time;

Review logs of both pods:

kubectl logs example-1
kubectl logs example-2

Examine the logs for relevant failover information.

Perform a final cluster status check:

kubectl cnpg status example

Confirm both instances are running and roles are as expected.

7. Backup and Restore Test

First, check the current status of your cluster:

kubectl cnpg status example

Note the current state, number of instances, and any important details.

Promote the example-1 node to Primary:

kubectl cnpg promote example example-1

Monitor the promotion process, which typically takes about 3 minutes to complete.

Check the updated status of your cluster, then create a new backup:

kubectl cnpg backup example –backup-name=example-backup-1

Verify the backup status:

kubectl get backups
NAME               AGE   CLUSTER   METHOD              PHASE       ERROR
example-backup-1   38m   example   barmanObjectStore   completed

Delete the Original Cluster then prepare for the recovery test:

kubectl delete cluster example

There are two methods available to perform a Cluster Recovery bootstrap from another cluster. For further details, please refer to the documentation. There are two ways to achieve this result in CloudNativePG:

Using a recovery object store, that is a backup of another cluster created by Barman Cloud and defined via the barmanObjectStore option in the externalClusters section (recommended)
Using an existing Backup object in the same namespace (this was the only option available before version 1.8.0).

Method 1: Recovery from an Object Store

You can recover from a backup created by Barman Cloud and stored on supported object storage. Once you have defined the external cluster, including all the required configurations in the barmanObjectStore section, you must reference it in the .spec.recovery.source option.

Create a file named example-object-restored.yaml with the following content:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-object-restored
spec:
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  storage:
    size: 1Gi
  bootstrap:
    recovery:
      source: example
  externalClusters:
    - name: example
      barmanObjectStore:
        destinationPath: 's3://your-bucket-name'
        s3Credentials:
          accessKeyId:
            name: s3-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: s3-creds
            key: ACCESS_SECRET_KEY

Apply the restored cluster configuration:

kubectl apply -f example-object-restored.yaml

Monitor the restored cluster status:

kubectl cnpg status example-object-restored

Retrieve the database password:

kubectl get secret example-object-restored-app \
  -o jsonpath="{.data.password}" | base64 --decode

Connect to the restored database:

kubectl exec -it pgclient -- psql -h example-object-restored-rw -U app

Verify the restored data by executing the following SQL queries:

# it should show 4
SELECT COUNT(*) FROM stocks_real_time;
SELECT * FROM stocks_real_time;

The successful execution of these steps to recover from an object store confirms the effectiveness of the backup and restore process.

Delete the example-object-restored Cluster then prepare for the backup object restore test:

kubectl delete cluster example-object-restored

Method 2: Recovery from Backup Object

In case a Backup resource is already available in the namespace in which the cluster should be created, you can specify its name through .spec.bootstrap.recovery.backup.name

Create a file named example-restored.yaml:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-restored
spec:
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  storage:
    size: 1Gi
  bootstrap:
    recovery:
      backup:
        name: example-backup-1

Apply the restored cluster configuration:

kubectl apply -f example-restored.yaml

Monitor the restored cluster status:

kubectl cnpg status example-restored

Retrieve the database password:

kubectl get secret example-restored-app \
  -o jsonpath="{.data.password}" | base64 --decode

Connect to the restored database:

kubectl exec -it pgclient -- psql -h example-restored-rw -U app

Verify the restored data by executing the following SQL queries:

SELECT COUNT(*) FROM stocks_real_time;
SELECT * FROM stocks_real_time;

The successful execution of these steps confirms the effectiveness of the backup and restore process.

Kubernetes Events and Logs

1. Failover Events

Monitor events using:

# Watch cluster events
kubectl get events --watch | grep postgresql

# Get specific cluster events
kubectl describe cluster example | grep -A 10 Events

Key events to monitor:
- Primary selection process
- Replica promotion events
- Connection switching events
- Replication status changes

2. Backup Status

Monitor backup progress:

# Check backup status
kubectl get backups

# Get detailed backup info
kubectl describe backup example-backup-1

Key metrics:
- Backup duration
- Backup size
- Compression ratio
- Success/failure status

3. Recovery Process

Monitor recovery status:

# Watch recovery progress
kubectl cnpg status example-restored

# Check recovery logs
kubectl logs example-restored-1 \-c postgres

Important recovery indicators:
- WAL replay progress
- Timeline switches
- Recovery target status

Conclusion

The Cloud Native PostgreSQL Operator significantly simplifies the management of PostgreSQL clusters in Kubernetes environments. By following these practices for failover and disaster recovery, organizations can maintain highly available database systems that recover quickly from failures while minimizing data loss. Remember to regularly test your failover and disaster recovery procedures to ensure they work as expected when needed. Continuous monitoring and proactive maintenance are key to maintaining a robust PostgreSQL infrastructure.

Everything fails, all the time. ~ Werner Vogels, CTO, Amazon Web services

Editoral: And if you are looking for looking for a distributed, scalable, reliable, and durable storage for your PostgreSQL cluster in Kubernetes or any other Kubernetes storage need, simplyblock is the solution you’re looking for.

The post How to Build Scalable and Reliable PostgreSQL Systems on Kubernetes appeared first on simplyblock.

Easy Developer Namespaces with Multi-tenant Kubernetes with Alessandro Vozza from Kubespaces

Chris Engelbert — Fri, 14 Jun 2024 12:07:20 +0000

This interview is part of the simplyblock’s Cloud Commute Podcast, available on Youtube , Spotify , iTunes/Apple Podcasts , Pandora , Samsung Podcasts, and our show site .

In this installment of podcast, we’re joined by Alessandro Vozza ( Twitter/X , Github ) , a prominent figure in the Kubernetes and cloud-native community , who talks about his new project, Kubespaces, which aims to simplify Kubernetes deployment by offering a namespace-as-a-service. He highlights the importance of maintaining the full feature set of Kubernetes while ensuring security and isolation for multi-tenant environments. Alessandro’s vision includes leveraging the Kubernetes API to create a seamless, cloud-agnostic deployment experience, ultimately aiming to fulfill the promises of platform engineering and serverless computing. He also discusses the future trends in Kubernetes and the significance of environmental sustainability in technology.

Chris Engelbert: Hello, everyone. Welcome back to the next episode of simplyblock’s Cloud Commute podcast. Today, I have another incredible guest. I know I say that every time, but he’s really incredible. He’s been around in the Kubernetes space for quite a while. And I think, Alessandro, the best way is just to introduce yourself. Who are you? What have you done in the past, and what are you doing right now?

Alessandro Vozza: Thank you for having me. Well, I’m Alessandro, yes, indeed. I’ve been around for some time in the cloud-native community. I’m Italian, from the south of Italy, and I moved to Amsterdam, where I live currently, about 20 years ago, to get my PhD in chemistry. And then after I finished my PhD, that’s my career. So I went through different phases, always around open source, of course. I’ve been an advocate for open source, and a user of open source since the beginning, since I could lay my hands on a keyboard.

That led me to various places, of course, and various projects. So I started running the DevOps meetup in Amsterdam back in the day, 10, 11 years ago. Then from there, I moved to the OpenStack project and running the OpenStack community. But when I discovered Kubernetes, and what would become the Cloud Native Computing Foundation, I started running the local meetup. And that was kind of a turning point for me. I really embraced the community and embraced the project and started working on the things. So basically what I do is organize the meetup and organize the KCDs, the Kubernetes Community Days in Amsterdam, in Utrecht, around the country. That kind of led me through a natural process to be a CNCF Ambassador, which are people that represent or are so enthusiastic about the way the Cloud Native Computing Foundation works and the community, that are naturally elected to be the face or the ambassadors for the project, for the mission.

At this moment, I still do that. It’s my honor and pleasure to serve the community, to create, to run monthly meetups and KCDs and help other communities thrive as well. So the lessons learned in the Netherlands, in the meetups and in the conferences, we try to spread them as much as possible. We are always available for other communities to help them thrive as well. So that’s been me in a nutshell. So all about community. I always say I’m an average programmer, I’m an average engineer, but where I really shine is to organize these events and to get the people together. I get a kick out of a successful event where people form connections and grow together. So that’s what drives me in my very core.

Chris Engelbert: I like how you put this. You really shine in bringing engagement to the community, helping people to shine themselves, to grow themselves. I think that is a big part of being a developer advocate or in the developer relations space in general. You love this sharing of information, helping other people to get the most out of it.

Alessandro Vozza: Actually, I used to be, or I still do play the bass, electric bass and double bass. And the bass player stays in the back next to the drummer and he creates the conditions so the other members of the band shine. So the guitar player usually stays in front, the bass player is the guy that stays back and is happy to create the foundations and cover the music to really shine. And that’s maybe my nature. So maybe it reflects from the fact that I always love playing the bass and being that guy in a band.

Chris Engelbert: I love that. That’s a great analogy. I never thought about that, but that is just brilliant. And I actually did the same thing in the past, so there may be some truth to that. So we met a few weeks ago in Amsterdam, actually at AWS Summit Amsterdam.

And I invited you because I thought you were still with the previous company, but you’re doing something new right now. So before that, you were with Solo.io , an API gateway, networking, whatever kind of thing. But you’re doing your own thing. So tell us about it.

Alessandro Vozza: Yeah. So it was a great year doing DevRel and so much fun going and speaking about service mesh, which is something that I really believe it’s going to, it’s something that everybody needs, but I know it’s a controversial, but it’s something that I really, you got to believe in it. You know, when you are a developer advocate, when you represent a company or community, the passion is important. You cannot have passion for something you don’t believe in, for something that you don’t completely embrace. And that was great. And we had so much fun for about a year or a bit more. But then I decided that I’m too young to settle, as always, like I’m only 48, come on, I have a good 10 years of engineering work to do. So I decided that I wanted to work on something else, on something mine, more, more mine, more an idea that I had, and I want to see it develop.

Filling a gap in the market and a real need for developers to have a flexible environment, environments to deploy their applications. So fulfilling the promises of platform engineering as a self-service platform to deploy applications. So the idea goes around the namespace. What is a namespace? Of course, it’s what the unit of deployment in Kubernetes really, it’s this magical place where developers can be free and can deploy their application without the control within the guard rails of whatever the system means, the cluster administrator sets.

But developers really love freedom. So developers don’t want to have to interact even with the sysops or sysadmins. In fact, developers love Heroku. So Heroku, I think, is the hallmark of developer experience where you just can deploy whatever you want, all your code, all your applications in a place and it’s automatically exposed and you can manage by yourself everything about your application.

I want to reproduce that. I want to get inspired by that particular developer experience. But because I love Kubernetes, of course, and because I really believe that the Kubernetes APIs are the cornerstone, the golden standards of cloud-native application deployment. So I want to offer the same experience but through the Kubernetes API. So how you do that, and that’s, of course, like this evolving product, me and a bunch of people are still working on, define exactly what does it mean and how it’s going to work. But the idea is that we offer namespace-as-a-service. What really matters to developers is not the clusters, is not the VMs or the networks or all the necessary evil that you need to run namespaces. But what really matters is the namespace, is a place where they can deploy their application. So what if we could offer the best of both worlds, kind of like the promises of serverless computing, right? So you are unburdened by infrastructure. Of course, there is infrastructure somewhere, the cloud is just somebody else’s computer, right? So it’s not magic, but it feels like magic because of the clever arrangement of servers in a way that you don’t see them, but they are still there.

So imagine a clusterless Kubernetes. The experience of Kubernetes, the API really, so all the APIs that you learn to love and embrace without the burden of infrastructure. That’s the core idea.

Chris Engelbert: So that means it’s slightly different from those app platforms like Fargate or what’s the Azure and GCP ones, Cloud Run and whatever. So it’s slightly different, right? Because you’re still having everything Kubernetes offers you. You still have your CRDs or your resource definitions, but you don’t have to manage Kubernetes on its own because it’s basically a hosted platform. Is that correct?

Alessandro Vozza: Yeah. So those platforms, of course, they are meant to run single individual application pods, but they don’t feel like Kubernetes. I don’t understand. For me, because I love it so much, I think developers love to learn also new things. So developers will love to have a Kubernetes cluster where they can do what they like, but without the burden of managing it. But this CloudRun and ACI and Fargate, they are great tools, of course, and you can use them to put together some infrastructure, but they’re still limiting in what you can deploy. So you can deploy this single container, but it’s not a full-fledged Kubernetes cluster. And I think it’s still tripling in a way that you don’t have the full API at your disposal, but you have to go through this extra API layer. It’s a bespoke API, so you got to learn Cloud Run, you got to learn ACI, you got to learn Fargate, but they are not compatible with each other. They are very cloud specific, but a Kubernetes API is cloud agnostic, and that’s what I want to build.

What we seek to build is to have a single place where you can deploy in every cloud, in every region, in some multi-region, multi-cloud, but through the same API layer, which is the pure and simple Kubernetes API.

Chris Engelbert: I can see there’s two groups of people, the ones that say, just hide all the complexity from Kubernetes. And you’re kind of on the other side, I wouldn’t say going all the way, like you want the complexity, but you want the feature set, the possibilities that Kubernetes still offers you without the complexity of operating it. That’s my feeling.

Alessandro Vozza: Yeah, the complexity lies in the operation, in the upgrades, the security, to properly secure a Kubernetes cluster, it takes a PhD almost, so there’s a whole sort of ecosystem dedicated to secure a cluster. But in Kubespaces, we can take care of it, we can make sure that the clusters are secure and compliant, while still offering the freedom to the developers to deploy what they need and they like. I think we underestimate the developers, so they love to tinker with the platform, so they love freedom, they don’t want the burden, even to interact with the operation team.

And so the very proposal here is that you don’t need an operation team, you don’t need a platform engineering team, it’s all part of the platform that we offer. And you don’t even need an account in Azure or AWS, you can select which cloud and which region to deploy to completely seamlessly and without limits.

Chris Engelbert: Okay, so that means you can select, okay, I need a Kubernetes cluster namespace, whatever you want to call it, in Azure, in Frankfurt or in Western Europe, whatever they call it.

Alessandro Vozza: Yeah. Okay, so yeah, it is still a thing, so people don’t want to be in clouds that don’t trust, so if you don’t want to be in Azure, you should not be forced to. So we offer several infrastructure pieces, clusters, even if the word cluster doesn’t even appear anywhere, because it’s by design, we don’t want people to think in terms of clusters, we want people to think in terms of namespaces and specifically tenants, which are just a collection of namespaces, right? So it’s a one namespace is not going to cut it, of course, you want to have multiple to assign to your teams, to group them in environments like that, prod or test, and then assign them to your team, to your teams, so they can deploy and they’re fun with their namespaces and tenants.

Chris Engelbert: Yeah, I think there’s one other thing which is also important when you select a cloud and stuff, you may have other applications or other services already in place, and you just want to make sure that you have the lowest latency, you don’t have to pay for throughput, and stuff like that. Something that I always find complicated with hosted database platforms, to be honest, because you have to have them in the same region somehow.

Alessandro Vozza: Yeah, that’s also a political reason, right? Or commercial reason that prevents you from that.

Chris Engelbert: Fair, fair. There’s supposed to be people that love Microsoft for everything.

Alessandro Vozza: I love Microsoft, of course, been there for seven years. I’m not a fanboy, maybe I am a little, but that’s all right. Everybody, that’s why the world is a beautiful place. Everybody is entitled to his or her opinion, and that’s all right.

Chris Engelbert: I think Microsoft did a great job with the cloud, and in general, a lot of the changes they did over the last couple of decades, like the last two decades, I think there are still the teams like the Office and the Windows team, which are probably very enterprise-y still, but all of the other ones. For me specifically, the Java team at Microsoft, they’re all doing a great job, and they seem to be much easier and much more community driven than the others.

Alessandro Vozza: I was so lucky because I was there, so I saw it with my own eyes, the unfolding of this war machine of Microsoft. There was this tension of beating Amazon at their own game. Seven years ago, we had this mission of really, really demonstrating that Microsoft was serious about open source, about cloud, and it paid off, and they definitely put Microsoft back on the map. I’m proud and very, very grateful to be here. You have been there, Microsoft joining the Linux Foundation, the Cloud Native Computing Foundation really being serious about Cloud Native, and now it works.

Chris Engelbert: I agree. The Post-Balmer era is definitely a different world for Microsoft. All right, let’s get back to Kubespaces, because looking at the time, we’re at 17. You said it’s, I think it’s a shared resource. You see the Kubernetes as a multi-tenant application, so how does isolation work between customers? Because I think that is probably a good question for a lot of security-concerned people.

Alessandro Vozza: Yeah, so of course, in the first incarnation would be a pure play SaaS where you have shared tenants. I mean, it’s an infrastructure share among customers. That’s by design the first iteration. There will be more, probably where we can offer dedicated clusters to specific customers. But in the beginning, it will be based on a mix of technologies between big cluster and Firecracker, which ensure better isolation of your workload. So it is indeed one piece of infrastructure where multiple customers will throw their application, but you won’t be able to see each other. Everybody gets his own API endpoint for Kubernetes API, so you will not be able. RBAC is great, and it works, of course, and it’s an arcane magic thing and it’s arcane knowledge. Of course, to properly do RBAC is quite difficult. So instead of risking to make a mistake in some cluster role or role, and then everybody can see everything, you better have isolation between tenants. And that comes with a popular project like big cluster, which has been already around for five years. So that’s some knowledge there already.

And even an other layer of isolation, things like Kata Container and Firecracker, they provide much better isolation at the container runtime level. So even if you escape from the container, from the jail of the container, you only can see very limited view of the world and you cannot see the rest of the infrastructure. So that’s the idea of isolating workloads between customers. You could find, of course, flaws in it, but we will take care of it and we will have all the monitoring in place to prevent it, it’s a learning experience. We want to prove to ourselves first and to customers that we can do this.

Chris Engelbert: Right. Okay. For the sake of time, a very, very… well, I think because you’re still building this thing out, it may be very interesting for you to talk about that. I think right now it’s most like a one person thing. So if you’re looking for somebody to help with that, now is your time to ask for people.

Alessandro Vozza: Yeah. If the ideas resonate and you want to build a product together, I do need backend engineers, front-end engineers, or just enthusiastic people that believe in the idea. It’s my first shot at building a product or building a startup. Of course, I’ve been building other businesses before, consulting and even a coworking space called Cloud Pirates. But now I want to take a shot at building a product and see how it goes. The idea is sound. There’s some real need in the market. So it’s just a matter of building it, build something that people want. So don’t start from your ideas, but just listen to what people tell you to build and see how it goes. So yeah, I’ll be very happy to talk about it and to accept other people’s ideas.

Chris Engelbert: Perfect. Last question, something I always have to ask people. What do you think will be the next big thing in Kubernetes? Is it the namespace-as-a-service or do you see anything else as well?

Alessandro Vozza: If I knew, of course, in the last KubeCon in Paris, of course, the trends are clear, this AI, this feeding into AI, but also helping AI thrive from Cloud Native. So this dual relationship with the Gen AI and the new trends in computing, which is very important. But of course, if you ask people, there will be WebAssembly on the horizon, not replacing containers, but definitely becoming a thing. So there are trends. And that’s great about this community and this technologies that it’s never boring. So there’s always something new to learn. And I’m personally trying to learn every day. And if it’s not WebAssembly, it’s something else, but trying to stay updated. This is fun. And challenges your convention, your knowledge every day. So this idea from Microsoft that I learned about growth mindset, what you should know now is never enough if you think ahead. And it’s a beautiful thing to see. So it’s something that keeps me every day.

Now I’m learning a lot of on-premise as well. These are also trying to move workloads back to the data centers. There are reasons for it. And one trend is actually one very important one. And I want to shout out to the people in the Netherlands also working on it is green computing or environmental sustainability of software and infrastructure. So within the CNCF, there is the Technical Advisory Group environmental sustainability, which we’re collaborating with. We are running the environmental sustainability week in October. So worldwide events all around getting the software we all love and care to run greener and leaner and less carbon intense. And this is not just our community, but it’s the whole planet involved. Or at least should be concerned for everybody concerned about the future of us. And I mean, I have a few kids, so I have five kids. So it’s something that concerns me a lot to leave a better place than I found it.

Chris Engelbert: I think that is a beautiful last statement, because we’re running out of time. But in case you haven’t seen the first episode of a podcast, that may be something for you because we actually talked to Rich Kenny from Interact and they work on data center sustainability, kind of doing the same thing on a hardware level. Really, really interesting stuff. Thank you very much. It was a pleasure having you. And for the audience, next week, same time, same place. I hope you’re listening again. Thank you.

Alessandro Vozza: Thank you so much for having me. You’re welcome.

The post Easy Developer Namespaces with Multi-tenant Kubernetes with Alessandro Vozza from Kubespaces appeared first on simplyblock.

How to choose your Kubernetes Postgres Operator?

Chris Engelbert — Thu, 06 Jun 2024 12:09:42 +0000

A Postgres Operator for Kubernetes eases the management of PostgreSQL clusters inside the Kubernetes (k8s) environment. It watches changes (additions, updates, deletes) of PostgreSQL related CRD (custom resource definitions) and applies changes to the running clusters accordingly.

Therefore, a Postgres Operator is a critical component of the database setup. In addition to the “simply” management of the Postgres cluster, it often also provides additional integration with external tools like pgpool II or pgbouncer (for connection pooling), pgbackrest or Barman (for backup and restore), as well as Postgres extension management.

Critical components should be chosen wisely. Oftentimes, it’s hard to exchange one tool for another later on. Additionally, depending on your requirements, you may need a tool with commercial support available, while others prefer the vast open source ecosystem. If both, even better.

But how do you choose your Postgres Operator of choice? Let’s start with a quick introduction of the most common options.

Stolon

Stolon is one of the oldest operators available. Originally released in November 2015 it even predates the term Kubernetes Operator .

The project is well known and has over 4.5k stars on GitHub. However the age shows. Many features aren’t cloud-native, it doesn’t support CRDs (custom resource definition), hence configuration isn’t according to Kubernetes. All changes are handled through a command line tool. Apart from that, the last release 0.17.0 is from September 2021, and there isn’t really any new activity on the repository. You can read that as “extremely stable” or “almost abandoned”. I guess both views on the matter are correct. Anyhow, from my point of view, the lack of activity is concerning for a tool that’s potentially being used for 5-10 years, especially in a fast-paced world like the cloud-native one.

Personally, I don’t recommend using it for new setups. I still wanted to mention it though since it deserves it. It did and does a great job for many people. Still, I left it out of the comparison table further below.

CloudNativePG

CloudNativePG is the new kid on the block, or so you might think. Its official first commit happened in March 2022. However, CloudNativePG was originally developed and sponsored by EDB (EnterpriseDB), one of the oldest companies built around PostgreSQL.

As the name suggests, CloudNativePG is designed to bring the full cloud-native feeling across. Everything is defined and configured using CRDs. No matter if you need to create a new Postgres cluster, want to define backup schedules, configure automatic failover, or scale up and down. It’s all there. Fully integrated into your typical Kubernetes way of doing things. Including a plugin for kubectl.

On GitHub , CloudNativePG skyrocketed in terms of stargazers and collected over 3.6k stars over the course of its lifetime. And while it’s not officially a CNCF (Cloud Native Computing Foundation) project, the CNCF stands pretty strongly behind it. And not to forget, the feature set is on par with older projects. No need to hide here.

All in all, CloudNativePG is a strong choice for a new setup. Fully in line with your Kubernetes workflows, feature rich, and a strong and active community.

Editor’s note: Btw, we had Jimmy Angelakos as a guest on the Cloud Commute podcast , and he talks about CloudNativePG. You should listen in.

Crunchy Postgres Operator (PGO)

PGO, or the Postgres Operator from Crunchy Data , has been around since March 2017 and is a very beloved choice by a good chunk of people. On GitHub, PGO has over 3.7k stars, an active community, quick response times to issues, and overall a pretty stable timeline activity.

Postgres resources are defined and managed through CRDs, as are Postgres users. Likewise, PGO provides integration with a vast tooling ecosystem, such as patroni (for automatic failover), pgbackrest (for backup management), pgbouncer (connection pooling), and more.

A good number of common extensions are provided out of the box, however, adding additional extensions is a little bit more complicated.

Overall, Crunchy PGO is a solid, production-proven option, also into the future. While not as fresh and hip as CloudNativePG anymore, it checks all the necessary marks, and it does so for many people for many years.

OnGres StackGres

StackGres by OnGres is also fairly new and tries to do a few things differently. While all resources can be managed through CRDs, they can also be managed through a CLI, and even a web interface. Up to the point that you can create a resource through a CRD, change a small property through the CLI, and scale the Postgres cluster using the webui. All management interfaces are completely interchangeable.

Same goes for extension management. StackGres has the biggest amount of supported Postgres extensions available, and it’s hard to find an extension which isn’t supported out of the box.

In terms of tooling integration, StackGres supports all the necessary tools to build a highly available, fault tolerant, automatically backed up and restored, scalable cluster.

In comparison to some of the other operators I really like the independent CRD types, giving a better overview of a specific resource instead of bundling all of the complexity of the Postgres and tooling ecosystem configuration in one major CRD with hundreds and thousands of lines of code.

StackGres is my personal favorite, even though it only accumulated around 900 starts on GitHub so far.

While still young, the team behind it is a group of veteran PostgreSQL folks. They just know what they do. If you prefer a bit of a bigger community, and less of a company driven project, you’ll be better off with CloudNativePG, but apart from that, StackGres is the way to go.

Editor’s note: We had Álvaro Hernández, the founder and CEO of OnGres , in our Cloud Commute podcast, talking about StackGres and why you should use it. Don’t miss the first hand information.

AppsCode KubeDB

KubeDB by AppsCode is different. In development since 2017, it’s well known in the community, but an all-in commercial product. That said, commercial support is great and loved.

Additionally, KubeDB isn’t just PostgreSQL. It supports Elasticsearch, MongoDB, Redis, Memcache, MySQL, and more. It’s a real database-in-a-box deployment solution.

All boxes are checked for KubeDB and there isn’t much to say about it other than, if you need a commercially supported operator for Postgres, look no further than KubeDB.

Zalando Postgres Operator

Last but not least, the Postgres Operator from Zalando . Yes, that Zalando, the one that sells shoes and clothes.

Zalando is a big Postgres user and started early with the cloud-native journey. Their operator has been around since 2017 and has a sizable fanbase. On GitHub the operator managed to collect over 4k stars, has a very active community, a stable release cadence, and is a great choice.

In terms of integrations with the tooling ecosystem, it provides less flexibility and is slightly opinionated towards how things are done. It was developed for Zalando’s own infrastructure first and foremost though.

Anyhow, the Zalando operator has been and still is a great choice. I actually used it myself for my previous startup and it just worked.

Which Postgres Kubernetes Operator should I Use?

You already know the answer, it depends. I know we all hate that answer, but it is true. As hinted at in the beginning, if you need commercial support certain options are already out of the scope.

It also depends if you already have another Postgres cluster with an operator running. If it works, is there really a reason to change it or introduce another one for the new cluster?

Anyhow, below is a quick comparison table of features and supported versions that I think are important.

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Tool version	1.23.1	5.5.2	1.10.0	v2024.4.27	1.12.0
Release date	2024-04-30	2024-05-23	2024-04-29	2024-04-30	2024-05-31
License	Apache 2	Apache 2	AGPL3	Commercial	MIT
Commercial support	✔	✔	✔	✔	✘

Supported PostgreSQL Features

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Supported versions	12, 13, 14, 15, 16	11, 12, 13, 14, 15, 16	12, 13, 14, 15, 16	9.6, 10, 11, 12, 13, 14	11, 12, 13, 14, 15, 16
Postgres Clusters	✔	✔	✔	✔	✔
Streaming replication	✔	✔	✔	✔	✔
Supports Extensions	✔	✔	✔	✔	✔

High Availability and Backup Features

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Hot Standby	✔	✔	✔	✔	✔
Warm Standby	✔	✔	✔	✔	✔
Automatic Failover	✔	✔	✔	✔	✔
Continuous Archiving	✔	✔	✔	✔	✔
Restore from WAL archive	✔	✔	✔	✔	✔
Supports PITR	✔	✔	✔	✔	✔
Manual backups	✔	✔	✔	✔	✔
Scheduled backups	✔	✔	✔	✔	✔

Kubernetes Specific Features

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Backups via Kubernetes	✔	✘	✔	✔	✘
Custom resources	✔	✔	✔	✔	✔
Uses default PG images	✘	✔	✔	✘	✘
CLI access	✔	✔	✔	✔	✘
WebUI	✘	✘	✔	✔	✘
Tolerations	✔	✔	✔	✔	✔
Node affinity	✔	✔	✔	✔	✔

How to choose your Postgres Operator?

The graph below shows the GitHub stars of the above Postgres Operators. What we see is a clear domination of Stolon, PGO, and Zalando’s PG Operator, with CloudNativePG rushing in from behind. StackGres, while around longer than CloudNativePG doesn’t have the community backing behind it, yet. But GitHub stars aren’t everything.

All of the above tools are great options, with the exception of Stolon, which isn’t a bad tool, but I’m concerned about the lack of activity. Make of it what you like.

Before closing, I want to quickly give some honorable mentions to two further tools.

Percona has an operator for PostgreSQL, but the community is very small right now. Let’s see if they manage to bring it on par with the other tools. If you use other Percona tools, it’s certainly worth giving it a look: Percona Operator for PostgreSQL .

The other one is the External PostgreSQL Server Operator by MoveToKube . It didn’t really fit the topic of this blog post as it’s less of a Postgres Operator but more of a database (in Postgres’ relational entity sense) and users management tool. Meaning, it uses CRDs to add, update, remove databases in external PG servers, as it does for Postgres users. Anyhow, this tool also works with services like Timescale Cloud, Amazon RDS, and many more. Worth mentioning and maybe you can make use of it in the future.

The post How to choose your Kubernetes Postgres Operator? appeared first on simplyblock.

Production-grade Kubernetes PostgreSQL, Álvaro Hernández

Chris Engelbert — Fri, 05 Apr 2024 12:13:27 +0000

In this episode of the Cloud Commute podcast, Chris Engelbert is joined by Álvaro Hernández Tortosa, a prominent figure in the PostgreSQL community and CEO of OnGres. Álvaro shares his deep insights into running production-grade PostgreSQL on Kubernetes, a complex yet rewarding endeavor. The discussion covers the challenges, best practices, and innovations that make PostgreSQL a powerful database choice in cloud-native environments.

This interview is part of the simplyblock Cloud Commute Podcast, available on Youtube, Spotify, iTunes/Apple Podcasts, Pandora, Samsung Podcasts, and our show site.

Key Takeaways

Q: Should you deploy PostgreSQL in Kubernetes?

Deploying PostgreSQL in Kubernetes is a strategic move for organizations aiming for flexibility and scalability. Álvaro emphasizes that Kubernetes abstracts the underlying infrastructure, allowing PostgreSQL to run consistently across various environments—whether on-premise or in the cloud. This approach not only simplifies deployments but also ensures that the database is resilient and highly available.

Q: What are the main challenges of running PostgreSQL on Kubernetes?

Running PostgreSQL on Kubernetes presents unique challenges, particularly around storage and network performance. Network disks, commonly used in cloud environments, often lag behind local disks in performance, impacting database operations. However, these challenges can be mitigated by carefully choosing storage solutions and configuring Kubernetes to optimize performance. Furthermore, managing PostgreSQL’s ecosystem—such as backups, monitoring, and high availability—requires robust tooling and expertise, which can be streamlined with solutions like StackGres.

Q: Why should you use Kubernetes for PostgreSQL?

Kubernetes offers a powerful platform for running PostgreSQL due to its ability to abstract infrastructure details, automate deployments, and provide built-in scaling capabilities. Kubernetes facilitates the management of complex PostgreSQL environments, making it easier to achieve high availability and resilience without being locked into a specific vendor’s ecosystem.

Q: Can I use PostgreSQL on Kubernetes with PGO?

Yes, you can. Tools like the PostgreSQL Operator (PGO) for Kubernetes simplify the management of PostgreSQL clusters by automating routine tasks such as backups, scaling, and updates. These operators are essential for ensuring that PostgreSQL runs efficiently on Kubernetes while reducing the operational burden on database administrators.

In addition to highlighting the key takeaways, it’s essential to provide deeper context and insights that enrich the listener’s understanding of the episode. By offering this added layer of information, we ensure that when you tune in, you’ll have a clearer grasp of the nuances behind the discussion. This approach enhances your engagement with the content and helps shed light on the reasoning and perspective behind the thoughtful questions posed by our host, Chris Engelbert. Ultimately, this allows for a more immersive and insightful listening experience.

Key Learnings

Q: How does Kubernetes scheduler work with PostgreSQL?

Kubernetes uses its scheduler to manage how and where PostgreSQL instances are deployed, ensuring optimal resource utilization. However, understanding the nuances of Kubernetes’ scheduling can help optimize PostgreSQL performance, especially in environments with fluctuating workloads.

simplyblock Insight: Leveraging simplyblock’s solution, users can integrate sophisticated monitoring and management tools with Kubernetes, allowing them to automate the scaling and scheduling of PostgreSQL workloads, thereby ensuring that database resources are efficiently utilized and downtime is minimized. Q: What is the best experience of running PostgreSQL in Kubernetes?

The best experience comes from utilizing a Kubernetes operator like StackGres, which simplifies the deployment and management of PostgreSQL clusters. StackGres handles critical functions such as backups, monitoring, and high availability out of the box, providing a seamless experience for both seasoned DBAs and those new to PostgreSQL on Kubernetes.

simplyblock Insight: By using simplyblock’s Kubernetes-based solutions, you can further enhance your PostgreSQL deployments with features like dynamic scaling and automated failover, ensuring that your database remains resilient and performs optimally under varying loads. Q: How does disk access latency impact PostgreSQL performance in Kubernetes?

Disk access latency is a significant factor in PostgreSQL performance, especially in Kubernetes environments where network storage is commonly used. While network storage offers flexibility, it typically has higher latency compared to local storage, which can slow down database operations. Optimizing storage configurations in Kubernetes is crucial to minimizing latency and maintaining high performance.

simplyblock Insight: simplyblock’s advanced storage solutions for Kubernetes can help mitigate these latency issues by providing optimized, low-latency storage options tailored specifically for PostgreSQL workloads, ensuring your database runs at peak efficiency. Q: What are the advantages of clustering in PostgreSQL on Kubernetes?

Clustering PostgreSQL in Kubernetes offers several advantages, including improved fault tolerance, load balancing, and easier scaling. Kubernetes operators like StackGres enable automated clustering, which simplifies the process of setting up and managing a highly available PostgreSQL cluster.

simplyblock Insight: With simplyblock, you can easily deploy clustered PostgreSQL environments that automatically adjust to your workload demands, ensuring continuous availability and optimal performance across all nodes in your cluster.

Additional Nugget of Information

Q: What are the advantages of clustering in Postgres? A: Clustering in PostgreSQL provides several benefits, including improved performance, high availability, and better fault tolerance. Clustering allows multiple database instances to work together, distributing the load and ensuring that if one node fails, others can take over without downtime. This setup is particularly advantageous for large-scale applications that require high availability and resilience. Clustering also enables better scalability, as you can add more nodes to handle increasing workloads, ensuring consistent performance as demand grows.

Conclusion

Deploying PostgreSQL on Kubernetes offers powerful capabilities but comes with challenges. Álvaro Hernández Tortosa highlights how StackGres simplifies this process, enhancing performance, ensuring high availability, and making PostgreSQL more accessible. With the right tools and insights, you can confidently manage PostgreSQL in a cloud-native environment.

Full Video Transcript

Chris Engelbert: Welcome to this week’s episode of Cloud Commute podcast by simplyblock. Today, I have another incredible guest, a really good friend, Álvaro Hernández from OnGres. He’s very big in the Postgres community. So hello, and welcome, Álvaro.

Álvaro Hernández Tortosa: Thank you very much, first of all, for having me here. It’s an honor.

Chris Engelbert: Maybe just start by introducing yourself, who you are, what you’ve done in the past, how you got here. Well, except me inviting you.

Álvaro Hernández Tortosa: OK, well, I don’t know how to describe myself, but I would say, first of all, I’m a big nerd, big fan of open source. And I’ve been working with Postgres, I don’t know, for more than 20 years, 24 years now. So I’m a big Postgres person. There’s someone out there in the community that says that if you say Postgres three times, I will pop up there. It’s kind of like Superman or Batman or these superheroes. No, I’m not a superhero. But anyway, professionally, I’m the founder and CEO of a company called OnGres. Let’s guess what it means, On Postgres. So it’s pretty obvious what we do. So everything revolves around Postgres, but in reality, I love all kinds of technology. I’ve been working a lot with many other technologies. I know you because of being a Java programmer, which is kind of my hobby. I love programming in my free time, which almost doesn’t exist. But I try to get some from time to time. And everything related to technology in general, I’m also a big fan and supporter of open source. I have contributed and keep contributing a lot to open source. I also founded some open source communities, like for example, I’m a Spaniard. I live in Spain. And I founded Debian Spain, an association like, I don’t know, 20 years ago. More recently, I also founded a foundation, a nonprofit foundation also in Spain called Fundación PostgreSQL. Again, guess what it does? And I try to engage a lot with the open source communities. We, by the way, organized a conference for those who are interested in Postgres in the magnificent island of Ibiza in the Mediterranean Sea in September this year, 9th to 11th September for those who want to join. So yeah, that’s probably a brief intro about myself.

Chris Engelbert: All right. So you are basically the Beetlejuice of Postgres. That’s what you’re saying.

Álvaro Hernández Tortosa: Beetlejuice, right. That’s more upper bid than superheroes. You’re absolutely right.

Chris Engelbert: I’m not sure if he is a superhero, but he’s different at least. Anyway, you mentioned OnGres. And I know OnGres isn’t really like the first company. There were quite a few before, I think, El Toro, a database company.

Álvaro Hernández Tortosa: Yes, Toro DB.

Chris Engelbert: Oh, Toro DB. Sorry, close, close, very close. So what is up with that? You’re trying to do a lot of different things and seem to love trying new things, right?

Álvaro Hernández Tortosa: Yes. So I sometimes define myself as a 0.x serial entrepreneur, meaning that I’ve tried several ventures and sold none of them. But I’m still trying. I like to try to be resilient, and I keep pushing the ideas that I have in the back of my head. So yes, I’ve done several ventures, all of them, around certain patterns. So for example, you’re asking about Toro DB. Toro DB is essentially an open source software that is meant to replace MongoDB with, you guessed it, Postgres, right? There’s a certain pattern in my professional life. And Toro DB was. I’m speaking in the past because it no longer unfortunately maintained open source projects. We moved on to something else, which is OnGres. But the idea of Toro DB was to essentially replicate from Mongo DB live these documents and in the process, real time, transform them into a set of relational tables that got stored inside of a Postgres database. So it enabled you to do SQL queries on your documents that were MongoDB. So think of a MongoDB replica. You can keep your MongoDB class if you want, and then you have all the data in SQL. This was great for analytics. You could have great speed ups by normalizing data automatically and then doing queries with the power of SQL, which obviously is much broader and richer than query language MongoDB, especially for analytics. We got like 100 times faster on most queries. So it was an interesting project.

Chris Engelbert: So that means you basically generated the schema on the fly and then generated the table for that schema specifically? Interesting.

Álvaro Hernández Tortosa: Yeah, it was generating tables and columns on the fly.

OnGres StackGres: Operator for Production-Grade PostgreSQL on Kubernetes

Chris Engelbert: Right. Ok, interesting. So now you’re doing the OnGres thing. And OnGres has, I think, the main product, StackGres, as far as I know. Can you tell a little bit about that?

Álvaro Hernández Tortosa: Yes. So OnGres, as I said, means On Postgres. And one of our goals in OnGres is that we believe that Postgres is a fantastic database. I don’t need to explain that to you, right? But it’s kind of the Linux kernel, if I may use this parallel. It’s a bit bare bones. You need something around it. You need a distribution, right? So Postgres is a little bit the same thing. The core is small, it’s fantastic, it’s very featureful, it’s reliable, it’s trustable. But it needs tools around it. So our vision in OnGres is to develop this ecosystem around this Postgres core, right? And one of the things that we experience during our professional lifetime is that Postgres requires a lot of tools around it. It needs monitoring, it needs backups, it needs high availability, it needs connection pooling.

By the way, do not use Postgres without connection pooling, right? So you need a lot of tools around. And none of these tools come from a core. You need to look into the ecosystem. And actually, this is good and bad. It’s good because there’s a lot of options. It’s bad because there’s a lot of options. Meaning which one to choose, which one is good, which one is bad, which one goes with a good backup solution or the good monitoring solution and how you configure them all. So this was a problem that we coined as a stack problem. So when you really want to run Postgres in production, you need the stack on top of Postgres, right? To orchestrate all these components.

Now, the problem is that we’ve been doing this a lot of time for our customers. Typically, we love infrastructure scores, right? And everything was done with Ansible and similar tools and Terraform for infrastructure and Ansible for orchestrating these components. But the reality is that every environment into which we looked was slightly different. And we can just take our Ansible code and run it. You’ve got this stack. But now the storage is different. Your networking is different. Your entry point. Here, one is using virtual IPs. That one is using DNS. That one is using proxies. And then the compute is also somehow different. And it was not reusable. We were doing a lot of copy, paste, modify, something that was not very sustainable. At some point, we started thinking, is there a way in which we can pack this stack into a single deployable unit that we can take essentially anywhere? And the answer was Kubernetes. Kubernetes provides us this abstraction where we can abstract away this compute, this storage, this bit working and code against a programmable API that we can indeed create this package. So that’s a StackGres.

StackGres is the stack of components you need to run production Postgres, packaging a way that is uniform across any environment where you want to run it, cloud, on-prem, it doesn’t matter. And is production ready! It’s packaged at a very, very high level. So basically you barely need, I would say, you don’t need Postgres knowledge to run a production ready enterprise quality Postgres cluster introduction. And that’s the main goal of StackGres.

Chris Engelbert: Right, right. And as far as I know, I think it’s implemented as a Kubernetes operator, right?

Álvaro Hernández Tortosa: Yes, exactly.

Chris Engelbert: And there’s quite a few other operators as well. But I know that StackGres has some things which are done slightly differently. Can you talk a little bit about that? I don’t know how much you wanna actually make this public right now.

Álvaro Hernández Tortosa: No, actually everything is open source. Our roadmap is open source, our issues are open source. I’m happy to share everything. Well, first of all, what I would say is that the operator pattern is essentially these controllers that take actions on your cluster and the CRDs. We gave a lot of thought to these CRDs. I would say that a lot of operators, CRDs are kind of a byproduct. A second thought, “I have my objects and then some script generates the CRDs.” No, we said CRDs are our user-facing API. The CRDs are our extended API. And the goal of operators is to abstract the way and package business logic, right? And expose it with a simple user interface.

So we designed our CRDs to be very, very high level, very amenable to the user, so that again, you don’t require any Postgres expertise. So if you look at the CRDs, in practical terms, the YAMLs, right? The YAMLs that you write to deploy something on StackGres, they should be able to deploy, right? You could explain to your five-year-old kid and your five-year-old kid should be able to deploy Postgres into a production-quality cluster, right? And that’s our goal. And if we didn’t fulfill this goal, please raise an issue on our public issue tracker on GitLab because we definitely have failed if that’s not true. So instead of focusing on the Postgres usual user, very knowledgeable, very high level, most operators focused on low level CRDs and they require Postgres expertise, probably a lot. We want to make Postgres more mainstream than ever, right? Postgres increases in popularity every year and it’s being adopted by more and more organizations, but not everybody’s a Postgres expert. We want to make Postgres universally accessible for everyone. So one of the things is that we put a lot of effort into this design. And we also have, instead of like a big one, gigantic CRD. We have multiple. They actually can be attached like in an ER diagram between them. So you understand relationships, you create one and then you reference many times, you don’t need to restart or reconfigure the configuration files. Another area where I would say we have tried to do something is extensions. Postgres extensions is one of the most loved, if not the most loved feature, right?

And StackGres is the operator that arguably supports the largest number of extensions, over 200 extensions of now and growing. And we did this because we developed a custom solution, which is also open source by StackGres, where we can load extensions dynamically into the cluster. So we don’t need to build you a fat container with 200 images and a lot of security issues, right? But rather we deploy you a container with no extensions. And then you say, “I want this, this, this and that.” And then they will appear in your cluster automatically. And this is done via simple YAML. So we have a very powerful extension mechanism. And the other thing is that we not only expose the usual CRD YAML interface for interacting with StackGres, it’s more than fine and I love it, but it comes with a fully fledged web console. Not everybody also likes the command line or GitOps approach. We do, but not everybody does. And it’s a fully fledged web console which also supports single sign-on, where you can integrate with your AD, with your OIDC provider, anything that you want. Has detailed fine-grained permissions based on Kubernetes RBAC. So you can say, “Who can create clusters, who can view configurations, who can do anything?” And last but not least, there’s a REST API. So if you prefer to automate and integrate with another kind of solution, you can also use the REST API and create clusters and manage clusters via the REST API. And these three mechanisms, the YAML files, CRDs, the REST API and the web console are fully interchangeable. You can use one for one operation, the other one for everything goes back to the same. So you can use any one that you want.

And lately we also have added sharding. So sharding scales out with solutions like Citus, but we also support foreign interoperability, Postgres with partitioning and Apache ShardingSphere. Our way is to create a cluster of multiple instances. Not only one primary and one replica, but a coordinator layer and then shards, and it shares a coordinator of the replica. So typically dozens of instances, and you can create them with a simple YAML file and very high-level description, requires some knowledge and wires everything for you. So it’s very, very convenient to make things simple.

Chris Engelbert: Right. So the plugin mechanism or the extension mechanism, that was exactly what I was hinting at. That was mind-blowing. I’ve never seen anything like that when you showed it last year in Ibiza. The other thing that is always a little bit of like a hat-scratcher, I think, for a lot of people when they hear that a Kubernetes operator is actually written in Java. I think RedHat built the original framework. So it kind of makes sense that RedHat is doing that, I think the original framework was a Go library. And Java would probably not be like the first choice to do that. So how did that happen?

Álvaro Hernández Tortosa: Well, at first you’re right. Like the operator framework is written in Go and there was nothing else than Go at the time. So we were looking at that, but our team, we had a team of very, very senior Java programmers and none of them were Go programmers, right? But I’ve seen the Postgres community and all the communities that people who are kind of more in the DevOps world, then switching to Go programmers is a bit more natural, but at the same time, they are not senior from a Go programming perspective, right? The same would have happened with our team, right? They would switch from Java to Go. They would have been senior in Go, obviously, right? So it would have taken some time to develop those skills. On the other hand, we looked at what is the technology behind, what is an operator? An operator is no more than essentially an HTTP server that receives callbacks from Kubernetes and a client because it makes calls to Kubernetes. And HTTP clients and servers can read written in any language. So we look at the core, how complicated this is and how much does this operator framework bring to you? How we saw that it was not that much.

And actually something, for example, just mentioned before, the CRDs are kind of generated from your structures and we really wanted to do the opposite way. This is like the database. You use an ORM to read your database existing schema that we develop with all your SQL capabilities or you just create an object and let that generate a database. I prefer the format. So we did the same thing with the CRDs, right? And we wanted to develop them. So Java was more than okay to develop a Kubernetes operator and our team was expert in Java. So by doing it in Java, we were able to be very efficient and deliver a lot of value, a lot of features very, very fast without having to retrain anyone, learn a new language, or learn new skills. On top of this, there’s sometimes a concern that Java requires a JVM, which is kind of a heavy environment, right? And consumes memory and resources, and disk. But by default, StackGres uses a compilation technology and will build a whole project around it called GraalVM. And this allows you to generate native images that are indistinguishable from any other binary, Linux binary you can have with your system. And we deploy StackGres with native images. You can also switch JVM images if you prefer. We over expose both, but by default, there are native images. So at the end of the day, StackGres is several megabytes file, Linux binary and the container and that’s it.

Chris Engelbert: That makes sense. And I like that you basically pointed out that the efficiency of the existing developers was much more important than like being cool and going from a new language just because everyone does. So we talked about the operator quite a bit. Like what are your general thoughts on databases in the cloud or specifically in Kubernetes? What are like the issues you see, the problems running a database in such an environment? Well, it’s a wide topic, right? And I think one of the most interesting topics that we’re seeing lately is a concern about cost and performance. So there’s kind of a trade off as usual, right?

Álvaro Hernández Tortosa: There’s a trade off between the convenience I want to run a database and almost forget about it. And that’s why you switched to a cloud managed service which is not always true by the way, because forgetting about it means that nobody’s gonna then back your database, repack your tables, right? Optimize your queries, analyze if you haven’t used indexes. So if you’re very small, that’s more than okay. You can assume that you don’t need to touch your database even if you grow over a certain level, you’re gonna need the same DBAs, the same, at least to operate not the basic operations of the database which are monitoring, high availability and backups. So those are the three main areas that a managed service provides to you.

But so there’s convenience, but then there’s an additional cost. And this additional cost sometimes is quite notable, right? So it’s typically around 80% premium on a N+1/N number of instances because sometimes we need an extra even instance for many cloud services, right? And that multiply by 1.8 ends up being two point something in the usual case. So you’re overpaying that. So you need to analyze whether this is good for you from this perspective of convenience or if you want to have something else. On the other hand, almost all cloud services use network disks. And these network disks are very good and have improved performance a lot in the last years, but still they are far from the performance of a local drive, right? And running databases with local drives has its own challenges, but they can be addressed. And you can really, really move the needle by kind of, I don’t know if that’s the right term to call it self-hosting, but this trend of self-hosting, and if we could marry the simplicity and the convenience of managed services, right?

With the ability of running on any environment and running on any environment at a much higher performance, I think that’s kind of an interesting trend right now and a good sweet spot. And Kubernetes, to try to marry all the terms that you mentioned in the question, actually is one driver towards this goal because it enables us infrastructure independence and it enables both network disks and local disks and equally the same. And it’s kind of an enabler for this pattern that I see more trends, more trends as of now, more important and one that definitely we are looking forward to.

Chris Engelbert: Right, I like that you pointed out that there’s ways to address the local storage issues, just shameless plug, we’re actually working on something.

Álvaro Hernández Tortosa: I heard something.

The Biggest Trend in Containers?

Chris Engelbert: Oh, you heard something. (laughing) All right, last question because we’re also running out of time. What do you see as the biggest trend right now in containers, cloud, whatever? What do you think is like the next big thing? And don’t say AI, everyone says that.

Álvaro Hernández Tortosa: Oh, no. Well, you know what? Let me do a shameless plug here, right?

Chris Engelbert: All right. I did one. (laughing)

Álvaro Hernández Tortosa: So there’s a technology we’re working on right now that works for our use case, but will work for many use cases also, which is what we’re calling dynamic containers. So containers are essential as something that is static, right? You build a container, you have a build with your Dockerfile, whatever you use, right? And then that image is static. It is what it is. Contains the layers that you specified and that’s all. But if you look at any repository in Docker Hub, right? There’s plenty of tags. You have what, for example, Postgres. There’s Postgres based on Debian. There’s Postgres based on Alpine. There’s Postgres with this option. Then you want this extension, then you want this other extension. And then there’s a whole variety of images, right? And each of those images needs to be built independently, maintained, updated independently, right? But they’re very orthogonal. Like upgrading the Debian base OS has nothing to do with the Postgres layer, has nothing to do with the timescale extension, has nothing to do with whether I want the debug symbols or not. So we’re working on technology with the goal of being able to, as a user, express any combination of items I want for my container and get that container image without having to rebuild and maintain the image with the specific parameters that I want.

Chris Engelbert: Right, and let me guess, that is how the Postgres extension stuff works.

Álvaro Hernández Tortosa: It is meant to be, and then as a solution for the Postgres extensions, but it’s actually quite broad and quite general, right? Like, for example, I was discussing recently with some folks of the OpenTelemetry community, and the OpenTelemetry collector, which is the router for signals in the OpenTelemetry world, right? Has the same architecture, has like around 200 plugins, right? And you don’t want a container image with those 200 plugins, which potentially, because many third parties may have some security vulnerabilities, or even if there’s an update, you don’t want to update all those and restart your containers and all that, right? So why don’t you kind of get a container image with the OpenTelemetry collector with this source and this receiver and this export, right? So that’s actually probably more applicable. Yeah, I think that makes sense, right? I think that is a really good end, especially because the static containers in the past were in the original idea was that the static gives you some kind of consistency and some security on how the container looks, but we figured out over time, that is not the best solution. So I’m really looking forward to that being probably a more general thing. To be honest, actually the idea, I call it dynamic containers, but in reality, from a user perspective, they’re the same static as before. They are dynamic from the registry perspective.

Chris Engelbert: Right, okay, fair enough. All right, thank you very much. It was a pleasure like always talking to you. And for the other ones, I see, hear, or read you next week with my next guest. And thank you to Álvaro, thank you for being here. It was appreciated like always.

Álvaro Hernández Tortosa: Thank you very much.

The post Production-grade Kubernetes PostgreSQL, Álvaro Hernández appeared first on simplyblock.