Elastic Kubernetes Service Archives | simplyblock

Amazon EKS vs. ECS: Understanding the Differences and Choosing the Right Service

Rob Pankow — Fri, 06 Sep 2024 23:31:01 +0000

Introduction

When it comes to container orchestration on AWS, two primary services come to mind: Amazon Elastic Kubernetes Service (EKS) and Amazon Elastic Container Service (ECS) . Both offer robust solutions for deploying, managing, and scaling containerized applications, but each has its unique strengths and ideal use cases. Choosing the right service is crucial for optimizing performance, cost, and management efficiency.

Understanding AWS’ Amazon EKS

Overview of AWS EKS

Amazon EKS is AWS’ managed Kubernetes service, which simplifies running Kubernetes on AWS without the need to install and operate your own Kubernetes control plane or worker nodes. Kubernetes, an open-source container orchestration platform, automates the deployment, scaling, and operation of application containers.

Key Features of AWS EKS

Managed Kubernetes Control Plane : AWS handles the control plane management, ensuring high availability and security.

Integration with AWS Services : Seamless integration with other AWS services such as IAM, VPC, and CloudWatch.

Scalability : Supports both horizontal and vertical scaling, making it suitable for varying workload demands.

Security : Provides features like IAM roles for service accounts, enabling granular access control.

Benefits of using AWS EKS

Simplified Kubernetes Management : Reduces the operational burden of managing Kubernetes clusters.

Flexibility : Offers the flexibility to run Kubernetes-native applications and leverage the Kubernetes ecosystem.

High Availability : Ensures your control plane is spread across multiple AWS Availability Zones.

Understanding Amazon ECS

Overview of Amazon ECS

Amazon ECS is AWS’ native container orchestration service that supports Docker containers and allows you to run applications on a managed cluster of Amazon EC2 instances. It provides a highly scalable, high-performance orchestration service deeply integrated with the AWS ecosystem.

Key Features of AWS ECS

Native AWS Integration : Deep integration with AWS services like IAM, CloudWatch, and AWS Fargate.

Task Definitions : Define containers and their configurations through JSON task definition files.

Service Management : Allows you to maintain application availability and enables service discovery.

Benefits of using AWS ECS

Ease of Use : Simplifies the process of running and managing Docker containers.

Performance : Optimized for performance within the AWS ecosystem .

Cost-Effectiveness : Can be more cost-effective due to its integration with AWS services and straightforward pricing.

Comparing AWS EKS and ECS

Architecture – Kubernetes Vs. Native AWS:

EKS : Provides Kubernetes, an open-source platform, offering flexibility and a wide range of capabilities.

ECS : A native AWS service designed for seamless integration with other AWS offerings and hiding the complexity of managing a Kubernetes-alike infrastructure.

Deployment and Management – Complexity and Learning Curve:

EKS : Requires understanding of Kubernetes concepts, which might be more challenging for some teams.

ECS* : Easier to set up and manage, especially for users familiar with AWS.

Performance – Scalability and Efficiency:

EKS : Supports Kubernetes-native scaling solutions. ECS : Offers native scaling within AWS, including integration with AWS Auto Scaling.

Pricing Models:

EKS : Charges for the Kubernetes control plane and the compute resources (EC2 or Fargate).

ECS : No control plane costs; you pay only for the underlying compute resources.

Customization and Configurability:

EKS : Highly customizable through Kubernetes tools and extensions. ECS : Integrates well with AWS services but is little flexibility with third-party tools.

Security Features and Compliance:

Both : Offer strong security features like IAM roles and VPC integration. EKS : Additional security configurations specific to Kubernetes, like network policies.

Use Cases for Amazon EKS

When to Choose Amazon EKS

EKS is ideal for organizations already invested in Kubernetes or those requiring extensive customization and flexibility. It’s suitable for complex applications that benefit from the Kubernetes ecosystem.

Example Scenarios and Applications

Microservices Architectures : Leveraging Kubernetes’ robust orchestration capabilities.

Hybrid Deployments : Integrating on-premises Kubernetes clusters with cloud-based clusters.

Use Cases for Amazon ECS

When to Choose Amazon ECS

ECS is perfect for users seeking simplicity and tight integration with AWS services. It’s a great choice for straightforward containerized applications that don’t require extensive third-party integrations.

Example Scenarios and Applications

Batch Processing : Running large-scale batch processing tasks efficiently.

Web Applications : Deploying and managing web applications with minimal overhead.

Integration with other AWS Services

How EKS Integrates with other AWS Services

EKS integrates seamlessly with services like IAM for access control, CloudWatch for logging and monitoring, and ELB for load balancing.

How ECS Integrates with other AWS Services

ECS offers deep integration with AWS services such as IAM, CloudWatch, and AWS Fargate, providing a cohesive environment for container management.

Developer and Operations Experience

Ease of Use for Developers: EKS might require more setup and configuration due to Kubernetes’ complexity. ECS offers a more straightforward experience, especially for developers familiar with AWS.

Operations and Maintenance Considerations: EKS requires managing Kubernetes updates and configurations, while ECS offloads much of this operational overhead to AWS, simplifying maintenance.

Community and Support

Community Support for EKS: EKS benefits from the extensive Kubernetes community, providing numerous resources, plugins, and tools.

Community Support for ECS: ECS has strong support within the AWS community, with extensive documentation and integration guides.

AWS Support and Documentation Both services offer comprehensive AWS support and documentation, ensuring users can find the help they need.

Case Studies

Companies using AWS EKS

Snap Inc. : Utilizes EKS for scalable, reliable infrastructure. Intuit : Leverages EKS for Kubernetes-based application deployments.

Companies using AWS ECS

Samsung : Uses ECS for efficient container management. GE (General Electric) : Employs ECS for scalable, containerized applications.

Conclusion

Choosing between the AWS’ services Amazon EKS and Amazon ECS depends on your specific needs and expertise. EKS offers greater flexibility and integration with Kubernetes’ extensive ecosystem, making it ideal for complex applications. ECS provides a simpler, more integrated experience within the AWS ecosystem, suitable for straightforward containerized applications.

Save up to 80% on Amazon EBS Costs simplyblock can help you reduce your Amazon EBS storage costs by up to 80% through high-performance cloud block storage and seamless integration with local NVMe, EBS, and S3.

Frequently Asked Questions (FAQs)

What are the Main Differences between Amazon EKS and ECS?

AWS EKS uses Kubernetes, providing extensive customization and flexibility, while ECS is a native AWS service offering simpler management and tighter AWS integration.

Which Service is more Cost-effective?

ECS can be more cost-effective due to its straightforward pricing model, whereas EKS involves additional costs for the Kubernetes control plane.

Can i Migrate from ECS to EKS Easily?

Migrating from ECS to EKS can be complex due to the differences in orchestration and management, but AWS provides tools and documentation to facilitate the process.

Is EKS better for Large-scale Applications?

EKS is often better for large-scale applications requiring extensive customization and flexibility, leveraging Kubernetes’ capabilities.

How does AWS Support Differ for EKS and ECS?

Both services offer robust AWS support and documentation, with EKS benefiting from the broader Kubernetes community and ECS from the AWS community.

How can Simplyblock Enhance your AWS EKS or ECS Deployments?

AWS Marketplace storage solutions, such as simplyblock can help reduce your database costs on AWS by up to 80% . Simplyblock offers high-performance cloud block storage that enhances the performance of your databases and applications. This ensures you get better value and efficiency from your cloud resources.

Simplyblock software provides a seamless bridge between local NVMe disk, Amazon EBS, and Amazon S3, integrating these storage options into a single, cohesive system designed for the ultimate scale and performance of IO-intensive stateful workloads. By combining the high performance of local NVMe storage with the reliability and cost-efficiency of EBS (gp2 and gp3 volumes) and S3 respectively, simplyblock enables enterprises to optimize their storage infrastructure for stateful applications, ensuring scalability, cost savings, and enhanced performance. With simplyblock, you can save up to 80% on your EBS costs on AWS.

Our technology uses NVMe over TCP for minimal access latency, high IOPS/GB, and efficient CPU core utilization, outperforming local NVMe disks and Amazon EBS in cost/performance ratio at scale . Ideal for high-performance Kubernetes environments, simplyblock combines the benefits of local-like latency with the scalability and flexibility necessary for dynamic AWS EKS deployments , ensuring optimal performance for I/O-sensitive workloads like databases. By using erasure coding (a better RAID) instead of replicas, simplyblock minimizes storage overhead while maintaining data safety and fault tolerance. This approach reduces storage costs without compromising reliability.

Simplyblock also includes additional features such as instant snapshots (full and incremental), copy-on-write clones, thin provisioning, compression, encryption, and many more – in short, there are many ways in which simplyblock can help you optimize your cloud costs. Get started using simplyblock right now and see how simplyblock can help you on the AWS Marketplace. Simplyblock is available on AWS marketplace .

The post Amazon EKS vs. ECS: Understanding the Differences and Choosing the Right Service appeared first on simplyblock.

How to choose your Kubernetes Postgres Operator?

Chris Engelbert — Thu, 06 Jun 2024 12:09:42 +0000

A Postgres Operator for Kubernetes eases the management of PostgreSQL clusters inside the Kubernetes (k8s) environment. It watches changes (additions, updates, deletes) of PostgreSQL related CRD (custom resource definitions) and applies changes to the running clusters accordingly.

Therefore, a Postgres Operator is a critical component of the database setup. In addition to the “simply” management of the Postgres cluster, it often also provides additional integration with external tools like pgpool II or pgbouncer (for connection pooling), pgbackrest or Barman (for backup and restore), as well as Postgres extension management.

Critical components should be chosen wisely. Oftentimes, it’s hard to exchange one tool for another later on. Additionally, depending on your requirements, you may need a tool with commercial support available, while others prefer the vast open source ecosystem. If both, even better.

But how do you choose your Postgres Operator of choice? Let’s start with a quick introduction of the most common options.

Stolon

Stolon is one of the oldest operators available. Originally released in November 2015 it even predates the term Kubernetes Operator .

The project is well known and has over 4.5k stars on GitHub. However the age shows. Many features aren’t cloud-native, it doesn’t support CRDs (custom resource definition), hence configuration isn’t according to Kubernetes. All changes are handled through a command line tool. Apart from that, the last release 0.17.0 is from September 2021, and there isn’t really any new activity on the repository. You can read that as “extremely stable” or “almost abandoned”. I guess both views on the matter are correct. Anyhow, from my point of view, the lack of activity is concerning for a tool that’s potentially being used for 5-10 years, especially in a fast-paced world like the cloud-native one.

Personally, I don’t recommend using it for new setups. I still wanted to mention it though since it deserves it. It did and does a great job for many people. Still, I left it out of the comparison table further below.

CloudNativePG

CloudNativePG is the new kid on the block, or so you might think. Its official first commit happened in March 2022. However, CloudNativePG was originally developed and sponsored by EDB (EnterpriseDB), one of the oldest companies built around PostgreSQL.

As the name suggests, CloudNativePG is designed to bring the full cloud-native feeling across. Everything is defined and configured using CRDs. No matter if you need to create a new Postgres cluster, want to define backup schedules, configure automatic failover, or scale up and down. It’s all there. Fully integrated into your typical Kubernetes way of doing things. Including a plugin for kubectl.

On GitHub , CloudNativePG skyrocketed in terms of stargazers and collected over 3.6k stars over the course of its lifetime. And while it’s not officially a CNCF (Cloud Native Computing Foundation) project, the CNCF stands pretty strongly behind it. And not to forget, the feature set is on par with older projects. No need to hide here.

All in all, CloudNativePG is a strong choice for a new setup. Fully in line with your Kubernetes workflows, feature rich, and a strong and active community.

Editor’s note: Btw, we had Jimmy Angelakos as a guest on the Cloud Commute podcast , and he talks about CloudNativePG. You should listen in.

Crunchy Postgres Operator (PGO)

PGO, or the Postgres Operator from Crunchy Data , has been around since March 2017 and is a very beloved choice by a good chunk of people. On GitHub, PGO has over 3.7k stars, an active community, quick response times to issues, and overall a pretty stable timeline activity.

Postgres resources are defined and managed through CRDs, as are Postgres users. Likewise, PGO provides integration with a vast tooling ecosystem, such as patroni (for automatic failover), pgbackrest (for backup management), pgbouncer (connection pooling), and more.

A good number of common extensions are provided out of the box, however, adding additional extensions is a little bit more complicated.

Overall, Crunchy PGO is a solid, production-proven option, also into the future. While not as fresh and hip as CloudNativePG anymore, it checks all the necessary marks, and it does so for many people for many years.

OnGres StackGres

StackGres by OnGres is also fairly new and tries to do a few things differently. While all resources can be managed through CRDs, they can also be managed through a CLI, and even a web interface. Up to the point that you can create a resource through a CRD, change a small property through the CLI, and scale the Postgres cluster using the webui. All management interfaces are completely interchangeable.

Same goes for extension management. StackGres has the biggest amount of supported Postgres extensions available, and it’s hard to find an extension which isn’t supported out of the box.

In terms of tooling integration, StackGres supports all the necessary tools to build a highly available, fault tolerant, automatically backed up and restored, scalable cluster.

In comparison to some of the other operators I really like the independent CRD types, giving a better overview of a specific resource instead of bundling all of the complexity of the Postgres and tooling ecosystem configuration in one major CRD with hundreds and thousands of lines of code.

StackGres is my personal favorite, even though it only accumulated around 900 starts on GitHub so far.

While still young, the team behind it is a group of veteran PostgreSQL folks. They just know what they do. If you prefer a bit of a bigger community, and less of a company driven project, you’ll be better off with CloudNativePG, but apart from that, StackGres is the way to go.

Editor’s note: We had Álvaro Hernández, the founder and CEO of OnGres , in our Cloud Commute podcast, talking about StackGres and why you should use it. Don’t miss the first hand information.

AppsCode KubeDB

KubeDB by AppsCode is different. In development since 2017, it’s well known in the community, but an all-in commercial product. That said, commercial support is great and loved.

Additionally, KubeDB isn’t just PostgreSQL. It supports Elasticsearch, MongoDB, Redis, Memcache, MySQL, and more. It’s a real database-in-a-box deployment solution.

All boxes are checked for KubeDB and there isn’t much to say about it other than, if you need a commercially supported operator for Postgres, look no further than KubeDB.

Zalando Postgres Operator

Last but not least, the Postgres Operator from Zalando . Yes, that Zalando, the one that sells shoes and clothes.

Zalando is a big Postgres user and started early with the cloud-native journey. Their operator has been around since 2017 and has a sizable fanbase. On GitHub the operator managed to collect over 4k stars, has a very active community, a stable release cadence, and is a great choice.

In terms of integrations with the tooling ecosystem, it provides less flexibility and is slightly opinionated towards how things are done. It was developed for Zalando’s own infrastructure first and foremost though.

Anyhow, the Zalando operator has been and still is a great choice. I actually used it myself for my previous startup and it just worked.

Which Postgres Kubernetes Operator should I Use?

You already know the answer, it depends. I know we all hate that answer, but it is true. As hinted at in the beginning, if you need commercial support certain options are already out of the scope.

It also depends if you already have another Postgres cluster with an operator running. If it works, is there really a reason to change it or introduce another one for the new cluster?

Anyhow, below is a quick comparison table of features and supported versions that I think are important.

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Tool version	1.23.1	5.5.2	1.10.0	v2024.4.27	1.12.0
Release date	2024-04-30	2024-05-23	2024-04-29	2024-04-30	2024-05-31
License	Apache 2	Apache 2	AGPL3	Commercial	MIT
Commercial support	✔	✔	✔	✔	✘

Supported PostgreSQL Features

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Supported versions	12, 13, 14, 15, 16	11, 12, 13, 14, 15, 16	12, 13, 14, 15, 16	9.6, 10, 11, 12, 13, 14	11, 12, 13, 14, 15, 16
Postgres Clusters	✔	✔	✔	✔	✔
Streaming replication	✔	✔	✔	✔	✔
Supports Extensions	✔	✔	✔	✔	✔

High Availability and Backup Features

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Hot Standby	✔	✔	✔	✔	✔
Warm Standby	✔	✔	✔	✔	✔
Automatic Failover	✔	✔	✔	✔	✔
Continuous Archiving	✔	✔	✔	✔	✔
Restore from WAL archive	✔	✔	✔	✔	✔
Supports PITR	✔	✔	✔	✔	✔
Manual backups	✔	✔	✔	✔	✔
Scheduled backups	✔	✔	✔	✔	✔

Kubernetes Specific Features

	CloudNativePG	Crunchy Postgres for Kubernetes	OnGres StackGres	KubeDB	Zalando Postgres Operator
Backups via Kubernetes	✔	✘	✔	✔	✘
Custom resources	✔	✔	✔	✔	✔
Uses default PG images	✘	✔	✔	✘	✘
CLI access	✔	✔	✔	✔	✘
WebUI	✘	✘	✔	✔	✘
Tolerations	✔	✔	✔	✔	✔
Node affinity	✔	✔	✔	✔	✔

How to choose your Postgres Operator?

The graph below shows the GitHub stars of the above Postgres Operators. What we see is a clear domination of Stolon, PGO, and Zalando’s PG Operator, with CloudNativePG rushing in from behind. StackGres, while around longer than CloudNativePG doesn’t have the community backing behind it, yet. But GitHub stars aren’t everything.

All of the above tools are great options, with the exception of Stolon, which isn’t a bad tool, but I’m concerned about the lack of activity. Make of it what you like.

Before closing, I want to quickly give some honorable mentions to two further tools.

Percona has an operator for PostgreSQL, but the community is very small right now. Let’s see if they manage to bring it on par with the other tools. If you use other Percona tools, it’s certainly worth giving it a look: Percona Operator for PostgreSQL .

The other one is the External PostgreSQL Server Operator by MoveToKube . It didn’t really fit the topic of this blog post as it’s less of a Postgres Operator but more of a database (in Postgres’ relational entity sense) and users management tool. Meaning, it uses CRDs to add, update, remove databases in external PG servers, as it does for Postgres users. Anyhow, this tool also works with services like Timescale Cloud, Amazon RDS, and many more. Worth mentioning and maybe you can make use of it in the future.

The post How to choose your Kubernetes Postgres Operator? appeared first on simplyblock.

Building a Time Series Database in the Cloud with Steven Sklar from QuestDB (video + interview)

Chris Engelbert — Fri, 12 Apr 2024 12:13:27 +0000

This interview is part of the simplyblock Cloud Commute Podcast, available on Youtube , Spotify , iTunes/Apple Podcasts , Pandora , Samsung Podcasts, and our show site .

In this installment of the podcast, we talked to Steven Sklar ( his private blog , X/Twitter ) from QuestDB , a company producing a time series database for large IoT, metrics, observability and other time-component data sets, talks about how they implemented their database offering, from building their own operator to how storage is handled.

Chris Engelbert: Hello, everyone. Welcome back to another episode of simpleblock’s Cloud Commute Podcast. Today, I’m joined by Steven Sklar from QuestDB. He was recommended by a really good friend and an old coworker who’s also at QuestDB. So hello, Steven, and good to have you.

Steven Sklar: Thank you. It’s really a pleasure to be here, and I’m looking forward to our chat.

Chris Engelbert: All right, cool. So maybe just start with a quick introduction. I mean, we already know your name, and I hope I pronounced that correctly. But what else is to talk about you?

Steven Sklar: Sure. So I kind of have a nontraditional background. I started with a degree in economics and finance and worked on Wall Street for a little bit. I like to say in most of my conference talks on the first slide that my first programming language was actually Excel VBA, which I do still have a soft spot for. And I found myself on a bond trading desk and kind of reached the boundaries of Excel and started working in C# and SQL Server, realized I liked that a lot more than just kind of talking to people on the phone and negotiating over various mortgage bonds and things. So I moved into the IT realm and software development and have been developing software ever since. So I moved on from C# into the Python world, moved on from finance into the startup world, and I currently am QuestDB, as you mentioned earlier.

Chris Engelbert: Right. So maybe you can say a few words about QuestDB. What is it? What does it do? And why do people want to use it?

Steven Sklar: Sure. QuestDB is a time series database with a focus on high performance. And I like to think of ease of usability. So we can ingest up to like millions of rows per second on some benchmarks, which is just completely mind-blowing to me. It’s actually written primarily in Java, which doesn’t necessarily go hand in hand with high performance, but we’ve rewritten most of the standard library to avoid memory allocation. So I know it actually truly is high performance. We’ve actually been introducing Rust as well into the code base. You can query the database using plain old SQL. And it really fits into several use cases, like financial tick by tick data and sensor data. I have one going on in my house right now, collecting all of my smart home stuff from Home Assistant. And I mean, yes, I’ve been here for around a year and a half, I want to say. And it’s been a great ride.

Chris Engelbert: Right. So you mentioned time series. And I’m aware what time series are because I’ve been at a competitor before that. So Jaromir and I went slightly different directions, but we both ended up in the time series world. But for the audience, that may not be perfectly aware what time series are. You already mentioned tick data from the financial background. You also mentioned Home Assistant and IoT data, which is great because I’m doing the same thing for me. It’s most like energy consumption and stuff. But maybe you have some more examples.

Steven Sklar: Sure. Kind of a canonical one is monitoring and metrics. So any kind of data, I think, has a time component. Because it’s and I think you need a specialized database. A lot of people ask, well, why not just use Postgres or any of the common databases? And you could, but you’re probably not going to scale. And you’re going to hit a point where your queries are just not performing. And time series databases, in many cases, ours in particular as well, I can speak to, is a columnar database. So it stores data in a different format than you normally would see in a traditional database. And that makes querying and actually ingesting data from a wide range of sources much more efficient. And you kind of like to think of it as, I don’t want to put myself on the spot and do mental math. But imagine if you have 10,000 devices that are sending information to your database for more than a second. It’s not that big of a deal. But maybe, let’s say, you scale and you end up with a million devices. All of a sudden, you’re dealing with tremendous amounts of data going into your database that you need to manage. And that’s a different problem, I think, than your typical relational database.

Chris Engelbert: Right. And I think you brought up a good example. Most of the time when we talk about devices as I said, I’m coming from a kind of similar background. It’s not like a device just sends you a single data point. When we talk about connected cars, they actually send thousands to 100,000 of data, position information, all kinds of metrics about the car itself, the electronics, and all that kind of stuff. And that comes down to quite a massive amount of data. So yeah, I agree with you. An actual time series database is super important. You mentioned columnar storage. Maybe you can say a few words about how that is different from, I guess, your Excel sheet.

Steven Sklar: Sure. Well, I guess I don’t know if I can necessarily compare it to my Excel spreadsheet, since that’s its own weird XML format, of course. But columnar data, I guess, is different from, let’s say, tabular data in your typical database. Tabular data is generally stored in the table format, where all of your columns and rows are kind of stored together versus columnar in a data store, each column is its own separate file. And that kind of makes it more efficient when you’re working in a time component, because time is generally your index. You’re not really indexing on a lot of things like primary key type things. You’re really just mostly indexing on time, like what happened at this point in time or over this time period. Because of that, we’re able to optimize the storage model to allow faster querying and also ingestion as well. And just for clarity, I’m not a core developer. I’m more of a cloud guy, so I hope I got those details right.

Chris Engelbert: I think you get the gist of it. But for QuestDB, that means it still looks like a tabular kind of database. So you still have your typical tables, but the individual columns are stored separately. Is that correct?

Steven Sklar: Correct.

Chris Engelbert: Ok, cool. So you said you’re a cloud guy. But as far as I know, you can install QuestDB locally, on-prem. You can install it into your own private cloud. I think there is the QuestDB cloud, which is the hosted platform. Well, not I guess. I know that it is. So maybe what is special about that? Does it have special features? Or is that mostly about the convenience of getting the managed database and getting rid of all the work you have to do when you run your own database, which can be complicated.

Steven Sklar: Absolutely. So actually, both. Obviously, you don’t have to manage it, and that’s great. You can leave it to the experts. That’s already worth the price of admission, I think. Additionally, we have the enterprise, the QuestDB enterprise, which has additional features. And all of those features, like role-based authentication and replication that’s coming soon and compression of your data on disk, are all things that you get automatically through the cloud.

Chris Engelbert: Ok, so that means I have to buy QuestDB enterprise when I want to have those features on prem, but I get them on the cloud right away.

Steven Sklar: Correct.

Chris Engelbert: Ok, cool. And correct me if I’m wrong, but I think from a client perspective, it uses the Postgres protocol. So any Postgres client is a QuestDB client, basically.

Steven Sklar: Absolutely, 100%.

Chris Engelbert: All right, so that means as an application developer, it’s super simple. I’ll basically drop in QuestDB instead of Postgres or anything else. So yeah, let’s talk a little bit about the cloud then. Maybe you can elaborate a little bit on the stack you’re running on. I’m not sure how much you can actually say, but anything you can share will probably be beneficial to everyone.

Steven Sklar: Oh, yeah, no problem. So we run on AWS. We run on Kubernetes. And we also– I guess one thing that I’m particularly proud of is an operator that I wrote to orchestrate all these databases. So our model, which is not necessarily your bread and butter Kubernetes deployment, is actually a single-tenant model. So we have one database per instance. And when you’re running Kubernetes, you kind of think, why do you care about what nodes you’re running on? Shouldn’t all that be abstracted away? And I would agree. We primarily use Kubernetes for its orchestration. But we want to avoid the noisy neighbor problem. We want to make it easy for users to change instances and instance types quickly. We want users to be able to shut down their database. And we still have the volume. So all these things, we could orchestrate them directly through Kubernetes. But we decided to use single-tenant nodes for that.

Chris Engelbert: Right. So let me see. So that means you’re using Kubernetes, as you said, mostly for orchestration, which means it’s more like if the database for some reason goes down or you have to have maintenance or you want to upgrade. It’s more about the convenience of having something managing that instead of doing it manually, right?

Steven Sklar: Exactly. And so I think we really thought, ok and this is a little bit before my time, but you could always roll your own cluster. But there’s so many things that are baked into Kubernetes these days, like monitoring and logs and metrics and networking and DNS and all these things that I don’t necessarily want to spend all my time on. I want to build a product. And by using Kubernetes and leveraging those components, we were able to build the cloud incredibly quickly, get us up and running, and then now expand upon it in the future. And that’s why, again, I mentioned the operator earlier. That was not originally part of the cloud. The cloud still has in a more limited capacity what we call a provisioner. So basically, if you’re interacting with the cloud and you make a new database, basically send a message to a queue, and that message will be picked up by a provisioner. And previously, that provisioner would say, ok, you want a database. Let’s make a stateful set. Let’s make a persistent volume. Let’s make these networking policies. Let’s do all of these things. If there’s an error, we can roll back. And we have retries. So it’s fairly sophisticated. But we ended up moving towards this operator model, which instead of the provisioner managing each of these individual components, it just manages one QuestDB resource. And our operator now handles all of those other little things. So I think that’s much more flexible for us in terms of, A, simplifying the provisioner code, and also by adding new features instead of having to work in this ever-growing web of Python. Now, it’s really just adding a snippet here and there to our reconciliation inside of everything.

Chris Engelbert: Right. You mentioned that the database is mostly written in Java. Most operators are written in Go. So what about your operator? Is it Java?

Steven Sklar: It’s Go.

Chris Engelbert: That’s fair. To be honest, I think the vast majority is. So you mentioned AWS. But I think that you are mostly talking about QuestDB Cloud, right? I think from a user’s perspective, do I use a helm chart or do I also use the operator to install it myself?

Steven Sklar: Yes. So the operator is actually only limited to the cloud because it’s built specifically to manage our own infrastructure with our own assumptions. We do have a helm chart and an open source image on Docker Hub. So I’ve used that plenty of times more than I can count.

Chris Engelbert: Ok, fair enough. So you basically support all cloud environments, all on-premise. But when you go for QuestDB Cloud, that is AWS, which I think is a fair decision. It is the biggest environment by far. So from a storage engine perspective, how much can you share? Can you talk about some fancy details? Like what kind of storage do you use? Do you use the local NVMe storage attached to the virtual machine or EBS volumes?

Steven Sklar: Yeah. So in our cloud, we have both actually NVMe and EBS. Most customers end up using EBS. And honestly, EBS is simpler to provision. But I do want to actually talk about some cool stuff that we’ve done with compression. Because we actually never implemented our own compression algorithm. We’re running on top of ZFS and using their compression algorithm. And we’ve actually– there’s an issue about data corruption, potentially, using mmap on ZFS, or rather a combination of mmap and traditional sys calls, the pwrite and preads. And what we do is actually identify when we’re running on ZFS and then decide to only use mmap calls to avoid this issue. And I think what we’ve done is pretty cool also on the storage side of orchestrating this whole thing. Because ZFS has its own notion of snapshots, its own notion of replication, its own notion of ZPools. And to simplify things, again, because we’re running this kind of I don’t necessarily want to say antiquated, but we’re running a single-tenant model, which might not be in vogue these days. What we actually do is we create one ZPool per volume and throw our QuestDB on the ZPool, enabling compression. And we’ve written our own CSI storage driver that sits in the middle of Kubernetes and other cloud providers so that we’re able to pass calls onto the cloud providers if, let’s say, we need to create or delete a volume using the cloud provider API. But when it comes to mounting specific ZFS and running ZFS-related commands, we actually take control of that and perform that in our own driver. I don’t know when this is going to be released, but I’m actually talking about this in Atlanta next week.

Chris Engelbert: No. Next week is a little bit early. Currently, I’m doing a couple of recordings, building a little bit of a pipeline. Because of conferences, the same thing will be in Paris for KubeCon next week. So there is a little bit. No, I don’t know the exact date. I think it’s in three or four weeks. So it’s a little bit out. But I guess your talk may be recorded. And public by then. So if that is the case, I’m happy if you drop it over and I put it into the show notes, people will love that. So you said when you run on, or when you detect that you run on ZFS, you use mmap. So you basically map the file into memory. And you change the memory positions directly. And then you fsync it. Or how does it work? How do I have to think about that?

Steven Sklar: Oh, boy. Ok. This is getting a little out of my– So you always use mmap regardless. But the issue is when you combine mmap with traditional sys calls on ZFS. And so what we do is we basically turn off those other sys calls and only use mmap when we’re writing to our files. In terms of the specifics of when we sync and things like that, I wish I could answer it right off of the bat.

Chris Engelbert: That’s totally fine. So just to sneak in a little shameless plug here, we should totally look into getting QuestDB running on simplyblock. I think that could be a really interesting thing. Because you mentioned ZFS, it’s basically ZFS on steroids. ZFS from my perspective, I mean, I’m running a ZFS file server in the basement. It saved me a couple of times with a broken hard disk. It’s just an incredible piece of technology. I agree with that. And it’s interesting because I’ve seen a lot of people running database on ZFS. And ZFS is all about reliability. It’s not necessarily about the highest performance. So it’s interesting you choose ZFS and you say, that’s perfect and works great for us. So because we’re almost running out of time, as I said earlier, 20 minutes is super short. When you look at cloud and databases and the world as a whole, whatever you want to talk about, what do you think is the next big trend or the current big trend? What is coming? What do you think would be really cool?

Steven Sklar: Yeah. So I guess I’m not going to talk about the existential crisis I’m having with Devin and the AI bots because it’s just a little depressing for me right now. But I think one thing that I’ve been seeing over the past few years that I find very interesting is this move away from cloud and back into your own data center. I think having control over your data is something that’s incredibly important to basically everyone now. And I think it’s to find a happy medium as a DevOps engineer between all the wonderful cloud APIs that you can use and going in the server room and kind of hooking things up. There’s probably a happy medium there somewhere that I think is an area that is going to start growing in the future. You see a lot of on-prem Kubernetes type things, Kubernetes on edge maybe. And for me, it presents a lot of interesting challenges because I spent most of my career in startups working on the cloud and understanding the fundamentals of not just the cloud APIs but operating systems and hardware a little bit. And so kind of figuring out where to draw that line in terms of what knowledge is transferable to this new paradigm will be interesting. And I think that’s a new trend that I’ve been focused on at least over the past couple of months.

Chris Engelbert: That is interesting that you mentioned that because it is kind of that. When the cloud became big, everyone wanted to move to the cloud because it was like “cheaper” in air quotes. And I think– well, the next step was serverless because it is yet even cheaper, which we all know is not necessarily true. And I see kind of the same thing. Now people realize that not every workload actually works perfectly or is a great fit for the cloud and people slowly start moving back or at least going back to not necessarily cloud instance but co-located servers or virtual machines, like plain virtual machines and just taking those for the workloads that do not need to be super scalable or super elastic. Well, thank you very much. That was very delightful. It was a pleasure having you.

Steven Sklar: Thank you.

Chris Engelbert: Thank you for being here and for the audience. I hope to– well, not see you, but hear you next time, next week. Thank you very much.

Steven Sklar: Thank you. Take care.

The post Building a Time Series Database in the Cloud with Steven Sklar from QuestDB (video + interview) appeared first on simplyblock.

Production-grade Kubernetes PostgreSQL, Álvaro Hernández

Chris Engelbert — Fri, 05 Apr 2024 12:13:27 +0000

In this episode of the Cloud Commute podcast, Chris Engelbert is joined by Álvaro Hernández Tortosa, a prominent figure in the PostgreSQL community and CEO of OnGres. Álvaro shares his deep insights into running production-grade PostgreSQL on Kubernetes, a complex yet rewarding endeavor. The discussion covers the challenges, best practices, and innovations that make PostgreSQL a powerful database choice in cloud-native environments.

This interview is part of the simplyblock Cloud Commute Podcast, available on Youtube, Spotify, iTunes/Apple Podcasts, Pandora, Samsung Podcasts, and our show site.

Key Takeaways

Q: Should you deploy PostgreSQL in Kubernetes?

Deploying PostgreSQL in Kubernetes is a strategic move for organizations aiming for flexibility and scalability. Álvaro emphasizes that Kubernetes abstracts the underlying infrastructure, allowing PostgreSQL to run consistently across various environments—whether on-premise or in the cloud. This approach not only simplifies deployments but also ensures that the database is resilient and highly available.

Q: What are the main challenges of running PostgreSQL on Kubernetes?

Running PostgreSQL on Kubernetes presents unique challenges, particularly around storage and network performance. Network disks, commonly used in cloud environments, often lag behind local disks in performance, impacting database operations. However, these challenges can be mitigated by carefully choosing storage solutions and configuring Kubernetes to optimize performance. Furthermore, managing PostgreSQL’s ecosystem—such as backups, monitoring, and high availability—requires robust tooling and expertise, which can be streamlined with solutions like StackGres.

Q: Why should you use Kubernetes for PostgreSQL?

Kubernetes offers a powerful platform for running PostgreSQL due to its ability to abstract infrastructure details, automate deployments, and provide built-in scaling capabilities. Kubernetes facilitates the management of complex PostgreSQL environments, making it easier to achieve high availability and resilience without being locked into a specific vendor’s ecosystem.

Q: Can I use PostgreSQL on Kubernetes with PGO?

Yes, you can. Tools like the PostgreSQL Operator (PGO) for Kubernetes simplify the management of PostgreSQL clusters by automating routine tasks such as backups, scaling, and updates. These operators are essential for ensuring that PostgreSQL runs efficiently on Kubernetes while reducing the operational burden on database administrators.

In addition to highlighting the key takeaways, it’s essential to provide deeper context and insights that enrich the listener’s understanding of the episode. By offering this added layer of information, we ensure that when you tune in, you’ll have a clearer grasp of the nuances behind the discussion. This approach enhances your engagement with the content and helps shed light on the reasoning and perspective behind the thoughtful questions posed by our host, Chris Engelbert. Ultimately, this allows for a more immersive and insightful listening experience.

Key Learnings

Q: How does Kubernetes scheduler work with PostgreSQL?

Kubernetes uses its scheduler to manage how and where PostgreSQL instances are deployed, ensuring optimal resource utilization. However, understanding the nuances of Kubernetes’ scheduling can help optimize PostgreSQL performance, especially in environments with fluctuating workloads.

simplyblock Insight: Leveraging simplyblock’s solution, users can integrate sophisticated monitoring and management tools with Kubernetes, allowing them to automate the scaling and scheduling of PostgreSQL workloads, thereby ensuring that database resources are efficiently utilized and downtime is minimized. Q: What is the best experience of running PostgreSQL in Kubernetes?

The best experience comes from utilizing a Kubernetes operator like StackGres, which simplifies the deployment and management of PostgreSQL clusters. StackGres handles critical functions such as backups, monitoring, and high availability out of the box, providing a seamless experience for both seasoned DBAs and those new to PostgreSQL on Kubernetes.

simplyblock Insight: By using simplyblock’s Kubernetes-based solutions, you can further enhance your PostgreSQL deployments with features like dynamic scaling and automated failover, ensuring that your database remains resilient and performs optimally under varying loads. Q: How does disk access latency impact PostgreSQL performance in Kubernetes?

Disk access latency is a significant factor in PostgreSQL performance, especially in Kubernetes environments where network storage is commonly used. While network storage offers flexibility, it typically has higher latency compared to local storage, which can slow down database operations. Optimizing storage configurations in Kubernetes is crucial to minimizing latency and maintaining high performance.

simplyblock Insight: simplyblock’s advanced storage solutions for Kubernetes can help mitigate these latency issues by providing optimized, low-latency storage options tailored specifically for PostgreSQL workloads, ensuring your database runs at peak efficiency. Q: What are the advantages of clustering in PostgreSQL on Kubernetes?

Clustering PostgreSQL in Kubernetes offers several advantages, including improved fault tolerance, load balancing, and easier scaling. Kubernetes operators like StackGres enable automated clustering, which simplifies the process of setting up and managing a highly available PostgreSQL cluster.

simplyblock Insight: With simplyblock, you can easily deploy clustered PostgreSQL environments that automatically adjust to your workload demands, ensuring continuous availability and optimal performance across all nodes in your cluster.

Additional Nugget of Information

Q: What are the advantages of clustering in Postgres? A: Clustering in PostgreSQL provides several benefits, including improved performance, high availability, and better fault tolerance. Clustering allows multiple database instances to work together, distributing the load and ensuring that if one node fails, others can take over without downtime. This setup is particularly advantageous for large-scale applications that require high availability and resilience. Clustering also enables better scalability, as you can add more nodes to handle increasing workloads, ensuring consistent performance as demand grows.

Conclusion

Deploying PostgreSQL on Kubernetes offers powerful capabilities but comes with challenges. Álvaro Hernández Tortosa highlights how StackGres simplifies this process, enhancing performance, ensuring high availability, and making PostgreSQL more accessible. With the right tools and insights, you can confidently manage PostgreSQL in a cloud-native environment.

Full Video Transcript

Chris Engelbert: Welcome to this week’s episode of Cloud Commute podcast by simplyblock. Today, I have another incredible guest, a really good friend, Álvaro Hernández from OnGres. He’s very big in the Postgres community. So hello, and welcome, Álvaro.

Álvaro Hernández Tortosa: Thank you very much, first of all, for having me here. It’s an honor.

Chris Engelbert: Maybe just start by introducing yourself, who you are, what you’ve done in the past, how you got here. Well, except me inviting you.

Álvaro Hernández Tortosa: OK, well, I don’t know how to describe myself, but I would say, first of all, I’m a big nerd, big fan of open source. And I’ve been working with Postgres, I don’t know, for more than 20 years, 24 years now. So I’m a big Postgres person. There’s someone out there in the community that says that if you say Postgres three times, I will pop up there. It’s kind of like Superman or Batman or these superheroes. No, I’m not a superhero. But anyway, professionally, I’m the founder and CEO of a company called OnGres. Let’s guess what it means, On Postgres. So it’s pretty obvious what we do. So everything revolves around Postgres, but in reality, I love all kinds of technology. I’ve been working a lot with many other technologies. I know you because of being a Java programmer, which is kind of my hobby. I love programming in my free time, which almost doesn’t exist. But I try to get some from time to time. And everything related to technology in general, I’m also a big fan and supporter of open source. I have contributed and keep contributing a lot to open source. I also founded some open source communities, like for example, I’m a Spaniard. I live in Spain. And I founded Debian Spain, an association like, I don’t know, 20 years ago. More recently, I also founded a foundation, a nonprofit foundation also in Spain called Fundación PostgreSQL. Again, guess what it does? And I try to engage a lot with the open source communities. We, by the way, organized a conference for those who are interested in Postgres in the magnificent island of Ibiza in the Mediterranean Sea in September this year, 9th to 11th September for those who want to join. So yeah, that’s probably a brief intro about myself.

Chris Engelbert: All right. So you are basically the Beetlejuice of Postgres. That’s what you’re saying.

Álvaro Hernández Tortosa: Beetlejuice, right. That’s more upper bid than superheroes. You’re absolutely right.

Chris Engelbert: I’m not sure if he is a superhero, but he’s different at least. Anyway, you mentioned OnGres. And I know OnGres isn’t really like the first company. There were quite a few before, I think, El Toro, a database company.

Álvaro Hernández Tortosa: Yes, Toro DB.

Chris Engelbert: Oh, Toro DB. Sorry, close, close, very close. So what is up with that? You’re trying to do a lot of different things and seem to love trying new things, right?

Álvaro Hernández Tortosa: Yes. So I sometimes define myself as a 0.x serial entrepreneur, meaning that I’ve tried several ventures and sold none of them. But I’m still trying. I like to try to be resilient, and I keep pushing the ideas that I have in the back of my head. So yes, I’ve done several ventures, all of them, around certain patterns. So for example, you’re asking about Toro DB. Toro DB is essentially an open source software that is meant to replace MongoDB with, you guessed it, Postgres, right? There’s a certain pattern in my professional life. And Toro DB was. I’m speaking in the past because it no longer unfortunately maintained open source projects. We moved on to something else, which is OnGres. But the idea of Toro DB was to essentially replicate from Mongo DB live these documents and in the process, real time, transform them into a set of relational tables that got stored inside of a Postgres database. So it enabled you to do SQL queries on your documents that were MongoDB. So think of a MongoDB replica. You can keep your MongoDB class if you want, and then you have all the data in SQL. This was great for analytics. You could have great speed ups by normalizing data automatically and then doing queries with the power of SQL, which obviously is much broader and richer than query language MongoDB, especially for analytics. We got like 100 times faster on most queries. So it was an interesting project.

Chris Engelbert: So that means you basically generated the schema on the fly and then generated the table for that schema specifically? Interesting.

Álvaro Hernández Tortosa: Yeah, it was generating tables and columns on the fly.

OnGres StackGres: Operator for Production-Grade PostgreSQL on Kubernetes

Chris Engelbert: Right. Ok, interesting. So now you’re doing the OnGres thing. And OnGres has, I think, the main product, StackGres, as far as I know. Can you tell a little bit about that?

Álvaro Hernández Tortosa: Yes. So OnGres, as I said, means On Postgres. And one of our goals in OnGres is that we believe that Postgres is a fantastic database. I don’t need to explain that to you, right? But it’s kind of the Linux kernel, if I may use this parallel. It’s a bit bare bones. You need something around it. You need a distribution, right? So Postgres is a little bit the same thing. The core is small, it’s fantastic, it’s very featureful, it’s reliable, it’s trustable. But it needs tools around it. So our vision in OnGres is to develop this ecosystem around this Postgres core, right? And one of the things that we experience during our professional lifetime is that Postgres requires a lot of tools around it. It needs monitoring, it needs backups, it needs high availability, it needs connection pooling.

By the way, do not use Postgres without connection pooling, right? So you need a lot of tools around. And none of these tools come from a core. You need to look into the ecosystem. And actually, this is good and bad. It’s good because there’s a lot of options. It’s bad because there’s a lot of options. Meaning which one to choose, which one is good, which one is bad, which one goes with a good backup solution or the good monitoring solution and how you configure them all. So this was a problem that we coined as a stack problem. So when you really want to run Postgres in production, you need the stack on top of Postgres, right? To orchestrate all these components.

Now, the problem is that we’ve been doing this a lot of time for our customers. Typically, we love infrastructure scores, right? And everything was done with Ansible and similar tools and Terraform for infrastructure and Ansible for orchestrating these components. But the reality is that every environment into which we looked was slightly different. And we can just take our Ansible code and run it. You’ve got this stack. But now the storage is different. Your networking is different. Your entry point. Here, one is using virtual IPs. That one is using DNS. That one is using proxies. And then the compute is also somehow different. And it was not reusable. We were doing a lot of copy, paste, modify, something that was not very sustainable. At some point, we started thinking, is there a way in which we can pack this stack into a single deployable unit that we can take essentially anywhere? And the answer was Kubernetes. Kubernetes provides us this abstraction where we can abstract away this compute, this storage, this bit working and code against a programmable API that we can indeed create this package. So that’s a StackGres.

StackGres is the stack of components you need to run production Postgres, packaging a way that is uniform across any environment where you want to run it, cloud, on-prem, it doesn’t matter. And is production ready! It’s packaged at a very, very high level. So basically you barely need, I would say, you don’t need Postgres knowledge to run a production ready enterprise quality Postgres cluster introduction. And that’s the main goal of StackGres.

Chris Engelbert: Right, right. And as far as I know, I think it’s implemented as a Kubernetes operator, right?

Álvaro Hernández Tortosa: Yes, exactly.

Chris Engelbert: And there’s quite a few other operators as well. But I know that StackGres has some things which are done slightly differently. Can you talk a little bit about that? I don’t know how much you wanna actually make this public right now.

Álvaro Hernández Tortosa: No, actually everything is open source. Our roadmap is open source, our issues are open source. I’m happy to share everything. Well, first of all, what I would say is that the operator pattern is essentially these controllers that take actions on your cluster and the CRDs. We gave a lot of thought to these CRDs. I would say that a lot of operators, CRDs are kind of a byproduct. A second thought, “I have my objects and then some script generates the CRDs.” No, we said CRDs are our user-facing API. The CRDs are our extended API. And the goal of operators is to abstract the way and package business logic, right? And expose it with a simple user interface.

So we designed our CRDs to be very, very high level, very amenable to the user, so that again, you don’t require any Postgres expertise. So if you look at the CRDs, in practical terms, the YAMLs, right? The YAMLs that you write to deploy something on StackGres, they should be able to deploy, right? You could explain to your five-year-old kid and your five-year-old kid should be able to deploy Postgres into a production-quality cluster, right? And that’s our goal. And if we didn’t fulfill this goal, please raise an issue on our public issue tracker on GitLab because we definitely have failed if that’s not true. So instead of focusing on the Postgres usual user, very knowledgeable, very high level, most operators focused on low level CRDs and they require Postgres expertise, probably a lot. We want to make Postgres more mainstream than ever, right? Postgres increases in popularity every year and it’s being adopted by more and more organizations, but not everybody’s a Postgres expert. We want to make Postgres universally accessible for everyone. So one of the things is that we put a lot of effort into this design. And we also have, instead of like a big one, gigantic CRD. We have multiple. They actually can be attached like in an ER diagram between them. So you understand relationships, you create one and then you reference many times, you don’t need to restart or reconfigure the configuration files. Another area where I would say we have tried to do something is extensions. Postgres extensions is one of the most loved, if not the most loved feature, right?

And StackGres is the operator that arguably supports the largest number of extensions, over 200 extensions of now and growing. And we did this because we developed a custom solution, which is also open source by StackGres, where we can load extensions dynamically into the cluster. So we don’t need to build you a fat container with 200 images and a lot of security issues, right? But rather we deploy you a container with no extensions. And then you say, “I want this, this, this and that.” And then they will appear in your cluster automatically. And this is done via simple YAML. So we have a very powerful extension mechanism. And the other thing is that we not only expose the usual CRD YAML interface for interacting with StackGres, it’s more than fine and I love it, but it comes with a fully fledged web console. Not everybody also likes the command line or GitOps approach. We do, but not everybody does. And it’s a fully fledged web console which also supports single sign-on, where you can integrate with your AD, with your OIDC provider, anything that you want. Has detailed fine-grained permissions based on Kubernetes RBAC. So you can say, “Who can create clusters, who can view configurations, who can do anything?” And last but not least, there’s a REST API. So if you prefer to automate and integrate with another kind of solution, you can also use the REST API and create clusters and manage clusters via the REST API. And these three mechanisms, the YAML files, CRDs, the REST API and the web console are fully interchangeable. You can use one for one operation, the other one for everything goes back to the same. So you can use any one that you want.

And lately we also have added sharding. So sharding scales out with solutions like Citus, but we also support foreign interoperability, Postgres with partitioning and Apache ShardingSphere. Our way is to create a cluster of multiple instances. Not only one primary and one replica, but a coordinator layer and then shards, and it shares a coordinator of the replica. So typically dozens of instances, and you can create them with a simple YAML file and very high-level description, requires some knowledge and wires everything for you. So it’s very, very convenient to make things simple.

Chris Engelbert: Right. So the plugin mechanism or the extension mechanism, that was exactly what I was hinting at. That was mind-blowing. I’ve never seen anything like that when you showed it last year in Ibiza. The other thing that is always a little bit of like a hat-scratcher, I think, for a lot of people when they hear that a Kubernetes operator is actually written in Java. I think RedHat built the original framework. So it kind of makes sense that RedHat is doing that, I think the original framework was a Go library. And Java would probably not be like the first choice to do that. So how did that happen?

Álvaro Hernández Tortosa: Well, at first you’re right. Like the operator framework is written in Go and there was nothing else than Go at the time. So we were looking at that, but our team, we had a team of very, very senior Java programmers and none of them were Go programmers, right? But I’ve seen the Postgres community and all the communities that people who are kind of more in the DevOps world, then switching to Go programmers is a bit more natural, but at the same time, they are not senior from a Go programming perspective, right? The same would have happened with our team, right? They would switch from Java to Go. They would have been senior in Go, obviously, right? So it would have taken some time to develop those skills. On the other hand, we looked at what is the technology behind, what is an operator? An operator is no more than essentially an HTTP server that receives callbacks from Kubernetes and a client because it makes calls to Kubernetes. And HTTP clients and servers can read written in any language. So we look at the core, how complicated this is and how much does this operator framework bring to you? How we saw that it was not that much.

And actually something, for example, just mentioned before, the CRDs are kind of generated from your structures and we really wanted to do the opposite way. This is like the database. You use an ORM to read your database existing schema that we develop with all your SQL capabilities or you just create an object and let that generate a database. I prefer the format. So we did the same thing with the CRDs, right? And we wanted to develop them. So Java was more than okay to develop a Kubernetes operator and our team was expert in Java. So by doing it in Java, we were able to be very efficient and deliver a lot of value, a lot of features very, very fast without having to retrain anyone, learn a new language, or learn new skills. On top of this, there’s sometimes a concern that Java requires a JVM, which is kind of a heavy environment, right? And consumes memory and resources, and disk. But by default, StackGres uses a compilation technology and will build a whole project around it called GraalVM. And this allows you to generate native images that are indistinguishable from any other binary, Linux binary you can have with your system. And we deploy StackGres with native images. You can also switch JVM images if you prefer. We over expose both, but by default, there are native images. So at the end of the day, StackGres is several megabytes file, Linux binary and the container and that’s it.

Chris Engelbert: That makes sense. And I like that you basically pointed out that the efficiency of the existing developers was much more important than like being cool and going from a new language just because everyone does. So we talked about the operator quite a bit. Like what are your general thoughts on databases in the cloud or specifically in Kubernetes? What are like the issues you see, the problems running a database in such an environment? Well, it’s a wide topic, right? And I think one of the most interesting topics that we’re seeing lately is a concern about cost and performance. So there’s kind of a trade off as usual, right?

Álvaro Hernández Tortosa: There’s a trade off between the convenience I want to run a database and almost forget about it. And that’s why you switched to a cloud managed service which is not always true by the way, because forgetting about it means that nobody’s gonna then back your database, repack your tables, right? Optimize your queries, analyze if you haven’t used indexes. So if you’re very small, that’s more than okay. You can assume that you don’t need to touch your database even if you grow over a certain level, you’re gonna need the same DBAs, the same, at least to operate not the basic operations of the database which are monitoring, high availability and backups. So those are the three main areas that a managed service provides to you.

But so there’s convenience, but then there’s an additional cost. And this additional cost sometimes is quite notable, right? So it’s typically around 80% premium on a N+1/N number of instances because sometimes we need an extra even instance for many cloud services, right? And that multiply by 1.8 ends up being two point something in the usual case. So you’re overpaying that. So you need to analyze whether this is good for you from this perspective of convenience or if you want to have something else. On the other hand, almost all cloud services use network disks. And these network disks are very good and have improved performance a lot in the last years, but still they are far from the performance of a local drive, right? And running databases with local drives has its own challenges, but they can be addressed. And you can really, really move the needle by kind of, I don’t know if that’s the right term to call it self-hosting, but this trend of self-hosting, and if we could marry the simplicity and the convenience of managed services, right?

With the ability of running on any environment and running on any environment at a much higher performance, I think that’s kind of an interesting trend right now and a good sweet spot. And Kubernetes, to try to marry all the terms that you mentioned in the question, actually is one driver towards this goal because it enables us infrastructure independence and it enables both network disks and local disks and equally the same. And it’s kind of an enabler for this pattern that I see more trends, more trends as of now, more important and one that definitely we are looking forward to.

Chris Engelbert: Right, I like that you pointed out that there’s ways to address the local storage issues, just shameless plug, we’re actually working on something.

Álvaro Hernández Tortosa: I heard something.

The Biggest Trend in Containers?

Chris Engelbert: Oh, you heard something. (laughing) All right, last question because we’re also running out of time. What do you see as the biggest trend right now in containers, cloud, whatever? What do you think is like the next big thing? And don’t say AI, everyone says that.

Álvaro Hernández Tortosa: Oh, no. Well, you know what? Let me do a shameless plug here, right?

Chris Engelbert: All right. I did one. (laughing)

Álvaro Hernández Tortosa: So there’s a technology we’re working on right now that works for our use case, but will work for many use cases also, which is what we’re calling dynamic containers. So containers are essential as something that is static, right? You build a container, you have a build with your Dockerfile, whatever you use, right? And then that image is static. It is what it is. Contains the layers that you specified and that’s all. But if you look at any repository in Docker Hub, right? There’s plenty of tags. You have what, for example, Postgres. There’s Postgres based on Debian. There’s Postgres based on Alpine. There’s Postgres with this option. Then you want this extension, then you want this other extension. And then there’s a whole variety of images, right? And each of those images needs to be built independently, maintained, updated independently, right? But they’re very orthogonal. Like upgrading the Debian base OS has nothing to do with the Postgres layer, has nothing to do with the timescale extension, has nothing to do with whether I want the debug symbols or not. So we’re working on technology with the goal of being able to, as a user, express any combination of items I want for my container and get that container image without having to rebuild and maintain the image with the specific parameters that I want.

Chris Engelbert: Right, and let me guess, that is how the Postgres extension stuff works.

Álvaro Hernández Tortosa: It is meant to be, and then as a solution for the Postgres extensions, but it’s actually quite broad and quite general, right? Like, for example, I was discussing recently with some folks of the OpenTelemetry community, and the OpenTelemetry collector, which is the router for signals in the OpenTelemetry world, right? Has the same architecture, has like around 200 plugins, right? And you don’t want a container image with those 200 plugins, which potentially, because many third parties may have some security vulnerabilities, or even if there’s an update, you don’t want to update all those and restart your containers and all that, right? So why don’t you kind of get a container image with the OpenTelemetry collector with this source and this receiver and this export, right? So that’s actually probably more applicable. Yeah, I think that makes sense, right? I think that is a really good end, especially because the static containers in the past were in the original idea was that the static gives you some kind of consistency and some security on how the container looks, but we figured out over time, that is not the best solution. So I’m really looking forward to that being probably a more general thing. To be honest, actually the idea, I call it dynamic containers, but in reality, from a user perspective, they’re the same static as before. They are dynamic from the registry perspective.

Chris Engelbert: Right, okay, fair enough. All right, thank you very much. It was a pleasure like always talking to you. And for the other ones, I see, hear, or read you next week with my next guest. And thank you to Álvaro, thank you for being here. It was appreciated like always.

Álvaro Hernández Tortosa: Thank you very much.

The post Production-grade Kubernetes PostgreSQL, Álvaro Hernández appeared first on simplyblock.

How the CSI (Container Storage Interface) Works

Steven Sklar (Guest Author, QuestDB) — Fri, 29 Mar 2024 12:13:27 +0000

If you work with persistent storage in Kubernetes, maybe you’ve seen articles about how to migrate from in-tree to CSI volumes, but aren’t sure what all the fuss is about? Or perhaps you’re trying to debug a stuck VolumeAttachment that won’t unmount from a node, holding up your important StatefulSet rollout? A clear understanding of what the Container Storage Interface (or CSI for short) is and how it works will give you confidence when dealing with persistent data in Kubernetes, allowing you to answer these questions and more!

Editorial: This blog post is written by a guest author, Steven Sklar from QuestDB. It appeared first on his private blog at sklar.rocks. We appreciate his contributions to the Kubernetes ecosystem and wanted to thank him for letting us repost his article. Steven, you rock! 🔥

The Container Storage Interface is an API specification that enables developers to build custom drivers which handle the provisioning, attaching, and mounting of volumes in containerized workloads. As long as a driver correctly implements the CSI API spec, it can be used in any supported Container Orchestration system, like Kubernetes. This decouples persistent storage development efforts from core cluster management tooling, allowing for the rapid development and iteration of storage drivers across the cloud native ecosystem.

In Kubernetes, the CSI has replaced legacy in-tree volumes with a more flexible means of managing storage mediums. Previously, in order to take advantage of new storage types, one would have had to upgrade an entire cluster’s Kubernetes version to access new PersistentVolume API fields for a new storage type. But now, with the plethora of independent CSI drivers available, you can add any type of underlying storage to your cluster instantly, as long as there’s a driver for it.

But what if existing drivers don’t provide the features that you require and you want to build a new custom driver? Maybe you’re concerned about the ramifications of migrating from in-tree to CSI volumes? Or, you simply want to learn more about how persistent storage works in Kubernetes? Well, you’re in the right place! This article will describe what the CSI is and detail how it’s implemented in Kubernetes.

It’s APIs all the way down

Like many things in the Kubernetes ecosystem, the Container Storage Interface is actually just an API specification. In the container-storage-interface/spec GitHub repo, you can find this spec in 2 different versions:

A protobuf file that defines the API schema in gRPC terms
A markdown file that describes the overall system architecture and goes into detail about each API call

What I’m going to discuss in this section is an abridged version of that markdown file, while borrowing some nice ASCII diagrams from the repo itself!

Architecture

A CSI Driver has 2 components, a Node Plugin and a Controller Plugin. The Controller Plugin is responsible for high-level volume management; creating, deleting, attaching, detatching, snapshotting, and restoring physical (or virtualized) volumes. If you’re using a driver built for a cloud provider, like EBS on AWS, the driver’s Controller Plugin communicates with AWS HTTPS APIs to perform these operations. For other storage types like NFS, EXSI, ZFS, and more, the driver sends these requests to the underlying storage’s API endpoint, in whatever format that API accepts.

Editorial: The same is true for simplyblock. Simplyblock’s CSI driver implements all necessary, and following described calls, making it a perfect drop-in replacement for Amazon EBS. If you want to learn more read: Why simplyblock.

On the other hand, the Node Plugin is responsible for mounting and provisioning a volume once it’s been attached to a node. These low-level operations usually require privileged access, so the Node Plugin is installed on every node in your cluster’s data plane, wherever a volume could be mounted.

The Node Plugin is also responsible for reporting metrics like disk usage back to the Container Orchestration system (referred to as the “CO” in the spec). As you might have guessed already, I’ll be using Kubernetes as the CO in this post! But what makes the spec so powerful is that it can be used by any container orchestration system, like Nomad for example, as long as it abides by the contract set by the API guidelines.

The specification doc provides a few possible deployment patterns, so let’s start with the most common one.

CO "Master" Host
+-------------------------------------------+
|                                           |
|  +------------+           +------------+  |
|  |     CO     |   gRPC    | Controller |  |
|  |            +----------->   Plugin   |  |
|  +------------+           +------------+  |
|                                           |
+-------------------------------------------+

CO "Node" Host(s)
+-------------------------------------------+
|                                           |
|  +------------+           +------------+  |
|  |     CO     |   gRPC    |    Node    |  |
|  |            +----------->   Plugin   |  |
|  +------------+           +------------+  |
|                                           |
+-------------------------------------------+

Since the Controller Plugin is concerned with higher-level volume operations, it does not need to run on a host in your cluster’s data plane. For example, in AWS, the Controller makes AWS API calls like ec2:CreateVolume, ec2:AttachVolume, or ec2:CreateSnapshot to manage EBS volumes. These functions can be run anywhere, as long as the caller is authenticated with AWS. All the CO needs is to be able to send messages to the plugin over gRPC. So in this architecture, the Controller Plugin is running on a “master” host in the cluster’s control plane.

On the other hand, the Node Plugin must be running on a host in the cluster’s data plane. Once the Controller Plugin has done its job by attaching a volume to a node for a workload to use, the Node Plugin (running on that node) will take over by mounting the volume to a well-known path and optionally formatting it. At this point, the CO is free to use that path as a volume mount when creating a new containerized process; so all data on that mount will be stored on the underlying volume that was attached by the Controller Plugin. It’s important to note that the Container Orchestrator, not the Controller Plugin, is responsible for letting the Node Plugin know that it should perform the mount.

Volume Lifecycle

The spec provides a flowchart of basic volume operations, also in the form of a cool ASCII diagram:

   CreateVolume +------------+ DeleteVolume
 +------------->|  CREATED   +--------------+
 |              +---+----^---+              |
 |       Controller |    | Controller       v
+++         Publish |    | Unpublish       +++
|X|          Volume |    | Volume          | |
+-+             +---v----+---+             +-+
                | NODE_READY |
                +---+----^---+
               Node |    | Node
            Publish |    | Unpublish
             Volume |    | Volume
                +---v----+---+
                | PUBLISHED  |
                +------------+

Mounting a volume is a synchronous process: each step requires the previous one to have run successfully. For example, if a volume does not exist, how could we possibly attach it to a node?

When publishing (mounting) a volume for use by a workload, the Node Plugin first requires that the Controller Plugin has successfully published a volume at a directory that it can access. In practice, this usually means that the Controller Plugin has created the volume and attached it to a node. Now that the volume is attached, it’s time for the Node Plugin to do its job. At this point, the Node Plugin can access the volume at its device path to create a filesystem and mount it to a directory. Once it’s mounted, the volume is considered to be published and it is ready for a containerized process to use. This ends the CSI mounting workflow.

Continuing the AWS example, when the Controller Plugin publishes a volume, it calls ec2:CreateVolume followed by ec2:AttachVolume. These two API calls allocate the underlying storage by creating an EBS volume and attaching it to a particular instance. Once the volume is attached to the EC2 instance, the Node Plugin is free to format it and create a mount point on its host’s filesystem.

Here is an annotated version of the above volume lifecycle diagram, this time with the AWS calls included in the flow chart.

   CreateVolume +------------+ DeleteVolume
 +------------->|  CREATED   +--------------+
 |              +---+----^---+              |
 |       Controller |    | Controller       v
+++         Publish |    | Unpublish       +++
|X|          Volume |    | Volume          | |
+-+                 |    |                 +-+
                    |    |
  |    | 
                    |    |
  |    | 
                    |    |
                +---v----+---+
                | NODE_READY |
                +---+----^---+
               Node |    | Node
            Publish |    | Unpublish
             Volume |    | Volume
                +---v----+---+
                | PUBLISHED  |
                +------------+

If a Controller wants to delete a volume, it must first wait for the Node Plugin to safely unmount the volume to preserve data and system integrity. Otherwise, if a volume is forcibly detached from a node before unmounting it, we could experience bad things like data corruption. Once the volume is safely unpublished (unmounted) by the Node Plugin, the Controller Plugin would then call ec2:DetachVolume to detach it from the node and finally ec2:DeleteVolume to delete it, assuming that the you don’t want to reuse the volume elsewhere.

What makes the CSI so powerful is that it does not prescribe how to publish a volume. As long as your driver correctly implements the required API methods defined in the CSI spec, it will be compatible with the CSI and by extension, be usable in COs like Kubernetes and Nomad.

Running CSI Drivers in Kubernetes

What I haven’t entirely make clear yet is why the Controller and Node Plugins are plugins themselves! How does the Container Orchestrator call them, and where do they plug into?

Well, the answer depends on which Container Orchestrator you are using. Since I’m most familiar with Kubernetes, I’ll be using it to demonstrate how a CSI driver interacts with a CO.

Deployment Model

Since the Node Plugin, responsible for low-level volume operations, must be running on every node in your data plane, it is typically installed using a DaemonSet. If you have heterogeneous nodes and only want to deploy the plugin to a subset of them, you can use node selectors, affinities, or anti-affinities to control which nodes receive a Node Plugin Pod. Since the Node Plugin requires root access to modify host volumes and mounts, these Pods will be running in privileged mode. In this mode, the Node Plugin can escape its container’s security context to access the underlying node’s filesystem when performing mounting and provisioning operations. Without these elevated permissions, the Node Plugin could only operate inside of its own containerized namespace without the system-level access that it requires to provision volumes on the node.

The Controller Plugin is usually run in a Deployment because it deals with higher-level primitives like volumes and snapshots, which don’t require filesystem access to every single node in the cluster. Again, lets think about the AWS example I used earlier. If the Controller Plugin is just making AWS API calls to manage volumes and snapshots, why would it need access to a node’s root filesystem? Most Controller Plugins are stateless and highly-available, both of which lend themselves to the Deployment model. The Controller also does not need to be run in a privileged context.

Event-Driven Sidecar Pattern

Now that we know how CSI plugins are deployed in a typical cluster, it’s time to focus on how Kubernetes calls each plugin to perform CSI-related operations. A series of sidecar containers, that are registered with the Kubernetes API server to react to different events across the cluster, are deployed alongside each Controller and Node Plugin. In a way, this is similar to the typical Kubernetes controller pattern, where controllers react to changes in cluster state and attempt to reconcile the current cluster state with the desired one.

There are currently 6 different sidecars that work alongside each CSI driver to perform specific volume-related operations. Each sidecar registers itself with the Kubernetes API server and watches for changes in a specific resource type. Once the sidecar has detected a change that it must act upon, it calls the relevant plugin with one or more API calls from the CSI specification to perform the desired operations.

Controller Plugin Sidecars

Here is a table of the sidecars that run alongside a Controller Plugin:

Sidecar Name	K8s Resources Watched	CSI API Endpoints Called
external-provisioner	PersistentVolumeClaim	CreateVolume, DeleteVolume
external-attacher	VolumeAttachment	Controller(Un)PublishVolume
external-snapshotter	VolumeSnapshot (Content)	CreateSnapshot, DeleteSnapshot
external-resizer	PersistentVolumeClaim	ControllerExpandVolume

How do these sidecars work together? Let’s use an example of a StatefulSet to demonstrate. In this example, we’re dynamically provisioning our PersistentVolumes (PVs) instead of mapping PersistentVolumeClaims (PVCs) to existing PVs. We start at the creation of a new StatefulSet with a VolumeClaimTemplate.

---
apiVersion: apps/v1
kind: StatefulSet
spec:
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
         storage: 1Gi

Creating this StatefulSet will trigger the creation of a new PVC based on the above template. Once the PVC has been created, the Kubernetes API will notify the external-provisioner sidecar that this new resource was created. The external-provisioner will then send a CreateVolume message to its neighbor Controller Plugin over gRPC. From here, the CSI driver’s Controller Plugin takes over by processing the incoming gRPC message and will create a new volume based on its custom logic. In the AWS EBS driver, this would be an ec2:CreateVolume call.

At this point, the control flow moves to the built-in PersistentVolume controller, which will create a matching PV and bind it to the PVC. This allows the StatefulSet’s underlying Pod to be scheduled and assigned to a Node.

Here, the external-attacher sidecar takes over. It will be notified of the new PV and call the Controller Plugin’s ControllerPublishVolume endpoint, mounting the volume to the StatefulSet’s assigned node. This would be the equivalent to ec2:AttachVolume in AWS.

At this point, we have an EBS volume that is mounted to an EC2 instance, all based on the creation of a StatefulSet, PersistentVolumeClaim, and the work of the AWS EBS CSI Controller Plugin.

Node Plugin Sidecars

There is only one unique sidecar that is deployed alongside the Node Plugin; the node-driver-registrar. This sidecar, running as part of a DaemonSet, registers the Node Plugin with a Node’s kubelet. During the registration process, the Node Plugin will inform the kubelet that it is able to mount volumes using the CSI driver that it is part of. The kubelet itself will then wait until a Pod is scheduled to its corresponding Node, at which point it is then responsible for making the relevant CSI calls ( PublishVolume ) to the Node Plugin over gRPC.

Common Sidecars

There is also a livenessprobe sidecar that runs in both the Container and Node Plugin Pods that monitors the health of the CSI driver and reports back to the Kubernetes Liveness Probe mechanism.

Communication over Sockets

How do these sidecars communicate with the Controller and Node Plugins? Over gRPC through a shared socket! So each sidecar and plugin contains a volume mount pointing to a single unix socket.

This diagram highlights the pluggable nature of CSI Drivers. To replace one driver with another, all you have to do is simply swap the CSI Driver container with another and ensure that it’s listening to the unix socket that the sidecars are sending gRPC messages to. Becase all drivers advertise their own different capabilities and communicate over the shared CSI API contract, it’s literally a plug-and-play solution.

Conclusion

In this article, I only covered the high-level concepts of the Container Storage Interface spec and implementation in Kubernetes. While hopefully it has provided a clearer understanding of what happens once you install a CSI driver, writing one requires significant low-level knowledge of both your nodes’ operating system(s) and the underlying storage mechanism that your driver is implementing. Luckily, CSI drivers exist for a variety of cloud providers and distributed storage solutions, so it’s likely that you can find a CSI driver that already fulfills your requirements. But it always helps to know what’s happening under the hood in case your particular driver is misbehaving.

If this article interests you and you want to learn more about the topic, please let me know! I’m always happy to answer questions about CSI Drivers, Kubernetes Operators, and a myriad of other DevOps-related topics.

The post How the CSI (Container Storage Interface) Works appeared first on simplyblock.

Kubernetes for AI, GKE, and serverless with Abdellfetah Sghiouar from Google (interview)

Chris Engelbert — Fri, 22 Mar 2024 12:13:27 +0000

This interview is part of the simplyblock Cloud Commute Podcast, available on Youtube, Spotify, iTunes/Apple Podcasts, Pandora, Samsung Podcasts, and our show site.

In this installment, we’re talking to Abdel Sghiouar from Google, a company that needs no introduction. Abdel is Developer Advocate for GKE (Google Kubernetes Engine) and talks to us about Kubernetes, serverless platforms, and the future of containerization. See key learnings below on Kubernetes for AI, where to find the best Kubernetes tutorials for beginners and how simplyblock can speed up your AI/ML workload on Kubernetes. Also see interview transcript section at the end.

Key Learnings

Can Kubernetes be used for AI / ML?

Kubernetes can be used for AI/ML. There are several benefits for managing AI/ML workloads including: Scalability: Kubernetes can easily scale up or down to handle the varying computational demands of AI/ML tasks. Resource Management: It efficiently manages resources such as CPU, memory, and GPUs, ensuring optimal utilization. Portability: AI/ML models can be deployed across different environments without modification, thanks to Kubernetes’ container orchestration capabilities. Automation: Kubernetes automates deployment, scaling, and management of containerized applications, which is beneficial for continuous integration and continuous deployment (CI/CD) pipelines in AI/ML projects. Flexibility: It supports various AI/ML frameworks and tools, allowing for a versatile development and deployment ecosystem. Reproducibility: Containers ensure consistent environments for development, testing, and production, enhancing reproducibility of AI/ML experiments.

These features make Kubernetes a powerful platform for deploying, scaling, and managing AI/ML applications. Many AI companies, including those running ChatGPT, utilize Kubernetes because it allows for rapid scaling and performance management of large models and workloads.

Where to Find the best Kubernetes Tutorials for Beginners?

There are several excellent resources where beginners can find comprehensive Kubernetes tutorials, including various learning styles from video tutorials, in-depth articles, interactive labs to official documentation. Kubernetes.io Documentation: The official Kubernetes documentation provides a wealth of information, including beginner-friendly tutorials, concepts, and guides. The “Getting Started” section is particularly useful KubeAcademy by VMware: Offers free, high-quality video courses on Kubernetes basics, cluster operations, and application management. Udemy: Offers a variety of Kubernetes courses, often including hands-on labs and real-world examples. Popular courses include “Kubernetes for the Absolute Beginners” and “Kubernetes Mastery.” Coursera: Partnered with top universities and organizations to offer courses on Kubernetes. The “Architecting with Google Kubernetes Engine” specialization is a notable example. edX: Provides courses from institutions like the Linux Foundation and Red Hat. The “Introduction to Kubernetes” course by the Linux Foundation is a good starting point. YouTube: There are many YouTube channels that offer high-quality tutorials on Kubernetes. Channels like “TechWorld with Nana,” “Kunal Kushwaha,” and “DigitalOcean” provide beginner-friendly content. Play with Kubernetes: An interactive learning environment provided by Docker, offering hands-on tutorials to practice Kubernetes commands and concepts. Medium and Dev.to Articles: Both platforms have numerous articles and tutorials written by the community. Searching for “Kubernetes beginner tutorial” can yield many helpful results.

How can Simplyblock Speed up your AI / ML Workload on Kubernetes?

Simplyblock offers various features and tools that can significantly enhance the performance and efficiency of AI/ML workloads on Kubernetes. High performance, low latency: Simplyblock provides a fully cloud-native storage solution, designed for predictable low latency workloads, such as AI/ML tasks. Scalability: Kubernetes inherently supports scaling, and simplyblock enhances this capability by: Storage Scalability: Simplyblock is designed to seamlessly scale out with the growing amount of data in AI/ML use cases. Automatic Rebalancing: when nodes or disks are added to a simplyblock storage cluster, stored data is automatically rebalanced in the background for highest read and write performance, as well as lowest latency, as close to a local disk as possible. Improved Data Management: AI/ML workloads often involve large datasets. Simplyblock improves data handling by: Data Locality: Ensuring that data is processed close to where it is stored to reduce latency and improve performance. Persistent Storage Solutions: Providing robust and high-performance storage solutions that can handle the I/O demands of AI/ML workloads. Monitoring and Optimization: Effective monitoring and optimization tools provided by simplyblock help in maintaining performance and efficiency: Performance Monitoring: Offering real-time monitoring of storage usage and storage performance to identify and mitigate issues quickly Cost Optimization: Offering an easy way to reduce the cost of cloud storage without sacrificing performance, latency, or capacity, thereby reducing the overall cost of running AI/ML workloads on Kubernetes. Enhanced Security: Simplyblock ensures that AI/ML workloads on Kubernetes are secure by: Secure Data Handling: Implementing encryption and secure data transmission between the AI/ML workload and the storage cluster. Access Controls: Providing granular access controls to manage who can mount, access, and modify AI/ML workloads.

By leveraging the features provided by simplyblock, organizations can significantly speed up their AI/ML workloads on Kubernetes. These enhancements lead to improved performance, better resource utilization, scalability, and overall efficiency, ultimately resulting in faster model development and deployment cycles.

Transcript

Chris Engelbert: Hello everyone, today with me, another very good friend of mine. We have known each other for quite a few years. Abdel, and I don’t try to pronounce your last name. I’m really sorry [laughs]. Maybe you can introduce yourself real quick, like, who are you, what do you do, and where you come from.

Abdel Sghiouar: Sure. Abdel is fine. So yes, Abdel Sghiouar if you want. That’s how typically people pronounce my last name. I’m based out of Stockholm, and I work for Google. I’ve been at the company for 10 years. I do Kubernetes stuff. I do a Kubernetes podcast, and I talk about Kubernetes and containers. I arguably do more talks about why you probably do not need any of these technologies than why you actually need them. Originally from Morocco, that’s the back story. Have you made it to Devoxx Morocco yet?

Chris Engelbert: I was supposed to go this year, but I caught COVID the week before.

Abdel Sghiouar: Oh, yes, I remember.

Chris Engelbert: Yeah, that was really unfortunate. I was so looking forward to that. I was like, no.

Abdel Sghiouar: I keep promoting the conference everywhere I go, and then I forget who made it or who didn’t. Okay, well, maybe 2024 is what happened.

Chris Engelbert: Yes, next time. I promise. Like last time, I promised last time, and I still failed. All right, you said you work for Google. I think everyone knows Google, but your role is slightly different. You’re not working on a specific… Well, you work on a specific product, but as you said, you’re trying to tell people why not to use it.

Abdel Sghiouar: Which I don’t think my manager would be very happy if he knows that that’s what I do.

Chris Engelbert: All right, tell us about it so I can send it to him.

Abdel Sghiouar: So, yes, I work on GKE. GKE is Google Kubernetes Engine, which is the managed version of Kubernetes that we do on Google Cloud. That’s technically the product I work on. I think that we were having this conversation. I avoid being too salesy, so I don’t really talk about GKE itself unless there are specific circumstances. And of course, Kubernetes itself, as a platform, it’s more specific to an audience. You cannot just go to random front-end conferences and start talking about Kubernetes. No one would understand whatever you’re saying. But, yeah, GKE is my main thing. I do a lot of networking and security within GKE. I actually started this year looking a little bit into storage and provisioning because, surprise, surprise, AI is here. And people need it. And when you are doing wild machine learning workloads, really, on anything, you need storage because these models are super huge. And storage performance is important. So, yeah, so that’s what I do.

Chris Engelbert: I think predictability is also important, right? It’s important for the data model. It’s not like you want it sometimes fast and you want it sometimes slow. You want it very fast. You want it to have the same speed consistently.

Abdel Sghiouar: Yeah, that’s true. It’s actually interesting. I mean, besides the product itself, it’s interesting. A lot of these AI companies that came to the market in the last, I’d say, year, year and a half, they basically just use Kubernetes. I mean, ChatGPT is trained on Kubernetes, right? It sounds like, oh, yeah, I just need nodes with GPU, Kubernetes, boom. And to be honest, it totally makes sense, right? Because you need to have a lot of compute power. You have to scale up and down really fast, depending on how many users you currently have, right? So I think anything like that with Kubernetes makes perfect sense.

Chris Engelbert: Exactly. So you mentioned you also have a podcast. Tell me about it. It’s not like I have listeners that you don’t have, but who knows?

Abdel Sghiouar: Yeah, it’s the Kubernetes podcast by Google. It’s a show that’s been running for almost six years. I think we’re close to six years. We’ve been doing it for two years, me and my co-host. So me and Kaslin Fields from Seattle. It’s a twice a month show. We basically invite people from different parts of the cloud-native world, I would say. So we talk to maintainers. We talk to community members. We are looking into bringing some other interesting guests. One of my personal interests is basically the intersection of technology and science. And I don’t know if your audience would know this, but at KubeCon in Barcelona in 2019, one of the keynotes that was actually done was done by people from CERN who came on stage and showed how they could replicate the Higgs boson experiments on Kubernetes. So that was before I was a host. But we are exploring the idea now of finding organizations that their job is not really technology. They’re doing science, but they’re using Kubernetes to enable science. And we’re talking to them. So there’s quite a lot of interesting things happening. That’s how we are going to be releasing very soon. But yeah, so it’s a one-hour conversation. We try to do some news. We try to figure out who bought who, who acquired who, who removed which version, who changed licenses in the last two weeks since the last episode. And then we just interviewed the guests. So yeah, that’s the show essentially.

Chris Engelbert: Right. And correct me if I’m wrong, but I think you’re also writing the “What Happened in Kubernetes” newsletter this week.

Abdel Sghiouar: This week in GKE. Yes. Which is not only about GKE. So I have a newsletter on LinkedIn called “This Week in GKE”, which covers a couple of things that happened in GKE in this week, but also covers other cloud stuff.

Chris Engelbert: All right. Fair enough. So you’re allowed to talk about that.

Abdel Sghiouar: Yeah, yeah. It’s actually interesting. It started as a conversation on Twitter. Somebody put a tweet that said this other cloud provider without mentioning a name introduced a feature into their managed Kubernetes. And I replied saying, we had this on GKE for the last two years. It’s a feature called image streaming. It’s basically a feature to speed up the image pool. When you are trying to start a pod, the image has to be pulled very quickly. And then the person replied to me saying, well, you’re not doing a very good job talking about it. I was like, well, challenge accepted.

Chris Engelbert: Fair enough. So I see you’re not talking about other cloud providers. You’re not naming people or naming things. It’s like the person you’re not talking about. What is it? Not Voldemort. Is it Voldemort? I’m really bad at Harry Potter stuff.

Abdel Sghiouar: I’m really bad, too. Maybe I am as bad as you are. But yes.

Chris Engelbert: All right. Let’s dive a little bit deeper. I mean, it’s a cloud podcast. So when you build an application, how would you start? Where is the best start? Obviously, it’s GKE. But–

Abdel Sghiouar: Well, I mean, actually, arguably, no. So this is a very quick question. I mean, specifically, if you’re using any cloud provider, going straight into Kubernetes is probably going to be frustrating because you will have to learn a bunch of things. And what has been happening through the last couple of years is that a lot of these cloud providers are just offering you a CAS, a container as a service tool, Fargate on AWS. I think it’s called ACS, Azure Container Services on Azure, and Cloud Run on GCP. So you can just basically write code, put it inside the container, and then ship it to us. And then we will give you an URL and a certificate. And we will scale it up and down for you. If you are just writing an app that you need to answer a web request and scale up and down on demand, you don’t need Kubernetes for this. Where things start to be interesting or where people start looking into using Kubernetes, specifically– and of course, we’re talking here about Google Cloud, so GKE, is when you start having or start requiring things that this simple container service platform doesn’t give you. And since we’re in the age of AI, we can talk about GPUs. So if you need a GPU, the promise of a serverless platform is very fast scaling. Very fast scaling and GPUs don’t really go hand in hand. Just bringing up a Linux node and installing Nvidia drivers to have a GPU ready, that takes 15 minutes. I don’t think anybody will be able to sell a serverless platform that scales in 15 minutes. [laughs]

Chris Engelbert: It will be complicated, I guess.

Abdel Sghiouar: It will be complicated, yes. That’s where people go then into Kubernetes, when you need those really specific kinds of configuration, and more fine-tuning knobs that you can turn on and off and experiment to try things out. This is what I like to call the happy path. The happy path is you start with something simple. And as you do in this case, it gets more complicated. You move to something more complex. Of course, that’s not how it always works. And people usually just go head first, dive into GKE.

Chris Engelbert: I’m as big as Google. I need Kubernetes right now.

Abdel Sghiouar: Sure. Knock yourself down, please. Actually, managed Kubernetes makes more money than container service, technically. So whatever.

Chris Engelbert: So the container service, just for people that may not know, I think underneath is basically also something like Kubernetes. It’s just like you’ll never see that. It’s operated by the cloud provider, whatever it is. And you basically just give them an image, run it for me.

Abdel Sghiouar: Exactly. If people want to go dive into how these things are built, there is a project called Knative that was also released by the Kubernetes community a few years ago. And that’s typically what people use to be able to give you this serverless experience in a container format. So it’s Kubernetes with Knative, but everything is managed by the cloud provider. And as you said, we expose just the interface that allows you to say container in, container out.

Chris Engelbert: Fair. So about a decade ago people started really going into first VMs and doing all of that. Then we thought, oh, VMs. So it all started with physical servers are s**t. It’s bad. And it’s so heavy. So let’s use virtual machines. They’re more lightweight. And then we figure out, oh, virtual machines are still very heavy. So let’s do containers. And a couple of years ago, companies started really buying into this whole container thing. I think it basically was when Kubernetes got big. I don’t remember the original Google technology, Kubernetes was basically built after Bork was started.

Abdel Sghiouar: Bork, yes.

Chris Engelbert: Right. So Bork is the piece, it was probably borked. Anyway, we saw this big uptake in migrations to the cloud, and specifically container platforms. Do you think it has slowed down? Do you think it’s still on the rise? Is it pretty much steady? How do you see that right now?

Abdel Sghiouar: It’s a very good question. I think that there are multiple ways you can answer that question. I think that people moving to containers is something that is probably still happening. I think that what’s probably happening is that we don’t talk about it as much. Because– so the way I like to describe this is because Kubernetes as a technology is 10 years old, it’s going to be 10 years old in June, by the way. So June, I think, 5th or something. June 5, 2014 was the first pull request that was pushed, that pushed the first version of Kubernetes. It’s going to be a commodity. It’s becoming a commodity. It’s becoming something that people don’t even have to think about. And even cloud providers are making it such a way that they give you the experience of Kubernetes, which is essentially the API. And the API of Kubernetes itself is pretty cool. It’s a really nice way of expressing intent. As a developer, you just say, this is how my application looks like. Run it for me. I don’t care. And so on the other hand, also this is– also interesting is a lot of programming language frameworks started building into the framework ways of going from code to containers without Docker files. And you are a Java developer, so you know what I’m talking about. Like a Jib, you can just import the Jib plug-in in Maven. And then you just run your Maven, and then blah, you have a container. So you don’t have to think about Docker files. You don’t really have to worry about them too much. You don’t even have to learn them. And so I think that the conversation is more now about cost optimization, optimizing, bin packing, rather than the simple, oh, I want to move from a VM to a container. So the conversation shifts somewhere else because the technology itself is becoming more mainstream, I guess.

Chris Engelbert: So a couple of years ago, people asked me about Docker, just because you mentioned Docker. And they asked me what I think about Docker. And I said, well, it’s the best tool we have right now. I hope it’s not going to stick. That was probably a very mean way of saying, I think the technology, the idea of containerization and the container images is good. But I don’t think Docker is the best way. And you said a lot of tools actually started building their own kinds of interface. They all still use Docker or any of the other imaging tools underneath, but they’re trying to hide all of that from you.

Abdel Sghiouar: Yeah. And you don’t even need Docker in this case, right? I think just very quickly, I’m going to do a shameless plug here. We have an episode on the podcast where we interviewed somebody who is one of the core maintainers of ContainerD. And the episode was not about ContainerD. It was really about the history of containers. And I think it’s very important to go listen to the episode because we talked about the evolution from Docker initially to the Open Container Initiative, the OCI, which is actually a standardization part, to ContainerD, to all the container runtimes that exist on the market today. And I think through that history, you will be able to understand what you’re exactly talking about. We don’t need Docker anymore. You can build containers without even touching Docker. Because I don’t really like Docker personally.

Chris Engelbert: I think we’re not the only ones, to be honest. Anyway, what I wanted to direct my question to is that you also said you have the things like Fargate or all of those serverless technologies that underneath use Kubernetes. But it’s hidden from the user. Is that the same thing?

Abdel Sghiouar: I mean, yes, in the sense that yes, because Kubernetes is a sign, it’s becoming a commodity that people shouldn’t actually– people would probably be shocked to hear me say this. I think Kubernetes should probably have never gotten out like this. I mean, the fact that it became a super popular tool is a good thing, because it attracted a lot of interest and a lot of investments. I do not think it’s something that people should learn. But it’s a platform. It’s something you can build on top of. I mean, you need to run an application, go run it on top of a container as a service. Do not learn Kubernetes. You don’t have to, right? Like, I can see how we put it once in a tweet, which is basically, Kubernetes is a platform to build platforms. That’s what it is.

Chris Engelbert: That makes sense. You don’t even need to understand how it works. And I think that makes perfect sense. From a user’s perspective, the whole idea of Kubernetes was to abstract away whatever you run on. But now, there’s so many APIs in Kubernetes that abstract away Kubernetes that it’s basically possible to do whatever. And I think Microsoft did a really interesting implementation on top of Hyper-V, which uses micro VMs, whatever they call those things.

Abdel Sghiouar: Oh, yes, yes, yes, yes, yes, yes.

Chris Engelbert: The shared kernel kind of stuff, which is kind of the same idea as a container.

Abdel Sghiouar: Yeah, I think it’s based on Kata Containers. I know what you’re talking about.

Chris Engelbert: But it’s interesting, because they still put the Kubernetes APIs on top, or the Kubernetes implementation on top, making it look exactly like a Linux container, which is really cool. Anyway, what do you think is or will be the next big thing for the cloud? What is the trend that is upcoming? You already mentioned AI, so that is out.

Abdel Sghiouar: That’s a very big one, right? It’s actually funny, because I’m in Berlin this week. I am here for a conference, and we were chatting with some of our colleagues. And the joke I was making was next time I go on stage to a big conference, if there are people talking about AI at the same conference, I will go on stage and go, “OK, people talked about AI. Now let’s talk about the things that actually matter. Let’s talk about the thing that people are using and making money from. Let’s stop wishful thinking, right?” I think Kubernetes for AI is big. That’s going to be around. AI is not going to disappear. It’s going to be big. I think we’re in the phase where we’re discovering what people can do with it. So I think it’s a super exciting time to be alive, I think, in my opinion. There’s like a shift in our field that lots of people don’t get to experience. I think the last time such a shift happened in our field was people moving from mainframes to pizza box servers. So we’re living through some interesting times. So anyway, I think that that’s what’s going to happen. So security remains a big problem across the board for everything. Access, security, management, identity, software security. You’re a developer. You know what I’m talking about. People pulling random dependencies from the internet without knowing where they’re coming from. People pulling containers from Docker Hub without knowing who built them or how they were built. Zero ways of establishing trust, like all that stuff. So that’s going to remain a problem, I would say, but it’s going to remain a theme that we’re going to hear about over and over again. And we have to solve this eventually. I think the other thing would be just basically cost saving, because we live in an interesting world where everybody cares about cost saving. So cost optimization, bin packing, making sure you get the most out of your buck that you’re paying to your cloud provider. And I think that the cloud native ecosystem enabled a lot of people to go do some super niche solutions. I think we’re going to get to a stage now where all these companies doing super niche solutions will be filtered out in a way that only those that have really, really interesting things that solve real problems, not made up problems, will remain on the markets.

Chris Engelbert: That makes sense. Only the companies that really have a solution that solves something will stay. And I think that ethics will also play a big part in the whole AI idea. Well, in the next decade of AI, I think we need to be really careful what we do. The technology makes big steps, but we also see all of the downsides already. The main industry that is always on the forefront of every new technology already is there, and they’re misusing it for a lot of really stupid things. But it is what it is. Anyway, because we’re already running out of time. 20 Minutes is so super short. Your favorite stack right now?

Abdel Sghiouar: Yeah, you asked me that question before. Still unchanged. Go, as a backend technology, I don’t do a lot of front-end, so I don’t have a favorite stack in that space. Mac for development, Visual Studio Code, VSS for coding. I started doubling into IntelliJ, but recently I kind of like it, actually. Because I’m not a Java developer, I never had a need for it, but I’m just experimenting. And so Go for the backend. I think it’s just backend. I only do backend, so Go.

Chris Engelbert: That makes sense. Go and I have a love-hate relationship. I wouldn’t say Go is a perfect language, but it’s super efficient for microservices. When you write microservices, all of the integration of Let’s Encrypt or the ACME protocol, all that kind of stuff, it’s literally you just dump it down and it works. And that was a first for me, coming from the Java world. A lot of people claim that Java is very verbose. I don’t think so. I think Java was always meant to be more readable than writeable, which is, from my perspective, a good thing. And I sometimes think Go did some things, at least, wrong. But it’s much more complicated, because Java is coming from a very different direction. If you want to write something really small, and you hinted at the frameworks like Quarkus and stuff, Go all just has that. They built it with the idea that the standard library should be pretty much everything you need for a microservice.

Abdel Sghiouar: Exactly.

Chris Engelbert: All right. We’re recording that the week before KubeCon. KubeCon Europe, will I see you next week?

Abdel Sghiouar: Yes, I’m going to be at KubeCon. I’m going to speak at Cloud Data Rejects on the weekend, and I’ll be at KubeCon the whole week.

Chris Engelbert: All right. So if that comes out next Friday, I think, and you hear that, and you’re still at KubeCon, come see us.

Abdel Sghiouar: Yes. I’m going to be at the booth a lot of times, but you will be able to see me. I am a tall, brown person with curly hair. So I don’t know how many people like me will be there, but I speak very loud. So you’ll be able to both hear me and see me.

Chris Engelbert: That’s fair. All right. Thank you, Abdel. Thank you for being here. I really appreciated the session. It was a good chat. I love that. Thank you very much.

Abdel Sghiouar: Thanks for having me, Chris.

Key Takeaways

In this episode of simplyblock’s Cloud Commute Podcast, host Chris Engelbert welcomes Abdellfetah Sghiouar from Google, who talks about his community work for Kubernetes, his Kubernetes podcast, relevant conferences, as well as his role at Google and insights into future cloud trends with the advent of AI.

Abdel Sghiouar is a Google Kubernetes expert with 10 years of experience at the company. He also hosts a podcast where he talks about Kubernetes and containers.

Abdel focuses on networking, security, and now storage within GKE (Google Kubernetes Engine), especially for AI workloads. He emphasizes the importance of predictable latency and high-performance storage, such as simplyblock, for AI, rendering Kubernetes essential for AI as it provides scalable compute power.

His podcast has been running for almost 6 years, featuring guests from the cloud-native ecosystem, and covers cloud-native technologies and industry news. He also writes a LinkedIn newsletter, ‘This Week in GKE,’ covering GKE updates and other cloud topics.

Abdel highlights the shift from physical servers to VMs, and now to containers and discusses the simplification of container usage and the standardization through projects like Knative.

He is of the opinion that Kubernetes has become a commodity and explains that it should serve as a platform for building platforms. He also highlights the importance of user-friendly interfaces and managed services.

Abdel believes AI will continue to grow, with Kubernetes playing a significant role. Security and cost optimization will be critical focuses. He also emphasizes the need for real solutions to genuine problems, rather than niche solutions.

While asked about his favourite tech stack, Abdel includes Go for backend, Mac for development, and Visual Studio Code for coding. He also likes IntelliJ and is currently experimenting with it. Chris also appreciates Go’s efficiency for microservices despite some criticisms.

Chris and Abdel also touch upon conferences, particularly Devoxx Morocco and KubeCon.

The post Kubernetes for AI, GKE, and serverless with Abdellfetah Sghiouar from Google (interview) appeared first on simplyblock.

AWS EBS Pricing: A Comprehensive Guide

Chris Engelbert — Wed, 28 Feb 2024 12:13:26 +0000

In the vast landscape of cloud computing, Amazon Elastic Block Store (Amazon EBS) stands out as a crucial component for storage in AWS’ Amazon EKS (Elastic Kubernetes Service), as well as other AWS services.

As businesses increasingly migrate to the cloud, or build newer applications as cloud-native services, understanding the cloud cost becomes essential for cost-effective operations. With Amazon EBS often making up 50% or more of the cloud cost, it is important to grasp the intricacies of Amazon EBS pricing, explore the key concepts, and find the main factors that influence cost, as well as strategies to optimize expenses.

Understanding Amazon EBS

Amazon EBS provides scalable block-level storage volumes for use with Amazon EKS Persistent Volumes, EC2 instances, and other Amazon services. It offers various volume types, each designed for specific use cases, such as General Purpose (SSD), Provisioned IOPS (SSD), and HDD based. The choice of volume type significantly impacts performance and cost, making it vital to align storage configurations with application requirements.

Amazon EBS Pricing Breakdown

AWS pricing is complicated and requires a lot of studying the different regions, available options, as well as some good estimations of a service’s own behavior in terms of speed and capacity requirements.

Amazon EBS provides a set of different factors that influence availability, performance, capacity, and most prominently the cost.

Volume Type and Performance

Different workloads demand different levels of performance. Understanding the nature of your applications and selecting the appropriate volume type is crucial to balance cost and performance. The available volume types will be discussed further down in the blog post.

Volume Size

Amazon EBS volumes come in various sizes, and costs scale with the amount of provisioned storage per volume. Assessing the storage storage requirements and adjusting volume sizes accordingly to avoid over-provisioning can influence quite significantly.

Snapshot Costs

Creating snapshots for backup and disaster recovery is a common practice. However, snapshot costs can accumulate, especially as the frequency and volume of snapshots increase, the cost scales with the number and types of snapshots created. Additionally, there are two types of snapshots, standard, which is the default, and archive, which is cheaper on the storage side, but incurs cost when being restored. Implementing a snapshot management strategy to control expenses is crucial.

Throughput and I/O Operations

Throughput and I/O operations may or may not incur additional costs, depending on the selected volume type.

While data transfer is often easy to estimate, the necessary values for throughput and I/O operations per second (also known as IOPS ) are much harder. Especially IOPS can be a fair amount of the spending when running io-intensive workloads, such as databases, data warehouses, high-load webservers, or similar.

Be mindful of the amount of data transferred in and out of your EBS volumes, as well as the number of I/O operations performed.

Amazon EBS Volume Types

As mentioned above, Amazon EBS has quite the set of different volume types. Some are designed for specific use cases or to provide a cost-effective alternative, while others are older or newer generations for the same usage scenario.

An in-depth technical description of the different volume types can be found on AWS’ documentation .

Cheap Storage Volumes (st1 / Sc1)

The first category is designed for storage volumes that require large amounts of data storage which, at the same time, doesn’t need to provide the highest performance characteristics.

Being based upon HDD disks, the access latency is high and transfer speed is fairly low. The volume can be scaled up to 16TiB each though, reaching a high capacity at a cheap price.

Durability is typically given as 99.8% – 99.9%, meaning that the volume can be offline for roughly 9h per year. Warm ( throughput optimized) and cold volumes are available, relating to the types st1 and sc1 respectively.

General Purpose Volumes (gp2 / Gp3)

The second category is, what AWS calls, general purpose. It has the widest applicability and is the default option when looking for an Amazon EBS volume.

When creating volumes, gp2 should be avoided, being the old generation at the same price but with less features. That said, gp3 provides higher throughput and IOPS over st1 and sc1 volumes due to being SSD-based storage. Like the HDD-based services, durability is in the same range of 99.8% – 99.9%, leading to up to 9h per year unavailability. Likewise with capacity. Volumes can be scaled up to 16TiB each and therefore are perfect for a variety of use cases, such as boot volumes, simple transactional workloads, smaller databases, and similar.

Provisioned IOPS Volumes (io1 / Io2)

The third option are high-performance SSD (and NVMe) based volumes.

Amazon EBS Pricing

Prices for Amazon EBS volumes and additional upgrades depend on the region they are created in. For that reason, it is not possible to give an exact explanation of the pricing. There is, however, the chance to give an overview of what features have separate prices, and an example for one specific region.

The base Amazon EBS volume types normal price from cheapest to most expensive (GB-month):

HDD-based sc1 2. HDD-based st1 3. SSD-based gp2 4. SSD-based gp3 5. SSD-based io1 and io2

In addition to the base pricing, there are certain capabilities or aspects which can be increased for an additional cost, such as I/O Operations per Second (IOPS) Throughput

Amazon EBS Pricing example

And this is where it gets a bit more complicated. Every type of volume has its own set of base, and maximum available capabilities. Not all capabilities are available on all volume types though.

In our example, we want to create an Amazon EBS volume of type io2 in the US-EAST with 10 TB storage capacity. In addition we want to increase the available IOPS to 80,000 – just to make it complicated. For newer io2 volumes, the throughput scales proportionally with provisioned IOPS up to 4,000 MiB/s, meaning we don’t have to pay extra.

Base price for the io2 volume: The volume’s base cost is 0.125 USD/GB-month. That said, our 10 TB volume comes up to 1,250 USD per month.

Throughput capability pricing: The throughput of up to 4,000 MiB/s is automatically scaled proportionally to the provisioned IOPS, so all is good here. For other volume types, additional throughput (over the base amount) can be bought.

IOPS capability pricing: The pricing for IOPS can be as complicated as with io2 volumes. These have multiple “discount stages”. The prices are split at 32,000 and 64,000 IOPS.

With that in mind, the IOPS pricing can be broken down into: 0-32,000 IOPS * 0.065 USD/IOPS-month = 2,080 USD/month 32,001 – 64,000 IOPS * 0.046 USD/IOPS-month = 1,417.95 USD/month 64,001 – 80,000 IOPS * 0.032 USD/IOPS-month = 511.97 USD/month

Cost of the io2 volume: That means, including all cost factors (USD 1,250.00 + USD 2,080.00, USD 1,417.95, USD 511.97), the cost builds up to a monthly fee of USD 5,259.92 – for a single volume.

Strategies to Optimize Amazon EBS Spending

Amazon EBS volumes can be expensive as just shown. Therefore, it is important to keep the following strategies for cost reduction and optimization in mind.

Rightsize your Volumes

Regularly assess your storage requirements and resize volumes accordingly. Downsizing or upsizing volumes based on actual needs can result in significant cost savings. If auto-growing of volumes is enabled, keep the disk growth in check. Log files, or similar, running amok can blow your spend limit in hours.

Utilize Provisioned IOPS Wisely

Provisioned IOPS volumes offer high-performance storage but come at a high cost. Use them judiciously (and not ludicrously) for applications that require consistent and low-latency performance, and consider alternatives for less demanding workloads.

Implement Snapshot Lifecycle Policies

Set up lifecycle policies for snapshots to manage retention periods and reduce unnecessary storage costs. Periodically review and clean up outdated snapshots to optimize storage usage.

Leverage EBS-Optimized Instances

Use EC2 instances that are EBS-optimized for better performance. This ensures that the network traffic between EC2 instances and EBS volumes does not negatively impact overall system performance.

Conclusive Thoughts

As businesses continue to leverage AWS services, understanding and optimizing Amazon EBS spending is a key aspect of efficient cloud management. By carefully selecting the right volume types, managing sizes, and implementing cost-saving strategies, organizations can strike a balance between performance and cost-effectiveness in their cloud storage infrastructure. Regular monitoring and adjustment of storage configurations will contribute to a well-optimized and cost-efficient AWS environment.

If this feels too complicated or the requirements are hard to predict, simplyblock offers an easier, more scalable, and future-proof solution. Running right in your AWS account, providing you with the fastest and easiest way to build your own Amazon EBS alternative for Kubernetes, and save 60% and more on storage cost at the same time. Learn here how simplyblock works.

The post AWS EBS Pricing: A Comprehensive Guide appeared first on simplyblock.