Kubernetes Archives | simplyblock

5 Storage Solutions for Kubernetes in 2025

Chris Engelbert — Mon, 27 Jan 2025 13:04:31 +0000

Selecting your Kubernetes persistent storage may be tough due to the many available options. Not all are geared towards enterprise setups, though. Hence, we would like to briefly introduce 5 storage solutions for Kubernetes that may meet your enterprise’s diverse storage needs.

That said, as Kubernetes adoption keeps growing in 2025, selecting the right storage solution is more important than ever. Enterprise-level features, data encryption, and high availability are at the forefront of the requirements we want to look into. The same is true for the ability to attach to multiple clients simultaneously and the built-in data loss protection.

Simplyblock: Enterprise Storage Platform for Kubernetes

Simplyblock™ is an enterprise-grade storage platform designed to cater to high-performance and scalability needs for storage in Kubernetes environments. Simplyblock is fully optimized to take advantage of modern NVMe devices. It utilizes the NVMe/TCP protocol to share its shared volumes between the storage cluster and clients, providing superior throughput and lower access latency than the alternatives.

Simplyblock is designed as a cloud-native solution and is highly integrated with Kubernetes through its Simplyblock CSI driver. It supports dynamic provisioning, snapshots, clones, volume resizing, fully integrated encryption at rest, and many more. One benefit of simplyblock is its use of NVMe over TCP which is integrated into the Linux and Windows (Server 2025 or later) kernels, meaning no additional drivers. This also means it is easy to use simplyblock volumes outside of Kubernetes if you also operate virtual machines and would like to unify your storage. Furthermore, simplyblock volumes support read-write multi-attach. That means they can be attached to multiple pods, VMs, or both at the same time, making it easy to share data.

Its scale-out architecture provides full multi-tenant isolation, meaning that many customers can share the same storage backend. Logical volumes can be encrypted at rest either by an encryption key per tenant or even per logical volume, providing the strongest isolation option.

Deployment-wise, simplyblock offers the best of both worlds: disaggregated and hyperconverged setups. Simplyblock’s storage engine can be deployed on either a set of separate storage nodes, building a disaggregated storage cluster, or on Kubernetes worker nodes to utilize the worker node-local storage. Simplyblock also supports a mixed setup, where workloads can benefit from ultra-low latency with worker node-local storage (node-affinity) and the security and “spill-over” of the infinite storage from the disaggregated cluster.

As the only solution presented here, simplyblock favors erasure coding over replication for high availability and fault tolerance. Erasure coding is quite similar to RAID and uses parity information to achieve data loss protection. Simplyblock distributes this parity information across cluster nodes for higher fault tolerance. That said, erasure coding has a configuration similar to a replication factor, defining how many chunks and parity information will be used per calculation. This enables the best trade-off between data protection and storage overhead, enabling secure setups with as little as 50% additional storage requirements.

Furthermore, simplyblock provides a full multi-tier solution that caters to diverse storage needs in a single system. It enables you to utilize ultra-fast flash storage devices such as NVMe alongside slower SATA/SAS SSDs. At the same time, you can manage your long-term (cold) storage with traditional HDDs or QLC-based flash storage (slower but very high capacity).

Simplyblock is a robust choice if you need scalable, high-performance block storage for use cases such as databases, CDNs (Content Delivery Network), analytics solutions, and similar. Furthermore, simplyblock offers high throughput and low access latency. With its use of erasure coding, simplyblock is a great solution for companies seeking cost-effective storage while its ease of use allows organizations to adapt quickly to changing storage demands. For businesses seeking a modern, efficient, and Kubernetes-optimized block storage solution, simplyblock offers a compelling combination of features and performance.

Portworx: Kubernetes Storage and Data Management

Portworx is a cloud-native, software-defined storage platform that is highly integrated with Kubernetes. It is an enterprise-grade, closed-source solution that was acquired and is in active development by Pure Storage. Hence, its integration with the Pure Storage hardware appliances enables a performant, scalable storage option with integrated tiering capabilities.

Portworx integrated with Kubernetes through its native CSI driver and provides important CSI features such as dynamic provisioning, snapshots, clones, resizing, and persistent or ephemeral volumes. Furthermore, Portworx supports data-at-rest encryption and disaster recovery using synchronous and asynchronous cluster replication.

To enable fault tolerance and high availability, Portworx utilizes replicas, storing copies of data on different cluster nodes. This multiplies the required disk space by the replication factor. For the connection between the storage cluster and clients, Portworx provides access via iSCSI, a fairly old protocol that isn’t necessarily optimized for fast flash storage.

For connections between Pure’s FlashArray and Portworx, you can use NVMe/TCP or NVMe-RoCE (NVMe with RDMA over Converged Ethernet) — a mouthful, I know.

Encryption at rest is supported with either a unique key per volume or a cluster-wide encryption key. For storage-client communication, while iSCSI should be separated into its own VLAN, remember that iSCSI itself isn’t encrypted, meaning that encryption in transit isn’t guaranteed (if not pushed through a secured channel).

As mentioned, Portworx distinguishes itself by integrating with Pure Storage appliances. This integration enables organizations to leverage the performance and reliability of Pure’s flash storage systems. This makes Portworx a compelling choice for running critical stateful applications such as databases, message queues, and analytics platforms in Kubernetes, especially if you don’t fear operating hardware appliances. While available as a pure software-defined storage solution, Portworx excels in combination with Pure’s hardware, making it a great choice for databases, high-throughput message queues, and analytical applications on Kubernetes.

Ceph: Open-source, Distributed Storage System

Ceph is a highly scalable and distributed storage solution. Run as a company-backed open-source project, Ceph presents a unified storage platform with support for block, object, and file storage. That makes it a versatile choice for a wide range of Kubernetes applications.

Ceph’s Kubernetes integration is provided through the ceph-csi driver, which brings dynamically provisioned persistent volumes and automatic lifecycle management. CSI features supported by Ceph include snapshotting, cloning, resizing, and encryption.

The architecture of Ceph is built to be self-healing and self-managing, mainly designed to enable infinite disk space scalability. The provided access latency, while not on the top end, is good enough for many use cases. Running workloads like databases, which love high IOPS and low latency, can feel a bit laggy, though. Finally, high availability and fault tolerance are implemented through replication between Ceph cluster nodes.

From the security end, Ceph supports encryption at rest via a few different options. I’d recommend using the LUKS-based (Linux Unified Key Setup) setup as it supports all of the different Ceph storage options. The communication between cluster nodes, as well as storage and client, is not encrypted by default. If you require encryption in transit (and you should), utilize SSH and SSL termination via HAproxy or similar solutions. It’s unfortunate that a storage solution as big as Ceph has no such built-in support. The same goes for multi-tenancy, which can be achieved using RADOS namespaces but isn’t an out-of-the-box solution.

Ceph is an excellent choice as your Kubernetes storage when you are looking for an open-source solution with a proven track record of enterprise deployments, infinite storage scalability, and versatile storage types. It is not a good choice if you are looking for high-performance storage with low latency and millions of IOPS.

Moreover, due to its community-driven development and support, Ceph can be operated as a cost-effective and open-source alternative to proprietary storage solutions. Whether you’re deploying in a private data center, a public cloud, or a hybrid environment, Ceph’s adaptability is a great help for managing storage in containerized ecosystems.

Commercial support for Ceph is available from Red Hat.

Longhorn: Cloud-native Block Storage for Kubernetes

Longhorn is an open-source, cloud-native storage solution specifically designed for Kubernetes. It provides block storage that focuses on flexibility and ease of use. Therefore, Longhorn deploys straight into your Kubernetes cluster, providing worker node-local storage as persistent volumes.

As a cloud-native storage solution, Longhorn provides its own CSI driver, highly integrated with Longhorn and Kubernetes. It enables dynamic provisioning and management of persistent volumes, snapshots, clones, and backups. For the latter, people seem to have some complications with restores, so make sure to test your recovery processes.

For communication between storage and clients, Longhorn uses the iSCSI protocol. A newer version of the storage engine is in the works, which enables NVMe over TCP, however, at the time of writing, this engine isn’t yet production-ready and is not recommended for production use.

Anyhow, Longhorn provides good access latency and throughput, making it a great solution for mid-size databases and similar workloads. Encryption at rest can be set up but isn’t as simple as with some alternatives. High availability and fault tolerance is achieved by replicating data between cluster nodes. That means, as with many other solutions, the required storage is multiplied by the replication factor. However, Longhorn supports incremental backups to external storage systems like S3 for easy data protection and fast recoverability in disaster scenarios.

Longhorn is a full open-source project under the Cloud Native Computing Foundation (CNCF). It was originally developed by Rancher and is backed by SUSE. Hence, it’s commercially available with enterprise support as SUSE Storage.

Longhorn is a good choice if you want a lightweight, cloud-native, open-source solution that runs hyper-converged with your cluster workloads. It is usually used for smaller deployments or home labs — widely discussed on Reddit. Generally, it is not considered as robust as Ceph and, hence, is not recommended for mission-critical enterprise production workloads.

NFS: File Sharing Solution for Enterprises with Heterogeneous Environments

NFS (Network File System) is a well-known and widely adopted file-sharing protocol, inside and outside Kubernetes. That said, NFS has a proven track record showing its simplicity, reliability, and ability to provide shared access to persistent storage.

One of the main features of NFS is its ability to simultaneously attach volumes to many containers (and pods) with read-write access. That enables easy sharing of configuration, training data, or similar shared data sets between many instances or applications.

There are quite a few different options for integrating NFS with Kubernetes. The two main ones are the Kubernetes NFS Subdir External Provisioner. Both automatically create NFS subdirectories when new persistent volumes are requested, and the csi-driver-nfs. In addition, many storage solutions provide optimized NFS CSI drivers designed to provision shares for their respective solutions automatically. Such storage options include TrueNAS, OpenEBS, Dell EMC, and others.

High availability is one of the elements of NFS that isn’t simple, though. To make automatic failover work, additional tools like Corosync or Pacemaker need to be configured. On the client side, automount should be set up to handle automatic failover and reconnection. NFS is an old protocol from a time when those additional steps were commonplace. Today, they feel frumpy and out of place, especially compared to available alternatives.

While multi-tenancy isn’t strictly supported by NFS, using individual shares could be seen as a viable solution. However, remember that shares aren’t secured in any way. Authentication requires additional setups such as Kerberos. File access permissions shouldn’t be used as a sufficient setup for tenant isolation.

Encryption at rest with NFS comes down to the backing storage solution. NFS, as a sharing protocol, doesn’t offer anything by itself. Encryption in transit is supported, either via Kerberos or other means like TLS via stunnel. The implementation details differ per NFS provider, though. You should consult your provider’s manual.

NFS is your Kubernetes storage of choice if you need a simple, scalable, and shared file storage system that integrates seamlessly into your existing infrastructure. In the best case, you already have an NFS server set up and ready to go. Installing the CSI driver and configuring the storage class is all you need. While NFS might be a bottleneck for high-performance systems such as databases, many applications work perfectly fine. Imagine you need to scale out a WordPress-based website. There isn’t an easier way to share the same writable storage to many WP instances. That said, for organizations looking for a mature, battle-tested storage option to deliver shared storage with minimal complexity, NFS is the choice.

Make Your Kubernetes Storage Choice in 2025

Selecting the right storage solution for your Kubernetes persistent volume isn’t easy. It is an important choice to ensure performance, scalability, and reliability for your containerized workloads. Solutions like simplyblock™, Portworx, Ceph, Longhorn, and NFS offer a great set of features and are optimized for different use cases.

NFS is loved for its simplicity and easy multi-attach functionality. It is a great choice for all use cases needing shared write access. It’s not a great fit for high throughput and super-low access latency, though.

Ceph, on the other hand, is great if you need infinite scalability and don’t fear away from a slightly more complicated setup and operation. Ceph provides a robust choice for all use cases, as well as high-performance databases and similar IO-intensive applications.

Longhorn and Portworx are generally good choices for almost all types of applications. Both solutions provide good access latency and throughput. If you tend to buy hardware appliances, Portworx, in combination with Pure Storage, is the way to go. If you prefer pure software-defined storage and want to utilize storage available in your worker nodes, take a look at Longhorn.

Last but not least, simplyblock is your choice when running IO-hungry databases in or outside Kubernetes. Its use of the NVMe/TCP protocol makes it a perfect choice for pure container storage, as well as mixed environments with containers and virtual machines. Due to its low storage overhead for data protection, simplyblock is a great, cost-effective, and fast storage solution. And a bit of capacity always remains for all other storage needs, meaning a single solution will do it for you.

As Kubernetes evolves, leveraging the proper storage solution will significantly improve your application performance and resiliency. To ensure you make an informed decision for your short and long-term storage needs, consider factors like workload complexity, deployment scale, and data management needs.

Whether you are looking for a robust enterprise solution or a more simple and straightforward setup, these five options are all strong contenders to meet your Kubernetes storage demands in 2025.

The post 5 Storage Solutions for Kubernetes in 2025 appeared first on simplyblock.

We Built a Tool to Help You Understand Your Real EBS Usage!

Chris Engelbert — Fri, 17 Jan 2025 08:50:27 +0000

There is one question in life that is really hard to answer: “What is your actual AWS EBS volume usage?”

When talking to customers and users, this question is frequently left open with the note that they’ll check and tell us later. With storage being one of the main cost factors of cloud services such as Amazon’s AWS, this is not what it should be.

But who could blame them? It’s not like AWS is making it obvious to you how much of your storage resources (not only capacity but especially IOPS and throughput) you really use. It might be bad for AWS’ revenue.

Why We Built This

We believe that there is no reason to pay more than necessary. However, since it’s so hard to get hard facts on storage use, we tend to overprovision—by a lot.

Hence, we decided to do something about it. Today, we’re excited to share our new open-source tool – the AWS EBS Volume Usage Exporter!

What makes this particularly interesting is that, based on our experience, organizations typically utilize only 20-30% of their provisioned AWS EBS volumes. That means 70-80% of provisioned storage is sitting idle, quietly adding to your monthly AWS bill. Making someone else happy but you.

What Our Tool Does

The EBS Volume Usage Exporter runs in your EKS cluster and collects detailed metrics about your EBS volumes, including:

Actual usage patterns
IOPS consumption
Throughput utilization
Available disk space
Snapshot information

All this data gets exported into a simple CSV file that you can analyze however you want.

If you like convenience, we’ve also built a handy calculator (that runs entirely in your browser – no data leaves your machine!) to help you quickly understand potential cost savings. Here’s the link to our EBS Volume Usage Calculator. No need to use it, though. The data is easy enough to get basic insight. Our calculator just automates the pricing and potential saving calculation based on current AWS price lists.

Super Simple to Get Started

To get you started quickly, we packaged everything as a Helm chart to make deployment as smooth as possible. You’ll need:

An EKS cluster with cluster-admin privileges
An S3 bucket
Basic AWS permissions

The setup takes just a few minutes – we’ve included all the commands you need in our GitHub repository.

After a successful run, you can simply delete the helm chart deployment and be done with it. The exported data are available in the provided S3 bucket for download.

We Want Your Feedback!

This is just the beginning, and we’d love to hear from you!

Do the calculated numbers match your actual costs?
What other features would you find useful?

We already heard people asking for a tool that can run outside of EKS, and we’re looking into it. We would also love to extend support to utilize existing orchestration infrastructure such as DataDog, Dynatrace, or others. Most of the data is already available and should be easy to extract.

For those storage pros out there who can answer the EBS utilization question off the top of your head – we’d love to hear your stories, too!

Share your experiences and help us make this tool even better.

Try It Out!

The AWS EBS Volume Usage Exporter is open source and available now on GitHub. Give it a spin, and let us know what you think!

And hey – if you’re wondering whether you really need this tool, ask yourself: “Do I know exactly how much of my provisioned EBS storage is actually being used right now?”

If there’s even a moment of hesitation, you should check this out!

At simplyblock, we’re passionate about helping organizations optimize their cloud storage. This tool represents our commitment to the open-source community and our mission to eliminate cloud storage waste.

The post We Built a Tool to Help You Understand Your Real EBS Usage! appeared first on simplyblock.

Kubernetes Storage 201: Concepts and Practical Examples

Rob Pankow — Mon, 23 Dec 2024 09:08:57 +0000

What is Kubernetes Storage?

Kubernetes storage is a sophisticated ecosystem designed to address the complex data management needs of containerized applications. At its core, Kubernetes storage provides a flexible mechanism to manage data across dynamic, distributed computing environments. It allows your containers to store, access, and persist data with unprecedented flexibility.

Storage Types in Kubernetes

Fundamentally, Kubernetes provides two types of storage: ephemeral volumes are bound to the container’s lifecycle, and persistent volumes survive a container restart or termination.

Ephemeral (Non-Persistent) Storage

Ephemeral storage represents the default storage mechanism in Kubernetes. It provides a temporary storage solution, existing only for the duration of a container’s lifecycle. Therefore, when a container is terminated or removed, all data stored in this temporary storage location is permanently deleted.

This type of storage is ideal for transient data that doesn’t require long-term preservation, such as temporary computation results or cache files. Most stateless workloads utilize ephemeral storage for these kinds of temporary data. That said, a “stateless workload” doesn’t necessarily mean no data is stored temporarily. It means there is no issue if this storage disappears from one second to the next.

Persistent Storage

Persistent storage is a critical concept in Kubernetes that addresses one of the fundamental challenges of containerized applications: maintaining data integrity and accessibility across dynamic and ephemeral computing environments.

Unlike ephemeral storage, which exists only for the lifetime of a container, persistent storage is not bound to the lifetime of a container. Hence, persistent storage provides a robust mechanism for storing and managing data that must survive container restarts, pod rescheduling, or even complete cluster redesigns. You enable persistent Kubernetes storage through the concepts of Persistent Volumes (PV) as well as Persistent Volume Claims (PVC).

Fundamental Kubernetes Storage Entities

Figure 1: The building blocks of Kubernetes Storage

Storage in Kubernetes is built up from multiple entities, depending on how storage is provided and if it is ephemeral or persistent.

Persistent Volumes (PV)

A Persistent Volume (PV) is a slice of storage in the Kubernetes cluster that has been provisioned by an administrator or dynamically created through a StorageClass. Think of a PV as a virtual storage resource that exists independently of any individual pod’s lifecycle. Consequently, this abstraction allows for several key capabilities:

Lifecycle Independence: A PV exists even when the owner-pod is deleted, recreated, or moved to different nodes.
Storage Type Abstraction: A PV is not bound to a specific storage technology. Hence, it can represent various storage types, including:

Persistent Volume Claims (PVC): Requesting Storage Resources

Persistent Volume Claims act as a user’s request for storage resources. Image your PVC as a demand for storage with specific requirements, similar to how a developer requests computing resources.

When a user creates a PVC, Kubernetes attempts to find and bind an appropriate Persistent Volume that meets the specified criteria. If no existing volume is found but a storage class is defined or a cluster-default one is available, the persistent volume will be dynamically allocated.

Key PersistentVolumeClaim Characteristics:

Size Specification: Defines a user storage capacity request
Access Modes: Defines how the volume can be accessed
- ReadWriteOnce (RWO): Allows all pods on a single node to mount the volume in read-write mode.
- ReadWriteOncePod: Allows a single pod to read-write mount the volume on a single node.
- ReadOnlyMany (ROX): Allows multiple pods on multiple nodes to read the volume. Very practical for a shared configuration state.
- ReadWriteMany (RWO): Allows multiple pods on multiple nodes to read and write to the volume. Remember, this could be dangerous for databases and other applications that don’t support a shared state.
StorageClass: Allows requesting specific types of storage based on performance, redundancy, or other characteristics

The Container Storage Interface (CSI)

The Container Storage Interface (CSI) represents a pivotal advancement in Kubernetes storage architecture. Before CSI, integrating storage devices with Kubernetes was a complex and often challenging process that required a deep understanding of both storage systems and container orchestration.

The Container Storage Interface introduces a standardized approach to storage integration. Storage providers (commonly referred to as CSI drivers) are so-called out-of-process entities that communicate with Kubernetes via an API. The integration of CSI into the Kubernetes ecosystem provides three major benefits:

CSI provides a vendor-neutral, extensible plugin architecture
CSI simplifies the process of adding new storage systems to Kubernetes
CSI enables third-party storage providers to develop and maintain their own storage plugins without modifying Kubernetes core code

Volumes: The Basic Storage Units

In Kubernetes, volumes are fundamental storage entities that solve the problem of data persistence and sharing between containers. Unlike traditional storage solutions, Kubernetes volumes are not limited to a single type of storage medium. They can represent:

Volumes provide a flexible abstraction layer that allows applications to interact with storage resources without being directly coupled to the underlying storage infrastructure.

StorageClasses: Dynamic Storage Provisioning

StorageClasses represent a powerful abstraction that enables dynamic and flexible storage provisioning because they allow cluster administrators to define different types of storage services with varying performance characteristics, such as:

High-performance SSD storage
Economical magnetic drive storage
Geo-redundant cloud storage solutions

When a user requests storage through a PVC, Kubernetes tries to find an existing persistent volume. If none was found, the appropriate StorageClass defines how to automatically provision a suitable storage resource, significantly reducing administrative overhead.

Figure 2: Table with features for ephemeral storage and persistent storage

Best Practices for Kubernetes Storage Management

Resource Limitation
- Implement strict resource quotas
- Control storage consumption across namespaces
- Set clear boundaries for storage requests
Configuration Recommendations
- Always use Persistent Volume Claims in container configurations
- Maintain a default StorageClass
- Use meaningful and descriptive names for storage classes
Performance and Security Considerations
- Implement quality of service (QoS) controls
- Create isolated storage environments
- Enable multi-tenancy through namespace segregation

Practical Storage Provisioning Example

While specific implementations vary, here’s a conceptual example of storage provisioning using Helm:

helm install storage-solution storage-provider/csi-driver \
  --set storage.size=100Gi \
  --set storage.type=high-performance \
  --set access.mode=ReadWriteMany

Kubernetes Storage with Simplyblock CSI: Practical Implementation Guide

Simplyblock is a storage platform for stateful workloads such as databases, message queues, data warehouses, file storage, and similar. Therefore, simplyblock provides many features tailored to the use cases, simplifying deployments, improving performance, or enabling features such as instant database clones.

Basic Installation Example

When deploying storage in a Kubernetes environment, organizations need a reliable method to integrate storage solutions seamlessly. The Simplyblock CSI driver installation process begins by adding the Helm repository, which allows teams to easily access and deploy the storage infrastructure. By creating a dedicated namespace called simplyblock-csi, administrators ensure clean isolation of storage-related resources from other cluster components.

The installation command specifies critical configuration parameters that connect the Kubernetes cluster to the storage backend. The unique cluster UUID identifies the specific storage cluster, while the API endpoint provides the connection mechanism. The secret token ensures secure authentication, and the pool name defines the initial storage pool where volumes will be provisioned. This approach allows for a standardized, secure, and easily repeatable storage deployment process.

Here’s an example of installing the Simplyblock CSI driver:

helm repo add simplyblock-csi https://raw.githubusercontent.com/simplyblock-io/simplyblock-csi/master/charts

helm repo update

helm install -n simplyblock-csi --create-namespace \
  simplyblock-csi simplyblock-csi/simplyblock-csi \
  --set csiConfig.simplybk.uuid=[random-cluster-uuid] \
  --set csiConfig.simplybk.ip=[cluster-ip] \
  --set csiSecret.simplybk.secret=[random-cluster-secret] \
  --set logicalVolume.pool_name=[cluster-name]

Advanced Configuration Scenarios

1. Performance-Optimized Storage Configuration

Modern applications often require precise control over storage performance, making custom StorageClasses invaluable.

Firstly, by creating a high-performance storage class, organizations can define exact performance characteristics for different types of workloads. The configuration sets a specific IOPS (Input/Output Operations Per Second) limit of 5000, ensuring that applications receive consistent and predictable storage performance.

Secondly, bandwidth limitations of 500 MB/s prevent any single application from monopolizing storage resources, promoting fair resource allocation. The added encryption layer provides an additional security measure, protecting sensitive data at rest. This approach allows DevOps teams to create storage resources that precisely match application requirements, balancing performance, security, and resource management.

# Example StorageClass configuration
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance-storage
provisioner: csi.simplyblock.io
parameters:
  qos_rw_iops: "5000"    # High IOPS performance
  qos_rw_mbytes: "500"   # Bandwidth limit
  encryption: "True"      # Enable encryption

2. Multi-Tenant Storage Setup

As a large organization or cloud provider, you require a robust environment and workload separation mechanism. For that reason, teams organize workloads between development, staging, and production environments by creating a dedicated namespace for production applications.

Therefore, the custom storage class for production workloads ensures critical applications have access to dedicated storage resources with specific performance and distribution characteristics.

The distribution configuration with multiple network domain controllers (NDCs) provides enhanced reliability and performance. Indeed, this approach supports complex enterprise requirements by enabling granular control over storage resources, improving security, and ensuring that production workloads receive the highest quality of service.

# Namespace-based storage isolation
apiVersion: v1
kind: Namespace
metadata:
  name: production-apps

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: encrypted-volume
  annotations:
    simplybk/secret-name: encrypted-volume-keys
spec:
  storageClassName: encrypted-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Multipath Storage Configuration

Network resilience is a critical consideration in enterprise storage solutions. Hence, multipath storage configuration provides redundancy by allowing multiple network paths for storage communication. By enabling multipathing and specifying a default network interface, organizations can create more robust storage infrastructures that can withstand network interruptions.

The caching node creation further enhances performance by providing an intelligent caching layer that can improve read and write operations. Furthermore, this configuration supports load balancing and reduces potential single points of failure in the storage network.

cachingnode:
  create: true
  multipathing: true
  ifname: eth0  # Default network interface

Best Practices for Kubernetes Storage with Simplyblock

Always specify a unique pool name for each storage configuration
Implement encryption for sensitive workloads
Use QoS parameters to control storage performance
Leverage multi-tenancy features for environment isolation
Regularly monitor storage node capacities and performance

Deletion and Cleanup

# Uninstall the CSI driver
helm uninstall "simplyblock-csi" --namespace "simplyblock-csi"

# Remove the namespace
kubectl delete namespace simplyblock-csi

The examples demonstrate the flexibility of Kubernetes storage, showcasing how administrators can fine-tune storage resources to meet specific application requirements while maintaining performance, security, and scalability. Try simplyblock for the most flexible Kubernetes storage solution on the market today.

The post Kubernetes Storage 201: Concepts and Practical Examples appeared first on simplyblock.

Scale Up vs Scale Out: System Scalability Strategies

Chris Engelbert — Wed, 11 Dec 2024 10:00:40 +0000

TLDR: Horizontal scalability (scale out) describes a system that scales by adding more resources through parallel systems, whereas vertical scalability (scale up) increases the amount of resources on a single system.

One of the most important questions to answer when designing an application or infrastructure is the architecture approach to system scalability. Traditionally, systems used the scale-up approach or vertical scalability. Many modern systems use a scale-out approach, especially in the cloud-native ecosystem. Also called horizontal scalability.

Scale-Up vs Scale-Out: Which System Architecture is Right for You?

Understanding the Basics

Understanding the fundamental concepts is essential when discussing system architectures. Hence, let’s briefly overview the two approaches before exploring them in more depth.

With Scale Up (Vertical Scalability), you increase resources (typically CPU, memory, and storage) in the existing system to improve performance and capacity.
With Scale Out (Horizontal Scalability), you add additional nodes or machines to the existing workforce to distribute the workload across multiple systems.

Both architectural approaches have their respective advantages and disadvantages. While scale-up architectures are easier to implement, they are harder to scale at a certain point. On the other hand, scale-out architectures are more complex to implement but scale almost linearly if done right.

Vertical Scaling (Scale Up) Architectures: The Traditional Approach

Figure 1: Scale-up storage architecture with disks being added to the same machine

Vertical scaling, commonly known as scaling up, involves adding more resources to an existing system to increase its power or capacity.

Think of it as upgrading your personal computer. Instead of buying a second computer, you add more RAM or install a faster processor or larger storage device. In enterprise storage systems, this typically means adding more CPU cores, memory, or storage drives to an existing server. Meanwhile, for virtual machines it usually involves increasing the host machine’s assigned resources.

To clarify, let’s use a real-world example from the storage industry. With a ZFS-based SAN (Storage Area Network) system, a scaling up system design is required. Or as Jason Lohrey wrote: «However, ZFS has a significant issue – it can’t scale out. ZFS’s biggest limitation is that it is “scale-up” only.» ZFS, as awesome as it is, is limited to a single machine. That said, increasing the storage capacity always means adding larger or more disks to the existing machine. This approach maintains the simplicity of the original architecture while increasing storage capacity and potentially improving performance.

Strengths of Vertical Scaling

Today, many people see the vertical scalability approach as outdated and superfluous. That is, however, not necessarily true. Vertical scaling shines in several scenarios.

First, implementing a scale-up system is generally more straightforward since it doesn’t require changes to your application architectures or complex data distribution logic. When you scale up a transactional database like PostgreSQL or MySQL, you essentially give it more operational resources while maintaining the same operational model.

Secondly, the management overhead is lower. Tasks such as backups, monitoring, and maintenance are straightforward. This simplicity often translates to lower operational costs despite the potentially higher hardware costs.

Here is a quick overview of all the advantages:

Simplicity: It’s straightforward to implement since you’re just adding resources to an existing system
Lower Complexity: Less architectural overhead since you’re working with a single system
Consistent Performance: Lower latency due to all resources being in one place
Software Compatibility: Most traditional software is designed to run on a single system
Lower Initial Costs: Often cheaper for smaller workloads due to simpler licensing and management

Weaknesses and Limitations of Scale-Up Architectures

Like anything in this world, vertical scaling architectures also have drawbacks. The most significant limitation is the so-called physical ceiling. A system is limited by its server chassis’s space capacity or the hardware architecture’s limitation. You can only add as much hardware as those limitations allow. Alternatively, you need to migrate to a bigger base system.

Traditional monolithic applications often face another challenge with vertical scaling: adding more resources doesn’t always translate to linear performance improvements. For example, doubling the CPU cores might yield only a 50% performance increase due to software architecture limitations, especially resource contention.

Here is a quick overview of all the disadvantages:

Hardware Limits: The physical ceiling limits how much you can scale up based on maximum hardware specifications
Downtime During Upgrades: Usually requires system shutdown for hardware upgrades
Cost Efficiency: High-end hardware becomes exponentially more expensive
Single Point of Failure: No built-in redundancy
Limited Flexibility: Cannot easily scale back down when demand decreases

When to Scale Up?

After all that, here is when you really want to go with a scale-up architecture:

You have traditional monolithic applications
You look for an easier way to optimize for performance, not capacity
You’re dealing with applications that aren’t designed for distributed computing
You need a quick solution for immediate performance issues

Horizontal Scaling (Scale Out) Architectures: The Distributed Approach

Figure 2: Scale-out storage architecture with additional nodes being added to the cluster

The fundamentally different approach is the horizontal scaling or scale-out architecture. Instead of increasing the available resources on the existing system, you add more systems to distribute the load across them. This is actually similar to adding additional workers to an assembly line rather than trying to make one worker more efficient.

Consider a distributed storage system like simplyblock or a distributed database like MongoDB. When you scale out these systems, you add more nodes to the cluster, and the workload gets distributed across all nodes. Each node handles a portion of the data and processing, allowing the system to grow almost limitlessly.

Advantages of Horizontal Scaling

Large-scale deployments and highly distributed systems are the forte of scale-out architectures. As a simple example, most modern web applications utilize load balancers. They distribute the traffic across multiple application servers. This allows us to handle millions of concurrent requests and users. Similarly, distributed storage systems like simplyblock scale to petabytes of data by adding additional storage nodes.

Secondly, another significant advantage is improved high availability and fault tolerance. In a properly designed scale-out system, if one node fails, the system continues operating. While it may degrade to a reduced service, it will not experience a complete system failure or outage.

To bring this all to a point:

Near-Infinite Scalability: Can continue adding nodes as needed
Better Fault Tolerance: Built-in redundancy through multiple nodes
Cost Effectiveness: Can use commodity hardware
Flexible Resource Allocation: Easy to scale up or down based on demand
High Availability: No single point of failure

The Cost of Distribution: Weakness and Limitations of Horizontal Scalability

The primary challenge when considering scale-out architectures is complexity. Distributed systems must maintain data consistency across system boundaries, handle network communications or latencies, and handle failure recovery. Multiple algorithms have been developed over the years. The most commonly used ones are Raft and Paxos, but that’s a different blog post. Anyhow, this complexity typically requires more sophisticated management tools and distributed systems expertise. Normally also for the team operating the system.

The second challenge is the overhead of system coordination. In a distributed system, nodes must synchronize their operations. If not careful, this can introduce latency and even reduce the performance of certain types of operations. Great distributed systems utilize sophisticated algorithms to prevent these issues from happening.

Here is a quick overview of the disadvantages of horizontal scaling:

Increased Complexity: More moving parts to manage
Data Consistency Challenges: Maintaining consistency across nodes can be complex
Higher Initial Setup Costs: Requires more infrastructure and planning
Software Requirements: Applications must be designed for distributed computing
Network Overhead: Communication between nodes adds latency

Kubernetes: A Modern Approach to Scaling

Kubernetes has become the de facto platform for container orchestration. It comes in multiple varieties, in its vanilla form or as the basis for systems like OpenShift or Rancher. Either way, it can be used for both vertical and horizontal scaling capabilities. However, Kubernetes has become a necessity when deploying scale-out services. Let’s look at how different workloads scale in a Kubernetes environment.

Scaling Stateless Workloads

Stateless applications, like web servers or API gateways, are natural candidates for horizontal scaling in Kubernetes. The Horizontal Pod Autoscaler (HPA) provided by Kubernetes automatically adjusts the number of pods based on metrics such as CPU or RAM utilization. Custom metrics as triggers are also possible.

Horizontally scaling stateless applications is easy. As the name suggests, stateless applications do not maintain persistent local or shared state data. Each instance or pod is entirely independent and interchangeable. Each request to the service contains all the required information needed for processing.

That said, automatically scaling up and down (in the meaning of starting new instances or shutting some down) is part of the typical lifecycle and can happen at any point in time.

Scaling Stateful Workloads

Stateful workloads, like databases, require more careful consideration.

A common approach for more traditional databases like PostgreSQL or MySQL is to use a primary-replica architecture. In this design, write operations always go to the primary instance, while read operations can be distributed across all replicas.

On the other hand, MongoDB, which uses a distributed database design, can scale out more naturally by adding more shards to the cluster. Their internal cluster design uses a technique called sharding. Data is assigned to horizontally scaling partitions distributed across the cluster nodes. Shard assignment happens either automatically (based on the data) or by providing a specific shard key, enabling data affinity. Adding a shard to the cluster will increase capacity when additional scale is necessary. Data rebalancing happens automatically.

Why we built Simplyblock on a Scale-Out Architecture

Figure 3: Simplyblock’s scale-out architecture with storage pooling via cluster nodes

Stateful workloads, like Postgres or MySQL, can scale out by adding additional read-replicas to the cluster. However, every single instance needs storage to store its very own data. Hence, the need for scalable storage arrives.

Simplyblock is a cloud-native and distributed storage platform built to deliver scalable performance and virtually infinite capacity for logical devices through horizontal scalability. Unlike traditional storage systems, simplyblock distributes data across all cluster nodes, multiplying the performance and capacity.

Designed as an NVMe-first architecture, simplyblock using the NVMe over Fabrics protocol family. This extends the reach of the highly scalable NVMe protocol over network fabrics such as TCP, Fibre Channel, and others. Furthermore, it provides built-in support for multi-pathing, enabling seamless failover and load balancing.

The system uses a distributed data placement algorithm to spread data across all available cluster nodes, automatically rebalancing data when nodes are added or removed. When writing data, simplyblock splits the item into multiple, smaller chunks and distributes them. This allows for parallel access during read operations. The data distribution also provides redundancy, with parity information stored on other nodes in the cluster. This protects the data against individual disk and node failures.

Using this architecture, simplyblock provides linear capacity and performance scalability by pooling all available disks and parallelizing access. This enables simplyblock to scale from mere terabytes to multiple petabytes while maintaining performance, consistency, and durability characteristics throughout the cluster-growth process.

Building Future-Proof Infrastructure

To wrap up, when you build out a new system infrastructure or application, consider these facts:

Figure 4: Flowchart when to scale-up or scale-out?

Workload characteristics: CPU-intensive workloads might benefit more from vertical scaling. Distributing operations comes with its own overhead. If the operation itself doesn’t set off this overhead, you might see lower performance than with vertical scaling. On the other hand, I/O-heavy workloads might perform better with horizontal scaling. If the access patterns are highly parallelizable, a horizontal architecture will most likely out scale a vertical one.
Growth patterns: Predictable, steady growth might favor scaling up, while rapid growth patterns might necessitate the flexibility of scaling out. This isn’t a hard rule, though. A carefully designed scale-out system will provide a very predictable growth pattern and latency. However, the application isn’t the only element to take into account when designing the system, as there are other components, most prominently the network and network equipment.
Future-Proofing: Scaling out often requires little upfront investment in infrastructure but higher investment in development and expertise. It can, however, provide better long-term cost efficiency for large deployments. That said, buying a scale-out solution is a great idea. With a storage solution like simplyblock, for example, you can start small and add required resources whenever necessary. With traditional storage solutions, you have to go with a higher upfront cost and are limited by the physical ceiling.
Operational Complexity: Scale-up architectures are typically easier to manage, while a stronger DevOps or operations team is required to handle scale-out solutions. That’s why simplyblock’s design is carefully crafted to be fully autonomous and self-healing, with as few hands-on requirements as possible.

The Answer Depends

That means there is no universal answer to whether scaling up or out is better. A consultant would say, “It depends.” Seriously, it does. It depends on your specific requirements, constraints, and goals.

Many successful organizations use a hybrid approach, scaling up individual nodes while also scaling out their overall infrastructure. The key is understanding the trade-offs and choosing the best approach to your needs while keeping future growth in mind. Hence, simplyblock provides the general scale-out architecture for infinite scalability. It also provides a way to utilize storage located in Kubernetes worker nodes as part of the storage cluster to provide the highest possible performance. At the same time, it maintains the option to spill over when local capacity is reached and the high durability and fault tolerance of a fully distributed storage system.

Remember, the best scaling strategy aligns with your business objectives while maintaining performance, reliability, and cost-effectiveness. Whether you scale up, out, or both, ensure your choice supports your long-term infrastructure goals.

Figure 5: Simple definition of scale up vs scale out.

The post Scale Up vs Scale Out: System Scalability Strategies appeared first on simplyblock.

NVMe & Kubernetes: Future-Proof Infrastructure

Chris Engelbert — Wed, 27 Nov 2024 13:34:00 +0000

The marriage of NVMe storage and Kubernetes persistent volumes represents a perfect union of high-performance storage and modern container orchestration. As organizations increasingly move performance-critical workloads to Kubernetes, understanding how to leverage NVMe technology becomes crucial for achieving optimal performance and efficiency.

The Evolution of Storage in Kubernetes

When Kubernetes was created over 10 years ago, its only purpose was to schedule and orchestrate stateless workloads. Since then, a lot has changed, and Kubernetes is increasingly used for stateful workloads. Not just basic ones but mission-critical workloads, like a company’s primary databases. The promise of workload orchestration in infrastructures with growing complexity is too significant.

Anyhow, traditional Kubernetes storage solutions relied upon and still rely on old network-attached storage protocols like iSCSI. Released in 2000, iSCSI was built on the SCSI protocol itself, first introduced in the 1980s. Hence, both protocols are inherently designed for spinning disks with much higher seek times and access latencies. According to our modern understanding of low latency and low complexity, they just can’t keep up.

While these solutions worked well for basic containerized applications, they fall short for high-performance workloads like databases, AI/ML training, and real-time analytics. Let’s look at the NVMe standard, particularly NVMe over TCP, which has transformed our thinking about storage in containerized environments, not just Kubernetes.

Why NVMe and Kubernetes Work So Well Together

The beauty of this combination lies in their complementary architectures. The NVMe protocol and command set were designed from the ground up for parallel, low-latency operations–precisely what modern containerized applications demand. When you combine NVMe’s parallelism with Kubernetes’ orchestration capabilities, you get a system that can efficiently distribute I/O-intensive workloads while maintaining microsecond-level latency. Further, comparing NVMe over TCP vs iSCSI, we see significant improvement in terms of IOPS and latency performance when using NVMe/TCP.

Consider a typical database workload on Kubernetes. Traditional storage might introduce latencies of 2-4ms for read operations. With NVMe over TCP, these same operations complete in under 200 microseconds–a 10-20x improvement. This isn’t just about raw speed; it’s about enabling new classes of applications to run effectively in containerized environments.

The Technical Symphony

The integration of NVMe with Kubernetes is particularly elegant through persistent volumes and the Container Storage Interface (CSI). Modern storage orchestrators like simplyblock leverage this interface to provide seamless NVMe storage provisioning while maintaining Kubernetes’ declarative model. This means development teams can request high-performance storage using familiar Kubernetes constructs while the underlying system handles the complexity of NVMe management, providing fully reliable shared storage.

The NMVe Impact: A Real-World Example

But what does that mean for actual workloads? Our friends over at Percona found in their MongoDB Performance on Kubernetes report that Kubernetes implies no performance penalty. Hence, we can look at the disks’ actual raw performance.

A team of researchers from the University of Southern California, San Jose State University, and Samsung Semiconductor took on the challenge of measuring the implications of NVMe SSDs (over SATA SSD and SATA HDD) for real-world database performance.

The general performance characteristics of their test hardware:

	NVMe SSD	SATA SSD	SATA HDD
Access latency	113µs	125µs	14,295µs
Maximum IOPS	750,000	70,000	190
Maximum Bandwidth	3GB/s	278MB/s	791KB/s

Table 1: General performance characteristics of the different storage types

Their resume states, “scale-out systems are driving the need for high-performance storage solutions with high available bandwidth and lower access latencies. To address this need, newer standards are being developed exclusively for non-volatile storage devices like SSDs,” and “NVMe’s hardware and software redesign of the storage subsystem translates into real-world benefits.”

They’re closing with some direct comparisons that claim an 8x performance improvement of NVMe-based SSDs compared to a single SATA-based SSD and still a 5x improvement over a [Hardware] RAID-0 of four SATA-based SSDs.

Transforming Database Operations

Perhaps the most compelling use case for NVMe in Kubernetes is database operations. Typical modern databases process queries significantly faster when storage isn’t the bottleneck. This becomes particularly important in microservices architectures where concurrent database requests and high-load scenarios are the norm.

Traditionally, running stateful services in Kubernetes meant accepting significant performance overhead. With NVMe storage, organizations can now run high-performance databases, caches, and messaging systems with bare-metal-like performance in their Kubernetes clusters.

Dynamic Resource Allocation

One of Kubernetes’ central promises is dynamic resource allocation. That means assigning CPU and memory according to actual application requirements. Furthermore, it also means dynamically allocating storage for stateful workloads. With storage classes, Kubernetes provides the option to assign different types of storage backends to different types of applications. While not strictly necessary, this can be a great application of the “best tool for the job” principle.

That said, for IO-intensive workloads, such as databases, a storage backend providing NVMe storage is essential. NVMe’s ability to handle massive I/O parallelism aligns perfectly with Kubernetes’ scheduling capabilities. Storage resources can be dynamically allocated and deallocated based on workload demands, ensuring optimal resource utilization while maintaining performance guarantees.

Simplified High Availability

The low latency of NVMe over TCP enables new approaches to high availability. Instead of complex database replication schemes, organizations can leverage storage-level replication (or more storage-efficient erasure coding, like in the case of simplyblock) with a negligible performance impact. This significantly simplifies application architecture while improving reliability.

Furthermore, NVMe over TCP utilizes multipathing as an automatic fail-over implementation to protect against network connection issues and sudden connection drops, increasing the high availability of persistent volumes in Kubernetes.

The Physics Behind NVMe Performance

Many teams don’t realize how profoundly storage physics impacts database operations. Traditional storage solutions averaging 2-4ms latency might seem fast, but this translates to a hard limit of about 80 consistent transactions per second, even before considering CPU or network overhead. Each transaction requires multiple storage operations: reading data pages, writing to WAL, updating indexes, and performing one or more fsync() operations. At 3ms per operation, these quickly stack up into significant delays. Many teams spend weeks optimizing queries or adding memory when their real bottleneck is fundamental storage latency.

This is where the NVMe and Kubernetes combination truly shines. With NVMe as your Kubernetes persistent volume storage backend, providing sub-200μs latency, the same database operations can theoretically support over 1,200 transactions per second–a 15x improvement. More importantly, this dramatic reduction in storage latency changes how databases behave under load. Connection pools remain healthy longer, buffer cache decisions become more efficient, and query planners can make better optimization choices. With the storage bottleneck removed, databases can finally operate closer to their theoretical performance limits.

Looking Ahead

The combination of NVMe and Kubernetes is just beginning to show its potential. As more organizations move performance-critical workloads to Kubernetes, we’ll likely see new patterns and use cases that fully take advantage of this powerful combination.

Some areas to watch:

AI/ML workload optimization through intelligent data placement
Real-time analytics platforms leveraging NVMe’s parallel access capabilities
Next-generation database architectures built specifically for NVMe on Kubernetes Persistent Volumes

The marriage of NVMe-based storage and Kubernetes Persistent Volumes represents more than just a performance improvement. It’s a fundamental shift in how we think about storage for containerized environments. Organizations that understand and leverage this combination effectively gain a significant competitive advantage through improved performance, reduced complexity, and better resource utilization.

For a deeper dive into implementing NVMe storage in Kubernetes, visit our guide on optimizing Kubernetes storage performance.

The post NVMe & Kubernetes: Future-Proof Infrastructure appeared first on simplyblock.

How to Build Scalable and Reliable PostgreSQL Systems on Kubernetes

Sanskar Gurdasani (Guest Author, CloudRaft) — Mon, 25 Nov 2024 15:31:37 +0000

This is a guest post by Sanskar Gurdasani, DevOps Engineer, from CloudRaft.

Maintaining highly available and resilient PostgreSQL databases is crucial for business continuity in today’s cloud-native landscape. The Cloud Native PostgreSQL Operator provides robust capabilities for managing PostgreSQL clusters in Kubernetes environments, particularly in handling failover scenarios and implementing disaster recovery strategies.

In this blog post, we’ll explore the key features of the Cloud Native PostgreSQL Operator for managing failover and disaster recovery. We’ll discuss how it ensures high availability, implements automatic failover, and facilitates disaster recovery processes. Additionally, we’ll look at best practices for configuring and managing PostgreSQL clusters using this operator in Kubernetes environments.

Why to run Postgres on Kubernetes?

Running PostgreSQL on Kubernetes offers several advantages for modern, cloud-native applications:

Stateful Workload Readiness: Contrary to old beliefs, Kubernetes is now ready for stateful workloads like databases. A 2021 survey by the Data on Kubernetes Community revealed that 90% of respondents believe Kubernetes is suitable for stateful workloads, with 70% already running databases in production.
Immutable Application Containers: CloudNativePG leverages immutable application containers, enhancing deployment safety and repeatability. This approach aligns with microservice architecture principles and simplifies updates and patching.
Cloud-Native Benefits: Running PostgreSQL on Kubernetes embraces cloud-native principles, fostering a DevOps culture, enabling microservice architectures, and providing robust container orchestration.
Automated Management: Kubernetes operators like CloudNativePG extend Kubernetes controllers to manage complex applications like PostgreSQL, handling deployments, failovers, and other critical operations automatically.
Declarative Configuration: CloudNativePG allows for declarative configuration of PostgreSQL clusters, simplifying change management and enabling Infrastructure as Code practices.
Resource Optimization: Kubernetes provides efficient resource management, allowing for better utilization of infrastructure and easier scaling of database workloads.
High Availability and Disaster Recovery: Kubernetes facilitates the implementation of high availability architectures across availability zones and enables efficient disaster recovery strategies.
Streamlined Operations with Operators: Using operators like CloudNativePG automates all the tasks mentioned above, significantly reducing operational complexity. These operators act as PostgreSQL experts in code form, handling intricate database management tasks such as failovers, backups, and scaling with minimal human intervention. This not only increases reliability but also frees up DBAs and DevOps teams to focus on higher-value activities, ultimately leading to more robust and efficient database operations in Kubernetes environments.

By leveraging Kubernetes for PostgreSQL deployments, organizations can benefit from increased automation, improved scalability, and enhanced resilience for their database infrastructure, with operators like CloudNativePG further simplifying and optimizing these processes.

List of Postgres Operators

Kubernetes operators represent an innovative approach to managing applications within a Kubernetes environment by encapsulating operational knowledge and best practices. These extensions automate the deployment and maintenance of complex applications, such as databases, ensuring smooth operation in a Kubernetes setup.

The Cloud Native PostgreSQL Operator is a prime example of this concept, specifically designed to manage PostgreSQL clusters on Kubernetes. This operator automates various database management tasks, providing a seamless experience for users. Some key features include direct integration with the Kubernetes API server for high availability without relying on external tools, self-healing capabilities through automated failover and replica recreation, and planned switchover of the primary instance to maintain data integrity during maintenance or upgrades.

Additionally, the operator supports scalable architecture with the ability to manage multiple instances, declarative management of PostgreSQL configuration and roles, and compatibility with Local Persistent Volumes and separate volumes for WAL files. It also offers continuous backup solutions to object stores like AWS S3, Azure Blob Storage, and Google Cloud Storage, ensuring data safety and recoverability. Furthermore, the operator provides full recovery and point-in-time recovery options from existing backups, TLS support with client certificate authentication, rolling updates for PostgreSQL minor versions and operator upgrades, and support for synchronous replicas and HA physical replication slots. It also offers replica clusters for multi-cluster PostgreSQL deployments, connection pooling through PgBouncer, a native customizable Prometheus metrics exporter, and LDAP authentication support.

By leveraging the Cloud Native PostgreSQL Operator, organizations can streamline their database management processes on Kubernetes, reducing manual intervention and ensuring high availability, scalability, and security in their PostgreSQL deployments. This operator showcases how Kubernetes operators can significantly enhance application management within a cloud-native ecosystem.

Here are the most popular PostgreSQL operators:

CloudNativePG (formerly known as Cloud Native PostgreSQL Operator)
Crunchy Data Postgres Operator (first released in 2017)
Zalando Postgres Operator (first released in 2017)
StackGres (released in 2020)
Percona Operator for PostgreSQL (released in 2021)
Kubegres (released in 2021)
Patroni (for HA PostgreSQL solutions using Python.)

Understanding Failover in PostgreSQL

Primary-Replica Architecture

In a PostgreSQL cluster, the primary-replica (formerly master-slave) architecture consists of:

Primary Node: Handles all write operations and read operations
Replica Nodes: Maintain synchronized copies of the primary node’s data

Automatic Failover Process

When the primary node becomes unavailable, the operator initiates the following process:

Detection: Continuous health monitoring identifies primary node failure
Election: A replica is selected to become the new primary
Promotion: The chosen replica is promoted to primary status
Reconfiguration: Other replicas are reconfigured to follow the new primary
Service Updates: Kubernetes services are updated to point to the new primary

Implementing Disaster Recovery

Backup Strategies

The operator supports multiple backup approaches:

1. Volume Snapshots

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgresql-cluster
spec:
  instances: 3
  backup:
    volumeSnapshot:
      className: csi-hostpath-snapclass
      enabled: true
      snapshotOwnerReference: true

2. Barman Integration

spec:
  backup:
    barmanObjectStore:
      destinationPath: 's3://backup-bucket/postgres'
      endpointURL: 'https://s3.amazonaws.com'
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS\_KEY\_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS\_SECRET\_KEY

Disaster Recovery Procedures

Point-in-Time Recovery (PITR)
- Enables recovery to any specific point in time
- Uses WAL (Write-Ahead Logging) archives
- Minimizes data loss
Cross-Region Recovery
- Maintains backup copies in different geographical regions
- Enables recovery in case of regional failures

Demo

This section provides a step-by-step guide to setting up a CloudNative PostgreSQL cluster, testing failover, and performing disaster recovery.

1. Installation

Method 1: Direct Installation

kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.24.0.yaml

Method 2: Helm Installation

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

Verify the Installation

kubectl get deployment -n cnpg-system cnpg-controller-manager

Install CloudNativePG Plugin

CloudNativePG provides a plugin for kubectl to manage a cluster in Kubernetes. You can install the cnpg plugin using a variety of methods.

Via the installation script

curl -sSfL \
  https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | \
  sudo sh -s -- -b /usr/local/bin

If you already have Krew installed, you can simply run:

kubectl krew install cnpg

2. Create S3 Credentials Secret

First, create an S3 bucket and an IAM user with S3 access. Then, create a Kubernetes secret with the IAM credentials:

kubectl create secret generic s3-creds \
  --from-literal=ACCESS_KEY_ID=your_access_key_id \
  --from-literal=ACCESS_SECRET_KEY=your_secret_access_key

3. Create PostgreSQL Cluster

Create a file named cluster.yaml with the following content:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example
spec:
  backup:
    barmanObjectStore:
      destinationPath: 's3://your-bucket-name/retail-master-db'
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: ACCESS_SECRET_KEY
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  bootstrap:
    initdb:
      postInitTemplateSQL:
        - CREATE EXTENSION IF NOT EXISTS timescaledb;
  storage:
    size: 20Gi

Apply the configuration to create cluster:

kubectl apply -f cluster.yaml

Verify the cluster status:

kubectl cnpg status example

4. Getting Access

Deploying a cluster is one thing, actually accessing it is entirely another. CloudNativePG creates three services for every cluster, named after the cluster name. In our case, these are:

kubectl get service

example-rw: Always points to the Primary node
example-ro: Points to only Replica nodes (round-robin)
example-r: Points to any node in the cluster (round-robin)

5. Insert Data

Create a PostgreSQL client pod:

kubectl run pgclient --image=postgres:13 --command -- sleep infinity

Connect to the database:

kubectl exec -ti example-1 -- psql app

Create a table and insert data:

CREATE TABLE stocks_real_time (
  time TIMESTAMPTZ NOT NULL,
  symbol TEXT NOT NULL,
  price DOUBLE PRECISION NULL,
  day_volume INT NULL
);

SELECT create_hypertable('stocks_real_time', by_range('time'));
CREATE INDEX ix_symbol_time ON stocks_real_time (symbol, time DESC);
GRANT ALL PRIVILEGES ON TABLE stocks_real_time TO app;

INSERT INTO stocks_real_time (time, symbol, price, day_volume)
VALUES
  (NOW(), 'AAPL', 150.50, 1000000),
  (NOW(), 'GOOGL', 2800.75, 500000),
  (NOW(), 'MSFT', 300.25, 750000);

6. Failover Test

Force a backup:

kubectl cnpg backup example

Initiate failover by deleting the primary pod:

kubectl delete pod example-1

Monitor the cluster status:

kubectl cnpg status example

Key observations during failover:

Initial status: “Switchover in progress”
After approximately 2 minutes 15 seconds: “Waiting for instances to become active”
After approximately 3 minutes 30 seconds: Complete failover with new primary

Verify data integrity after failover through service:

Retrieve the database password:

kubectl get secret example-app -o \
  jsonpath="{.data.password}" | base64 --decode

Connect to the database using the password:

kubectl exec -it pgclient -- psql -h example-rw -U app

Execute the following SQL queries:

# Confirm the count matches the number of rows inserted earlier. It will show 3
SELECT COUNT(*) FROM stocks_real_time;

#Insert new data to test write capability after failover:
INSERT INTO stocks_real_time (time, symbol, price, day_volume)
VALUES (NOW(), 'NFLX', 500.75, 300000);


SELECT * FROM stocks_real_time ORDER BY time DESC LIMIT 1;

Check read-only service:

kubectl exec -it pgclient -- psql -h example-ro -U app

Once connected, execute:

SELECT COUNT(*) FROM stocks_real_time;

Review logs of both pods:

kubectl logs example-1
kubectl logs example-2

Examine the logs for relevant failover information.

Perform a final cluster status check:

kubectl cnpg status example

Confirm both instances are running and roles are as expected.

7. Backup and Restore Test

First, check the current status of your cluster:

kubectl cnpg status example

Note the current state, number of instances, and any important details.

Promote the example-1 node to Primary:

kubectl cnpg promote example example-1

Monitor the promotion process, which typically takes about 3 minutes to complete.

Check the updated status of your cluster, then create a new backup:

kubectl cnpg backup example –backup-name=example-backup-1

Verify the backup status:

kubectl get backups
NAME               AGE   CLUSTER   METHOD              PHASE       ERROR
example-backup-1   38m   example   barmanObjectStore   completed

Delete the Original Cluster then prepare for the recovery test:

kubectl delete cluster example

There are two methods available to perform a Cluster Recovery bootstrap from another cluster. For further details, please refer to the documentation. There are two ways to achieve this result in CloudNativePG:

Using a recovery object store, that is a backup of another cluster created by Barman Cloud and defined via the barmanObjectStore option in the externalClusters section (recommended)
Using an existing Backup object in the same namespace (this was the only option available before version 1.8.0).

Method 1: Recovery from an Object Store

You can recover from a backup created by Barman Cloud and stored on supported object storage. Once you have defined the external cluster, including all the required configurations in the barmanObjectStore section, you must reference it in the .spec.recovery.source option.

Create a file named example-object-restored.yaml with the following content:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-object-restored
spec:
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  storage:
    size: 1Gi
  bootstrap:
    recovery:
      source: example
  externalClusters:
    - name: example
      barmanObjectStore:
        destinationPath: 's3://your-bucket-name'
        s3Credentials:
          accessKeyId:
            name: s3-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: s3-creds
            key: ACCESS_SECRET_KEY

Apply the restored cluster configuration:

kubectl apply -f example-object-restored.yaml

Monitor the restored cluster status:

kubectl cnpg status example-object-restored

Retrieve the database password:

kubectl get secret example-object-restored-app \
  -o jsonpath="{.data.password}" | base64 --decode

Connect to the restored database:

kubectl exec -it pgclient -- psql -h example-object-restored-rw -U app

Verify the restored data by executing the following SQL queries:

# it should show 4
SELECT COUNT(*) FROM stocks_real_time;
SELECT * FROM stocks_real_time;

The successful execution of these steps to recover from an object store confirms the effectiveness of the backup and restore process.

Delete the example-object-restored Cluster then prepare for the backup object restore test:

kubectl delete cluster example-object-restored

Method 2: Recovery from Backup Object

In case a Backup resource is already available in the namespace in which the cluster should be created, you can specify its name through .spec.bootstrap.recovery.backup.name

Create a file named example-restored.yaml:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-restored
spec:
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  storage:
    size: 1Gi
  bootstrap:
    recovery:
      backup:
        name: example-backup-1

Apply the restored cluster configuration:

kubectl apply -f example-restored.yaml

Monitor the restored cluster status:

kubectl cnpg status example-restored

Retrieve the database password:

kubectl get secret example-restored-app \
  -o jsonpath="{.data.password}" | base64 --decode

Connect to the restored database:

kubectl exec -it pgclient -- psql -h example-restored-rw -U app

Verify the restored data by executing the following SQL queries:

SELECT COUNT(*) FROM stocks_real_time;
SELECT * FROM stocks_real_time;

The successful execution of these steps confirms the effectiveness of the backup and restore process.

Kubernetes Events and Logs

1. Failover Events

Monitor events using:

# Watch cluster events
kubectl get events --watch | grep postgresql

# Get specific cluster events
kubectl describe cluster example | grep -A 10 Events

Key events to monitor:
- Primary selection process
- Replica promotion events
- Connection switching events
- Replication status changes

2. Backup Status

Monitor backup progress:

# Check backup status
kubectl get backups

# Get detailed backup info
kubectl describe backup example-backup-1

Key metrics:
- Backup duration
- Backup size
- Compression ratio
- Success/failure status

3. Recovery Process

Monitor recovery status:

# Watch recovery progress
kubectl cnpg status example-restored

# Check recovery logs
kubectl logs example-restored-1 \-c postgres

Important recovery indicators:
- WAL replay progress
- Timeline switches
- Recovery target status

Conclusion

The Cloud Native PostgreSQL Operator significantly simplifies the management of PostgreSQL clusters in Kubernetes environments. By following these practices for failover and disaster recovery, organizations can maintain highly available database systems that recover quickly from failures while minimizing data loss. Remember to regularly test your failover and disaster recovery procedures to ensure they work as expected when needed. Continuous monitoring and proactive maintenance are key to maintaining a robust PostgreSQL infrastructure.

Everything fails, all the time. ~ Werner Vogels, CTO, Amazon Web services

Editoral: And if you are looking for looking for a distributed, scalable, reliable, and durable storage for your PostgreSQL cluster in Kubernetes or any other Kubernetes storage need, simplyblock is the solution you’re looking for.

The post How to Build Scalable and Reliable PostgreSQL Systems on Kubernetes appeared first on simplyblock.

Best Open Source Tools For Kubernetes

Rahil Parekh — Thu, 24 Oct 2024 22:59:15 +0000

What are the best open-source tools for your Kubernetes setup?

Kubernetes, also known as K8s, is an open source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community. As Kubernetes continues to gain popularity, the demand for robust and reliable open-source tools has also increased. Developers and operators are constantly on the lookout for tools that can help them manage their Kubernetes environments more effectively. In this post, we will explore nine must-know open-source tools that can help you optimize your Kubernetes environment.

1. Helm

Helm is often referred to as the package manager for Kubernetes. It simplifies the process of deploying, managing, and versioning Kubernetes applications. With Helm, you can define, install, and upgrade even the most complex Kubernetes applications. By using Helm Charts, you can easily share your application configurations and manage dependencies between them.

2. Prometheus

Prometheus is a leading monitoring and alerting toolkit that’s widely adopted within the Kubernetes community. It collects metrics from your Kubernetes clusters, stores them, and allows you to query and visualize the data. Prometheus is essential for keeping an eye on your infrastructure’s performance and spotting issues before they become critical.

3. Kubectl

Kubectl is the command-line tool that allows you to interact with your Kubernetes clusters. It is indispensable for managing cluster resources, deploying applications, and troubleshooting issues. Whether you’re scaling your applications or inspecting logs, Kubectl provides the commands you need to get the job done.

4. Kustomize

Kustomize is a configuration management tool that helps you customize Kubernetes objects through a file-based approach. It allows you to manage multiple configurations without duplicating YAML manifests. Kustomize’s native support in Kubectl makes it easy to integrate into your existing workflows.

5. Argo CD

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. It enables you to manage your application deployments through Git repositories, ensuring that your applications are always in sync with your Git-based source of truth. Argo CD offers features like automated sync, rollback, and health status monitoring, making it a powerful tool for CI/CD pipelines.

6. Istio

Istio is an open-source service mesh that provides traffic management, security, and observability for microservices. It simplifies the complexity of managing network traffic between services in a Kubernetes cluster. Istio helps ensure that your applications are secure, reliable, and easy to monitor.

7. Fluentd

Fluentd is a versatile log management tool that helps you collect, process, and analyze logs from various sources within your Kubernetes cluster. With Fluentd, you can unify your log data into a single source and easily route it to multiple destinations, making it easier to monitor and troubleshoot your applications.

8. Velero

Velero is a backup and recovery solution for Kubernetes clusters. It allows you to back up your cluster resources and persistent volumes, restore them when needed, and even migrate resources between clusters. Velero is an essential tool for disaster recovery planning in Kubernetes environments.

9. Kubeapps

Kubeapps is a web-based UI for deploying and managing applications on Kubernetes. It provides a simple and intuitive interface for browsing Helm charts, managing your applications, and even configuring role-based access control (RBAC). Kubeapps makes it easier for developers and operators to work with Kubernetes applications.

Why Choose simplyblock for Kubernetes?

While Kubernetes excels at container orchestration, managing persistent storage for stateful workloads remains challenging. This is where simplyblock’s intelligent storage optimization creates unique value:

Kubernetes-Native Storage Orchestration: Simplyblock seamlessly integrates with Kubernetes through its CSI driver, providing intelligent storage management for stateful workloads. Unlike traditional storage solutions, simplyblock’s thin provisioning and storage pooling enable dynamic resource allocation without over-provisioning. The platform automatically manages underlying EBS volumes, scaling storage resources based on actual usage while maintaining high performance through NVMe over TCP. This is particularly valuable for organizations running multiple databases or data-intensive applications on Kubernetes.
Multi-Tenant Storage Management: Simplyblock enhances Kubernetes’ multi-tenant capabilities through isolated storage pools. Each namespace or tenant can receive dedicated storage resources while sharing the underlying infrastructure efficiently. The platform’s ability to provide per-volume encryption and performance characteristics ensures secure and predictable storage performance across different workloads. This architecture is especially beneficial for organizations running Database-as-a-Service or other multi-tenant applications on Kubernetes.
Enterprise-Grade Data Protection: Simplyblock strengthens Kubernetes’ persistent storage reliability through sophisticated backup and disaster recovery features. The platform’s consistent snapshot capability ensures that backups maintain referential integrity across multiple persistent volumes, especially crucial for distributed databases and stateful applications. By leveraging S3 for disaster recovery with near-zero RPO through write-ahead logging, simplyblock provides robust data protection without compromising performance. This approach particularly benefits organizations requiring rapid recovery capabilities while maintaining strict data consistency across their Kubernetes clusters.

How to Optimize Kubernetes Storage with Open-source Tools

This guide explored nine essential open-source tools for Kubernetes, from Helm’s package management to Velero’s backup capabilities. While these tools excel at different aspects – Prometheus for monitoring, Istio for service mesh, and Argo CD for GitOps – proper implementation is crucial. Tools like Fluentd enable log management, while Kubeapps provides UI-based deployments. Each tool offers unique approaches to managing and optimizing Kubernetes deployments.

If you’re looking to further streamline your Kubernetes operations, simplyblock offers comprehensive solutions that integrate seamlessly with these tools, helping you get the most out of your Kubernetes environment.

Ready to take your Kubernetes management to the next level? Contact simplyblock today to learn how we can help you simplify and enhance your Kubernetes journey.

The post Best Open Source Tools For Kubernetes appeared first on simplyblock.

Best Open Source Tools for AWS Cloud

Rahil Parekh — Thu, 24 Oct 2024 21:25:22 +0000

What are the best open-source tools for your AWS Cloud setup?

The AWS Cloud ecosystem is a dynamic and rapidly evolving environment that supports a vast array of services and applications. As organizations increasingly rely on AWS for their cloud computing needs, open-source tools have become invaluable for enhancing AWS operations. These tools provide essential capabilities such as infrastructure management, cost optimization, security, and monitoring, ensuring that your AWS environment runs efficiently and securely. As AWS continues to grow in popularity, the demand for effective and reliable open-source tools has surged. Cloud architects, developers, and operations teams are always looking for tools that can help them manage their AWS environments more effectively. In this post, we will explore nine must-know open-source tools that can help you optimize your AWS Cloud experience.

1. Terraform

Terraform is a powerful infrastructure-as-code (IaC) tool that allows you to define and provision your AWS infrastructure using a simple, declarative configuration language. With Terraform, you can version control your infrastructure, automate deployments, and ensure consistency across your environments. It’s a must-have tool for managing complex AWS environments and streamlining cloud operations.

2. Ansible

Ansible is an open-source automation tool that simplifies the process of managing AWS resources. It uses a simple, human-readable language (YAML) to define tasks and configurations, making it easy to automate provisioning, configuration management, and application deployment. Ansible’s extensive AWS modules enable seamless integration with AWS services, helping you automate cloud operations with ease.

3. Prometheus

Prometheus is a leading open-source monitoring and alerting toolkit widely used for tracking the performance and health of AWS infrastructure. It collects metrics from your AWS services, stores them, and allows you to visualize and query the data. Prometheus is essential for ensuring that your AWS applications and services are running smoothly and for identifying potential issues before they impact your users.

4. Kubernetes (K8s) on AWS (EKS)

Kubernetes is a powerful container orchestration platform, and when combined with Amazon Elastic Kubernetes Service (EKS), it becomes a robust solution for managing containerized applications on AWS. It automates the deployment, scaling, and operation of application containers, while EKS provides a fully managed Kubernetes control plane, simplifying cluster management. This combination is ideal for deploying, managing, and scaling containerized applications on AWS.

5. AWS CDK (Cloud Development Kit)

The AWS CDK is an open-source software development framework that enables you to define your cloud infrastructure using familiar programming languages such as Python, JavaScript, and TypeScript. CDK simplifies cloud infrastructure management by allowing developers to use code to define and provision AWS resources, resulting in more maintainable and scalable infrastructure-as-code practices.

6. Packer

Packer is an open-source tool that automates the creation of machine images for AWS, including Amazon Machine Images (AMIs). It integrates seamlessly with your existing CI/CD pipelines, enabling you to create consistent, pre-configured images that can be used across your AWS environments. Packer is crucial for ensuring that your infrastructure is consistent, secure, and easy to deploy.

7. ElasticSearch (on Amazon Elasticsearch Service)

Elasticsearch is a widely-used open-source search and analytics engine that, when paired with Amazon Elasticsearch Service (OpenSearch Service), provides a scalable and secure way to search, analyze, and visualize data on AWS. It is particularly useful for log and event data analysis, making it easier to monitor and troubleshoot applications running in the cloud.

8. Cloud Custodian

Cloud Custodian is an open-source governance-as-code tool that allows you to manage and automate AWS resource policies. It enables you to define rules for resource provisioning, security, and compliance using simple YAML configurations. Cloud Custodian is invaluable for ensuring that your AWS environments adhere to best practices and regulatory requirements.

9. Grafana

Grafana is an open-source data visualization and monitoring tool that integrates with Prometheus and other data sources to provide comprehensive dashboards for monitoring AWS resources. It also offers powerful visualizations, alerting capabilities, and flexible query options.

How to Optimize AWS Cloud with Open-source Tools

This guide explored nine essential open-source tools for AWS Cloud, from Terraform’s infrastructure as code to Grafana’s visualization capabilities. While these tools excel at different aspects – Ansible for automation, Prometheus for monitoring, and Kubernetes for container orchestration – proper implementation is crucial. Tools like AWS CDK enable programmatic infrastructure definition, while Cloud Custodian and Packer provide governance and image management capabilities. Each tool offers unique approaches to managing and optimizing AWS resources.

Why Choose simplyblock for AWS Cloud?

While AWS provides robust cloud services, protecting cloud workloads against ransomware and ensuring business continuity across regions is crucial. This is where simplyblock’s specialized protection approach creates unique value:

Cloud Infrastructure Protection

Simplyblock ensures the integrity of your AWS environment by providing immutable backups of critical cloud resources, including EC2 instances, EBS volumes, and RDS databases. Unlike traditional backup solutions, simplyblock’s immutable storage architecture protects your AWS workloads against ransomware attacks while maintaining cross-region availability. The platform integrates seamlessly with AWS’s native services while adding an extra layer of ransomware-proof protection for your critical data.

Zero-Downtime Cloud Recovery

Simplyblock enables rapid recovery of AWS environments by preserving complete infrastructure states, maintaining data consistency across availability zones, and ensuring immediate access to clean backup copies. In the event of a ransomware attack or disaster, organizations can quickly restore their AWS workloads without paying ransoms or experiencing extended downtime. This approach ensures business continuity across your entire AWS infrastructure, from compute resources to storage volumes.

Enterprise-Grade AWS Protection

Simplyblock optimizes AWS protection through efficient management of backup storage, intelligent handling of cross-region replication, and preservation of infrastructure configurations. By leveraging AWS’s global infrastructure while adding immutable protection, simplyblock ensures both data integrity and cost efficiency for your cloud workloads.

If you’re looking to further streamline your AWS operations, simplyblock offers comprehensive solutions that integrate seamlessly with these tools, helping you get the most out of your AWS environment.

Ready to take your AWS management to the next level? Contact simplyblock today to learn how we can help you simplify and enhance your AWS journey.

The post Best Open Source Tools for AWS Cloud appeared first on simplyblock.

Serverless Compute Need Serverless Storage

Chris Engelbert — Wed, 23 Oct 2024 11:37:27 +0000

The use of serverless infrastructures is steeply increasing. As the Datadog “State of Serverless 2023” survey shows, more than half of all cloud customers have already adopted a serverless environment on the three big hyperscalers—at least to some extent. The premise of saving cost while automatically and indefinitely scaling (up and down) increases the user base.

Due to this movement, other cloud operators, many database companies (such as Neon and Nile), and infrastructure teams at large enterprises are building serverless environments, either on their premises or in their private cloud platforms.

While there are great options for serverless compute, providing serverless storage to your serverless platform tends to be more challenging. This is often fueled by a lack of understanding of what serverless storage has to provide and its requirements.

What is a Serverless Architecture?

Serverless architecture is a software design pattern that leverages serverless computing resources to build and run applications without managing the underlying architecture. These serverless compute resources are commonly provided by cloud providers such as AWS Lambda, Google Cloud Functions, or Azure Functions and can be dynamically scaled up and down.

Simplified serverless architecture with clients and multiple functions

When designing a serverless architecture, you’ll encounter the so-called Function-as-a-Service (FaaS), meaning that the application’s core logic will be implemented in small, stateless functions that respond to events.

That said, typically, several FaaS make up the actual application, sending events between them. Since the underlying infrastructure is abstracted away, the functions don’t know how requests or responses are handled, and their implementations are designed for vendor lock-in and built against a cloud-provider-specific API.

Cloud-vendor-agnostic solutions exist, such as knative, but require at least parts of the team to manage the Kubernetes infrastructure. They can, however, take the burden away from other internal and external development teams.

What is Serverless Compute?

While a serverless architecture describes the application design that runs on top of a serverless compute infrastructure, serverless compute itself describes the cloud computing model in which the cloud provider dynamically manages the allocation and provisioning of server resources.

Simplified serverless platform architecture

It is essential to understand that serverless doesn’t mean “without servers” but “as a user, I don’t have to plan, provision, or manage the infrastructure.”

In essence, the cloud provider (or whoever manages the serverless infrastructure) takes the burden from the developer. Serverless compute environments fully auto-scale, starting or stopping instances of the functions according to the needed capacity. Due to their stateless nature, it’s easy to stop and restart them at any point in time. That means that function instances are often very short-lived.

Popular serverless compute platforms include AWS Lambda, Google Cloud Functions, and Azure Functions. For self-managed operations, there is knative (mentioned before), as well as OpenFaaS and OpenFunction (which seems to have less activity in the recent future).

They all enable developers to focus on writing code without managing the underlying infrastructure.

What is a Serverless Storage System?

Serverless storage refers to a cloud storage model where the underlying infrastructure, capacity planning, and scaling are abstracted away from the user. With serverless storage, customers don’t have to worry about provisioning or managing storage servers or volumes. Instead, they can store and retrieve data while the serverless storage handles all the backend infrastructure.

Serverless storage solutions come in different forms and shapes, beginning with an object storage interface, such as Amazon S3 or Google Cloud Storage. Object storage is excellent when storing unstructured data, such as documents or media.

Serverless storage options are available in GCP, AWS, and Azure

Another option that people love to use for serverless storage is serverless databases. Various options are available, depending on your needs: relational, NoSQL, time-series, and graph databases. This might be the easiest way to go, depending on how you need to access data. Examples of such serverless databases include Amazon Aurora Serverless, Google’s Cloud Datastore, and external companies such as Neon or Nile.

When self-managing your serverless infrastructure with knative or one of the alternative options, you can use Kubernetes CSI storage providers to provide storage into your functions. However, you may add considerable startup time if you choose the wrong CSI driver. I might be biased, but simplyblock is an excellent option with its neglectable provisioning and attachment times, as well as features such as multi-attach, where a volume can be attached to multiple functions (for example, to provide a shared set of data).

Why Serverless Architectures?

Most people think of cost-efficiency when it comes to serverless architectures. However, this is only one side of the coin. If your use cases aren’t a good fit for a serverless environment, it will hold true—more on when serverless makes sense later.

In serverless architectures, functions are triggered through an event, either from the outside world (like an HTTP request) or an event initiated by another function. If no function instance is up and running, a new instance will be started. The same goes for situations where all function instances are busy. If function instances idle, they’ll be shut down.

Serverless functions usually use a pay-per-use model. A function’s extremely short lifespan can lead to cost reductions over deployment models like containers and virtual machines, which tend to run longer.

Apart from that, serverless architectures have more benefits. Many are moving in the same direction as microservices architectures, but with the premise that they are easier to implement and maintain.

First and foremost, serverless solutions are designed for scalability and elasticity. They quickly and automatically scale up and down depending on the incoming workload. It’s all hands-free.

Another benefit is that development cycles are often shortened. Due to the limited size and functionality of a FaaS, changes are fast to implement and easy to test. Additionally, updating the function is as simple as deploying the new version. All existing function instances finish their current work and shut down. In the meantime, the latest version will be started up. Due to its stateless nature, this is easy to achieve.

What are the Complexities of Serverless Architecture?

Writing serverless solutions has the benefits of fast iteration, simplified deployments, and potential cost savings. However, they also come with their own set of complexities.

Designing real stateless code isn’t easy, at least when we’re not just talking about simple transformation functionality. That’s why a FaaS receives and passes context information along during its events.

What works great for small bits of context is challenging for larger pieces. In this situation, a larger context, or state, can mean lots of things, starting from simple cross-request information that should be available without transferring it with every request over more involved data, such as lookup information to enrich and cross-check, all the way to actual complex data, like when you want to implement a serverless database. And yes, a serverless database needs to store its data somewhere.

That’s where serverless storage comes in, and simply put, this is why all serverless solutions have state storage alternatives.

What is Serverless Storage?

Serverless storage refers to storage solutions that are fully integrated into serverless compute environments without manual intervention. These solutions scale and grow according to user demand and complement the pay-by-use payment model of serverless platforms.

Serverless storage lets you store information across multiple requests or functions.

As mentioned above, cloud environments offer a wide selection of serverless storage options. However, all of them are vendor-bound and lock you into their services.

However, when you design your serverless infrastructure or service, these services don’t help you. It’s up to you to provide the serverless storage. In this case, a cloud-native and serverless-supporting storage engine can simplify this talk immensely. Whether you want to provide object storage, a serverless database, or file-based storage, an underlying cloud-native block storage solution is the perfect building block underneath. However, this block storage solution needs to be able to scale and grow with your needs easily and quickly to provision and support snapshotting, cloning, and attaching to multiple function instances.

Why do Serverless Architectures Require Serverless Storage?

Serverless storage has particular properties designed for serverless environments. It needs to keep up with the specific requirements of serverless architectures, most specifically short lifetimes, extremely fast up and down scaling or restarts, easy use across multiple versions during updates, and easy integration through APIs utilized by the FaaS.

The most significant issues are that it must be used by multiple function instances simultaneously and is quickly available to new instances on other nodes, regardless of whether those are migrated over or used for scaling out. That means that the underlying storage technology must be prepared to handle these tasks easily.

These are just the most significant requirements, but there are more:

Stateless nature: Serverless functions spin up, execute, and terminate due to their stateless nature. Without fast, persistent storage that can be attached or accessed without any additional delay, this fundamental property of serverless functions would become a struggle.
Scalability needs: Serverless compute is built to scale automatically based on user demand. A storage layer needs to seamlessly support the growth and shrinking of serverless infrastructures and handle variations in I/O patterns, meaning that traditional storage systems with fixed capacity limits don’t align well with the requirements of serverless workloads.
Cost efficiency: One reason people engage with serverless compute solutions is cost efficiency. Serverless compute users pay by actual execution time. That means that serverless storage must support similar payment structures and help serverless infrastructure operators efficiently manage and scale their storage capacities and performance characteristics.
Management overhead: Serverless compute environments are designed to eliminate manual server management. Therefore, the storage solution needs to minimize its manual administrative tasks. Allocating and scaling storage requirements must be fully integratable and automated via API calls or fully automatic. Also, the integration must be seamless if multiple storage tiers are available for additional cost savings.
Performance requirements: Serverless functions require fast, if not immediate, access to data when they spin up. Traditional storage solutions introduce delays due to allocation and additional latency, negatively impacting serverless functions’ performance. As functions are paid by runtime, their operational cost increases.
Integration needs: Serverless architectures typically combine many services, as individual functions use different services. That said, the underlying storage solution of a serverless environment needs to support all kinds of services provided to users. Additionally, seamless integration with the management services of the serverless platform is required.

There are quite some requirements. For the alignment of serverless compute and serverless storage, storage solutions need to provide an efficient and manageable layer that seamlessly integrates with the overall management layer of the serverless platform.

Simplyblock for Serverless Storage

When designing a serverless environment, the storage layer must be designed to keep up with the pace. Simplyblock enables serverless infrastructures to provide dynamic and scalable storage.

To achieve this, simplyblock provides several characteristics that perfectly align with serverless principles:

Dynamic resource allocation: Simplyblock’s thin provisioning makes capacity planning irrelevant. Storage is allocated on-demand as data is written, similar to how serverless platforms allocate resources. That means every volume can be arbitrarily large to accommodate unpredictable future growth. Additionally, simplyblock’s logical volumes are resizable, meaning that the volume can be enlarged at any point in the future.
Automatic scaling: Simplyblock’s storage engine can indefinitely grow. To acquire additional backend storage, simplyblock can automatically acquire additional persistent disks (like Amazon EBS volumes) from cloud providers or attach additional storage nodes to its cluster when capacity is about to exceed, handling scaling without user intervention.
Abstraction of infrastructure: Users interact with simplyblock’s virtual drives like normal hard disks. This abstracts away the complexity of the underlying storage pooling and backend storage technologies.
Unified interface: Simplyblock provides a unified storage interface (NVMe) logical device that abstracts away underlying, diverging storage interfaces behind an easy-to-understand disk design. That enables services not specifically designed to talk to object storages or similar technologies to immediately benefit from them, just like PostgreSQL or MySQL.
Extensibility: Due to its disk-like storage interface, simplyblock is highly extensible in terms of solutions that can be run on top of it. Databases, object storage, file storage, and specific storage APIs, simplyblock provides scalable block storage to all of them, making it the perfect backend solution for serverless environments.
Crash-consistent and recoverable: Serverless storage must always be up and running. Simplyblock’s distributed erasure coding (parity information similar to RAID-5 or 6) enables high availability and fault tolerance on the storage level with a high storage efficiency, way below simple replication. Additionally, simplyblock provides storage cluster replication (sync / async), consistent snapshots across multiple logical volumes, and disaster recovery options.
Automated management: With features like automatic storage tiering to cheaper object storage (such as Amazon S3), automatic scaling, as well as erasure coding and backups for data protection, simplyblock eliminates manual management overhead and hands-on tasks. Simplyblock clusters are fully autonomous and manage the underlying storage backend automatically.
Flexible integration: Serverless platforms require storage to be seamlessly allocated and provisioned. Simplyblock achieves this through its API, which can be integrated into the standard provisioning flow of new customer sign-ups. If the new infrastructure runs on Kubernetes, integration is even easier with the Kubernetes CSI driver, allowing seamless integration with container-based serverless platforms such as knative.
Pay-per-use potential: Due to the automatic scalability, thin provisioning, and seamless resizing and integration, simplyblock enables you to provide your customers with an industry-loved pay-by-use model for managed service providers, perfectly aligning with the most common serverless pricing models.

Simplyblock is the perfect backend storage for all your serverless storage needs while future-proofing your infrastructure. As data grows and evolves, simplyblock’s flexibility and scalability ensure you can adapt without massive overhauls or migrations.

Remember, simplyblock offers powerful features like thin provisioning, storage pooling, and tiering, helping you to provide a cost-efficient, pay-by-use enabled storage solution. Get started now and find out how easy it is to operate services on top of simplyblock.

The post Serverless Compute Need Serverless Storage appeared first on simplyblock.

Developer Platforms at Scale | Elias Schneider

Rahil Parekh — Tue, 22 Oct 2024 23:13:53 +0000

Introduction:

In this episode of Cloud Frontier, Rob Pankow interviews Elias Schneider, founder of Codesphere, about his journey and the evolution of developer platforms at scale. With a background at Google, Elias brings deep expertise in cloud-native development processes. They discuss the challenges of building large-scale developer platforms and why enterprise customers are crucial for scaling such solutions

This interview is part of the simplyblock Cloud Frontier Podcast, available on Youtube, Spotify, iTunes/Apple Podcasts, and our show site.

Key Takeaways

Q: What are the current trends in cloud migration and infrastructure management?

One major trend is the shift back to on-premise infrastructure from the cloud, driven by rising cloud costs and increased control requirements. Many enterprises are adopting a hybrid approach, keeping some workloads on-prem while utilizing cloud services for scaling and fluctuating demands. This allows businesses to balance cost and performance while managing regulatory concerns.

Q: Why is it important to use managed services in cloud environments?

Managed services in cloud environments allow companies to offload the complexity of infrastructure management. This includes automatic updates, monitoring, and scaling, which reduces the need for dedicated personnel and ensures the infrastructure runs efficiently. Without managed services, companies face increased operational overhead and risk of downtime.

In addition to highlighting the key takeaways, it’s essential to provide context that enriches the listener’s understanding of the episode. By offering this added layer of information, we ensure that when you tune in, you’ll have a clearer grasp of the nuances behind the discussion. This approach helps shed light on the reasoning and perspective behind the thoughtful questions posed by our host, Rob Pankow. Ultimately, this allows for a more immersive and insightful listening experience.

Key Learnings

Q: What are the benefits of developers managing their own cloud environments?

Allowing developers to manage their own cloud environments enables faster iterations and more autonomy. It eliminates the need for constant back-and-forth with DevOps teams, which can slow down development. Developers can directly deploy, test, and scale applications, which leads to more agile development cycles.

Simplyblock Insight: When developers have control over their own environments, the development cycle speeds up significantly. Simplyblock’s orchestration tools simplify the deployment and management process, enabling developers to maintain performance and scalability while reducing the overhead typically associated with infrastructure management.

Q: What are the main challenges companies face with cloud scalability?

One major challenge with cloud scalability is managing the complexity of infrastructure as the number of services and applications grows. Many companies struggle with orchestrating resources efficiently, leading to cost overruns and increased downtime. Additionally, scaling globally while maintaining performance and compliance can be difficult without the right tools.

Simplyblock Insight: Ensuring optimal performance while scaling requires intelligent automation and resource orchestration. Simplyblock helps companies optimize storage and performance across distributed environments, automating resource allocation to reduce costs and prevent performance bottlenecks as businesses scale.

Q: What role does infrastructure sovereignty play in cloud adoption?

Infrastructure sovereignty refers to the ability of a company to maintain control over its infrastructure, especially when operating across public and private clouds. This is particularly important for enterprises facing regulatory constraints or data sovereignty laws that require specific handling of sensitive information.

Simplyblock Insight: With hybrid cloud setups becoming more common, maintaining control over where and how data is stored is crucial. Simplyblock offers solutions that allow businesses to manage data across multiple infrastructures, ensuring compliance with data regulations while optimizing performance and cost-efficiency.

Additional Nugget of Information

Q: Why are hybrid cloud solutions becoming more popular?

As companies scale their cloud operations, hybrid cloud solutions are becoming increasingly popular. A hybrid approach allows businesses to combine the benefits of on-premise infrastructure with cloud services, offering more flexibility, better cost management, and the ability to meet regulatory requirements. This approach enables companies to maintain control over critical workloads while benefiting from the scalability of the cloud.

Conclusion

In this episode, Elias Schneider shares his journey from Google to founding Codesphere, emphasizing the importance of addressing the needs of large enterprises. Codesphere helps companies standardize their development processes, enabling faster deployments and reducing costs. As you think about your company’s cloud strategy, consider how platforms like Codesphere can offer scalability, sovereignty, and streamlined processes.

If you’re in the process of scaling your development or infrastructure, now is the time to explore solutions that empower your developers and improve operational efficiency. Whether you are considering hybrid cloud solutions or simply aiming to enhance your development workflows, the insights from this episode provide valuable guidance.

If you’re eager to learn more about founding early-stage cloud infrastructure startups, entrepreneurship, or taking visionary ideas to market, be sure to tune in to future episodes of the Cloud Frontier Podcast.

Stay updated with expert insights that can help shape the next generation of cloud infrastructure innovations!

The post Developer Platforms at Scale | Elias Schneider appeared first on simplyblock.