Storage Archives | simplyblock

5 Storage Solutions for Kubernetes in 2025

Chris Engelbert — Mon, 27 Jan 2025 13:04:31 +0000

Selecting your Kubernetes persistent storage may be tough due to the many available options. Not all are geared towards enterprise setups, though. Hence, we would like to briefly introduce 5 storage solutions for Kubernetes that may meet your enterprise’s diverse storage needs.

That said, as Kubernetes adoption keeps growing in 2025, selecting the right storage solution is more important than ever. Enterprise-level features, data encryption, and high availability are at the forefront of the requirements we want to look into. The same is true for the ability to attach to multiple clients simultaneously and the built-in data loss protection.

Simplyblock: Enterprise Storage Platform for Kubernetes

Simplyblock™ is an enterprise-grade storage platform designed to cater to high-performance and scalability needs for storage in Kubernetes environments. Simplyblock is fully optimized to take advantage of modern NVMe devices. It utilizes the NVMe/TCP protocol to share its shared volumes between the storage cluster and clients, providing superior throughput and lower access latency than the alternatives.

Simplyblock is designed as a cloud-native solution and is highly integrated with Kubernetes through its Simplyblock CSI driver. It supports dynamic provisioning, snapshots, clones, volume resizing, fully integrated encryption at rest, and many more. One benefit of simplyblock is its use of NVMe over TCP which is integrated into the Linux and Windows (Server 2025 or later) kernels, meaning no additional drivers. This also means it is easy to use simplyblock volumes outside of Kubernetes if you also operate virtual machines and would like to unify your storage. Furthermore, simplyblock volumes support read-write multi-attach. That means they can be attached to multiple pods, VMs, or both at the same time, making it easy to share data.

Its scale-out architecture provides full multi-tenant isolation, meaning that many customers can share the same storage backend. Logical volumes can be encrypted at rest either by an encryption key per tenant or even per logical volume, providing the strongest isolation option.

Deployment-wise, simplyblock offers the best of both worlds: disaggregated and hyperconverged setups. Simplyblock’s storage engine can be deployed on either a set of separate storage nodes, building a disaggregated storage cluster, or on Kubernetes worker nodes to utilize the worker node-local storage. Simplyblock also supports a mixed setup, where workloads can benefit from ultra-low latency with worker node-local storage (node-affinity) and the security and “spill-over” of the infinite storage from the disaggregated cluster.

As the only solution presented here, simplyblock favors erasure coding over replication for high availability and fault tolerance. Erasure coding is quite similar to RAID and uses parity information to achieve data loss protection. Simplyblock distributes this parity information across cluster nodes for higher fault tolerance. That said, erasure coding has a configuration similar to a replication factor, defining how many chunks and parity information will be used per calculation. This enables the best trade-off between data protection and storage overhead, enabling secure setups with as little as 50% additional storage requirements.

Furthermore, simplyblock provides a full multi-tier solution that caters to diverse storage needs in a single system. It enables you to utilize ultra-fast flash storage devices such as NVMe alongside slower SATA/SAS SSDs. At the same time, you can manage your long-term (cold) storage with traditional HDDs or QLC-based flash storage (slower but very high capacity).

Simplyblock is a robust choice if you need scalable, high-performance block storage for use cases such as databases, CDNs (Content Delivery Network), analytics solutions, and similar. Furthermore, simplyblock offers high throughput and low access latency. With its use of erasure coding, simplyblock is a great solution for companies seeking cost-effective storage while its ease of use allows organizations to adapt quickly to changing storage demands. For businesses seeking a modern, efficient, and Kubernetes-optimized block storage solution, simplyblock offers a compelling combination of features and performance.

Portworx: Kubernetes Storage and Data Management

Portworx is a cloud-native, software-defined storage platform that is highly integrated with Kubernetes. It is an enterprise-grade, closed-source solution that was acquired and is in active development by Pure Storage. Hence, its integration with the Pure Storage hardware appliances enables a performant, scalable storage option with integrated tiering capabilities.

Portworx integrated with Kubernetes through its native CSI driver and provides important CSI features such as dynamic provisioning, snapshots, clones, resizing, and persistent or ephemeral volumes. Furthermore, Portworx supports data-at-rest encryption and disaster recovery using synchronous and asynchronous cluster replication.

To enable fault tolerance and high availability, Portworx utilizes replicas, storing copies of data on different cluster nodes. This multiplies the required disk space by the replication factor. For the connection between the storage cluster and clients, Portworx provides access via iSCSI, a fairly old protocol that isn’t necessarily optimized for fast flash storage.

For connections between Pure’s FlashArray and Portworx, you can use NVMe/TCP or NVMe-RoCE (NVMe with RDMA over Converged Ethernet) — a mouthful, I know.

Encryption at rest is supported with either a unique key per volume or a cluster-wide encryption key. For storage-client communication, while iSCSI should be separated into its own VLAN, remember that iSCSI itself isn’t encrypted, meaning that encryption in transit isn’t guaranteed (if not pushed through a secured channel).

As mentioned, Portworx distinguishes itself by integrating with Pure Storage appliances. This integration enables organizations to leverage the performance and reliability of Pure’s flash storage systems. This makes Portworx a compelling choice for running critical stateful applications such as databases, message queues, and analytics platforms in Kubernetes, especially if you don’t fear operating hardware appliances. While available as a pure software-defined storage solution, Portworx excels in combination with Pure’s hardware, making it a great choice for databases, high-throughput message queues, and analytical applications on Kubernetes.

Ceph: Open-source, Distributed Storage System

Ceph is a highly scalable and distributed storage solution. Run as a company-backed open-source project, Ceph presents a unified storage platform with support for block, object, and file storage. That makes it a versatile choice for a wide range of Kubernetes applications.

Ceph’s Kubernetes integration is provided through the ceph-csi driver, which brings dynamically provisioned persistent volumes and automatic lifecycle management. CSI features supported by Ceph include snapshotting, cloning, resizing, and encryption.

The architecture of Ceph is built to be self-healing and self-managing, mainly designed to enable infinite disk space scalability. The provided access latency, while not on the top end, is good enough for many use cases. Running workloads like databases, which love high IOPS and low latency, can feel a bit laggy, though. Finally, high availability and fault tolerance are implemented through replication between Ceph cluster nodes.

From the security end, Ceph supports encryption at rest via a few different options. I’d recommend using the LUKS-based (Linux Unified Key Setup) setup as it supports all of the different Ceph storage options. The communication between cluster nodes, as well as storage and client, is not encrypted by default. If you require encryption in transit (and you should), utilize SSH and SSL termination via HAproxy or similar solutions. It’s unfortunate that a storage solution as big as Ceph has no such built-in support. The same goes for multi-tenancy, which can be achieved using RADOS namespaces but isn’t an out-of-the-box solution.

Ceph is an excellent choice as your Kubernetes storage when you are looking for an open-source solution with a proven track record of enterprise deployments, infinite storage scalability, and versatile storage types. It is not a good choice if you are looking for high-performance storage with low latency and millions of IOPS.

Moreover, due to its community-driven development and support, Ceph can be operated as a cost-effective and open-source alternative to proprietary storage solutions. Whether you’re deploying in a private data center, a public cloud, or a hybrid environment, Ceph’s adaptability is a great help for managing storage in containerized ecosystems.

Commercial support for Ceph is available from Red Hat.

Longhorn: Cloud-native Block Storage for Kubernetes

Longhorn is an open-source, cloud-native storage solution specifically designed for Kubernetes. It provides block storage that focuses on flexibility and ease of use. Therefore, Longhorn deploys straight into your Kubernetes cluster, providing worker node-local storage as persistent volumes.

As a cloud-native storage solution, Longhorn provides its own CSI driver, highly integrated with Longhorn and Kubernetes. It enables dynamic provisioning and management of persistent volumes, snapshots, clones, and backups. For the latter, people seem to have some complications with restores, so make sure to test your recovery processes.

For communication between storage and clients, Longhorn uses the iSCSI protocol. A newer version of the storage engine is in the works, which enables NVMe over TCP, however, at the time of writing, this engine isn’t yet production-ready and is not recommended for production use.

Anyhow, Longhorn provides good access latency and throughput, making it a great solution for mid-size databases and similar workloads. Encryption at rest can be set up but isn’t as simple as with some alternatives. High availability and fault tolerance is achieved by replicating data between cluster nodes. That means, as with many other solutions, the required storage is multiplied by the replication factor. However, Longhorn supports incremental backups to external storage systems like S3 for easy data protection and fast recoverability in disaster scenarios.

Longhorn is a full open-source project under the Cloud Native Computing Foundation (CNCF). It was originally developed by Rancher and is backed by SUSE. Hence, it’s commercially available with enterprise support as SUSE Storage.

Longhorn is a good choice if you want a lightweight, cloud-native, open-source solution that runs hyper-converged with your cluster workloads. It is usually used for smaller deployments or home labs — widely discussed on Reddit. Generally, it is not considered as robust as Ceph and, hence, is not recommended for mission-critical enterprise production workloads.

NFS: File Sharing Solution for Enterprises with Heterogeneous Environments

NFS (Network File System) is a well-known and widely adopted file-sharing protocol, inside and outside Kubernetes. That said, NFS has a proven track record showing its simplicity, reliability, and ability to provide shared access to persistent storage.

One of the main features of NFS is its ability to simultaneously attach volumes to many containers (and pods) with read-write access. That enables easy sharing of configuration, training data, or similar shared data sets between many instances or applications.

There are quite a few different options for integrating NFS with Kubernetes. The two main ones are the Kubernetes NFS Subdir External Provisioner. Both automatically create NFS subdirectories when new persistent volumes are requested, and the csi-driver-nfs. In addition, many storage solutions provide optimized NFS CSI drivers designed to provision shares for their respective solutions automatically. Such storage options include TrueNAS, OpenEBS, Dell EMC, and others.

High availability is one of the elements of NFS that isn’t simple, though. To make automatic failover work, additional tools like Corosync or Pacemaker need to be configured. On the client side, automount should be set up to handle automatic failover and reconnection. NFS is an old protocol from a time when those additional steps were commonplace. Today, they feel frumpy and out of place, especially compared to available alternatives.

While multi-tenancy isn’t strictly supported by NFS, using individual shares could be seen as a viable solution. However, remember that shares aren’t secured in any way. Authentication requires additional setups such as Kerberos. File access permissions shouldn’t be used as a sufficient setup for tenant isolation.

Encryption at rest with NFS comes down to the backing storage solution. NFS, as a sharing protocol, doesn’t offer anything by itself. Encryption in transit is supported, either via Kerberos or other means like TLS via stunnel. The implementation details differ per NFS provider, though. You should consult your provider’s manual.

NFS is your Kubernetes storage of choice if you need a simple, scalable, and shared file storage system that integrates seamlessly into your existing infrastructure. In the best case, you already have an NFS server set up and ready to go. Installing the CSI driver and configuring the storage class is all you need. While NFS might be a bottleneck for high-performance systems such as databases, many applications work perfectly fine. Imagine you need to scale out a WordPress-based website. There isn’t an easier way to share the same writable storage to many WP instances. That said, for organizations looking for a mature, battle-tested storage option to deliver shared storage with minimal complexity, NFS is the choice.

Make Your Kubernetes Storage Choice in 2025

Selecting the right storage solution for your Kubernetes persistent volume isn’t easy. It is an important choice to ensure performance, scalability, and reliability for your containerized workloads. Solutions like simplyblock™, Portworx, Ceph, Longhorn, and NFS offer a great set of features and are optimized for different use cases.

NFS is loved for its simplicity and easy multi-attach functionality. It is a great choice for all use cases needing shared write access. It’s not a great fit for high throughput and super-low access latency, though.

Ceph, on the other hand, is great if you need infinite scalability and don’t fear away from a slightly more complicated setup and operation. Ceph provides a robust choice for all use cases, as well as high-performance databases and similar IO-intensive applications.

Longhorn and Portworx are generally good choices for almost all types of applications. Both solutions provide good access latency and throughput. If you tend to buy hardware appliances, Portworx, in combination with Pure Storage, is the way to go. If you prefer pure software-defined storage and want to utilize storage available in your worker nodes, take a look at Longhorn.

Last but not least, simplyblock is your choice when running IO-hungry databases in or outside Kubernetes. Its use of the NVMe/TCP protocol makes it a perfect choice for pure container storage, as well as mixed environments with containers and virtual machines. Due to its low storage overhead for data protection, simplyblock is a great, cost-effective, and fast storage solution. And a bit of capacity always remains for all other storage needs, meaning a single solution will do it for you.

As Kubernetes evolves, leveraging the proper storage solution will significantly improve your application performance and resiliency. To ensure you make an informed decision for your short and long-term storage needs, consider factors like workload complexity, deployment scale, and data management needs.

Whether you are looking for a robust enterprise solution or a more simple and straightforward setup, these five options are all strong contenders to meet your Kubernetes storage demands in 2025.

The post 5 Storage Solutions for Kubernetes in 2025 appeared first on simplyblock.

Kubernetes Storage 201: Concepts and Practical Examples

Rob Pankow — Mon, 23 Dec 2024 09:08:57 +0000

What is Kubernetes Storage?

Kubernetes storage is a sophisticated ecosystem designed to address the complex data management needs of containerized applications. At its core, Kubernetes storage provides a flexible mechanism to manage data across dynamic, distributed computing environments. It allows your containers to store, access, and persist data with unprecedented flexibility.

Storage Types in Kubernetes

Fundamentally, Kubernetes provides two types of storage: ephemeral volumes are bound to the container’s lifecycle, and persistent volumes survive a container restart or termination.

Ephemeral (Non-Persistent) Storage

Ephemeral storage represents the default storage mechanism in Kubernetes. It provides a temporary storage solution, existing only for the duration of a container’s lifecycle. Therefore, when a container is terminated or removed, all data stored in this temporary storage location is permanently deleted.

This type of storage is ideal for transient data that doesn’t require long-term preservation, such as temporary computation results or cache files. Most stateless workloads utilize ephemeral storage for these kinds of temporary data. That said, a “stateless workload” doesn’t necessarily mean no data is stored temporarily. It means there is no issue if this storage disappears from one second to the next.

Persistent Storage

Persistent storage is a critical concept in Kubernetes that addresses one of the fundamental challenges of containerized applications: maintaining data integrity and accessibility across dynamic and ephemeral computing environments.

Unlike ephemeral storage, which exists only for the lifetime of a container, persistent storage is not bound to the lifetime of a container. Hence, persistent storage provides a robust mechanism for storing and managing data that must survive container restarts, pod rescheduling, or even complete cluster redesigns. You enable persistent Kubernetes storage through the concepts of Persistent Volumes (PV) as well as Persistent Volume Claims (PVC).

Fundamental Kubernetes Storage Entities

Figure 1: The building blocks of Kubernetes Storage

Storage in Kubernetes is built up from multiple entities, depending on how storage is provided and if it is ephemeral or persistent.

Persistent Volumes (PV)

A Persistent Volume (PV) is a slice of storage in the Kubernetes cluster that has been provisioned by an administrator or dynamically created through a StorageClass. Think of a PV as a virtual storage resource that exists independently of any individual pod’s lifecycle. Consequently, this abstraction allows for several key capabilities:

Lifecycle Independence: A PV exists even when the owner-pod is deleted, recreated, or moved to different nodes.
Storage Type Abstraction: A PV is not bound to a specific storage technology. Hence, it can represent various storage types, including:

Persistent Volume Claims (PVC): Requesting Storage Resources

Persistent Volume Claims act as a user’s request for storage resources. Image your PVC as a demand for storage with specific requirements, similar to how a developer requests computing resources.

When a user creates a PVC, Kubernetes attempts to find and bind an appropriate Persistent Volume that meets the specified criteria. If no existing volume is found but a storage class is defined or a cluster-default one is available, the persistent volume will be dynamically allocated.

Key PersistentVolumeClaim Characteristics:

Size Specification: Defines a user storage capacity request
Access Modes: Defines how the volume can be accessed
- ReadWriteOnce (RWO): Allows all pods on a single node to mount the volume in read-write mode.
- ReadWriteOncePod: Allows a single pod to read-write mount the volume on a single node.
- ReadOnlyMany (ROX): Allows multiple pods on multiple nodes to read the volume. Very practical for a shared configuration state.
- ReadWriteMany (RWO): Allows multiple pods on multiple nodes to read and write to the volume. Remember, this could be dangerous for databases and other applications that don’t support a shared state.
StorageClass: Allows requesting specific types of storage based on performance, redundancy, or other characteristics

The Container Storage Interface (CSI)

The Container Storage Interface (CSI) represents a pivotal advancement in Kubernetes storage architecture. Before CSI, integrating storage devices with Kubernetes was a complex and often challenging process that required a deep understanding of both storage systems and container orchestration.

The Container Storage Interface introduces a standardized approach to storage integration. Storage providers (commonly referred to as CSI drivers) are so-called out-of-process entities that communicate with Kubernetes via an API. The integration of CSI into the Kubernetes ecosystem provides three major benefits:

CSI provides a vendor-neutral, extensible plugin architecture
CSI simplifies the process of adding new storage systems to Kubernetes
CSI enables third-party storage providers to develop and maintain their own storage plugins without modifying Kubernetes core code

Volumes: The Basic Storage Units

In Kubernetes, volumes are fundamental storage entities that solve the problem of data persistence and sharing between containers. Unlike traditional storage solutions, Kubernetes volumes are not limited to a single type of storage medium. They can represent:

Volumes provide a flexible abstraction layer that allows applications to interact with storage resources without being directly coupled to the underlying storage infrastructure.

StorageClasses: Dynamic Storage Provisioning

StorageClasses represent a powerful abstraction that enables dynamic and flexible storage provisioning because they allow cluster administrators to define different types of storage services with varying performance characteristics, such as:

High-performance SSD storage
Economical magnetic drive storage
Geo-redundant cloud storage solutions

When a user requests storage through a PVC, Kubernetes tries to find an existing persistent volume. If none was found, the appropriate StorageClass defines how to automatically provision a suitable storage resource, significantly reducing administrative overhead.

Figure 2: Table with features for ephemeral storage and persistent storage

Best Practices for Kubernetes Storage Management

Resource Limitation
- Implement strict resource quotas
- Control storage consumption across namespaces
- Set clear boundaries for storage requests
Configuration Recommendations
- Always use Persistent Volume Claims in container configurations
- Maintain a default StorageClass
- Use meaningful and descriptive names for storage classes
Performance and Security Considerations
- Implement quality of service (QoS) controls
- Create isolated storage environments
- Enable multi-tenancy through namespace segregation

Practical Storage Provisioning Example

While specific implementations vary, here’s a conceptual example of storage provisioning using Helm:

helm install storage-solution storage-provider/csi-driver \
  --set storage.size=100Gi \
  --set storage.type=high-performance \
  --set access.mode=ReadWriteMany

Kubernetes Storage with Simplyblock CSI: Practical Implementation Guide

Simplyblock is a storage platform for stateful workloads such as databases, message queues, data warehouses, file storage, and similar. Therefore, simplyblock provides many features tailored to the use cases, simplifying deployments, improving performance, or enabling features such as instant database clones.

Basic Installation Example

When deploying storage in a Kubernetes environment, organizations need a reliable method to integrate storage solutions seamlessly. The Simplyblock CSI driver installation process begins by adding the Helm repository, which allows teams to easily access and deploy the storage infrastructure. By creating a dedicated namespace called simplyblock-csi, administrators ensure clean isolation of storage-related resources from other cluster components.

The installation command specifies critical configuration parameters that connect the Kubernetes cluster to the storage backend. The unique cluster UUID identifies the specific storage cluster, while the API endpoint provides the connection mechanism. The secret token ensures secure authentication, and the pool name defines the initial storage pool where volumes will be provisioned. This approach allows for a standardized, secure, and easily repeatable storage deployment process.

Here’s an example of installing the Simplyblock CSI driver:

helm repo add simplyblock-csi https://raw.githubusercontent.com/simplyblock-io/simplyblock-csi/master/charts

helm repo update

helm install -n simplyblock-csi --create-namespace \
  simplyblock-csi simplyblock-csi/simplyblock-csi \
  --set csiConfig.simplybk.uuid=[random-cluster-uuid] \
  --set csiConfig.simplybk.ip=[cluster-ip] \
  --set csiSecret.simplybk.secret=[random-cluster-secret] \
  --set logicalVolume.pool_name=[cluster-name]

Advanced Configuration Scenarios

1. Performance-Optimized Storage Configuration

Modern applications often require precise control over storage performance, making custom StorageClasses invaluable.

Firstly, by creating a high-performance storage class, organizations can define exact performance characteristics for different types of workloads. The configuration sets a specific IOPS (Input/Output Operations Per Second) limit of 5000, ensuring that applications receive consistent and predictable storage performance.

Secondly, bandwidth limitations of 500 MB/s prevent any single application from monopolizing storage resources, promoting fair resource allocation. The added encryption layer provides an additional security measure, protecting sensitive data at rest. This approach allows DevOps teams to create storage resources that precisely match application requirements, balancing performance, security, and resource management.

# Example StorageClass configuration
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance-storage
provisioner: csi.simplyblock.io
parameters:
  qos_rw_iops: "5000"    # High IOPS performance
  qos_rw_mbytes: "500"   # Bandwidth limit
  encryption: "True"      # Enable encryption

2. Multi-Tenant Storage Setup

As a large organization or cloud provider, you require a robust environment and workload separation mechanism. For that reason, teams organize workloads between development, staging, and production environments by creating a dedicated namespace for production applications.

Therefore, the custom storage class for production workloads ensures critical applications have access to dedicated storage resources with specific performance and distribution characteristics.

The distribution configuration with multiple network domain controllers (NDCs) provides enhanced reliability and performance. Indeed, this approach supports complex enterprise requirements by enabling granular control over storage resources, improving security, and ensuring that production workloads receive the highest quality of service.

# Namespace-based storage isolation
apiVersion: v1
kind: Namespace
metadata:
  name: production-apps

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: encrypted-volume
  annotations:
    simplybk/secret-name: encrypted-volume-keys
spec:
  storageClassName: encrypted-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Multipath Storage Configuration

Network resilience is a critical consideration in enterprise storage solutions. Hence, multipath storage configuration provides redundancy by allowing multiple network paths for storage communication. By enabling multipathing and specifying a default network interface, organizations can create more robust storage infrastructures that can withstand network interruptions.

The caching node creation further enhances performance by providing an intelligent caching layer that can improve read and write operations. Furthermore, this configuration supports load balancing and reduces potential single points of failure in the storage network.

cachingnode:
  create: true
  multipathing: true
  ifname: eth0  # Default network interface

Best Practices for Kubernetes Storage with Simplyblock

Always specify a unique pool name for each storage configuration
Implement encryption for sensitive workloads
Use QoS parameters to control storage performance
Leverage multi-tenancy features for environment isolation
Regularly monitor storage node capacities and performance

Deletion and Cleanup

# Uninstall the CSI driver
helm uninstall "simplyblock-csi" --namespace "simplyblock-csi"

# Remove the namespace
kubectl delete namespace simplyblock-csi

The examples demonstrate the flexibility of Kubernetes storage, showcasing how administrators can fine-tune storage resources to meet specific application requirements while maintaining performance, security, and scalability. Try simplyblock for the most flexible Kubernetes storage solution on the market today.

The post Kubernetes Storage 201: Concepts and Practical Examples appeared first on simplyblock.

Encryption At Rest: A Comprehensive Guide to DARE

Chris Engelbert — Tue, 17 Dec 2024 10:22:44 +0000

TLDR: Data At Rest Encryption (DARE) is the process of encrypting data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

Today, data is the “new gold” for companies. Data collection happens everywhere and at any time. The global amount of data collected is projected to reach 181 Zettabytes (that is 181 billion Terabytes) by the end of 2025. A whopping 23.13% increase over 2024.

That means data protection is becoming increasingly important. Hence, data security has become paramount for organizations of all sizes. One key aspect of data security protocols is data-at-rest encryption, which provides crucial protection for stored data.

Understanding the Data-in-Use, Data-in-Transit, and Data-at-Rest

Before we go deep into DARE, let’s quickly discuss the three states of data within a computing system and infrastructure.

Figure 1: The three states of data in encryption

To start, any type of data is created inside an application. While the application holds onto it, it is considered as data in use. That means data in use describes information actively being processed, read, or modified by applications or users.

For example, imagine you open a spreadsheet to edit its contents or a database to process a query. This data is considered “in use.” This state is often the most vulnerable as the data must be in an unencrypted form for processing. However, technologies like confidential computing enable encrypted memory to process even these pieces of data in an encrypted manner.

Next, data in transit describes any information moving between locations. Locations means across the internet, within a private network, or between memory and processors. Examples include email messages being sent, files being downloaded, or database query results traveling between the database server and applications—all examples of data in transit.

Last but not least, data at rest refers to any piece of information stored physically on a digital storage media such as flash storage or hard disk. It also considers storage solutions like cloud storage, offline backups, and file systems as valid digital storage. Hence, data stored in these services is also data at rest. Think of files saved on your laptop’s hard drive or photos stored in cloud storage such as Google Drive, Dropbox, or similar.

Criticality of Data Protection

For organizations, threats to their data are omnipresent. Starting from unfortunate human error, deleting important information, coordinated ransomware attacks, encryption of your data, and asking for a ransom to actual data leaks.

Especially with data leaks, most people think about external hackers copying data off of the organization’s network and making it public. However, this isn’t the only way data is leaked to the public. There are many examples of Amazon S3 buckets without proper authorization, databases being accessible from the outside world, or backup services being accessed.

Anyhow, organizations face increasing threats to their data security. Any data breach has consequences, which are categorized into four segments:

Financial losses from regulatory fines and legal penalties
Damage to brand reputation and loss of customer trust
Exposure of intellectual property to competitors
Compliance violations with regulations like GDPR, HIPAA, or PCI DSS

While data in transit is commonly protected through TLS (transport layer encryption), data at rest encryption (or DARE) is often an afterthought. This is typically because the setup isn’t as easy and straightforward as it should be. It’s this “I can still do this afterward” effect: “What could possibly go wrong?”

However, data at rest is the most vulnerable of all. Data in use is often unprotected but very transient. While there is a chance of a leak, it is small. Data at rest persists for more extended periods of time, giving hackers more time to plan and execute their attacks. Secondly, persistent data often contains the most valuable pieces of information, such as customer records, financial data, or intellectual property. And lastly, access to persistent storage enables access to large amounts of data within a single breach, making it so much more interesting to attackers.

Understanding Data At Rest Encryption (DARE)

Simply spoken, Data At Rest Encryption (DARE) transforms stored data into an unreadable format that can only be read with the appropriate encryption or decryption key. This ensures that the information isn’t readable even if unauthorized parties gain access to the storage medium.

That said, the strength of the encryption used is crucial. Many encryption algorithms we have considered secure have been broken over time. Encrypting data once and taking for granted that unauthorized parties can’t access it is just wrong. Data at rest encryption is an ongoing process, potentially involving re-encrypting information when more potent, robust encryption algorithms are available.

Available Encryption Types for Data At Rest Encryption

Figure 2: Symmetric encryption (same encryption key) vs asymmetric encryption (private and public key)

Two encryption methods are generally available today for large-scale setups. While we look forward to quantum-safe encryption algorithms, a widely adopted solution has yet to be developed. Quantum-safe means that the encryption will resist breaking attempts from quantum computers.

Anyhow, the first typical type of encryption is the Symmetric Encryption. It uses the same key for encryption and decryption. The most common symmetric encryption algorithm is AES, the Advanced Encryption Standard.

const cipherText = encrypt(plaintext, encryptionKey);
const plaintext = decrypt(cipherText, encryptionKey);

The second type of encryption is Asymmetric Encryption. Hereby, the encryption and decryption routines use different but related keys, generally referred to as Private Key and Public Key. According to their names, the public key can be publicly known, while the private key must be kept private. Both keys are mathematically connected and generally based on a hard-to-solve mathematical problem. Two standard algorithms are essential: RSA (the Rivest–Shamir–Adleman encryption) and ECDSA (the Elliptic Curve Digital Signature Algorithm). While RSA is based on the prime factorization problem, ECDSA is based on the discrete log problem. To go into detail about those problems is more than just one additional blog post, though.

const cipherText = encrypt(plaintext, publicKey);
const plaintext = decrypt(cipherText, privateKey);

Encryption in Storage

The use cases for symmetric and asymmetric encryption are different. While considered more secure, asymmetric encryption is slower and typically not used for large data volumes. Symmetric encryption, however, is fast but requires the sharing of the encryption key between the encrypting and decrypting parties, making it less secure.

To encrypt large amounts of data and get the best of both worlds, security and speed, you often see a combination of both approaches. The symmetric encryption key is encrypted with an asymmetric encryption algorithm. After decrypting the symmetric key, it is used to encrypt or decrypt the stored data.

Simplyblock provides data-at-rest encryption directly through its Kubernetes CSI integration or via CLI and API. Additionally, simplyblock can be configured to use a different encryption key per logical volume for the highest degree of security and muti-tenant isolation.

Key Management Solutions

Next to selecting the encryption algorithm, managing and distributing the necessary keys is crucial. This is where key management solutions (KMS) come in.

Generally, there are two basic types of key management solutions: hardware and software-based.

For hardware-based solutions, you’ll typically utilize an HSM (Hardware Security Module). These HSMs provide a dedicated hardware token (commonly a USB key) to store keys. They offer the highest level of security but need to be physically attached to systems and are more expensive.

Software-based solutions offer a flexible key management alternative. Almost all cloud providers offer their own KMS systems, such as Azure Key Vault, AWS KMS, or Google Cloud KMS. Additionally, third-party key management solutions are available when setting up an on-premises or private cloud environment.

When managing more than a few encrypted volumes, you should implement a key management solution. That’s why simplyblock supports KMS solutions by default.

Implementing Data At Rest Encryption (DARE)

Simply said, you should have a key management solution ready to implement data-at-rest encryption in your company. If you run in the cloud, I recommend using whatever the cloud provider offers. If you run on-premises or the cloud provider doesn’t provide one, select from the existing third-party solutions.

DARE with Linux

As the most popular server operating system, Linux has two options to choose from when you want to encrypt data. The first option works based on files, providing a filesystem-level encryption.

Typical solutions for filesystem-level based encryption are eCryptfs, a stacked filesystem, which stores encryption information in the header information of each encrypted file. The benefit of eCryptfs is the ability to copy encrypted files to other systems without the need to decrypt them first. As long as the target node has the necessary encryption key in its keyring, the file will be decrypted. The alternative is EncFS, a user-space filesystem that runs without special permissions. Both solutions haven’t seen updates in many years, and I’d generally recommend the second approach.

Block-level encryption transparently encrypts the whole content of a block device (meaning, hard disk, SSD, simplyblock logical volume, etc.). Data is automatically decrypted when read. While there is VeraCrypt, it is mainly a solution for home setups and laptops. For server setups, the most common way to implement block-level encryption is a combination of dm-crypt and LUKS, the Linux Unified Key Setup.

Linux Data At Rest Encryption with dm-crypt and LUKS

The fastest way to encrypt a volume with dm-crypt and LUKS is via the cryptsetup tools.

Debian / Ubuntu / Derivates:

sudo apt install cryptsetup

RHEL / Rocky Linux / Oracle:

yum install cryptsetup-luks

Now, we need to enable encryption for our block device. In this example, we assume we already have a whole block device or a partition we want to encrypt.

Warning: The following command will delete all data from the given device or partition. Make sure you use the correct device file.

cryptsetup -y luksFormat /dev/xvda

I recommend always running cryptsetup with the -y parameter, which forces it to ask for the passphrase twice. If you misspelled it once, you’ll realize it now, not later.

Now open the encrypted device.

cryptsetup luksOpen /dev/xvda encrypted

This command will ask for the passphrase. The passphrase is not recoverable, so you better remember it.

Afterward, the device is ready to be used as /dev/mapper/encrypted. We can format and mount it.

mkfs.ext4 /dev/mapper/encrypted
mkdir /mnt/encrypted
mount /dev/mapper/encrypted /mnt/encrypted

Data At Rest Encryption with Kubernetes

Kubernetes offers ephemeral and persistent storage as volumes. Those volumes can be statically or dynamically provisioned.

For pre-created and statically provisioned volumes, you can follow the above guide on encrypting block devices on Linux and make the already encrypted device available to Kubernetes.

However, Kubernetes doesn’t offer out-of-the-box support for encrypted and dynamically provisioned volumes. Encrypting persistent volumes is not in Kubernetes’s domain. Instead, it delegates this responsibility to its container storage provider, connected through the CSI (Container Storage Interface).

Note that not all CSI drivers support the data-at-rest encryption, though! But simplyblock does!

Data At Rest Encryption with Simplyblock

Figure 3: Encryption stack for Linux: dm-crypt+LUKS vs Simplyblock

Due to the importance of DARE, simplyblock enables you to secure your data immediately through data-at-rest encryption. Simplyblock goes above and beyond with its features and provides a fully multi-tenant DARE feature set. That said, in simplyblock, you can encrypt multiple logical volumes (virtual block devices) with the same key or one key per logical volume.

The use cases are different. One key per volume enables the highest level of security and complete isolation, even between volumes. You want this to encapsulate applications or teams fully.

When multiple volumes share the same encryption key, you want to provide one key per customer. This ensures that you isolate customers against each other and prevent data from being accessible by other customers on a shared infrastructure in the case of a configuration failure or similar incident.

To set up simplyblock, you can configure keys manually or utilize a key management solution. In this example, we’ll set up the key manual. We also assume that simplyblock has already been deployed as your distributed storage platform, and the simplyblock CSI driver is available in Kubernetes.

First, let’s create our Kubernetes StorageClass for simplyblock. Deploy the YAML file via kubectl, just as any other type of resource.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: encrypted-volumes
provisioner: csi.simplyblock.io
parameters:
    encryption: "True"
    csi.storage.k8s.io/fstype: ext4
    ... other parameters
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Secondly, we generate the two keys. Note the results of the two commands down.

openssl rand -hex 32   # Key 1
openssl rand -hex 32   # Key 2

Now, we can create our secret and Persistent Volume Claim (PVC).

apiVersion: v1
kind: Secret
metadata:
    name: encrypted-volume-keys
data:
    crypto_key1: 
    crypto_key2: 
–--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    annotations:
        simplybk/secret-name: encrypted-volume-keys
    name: encrypted-volume-claim
spec:
    storageClassName: encrypted-volumes
    accessModes:
        - ReadWriteOnce
    resources:
        requests:
            storage: 200Gi

And we’re done. Whenever we use the persistent volume claim, Kubernetes will delegate to simplyblock and ask for the encrypted volume. If it doesn’t exist yet, simplyblock will create it automatically. Otherwise, it will just provide it to Kubernetes directly. All data written to the logical volume is fully encrypted.

Best Practices for Securing Data at Rest

Implementing robust data encryption is crucial. In the best case, data should never exist in a decrypted state.

That said, data-in-use encryption is still complicated as of writing. However, solutions such as Edgeless Systems’ Constellation exist and make it possible using hardware memory encryption.

Data-in-transit encryption is commonly used today via TLS. If you don’t use it yet, there is no time to waste. Low-hanging fruits first.

Data-at-rest encryption in Windows, Linux, or Kubernetes doesn’t have to be complicated. Solutions such as simplyblock enable you to secure your data with minimal effort.

However, there are a few more things to remember when implementing DARE effectively.

Data Classification

Organizations should classify their data based on sensitivity levels and regulatory requirements. This classification guides encryption strategies and key management policies. A robust data classification system includes three things:

Sensitive data identification: Identify sensitive data through automated discovery tools and manual review processes. For example, personally identifiable information (PII) like social security numbers should be classified as highly sensitive.
Classification levels: Establish clear classification levels such as Public, Internal, Confidential, and Restricted. Each level should have defined handling requirements and encryption standards.
Automated classification: Implement automated classification tools to scan and categorize data based on content patterns and metadata.

Access Control and Key Management

Encryption is only as strong as the key management and permission control around it. If your keys are leaked, the strongest encryption is useless.

Therefore, it is crucial to implement strong access controls and key rotation policies. Additionally, regular key rotation helps minimize the impact of potential key compromises and, I hate to say it, employees leaving the company.

Monitoring and Auditing

Understanding potential risks early is essential. That’s why it must maintain comprehensive logs of all access to encrypted data and the encryption keys or key management solutions. Also, regular audits should be scheduled for suspicious activities.

In the best case, multiple teams run independent audits to prevent internal leaks through dubious employees. While it may sound harsh, there are situations in life where people take the wrong path. Not necessarily on purpose or because they want to.

Data Minimization

The most secure data is the data that isn’t stored. Hence, you should only store necessary data.

Apart from that, encrypt only what needs protection. While this sounds counterproductive, it reduces the attack surface and the performance impact of encryption.

Data At Rest Encryption: The Essential Component of Data Management

Data-at-rest encryption (DARE) has become essential for organizations handling sensitive information. The rise in data breaches and increasingly stringent regulatory requirements make it vital to protect stored information. Additionally, with the rise of cloud computing and distributed systems, implementing DARE is more critical than ever.

Simplyblock integrates natively with Kubernetes to provide a seamless approach to implementing data-at-rest encryption in modern containerized environments. With our support for transparent encryption, your organization can secure its data without any application changes. Furthermore, simplyblock utilizes the standard NVMe over TCP protocol which enables us to work natively with Linux. No additional drivers are required. Use a simplyblock logical volume straight from your dedicated or virtual machine, including all encryption features.

Anyhow, for organizations running Kubernetes, whether in public clouds, private clouds, or on-premises, DARE serves as a fundamental security control. By following best practices and using modern tools like Simplyblock, organizations can achieve robust data protection while maintaining system performance and usability.

But remember that DARE is just one component of a comprehensive data security strategy. It should be combined with other security controls, such as access management, network security, and security monitoring, to create a defense-in-depth approach to protecting sensitive information.

That all said, by following the guidelines and implementations detailed in this article, your organization can effectively protect its data at rest while maintaining system functionality and performance.

As threats continue to evolve, having a solid foundation in data encryption becomes increasingly crucial for maintaining data security and regulatory compliance.

What is Data At Rest Encryption?

Data At Rest Encryption (DARE), or encryption at rest, is the encryption process of data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

What is Data-At-Rest?

Data-at-rest refers to any piece of information written to physical storage media, such as flash storage or hard disk. This also includes storage solutions like cloud storage, offline backups, and file systems as valid digital storage.

What is Data-In-Use?

Data-in-use describes any type of data created inside an application, which is considered data-in-use as long as the application holds onto it. Hence, data-in-use is information actively processed, read, or modified by applications or users.

What is Data-In-Transit?

Data-in-transit describes any information moving between locations, such as data sent across the internet, moved within a private network, or transmitted between memory and processors.

The post Encryption At Rest: A Comprehensive Guide to DARE appeared first on simplyblock.

Scale Up vs Scale Out: System Scalability Strategies

Chris Engelbert — Wed, 11 Dec 2024 10:00:40 +0000

TLDR: Horizontal scalability (scale out) describes a system that scales by adding more resources through parallel systems, whereas vertical scalability (scale up) increases the amount of resources on a single system.

One of the most important questions to answer when designing an application or infrastructure is the architecture approach to system scalability. Traditionally, systems used the scale-up approach or vertical scalability. Many modern systems use a scale-out approach, especially in the cloud-native ecosystem. Also called horizontal scalability.

Scale-Up vs Scale-Out: Which System Architecture is Right for You?

Understanding the Basics

Understanding the fundamental concepts is essential when discussing system architectures. Hence, let’s briefly overview the two approaches before exploring them in more depth.

With Scale Up (Vertical Scalability), you increase resources (typically CPU, memory, and storage) in the existing system to improve performance and capacity.
With Scale Out (Horizontal Scalability), you add additional nodes or machines to the existing workforce to distribute the workload across multiple systems.

Both architectural approaches have their respective advantages and disadvantages. While scale-up architectures are easier to implement, they are harder to scale at a certain point. On the other hand, scale-out architectures are more complex to implement but scale almost linearly if done right.

Vertical Scaling (Scale Up) Architectures: The Traditional Approach

Figure 1: Scale-up storage architecture with disks being added to the same machine

Vertical scaling, commonly known as scaling up, involves adding more resources to an existing system to increase its power or capacity.

Think of it as upgrading your personal computer. Instead of buying a second computer, you add more RAM or install a faster processor or larger storage device. In enterprise storage systems, this typically means adding more CPU cores, memory, or storage drives to an existing server. Meanwhile, for virtual machines it usually involves increasing the host machine’s assigned resources.

To clarify, let’s use a real-world example from the storage industry. With a ZFS-based SAN (Storage Area Network) system, a scaling up system design is required. Or as Jason Lohrey wrote: «However, ZFS has a significant issue – it can’t scale out. ZFS’s biggest limitation is that it is “scale-up” only.» ZFS, as awesome as it is, is limited to a single machine. That said, increasing the storage capacity always means adding larger or more disks to the existing machine. This approach maintains the simplicity of the original architecture while increasing storage capacity and potentially improving performance.

Strengths of Vertical Scaling

Today, many people see the vertical scalability approach as outdated and superfluous. That is, however, not necessarily true. Vertical scaling shines in several scenarios.

First, implementing a scale-up system is generally more straightforward since it doesn’t require changes to your application architectures or complex data distribution logic. When you scale up a transactional database like PostgreSQL or MySQL, you essentially give it more operational resources while maintaining the same operational model.

Secondly, the management overhead is lower. Tasks such as backups, monitoring, and maintenance are straightforward. This simplicity often translates to lower operational costs despite the potentially higher hardware costs.

Here is a quick overview of all the advantages:

Simplicity: It’s straightforward to implement since you’re just adding resources to an existing system
Lower Complexity: Less architectural overhead since you’re working with a single system
Consistent Performance: Lower latency due to all resources being in one place
Software Compatibility: Most traditional software is designed to run on a single system
Lower Initial Costs: Often cheaper for smaller workloads due to simpler licensing and management

Weaknesses and Limitations of Scale-Up Architectures

Like anything in this world, vertical scaling architectures also have drawbacks. The most significant limitation is the so-called physical ceiling. A system is limited by its server chassis’s space capacity or the hardware architecture’s limitation. You can only add as much hardware as those limitations allow. Alternatively, you need to migrate to a bigger base system.

Traditional monolithic applications often face another challenge with vertical scaling: adding more resources doesn’t always translate to linear performance improvements. For example, doubling the CPU cores might yield only a 50% performance increase due to software architecture limitations, especially resource contention.

Here is a quick overview of all the disadvantages:

Hardware Limits: The physical ceiling limits how much you can scale up based on maximum hardware specifications
Downtime During Upgrades: Usually requires system shutdown for hardware upgrades
Cost Efficiency: High-end hardware becomes exponentially more expensive
Single Point of Failure: No built-in redundancy
Limited Flexibility: Cannot easily scale back down when demand decreases

When to Scale Up?

After all that, here is when you really want to go with a scale-up architecture:

You have traditional monolithic applications
You look for an easier way to optimize for performance, not capacity
You’re dealing with applications that aren’t designed for distributed computing
You need a quick solution for immediate performance issues

Horizontal Scaling (Scale Out) Architectures: The Distributed Approach

Figure 2: Scale-out storage architecture with additional nodes being added to the cluster

The fundamentally different approach is the horizontal scaling or scale-out architecture. Instead of increasing the available resources on the existing system, you add more systems to distribute the load across them. This is actually similar to adding additional workers to an assembly line rather than trying to make one worker more efficient.

Consider a distributed storage system like simplyblock or a distributed database like MongoDB. When you scale out these systems, you add more nodes to the cluster, and the workload gets distributed across all nodes. Each node handles a portion of the data and processing, allowing the system to grow almost limitlessly.

Advantages of Horizontal Scaling

Large-scale deployments and highly distributed systems are the forte of scale-out architectures. As a simple example, most modern web applications utilize load balancers. They distribute the traffic across multiple application servers. This allows us to handle millions of concurrent requests and users. Similarly, distributed storage systems like simplyblock scale to petabytes of data by adding additional storage nodes.

Secondly, another significant advantage is improved high availability and fault tolerance. In a properly designed scale-out system, if one node fails, the system continues operating. While it may degrade to a reduced service, it will not experience a complete system failure or outage.

To bring this all to a point:

Near-Infinite Scalability: Can continue adding nodes as needed
Better Fault Tolerance: Built-in redundancy through multiple nodes
Cost Effectiveness: Can use commodity hardware
Flexible Resource Allocation: Easy to scale up or down based on demand
High Availability: No single point of failure

The Cost of Distribution: Weakness and Limitations of Horizontal Scalability

The primary challenge when considering scale-out architectures is complexity. Distributed systems must maintain data consistency across system boundaries, handle network communications or latencies, and handle failure recovery. Multiple algorithms have been developed over the years. The most commonly used ones are Raft and Paxos, but that’s a different blog post. Anyhow, this complexity typically requires more sophisticated management tools and distributed systems expertise. Normally also for the team operating the system.

The second challenge is the overhead of system coordination. In a distributed system, nodes must synchronize their operations. If not careful, this can introduce latency and even reduce the performance of certain types of operations. Great distributed systems utilize sophisticated algorithms to prevent these issues from happening.

Here is a quick overview of the disadvantages of horizontal scaling:

Increased Complexity: More moving parts to manage
Data Consistency Challenges: Maintaining consistency across nodes can be complex
Higher Initial Setup Costs: Requires more infrastructure and planning
Software Requirements: Applications must be designed for distributed computing
Network Overhead: Communication between nodes adds latency

Kubernetes: A Modern Approach to Scaling

Kubernetes has become the de facto platform for container orchestration. It comes in multiple varieties, in its vanilla form or as the basis for systems like OpenShift or Rancher. Either way, it can be used for both vertical and horizontal scaling capabilities. However, Kubernetes has become a necessity when deploying scale-out services. Let’s look at how different workloads scale in a Kubernetes environment.

Scaling Stateless Workloads

Stateless applications, like web servers or API gateways, are natural candidates for horizontal scaling in Kubernetes. The Horizontal Pod Autoscaler (HPA) provided by Kubernetes automatically adjusts the number of pods based on metrics such as CPU or RAM utilization. Custom metrics as triggers are also possible.

Horizontally scaling stateless applications is easy. As the name suggests, stateless applications do not maintain persistent local or shared state data. Each instance or pod is entirely independent and interchangeable. Each request to the service contains all the required information needed for processing.

That said, automatically scaling up and down (in the meaning of starting new instances or shutting some down) is part of the typical lifecycle and can happen at any point in time.

Scaling Stateful Workloads

Stateful workloads, like databases, require more careful consideration.

A common approach for more traditional databases like PostgreSQL or MySQL is to use a primary-replica architecture. In this design, write operations always go to the primary instance, while read operations can be distributed across all replicas.

On the other hand, MongoDB, which uses a distributed database design, can scale out more naturally by adding more shards to the cluster. Their internal cluster design uses a technique called sharding. Data is assigned to horizontally scaling partitions distributed across the cluster nodes. Shard assignment happens either automatically (based on the data) or by providing a specific shard key, enabling data affinity. Adding a shard to the cluster will increase capacity when additional scale is necessary. Data rebalancing happens automatically.

Why we built Simplyblock on a Scale-Out Architecture

Figure 3: Simplyblock’s scale-out architecture with storage pooling via cluster nodes

Stateful workloads, like Postgres or MySQL, can scale out by adding additional read-replicas to the cluster. However, every single instance needs storage to store its very own data. Hence, the need for scalable storage arrives.

Simplyblock is a cloud-native and distributed storage platform built to deliver scalable performance and virtually infinite capacity for logical devices through horizontal scalability. Unlike traditional storage systems, simplyblock distributes data across all cluster nodes, multiplying the performance and capacity.

Designed as an NVMe-first architecture, simplyblock using the NVMe over Fabrics protocol family. This extends the reach of the highly scalable NVMe protocol over network fabrics such as TCP, Fibre Channel, and others. Furthermore, it provides built-in support for multi-pathing, enabling seamless failover and load balancing.

The system uses a distributed data placement algorithm to spread data across all available cluster nodes, automatically rebalancing data when nodes are added or removed. When writing data, simplyblock splits the item into multiple, smaller chunks and distributes them. This allows for parallel access during read operations. The data distribution also provides redundancy, with parity information stored on other nodes in the cluster. This protects the data against individual disk and node failures.

Using this architecture, simplyblock provides linear capacity and performance scalability by pooling all available disks and parallelizing access. This enables simplyblock to scale from mere terabytes to multiple petabytes while maintaining performance, consistency, and durability characteristics throughout the cluster-growth process.

Building Future-Proof Infrastructure

To wrap up, when you build out a new system infrastructure or application, consider these facts:

Figure 4: Flowchart when to scale-up or scale-out?

Workload characteristics: CPU-intensive workloads might benefit more from vertical scaling. Distributing operations comes with its own overhead. If the operation itself doesn’t set off this overhead, you might see lower performance than with vertical scaling. On the other hand, I/O-heavy workloads might perform better with horizontal scaling. If the access patterns are highly parallelizable, a horizontal architecture will most likely out scale a vertical one.
Growth patterns: Predictable, steady growth might favor scaling up, while rapid growth patterns might necessitate the flexibility of scaling out. This isn’t a hard rule, though. A carefully designed scale-out system will provide a very predictable growth pattern and latency. However, the application isn’t the only element to take into account when designing the system, as there are other components, most prominently the network and network equipment.
Future-Proofing: Scaling out often requires little upfront investment in infrastructure but higher investment in development and expertise. It can, however, provide better long-term cost efficiency for large deployments. That said, buying a scale-out solution is a great idea. With a storage solution like simplyblock, for example, you can start small and add required resources whenever necessary. With traditional storage solutions, you have to go with a higher upfront cost and are limited by the physical ceiling.
Operational Complexity: Scale-up architectures are typically easier to manage, while a stronger DevOps or operations team is required to handle scale-out solutions. That’s why simplyblock’s design is carefully crafted to be fully autonomous and self-healing, with as few hands-on requirements as possible.

The Answer Depends

That means there is no universal answer to whether scaling up or out is better. A consultant would say, “It depends.” Seriously, it does. It depends on your specific requirements, constraints, and goals.

Many successful organizations use a hybrid approach, scaling up individual nodes while also scaling out their overall infrastructure. The key is understanding the trade-offs and choosing the best approach to your needs while keeping future growth in mind. Hence, simplyblock provides the general scale-out architecture for infinite scalability. It also provides a way to utilize storage located in Kubernetes worker nodes as part of the storage cluster to provide the highest possible performance. At the same time, it maintains the option to spill over when local capacity is reached and the high durability and fault tolerance of a fully distributed storage system.

Remember, the best scaling strategy aligns with your business objectives while maintaining performance, reliability, and cost-effectiveness. Whether you scale up, out, or both, ensure your choice supports your long-term infrastructure goals.

Figure 5: Simple definition of scale up vs scale out.

The post Scale Up vs Scale Out: System Scalability Strategies appeared first on simplyblock.

NVMe & Kubernetes: Future-Proof Infrastructure

Chris Engelbert — Wed, 27 Nov 2024 13:34:00 +0000

The marriage of NVMe storage and Kubernetes persistent volumes represents a perfect union of high-performance storage and modern container orchestration. As organizations increasingly move performance-critical workloads to Kubernetes, understanding how to leverage NVMe technology becomes crucial for achieving optimal performance and efficiency.

The Evolution of Storage in Kubernetes

When Kubernetes was created over 10 years ago, its only purpose was to schedule and orchestrate stateless workloads. Since then, a lot has changed, and Kubernetes is increasingly used for stateful workloads. Not just basic ones but mission-critical workloads, like a company’s primary databases. The promise of workload orchestration in infrastructures with growing complexity is too significant.

Anyhow, traditional Kubernetes storage solutions relied upon and still rely on old network-attached storage protocols like iSCSI. Released in 2000, iSCSI was built on the SCSI protocol itself, first introduced in the 1980s. Hence, both protocols are inherently designed for spinning disks with much higher seek times and access latencies. According to our modern understanding of low latency and low complexity, they just can’t keep up.

While these solutions worked well for basic containerized applications, they fall short for high-performance workloads like databases, AI/ML training, and real-time analytics. Let’s look at the NVMe standard, particularly NVMe over TCP, which has transformed our thinking about storage in containerized environments, not just Kubernetes.

Why NVMe and Kubernetes Work So Well Together

The beauty of this combination lies in their complementary architectures. The NVMe protocol and command set were designed from the ground up for parallel, low-latency operations–precisely what modern containerized applications demand. When you combine NVMe’s parallelism with Kubernetes’ orchestration capabilities, you get a system that can efficiently distribute I/O-intensive workloads while maintaining microsecond-level latency. Further, comparing NVMe over TCP vs iSCSI, we see significant improvement in terms of IOPS and latency performance when using NVMe/TCP.

Consider a typical database workload on Kubernetes. Traditional storage might introduce latencies of 2-4ms for read operations. With NVMe over TCP, these same operations complete in under 200 microseconds–a 10-20x improvement. This isn’t just about raw speed; it’s about enabling new classes of applications to run effectively in containerized environments.

The Technical Symphony

The integration of NVMe with Kubernetes is particularly elegant through persistent volumes and the Container Storage Interface (CSI). Modern storage orchestrators like simplyblock leverage this interface to provide seamless NVMe storage provisioning while maintaining Kubernetes’ declarative model. This means development teams can request high-performance storage using familiar Kubernetes constructs while the underlying system handles the complexity of NVMe management, providing fully reliable shared storage.

The NMVe Impact: A Real-World Example

But what does that mean for actual workloads? Our friends over at Percona found in their MongoDB Performance on Kubernetes report that Kubernetes implies no performance penalty. Hence, we can look at the disks’ actual raw performance.

A team of researchers from the University of Southern California, San Jose State University, and Samsung Semiconductor took on the challenge of measuring the implications of NVMe SSDs (over SATA SSD and SATA HDD) for real-world database performance.

The general performance characteristics of their test hardware:

	NVMe SSD	SATA SSD	SATA HDD
Access latency	113µs	125µs	14,295µs
Maximum IOPS	750,000	70,000	190
Maximum Bandwidth	3GB/s	278MB/s	791KB/s

Table 1: General performance characteristics of the different storage types

Their resume states, “scale-out systems are driving the need for high-performance storage solutions with high available bandwidth and lower access latencies. To address this need, newer standards are being developed exclusively for non-volatile storage devices like SSDs,” and “NVMe’s hardware and software redesign of the storage subsystem translates into real-world benefits.”

They’re closing with some direct comparisons that claim an 8x performance improvement of NVMe-based SSDs compared to a single SATA-based SSD and still a 5x improvement over a [Hardware] RAID-0 of four SATA-based SSDs.

Transforming Database Operations

Perhaps the most compelling use case for NVMe in Kubernetes is database operations. Typical modern databases process queries significantly faster when storage isn’t the bottleneck. This becomes particularly important in microservices architectures where concurrent database requests and high-load scenarios are the norm.

Traditionally, running stateful services in Kubernetes meant accepting significant performance overhead. With NVMe storage, organizations can now run high-performance databases, caches, and messaging systems with bare-metal-like performance in their Kubernetes clusters.

Dynamic Resource Allocation

One of Kubernetes’ central promises is dynamic resource allocation. That means assigning CPU and memory according to actual application requirements. Furthermore, it also means dynamically allocating storage for stateful workloads. With storage classes, Kubernetes provides the option to assign different types of storage backends to different types of applications. While not strictly necessary, this can be a great application of the “best tool for the job” principle.

That said, for IO-intensive workloads, such as databases, a storage backend providing NVMe storage is essential. NVMe’s ability to handle massive I/O parallelism aligns perfectly with Kubernetes’ scheduling capabilities. Storage resources can be dynamically allocated and deallocated based on workload demands, ensuring optimal resource utilization while maintaining performance guarantees.

Simplified High Availability

The low latency of NVMe over TCP enables new approaches to high availability. Instead of complex database replication schemes, organizations can leverage storage-level replication (or more storage-efficient erasure coding, like in the case of simplyblock) with a negligible performance impact. This significantly simplifies application architecture while improving reliability.

Furthermore, NVMe over TCP utilizes multipathing as an automatic fail-over implementation to protect against network connection issues and sudden connection drops, increasing the high availability of persistent volumes in Kubernetes.

The Physics Behind NVMe Performance

Many teams don’t realize how profoundly storage physics impacts database operations. Traditional storage solutions averaging 2-4ms latency might seem fast, but this translates to a hard limit of about 80 consistent transactions per second, even before considering CPU or network overhead. Each transaction requires multiple storage operations: reading data pages, writing to WAL, updating indexes, and performing one or more fsync() operations. At 3ms per operation, these quickly stack up into significant delays. Many teams spend weeks optimizing queries or adding memory when their real bottleneck is fundamental storage latency.

This is where the NVMe and Kubernetes combination truly shines. With NVMe as your Kubernetes persistent volume storage backend, providing sub-200μs latency, the same database operations can theoretically support over 1,200 transactions per second–a 15x improvement. More importantly, this dramatic reduction in storage latency changes how databases behave under load. Connection pools remain healthy longer, buffer cache decisions become more efficient, and query planners can make better optimization choices. With the storage bottleneck removed, databases can finally operate closer to their theoretical performance limits.

Looking Ahead

The combination of NVMe and Kubernetes is just beginning to show its potential. As more organizations move performance-critical workloads to Kubernetes, we’ll likely see new patterns and use cases that fully take advantage of this powerful combination.

Some areas to watch:

AI/ML workload optimization through intelligent data placement
Real-time analytics platforms leveraging NVMe’s parallel access capabilities
Next-generation database architectures built specifically for NVMe on Kubernetes Persistent Volumes

The marriage of NVMe-based storage and Kubernetes Persistent Volumes represents more than just a performance improvement. It’s a fundamental shift in how we think about storage for containerized environments. Organizations that understand and leverage this combination effectively gain a significant competitive advantage through improved performance, reduced complexity, and better resource utilization.

For a deeper dive into implementing NVMe storage in Kubernetes, visit our guide on optimizing Kubernetes storage performance.

The post NVMe & Kubernetes: Future-Proof Infrastructure appeared first on simplyblock.

Serverless Compute Need Serverless Storage

Chris Engelbert — Wed, 23 Oct 2024 11:37:27 +0000

The use of serverless infrastructures is steeply increasing. As the Datadog “State of Serverless 2023” survey shows, more than half of all cloud customers have already adopted a serverless environment on the three big hyperscalers—at least to some extent. The premise of saving cost while automatically and indefinitely scaling (up and down) increases the user base.

Due to this movement, other cloud operators, many database companies (such as Neon and Nile), and infrastructure teams at large enterprises are building serverless environments, either on their premises or in their private cloud platforms.

While there are great options for serverless compute, providing serverless storage to your serverless platform tends to be more challenging. This is often fueled by a lack of understanding of what serverless storage has to provide and its requirements.

What is a Serverless Architecture?

Serverless architecture is a software design pattern that leverages serverless computing resources to build and run applications without managing the underlying architecture. These serverless compute resources are commonly provided by cloud providers such as AWS Lambda, Google Cloud Functions, or Azure Functions and can be dynamically scaled up and down.

Simplified serverless architecture with clients and multiple functions

When designing a serverless architecture, you’ll encounter the so-called Function-as-a-Service (FaaS), meaning that the application’s core logic will be implemented in small, stateless functions that respond to events.

That said, typically, several FaaS make up the actual application, sending events between them. Since the underlying infrastructure is abstracted away, the functions don’t know how requests or responses are handled, and their implementations are designed for vendor lock-in and built against a cloud-provider-specific API.

Cloud-vendor-agnostic solutions exist, such as knative, but require at least parts of the team to manage the Kubernetes infrastructure. They can, however, take the burden away from other internal and external development teams.

What is Serverless Compute?

While a serverless architecture describes the application design that runs on top of a serverless compute infrastructure, serverless compute itself describes the cloud computing model in which the cloud provider dynamically manages the allocation and provisioning of server resources.

Simplified serverless platform architecture

It is essential to understand that serverless doesn’t mean “without servers” but “as a user, I don’t have to plan, provision, or manage the infrastructure.”

In essence, the cloud provider (or whoever manages the serverless infrastructure) takes the burden from the developer. Serverless compute environments fully auto-scale, starting or stopping instances of the functions according to the needed capacity. Due to their stateless nature, it’s easy to stop and restart them at any point in time. That means that function instances are often very short-lived.

Popular serverless compute platforms include AWS Lambda, Google Cloud Functions, and Azure Functions. For self-managed operations, there is knative (mentioned before), as well as OpenFaaS and OpenFunction (which seems to have less activity in the recent future).

They all enable developers to focus on writing code without managing the underlying infrastructure.

What is a Serverless Storage System?

Serverless storage refers to a cloud storage model where the underlying infrastructure, capacity planning, and scaling are abstracted away from the user. With serverless storage, customers don’t have to worry about provisioning or managing storage servers or volumes. Instead, they can store and retrieve data while the serverless storage handles all the backend infrastructure.

Serverless storage solutions come in different forms and shapes, beginning with an object storage interface, such as Amazon S3 or Google Cloud Storage. Object storage is excellent when storing unstructured data, such as documents or media.

Serverless storage options are available in GCP, AWS, and Azure

Another option that people love to use for serverless storage is serverless databases. Various options are available, depending on your needs: relational, NoSQL, time-series, and graph databases. This might be the easiest way to go, depending on how you need to access data. Examples of such serverless databases include Amazon Aurora Serverless, Google’s Cloud Datastore, and external companies such as Neon or Nile.

When self-managing your serverless infrastructure with knative or one of the alternative options, you can use Kubernetes CSI storage providers to provide storage into your functions. However, you may add considerable startup time if you choose the wrong CSI driver. I might be biased, but simplyblock is an excellent option with its neglectable provisioning and attachment times, as well as features such as multi-attach, where a volume can be attached to multiple functions (for example, to provide a shared set of data).

Why Serverless Architectures?

Most people think of cost-efficiency when it comes to serverless architectures. However, this is only one side of the coin. If your use cases aren’t a good fit for a serverless environment, it will hold true—more on when serverless makes sense later.

In serverless architectures, functions are triggered through an event, either from the outside world (like an HTTP request) or an event initiated by another function. If no function instance is up and running, a new instance will be started. The same goes for situations where all function instances are busy. If function instances idle, they’ll be shut down.

Serverless functions usually use a pay-per-use model. A function’s extremely short lifespan can lead to cost reductions over deployment models like containers and virtual machines, which tend to run longer.

Apart from that, serverless architectures have more benefits. Many are moving in the same direction as microservices architectures, but with the premise that they are easier to implement and maintain.

First and foremost, serverless solutions are designed for scalability and elasticity. They quickly and automatically scale up and down depending on the incoming workload. It’s all hands-free.

Another benefit is that development cycles are often shortened. Due to the limited size and functionality of a FaaS, changes are fast to implement and easy to test. Additionally, updating the function is as simple as deploying the new version. All existing function instances finish their current work and shut down. In the meantime, the latest version will be started up. Due to its stateless nature, this is easy to achieve.

What are the Complexities of Serverless Architecture?

Writing serverless solutions has the benefits of fast iteration, simplified deployments, and potential cost savings. However, they also come with their own set of complexities.

Designing real stateless code isn’t easy, at least when we’re not just talking about simple transformation functionality. That’s why a FaaS receives and passes context information along during its events.

What works great for small bits of context is challenging for larger pieces. In this situation, a larger context, or state, can mean lots of things, starting from simple cross-request information that should be available without transferring it with every request over more involved data, such as lookup information to enrich and cross-check, all the way to actual complex data, like when you want to implement a serverless database. And yes, a serverless database needs to store its data somewhere.

That’s where serverless storage comes in, and simply put, this is why all serverless solutions have state storage alternatives.

What is Serverless Storage?

Serverless storage refers to storage solutions that are fully integrated into serverless compute environments without manual intervention. These solutions scale and grow according to user demand and complement the pay-by-use payment model of serverless platforms.

Serverless storage lets you store information across multiple requests or functions.

As mentioned above, cloud environments offer a wide selection of serverless storage options. However, all of them are vendor-bound and lock you into their services.

However, when you design your serverless infrastructure or service, these services don’t help you. It’s up to you to provide the serverless storage. In this case, a cloud-native and serverless-supporting storage engine can simplify this talk immensely. Whether you want to provide object storage, a serverless database, or file-based storage, an underlying cloud-native block storage solution is the perfect building block underneath. However, this block storage solution needs to be able to scale and grow with your needs easily and quickly to provision and support snapshotting, cloning, and attaching to multiple function instances.

Why do Serverless Architectures Require Serverless Storage?

Serverless storage has particular properties designed for serverless environments. It needs to keep up with the specific requirements of serverless architectures, most specifically short lifetimes, extremely fast up and down scaling or restarts, easy use across multiple versions during updates, and easy integration through APIs utilized by the FaaS.

The most significant issues are that it must be used by multiple function instances simultaneously and is quickly available to new instances on other nodes, regardless of whether those are migrated over or used for scaling out. That means that the underlying storage technology must be prepared to handle these tasks easily.

These are just the most significant requirements, but there are more:

Stateless nature: Serverless functions spin up, execute, and terminate due to their stateless nature. Without fast, persistent storage that can be attached or accessed without any additional delay, this fundamental property of serverless functions would become a struggle.
Scalability needs: Serverless compute is built to scale automatically based on user demand. A storage layer needs to seamlessly support the growth and shrinking of serverless infrastructures and handle variations in I/O patterns, meaning that traditional storage systems with fixed capacity limits don’t align well with the requirements of serverless workloads.
Cost efficiency: One reason people engage with serverless compute solutions is cost efficiency. Serverless compute users pay by actual execution time. That means that serverless storage must support similar payment structures and help serverless infrastructure operators efficiently manage and scale their storage capacities and performance characteristics.
Management overhead: Serverless compute environments are designed to eliminate manual server management. Therefore, the storage solution needs to minimize its manual administrative tasks. Allocating and scaling storage requirements must be fully integratable and automated via API calls or fully automatic. Also, the integration must be seamless if multiple storage tiers are available for additional cost savings.
Performance requirements: Serverless functions require fast, if not immediate, access to data when they spin up. Traditional storage solutions introduce delays due to allocation and additional latency, negatively impacting serverless functions’ performance. As functions are paid by runtime, their operational cost increases.
Integration needs: Serverless architectures typically combine many services, as individual functions use different services. That said, the underlying storage solution of a serverless environment needs to support all kinds of services provided to users. Additionally, seamless integration with the management services of the serverless platform is required.

There are quite some requirements. For the alignment of serverless compute and serverless storage, storage solutions need to provide an efficient and manageable layer that seamlessly integrates with the overall management layer of the serverless platform.

Simplyblock for Serverless Storage

When designing a serverless environment, the storage layer must be designed to keep up with the pace. Simplyblock enables serverless infrastructures to provide dynamic and scalable storage.

To achieve this, simplyblock provides several characteristics that perfectly align with serverless principles:

Dynamic resource allocation: Simplyblock’s thin provisioning makes capacity planning irrelevant. Storage is allocated on-demand as data is written, similar to how serverless platforms allocate resources. That means every volume can be arbitrarily large to accommodate unpredictable future growth. Additionally, simplyblock’s logical volumes are resizable, meaning that the volume can be enlarged at any point in the future.
Automatic scaling: Simplyblock’s storage engine can indefinitely grow. To acquire additional backend storage, simplyblock can automatically acquire additional persistent disks (like Amazon EBS volumes) from cloud providers or attach additional storage nodes to its cluster when capacity is about to exceed, handling scaling without user intervention.
Abstraction of infrastructure: Users interact with simplyblock’s virtual drives like normal hard disks. This abstracts away the complexity of the underlying storage pooling and backend storage technologies.
Unified interface: Simplyblock provides a unified storage interface (NVMe) logical device that abstracts away underlying, diverging storage interfaces behind an easy-to-understand disk design. That enables services not specifically designed to talk to object storages or similar technologies to immediately benefit from them, just like PostgreSQL or MySQL.
Extensibility: Due to its disk-like storage interface, simplyblock is highly extensible in terms of solutions that can be run on top of it. Databases, object storage, file storage, and specific storage APIs, simplyblock provides scalable block storage to all of them, making it the perfect backend solution for serverless environments.
Crash-consistent and recoverable: Serverless storage must always be up and running. Simplyblock’s distributed erasure coding (parity information similar to RAID-5 or 6) enables high availability and fault tolerance on the storage level with a high storage efficiency, way below simple replication. Additionally, simplyblock provides storage cluster replication (sync / async), consistent snapshots across multiple logical volumes, and disaster recovery options.
Automated management: With features like automatic storage tiering to cheaper object storage (such as Amazon S3), automatic scaling, as well as erasure coding and backups for data protection, simplyblock eliminates manual management overhead and hands-on tasks. Simplyblock clusters are fully autonomous and manage the underlying storage backend automatically.
Flexible integration: Serverless platforms require storage to be seamlessly allocated and provisioned. Simplyblock achieves this through its API, which can be integrated into the standard provisioning flow of new customer sign-ups. If the new infrastructure runs on Kubernetes, integration is even easier with the Kubernetes CSI driver, allowing seamless integration with container-based serverless platforms such as knative.
Pay-per-use potential: Due to the automatic scalability, thin provisioning, and seamless resizing and integration, simplyblock enables you to provide your customers with an industry-loved pay-by-use model for managed service providers, perfectly aligning with the most common serverless pricing models.

Simplyblock is the perfect backend storage for all your serverless storage needs while future-proofing your infrastructure. As data grows and evolves, simplyblock’s flexibility and scalability ensure you can adapt without massive overhauls or migrations.

Remember, simplyblock offers powerful features like thin provisioning, storage pooling, and tiering, helping you to provide a cost-efficient, pay-by-use enabled storage solution. Get started now and find out how easy it is to operate services on top of simplyblock.

The post Serverless Compute Need Serverless Storage appeared first on simplyblock.

AWS Storage Optimization: Avoid EBS Over-provisioning

Chris Engelbert — Thu, 10 Oct 2024 07:36:59 +0000

“Cloud is expensive” is an often repeated phrase among IT professionals. What makes the cloud so expensive, though? One element that significantly drives cloud costs is storage over-provisioning and lack of storage optimization. Over-provisioning refers to the eager allocation of more resources than required by a specific workload at the time of allocation.

When we hear about hoarding goods, we often think of so-called preppers preparing for some type of serious event. Many people would laugh about that kind of behavior. However, it is commonplace when we are talking about cloud environments.

In the past, most workloads used their own servers, often barely utilizing any of the machines. That’s why we invented virtualization techniques, first with virtual machines and later with containers. We didn’t like the idea of wasting resources and money.

That didn’t stop when workloads were moved to the cloud, or did it?

What is Over-Provisioning?

As briefly mentioned above, over-provisioning refers to allocating more resources than are needed for a given workload or application. That means we actively request more resources than we need, and we know it. Over-provisioning typically occurs across various infrastructure components: CPU, memory, and storage. Let’s look at some basic examples to understand what that means:

CPU Over-Provisioning: Imagine running a web server on a virtual machine instance (e.g., Amazon EC2) with 16 vCPUs. At the same time, your application only requires four vCPUs for the current load and number of customers. You expect to increase the number of customers in the next year or so. Until then, the excess computing power sits idle, wasting resources and money.
Memory Over-Provisioning: Consider a database server provisioned with 64GB of RAM when the database service commonly only uses 16GB, except during peak loads. The unused memory is essentially paid for but unutilized most of the time.
Storage Over-Provisioning: Consider a Kubernetes cluster with ten instances of the same stateful service (like a database), each requesting a block storage volume (e.g., Amazon EBS) of 100 GB but will only slowly fill it up over the course of a year. In this case, each container uses about 20 GB as of now, meaning we over-provisioned 800 GB, and we have to pay for it.

Why is EBS Over-Provisioning an Issue?

EBS Over-provisioning isn’t an issue by itself, and we lived happily ever after (almost) with it for decades. While over-provisioning seems to be the safe bet to ensure performance and plannability, it comes with a set of drawbacks.

High initial cost: When you overprovision, you pay for resources you don’t use from day one. This can significantly inflate your cloud bill, especially at scale.
Resource waste: Unused resources aren’t just a financial burden. They also waste valuable computing power that could be better allocated elsewhere. Not to mention the environmental effects of over-provisioning, think CO2 footprint.
Hard to estimate upfront: Predicting exact resource needs is challenging, especially for new applications or those with variable workloads. This uncertainty often leads us to very conservative (and excessive) provisioning decisions.
Limitations when resizing: While cloud providers like AWS allow resource resizing, limitations exist. Amazon EBS volumes can only be modified every 6 hours, making it difficult to adjust to changing needs quickly.

On top of those issues, which are all financial impact related, over-provisioning can also directly or indirectly contribute to topics such as:

Reduced budget for innovation
Complex and hard-to-manage infrastructures
Potential compliance issues in regulated industries
Decreased infrastructure efficiency

The Solution is Pay-By-Use

Pay-by-use refers to the concept that customers are billed only for what they actually use. That said, using our earlier example of a 100 GB Amazon EBS volume where only 20 GB is used, we would only be charged for those 20 GB. As a customer, I’d love the pay-by-use option since it makes it easy and relieves me of the burden of the initial estimate.

So why isn’t everyone just offering pay-by-use models?

The Complexity of Pay-By-Use

Many organizations dream of an actual pay-by-use model, where they only pay for the exact resources consumed. This improves the financial impact, optimizes the overall resource utilization, and brings environmental benefits. However, implementing this is challenging for several reasons:

Technical Complexity: Building a system that can accurately measure and bill for precise resource usage in real time is technically complex.
Performance Concerns: Constant scaling and de-scaling to match exact usage can potentially impact performance and introduce latency.
Unpredictable Costs: While pay-by-use can save money, it can also make costs less predictable, making budgeting challenging.
Legacy Systems: Many existing applications aren’t designed to work with dynamically allocated resources.
Cloud Provider Greed: While this is probably exaggerated, there is still some truth. Cloud providers overcommit CPU, RAM, and network bandwidth, which is why they offer both machine types with dedicated resources and ones without (where they tend to over-provision resources, and you might encounter the “noisy neighbor” problem). On the storage side, they thinly provision your storage out of a large, ever-growing storage pool.

Over-Provisioning in AWS

Like most cloud providers, AWS has several components where over-provisioning is typical. The most obvious one is resources around Amazon EC2. However, since many other services are built upon EC2 machines (like Kubernetes clusters), this is the most common entry point to look into optimization.

Amazon EC2 (CPU and Memory)

When looking at Amazon EC2 instances to save some hard-earned money, AWS offers some tools by itself:

Use AWS CloudWatch to monitor CPU and memory utilization.
Implement auto-scaling groups to adjust instance counts dynamically based on demand.
Consider using EC2 Auto Scaling with predictive scaling to anticipate future needs.

In addition, some external tools, such as AutoSpotting or Cast.ai, enable you to find over-provisioned VMs and adjust them accordingly automatically or exchange them with so-called spot instances. Spot instances are VM instances that are way cheaper but can be taken away from you with only a few seconds’ notice. The idea is that AWS offers these instances at a reduced rate when they can’t be sold for their regular price. That said, if the capacity is required, they’ll take them away from you—still a great way to save some money.

Last but not least, companies like DoIT work as resellers for hyperscalers like AWS. They have custom rates and offer additional features like bursting beyond your typical requirements. This is a great way to get cheaper VMs and extra services. It’s worth a look.

Amazon EBS Storage Over-Provisioning

One of the most common causes of over-provisioning happens with block storage volumes, such as Amazon EBS. With EBS, the over-provisioning is normally driven by:

Pre-allocated Capacity: EBS volumes are provisioned with a fixed size, and you pay for the entire allocated space regardless of usage.
Modification Limitations: EBS volumes can only be modified every 6 hours, making rapid adjustments difficult.
Performance Considerations: A common belief is that larger volumes perform better, so people feel incentivized to over-provision.

One interesting note, though, is that while customers have to pay for the total allocated size, AWS likely uses technologies such as thin provisioning internally, allowing it to oversell its actual physical storage. Imagine this overselling margin would be on your end and not the hyperscaler.

How Simplyblock Can Help with EBS Storage Over-Provisioning

Simplyblock offers an innovative storage optimization platform to address storage over-provisioning challenges. By providing you with a comprehensive set of technologies, simplyblock enables several features that significantly optimize storage usage and costs.

Thin Provisioning

Thin provisioning is a technique where a storage entity of any capacity will be created without pre-allocating the requested capacity. A thinly provisioned volume will only require as much physical storage as the data consumes at any point in time. This enables overcommitting the underlying storage, like ten volumes with a provisioned capacity of 1 TB each. Still, only 100GB being used will require around 1 TB at this time, meaning you can save around 9 TB of storage that is not paid for unless used.

Simplyblock’s thin provisioning technology allows you to create logical volumes of any size without pre-allocating the total capacity. You only consume (and pay for) the actual space your data uses. This eliminates the need to over-provision “just in case” and allows for more efficient use of your storage resources. When your actual storage requirements increase, simplyblock automatically allocates additional underlying storage to keep up with your demands.

Copy-on-Write, Snapshots, and Instant Clones

Simplyblock’s storage technology is a fully copy-on-write-enabled system. Copy-on-write is a technique also known as shadowing. Instead of copying data right away when multiple copies are created, copy-on-write will only create a second instance when the data is actually changed. This means the old version is still around since other copies still refer to it, while only one specific copy refers to the changed data. Copy-on-write enables the instant creation of volume snapshots and clones without duplicating data. This is particularly useful for development and testing environments, where multiple copies of large datasets are often needed. Instead of provisioning full copies of production data, you can create instant, space-efficient clones specifically attractive for databases, AI / ML workloads, or analytics data.

Transparent Tiering

With most data sets, parts of the data are typically assumed to be “cold,” meaning that the data is very infrequently used, if ever. This is true for any data that needs to be kept available for regulatory reasons or historical manufacturing data (such as process information for car part manufacturing). This data can be moved to slower but much less expensive storage options. Simplyblock automatically moves infrequently accessed data to cheaper storage tiers such as object storage (e.g., Amazon S3 or MinIO) and non-NVMe SSD or HDD pools while keeping hot data on high-performance storage. This tiering is completely transparent to your applications, database, or other workload and helps optimize costs without sacrificing performance. With tiering integrated into the storage layer, application and system developers can focus on business logic rather than storage requirements.

Storage Pooling

Storage pooling is a technique in which multiple storage devices or services are used in conjunction. It enables technologies like thin provisioning and data tiering, which were already mentioned above.

By pooling multiple cloud block storage volumes (e.g., Amazon EBS volumes), simplyblock can provide better performance and more flexible scaling. This pooling allows for more granular storage growth, preventing the provision of large EBS volumes upfront.

Additionally, simplyblock can leverage directly attached fast SSD storage (NVMe), also called local instance storage, and make it part of the storage pool or use it as an even faster workload-local data cache.

NVMe over Fabrics

NVMe over Fabrics is an industry-standard for remotely attaching block devices to clients. It can be assumed to be the successor of iSCSI and enables the full feature set and performance of NVMe-based SSD storage. Simplyblock uses NVMe over Fabrics (specifically the NVMe/TCP version) to provide high-performance, low-latency access to storage.

This enables the consolidation of multiple storage locations into a centralized one, enabling even greater savings on storage capacity and compute power.

Pay-By-Use Model Enablement

As stated above, pay-by-use models are a real business advantage, specifically for storage. Implementing a pay-by-use model in the cloud requires taking charge of how storage works. This is complex and requires a lot of engineering effort. This is where simplyblock helps bring a competitive advantage to your doorstep.

With its underlying technology and features such as thin provisioning, simplyblock makes it easier for managed service providers to implement a true pay-by-use model for their customers, giving you the competitive advantage at no extra cost or development effort, all fully transparent to your database or application workload.

AWS Storage Optimization with Simplyblock

By addressing the core issues of EBS over-provisioning, simplyblock helps reduce costs and improves overall storage efficiency and flexibility. For businesses struggling with storage over-provisioning in AWS, simplyblock offers a compelling solution to optimize their infrastructure and better align costs with actual usage.

In conclusion, while over-provisioning remains a significant challenge in AWS environments, particularly with storage, simplyblock paves the way for more efficient, cost-effective cloud storage optimization management. By combining advanced technologies with a deep understanding of cloud storage dynamics, simplyblock enables businesses to achieve the elusive goal of paying only for what they use without sacrificing performance or flexibility.

Take your competitive advantage and get started with simplyblock today.

The post AWS Storage Optimization: Avoid EBS Over-provisioning appeared first on simplyblock.

Origins of simplyblock and the Evolution of Storage Technologies

Rahil Parekh — Fri, 20 Sep 2024 21:19:18 +0000

Introduction:

In this episode of the simplyblock Cloud Commute Podcast, host Chris Engelbert interviews Michael Schmidt, co-founder of simplyblock. Michael shares insights into the evolution of storage technologies and how simplyblock is pushing boundaries with software-defined storage (SDS) to replace outdated hardware-defined systems. If you’re curious about how cloud storage is transforming through SDS and how it’s creating new possibilities for scalability and efficiency, this episode is a must-listen.

This interview is part of the simplyblock Cloud Commute Podcast, available on Youtube, Spotify, iTunes/Apple Podcasts, and our show site.

Key Takeaways

What is simplyblock, and how does it Differ from Traditional Storage Technologies?

Michael Schmidt explained that simplyblock is built on the idea that hardware-defined storage systems are becoming outdated. The traditional storage models, like SAN (Storage Area Networks), are slow-moving, expensive, and difficult to scale in cloud environments. Simplyblock, in contrast, leverages software-defined storage (SDS), making it more flexible, scalable, and hardware-agnostic. The key advantage is that SDS allows organizations to operate independently of the hardware lifecycle and seamlessly scale their storage without the limitations of physical systems.

How does simplyblock Offer better Storage Performance for Kubernetes Clusters?

Simplyblock is optimized for Kubernetes environments by integrating a CSI (Container Storage Interface) driver. Michael noted that deploying simplyblock on Kubernetes allows users to take advantage of local disk storage, NVMe devices, or standard GP3 volumes within AWS. This integration simplifies scaling and enhances storage performance with minimal configuration, making it highly adaptable for workloads that require high-speed, reliable storage.

EP30: A Brief History of Simplyblock and Evolution of Storage technologies | Michael Schmidt

In addition to highlighting the key takeaways, it’s essential to provide context that enriches the listener’s understanding of the episode. By offering this added layer of information, we ensure that when you tune in, you’ll have a clearer grasp of the nuances behind the discussion. This approach helps shed light on the reasoning and perspective behind the thoughtful questions posed by our host, Chris Engelbert. Ultimately, this allows for a more immersive and insightful listening experience.

Key Learnings

What are the Advantages of Software-defined Storage Compared to Hardware-defined Storage?

Software-defined storage offers flexibility by decoupling storage from physical hardware. This results in improved scalability, lifecycle management, and cost-effectiveness.

Simplyblock Insight:

Software-defined storage systems like simplyblock allow for hardware-agnostic scalability, enabling businesses to avoid hardware refresh cycles that burden CAPEX and OPEX budgets. SDS also opens up the possibility for greater automation and better integration with existing cloud infrastructures.

What is Thin Provisioning in Cloud Storage?

Thin provisioning allows cloud users to allocate storage without consuming the full provisioned capacity upfront, optimizing resource usage.

Simplyblock Insight:

Thin provisioning has been standard in enterprise storage systems for years, and simplyblock brings this essential feature to the cloud. By offering thin provisioning in its cloud-native architecture, simplyblock ensures that businesses can avoid over-provisioning and reduce storage costs, only paying for the storage they use. This efficiency significantly benefits organizations with unpredictable storage needs.

Additional Nugget of Information

Why are SLAs Important in Software-defined Storage, and how does Simplyblock Ensure Performance Reliability?

Service Level Agreements (SLAs) are crucial in software-defined storage because they guarantee specific performance metrics, such as IOPS (input/output operations per second), latency, and availability. In traditional hardware-defined storage systems, performance metrics were easier to predict due to standardized hardware configurations. However, with software-defined storage, where hardware can vary, SLAs provide customers with a level of assurance that the storage system will meet their needs consistently, regardless of the underlying infrastructure.

Conclusion

Michael Schmidt’s discussion offers a fascinating look at the evolving landscape of cloud storage. It’s clear that simplyblock is addressing key challenges by combining the flexibility of software-defined storage with the power of modern cloud-native architectures. Whether you’re managing large-scale Kubernetes deployments or trying to cut infrastructure costs, simplyblock’s approach to scalability and performance could be just what you need.

If you’re considering how to future-proof your storage solutions or make them more cost-efficient, the insights shared in this episode will be valuable. Be sure to explore the simplyblock platform and stay connected for more episodes of the Cloud Commute Podcast. We’re constantly bringing in experts to discuss the cutting-edge technologies shaping tomorrow’s infrastructure. Don’t miss out!

The post Origins of simplyblock and the Evolution of Storage Technologies appeared first on simplyblock.

9 Best Open Source Tools for Storage Performance Measurement

Rahil Parekh — Tue, 24 Oct 2023 00:43:00 +0000

What is storage performance measurement?

Accurately measuring storage performance is essential for optimizing the efficiency and reliability of your infrastructure. Open-source tools play a critical role in helping administrators and developers assess storage performance by providing insights into latency, throughput, and IOPS (input/output operations per second). These tools are invaluable for identifying bottlenecks, monitoring trends, and ensuring that storage resources meet the demands of applications.

What are the best open-source tools for your storage performance measurement setup?

In this post, we will explore nine must-know open-source tools that can help you evaluate and improve your storage performance.

1. Fio (Flexible I/O Tester)

Fio is one of the most versatile tools available for benchmarking and testing storage performance. It allows you to simulate various I/O workloads to measure disk performance across different environments. With Fio, you can test read/write operations, random/sequential access patterns, and tune block sizes, helping you identify the capabilities and limitations of your storage system.

2. Iometer

Originally developed by Intel, Iometer is a comprehensive storage benchmarking tool that supports both Windows and Linux platforms. It provides detailed reports on IOPS, bandwidth, and latency for a wide range of storage devices. Iometer’s ability to simulate multiple worker threads and diverse workloads makes it a popular choice for evaluating storage performance under heavy load.

3. sysstat

Sysstat is a collection of performance monitoring tools for Linux, offering a detailed view of system performance, including storage I/O. It provides metrics like disk utilization, throughput, and request latency. Tools like iostat from the sysstat suite are essential for real-time monitoring and long-term analysis of storage performance, helping administrators identify performance bottlenecks.

4. vdbench

VDbench is an open-source benchmarking tool designed to measure the performance of storage systems with a focus on simulating complex workloads. It offers advanced features such as workload replay and stress testing, making it ideal for evaluating the behavior of storage devices under different I/O conditions. VDbench is particularly useful for performance validation in virtualized and cloud environments.

5. Bonnie++

Bonnie++ is a simple yet effective tool for testing the performance of hard drives, file systems, and SSDs. It evaluates various parameters, including file read/write speed, random seeks, and file creation/deletion rates. Bonnie++ provides a quick snapshot of how your storage devices are performing, helping you compare different storage setups or tune your system for optimal performance.

6. iozone

Iozone is a robust file system benchmarking tool that measures read/write, random I/O, and throughput performance. It generates reports on file system performance for different record sizes and file sizes, making it ideal for testing the performance of both traditional hard drives and modern SSDs. Iozone’s graphical output and detailed metrics provide valuable insights into file system behavior under various loads.

7. perf

Perf is a performance analysis tool that’s part of the Linux kernel, capable of measuring various aspects of system performance, including storage I/O. With Perf, you can analyze how disk operations impact overall system performance, allowing you to correlate I/O patterns with CPU and memory utilization. This tool is useful for identifying how storage workloads affect the entire system.

8. Dstat

Dstat is a flexible real-time system performance monitoring tool that provides valuable metrics on disk I/O performance, including read/write speeds and IOPS. It combines features from various tools such as iostat, vmstat, and ifstat into a single interface, making it easier to visualize and track storage performance in real time. Dstat is an essential tool for quick diagnostics and performance tuning.

9. Blkio

Blkio is a Linux kernel subsystem that provides tools for monitoring and controlling block device I/O, allowing you to track throughput, latency, and IOPS. Blkio tools like blkparse and blktrace offer deep insights into how I/O operations are processed, making it an excellent resource for troubleshooting and optimizing storage systems at the kernel level.

Why Choose simplyblock for Storage Performance Measurement?

Storage performance measurement requires sophisticated analysis of I/O patterns, latency profiles, and throughput characteristics across different storage layers. This is where simplyblock’s intelligent orchestration creates unique value:

Comprehensive I/O Analysis: Simplyblock implements advanced storage performance monitoring with deep insights into I/O behavior. The platform correlates IOPS, latency, and throughput metrics across different storage tiers, analyzing both sequential and random access patterns. It automatically profiles workload characteristics, monitors queue depths, and tracks I/O size distributions to provide a complete understanding of storage performance bottlenecks and optimization opportunities.
Intelligent Performance Optimization: Simplyblock manages complex storage performance tuning by implementing adaptive I/O scheduling and intelligent caching strategies. The platform continuously monitors storage device capabilities, automatically adjusts block sizes and queue depths based on workload patterns, and optimizes read-ahead and write-back settings for different storage technologies, from NVMe SSDs to traditional HDDs, ensuring optimal performance across varied workloads.
Enterprise-Grade Performance Management: Through Kubernetes integration, simplyblock automates critical performance management tasks. This includes sophisticated I/O throttling mechanisms, quality of service management across multiple tenants, and detailed performance analytics with historical trending. The platform provides comprehensive monitoring of storage latency distributions, IOPS utilization, and throughput patterns while maintaining performance isolation between workloads.

How to Optimize Storage Performance Measurement with Open-source Tools

This guide explored nine essential open-source tools for storage performance measurement, from Fio’s versatile benchmarking capabilities to Blkio’s kernel-level analytics. While these tools excel at different aspects – Iometer for comprehensive testing, sysstat for system-wide monitoring, and vdbench for workload simulation – proper implementation is crucial. Tools like Bonnie++ and iozone enable filesystem-specific testing, while perf and Dstat provide real-time performance insights. Each tool offers unique capabilities for understanding storage behavior across different workload patterns.

If you’re looking to further streamline your storage performance measurement processes, simplyblock offers comprehensive solutions that integrate seamlessly with these tools, helping you get the most out of your storage infrastructure.

Ready to optimize your storage performance? Contact simplyblock today to learn how we can help you enhance your storage infrastructure with high-performance, low-latency solutions tailored to your specific needs.

The post 9 Best Open Source Tools for Storage Performance Measurement appeared first on simplyblock.

Simplyblock for AWS – Whitepaper

Rob Pankow — Wed, 05 Apr 2023 12:06:04 +0000

Simplyblock has recognized the limitations of conventional cloud storage services, specifically AWS Elastic Block Storage (EBS) , and has introduced a groundbreaking alternative. This whitepaper presents a comprehensive overview of simplyblock’s solution, highlighting its ability to address the drawbacks of EBS while delivering enhanced performance and cost-effectiveness. By harnessing local NVMe storage attached to virtualized or bare-metal EC2 instances, simplyblock offers a high-performance block storage option that outperforms traditional storage solutions. Simplyblock offers a compelling alternative to Amazon EBS.

The core architecture of simplyblock revolves around cluster-based infrastructure, ensuring remarkable reliability, durability, and availability. With automated health-checking mechanisms and seamless scalability, simplyblock enables smooth operations without compromising performance or data integrity. This unique approach empowers users to optimize resource utilization and achieve significant cost savings, making it an ideal choice for organizations with demanding workloads and stringent performance requirements.

By leveraging simplyblock’s solution, businesses can break free from the limitations of AWS EBS and experience unparalleled performance and efficiency. The flexible deployment options, including virtualized or bare-metal EC2 instances, coupled with seamless integration with AWS infrastructure, provide a hassle-free experience for administrators and developers. With simplyblock’s innovative approach to high-performance block storage, organizations can unlock new possibilities and overcome the limitations of traditional cloud storage services.