Persistent Volume Archives | simplyblock

Kubernetes Storage 201: Concepts and Practical Examples

Rob Pankow — Mon, 23 Dec 2024 09:08:57 +0000

What is Kubernetes Storage?

Kubernetes storage is a sophisticated ecosystem designed to address the complex data management needs of containerized applications. At its core, Kubernetes storage provides a flexible mechanism to manage data across dynamic, distributed computing environments. It allows your containers to store, access, and persist data with unprecedented flexibility.

Storage Types in Kubernetes

Fundamentally, Kubernetes provides two types of storage: ephemeral volumes are bound to the container’s lifecycle, and persistent volumes survive a container restart or termination.

Ephemeral (Non-Persistent) Storage

Ephemeral storage represents the default storage mechanism in Kubernetes. It provides a temporary storage solution, existing only for the duration of a container’s lifecycle. Therefore, when a container is terminated or removed, all data stored in this temporary storage location is permanently deleted.

This type of storage is ideal for transient data that doesn’t require long-term preservation, such as temporary computation results or cache files. Most stateless workloads utilize ephemeral storage for these kinds of temporary data. That said, a “stateless workload” doesn’t necessarily mean no data is stored temporarily. It means there is no issue if this storage disappears from one second to the next.

Persistent Storage

Persistent storage is a critical concept in Kubernetes that addresses one of the fundamental challenges of containerized applications: maintaining data integrity and accessibility across dynamic and ephemeral computing environments.

Unlike ephemeral storage, which exists only for the lifetime of a container, persistent storage is not bound to the lifetime of a container. Hence, persistent storage provides a robust mechanism for storing and managing data that must survive container restarts, pod rescheduling, or even complete cluster redesigns. You enable persistent Kubernetes storage through the concepts of Persistent Volumes (PV) as well as Persistent Volume Claims (PVC).

Fundamental Kubernetes Storage Entities

Figure 1: The building blocks of Kubernetes Storage

Storage in Kubernetes is built up from multiple entities, depending on how storage is provided and if it is ephemeral or persistent.

Persistent Volumes (PV)

A Persistent Volume (PV) is a slice of storage in the Kubernetes cluster that has been provisioned by an administrator or dynamically created through a StorageClass. Think of a PV as a virtual storage resource that exists independently of any individual pod’s lifecycle. Consequently, this abstraction allows for several key capabilities:

Lifecycle Independence: A PV exists even when the owner-pod is deleted, recreated, or moved to different nodes.
Storage Type Abstraction: A PV is not bound to a specific storage technology. Hence, it can represent various storage types, including:

Persistent Volume Claims (PVC): Requesting Storage Resources

Persistent Volume Claims act as a user’s request for storage resources. Image your PVC as a demand for storage with specific requirements, similar to how a developer requests computing resources.

When a user creates a PVC, Kubernetes attempts to find and bind an appropriate Persistent Volume that meets the specified criteria. If no existing volume is found but a storage class is defined or a cluster-default one is available, the persistent volume will be dynamically allocated.

Key PersistentVolumeClaim Characteristics:

Size Specification: Defines a user storage capacity request
Access Modes: Defines how the volume can be accessed
- ReadWriteOnce (RWO): Allows all pods on a single node to mount the volume in read-write mode.
- ReadWriteOncePod: Allows a single pod to read-write mount the volume on a single node.
- ReadOnlyMany (ROX): Allows multiple pods on multiple nodes to read the volume. Very practical for a shared configuration state.
- ReadWriteMany (RWO): Allows multiple pods on multiple nodes to read and write to the volume. Remember, this could be dangerous for databases and other applications that don’t support a shared state.
StorageClass: Allows requesting specific types of storage based on performance, redundancy, or other characteristics

The Container Storage Interface (CSI)

The Container Storage Interface (CSI) represents a pivotal advancement in Kubernetes storage architecture. Before CSI, integrating storage devices with Kubernetes was a complex and often challenging process that required a deep understanding of both storage systems and container orchestration.

The Container Storage Interface introduces a standardized approach to storage integration. Storage providers (commonly referred to as CSI drivers) are so-called out-of-process entities that communicate with Kubernetes via an API. The integration of CSI into the Kubernetes ecosystem provides three major benefits:

CSI provides a vendor-neutral, extensible plugin architecture
CSI simplifies the process of adding new storage systems to Kubernetes
CSI enables third-party storage providers to develop and maintain their own storage plugins without modifying Kubernetes core code

Volumes: The Basic Storage Units

In Kubernetes, volumes are fundamental storage entities that solve the problem of data persistence and sharing between containers. Unlike traditional storage solutions, Kubernetes volumes are not limited to a single type of storage medium. They can represent:

Volumes provide a flexible abstraction layer that allows applications to interact with storage resources without being directly coupled to the underlying storage infrastructure.

StorageClasses: Dynamic Storage Provisioning

StorageClasses represent a powerful abstraction that enables dynamic and flexible storage provisioning because they allow cluster administrators to define different types of storage services with varying performance characteristics, such as:

High-performance SSD storage
Economical magnetic drive storage
Geo-redundant cloud storage solutions

When a user requests storage through a PVC, Kubernetes tries to find an existing persistent volume. If none was found, the appropriate StorageClass defines how to automatically provision a suitable storage resource, significantly reducing administrative overhead.

Figure 2: Table with features for ephemeral storage and persistent storage

Best Practices for Kubernetes Storage Management

Resource Limitation
- Implement strict resource quotas
- Control storage consumption across namespaces
- Set clear boundaries for storage requests
Configuration Recommendations
- Always use Persistent Volume Claims in container configurations
- Maintain a default StorageClass
- Use meaningful and descriptive names for storage classes
Performance and Security Considerations
- Implement quality of service (QoS) controls
- Create isolated storage environments
- Enable multi-tenancy through namespace segregation

Practical Storage Provisioning Example

While specific implementations vary, here’s a conceptual example of storage provisioning using Helm:

helm install storage-solution storage-provider/csi-driver \
  --set storage.size=100Gi \
  --set storage.type=high-performance \
  --set access.mode=ReadWriteMany

Kubernetes Storage with Simplyblock CSI: Practical Implementation Guide

Simplyblock is a storage platform for stateful workloads such as databases, message queues, data warehouses, file storage, and similar. Therefore, simplyblock provides many features tailored to the use cases, simplifying deployments, improving performance, or enabling features such as instant database clones.

Basic Installation Example

When deploying storage in a Kubernetes environment, organizations need a reliable method to integrate storage solutions seamlessly. The Simplyblock CSI driver installation process begins by adding the Helm repository, which allows teams to easily access and deploy the storage infrastructure. By creating a dedicated namespace called simplyblock-csi, administrators ensure clean isolation of storage-related resources from other cluster components.

The installation command specifies critical configuration parameters that connect the Kubernetes cluster to the storage backend. The unique cluster UUID identifies the specific storage cluster, while the API endpoint provides the connection mechanism. The secret token ensures secure authentication, and the pool name defines the initial storage pool where volumes will be provisioned. This approach allows for a standardized, secure, and easily repeatable storage deployment process.

Here’s an example of installing the Simplyblock CSI driver:

helm repo add simplyblock-csi https://raw.githubusercontent.com/simplyblock-io/simplyblock-csi/master/charts

helm repo update

helm install -n simplyblock-csi --create-namespace \
  simplyblock-csi simplyblock-csi/simplyblock-csi \
  --set csiConfig.simplybk.uuid=[random-cluster-uuid] \
  --set csiConfig.simplybk.ip=[cluster-ip] \
  --set csiSecret.simplybk.secret=[random-cluster-secret] \
  --set logicalVolume.pool_name=[cluster-name]

Advanced Configuration Scenarios

1. Performance-Optimized Storage Configuration

Modern applications often require precise control over storage performance, making custom StorageClasses invaluable.

Firstly, by creating a high-performance storage class, organizations can define exact performance characteristics for different types of workloads. The configuration sets a specific IOPS (Input/Output Operations Per Second) limit of 5000, ensuring that applications receive consistent and predictable storage performance.

Secondly, bandwidth limitations of 500 MB/s prevent any single application from monopolizing storage resources, promoting fair resource allocation. The added encryption layer provides an additional security measure, protecting sensitive data at rest. This approach allows DevOps teams to create storage resources that precisely match application requirements, balancing performance, security, and resource management.

# Example StorageClass configuration
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance-storage
provisioner: csi.simplyblock.io
parameters:
  qos_rw_iops: "5000"    # High IOPS performance
  qos_rw_mbytes: "500"   # Bandwidth limit
  encryption: "True"      # Enable encryption

2. Multi-Tenant Storage Setup

As a large organization or cloud provider, you require a robust environment and workload separation mechanism. For that reason, teams organize workloads between development, staging, and production environments by creating a dedicated namespace for production applications.

Therefore, the custom storage class for production workloads ensures critical applications have access to dedicated storage resources with specific performance and distribution characteristics.

The distribution configuration with multiple network domain controllers (NDCs) provides enhanced reliability and performance. Indeed, this approach supports complex enterprise requirements by enabling granular control over storage resources, improving security, and ensuring that production workloads receive the highest quality of service.

# Namespace-based storage isolation
apiVersion: v1
kind: Namespace
metadata:
  name: production-apps

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: encrypted-volume
  annotations:
    simplybk/secret-name: encrypted-volume-keys
spec:
  storageClassName: encrypted-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Multipath Storage Configuration

Network resilience is a critical consideration in enterprise storage solutions. Hence, multipath storage configuration provides redundancy by allowing multiple network paths for storage communication. By enabling multipathing and specifying a default network interface, organizations can create more robust storage infrastructures that can withstand network interruptions.

The caching node creation further enhances performance by providing an intelligent caching layer that can improve read and write operations. Furthermore, this configuration supports load balancing and reduces potential single points of failure in the storage network.

cachingnode:
  create: true
  multipathing: true
  ifname: eth0  # Default network interface

Best Practices for Kubernetes Storage with Simplyblock

Always specify a unique pool name for each storage configuration
Implement encryption for sensitive workloads
Use QoS parameters to control storage performance
Leverage multi-tenancy features for environment isolation
Regularly monitor storage node capacities and performance

Deletion and Cleanup

# Uninstall the CSI driver
helm uninstall "simplyblock-csi" --namespace "simplyblock-csi"

# Remove the namespace
kubectl delete namespace simplyblock-csi

The examples demonstrate the flexibility of Kubernetes storage, showcasing how administrators can fine-tune storage resources to meet specific application requirements while maintaining performance, security, and scalability. Try simplyblock for the most flexible Kubernetes storage solution on the market today.

The post Kubernetes Storage 201: Concepts and Practical Examples appeared first on simplyblock.

NVMe & Kubernetes: Future-Proof Infrastructure

Chris Engelbert — Wed, 27 Nov 2024 13:34:00 +0000

The marriage of NVMe storage and Kubernetes persistent volumes represents a perfect union of high-performance storage and modern container orchestration. As organizations increasingly move performance-critical workloads to Kubernetes, understanding how to leverage NVMe technology becomes crucial for achieving optimal performance and efficiency.

The Evolution of Storage in Kubernetes

When Kubernetes was created over 10 years ago, its only purpose was to schedule and orchestrate stateless workloads. Since then, a lot has changed, and Kubernetes is increasingly used for stateful workloads. Not just basic ones but mission-critical workloads, like a company’s primary databases. The promise of workload orchestration in infrastructures with growing complexity is too significant.

Anyhow, traditional Kubernetes storage solutions relied upon and still rely on old network-attached storage protocols like iSCSI. Released in 2000, iSCSI was built on the SCSI protocol itself, first introduced in the 1980s. Hence, both protocols are inherently designed for spinning disks with much higher seek times and access latencies. According to our modern understanding of low latency and low complexity, they just can’t keep up.

While these solutions worked well for basic containerized applications, they fall short for high-performance workloads like databases, AI/ML training, and real-time analytics. Let’s look at the NVMe standard, particularly NVMe over TCP, which has transformed our thinking about storage in containerized environments, not just Kubernetes.

Why NVMe and Kubernetes Work So Well Together

The beauty of this combination lies in their complementary architectures. The NVMe protocol and command set were designed from the ground up for parallel, low-latency operations–precisely what modern containerized applications demand. When you combine NVMe’s parallelism with Kubernetes’ orchestration capabilities, you get a system that can efficiently distribute I/O-intensive workloads while maintaining microsecond-level latency. Further, comparing NVMe over TCP vs iSCSI, we see significant improvement in terms of IOPS and latency performance when using NVMe/TCP.

Consider a typical database workload on Kubernetes. Traditional storage might introduce latencies of 2-4ms for read operations. With NVMe over TCP, these same operations complete in under 200 microseconds–a 10-20x improvement. This isn’t just about raw speed; it’s about enabling new classes of applications to run effectively in containerized environments.

The Technical Symphony

The integration of NVMe with Kubernetes is particularly elegant through persistent volumes and the Container Storage Interface (CSI). Modern storage orchestrators like simplyblock leverage this interface to provide seamless NVMe storage provisioning while maintaining Kubernetes’ declarative model. This means development teams can request high-performance storage using familiar Kubernetes constructs while the underlying system handles the complexity of NVMe management, providing fully reliable shared storage.

The NMVe Impact: A Real-World Example

But what does that mean for actual workloads? Our friends over at Percona found in their MongoDB Performance on Kubernetes report that Kubernetes implies no performance penalty. Hence, we can look at the disks’ actual raw performance.

A team of researchers from the University of Southern California, San Jose State University, and Samsung Semiconductor took on the challenge of measuring the implications of NVMe SSDs (over SATA SSD and SATA HDD) for real-world database performance.

The general performance characteristics of their test hardware:

	NVMe SSD	SATA SSD	SATA HDD
Access latency	113µs	125µs	14,295µs
Maximum IOPS	750,000	70,000	190
Maximum Bandwidth	3GB/s	278MB/s	791KB/s

Table 1: General performance characteristics of the different storage types

Their resume states, “scale-out systems are driving the need for high-performance storage solutions with high available bandwidth and lower access latencies. To address this need, newer standards are being developed exclusively for non-volatile storage devices like SSDs,” and “NVMe’s hardware and software redesign of the storage subsystem translates into real-world benefits.”

They’re closing with some direct comparisons that claim an 8x performance improvement of NVMe-based SSDs compared to a single SATA-based SSD and still a 5x improvement over a [Hardware] RAID-0 of four SATA-based SSDs.

Transforming Database Operations

Perhaps the most compelling use case for NVMe in Kubernetes is database operations. Typical modern databases process queries significantly faster when storage isn’t the bottleneck. This becomes particularly important in microservices architectures where concurrent database requests and high-load scenarios are the norm.

Traditionally, running stateful services in Kubernetes meant accepting significant performance overhead. With NVMe storage, organizations can now run high-performance databases, caches, and messaging systems with bare-metal-like performance in their Kubernetes clusters.

Dynamic Resource Allocation

One of Kubernetes’ central promises is dynamic resource allocation. That means assigning CPU and memory according to actual application requirements. Furthermore, it also means dynamically allocating storage for stateful workloads. With storage classes, Kubernetes provides the option to assign different types of storage backends to different types of applications. While not strictly necessary, this can be a great application of the “best tool for the job” principle.

That said, for IO-intensive workloads, such as databases, a storage backend providing NVMe storage is essential. NVMe’s ability to handle massive I/O parallelism aligns perfectly with Kubernetes’ scheduling capabilities. Storage resources can be dynamically allocated and deallocated based on workload demands, ensuring optimal resource utilization while maintaining performance guarantees.

Simplified High Availability

The low latency of NVMe over TCP enables new approaches to high availability. Instead of complex database replication schemes, organizations can leverage storage-level replication (or more storage-efficient erasure coding, like in the case of simplyblock) with a negligible performance impact. This significantly simplifies application architecture while improving reliability.

Furthermore, NVMe over TCP utilizes multipathing as an automatic fail-over implementation to protect against network connection issues and sudden connection drops, increasing the high availability of persistent volumes in Kubernetes.

The Physics Behind NVMe Performance

Many teams don’t realize how profoundly storage physics impacts database operations. Traditional storage solutions averaging 2-4ms latency might seem fast, but this translates to a hard limit of about 80 consistent transactions per second, even before considering CPU or network overhead. Each transaction requires multiple storage operations: reading data pages, writing to WAL, updating indexes, and performing one or more fsync() operations. At 3ms per operation, these quickly stack up into significant delays. Many teams spend weeks optimizing queries or adding memory when their real bottleneck is fundamental storage latency.

This is where the NVMe and Kubernetes combination truly shines. With NVMe as your Kubernetes persistent volume storage backend, providing sub-200μs latency, the same database operations can theoretically support over 1,200 transactions per second–a 15x improvement. More importantly, this dramatic reduction in storage latency changes how databases behave under load. Connection pools remain healthy longer, buffer cache decisions become more efficient, and query planners can make better optimization choices. With the storage bottleneck removed, databases can finally operate closer to their theoretical performance limits.

Looking Ahead

The combination of NVMe and Kubernetes is just beginning to show its potential. As more organizations move performance-critical workloads to Kubernetes, we’ll likely see new patterns and use cases that fully take advantage of this powerful combination.

Some areas to watch:

AI/ML workload optimization through intelligent data placement
Real-time analytics platforms leveraging NVMe’s parallel access capabilities
Next-generation database architectures built specifically for NVMe on Kubernetes Persistent Volumes

The marriage of NVMe-based storage and Kubernetes Persistent Volumes represents more than just a performance improvement. It’s a fundamental shift in how we think about storage for containerized environments. Organizations that understand and leverage this combination effectively gain a significant competitive advantage through improved performance, reduced complexity, and better resource utilization.

For a deeper dive into implementing NVMe storage in Kubernetes, visit our guide on optimizing Kubernetes storage performance.

The post NVMe & Kubernetes: Future-Proof Infrastructure appeared first on simplyblock.

Avoiding Storage Lock-in: Block Storage Migration with Simplyblock

Rob Pankow — Tue, 27 Aug 2024 00:08:26 +0000

Storage and particularly block storage is generally easy to migrate. It doesn’t create vendor lock-in, which is very different from most database systems. Therefore, it’s worth to briefly line out where this difference comes from.

Why is Vendor Lock-In Dangerous?

For most companies, data is the most crucial part of their business. Therefore, it is dangerous to forfeit the control on how to store this incredibly important good and storage vendor lock-in poses a significant risk to these businesses. When a company becomes overly reliant on a single storage provider or technology, it can find itself trapped in an inflexible situation with far-reaching consequences.

The dangers of vendor lock-in manifest in several ways:

Limited flexibility when requirements change or scalability needs grow.
Potential cost increases when vendors decide for sudden price rise, knowing that customers depend on their services and migration is complicated or tortuous.
Innovation constraints where other vendors provide advanced features.
Data migration challenges when moving large amounts of data from one system to another which can be complex and expensive.
Reduced bargaining power due to limited alternatives and complex migrations.
Business continuity risks if a vendor faces issues or goes out of business.
Compatibility problems due to proprietary formats or API which limited interoperability and compatibility.

These factors can lead to increased operational costs, decreased competitiveness, and potential disruptions to business continuity. As such, it’s crucial for organizations to carefully consider their storage strategies and implement measures to mitigate risks of vendor lock-in.

Migrating Block Storage on Linux

The interfaces provided by a database system are extremely complex compared to block storage. While some of it is standardized in SQL, there are a lot of system specifics in data and administrative interfaces. Migrating a database system from one to another—or even upgrading a release—requires entire projects.

On the other hand, the block storage interface on Linux is extremely simple in its essence, it allows you to write, read, or delete (trim) a range of blocks. The NVMe protocol itself is a bit more complicated, but is fully standardized (industry standard, managed by NVM Express, Inc.) and the majority of advanced features are neither required nor used. Most commonly they aren’t even accessible through the Linux block storage interface.

In essence, to migrate block storage under Linux, just follow a few simple steps, which have to be performed volume-by-volume: Take your volume offline Create your new volume of the same size or larger Copy or replicate the data on block-level (under Linux just use dd) Potentially resize the filesystem (if necessary) Verify the results In some cases, it is even possible to lower or eliminate the down-time and perform online replication. For example, the Linux Volume Manager (LVM) provides a feature to move data between physical volumes for a particular logical volume under the hood and while the volume is online (pvmove).

When operating in a Kubernetes-based environment, this simple migration is still perfectly available when using file or block storage.

Migration to Simplyblock

Figure 1: Migrating to simplyblock is easy. Any software that writes to raw disks will work as a standard block device.

Migrating from any block storage to a simplyblock logical volume (virtual NVMe block device) is simple and supported through sbcli (the simplyblock command line interface).

Plain Linux

Within a plain linux environment, it is possible to use sbcli migrate with an input of a list of block storage volumes. The necessary and corresponding simplyblock logical volumes are created first. Those volumes may be of the same size or larger. The source volumes are then unmounted, and volume level replication takes place. Finally, source volumes may be deleted and replicated volumes are mounted instead.

Kubernetes

To migrate existing PVCs (Persistent Volume Claim) from any underlying storage, we need to first replicate them into simplyblock. Simplyblock’s internal Kubernetes job sbcli ‑ migrate can automatically select all PVs (Persistent Volume) of a particular type, storage class, or label. During the migration, PVCs may still be active, meaning that PVs can be mounted, but pods must be stopped.

Simplyblock will then create corresponding volumes and PVCs. Afterwards it will replicate the source’s content over to the new volumes, and deploy them under the same mount points.

Optionally, it is possible to resize the volumes during this process and to automatically delete the originals when the process finishes successfully.

Migration from Simplyblock

Migrating a specific volume away from simplyblock is just as easy. Outside of Kubernetes, using dd is the easiest way with the source and destination volumes being unmounted and just copied.

Inside a Kubernetes environment, the process of migrating block and file storage is straight-forward, too.

Individual PVs can simply be backed up after deleting the PVC. Make sure that the lifecycle of the PV and PVC aren’t bound, otherwise the PV will be deleted by Kubernetes in the process. Afterwards, the PV can be restored to new volumes and eventually re-mounted as a new PVC.

Velero is a tool that greatly helps to simplify this process.

Simplyblock: Storage without Vendor Lock-in

Utilizing block storage brings the best of all worlds. Easy migration options, compatibility due to standardized interfaces, and the possibility to choose the best tool for the job by mixing different block storage options.

Simplyblock embraces the fact that there is no one-fits-all solution and enables full interoperability and compatibility with the default standard interfaces in modern computing systems, such as block storage and the NVMe protocol. Hence, simplyblock’s logical volumes provide an easy migration path from and to simplyblock.

However, simplyblock logical volumes provide additional features that make users want to stay.

Simplyblock volumes are full copy-on-write block storage devices which enable immediate snapshots and clones. Those can be used for fast backups, or to implement features such as database branching, enabling fast turn-around times when implementing customer facing functionality.

Furthermore, multi-tenancy (including encryption keys per volume) and thin provisioning enable storage virtualization with overprovisioning. Making use of the fact that a typical storage utilization is around 30% brings down bundled storage requirements by 70% and provides a great way to optimize for cost efficiency. Additional features such as deduplication can decrease storage needs even further.

All this and more makes simplyblock, the intelligent storage orchestrator, the perfect storage solution for database operators and everyone who operates stateful Kubernetes workloads that require high performance and low latency block or file storage.

The post Avoiding Storage Lock-in: Block Storage Migration with Simplyblock appeared first on simplyblock.

What is a Kubernetes Persistent Volume?

Chris Engelbert — Wed, 03 Apr 2024 12:13:37 +0000

A persistent volume is a slice of storage provisioned by a Kubernetes administrator that can be attached and mounted to pods. Like everything in Kubernetes, it is a resource inside the cluster, and its lifecycle is either bound to the lifecycle of the pod or can survive pod deletions. It is backed by storage provided through the container storage interface (CSI).

Stateful Workloads and Data Persistence

Simplyblock CSI Driver for Kubernetes Persistent Volumes

Originally designed to be stateless, containers were supposed to also be ephemeral and lightweight. They were designed to “boot” quickly and be small, maybe a few megabytes in size, sharing much of their host operating system.

This design quickly became a hassle, and people realized that you often have to persist data between container restarts. Some of this storage can be ephemeral (living until the pod ceases to exist) or persistent (which will stay alive indefinitely). This is specifically important for applications like databases or logs, as well as many other types of applications that need to hold serialized session information or similar states.

In general, the bigger the container-based landscape, the higher the chance of having stateful workloads in your deployments, especially with large Kubernetes deployments consisting of hundreds of nodes.

Using the CSI and its storage plugins, it is possible and recommended (at least by us) to separate the storage and compute infrastructure. This disaggregated architecture enables independent scalability and allows users to choose the best tool for the job.

To bind a persistent volume to a Kubernetes container, a so-called persistent volume is required that consists of the container’s storage requirement definition. That includes the StorageClass, basically what type of backing storage is requested (this is normally a 1:1 mapping to a specific CSI implementation, like simplyblock’s driver, or a combination of the CSI implementation and some characteristics, such as a performance policy), but also the requested size and lifecycle binding of the (persistent) volume claim and its persistent volume.

Why are Kubernetes Persistent Volumes Essential?

One could ask why not just directly bind to directories or other storage backends. The whole point of Kubernetes is to abstract away the underlying operating system, or better said, the whole environment outside of Kubernetes. This enables developers and DevOps folks to make specific assumptions about the environment, bringing development and production environments closer together. Especially for developers, this is a big win since debugging is easier than ever before.

To achieve this abstraction, applications, and containers should not make assumptions about the underlying storage (or any other resource). Hence, provisioning storage is part of the actual deployment process.

Managing persistent volume resources this way has quite a few benefits:

Decoupling: PVs decouple the storage provisioning from the pod or container lifecycle, allowing more flexibility and independence in managing storage resources.
Storage Persistence: PVs offer persistent storage, which survives pod terminations and rescheduling.
Dynamic Provisioning: PVs can be dynamically provisioned, enabling automatic on-demand creation and management according to the application requirements.
Portability: While not strictly true for all backing storage types, PVs enable the separation of concerns by providing access to local and remote storage options.
Resource Management: PVs, or better said, their storage class implementations, can provide better resource management by virtualizing the storage and sharing access to the same backing storage, hence reducing storage redundancy and optimizing resource allocation.

Types of Persistent Storage in Kubernetes

Kubernetes enables a wide variety of application deployments. Therefore, it also offers many storage options. While there are many more, we want to focus on the two specific ones that enable Kubernetes persistent storage, speed, and high availability (at least to some extent).

Kubernetes persistent volumes via storage plugins (CSI drivers) are your best bet for persistent storage and offer the widest variety of backing storage. Implementations can either offer access to an independently running storage cluster or appliance (disaggregated) or use the local storage of the Kubernetes worker nodes and offer it as a shared, clustered resource or just as an enhanced version of local mounts (hyperconverged). This type of storage is specifically interesting to persistent volumes since many companies provide enhanced features such as immediate snapshots, clones, thin provisioning, and more.

Editorial: Simplyblock provides the industry-leading cloud-native Kubernetes Enterprise storage to enable IO-intensive and latency-sensitive stateful workloads (like databases) that need increased reliability, speed, and cost-effectiveness. Additionally, simplyblock offers advanced deployment options, which include hyperconverged, disaggregated, and a combination of the two.

HostPath volumes are backed by a directory on the host machine. They stay intact when the pod or container is terminated or even deleted and can be reattached to a new pod or container as long as the new one is scheduled and created on the same Kubernetes worker node. This offers some type of durability but comes with the additional complexity of ensuring containers live on specific workers. While tainting nodes may work, a globally available (in the sense of across the Kubernetes cluster) backing storage will be easier and enable better resource utilization. This type is often used for speed requirements, taking a hit on high availability and fault tolerance.

Editorial: Simplyblock can provide high IOPS and predictable low latency without sacrificing high availability and fault tolerance.

How do Persistent Volumes Work?

Persistent Volumes are implemented using a combination of Kubernetes components and container storage interface (CSI)- backed storage plugins or drivers.

Internally, persistent storage is represented as a combination of persistent volumes (the actual logical storage entity) and persistent volume claims (the “assignment” of a volume to a container request for storage). Both types are Kubernetes resources and can be created, updated, or deleted using API calls or CRDs (custom resource definition).

Using a StorageClass, Kubernetes’ persistent volume claim (PVC) requests a specific backing implementation to provide a persistent volume. The storage class always defines which kind of storage implementation is used. However, a storage plugin may provide multiple storage classes and encode specific performance or other characteristics, such as high availability or levels of fault tolerance. The specific details depend on the storage plugin.

When the persistent volume claim is provided with a matching persistent volume, Kubernetes will mount the persistent volume into the pod, enabling access to the PV from inside the container. Additionally, the PVC may restrict or permit reading or writing data into the storage. These permissions can also enable features like simultaneously attaching the same persistent volume to multiple containers, provided that the backing implementation offers such functionality.

Last but not least, when the PV’s lifecycle is bound to the PVC, automatic resource management, like dynamic allocation and deallocation, happens when the PVC is created or deleted. Depending on the type of deployment, this can be a real benefit or completely useless. It’s up to you to decide when you want automatic resource management and when not—that’s the beauty of declarative configuration.

Your Stateful Workload

Persistent volumes (PVs) are critical in deploying stateful workloads into Kubernetes environments. By abstracting storage provisioning and management, PVs enable seamless integration of many storage resources, enabling the “use the best tool for the job” rule. That said, not all applications may have the same storage requirements, hence providing multiple storage classes can be required. Kubernetes makes this specifically easy with its storage plugin design and the definition of storage classes. Understanding persistent volumes is essential for Kubernetes administrators and developers looking to optimize storage management within their containerized environments.

Simplyblock offers a flexible, enterprise-ready Kubernetes storage solution that combines high performance, predictable low latency, and high availability, as well as fault tolerance. In addition, simplyblock provides advanced features such as immediate snapshots (copy-on-write), local and remote clones, cluster replication, thin provisioning, storage overcommitment, and zero downtime scalability. Do you want to learn more about simplyblock, or are you ready to use it ?

The post What is a Kubernetes Persistent Volume? appeared first on simplyblock.