Blog Archives | simplyblock

How To Reduce Your Cloud Storage Carbon Footprint

Rob Pankow — Tue, 04 Feb 2025 13:50:40 +0000

The footprint expansion of cloud infrastructure has been continuous since the invention of the cloud in the early 2000’s. With data volumes growing exponentially and cloud costs continuing to rise, the carbon footprint reduction of your data center has become crucial for operational efficiency and sustainability. This short guide explores practical strategies to reduce your cloud infrastructure footprint, focusing on storage optimization.

Understanding the Challenge of Cloud Infrastructure Growth

Cloud infrastructure footprint extends beyond simple storage consumption. It encompasses the entire ecosystem of resources, including compute instances, databases, storage volumes, networking components, and their complex interactions. It’s not only a problem of companies on-prem. The most significant impact of data center carbon footprint in the public clouds still depends on the end user. Overprovisioning, usage of inefficient technologies, and lack of awareness are some of the problems contributing to the fast-growing cloud’s carbon footprint.

One thing that is often overlooked is storage. Storage, unlike compute, needs to be on at all times. Traditional approaches to storage provisioning usually lead to significant waste. For instance, when deploying databases or other data-intensive applications, it’s common practice to overprovision storage to ensure adequate capacity for future growth. This results in large amounts of unused storage capacity, unnecessarily driving up wastage and carbon emissions.

Environmental Impact of Cloud Data Center Footprint

According to the International Energy Agency, digital infrastructure’s environmental impact has reached staggering levels, with data centers consuming approximately 1% of global electricity use. To put this in perspective, a single petabyte of actively used storage in traditional cloud environments has roughly the same annual carbon footprint as 20 round-trip flights from New York to London. Overall, data centers contribute more to carbon emissions than the whole aviation industry.

Optimizing Compute Resources

Let’s first look at compute, which is the most often mentioned regarding data center footprint. Modern cloud optimization platforms like Cast AI have revolutionized compute resource management with the help of ML-based algorithms. Organizations can significantly reduce compute costs while maintaining performance by analyzing workload patterns and automatically adjusting instance types and sizes. Cast’s customers typically report 50-75% cost savings on their Kubernetes compute costs through automated instance optimization.

For AWS users, some tools enable organizations to leverage lower-cost spot instances effectively. Modern workload orchestrators can automatically handle spot instance interruptions, making them viable even for production workloads. There are also “second-hand” instance marketplaces, where users can “give back” their reserved instances they don’t need anymore. These solutions might have a considerable impact not only on carbon footprint reduction but also on savings.

Modern Approaches to Storage Optimization

Storage optimization has evolved significantly in recent years. Modern solutions like simplyblock have introduced innovative approaches to storage management that dramatically help reduce your cloud footprint while maintaining or even improving performance.

Carbon Footprint Reduction and Cost Savings

Using simplyblock as your data center and cloud storage platform enables you to dramatically reduce your carbon footprint while optimizing your storage cost. Saving the environment doesn’t have to be expensive.

Save on costs and the environment

Thin Provisioning as a Rescue

One of the most effective strategies for reducing storage footprint is thin provisioning. Unlike traditional storage allocation, where you must pre-allocate the full volume size, thin provisioning allows you to create volumes of any size while only consuming the actual used space. This approach is compelling for database operations where storage requirements can be challenging to predict.

For example, a database service provider might need to provision 1TB volumes for each customer instance. With traditional provisioning, this would require allocating the full 1TB upfront, even if the customer initially only uses 100GB. Thin provisioning allows the creation of these 1TB volumes while only consuming the actual space used. This typically results in a 60-70% reduction in actual storage consumption and carbon footprint.

Sounds like a no-brainer? Well, cloud providers don’t allow you to use thin provisioning out of the box. The same applies to managed database services such as Amazon RDS or Aurora. The cloud providers use thin provisioning to benefit their own operations, but eventually, it still leads to wastage on the database level. Technologies like simplyblock come in handy for those looking to use thin provisioning in public cloud environments.

Intelligent Data Tiering

Not all data requires high-performance, expensive storage. Modern organizations are increasingly adopting various strategies to optimize storage costs while maintaining performance where it matters. This involves tiering data within a particular storage type (e.g., for object storage between S3 and Glacier) or, as in the case of simplyblock’s intelligent tiering, automatically moving data between different storage tiers and services based on access patterns and business requirements.

Take the example of an observability platform: recent metrics and logs require fast access and are kept in high-performance storage, while historical data can be automatically moved to more cost-effective object storage like Amazon S3. Simplyblock’s approach to tiering is particularly innovative, providing transparent tiering that’s entirely invisible for applications while delivering significant cost savings.

Maximizing Storage Efficiency Through Innovative Technologies

Modern storage solutions offer several powerful technologies for reducing data footprint, including:

Compression: Reduces data size in-line before writing
Deduplication: Eliminates redundant copies
Copy-on-write: Creates space-efficient snapshots and clones

Compression and deduplication work together to minimize the actual storage space required. Copy-on-write technology enables the efficient creation of database copies for development and testing environments without duplicating data.

For instance, when development teams need database copies for testing, traditional approaches would require creating full copies of production data. With copy-on-write technology, these copies can be created instantly with minimal additional storage overhead, only consuming extra space when data is modified.

Integrating with Modern Infrastructure

Kubernetes has introduced new challenges and opportunities for storage optimization. When running databases and other stateful workloads on Kubernetes, one needs cloud-native storage solutions that can efficiently handle dynamic provisioning and scaling while maintaining performance and reliability.

Through Kubernetes integration via CSI drivers, modern storage solutions can provide automated provisioning, scaling, and optimization of storage resources. This integration enables organizations to maintain efficient storage utilization even in highly dynamic environments.

Quantifying Real-World Storage Impact

Let’s break down what storage consumption really means in environmental terms. For this example, we take 100TB of transactional databases like Postgres or MySQL on AWS using gp3 volumes as baseline storage.

Base Assumptions

AWS gp3 volume running 24/7/365
Power Usage Effectiveness (PUE) for AWS data centers: 1.15
Average data center carbon intensity: 0.35 kg CO2e per kWh
Storage redundancy factor: 3x (EBS maintains 3 copies for durability)
Average car emission at 4.6 metric tons CO2e annually
Database high availability configuration (3x replication)
Development/testing environments (2x copies)

Detailed Calculation

Storage needs:

Primary database: 100TB
High availability (HA) replication: 200TB
Development/testing: 100TB × 2 = 200TB
Total storage footprint: 500TB

Power Consumption per TB:

Base power per TB = 0.024 kW
Daily consumption = 0.024 kW × 24 hours = 0.576 kWh/day
Annual consumption = 0.576 kWh × 365 = 210.24 kWh/year
With PUE for AWS: 210.24 kWh × 1.15 = 241.78 kWh/year
Total annual power consumption with redundancy is 241.78 kWh × 3 = 725.34 kWh/year

Carbon emissions: 725.34 kWh × 0.35 kg CO2e/kWh x 500 TB x 1.15 = 145.9 metric tons CO2e annually

The final calculation assumes additional overhead for networking and management of 15%.

Figure 1: Global energy demand by data center type (source: International Energy Agency)

Environmental Impact and Why We Need Carbon Footprint Reduction Strategies

This database setup generates approximately 146 metric tons of CO2e. That’s equivalent to annual emissions coming from 32 cars. The potential for CO2e (carbon footprint) reduction using thin provisioning, tiering, COW, multi-attach, and erasure coding on the storage level would come close to 90%, translating into taking 29 cars off the road. This example is just for a 100TB primary database, a fraction of what many enterprises store.

But that’s only the beginning of the story. According to research by Gartner, enterprise data is growing at 35-40% annually. At this rate, a company storing 100TB today will need over 750TB in just 5 years if growth continues unchecked. This is why we need to consider reducing the cloud’s carbon footprint today.

The Future of Cloud Carbon Footprint Reduction

As cloud infrastructure continues to evolve, the importance of efficient storage management will only grow. While data center operators have taken significant steps to reduce their environmental footprint by adopting green or renewable energy sources, a large portion of the energy used by data centers still comes from fossil fuel-generated electricity. The future of cloud optimization lies in intelligent, automated solutions that dynamically adapt to changing requirements while maintaining optimal resource utilization. Technologies such as AI-driven data placement and advanced compression algorithms will further enhance our ability to minimize storage footprints while maximizing performance.

Reducing your cloud data center footprint through storage optimization isn’t just about cost savings—it’s about environmental responsibility. Organizations can significantly lower their expenses and environmental impact by implementing modern storage optimization strategies and supporting them with efficient compute and network resources.

For more insights on cloud storage optimization and its environmental impact, visit our detailed cloud storage cost optimization guide. Every byte of storage saved contributes to a more sustainable future for cloud computing. Start your optimization journey today with simplyblock and be part of the solution for a more environmentally responsible digital infrastructure.

The post How To Reduce Your Cloud Storage Carbon Footprint appeared first on simplyblock.

Bare-Metal Kubernetes: Power of Direct Hardware Access

Rob Pankow — Tue, 28 Jan 2025 14:56:05 +0000

As of 2025, over 90% of enterprises have adopted Kubernetes. The technology has matured, and the ecosystem around containerization technologies is richer than ever. More platform teams are realizing the tangible benefits of running containerized workloads directly on bare-metal Kubernetes infrastructure, especially when compared to virtualized environments like VMware or OpenStack. This shift is particularly evident in storage performance, where removing virtualization layers leads to significant efficiency and speed gains.

Why Run Kubernetes on Bare Metal?

There are several compelling reasons to consider bare-metal Kubernetes deployments.

Virtualized environments like VMware and OpenStack have traditionally been used to run Kubernetes, but they introduce unnecessary overhead, especially for high-performance workloads.

By running Kubernetes on bare metal, businesses can eliminate the virtualization layer and gain direct access to hardware resources. This results in lower latency, higher throughput, and improved overall performance. It is especially beneficial for I/O-intensive applications, such as databases and real-time systems, where every microsecond of delay counts.

Moreover, the recent VMware acquisition by Broadcom has triggered many enterprises to look for alternatives, and Kubernetes has emerged as a top choice for many former VMware users.

Last but not least, bare-metal Kubernetes supports scalability without the complexity of managing virtual machines and containerized infrastructure. Containerization has revolutionized how applications are built and deployed. However, the complexity of managing containerized infrastructures alongside virtual machines has posed challenges for infrastructure teams. A complete transition to Kubernetes offers a unified approach for stateless and stateful workloads.

The Hidden Cost of Virtualization

When they came out, virtualization platforms like VMware and OpenStack were game-changers. They enabled more efficient use of hardware resources such as CPU and RAM for a small overhead cost of virtualization. However, as the adoption of containerized applications and Kubernetes grows, the inefficiencies of these platforms become more apparent.

Every layer of virtualization introduces performance overhead. Each additional abstraction between the application and the physical hardware consumes CPU cycles, adds memory overhead, and introduces latency, particularly in I/O-heavy workloads. While technologies are paravirtualization, enabling drivers inside virtual machines to bypass large parts of the OS driver stack, the overhead is still significant. However, Kubernetes (and containerization in general) is more resource-friendly than traditional hypervisor-based virtualization. Containerization doesn’t require an entire operating system per container and shares most host resources, isolating applications and physical hosts through filters and syscall guards.

From an operation perspective, managing both virtualization layers and Kubernetes orchestration platforms compounds this complexity.

Storage Performance: The Bare-Metal Advantage

One of the key differentiators for bare-metal Kubernetes is its ability to maximize storage performance by eliminating these virtualization layers. Direct hardware access allows NVMe devices to connect directly to containers without complex passthrough configurations. This results in dramatically lower latency and enhanced IOPS. This direct mapping of physical to logical storage resources streamlines the data path and increases performance, particularly for I/O-intensive applications like databases.

Consider a typical storage path in a virtualized environment:

Application → Container → Kubernetes → Virtual Machine → Hypervisor → Physical Storage

Now, compare it to a bare metal environment:

Application → Container → Kubernetes → Physical Storage

By reducing unnecessary layers, organizations see substantial improvements in both latency and throughput, making bare-metal Kubernetes ideal for high-demand environments.

Figure 1: Comparison of Layers: Virtualization vs Virtualized Kubernetes vs Bare-metal Kubernetes

Figure 1: Comparison of system layers for storage access with normal virtualization, virtualization and Kubernetes, and bare metal Kubernetes

NVMe/TCP: The Networking Revolution for Bare-Metal Kubernetes

For bare-metal Kubernetes deployments, the storage performance benefits are further enhanced with NVMe-over-Fabrics (NVMe-oF). Specifically using NVMe/TCP, the successor to iSCSI. This enables organizations to leverage high-speed networking protocols, allowing storage devices to communicate with Kubernetes clusters over a network, providing enterprise-grade performance and scalability. Best of all, NVMe/TCP eliminates the traditional limitations of local storage (physical space limitations) or Fibre Channel and Infiniband solutions (requirement for specialized network hardware).

Simplyblock is built to optimize storage performance in Kubernetes environments using NVMe-oF and NVMe/TCP. Our architecture allows Kubernetes clusters to efficiently scale and use networked NVMe storage devices without sacrificing the low-latency, high-throughput characteristics of direct hardware access.

By integrating NVMe/TCP, simplyblock ensures that storage access, whether local or over the network, remains consistent and fast, allowing for seamless storage expansion as your Kubernetes environment grows. Whether you’re managing databases, real-time analytics, or microservices, NVMe/TCP provides the robust networking backbone that complements the high-performance nature of bare-metal Kubernetes.

Bridging the Gap with Modern Storage Solutions

While the performance benefits of bare metal deployments are clear, enterprises still require advanced storage capabilities. An enterprise-grade storage solution requires high availability, data protection, and efficient resource utilization without sacrificing scalability. Solutions like simplyblock meet these needs without the added complexity of virtualization.

Simplyblock’s NVMe-first architecture directly integrates into Kubernetes environments, providing user-space access to NVMe devices at scale. Simplyblock simplifies persistent volume provisioning with enterprise features such as instant snapshots, clones, and transparent encryption— all without compromising performance.

Discover how Simplyblock optimizes Kubernetes storage.

Future-Proofing Infrastructure with Bare Metal

With advancements in storage and networking technologies, the advantages of bare-metal Kubernetes continue to grow. New NVMe devices are pushing the performance limits, demanding direct hardware access. On the other hand, innovations such as SmartNICs and DPUs require tight integration with the hardware.

In addition, all modern microservices architectures, real-time applications, and high-performance databases benefit from the predictability and low-latency performance offered by bare-metal. It was always something virtualization struggles to deliver.

Figure 2: Benefits of Bare-metal Kubernetes

The Cost Equation

We’ve talked a lot about performance benefits, but that’s not the only reason you should consider adopting bare-metal Kubernetes infrastructures. Kubernetes equally helps to achieve optimal hardware utilization while controlling costs.

Removing the virtualization layer reduces infrastructure and licensing costs, while the streamlined architecture lowers operational overhead. Solutions like simplyblock enable organizations to achieve these financial benefits while maintaining enterprise-grade features like high availability and advanced data protection. The benefits of running on a standard Ethernet network stack and using NVMe over TCP further add to the cost benefits.

Nevertheless, transitioning to bare-metal Kubernetes is not an all-or-nothing process. First, enterprises can begin with I/O-intensive workloads that would benefit the most from performance gains and gradually scale over time. With many Kubernetes operators for databases available, running databases on Kubernetes has never been easier.

Second, enterprises can keep specific applications that may not run well on containers in virtual machines. Kubernetes extensions, such as KubeVirt, help manage virtual machine resources using the same API as containers, simplifying the deployment architecture.

That further confirms the growing adoption of stateful workloads on Kubernetes.

The Future of Kubernetes is Bare Metal

Organizations need infrastructure that taps directly into hardware capabilities while offering robust enterprise features to remain competitive in today’s fast-moving and cloud-native landscape. Bare-metal Kubernetes, combined with cutting-edge storage solutions like simplyblock, represents the ideal balance of performance, scalability, and functionality.

Ready to optimize your Kubernetes deployment with bare-metal storage and networked NVMe solutions? Contact Simplyblock today to learn how our solutions can drive your infrastructure’s performance and reduce your costs.

The post Bare-Metal Kubernetes: Power of Direct Hardware Access appeared first on simplyblock.

NVMe over TCP vs iSCSI: Evolution of Network Storage

Chris Engelbert — Wed, 08 Jan 2025 14:27:48 +0000

TLDR: In a direct comparison of NVMe over TCP vs iSCSI, we see that NVMe over TCP outranks iSCSI in all categories with IOPS improvements of up to 50% (and more) and latency improvements by up to 34%.

When data grows, storage needs to grow, too. That’s when remotely attached SAN (Storage Area Network) systems come in. So far, these were commonly connected through one of three protocols: Fibre Channel, Infiniband, or iSCSI. However, with the latter being on the “low end” side of things, without the need for special hardware to operate. NMVe over Fabrics (NVMe-oF), and specifically NVMe over TCP (NVMe/TCP) as the successor of iSCSI, is on the rise to replace these legacy protocols and bring immediate improvements in latency, throughput, and IOPS.

iSCSI: The Quick History Lesson

Figure 1: Nokia 3310, released September 2000 (Source: Wikipedia)

iSCSI is a protocol that connects remote storage solutions (commonly hardware storage appliances) to storage clients. The latter are typically servers without (or with minimal) local storage, as well as virtual machines. In recent years, we have also seen iSCSI being used as a backend for container storage.

iSCSI stands for Internet Small Computer Storage Interface and encapsulates the standard SCSI commands within TCP/IP packets. That means that iSCSI works over commodity Ethernet networks, removing the need for specialized hardware such as network cards (NICs) and switches.

The iSCSI standard was first released in early 2000. A world that was very different from today. Do you remember what a phone looked like in 2000?

That said, while there was access to the first flash-based systems, prices were still outrageous, and storage systems were designed with spinning disks in mind. Remember that. We’ll come back to it later.

What is SCSI?

SCSI, or, you guessed it, the Small Computer Storage Interface, is a set of standards for connecting and transferring data between computers and peripheral devices. Originally developed in the 1980s, SCSI has been a foundational technology for data storage interfaces, supporting various device types, primarily hard drives, optical drives, and scanners.

While SCSI kept improving and adding new commands for technologies like NVMe. The foundation is still rooted in the early 1980s, though. However, many standards still use the SCSI command set, SATA (home computers), SAS (servers), and iSCSI.

What is NVMe?

Non-Volatile Memory Express (NVMe) is a modern PCI Express-based (PCI-e) storage interface. With the original specification dating back to 2011, NVMe is engineered specifically for solid-state drives (SSDs) connected via the PCIe bus. Therefore, NVMe devices are directly connected to the CPU and other devices on the bus to increase throughput and latency. NVMe dramatically reduces latency and increases input/output operations per second (IOPS) compared to traditional storage interfaces.

As part of the NVMe standard, additional specifications are developed, such as the transport specification which defines how NVMe commands are transported (e.g., via the PCI Express Bus, but also networking protocols like TCP/IP).

The Fundamental Difference of Spinning Disks and NVMe

Traditional spinning hard disk drives (HDDs) rely on physical spinning platters and moveable read/write heads to write or access data. When data is requested, the mechanical component must be physically placed in the correct location of the platter stack, resulting in significant access latencies ranging from 10-14 milliseconds.

Flash storage, including NVMe devices, eliminates the mechanical parts, utilizing NAND flash chips instead. NAND stores data purely electronically and achieves access latencies as low as 20 microseconds (and even lower on super high-end gear). That makes them 100 times faster than their HDD counterparts.

For a long time, flash storage had the massive disadvantage of limited storage capacity. However, this disadvantage slowly fades away with companies introducing higher-capacity devices. For example, Toshiba just announced a 180TB flash storage device.

Cost, the second significant disadvantage, also keeps falling with improvements in development and production. Technologies like QLC NAND offer incredible storage density for an affordable price.

Anyhow, why am I bringing up the mechanical vs electrical storage principle? The reason is simple: the access latency. SCSI and iSCSI were never designed for super low access latency devices because they didn’t really exist at the time of their development. And, while some adjustments were made to the protocol over the years, their fundamental design is outdated and can’t be changed for backward compatibility reasons.

NVMe over Fabrics: Flash Storage on the Network

NVMe over Fabrics (also known as NVMe-oF) is an extension to the NVMe base specification. It allows NVMe storage to be accessed over a network fabric while maintaining the low-latency, high-performance characteristics of local NVMe devices.

NVMe over Fabrics itself is a collection of multiple sub-specifications, defining multiple transport layer protocols.

NVMe over TCP: NVMe/TCP utilizes the common internet standard protocol TCP/IP. It deploys on commodity Ethernet networks and can run parallel to existing network traffic. That makes NVMe over TCP the modern successor to iSCSI, taking over where iSCSI left off. Therefore, NVMe over TCP is the perfect solution for public cloud-based storage solutions that typically only provide TCP/IP networking.
NVME over Fibre Channel: NVMe/FC builds upon the existing Fibre Channel network fabric. It tunnels NVMe commands through Fibre Channel packets and enables reusing available Fibre Channel hardware. I wouldn’t recommend it for new deployments due to the high entry cost of Fibre Channel equipment.
NVMe over Infiniband: Like NVMe over Fibre Channel, NVMe/IB utilizes existing Infiniband networks to tunnel the NVMe protocol. If you have existing Infiniband equipment, NVMe over Infiniband might be your way. For new deployments, the initial entry cost is too high.
NVME over RoCE: NVME over Converged Ethernet is a transport layer that uses an Ethernet fabric for remote direct memory access (RDMA). To use NVMe over RoCE, you need RDMA-capable NICs. RoCE comes in two versions: RoCEv1, which is a layer-2 protocol and not routable, and RoCEv2, which uses UDP/IP and can be routed across complex networks. NVMe over RoCE doesn’t scale as easily as NVMe over TCP but provides even lower latencies.

NVMe over TCP vs iSCSI: The Comparison

When comparing NVMe over TCP vs iSCSI, we see considerable improvements in all three primary metrics: latency, throughput, and IOPS.

Figure 2: Medium queue-depth workload at 4KB blocksize I/O (Source: Blockbridge)

The folks over at Blockbridge ran an extensive comparison of the two technologies, which shows that NVMe over TCP outperformed iSCSI, regardless of the benchmark.

I’ll provide the most critical benchmarks here, but I recommend you read through the full benchmark article right after finishing here.

Anyhow, let’s dive a little deeper into the actual facts on the NVMe over TCP vs iSCSI benchmark.

Editor’s Note: Our Developer Advocate, Chris Engelbert, gave a talk recently at SREcon in Dublin, talking about the performance between NVMe over TCP and iSCSI, which led to this blog post. Find the full presentation NVMe/TCP makes iSCSI look like Fortran.

Benchmarking Network Storage

Evaluating storage performance involves comparing four major performance indicators.

IOPS: Number of input/output operations processed per second
Latency: Time required to complete a single input/output operation
Throughput: Total data transferred per unit of time
Protocol Overhead: Additional processing required by the communication protocol

Editor’s note: For latency, throughput, and IOPS, we have an exhaustive blog post talking deeper about the necessities, their relationships, and how to calculate them.

A comprehensive performance testing involves simulated workloads that mirror real-world scenarios. To simplify this process, benchmarks use tools like FIO (Flexible I/O Tester) to generate consistent, reproducible test data and results across different storage configurations and systems.

IOPS Improvements of NVMe over TCP vs iSCSI

Running IOPS-intensive applications, the number of available IOPS in a storage system is critical. IOPS-intensive application means systems such as databases, analytics platforms, asset servers, and similar solutions.

Improving IOPS by exchanging the storage-network protocol is an immediate win for the database and us.

Using NVMe over TCP instead of iSCSI shows a dramatic increase in IOPS, especially for smaller block sizes. At 512 bytes block size, Blockbridge found an average 35.4% increase in IOPS. At a more common 4KiB block size, the average increase was 34.8%.

That means the same hardware can provide over one-third more IOPS using NVMe over TCP vs iSCSI at no additional cost.

Figure 3: Average IOPS improvement of NVMe over TCP vs iSCSI by blocksize (Source: Blockbridge)

Latency Improvements of NVMe over TCP vs iSCSI

While IOPS-hungry use cases, such as compaction events in databases (Cassandra), benefit from the immense increase in IOPS, latency-sensitive applications love low access latencies. Latency is the primary factor that causes people to choose local NVMe storage over remotely-attached storage, knowing about many or all drawbacks.

Latency-sensitive applications range from high-frequency trading systems, where milliseconds are measured in hard money, over telecommunication systems, where latency can introduce issues with system synchronization, to cybersecurity and threat detection solutions that need to react as fast as possible.

Therefore, decreasing latency is a significant benefit for many industries and solutions. Apart from that, a lower access latency always speeds up data access, even if your system isn’t necessarily latency-sensitive. You will feel the difference.

Blockbridge found the most significant benefit in access latency reduction with a block size of 16KiB with a queue depth of 128 (which can easily be hit with I/O demanding solutions). The average latency for iSCSI was 5,871μs compared to NVMe over TCP with 5,089μs. A 782μs (~25%) decrease in access latency—just by exchanging the storage protocol.

Figure 4: Average access latency comparison, NVMe over TCP vs iSCSI, for 4, 8, 16 KiB (Source: Blockbridge)

Throughput Improvement of NVMe over TCP vs iSCSI

As the third primary metric of storage performance, throughput describes how much data is actually pumped from the disk into your workload.

Throughput is the major factor for applications such as video encoding or streaming platforms, large analytical systems, and game servers streaming massive worlds into memory. Furthermore, there are also time-series storage, data lakes, and historian databases.

Throughput-heavy systems benefit from higher throughput to get the “job done faster.” Oftentimes, increasing the throughput isn’t easy. You’re either bound by the throughput provided by the disk or, in the case of a network-attached system, the network bandwidth. To achieve high throughput and capacity, remote network storage utilizes high bandwidth networking or specialized networking systems such as Fibre Channel or Infiniband.

Blockbridge ran their tests on a dual-port 100Gbit/s network card, limited by the PCI Express x16 Gen3 bus to a maximum throughput of around 126Gbit/s. Newer PCIe standards achieve much higher throughput. Hence, NVMe devices and NICs aren’t bound by the “limiting” factor of the PCIe bus anymore.

With a 16KiB block size and a queue depth of 32, their benchmark saw a whopping 2.3GB/s increase in performance on NVMe over TCP vs iSCSI. The throughput increased from 10.387GBit/s on iSCSI to 12.665GBit/s, an easy 20% on top—again, using the same hardware. That’s how you save money.

Figure 5: Average throughput of NVMe over TCP vs iSCSI for different queue depths of 1, 2, 4, 8, 16, 32, 64, 128 (Source: Blockbridge)

The Compelling Case for NVMe over TCP

We’ve seen that NVMe over TCP has significant performance advantages over iSCSI in all three primary storage performance metrics. Nevertheless, there are more advantages to NVMe over TCP vs iSCSI.

Standard Ethernet: NVMe over TCP’s most significant advantage is its ability to operate over standard Ethernet networks. Unlike specialized networking technologies (Infiniband, Fibre Channel), NVMe/TCP requires no additional hardware investments or complex configuration, making it remarkably accessible for organizations of all sizes.
Performance Characteristics: NVMe over TCP delivers exceptional performance by minimizing protocol overhead and leveraging the efficiency of NVMe’s design. It can achieve latencies comparable to local storage while providing the flexibility of network-attached resources. Modern implementations can sustain throughput rates exceeding traditional storage protocols by significant margins.
Ease of Deployment: NVMe over TCP integrates seamlessly with Linux and Windows (Server 2025 and later) since the necessary drivers are already part of the kernel. That makes NVMe/TCP straightforward to implement and manage. Seamless compatibility reduces the learning curve and integration challenges typically associated with new storage technologies.

Choosing Between NVMe over TCP and iSCSI

Deciding between two technologies isn’t always easy. It isn’t that hard in the case of NVMe over TCP vs iSCSI. The use cases for new iSCSI deployment are very sparse. From my perspective, the only valid use case is the integration of pre-existing legacy systems that don’t yet support NVMe over TCP

That’s why simplyblock, as an NVMe over TCP first solution, still provides iSCSI if you really need it. We offer it exactly for the reason that migrations don’t happen from today to tomorrow. Still, you want to leverage the benefits of newer technologies, such as NVMe over TCP, wherever possible. With simplyblock, logical volumes can easily be provisioned as NVMe over TCP or iSCSI devices. You can even switch over from iSCSI to NVMe over TCP later on.

In any case, you should go with NVMe over TCP when:

You operate high-performance computing environments
You have modern data centers with significant bandwidth
You deploy workloads requiring low-latency, high IOPS, or throughput storage access
You find yourself in scenarios that demand scalable, flexible storage solutions
You are in any other situation where you need remotely attached storage

You should stay on iSCSI (or slowly migrate away) when:

You have legacy infrastructure with limited upgrade paths

You see, there aren’t a lot of reasons. Given that, it’s just a matter of selecting your new storage solution. Personally, these days, I would always recommend software-defined storage solutions such as simplyblock, but I’m biased. Anyhow, an SDS provides the best of both worlds: commodity storage hardware (with the option to go all in with your 96-bay storage server) and performance.

Simplyblock: Embracing Versatility

Simplyblock demonstrates forward-thinking storage design by supporting both NVMe over TCP and iSCSI, providing customers with the best performance when available and the chance to migrate slowly in the case of existing legacy clients.

Furthermore, simplyblock offers features known from traditional SAN storage systems or “filesystems” such as ZFS. This includes a full copy-on-write backend with instant snapshots and clones. It includes synchronous and asynchronous replication between storage clusters. Finally, simplyblock is your modern storage solution, providing storage to dedicated hosts, virtual machines, and containers. Regardless of the client, simplyblock offers the most seamless integration with your existing and upcoming environments.

The Future of NVMe over TCP

As enterprise and cloud computing continue to evolve, NVMe over TCP stands as the technology of choice for remotely attached storage. Firstly, it combines simplicity, performance, and broad compatibility. Secondly, it provides a cost-efficient and scalable solution utilizing commodity network gear.

The protocol’s ongoing development (last specification update May 2024) and increasing adoption show continued improvements in efficiency, reduced latency, and enhanced scalability.

NVMe over TCP represents a significant step forward in storage networking technology. Furthermore, combining the raw performance of NVMe with the ubiquity of Ethernet networking offers a compelling solution for modern computing environments. While iSCSI remains relevant for specific use cases and during migration phases, NVME over TCP represents the future and should be adopted as soon as possible.

We, at simplyblock, are happy to be part of this important step in the history of storage.

Questions and Answers

Is NVMe over TCP better than iSCSI?

Yes, NVMe over TCP is superior to iSCSI in almost any way. NVMe over TCP provides lower protocol overhead, better throughput, lower latency, and higher IOPS compared to iSCSI. It is recommended that iSCSI not be used for newly designed infrastructures and that old infrastructures be migrated wherever possible.

How much faster is NVMe over TCP compared to iSCSI?

NVMe over TCP is superior in all primary storage metrics, meaning IOPS, latency, and throughput. NVMe over TCP shows up to 35% higher IOPS, 25% lower latency, and 20% increased throughput compared to iSCSI using the same network fabric and storage.

What is NVMe over TCP?

NVMe/TCP is a storage networking protocol that utilizes the common internet standard protocol TCP/IP as its transport layer. It is deployed through standard Ethernet fabrics and can be run parallel to existing network traffic, while separation through VLANs or physically separated networks is recommended. NVMe over TCP is considered the successor of the iSCSI protocol.

What is iSCSI?

iSCSI is a storage networking protocol that utilizes the common internet standard protocol TCP/IP as its transport layer. It connects remote storage solutions (commonly hardware storage appliances) to storage clients through a standard Ethernet fabric. iSCSI was initially standardized in 2000. Many companies replace iSCSI with the superior NVMe over TCP protocol.

What is SCSI?

SCSI (Small Computer Storage Interface) is a command set that connects computers and peripheral devices and transfers data between them. Initially developed in the 1980s, SCSI has been a foundational technology for data storage interfaces, supporting various device types such as hard drives, optical drives, and scanners.

What is NVMe?

NVMe (Non-Volatile Memory Express) is a specification that defines the connection and transmission of data between storage devices and computers. The initial specification was released in 2011. NVMe is designed specifically for solid-state drives (SSDs) connected via the PCIe bus. NVMe devices have improved latency and performance than older standards such as SCSI, SATA, and SAS.

The post NVMe over TCP vs iSCSI: Evolution of Network Storage appeared first on simplyblock.

Kubernetes Storage 201: Concepts and Practical Examples

Rob Pankow — Mon, 23 Dec 2024 09:08:57 +0000

What is Kubernetes Storage?

Kubernetes storage is a sophisticated ecosystem designed to address the complex data management needs of containerized applications. At its core, Kubernetes storage provides a flexible mechanism to manage data across dynamic, distributed computing environments. It allows your containers to store, access, and persist data with unprecedented flexibility.

Storage Types in Kubernetes

Fundamentally, Kubernetes provides two types of storage: ephemeral volumes are bound to the container’s lifecycle, and persistent volumes survive a container restart or termination.

Ephemeral (Non-Persistent) Storage

Ephemeral storage represents the default storage mechanism in Kubernetes. It provides a temporary storage solution, existing only for the duration of a container’s lifecycle. Therefore, when a container is terminated or removed, all data stored in this temporary storage location is permanently deleted.

This type of storage is ideal for transient data that doesn’t require long-term preservation, such as temporary computation results or cache files. Most stateless workloads utilize ephemeral storage for these kinds of temporary data. That said, a “stateless workload” doesn’t necessarily mean no data is stored temporarily. It means there is no issue if this storage disappears from one second to the next.

Persistent Storage

Persistent storage is a critical concept in Kubernetes that addresses one of the fundamental challenges of containerized applications: maintaining data integrity and accessibility across dynamic and ephemeral computing environments.

Unlike ephemeral storage, which exists only for the lifetime of a container, persistent storage is not bound to the lifetime of a container. Hence, persistent storage provides a robust mechanism for storing and managing data that must survive container restarts, pod rescheduling, or even complete cluster redesigns. You enable persistent Kubernetes storage through the concepts of Persistent Volumes (PV) as well as Persistent Volume Claims (PVC).

Fundamental Kubernetes Storage Entities

Figure 1: The building blocks of Kubernetes Storage

Storage in Kubernetes is built up from multiple entities, depending on how storage is provided and if it is ephemeral or persistent.

Persistent Volumes (PV)

A Persistent Volume (PV) is a slice of storage in the Kubernetes cluster that has been provisioned by an administrator or dynamically created through a StorageClass. Think of a PV as a virtual storage resource that exists independently of any individual pod’s lifecycle. Consequently, this abstraction allows for several key capabilities:

Lifecycle Independence: A PV exists even when the owner-pod is deleted, recreated, or moved to different nodes.
Storage Type Abstraction: A PV is not bound to a specific storage technology. Hence, it can represent various storage types, including:

Persistent Volume Claims (PVC): Requesting Storage Resources

Persistent Volume Claims act as a user’s request for storage resources. Image your PVC as a demand for storage with specific requirements, similar to how a developer requests computing resources.

When a user creates a PVC, Kubernetes attempts to find and bind an appropriate Persistent Volume that meets the specified criteria. If no existing volume is found but a storage class is defined or a cluster-default one is available, the persistent volume will be dynamically allocated.

Key PersistentVolumeClaim Characteristics:

Size Specification: Defines a user storage capacity request
Access Modes: Defines how the volume can be accessed
- ReadWriteOnce (RWO): Allows all pods on a single node to mount the volume in read-write mode.
- ReadWriteOncePod: Allows a single pod to read-write mount the volume on a single node.
- ReadOnlyMany (ROX): Allows multiple pods on multiple nodes to read the volume. Very practical for a shared configuration state.
- ReadWriteMany (RWO): Allows multiple pods on multiple nodes to read and write to the volume. Remember, this could be dangerous for databases and other applications that don’t support a shared state.
StorageClass: Allows requesting specific types of storage based on performance, redundancy, or other characteristics

The Container Storage Interface (CSI)

The Container Storage Interface (CSI) represents a pivotal advancement in Kubernetes storage architecture. Before CSI, integrating storage devices with Kubernetes was a complex and often challenging process that required a deep understanding of both storage systems and container orchestration.

The Container Storage Interface introduces a standardized approach to storage integration. Storage providers (commonly referred to as CSI drivers) are so-called out-of-process entities that communicate with Kubernetes via an API. The integration of CSI into the Kubernetes ecosystem provides three major benefits:

CSI provides a vendor-neutral, extensible plugin architecture
CSI simplifies the process of adding new storage systems to Kubernetes
CSI enables third-party storage providers to develop and maintain their own storage plugins without modifying Kubernetes core code

Volumes: The Basic Storage Units

In Kubernetes, volumes are fundamental storage entities that solve the problem of data persistence and sharing between containers. Unlike traditional storage solutions, Kubernetes volumes are not limited to a single type of storage medium. They can represent:

Volumes provide a flexible abstraction layer that allows applications to interact with storage resources without being directly coupled to the underlying storage infrastructure.

StorageClasses: Dynamic Storage Provisioning

StorageClasses represent a powerful abstraction that enables dynamic and flexible storage provisioning because they allow cluster administrators to define different types of storage services with varying performance characteristics, such as:

High-performance SSD storage
Economical magnetic drive storage
Geo-redundant cloud storage solutions

When a user requests storage through a PVC, Kubernetes tries to find an existing persistent volume. If none was found, the appropriate StorageClass defines how to automatically provision a suitable storage resource, significantly reducing administrative overhead.

Figure 2: Table with features for ephemeral storage and persistent storage

Best Practices for Kubernetes Storage Management

Resource Limitation
- Implement strict resource quotas
- Control storage consumption across namespaces
- Set clear boundaries for storage requests
Configuration Recommendations
- Always use Persistent Volume Claims in container configurations
- Maintain a default StorageClass
- Use meaningful and descriptive names for storage classes
Performance and Security Considerations
- Implement quality of service (QoS) controls
- Create isolated storage environments
- Enable multi-tenancy through namespace segregation

Practical Storage Provisioning Example

While specific implementations vary, here’s a conceptual example of storage provisioning using Helm:

helm install storage-solution storage-provider/csi-driver \
  --set storage.size=100Gi \
  --set storage.type=high-performance \
  --set access.mode=ReadWriteMany

Kubernetes Storage with Simplyblock CSI: Practical Implementation Guide

Simplyblock is a storage platform for stateful workloads such as databases, message queues, data warehouses, file storage, and similar. Therefore, simplyblock provides many features tailored to the use cases, simplifying deployments, improving performance, or enabling features such as instant database clones.

Basic Installation Example

When deploying storage in a Kubernetes environment, organizations need a reliable method to integrate storage solutions seamlessly. The Simplyblock CSI driver installation process begins by adding the Helm repository, which allows teams to easily access and deploy the storage infrastructure. By creating a dedicated namespace called simplyblock-csi, administrators ensure clean isolation of storage-related resources from other cluster components.

The installation command specifies critical configuration parameters that connect the Kubernetes cluster to the storage backend. The unique cluster UUID identifies the specific storage cluster, while the API endpoint provides the connection mechanism. The secret token ensures secure authentication, and the pool name defines the initial storage pool where volumes will be provisioned. This approach allows for a standardized, secure, and easily repeatable storage deployment process.

Here’s an example of installing the Simplyblock CSI driver:

helm repo add simplyblock-csi https://raw.githubusercontent.com/simplyblock-io/simplyblock-csi/master/charts

helm repo update

helm install -n simplyblock-csi --create-namespace \
  simplyblock-csi simplyblock-csi/simplyblock-csi \
  --set csiConfig.simplybk.uuid=[random-cluster-uuid] \
  --set csiConfig.simplybk.ip=[cluster-ip] \
  --set csiSecret.simplybk.secret=[random-cluster-secret] \
  --set logicalVolume.pool_name=[cluster-name]

Advanced Configuration Scenarios

1. Performance-Optimized Storage Configuration

Modern applications often require precise control over storage performance, making custom StorageClasses invaluable.

Firstly, by creating a high-performance storage class, organizations can define exact performance characteristics for different types of workloads. The configuration sets a specific IOPS (Input/Output Operations Per Second) limit of 5000, ensuring that applications receive consistent and predictable storage performance.

Secondly, bandwidth limitations of 500 MB/s prevent any single application from monopolizing storage resources, promoting fair resource allocation. The added encryption layer provides an additional security measure, protecting sensitive data at rest. This approach allows DevOps teams to create storage resources that precisely match application requirements, balancing performance, security, and resource management.

# Example StorageClass configuration
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance-storage
provisioner: csi.simplyblock.io
parameters:
  qos_rw_iops: "5000"    # High IOPS performance
  qos_rw_mbytes: "500"   # Bandwidth limit
  encryption: "True"      # Enable encryption

2. Multi-Tenant Storage Setup

As a large organization or cloud provider, you require a robust environment and workload separation mechanism. For that reason, teams organize workloads between development, staging, and production environments by creating a dedicated namespace for production applications.

Therefore, the custom storage class for production workloads ensures critical applications have access to dedicated storage resources with specific performance and distribution characteristics.

The distribution configuration with multiple network domain controllers (NDCs) provides enhanced reliability and performance. Indeed, this approach supports complex enterprise requirements by enabling granular control over storage resources, improving security, and ensuring that production workloads receive the highest quality of service.

# Namespace-based storage isolation
apiVersion: v1
kind: Namespace
metadata:
  name: production-apps

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: encrypted-volume
  annotations:
    simplybk/secret-name: encrypted-volume-keys
spec:
  storageClassName: encrypted-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Multipath Storage Configuration

Network resilience is a critical consideration in enterprise storage solutions. Hence, multipath storage configuration provides redundancy by allowing multiple network paths for storage communication. By enabling multipathing and specifying a default network interface, organizations can create more robust storage infrastructures that can withstand network interruptions.

The caching node creation further enhances performance by providing an intelligent caching layer that can improve read and write operations. Furthermore, this configuration supports load balancing and reduces potential single points of failure in the storage network.

cachingnode:
  create: true
  multipathing: true
  ifname: eth0  # Default network interface

Best Practices for Kubernetes Storage with Simplyblock

Always specify a unique pool name for each storage configuration
Implement encryption for sensitive workloads
Use QoS parameters to control storage performance
Leverage multi-tenancy features for environment isolation
Regularly monitor storage node capacities and performance

Deletion and Cleanup

# Uninstall the CSI driver
helm uninstall "simplyblock-csi" --namespace "simplyblock-csi"

# Remove the namespace
kubectl delete namespace simplyblock-csi

The examples demonstrate the flexibility of Kubernetes storage, showcasing how administrators can fine-tune storage resources to meet specific application requirements while maintaining performance, security, and scalability. Try simplyblock for the most flexible Kubernetes storage solution on the market today.

The post Kubernetes Storage 201: Concepts and Practical Examples appeared first on simplyblock.

Encryption At Rest: A Comprehensive Guide to DARE

Chris Engelbert — Tue, 17 Dec 2024 10:22:44 +0000

TLDR: Data At Rest Encryption (DARE) is the process of encrypting data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

Today, data is the “new gold” for companies. Data collection happens everywhere and at any time. The global amount of data collected is projected to reach 181 Zettabytes (that is 181 billion Terabytes) by the end of 2025. A whopping 23.13% increase over 2024.

That means data protection is becoming increasingly important. Hence, data security has become paramount for organizations of all sizes. One key aspect of data security protocols is data-at-rest encryption, which provides crucial protection for stored data.

Understanding the Data-in-Use, Data-in-Transit, and Data-at-Rest

Before we go deep into DARE, let’s quickly discuss the three states of data within a computing system and infrastructure.

Figure 1: The three states of data in encryption

To start, any type of data is created inside an application. While the application holds onto it, it is considered as data in use. That means data in use describes information actively being processed, read, or modified by applications or users.

For example, imagine you open a spreadsheet to edit its contents or a database to process a query. This data is considered “in use.” This state is often the most vulnerable as the data must be in an unencrypted form for processing. However, technologies like confidential computing enable encrypted memory to process even these pieces of data in an encrypted manner.

Next, data in transit describes any information moving between locations. Locations means across the internet, within a private network, or between memory and processors. Examples include email messages being sent, files being downloaded, or database query results traveling between the database server and applications—all examples of data in transit.

Last but not least, data at rest refers to any piece of information stored physically on a digital storage media such as flash storage or hard disk. It also considers storage solutions like cloud storage, offline backups, and file systems as valid digital storage. Hence, data stored in these services is also data at rest. Think of files saved on your laptop’s hard drive or photos stored in cloud storage such as Google Drive, Dropbox, or similar.

Criticality of Data Protection

For organizations, threats to their data are omnipresent. Starting from unfortunate human error, deleting important information, coordinated ransomware attacks, encryption of your data, and asking for a ransom to actual data leaks.

Especially with data leaks, most people think about external hackers copying data off of the organization’s network and making it public. However, this isn’t the only way data is leaked to the public. There are many examples of Amazon S3 buckets without proper authorization, databases being accessible from the outside world, or backup services being accessed.

Anyhow, organizations face increasing threats to their data security. Any data breach has consequences, which are categorized into four segments:

Financial losses from regulatory fines and legal penalties
Damage to brand reputation and loss of customer trust
Exposure of intellectual property to competitors
Compliance violations with regulations like GDPR, HIPAA, or PCI DSS

While data in transit is commonly protected through TLS (transport layer encryption), data at rest encryption (or DARE) is often an afterthought. This is typically because the setup isn’t as easy and straightforward as it should be. It’s this “I can still do this afterward” effect: “What could possibly go wrong?”

However, data at rest is the most vulnerable of all. Data in use is often unprotected but very transient. While there is a chance of a leak, it is small. Data at rest persists for more extended periods of time, giving hackers more time to plan and execute their attacks. Secondly, persistent data often contains the most valuable pieces of information, such as customer records, financial data, or intellectual property. And lastly, access to persistent storage enables access to large amounts of data within a single breach, making it so much more interesting to attackers.

Understanding Data At Rest Encryption (DARE)

Simply spoken, Data At Rest Encryption (DARE) transforms stored data into an unreadable format that can only be read with the appropriate encryption or decryption key. This ensures that the information isn’t readable even if unauthorized parties gain access to the storage medium.

That said, the strength of the encryption used is crucial. Many encryption algorithms we have considered secure have been broken over time. Encrypting data once and taking for granted that unauthorized parties can’t access it is just wrong. Data at rest encryption is an ongoing process, potentially involving re-encrypting information when more potent, robust encryption algorithms are available.

Available Encryption Types for Data At Rest Encryption

Figure 2: Symmetric encryption (same encryption key) vs asymmetric encryption (private and public key)

Two encryption methods are generally available today for large-scale setups. While we look forward to quantum-safe encryption algorithms, a widely adopted solution has yet to be developed. Quantum-safe means that the encryption will resist breaking attempts from quantum computers.

Anyhow, the first typical type of encryption is the Symmetric Encryption. It uses the same key for encryption and decryption. The most common symmetric encryption algorithm is AES, the Advanced Encryption Standard.

const cipherText = encrypt(plaintext, encryptionKey);
const plaintext = decrypt(cipherText, encryptionKey);

The second type of encryption is Asymmetric Encryption. Hereby, the encryption and decryption routines use different but related keys, generally referred to as Private Key and Public Key. According to their names, the public key can be publicly known, while the private key must be kept private. Both keys are mathematically connected and generally based on a hard-to-solve mathematical problem. Two standard algorithms are essential: RSA (the Rivest–Shamir–Adleman encryption) and ECDSA (the Elliptic Curve Digital Signature Algorithm). While RSA is based on the prime factorization problem, ECDSA is based on the discrete log problem. To go into detail about those problems is more than just one additional blog post, though.

const cipherText = encrypt(plaintext, publicKey);
const plaintext = decrypt(cipherText, privateKey);

Encryption in Storage

The use cases for symmetric and asymmetric encryption are different. While considered more secure, asymmetric encryption is slower and typically not used for large data volumes. Symmetric encryption, however, is fast but requires the sharing of the encryption key between the encrypting and decrypting parties, making it less secure.

To encrypt large amounts of data and get the best of both worlds, security and speed, you often see a combination of both approaches. The symmetric encryption key is encrypted with an asymmetric encryption algorithm. After decrypting the symmetric key, it is used to encrypt or decrypt the stored data.

Simplyblock provides data-at-rest encryption directly through its Kubernetes CSI integration or via CLI and API. Additionally, simplyblock can be configured to use a different encryption key per logical volume for the highest degree of security and muti-tenant isolation.

Key Management Solutions

Next to selecting the encryption algorithm, managing and distributing the necessary keys is crucial. This is where key management solutions (KMS) come in.

Generally, there are two basic types of key management solutions: hardware and software-based.

For hardware-based solutions, you’ll typically utilize an HSM (Hardware Security Module). These HSMs provide a dedicated hardware token (commonly a USB key) to store keys. They offer the highest level of security but need to be physically attached to systems and are more expensive.

Software-based solutions offer a flexible key management alternative. Almost all cloud providers offer their own KMS systems, such as Azure Key Vault, AWS KMS, or Google Cloud KMS. Additionally, third-party key management solutions are available when setting up an on-premises or private cloud environment.

When managing more than a few encrypted volumes, you should implement a key management solution. That’s why simplyblock supports KMS solutions by default.

Implementing Data At Rest Encryption (DARE)

Simply said, you should have a key management solution ready to implement data-at-rest encryption in your company. If you run in the cloud, I recommend using whatever the cloud provider offers. If you run on-premises or the cloud provider doesn’t provide one, select from the existing third-party solutions.

DARE with Linux

As the most popular server operating system, Linux has two options to choose from when you want to encrypt data. The first option works based on files, providing a filesystem-level encryption.

Typical solutions for filesystem-level based encryption are eCryptfs, a stacked filesystem, which stores encryption information in the header information of each encrypted file. The benefit of eCryptfs is the ability to copy encrypted files to other systems without the need to decrypt them first. As long as the target node has the necessary encryption key in its keyring, the file will be decrypted. The alternative is EncFS, a user-space filesystem that runs without special permissions. Both solutions haven’t seen updates in many years, and I’d generally recommend the second approach.

Block-level encryption transparently encrypts the whole content of a block device (meaning, hard disk, SSD, simplyblock logical volume, etc.). Data is automatically decrypted when read. While there is VeraCrypt, it is mainly a solution for home setups and laptops. For server setups, the most common way to implement block-level encryption is a combination of dm-crypt and LUKS, the Linux Unified Key Setup.

Linux Data At Rest Encryption with dm-crypt and LUKS

The fastest way to encrypt a volume with dm-crypt and LUKS is via the cryptsetup tools.

Debian / Ubuntu / Derivates:

sudo apt install cryptsetup

RHEL / Rocky Linux / Oracle:

yum install cryptsetup-luks

Now, we need to enable encryption for our block device. In this example, we assume we already have a whole block device or a partition we want to encrypt.

Warning: The following command will delete all data from the given device or partition. Make sure you use the correct device file.

cryptsetup -y luksFormat /dev/xvda

I recommend always running cryptsetup with the -y parameter, which forces it to ask for the passphrase twice. If you misspelled it once, you’ll realize it now, not later.

Now open the encrypted device.

cryptsetup luksOpen /dev/xvda encrypted

This command will ask for the passphrase. The passphrase is not recoverable, so you better remember it.

Afterward, the device is ready to be used as /dev/mapper/encrypted. We can format and mount it.

mkfs.ext4 /dev/mapper/encrypted
mkdir /mnt/encrypted
mount /dev/mapper/encrypted /mnt/encrypted

Data At Rest Encryption with Kubernetes

Kubernetes offers ephemeral and persistent storage as volumes. Those volumes can be statically or dynamically provisioned.

For pre-created and statically provisioned volumes, you can follow the above guide on encrypting block devices on Linux and make the already encrypted device available to Kubernetes.

However, Kubernetes doesn’t offer out-of-the-box support for encrypted and dynamically provisioned volumes. Encrypting persistent volumes is not in Kubernetes’s domain. Instead, it delegates this responsibility to its container storage provider, connected through the CSI (Container Storage Interface).

Note that not all CSI drivers support the data-at-rest encryption, though! But simplyblock does!

Data At Rest Encryption with Simplyblock

Figure 3: Encryption stack for Linux: dm-crypt+LUKS vs Simplyblock

Due to the importance of DARE, simplyblock enables you to secure your data immediately through data-at-rest encryption. Simplyblock goes above and beyond with its features and provides a fully multi-tenant DARE feature set. That said, in simplyblock, you can encrypt multiple logical volumes (virtual block devices) with the same key or one key per logical volume.

The use cases are different. One key per volume enables the highest level of security and complete isolation, even between volumes. You want this to encapsulate applications or teams fully.

When multiple volumes share the same encryption key, you want to provide one key per customer. This ensures that you isolate customers against each other and prevent data from being accessible by other customers on a shared infrastructure in the case of a configuration failure or similar incident.

To set up simplyblock, you can configure keys manually or utilize a key management solution. In this example, we’ll set up the key manual. We also assume that simplyblock has already been deployed as your distributed storage platform, and the simplyblock CSI driver is available in Kubernetes.

First, let’s create our Kubernetes StorageClass for simplyblock. Deploy the YAML file via kubectl, just as any other type of resource.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: encrypted-volumes
provisioner: csi.simplyblock.io
parameters:
    encryption: "True"
    csi.storage.k8s.io/fstype: ext4
    ... other parameters
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Secondly, we generate the two keys. Note the results of the two commands down.

openssl rand -hex 32   # Key 1
openssl rand -hex 32   # Key 2

Now, we can create our secret and Persistent Volume Claim (PVC).

apiVersion: v1
kind: Secret
metadata:
    name: encrypted-volume-keys
data:
    crypto_key1: 
    crypto_key2: 
–--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    annotations:
        simplybk/secret-name: encrypted-volume-keys
    name: encrypted-volume-claim
spec:
    storageClassName: encrypted-volumes
    accessModes:
        - ReadWriteOnce
    resources:
        requests:
            storage: 200Gi

And we’re done. Whenever we use the persistent volume claim, Kubernetes will delegate to simplyblock and ask for the encrypted volume. If it doesn’t exist yet, simplyblock will create it automatically. Otherwise, it will just provide it to Kubernetes directly. All data written to the logical volume is fully encrypted.

Best Practices for Securing Data at Rest

Implementing robust data encryption is crucial. In the best case, data should never exist in a decrypted state.

That said, data-in-use encryption is still complicated as of writing. However, solutions such as Edgeless Systems’ Constellation exist and make it possible using hardware memory encryption.

Data-in-transit encryption is commonly used today via TLS. If you don’t use it yet, there is no time to waste. Low-hanging fruits first.

Data-at-rest encryption in Windows, Linux, or Kubernetes doesn’t have to be complicated. Solutions such as simplyblock enable you to secure your data with minimal effort.

However, there are a few more things to remember when implementing DARE effectively.

Data Classification

Organizations should classify their data based on sensitivity levels and regulatory requirements. This classification guides encryption strategies and key management policies. A robust data classification system includes three things:

Sensitive data identification: Identify sensitive data through automated discovery tools and manual review processes. For example, personally identifiable information (PII) like social security numbers should be classified as highly sensitive.
Classification levels: Establish clear classification levels such as Public, Internal, Confidential, and Restricted. Each level should have defined handling requirements and encryption standards.
Automated classification: Implement automated classification tools to scan and categorize data based on content patterns and metadata.

Access Control and Key Management

Encryption is only as strong as the key management and permission control around it. If your keys are leaked, the strongest encryption is useless.

Therefore, it is crucial to implement strong access controls and key rotation policies. Additionally, regular key rotation helps minimize the impact of potential key compromises and, I hate to say it, employees leaving the company.

Monitoring and Auditing

Understanding potential risks early is essential. That’s why it must maintain comprehensive logs of all access to encrypted data and the encryption keys or key management solutions. Also, regular audits should be scheduled for suspicious activities.

In the best case, multiple teams run independent audits to prevent internal leaks through dubious employees. While it may sound harsh, there are situations in life where people take the wrong path. Not necessarily on purpose or because they want to.

Data Minimization

The most secure data is the data that isn’t stored. Hence, you should only store necessary data.

Apart from that, encrypt only what needs protection. While this sounds counterproductive, it reduces the attack surface and the performance impact of encryption.

Data At Rest Encryption: The Essential Component of Data Management

Data-at-rest encryption (DARE) has become essential for organizations handling sensitive information. The rise in data breaches and increasingly stringent regulatory requirements make it vital to protect stored information. Additionally, with the rise of cloud computing and distributed systems, implementing DARE is more critical than ever.

Simplyblock integrates natively with Kubernetes to provide a seamless approach to implementing data-at-rest encryption in modern containerized environments. With our support for transparent encryption, your organization can secure its data without any application changes. Furthermore, simplyblock utilizes the standard NVMe over TCP protocol which enables us to work natively with Linux. No additional drivers are required. Use a simplyblock logical volume straight from your dedicated or virtual machine, including all encryption features.

Anyhow, for organizations running Kubernetes, whether in public clouds, private clouds, or on-premises, DARE serves as a fundamental security control. By following best practices and using modern tools like Simplyblock, organizations can achieve robust data protection while maintaining system performance and usability.

But remember that DARE is just one component of a comprehensive data security strategy. It should be combined with other security controls, such as access management, network security, and security monitoring, to create a defense-in-depth approach to protecting sensitive information.

That all said, by following the guidelines and implementations detailed in this article, your organization can effectively protect its data at rest while maintaining system functionality and performance.

As threats continue to evolve, having a solid foundation in data encryption becomes increasingly crucial for maintaining data security and regulatory compliance.

What is Data At Rest Encryption?

Data At Rest Encryption (DARE), or encryption at rest, is the encryption process of data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

What is Data-At-Rest?

Data-at-rest refers to any piece of information written to physical storage media, such as flash storage or hard disk. This also includes storage solutions like cloud storage, offline backups, and file systems as valid digital storage.

What is Data-In-Use?

Data-in-use describes any type of data created inside an application, which is considered data-in-use as long as the application holds onto it. Hence, data-in-use is information actively processed, read, or modified by applications or users.

What is Data-In-Transit?

Data-in-transit describes any information moving between locations, such as data sent across the internet, moved within a private network, or transmitted between memory and processors.

The post Encryption At Rest: A Comprehensive Guide to DARE appeared first on simplyblock.

Scale Up vs Scale Out: System Scalability Strategies

Chris Engelbert — Wed, 11 Dec 2024 10:00:40 +0000

TLDR: Horizontal scalability (scale out) describes a system that scales by adding more resources through parallel systems, whereas vertical scalability (scale up) increases the amount of resources on a single system.

One of the most important questions to answer when designing an application or infrastructure is the architecture approach to system scalability. Traditionally, systems used the scale-up approach or vertical scalability. Many modern systems use a scale-out approach, especially in the cloud-native ecosystem. Also called horizontal scalability.

Scale-Up vs Scale-Out: Which System Architecture is Right for You?

Understanding the Basics

Understanding the fundamental concepts is essential when discussing system architectures. Hence, let’s briefly overview the two approaches before exploring them in more depth.

With Scale Up (Vertical Scalability), you increase resources (typically CPU, memory, and storage) in the existing system to improve performance and capacity.
With Scale Out (Horizontal Scalability), you add additional nodes or machines to the existing workforce to distribute the workload across multiple systems.

Both architectural approaches have their respective advantages and disadvantages. While scale-up architectures are easier to implement, they are harder to scale at a certain point. On the other hand, scale-out architectures are more complex to implement but scale almost linearly if done right.

Vertical Scaling (Scale Up) Architectures: The Traditional Approach

Figure 1: Scale-up storage architecture with disks being added to the same machine

Vertical scaling, commonly known as scaling up, involves adding more resources to an existing system to increase its power or capacity.

Think of it as upgrading your personal computer. Instead of buying a second computer, you add more RAM or install a faster processor or larger storage device. In enterprise storage systems, this typically means adding more CPU cores, memory, or storage drives to an existing server. Meanwhile, for virtual machines it usually involves increasing the host machine’s assigned resources.

To clarify, let’s use a real-world example from the storage industry. With a ZFS-based SAN (Storage Area Network) system, a scaling up system design is required. Or as Jason Lohrey wrote: «However, ZFS has a significant issue – it can’t scale out. ZFS’s biggest limitation is that it is “scale-up” only.» ZFS, as awesome as it is, is limited to a single machine. That said, increasing the storage capacity always means adding larger or more disks to the existing machine. This approach maintains the simplicity of the original architecture while increasing storage capacity and potentially improving performance.

Strengths of Vertical Scaling

Today, many people see the vertical scalability approach as outdated and superfluous. That is, however, not necessarily true. Vertical scaling shines in several scenarios.

First, implementing a scale-up system is generally more straightforward since it doesn’t require changes to your application architectures or complex data distribution logic. When you scale up a transactional database like PostgreSQL or MySQL, you essentially give it more operational resources while maintaining the same operational model.

Secondly, the management overhead is lower. Tasks such as backups, monitoring, and maintenance are straightforward. This simplicity often translates to lower operational costs despite the potentially higher hardware costs.

Here is a quick overview of all the advantages:

Simplicity: It’s straightforward to implement since you’re just adding resources to an existing system
Lower Complexity: Less architectural overhead since you’re working with a single system
Consistent Performance: Lower latency due to all resources being in one place
Software Compatibility: Most traditional software is designed to run on a single system
Lower Initial Costs: Often cheaper for smaller workloads due to simpler licensing and management

Weaknesses and Limitations of Scale-Up Architectures

Like anything in this world, vertical scaling architectures also have drawbacks. The most significant limitation is the so-called physical ceiling. A system is limited by its server chassis’s space capacity or the hardware architecture’s limitation. You can only add as much hardware as those limitations allow. Alternatively, you need to migrate to a bigger base system.

Traditional monolithic applications often face another challenge with vertical scaling: adding more resources doesn’t always translate to linear performance improvements. For example, doubling the CPU cores might yield only a 50% performance increase due to software architecture limitations, especially resource contention.

Here is a quick overview of all the disadvantages:

Hardware Limits: The physical ceiling limits how much you can scale up based on maximum hardware specifications
Downtime During Upgrades: Usually requires system shutdown for hardware upgrades
Cost Efficiency: High-end hardware becomes exponentially more expensive
Single Point of Failure: No built-in redundancy
Limited Flexibility: Cannot easily scale back down when demand decreases

When to Scale Up?

After all that, here is when you really want to go with a scale-up architecture:

You have traditional monolithic applications
You look for an easier way to optimize for performance, not capacity
You’re dealing with applications that aren’t designed for distributed computing
You need a quick solution for immediate performance issues

Horizontal Scaling (Scale Out) Architectures: The Distributed Approach

Figure 2: Scale-out storage architecture with additional nodes being added to the cluster

The fundamentally different approach is the horizontal scaling or scale-out architecture. Instead of increasing the available resources on the existing system, you add more systems to distribute the load across them. This is actually similar to adding additional workers to an assembly line rather than trying to make one worker more efficient.

Consider a distributed storage system like simplyblock or a distributed database like MongoDB. When you scale out these systems, you add more nodes to the cluster, and the workload gets distributed across all nodes. Each node handles a portion of the data and processing, allowing the system to grow almost limitlessly.

Advantages of Horizontal Scaling

Large-scale deployments and highly distributed systems are the forte of scale-out architectures. As a simple example, most modern web applications utilize load balancers. They distribute the traffic across multiple application servers. This allows us to handle millions of concurrent requests and users. Similarly, distributed storage systems like simplyblock scale to petabytes of data by adding additional storage nodes.

Secondly, another significant advantage is improved high availability and fault tolerance. In a properly designed scale-out system, if one node fails, the system continues operating. While it may degrade to a reduced service, it will not experience a complete system failure or outage.

To bring this all to a point:

Near-Infinite Scalability: Can continue adding nodes as needed
Better Fault Tolerance: Built-in redundancy through multiple nodes
Cost Effectiveness: Can use commodity hardware
Flexible Resource Allocation: Easy to scale up or down based on demand
High Availability: No single point of failure

The Cost of Distribution: Weakness and Limitations of Horizontal Scalability

The primary challenge when considering scale-out architectures is complexity. Distributed systems must maintain data consistency across system boundaries, handle network communications or latencies, and handle failure recovery. Multiple algorithms have been developed over the years. The most commonly used ones are Raft and Paxos, but that’s a different blog post. Anyhow, this complexity typically requires more sophisticated management tools and distributed systems expertise. Normally also for the team operating the system.

The second challenge is the overhead of system coordination. In a distributed system, nodes must synchronize their operations. If not careful, this can introduce latency and even reduce the performance of certain types of operations. Great distributed systems utilize sophisticated algorithms to prevent these issues from happening.

Here is a quick overview of the disadvantages of horizontal scaling:

Increased Complexity: More moving parts to manage
Data Consistency Challenges: Maintaining consistency across nodes can be complex
Higher Initial Setup Costs: Requires more infrastructure and planning
Software Requirements: Applications must be designed for distributed computing
Network Overhead: Communication between nodes adds latency

Kubernetes: A Modern Approach to Scaling

Kubernetes has become the de facto platform for container orchestration. It comes in multiple varieties, in its vanilla form or as the basis for systems like OpenShift or Rancher. Either way, it can be used for both vertical and horizontal scaling capabilities. However, Kubernetes has become a necessity when deploying scale-out services. Let’s look at how different workloads scale in a Kubernetes environment.

Scaling Stateless Workloads

Stateless applications, like web servers or API gateways, are natural candidates for horizontal scaling in Kubernetes. The Horizontal Pod Autoscaler (HPA) provided by Kubernetes automatically adjusts the number of pods based on metrics such as CPU or RAM utilization. Custom metrics as triggers are also possible.

Horizontally scaling stateless applications is easy. As the name suggests, stateless applications do not maintain persistent local or shared state data. Each instance or pod is entirely independent and interchangeable. Each request to the service contains all the required information needed for processing.

That said, automatically scaling up and down (in the meaning of starting new instances or shutting some down) is part of the typical lifecycle and can happen at any point in time.

Scaling Stateful Workloads

Stateful workloads, like databases, require more careful consideration.

A common approach for more traditional databases like PostgreSQL or MySQL is to use a primary-replica architecture. In this design, write operations always go to the primary instance, while read operations can be distributed across all replicas.

On the other hand, MongoDB, which uses a distributed database design, can scale out more naturally by adding more shards to the cluster. Their internal cluster design uses a technique called sharding. Data is assigned to horizontally scaling partitions distributed across the cluster nodes. Shard assignment happens either automatically (based on the data) or by providing a specific shard key, enabling data affinity. Adding a shard to the cluster will increase capacity when additional scale is necessary. Data rebalancing happens automatically.

Why we built Simplyblock on a Scale-Out Architecture

Figure 3: Simplyblock’s scale-out architecture with storage pooling via cluster nodes

Stateful workloads, like Postgres or MySQL, can scale out by adding additional read-replicas to the cluster. However, every single instance needs storage to store its very own data. Hence, the need for scalable storage arrives.

Simplyblock is a cloud-native and distributed storage platform built to deliver scalable performance and virtually infinite capacity for logical devices through horizontal scalability. Unlike traditional storage systems, simplyblock distributes data across all cluster nodes, multiplying the performance and capacity.

Designed as an NVMe-first architecture, simplyblock using the NVMe over Fabrics protocol family. This extends the reach of the highly scalable NVMe protocol over network fabrics such as TCP, Fibre Channel, and others. Furthermore, it provides built-in support for multi-pathing, enabling seamless failover and load balancing.

The system uses a distributed data placement algorithm to spread data across all available cluster nodes, automatically rebalancing data when nodes are added or removed. When writing data, simplyblock splits the item into multiple, smaller chunks and distributes them. This allows for parallel access during read operations. The data distribution also provides redundancy, with parity information stored on other nodes in the cluster. This protects the data against individual disk and node failures.

Using this architecture, simplyblock provides linear capacity and performance scalability by pooling all available disks and parallelizing access. This enables simplyblock to scale from mere terabytes to multiple petabytes while maintaining performance, consistency, and durability characteristics throughout the cluster-growth process.

Building Future-Proof Infrastructure

To wrap up, when you build out a new system infrastructure or application, consider these facts:

Figure 4: Flowchart when to scale-up or scale-out?

Workload characteristics: CPU-intensive workloads might benefit more from vertical scaling. Distributing operations comes with its own overhead. If the operation itself doesn’t set off this overhead, you might see lower performance than with vertical scaling. On the other hand, I/O-heavy workloads might perform better with horizontal scaling. If the access patterns are highly parallelizable, a horizontal architecture will most likely out scale a vertical one.
Growth patterns: Predictable, steady growth might favor scaling up, while rapid growth patterns might necessitate the flexibility of scaling out. This isn’t a hard rule, though. A carefully designed scale-out system will provide a very predictable growth pattern and latency. However, the application isn’t the only element to take into account when designing the system, as there are other components, most prominently the network and network equipment.
Future-Proofing: Scaling out often requires little upfront investment in infrastructure but higher investment in development and expertise. It can, however, provide better long-term cost efficiency for large deployments. That said, buying a scale-out solution is a great idea. With a storage solution like simplyblock, for example, you can start small and add required resources whenever necessary. With traditional storage solutions, you have to go with a higher upfront cost and are limited by the physical ceiling.
Operational Complexity: Scale-up architectures are typically easier to manage, while a stronger DevOps or operations team is required to handle scale-out solutions. That’s why simplyblock’s design is carefully crafted to be fully autonomous and self-healing, with as few hands-on requirements as possible.

The Answer Depends

That means there is no universal answer to whether scaling up or out is better. A consultant would say, “It depends.” Seriously, it does. It depends on your specific requirements, constraints, and goals.

Many successful organizations use a hybrid approach, scaling up individual nodes while also scaling out their overall infrastructure. The key is understanding the trade-offs and choosing the best approach to your needs while keeping future growth in mind. Hence, simplyblock provides the general scale-out architecture for infinite scalability. It also provides a way to utilize storage located in Kubernetes worker nodes as part of the storage cluster to provide the highest possible performance. At the same time, it maintains the option to spill over when local capacity is reached and the high durability and fault tolerance of a fully distributed storage system.

Remember, the best scaling strategy aligns with your business objectives while maintaining performance, reliability, and cost-effectiveness. Whether you scale up, out, or both, ensure your choice supports your long-term infrastructure goals.

Figure 5: Simple definition of scale up vs scale out.

The post Scale Up vs Scale Out: System Scalability Strategies appeared first on simplyblock.

NVMe & Kubernetes: Future-Proof Infrastructure

Chris Engelbert — Wed, 27 Nov 2024 13:34:00 +0000

The marriage of NVMe storage and Kubernetes persistent volumes represents a perfect union of high-performance storage and modern container orchestration. As organizations increasingly move performance-critical workloads to Kubernetes, understanding how to leverage NVMe technology becomes crucial for achieving optimal performance and efficiency.

The Evolution of Storage in Kubernetes

When Kubernetes was created over 10 years ago, its only purpose was to schedule and orchestrate stateless workloads. Since then, a lot has changed, and Kubernetes is increasingly used for stateful workloads. Not just basic ones but mission-critical workloads, like a company’s primary databases. The promise of workload orchestration in infrastructures with growing complexity is too significant.

Anyhow, traditional Kubernetes storage solutions relied upon and still rely on old network-attached storage protocols like iSCSI. Released in 2000, iSCSI was built on the SCSI protocol itself, first introduced in the 1980s. Hence, both protocols are inherently designed for spinning disks with much higher seek times and access latencies. According to our modern understanding of low latency and low complexity, they just can’t keep up.

While these solutions worked well for basic containerized applications, they fall short for high-performance workloads like databases, AI/ML training, and real-time analytics. Let’s look at the NVMe standard, particularly NVMe over TCP, which has transformed our thinking about storage in containerized environments, not just Kubernetes.

Why NVMe and Kubernetes Work So Well Together

The beauty of this combination lies in their complementary architectures. The NVMe protocol and command set were designed from the ground up for parallel, low-latency operations–precisely what modern containerized applications demand. When you combine NVMe’s parallelism with Kubernetes’ orchestration capabilities, you get a system that can efficiently distribute I/O-intensive workloads while maintaining microsecond-level latency. Further, comparing NVMe over TCP vs iSCSI, we see significant improvement in terms of IOPS and latency performance when using NVMe/TCP.

Consider a typical database workload on Kubernetes. Traditional storage might introduce latencies of 2-4ms for read operations. With NVMe over TCP, these same operations complete in under 200 microseconds–a 10-20x improvement. This isn’t just about raw speed; it’s about enabling new classes of applications to run effectively in containerized environments.

The Technical Symphony

The integration of NVMe with Kubernetes is particularly elegant through persistent volumes and the Container Storage Interface (CSI). Modern storage orchestrators like simplyblock leverage this interface to provide seamless NVMe storage provisioning while maintaining Kubernetes’ declarative model. This means development teams can request high-performance storage using familiar Kubernetes constructs while the underlying system handles the complexity of NVMe management, providing fully reliable shared storage.

The NMVe Impact: A Real-World Example

But what does that mean for actual workloads? Our friends over at Percona found in their MongoDB Performance on Kubernetes report that Kubernetes implies no performance penalty. Hence, we can look at the disks’ actual raw performance.

A team of researchers from the University of Southern California, San Jose State University, and Samsung Semiconductor took on the challenge of measuring the implications of NVMe SSDs (over SATA SSD and SATA HDD) for real-world database performance.

The general performance characteristics of their test hardware:

	NVMe SSD	SATA SSD	SATA HDD
Access latency	113µs	125µs	14,295µs
Maximum IOPS	750,000	70,000	190
Maximum Bandwidth	3GB/s	278MB/s	791KB/s

Table 1: General performance characteristics of the different storage types

Their resume states, “scale-out systems are driving the need for high-performance storage solutions with high available bandwidth and lower access latencies. To address this need, newer standards are being developed exclusively for non-volatile storage devices like SSDs,” and “NVMe’s hardware and software redesign of the storage subsystem translates into real-world benefits.”

They’re closing with some direct comparisons that claim an 8x performance improvement of NVMe-based SSDs compared to a single SATA-based SSD and still a 5x improvement over a [Hardware] RAID-0 of four SATA-based SSDs.

Transforming Database Operations

Perhaps the most compelling use case for NVMe in Kubernetes is database operations. Typical modern databases process queries significantly faster when storage isn’t the bottleneck. This becomes particularly important in microservices architectures where concurrent database requests and high-load scenarios are the norm.

Traditionally, running stateful services in Kubernetes meant accepting significant performance overhead. With NVMe storage, organizations can now run high-performance databases, caches, and messaging systems with bare-metal-like performance in their Kubernetes clusters.

Dynamic Resource Allocation

One of Kubernetes’ central promises is dynamic resource allocation. That means assigning CPU and memory according to actual application requirements. Furthermore, it also means dynamically allocating storage for stateful workloads. With storage classes, Kubernetes provides the option to assign different types of storage backends to different types of applications. While not strictly necessary, this can be a great application of the “best tool for the job” principle.

That said, for IO-intensive workloads, such as databases, a storage backend providing NVMe storage is essential. NVMe’s ability to handle massive I/O parallelism aligns perfectly with Kubernetes’ scheduling capabilities. Storage resources can be dynamically allocated and deallocated based on workload demands, ensuring optimal resource utilization while maintaining performance guarantees.

Simplified High Availability

The low latency of NVMe over TCP enables new approaches to high availability. Instead of complex database replication schemes, organizations can leverage storage-level replication (or more storage-efficient erasure coding, like in the case of simplyblock) with a negligible performance impact. This significantly simplifies application architecture while improving reliability.

Furthermore, NVMe over TCP utilizes multipathing as an automatic fail-over implementation to protect against network connection issues and sudden connection drops, increasing the high availability of persistent volumes in Kubernetes.

The Physics Behind NVMe Performance

Many teams don’t realize how profoundly storage physics impacts database operations. Traditional storage solutions averaging 2-4ms latency might seem fast, but this translates to a hard limit of about 80 consistent transactions per second, even before considering CPU or network overhead. Each transaction requires multiple storage operations: reading data pages, writing to WAL, updating indexes, and performing one or more fsync() operations. At 3ms per operation, these quickly stack up into significant delays. Many teams spend weeks optimizing queries or adding memory when their real bottleneck is fundamental storage latency.

This is where the NVMe and Kubernetes combination truly shines. With NVMe as your Kubernetes persistent volume storage backend, providing sub-200μs latency, the same database operations can theoretically support over 1,200 transactions per second–a 15x improvement. More importantly, this dramatic reduction in storage latency changes how databases behave under load. Connection pools remain healthy longer, buffer cache decisions become more efficient, and query planners can make better optimization choices. With the storage bottleneck removed, databases can finally operate closer to their theoretical performance limits.

Looking Ahead

The combination of NVMe and Kubernetes is just beginning to show its potential. As more organizations move performance-critical workloads to Kubernetes, we’ll likely see new patterns and use cases that fully take advantage of this powerful combination.

Some areas to watch:

AI/ML workload optimization through intelligent data placement
Real-time analytics platforms leveraging NVMe’s parallel access capabilities
Next-generation database architectures built specifically for NVMe on Kubernetes Persistent Volumes

The marriage of NVMe-based storage and Kubernetes Persistent Volumes represents more than just a performance improvement. It’s a fundamental shift in how we think about storage for containerized environments. Organizations that understand and leverage this combination effectively gain a significant competitive advantage through improved performance, reduced complexity, and better resource utilization.

For a deeper dive into implementing NVMe storage in Kubernetes, visit our guide on optimizing Kubernetes storage performance.

The post NVMe & Kubernetes: Future-Proof Infrastructure appeared first on simplyblock.

How to Build Scalable and Reliable PostgreSQL Systems on Kubernetes

Sanskar Gurdasani (Guest Author, CloudRaft) — Mon, 25 Nov 2024 15:31:37 +0000

This is a guest post by Sanskar Gurdasani, DevOps Engineer, from CloudRaft.

Maintaining highly available and resilient PostgreSQL databases is crucial for business continuity in today’s cloud-native landscape. The Cloud Native PostgreSQL Operator provides robust capabilities for managing PostgreSQL clusters in Kubernetes environments, particularly in handling failover scenarios and implementing disaster recovery strategies.

In this blog post, we’ll explore the key features of the Cloud Native PostgreSQL Operator for managing failover and disaster recovery. We’ll discuss how it ensures high availability, implements automatic failover, and facilitates disaster recovery processes. Additionally, we’ll look at best practices for configuring and managing PostgreSQL clusters using this operator in Kubernetes environments.

Why to run Postgres on Kubernetes?

Running PostgreSQL on Kubernetes offers several advantages for modern, cloud-native applications:

Stateful Workload Readiness: Contrary to old beliefs, Kubernetes is now ready for stateful workloads like databases. A 2021 survey by the Data on Kubernetes Community revealed that 90% of respondents believe Kubernetes is suitable for stateful workloads, with 70% already running databases in production.
Immutable Application Containers: CloudNativePG leverages immutable application containers, enhancing deployment safety and repeatability. This approach aligns with microservice architecture principles and simplifies updates and patching.
Cloud-Native Benefits: Running PostgreSQL on Kubernetes embraces cloud-native principles, fostering a DevOps culture, enabling microservice architectures, and providing robust container orchestration.
Automated Management: Kubernetes operators like CloudNativePG extend Kubernetes controllers to manage complex applications like PostgreSQL, handling deployments, failovers, and other critical operations automatically.
Declarative Configuration: CloudNativePG allows for declarative configuration of PostgreSQL clusters, simplifying change management and enabling Infrastructure as Code practices.
Resource Optimization: Kubernetes provides efficient resource management, allowing for better utilization of infrastructure and easier scaling of database workloads.
High Availability and Disaster Recovery: Kubernetes facilitates the implementation of high availability architectures across availability zones and enables efficient disaster recovery strategies.
Streamlined Operations with Operators: Using operators like CloudNativePG automates all the tasks mentioned above, significantly reducing operational complexity. These operators act as PostgreSQL experts in code form, handling intricate database management tasks such as failovers, backups, and scaling with minimal human intervention. This not only increases reliability but also frees up DBAs and DevOps teams to focus on higher-value activities, ultimately leading to more robust and efficient database operations in Kubernetes environments.

By leveraging Kubernetes for PostgreSQL deployments, organizations can benefit from increased automation, improved scalability, and enhanced resilience for their database infrastructure, with operators like CloudNativePG further simplifying and optimizing these processes.

List of Postgres Operators

Kubernetes operators represent an innovative approach to managing applications within a Kubernetes environment by encapsulating operational knowledge and best practices. These extensions automate the deployment and maintenance of complex applications, such as databases, ensuring smooth operation in a Kubernetes setup.

The Cloud Native PostgreSQL Operator is a prime example of this concept, specifically designed to manage PostgreSQL clusters on Kubernetes. This operator automates various database management tasks, providing a seamless experience for users. Some key features include direct integration with the Kubernetes API server for high availability without relying on external tools, self-healing capabilities through automated failover and replica recreation, and planned switchover of the primary instance to maintain data integrity during maintenance or upgrades.

Additionally, the operator supports scalable architecture with the ability to manage multiple instances, declarative management of PostgreSQL configuration and roles, and compatibility with Local Persistent Volumes and separate volumes for WAL files. It also offers continuous backup solutions to object stores like AWS S3, Azure Blob Storage, and Google Cloud Storage, ensuring data safety and recoverability. Furthermore, the operator provides full recovery and point-in-time recovery options from existing backups, TLS support with client certificate authentication, rolling updates for PostgreSQL minor versions and operator upgrades, and support for synchronous replicas and HA physical replication slots. It also offers replica clusters for multi-cluster PostgreSQL deployments, connection pooling through PgBouncer, a native customizable Prometheus metrics exporter, and LDAP authentication support.

By leveraging the Cloud Native PostgreSQL Operator, organizations can streamline their database management processes on Kubernetes, reducing manual intervention and ensuring high availability, scalability, and security in their PostgreSQL deployments. This operator showcases how Kubernetes operators can significantly enhance application management within a cloud-native ecosystem.

Here are the most popular PostgreSQL operators:

CloudNativePG (formerly known as Cloud Native PostgreSQL Operator)
Crunchy Data Postgres Operator (first released in 2017)
Zalando Postgres Operator (first released in 2017)
StackGres (released in 2020)
Percona Operator for PostgreSQL (released in 2021)
Kubegres (released in 2021)
Patroni (for HA PostgreSQL solutions using Python.)

Understanding Failover in PostgreSQL

Primary-Replica Architecture

In a PostgreSQL cluster, the primary-replica (formerly master-slave) architecture consists of:

Primary Node: Handles all write operations and read operations
Replica Nodes: Maintain synchronized copies of the primary node’s data

Automatic Failover Process

When the primary node becomes unavailable, the operator initiates the following process:

Detection: Continuous health monitoring identifies primary node failure
Election: A replica is selected to become the new primary
Promotion: The chosen replica is promoted to primary status
Reconfiguration: Other replicas are reconfigured to follow the new primary
Service Updates: Kubernetes services are updated to point to the new primary

Implementing Disaster Recovery

Backup Strategies

The operator supports multiple backup approaches:

1. Volume Snapshots

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgresql-cluster
spec:
  instances: 3
  backup:
    volumeSnapshot:
      className: csi-hostpath-snapclass
      enabled: true
      snapshotOwnerReference: true

2. Barman Integration

spec:
  backup:
    barmanObjectStore:
      destinationPath: 's3://backup-bucket/postgres'
      endpointURL: 'https://s3.amazonaws.com'
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS\_KEY\_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS\_SECRET\_KEY

Disaster Recovery Procedures

Point-in-Time Recovery (PITR)
- Enables recovery to any specific point in time
- Uses WAL (Write-Ahead Logging) archives
- Minimizes data loss
Cross-Region Recovery
- Maintains backup copies in different geographical regions
- Enables recovery in case of regional failures

Demo

This section provides a step-by-step guide to setting up a CloudNative PostgreSQL cluster, testing failover, and performing disaster recovery.

1. Installation

Method 1: Direct Installation

kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.24.0.yaml

Method 2: Helm Installation

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

Verify the Installation

kubectl get deployment -n cnpg-system cnpg-controller-manager

Install CloudNativePG Plugin

CloudNativePG provides a plugin for kubectl to manage a cluster in Kubernetes. You can install the cnpg plugin using a variety of methods.

Via the installation script

curl -sSfL \
  https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | \
  sudo sh -s -- -b /usr/local/bin

If you already have Krew installed, you can simply run:

kubectl krew install cnpg

2. Create S3 Credentials Secret

First, create an S3 bucket and an IAM user with S3 access. Then, create a Kubernetes secret with the IAM credentials:

kubectl create secret generic s3-creds \
  --from-literal=ACCESS_KEY_ID=your_access_key_id \
  --from-literal=ACCESS_SECRET_KEY=your_secret_access_key

3. Create PostgreSQL Cluster

Create a file named cluster.yaml with the following content:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example
spec:
  backup:
    barmanObjectStore:
      destinationPath: 's3://your-bucket-name/retail-master-db'
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: ACCESS_SECRET_KEY
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  bootstrap:
    initdb:
      postInitTemplateSQL:
        - CREATE EXTENSION IF NOT EXISTS timescaledb;
  storage:
    size: 20Gi

Apply the configuration to create cluster:

kubectl apply -f cluster.yaml

Verify the cluster status:

kubectl cnpg status example

4. Getting Access

Deploying a cluster is one thing, actually accessing it is entirely another. CloudNativePG creates three services for every cluster, named after the cluster name. In our case, these are:

kubectl get service

example-rw: Always points to the Primary node
example-ro: Points to only Replica nodes (round-robin)
example-r: Points to any node in the cluster (round-robin)

5. Insert Data

Create a PostgreSQL client pod:

kubectl run pgclient --image=postgres:13 --command -- sleep infinity

Connect to the database:

kubectl exec -ti example-1 -- psql app

Create a table and insert data:

CREATE TABLE stocks_real_time (
  time TIMESTAMPTZ NOT NULL,
  symbol TEXT NOT NULL,
  price DOUBLE PRECISION NULL,
  day_volume INT NULL
);

SELECT create_hypertable('stocks_real_time', by_range('time'));
CREATE INDEX ix_symbol_time ON stocks_real_time (symbol, time DESC);
GRANT ALL PRIVILEGES ON TABLE stocks_real_time TO app;

INSERT INTO stocks_real_time (time, symbol, price, day_volume)
VALUES
  (NOW(), 'AAPL', 150.50, 1000000),
  (NOW(), 'GOOGL', 2800.75, 500000),
  (NOW(), 'MSFT', 300.25, 750000);

6. Failover Test

Force a backup:

kubectl cnpg backup example

Initiate failover by deleting the primary pod:

kubectl delete pod example-1

Monitor the cluster status:

kubectl cnpg status example

Key observations during failover:

Initial status: “Switchover in progress”
After approximately 2 minutes 15 seconds: “Waiting for instances to become active”
After approximately 3 minutes 30 seconds: Complete failover with new primary

Verify data integrity after failover through service:

Retrieve the database password:

kubectl get secret example-app -o \
  jsonpath="{.data.password}" | base64 --decode

Connect to the database using the password:

kubectl exec -it pgclient -- psql -h example-rw -U app

Execute the following SQL queries:

# Confirm the count matches the number of rows inserted earlier. It will show 3
SELECT COUNT(*) FROM stocks_real_time;

#Insert new data to test write capability after failover:
INSERT INTO stocks_real_time (time, symbol, price, day_volume)
VALUES (NOW(), 'NFLX', 500.75, 300000);


SELECT * FROM stocks_real_time ORDER BY time DESC LIMIT 1;

Check read-only service:

kubectl exec -it pgclient -- psql -h example-ro -U app

Once connected, execute:

SELECT COUNT(*) FROM stocks_real_time;

Review logs of both pods:

kubectl logs example-1
kubectl logs example-2

Examine the logs for relevant failover information.

Perform a final cluster status check:

kubectl cnpg status example

Confirm both instances are running and roles are as expected.

7. Backup and Restore Test

First, check the current status of your cluster:

kubectl cnpg status example

Note the current state, number of instances, and any important details.

Promote the example-1 node to Primary:

kubectl cnpg promote example example-1

Monitor the promotion process, which typically takes about 3 minutes to complete.

Check the updated status of your cluster, then create a new backup:

kubectl cnpg backup example –backup-name=example-backup-1

Verify the backup status:

kubectl get backups
NAME               AGE   CLUSTER   METHOD              PHASE       ERROR
example-backup-1   38m   example   barmanObjectStore   completed

Delete the Original Cluster then prepare for the recovery test:

kubectl delete cluster example

There are two methods available to perform a Cluster Recovery bootstrap from another cluster. For further details, please refer to the documentation. There are two ways to achieve this result in CloudNativePG:

Using a recovery object store, that is a backup of another cluster created by Barman Cloud and defined via the barmanObjectStore option in the externalClusters section (recommended)
Using an existing Backup object in the same namespace (this was the only option available before version 1.8.0).

Method 1: Recovery from an Object Store

You can recover from a backup created by Barman Cloud and stored on supported object storage. Once you have defined the external cluster, including all the required configurations in the barmanObjectStore section, you must reference it in the .spec.recovery.source option.

Create a file named example-object-restored.yaml with the following content:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-object-restored
spec:
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  storage:
    size: 1Gi
  bootstrap:
    recovery:
      source: example
  externalClusters:
    - name: example
      barmanObjectStore:
        destinationPath: 's3://your-bucket-name'
        s3Credentials:
          accessKeyId:
            name: s3-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: s3-creds
            key: ACCESS_SECRET_KEY

Apply the restored cluster configuration:

kubectl apply -f example-object-restored.yaml

Monitor the restored cluster status:

kubectl cnpg status example-object-restored

Retrieve the database password:

kubectl get secret example-object-restored-app \
  -o jsonpath="{.data.password}" | base64 --decode

Connect to the restored database:

kubectl exec -it pgclient -- psql -h example-object-restored-rw -U app

Verify the restored data by executing the following SQL queries:

# it should show 4
SELECT COUNT(*) FROM stocks_real_time;
SELECT * FROM stocks_real_time;

The successful execution of these steps to recover from an object store confirms the effectiveness of the backup and restore process.

Delete the example-object-restored Cluster then prepare for the backup object restore test:

kubectl delete cluster example-object-restored

Method 2: Recovery from Backup Object

In case a Backup resource is already available in the namespace in which the cluster should be created, you can specify its name through .spec.bootstrap.recovery.backup.name

Create a file named example-restored.yaml:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-restored
spec:
  instances: 2
  imageName: ghcr.io/clevyr/cloudnativepg-timescale:16-ts2
  postgresql:
    shared_preload_libraries:
      - timescaledb
  storage:
    size: 1Gi
  bootstrap:
    recovery:
      backup:
        name: example-backup-1

Apply the restored cluster configuration:

kubectl apply -f example-restored.yaml

Monitor the restored cluster status:

kubectl cnpg status example-restored

Retrieve the database password:

kubectl get secret example-restored-app \
  -o jsonpath="{.data.password}" | base64 --decode

Connect to the restored database:

kubectl exec -it pgclient -- psql -h example-restored-rw -U app

Verify the restored data by executing the following SQL queries:

SELECT COUNT(*) FROM stocks_real_time;
SELECT * FROM stocks_real_time;

The successful execution of these steps confirms the effectiveness of the backup and restore process.

Kubernetes Events and Logs

1. Failover Events

Monitor events using:

# Watch cluster events
kubectl get events --watch | grep postgresql

# Get specific cluster events
kubectl describe cluster example | grep -A 10 Events

Key events to monitor:
- Primary selection process
- Replica promotion events
- Connection switching events
- Replication status changes

2. Backup Status

Monitor backup progress:

# Check backup status
kubectl get backups

# Get detailed backup info
kubectl describe backup example-backup-1

Key metrics:
- Backup duration
- Backup size
- Compression ratio
- Success/failure status

3. Recovery Process

Monitor recovery status:

# Watch recovery progress
kubectl cnpg status example-restored

# Check recovery logs
kubectl logs example-restored-1 \-c postgres

Important recovery indicators:
- WAL replay progress
- Timeline switches
- Recovery target status

Conclusion

The Cloud Native PostgreSQL Operator significantly simplifies the management of PostgreSQL clusters in Kubernetes environments. By following these practices for failover and disaster recovery, organizations can maintain highly available database systems that recover quickly from failures while minimizing data loss. Remember to regularly test your failover and disaster recovery procedures to ensure they work as expected when needed. Continuous monitoring and proactive maintenance are key to maintaining a robust PostgreSQL infrastructure.

Everything fails, all the time. ~ Werner Vogels, CTO, Amazon Web services

Editoral: And if you are looking for looking for a distributed, scalable, reliable, and durable storage for your PostgreSQL cluster in Kubernetes or any other Kubernetes storage need, simplyblock is the solution you’re looking for.

The post How to Build Scalable and Reliable PostgreSQL Systems on Kubernetes appeared first on simplyblock.

Serverless Compute Need Serverless Storage

Chris Engelbert — Wed, 23 Oct 2024 11:37:27 +0000

The use of serverless infrastructures is steeply increasing. As the Datadog “State of Serverless 2023” survey shows, more than half of all cloud customers have already adopted a serverless environment on the three big hyperscalers—at least to some extent. The premise of saving cost while automatically and indefinitely scaling (up and down) increases the user base.

Due to this movement, other cloud operators, many database companies (such as Neon and Nile), and infrastructure teams at large enterprises are building serverless environments, either on their premises or in their private cloud platforms.

While there are great options for serverless compute, providing serverless storage to your serverless platform tends to be more challenging. This is often fueled by a lack of understanding of what serverless storage has to provide and its requirements.

What is a Serverless Architecture?

Serverless architecture is a software design pattern that leverages serverless computing resources to build and run applications without managing the underlying architecture. These serverless compute resources are commonly provided by cloud providers such as AWS Lambda, Google Cloud Functions, or Azure Functions and can be dynamically scaled up and down.

Simplified serverless architecture with clients and multiple functions

When designing a serverless architecture, you’ll encounter the so-called Function-as-a-Service (FaaS), meaning that the application’s core logic will be implemented in small, stateless functions that respond to events.

That said, typically, several FaaS make up the actual application, sending events between them. Since the underlying infrastructure is abstracted away, the functions don’t know how requests or responses are handled, and their implementations are designed for vendor lock-in and built against a cloud-provider-specific API.

Cloud-vendor-agnostic solutions exist, such as knative, but require at least parts of the team to manage the Kubernetes infrastructure. They can, however, take the burden away from other internal and external development teams.

What is Serverless Compute?

While a serverless architecture describes the application design that runs on top of a serverless compute infrastructure, serverless compute itself describes the cloud computing model in which the cloud provider dynamically manages the allocation and provisioning of server resources.

Simplified serverless platform architecture

It is essential to understand that serverless doesn’t mean “without servers” but “as a user, I don’t have to plan, provision, or manage the infrastructure.”

In essence, the cloud provider (or whoever manages the serverless infrastructure) takes the burden from the developer. Serverless compute environments fully auto-scale, starting or stopping instances of the functions according to the needed capacity. Due to their stateless nature, it’s easy to stop and restart them at any point in time. That means that function instances are often very short-lived.

Popular serverless compute platforms include AWS Lambda, Google Cloud Functions, and Azure Functions. For self-managed operations, there is knative (mentioned before), as well as OpenFaaS and OpenFunction (which seems to have less activity in the recent future).

They all enable developers to focus on writing code without managing the underlying infrastructure.

What is a Serverless Storage System?

Serverless storage refers to a cloud storage model where the underlying infrastructure, capacity planning, and scaling are abstracted away from the user. With serverless storage, customers don’t have to worry about provisioning or managing storage servers or volumes. Instead, they can store and retrieve data while the serverless storage handles all the backend infrastructure.

Serverless storage solutions come in different forms and shapes, beginning with an object storage interface, such as Amazon S3 or Google Cloud Storage. Object storage is excellent when storing unstructured data, such as documents or media.

Serverless storage options are available in GCP, AWS, and Azure

Another option that people love to use for serverless storage is serverless databases. Various options are available, depending on your needs: relational, NoSQL, time-series, and graph databases. This might be the easiest way to go, depending on how you need to access data. Examples of such serverless databases include Amazon Aurora Serverless, Google’s Cloud Datastore, and external companies such as Neon or Nile.

When self-managing your serverless infrastructure with knative or one of the alternative options, you can use Kubernetes CSI storage providers to provide storage into your functions. However, you may add considerable startup time if you choose the wrong CSI driver. I might be biased, but simplyblock is an excellent option with its neglectable provisioning and attachment times, as well as features such as multi-attach, where a volume can be attached to multiple functions (for example, to provide a shared set of data).

Why Serverless Architectures?

Most people think of cost-efficiency when it comes to serverless architectures. However, this is only one side of the coin. If your use cases aren’t a good fit for a serverless environment, it will hold true—more on when serverless makes sense later.

In serverless architectures, functions are triggered through an event, either from the outside world (like an HTTP request) or an event initiated by another function. If no function instance is up and running, a new instance will be started. The same goes for situations where all function instances are busy. If function instances idle, they’ll be shut down.

Serverless functions usually use a pay-per-use model. A function’s extremely short lifespan can lead to cost reductions over deployment models like containers and virtual machines, which tend to run longer.

Apart from that, serverless architectures have more benefits. Many are moving in the same direction as microservices architectures, but with the premise that they are easier to implement and maintain.

First and foremost, serverless solutions are designed for scalability and elasticity. They quickly and automatically scale up and down depending on the incoming workload. It’s all hands-free.

Another benefit is that development cycles are often shortened. Due to the limited size and functionality of a FaaS, changes are fast to implement and easy to test. Additionally, updating the function is as simple as deploying the new version. All existing function instances finish their current work and shut down. In the meantime, the latest version will be started up. Due to its stateless nature, this is easy to achieve.

What are the Complexities of Serverless Architecture?

Writing serverless solutions has the benefits of fast iteration, simplified deployments, and potential cost savings. However, they also come with their own set of complexities.

Designing real stateless code isn’t easy, at least when we’re not just talking about simple transformation functionality. That’s why a FaaS receives and passes context information along during its events.

What works great for small bits of context is challenging for larger pieces. In this situation, a larger context, or state, can mean lots of things, starting from simple cross-request information that should be available without transferring it with every request over more involved data, such as lookup information to enrich and cross-check, all the way to actual complex data, like when you want to implement a serverless database. And yes, a serverless database needs to store its data somewhere.

That’s where serverless storage comes in, and simply put, this is why all serverless solutions have state storage alternatives.

What is Serverless Storage?

Serverless storage refers to storage solutions that are fully integrated into serverless compute environments without manual intervention. These solutions scale and grow according to user demand and complement the pay-by-use payment model of serverless platforms.

Serverless storage lets you store information across multiple requests or functions.

As mentioned above, cloud environments offer a wide selection of serverless storage options. However, all of them are vendor-bound and lock you into their services.

However, when you design your serverless infrastructure or service, these services don’t help you. It’s up to you to provide the serverless storage. In this case, a cloud-native and serverless-supporting storage engine can simplify this talk immensely. Whether you want to provide object storage, a serverless database, or file-based storage, an underlying cloud-native block storage solution is the perfect building block underneath. However, this block storage solution needs to be able to scale and grow with your needs easily and quickly to provision and support snapshotting, cloning, and attaching to multiple function instances.

Why do Serverless Architectures Require Serverless Storage?

Serverless storage has particular properties designed for serverless environments. It needs to keep up with the specific requirements of serverless architectures, most specifically short lifetimes, extremely fast up and down scaling or restarts, easy use across multiple versions during updates, and easy integration through APIs utilized by the FaaS.

The most significant issues are that it must be used by multiple function instances simultaneously and is quickly available to new instances on other nodes, regardless of whether those are migrated over or used for scaling out. That means that the underlying storage technology must be prepared to handle these tasks easily.

These are just the most significant requirements, but there are more:

Stateless nature: Serverless functions spin up, execute, and terminate due to their stateless nature. Without fast, persistent storage that can be attached or accessed without any additional delay, this fundamental property of serverless functions would become a struggle.
Scalability needs: Serverless compute is built to scale automatically based on user demand. A storage layer needs to seamlessly support the growth and shrinking of serverless infrastructures and handle variations in I/O patterns, meaning that traditional storage systems with fixed capacity limits don’t align well with the requirements of serverless workloads.
Cost efficiency: One reason people engage with serverless compute solutions is cost efficiency. Serverless compute users pay by actual execution time. That means that serverless storage must support similar payment structures and help serverless infrastructure operators efficiently manage and scale their storage capacities and performance characteristics.
Management overhead: Serverless compute environments are designed to eliminate manual server management. Therefore, the storage solution needs to minimize its manual administrative tasks. Allocating and scaling storage requirements must be fully integratable and automated via API calls or fully automatic. Also, the integration must be seamless if multiple storage tiers are available for additional cost savings.
Performance requirements: Serverless functions require fast, if not immediate, access to data when they spin up. Traditional storage solutions introduce delays due to allocation and additional latency, negatively impacting serverless functions’ performance. As functions are paid by runtime, their operational cost increases.
Integration needs: Serverless architectures typically combine many services, as individual functions use different services. That said, the underlying storage solution of a serverless environment needs to support all kinds of services provided to users. Additionally, seamless integration with the management services of the serverless platform is required.

There are quite some requirements. For the alignment of serverless compute and serverless storage, storage solutions need to provide an efficient and manageable layer that seamlessly integrates with the overall management layer of the serverless platform.

Simplyblock for Serverless Storage

When designing a serverless environment, the storage layer must be designed to keep up with the pace. Simplyblock enables serverless infrastructures to provide dynamic and scalable storage.

To achieve this, simplyblock provides several characteristics that perfectly align with serverless principles:

Dynamic resource allocation: Simplyblock’s thin provisioning makes capacity planning irrelevant. Storage is allocated on-demand as data is written, similar to how serverless platforms allocate resources. That means every volume can be arbitrarily large to accommodate unpredictable future growth. Additionally, simplyblock’s logical volumes are resizable, meaning that the volume can be enlarged at any point in the future.
Automatic scaling: Simplyblock’s storage engine can indefinitely grow. To acquire additional backend storage, simplyblock can automatically acquire additional persistent disks (like Amazon EBS volumes) from cloud providers or attach additional storage nodes to its cluster when capacity is about to exceed, handling scaling without user intervention.
Abstraction of infrastructure: Users interact with simplyblock’s virtual drives like normal hard disks. This abstracts away the complexity of the underlying storage pooling and backend storage technologies.
Unified interface: Simplyblock provides a unified storage interface (NVMe) logical device that abstracts away underlying, diverging storage interfaces behind an easy-to-understand disk design. That enables services not specifically designed to talk to object storages or similar technologies to immediately benefit from them, just like PostgreSQL or MySQL.
Extensibility: Due to its disk-like storage interface, simplyblock is highly extensible in terms of solutions that can be run on top of it. Databases, object storage, file storage, and specific storage APIs, simplyblock provides scalable block storage to all of them, making it the perfect backend solution for serverless environments.
Crash-consistent and recoverable: Serverless storage must always be up and running. Simplyblock’s distributed erasure coding (parity information similar to RAID-5 or 6) enables high availability and fault tolerance on the storage level with a high storage efficiency, way below simple replication. Additionally, simplyblock provides storage cluster replication (sync / async), consistent snapshots across multiple logical volumes, and disaster recovery options.
Automated management: With features like automatic storage tiering to cheaper object storage (such as Amazon S3), automatic scaling, as well as erasure coding and backups for data protection, simplyblock eliminates manual management overhead and hands-on tasks. Simplyblock clusters are fully autonomous and manage the underlying storage backend automatically.
Flexible integration: Serverless platforms require storage to be seamlessly allocated and provisioned. Simplyblock achieves this through its API, which can be integrated into the standard provisioning flow of new customer sign-ups. If the new infrastructure runs on Kubernetes, integration is even easier with the Kubernetes CSI driver, allowing seamless integration with container-based serverless platforms such as knative.
Pay-per-use potential: Due to the automatic scalability, thin provisioning, and seamless resizing and integration, simplyblock enables you to provide your customers with an industry-loved pay-by-use model for managed service providers, perfectly aligning with the most common serverless pricing models.

Simplyblock is the perfect backend storage for all your serverless storage needs while future-proofing your infrastructure. As data grows and evolves, simplyblock’s flexibility and scalability ensure you can adapt without massive overhauls or migrations.

Remember, simplyblock offers powerful features like thin provisioning, storage pooling, and tiering, helping you to provide a cost-efficient, pay-by-use enabled storage solution. Get started now and find out how easy it is to operate services on top of simplyblock.

The post Serverless Compute Need Serverless Storage appeared first on simplyblock.

NVMe Storage for Database Optimization: Lessons from Tech Giants

Rob Pankow — Thu, 17 Oct 2024 13:27:59 +0000

Database Scalability Challenges in the Age of NVMe

In 2024, data-driven organizations increasingly recognize the crucial importance of adopting NVMe storage solutions to stay competitive. With NVMe adoption still below 30%, there’s significant room for growth as companies seek to optimize their database performance and storage efficiency. We’ve looked at how major tech companies have tackled database optimization and scalability challenges, often turning to self-hosted database solutions and NVMe storage.

While it’s interesting to see what Netflix or Pinterest engineers are investing their efforts into, it is also essential to ask yourself how your organization is adopting new technologies. As companies grow and their data needs expand, traditional database setups often struggle to keep up. Let’s look at some examples of how some of the major tech players have addressed these challenges.

Pinterest’s Journey to Horizontal Database Scalability with TiDB

Pinterest, which handles billions of pins and user interactions, faced significant challenges with its HBase setup as it scaled. As their business grew, HBase struggled to keep up with evolving needs, prompting a search for a more scalable database solution. They eventually decided to go with TiDB as it provided the best performance under load.

Selection Process:

Evaluated multiple options, including RocksDB, ShardDB, Vitess, VoltDB, Phoenix, Spanner, CosmosDB, Aurora, TiDB, YugabyteDB, and DB-X.
Narrowed down to TiDB, YugabyteDB, and DB-X for final testing.

Evaluation:

Conducted shadow traffic testing with production workloads.
TiDB performed well after tuning, providing sustained performance under load.

TiDB Adoption:

Deployed 20+ TiDB clusters in production.
Stores over 200+ TB of data across 400+ nodes.
Primarily uses TiDB 2.1 in production, with plans to migrate to 3.0.

Key Benefits:

Improved query performance, with 2-10x improvements in p99 latency.
More predictable performance with fewer spikes.
Reduced infrastructure costs by about 50%.
Enabled new product use cases due to improved database performance.

Challenges and Learnings:

Encountered issues like TiCDC throughput limitations and slow data movement during backups.
Worked closely with PingCAP to address these issues and improve the product.

Future Plans:

Exploring multi-region setups.
Considering removing Envoy as a proxy to the SQL layer for better connection control.
Exploring migrating to Graviton instance types for a better price-performance ratio and EBS for faster data movement (and, in turn, shorter MTTR on node failures).

Uber’s Approach to Scaling Datastores with NVMe

Uber, facing exponential growth in active users and ride volumes, needed a robust solution for their datastore “Docstore” challenges.

Hosting Environment and Limitations:

Initially on AWS, later migrated to hybrid cloud and on-premises infrastructure
Uber’s massive scale and need for customization exceeded the capabilities of managed database services

Uber’s Solution: Schemaless and MySQL with NVMe

Schemaless: A custom solution built on top of MySQL
Sharding: Implemented application-level sharding for horizontal scalability
Replication: Used MySQL replication for high availability
NVMe storage: Leveraged NVMe disks for improved I/O performance

Results:

Able to handle over 100 billion queries per day
Significantly reduced latency for read and write operations
Improved operational simplicity compared to Cassandra

Discord’s Storage Evolution and NVMe Adoption

Discord, facing rapid growth in user base and message volume, needed a scalable and performant storage solution.

Hosting Environment and Limitations:

Google Cloud Platform (GCP)
Discord’s specific performance requirements and need for customization led them to self-manage their database infrastructure

Discord’s storage evolution:

MongoDB: Initially used for its flexibility, but faced scalability issues
Cassandra: Adopted for better scalability but encountered performance and maintenance challenges
ScyllaDB: Finally settled on ScyllaDB for its performance and compatibility with Cassandra

Discord also created a solution, “superdisk” with a RAID0 on top of the Local SSDs, and a RAID1 between the Persistent Disk and RAID0 array. They could configure the database with a disk drive that would offer low-latency reads while still allowing us to benefit from the best properties of Persistent Disks. One can think of it as a “simplyblock v0.1”.

Figure 1: Discord’s “superdisk” architecture

Key improvements with ScyllaDB:

Reduced P99 latencies from 40-125ms to 15ms for read operations
Improved write performance, with P99 latencies dropping from 5-70ms to a consistent 5ms
Better resource utilization, allowing Discord to reduce their cluster size from 177 Cassandra nodes to just 72 ScyllaDB nodes

Summary of Case Studies

In the table below, we can see a summary of the key initiatives taken by tech giants and their respective outcomes. What is notable, all of the companies were self-hosting their databases (on Kubernetes or on bare-metal servers) and have leveraged local SSD (NVMe) for improved read/write performance and lower latency. However, at the same time, they all had to work around data protection and scalability of the local disk. Discord, for example, uses RAID to mirror the disk, which causes significant storage overhead. Such an approach doesn’t also offer a logical management layer (i.e. “storage/disk virtualization”). In the next paragraphs, let’s explore how simplyblock adds even more performance, scalability, and resource efficiency to such setups.

Company	Database	Hosting environment	Key Initiative
Pinterest	TiDB	AWS EC2 & Kubernetes, local NVMe disk	Improved performance & scalability
Uber	MySQL	Bare-metal, NVMe storage	Reduced read/write latency, improved scalability
Discord	ScyllaDB	Google Cloud, local NVMe disk with RAID mirroring	Reduced latency, improved performance and resource utilization

The Role of Intelligent Storage Optimization in NVMe-Based Systems

While these case studies demonstrate the power of NVMe and optimized database solutions, there’s still room for improvement. This is where intelligent storage optimization solutions like simplyblock are spearheading market changes.

Simplyblock vs. Local NVMe SSD: Enhancing Database Scalability

While local NVMe disks offer impressive performance, simplyblock provides several critical advantages for database scalability. Simplyblock builds a persistent layer out of local NVMe disks, which means that is not just a cache and it’s not just ephemeral storage. Let’s explore the benefits of simplyblock over local NVMe disk:

Scalability: Unlike local NVMe storage, simplyblock offers dynamic scalability, allowing storage to grow or shrink as needed. Simplyblock can scale performance and capacity beyond the local node’s disk size, significantly improving tail latency.
Reliability: Data on local NVMe is lost if an instance is stopped or terminated. Simplyblock provides advanced data protection that survives instance outages.
High Availability: Local NVMe loses data availability during the node outage. Simplyblock ensures storage remains fully available even if a compute instance fails.
Data Protection Efficiency: Simplyblock uses erasure coding (parity information) instead of triple replication, reducing network load and improving effective-to-raw storage ratios by about 150% (for a given amount of NVMe disk, there is 150% more usable storage with simplyblock).
Predictable Performance: As IOPS demand increases, local NVMe access latency rises, often causing a significant increase in tail latencies (p99 latency). Simplyblock maintains constant access latencies at scale, improving both median and p99 access latency. Simplyblock also allows for much faster write at high IOPS as it’s not using NVMe layer as write-through cache, hence its performance isn’t dependent on a backing persistent storage layer (e.g. S3)
Maintainability: Upgrading compute instances impacts local NVMe storage. With simplyblock, compute instances can be maintained without affecting storage.
Data Services: Simplyblock provides advanced data services like snapshots, cloning, resizing, and compression without significant overhead on CPU performance or access latency.
Intelligent Tiering: Simplyblock automatically moves infrequently accessed data to cheaper S3 storage, a feature unavailable with local NVMe.
Thin Provisioning: This allows for more efficient use of storage resources, reducing overprovisioning common in cloud environments.
Multi-attach Capability: Simplyblock enables multiple nodes to access the same volume, which is useful for high-availability setups without data duplication. Additionally, multi-attach can decrease the complexity of volume management and data synchronization.

Technical Deep Dive: Simplyblock’s Architecture

Simplyblock’s architecture is designed to maximize the benefits of NVMe while addressing common cloud storage challenges:

NVMe-oF (NVMe over Fabrics) Interface: Exposes storage as NVMe volumes, allowing for seamless integration with existing systems while providing the low-latency benefits of NVMe.
Distributed Data Plane: Uses a statistical placement algorithm to distribute data across nodes, balancing performance and reliability.
Logical Volume Management: Supports thin provisioning, instant resizing, and copy-on-write clones, providing flexibility for database operations.
Asynchronous Replication: Utilizes a block-storage-level write-ahead log (WAL) that’s asynchronously replicated to object storage, enabling disaster recovery with near-zero RPO (Recovery Point Objective).
CSI Driver: Provides seamless integration with Kubernetes, allowing for dynamic provisioning and lifecycle management of volumes.

Below is a short overview of simplyblock’s high-level architecture in the context of PostgreSQL, MySQL, or Redis instances hosted in Kubernetes. Simplyblock creates a clustered shared pool out of local NVMe storage attached to Kubernetes compute worker nodes (storage is persistent, protected by erasure coding), serving database instances with the performance of local disk but with an option to scale out into other nodes (which can be either other compute nodes or separate, disaggregated, storage nodes). Further, the “colder” data is tiered into cheaper storage pools, such as HDD pools or object storage.

Figure 2: Simplified simplyblock architecture

Applying Simplyblock to Real-World Scenarios

Let’s explore how simplyblock could enhance the setups of the companies we’ve discussed:

Pinterest and TiDB with simplyblock

While TiDB solved Pinterest’s scalability issues, and they are exploring Graviton instances and EBS for a better price-performance ratio and faster data movement, simplyblock could potentially offer additional benefits:

Price/Performance Enhancement: Simplyblock’s storage orchestration could complement Pinterest’s move to Graviton instances, potentially amplifying the price-performance benefits. By intelligently managing storage across different tiers (including EBS and local NVMe), simplyblock could help optimize storage costs while maintaining or even improving performance.
MTTR Improvement & Faster Data Movements: In line with Pinterest’s goal of faster data movement and reduced Mean Time To Recovery (MTTR), simplyblock’s advanced data management capabilities could further accelerate these processes. Its efficient data protection with erasure coding and multi-attach capabilities helps with smooth failovers or node failures without performance degradation. If a node fails, simplyblock can quickly and autonomously rebuild the data on another node using parity information provided by erasure coding, eliminating downtime.
Better Scalability through Disaggregation: Simplyblock’s architecture allows for the disaggregation of storage and compute, which aligns well with Pinterest’s exploration of different instance types and storage options. This separation would provide Pinterest with greater flexibility in scaling their storage and compute resources independently, potentially leading to more efficient resource utilization and easier capacity planning.

Figure 3: Simplyblock’s multi-attach functionality visualized

Uber’s Schemaless

While Uber’s custom Schemaless solution on MySQL with NVMe storage is highly optimized, simplyblock could still offer benefits:

Unified Storage Interface: Simplyblock could provide a consistent interface across Uber’s diverse storage needs, simplifying operations.
Intelligent Data Placement: For Uber’s time-series data (like ride information), simplyblock’s tiering could automatically optimize data placement based on age and access patterns.
Enhanced Disaster Recovery: Simplyblock’s asynchronous replication to S3 could complement Uber’s existing replication strategies, potentially improving RPO.

Discord and ScyllaDB

Discord’s move to ScyllaDB already provided significant performance improvements, but simplyblock could further enhance their setup:

NVMe Resource Pooling: By pooling NVMe resources across nodes, simplyblock would allow Discord to further reduce their node count while maintaining performance.
Cost-Efficient Scaling: For Discord’s rapidly growing data needs, simplyblock’s intelligent tiering could help manage costs as data volumes expand.
Simplified Cloning for Testing: Simplyblock’s instant cloning feature could be valuable for Discord’s development and testing processes.It allows for quick replication of production data without additional storage overhead.

What’s next in the NVMe Storage Landscape?

The case studies from Pinterest, Uber, and Discord highlight the importance of continuous innovation in database and storage technologies. These companies have pushed beyond the limitations of managed services like Amazon RDS to create custom, high-performance solutions often built on NVMe storage.

However, the introduction of intelligent storage optimization solutions like simplyblock represents the next frontier in this evolution. By providing an innovative layer of abstraction over diverse storage types, implementing smart data placement strategies, and offering features like thin provisioning and instant cloning alongside tight integration with Kubernetes, simplyblock spearheads market changes in how companies approach storage optimization.

As data continues to grow exponentially and performance demands increase, the ability to intelligently manage and optimize NVMe storage will become ever more critical. Solutions that can seamlessly integrate with existing infrastructure while providing advanced features for performance, cost optimization, and disaster recovery will be key to helping companies navigate the challenges of the data-driven future.

The trend towards NVMe adoption, coupled with intelligent storage solutions like simplyblock is set to reshape the database infrastructure landscape. Companies that embrace these technologies early will be well-positioned to handle the data challenges of tomorrow, gaining a significant competitive advantage in their respective markets.

The post NVMe Storage for Database Optimization: Lessons from Tech Giants appeared first on simplyblock.