NVMe Archives | simplyblock

NVMe over TCP vs iSCSI: Evolution of Network Storage

Chris Engelbert — Wed, 08 Jan 2025 14:27:48 +0000

TLDR: In a direct comparison of NVMe over TCP vs iSCSI, we see that NVMe over TCP outranks iSCSI in all categories with IOPS improvements of up to 50% (and more) and latency improvements by up to 34%.

When data grows, storage needs to grow, too. That’s when remotely attached SAN (Storage Area Network) systems come in. So far, these were commonly connected through one of three protocols: Fibre Channel, Infiniband, or iSCSI. However, with the latter being on the “low end” side of things, without the need for special hardware to operate. NMVe over Fabrics (NVMe-oF), and specifically NVMe over TCP (NVMe/TCP) as the successor of iSCSI, is on the rise to replace these legacy protocols and bring immediate improvements in latency, throughput, and IOPS.

iSCSI: The Quick History Lesson

Figure 1: Nokia 3310, released September 2000 (Source: Wikipedia)

iSCSI is a protocol that connects remote storage solutions (commonly hardware storage appliances) to storage clients. The latter are typically servers without (or with minimal) local storage, as well as virtual machines. In recent years, we have also seen iSCSI being used as a backend for container storage.

iSCSI stands for Internet Small Computer Storage Interface and encapsulates the standard SCSI commands within TCP/IP packets. That means that iSCSI works over commodity Ethernet networks, removing the need for specialized hardware such as network cards (NICs) and switches.

The iSCSI standard was first released in early 2000. A world that was very different from today. Do you remember what a phone looked like in 2000?

That said, while there was access to the first flash-based systems, prices were still outrageous, and storage systems were designed with spinning disks in mind. Remember that. We’ll come back to it later.

What is SCSI?

SCSI, or, you guessed it, the Small Computer Storage Interface, is a set of standards for connecting and transferring data between computers and peripheral devices. Originally developed in the 1980s, SCSI has been a foundational technology for data storage interfaces, supporting various device types, primarily hard drives, optical drives, and scanners.

While SCSI kept improving and adding new commands for technologies like NVMe. The foundation is still rooted in the early 1980s, though. However, many standards still use the SCSI command set, SATA (home computers), SAS (servers), and iSCSI.

What is NVMe?

Non-Volatile Memory Express (NVMe) is a modern PCI Express-based (PCI-e) storage interface. With the original specification dating back to 2011, NVMe is engineered specifically for solid-state drives (SSDs) connected via the PCIe bus. Therefore, NVMe devices are directly connected to the CPU and other devices on the bus to increase throughput and latency. NVMe dramatically reduces latency and increases input/output operations per second (IOPS) compared to traditional storage interfaces.

As part of the NVMe standard, additional specifications are developed, such as the transport specification which defines how NVMe commands are transported (e.g., via the PCI Express Bus, but also networking protocols like TCP/IP).

The Fundamental Difference of Spinning Disks and NVMe

Traditional spinning hard disk drives (HDDs) rely on physical spinning platters and moveable read/write heads to write or access data. When data is requested, the mechanical component must be physically placed in the correct location of the platter stack, resulting in significant access latencies ranging from 10-14 milliseconds.

Flash storage, including NVMe devices, eliminates the mechanical parts, utilizing NAND flash chips instead. NAND stores data purely electronically and achieves access latencies as low as 20 microseconds (and even lower on super high-end gear). That makes them 100 times faster than their HDD counterparts.

For a long time, flash storage had the massive disadvantage of limited storage capacity. However, this disadvantage slowly fades away with companies introducing higher-capacity devices. For example, Toshiba just announced a 180TB flash storage device.

Cost, the second significant disadvantage, also keeps falling with improvements in development and production. Technologies like QLC NAND offer incredible storage density for an affordable price.

Anyhow, why am I bringing up the mechanical vs electrical storage principle? The reason is simple: the access latency. SCSI and iSCSI were never designed for super low access latency devices because they didn’t really exist at the time of their development. And, while some adjustments were made to the protocol over the years, their fundamental design is outdated and can’t be changed for backward compatibility reasons.

NVMe over Fabrics: Flash Storage on the Network

NVMe over Fabrics (also known as NVMe-oF) is an extension to the NVMe base specification. It allows NVMe storage to be accessed over a network fabric while maintaining the low-latency, high-performance characteristics of local NVMe devices.

NVMe over Fabrics itself is a collection of multiple sub-specifications, defining multiple transport layer protocols.

NVMe over TCP: NVMe/TCP utilizes the common internet standard protocol TCP/IP. It deploys on commodity Ethernet networks and can run parallel to existing network traffic. That makes NVMe over TCP the modern successor to iSCSI, taking over where iSCSI left off. Therefore, NVMe over TCP is the perfect solution for public cloud-based storage solutions that typically only provide TCP/IP networking.
NVME over Fibre Channel: NVMe/FC builds upon the existing Fibre Channel network fabric. It tunnels NVMe commands through Fibre Channel packets and enables reusing available Fibre Channel hardware. I wouldn’t recommend it for new deployments due to the high entry cost of Fibre Channel equipment.
NVMe over Infiniband: Like NVMe over Fibre Channel, NVMe/IB utilizes existing Infiniband networks to tunnel the NVMe protocol. If you have existing Infiniband equipment, NVMe over Infiniband might be your way. For new deployments, the initial entry cost is too high.
NVME over RoCE: NVME over Converged Ethernet is a transport layer that uses an Ethernet fabric for remote direct memory access (RDMA). To use NVMe over RoCE, you need RDMA-capable NICs. RoCE comes in two versions: RoCEv1, which is a layer-2 protocol and not routable, and RoCEv2, which uses UDP/IP and can be routed across complex networks. NVMe over RoCE doesn’t scale as easily as NVMe over TCP but provides even lower latencies.

NVMe over TCP vs iSCSI: The Comparison

When comparing NVMe over TCP vs iSCSI, we see considerable improvements in all three primary metrics: latency, throughput, and IOPS.

Figure 2: Medium queue-depth workload at 4KB blocksize I/O (Source: Blockbridge)

The folks over at Blockbridge ran an extensive comparison of the two technologies, which shows that NVMe over TCP outperformed iSCSI, regardless of the benchmark.

I’ll provide the most critical benchmarks here, but I recommend you read through the full benchmark article right after finishing here.

Anyhow, let’s dive a little deeper into the actual facts on the NVMe over TCP vs iSCSI benchmark.

Editor’s Note: Our Developer Advocate, Chris Engelbert, gave a talk recently at SREcon in Dublin, talking about the performance between NVMe over TCP and iSCSI, which led to this blog post. Find the full presentation NVMe/TCP makes iSCSI look like Fortran.

Benchmarking Network Storage

Evaluating storage performance involves comparing four major performance indicators.

IOPS: Number of input/output operations processed per second
Latency: Time required to complete a single input/output operation
Throughput: Total data transferred per unit of time
Protocol Overhead: Additional processing required by the communication protocol

Editor’s note: For latency, throughput, and IOPS, we have an exhaustive blog post talking deeper about the necessities, their relationships, and how to calculate them.

A comprehensive performance testing involves simulated workloads that mirror real-world scenarios. To simplify this process, benchmarks use tools like FIO (Flexible I/O Tester) to generate consistent, reproducible test data and results across different storage configurations and systems.

IOPS Improvements of NVMe over TCP vs iSCSI

Running IOPS-intensive applications, the number of available IOPS in a storage system is critical. IOPS-intensive application means systems such as databases, analytics platforms, asset servers, and similar solutions.

Improving IOPS by exchanging the storage-network protocol is an immediate win for the database and us.

Using NVMe over TCP instead of iSCSI shows a dramatic increase in IOPS, especially for smaller block sizes. At 512 bytes block size, Blockbridge found an average 35.4% increase in IOPS. At a more common 4KiB block size, the average increase was 34.8%.

That means the same hardware can provide over one-third more IOPS using NVMe over TCP vs iSCSI at no additional cost.

Figure 3: Average IOPS improvement of NVMe over TCP vs iSCSI by blocksize (Source: Blockbridge)

Latency Improvements of NVMe over TCP vs iSCSI

While IOPS-hungry use cases, such as compaction events in databases (Cassandra), benefit from the immense increase in IOPS, latency-sensitive applications love low access latencies. Latency is the primary factor that causes people to choose local NVMe storage over remotely-attached storage, knowing about many or all drawbacks.

Latency-sensitive applications range from high-frequency trading systems, where milliseconds are measured in hard money, over telecommunication systems, where latency can introduce issues with system synchronization, to cybersecurity and threat detection solutions that need to react as fast as possible.

Therefore, decreasing latency is a significant benefit for many industries and solutions. Apart from that, a lower access latency always speeds up data access, even if your system isn’t necessarily latency-sensitive. You will feel the difference.

Blockbridge found the most significant benefit in access latency reduction with a block size of 16KiB with a queue depth of 128 (which can easily be hit with I/O demanding solutions). The average latency for iSCSI was 5,871μs compared to NVMe over TCP with 5,089μs. A 782μs (~25%) decrease in access latency—just by exchanging the storage protocol.

Figure 4: Average access latency comparison, NVMe over TCP vs iSCSI, for 4, 8, 16 KiB (Source: Blockbridge)

Throughput Improvement of NVMe over TCP vs iSCSI

As the third primary metric of storage performance, throughput describes how much data is actually pumped from the disk into your workload.

Throughput is the major factor for applications such as video encoding or streaming platforms, large analytical systems, and game servers streaming massive worlds into memory. Furthermore, there are also time-series storage, data lakes, and historian databases.

Throughput-heavy systems benefit from higher throughput to get the “job done faster.” Oftentimes, increasing the throughput isn’t easy. You’re either bound by the throughput provided by the disk or, in the case of a network-attached system, the network bandwidth. To achieve high throughput and capacity, remote network storage utilizes high bandwidth networking or specialized networking systems such as Fibre Channel or Infiniband.

Blockbridge ran their tests on a dual-port 100Gbit/s network card, limited by the PCI Express x16 Gen3 bus to a maximum throughput of around 126Gbit/s. Newer PCIe standards achieve much higher throughput. Hence, NVMe devices and NICs aren’t bound by the “limiting” factor of the PCIe bus anymore.

With a 16KiB block size and a queue depth of 32, their benchmark saw a whopping 2.3GB/s increase in performance on NVMe over TCP vs iSCSI. The throughput increased from 10.387GBit/s on iSCSI to 12.665GBit/s, an easy 20% on top—again, using the same hardware. That’s how you save money.

Figure 5: Average throughput of NVMe over TCP vs iSCSI for different queue depths of 1, 2, 4, 8, 16, 32, 64, 128 (Source: Blockbridge)

The Compelling Case for NVMe over TCP

We’ve seen that NVMe over TCP has significant performance advantages over iSCSI in all three primary storage performance metrics. Nevertheless, there are more advantages to NVMe over TCP vs iSCSI.

Standard Ethernet: NVMe over TCP’s most significant advantage is its ability to operate over standard Ethernet networks. Unlike specialized networking technologies (Infiniband, Fibre Channel), NVMe/TCP requires no additional hardware investments or complex configuration, making it remarkably accessible for organizations of all sizes.
Performance Characteristics: NVMe over TCP delivers exceptional performance by minimizing protocol overhead and leveraging the efficiency of NVMe’s design. It can achieve latencies comparable to local storage while providing the flexibility of network-attached resources. Modern implementations can sustain throughput rates exceeding traditional storage protocols by significant margins.
Ease of Deployment: NVMe over TCP integrates seamlessly with Linux and Windows (Server 2025 and later) since the necessary drivers are already part of the kernel. That makes NVMe/TCP straightforward to implement and manage. Seamless compatibility reduces the learning curve and integration challenges typically associated with new storage technologies.

Choosing Between NVMe over TCP and iSCSI

Deciding between two technologies isn’t always easy. It isn’t that hard in the case of NVMe over TCP vs iSCSI. The use cases for new iSCSI deployment are very sparse. From my perspective, the only valid use case is the integration of pre-existing legacy systems that don’t yet support NVMe over TCP

That’s why simplyblock, as an NVMe over TCP first solution, still provides iSCSI if you really need it. We offer it exactly for the reason that migrations don’t happen from today to tomorrow. Still, you want to leverage the benefits of newer technologies, such as NVMe over TCP, wherever possible. With simplyblock, logical volumes can easily be provisioned as NVMe over TCP or iSCSI devices. You can even switch over from iSCSI to NVMe over TCP later on.

In any case, you should go with NVMe over TCP when:

You operate high-performance computing environments
You have modern data centers with significant bandwidth
You deploy workloads requiring low-latency, high IOPS, or throughput storage access
You find yourself in scenarios that demand scalable, flexible storage solutions
You are in any other situation where you need remotely attached storage

You should stay on iSCSI (or slowly migrate away) when:

You have legacy infrastructure with limited upgrade paths

You see, there aren’t a lot of reasons. Given that, it’s just a matter of selecting your new storage solution. Personally, these days, I would always recommend software-defined storage solutions such as simplyblock, but I’m biased. Anyhow, an SDS provides the best of both worlds: commodity storage hardware (with the option to go all in with your 96-bay storage server) and performance.

Simplyblock: Embracing Versatility

Simplyblock demonstrates forward-thinking storage design by supporting both NVMe over TCP and iSCSI, providing customers with the best performance when available and the chance to migrate slowly in the case of existing legacy clients.

Furthermore, simplyblock offers features known from traditional SAN storage systems or “filesystems” such as ZFS. This includes a full copy-on-write backend with instant snapshots and clones. It includes synchronous and asynchronous replication between storage clusters. Finally, simplyblock is your modern storage solution, providing storage to dedicated hosts, virtual machines, and containers. Regardless of the client, simplyblock offers the most seamless integration with your existing and upcoming environments.

The Future of NVMe over TCP

As enterprise and cloud computing continue to evolve, NVMe over TCP stands as the technology of choice for remotely attached storage. Firstly, it combines simplicity, performance, and broad compatibility. Secondly, it provides a cost-efficient and scalable solution utilizing commodity network gear.

The protocol’s ongoing development (last specification update May 2024) and increasing adoption show continued improvements in efficiency, reduced latency, and enhanced scalability.

NVMe over TCP represents a significant step forward in storage networking technology. Furthermore, combining the raw performance of NVMe with the ubiquity of Ethernet networking offers a compelling solution for modern computing environments. While iSCSI remains relevant for specific use cases and during migration phases, NVME over TCP represents the future and should be adopted as soon as possible.

We, at simplyblock, are happy to be part of this important step in the history of storage.

Questions and Answers

Is NVMe over TCP better than iSCSI?

Yes, NVMe over TCP is superior to iSCSI in almost any way. NVMe over TCP provides lower protocol overhead, better throughput, lower latency, and higher IOPS compared to iSCSI. It is recommended that iSCSI not be used for newly designed infrastructures and that old infrastructures be migrated wherever possible.

How much faster is NVMe over TCP compared to iSCSI?

NVMe over TCP is superior in all primary storage metrics, meaning IOPS, latency, and throughput. NVMe over TCP shows up to 35% higher IOPS, 25% lower latency, and 20% increased throughput compared to iSCSI using the same network fabric and storage.

What is NVMe over TCP?

NVMe/TCP is a storage networking protocol that utilizes the common internet standard protocol TCP/IP as its transport layer. It is deployed through standard Ethernet fabrics and can be run parallel to existing network traffic, while separation through VLANs or physically separated networks is recommended. NVMe over TCP is considered the successor of the iSCSI protocol.

What is iSCSI?

iSCSI is a storage networking protocol that utilizes the common internet standard protocol TCP/IP as its transport layer. It connects remote storage solutions (commonly hardware storage appliances) to storage clients through a standard Ethernet fabric. iSCSI was initially standardized in 2000. Many companies replace iSCSI with the superior NVMe over TCP protocol.

What is SCSI?

SCSI (Small Computer Storage Interface) is a command set that connects computers and peripheral devices and transfers data between them. Initially developed in the 1980s, SCSI has been a foundational technology for data storage interfaces, supporting various device types such as hard drives, optical drives, and scanners.

What is NVMe?

NVMe (Non-Volatile Memory Express) is a specification that defines the connection and transmission of data between storage devices and computers. The initial specification was released in 2011. NVMe is designed specifically for solid-state drives (SSDs) connected via the PCIe bus. NVMe devices have improved latency and performance than older standards such as SCSI, SATA, and SAS.

The post NVMe over TCP vs iSCSI: Evolution of Network Storage appeared first on simplyblock.

Encryption At Rest: A Comprehensive Guide to DARE

Chris Engelbert — Tue, 17 Dec 2024 10:22:44 +0000

TLDR: Data At Rest Encryption (DARE) is the process of encrypting data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

Today, data is the “new gold” for companies. Data collection happens everywhere and at any time. The global amount of data collected is projected to reach 181 Zettabytes (that is 181 billion Terabytes) by the end of 2025. A whopping 23.13% increase over 2024.

That means data protection is becoming increasingly important. Hence, data security has become paramount for organizations of all sizes. One key aspect of data security protocols is data-at-rest encryption, which provides crucial protection for stored data.

Understanding the Data-in-Use, Data-in-Transit, and Data-at-Rest

Before we go deep into DARE, let’s quickly discuss the three states of data within a computing system and infrastructure.

Figure 1: The three states of data in encryption

To start, any type of data is created inside an application. While the application holds onto it, it is considered as data in use. That means data in use describes information actively being processed, read, or modified by applications or users.

For example, imagine you open a spreadsheet to edit its contents or a database to process a query. This data is considered “in use.” This state is often the most vulnerable as the data must be in an unencrypted form for processing. However, technologies like confidential computing enable encrypted memory to process even these pieces of data in an encrypted manner.

Next, data in transit describes any information moving between locations. Locations means across the internet, within a private network, or between memory and processors. Examples include email messages being sent, files being downloaded, or database query results traveling between the database server and applications—all examples of data in transit.

Last but not least, data at rest refers to any piece of information stored physically on a digital storage media such as flash storage or hard disk. It also considers storage solutions like cloud storage, offline backups, and file systems as valid digital storage. Hence, data stored in these services is also data at rest. Think of files saved on your laptop’s hard drive or photos stored in cloud storage such as Google Drive, Dropbox, or similar.

Criticality of Data Protection

For organizations, threats to their data are omnipresent. Starting from unfortunate human error, deleting important information, coordinated ransomware attacks, encryption of your data, and asking for a ransom to actual data leaks.

Especially with data leaks, most people think about external hackers copying data off of the organization’s network and making it public. However, this isn’t the only way data is leaked to the public. There are many examples of Amazon S3 buckets without proper authorization, databases being accessible from the outside world, or backup services being accessed.

Anyhow, organizations face increasing threats to their data security. Any data breach has consequences, which are categorized into four segments:

Financial losses from regulatory fines and legal penalties
Damage to brand reputation and loss of customer trust
Exposure of intellectual property to competitors
Compliance violations with regulations like GDPR, HIPAA, or PCI DSS

While data in transit is commonly protected through TLS (transport layer encryption), data at rest encryption (or DARE) is often an afterthought. This is typically because the setup isn’t as easy and straightforward as it should be. It’s this “I can still do this afterward” effect: “What could possibly go wrong?”

However, data at rest is the most vulnerable of all. Data in use is often unprotected but very transient. While there is a chance of a leak, it is small. Data at rest persists for more extended periods of time, giving hackers more time to plan and execute their attacks. Secondly, persistent data often contains the most valuable pieces of information, such as customer records, financial data, or intellectual property. And lastly, access to persistent storage enables access to large amounts of data within a single breach, making it so much more interesting to attackers.

Understanding Data At Rest Encryption (DARE)

Simply spoken, Data At Rest Encryption (DARE) transforms stored data into an unreadable format that can only be read with the appropriate encryption or decryption key. This ensures that the information isn’t readable even if unauthorized parties gain access to the storage medium.

That said, the strength of the encryption used is crucial. Many encryption algorithms we have considered secure have been broken over time. Encrypting data once and taking for granted that unauthorized parties can’t access it is just wrong. Data at rest encryption is an ongoing process, potentially involving re-encrypting information when more potent, robust encryption algorithms are available.

Available Encryption Types for Data At Rest Encryption

Figure 2: Symmetric encryption (same encryption key) vs asymmetric encryption (private and public key)

Two encryption methods are generally available today for large-scale setups. While we look forward to quantum-safe encryption algorithms, a widely adopted solution has yet to be developed. Quantum-safe means that the encryption will resist breaking attempts from quantum computers.

Anyhow, the first typical type of encryption is the Symmetric Encryption. It uses the same key for encryption and decryption. The most common symmetric encryption algorithm is AES, the Advanced Encryption Standard.

const cipherText = encrypt(plaintext, encryptionKey);
const plaintext = decrypt(cipherText, encryptionKey);

The second type of encryption is Asymmetric Encryption. Hereby, the encryption and decryption routines use different but related keys, generally referred to as Private Key and Public Key. According to their names, the public key can be publicly known, while the private key must be kept private. Both keys are mathematically connected and generally based on a hard-to-solve mathematical problem. Two standard algorithms are essential: RSA (the Rivest–Shamir–Adleman encryption) and ECDSA (the Elliptic Curve Digital Signature Algorithm). While RSA is based on the prime factorization problem, ECDSA is based on the discrete log problem. To go into detail about those problems is more than just one additional blog post, though.

const cipherText = encrypt(plaintext, publicKey);
const plaintext = decrypt(cipherText, privateKey);

Encryption in Storage

The use cases for symmetric and asymmetric encryption are different. While considered more secure, asymmetric encryption is slower and typically not used for large data volumes. Symmetric encryption, however, is fast but requires the sharing of the encryption key between the encrypting and decrypting parties, making it less secure.

To encrypt large amounts of data and get the best of both worlds, security and speed, you often see a combination of both approaches. The symmetric encryption key is encrypted with an asymmetric encryption algorithm. After decrypting the symmetric key, it is used to encrypt or decrypt the stored data.

Simplyblock provides data-at-rest encryption directly through its Kubernetes CSI integration or via CLI and API. Additionally, simplyblock can be configured to use a different encryption key per logical volume for the highest degree of security and muti-tenant isolation.

Key Management Solutions

Next to selecting the encryption algorithm, managing and distributing the necessary keys is crucial. This is where key management solutions (KMS) come in.

Generally, there are two basic types of key management solutions: hardware and software-based.

For hardware-based solutions, you’ll typically utilize an HSM (Hardware Security Module). These HSMs provide a dedicated hardware token (commonly a USB key) to store keys. They offer the highest level of security but need to be physically attached to systems and are more expensive.

Software-based solutions offer a flexible key management alternative. Almost all cloud providers offer their own KMS systems, such as Azure Key Vault, AWS KMS, or Google Cloud KMS. Additionally, third-party key management solutions are available when setting up an on-premises or private cloud environment.

When managing more than a few encrypted volumes, you should implement a key management solution. That’s why simplyblock supports KMS solutions by default.

Implementing Data At Rest Encryption (DARE)

Simply said, you should have a key management solution ready to implement data-at-rest encryption in your company. If you run in the cloud, I recommend using whatever the cloud provider offers. If you run on-premises or the cloud provider doesn’t provide one, select from the existing third-party solutions.

DARE with Linux

As the most popular server operating system, Linux has two options to choose from when you want to encrypt data. The first option works based on files, providing a filesystem-level encryption.

Typical solutions for filesystem-level based encryption are eCryptfs, a stacked filesystem, which stores encryption information in the header information of each encrypted file. The benefit of eCryptfs is the ability to copy encrypted files to other systems without the need to decrypt them first. As long as the target node has the necessary encryption key in its keyring, the file will be decrypted. The alternative is EncFS, a user-space filesystem that runs without special permissions. Both solutions haven’t seen updates in many years, and I’d generally recommend the second approach.

Block-level encryption transparently encrypts the whole content of a block device (meaning, hard disk, SSD, simplyblock logical volume, etc.). Data is automatically decrypted when read. While there is VeraCrypt, it is mainly a solution for home setups and laptops. For server setups, the most common way to implement block-level encryption is a combination of dm-crypt and LUKS, the Linux Unified Key Setup.

Linux Data At Rest Encryption with dm-crypt and LUKS

The fastest way to encrypt a volume with dm-crypt and LUKS is via the cryptsetup tools.

Debian / Ubuntu / Derivates:

sudo apt install cryptsetup

RHEL / Rocky Linux / Oracle:

yum install cryptsetup-luks

Now, we need to enable encryption for our block device. In this example, we assume we already have a whole block device or a partition we want to encrypt.

Warning: The following command will delete all data from the given device or partition. Make sure you use the correct device file.

cryptsetup -y luksFormat /dev/xvda

I recommend always running cryptsetup with the -y parameter, which forces it to ask for the passphrase twice. If you misspelled it once, you’ll realize it now, not later.

Now open the encrypted device.

cryptsetup luksOpen /dev/xvda encrypted

This command will ask for the passphrase. The passphrase is not recoverable, so you better remember it.

Afterward, the device is ready to be used as /dev/mapper/encrypted. We can format and mount it.

mkfs.ext4 /dev/mapper/encrypted
mkdir /mnt/encrypted
mount /dev/mapper/encrypted /mnt/encrypted

Data At Rest Encryption with Kubernetes

Kubernetes offers ephemeral and persistent storage as volumes. Those volumes can be statically or dynamically provisioned.

For pre-created and statically provisioned volumes, you can follow the above guide on encrypting block devices on Linux and make the already encrypted device available to Kubernetes.

However, Kubernetes doesn’t offer out-of-the-box support for encrypted and dynamically provisioned volumes. Encrypting persistent volumes is not in Kubernetes’s domain. Instead, it delegates this responsibility to its container storage provider, connected through the CSI (Container Storage Interface).

Note that not all CSI drivers support the data-at-rest encryption, though! But simplyblock does!

Data At Rest Encryption with Simplyblock

Figure 3: Encryption stack for Linux: dm-crypt+LUKS vs Simplyblock

Due to the importance of DARE, simplyblock enables you to secure your data immediately through data-at-rest encryption. Simplyblock goes above and beyond with its features and provides a fully multi-tenant DARE feature set. That said, in simplyblock, you can encrypt multiple logical volumes (virtual block devices) with the same key or one key per logical volume.

The use cases are different. One key per volume enables the highest level of security and complete isolation, even between volumes. You want this to encapsulate applications or teams fully.

When multiple volumes share the same encryption key, you want to provide one key per customer. This ensures that you isolate customers against each other and prevent data from being accessible by other customers on a shared infrastructure in the case of a configuration failure or similar incident.

To set up simplyblock, you can configure keys manually or utilize a key management solution. In this example, we’ll set up the key manual. We also assume that simplyblock has already been deployed as your distributed storage platform, and the simplyblock CSI driver is available in Kubernetes.

First, let’s create our Kubernetes StorageClass for simplyblock. Deploy the YAML file via kubectl, just as any other type of resource.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: encrypted-volumes
provisioner: csi.simplyblock.io
parameters:
    encryption: "True"
    csi.storage.k8s.io/fstype: ext4
    ... other parameters
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Secondly, we generate the two keys. Note the results of the two commands down.

openssl rand -hex 32   # Key 1
openssl rand -hex 32   # Key 2

Now, we can create our secret and Persistent Volume Claim (PVC).

apiVersion: v1
kind: Secret
metadata:
    name: encrypted-volume-keys
data:
    crypto_key1: 
    crypto_key2: 
–--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    annotations:
        simplybk/secret-name: encrypted-volume-keys
    name: encrypted-volume-claim
spec:
    storageClassName: encrypted-volumes
    accessModes:
        - ReadWriteOnce
    resources:
        requests:
            storage: 200Gi

And we’re done. Whenever we use the persistent volume claim, Kubernetes will delegate to simplyblock and ask for the encrypted volume. If it doesn’t exist yet, simplyblock will create it automatically. Otherwise, it will just provide it to Kubernetes directly. All data written to the logical volume is fully encrypted.

Best Practices for Securing Data at Rest

Implementing robust data encryption is crucial. In the best case, data should never exist in a decrypted state.

That said, data-in-use encryption is still complicated as of writing. However, solutions such as Edgeless Systems’ Constellation exist and make it possible using hardware memory encryption.

Data-in-transit encryption is commonly used today via TLS. If you don’t use it yet, there is no time to waste. Low-hanging fruits first.

Data-at-rest encryption in Windows, Linux, or Kubernetes doesn’t have to be complicated. Solutions such as simplyblock enable you to secure your data with minimal effort.

However, there are a few more things to remember when implementing DARE effectively.

Data Classification

Organizations should classify their data based on sensitivity levels and regulatory requirements. This classification guides encryption strategies and key management policies. A robust data classification system includes three things:

Sensitive data identification: Identify sensitive data through automated discovery tools and manual review processes. For example, personally identifiable information (PII) like social security numbers should be classified as highly sensitive.
Classification levels: Establish clear classification levels such as Public, Internal, Confidential, and Restricted. Each level should have defined handling requirements and encryption standards.
Automated classification: Implement automated classification tools to scan and categorize data based on content patterns and metadata.

Access Control and Key Management

Encryption is only as strong as the key management and permission control around it. If your keys are leaked, the strongest encryption is useless.

Therefore, it is crucial to implement strong access controls and key rotation policies. Additionally, regular key rotation helps minimize the impact of potential key compromises and, I hate to say it, employees leaving the company.

Monitoring and Auditing

Understanding potential risks early is essential. That’s why it must maintain comprehensive logs of all access to encrypted data and the encryption keys or key management solutions. Also, regular audits should be scheduled for suspicious activities.

In the best case, multiple teams run independent audits to prevent internal leaks through dubious employees. While it may sound harsh, there are situations in life where people take the wrong path. Not necessarily on purpose or because they want to.

Data Minimization

The most secure data is the data that isn’t stored. Hence, you should only store necessary data.

Apart from that, encrypt only what needs protection. While this sounds counterproductive, it reduces the attack surface and the performance impact of encryption.

Data At Rest Encryption: The Essential Component of Data Management

Data-at-rest encryption (DARE) has become essential for organizations handling sensitive information. The rise in data breaches and increasingly stringent regulatory requirements make it vital to protect stored information. Additionally, with the rise of cloud computing and distributed systems, implementing DARE is more critical than ever.

Simplyblock integrates natively with Kubernetes to provide a seamless approach to implementing data-at-rest encryption in modern containerized environments. With our support for transparent encryption, your organization can secure its data without any application changes. Furthermore, simplyblock utilizes the standard NVMe over TCP protocol which enables us to work natively with Linux. No additional drivers are required. Use a simplyblock logical volume straight from your dedicated or virtual machine, including all encryption features.

Anyhow, for organizations running Kubernetes, whether in public clouds, private clouds, or on-premises, DARE serves as a fundamental security control. By following best practices and using modern tools like Simplyblock, organizations can achieve robust data protection while maintaining system performance and usability.

But remember that DARE is just one component of a comprehensive data security strategy. It should be combined with other security controls, such as access management, network security, and security monitoring, to create a defense-in-depth approach to protecting sensitive information.

That all said, by following the guidelines and implementations detailed in this article, your organization can effectively protect its data at rest while maintaining system functionality and performance.

As threats continue to evolve, having a solid foundation in data encryption becomes increasingly crucial for maintaining data security and regulatory compliance.

What is Data At Rest Encryption?

Data At Rest Encryption (DARE), or encryption at rest, is the encryption process of data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

What is Data-At-Rest?

Data-at-rest refers to any piece of information written to physical storage media, such as flash storage or hard disk. This also includes storage solutions like cloud storage, offline backups, and file systems as valid digital storage.

What is Data-In-Use?

Data-in-use describes any type of data created inside an application, which is considered data-in-use as long as the application holds onto it. Hence, data-in-use is information actively processed, read, or modified by applications or users.

What is Data-In-Transit?

Data-in-transit describes any information moving between locations, such as data sent across the internet, moved within a private network, or transmitted between memory and processors.

The post Encryption At Rest: A Comprehensive Guide to DARE appeared first on simplyblock.

NVMe & Kubernetes: Future-Proof Infrastructure

Chris Engelbert — Wed, 27 Nov 2024 13:34:00 +0000

The marriage of NVMe storage and Kubernetes persistent volumes represents a perfect union of high-performance storage and modern container orchestration. As organizations increasingly move performance-critical workloads to Kubernetes, understanding how to leverage NVMe technology becomes crucial for achieving optimal performance and efficiency.

The Evolution of Storage in Kubernetes

When Kubernetes was created over 10 years ago, its only purpose was to schedule and orchestrate stateless workloads. Since then, a lot has changed, and Kubernetes is increasingly used for stateful workloads. Not just basic ones but mission-critical workloads, like a company’s primary databases. The promise of workload orchestration in infrastructures with growing complexity is too significant.

Anyhow, traditional Kubernetes storage solutions relied upon and still rely on old network-attached storage protocols like iSCSI. Released in 2000, iSCSI was built on the SCSI protocol itself, first introduced in the 1980s. Hence, both protocols are inherently designed for spinning disks with much higher seek times and access latencies. According to our modern understanding of low latency and low complexity, they just can’t keep up.

While these solutions worked well for basic containerized applications, they fall short for high-performance workloads like databases, AI/ML training, and real-time analytics. Let’s look at the NVMe standard, particularly NVMe over TCP, which has transformed our thinking about storage in containerized environments, not just Kubernetes.

Why NVMe and Kubernetes Work So Well Together

The beauty of this combination lies in their complementary architectures. The NVMe protocol and command set were designed from the ground up for parallel, low-latency operations–precisely what modern containerized applications demand. When you combine NVMe’s parallelism with Kubernetes’ orchestration capabilities, you get a system that can efficiently distribute I/O-intensive workloads while maintaining microsecond-level latency. Further, comparing NVMe over TCP vs iSCSI, we see significant improvement in terms of IOPS and latency performance when using NVMe/TCP.

Consider a typical database workload on Kubernetes. Traditional storage might introduce latencies of 2-4ms for read operations. With NVMe over TCP, these same operations complete in under 200 microseconds–a 10-20x improvement. This isn’t just about raw speed; it’s about enabling new classes of applications to run effectively in containerized environments.

The Technical Symphony

The integration of NVMe with Kubernetes is particularly elegant through persistent volumes and the Container Storage Interface (CSI). Modern storage orchestrators like simplyblock leverage this interface to provide seamless NVMe storage provisioning while maintaining Kubernetes’ declarative model. This means development teams can request high-performance storage using familiar Kubernetes constructs while the underlying system handles the complexity of NVMe management, providing fully reliable shared storage.

The NMVe Impact: A Real-World Example

But what does that mean for actual workloads? Our friends over at Percona found in their MongoDB Performance on Kubernetes report that Kubernetes implies no performance penalty. Hence, we can look at the disks’ actual raw performance.

A team of researchers from the University of Southern California, San Jose State University, and Samsung Semiconductor took on the challenge of measuring the implications of NVMe SSDs (over SATA SSD and SATA HDD) for real-world database performance.

The general performance characteristics of their test hardware:

	NVMe SSD	SATA SSD	SATA HDD
Access latency	113µs	125µs	14,295µs
Maximum IOPS	750,000	70,000	190
Maximum Bandwidth	3GB/s	278MB/s	791KB/s

Table 1: General performance characteristics of the different storage types

Their resume states, “scale-out systems are driving the need for high-performance storage solutions with high available bandwidth and lower access latencies. To address this need, newer standards are being developed exclusively for non-volatile storage devices like SSDs,” and “NVMe’s hardware and software redesign of the storage subsystem translates into real-world benefits.”

They’re closing with some direct comparisons that claim an 8x performance improvement of NVMe-based SSDs compared to a single SATA-based SSD and still a 5x improvement over a [Hardware] RAID-0 of four SATA-based SSDs.

Transforming Database Operations

Perhaps the most compelling use case for NVMe in Kubernetes is database operations. Typical modern databases process queries significantly faster when storage isn’t the bottleneck. This becomes particularly important in microservices architectures where concurrent database requests and high-load scenarios are the norm.

Traditionally, running stateful services in Kubernetes meant accepting significant performance overhead. With NVMe storage, organizations can now run high-performance databases, caches, and messaging systems with bare-metal-like performance in their Kubernetes clusters.

Dynamic Resource Allocation

One of Kubernetes’ central promises is dynamic resource allocation. That means assigning CPU and memory according to actual application requirements. Furthermore, it also means dynamically allocating storage for stateful workloads. With storage classes, Kubernetes provides the option to assign different types of storage backends to different types of applications. While not strictly necessary, this can be a great application of the “best tool for the job” principle.

That said, for IO-intensive workloads, such as databases, a storage backend providing NVMe storage is essential. NVMe’s ability to handle massive I/O parallelism aligns perfectly with Kubernetes’ scheduling capabilities. Storage resources can be dynamically allocated and deallocated based on workload demands, ensuring optimal resource utilization while maintaining performance guarantees.

Simplified High Availability

The low latency of NVMe over TCP enables new approaches to high availability. Instead of complex database replication schemes, organizations can leverage storage-level replication (or more storage-efficient erasure coding, like in the case of simplyblock) with a negligible performance impact. This significantly simplifies application architecture while improving reliability.

Furthermore, NVMe over TCP utilizes multipathing as an automatic fail-over implementation to protect against network connection issues and sudden connection drops, increasing the high availability of persistent volumes in Kubernetes.

The Physics Behind NVMe Performance

Many teams don’t realize how profoundly storage physics impacts database operations. Traditional storage solutions averaging 2-4ms latency might seem fast, but this translates to a hard limit of about 80 consistent transactions per second, even before considering CPU or network overhead. Each transaction requires multiple storage operations: reading data pages, writing to WAL, updating indexes, and performing one or more fsync() operations. At 3ms per operation, these quickly stack up into significant delays. Many teams spend weeks optimizing queries or adding memory when their real bottleneck is fundamental storage latency.

This is where the NVMe and Kubernetes combination truly shines. With NVMe as your Kubernetes persistent volume storage backend, providing sub-200μs latency, the same database operations can theoretically support over 1,200 transactions per second–a 15x improvement. More importantly, this dramatic reduction in storage latency changes how databases behave under load. Connection pools remain healthy longer, buffer cache decisions become more efficient, and query planners can make better optimization choices. With the storage bottleneck removed, databases can finally operate closer to their theoretical performance limits.

Looking Ahead

The combination of NVMe and Kubernetes is just beginning to show its potential. As more organizations move performance-critical workloads to Kubernetes, we’ll likely see new patterns and use cases that fully take advantage of this powerful combination.

Some areas to watch:

AI/ML workload optimization through intelligent data placement
Real-time analytics platforms leveraging NVMe’s parallel access capabilities
Next-generation database architectures built specifically for NVMe on Kubernetes Persistent Volumes

The marriage of NVMe-based storage and Kubernetes Persistent Volumes represents more than just a performance improvement. It’s a fundamental shift in how we think about storage for containerized environments. Organizations that understand and leverage this combination effectively gain a significant competitive advantage through improved performance, reduced complexity, and better resource utilization.

For a deeper dive into implementing NVMe storage in Kubernetes, visit our guide on optimizing Kubernetes storage performance.

The post NVMe & Kubernetes: Future-Proof Infrastructure appeared first on simplyblock.

Local NVMe Storage on AWS – Pros and Cons

Rob Pankow — Thu, 03 Oct 2024 12:13:26 +0000

What is the Best Storage Solution on AWS?

The debate over the optimal storage solution has been ongoing. Local instance storage on AWS (i.e. ephemeral NVMe disk attached to EC2 instance) brings remarkable cost-performance ratios. It offers 20 times better performance and 10 times lower access latency than EBS. It’s a powerhouse for quick, ephemeral storage needs. In simple words, local NVME disk is very fast and relatively cheap, but not scalable and not persistent.

Recently, Vantage posted an article titled “Don’t use EBS for Cloud Native Services“. We agree with this problem statement, however we also strongly believe that there is a better solution that using Local NVMe SSD Storage on AWS as an alternative to EBS. Local NVMe to EBS is not like comparing apples to apples, but more like apples to oranges.

The Local Instance NVMe Storage Advantage

Local storage on AWS excels in speed and cost-efficiency, delivering performance that’s 20 times better and latency that’s 10 times lower compared to EBS. For certain workloads with temporary storage needs, it’s a clear winner. But, let’s acknowledge the reasons why data centers have traditionally separated storage and compute.

Overcoming Traditional Challenges of Local Storage

Challenges	Local Storage	simplyblock
Scalability	Limited capacity, unable to resize dynamically	Dynamic scalability with simplyblock
Reliability	Data loss if instance is stopped or terminated	Advanced data protection, data survives instance outage
High Availability	Inconsistent access in case of compute instance outage	Access to storage must remain fully available in case of a compute instance outage
Data Protection Efficiency	N/A	Use of erasure coding instead of three replicas to reduce load on the network and effective-to-raw storage ratios by a factor of about 2.5x
Predictability/Consistency	Access latency increases with rising IOPS demand	Constant access latencies with simplyblock
Maintainability	Impact on storage during compute instance upgrades	Upgrading and maintaining compute instances without impact on storage is an important aspect of operations
Data Services Offloading	N/A	No impact on local CPU, performance and access latency for data services such as volume snapshot, copy-on-write cloning, instant volume resizing, erasure coding, encryption and data compression
Intelligent Storage Tiering	N/A	Automatically move infrequently accessed data chunks from more expensive, fast storage to cheap S3 buckets

Simplyblock provides an innovatrive approach that marries the cost and performance advantages of local instance storage with the benefits of pooled cloud storage. It offers the best of both worlds—high-speed, low-latency performance near to local storage, coupled with the robustness and flexibility of pooled cloud storage.

Why Choose simplyblock on AWS?

Performance and Cost Efficiency: Enjoy the benefits of local storage without compromising on scalability, reliability, and high availability.
Data Protection: simplyblock employs advanced data protection mechanisms,
ensuring that your data survives any instance outage.
Seamless Operations: Upgrade and maintain compute instances without impacting storage, ensuring continuous operations.
Data Services Galore: Unlock the potential of various data services without affecting local CPU performance.

While local instance storage has its merits, the future lies in a harmonious blend of the speed of local storage and the resilience of cloud-pooled storage. With simplyblock, we transcend the limitations of local NVMe disk, providing you with a storage solution that’s not just powerful but also versatile, scalable, and intelligently designed for the complexities of the cloud era.

The post Local NVMe Storage on AWS – Pros and Cons appeared first on simplyblock.

Origins of simplyblock and the Evolution of Storage Technologies

Rahil Parekh — Fri, 20 Sep 2024 21:19:18 +0000

Introduction:

In this episode of the simplyblock Cloud Commute Podcast, host Chris Engelbert interviews Michael Schmidt, co-founder of simplyblock. Michael shares insights into the evolution of storage technologies and how simplyblock is pushing boundaries with software-defined storage (SDS) to replace outdated hardware-defined systems. If you’re curious about how cloud storage is transforming through SDS and how it’s creating new possibilities for scalability and efficiency, this episode is a must-listen.

This interview is part of the simplyblock Cloud Commute Podcast, available on Youtube, Spotify, iTunes/Apple Podcasts, and our show site.

Key Takeaways

What is simplyblock, and how does it Differ from Traditional Storage Technologies?

Michael Schmidt explained that simplyblock is built on the idea that hardware-defined storage systems are becoming outdated. The traditional storage models, like SAN (Storage Area Networks), are slow-moving, expensive, and difficult to scale in cloud environments. Simplyblock, in contrast, leverages software-defined storage (SDS), making it more flexible, scalable, and hardware-agnostic. The key advantage is that SDS allows organizations to operate independently of the hardware lifecycle and seamlessly scale their storage without the limitations of physical systems.

How does simplyblock Offer better Storage Performance for Kubernetes Clusters?

Simplyblock is optimized for Kubernetes environments by integrating a CSI (Container Storage Interface) driver. Michael noted that deploying simplyblock on Kubernetes allows users to take advantage of local disk storage, NVMe devices, or standard GP3 volumes within AWS. This integration simplifies scaling and enhances storage performance with minimal configuration, making it highly adaptable for workloads that require high-speed, reliable storage.

EP30: A Brief History of Simplyblock and Evolution of Storage technologies | Michael Schmidt

In addition to highlighting the key takeaways, it’s essential to provide context that enriches the listener’s understanding of the episode. By offering this added layer of information, we ensure that when you tune in, you’ll have a clearer grasp of the nuances behind the discussion. This approach helps shed light on the reasoning and perspective behind the thoughtful questions posed by our host, Chris Engelbert. Ultimately, this allows for a more immersive and insightful listening experience.

Key Learnings

What are the Advantages of Software-defined Storage Compared to Hardware-defined Storage?

Software-defined storage offers flexibility by decoupling storage from physical hardware. This results in improved scalability, lifecycle management, and cost-effectiveness.

Simplyblock Insight:

Software-defined storage systems like simplyblock allow for hardware-agnostic scalability, enabling businesses to avoid hardware refresh cycles that burden CAPEX and OPEX budgets. SDS also opens up the possibility for greater automation and better integration with existing cloud infrastructures.

What is Thin Provisioning in Cloud Storage?

Thin provisioning allows cloud users to allocate storage without consuming the full provisioned capacity upfront, optimizing resource usage.

Simplyblock Insight:

Thin provisioning has been standard in enterprise storage systems for years, and simplyblock brings this essential feature to the cloud. By offering thin provisioning in its cloud-native architecture, simplyblock ensures that businesses can avoid over-provisioning and reduce storage costs, only paying for the storage they use. This efficiency significantly benefits organizations with unpredictable storage needs.

Additional Nugget of Information

Why are SLAs Important in Software-defined Storage, and how does Simplyblock Ensure Performance Reliability?

Service Level Agreements (SLAs) are crucial in software-defined storage because they guarantee specific performance metrics, such as IOPS (input/output operations per second), latency, and availability. In traditional hardware-defined storage systems, performance metrics were easier to predict due to standardized hardware configurations. However, with software-defined storage, where hardware can vary, SLAs provide customers with a level of assurance that the storage system will meet their needs consistently, regardless of the underlying infrastructure.

Conclusion

Michael Schmidt’s discussion offers a fascinating look at the evolving landscape of cloud storage. It’s clear that simplyblock is addressing key challenges by combining the flexibility of software-defined storage with the power of modern cloud-native architectures. Whether you’re managing large-scale Kubernetes deployments or trying to cut infrastructure costs, simplyblock’s approach to scalability and performance could be just what you need.

If you’re considering how to future-proof your storage solutions or make them more cost-efficient, the insights shared in this episode will be valuable. Be sure to explore the simplyblock platform and stay connected for more episodes of the Cloud Commute Podcast. We’re constantly bringing in experts to discuss the cutting-edge technologies shaping tomorrow’s infrastructure. Don’t miss out!

The post Origins of simplyblock and the Evolution of Storage Technologies appeared first on simplyblock.

Disaster Recovery with Simplyblock in AWS

Michael Schmidt — Fri, 06 Sep 2024 23:41:03 +0000

When disaster strikes, a great recovery strategy is required. Oftentimes, deficiencies are only discovered when it’s already too late. Simplyblock provides comprehensive disaster recovery support for databases, file storages, and whole infrastructures, enabling the restore from ground up in a different availability zone with minimal RTO (Recovery Time Objective) and near-zero RPO (Recovery Point Objective).

Amazon EBS, Amazon S3, and Local Instance Storage

AWS’ cloud block storage ( Amazon EBS ) is a great product, providing a multitude of different product types depending on your performance (random IOPS, access latency) requirements. However, the provided durability is limited. Depending on the EBS volume type , AWS provides a durability indicator between 99.8% and 99.999%. The bigger issue though, in case of a disaster in your availability zone (AZ), storage will become unavailable in its entirety and, depending on the type of the disaster, data may actually be lost (partially or in full).

The durability is even worse with local instance storage. Local instance storage are NVMe disks which are physically located on the virtual machine host that runs your workload. That said, all data stored on local instance storage is immediately lost once the instance is turned off, or a failure occurs with the physical host.

Amazon S3 storage, on the other hand, is considered to be extremely durable, offering 99.999999999% durability. In addition, it is replicated across availability zones. Therefore, the probability of data loss by any kind of disaster is close to zero. To our knowledge, and as of time of writing, it has never actually happened. In terms of durability, Amazon S3 is king.We trade, however, durability for latency.

Data Protection for Amazon EBS

As shown, all persistent (meaning, non ephemeral) data stored in Amazon EBS requires additional means of protection. That said, the most common way to protect your data is taking a snapshot of your EBS volume and backing it up to Amazon S3.

Those S3 backups have a number of important drawbacks though: A snapshot-based backup always implicitly means that you’ll have data loss of some kind. Data which has been written between the last backup and the time of the failure is irrecoverably lost. No restore procedure will be able to recover it. For low velocity data (data which is rarely changed) that may be a minor issue. Examples of this kind of data may be media files or archived documents. However, the data loss can be catastrophic for other types of data such as transactional systems. Multiple backups between different systems aren’t consistent between each other. The backup of one database may not fit the backup of another database or a file repository. That said, after restoration the systems may have inconsistent data states and will not integrate correctly. Bringing a collection of systems with backups taken at different times back into a working state can be a massive manual effort. Sometimes it is even impossible. Backup management is a significant effort. To free up disk space, it is necessary to remove snapshots from EBS after moving them to S3. Furthermore, backups have to be configured with retention policies. The successful operations of taking backups must be monitored and backups have to be tested regularly to make sure it is possible to restore them successfully.

Last but not least, human error in backup management may lead to missing or corrupted backups.

Data Protection for Amazon EBS with Simplyblock

Simplyblock provides a smart solution to the consistent recovery of hot data after a major incident or even a zone-level disaster.

First and foremost, simplyblock logical volumes stores data synchronously into the hot tier storage backend. In addition, data is also written into an asynchronous replicated write-ahead log (WAL). Writing this log is optimized for high throughput to secondary (low IOPS) storage such as S3 or HDD pools (e.g. the Amazon EBS st2 service). Last but not least, the WAL is efficiently compacted at regular intervals to limit storage growth and optimize recovery times.

Simplyblock’s logical volumes inherently support snapshots. Due to the copy-on-write nature of simplyblock, snapshots are taken immediately and, together with the WAL, asynchronously replicated to S3.

Data recovery, on the other hand, restores all live volumes and snapshots in a fully consistent manner. The asynchronicity of the replication limits data loss to a few hundred milliseconds.

Disaster Recovery with Near-Zero RPO

The solution stores all “hot” data either in distributed instance storage or within gp3 pools, providing the necessary online performance of storage. At the same time, all data is also asynchronously replicated into S3.

In case of a loss of the entire infrastructure in an availability zone (including the gp3 volumes and local instance storage) it is possible to consistently bootstrap the entire environment in a new AZ.

If a customer uses simplyblock to store the databases, but also bootstrap and deployment information (like ArgoCD configuration, terraform data, or similar), a recover operation can consistently restore the entirety of the infrastructure from ground up. Using this strategy, infrastructures supported by simplyblock can be consistently and fully automatically recovered with near-zero RPO and a low RTO becomes possible.

For this purpose, the “primary” simplyblock storage pod, which contains all data required for bootstrapping, has to be restarted in a new zone and connected to the control plane. Afterwards, all storage is consistently accessible.

First, infrastructure templates and configurations for the environment are retrieved, after which the deployment scripts are run and the infrastructure is redeployed. In this process, databases, documents, and other file stores can already be connected to their corresponding volumes, which contain all of the data in a crash-consistent manner.

At a later stage, “secondary” storage plane pods can be restarted within the new availability zone and data will be recovered.

The recovery time depends largely on the amount of data and the instance network bandwidth. The read time from S3 is highly optimized using large, parallel reads, wherever possible to pre-fetch hot data as quickly as possible.

Conclusion

All that said, simplyblock, the intelligent storage orchestrator, provides a powerful and feature-rich solution to provide a crash-consistent, yet performant storage solution.

Built upon well-known storage solutions, such as local instance storage, Amazon EBS, and Amazon S3, simplyblock combines the ultra low latency access of NVMe volumes (pooled or unpooled) with the extreme durability of Amazon S3. Simplyblock’s write-ahead log and disaster recovery support enables the lowest RPO and minimal downtime, even in case of the loss of a full availability zone.

Get started with simplyblock today and learn all about the other amazing features simplyblock brings right to you.

The post Disaster Recovery with Simplyblock in AWS appeared first on simplyblock.

How We Built Our Distributed Data Placement Algorithm

Michael Schmidt — Wed, 22 May 2024 12:11:23 +0000

Modern cloud applications demand more from their storage than ever before – ultra-low latency, predictable performance, and bulletproof reliability. Simplyblock’s software-defined storage cluster technology, built upon its distributed data placement algorithm, reimagines how we utilize NVMe devices in public cloud environments.

This article deep dives into how we’ve improved upon traditional distributed data placement algorithms to create a high-performance I/O processing environment that meets modern enterprise storage requirements.

Design Of Simplyblock’s Storage Cluster

Simplyblock storage cluster technology is designed to utilize NVMe storage devices in public cloud environments for use cases that require predictable and ultra-low access latency (sub-millisecond) and the highest performance density (high IOPS per GiB).

To combine high performance with a high degree of data durability, high availability, and fault tolerance, as well as zero downtime scalability, the known distributed data placement algorithms had to be improved, re-combined, and implemented into a high-performance IO processing environment.

Our innovative approach combines:

Predictable, ultra-low latency performance (<1ms)
Maximum IOPS density optimization
Enterprise-grade durability and availability
Zero-downtime scalability
Advanced failure domain management

Modern Storage Requirements

Use cases such as high-load databases, time-series databases with high-velocity data, Artificial Intelligence (AI), Machine Learning (ML), and many others require fast and predictable storage solutions.

Anyhow, performance isn’t everything. The fastest storage is writing to /dev/null, but only if you don’t need the data durability. That said, the main goals for a modern storage solution are:

High Performance Density, meaning a high amount of IOPS per Gigabyte (at an affordable price).
Predictable, low Latency, especially for use cases that require consistent response times.
High degree of Data Durability, to distribute the data across failure domains, enabling it to survive multiple failure scenarios.
High Availability and Fault Tolerance, for the data to remain accessible in case of node outage. Clusters are automatically re-balanced in the case of element failures.
Zero Downtime Scalability, meaning that clusters can grow in real-time and online and are automatically re-balanced.

Distributed Data Placement

Data placement in storage clusters commonly uses pseudo-randomization. Additionally, features such as weighted distribution of storage across the cluster (based on the capacity and performance of available data buckets) are introduced to handle failure domains and cluster rebalancing – for scaling, downsizing, or removal of failed elements – at minimal cost. A prominent example of such an algorithm is CRUSH (Controlled, Scalable, Decentralized Placement of Replicated Data), which is used in Ceph, an open-source software-defined storage platform designed to provide object storage, block storage, and file storage in a unified system.

Simplyblock uses a different algorithm to achieve the following characteristics for its distributed data placement feature:

High storage efficiency (raw to effective storage ratio) with minimal performance overhead. Instead of using three data replicas, which is the standard mechanism to protect data from storage device failure in software-defined storage clusters, simplyblock uses error coding algorithms with a raw-to-effective ratio of about 1.33 (instead of 3).
Very low access latency below 100 microseconds for read and write. Possible write amplification below 2.
Ultra-high IOPS density with more than 200.000 IOPS per CPU core.
Performant re-distribution of storage in the cluster in the case of cluster scaling and removal of failed storage devices. Simplyblock’s algorithm will only re-distribute the amount of data that is close to the theoretical minimum to rebalance the cluster.
Support for volume high-availability based on the NVMe industry standard. Support for simple failure domains as they are available in cloud environments (device, node, rack, availability zone).
Performance efficiency aims to address the typical performance bottlenecks in cloud environments.

Implementing a storage solution to keep up with current trends required us to think out of the box. The technical design consists of several elements.

Low-level I/O Processing Pipeline

On the lower level, simplyblock uses a fixed-size page mapping algorithm implemented in a virtual block device (a virtual block device implements a filter or transformation step in the IO processing pipeline).

For that purpose, IO is organized into “pages” ( 2^m blocks, with m in the range of 8 to 12). Cross-page IO has to be split before processing. This is done on the mid-level processing pipeline. We’ll get to that in a second.

This algorithm can place data received via IO-write requests from multiple virtual block devices on a single physical block device. Each virtual block device has its own logical block address space though. The algorithm is designed to read, write, and unmap (deallocate) data with minimal write amplification for metadata updates (about 2%) and minimal increase in latency (on average in the sub-microseconds range). Furthermore, it is optimized for sudden power cuts (crash-consistent) by storing all metadata inline of the storage blocks on the underlying device.

Like all block device IO requests, each request contains an LBA (logical block address) and a length (in blocks). The 64-bit LBA is internally organized into a 24-bit VUID (a cluster-wide unique identifier of the logical volume) and a 39-bit virtual LBA. The starting LBA of the page on the physical device is identified by the key (VUID, LPA), where LPA is the logical page address (LBA / (2^m)), and the address offset within the page is determined by (LBA modulo 2^m).

The IO processing services of this virtual device work entirely asynchronously on CPU-pinned threads with entirely private data (no synchronization mechanisms between IO threads required).

They are placed on top of an entirely asynchronous NVMe driver, which receives IO and submits responses via IO-queue pairs sitting at the bottom of the IO processing stack.

Mid-level IO Processing Pipeline

On top of the low-level mapping device, a virtual block device, which implements distributed data placement, has access to the entire cluster topology. This topology is maintained by a centralized multi-tenant control plane, which knows about the state of each node and device in the attached clusters and manages changes to cluster topology (adding or removing devices and nodes).

It uses multiple mechanisms to determine the calculated and factual location of each data page and then issues asynchronous IO to this mapping device in the cluster locally (NVMe) or remotely using NVMe over Fabrics (NVMe-oF):

All IO is received and forwarded on private IO threads, with no inter-thread communication, and entirely asynchronously, both inbound and outbound.
First, IO is split at page boundaries so that single requests can be processed within a single page.
Data is then striped into n chunks, and (double) parity is calculated from the chunks. Double parity is calculated using the RDP algorithm. The data is, therefore, organized in 2-dimensional arrays. n equals 1, 2, 4, or 8. This way, 4 KiB blocks can be mapped into 512-byte device blocks, and expensive partial stripe writes can be avoided.
To determine a primary target for each combination of (VUID, page, chunk-index), a flat list of devices is fed into the “list bucket” algorithm (see …) with (VUID, page, chunk-index) being the key.
Each of the data and parity chunks in a stripe have to be placed on a different device. In addition, more failure domains, such as nodes and racks, can be considered for placement anti-affinity rules. In case of a collision, the algorithm repeats recursively with an adjusted chunk-index (chunk-index + i x p, where p is the next prime number larger than the maximum chunk index and i is the iteration).
In case a selected device is (currently) not available, the algorithm repeats recursively to find an alternative and also stores the temporary placement data for each chunk in the IO. This temporary placement data is now also journaled as metadata. Metadata journaling is an important and complex part of the algorithm. It is described separately below.
On read, the process is reversed: the chunks to read from a determined placement location are determined by the same algorithm.
In case of single or dual device failure at read, the missing data will be reconstructed on the fly from parity chunks.
The algorithm pushes any information on IO errors straight to the control plane, and the control plane may update the cluster map (status of nodes and devices) and push the updated cluster map back to all nodes and virtual devices.

Top-level IO Processing Pipeline

On top of the stack of virtual block devices, simplyblock includes multiple optional virtual block devices, including a snapshot device – the device can take instant snapshots of volumes, supports snapshot chains, and instant (copy-on-write) cloning of volumes. Additionally, there is a virtual block device layer, which supports synchronous and asynchronous replication of block storage volumes across availability zones.

The highest virtual block device in the stack is then published to the fabric as a separate NVMe-oF volume with its own unique NVMe identifier (NQN) via the control plane.

High-Availability Support

The algorithm supports highly available volumes based on NVMe multipathing and ANA (asynchronous namespace access). This means that a transparent fail-over of IO for a single volume in case of a node outage is realized without having to add any additional software to clients.

Due to the features of the high-, mid-, and low-level IO pipeline, this is easy to realize: identical stacks of virtual block devices with an identical VUID are created on multiple nodes and published to the fabric using “asynchronous namespace access” (which prefers one volume over others and essentially implements an active/passive/passive mechanism).

Metadata Journaling

Metadata Journaling persists in non-primary placement locations to locate data in the cluster. It has the following important features:

It has to persist every change in location for the block range addressed in an IO request to be consistent in “sudden power-off” situations (node outages)
It has to minimize write amplification – this is achieved by smartly “batching” multiple metadata write requests into single IO operations in the form of “high-priority” IO
It has to be fast – not delaying data IO – this is achieved by high-priority NVMe queues
It has to be reliable – this is achieved by replicating metadata writes to three nodes using NVMe over Fabrics remote connections
Its storage footprint has to be small and remain constant over time; it cannot grow forever with new IO – this is achieved by introducing a regular compression mechanism, which replaces the transactional journal with a “snapshot” of placement metadata at a certain moment

Data Migrations

Data Migrations run as background processes, which take care of the movement of data in cases of failed devices (re-rebuild and re-distribution of data in the cluster), cluster scaling (to reduce the load on utilized elements and rebalance the cluster), and temporary element outage (to migrate data back to its primary locations). Running data migrations keeps a cluster in a state of transition and has to be coordinated to not conflict with any ongoing IO.

Conclusion

Building an architecture for a fast, scalable, fault-tolerant distributed storage solution isn’t easy. To be fair, I don’t think anyone expected that. Distributed systems are always complicated, and a lot of brain power goes into their design.

Simplyblock separates itself by rethinking the data placement in distributed storage environments. Part of it is the fundamentally different way of using erasure coding for parity information. We don’t just use them on a single node, between the local drives, simplyblock uses erasure coding throughout the cluster, distributing parity information from each disk on another disk on another node, hence increasing the fault tolerance.

To test simplyblock, get started right away. If you want to learn more about the features simplyblock offers you, see our feature overview.

The post How We Built Our Distributed Data Placement Algorithm appeared first on simplyblock.

What is Block Storage?

Chris Engelbert — Wed, 15 May 2024 12:11:36 +0000

The term Block storage describes a technology that controls how data is stored on storage devices. The name is derived from the fact that block storage splits any kind of information, such as files or raw disk content, into chunks (or blocks) of equal size.

Block storage is the most versatile type of storage, as it is the underlying structure of other storage options, such as file or object storage. It is also the most known type of storage since most typical storage media (HDD, SSD, NVMe, …) are exposed to the system as block storage devices.

How does Block Storage Work?

Block storage devices are split into a number of independent blocks. Each block has a logical block address (LBA) which uniquely identifies it. Furthermore, the blocks are all the same size for the same block device, and typically only one piece of information can be stored within a single block (as it is the smallest addressable unit).

When an application wants to write a file it is first determined if the file fits into a single block. If this is the case, it’s an easy operation. Find a free / unused block, write the file to it.

If the file is larger than a single block, it is split into multiple parts, with each part being written to a separate free block. The order or consecutive positioning of these blocks is not guaranteed.

Anyhow, after the file is written to one or more blocks, the block address(es) are written to a lookup table. The lookup table is provided through the filesystem that was installed onto the block device and varies depending on the filesystem in use. If you’ve ever heard the term Inode in Linux, that’s part of the lookup mechanism.

Though, when reading the file, based on the filename the blocks and their read-order is looked up in the lookup table, and the block storage reads the requested blocks back into memory where the file is pieced together in the right order.

Block Storage use Cases

Due to the unique characteristics of block storage, it can be used for any kind of use case. Typical simple use cases include computer storage, including virtual hard drives for virtual machines, being used to store and boot the operating system.

Where block storage really shines though is when high performance is required, or when IO-intensive, latency sensitive, as well as mission-critical workloads, such as relational or transactional databases, time-series databases, container storage, require storage. In these cases it’s common to claim, the faster the better.

Database Workloads

A transactional workload is a series of changes from different users. That means that the database receives reads and writes from various users over time. Modifications between different changes need to be atomic (meaning, happen at once or not at all), which is known as a transaction. A common example of transactional workloads are banking systems, where multiple (money) transactions happen in parallel.

Due to the nature of block storage, where each block is an independent unit, databases can optimally read and write data, either with a filesystem in between, or taking on the role of managing the block assignment themselves. With a growing data set, the underlying physical storage can be split into multiple devices, or even multiple storage nodes. The logical view of a block storage device stays intact.

Cloud and Container Workloads

Virtual machines and containers are designed to be a flexible way to place workloads on machines, isolated from each other. This flexibility requires storage which is just as flexible and can easily be grown in size and migrated to other locations (servers, data centers, or operating environments). While alternative storage technologies are available, none of them is as flexible as pure block storage devices.

Other High Velocity Data Workloads

Workloads with high data velocity, meaning rapidly changing data, oftentimes within seconds, need storage solutions that can keep up with the speed of writes and reads. Typical use cases of such workloads include Big Data Analytics, but also real-time use cases, such as GPS tracking data (Uber, DHL, etc). In these cases, direct addressable block access improves read and write performance by removing additional, non-standard access layers.

Block Storage vs File Storage

File level storage, or file storage refers to storage options that work purely on a file level. File storage is commonly associated with local file systems such as NTFS, ext4, or network file systems such as SMB (the Windows file sharing protocol), or NFS.

From a user’s perspective, file storages are easy to use and to navigate, since their design replicates how we operate with local file systems. The present directories and files, and mimic the hierarchical nesting of those. File storages often provide access control and permissions on a file basis.

While easy to use, the way those storages are implemented introduces a single access path, hence the performance can be impacted compared to block storage, especially in situations with many concurrent accesses. It also means that interoperability may be decreased over a pure block storage device since not every file system implementation is available on every operating system.

Typically, a file storage is backed by a block storage device in combination with a file system. This file system is either used locally, or made available remotely through one of the available network file systems.

Block Storage vs Object Storage

Object storage, sometimes also known as blob storage, is a storage approach which stores information in blobs or objects (which explains the origin of its name). Each object has a variable amount of metadata attached to it, and is globally uniquely identifiable.

The object identities are commonly collected and managed by the application that stores or reads the file. These identities commonly are represented by URIs, due to the fact that most object storages (these days) are based on HTTP services, such as AWS S3, or Azure Blob Storage. This means that typical access patterns aren’t available, and that object storages most often require application changes. The S3 protocol currently is kind of a de facto standard across many object storage implementations. Yet, not all (especially other cloud providers) implement it. Meaning that implementations aren’t compatible or interchangeable.

Object storages, while versatile, impact the performance and accessibility of files. The additional protocol overhead, as well as access patterns are great for unstructured, static files, such as images, video data, backup files, and similar, but aren’t a good fit for frequently accessed or updated data.

Block Storage as a Service

In summary, block storage, file storage, and object storage each offer distinct advantages and are suited to different use cases. While block storage excels in performance-critical applications, file storage is ideal for shared file access, and object storage provides scalable storage for unstructured data.

Anyhow, each of those storage types is available via one or more storage as a service offerings. While some may be compatible inside their own category, others are not. Being able to interchange implementations, or change cloud providers when necessary may be a requirement. Incompatible protocols, especially in the case of object storages, paired with the performance impact over block storages makes a basic block storage still the tool of choice for most use cases.

Simplyblock offers a highly distributed, NVMe optimized, block storage solution. It combines the performance and capacity of many storage devices throughout the attached cluster nodes and enables the creation of logical block devices of various sizes and performance characteristics. Imagine virtualization, but for your storage. To build your own Amazon EBS like storage solution today, you can get started right away. An overview of all the features, such as snapshots, copy-on-write clones, online scalability, and much more, can be found on our feature page.

The post What is Block Storage? appeared first on simplyblock.

What is NVMe Storage?

Chris Engelbert — Wed, 08 May 2024 12:12:00 +0000

NVMe, or Non-Volatile Memory Express, is a modern access and storage protocol for flash-based solid-state storage. Designed for low overhead, latency, and response times, it aims for the highest achievable throughput. With NVMe over TCP, NVMe has its own successor to the familiar iSCSI.

While commonly found in home computers and laptops (M.2 factor), it is designed from the ground up for all types of commodity and enterprise workloads. It guarantees fast load times and response times, even in demanding application scenarios.

The main intention in developing the NVMe storage protocol was to transfer data through the PCIe (PCI Express) bus. Since the low-overhead protocol, more use cases have been found through NVMe specification extensions managed by the NVM Express group. Those extensions include additional transport layers, such as Fibre Channel, Infiniband, and TCP (collectively known as NVMe-oF or NVMe over Fabrics).

How does NVMe Work?

Traditionally, computers used SATA or SAS (and before that, ATA, IDE, SCSI, …) as their main protocols for data transfers from the disk to the rest of the system. Those protocols were all developed when spinning disks were the prevalent type of high-capacity storage media.

NVMe, on the other hand, was developed as a standard protocol to communicate with modern solid-state drives (SSDs). Unlike traditional protocols, NVMe fully takes advantage of SSDs’ capabilities. It also provides support for much lower latency due to the missing repositioning of read-write heads and rotating spindles.

The main reason for developing the NVMe protocol was that SSDs were starting to experience throughput limitations due to the traditional protocols SAS and SATA.

Anyhow, NVMe communicates through a high-speed Peripheral Component Interconnect Express bus (better known as PCIe). The logic for NVMe resides inside the controller chip on the storage adapter board, which is physically located inside the NVMe-capable device. This board is often co-located with controllers for other features, such as wear leveling. When accessing or writing data, the NVMe controller talks directly to the CPU through the PCIe bus.

The NVMe standard defines registers (basically special memory locations) to control the protocol, a command set of possible operations to be executed, and additional features to improve performance for specific operations.

What are the Benefits of NVMe Storage?

Compared to traditional storage protocols, NVMe has much lower overhead and is better optimized for high-speed and low-latency data access.

Additionally, the PCI Express bus can transfer data faster than SATA or SAS links. That means that NVMe-based SSDs provide a latency of a few microseconds over the 40-100ms for SATA-based ones.

Furthermore, NVMe storage comes in many different packages, depending on the use case. That said, many people know the M.2 form factor for home use, however it is limited in bandwidth due to the much fewer available PCIe lanes on consumer-grade CPUs. Enterprise NVMe form factors, such as U.2, provide more and higher capacity uplinks. These enterprise types are specifically designed to sustain high throughput for ambiguous data center workloads, such as high-load databases or ML / AI applications.

Last but not least, NVMe commands can be streamlined, queued, and multipath for more efficient parsing and execution. Due to the non-rotational nature of solid-state drives, multiple operations can be executed in parallel. This makes NVMe a perfect candidate for tunneling the protocol over high-speed communication links.

What is NVMe over Fabrics (NVMe-oF)?

NVMe over Fabrics is a tunneling mechanism for access to remote NVMe devices. It extends the performance of access to solid-state drives over traditional tunneling protocols, just like iSCSI.

NVMe over Fabrics is directly supported by the NVMe driver stacks of common operating systems, such as Linux and Windows (Server), and doesn’t require additional software on the client side.

At the time of writing, the NVM Express group has standardized the tunneling of NVMe commands through the NVMe-friendly protocols Fibre Channel, Infiniband, and Ethernet, or more precisely, over TCP.

NVMe over Fibre Channel (NVMe/FC)

NVMe over Fibre Channel is a high-speed transfer that connects NVMe storage solutions to client devices. Fibre Channel, initially designed to transport SCSI commands, needed to translate NVMe commands into SCSI commands and back to communicate with newer solid-state hardware. To mitigate that overhead, the Fibre Channel protocol was enhanced to natively support the transport of NVMe commands. Today, it supports native, in-order transfers between NVMe storage devices across the network.

Due to the fact that Fibre Channel is its own networking stack, cloud providers (at least none of my knowledge) don’t offer support for NVMe/FC.

NVMe over TCP (NVMe/TCP)

NVMe over TCP provides an alternative way of transferring NVMe communication through a network. In the case of NVMe/TCP, the underlying network layer is the TCP/IP protocol, hence an Ethernet-based network. That increases the availability and commodity of such a transport layer beyond separate and expensive enterprise networks running Fibre Channel.

NVMe/TCP is rising to become the next protocol for mainstream enterprise storage, offering the best combination of performance, ease of deployment, and cost efficiency.

Due to its reliance on TCP/IP, NVMe/TCP can be utilized without additional modifications in all standard Ethernet network gear, such as NICs, switches, and copper or fiber transports. It also works across virtual private networks, making it extremely interesting in cloud, private cloud, and on-premises environments, specifically with public clouds with limited network connectivity options.

NVMe over RDMA (NVMe/RDMA)

A special version of NVMe over Fabrics is NVMe over RDMA (or NVMe/RDMA). It implements a direct communication channel between a storage controller and a remote memory region (RDMA = Remote Direct Memory Access). This lowers the CPU overhead for remote access to storage (and other peripheral devices). To achieve that, NVMe/RDMA bypasses the kernel stack, hence it mitigates the memory copying between the driver stack, the kernel stack, and the application memory.

NVMe over RDMA has two sub-protocols: NVMe over Infiniband and NVMe over RoCE (Remote Direct Memory Access over Converged Ethernet). Some cloud providers offer NVMe over RDMA access through their virtual networks.

How does NVMe/TCP Compare to ISCSI?

NVMe over TCP provides performance and latency benefits over the older iSCSI protocol. The improvements include about 25% lower protocol overhead, meaning more actual data can be transferred with every TCP/IP packet, increasing the protocol’s throughput.

Furthermore, NVMe/TCP enables native transfer of the NVMe protocol, removing multiple translation layers between the older SCSI protocol (which is used in iSCSI, hence the name) and NVMe.

That said, the difference is measurable. Blockbridge Networks, a provider of all-flash storage hardware, did a performance benchmarking of both protocols and found a general improvement of access latency of up to 20% and an IOPS improvement of up to 35% using NVMe/TCP over iSCSI to access the remote file storage.

Use Cases for NVMe Storage?

NVMe storage’s benefits and ability to be tunneled through different types of networks (including virtual private networks in cloud environments through NVMe/TCP) open up a wide range of high-performance, latency-sensitive, or IOPS-hungry use cases.

Relational Databases with high load or high-velocity data. That includes:

Time-series databases for IoT or Observability data
Big Data
Data Warehouses
Analytical databases
Artificial Intelligence (AI) and Machine Learning (ML)
Blockchain storage and other Crypto use cases
Large-scale data center storage solutions
Graphics editing storage servers

The Future is NVMe Storage

No matter how we look at it, the amount of data we need to transfer (quickly) from and to storage devices won’t shrink. NVMe is the current gold standard for high-performance and low-latency storage. Making NVMe available throughout a network and accessing the data remotely is becoming increasingly popular over the still prevalent iSCSI protocol. The benefits are imminent whenever NVMe-oF is deployed.

The storage solution by simplyblock is designed around the idea of NVMe being the better way to access your data. Built from the ground up to support NVMe throughout the stack, it combines NVMe solid-state drives into a massive storage pool. It creates logical volumes, with data spread around all connected storage devices and simplyblock cluster nodes. Simplyblock provides these logical volumes as NVMe over TCP devices, which are directly accessible from Linux and Windows. Additional features such as copy-on-write clones, thin provisioning, compression, encryption, and more are given.

If you want to learn more about simplyblock, read our feature deep dive. You want to test it out, then get started right away.

The post What is NVMe Storage? appeared first on simplyblock.

9 Best Open Source Tools for Apache Kafka

Rahil Parekh — Mon, 23 Oct 2023 13:40:00 +0000

What is Apache Kafka?

Apache Kafka has become the backbone of many modern data pipelines, offering distributed event streaming for building real-time data applications. As Kafka’s adoption continues to grow, the ecosystem around it has flourished with open-source tools that enhance its usability, performance, and management. These tools are indispensable for ensuring that Kafka clusters operate efficiently, securely, and reliably.

What are the best open-source tools for your Apache Kafka setup?

In this post, we will explore nine essential open-source tools that can help you manage, monitor, and optimize your Kafka environment.

1. Kafka Manager

Kafka Manager, developed by Yahoo, is an open-source web-based tool that simplifies the management of Apache Kafka clusters. It allows administrators to monitor broker health, topic partitions, and consumer groups. Kafka Manager makes it easy to manage Kafka brokers, rebalance partitions, and perform administrative tasks with minimal effort.

2. Confluent Control Center

Although part of Confluent’s paid offering, the Confluent Control Center provides a free version for managing Kafka clusters. It offers a rich user interface to monitor cluster health, topic throughput, and consumer lag, while ensuring compliance with security policies. For small-scale Kafka deployments, the free version is an ideal tool for keeping your Kafka environment running smoothly.

3. Prometheus & Grafana

Prometheus is a popular monitoring and alerting toolkit that pairs with Grafana for visualizing metrics. By integrating Prometheus with Kafka, you can gather essential metrics such as broker performance, message throughput, and partition states. Grafana provides real-time dashboards that help you visualize Kafka’s operational health and detect performance bottlenecks early.

4. Kafdrop

Kafdrop is an open-source UI for exploring Kafka topics, consumers, and brokers. It provides a visual representation of topic partitions, offsets, and consumer lag, making it easier to manage and troubleshoot your Kafka environment. Kafdrop’s simplicity makes it an excellent tool for developers and administrators new to Kafka.

5. MirrorMaker 2

MirrorMaker 2 is an open-source Kafka tool designed for replicating data across multiple Kafka clusters. It supports geo-replication and allows organizations to build disaster recovery strategies for their Kafka infrastructure. MirrorMaker 2’s flexible architecture enables both active-active and active-passive replication scenarios, ensuring high availability and fault tolerance.

6. Kafka Connect

Kafka Connect is an integral part of Kafka’s ecosystem, providing a scalable and reliable way to integrate Kafka with other systems. It offers numerous open-source connectors for databases, file systems, and other data platforms. With Kafka Connect, you can easily set up real-time data pipelines without needing custom code, making it easier to integrate with various systems.

7. Schema Registry

Confluent’s Schema Registry is a must-have for managing data formats in Kafka. It ensures that producers and consumers adhere to a predefined schema, preventing data inconsistencies. By enforcing schema validation at runtime, the Schema Registry helps avoid breaking changes in your Kafka applications while ensuring compatibility across systems.

8. Burrow

Burrow is an open-source tool developed by LinkedIn for monitoring Kafka consumer lag. It provides a detailed view of how much a consumer lags behind the latest message in a Kafka topic. Burrow’s ability to monitor consumer offsets ensures that you are alerted when consumers are falling behind, allowing for timely intervention to prevent data loss or delays.

9. Kafka Streams

Kafka Streams is a lightweight, client-side library for building real-time streaming applications on top of Kafka. It enables the processing of data within Kafka topics using event-driven architectures. With Kafka Streams, you can build complex event processing pipelines directly inside Kafka without the need for external systems, making it an essential tool for real-time analytics and data transformation.

Why Choose simplyblock for Apache Kafka?

Apache Kafka’s distributed architecture requires careful management of brokers, topics, and partitions to maintain optimal performance and reliability. This is where simplyblock’s intelligent orchestration creates unique value:

Intelligent Broker Management: Simplyblock implements sophisticated broker optimization strategies for Kafka clusters. The platform manages partition leadership distribution, handles broker scaling and rebalancing, and optimizes producer/consumer configurations for maximum throughput. It automatically tunes critical parameters like batch sizes, compression settings, and replication factors based on workload patterns and resource utilization.
Performance-Optimized Message Handling: Simplyblock manages the complex aspects of Kafka’s message delivery system by implementing intelligent partition assignment strategies and optimizing consumer group configurations. The platform handles log segment management, maintains optimal retention policies, and provides automated cleanup processes while ensuring zero-copy message delivery and efficient disk space utilization across the cluster.
Enterprise-Grade Kafka Operations: Through Kubernetes integration, simplyblock automates critical Kafka operational requirements. This includes managing ZooKeeper ensemble coordination, implementing sophisticated fault tolerance mechanisms, and handling topic configurations at scale. The platform provides comprehensive monitoring of key metrics like producer/consumer lag, partition health, and broker performance while maintaining security configurations and access controls across the cluster.

How to Optimize Apache Kafka with Open-source Tools

This guide explored nine essential open-source tools for Apache Kafka, from Kafka Manager’s cluster management to Kafka Streams’ real-time processing capabilities. While these tools excel at different aspects – MirrorMaker 2 for replication, Schema Registry for data consistency, and Burrow for consumer monitoring – proper implementation is crucial. Tools like Prometheus and Grafana enable comprehensive monitoring, while Kafka Connect simplifies data integration. Each tool addresses specific operational needs in the Kafka ecosystem.

If you’re looking to further streamline your Apache Kafka operations, simplyblock offers comprehensive solutions that integrate seamlessly with these tools, helping you get the most out of your Kafka environment.

Ready to optimize your Kafka message streaming? Contact simplyblock today to discover how we can help you enhance your data streaming, performance, and scalability.

The post 9 Best Open Source Tools for Apache Kafka appeared first on simplyblock.