Simplyblock Archives | simplyblock https://www.simplyblock.io/blog/categories/simplyblock/ NVMe-First Kubernetes Storage Platform Wed, 05 Feb 2025 13:31:14 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://www.simplyblock.io/wp-content/media/cropped-icon-rgb-simplyblock-32x32.png Simplyblock Archives | simplyblock https://www.simplyblock.io/blog/categories/simplyblock/ 32 32 Encryption At Rest: A Comprehensive Guide to DARE https://www.simplyblock.io/blog/encryption-at-rest-dare/ Tue, 17 Dec 2024 10:22:44 +0000 https://www.simplyblock.io/?p=4645 TLDR: Data At Rest Encryption (DARE) is the process of encrypting data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key. Today, data is the “new gold” for companies. Data collection happens everywhere and […]

The post Encryption At Rest: A Comprehensive Guide to DARE appeared first on simplyblock.

]]>
TLDR: Data At Rest Encryption (DARE) is the process of encrypting data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

Today, data is the “new gold” for companies. Data collection happens everywhere and at any time. The global amount of data collected is projected to reach 181 Zettabytes (that is 181 billion Terabytes) by the end of 2025. A whopping 23.13% increase over 2024.

That means data protection is becoming increasingly important. Hence, data security has become paramount for organizations of all sizes. One key aspect of data security protocols is data-at-rest encryption, which provides crucial protection for stored data.

Data At Rest Encryption or Encryption At Rest

Understanding the Data-in-Use, Data-in-Transit, and Data-at-Rest

Before we go deep into DARE, let’s quickly discuss the three states of data within a computing system and infrastructure.

Data at rest encryption, data in transit encryption, data in use encryption
Figure 1: The three states of data in encryption

To start, any type of data is created inside an application. While the application holds onto it, it is considered as data in use. That means data in use describes information actively being processed, read, or modified by applications or users.

For example, imagine you open a spreadsheet to edit its contents or a database to process a query. This data is considered “in use.” This state is often the most vulnerable as the data must be in an unencrypted form for processing. However, technologies like confidential computing enable encrypted memory to process even these pieces of data in an encrypted manner.

Next, data in transit describes any information moving between locations. Locations means across the internet, within a private network, or between memory and processors. Examples include email messages being sent, files being downloaded, or database query results traveling between the database server and applications—all examples of data in transit.

Last but not least, data at rest refers to any piece of information stored physically on a digital storage media such as flash storage or hard disk. It also considers storage solutions like cloud storage, offline backups, and file systems as valid digital storage. Hence, data stored in these services is also data at rest. Think of files saved on your laptop’s hard drive or photos stored in cloud storage such as Google Drive, Dropbox, or similar.

Criticality of Data Protection

For organizations, threats to their data are omnipresent. Starting from unfortunate human error, deleting important information, coordinated ransomware attacks, encryption of your data, and asking for a ransom to actual data leaks.

Especially with data leaks, most people think about external hackers copying data off of the organization’s network and making it public. However, this isn’t the only way data is leaked to the public. There are many examples of Amazon S3 buckets without proper authorization, databases being accessible from the outside world, or backup services being accessed.

Anyhow, organizations face increasing threats to their data security. Any data breach has consequences, which are categorized into four segments:

  1. Financial losses from regulatory fines and legal penalties
  2. Damage to brand reputation and loss of customer trust
  3. Exposure of intellectual property to competitors
  4. Compliance violations with regulations like GDPR, HIPAA, or PCI DSS

While data in transit is commonly protected through TLS (transport layer encryption), data at rest encryption (or DARE) is often an afterthought. This is typically because the setup isn’t as easy and straightforward as it should be. It’s this “I can still do this afterward” effect: “What could possibly go wrong?”

However, data at rest is the most vulnerable of all. Data in use is often unprotected but very transient. While there is a chance of a leak, it is small. Data at rest persists for more extended periods of time, giving hackers more time to plan and execute their attacks. Secondly, persistent data often contains the most valuable pieces of information, such as customer records, financial data, or intellectual property. And lastly, access to persistent storage enables access to large amounts of data within a single breach, making it so much more interesting to attackers.

Understanding Data At Rest Encryption (DARE)

Simply spoken, Data At Rest Encryption (DARE) transforms stored data into an unreadable format that can only be read with the appropriate encryption or decryption key. This ensures that the information isn’t readable even if unauthorized parties gain access to the storage medium.

That said, the strength of the encryption used is crucial. Many encryption algorithms we have considered secure have been broken over time. Encrypting data once and taking for granted that unauthorized parties can’t access it is just wrong. Data at rest encryption is an ongoing process, potentially involving re-encrypting information when more potent, robust encryption algorithms are available.

Available Encryption Types for Data At Rest Encryption

Symmetric encryption (same encryption key) vs asymmetric encryption (private and public key)
Figure 2: Symmetric encryption (same encryption key) vs asymmetric encryption (private and public key)

Two encryption methods are generally available today for large-scale setups. While we look forward to quantum-safe encryption algorithms, a widely adopted solution has yet to be developed. Quantum-safe means that the encryption will resist breaking attempts from quantum computers.

Anyhow, the first typical type of encryption is the Symmetric Encryption. It uses the same key for encryption and decryption. The most common symmetric encryption algorithm is AES, the Advanced Encryption Standard.

const cipherText = encrypt(plaintext, encryptionKey);
const plaintext = decrypt(cipherText, encryptionKey);

The second type of encryption is Asymmetric Encryption. Hereby, the encryption and decryption routines use different but related keys, generally referred to as Private Key and Public Key. According to their names, the public key can be publicly known, while the private key must be kept private. Both keys are mathematically connected and generally based on a hard-to-solve mathematical problem. Two standard algorithms are essential: RSA (the Rivest–Shamir–Adleman encryption) and ECDSA (the Elliptic Curve Digital Signature Algorithm). While RSA is based on the prime factorization problem, ECDSA is based on the discrete log problem. To go into detail about those problems is more than just one additional blog post, though.

const cipherText = encrypt(plaintext, publicKey);
const plaintext = decrypt(cipherText, privateKey);

Encryption in Storage

The use cases for symmetric and asymmetric encryption are different. While considered more secure, asymmetric encryption is slower and typically not used for large data volumes. Symmetric encryption, however, is fast but requires the sharing of the encryption key between the encrypting and decrypting parties, making it less secure.

To encrypt large amounts of data and get the best of both worlds, security and speed, you often see a combination of both approaches. The symmetric encryption key is encrypted with an asymmetric encryption algorithm. After decrypting the symmetric key, it is used to encrypt or decrypt the stored data.

Simplyblock provides data-at-rest encryption directly through its Kubernetes CSI integration or via CLI and API. Additionally, simplyblock can be configured to use a different encryption key per logical volume for the highest degree of security and muti-tenant isolation.

Key Management Solutions

Next to selecting the encryption algorithm, managing and distributing the necessary keys is crucial. This is where key management solutions (KMS) come in.

Generally, there are two basic types of key management solutions: hardware and software-based.

For hardware-based solutions, you’ll typically utilize an HSM (Hardware Security Module). These HSMs provide a dedicated hardware token (commonly a USB key) to store keys. They offer the highest level of security but need to be physically attached to systems and are more expensive.

Software-based solutions offer a flexible key management alternative. Almost all cloud providers offer their own KMS systems, such as Azure Key Vault, AWS KMS, or Google Cloud KMS. Additionally, third-party key management solutions are available when setting up an on-premises or private cloud environment.

When managing more than a few encrypted volumes, you should implement a key management solution. That’s why simplyblock supports KMS solutions by default.

Implementing Data At Rest Encryption (DARE)

Simply said, you should have a key management solution ready to implement data-at-rest encryption in your company. If you run in the cloud, I recommend using whatever the cloud provider offers. If you run on-premises or the cloud provider doesn’t provide one, select from the existing third-party solutions.

DARE with Linux

As the most popular server operating system, Linux has two options to choose from when you want to encrypt data. The first option works based on files, providing a filesystem-level encryption.

Typical solutions for filesystem-level based encryption are eCryptfs, a stacked filesystem, which stores encryption information in the header information of each encrypted file. The benefit of eCryptfs is the ability to copy encrypted files to other systems without the need to decrypt them first. As long as the target node has the necessary encryption key in its keyring, the file will be decrypted. The alternative is EncFS, a user-space filesystem that runs without special permissions. Both solutions haven’t seen updates in many years, and I’d generally recommend the second approach.

Block-level encryption transparently encrypts the whole content of a block device (meaning, hard disk, SSD, simplyblock logical volume, etc.). Data is automatically decrypted when read. While there is VeraCrypt, it is mainly a solution for home setups and laptops. For server setups, the most common way to implement block-level encryption is a combination of dm-crypt and LUKS, the Linux Unified Key Setup.

Linux Data At Rest Encryption with dm-crypt and LUKS

The fastest way to encrypt a volume with dm-crypt and LUKS is via the cryptsetup tools.

Debian / Ubuntu / Derivates:

sudo apt install cryptsetup 

RHEL / Rocky Linux / Oracle:

yum install cryptsetup-luks

Now, we need to enable encryption for our block device. In this example, we assume we already have a whole block device or a partition we want to encrypt.

Warning: The following command will delete all data from the given device or partition. Make sure you use the correct device file.

cryptsetup -y luksFormat /dev/xvda

I recommend always running cryptsetup with the -y parameter, which forces it to ask for the passphrase twice. If you misspelled it once, you’ll realize it now, not later.

Now open the encrypted device.

cryptsetup luksOpen /dev/xvda encrypted

This command will ask for the passphrase. The passphrase is not recoverable, so you better remember it.

Afterward, the device is ready to be used as /dev/mapper/encrypted. We can format and mount it.

mkfs.ext4 /dev/mapper/encrypted
mkdir /mnt/encrypted
mount /dev/mapper/encrypted /mnt/encrypted

Data At Rest Encryption with Kubernetes

Kubernetes offers ephemeral and persistent storage as volumes. Those volumes can be statically or dynamically provisioned.

For pre-created and statically provisioned volumes, you can follow the above guide on encrypting block devices on Linux and make the already encrypted device available to Kubernetes.

However, Kubernetes doesn’t offer out-of-the-box support for encrypted and dynamically provisioned volumes. Encrypting persistent volumes is not in Kubernetes’s domain. Instead, it delegates this responsibility to its container storage provider, connected through the CSI (Container Storage Interface).

Note that not all CSI drivers support the data-at-rest encryption, though! But simplyblock does!

Data At Rest Encryption with Simplyblock

DARE stack for Linux: dm-crypt+LUKS vs Simplyblock
Figure 3: Encryption stack for Linux: dm-crypt+LUKS vs Simplyblock

Due to the importance of DARE, simplyblock enables you to secure your data immediately through data-at-rest encryption. Simplyblock goes above and beyond with its features and provides a fully multi-tenant DARE feature set. That said, in simplyblock, you can encrypt multiple logical volumes (virtual block devices) with the same key or one key per logical volume.

The use cases are different. One key per volume enables the highest level of security and complete isolation, even between volumes. You want this to encapsulate applications or teams fully.

When multiple volumes share the same encryption key, you want to provide one key per customer. This ensures that you isolate customers against each other and prevent data from being accessible by other customers on a shared infrastructure in the case of a configuration failure or similar incident.

To set up simplyblock, you can configure keys manually or utilize a key management solution. In this example, we’ll set up the key manual. We also assume that simplyblock has already been deployed as your distributed storage platform, and the simplyblock CSI driver is available in Kubernetes.

First, let’s create our Kubernetes StorageClass for simplyblock. Deploy the YAML file via kubectl, just as any other type of resource.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: encrypted-volumes
provisioner: csi.simplyblock.io
parameters:
    encryption: "True"
    csi.storage.k8s.io/fstype: ext4
    ... other parameters
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Secondly, we generate the two keys. Note the results of the two commands down.

openssl rand -hex 32   # Key 1
openssl rand -hex 32   # Key 2

Now, we can create our secret and Persistent Volume Claim (PVC).

apiVersion: v1
kind: Secret
metadata:
    name: encrypted-volume-keys
data:
    crypto_key1: 
    crypto_key2: 
–--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    annotations:
        simplybk/secret-name: encrypted-volume-keys
    name: encrypted-volume-claim
spec:
    storageClassName: encrypted-volumes
    accessModes:
        - ReadWriteOnce
    resources:
        requests:
            storage: 200Gi

And we’re done. Whenever we use the persistent volume claim, Kubernetes will delegate to simplyblock and ask for the encrypted volume. If it doesn’t exist yet, simplyblock will create it automatically. Otherwise, it will just provide it to Kubernetes directly. All data written to the logical volume is fully encrypted.

Best Practices for Securing Data at Rest

Implementing robust data encryption is crucial. In the best case, data should never exist in a decrypted state.

That said, data-in-use encryption is still complicated as of writing. However, solutions such as Edgeless Systems’ Constellation exist and make it possible using hardware memory encryption.

Data-in-transit encryption is commonly used today via TLS. If you don’t use it yet, there is no time to waste. Low-hanging fruits first.

Data-at-rest encryption in Windows, Linux, or Kubernetes doesn’t have to be complicated. Solutions such as simplyblock enable you to secure your data with minimal effort.

However, there are a few more things to remember when implementing DARE effectively.

Data Classification

Organizations should classify their data based on sensitivity levels and regulatory requirements. This classification guides encryption strategies and key management policies. A robust data classification system includes three things:

  • Sensitive data identification: Identify sensitive data through automated discovery tools and manual review processes. For example, personally identifiable information (PII) like social security numbers should be classified as highly sensitive.
  • Classification levels: Establish clear classification levels such as Public, Internal, Confidential, and Restricted. Each level should have defined handling requirements and encryption standards.
  • Automated classification: Implement automated classification tools to scan and categorize data based on content patterns and metadata.

Access Control and Key Management

Encryption is only as strong as the key management and permission control around it. If your keys are leaked, the strongest encryption is useless.

Therefore, it is crucial to implement strong access controls and key rotation policies. Additionally, regular key rotation helps minimize the impact of potential key compromises and, I hate to say it, employees leaving the company.

Monitoring and Auditing

Understanding potential risks early is essential. That’s why it must maintain comprehensive logs of all access to encrypted data and the encryption keys or key management solutions. Also, regular audits should be scheduled for suspicious activities.

In the best case, multiple teams run independent audits to prevent internal leaks through dubious employees. While it may sound harsh, there are situations in life where people take the wrong path. Not necessarily on purpose or because they want to.

Data Minimization

The most secure data is the data that isn’t stored. Hence, you should only store necessary data.

Apart from that, encrypt only what needs protection. While this sounds counterproductive, it reduces the attack surface and the performance impact of encryption.

Data At Rest Encryption: The Essential Component of Data Management

Data-at-rest encryption (DARE) has become essential for organizations handling sensitive information. The rise in data breaches and increasingly stringent regulatory requirements make it vital to protect stored information. Additionally, with the rise of cloud computing and distributed systems, implementing DARE is more critical than ever.

Simplyblock integrates natively with Kubernetes to provide a seamless approach to implementing data-at-rest encryption in modern containerized environments. With our support for transparent encryption, your organization can secure its data without any application changes. Furthermore, simplyblock utilizes the standard NVMe over TCP protocol which enables us to work natively with Linux. No additional drivers are required. Use a simplyblock logical volume straight from your dedicated or virtual machine, including all encryption features.

Anyhow, for organizations running Kubernetes, whether in public clouds, private clouds, or on-premises, DARE serves as a fundamental security control. By following best practices and using modern tools like Simplyblock, organizations can achieve robust data protection while maintaining system performance and usability.

But remember that DARE is just one component of a comprehensive data security strategy. It should be combined with other security controls, such as access management, network security, and security monitoring, to create a defense-in-depth approach to protecting sensitive information.

That all said, by following the guidelines and implementations detailed in this article, your organization can effectively protect its data at rest while maintaining system functionality and performance.

As threats continue to evolve, having a solid foundation in data encryption becomes increasingly crucial for maintaining data security and regulatory compliance.

What is Data At Rest Encryption?

Data At Rest Encryption (DARE), or encryption at rest, is the encryption process of data when stored on a storage medium. The encryption transforms the readable data (plaintext) into an encoded format (ciphertext) that can only be decrypted with knowledge about the correct encryption key.

What is Data-At-Rest?

Data-at-rest refers to any piece of information written to physical storage media, such as flash storage or hard disk. This also includes storage solutions like cloud storage, offline backups, and file systems as valid digital storage.

What is Data-In-Use?

Data-in-use describes any type of data created inside an application, which is considered data-in-use as long as the application holds onto it. Hence, data-in-use is information actively processed, read, or modified by applications or users.

What is Data-In-Transit?

Data-in-transit describes any information moving between locations, such as data sent across the internet, moved within a private network, or transmitted between memory and processors.

The post Encryption At Rest: A Comprehensive Guide to DARE appeared first on simplyblock.

]]>
data-at-rest-encryption–encryption-at-rest data-at-rest–data-in-transit–data-in-use symmetric-encryption-vs-asymmetric-encryption encryption-stack-linux-dm-crypt-luks-and-simplyblock
Scale Up vs Scale Out: System Scalability Strategies https://www.simplyblock.io/blog/scale-up-vs-scale-out/ Wed, 11 Dec 2024 10:00:40 +0000 https://www.simplyblock.io/?p=4595 TLDR: Horizontal scalability (scale out) describes a system that scales by adding more resources through parallel systems, whereas vertical scalability (scale up) increases the amount of resources on a single system. One of the most important questions to answer when designing an application or infrastructure is the architecture approach to system scalability. Traditionally, systems used […]

The post Scale Up vs Scale Out: System Scalability Strategies appeared first on simplyblock.

]]>
TLDR: Horizontal scalability (scale out) describes a system that scales by adding more resources through parallel systems, whereas vertical scalability (scale up) increases the amount of resources on a single system.

One of the most important questions to answer when designing an application or infrastructure is the architecture approach to system scalability. Traditionally, systems used the scale-up approach or vertical scalability. Many modern systems use a scale-out approach, especially in the cloud-native ecosystem. Also called horizontal scalability.

Scale-Up vs Scale-Out: Which System Architecture is Right for You?
Scale-Up vs Scale-Out: Which System Architecture is Right for You?

Understanding the Basics

Understanding the fundamental concepts is essential when discussing system architectures. Hence, let’s briefly overview the two approaches before exploring them in more depth.

  • With Scale Up (Vertical Scalability), you increase resources (typically CPU, memory, and storage) in the existing system to improve performance and capacity.
  • With Scale Out (Horizontal Scalability), you add additional nodes or machines to the existing workforce to distribute the workload across multiple systems.

Both architectural approaches have their respective advantages and disadvantages. While scale-up architectures are easier to implement, they are harder to scale at a certain point. On the other hand, scale-out architectures are more complex to implement but scale almost linearly if done right.

Vertical Scaling (Scale Up) Architectures: The Traditional Approach

Scale Up Storage Architecture with disks being added to the same machine.
Figure 1: Scale-up storage architecture with disks being added to the same machine

Vertical scaling, commonly known as scaling up, involves adding more resources to an existing system to increase its power or capacity.

Think of it as upgrading your personal computer. Instead of buying a second computer, you add more RAM or install a faster processor or larger storage device. In enterprise storage systems, this typically means adding more CPU cores, memory, or storage drives to an existing server. Meanwhile, for virtual machines it usually involves increasing the host machine’s assigned resources.

To clarify, let’s use a real-world example from the storage industry. With a ZFS-based SAN (Storage Area Network) system, a scaling up system design is required. Or as Jason Lohrey wrote: «However, ZFS has a significant issue – it can’t scale out. ZFS’s biggest limitation is that it is “scale-up” only.» ZFS, as awesome as it is, is limited to a single machine. That said, increasing the storage capacity always means adding larger or more disks to the existing machine. This approach maintains the simplicity of the original architecture while increasing storage capacity and potentially improving performance.

Strengths of Vertical Scaling

Today, many people see the vertical scalability approach as outdated and superfluous. That is, however, not necessarily true. Vertical scaling shines in several scenarios.

First, implementing a scale-up system is generally more straightforward since it doesn’t require changes to your application architectures or complex data distribution logic. When you scale up a transactional database like PostgreSQL or MySQL, you essentially give it more operational resources while maintaining the same operational model.

Secondly, the management overhead is lower. Tasks such as backups, monitoring, and maintenance are straightforward. This simplicity often translates to lower operational costs despite the potentially higher hardware costs.

Here is a quick overview of all the advantages:

  1. Simplicity: It’s straightforward to implement since you’re just adding resources to an existing system
  2. Lower Complexity: Less architectural overhead since you’re working with a single system
  3. Consistent Performance: Lower latency due to all resources being in one place
  4. Software Compatibility: Most traditional software is designed to run on a single system
  5. Lower Initial Costs: Often cheaper for smaller workloads due to simpler licensing and management

Weaknesses and Limitations of Scale-Up Architectures

Like anything in this world, vertical scaling architectures also have drawbacks. The most significant limitation is the so-called physical ceiling. A system is limited by its server chassis’s space capacity or the hardware architecture’s limitation. You can only add as much hardware as those limitations allow. Alternatively, you need to migrate to a bigger base system.

Traditional monolithic applications often face another challenge with vertical scaling: adding more resources doesn’t always translate to linear performance improvements. For example, doubling the CPU cores might yield only a 50% performance increase due to software architecture limitations, especially resource contention.

Here is a quick overview of all the disadvantages:

  1. Hardware Limits: The physical ceiling limits how much you can scale up based on maximum hardware specifications
  2. Downtime During Upgrades: Usually requires system shutdown for hardware upgrades
  3. Cost Efficiency: High-end hardware becomes exponentially more expensive
  4. Single Point of Failure: No built-in redundancy
  5. Limited Flexibility: Cannot easily scale back down when demand decreases

When to Scale Up?

After all that, here is when you really want to go with a scale-up architecture:

  • You have traditional monolithic applications
  • You look for an easier way to optimize for performance, not capacity
  • You’re dealing with applications that aren’t designed for distributed computing
  • You need a quick solution for immediate performance issues

Horizontal Scaling (Scale Out) Architectures: The Distributed Approach

Scale-out storage architecture with additional nodes being added to the cluster
Figure 2: Scale-out storage architecture with additional nodes being added to the cluster

The fundamentally different approach is the horizontal scaling or scale-out architecture. Instead of increasing the available resources on the existing system, you add more systems to distribute the load across them. This is actually similar to adding additional workers to an assembly line rather than trying to make one worker more efficient.

Consider a distributed storage system like simplyblock or a distributed database like MongoDB. When you scale out these systems, you add more nodes to the cluster, and the workload gets distributed across all nodes. Each node handles a portion of the data and processing, allowing the system to grow almost limitlessly.

Advantages of Horizontal Scaling

Large-scale deployments and highly distributed systems are the forte of scale-out architectures. As a simple example, most modern web applications utilize load balancers. They distribute the traffic across multiple application servers. This allows us to handle millions of concurrent requests and users. Similarly, distributed storage systems like simplyblock scale to petabytes of data by adding additional storage nodes.

Secondly, another significant advantage is improved high availability and fault tolerance. In a properly designed scale-out system, if one node fails, the system continues operating. While it may degrade to a reduced service, it will not experience a complete system failure or outage.

To bring this all to a point:

  1. Near-Infinite Scalability: Can continue adding nodes as needed
  2. Better Fault Tolerance: Built-in redundancy through multiple nodes
  3. Cost Effectiveness: Can use commodity hardware
  4. Flexible Resource Allocation: Easy to scale up or down based on demand
  5. High Availability: No single point of failure

The Cost of Distribution: Weakness and Limitations of Horizontal Scalability

The primary challenge when considering scale-out architectures is complexity. Distributed systems must maintain data consistency across system boundaries, handle network communications or latencies, and handle failure recovery. Multiple algorithms have been developed over the years. The most commonly used ones are Raft and Paxos, but that’s a different blog post. Anyhow, this complexity typically requires more sophisticated management tools and distributed systems expertise. Normally also for the team operating the system.

The second challenge is the overhead of system coordination. In a distributed system, nodes must synchronize their operations. If not careful, this can introduce latency and even reduce the performance of certain types of operations. Great distributed systems utilize sophisticated algorithms to prevent these issues from happening.

Here is a quick overview of the disadvantages of horizontal scaling:

  1. Increased Complexity: More moving parts to manage
  2. Data Consistency Challenges: Maintaining consistency across nodes can be complex
  3. Higher Initial Setup Costs: Requires more infrastructure and planning
  4. Software Requirements: Applications must be designed for distributed computing
  5. Network Overhead: Communication between nodes adds latency

Kubernetes: A Modern Approach to Scaling

Kubernetes has become the de facto platform for container orchestration. It comes in multiple varieties, in its vanilla form or as the basis for systems like OpenShift or Rancher. Either way, it can be used for both vertical and horizontal scaling capabilities. However, Kubernetes has become a necessity when deploying scale-out services. Let’s look at how different workloads scale in a Kubernetes environment.

Scaling Stateless Workloads

Stateless applications, like web servers or API gateways, are natural candidates for horizontal scaling in Kubernetes. The Horizontal Pod Autoscaler (HPA) provided by Kubernetes automatically adjusts the number of pods based on metrics such as CPU or RAM utilization. Custom metrics as triggers are also possible.

Horizontally scaling stateless applications is easy. As the name suggests, stateless applications do not maintain persistent local or shared state data. Each instance or pod is entirely independent and interchangeable. Each request to the service contains all the required information needed for processing.

That said, automatically scaling up and down (in the meaning of starting new instances or shutting some down) is part of the typical lifecycle and can happen at any point in time.

Scaling Stateful Workloads

Stateful workloads, like databases, require more careful consideration.

A common approach for more traditional databases like PostgreSQL or MySQL is to use a primary-replica architecture. In this design, write operations always go to the primary instance, while read operations can be distributed across all replicas.

On the other hand, MongoDB, which uses a distributed database design, can scale out more naturally by adding more shards to the cluster. Their internal cluster design uses a technique called sharding. Data is assigned to horizontally scaling partitions distributed across the cluster nodes. Shard assignment happens either automatically (based on the data) or by providing a specific shard key, enabling data affinity. Adding a shard to the cluster will increase capacity when additional scale is necessary. Data rebalancing happens automatically.

Why we built Simplyblock on a Scale-Out Architecture

Simplyblock's scale-out architecture with storage pooling via cluster nodes.
Figure 3: Simplyblock’s scale-out architecture with storage pooling via cluster nodes

Stateful workloads, like Postgres or MySQL, can scale out by adding additional read-replicas to the cluster. However, every single instance needs storage to store its very own data. Hence, the need for scalable storage arrives.

Simplyblock is a cloud-native and distributed storage platform built to deliver scalable performance and virtually infinite capacity for logical devices through horizontal scalability. Unlike traditional storage systems, simplyblock distributes data across all cluster nodes, multiplying the performance and capacity.

Designed as an NVMe-first architecture, simplyblock using the NVMe over Fabrics protocol family. This extends the reach of the highly scalable NVMe protocol over network fabrics such as TCP, Fibre Channel, and others. Furthermore, it provides built-in support for multi-pathing, enabling seamless failover and load balancing.

The system uses a distributed data placement algorithm to spread data across all available cluster nodes, automatically rebalancing data when nodes are added or removed. When writing data, simplyblock splits the item into multiple, smaller chunks and distributes them. This allows for parallel access during read operations. The data distribution also provides redundancy, with parity information stored on other nodes in the cluster. This protects the data against individual disk and node failures.

Using this architecture, simplyblock provides linear capacity and performance scalability by pooling all available disks and parallelizing access. This enables simplyblock to scale from mere terabytes to multiple petabytes while maintaining performance, consistency, and durability characteristics throughout the cluster-growth process.

Building Future-Proof Infrastructure

To wrap up, when you build out a new system infrastructure or application, consider these facts:

Flowchart when to scale-up or scale-out?
Figure 4: Flowchart when to scale-up or scale-out?
  1. Workload characteristics: CPU-intensive workloads might benefit more from vertical scaling. Distributing operations comes with its own overhead. If the operation itself doesn’t set off this overhead, you might see lower performance than with vertical scaling. On the other hand, I/O-heavy workloads might perform better with horizontal scaling. If the access patterns are highly parallelizable, a horizontal architecture will most likely out scale a vertical one.
  2. Growth patterns: Predictable, steady growth might favor scaling up, while rapid growth patterns might necessitate the flexibility of scaling out. This isn’t a hard rule, though. A carefully designed scale-out system will provide a very predictable growth pattern and latency. However, the application isn’t the only element to take into account when designing the system, as there are other components, most prominently the network and network equipment.
  3. Future-Proofing: Scaling out often requires little upfront investment in infrastructure but higher investment in development and expertise. It can, however, provide better long-term cost efficiency for large deployments. That said, buying a scale-out solution is a great idea. With a storage solution like simplyblock, for example, you can start small and add required resources whenever necessary. With traditional storage solutions, you have to go with a higher upfront cost and are limited by the physical ceiling.
  4. Operational Complexity: Scale-up architectures are typically easier to manage, while a stronger DevOps or operations team is required to handle scale-out solutions. That’s why simplyblock’s design is carefully crafted to be fully autonomous and self-healing, with as few hands-on requirements as possible.

The Answer Depends

That means there is no universal answer to whether scaling up or out is better. A consultant would say, “It depends.” Seriously, it does. It depends on your specific requirements, constraints, and goals.

Many successful organizations use a hybrid approach, scaling up individual nodes while also scaling out their overall infrastructure. The key is understanding the trade-offs and choosing the best approach to your needs while keeping future growth in mind. Hence, simplyblock provides the general scale-out architecture for infinite scalability. It also provides a way to utilize storage located in Kubernetes worker nodes as part of the storage cluster to provide the highest possible performance. At the same time, it maintains the option to spill over when local capacity is reached and the high durability and fault tolerance of a fully distributed storage system.

Remember, the best scaling strategy aligns with your business objectives while maintaining performance, reliability, and cost-effectiveness. Whether you scale up, out, or both, ensure your choice supports your long-term infrastructure goals.

Simple definition of scale up vs scale out.
Figure 5: Simple definition of scale up vs scale out.

The post Scale Up vs Scale Out: System Scalability Strategies appeared first on simplyblock.

]]>
scale-up-vs-scale-out-which-system-architecutre-is-right-for-you-social-hero Scale-Up vs Scale-Out: Which System Architecture is Right for You? scale-up-storage-architecture-design scale-out-storage-architecture-design simplyblock-scale-out-storage-cluster-architecture scale-up-vs-scale-out-flowchart-when-to-scale-up-or-scale-out scale-up-vs-scale-up-comparison-simple
NVMe Storage for Database Optimization: Lessons from Tech Giants https://www.simplyblock.io/blog/nvme-database-optimization/ Thu, 17 Oct 2024 13:27:59 +0000 https://www.simplyblock.io/?p=3304 Leveraging NVMe-based storage for databases brings whole new set of capabilities and performance optimization opportunities. In this blog we explore how can you adopt NVMe storage for your database workloads with case studies from tech giants such as Pinterest or Discord.

The post NVMe Storage for Database Optimization: Lessons from Tech Giants appeared first on simplyblock.

]]>
Database Scalability Challenges in the Age of NVMe

In 2024, data-driven organizations increasingly recognize the crucial importance of adopting NVMe storage solutions to stay competitive. With NVMe adoption still below 30%, there’s significant room for growth as companies seek to optimize their database performance and storage efficiency. We’ve looked at how major tech companies have tackled database optimization and scalability challenges, often turning to self-hosted database solutions and NVMe storage.

While it’s interesting to see what Netflix or Pinterest engineers are investing their efforts into, it is also essential to ask yourself how your organization is adopting new technologies. As companies grow and their data needs expand, traditional database setups often struggle to keep up. Let’s look at some examples of how some of the major tech players have addressed these challenges.

Pinterest’s Journey to Horizontal Database Scalability with TiDB

Pinterest, which handles billions of pins and user interactions, faced significant challenges with its HBase setup as it scaled. As their business grew, HBase struggled to keep up with evolving needs, prompting a search for a more scalable database solution. They eventually decided to go with TiDB as it provided the best performance under load.

Selection Process:

  • Evaluated multiple options, including RocksDB, ShardDB, Vitess, VoltDB, Phoenix, Spanner, CosmosDB, Aurora, TiDB, YugabyteDB, and DB-X.
  • Narrowed down to TiDB, YugabyteDB, and DB-X for final testing.

Evaluation:

  • Conducted shadow traffic testing with production workloads.
  • TiDB performed well after tuning, providing sustained performance under load.

TiDB Adoption:

  • Deployed 20+ TiDB clusters in production.
  • Stores over 200+ TB of data across 400+ nodes.
  • Primarily uses TiDB 2.1 in production, with plans to migrate to 3.0.

Key Benefits:

  • Improved query performance, with 2-10x improvements in p99 latency.
  • More predictable performance with fewer spikes.
  • Reduced infrastructure costs by about 50%.
  • Enabled new product use cases due to improved database performance.

Challenges and Learnings:

  • Encountered issues like TiCDC throughput limitations and slow data movement during backups.
  • Worked closely with PingCAP to address these issues and improve the product.

Future Plans:

  • Exploring multi-region setups.
  • Considering removing Envoy as a proxy to the SQL layer for better connection control.
  • Exploring migrating to Graviton instance types for a better price-performance ratio and EBS for faster data movement (and, in turn, shorter MTTR on node failures).

Uber’s Approach to Scaling Datastores with NVMe

Uber, facing exponential growth in active users and ride volumes, needed a robust solution for their datastore “Docstore” challenges.

Hosting Environment and Limitations:

  • Initially on AWS, later migrated to hybrid cloud and on-premises infrastructure
  • Uber’s massive scale and need for customization exceeded the capabilities of managed database services

Uber’s Solution: Schemaless and MySQL with NVMe

  • Schemaless: A custom solution built on top of MySQL
  • Sharding: Implemented application-level sharding for horizontal scalability
  • Replication: Used MySQL replication for high availability
  • NVMe storage: Leveraged NVMe disks for improved I/O performance

Results:

  • Able to handle over 100 billion queries per day
  • Significantly reduced latency for read and write operations
  • Improved operational simplicity compared to Cassandra

Discord’s Storage Evolution and NVMe Adoption

Discord, facing rapid growth in user base and message volume, needed a scalable and performant storage solution.

Hosting Environment and Limitations:

  • Google Cloud Platform (GCP)
  • Discord’s specific performance requirements and need for customization led them to self-manage their database infrastructure

Discord’s storage evolution:

  1. MongoDB: Initially used for its flexibility, but faced scalability issues
  2. Cassandra: Adopted for better scalability but encountered performance and maintenance challenges
  3. ScyllaDB: Finally settled on ScyllaDB for its performance and compatibility with Cassandra

Discord also created a solution, “superdisk” with a RAID0 on top of the Local SSDs, and a RAID1 between the Persistent Disk and RAID0 array. They could configure the database with a disk drive that would offer low-latency reads while still allowing us to benefit from the best properties of Persistent Disks. One can think of it as a “simplyblock v0.1”.

Discord’s “superdisk” architecture
Figure 1: Discord’s “superdisk” architecture

Key improvements with ScyllaDB:

  • Reduced P99 latencies from 40-125ms to 15ms for read operations
  • Improved write performance, with P99 latencies dropping from 5-70ms to a consistent 5ms
  • Better resource utilization, allowing Discord to reduce their cluster size from 177 Cassandra nodes to just 72 ScyllaDB nodes

Summary of Case Studies

In the table below, we can see a summary of the key initiatives taken by tech giants and their respective outcomes. What is notable, all of the companies were self-hosting their databases (on Kubernetes or on bare-metal servers) and have leveraged local SSD (NVMe) for improved read/write performance and lower latency. However, at the same time, they all had to work around data protection and scalability of the local disk. Discord, for example, uses RAID to mirror the disk, which causes significant storage overhead. Such an approach doesn’t also offer a logical management layer (i.e. “storage/disk virtualization”). In the next paragraphs, let’s explore how simplyblock adds even more performance, scalability, and resource efficiency to such setups.

CompanyDatabaseHosting environmentKey Initiative
PinterestTiDBAWS EC2 & Kubernetes, local NVMe diskImproved performance & scalability
UberMySQLBare-metal, NVMe storageReduced read/write latency, improved scalability
DiscordScyllaDBGoogle Cloud, local NVMe disk with RAID mirroringReduced latency, improved performance and resource utilization

The Role of Intelligent Storage Optimization in NVMe-Based Systems

While these case studies demonstrate the power of NVMe and optimized database solutions, there’s still room for improvement. This is where intelligent storage optimization solutions like simplyblock are spearheading market changes.

Simplyblock vs. Local NVMe SSD: Enhancing Database Scalability

While local NVMe disks offer impressive performance, simplyblock provides several critical advantages for database scalability. Simplyblock builds a persistent layer out of local NVMe disks, which means that is not just a cache and it’s not just ephemeral storage. Let’s explore the benefits of simplyblock over local NVMe disk:

  1. Scalability: Unlike local NVMe storage, simplyblock offers dynamic scalability, allowing storage to grow or shrink as needed. Simplyblock can scale performance and capacity beyond the local node’s disk size, significantly improving tail latency.
  2. Reliability: Data on local NVMe is lost if an instance is stopped or terminated. Simplyblock provides advanced data protection that survives instance outages.
  3. High Availability: Local NVMe loses data availability during the node outage. Simplyblock ensures storage remains fully available even if a compute instance fails.
  4. Data Protection Efficiency: Simplyblock uses erasure coding (parity information) instead of triple replication, reducing network load and improving effective-to-raw storage ratios by about 150% (for a given amount of NVMe disk, there is 150% more usable storage with simplyblock).
  5. Predictable Performance: As IOPS demand increases, local NVMe access latency rises, often causing a significant increase in tail latencies (p99 latency). Simplyblock maintains constant access latencies at scale, improving both median and p99 access latency. Simplyblock also allows for much faster write at high IOPS as it’s not using NVMe layer as write-through cache, hence its performance isn’t dependent on a backing persistent storage layer (e.g. S3)
  6. Maintainability: Upgrading compute instances impacts local NVMe storage. With simplyblock, compute instances can be maintained without affecting storage.
  7. Data Services: Simplyblock provides advanced data services like snapshots, cloning, resizing, and compression without significant overhead on CPU performance or access latency.
  8. Intelligent Tiering: Simplyblock automatically moves infrequently accessed data to cheaper S3 storage, a feature unavailable with local NVMe.
  9. Thin Provisioning: This allows for more efficient use of storage resources, reducing overprovisioning common in cloud environments.
  10. Multi-attach Capability: Simplyblock enables multiple nodes to access the same volume, which is useful for high-availability setups without data duplication. Additionally, multi-attach can decrease the complexity of volume management and data synchronization.

Technical Deep Dive: Simplyblock’s Architecture

Simplyblock’s architecture is designed to maximize the benefits of NVMe while addressing common cloud storage challenges:

  1. NVMe-oF (NVMe over Fabrics) Interface: Exposes storage as NVMe volumes, allowing for seamless integration with existing systems while providing the low-latency benefits of NVMe.
  2. Distributed Data Plane: Uses a statistical placement algorithm to distribute data across nodes, balancing performance and reliability.
  3. Logical Volume Management: Supports thin provisioning, instant resizing, and copy-on-write clones, providing flexibility for database operations.
  4. Asynchronous Replication: Utilizes a block-storage-level write-ahead log (WAL) that’s asynchronously replicated to object storage, enabling disaster recovery with near-zero RPO (Recovery Point Objective).
  5. CSI Driver: Provides seamless integration with Kubernetes, allowing for dynamic provisioning and lifecycle management of volumes.

Below is a short overview of simplyblock’s high-level architecture in the context of PostgreSQL, MySQL, or Redis instances hosted in Kubernetes. Simplyblock creates a clustered shared pool out of local NVMe storage attached to Kubernetes compute worker nodes (storage is persistent, protected by erasure coding), serving database instances with the performance of local disk but with an option to scale out into other nodes (which can be either other compute nodes or separate, disaggregated, storage nodes). Further, the “colder” data is tiered into cheaper storage pools, such as HDD pools or object storage.

Simplified simplyblock architecture
Figure 2: Simplified simplyblock architecture

Applying Simplyblock to Real-World Scenarios

Let’s explore how simplyblock could enhance the setups of the companies we’ve discussed:

Pinterest and TiDB with simplyblock

While TiDB solved Pinterest’s scalability issues, and they are exploring Graviton instances and EBS for a better price-performance ratio and faster data movement, simplyblock could potentially offer additional benefits:

  1. Price/Performance Enhancement: Simplyblock’s storage orchestration could complement Pinterest’s move to Graviton instances, potentially amplifying the price-performance benefits. By intelligently managing storage across different tiers (including EBS and local NVMe), simplyblock could help optimize storage costs while maintaining or even improving performance.
  2. MTTR Improvement & Faster Data Movements: In line with Pinterest’s goal of faster data movement and reduced Mean Time To Recovery (MTTR), simplyblock’s advanced data management capabilities could further accelerate these processes. Its efficient data protection with erasure coding and multi-attach capabilities helps with smooth failovers or node failures without performance degradation. If a node fails, simplyblock can quickly and autonomously rebuild the data on another node using parity information provided by erasure coding, eliminating downtime.
  3. Better Scalability through Disaggregation: Simplyblock’s architecture allows for the disaggregation of storage and compute, which aligns well with Pinterest’s exploration of different instance types and storage options. This separation would provide Pinterest with greater flexibility in scaling their storage and compute resources independently, potentially leading to more efficient resource utilization and easier capacity planning.
Simplyblock’s multi-attach functionality visualized
Figure 3: Simplyblock’s multi-attach functionality visualized

Uber’s Schemaless

While Uber’s custom Schemaless solution on MySQL with NVMe storage is highly optimized, simplyblock could still offer benefits:

  1. Unified Storage Interface: Simplyblock could provide a consistent interface across Uber’s diverse storage needs, simplifying operations.
  2. Intelligent Data Placement: For Uber’s time-series data (like ride information), simplyblock’s tiering could automatically optimize data placement based on age and access patterns.
  3. Enhanced Disaster Recovery: Simplyblock’s asynchronous replication to S3 could complement Uber’s existing replication strategies, potentially improving RPO.

Discord and ScyllaDB

Discord’s move to ScyllaDB already provided significant performance improvements, but simplyblock could further enhance their setup:

  1. NVMe Resource Pooling: By pooling NVMe resources across nodes, simplyblock would allow Discord to further reduce their node count while maintaining performance.
  2. Cost-Efficient Scaling: For Discord’s rapidly growing data needs, simplyblock’s intelligent tiering could help manage costs as data volumes expand.
  3. Simplified Cloning for Testing: Simplyblock’s instant cloning feature could be valuable for Discord’s development and testing processes.It allows for quick replication of production data without additional storage overhead.

What’s next in the NVMe Storage Landscape?

The case studies from Pinterest, Uber, and Discord highlight the importance of continuous innovation in database and storage technologies. These companies have pushed beyond the limitations of managed services like Amazon RDS to create custom, high-performance solutions often built on NVMe storage.

However, the introduction of intelligent storage optimization solutions like simplyblock represents the next frontier in this evolution. By providing an innovative layer of abstraction over diverse storage types, implementing smart data placement strategies, and offering features like thin provisioning and instant cloning alongside tight integration with Kubernetes, simplyblock spearheads market changes in how companies approach storage optimization.

As data continues to grow exponentially and performance demands increase, the ability to intelligently manage and optimize NVMe storage will become ever more critical. Solutions that can seamlessly integrate with existing infrastructure while providing advanced features for performance, cost optimization, and disaster recovery will be key to helping companies navigate the challenges of the data-driven future.

The trend towards NVMe adoption, coupled with intelligent storage solutions like simplyblock is set to reshape the database infrastructure landscape. Companies that embrace these technologies early will be well-positioned to handle the data challenges of tomorrow, gaining a significant competitive advantage in their respective markets.

The post NVMe Storage for Database Optimization: Lessons from Tech Giants appeared first on simplyblock.

]]>
Discord’s “superdisk” architecture Simplified simplyblock architecture Simplyblock’s multi-attach functionality visualized
AWS Storage Optimization: Avoid EBS Over-provisioning https://www.simplyblock.io/blog/avoid-storage-over-provisioning/ Thu, 10 Oct 2024 07:36:59 +0000 https://www.simplyblock.io/?p=2684 “Cloud is expensive” is an often repeated phrase among IT professionals. What makes the cloud so expensive, though? One element that significantly drives cloud costs is storage over-provisioning and lack of storage optimization. Over-provisioning refers to the eager allocation of more resources than required by a specific workload at the time of allocation. When we […]

The post AWS Storage Optimization: Avoid EBS Over-provisioning appeared first on simplyblock.

]]>
“Cloud is expensive” is an often repeated phrase among IT professionals. What makes the cloud so expensive, though? One element that significantly drives cloud costs is storage over-provisioning and lack of storage optimization. Over-provisioning refers to the eager allocation of more resources than required by a specific workload at the time of allocation.

When we hear about hoarding goods, we often think of so-called preppers preparing for some type of serious event. Many people would laugh about that kind of behavior. However, it is commonplace when we are talking about cloud environments.

In the past, most workloads used their own servers, often barely utilizing any of the machines. That’s why we invented virtualization techniques, first with virtual machines and later with containers. We didn’t like the idea of wasting resources and money.

That didn’t stop when workloads were moved to the cloud, or did it?

What is Over-Provisioning?

As briefly mentioned above, over-provisioning refers to allocating more resources than are needed for a given workload or application. That means we actively request more resources than we need, and we know it. Over-provisioning typically occurs across various infrastructure components: CPU, memory, and storage. Let’s look at some basic examples to understand what that means:

  1. CPU Over-Provisioning: Imagine running a web server on a virtual machine instance (e.g., Amazon EC2) with 16 vCPUs. At the same time, your application only requires four vCPUs for the current load and number of customers. You expect to increase the number of customers in the next year or so. Until then, the excess computing power sits idle, wasting resources and money.
  2. Memory Over-Provisioning: Consider a database server provisioned with 64GB of RAM when the database service commonly only uses 16GB, except during peak loads. The unused memory is essentially paid for but unutilized most of the time.
  3. Storage Over-Provisioning: Consider a Kubernetes cluster with ten instances of the same stateful service (like a database), each requesting a block storage volume (e.g., Amazon EBS) of 100 GB but will only slowly fill it up over the course of a year. In this case, each container uses about 20 GB as of now, meaning we over-provisioned 800 GB, and we have to pay for it.

Why is EBS Over-Provisioning an Issue?

EBS Over-provisioning isn’t an issue by itself, and we lived happily ever after (almost) with it for decades. While over-provisioning seems to be the safe bet to ensure performance and plannability, it comes with a set of drawbacks.

  1. High initial cost: When you overprovision, you pay for resources you don’t use from day one. This can significantly inflate your cloud bill, especially at scale.
  2. Resource waste: Unused resources aren’t just a financial burden. They also waste valuable computing power that could be better allocated elsewhere. Not to mention the environmental effects of over-provisioning, think CO2 footprint.
  3. Hard to estimate upfront: Predicting exact resource needs is challenging, especially for new applications or those with variable workloads. This uncertainty often leads us to very conservative (and excessive) provisioning decisions.
  4. Limitations when resizing: While cloud providers like AWS allow resource resizing, limitations exist. Amazon EBS volumes can only be modified every 6 hours, making it difficult to adjust to changing needs quickly.

On top of those issues, which are all financial impact related, over-provisioning can also directly or indirectly contribute to topics such as:

  • Reduced budget for innovation
  • Complex and hard-to-manage infrastructures
  • Potential compliance issues in regulated industries
  • Decreased infrastructure efficiency

The Solution is Pay-By-Use

Pay-by-use refers to the concept that customers are billed only for what they actually use. That said, using our earlier example of a 100 GB Amazon EBS volume where only 20 GB is used, we would only be charged for those 20 GB. As a customer, I’d love the pay-by-use option since it makes it easy and relieves me of the burden of the initial estimate.

So why isn’t everyone just offering pay-by-use models?

The Complexity of Pay-By-Use

Many organizations dream of an actual pay-by-use model, where they only pay for the exact resources consumed. This improves the financial impact, optimizes the overall resource utilization, and brings environmental benefits. However, implementing this is challenging for several reasons:

  1. Technical Complexity: Building a system that can accurately measure and bill for precise resource usage in real time is technically complex.
  2. Performance Concerns: Constant scaling and de-scaling to match exact usage can potentially impact performance and introduce latency.
  3. Unpredictable Costs: While pay-by-use can save money, it can also make costs less predictable, making budgeting challenging.
  4. Legacy Systems: Many existing applications aren’t designed to work with dynamically allocated resources.
  5. Cloud Provider Greed: While this is probably exaggerated, there is still some truth. Cloud providers overcommit CPU, RAM, and network bandwidth, which is why they offer both machine types with dedicated resources and ones without (where they tend to over-provision resources, and you might encounter the “noisy neighbor” problem). On the storage side, they thinly provision your storage out of a large, ever-growing storage pool.

Over-Provisioning in AWS

Like most cloud providers, AWS has several components where over-provisioning is typical. The most obvious one is resources around Amazon EC2. However, since many other services are built upon EC2 machines (like Kubernetes clusters), this is the most common entry point to look into optimization.

Amazon EC2 (CPU and Memory)

When looking at Amazon EC2 instances to save some hard-earned money, AWS offers some tools by itself:

  • Use AWS CloudWatch to monitor CPU and memory utilization.
  • Implement auto-scaling groups to adjust instance counts dynamically based on demand.
  • Consider using EC2 Auto Scaling with predictive scaling to anticipate future needs.

In addition, some external tools, such as AutoSpotting or Cast.ai, enable you to find over-provisioned VMs and adjust them accordingly automatically or exchange them with so-called spot instances. Spot instances are VM instances that are way cheaper but can be taken away from you with only a few seconds’ notice. The idea is that AWS offers these instances at a reduced rate when they can’t be sold for their regular price. That said, if the capacity is required, they’ll take them away from you—still a great way to save some money.

Last but not least, companies like DoIT work as resellers for hyperscalers like AWS. They have custom rates and offer additional features like bursting beyond your typical requirements. This is a great way to get cheaper VMs and extra services. It’s worth a look.

Amazon EBS Storage Over-Provisioning

One of the most common causes of over-provisioning happens with block storage volumes, such as Amazon EBS. With EBS, the over-provisioning is normally driven by:

  • Pre-allocated Capacity: EBS volumes are provisioned with a fixed size, and you pay for the entire allocated space regardless of usage.
  • Modification Limitations: EBS volumes can only be modified every 6 hours, making rapid adjustments difficult.
  • Performance Considerations: A common belief is that larger volumes perform better, so people feel incentivized to over-provision.

One interesting note, though, is that while customers have to pay for the total allocated size, AWS likely uses technologies such as thin provisioning internally, allowing it to oversell its actual physical storage. Imagine this overselling margin would be on your end and not the hyperscaler.

How Simplyblock Can Help with EBS Storage Over-Provisioning

Simplyblock offers an innovative storage optimization platform to address storage over-provisioning challenges. By providing you with a comprehensive set of technologies, simplyblock enables several features that significantly optimize storage usage and costs.

Thin Provisioning

Thin provisioning is a technique where a storage entity of any capacity will be created without pre-allocating the requested capacity. A thinly provisioned volume will only require as much physical storage as the data consumes at any point in time. This enables overcommitting the underlying storage, like ten volumes with a provisioned capacity of 1 TB each. Still, only 100GB being used will require around 1 TB at this time, meaning you can save around 9 TB of storage that is not paid for unless used.

Simplyblock’s thin provisioning technology allows you to create logical volumes of any size without pre-allocating the total capacity. You only consume (and pay for) the actual space your data uses. This eliminates the need to over-provision “just in case” and allows for more efficient use of your storage resources. When your actual storage requirements increase, simplyblock automatically allocates additional underlying storage to keep up with your demands.

Two thinly provisioned devices and the underlying physical storage

Copy-on-Write, Snapshots, and Instant Clones

Simplyblock’s storage technology is a fully copy-on-write-enabled system. Copy-on-write is a technique also known as shadowing. Instead of copying data right away when multiple copies are created, copy-on-write will only create a second instance when the data is actually changed. This means the old version is still around since other copies still refer to it, while only one specific copy refers to the changed data. Copy-on-write enables the instant creation of volume snapshots and clones without duplicating data. This is particularly useful for development and testing environments, where multiple copies of large datasets are often needed. Instead of provisioning full copies of production data, you can create instant, space-efficient clones specifically attractive for databases, AI / ML workloads, or analytics data.

Copy-on-write technique explained with two files referring to shared, similar parts and modified, unshared parts

Transparent Tiering

With most data sets, parts of the data are typically assumed to be “cold,” meaning that the data is very infrequently used, if ever. This is true for any data that needs to be kept available for regulatory reasons or historical manufacturing data (such as process information for car part manufacturing). This data can be moved to slower but much less expensive storage options. Simplyblock automatically moves infrequently accessed data to cheaper storage tiers such as object storage (e.g., Amazon S3 or MinIO) and non-NVMe SSD or HDD pools while keeping hot data on high-performance storage. This tiering is completely transparent to your applications, database, or other workload and helps optimize costs without sacrificing performance. With tiering integrated into the storage layer, application and system developers can focus on business logic rather than storage requirements.

Automatic tiering, transparently moving cold data parts to slower but cheaper storage

Storage Pooling

Storage pooling is a technique in which multiple storage devices or services are used in conjunction. It enables technologies like thin provisioning and data tiering, which were already mentioned above.

By pooling multiple cloud block storage volumes (e.g., Amazon EBS volumes), simplyblock can provide better performance and more flexible scaling. This pooling allows for more granular storage growth, preventing the provision of large EBS volumes upfront.

Additionally, simplyblock can leverage directly attached fast SSD storage (NVMe), also called local instance storage, and make it part of the storage pool or use it as an even faster workload-local data cache.

NVMe over Fabrics

NVMe over Fabrics is an industry-standard for remotely attaching block devices to clients. It can be assumed to be the successor of iSCSI and enables the full feature set and performance of NVMe-based SSD storage. Simplyblock uses NVMe over Fabrics (specifically the NVMe/TCP version) to provide high-performance, low-latency access to storage.

This enables the consolidation of multiple storage locations into a centralized one, enabling even greater savings on storage capacity and compute power.

Pay-By-Use Model Enablement

As stated above, pay-by-use models are a real business advantage, specifically for storage. Implementing a pay-by-use model in the cloud requires taking charge of how storage works. This is complex and requires a lot of engineering effort. This is where simplyblock helps bring a competitive advantage to your doorstep.

With its underlying technology and features such as thin provisioning, simplyblock makes it easier for managed service providers to implement a true pay-by-use model for their customers, giving you the competitive advantage at no extra cost or development effort, all fully transparent to your database or application workload.

AWS Storage Optimization with Simplyblock

By addressing the core issues of EBS over-provisioning, simplyblock helps reduce costs and improves overall storage efficiency and flexibility. For businesses struggling with storage over-provisioning in AWS, simplyblock offers a compelling solution to optimize their infrastructure and better align costs with actual usage.

In conclusion, while over-provisioning remains a significant challenge in AWS environments, particularly with storage, simplyblock paves the way for more efficient, cost-effective cloud storage optimization management. By combining advanced technologies with a deep understanding of cloud storage dynamics, simplyblock enables businesses to achieve the elusive goal of paying only for what they use without sacrificing performance or flexibility.

Take your competitive advantage and get started with simplyblock today.

The post AWS Storage Optimization: Avoid EBS Over-provisioning appeared first on simplyblock.

]]>
Two thinly provisioned devices and the underlying physical storage Copy-on-write technique explained with two files referring to shared, similar parts and modified, unshared parts Automatic tiering, transparently moving cold data parts to slower but cheaper storage
Simplyblock as alternative to Ceph: A Comprehensive Comparison https://www.simplyblock.io/blog/simplyblock-vs-ceph/ Wed, 05 Jun 2024 12:13:26 +0000 https://www.simplyblock.io/?p=326 Introduction When it comes to software-defined storage solutions, simplyblock and Ceph are two options worth considering. Let’s explore the features and differences between these two platforms to help you make an informed decision for your organization. Simplyblock: a Next-Generation Storage Solution Simplyblock is a high-performance, software-defined cluster storage solution built on NVMe and NVMe-oF technologies. […]

The post Simplyblock as alternative to Ceph: A Comprehensive Comparison appeared first on simplyblock.

]]>

Introduction

When it comes to software-defined storage solutions, simplyblock and Ceph are two options worth considering. Let’s explore the features and differences between these two platforms to help you make an informed decision for your organization.

Simplyblock: a Next-Generation Storage Solution

Simplyblock is a high-performance, software-defined cluster storage solution built on NVMe and NVMe-oF technologies. It offers block, file, and object storage interfaces and runs on standard x86 server hardware without limitations.

One of the key strengths of simplyblock is its support for multiple thin storage interfaces, including NVME-oF and iSCSI for block storage, POSIX-compatible file storage, and an S3-compatible object store. These interfaces are provided as thin, container-based solutions that offload functionality into the storage cluster. Although the current stable release focuses on block storage, future releases will expand the supported interfaces.

Additionally, simplyblock is designed to seamlessly operate in cloud-native environments, enabling hybrid and multi-cloud scenarios. It has already been integrated with Amazon AWS EC2 and EKS , as well as other platforms like OpenStack, OpenNebula, Proxmox, and Kubernetes (via CSI).

The resource efficiency of simplyblock is outstanding, as evidenced by its high IOPS/CPU-core ratio, efficient raw to effective storage rates, and low CPU consumption for compression and deduplication. This efficiency enables the cluster to achieve ultra-high scalability and ultra-low access latency. Simplyblock can be deployed in both hyper-converged and disaggregated setups, offering flexibility to meet various infrastructure requirements.

Ceph: Unified and Feature-Rich Storage

Ceph is a highly scalable, software-defined storage solution that provides unified block, file, and object storage interfaces. It is widely used in private cloud environments, especially in combination with OpenStack, where it has gained popularity as the go-to storage solution.

Ceph utilizes its own storage driver (rbd) that is integrated into the Linux Kernel and can also be used on other platforms as a third-party driver. It enables seamless connectivity between hosts and the Ceph cluster. In addition to OpenStack, Ceph offers deep integrations with Kubernetes through a separate CSI driver, as well as other platforms.

One of the notable strengths of Ceph is its compatibility with various storage types and interfaces, including HDD, SSD, SATA, SAS, and NVME. It can run on standard x86 server hardware, allowing for flexibility in combining different models, capacities, and vendors to meet specific storage needs.

At the core of Ceph lies the CRUSH algorithm, which is a powerful and decentralized data placement mechanism. This algorithm ensures efficient and flexible distribution of data across the cluster, contributing to Ceph’s scalability and fault tolerance capabilities.

Commonalities between Simplyblock and Ceph

Both simplyblock and Ceph offer unified storage clusters and support integration with platforms like OpenStack and Kubernetes. They provide essential data services such as snapshots, disaster recovery, copy-on-write, disaster recovery (cross-site data mirroring), compression, encryption and de-duplication of storage pools. Both solutions follow a similar hierarchy and offer multi-tenancy capabilities. They can be managed via CLI and provide management APIs. Additionally, they are deployable in hyper-converged or disaggregated models and operate on standard TCP/IP infrastructure and do not require FC or RDMA networking.

Both simplyblock and Ceph require a management node cluster. In simplyblock, this cluster is automatically deployed as a cloud-native docker swarm, providing a hassle-free setup. On the other hand, Ceph offers multiple automated deployment options, including Ansible playbooks.

One of the key strengths shared by both solutions is their highly scalable and fully distributed storage cluster architecture. Data placement is policy-based, utilizing failure domains, without relying on a centralized directory. This approach ensures exceptional scalability. Additionally, this architecture enables linear performance scalability, allowing clusters to scale into the multi-petabyte range. Here are some advantages of this cluster technology: Manual balancing of storage utilization or performance demand across systems is unnecessary. By adding nodes, the capacity and performance of the cluster grow linearly, while the cluster algorithms optimally balance capacity and load across all nodes. In the event of a device or node failure, the cluster automatically recovers to a state of full redundancy. There is no need to create RAID groups, manage hot spares, or perform manual recovery tasks. The impact of a device failure on a reasonably sized cluster is minimal, as the cluster seamlessly adjusts to the removal of a device or node by rebalancing load and capacity. Hardware repair, maintenance, and upgrades of devices and nodes are straightforward and safe, with no interruption to operations or significant performance degradation. The cluster architecture ensures that these activities can be carried out without disruptions.

Differences between Simplyblock and Ceph:

  1. Cost/Performance and Performance Density : Simplyblock outperforms Ceph in terms of cost/performance and performance density. While Ceph has average access latencies rarely dropping below 3 ms and an average of about 2,000 IOPS/CPU-core without compression, encryption, deduplication or other data services turned on. That limits the performance density to about 100.000 IOPS per storage node on a physical server with 48 CPU cores, consuming about 500 Watts of power and 1 or 2 rack units. Simplyblock consistently delivers average access latencies well below 100 microseconds and can scale up to approximately 10 million IOPS per storage node. This significant improvement in performance density allows for higher storage capacity and efficiency while maintaining low latency.
  2. Raw to Effective Capacity: Ceph’s default setup with three replicas results in a raw-to-effective storage ratio of 3:1. While erasure coding is possible, it increases access latency and CPU load. On the other hand, simplyblock natively supports distributed striping of data with n+1 and n+2 redundancy levels, enabling a data durability of 99.99999% and data availability of 99.9999% with a raw-to-effective storage ratio of 0.75. Additionally, simplyblock offers efficient compression and deduplication data services, achieving a raw-to-effective storage ratio of 1:2.5 (for 1 GB of raw storage, 2.5 GB of effective storage are available).
  3. Hardware and Co-Location Costs: Increasing storage capacity density while maintaining performance demand can significantly improve profitability by reducing hardware and co-location costs. Simplyblock’s optimized density of storage capacity allows for more capacity on the same hardware, leading to cost savings for cloud service providers. It also reduces the need for expensive instances with local NVMe storage in hyperscaler clouds.
  4. Host Integration : While Ceph uses its own rdb storage driver, simplyblock exposes storage via NVMe-oF (and iSCSI for compatibility). The entire simplyblock cluster appears as a single NVMe-oF subsystem, and the scheduling of primary and secondary NVMe-oF target endpoints to initiator hosts is fully automated. This avoids entry point bottlenecks into the cluster, and the open standard of NVMe-oF ensures ongoing driver enhancement and maintenance across different operating systems and virtualization platforms.
  5. Configuration : Ceph offers extensive configuration options, including replication and erasure coding with customizable schemas, multiple back-storage management options, and flexible deployment schemas. In contrast, simplyblock has limited configuration flexibility, with most decisions automated and deployment options simplified. Users can choose between single node, disaggregated, or HCI cluster deployments, allocate devices to clusters, and select cluster-level security options. Configuration options for pools and logical volumes are limited, focusing on QoS parameters and data services.
  6. Deployments : Simplyblock stands out with its native support for cloud-native environments. It seamlessly runs on AWS infrastructure, leveraging AWS resources efficiently and providing a cost-effective alternative to AWS-native EBS storage. This native integration simplifies deployment and ensures optimal performance within cloud environments.

Conclusions

By considering these advantages, it becomes clear that simplyblock offers superior cost/performance, performance density, scalability, and cloud-native support compared to Ceph. With its optimized resource utilization and advanced features, simplyblock is a compelling choice for organizations and cloud builders seeking a high-performance, software-defined storage solution.

Read more about simplyblock storage as alternative to Ceph storage on our website: https://www.simplyblock.io/ceph-alternative

The post Simplyblock as alternative to Ceph: A Comprehensive Comparison appeared first on simplyblock.

]]>
Simplyblock for AWS – Performance Test https://www.simplyblock.io/blog/simplyblock-for-aws-some-performance-test-results/ Wed, 05 Apr 2023 12:06:31 +0000 https://www.simplyblock.io/?p=332 Introduction At simplyblock.io we have recently conducted a performance test on our NVMe-oF storage solution on AWS. The results showcase remarkable performance capabilities, solidifying its position as a reliable choice for organizations and service providers with data-intensive workloads. In this article, we delve into the impressive test results and explore the benefits of simplyblock’s NVMe-oF […]

The post Simplyblock for AWS – Performance Test appeared first on simplyblock.

]]>
undefined

Introduction

At simplyblock.io we have recently conducted a performance test on our NVMe-oF storage solution on AWS. The results showcase remarkable performance capabilities, solidifying its position as a reliable choice for organizations and service providers with data-intensive workloads. In this article, we delve into the impressive test results and explore the benefits of simplyblock’s NVMe-oF storage solution.

Impressive Performance Metrics

The performance test revealed outstanding results, with simplyblock’s NVMe-oF storage solution reaching up to 200,000 IOPS on a single node with a single host. These Input/Output Operations Per Second ensure efficient processing of demanding workloads such as machine learning, analytics, genomics, and database operations. This performance level is achieved by harnessing the power of advanced NVMe-oF technology, leveraging the high-speed and low-latency capabilities of NVMe storage devices.

Linear Scalability and Flexibility

One notable feature of simplyblock’s solution is its linear scalability. By adding more nodes, organizations can further enhance their storage performance, catering to evolving business needs. This scalability empowers businesses to effortlessly handle growing workloads and ensure optimal performance without encountering bottlenecks.

Low Latency and Rapid Data Access

The performance test results also showcased an average access latency of approximately 10 microseconds, highlighting the solution’s ability to deliver data rapidly. With such low latency, critical applications can process data with minimal delay, enabling organizations to extract insights and make real-time decisions more effectively.

Data Protection and Reliability

Simplyblock prioritizes data protection and reliability in their NVMe-oF storage solution. Built with redundant features such as RAID, the solution safeguards against hardware failures, ensuring that data remains intact and accessible even in the face of unforeseen events. This level of reliability provides peace of mind to organizations handling sensitive and critical data.

Conclusion

Simplyblock’s NVMe-oF storage solution has proven its mettle in the performance test, showcasing exceptional performance, scalability, and reliability. With up to 200,000 IOPS, low latency, and linear scalability, organizations can confidently deploy data-intensive applications and workloads without compromising on performance. Moreover, the built-in data protection features offer an additional layer of security and reliability.

As businesses continue to grapple with ever-increasing data volumes and demanding workloads, simplyblock’s NVMe-oF storage solution emerges as a powerful and efficient solution, enabling organizations and service providers to unlock the full potential of their data-driven operations.

Simplyblock can serve as an alternative to Amazon EBS storage as it’s fully compatible with AWS platform and can be natively deployed on EC2 instances. The technology can be deployed in any other cloud environment, including OpenStack, OpenNebula, Proxmox or VMware serving as an alternative to storage solutions such as Ceph or vSAN.

The post Simplyblock for AWS – Performance Test appeared first on simplyblock.

]]>
undefined
Simplyblock for AWS – Whitepaper https://www.simplyblock.io/blog/simplyblock-for-aws-whitepaper/ Wed, 05 Apr 2023 12:06:04 +0000 https://www.simplyblock.io/?p=330 Simplyblock has recognized the limitations of conventional cloud storage services, specifically AWS Elastic Block Storage (EBS) , and has introduced a groundbreaking alternative. This whitepaper presents a comprehensive overview of simplyblock’s solution, highlighting its ability to address the drawbacks of EBS while delivering enhanced performance and cost-effectiveness. By harnessing local NVMe storage attached to virtualized […]

The post Simplyblock for AWS – Whitepaper appeared first on simplyblock.

]]>
Simplyblock has recognized the limitations of conventional cloud storage services, specifically AWS Elastic Block Storage (EBS) , and has introduced a groundbreaking alternative. This whitepaper presents a comprehensive overview of simplyblock’s solution, highlighting its ability to address the drawbacks of EBS while delivering enhanced performance and cost-effectiveness. By harnessing local NVMe storage attached to virtualized or bare-metal EC2 instances, simplyblock offers a high-performance block storage option that outperforms traditional storage solutions. Simplyblock offers a compelling alternative to Amazon EBS.

The core architecture of simplyblock revolves around cluster-based infrastructure, ensuring remarkable reliability, durability, and availability. With automated health-checking mechanisms and seamless scalability, simplyblock enables smooth operations without compromising performance or data integrity. This unique approach empowers users to optimize resource utilization and achieve significant cost savings, making it an ideal choice for organizations with demanding workloads and stringent performance requirements.

By leveraging simplyblock’s solution, businesses can break free from the limitations of AWS EBS and experience unparalleled performance and efficiency. The flexible deployment options, including virtualized or bare-metal EC2 instances, coupled with seamless integration with AWS infrastructure, provide a hassle-free experience for administrators and developers. With simplyblock’s innovative approach to high-performance block storage, organizations can unlock new possibilities and overcome the limitations of traditional cloud storage services.

Feature & Price Comparison: Simplyblock vs EBS

Get the “Simplyblock for AWS” White Paper Now:

Simplyblock for AWS

Read more on our website: https://www.Simplyblock.io/aws-storage

The post Simplyblock for AWS – Whitepaper appeared first on simplyblock.

]]>
Simplyblock for AWS (demo videos) https://www.simplyblock.io/blog/simplyblock-for-aws-demo/ Wed, 05 Apr 2023 12:05:33 +0000 https://www.simplyblock.io/?p=328 Introduction We have recently conducted a demo showcasing capabilities of simplyblock’s high-perfromance block storage on the AWS platform. The demo offers a glimpse into the seamless integration, exceptional performance, and cost-effectiveness that simplyblock brings to AWS users. In this article, we delve into the highlights of the simplyblock demo and explore how it can revolutionize […]

The post Simplyblock for AWS (demo videos) appeared first on simplyblock.

]]>
Introduction

We have recently conducted a demo showcasing capabilities of simplyblock’s high-perfromance block storage on the AWS platform. The demo offers a glimpse into the seamless integration, exceptional performance, and cost-effectiveness that simplyblock brings to AWS users. In this article, we delve into the highlights of the simplyblock demo and explore how it can revolutionize storage infrastructure on AWS.

Unleashing Unmatched Performance

During the simplyblock demo, AWS users can witness firsthand the unparalleled performance of simplyblock’s block storage solution. Leveraging the power of local NVMe storage attached to EC2 instances, simplyblock achieved impressive IOPS rates, ensuring lightning-fast data access and processing. The demo showcases the seamless scalability and linear performance scaling of simplyblock’s cluster-based architecture, allowing organizations to handle demanding workloads with ease.

Unrivaled Cost Efficiency

One of the key takeaways from the simplyblock demo is its cost-effectiveness. By offering a highly efficient alternative to traditional AWS storage solutions, simplyblock demonstrated its ability to significantly reduce storage costs for AWS users. The demo highlights how simplyblock’s solution leverages resource pooling, thin provisioning, and effective compaction technology to optimize storage utilization and drive substantial cost savings. With simplyblock, organizations can achieve superior performance while staying within budgetary constraints.

Conclusion

The simplyblock demo for AWS showcases the game-changing potential of our block storage solution. With its unmatched performance, seamless scalability, and impressive cost efficiency, simplyblock offers AWS users a powerful storage alternative that can revolutionize their infrastructure. By harnessing the performance capabilities of local NVMe storage and lev eraging innovative storage technologies, simplyblock opens up new possibilities for organizations seeking optimal performance, scalability, and cost-effectiveness in their AWS environments. With simplyblock, AWS users can unlock the true potential of their storage infrastructure and propel their business forward into a new era of performance and efficiency.

Read more about our block storage for AWS on our website: https://www.Simplyblock.io/aws-storage

The post Simplyblock for AWS (demo videos) appeared first on simplyblock.

]]>