Comprehensive Guide Archives | simplyblock

How to benefit from AWS Enterprise Discount Program (EDP)

Rob Pankow — Thu, 13 Jun 2024 12:08:44 +0000

What is the AWS Enterprise Discount Program (EDP)?

The AWS Enterprise Discount Program (EDP) is a discount initiative designed for organizations spending at least $1m per year on AWS cloud services and committed to extensive and long-term usage of Amazon Web Services (AWS). The program helps businesses optimize their cloud spending while expanding their operations on AWS. By entering into an EDP agreement, enterprises can secure significant cost savings and enhanced value from their AWS investments, which is particularly advantageous during economic downturns.

How does the AWS Enterprise Discount Program (EDP) Work?

The AWS Enterprise Discount Program operates on a tiered discount system based on an organization’s annual AWS spending commitment, usually starting at $1 million per year. Key features of the program include:

Customizable Discounts : Discounts are negotiated based on total committed spend and commitment duration, typically ranging from 1 to 5 years. Greater commitments yield higher discounts.
Broad Coverage : Discounts apply to nearly all AWS services and regions, ensuring consistent savings across the AWS ecosystem.
Marketplace offerings : AWS Marketplace can contribute up to 25% EDP spend.
Scalability : As AWS usage grows, the program allows organizations to benefit from increased discounts, promoting a sustainable and cost-effective cloud strategy.

What is EDP in AWS?

In AWS EDP stands for the Enterprise Discount Program . This is a contractual agreement between AWS and enterprises that guarantees significant discounts in exchange for a minimum level of AWS spending over a specified period. This program helps reduce cloud costs and encourages deeper engagement with the AWS ecosystem, fostering long-term partnerships and more efficient cloud usage.

How to Negotiate AWS EDP?

When negotiating an AWS Enterprise Agreement , consider these strategies to maximize benefits:

Understand Your Usage Patterns : Analyze your current and projected AWS usage to accurately determine your commitment levels.
Leverage Historical Spend : Use your historical AWS spend data to negotiate better discount rates.
Seek Flexibility : Aim for terms that allow flexibility in service usage and scalability.
Engage AWS Account Managers : Collaborate with AWS account managers to understand all available options and potential incentives.
Evaluate Support and Training : Include provisions for enhanced support and training services in the agreement.

How to Join AWS EDP?

To join the AWS Enterprise Discount Program , follow these steps:

Assess Eligibility : Ensure your organization meets the minimum annual spend requirement, typically around $1 million.
Contact AWS Sales : Reach out to your AWS account manager or AWS sales team to express interest in the program.
Prepare for Negotiations : Gather your usage data and financial projections to negotiate the best possible terms.
Sign Agreement : Finalize and sign the EDP agreement, detailing the committed spend and discount structure.
Monitor and Optimize : Regularly review your AWS usage and costs to ensure you are maximizing the benefits of the EDP.

Understanding AWS Marketplace with AWS EDP

To maximize the benefits of the AWS Enterprise Discount Program , it’s crucial to understand your AWS Marketplace usage. Determine which Independent Software Vendors (ISVs) you are currently purchasing from and explore opportunities to route these purchases through the AWS Marketplace. Purchases made via the AWS Marketplace can contribute to your total commitment under the EDP, with a cap of 25%. This can be a strategic way to ensure your software investments also help you meet your EDP commitments.

Can i Join EDP as a Startup?

For startups, joining the AWS Enterprise Discount Program (EDP) might not be feasible due to the high minimum spend requirement, typically around $1 million annually. However, there are other ways to maximize savings on AWS:

AWS Credits : Startups can benefit from AWS credits through programs like the AWS Activate program. These credits can significantly reduce your cloud costs during the early stages of growth. For example, AWS Activate provides up to $100,000 in credits for eligible startups.
Marketplace Solutions : Utilize the AWS Marketplace to purchase software solutions that can contribute to your overall AWS spend. For example, AWS marketplace offerings such as simplyblock can help you significantly reduce spending on AWS storage services while scaling the operations.

By leveraging these alternatives, startups can achieve substantial savings and optimize their AWS spending without needing to meet the high thresholds required for the EDP.

What’s the Difference between an EDP and a PPA?

EDP (Enterprise Discount Program) offers custom discounts based on high-volume, long-term AWS usage commitments, providing scalable savings across most AWS services. In contrast, a PPA (Private Pricing Agreement) is a more flexible, negotiated contract tailored to specific needs, often used for unique pricing arrangements and custom terms that might not fit the broader structure of an EDP. While both aim to reduce cloud costs, an EDP is typically for larger, ongoing commitments, whereas a PPA can address more specific, immediate requirements.

Other AWS Programs and Discounts

AWS offers various pricing models to help organizations achieve cost savings based on usage frequency, volume, and commitment duration. Here are some common ones:

Spot Instances: You use spare AWS capacity at a lower price. But, AWS can take back this capacity when they need it. Best for flexible workloads.
Reserved Instances: You commit to use AWS for a long time (1-3 years), and in return, you get a big discount. Best for predictable workloads.
Savings Plans: Similar to Reserved Instances, but more flexible. You commit to use a certain amount of AWS services, and you get a discount.
Vantage Autopilot : Provides automated optimization of AWS costs by dynamically adjusting instances and resources based on usage patterns, helping organizations reduce their AWS bills without manual intervention. Vantage autopilot can be used alongside simplyblock to further reduce storage cost with lower underlying EC2 instance costs (simplyblock deploys onto EC2 instances with local NVMe storage, pooling the resources into scalable enterprise-grade storage system).

How can Simplyblock be used with AWS EDP?

simplyblock can be a game-changer for your AWS Enterprise Discount Program (EDP) . It offers high-performance cloud block storage that not only enhances performance of your databases and applications but also brings cost efficiency. Most importantly, spending on simplyblock through AWS Marketplace can contribute towards the 25% marketplace spend requirement of AWS EDP. This means you can leverage simplyblock’s services while also fulfilling your commitment to AWS. It’s a win-win situation for AWS users seeking performance, scalability, and cost-effectiveness.

Simplyblock uses NVMe over TCP for minimal access latency, high IOPS/GB, and efficient CPU core utilization, surpassing local NVMe disks and Amazon EBS in cost/performance ratio at scale. Ideal for high-performance Kubernetes environments, simplyblock combines the benefits of local-like latency with the scalability and flexibility necessary for dynamic AWS EKS deployments , ensuring optimal performance for I/O-sensitive workloads like databases. Using erasure coding (a better RAID) instead of replicas helps to minimize storage overhead without sacrificing data safety and fault tolerance.

Additional features such as instant snapshots (full and incremental), copy-on-write clones, thin provisioning, compression, encryption, and many more, simplyblock meets your requirements before you set them. Get started using simplyblock right now or learn more about our feature set .

The post How to benefit from AWS Enterprise Discount Program (EDP) appeared first on simplyblock.

IOPS vs Throughput vs Latency – Storage Performance Metrics

Chris Engelbert — Wed, 24 Apr 2024 12:13:37 +0000

IOPS, throughput, and latency are interrelated metrics that provide insight into the read, write, and access performance of storage entities and network interconnects.

Measuring the performance of a storage solution isn’t hard, but understanding the measured values is. IOPS, throughput, and latency are related to each other, but how? In this blog post I try to explain how they are related and what you should know to really get the most out of your performance test.

IOPS, throughput, and latency are all very important metrics. HDDs (spinning or rotating disks), SSDs (I mean SATA or SAS connected ones), and NVMe devices (connected through some PCIe one way or the other), have very different performance profiles.

Simplyblock’s storage engine uses NVMe disks and the NVMe protocol to provide its virtual (or logical) storage volumes to the outside world. Enough of simplyblock, though, let’s get into the performance metrics!

What is IOPS?

IOPS (spoken as eye-ops) means input/output operations per second. An IOPS basically says how many actual read or write operations a storage device can perform (and sustain) in a single second.

Reads are operations that access information stored on the storage device, while writes perform operations adding new or updating existing information on the disk.

As a good rule of thumb, higher IOPS means better performance. However, it is important to understand how IOPS relates to throughput and block size. A measured or calculated amount of IOPS for one specific block size doesn’t necessarily translate to the same amount of IOPS with another block size. We’ll come back to block size later.

How to Calculate IOPS?

Calculating IOPS is a fairly simple formula. We take the throughputs (read and write combined) and divide it by the number of seconds we measured.

IOPS = (ReadThroughputs + WriteThroughputs) / TimeInSeconds

With that knowledge we can actually simplify the calculation down to a single measurement of read and write throughput.

IOPS = (ReadThroughput + WriteThroughput) / BlockSize

With spinning disks it is a bit more complicated since we have additional seek times. Since spinning disks aren’t used for high-performance storage anymore, I’ll leave it out.

Anyhow, while this sounds easy enough, sometimes, write operations require additional read operations to succeed. That can happen in a RAID setup (or similar technologies like erasure coding), where multiple bits of information are “mathematically” connected into recovery information (often called parity data). Meaning, we take two (or more) input information, connect them through some mathematical formula, and write the result to different disks.

As a result, we can lose one of the input information but use the calculation result and the other input information to reverse the mathematical operation and extract the lost information. Anyway, the result is that we have 1 additional read and 1 additional write for every write operation:

Write: New information to write
Read: Second information input for mathematical calculation
Write: Recovery calculation result

For reading data from such a setup, as long as the information to be read isn’t lost, it’s a single-read operation. In the case of recovering the information due to disk failure (or any other issue), there will be two read operations (parity and the second information input), as well as the reversed mathematical operation.

When calculating how many IOPS storage can sustain, we need to make sure we take those cases into account. And finally, some storage solutions may have a slightly asynchronous ability when it comes to writing and reading, meaning they aren’t equally fast. That said, to find the max IOPS value, we may have to test with a 60:40 read-write ratio or any other one. This is a plain trial and error situation. It’s always best to test with your specific use case and read-write ratio.

What is Throughput?

Throughput is the measurement of how much data can be transferred (read or written) in a given amount of time, for storage solutions it’s typically a measure in bytes per second. Just be careful since there are metrics using bits per second, too. Especially when your storage is connected via a network link (like Ethernet or FiberChannel).

A higher throughput normally means better performance, as in faster data transfers. In comparison to IOPS, throughput tells us a specific amount of data that can be read or written in a given amount of time. This metric can be used directly to understand if a storage solution is “fast enough.” It will not tell us if you can sustain a certain pattern of read or write operations though.

How to Calculate Throughput?

To calculate the throughput without measuring it, we take the amount of IOPS (read and write) and multiply it by the configured block size.

Throughput = (Read IOPS + Write IOPS) x BlockSize

We differentiate between read and write IOPS because the maximum amount of IOPS for either direction may not necessarily equal the other direction. See the How to Calculate IOPS section for more information.

IOPS vs Throughput

Since IOPS and throughput are related to each other. That means, to give a full picture of storage performance both values are required.

Throughput gives us a feeling of how much actual data we can read or write in a given amount of time. If we know that we have to write a 1GB file in less than 10 seconds, we know what write throughput is required.

On the other hand, if we know that our database makes many small read operations, we want to know the potential read IOPS limit of our storage.

Anyhow, there is more to performance than those two metrics, latency.

Figure 1: How is latency and throughput defined?

What is Latency?

Latency, in storage systems, describes the amount of time a storage entity needs to process a single data request. With spinning disks that included seek time (the time to find a specific location and position the read-write heads and for the platter to be at the correct position), but it also includes computation time inside the disk controller.

The latter still holds true for flash storage devices. Due to features such as wear leveling, the controller has to calculate where a piece of information is stored at the current point in time.

That said, latency directly affects how many IOPS can be performed, hence the throughput. Thanks to flash technology, the latency these days is fairly consistent (per device) and doesn’t depend on the physical location anymore. With spinning disks, the latency was different depending on the current and new position of the heads and if the information was stored on the inside or outside of the platter. Luckily, we can ignore this information today. Measuring the latency once per disk should be enough.

IOPS, throughput, and latency are directly related to each other but provide different information.

While IOPS tells us how many actual operations we can perform per second, throughput tells us how much data can be transferred at the same time. Latency, however, tells us how long we have to wait for this operation to be performed (or finished, depending on what type of latency you’re looking for).

There is no one best metric, though. When you want to understand the performance characteristics of a storage entity, all 3 metrics are equally important. Though, for a specific question, one may have more weight than the other.

Reads and Writes

So far, we have always talked about reading and writing. We also mentioned that their ratio may impact the overall performance. In addition, we briefly talked about RAID and erasure coding (and now I just throw in the mix mirroring or replication).

Diverging read and write speeds mostly come from hardware implications, such as flash cells needing to be erased before being written. That, inherently, makes writes slower.

For other situations, such as RAID and similar, additional reads or writes have to be executed so that the effective write IOPS value may be different from the physically possible write IOPS one.

Generally, it’s a valid approach to try to find the absolute maximum read and write values (in terms of throughput and IOPS) by trying out different ratios of writing and reading in the same test. I’d recommend starting with a 50:50 ratio and then going 60:40 or 40:60, seeing which one has the higher result, and then making smaller steps (1 or 5 increments or decrements) until you seem to get the highest numbers.

This doesn’t necessarily reflect your real-world pattern, though, because depending on the workload you run, you may have a different ratio requirement.

Why do I have more Reads than Writes?

Workloads like static asset web servers (found on CDN providers), generally, have a much read than write rate. I mean their whole purpose is to cache data from other systems and present them as fast and as many times as possible. In this case, the storage should provide the fastest read throughput, as well as the highest read IOPS.

While those systems need to store and update existing data in the cache, the extreme imbalance of read overwrite makes write performance (almost) negligible.

Another example of such a system may be an analytics database, which is filled with updated data irregularly but is used for analytical purposes most of the time. That means that while a lot of data is probably written in a short amount of time (ingress burst), the data is mostly read for running queries and analytics requests.

Why do I have more Writes than Reads?

Data lakes, or IoT databases, often see much higher writes than reads. In both situations we want to optimize the storage for faster writes.

Data lakes commonly collect data for later analytics. The collected data is often pre-evaluated and used as a training set for algorithms such as fraud detection.

With IoT databases, the amount of data being delivered from IoT devices is often much higher than what is necessary to present to the user. For that reason, many databases optimized for IoT data (or time series data in general) have features to pre-aggregate data for dashboards, hence increasing the write rates even higher. Anyhow, many such databases are optimized to hold the active data set (the data commonly presented to users) in RAM to prevent it from being read from disk, decreasing the typical read rates further.

Why do I have Writes Only?

Finally, there are systems that observe pretty much only writes. These include mirrors or something like hot standby systems. In this case, changes on a primary system are replicated to the mirror and written down. While the system is “waiting” for itself to become the primary, there are basically no read operations happening.

Reads and Writes Distribution

Depending on your workload, your read-write ratio will change. Before trying to find out if a storage solution fits your needs, you should figure out what your workload’s real-world distribution looks like. Only afterward is there a way to really make an educated guess.

Generally speaking, the read-write distribution will change the outcome of your tests. While a storage solution may have a massive amount of IOPS, the question is more around whether it can sustain the IOPS. At what block size can I expect those amounts of IOPS? And how those IOPS values will change when I change the block sizes.

Different workloads have different requirements. Just keep that in mind.

Why Block Size Matters

Finally, block size. We talked about it before over and over again. The block size defines how much data (in bytes) is read or written per operation. That said, 1000 operations (IOPS) with each 8KB means that we manage to read 8000KB (or 8MB) per second. We immediately see our throughput.

Does that mean that with 64KB block size, I get 8 times the speed? It depends. We all hate this answer, but it does.

It mostly depends on how the storage solution is built and how it is connected. Say, you have a network interconnect to the storage solution of 1Gbit/s, I can get roughly 125MB/s through the wire. 64KB x 1000 IOPS results in 64MB/s, so it should fit. If I increase the block size to 128KB, the network bandwidth is now the limiting factor.

Anyhow, the network isn’t your only limiting factor. Transferring larger block sizes needs more time, meaning your potential maximum IOPS will drop. Though, what I didn’t tell you before, when running your benchmark and trying to find the perfect read-write ratio, you can also play with the block size values. There is a good chance that the default block size may not be the perfect one.

Conclusion

That was a lot of information. Your two main metrics are IOPS and throughput. Latency should be measured but can be considered stable for most flash based systems today. Block size is something to experiment with. It will directly impact both IOPS and throughput, but only if you reach another limiting factor (mostly likely some bandwidth limitation).

Up to that point, you can try and increase the block size for best performance. Remember, you may also have to make changes to your workload configuration or implementation to make use of the increased block size.

The following table is supposed to help a bit in terms of judgment of both IOPS and throughput.

	IOPS	Throughput
Measurement	Input/Output operations per second	(Mega-)Bytes per second
Meaning	The amount of input/output (read or write) operations per second.	The amount of data (in bytes) that can be transferred through the storage connection per second.
Difficulty	Measurement requires specific software, and measurements depend directly on the selected block size. For measurements, I’d recommend the command line too l fio.	Easy to measure. Most operating systems have built-in tools for that or provide tools in the repositories (such as iostat). A specialized tool such as fio can help with benchmarking.
Where it helps	If you have a lot of random I/O, throughput will not be a good measurement due to the influences of latency and queuing of storage requests.	Good measurement of random and small operations.
Where is doesn’t help	Doesn’t say much about the amount of data that can be transferred. Also very much dependent on the chosen block size.	If you have a lot of random I/O, throughput will not be a good measurement due to influences of latency and queuing of storage requests.

That said, both metrics, IOPS and throughput, are useful for assessing the performance of a storage system. They shouldn’t be used in isolation, though.

Simplyblock provides a disaggregated storage solution for latency-sensitive and high-IO workloads in the cloud. To achieve that, simplyblock creates a unified storage pool of NVMe disks connected to the simplyblock cluster nodes (virtual machines) and offers logical volumes through NVMe over TCP. Those logical volumes will be distributed across all connected cluster nodes and disks and secured using erasure coding, bringing the highest performance without sacrificing fault tolerance. Learn now more about simplyblock, or get started right away.

The post IOPS vs Throughput vs Latency – Storage Performance Metrics appeared first on simplyblock.

How the CSI (Container Storage Interface) Works

Steven Sklar (Guest Author, QuestDB) — Fri, 29 Mar 2024 12:13:27 +0000

If you work with persistent storage in Kubernetes, maybe you’ve seen articles about how to migrate from in-tree to CSI volumes, but aren’t sure what all the fuss is about? Or perhaps you’re trying to debug a stuck VolumeAttachment that won’t unmount from a node, holding up your important StatefulSet rollout? A clear understanding of what the Container Storage Interface (or CSI for short) is and how it works will give you confidence when dealing with persistent data in Kubernetes, allowing you to answer these questions and more!

Editorial: This blog post is written by a guest author, Steven Sklar from QuestDB. It appeared first on his private blog at sklar.rocks. We appreciate his contributions to the Kubernetes ecosystem and wanted to thank him for letting us repost his article. Steven, you rock! 🔥

The Container Storage Interface is an API specification that enables developers to build custom drivers which handle the provisioning, attaching, and mounting of volumes in containerized workloads. As long as a driver correctly implements the CSI API spec, it can be used in any supported Container Orchestration system, like Kubernetes. This decouples persistent storage development efforts from core cluster management tooling, allowing for the rapid development and iteration of storage drivers across the cloud native ecosystem.

In Kubernetes, the CSI has replaced legacy in-tree volumes with a more flexible means of managing storage mediums. Previously, in order to take advantage of new storage types, one would have had to upgrade an entire cluster’s Kubernetes version to access new PersistentVolume API fields for a new storage type. But now, with the plethora of independent CSI drivers available, you can add any type of underlying storage to your cluster instantly, as long as there’s a driver for it.

But what if existing drivers don’t provide the features that you require and you want to build a new custom driver? Maybe you’re concerned about the ramifications of migrating from in-tree to CSI volumes? Or, you simply want to learn more about how persistent storage works in Kubernetes? Well, you’re in the right place! This article will describe what the CSI is and detail how it’s implemented in Kubernetes.

It’s APIs all the way down

Like many things in the Kubernetes ecosystem, the Container Storage Interface is actually just an API specification. In the container-storage-interface/spec GitHub repo, you can find this spec in 2 different versions:

A protobuf file that defines the API schema in gRPC terms
A markdown file that describes the overall system architecture and goes into detail about each API call

What I’m going to discuss in this section is an abridged version of that markdown file, while borrowing some nice ASCII diagrams from the repo itself!

Architecture

A CSI Driver has 2 components, a Node Plugin and a Controller Plugin. The Controller Plugin is responsible for high-level volume management; creating, deleting, attaching, detatching, snapshotting, and restoring physical (or virtualized) volumes. If you’re using a driver built for a cloud provider, like EBS on AWS, the driver’s Controller Plugin communicates with AWS HTTPS APIs to perform these operations. For other storage types like NFS, EXSI, ZFS, and more, the driver sends these requests to the underlying storage’s API endpoint, in whatever format that API accepts.

Editorial: The same is true for simplyblock. Simplyblock’s CSI driver implements all necessary, and following described calls, making it a perfect drop-in replacement for Amazon EBS. If you want to learn more read: Why simplyblock.

On the other hand, the Node Plugin is responsible for mounting and provisioning a volume once it’s been attached to a node. These low-level operations usually require privileged access, so the Node Plugin is installed on every node in your cluster’s data plane, wherever a volume could be mounted.

The Node Plugin is also responsible for reporting metrics like disk usage back to the Container Orchestration system (referred to as the “CO” in the spec). As you might have guessed already, I’ll be using Kubernetes as the CO in this post! But what makes the spec so powerful is that it can be used by any container orchestration system, like Nomad for example, as long as it abides by the contract set by the API guidelines.

The specification doc provides a few possible deployment patterns, so let’s start with the most common one.

CO "Master" Host
+-------------------------------------------+
|                                           |
|  +------------+           +------------+  |
|  |     CO     |   gRPC    | Controller |  |
|  |            +----------->   Plugin   |  |
|  +------------+           +------------+  |
|                                           |
+-------------------------------------------+

CO "Node" Host(s)
+-------------------------------------------+
|                                           |
|  +------------+           +------------+  |
|  |     CO     |   gRPC    |    Node    |  |
|  |            +----------->   Plugin   |  |
|  +------------+           +------------+  |
|                                           |
+-------------------------------------------+

Since the Controller Plugin is concerned with higher-level volume operations, it does not need to run on a host in your cluster’s data plane. For example, in AWS, the Controller makes AWS API calls like ec2:CreateVolume, ec2:AttachVolume, or ec2:CreateSnapshot to manage EBS volumes. These functions can be run anywhere, as long as the caller is authenticated with AWS. All the CO needs is to be able to send messages to the plugin over gRPC. So in this architecture, the Controller Plugin is running on a “master” host in the cluster’s control plane.

On the other hand, the Node Plugin must be running on a host in the cluster’s data plane. Once the Controller Plugin has done its job by attaching a volume to a node for a workload to use, the Node Plugin (running on that node) will take over by mounting the volume to a well-known path and optionally formatting it. At this point, the CO is free to use that path as a volume mount when creating a new containerized process; so all data on that mount will be stored on the underlying volume that was attached by the Controller Plugin. It’s important to note that the Container Orchestrator, not the Controller Plugin, is responsible for letting the Node Plugin know that it should perform the mount.

Volume Lifecycle

The spec provides a flowchart of basic volume operations, also in the form of a cool ASCII diagram:

   CreateVolume +------------+ DeleteVolume
 +------------->|  CREATED   +--------------+
 |              +---+----^---+              |
 |       Controller |    | Controller       v
+++         Publish |    | Unpublish       +++
|X|          Volume |    | Volume          | |
+-+             +---v----+---+             +-+
                | NODE_READY |
                +---+----^---+
               Node |    | Node
            Publish |    | Unpublish
             Volume |    | Volume
                +---v----+---+
                | PUBLISHED  |
                +------------+

Mounting a volume is a synchronous process: each step requires the previous one to have run successfully. For example, if a volume does not exist, how could we possibly attach it to a node?

When publishing (mounting) a volume for use by a workload, the Node Plugin first requires that the Controller Plugin has successfully published a volume at a directory that it can access. In practice, this usually means that the Controller Plugin has created the volume and attached it to a node. Now that the volume is attached, it’s time for the Node Plugin to do its job. At this point, the Node Plugin can access the volume at its device path to create a filesystem and mount it to a directory. Once it’s mounted, the volume is considered to be published and it is ready for a containerized process to use. This ends the CSI mounting workflow.

Continuing the AWS example, when the Controller Plugin publishes a volume, it calls ec2:CreateVolume followed by ec2:AttachVolume. These two API calls allocate the underlying storage by creating an EBS volume and attaching it to a particular instance. Once the volume is attached to the EC2 instance, the Node Plugin is free to format it and create a mount point on its host’s filesystem.

Here is an annotated version of the above volume lifecycle diagram, this time with the AWS calls included in the flow chart.

   CreateVolume +------------+ DeleteVolume
 +------------->|  CREATED   +--------------+
 |              +---+----^---+              |
 |       Controller |    | Controller       v
+++         Publish |    | Unpublish       +++
|X|          Volume |    | Volume          | |
+-+                 |    |                 +-+
                    |    |
  |    | 
                    |    |
  |    | 
                    |    |
                +---v----+---+
                | NODE_READY |
                +---+----^---+
               Node |    | Node
            Publish |    | Unpublish
             Volume |    | Volume
                +---v----+---+
                | PUBLISHED  |
                +------------+

If a Controller wants to delete a volume, it must first wait for the Node Plugin to safely unmount the volume to preserve data and system integrity. Otherwise, if a volume is forcibly detached from a node before unmounting it, we could experience bad things like data corruption. Once the volume is safely unpublished (unmounted) by the Node Plugin, the Controller Plugin would then call ec2:DetachVolume to detach it from the node and finally ec2:DeleteVolume to delete it, assuming that the you don’t want to reuse the volume elsewhere.

What makes the CSI so powerful is that it does not prescribe how to publish a volume. As long as your driver correctly implements the required API methods defined in the CSI spec, it will be compatible with the CSI and by extension, be usable in COs like Kubernetes and Nomad.

Running CSI Drivers in Kubernetes

What I haven’t entirely make clear yet is why the Controller and Node Plugins are plugins themselves! How does the Container Orchestrator call them, and where do they plug into?

Well, the answer depends on which Container Orchestrator you are using. Since I’m most familiar with Kubernetes, I’ll be using it to demonstrate how a CSI driver interacts with a CO.

Deployment Model

Since the Node Plugin, responsible for low-level volume operations, must be running on every node in your data plane, it is typically installed using a DaemonSet. If you have heterogeneous nodes and only want to deploy the plugin to a subset of them, you can use node selectors, affinities, or anti-affinities to control which nodes receive a Node Plugin Pod. Since the Node Plugin requires root access to modify host volumes and mounts, these Pods will be running in privileged mode. In this mode, the Node Plugin can escape its container’s security context to access the underlying node’s filesystem when performing mounting and provisioning operations. Without these elevated permissions, the Node Plugin could only operate inside of its own containerized namespace without the system-level access that it requires to provision volumes on the node.

The Controller Plugin is usually run in a Deployment because it deals with higher-level primitives like volumes and snapshots, which don’t require filesystem access to every single node in the cluster. Again, lets think about the AWS example I used earlier. If the Controller Plugin is just making AWS API calls to manage volumes and snapshots, why would it need access to a node’s root filesystem? Most Controller Plugins are stateless and highly-available, both of which lend themselves to the Deployment model. The Controller also does not need to be run in a privileged context.

Event-Driven Sidecar Pattern

Now that we know how CSI plugins are deployed in a typical cluster, it’s time to focus on how Kubernetes calls each plugin to perform CSI-related operations. A series of sidecar containers, that are registered with the Kubernetes API server to react to different events across the cluster, are deployed alongside each Controller and Node Plugin. In a way, this is similar to the typical Kubernetes controller pattern, where controllers react to changes in cluster state and attempt to reconcile the current cluster state with the desired one.

There are currently 6 different sidecars that work alongside each CSI driver to perform specific volume-related operations. Each sidecar registers itself with the Kubernetes API server and watches for changes in a specific resource type. Once the sidecar has detected a change that it must act upon, it calls the relevant plugin with one or more API calls from the CSI specification to perform the desired operations.

Controller Plugin Sidecars

Here is a table of the sidecars that run alongside a Controller Plugin:

Sidecar Name	K8s Resources Watched	CSI API Endpoints Called
external-provisioner	PersistentVolumeClaim	CreateVolume, DeleteVolume
external-attacher	VolumeAttachment	Controller(Un)PublishVolume
external-snapshotter	VolumeSnapshot (Content)	CreateSnapshot, DeleteSnapshot
external-resizer	PersistentVolumeClaim	ControllerExpandVolume

How do these sidecars work together? Let’s use an example of a StatefulSet to demonstrate. In this example, we’re dynamically provisioning our PersistentVolumes (PVs) instead of mapping PersistentVolumeClaims (PVCs) to existing PVs. We start at the creation of a new StatefulSet with a VolumeClaimTemplate.

---
apiVersion: apps/v1
kind: StatefulSet
spec:
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
         storage: 1Gi

Creating this StatefulSet will trigger the creation of a new PVC based on the above template. Once the PVC has been created, the Kubernetes API will notify the external-provisioner sidecar that this new resource was created. The external-provisioner will then send a CreateVolume message to its neighbor Controller Plugin over gRPC. From here, the CSI driver’s Controller Plugin takes over by processing the incoming gRPC message and will create a new volume based on its custom logic. In the AWS EBS driver, this would be an ec2:CreateVolume call.

At this point, the control flow moves to the built-in PersistentVolume controller, which will create a matching PV and bind it to the PVC. This allows the StatefulSet’s underlying Pod to be scheduled and assigned to a Node.

Here, the external-attacher sidecar takes over. It will be notified of the new PV and call the Controller Plugin’s ControllerPublishVolume endpoint, mounting the volume to the StatefulSet’s assigned node. This would be the equivalent to ec2:AttachVolume in AWS.

At this point, we have an EBS volume that is mounted to an EC2 instance, all based on the creation of a StatefulSet, PersistentVolumeClaim, and the work of the AWS EBS CSI Controller Plugin.

Node Plugin Sidecars

There is only one unique sidecar that is deployed alongside the Node Plugin; the node-driver-registrar. This sidecar, running as part of a DaemonSet, registers the Node Plugin with a Node’s kubelet. During the registration process, the Node Plugin will inform the kubelet that it is able to mount volumes using the CSI driver that it is part of. The kubelet itself will then wait until a Pod is scheduled to its corresponding Node, at which point it is then responsible for making the relevant CSI calls ( PublishVolume ) to the Node Plugin over gRPC.

Common Sidecars

There is also a livenessprobe sidecar that runs in both the Container and Node Plugin Pods that monitors the health of the CSI driver and reports back to the Kubernetes Liveness Probe mechanism.

Communication over Sockets

How do these sidecars communicate with the Controller and Node Plugins? Over gRPC through a shared socket! So each sidecar and plugin contains a volume mount pointing to a single unix socket.

This diagram highlights the pluggable nature of CSI Drivers. To replace one driver with another, all you have to do is simply swap the CSI Driver container with another and ensure that it’s listening to the unix socket that the sidecars are sending gRPC messages to. Becase all drivers advertise their own different capabilities and communicate over the shared CSI API contract, it’s literally a plug-and-play solution.

Conclusion

In this article, I only covered the high-level concepts of the Container Storage Interface spec and implementation in Kubernetes. While hopefully it has provided a clearer understanding of what happens once you install a CSI driver, writing one requires significant low-level knowledge of both your nodes’ operating system(s) and the underlying storage mechanism that your driver is implementing. Luckily, CSI drivers exist for a variety of cloud providers and distributed storage solutions, so it’s likely that you can find a CSI driver that already fulfills your requirements. But it always helps to know what’s happening under the hood in case your particular driver is misbehaving.

If this article interests you and you want to learn more about the topic, please let me know! I’m always happy to answer questions about CSI Drivers, Kubernetes Operators, and a myriad of other DevOps-related topics.

The post How the CSI (Container Storage Interface) Works appeared first on simplyblock.

Kubernetes CSI: Container Attached Storage

Chris Engelbert — Wed, 27 Mar 2024 12:13:27 +0000

Containerized services must be stateless, a doctrine that was widely used in the early days of containerization, which came hand-in-hand with microservices. While it makes elasticity easy, these days, we containerize many types of services, such as databases, which cannot be stateless—at least, without losing their meaning. This is where the Kubernetes container storage interface (CSI) comes in.

Docker, initially released in 2013, brought containerized applications to the vast majority of users (outside of the Solaris and BSD world), making them a commodity to the masses. Kubernetes, however, eased the process of orchestrating complex container-based systems. Both systems enable data storage options, ephemeral (temporary) or persistent. Let’s dive into concepts of container-attached storage and Kubernetes CSI.

What is Container Attached Storage (CAS)?

When containerized services need disk storage, whether ephemeral or persistent, container-attached storage (or CAS) provides the requested “virtual disk” to the container.

The CAS resources are managed alongside other container resources and are directly attached to the container’s own lifecycle. That means that storage resources are automatically provisioned and potentially de-provisioned. To achieve this functionality, the management of container-attached storage resources isn’t provided by the host operating system but directly integrated into the container runtime environment, hence systems such as Kubernetes, Docker, and others.

Since the storage resource is attached to the container, it isn’t used by the host operating system or other containers. Detaching storage and compute resources provides one of the building blocks of loosely coupled services, which small and independent development teams can easily manage.

From my perspective, five main principles are important to CAS:

Native: Storage resources are a first-class citizen of containerized environments. Therefore, the overall container runtime environment seamlessly integrates with and fully manages it.
Dynamic: Storage resources are (normally) coupled to their container’s lifecycle. This allows for on-demand provisioning of storage volumes whose size and performance profile are tailored to the applications’ needs. The dynamic nature and automatic resource management prevent manual intervention of volumes and devices.
Decoupled: Storage resources are decoupled from the underlying infrastructure, meaning the container doesn’t know (and care) where the provided storage comes from. That makes it easy to provide different storage options, like high performance or highly resilient storage, to different containers. For super-high performance but ephemeral storage, even RAM disks would be an option.
Efficient: By eliminating the need for traditional storage, e.g., local storage, it is easy to optimize resource utilization using special storage clusters, thin provisioning, and over-commitment. It also makes it easy to provide multi-regional backups and enables immediate re-attachment in case the container needs to be rescheduled on another cluster node.
Agnostic: The storage provider can be easily exchanged due to the decoupling of storage resources and container runtime. This prevents vendor lock-in or provides the option to utilize multiple different storage options, depending on the needs of specific applications. A database running in a container will have very different storage requirements from a normal REST API service.

Given the five features above, we have the chance to provide each and every container with the exact storage option necessary. Some may need only ephemeral storage. Hence, temporary storage can be discarded when the container itself stops, while others need persistent storage, which either lives until the container is deleted or, in specific cases, will even survive this to be reattached to a new container (for example, in the case of container migration).

What is a Container Storage Interface (Kubernetes CSI)?

Like everything in Kubernetes, the container-attached storage functionality is provided by a set of microservices orchestrated by Kubernetes itself, making it modular by design. That said, services internally provided and extended by vendors make up the container storage interface (or Kubernetes CSI). Together, they create a well-defined interface for any type of storage option to be plugged into Kubernetes.

The container storage interface defines a standard set of functionalities, some mandatory and some optional, to be implemented by the Kubernetes CSI drivers. Those drivers are commonly provided by the different vendors of storage systems.

Hence, the CSI drivers build bridges between Kubernetes and the actual storage implementation, which can be physical, software-defined, or fully virtual (like an implementation sending all data “stored” to /dev/null). On the other hand, it allows vendors to implement their storage solution as efficiently as possible, providing a minimal set of operations towards provisioning and general management. That way, vendors can choose how to implement storage, with the two main categories being hyperconverged (compute and storage sharing the same cluster nodes), disaggregated, meaning that the actual storage environment is fully separated from the Kubernetes workloads using them, bringing a clear separation of storage and compute resources.

Just like Kubernetes, the container storage interface is developed as a collaborative effort inside the Cloud Native Computing Foundation (better known as CNCF) by members from all sides of the industry, vendors, and users.

The main goal of Kubernetes CSI is to deliver on the premise of being fully vendor-neutral. In addition, it enables parallel deployment of multiple different drivers, offering storage classes for each of them. This provides us, as users, with the ability to choose the best storage technology for each container, even in the same Kubernetes cluster.

As mentioned, the Kubernetes CSI driver interface provides a standard storage (or volume) operation set. These include creation or provisioning, resizing, snapshotting, cloning, and volume deletion. The operations can either be performed directly or through Kubernetes’ container resource descriptors (CRD), integrating into the consistent approach to managing container resources.

Editor’s note: We also have a deep dive into how the Kubernetes Container Storage Interface works.

Kubernetes and Stateful Workloads

For many people, containerized workloads should be fully stateless; in the past, it was the most commonly used mantra. With the rise of orchestration platforms, such as Kubernetes, it also became more typical to deploy more stateful workloads, often due to the simplified deployment. Orchestrators offer features like automatic elasticity, restarting containers after crashes, automatic migration of containers for rolling upgrades, as well as many more typical operational procedures. Having them built-in into an orchestration platform takes a lot of the burden, hence people started to deploy more and more databases.

Databases aren’t the only stateful workloads, though. Other applications and services may also require storage of some kind of state, sometimes as a local cache, using ephemeral storage, and sometimes in a more persistent fashion, as databases.

Benjamin Wootton (then working for Contino, now at Ensemble) wrote a great blog post about the difference between stateless and stateful containers and why the latter is needed. You should read it, but only after this one.

Your Kubernetes Storage with Simplyblock

The container storage interface in Kubernetes serves as the bridge between Kubernetes and external storage systems. It provides a standardized and modular approach to provisioning and managing container-attached storage resources.

By decoupling storage functionality from the Kubernetes core, Kubernetes CSI promotes interoperability, flexibility, and extensibility. This enables organizations to seamlessly leverage a wide range of storage solutions in their Kubernetes environments, tailoring the storage to the needs of each container individually.

With the evolving ecosystem and changing Kubernetes workloads towards databases and other IO-intensive or low-latency applications, storage becomes increasingly important. Simplyblock is your distributed, disaggregated, high-performance, predictable, low-latency, and resilient storage solution. Simplyblock is tightly integrated with Kubernetes through the CSI driver and available as a StorageClass. It enables storage virtualization with overcommitment, thin provisioning, NVMe over TCP access, copy-on-write snapshots, and many more features.

If you want to learn more about simplyblock, read “Why simplyblock?” If you want to get started, we believe in simple pricing.

The post Kubernetes CSI: Container Attached Storage appeared first on simplyblock.

AWS EBS Pricing: A Comprehensive Guide

Chris Engelbert — Wed, 28 Feb 2024 12:13:26 +0000

In the vast landscape of cloud computing, Amazon Elastic Block Store (Amazon EBS) stands out as a crucial component for storage in AWS’ Amazon EKS (Elastic Kubernetes Service), as well as other AWS services.

As businesses increasingly migrate to the cloud, or build newer applications as cloud-native services, understanding the cloud cost becomes essential for cost-effective operations. With Amazon EBS often making up 50% or more of the cloud cost, it is important to grasp the intricacies of Amazon EBS pricing, explore the key concepts, and find the main factors that influence cost, as well as strategies to optimize expenses.

Understanding Amazon EBS

Amazon EBS provides scalable block-level storage volumes for use with Amazon EKS Persistent Volumes, EC2 instances, and other Amazon services. It offers various volume types, each designed for specific use cases, such as General Purpose (SSD), Provisioned IOPS (SSD), and HDD based. The choice of volume type significantly impacts performance and cost, making it vital to align storage configurations with application requirements.

Amazon EBS Pricing Breakdown

AWS pricing is complicated and requires a lot of studying the different regions, available options, as well as some good estimations of a service’s own behavior in terms of speed and capacity requirements.

Amazon EBS provides a set of different factors that influence availability, performance, capacity, and most prominently the cost.

Volume Type and Performance

Different workloads demand different levels of performance. Understanding the nature of your applications and selecting the appropriate volume type is crucial to balance cost and performance. The available volume types will be discussed further down in the blog post.

Volume Size

Amazon EBS volumes come in various sizes, and costs scale with the amount of provisioned storage per volume. Assessing the storage storage requirements and adjusting volume sizes accordingly to avoid over-provisioning can influence quite significantly.

Snapshot Costs

Creating snapshots for backup and disaster recovery is a common practice. However, snapshot costs can accumulate, especially as the frequency and volume of snapshots increase, the cost scales with the number and types of snapshots created. Additionally, there are two types of snapshots, standard, which is the default, and archive, which is cheaper on the storage side, but incurs cost when being restored. Implementing a snapshot management strategy to control expenses is crucial.

Throughput and I/O Operations

Throughput and I/O operations may or may not incur additional costs, depending on the selected volume type.

While data transfer is often easy to estimate, the necessary values for throughput and I/O operations per second (also known as IOPS ) are much harder. Especially IOPS can be a fair amount of the spending when running io-intensive workloads, such as databases, data warehouses, high-load webservers, or similar.

Be mindful of the amount of data transferred in and out of your EBS volumes, as well as the number of I/O operations performed.

Amazon EBS Volume Types

As mentioned above, Amazon EBS has quite the set of different volume types. Some are designed for specific use cases or to provide a cost-effective alternative, while others are older or newer generations for the same usage scenario.

An in-depth technical description of the different volume types can be found on AWS’ documentation .

Cheap Storage Volumes (st1 / Sc1)

The first category is designed for storage volumes that require large amounts of data storage which, at the same time, doesn’t need to provide the highest performance characteristics.

Being based upon HDD disks, the access latency is high and transfer speed is fairly low. The volume can be scaled up to 16TiB each though, reaching a high capacity at a cheap price.

Durability is typically given as 99.8% – 99.9%, meaning that the volume can be offline for roughly 9h per year. Warm ( throughput optimized) and cold volumes are available, relating to the types st1 and sc1 respectively.

General Purpose Volumes (gp2 / Gp3)

The second category is, what AWS calls, general purpose. It has the widest applicability and is the default option when looking for an Amazon EBS volume.

When creating volumes, gp2 should be avoided, being the old generation at the same price but with less features. That said, gp3 provides higher throughput and IOPS over st1 and sc1 volumes due to being SSD-based storage. Like the HDD-based services, durability is in the same range of 99.8% – 99.9%, leading to up to 9h per year unavailability. Likewise with capacity. Volumes can be scaled up to 16TiB each and therefore are perfect for a variety of use cases, such as boot volumes, simple transactional workloads, smaller databases, and similar.

Provisioned IOPS Volumes (io1 / Io2)

The third option are high-performance SSD (and NVMe) based volumes.

Amazon EBS Pricing

Prices for Amazon EBS volumes and additional upgrades depend on the region they are created in. For that reason, it is not possible to give an exact explanation of the pricing. There is, however, the chance to give an overview of what features have separate prices, and an example for one specific region.

The base Amazon EBS volume types normal price from cheapest to most expensive (GB-month):

HDD-based sc1 2. HDD-based st1 3. SSD-based gp2 4. SSD-based gp3 5. SSD-based io1 and io2

In addition to the base pricing, there are certain capabilities or aspects which can be increased for an additional cost, such as I/O Operations per Second (IOPS) Throughput

Amazon EBS Pricing example

And this is where it gets a bit more complicated. Every type of volume has its own set of base, and maximum available capabilities. Not all capabilities are available on all volume types though.

In our example, we want to create an Amazon EBS volume of type io2 in the US-EAST with 10 TB storage capacity. In addition we want to increase the available IOPS to 80,000 – just to make it complicated. For newer io2 volumes, the throughput scales proportionally with provisioned IOPS up to 4,000 MiB/s, meaning we don’t have to pay extra.

Base price for the io2 volume: The volume’s base cost is 0.125 USD/GB-month. That said, our 10 TB volume comes up to 1,250 USD per month.

Throughput capability pricing: The throughput of up to 4,000 MiB/s is automatically scaled proportionally to the provisioned IOPS, so all is good here. For other volume types, additional throughput (over the base amount) can be bought.

IOPS capability pricing: The pricing for IOPS can be as complicated as with io2 volumes. These have multiple “discount stages”. The prices are split at 32,000 and 64,000 IOPS.

With that in mind, the IOPS pricing can be broken down into: 0-32,000 IOPS * 0.065 USD/IOPS-month = 2,080 USD/month 32,001 – 64,000 IOPS * 0.046 USD/IOPS-month = 1,417.95 USD/month 64,001 – 80,000 IOPS * 0.032 USD/IOPS-month = 511.97 USD/month

Cost of the io2 volume: That means, including all cost factors (USD 1,250.00 + USD 2,080.00, USD 1,417.95, USD 511.97), the cost builds up to a monthly fee of USD 5,259.92 – for a single volume.

Strategies to Optimize Amazon EBS Spending

Amazon EBS volumes can be expensive as just shown. Therefore, it is important to keep the following strategies for cost reduction and optimization in mind.

Rightsize your Volumes

Regularly assess your storage requirements and resize volumes accordingly. Downsizing or upsizing volumes based on actual needs can result in significant cost savings. If auto-growing of volumes is enabled, keep the disk growth in check. Log files, or similar, running amok can blow your spend limit in hours.

Utilize Provisioned IOPS Wisely

Provisioned IOPS volumes offer high-performance storage but come at a high cost. Use them judiciously (and not ludicrously) for applications that require consistent and low-latency performance, and consider alternatives for less demanding workloads.

Implement Snapshot Lifecycle Policies

Set up lifecycle policies for snapshots to manage retention periods and reduce unnecessary storage costs. Periodically review and clean up outdated snapshots to optimize storage usage.

Leverage EBS-Optimized Instances

Use EC2 instances that are EBS-optimized for better performance. This ensures that the network traffic between EC2 instances and EBS volumes does not negatively impact overall system performance.

Conclusive Thoughts

As businesses continue to leverage AWS services, understanding and optimizing Amazon EBS spending is a key aspect of efficient cloud management. By carefully selecting the right volume types, managing sizes, and implementing cost-saving strategies, organizations can strike a balance between performance and cost-effectiveness in their cloud storage infrastructure. Regular monitoring and adjustment of storage configurations will contribute to a well-optimized and cost-efficient AWS environment.

If this feels too complicated or the requirements are hard to predict, simplyblock offers an easier, more scalable, and future-proof solution. Running right in your AWS account, providing you with the fastest and easiest way to build your own Amazon EBS alternative for Kubernetes, and save 60% and more on storage cost at the same time. Learn here how simplyblock works.

The post AWS EBS Pricing: A Comprehensive Guide appeared first on simplyblock.

Comprehensive Guide Archives | simplyblock

How to benefit from AWS Enterprise Discount Program (EDP)

What is the AWS Enterprise Discount Program (EDP)?

How does the AWS Enterprise Discount Program (EDP) Work?

What is EDP in AWS?

How to Negotiate AWS EDP?

How to Join AWS EDP?

Understanding AWS Marketplace with AWS EDP

Can i Join EDP as a Startup?

What’s the Difference between an EDP and a PPA?

Other AWS Programs and Discounts

How can Simplyblock be used with AWS EDP?

IOPS vs Throughput vs Latency – Storage Performance Metrics

What is IOPS?

How to Calculate IOPS?

What is Throughput?

How to Calculate Throughput?

IOPS vs Throughput

What is Latency?

How are IOPS, Throughput, and Latency Related?

Reads and Writes

Why do I have more Reads than Writes?

Why do I have more Writes than Reads?

Why do I have Writes Only?

Reads and Writes Distribution

Why Block Size Matters

Conclusion

How the CSI (Container Storage Interface) Works

It’s APIs all the way down

Architecture

Volume Lifecycle

Running CSI Drivers in Kubernetes

Deployment Model

Event-Driven Sidecar Pattern

Controller Plugin Sidecars

Node Plugin Sidecars

Common Sidecars

Communication over Sockets

Conclusion

Kubernetes CSI: Container Attached Storage

What is Container Attached Storage (CAS)?

What is a Container Storage Interface (Kubernetes CSI)?

Kubernetes and Stateful Workloads

Your Kubernetes Storage with Simplyblock

AWS EBS Pricing: A Comprehensive Guide

Understanding Amazon EBS

Amazon EBS Pricing Breakdown

Volume Type and Performance

Volume Size

Snapshot Costs

Throughput and I/O Operations

Amazon EBS Volume Types

Cheap Storage Volumes (st1 / Sc1)

General Purpose Volumes (gp2 / Gp3)

Provisioned IOPS Volumes (io1 / Io2)

Amazon EBS Pricing

Amazon EBS Pricing example

Strategies to Optimize Amazon EBS Spending

Rightsize your Volumes

Utilize Provisioned IOPS Wisely

Implement Snapshot Lifecycle Policies

Leverage EBS-Optimized Instances

Conclusive Thoughts