Storage Area Network Archives | simplyblock https://www.simplyblock.io/blog/tags/storage-area-network/ NVMe-First Kubernetes Storage Platform Thu, 06 Feb 2025 12:13:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://www.simplyblock.io/wp-content/media/cropped-icon-rgb-simplyblock-32x32.png Storage Area Network Archives | simplyblock https://www.simplyblock.io/blog/tags/storage-area-network/ 32 32 Scale Up vs Scale Out: System Scalability Strategies https://www.simplyblock.io/blog/scale-up-vs-scale-out/ Wed, 11 Dec 2024 10:00:40 +0000 https://www.simplyblock.io/?p=4595 TLDR: Horizontal scalability (scale out) describes a system that scales by adding more resources through parallel systems, whereas vertical scalability (scale up) increases the amount of resources on a single system. One of the most important questions to answer when designing an application or infrastructure is the architecture approach to system scalability. Traditionally, systems used […]

The post Scale Up vs Scale Out: System Scalability Strategies appeared first on simplyblock.

]]>
TLDR: Horizontal scalability (scale out) describes a system that scales by adding more resources through parallel systems, whereas vertical scalability (scale up) increases the amount of resources on a single system.

One of the most important questions to answer when designing an application or infrastructure is the architecture approach to system scalability. Traditionally, systems used the scale-up approach or vertical scalability. Many modern systems use a scale-out approach, especially in the cloud-native ecosystem. Also called horizontal scalability.

Scale-Up vs Scale-Out: Which System Architecture is Right for You?
Scale-Up vs Scale-Out: Which System Architecture is Right for You?

Understanding the Basics

Understanding the fundamental concepts is essential when discussing system architectures. Hence, let’s briefly overview the two approaches before exploring them in more depth.

  • With Scale Up (Vertical Scalability), you increase resources (typically CPU, memory, and storage) in the existing system to improve performance and capacity.
  • With Scale Out (Horizontal Scalability), you add additional nodes or machines to the existing workforce to distribute the workload across multiple systems.

Both architectural approaches have their respective advantages and disadvantages. While scale-up architectures are easier to implement, they are harder to scale at a certain point. On the other hand, scale-out architectures are more complex to implement but scale almost linearly if done right.

Vertical Scaling (Scale Up) Architectures: The Traditional Approach

Scale Up Storage Architecture with disks being added to the same machine.
Figure 1: Scale-up storage architecture with disks being added to the same machine

Vertical scaling, commonly known as scaling up, involves adding more resources to an existing system to increase its power or capacity.

Think of it as upgrading your personal computer. Instead of buying a second computer, you add more RAM or install a faster processor or larger storage device. In enterprise storage systems, this typically means adding more CPU cores, memory, or storage drives to an existing server. Meanwhile, for virtual machines it usually involves increasing the host machine’s assigned resources.

To clarify, let’s use a real-world example from the storage industry. With a ZFS-based SAN (Storage Area Network) system, a scaling up system design is required. Or as Jason Lohrey wrote: «However, ZFS has a significant issue – it can’t scale out. ZFS’s biggest limitation is that it is “scale-up” only.» ZFS, as awesome as it is, is limited to a single machine. That said, increasing the storage capacity always means adding larger or more disks to the existing machine. This approach maintains the simplicity of the original architecture while increasing storage capacity and potentially improving performance.

Strengths of Vertical Scaling

Today, many people see the vertical scalability approach as outdated and superfluous. That is, however, not necessarily true. Vertical scaling shines in several scenarios.

First, implementing a scale-up system is generally more straightforward since it doesn’t require changes to your application architectures or complex data distribution logic. When you scale up a transactional database like PostgreSQL or MySQL, you essentially give it more operational resources while maintaining the same operational model.

Secondly, the management overhead is lower. Tasks such as backups, monitoring, and maintenance are straightforward. This simplicity often translates to lower operational costs despite the potentially higher hardware costs.

Here is a quick overview of all the advantages:

  1. Simplicity: It’s straightforward to implement since you’re just adding resources to an existing system
  2. Lower Complexity: Less architectural overhead since you’re working with a single system
  3. Consistent Performance: Lower latency due to all resources being in one place
  4. Software Compatibility: Most traditional software is designed to run on a single system
  5. Lower Initial Costs: Often cheaper for smaller workloads due to simpler licensing and management

Weaknesses and Limitations of Scale-Up Architectures

Like anything in this world, vertical scaling architectures also have drawbacks. The most significant limitation is the so-called physical ceiling. A system is limited by its server chassis’s space capacity or the hardware architecture’s limitation. You can only add as much hardware as those limitations allow. Alternatively, you need to migrate to a bigger base system.

Traditional monolithic applications often face another challenge with vertical scaling: adding more resources doesn’t always translate to linear performance improvements. For example, doubling the CPU cores might yield only a 50% performance increase due to software architecture limitations, especially resource contention.

Here is a quick overview of all the disadvantages:

  1. Hardware Limits: The physical ceiling limits how much you can scale up based on maximum hardware specifications
  2. Downtime During Upgrades: Usually requires system shutdown for hardware upgrades
  3. Cost Efficiency: High-end hardware becomes exponentially more expensive
  4. Single Point of Failure: No built-in redundancy
  5. Limited Flexibility: Cannot easily scale back down when demand decreases

When to Scale Up?

After all that, here is when you really want to go with a scale-up architecture:

  • You have traditional monolithic applications
  • You look for an easier way to optimize for performance, not capacity
  • You’re dealing with applications that aren’t designed for distributed computing
  • You need a quick solution for immediate performance issues

Horizontal Scaling (Scale Out) Architectures: The Distributed Approach

Scale-out storage architecture with additional nodes being added to the cluster
Figure 2: Scale-out storage architecture with additional nodes being added to the cluster

The fundamentally different approach is the horizontal scaling or scale-out architecture. Instead of increasing the available resources on the existing system, you add more systems to distribute the load across them. This is actually similar to adding additional workers to an assembly line rather than trying to make one worker more efficient.

Consider a distributed storage system like simplyblock or a distributed database like MongoDB. When you scale out these systems, you add more nodes to the cluster, and the workload gets distributed across all nodes. Each node handles a portion of the data and processing, allowing the system to grow almost limitlessly.

Advantages of Horizontal Scaling

Large-scale deployments and highly distributed systems are the forte of scale-out architectures. As a simple example, most modern web applications utilize load balancers. They distribute the traffic across multiple application servers. This allows us to handle millions of concurrent requests and users. Similarly, distributed storage systems like simplyblock scale to petabytes of data by adding additional storage nodes.

Secondly, another significant advantage is improved high availability and fault tolerance. In a properly designed scale-out system, if one node fails, the system continues operating. While it may degrade to a reduced service, it will not experience a complete system failure or outage.

To bring this all to a point:

  1. Near-Infinite Scalability: Can continue adding nodes as needed
  2. Better Fault Tolerance: Built-in redundancy through multiple nodes
  3. Cost Effectiveness: Can use commodity hardware
  4. Flexible Resource Allocation: Easy to scale up or down based on demand
  5. High Availability: No single point of failure

The Cost of Distribution: Weakness and Limitations of Horizontal Scalability

The primary challenge when considering scale-out architectures is complexity. Distributed systems must maintain data consistency across system boundaries, handle network communications or latencies, and handle failure recovery. Multiple algorithms have been developed over the years. The most commonly used ones are Raft and Paxos, but that’s a different blog post. Anyhow, this complexity typically requires more sophisticated management tools and distributed systems expertise. Normally also for the team operating the system.

The second challenge is the overhead of system coordination. In a distributed system, nodes must synchronize their operations. If not careful, this can introduce latency and even reduce the performance of certain types of operations. Great distributed systems utilize sophisticated algorithms to prevent these issues from happening.

Here is a quick overview of the disadvantages of horizontal scaling:

  1. Increased Complexity: More moving parts to manage
  2. Data Consistency Challenges: Maintaining consistency across nodes can be complex
  3. Higher Initial Setup Costs: Requires more infrastructure and planning
  4. Software Requirements: Applications must be designed for distributed computing
  5. Network Overhead: Communication between nodes adds latency

Kubernetes: A Modern Approach to Scaling

Kubernetes has become the de facto platform for container orchestration. It comes in multiple varieties, in its vanilla form or as the basis for systems like OpenShift or Rancher. Either way, it can be used for both vertical and horizontal scaling capabilities. However, Kubernetes has become a necessity when deploying scale-out services. Let’s look at how different workloads scale in a Kubernetes environment.

Scaling Stateless Workloads

Stateless applications, like web servers or API gateways, are natural candidates for horizontal scaling in Kubernetes. The Horizontal Pod Autoscaler (HPA) provided by Kubernetes automatically adjusts the number of pods based on metrics such as CPU or RAM utilization. Custom metrics as triggers are also possible.

Horizontally scaling stateless applications is easy. As the name suggests, stateless applications do not maintain persistent local or shared state data. Each instance or pod is entirely independent and interchangeable. Each request to the service contains all the required information needed for processing.

That said, automatically scaling up and down (in the meaning of starting new instances or shutting some down) is part of the typical lifecycle and can happen at any point in time.

Scaling Stateful Workloads

Stateful workloads, like databases, require more careful consideration.

A common approach for more traditional databases like PostgreSQL or MySQL is to use a primary-replica architecture. In this design, write operations always go to the primary instance, while read operations can be distributed across all replicas.

On the other hand, MongoDB, which uses a distributed database design, can scale out more naturally by adding more shards to the cluster. Their internal cluster design uses a technique called sharding. Data is assigned to horizontally scaling partitions distributed across the cluster nodes. Shard assignment happens either automatically (based on the data) or by providing a specific shard key, enabling data affinity. Adding a shard to the cluster will increase capacity when additional scale is necessary. Data rebalancing happens automatically.

Why we built Simplyblock on a Scale-Out Architecture

Simplyblock's scale-out architecture with storage pooling via cluster nodes.
Figure 3: Simplyblock’s scale-out architecture with storage pooling via cluster nodes

Stateful workloads, like Postgres or MySQL, can scale out by adding additional read-replicas to the cluster. However, every single instance needs storage to store its very own data. Hence, the need for scalable storage arrives.

Simplyblock is a cloud-native and distributed storage platform built to deliver scalable performance and virtually infinite capacity for logical devices through horizontal scalability. Unlike traditional storage systems, simplyblock distributes data across all cluster nodes, multiplying the performance and capacity.

Designed as an NVMe-first architecture, simplyblock using the NVMe over Fabrics protocol family. This extends the reach of the highly scalable NVMe protocol over network fabrics such as TCP, Fibre Channel, and others. Furthermore, it provides built-in support for multi-pathing, enabling seamless failover and load balancing.

The system uses a distributed data placement algorithm to spread data across all available cluster nodes, automatically rebalancing data when nodes are added or removed. When writing data, simplyblock splits the item into multiple, smaller chunks and distributes them. This allows for parallel access during read operations. The data distribution also provides redundancy, with parity information stored on other nodes in the cluster. This protects the data against individual disk and node failures.

Using this architecture, simplyblock provides linear capacity and performance scalability by pooling all available disks and parallelizing access. This enables simplyblock to scale from mere terabytes to multiple petabytes while maintaining performance, consistency, and durability characteristics throughout the cluster-growth process.

Building Future-Proof Infrastructure

To wrap up, when you build out a new system infrastructure or application, consider these facts:

Flowchart when to scale-up or scale-out?
Figure 4: Flowchart when to scale-up or scale-out?
  1. Workload characteristics: CPU-intensive workloads might benefit more from vertical scaling. Distributing operations comes with its own overhead. If the operation itself doesn’t set off this overhead, you might see lower performance than with vertical scaling. On the other hand, I/O-heavy workloads might perform better with horizontal scaling. If the access patterns are highly parallelizable, a horizontal architecture will most likely out scale a vertical one.
  2. Growth patterns: Predictable, steady growth might favor scaling up, while rapid growth patterns might necessitate the flexibility of scaling out. This isn’t a hard rule, though. A carefully designed scale-out system will provide a very predictable growth pattern and latency. However, the application isn’t the only element to take into account when designing the system, as there are other components, most prominently the network and network equipment.
  3. Future-Proofing: Scaling out often requires little upfront investment in infrastructure but higher investment in development and expertise. It can, however, provide better long-term cost efficiency for large deployments. That said, buying a scale-out solution is a great idea. With a storage solution like simplyblock, for example, you can start small and add required resources whenever necessary. With traditional storage solutions, you have to go with a higher upfront cost and are limited by the physical ceiling.
  4. Operational Complexity: Scale-up architectures are typically easier to manage, while a stronger DevOps or operations team is required to handle scale-out solutions. That’s why simplyblock’s design is carefully crafted to be fully autonomous and self-healing, with as few hands-on requirements as possible.

The Answer Depends

That means there is no universal answer to whether scaling up or out is better. A consultant would say, “It depends.” Seriously, it does. It depends on your specific requirements, constraints, and goals.

Many successful organizations use a hybrid approach, scaling up individual nodes while also scaling out their overall infrastructure. The key is understanding the trade-offs and choosing the best approach to your needs while keeping future growth in mind. Hence, simplyblock provides the general scale-out architecture for infinite scalability. It also provides a way to utilize storage located in Kubernetes worker nodes as part of the storage cluster to provide the highest possible performance. At the same time, it maintains the option to spill over when local capacity is reached and the high durability and fault tolerance of a fully distributed storage system.

Remember, the best scaling strategy aligns with your business objectives while maintaining performance, reliability, and cost-effectiveness. Whether you scale up, out, or both, ensure your choice supports your long-term infrastructure goals.

Simple definition of scale up vs scale out.
Figure 5: Simple definition of scale up vs scale out.

The post Scale Up vs Scale Out: System Scalability Strategies appeared first on simplyblock.

]]>
scale-up-vs-scale-out-which-system-architecutre-is-right-for-you-social-hero Scale-Up vs Scale-Out: Which System Architecture is Right for You? scale-up-storage-architecture-design scale-out-storage-architecture-design simplyblock-scale-out-storage-cluster-architecture scale-up-vs-scale-out-flowchart-when-to-scale-up-or-scale-out scale-up-vs-scale-up-comparison-simple
Say Goodbye to High Data Access-Latency Cloud Storage with Simplyblock’s Local NVMe SSD Caching https://www.simplyblock.io/blog/low-latency-access-with-local-ssd-caching/ Fri, 26 Jul 2024 01:42:25 +0000 https://www.simplyblock.io/?p=1762 When using persistent storage on cloud providers, there are two options: locally attached (like AWS’ Instance Store or GCP Local SSD) or remote (like Amazon EBS or Google Persistent Disks). Both have pros and cons. Imagine a world where you could get the best of both options with persistent storage and local SSD caching. Discord dreamed of […]

The post Say Goodbye to High Data Access-Latency Cloud Storage with Simplyblock’s Local NVMe SSD Caching appeared first on simplyblock.

]]>
When using persistent storage on cloud providers, there are two options: locally attached (like AWS’ Instance Store or GCP Local SSD) or remote (like Amazon EBS or Google Persistent Disks). Both have pros and cons. Imagine a world where you could get the best of both options with persistent storage and local SSD caching.

Discord dreamed of something like it a little while ago but couldn’t find a solution, so they started implementing their own. But not too fast. Let’s get the background out of the way first.

The Challenge: Balancing Performance and Cost

Running a high-velocity database in the cloud constantly struggles to balance high performance versus your cloud spending. It’s not only you and me, it’s everyone. Discord saw the same challenge with storing more than 4 billion messages by millions of users per day.

My immediate first reaction is local storage. It has low latency and high throughput. But it comes at the cost of, first and foremost, cost, as well as the problems of backup and scalability.

Persistent cloud disks, on the other hand, offer great scalability and are automatically replicated. However, due to the introduced network latency, they typically have lower throughput and higher access latency. This issue quickly becomes the main bottleneck in a system. You can get around this issue but at a hefty price.

Simplyblock’s Local SSD Caching Solution

At simplyblock, we love a good challenge. Hence, we took it upon ourselves to implement the best of both worlds: a super-scalable and low-latency software-defined storage engine with zero downtime resource scalability and the option to add local SSD near caching for the lowest latency possible.

This feature helps to significantly boost the read performance of simplyblock’s logical SSD volumes by caching data locally on directly attached SSD devices. But what does this mean in practice, and how does it benefit users?

How does Local SSD Caching Work?

Simplyblock's storage and local SSD caching architecture
Figure 1: Simplyblock’s storage and local SSD caching architecture

At its core, simplyblock’s caching functionality leverages the blazing-fast speeds of local NVMe SSDs to create a high-performance and transparent caching layer. Here’s our simplified process:

  1. When data is read from the logical volume, it’s stored in the local SSD cache (read-behind).
  2. Subsequent read requests for the same data are served directly from the cache, bypassing access to the main storage volume.
  3. This caching mechanism dramatically reduces latency and increases throughput for frequently accessed data.
  4. When data is written to the logical volume, the information is passed through to the underlying backend storage and stored locally in the cache (write-through).

The beauty of this approach lies in its transparency, simplicity, and effectiveness. By caching data close to your workload, simplyblock eliminates the performance bottlenecks associated with network-attached storage, all while maintaining the benefits of centralized storage management.

Additionally, the caching layer is built into the solution and fully transparent to your application, meaning there is no need to change the application. As a developer, my heart could not jump higher.

What are the Benefits of Local SSD Caching?

In addition, simplyblock’s local SSD caching brings quite a few more benefits to the table:

  1. Dramatically Improved Read Performance: By serving frequently accessed data from local SSDs, latency is significantly reduced, and throughput is increased.
  2. Cost-Effective Performance Boost: Users can achieve high-performance storage without the need to invest in expensive, all-flash storage for their entire dataset.
  3. Scalability: The caching functionality can be easily scaled to meet growing performance demands without overhauling the entire storage infrastructure.
  4. Flexibility: Simplyblock’s solution provides logical (virtual) block storage devices, which means it works seamlessly with existing setups and provides a performance boost without disrupting current deployment processes.
  5. Optimized Resource Utilization: By caching hot data locally, network traffic to the main storage volume is reduced, optimizing overall resource utilization.
  6. Transparency: Making caching an integral part of simplyblock’s storage architecture, it is fully transparent to users. Hence, it works with any existing workload, such as databases, analytics tools, etc.

Real-World Validation: Discord’s Super-Disks

As mentioned before, in 2021 / 2022, Discord faced a similar challenge. They needed to scale persistent storage and get the benefits of internal storage data protection (in their case, GCP Persistent Disk with automatic replication). Still, they found that remotely attached disks were “too slow.”

That said, Discord, known for its popular communication platform, encountered performance issues using Google Cloud Platform’s Persistent Disks. Much like simplyblock’s caching functionality, their workaround involved using local SSDs to cache data and dramatically improve performance.

In his blog post from 2022, Glen Oakley, Senior Software Engineer at Discord, wrote about the reason to implement it in-house: “No such disk exists, at least not within the ecosystem of common cloud providers.” Glen noted in the blog post: “By using local SSDs as a cache, we can get the best of both worlds: the durability and ease-of-use of network block storage, with performance that rivals local disks.”

This is what simplyblock is. But it’s always good to have a real-world example from a major tech player to underscore the validity and importance of local caching solutions like simplyblock’s. Clearly, the challenge of balancing performance and cost-effectiveness in cloud storage is a widespread concern, and innovative and standardized caching solutions are emerging as a powerful answer. Who knows how many people built their own workaround already? Even during my time at Hazelcast, I kept telling people, “Don’t roll your own caching; it comes back to bite you,” as caching is commonly more complicated than just storing stuff.

Anyhow, I totally recommend you read Glen’s full blog post at Discord’s blog site.

Simplyblock’s Architecture: A Closer Look to Local SSD Caching

While the concept behind simplyblock’s local SSD caching is straightforward, the implementation is special in more than one regard and designed for the highest storage optimization. Let’s dive deeper into simplyblock.

Simplyblock's Local SSD Caching Architecture
Figure 2: Simplyblock’s Intelligent Caching Solution

SPDK-Based

Simplyblock’s persistent storage engine is built upon the Storage Performance Development Kit (SPDK) but extends it with data distribution, cluster management, erasure coding, and more. SPDK is a set of tools and libraries designed to write high-performance, scalable, user-mode storage applications. By integrating SPDK, simplyblock ensures that its storage solution is effective, highly efficient, and scalable.

Flexible Configuration

One of the standout features of simplyblock’s implementation is its flexibility. You can easily configure logical volumes, their performance profiles, and caching properties to suit your specific needs. This includes setting up custom cache sizes, choosing between different caching algorithms, and fine-tuning other parameters to optimize performance for your unique workloads.

Seamless Integration

Simplyblock has designed its caching functionality to be as seamless and transparent to you as possible. This means you can implement this performance-boosting feature without disrupting your current infrastructure or requiring significant changes to your workload.

Intelligent Cache Management

Behind the scenes, simplyblock’s caching solution employs algorithms to manage the data and caches effectively. This includes:

Intelligent data placement to ensure the most frequently accessed data is always readily available Efficient cache eviction policies to make room for new data when the cache fills up Consistency mechanisms to ensure data integrity between the cache and the main storage volume (write-through and read-behind)

The Bigger Picture: Transforming Storage Performance

Simplyblock’s local SSD caching functionality represents more than just a performance boost – it is a paradigm shift in how we approach storage in the cloud era. By bridging the gap between high-performance local storage and the flexibility of network-attached volumes, simplyblock is paving the way for a new generation of storage solutions.

This innovation has far-reaching implications across various industries and use cases:

  • High-Performance Computing: Research institutions and scientific organizations can accelerate data-intensive computations without breaking the bank on all-flash arrays.
  • Databases: High-velocity databases that benefit from low access latency and high throughput to quickly read data from disk for large reports or real-time data analytics.
  • Observability: Append-only or append-mostly storage engines and analytics platforms, running large aggregations and anomaly detection algorithms.
  • AI / Machine Learning: Running the latest LLMs (Large Language Models) or other AI workloads and training sets requires large amounts of data and high-performance storage, but also consistent latency for predictable runtime performance.
  • Financial Services: Banks and fintech companies can speed up transaction processing and analytics workloads, improving customer experiences and decision-making capabilities.
  • Media and Entertainment: Content delivery networks can cache frequently accessed media files locally, reducing latency and improving streaming quality.
  • E-commerce: Online retailers can boost the performance of their product catalogs and recommendation engines, leading to faster page loads and improved customer satisfaction.
  • Your High-Performance Workload: Whatever your data-intensive, low-latency sensitive, and high-performance use case looks like, simplyblock’s persistent storage solution can help you with a fully cloud-native solution.

Looking Ahead: The Future of Persistent Storage Performance

As data volumes continue to grow exponentially year-over-year and workloads, such as databases, become increasingly demanding, solutions like simplyblock’s local SSD caching will play a crucial role in shaping the future of storage performance. We can expect to see further innovations in this space, including:

More sophisticated caching algorithms leveraging machine learning to predict and preload data Tighter integration between caching solutions and emerging technologies like persistent memory Expansion of caching concepts to other areas of the data center, such as network and compute resources

Amount of data being generated annually (in Zettabyte), extrapolated 2018-2025, source Statista
Figure 3: Amount of data being generated annually (in Zettabyte), extrapolated 2018-2025, source Statista

Conclusion: Free the Full Performance Potential of your Data

In an era where data is the breath of our businesses, the ability to access and process information quickly can make or break an organization’s success. Simplyblock’s local SSD caching functionality represents a significant leap forward in our ability to harness the full performance potential of our data.

By offering a solution that combines the performance of local SSDs with the flexibility and scalability of network-attached persistent storage, simplyblock empowers businesses to achieve unprecedented storage performance without sacrificing cost-effectiveness or ease of use and management.

As we look to the future, innovations like simpleblock’s caching functionality will play a pivotal role in shaping the next generation of data storage and processing technologies. For organizations looking to stay ahead of the curve and unlock the full potential of their data, embracing these cutting-edge solutions is not just an option—it’s a necessity.

Learn more about simplyblock and its innovative storage solutions.

The post Say Goodbye to High Data Access-Latency Cloud Storage with Simplyblock’s Local NVMe SSD Caching appeared first on simplyblock.

]]>
simplyblock-storage-and-local-ssd-caching-architecture simplyblock-local-ssd-caching Amount of data being generated annually (in Zettabyte), extrapolated 2018-2025, source Statista
What is Software-Defined Storage (SDS)? https://www.simplyblock.io/blog/what-is-software-defined-storage-sds/ Wed, 29 May 2024 12:10:49 +0000 https://www.simplyblock.io/?p=264 Software-defined (block) storage solutions, or SDS, decouple the software storage layer from the underlying hardware. This allows for centralized management and automation of storage resources through the software abstraction layer and enables performant and simplified deployments of block, file, and object storage. Unlike traditional storage solutions, which typically rely heavily on proprietary hardware, software-defined storage […]

The post What is Software-Defined Storage (SDS)? appeared first on simplyblock.

]]>
Software-defined (block) storage solutions, or SDS, decouple the software storage layer from the underlying hardware. This allows for centralized management and automation of storage resources through the software abstraction layer and enables performant and simplified deployments of block, file, and object storage.

Unlike traditional storage solutions, which typically rely heavily on proprietary hardware, software-defined storage leverages commodity hardware and virtualization technologies. Software-defined storage enables companies to deploy, operate, and scale storage resources with greater flexibility and cost efficiency. Simplyblock is a prime example of SDS, enabling unmatched deployment flexibility with the reliability of traditional SAN systems.

How Software-Defined Storage Works

Software-defined storage is, first and foremost, software that abstracts the hardware from data management and visible data storage. This enables a high degree of flexibility when choosing storage hardware and provides the potential to build a storage solution that perfectly fits one’s needs in terms of performance, capacity, and scalability requirements.

Software-defined storage has multiple facets. It is sometimes bundled as a full operating system (often based on Linux or FreeBSD) or as a software layer installed on a common OS (most commonly Linux) installation. In either case, the physical hardware is managed by a general-purpose operating system, while the storage management is delivered in software.

To run software-defined storage, a suitable hardware or virtualization platform needs to be selected. Depending on the SDS solution, virtual cloud hosts (e.g., AWS Amazon EC2, Google Compute Engine VMs, or similar), on-premise virtual machines such as VMware VMs, or physical, dedicated storage servers can be used. Either way, the “physical” layer provides the actual storage capacity.

Software-Defined Storage is Not…

While software-defined storage is often used as a synonym for storage virtualization, that isn’t actually true. Storage virtualization defines the capability to combine and pool multiple local or remote storage devices into a single, large storage pool. For that reason, many SDS solutions are also storage virtualization solutions to some extent, hence the mix-up of the individual terms. However, building an SDS solution without the storage pooling option is perfectly possible.

Software-defined storage is also not a SaaS (Software as a Service) or IaaS (Infrastructure as a Service) solution. While it can be provided as a hosted and managed platform, it is more often not and is operated by the customer directly. That comes down to multiple factors, such as data privacy concerns or regulatory requirements, as well as specific configuration requirements.

Last but not least, software-defined storage isn’t necessarily a NAS (Network Attached Storage) or SAN (Storage Area Network). Since an SDS isn’t required to be built from a cluster of storage nodes or even a set of storage drives, there is no requirement to be able to pool them into a single storage space. In addition, an SDS solution isn’t necessarily connected through a network interconnect to the host machine, which consumes the storage. That said, while both SAN and NAS aren’t factual ingredients of SDS, just like storage virtualization, they are often part of the SDS solution for a broader set of use cases and increased flexibility.

Before and After: Software-Defined Storage vs Traditional Storage

Traditional enterprise storage setups are often based on proprietary hardware, meaning that multiple different storage systems are collected over time. These systems are often incompatible with each other, making it much harder to scale them or migrate between different solutions. That means that, more often than not, the setup gets stuck in time while new machines or generations are acquired for new use cases.

Traditional storage setup with separated storage solutions.

That leads to imbalanced use of the available storage resources. While some are at their capacity limit, others idle with plenty of unused free space. Migration between vendors or hardware generations is often complicated.

On the other hand, thanks to software-defined storage solutions, we have much more flexibility in terms of setups. Most SDS solutions feature storage virtualization (as mentioned above), which enables pooling the available storage resources and providing slices of them to the individual use cases.

Software Defined Storage solution with storage virtualization and combined performance and capacity.

These slices (e.g., logical block storage or any other pattern of storage type) can differ in capacity, performance characteristics, and even storage type. Depending on the software-defined storage in use, one or more of the typical storage types (file storage, block storage, and blob/object storage) may be available to workloads.

Due to the nature of storage pooling, migrations between the underlying, abstracted hardware are easy and normally (automatically) handled by the SDS. The same is true for scalability. If free available storage gets sparse, additional storage hardware can be added. Depending on the solution in place, this can be a seamless online operation or require downtime.

Benefits of Software-Defined Storage

With all that said, Software Defined Storage has some clear advantages over the traditional, hardware-based storage options.

  1. The unified storage layer enables flexibility and easy migration. From a consumer’s perspective, the logical devices look the same, no matter where and how they are stored on the abstracted hardware.
  2. The typically integrated storage pooling enabled a great degree of scalability. Starting small and adding additional hardware to the storage pool at a later point in time enables cost-effective storage usage without wasting unused capacity.
  3. Choosing your own hardware enables you to build storage systems that meet the requirements in terms of performance, reliability, and capacity. There is no vendor lock-in and no reliance on proprietary hardware.
  4. Overall, a typical software-defined storage solution enables the most cost-effective way to store data through optimized hardware configurations, storage pooling (storage virtualization), features like thin provisioning, and more.

Hyper-Converged Storage

Hyper-converged storage is a deployment pattern in which the storage solution is installed in the same cluster as the application. This consolidates storage, compute, and networking resources into a single integrated system.

This architecture co-locates storage with compute within a single cluster environment (most commonly Kubernetes). This simplifies management but often limits scalability and performance due to resource sharing with other use cases.

Hyper-converged storage solutions typically utilize distributed architectures and instance-local flash storage to deliver high throughput and low latency.

Disaggregated Storage

Disaggregated storage is an architecture in which storage resources are separated from compute resources, allowing them to be managed and scaled independently.

Unlike traditional storage systems where storage is tightly integrated with compute within individual servers or nodes, disaggregated storage pools storage resources separately from compute resources across a network.

Disaggregated storage enables easier scalability since storage resources and compute resources are distinct concerns and clusters. That means a storage cluster can be scaled up even if no additional compute resources are required. Many databases will grow over time, increasing the storage needs without additional compute power requirements.

Get the most out of your Storage with Simplyblock

Simplyblock is the next generation of software-defined block storage, enabling storage requirements for the most demanding workloads. Pooled storage and our distributed data placement algorithm enable high IOPS per Gigabyte density, low, predictable latency, and high throughput. Using erasure coding (a better RAID) instead of replicas helps minimize storage overhead without sacrificing data safety and fault tolerance.

Additional features include instant snapshots (full and incremental), copy-on-write clones, thin provisioning, compression, encryption, and many more. Simplyblock’s software-defined block storage meets your requirements before you set them. Get started using simplyblock right now, or learn more about our feature set.

The post What is Software-Defined Storage (SDS)? appeared first on simplyblock.

]]>
Traditional storage setup with separated storage solutions. Software Defined Storage solution with storage virtualization and combined performance and capacity.