Cloud Knowledge Archives | simplyblock

Serverless Compute Need Serverless Storage

Chris Engelbert — Wed, 23 Oct 2024 11:37:27 +0000

The use of serverless infrastructures is steeply increasing. As the Datadog “State of Serverless 2023” survey shows, more than half of all cloud customers have already adopted a serverless environment on the three big hyperscalers—at least to some extent. The premise of saving cost while automatically and indefinitely scaling (up and down) increases the user base.

Due to this movement, other cloud operators, many database companies (such as Neon and Nile), and infrastructure teams at large enterprises are building serverless environments, either on their premises or in their private cloud platforms.

While there are great options for serverless compute, providing serverless storage to your serverless platform tends to be more challenging. This is often fueled by a lack of understanding of what serverless storage has to provide and its requirements.

What is a Serverless Architecture?

Serverless architecture is a software design pattern that leverages serverless computing resources to build and run applications without managing the underlying architecture. These serverless compute resources are commonly provided by cloud providers such as AWS Lambda, Google Cloud Functions, or Azure Functions and can be dynamically scaled up and down.

Simplified serverless architecture with clients and multiple functions

When designing a serverless architecture, you’ll encounter the so-called Function-as-a-Service (FaaS), meaning that the application’s core logic will be implemented in small, stateless functions that respond to events.

That said, typically, several FaaS make up the actual application, sending events between them. Since the underlying infrastructure is abstracted away, the functions don’t know how requests or responses are handled, and their implementations are designed for vendor lock-in and built against a cloud-provider-specific API.

Cloud-vendor-agnostic solutions exist, such as knative, but require at least parts of the team to manage the Kubernetes infrastructure. They can, however, take the burden away from other internal and external development teams.

What is Serverless Compute?

While a serverless architecture describes the application design that runs on top of a serverless compute infrastructure, serverless compute itself describes the cloud computing model in which the cloud provider dynamically manages the allocation and provisioning of server resources.

Simplified serverless platform architecture

It is essential to understand that serverless doesn’t mean “without servers” but “as a user, I don’t have to plan, provision, or manage the infrastructure.”

In essence, the cloud provider (or whoever manages the serverless infrastructure) takes the burden from the developer. Serverless compute environments fully auto-scale, starting or stopping instances of the functions according to the needed capacity. Due to their stateless nature, it’s easy to stop and restart them at any point in time. That means that function instances are often very short-lived.

Popular serverless compute platforms include AWS Lambda, Google Cloud Functions, and Azure Functions. For self-managed operations, there is knative (mentioned before), as well as OpenFaaS and OpenFunction (which seems to have less activity in the recent future).

They all enable developers to focus on writing code without managing the underlying infrastructure.

What is a Serverless Storage System?

Serverless storage refers to a cloud storage model where the underlying infrastructure, capacity planning, and scaling are abstracted away from the user. With serverless storage, customers don’t have to worry about provisioning or managing storage servers or volumes. Instead, they can store and retrieve data while the serverless storage handles all the backend infrastructure.

Serverless storage solutions come in different forms and shapes, beginning with an object storage interface, such as Amazon S3 or Google Cloud Storage. Object storage is excellent when storing unstructured data, such as documents or media.

Serverless storage options are available in GCP, AWS, and Azure

Another option that people love to use for serverless storage is serverless databases. Various options are available, depending on your needs: relational, NoSQL, time-series, and graph databases. This might be the easiest way to go, depending on how you need to access data. Examples of such serverless databases include Amazon Aurora Serverless, Google’s Cloud Datastore, and external companies such as Neon or Nile.

When self-managing your serverless infrastructure with knative or one of the alternative options, you can use Kubernetes CSI storage providers to provide storage into your functions. However, you may add considerable startup time if you choose the wrong CSI driver. I might be biased, but simplyblock is an excellent option with its neglectable provisioning and attachment times, as well as features such as multi-attach, where a volume can be attached to multiple functions (for example, to provide a shared set of data).

Why Serverless Architectures?

Most people think of cost-efficiency when it comes to serverless architectures. However, this is only one side of the coin. If your use cases aren’t a good fit for a serverless environment, it will hold true—more on when serverless makes sense later.

In serverless architectures, functions are triggered through an event, either from the outside world (like an HTTP request) or an event initiated by another function. If no function instance is up and running, a new instance will be started. The same goes for situations where all function instances are busy. If function instances idle, they’ll be shut down.

Serverless functions usually use a pay-per-use model. A function’s extremely short lifespan can lead to cost reductions over deployment models like containers and virtual machines, which tend to run longer.

Apart from that, serverless architectures have more benefits. Many are moving in the same direction as microservices architectures, but with the premise that they are easier to implement and maintain.

First and foremost, serverless solutions are designed for scalability and elasticity. They quickly and automatically scale up and down depending on the incoming workload. It’s all hands-free.

Another benefit is that development cycles are often shortened. Due to the limited size and functionality of a FaaS, changes are fast to implement and easy to test. Additionally, updating the function is as simple as deploying the new version. All existing function instances finish their current work and shut down. In the meantime, the latest version will be started up. Due to its stateless nature, this is easy to achieve.

What are the Complexities of Serverless Architecture?

Writing serverless solutions has the benefits of fast iteration, simplified deployments, and potential cost savings. However, they also come with their own set of complexities.

Designing real stateless code isn’t easy, at least when we’re not just talking about simple transformation functionality. That’s why a FaaS receives and passes context information along during its events.

What works great for small bits of context is challenging for larger pieces. In this situation, a larger context, or state, can mean lots of things, starting from simple cross-request information that should be available without transferring it with every request over more involved data, such as lookup information to enrich and cross-check, all the way to actual complex data, like when you want to implement a serverless database. And yes, a serverless database needs to store its data somewhere.

That’s where serverless storage comes in, and simply put, this is why all serverless solutions have state storage alternatives.

What is Serverless Storage?

Serverless storage refers to storage solutions that are fully integrated into serverless compute environments without manual intervention. These solutions scale and grow according to user demand and complement the pay-by-use payment model of serverless platforms.

Serverless storage lets you store information across multiple requests or functions.

As mentioned above, cloud environments offer a wide selection of serverless storage options. However, all of them are vendor-bound and lock you into their services.

However, when you design your serverless infrastructure or service, these services don’t help you. It’s up to you to provide the serverless storage. In this case, a cloud-native and serverless-supporting storage engine can simplify this talk immensely. Whether you want to provide object storage, a serverless database, or file-based storage, an underlying cloud-native block storage solution is the perfect building block underneath. However, this block storage solution needs to be able to scale and grow with your needs easily and quickly to provision and support snapshotting, cloning, and attaching to multiple function instances.

Why do Serverless Architectures Require Serverless Storage?

Serverless storage has particular properties designed for serverless environments. It needs to keep up with the specific requirements of serverless architectures, most specifically short lifetimes, extremely fast up and down scaling or restarts, easy use across multiple versions during updates, and easy integration through APIs utilized by the FaaS.

The most significant issues are that it must be used by multiple function instances simultaneously and is quickly available to new instances on other nodes, regardless of whether those are migrated over or used for scaling out. That means that the underlying storage technology must be prepared to handle these tasks easily.

These are just the most significant requirements, but there are more:

Stateless nature: Serverless functions spin up, execute, and terminate due to their stateless nature. Without fast, persistent storage that can be attached or accessed without any additional delay, this fundamental property of serverless functions would become a struggle.
Scalability needs: Serverless compute is built to scale automatically based on user demand. A storage layer needs to seamlessly support the growth and shrinking of serverless infrastructures and handle variations in I/O patterns, meaning that traditional storage systems with fixed capacity limits don’t align well with the requirements of serverless workloads.
Cost efficiency: One reason people engage with serverless compute solutions is cost efficiency. Serverless compute users pay by actual execution time. That means that serverless storage must support similar payment structures and help serverless infrastructure operators efficiently manage and scale their storage capacities and performance characteristics.
Management overhead: Serverless compute environments are designed to eliminate manual server management. Therefore, the storage solution needs to minimize its manual administrative tasks. Allocating and scaling storage requirements must be fully integratable and automated via API calls or fully automatic. Also, the integration must be seamless if multiple storage tiers are available for additional cost savings.
Performance requirements: Serverless functions require fast, if not immediate, access to data when they spin up. Traditional storage solutions introduce delays due to allocation and additional latency, negatively impacting serverless functions’ performance. As functions are paid by runtime, their operational cost increases.
Integration needs: Serverless architectures typically combine many services, as individual functions use different services. That said, the underlying storage solution of a serverless environment needs to support all kinds of services provided to users. Additionally, seamless integration with the management services of the serverless platform is required.

There are quite some requirements. For the alignment of serverless compute and serverless storage, storage solutions need to provide an efficient and manageable layer that seamlessly integrates with the overall management layer of the serverless platform.

Simplyblock for Serverless Storage

When designing a serverless environment, the storage layer must be designed to keep up with the pace. Simplyblock enables serverless infrastructures to provide dynamic and scalable storage.

To achieve this, simplyblock provides several characteristics that perfectly align with serverless principles:

Dynamic resource allocation: Simplyblock’s thin provisioning makes capacity planning irrelevant. Storage is allocated on-demand as data is written, similar to how serverless platforms allocate resources. That means every volume can be arbitrarily large to accommodate unpredictable future growth. Additionally, simplyblock’s logical volumes are resizable, meaning that the volume can be enlarged at any point in the future.
Automatic scaling: Simplyblock’s storage engine can indefinitely grow. To acquire additional backend storage, simplyblock can automatically acquire additional persistent disks (like Amazon EBS volumes) from cloud providers or attach additional storage nodes to its cluster when capacity is about to exceed, handling scaling without user intervention.
Abstraction of infrastructure: Users interact with simplyblock’s virtual drives like normal hard disks. This abstracts away the complexity of the underlying storage pooling and backend storage technologies.
Unified interface: Simplyblock provides a unified storage interface (NVMe) logical device that abstracts away underlying, diverging storage interfaces behind an easy-to-understand disk design. That enables services not specifically designed to talk to object storages or similar technologies to immediately benefit from them, just like PostgreSQL or MySQL.
Extensibility: Due to its disk-like storage interface, simplyblock is highly extensible in terms of solutions that can be run on top of it. Databases, object storage, file storage, and specific storage APIs, simplyblock provides scalable block storage to all of them, making it the perfect backend solution for serverless environments.
Crash-consistent and recoverable: Serverless storage must always be up and running. Simplyblock’s distributed erasure coding (parity information similar to RAID-5 or 6) enables high availability and fault tolerance on the storage level with a high storage efficiency, way below simple replication. Additionally, simplyblock provides storage cluster replication (sync / async), consistent snapshots across multiple logical volumes, and disaster recovery options.
Automated management: With features like automatic storage tiering to cheaper object storage (such as Amazon S3), automatic scaling, as well as erasure coding and backups for data protection, simplyblock eliminates manual management overhead and hands-on tasks. Simplyblock clusters are fully autonomous and manage the underlying storage backend automatically.
Flexible integration: Serverless platforms require storage to be seamlessly allocated and provisioned. Simplyblock achieves this through its API, which can be integrated into the standard provisioning flow of new customer sign-ups. If the new infrastructure runs on Kubernetes, integration is even easier with the Kubernetes CSI driver, allowing seamless integration with container-based serverless platforms such as knative.
Pay-per-use potential: Due to the automatic scalability, thin provisioning, and seamless resizing and integration, simplyblock enables you to provide your customers with an industry-loved pay-by-use model for managed service providers, perfectly aligning with the most common serverless pricing models.

Simplyblock is the perfect backend storage for all your serverless storage needs while future-proofing your infrastructure. As data grows and evolves, simplyblock’s flexibility and scalability ensure you can adapt without massive overhauls or migrations.

Remember, simplyblock offers powerful features like thin provisioning, storage pooling, and tiering, helping you to provide a cost-efficient, pay-by-use enabled storage solution. Get started now and find out how easy it is to operate services on top of simplyblock.

The post Serverless Compute Need Serverless Storage appeared first on simplyblock.

AWS Storage Optimization: Avoid EBS Over-provisioning

Chris Engelbert — Thu, 10 Oct 2024 07:36:59 +0000

“Cloud is expensive” is an often repeated phrase among IT professionals. What makes the cloud so expensive, though? One element that significantly drives cloud costs is storage over-provisioning and lack of storage optimization. Over-provisioning refers to the eager allocation of more resources than required by a specific workload at the time of allocation.

When we hear about hoarding goods, we often think of so-called preppers preparing for some type of serious event. Many people would laugh about that kind of behavior. However, it is commonplace when we are talking about cloud environments.

In the past, most workloads used their own servers, often barely utilizing any of the machines. That’s why we invented virtualization techniques, first with virtual machines and later with containers. We didn’t like the idea of wasting resources and money.

That didn’t stop when workloads were moved to the cloud, or did it?

What is Over-Provisioning?

As briefly mentioned above, over-provisioning refers to allocating more resources than are needed for a given workload or application. That means we actively request more resources than we need, and we know it. Over-provisioning typically occurs across various infrastructure components: CPU, memory, and storage. Let’s look at some basic examples to understand what that means:

CPU Over-Provisioning: Imagine running a web server on a virtual machine instance (e.g., Amazon EC2) with 16 vCPUs. At the same time, your application only requires four vCPUs for the current load and number of customers. You expect to increase the number of customers in the next year or so. Until then, the excess computing power sits idle, wasting resources and money.
Memory Over-Provisioning: Consider a database server provisioned with 64GB of RAM when the database service commonly only uses 16GB, except during peak loads. The unused memory is essentially paid for but unutilized most of the time.
Storage Over-Provisioning: Consider a Kubernetes cluster with ten instances of the same stateful service (like a database), each requesting a block storage volume (e.g., Amazon EBS) of 100 GB but will only slowly fill it up over the course of a year. In this case, each container uses about 20 GB as of now, meaning we over-provisioned 800 GB, and we have to pay for it.

Why is EBS Over-Provisioning an Issue?

EBS Over-provisioning isn’t an issue by itself, and we lived happily ever after (almost) with it for decades. While over-provisioning seems to be the safe bet to ensure performance and plannability, it comes with a set of drawbacks.

High initial cost: When you overprovision, you pay for resources you don’t use from day one. This can significantly inflate your cloud bill, especially at scale.
Resource waste: Unused resources aren’t just a financial burden. They also waste valuable computing power that could be better allocated elsewhere. Not to mention the environmental effects of over-provisioning, think CO2 footprint.
Hard to estimate upfront: Predicting exact resource needs is challenging, especially for new applications or those with variable workloads. This uncertainty often leads us to very conservative (and excessive) provisioning decisions.
Limitations when resizing: While cloud providers like AWS allow resource resizing, limitations exist. Amazon EBS volumes can only be modified every 6 hours, making it difficult to adjust to changing needs quickly.

On top of those issues, which are all financial impact related, over-provisioning can also directly or indirectly contribute to topics such as:

Reduced budget for innovation
Complex and hard-to-manage infrastructures
Potential compliance issues in regulated industries
Decreased infrastructure efficiency

The Solution is Pay-By-Use

Pay-by-use refers to the concept that customers are billed only for what they actually use. That said, using our earlier example of a 100 GB Amazon EBS volume where only 20 GB is used, we would only be charged for those 20 GB. As a customer, I’d love the pay-by-use option since it makes it easy and relieves me of the burden of the initial estimate.

So why isn’t everyone just offering pay-by-use models?

The Complexity of Pay-By-Use

Many organizations dream of an actual pay-by-use model, where they only pay for the exact resources consumed. This improves the financial impact, optimizes the overall resource utilization, and brings environmental benefits. However, implementing this is challenging for several reasons:

Technical Complexity: Building a system that can accurately measure and bill for precise resource usage in real time is technically complex.
Performance Concerns: Constant scaling and de-scaling to match exact usage can potentially impact performance and introduce latency.
Unpredictable Costs: While pay-by-use can save money, it can also make costs less predictable, making budgeting challenging.
Legacy Systems: Many existing applications aren’t designed to work with dynamically allocated resources.
Cloud Provider Greed: While this is probably exaggerated, there is still some truth. Cloud providers overcommit CPU, RAM, and network bandwidth, which is why they offer both machine types with dedicated resources and ones without (where they tend to over-provision resources, and you might encounter the “noisy neighbor” problem). On the storage side, they thinly provision your storage out of a large, ever-growing storage pool.

Over-Provisioning in AWS

Like most cloud providers, AWS has several components where over-provisioning is typical. The most obvious one is resources around Amazon EC2. However, since many other services are built upon EC2 machines (like Kubernetes clusters), this is the most common entry point to look into optimization.

Amazon EC2 (CPU and Memory)

When looking at Amazon EC2 instances to save some hard-earned money, AWS offers some tools by itself:

Use AWS CloudWatch to monitor CPU and memory utilization.
Implement auto-scaling groups to adjust instance counts dynamically based on demand.
Consider using EC2 Auto Scaling with predictive scaling to anticipate future needs.

In addition, some external tools, such as AutoSpotting or Cast.ai, enable you to find over-provisioned VMs and adjust them accordingly automatically or exchange them with so-called spot instances. Spot instances are VM instances that are way cheaper but can be taken away from you with only a few seconds’ notice. The idea is that AWS offers these instances at a reduced rate when they can’t be sold for their regular price. That said, if the capacity is required, they’ll take them away from you—still a great way to save some money.

Last but not least, companies like DoIT work as resellers for hyperscalers like AWS. They have custom rates and offer additional features like bursting beyond your typical requirements. This is a great way to get cheaper VMs and extra services. It’s worth a look.

Amazon EBS Storage Over-Provisioning

One of the most common causes of over-provisioning happens with block storage volumes, such as Amazon EBS. With EBS, the over-provisioning is normally driven by:

Pre-allocated Capacity: EBS volumes are provisioned with a fixed size, and you pay for the entire allocated space regardless of usage.
Modification Limitations: EBS volumes can only be modified every 6 hours, making rapid adjustments difficult.
Performance Considerations: A common belief is that larger volumes perform better, so people feel incentivized to over-provision.

One interesting note, though, is that while customers have to pay for the total allocated size, AWS likely uses technologies such as thin provisioning internally, allowing it to oversell its actual physical storage. Imagine this overselling margin would be on your end and not the hyperscaler.

How Simplyblock Can Help with EBS Storage Over-Provisioning

Simplyblock offers an innovative storage optimization platform to address storage over-provisioning challenges. By providing you with a comprehensive set of technologies, simplyblock enables several features that significantly optimize storage usage and costs.

Thin Provisioning

Thin provisioning is a technique where a storage entity of any capacity will be created without pre-allocating the requested capacity. A thinly provisioned volume will only require as much physical storage as the data consumes at any point in time. This enables overcommitting the underlying storage, like ten volumes with a provisioned capacity of 1 TB each. Still, only 100GB being used will require around 1 TB at this time, meaning you can save around 9 TB of storage that is not paid for unless used.

Simplyblock’s thin provisioning technology allows you to create logical volumes of any size without pre-allocating the total capacity. You only consume (and pay for) the actual space your data uses. This eliminates the need to over-provision “just in case” and allows for more efficient use of your storage resources. When your actual storage requirements increase, simplyblock automatically allocates additional underlying storage to keep up with your demands.

Copy-on-Write, Snapshots, and Instant Clones

Simplyblock’s storage technology is a fully copy-on-write-enabled system. Copy-on-write is a technique also known as shadowing. Instead of copying data right away when multiple copies are created, copy-on-write will only create a second instance when the data is actually changed. This means the old version is still around since other copies still refer to it, while only one specific copy refers to the changed data. Copy-on-write enables the instant creation of volume snapshots and clones without duplicating data. This is particularly useful for development and testing environments, where multiple copies of large datasets are often needed. Instead of provisioning full copies of production data, you can create instant, space-efficient clones specifically attractive for databases, AI / ML workloads, or analytics data.

Transparent Tiering

With most data sets, parts of the data are typically assumed to be “cold,” meaning that the data is very infrequently used, if ever. This is true for any data that needs to be kept available for regulatory reasons or historical manufacturing data (such as process information for car part manufacturing). This data can be moved to slower but much less expensive storage options. Simplyblock automatically moves infrequently accessed data to cheaper storage tiers such as object storage (e.g., Amazon S3 or MinIO) and non-NVMe SSD or HDD pools while keeping hot data on high-performance storage. This tiering is completely transparent to your applications, database, or other workload and helps optimize costs without sacrificing performance. With tiering integrated into the storage layer, application and system developers can focus on business logic rather than storage requirements.

Storage Pooling

Storage pooling is a technique in which multiple storage devices or services are used in conjunction. It enables technologies like thin provisioning and data tiering, which were already mentioned above.

By pooling multiple cloud block storage volumes (e.g., Amazon EBS volumes), simplyblock can provide better performance and more flexible scaling. This pooling allows for more granular storage growth, preventing the provision of large EBS volumes upfront.

Additionally, simplyblock can leverage directly attached fast SSD storage (NVMe), also called local instance storage, and make it part of the storage pool or use it as an even faster workload-local data cache.

NVMe over Fabrics

NVMe over Fabrics is an industry-standard for remotely attaching block devices to clients. It can be assumed to be the successor of iSCSI and enables the full feature set and performance of NVMe-based SSD storage. Simplyblock uses NVMe over Fabrics (specifically the NVMe/TCP version) to provide high-performance, low-latency access to storage.

This enables the consolidation of multiple storage locations into a centralized one, enabling even greater savings on storage capacity and compute power.

Pay-By-Use Model Enablement

As stated above, pay-by-use models are a real business advantage, specifically for storage. Implementing a pay-by-use model in the cloud requires taking charge of how storage works. This is complex and requires a lot of engineering effort. This is where simplyblock helps bring a competitive advantage to your doorstep.

With its underlying technology and features such as thin provisioning, simplyblock makes it easier for managed service providers to implement a true pay-by-use model for their customers, giving you the competitive advantage at no extra cost or development effort, all fully transparent to your database or application workload.

AWS Storage Optimization with Simplyblock

By addressing the core issues of EBS over-provisioning, simplyblock helps reduce costs and improves overall storage efficiency and flexibility. For businesses struggling with storage over-provisioning in AWS, simplyblock offers a compelling solution to optimize their infrastructure and better align costs with actual usage.

In conclusion, while over-provisioning remains a significant challenge in AWS environments, particularly with storage, simplyblock paves the way for more efficient, cost-effective cloud storage optimization management. By combining advanced technologies with a deep understanding of cloud storage dynamics, simplyblock enables businesses to achieve the elusive goal of paying only for what they use without sacrificing performance or flexibility.

Take your competitive advantage and get started with simplyblock today.

The post AWS Storage Optimization: Avoid EBS Over-provisioning appeared first on simplyblock.

Why would you run PostgreSQL in Kubernetes, and how?

Chris Engelbert — Wed, 02 Oct 2024 13:12:26 +0000

Running PostgreSQL in Kubernetes

When you need a PostgreSQL service in the cloud, there are two common ways to achieve this. The initial thought is going for one of the many hosted databases, such as Amazon RDS or Aurora, Google’s CloudSQL, Azure Database for Postgres, and others. An alternative way is to self-host a database. Something that was way more common in the past when we talked about virtual machines but got lost towards containerization. Why? Many believe containers (and Kubernetes specifically) aren’t a good fit for running databases. I firmly believe that cloud databases, while seemingly convenient at first sight, aren’t a great way to scale and that the assumed benefits are not what you think they are. Now, let’s explore deeper strategies for running PostgreSQL effectively in Kubernetes.

Many people still think running a database in Kubernetes is a bad idea. To understand their reasoning, I did the only meaningful thing: I asked X (formerly Twitter) why you should not run a database. With the important addition of “asking for a friend.” Never forget that bit. You can thank me later 🤣

The answers were very different. Some expected, some not.

K8s is not designed with Databases in Mind!

When Kubernetes was created, it was designed as an orchestration layer for stateless workloads, such as web servers and stateless microservices. That said, it initially wasn’t intended for workloads like databases or any other workload that needs to hold any state across restarts or migration.

So while this answer had some initial merit, it isn’t true today. People from the DoK (Data on Kubernetes) Community and the SIG Storage (Special Interest Group), which is responsible for the CSI (Container Storage Interface) driver interface, as well as the community as a whole, made a tremendous effort to bring stateful workloads to the Kubernetes world.

Never run Stateful Workloads in Kubernetes!

From my perspective, this one is directly related to the claim that Kubernetes isn’t made for stateful workloads. As mentioned before, this was true in the past. However, these days, it isn’t much of a problem. There are a few things to be careful about, but we’ll discuss some later.

Persistent Data will kill you! Too slow!

When containers became popular in the Linux world, primarily due to the rise of Docker, storage was commonly implemented through overlay filesystems. These filesystems had to do quite the magic to combine the read-only container image with some potential (ephemeral) read-write storage. Doing anything IO-heavy on those filesystems was a pain. I’ve built Embedded Linux kernels inside Docker, and while it was convenient to have the build environment set up automatically, IO speed was awful.

These days, though, the CSI driver interface enables direct mounting of all kinds of storage into the container. Raw blog storage, file storage, FUSE filesystems, and others are readily available and often offer immediate access to functionality such as snapshotting, backups, resizing, and more. We’ll dive a bit deeper into storage later in the blog post.

Nobody understands Kubernetes!

This is my favorite one, especially since I’m all against the claim that Kubernetes is easy. If you never used Kubernetes before, a database isn’t the way to start. Not … at … all. Just don’t do it. If you’re not familiar with Kubernetes, avoid using it for your database.

What’s the Benefit? Databases don’t need Autoscaling!

That one was fascinating. Unfortunately, nobody from this group responded to the question about their commonly administered database size. It would’ve been interesting. Obviously, there are perfect use cases for a database to be scaled—maybe not storage-wise but certainly compute-wise.

The simplest example is an online shop handling the Americas only. It’ll mostly go idle overnight. The database compute could be scaled down close to zero, whereas, during the day, you have to scale it up again.

Databases and Applications should be separated!

I couldn’t agree more. That’s what node groups are for. It probably goes back to the fact that “nobody understands Kubernetes,” so you wouldn’t know about this feature.

Simply speaking, node groups are groups of Kubernetes worker nodes commonly grouped by hardware specifications. You can tag and taint those nodes to specify which workloads are supposed to be run on and by them. This is super useful!

Not another Layer of Indirection / Abstraction!

Last but not least is the abstraction layer argument. And this is undoubtedly a valid one. If everything works, the world couldn’t be better, but if something goes wrong, good luck finding the root cause. And it only worsens the more abstraction you add, such as service meshes or others. Abstraction layers are two sides of the same coin, always.

Why run PostgreSQL in Kubernetes?

If there are so many reasons not to run my database in Kubernetes, why do I still think it’s not only a good idea but should be the standard?

No Vendor Lock-in

First and foremost, I firmly believe that vendor lock-in is dangerous. While many cloud databases offer standard protocols (such as Postgres or MySQL compatible), their internal behavior or implementation isn’t. That means that over time, your application will be bound to a specific database’s behavior, making it an actual migration whenever you need to move your application or, worse, make it cross-cloud or hybrid-compatible.

Kubernetes abstracts away almost all elements of the underlying infrastructure, offering a unified interface. This makes it easy to move workloads and deployments from AWS to Google, from Azure to on-premise, from everywhere to anywhere.

Unified Deployment Architecture

Furthermore, the deployment landscape will look similar—there will be no special handling by cloud providers or hyperscalers. You have an ingress controller, a CSI driver for storage, and the Cert Manager to provide certificates—it’s all the same.

This simplifies development, simplifies deployment, and, ultimately, decreases the time to market for new products or features and the actual development cost.

Automation

Last, the most crucial factor is that Kubernetes is an orchestration platform. As such, it is all about automating deployments, provisioning, operation, and more.

Kubernetes comes with loads of features that simplify daily operations. These include managing the TLS certificates and up- and down-scaling services, ensuring multiple instances are distributed across the Kubernetes cluster as evenly as possible, restarting failed services, and the list could go on forever. Basically, anything you’d build for your infrastructure to make it work with minimal manual intervention, Kubernetes has your back.

Best Practices when running PostgreSQL on Kubernetes

With those things out of the way, we should be ready to understand what we should do to make our Postgres on K8s experience successful.

While many of the following thoughts aren’t exclusively related to running PostgreSQL on Kubernetes, there are often small bits and pieces that we should be aware of or that make our lives easier than implementing them separately.

That said, let’s dive in.

Enable Security Features

Let’s get the elephant out of the room first. Use security. Use it wherever possible, meaning you want TLS encryption between your database server and your services or clients. But that’s not all. If you use a remote or distributed storage technology, make sure all traffic from and to the storage system is also encrypted.

Kubernetes has excellent support for TLS using Cert Manager. It can create self-signed certificates or sign them using existing certificate authorities (either internal or external, such as Let’s Encrypt).

You should also ensure that your stored data is encrypted as much as possible. At least enable data-at-rest encryption. You must make sure that your underlying storage solution supports meaningful encryption options for your business. What I mean by that is that a serverless or shared infrastructure might need an encryption key per mounted volume (for strong isolation between customers). At the same time, a dedicated installation can be much simpler using a single encryption key for the whole machine.

You may also want to consider extended Kubernetes distributions such as Edgeless Systems’ Constellation, which supports fully encrypted memory regions based on support for CPUs and GPUs. It’s probably the highest level of isolation you can get. If you need that level of confidence, here you do. I talked to Moritz from Edgeless Systems in an early episode of my Cloud Commute podcast. You should watch it. It’s really interesting technology!

Backups and Recovery

At conferences, I love to ask the audience questions. One of them is, “Who creates regular backups?” Most commonly, all the room had their hands up. If you add a second question about storing backups off-site (different data center, S3, whatever), about 25% to 30% of the hands already go down. That, in itself, is bad enough.

Adding a third question on regularly testing their backups by playing them back, most hands are down. It always hurts my soul. We all know we should do it, but testing backups at a regular cadence isn’t easy. Let’s face it: It’s tough to restore a backup, especially if it requires multiple systems to be restored in tandem.

Kubernetes can make this process less painful. When I was at my own startup just a few years ago, we tested our backups once a week. You’d say this is extensive? Maybe it was, but it was pretty painless to do. In our case, we specifically restored our PostgreSQL + Timescale database. For the day-to-day operations, we used a 3-node Postgres cluster: one primary and two replicas.

Running a Backup-Restore every week, thanks to Kubernetes

Every week (no, not Fridays 🤣), we kicked off a third replica. Patroni (an HA manager for Postgres) managed the cluster and restored the last full backup. Afterward, it would replay as much of the Write-Ahead Log (WAL) as is available on our Minio/S3 bucket and have the new replica join the cluster. Now here was the exciting part, would the node be able to join, replay the remaining WAL, and become a full-blown cluster member? If yes, the world was all happy. If not, we’d stop everything else and try to figure out what happened. Let me add that it didn’t fail very often, but we always had a good feeling that the backup worked.

The story above contains one more interesting bit. It uses continuous backups, sometimes also referred to as point-in-time recovery (PITR). If your database supports it (PostgreSQL does), make use of it! If not, a solution like simplyblock may be of help. Simplyblock implements PITR on a block storage level, meaning that it supports all applications on top that implement a consistent write pattern (which are hopefully all databases).

Don’t roll your own Backup Tool

Finally, use existing and tested backup tools. Do not roll your own. You want your backup tool to be industry-proven. A backup is one of the most critical elements of your setup, so don’t take it lightly. Or do you just have anybody build your house?

However, when you have to backup and restore multiple databases or stateful services at once for a consistent but split data set, you need to look into a solution that is more than just a database backup. In this case, simplyblock may be a good solution. Simplyblock can snapshot and backup multiple logical volumes at the same time, creating a consistent view of the world at that point in time and enabling a consistent restore across all services.

Do you use Extensions?

While not all databases are as extensible as PostgreSQL, quite a few have an extension mechanism.

If you need extensions that aren’t part of the standard database container images, remember that you have to build your own image layers. Depending on your company, that can be a challenge. Many companies want signed and certified container images, sometimes for regulatory reasons.

If you have that need, talk to whoever is responsible for compliance (SOC2, PCI DSS, ISO 27000 series, …) as early as possible. You’ll need the time. Compliance is crucial but also a complication for us as engineers or DevOps folks.

In general, I’d recommend that you try to stay with the default images as long as possible. Maybe your database has the option to side-load extensions from volume mounts. That way, you can get extensions validated and certified independently of the actual database image.

For PostgreSQL specifically, OnGres’ StackGres has a magic solution that spins up virtual image layers at runtime. They work on this technology independently from StackGres, so we might see this idea come to other solutions as well.

Think about Updates of your PostgreSQL and Kubernetes

Updates and upgrades are one of the most hated topics around. We all have been in a situation where an update went off the rails or failed in more than one way. Still, they are crucial.

Sometimes, updates bring new features that we need, and sometimes, they bring performance improvement, but they’ll always bring bug fixes. Just because our database isn’t publicly accessible (it isn’t, is it? 🤨) doesn’t mean we don’t have to ensure that essential updates (often critical security bug fixes) are applied. If you don’t believe me, you’d be surprised by how many data breaches or cyber-attacks come from employees. And I’m not talking about accidental leaks or social engineering.

Depending on your database, Kubernetes will not make your life easier. This is especially true for PostgreSQL whenever you have to run pg_upgrade. For those not deep into PG, pg_upgrade will upgrade the database data files from one Postgres version to another. For that to happen, it needs the current and the new Postgres installation, as well as double the storage since it’s not an in-place upgrade but rather a copy-everything-to-the-new-place upgrade.

While not every Postgres update requires you to run pg_upgrade, the ones that do hurt a lot. I bet there are similar issues with other databases.

The development cycles of Kubernetes are fast. It is a fast-moving target that adds new functionality, promotes functionality, and deprecates or changes functionality while still in Alpha or Beta. That’s why many cloud providers and hyperscalers only support the last 3 to 5 versions “for free.” Some providers, e.g., AWS, have implemented an extended support scheme that provides an additional 12 months of support for older Kubernetes versions for only six times the price. For that price difference, maybe hire someone to ensure that your clusters are updated.

Find the right Storage Provider

When you think back to the beginning of the blog post, people were arguing that Kubernetes isn’t made for stateful workloads and that storage is slow.

To prove them wrong, select a storage provider (with a CSI driver) that best fits your needs. Databases love high IOPS and low latency, at least most of them. Hence, you should run benchmarks with your data set, your specific data access patterns and queries, and your storage provider of choice.

Try snapshotting and rolling back (if supported), try taking a backup and restoring it, try resizing and re-attaching, try a failover, and measure how long the volume will be blocked before you can re-attach it to another container. All of these elements aren’t even about speed, but your Recovery Point Objective (RPO) and Recovery Time Objective (RTO). They need to fit your requirements. If they exceed them, that’s awesome, but if you find out you’ll have to breach them due to your storage solution, you’re in a bad spot. Migrations aren’t always fun, and I mean, never.

Last but not least, consider how the data is stored. Remember to check for data-at-rest encryption options. Potentially, you’ll need cross-data center or cross-availability zone replication. There are many things to think about upfront. Know your requirements.

How to find the best Storage Provider?

To help select a meaningful Kubernetes storage provider, I created a tool available at https://storageclass.info/csidrivers. It is an extensive collection of CSI driver implementations searchable by features and characteristics. The list contains over 150 providers. If you see any mistakes, please feel free to open a pull request. Most of the data is extracted manually by looking at the source code.

Requests, Limits, and Quotas

This one is important for any containerization solution. Most databases are designed with the belief that they can utilize all resources available, and so is PostgreSQL.

That should be the case for any database with a good amount of load. I’d always recommend having your central databases run on their own worker nodes, meaning that apart from the database and the essential Kubernetes services, nothing else should be running on the node. Give the database the freedom to do its job.

If you run multiple smaller databases, for example, a shared hosting environment or free tier service, sharing Kubernetes worker nodes is most probably fine. In this case, make sure you set the requests, limits, and quotas correctly. Feel free to overcommit, but keep the noise neighbor problem at the back of your head when designing the overcommitment.

Apart from that, there isn’t much to say, and a full explanation of how to configure these is out of the scope of this blog post. There is, however, a great beginner write-up by Ashen Gunawardena on Kubernetes resource configuration.

One side note, though, is that most databases (including PostgreSQL) love huge pages. Remember that huge pages must be enabled twice on the host operating system (and I recommend reserving memory to get a large continuous chunk) and in the database deployment descriptor. Again, Nickolay Ihalainen already has an excellent write-up. While this article is about Huge Pages with Kubernetes and PostgreSQL, much of the basics are the same for other databases, too.

Make your Database Resilient

One of the main reasons to run your database in the cloud is increased availability and the chance to scale up and down according to your current needs.

Many databases provide tools for high availability with their distributions. For others, it is part of their ecosystems, just as it is for PostgreSQL. Like with backup tools, I’d strongly discourage you from building your own cluster manager. If you feel like you have to, collect a good list of reasons. Do not just jump in. High availability is one of the key features. We don’t want it to fail.

Another resiliency consideration is automatic failover. What happens when a node in my database cluster dies? How will my client failover to a new primary or the newly elected leader?

For PostgreSQL you want to look at the “obvious choices” such as Patroni, repmgr, pg_auto_failover. There are more, but those seem to the ones to use with Patroni most probably leading the pack.

Connection Pool: Proxying Database Connections

In most cases, a database proxy will transparently handle those issues for your application. They typically handle features such as retrying and transparent failover. In addition, they often handle load balancing (if the database supports it).

This most commonly works by the proxy accepting and terminating the database connection, which itself has a set of open connections to the underlying database nodes. Now the proxy will forward the query to one of the database instances (in case of primary-secondary database setups, it’ll also make sure to send mutating operations to the primary database), wait for the result, and return it. If an operation fails because the underlying database instance is gone, the proxy can retry it against a new leader or other instance.

In PostgreSQL, you want to look into tools such as PgBouncer, PgPool-II, and PgCat, with PgBouncer being the most famous choice.

Observability and Monitoring

In the beginning, we established the idea that additional abstraction doesn’t always make things easier. Especially if something goes wrong, more abstraction layers make it harder to get to the bottom of the problem.

That is why I strongly recommend using an observability tool, not just a monitoring tool. There are a bunch of great observability tools available. Some of them are DataDog, Instana, DynaTrace, Grafana, Sumologic, Dash0 (from the original creators of Instana), and many more. Make sure they support your application stack and database stack as wholeheartedly as possible.

A great observability tool that understands the layers and can trace through them is priceless when something goes wrong. They often help to pinpoint the actual root cause and help understand how applications, services, databases, and abstraction layers work together.

Use a Kubernetes Operator

Ok, that was a lot, and I promise we’re almost done. So far, you might wonder how I can claim that any of this is easier than just running bare-metal or on virtual machines. That’s where Kubernetes Operators enter the stage.

Kubernetes Operators are active components inside your Kubernetes environment that deploy, monitor, and operate your database (or other service) for you. They ask for your specifications, like “give me a 3-node PostgreSQL cluster” and set it all up. Usually, including high availability, backup, failover, connection proxy, security, storage, and whatnot.

Operators make your life easy. Think of them as your database operations person or administrator.

For most databases, one or more Kubernetes Operators are available. I’ve written about how to select a PostgreSQL Kubernetes Operator for your setup. For other databases, look at their official documentation or search the Operator Hub.

Anyhow, if you run a database in Kubernetes, make sure you have an Operator at hand. Running a database is more complicated than the initial deployment. I’d even claim that day-to-day operations are more important than deployment.

Actually, for PostgreSQL (and other databases will follow), the Data on Kubernetes Community started a project to create a comparable feature list of PostgreSQL Kubernetes Operators. So far, there isn’t a searchable website yet (as for storage providers), but maybe somebody wants to take that on.

PostgreSQL in Kubernetes is not Cloud SQL

If you read to this point, thank you and congratulations. I know there is a lot of stuff here, but I doubt it’s actually complete. I bet if I’d dig deeper, I would find many more pieces to the game.

As I mentioned before, I strongly believe that if you have Kubernetes experience, your initial thought should be to run your database on Kubernetes, taking in all of the benefits of automation, orchestration, and operation.

One thing we shouldn’t forget, though, running a Postgres on Kubernetes won’t turn it into Cloud SQL, as Kelsey Hightower once said. However, using a cloud database will also not free you of the burden of understanding query patterns, cleaning up the database, configuring the correct indexes, or all the other elements of managing a database. They literally only take away the operations, and here you have to trust they do the right thing.

Anyhow, being slightly biased, I also believe that your database should use simplyblock’s database storage orchestration. We unify access to pooled Amazon EBS volumes, local instance storage, and Amazon S3, using a virtual block storage device that looks like any regular NVMe/SSD hard disk. Simplyblock enables automatic resizing of the storage pool, hence overcommitting the storage backends, snapshots, instant copy-on-write clones, S3-backed cross-availability zone backups, and many more. I recommend you try it out and see all the benefits for yourself.

The post Why would you run PostgreSQL in Kubernetes, and how? appeared first on simplyblock.

Simplyblock for AWS: Environments with many gp2 or gp3 Volumes

Michael Schmidt — Thu, 19 Sep 2024 21:49:02 +0000

When operating your stateful workloads in Amazon EC2 and Amazon EKS, data is commonly stored on Amazon’s EBS volumes. AWS supports a set of different volume types which offer different performance requirements. The most commonly used ones are gp2 and gp3 volumes, providing a good combination of performance, capacity, and cost efficiency. So why would someone need an alternative?

For environments with high-performance requirements such as transactional databases, where low-latency access and optimized storage costs are key, alternative solutions are essential. This is where simplyblock steps in, offering a new way to manage storage that addresses common pain points in traditional EBS or local NVMe disk usage—such as limited scalability, complex resizing processes, and the cost of underutilized storage capacity.

What is Simplyblock?

Simplyblock is known for providing top performance based on distributed (clustered) NVMe instance storage at low cost with great data availability and durability. Simplyblock provides storage to Linux instances and Kubernetes environments via the NVMe block storage and NVMe over Fabrics (using TCP/IP as the underlying transport layer) protocols and the simplyblock CSI Driver.

Simplyblock’s storage orchestration technology is fast. The service provides access latency between 100 us and 500 us, depending on the IO access pattern and deployment topology. That means that simplyblock’s access latency is comparable to, or even lower than on Amazon EBS io2 volumes, which typically provide between 200 us to 300 us.

To make sure we only provide storage which will keep up, we test simplyblock extensively. With simplyblock you can easily achieve more than 1 million IOPS at a 4KiB block size on single EC2 compute instances. This is several times higher than the most scalable Amazon EBS volumes, io2 Block Express. On the other hand, simplyblock’s cost of capacity is comparable to io2. However, with simplyblock IOPS come for free – at absolutely no extra charge. Therefore, depending on the capacity to IOPS ratio of io2 volumes, it is possible to achieve cost advantages up to 10x .

For customers requiring very low storage access latency and high IOPS per TiB, simplyblock provides the best cost efficiency available today.

Why Simplyblock over Simple Amazon EBS?

Many customers are generally satisfied with the performance of their gp3 EBS volumes. Access latency of 6 to 10 ms is fine, and they never have to go beyond the included 3,000 IOPS (on gp2 and gp3). They should still care for simplyblock, because there is more. Much more.

Benefits of Thin Provisioning

With gp3, customers have to pay for provisioned rather than utilized capacity (~USD 80 per TiB provisioned). According to our research, the average utilization of Amazon EBS gp3 volumes is only at ~30%. This means that customers are actually paying more than three times the price per TiB of utilized storage. That said, due to the low utilization below one-third, the actual price comes down to about USD 250 per TiB. The higher the utilization, the closer a customer would be to the projected USD 80 per TiB.

In addition to the price inefficiency, customers also have to manage the resizing of gp3 volumes when utilization reaches the current capacity limit. However, resizing has its own number of limitations in EBS it is only possible once every six hours. To mitigate potential issues during that time, volumes are commonly doubled in size.

On the other hand, simplyblock provides thin provisioned logical volumes. This means that you can provision your volumes nearly without any restriction in size. Think of growable partitions that are sliced out of the storage pool. Logical volumes can also be over-provisioned, meaning, you can set the requested storage capacity to exceed the storage pool’s current size. There is no charge for the over-provisioned capacity as long as you do not use it.

That said, simplyblock thinly provisions NVMe volumes from a storage pool which is either made up of distributed local instance storage or gp3 volumes. The underlying pool is resized before it runs out of storage capacity.

These means enable you to save massively on storage, while also simplifying your operations. No more manual or script-based resizing! No more custom alerts before running out of storage.

Benefits of Storage Tiering

But if you feel there should be even more potential to save on storage, you are absolutely right!

The total data stored on a single EBS volume has very different access patterns. Let’s explore together what the average database setup looks like. The typical corporate’s transactional database will easily qualify as a “hot” storage. It is commonly stored on SSD-based EBS volumes. Nobody would think of putting this database to slow file storage stored on HDD or Amazon S3.

In reality, however, data that belongs to a database is never homogeneous when it comes to performance requirements. There is, for example, the so-called database transaction log, often referred to as write-ahead log (WAL) or simply a database journal. The WAL is quite sensitive to access latency and requires a high IOPS rate for writes. On the other hand, the log is relatively small compared to the entire dataset in the database.

Furthermore, some other data files store tablespaces and index spaces. Many of them are read so frequently that they are always kept in memory. They do not depend on storage performance. Others are accessed less frequently, meaning they have to be loaded from storage every time they’re accessed. They require solid storage performance on read.

Last but not least, there are large tables which are commonly used for archiving or document storage. They are written or read infrequently and typically in large IO sizes (batches). While throughput speed is relevant for accessing this data, access latency is not.

To support all of the above use cases, simplyblock supports automatic tiering. Our tiering will place less frequently accessed data to either Amazon EBS (st2) or Amazon S3, called warm storage. The tiering implementation is optimized for throughput, hence large amounts of data can be written or read in parallel. Simplyblock automatically identifies individual segments of data, which qualify for tiering, and moves them automatically to secondary storage, and only after tiering was successful, cleaning them up on the “hot” tier. This reduces the storage demand in the hot pool.

The AWS cost ratio between hot and warm storage is about 5:1, cutting cost to about 20% for tiered data. Tiering is completely transparent to you and data is automatically read from tiered storage when requested.

Based on our observations, we often see that up to 75% of all stored data can be tiered to warm storage. This creates another massive potential in storage costs savings.

How to Prevent Data Duplication

But there is yet more to come.

The AWS’ gp3 volumes do not allow multi-attach, meaning the same volume cannot be attached to multiple virtual machines or containers at the same time. Furthermore, its reliability is also relatively low (indicated at 99.8% – 99.9%) compared to Amazon S3.

That means neither a loss of availability nor a loss of data can be ruled out in case of an incident.

Therefore, additional steps need to be taken to increase availability of the storage consuming service, as well as the reliability of the storage itself. The common measure is to employ storage replication (RAID-1, or application-level replication). However, this leads to additional operational complexity, utilization of network bandwidth, and to a duplication of storage demand (which doubles the storage capacity and cost).

Simplyblock mitigates the requirement to replicate storage. First, the same thinly provisioned volume can be attached to more than one Amazon EC2 instance (or container) and, second, the reliability of each individual volume is higher (99.9999%) due to the internal use of erasure coding (parity data) to protect the data.

Multi-attach helps to cut the storage cost by half.

The Cost of Backup

Last but not least, backups. Yes there is even more.

A snapshot taken from an Amazon EBS volume is stored in an S3-like storage. However, AWS charges significantly more per TiB than for the same storage directly on S3. Actually about 3.5 times.

Snapshots taken from simplyblock logical volumes, however, are stored into a standard Amazon S3 bucket and based on the standard S3 pricing, giving you yet another nice cost reduction.

Near-Zero RPO Disaster Recovery

Anyhow, there is one more feature that we really want to talk about. Disaster recovery is an optional feature. Our DR comes at a minimum RPO and can be deployed without any redundancy on either the block storage or the compute layer between zones. Additionally, no data transfers between zones are needed.

Simplyblock employs asynchronous replication to store any change on the storage pool to an S3 bucket. This enables a fully crash-consistent and near-real-time option for disaster recovery. You can bootstrap and restart your entire environment after a disaster. This works in the same or a different availability zone and without having to take care of backup management yourself.

And if something happened, accidental deletion or even a successful ransomware attack which encrypted your data. Simplyblock is here to help. Our asynchronous replication journal provides full Point-in-Time-Recovery functionality on the block storage layer. No need for your service or database to support it. Just rewind the storage to whatever point in time in the past.

It also utilizes write- and deletion-protected on its S3 bucket making the journal itself resilient to ransomware attacks. That said, simplyblock provides a sophisticated solution to disaster recovery and cybersecurity breaches without the need for manual backup management.

Simplyblock is Storage Optimization – just for you

Simplyblock provides a number of advantages for environments that utilize a large number of Amazon EBS gp2 or gp3 volumes. Thin provisioning enables you to consolidate unused storage capacity and minimize the spent. Due to the automatic pool enlargement (increasing the pool with additional EBS volumes or storage nodes), you’ll never run out of storage space but also only require the least amount.

Together with automatic tiering, you can move infrequently used data blocks to warm or even cold storage. Fully transparent to the application. The same is true for our disaster recovery. Built into the storage layer, every application can benefit from point in time recovery, removing almost all RPO (Recovery Point Objective) from your whole infrastructure. And with consistent snapshots across volumes, you can enable a full-blown infrastructure recovery in case of an availability zone outage, right from ground up.

With simplyblock you get more features than mentioned here. Get started right away and learn about our other features and benefits.

The post Simplyblock for AWS: Environments with many gp2 or gp3 Volumes appeared first on simplyblock.

KubeCon + CloudNativeCon NA 2024: Your Salt Lake City Guide

Rahil Parekh — Wed, 18 Sep 2024 21:59:00 +0000

In preparation for KubeCon + CloudNativeCon, we at simplyblock figured it was essential to create a guide beyond the conference halls, highlighting the best spots to truly experience Salt Lake City. As a remote only company, we believe this short escapade, with the chance to get the team together, shouldn’t just be about work but also enjoying the local culture, food, and fun. But we’re not the gatekeeping type over here, so we want to share this guide with you in the hope to inspire you to make the most of your time there. After all, work hard, play hard—that’s our motto, and we hope you’ll join us in embracing both during the conference.

Salt Lake City’s TRAX system is your go-to for easily navigating the city. The straightforward routes and convenient stops make it a breeze to get around, much like finding the right Kubernetes pod. Grab a day pass for unlimited travel, and you’ll be ready to explore the city without any detours. To make your journey even smoother, check out this link for all the details on how to ride TRAX, including tips on routes, stops, and purchasing passes.

Here’s a link to the Google Maps location for all the places to make it more convenient.

By the way, you might see us at one of these places, so don’t be shy about greeting us. The worst we can do is improve your storage efficiency. Alright, I’ll stop pitching, read on and make your plans!

🏛️ Landmark Highlights

Salt Lake City is rich in history and architecture, with several landmarks to explore. Don’t miss the Cathedral of the Madeleine (1) ( maps ), a beautiful example of Romanesque and Gothic Revival design. Inside, admire vibrant stained glass, intricate woodwork, and serene murals. Whether attending a service or simply enjoying the peaceful atmosphere, it’s a must-visit spot in the city.

Secondly, don’t miss the Utah State Capitol (2) ( maps ), a stunning architectural landmark perched on Capitol Hill. This neoclassical building offers free guided tours and breathtaking views of Salt Lake City and the surrounding mountains. Inside, explore the rich history of Utah through various exhibits, and enjoy the peaceful ambiance of the beautifully landscaped grounds, perfect for a leisurely stroll.

❄️ Winter Thrills

Salt Lake City isn’t just a hub for tech enthusiasts during KubeCon + CloudNativeCon—it’s also a vibrant winter wonderland that offers something for everyone. After you’ve tackled all the conference sessions, why not unwind by ice skating at the Gallivan Center ? (3) ( Maps ) This charming downtown rink is a perfect way to enjoy the festive atmosphere—just remember to bundle up, because even tech pros feel the chill!

For a more serene experience, visit Temple Square (4) ( Maps ), which turns into a dazzling holiday light display in the winter—think of it as a peaceful stroll through a winter wonderland that’ll definitely brighten your day.

🍽️ Food Adventures Await

Heading to KubeCon + CloudNativeCon in Salt Lake City and eager to dive into the local food scene? We’ve got you covered with some of the city’s top dining spots, each offering a unique taste of Salt Lake’s diverse culinary landscape. For authentic Mexican cuisine, head to Red Iguana **** (5) **** ( Maps ), famous for its rich and flavorful moles that will keep you coming back for more. If you’re in the mood for a contemporary twist on American classics, The Copper Onion (6) ( Maps ) serves up bold, locally sourced dishes like succulent pork belly and house-made pastas. For sushi lovers, Takashi (7) ( Maps ) is a must-visit, offering fresh, innovative rolls and sashimi that rival coastal cities.

HSL (8) ( Maps ) provides a chic yet cozy atmosphere with modern American dishes like roasted chicken with seasonal veggies, perfect for unwinding after a day of sessions. And when you need a coffee break, The Rose Establishment (9) ( Maps ) is your go-to spot for quality coffee and artisanal pastries—a cozy hideaway to recharge between conference activities.

🍻 Don’t Forget to Enjoy the Night

Looking for the perfect spot to unwind after a day at KubeCon + CloudNativeCon in Salt Lake City? Here are some top picks for a fun and relaxing evening. Start with The Red Door (10) ( Maps ), a cozy lounge with a speakeasy vibe, where you can sip on craft cocktails and decompress after all the tech talk.

If you’re in the mood for some nostalgia, head to Quarters Arcade Bar (11) ( Maps ), where you can relive your childhood with classic arcade games while enjoying a drink. For craft beer enthusiasts, Beer Bar (12) ( Maps ) offers a wide selection of local brews in a laid-back atmosphere, making it an ideal spot to kick back with colleagues.

Whiskey Street (13) ( Maps ) is the place for those who appreciate a good whiskey. Its extensive selection, lively atmosphere, and perfect blend of elegance and comfort make it a great spot to enjoy expertly crafted cocktails and delicious food.

⛷ Salt Lake Ski Resorts

Salt Lake City is the ultimate destination for skiing in November, thanks to its unique combination of weather, terrain, and resort accessibility. Nestled at the base of the Wasatch Mountains (worth visiting the area in and of itself), Salt Lake benefits from early-season snowfalls that blanket the region’s world-renowned ski resorts, often making them operational as early as mid-November. The area’s terrain is diverse – ranging from steep and challenging slopes for experts to wide and groomed runs perfect for beginners, ensuring a tailored experience for every skier.

Moreover, Salt Lake City is just a short drive from over 10 major ski resorts , making it incredibly convenient for visitors to access top-tier slopes without long travel times. The efficient transport system, in the form of the ski bus can help you travel to and enjoy all the different ski slopes. If you don’t want to travel by bus, there are multiple options here to pick from. In addition to this, you might also want to check out the bundled pricing (with additional perks) for ski resorts.

This may be subjective, but according to our highly adept (world-class) skiing experts at simplyblock, we have come up with the list of the top 5 resorts that you may want to visit:

These are more for the experienced, and you can get a bundled pass for both here (also known to be in an area called Little Cottonwood):

Alta Ski Area ( Maps ): Alta Ski Area, one of the first ski areas in the U.S., is renowned for its steep terrain and deep powder, averaging 546 inches of snow annually. Covering 2,614 acres with a 2,538-foot vertical drop, 45% of the terrain caters to beginner and intermediate skiers. Six lifts provide access to its varied slopes, making it a top destination for experienced skiers. However, you can find multiple resorts in the Alta ski area, our favorite pick is Snowpine lodge ( Maps ). You can view the prices here . Snowbird Resort ( Maps ): Snowbird’s top elevation is 11,000 feet at Hidden Peak, with a 7,760-foot base at Baby Thunder. The resort’s tram ascends 2,900 vertical feet in just 7 minutes. It gets 500 inches of dry Utah powder annually. SKIING Magazine has ranked Alta and Snowbird the No. 1 resort in the United States. You can view the prices here .

These are more for fun and the beginner (also known to be in an area called Big Cottonwood):

Brighton Resort (Maps): Brighton Resort offers 1,050 acres of skiable terrain with a 1,745-foot vertical drop. Receiving 500 inches of snow annually, Brighton is family-friendly and accessible, with 66 runs and lifts that include five quads, a triple, and a magic carpet. Night skiing is available on 200 acres, making it a well-rounded destination just 35 miles from Salt Lake City. You can view the prices here . Solitude Resort ( Maps ): Solitude Mountain Resort spans 1,200 acres with a 2,494-foot vertical drop and receives 500 inches of snow each year. With 87 runs catering to all skill levels, the resort offers a peaceful skiing experience. Its eight lifts, including four high-speed quads, provide quick access to varied terrain, from groomed runs to powder-filled glades, making Solitude an ideal escape for skiers of all levels. You can view the prices here .

And a bonus tubing and skiing location, the home of the 2002 Winter Olympics:

Soldier hollow Nordic Centre (Maps): Nestled in Wasatch Mountain State Park, Soldier Hollow is famous for its Olympic legacy and offers over 20 miles of cross-country skiing trails through scenic landscapes. It’s a winter paradise for both athletes and the public, with activities like cross-country skiing, snowshoeing, and even public biathlon courses. A highlight is the snow tubing lanes, the longest in Utah, stretching over 1,200 feet. The day lodge provides ski rentals and food, making it an ideal spot for a winter retreat and family fun. We think Zermatt Utah Resort ( Maps ) is relatively close and reasonable. You can find the locations of all of the skiing resorts below and on this map list:

📜 Closing Note

If you’re attending KubeCon + CloudNativeCon NA 2024 in Salt Lake City, we’re excited to let you know that simplyblock will be there too!

Simplyblock offers cutting-edge storage orchestration solutions tailored for IO-intensive stateful workloads in Kubernetes, including databases and analytics. A single system seamlessly connects local NVMe disks, GP3 volumes, and S3, making it easier to manage storage capacity and performance. With smart NVMe caching, thin provisioning, storage tiering, and volume pooling, we enhance database performance while reducing costs—all without requiring changes to your existing AWS infrastructure.

We invite you to visit the simplyblock booth at the event! Swing by to learn more about how we can optimize your storage solutions and pick up some exclusive freebies. We can’t wait to meet you and discuss how we can help improve your Kubernetes experience. See you there!

The post KubeCon + CloudNativeCon NA 2024: Your Salt Lake City Guide appeared first on simplyblock.

Best Open Source Tools For Kubernetes

Chris Engelbert — Tue, 17 Sep 2024 22:44:08 +0000

The Kubernetes ecosystem is vibrant and ever-expanding, driven by a community of developers who are committed to enhancing the way we manage and deploy applications. Open-source tools have become an essential part of this ecosystem, offering a wide range of functionalities that streamline Kubernetes operations. These tools are crucial for automating tasks, improving efficiency, and ensuring that your Kubernetes clusters run smoothly. As Kubernetes continues to gain popularity, the demand for robust and reliable open-source tools has also increased. Developers and operators are constantly on the lookout for tools that can help them manage their Kubernetes environments more effectively. In this post, we will explore nine must-know open-source tools that can help you optimize your Kubernetes environment.

1. Helm

Helm is often referred to as the package manager for Kubernetes. It simplifies the process of deploying, managing, and versioning Kubernetes applications. With Helm, you can define, install, and upgrade even the most complex Kubernetes applications. By using Helm Charts, you can easily share your application configurations and manage dependencies between them.

2. Prometheus

Prometheus is a leading monitoring and alerting toolkit that’s widely adopted within the Kubernetes community. It collects metrics from your Kubernetes clusters, stores them, and allows you to query and visualize the data. Prometheus is essential for keeping an eye on your infrastructure’s performance and spotting issues before they become critical.

3. Kubectl

Kubectl is the command-line tool that allows you to interact with your Kubernetes clusters. It is indispensable for managing cluster resources, deploying applications, and troubleshooting issues. Whether you’re scaling your applications or inspecting logs, Kubectl provides the commands you need to get the job done.

4. Kustomize

Kustomize is a configuration management tool that helps you customize Kubernetes objects through a file-based approach. It allows you to manage multiple configurations without duplicating YAML manifests. Kustomize’s native support in Kubectl makes it easy to integrate into your existing workflows.

5. Argo CD

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. It enables you to manage your application deployments through Git repositories, ensuring that your applications are always in sync with your Git-based source of truth. Argo CD offers features like automated sync, rollback, and health status monitoring, making it a powerful tool for CI/CD pipelines.

6. Istio

Istio is an open-source service mesh that provides traffic management, security, and observability for microservices. It simplifies the complexity of managing network traffic between services in a Kubernetes cluster. Istio helps ensure that your applications are secure, reliable, and easy to monitor.

7. Fluentd

Fluentd is a versatile log management tool that helps you collect, process, and analyze logs from various sources within your Kubernetes cluster. With Fluentd, you can unify your log data into a single source and easily route it to multiple destinations, making it easier to monitor and troubleshoot your applications.

8. Velero

Velero is a backup and recovery solution for Kubernetes clusters. It allows you to back up your cluster resources and persistent volumes, restore them when needed, and even migrate resources between clusters. Velero is an essential tool for disaster recovery planning in Kubernetes environments.

9. Kubeapps

Kubeapps is a web-based UI for deploying and managing applications on Kubernetes. It provides a simple and intuitive interface for browsing Helm charts, managing your applications, and even configuring role-based access control (RBAC). Kubeapps makes it easier for developers and operators to work with Kubernetes applications.

Conclusion

These nine open-source tools are integral to optimizing and managing your Kubernetes environment. Each of them addresses a specific aspect of Kubernetes operations, from monitoring and logging to configuration management and deployment. By integrating these tools into your Kubernetes workflow, you can enhance your cluster’s efficiency, reliability, and security.

However, there is more. Simplyblock offers a wide range of benefits to many of the above tools, either by enhancing their capability with high performance and low latency storage options, or by directly integrating with them.

Simplyblock is the intelligent storage orchestrator for Kubernetes . We provide the Kubernetes community with easy to use virtual NVMe block devices by combining the power of Amazon EBS and Amazon S3, as well as local instance storage. Seamlessly integrated as a StorageClass (CSI) into Kubernetes, simplyblock enables Kubernetes workloads with a requirement for high IOPS and ultra low latency. Deployed directly into your AWS account, simplyblock takes full responsibility of your data and storage infrastructure, scaling and growing dynamically to meet your storage demands at any point in time.

Why Choose Simplyblock for Kubernetes?

Choosing simplyblock for your Kubernetes workloads comes with several compelling benefits to optimize your workload performance, scalability, and cost-efficiency. Elastic block storage powered by simplyblock is designed for IO-intensive and predictable low latency workloads.

Increase Cost-Efficiency: Optimize resource scaling to exactly meet your current requirements and reduce the overall cloud spend. Grow as needed, not upfront.
Maximize Reliability and Speed: Get the best of both worlds with ultra low latency of local instance storage combined with the reliability of Amazon EBS and Amazon S3.
Enhance Security: Get an immediate mitigation strategy for availability zone, and even region, outages using simplyblock’s S3 journaling and Point in Time Recovery (PITR) for any application.

If you’re looking to further streamline your Kubernetes operations, simplyblock offers comprehensive solutions that integrate seamlessly with these tools, helping you get the most out of your Kubernetes environment.

Ready to take your Kubernetes management to the next level? Contact simplyblock today to learn how we can help you simplify and enhance your Kubernetes journey.

The post Best Open Source Tools For Kubernetes appeared first on simplyblock.

RDS vs. EKS: The True Cost of Database Management

Rob Pankow — Thu, 12 Sep 2024 23:21:23 +0000

Databases can make up a significant portion of the costs for a variety of businesses and enterprises, and in particular for SaaS, Fintech, or E-commerce & Retail verticals. Choosing the right database management solution can make or break your business margins. But have you ever wondered about the true cost of your database management? Is your current solution really as cost-effective as you think? Let’s dive deep into the world of database management and uncover the hidden expenses that might be eating away at your bottom line.

The Database Dilemma: Managed Services or Self-Managed?

The first crucial decision comes when choosing the operating model for your databases: should you opt for managed services like AWS RDS or take the reins yourself with a self-managed solution on Kubernetes? It’s not just about the upfront costs – there’s a whole iceberg of expenses lurking beneath the surface.

The Allure of Managed Services

At first glance, managed services like AWS RDS seem to be a no-brainer. They promise hassle-free management, automatic updates, and round-the-clock support. But is it really as rosy as it seems?

The Visible Costs

Subscription Fees : You’re paying for the convenience, and it doesn’t come cheap.
Storage Costs : Every gigabyte counts, and it adds up quickly.
Data Transfer Fees : Moving data in and out? Be prepared to open your wallet.

The Hidden Expenses

Overprovisioning : Are you paying for more than you are actually using?
Personnel costs : Using RDS and assuming that you don’t need to understand databases anymore? Surprise! You still need team that will need to configure the database and set it up for your requirements.
Performance Limitations : When you hit a ceiling, scaling up can be costly.
Vendor Lock-in : Switching providers? That’ll cost you in time and money.
Data Migration : Moving data between services can cost a fortune.
Backup and Storage : Those “convenient” backups? They’re not free. In addition, AWS RDS does not let you plug in other storage solution than AWS-native EBS volumes, which can get quite expensive if your database is IO-intensive

The Power of Self-Managed Kubernetes Databases

On the flip side, managing your databases on Kubernetes might seem daunting at first. But let’s break it down and see where you could be saving big.

Initial Investment

Learning Curve : Yes, there’s an upfront cost in time and training. You need to have on your team engineers that are comfortable with Kubernetes or Amazon EKS.
Setup and Configuration : Getting things right takes effort, but it pays off.

Long-term Savings

Flexibility : Scale up or down as needed, without overpaying.
Multi-Cloud Freedom : Avoid vendor lock-in and negotiate better rates.
Resource Optimization : Use your hardware efficiently across workloads.
Resource Sharing : Kubernetes lets you efficiently allocate resources.
Open-Source Tools : Leverage free, powerful tools for monitoring and management.
Customization : Tailor your setup to your exact needs, no compromise.

Where are the Savings Coming from when using Kubernetes for your Database Management?

In a self-managed Kubernetes environment, you have greater control over resource allocation, leading to improved utilization and efficiency. Here’s why:

a) Dynamic Resource Allocation : Kubernetes allows for fine-grained control over CPU and memory allocation. You can set resource limits and requests at the pod level, ensuring databases only use what they need. Example: During off-peak hours, you can automatically scale down resources, whereas in managed services, you often pay for fixed resources 24/7.

b) Bin Packing : Kubernetes scheduler efficiently packs containers onto nodes, maximizing resource usage. This means you can run more workloads on the same hardware, reducing overall infrastructure costs. Example: You might be able to run both your database and application containers on the same node, optimizing server usage.

c) Avoid Overprovisioning : With managed services, you often need to provision for peak load at all times. In Kubernetes, you can use Horizontal Pod Autoscaling to add resources only when needed. Example: During a traffic spike, you can automatically add more database replicas, then scale down when the spike ends.

d) Resource Quotas : Kubernetes allows setting resource quotas at the namespace level, preventing any single team or application from monopolizing cluster resources. This leads to more efficient resource sharing across your organization.

Self-managed Kubernetes databases can also significantly reduce data transfer costs compared to managed services. Here’s how:

a) Co-location of Services : In Kubernetes, you can deploy your databases and application services in the same cluster. This reduces or eliminates data transfer between zones or regions, which is often charged in managed services. Example: If your app and database are in the same Kubernetes cluster, inter-service communication doesn’t incur data transfer fees.

b) Efficient Data Replication : Kubernetes allows for more control over how and when data is replicated. You can optimize replication strategies to reduce unnecessary data movement. Example: You might replicate data during off-peak hours or use differential backups to minimize data transfer.

c) Avoid Provider Lock-in : Managed services often charge for data egress, especially when moving to another provider. With self-managed databases, you have the flexibility to choose the most cost-effective data transfer methods. Example: You could use direct connectivity options or content delivery networks to reduce data transfer costs between regions or clouds.

d) Optimized Backup Strategies : Self-managed solutions allow for more control over backup processes. You can implement incremental backups or use deduplication techniques to reduce the amount of data transferred for backups. Example: Instead of full daily backups (common in managed services), you might do weekly full backups with daily incrementals, significantly reducing data transfer.

e) Multi-Cloud Flexibility : Self-managed Kubernetes databases allow you to strategically place data closer to where it’s consumed. This can reduce long-distance data transfer costs, which are often higher. Example: You could have a primary database in one cloud and read replicas in another, optimizing for both performance and cost.

By leveraging these strategies in a self-managed Kubernetes environment, organizations can significantly optimize their resource usage and reduce data transfer costs, leading to substantial savings compared to typical managed database services.

Breaking down the Numbers: a Cost Comparison between PostgreSQL on RDS vs EKS

Let’s get down to brass tacks. How do the costs really stack up? We’ve crunched the numbers for a small Postgres database between using managed RDS service and hosting on Kubernetes. For Kubernetes we are using EC2 instances with local NVMe disks that are managed on EKS and simplyblock as storage orchestration layer.

Scenario: 3TB Postgres Database with High Availability (3 nodes) and Single AZ Deployment

Managed Service (AWS RDS) using three Db.m4.2xlarge on Demand with Gp3 Volumes

Available resources

Costs

Available vCPU: 8 Available Memory: 32 GiB Available Storage: 3TB Available IOPS: 20,000 per volume Storage latency: 1-2 milliseconds

Monthly Total Cost: $2511,18
3-Year Total: $2511,18 * 36 months = $90,402

Editorial: See the pricing calculator for Amazon RDS for PostgreSQL

Self-Managed on Kubernetes (EKS) using three i3en.xlarge Instances on Demand

Available resources

Costs

Available vCPU: 12 Available Memory: 96 GiB Available

Storage: 3.75TB (7.5TB raw storage with assumed 50% data protection overhead for simplyblock) Available IOPS: 200,000 per volume (10x more than with RDS) Storage latency: below 200 microseconds (local NVMe disk orchestrated by simplyblock)

Monthly instance cost: $989.88 Monthly storage orchestration cost (e.g. Simplyblock): $90 (3TB x $30/TB)

Monthly EKS cost: $219 ($73 per cluster x 3)

Monthly Total Cost: $1298.88

3-Year Total: $1298.88 x 36 months = $46,759 Base Savings : $90,402 – $46,759 = $43,643 (48% over 3 years)

That’s a whopping 48% saving over three years! But wait, there’s more to consider. We have made some simplistic assumptions to estimate additional benefits of self-hosting to showcase the real potential of savings. While the actual efficiencies may vary from company to company, it should at least give a good understanding of where the hidden benefits might lie.

Additional Benefits of Self-Hosting (Estimated Annual Savings)

Resource optimization/sharing : Assumption: 20% better resource utilization (assuming existing Kubernetes clusters) Estimated Annual Saving: 20% x 989.88 x 12= $2,375
Reduced Data Transfer Costs : Assumption: 50% reduction in data transfer fees Estimated Annual Saving: $2,000
Flexible Scaling : Avoid over-provisioning during non-peak times Estimated Annual Saving: $3,000
Multi-Cloud Strategy : Ability to negotiate better rates across providers Estimated Annual Saving: $5,000
Open-Source Tools : Reduced licensing costs for management tools Estimated Annual Saving: $4,000

Disaster Recovery Insights

RTO (Recovery Time Objective) Improvement : Self-managed: Potential for 40% faster recovery Estimated value: $10,000 per hour of downtime prevented
RPO (Recovery Point Objective) Enhancement : Self-managed: Achieve near-zero data loss Estimated annual value: $20,000 in potential data loss prevention

Total Estimated Annual Benefit of Self-Hosting

Self-hosting pays off. Here is the summary of benefits: Base Savings: $8,400/year Additional Benefits: $15,920/year Disaster Recovery Improvement: $30,000/year (conservative estimate)

Total Estimated Annual Additional Benefit: $54,695

Total Estimated Additional Benefits over 3 Years: $164,085

Note: These figures are estimates and can vary based on specific use cases, implementation efficiency, and negotiated rates with cloud providers.

Beyond the Dollar Signs: the Real Value Proposition

Money talks, but it’s not the only factor in play. Let’s look at the broader picture.

Performance and Scalability

With self-managed Kubernetes databases, you’re in the driver’s seat. Need to scale up for a traffic spike? Done. Want to optimize for a specific workload? You’ve got the power.

Security and Compliance

Think managed services have the upper hand in security? Think again. With self-managed solutions, you have granular control over your security measures. Plus, you’re not sharing infrastructure with unknown entities.

Innovation and Agility

In the fast-paced tech world, agility is king. Self-managed solutions on Kubernetes allow you to adopt cutting-edge technologies and practices without waiting for your provider to catch up.

Is the Database on Kubernetes for Everyone?

Definitely not. While self-managed databases on Kubernetes offer significant benefits in terms of cost savings, flexibility, and control, they’re not a one-size-fits-all solution. Here’s why:

Expertise: Managing databases on Kubernetes demands a high level of expertise in both database administration and Kubernetes orchestration. Not all organizations have this skill set readily available. Self-management means taking on responsibilities like security patching, performance tuning, and disaster recovery planning. For smaller teams or those with limited DevOps resources, this can be overwhelming.
Scale of operations : For simple applications with predictable, low-to-moderate database requirements, the advanced features and flexibility of Kubernetes might be overkill. Managed services could be more cost-effective in these scenarios. Same applies for very small operations or startups in early stages – the cost benefits of self-managed databases on Kubernetes might not outweigh the added complexity and resource requirements.

While database management on Kubernetes offers compelling advantages, organizations must carefully assess their specific needs, resources, and constraints before making the switch. For many, especially larger enterprises or those with complex, dynamic database requirements, the benefits can be substantial. However, others might find that managed services better suit their current needs and capabilities.

Bonus: Simplyblock

There is one more bonus benefit that you get when running your databases in Kubernetes – you can add simplyblock as your storage orchestration layer behind a single CSI driver that will automatically and intelligently serve storage service of your choice. Do you need fast NVMe cache for some hot transactional data with random IO but don’t want to keep it hot forever? We’ve got you covered!

Simplyblock is an innovative cloud-native storage product, which runs on AWS, as well as other major cloud platforms. Simplyblock virtualizes, optimizes, and orchestrates existing cloud storage services (such as Amazon EBS or Amazon S3) behind a NVMe storage interface and a Kubernetes CSI driver. As such, it provides storage for compute instances (VMs) and containers. We have optimized for IO-heavy database workloads, including OLTP relational databases, graph databases, non-relational document databases, analytical databases, fast key-value stores, vector databases, and similar solutions.

This optimization has been built from the ground up to orchestrate a wide range of database storage needs, such as reliable and fast (high write-IOPS) storage for write-ahead logs and support for ultra-low latency, as well as high IOPS for random read operations. Simplyblock is highly configurable to optimally serve the different database query engines.

Some of the key benefits of using simplyblock alongside your stateful Kubernetes workloads are:

Cost Reduction, Margin Increase: Thin provisioning, compression, deduplication of hot-standby nodes, and storage virtualization with multiple tenants increases storage usage while enabling gradual storage increase.
Easy Scalability of Storage: Single node databases require highly scalable storage (IOPS, throughput, capacity) since data cannot be distributed to scale. Simplyblock pools either Amazon EBS volumes or local instance storage from EC2 virtual machines and provides a scalable and cost effective storage solution for single node databases.
Enables Database Branching Features: Using instant snapshots and clones, databases can be quickly branched out and provided to customers. Due to copy-on-write, the storage usage doesn’t increase unless the data is changed on either the primary or branch. Customers could be charged for “additional storage” though.
Enhances Security: Using an S3-based streaming of a recovery journal, the database can be quickly recovered from full AZ and even region outages. It also provides protection against typical ransomware attacks where data gets encrypted by enabling Point-in-Time-Recovery down to a few hundred milliseconds granularity.

Conclusion: the True Cost Revealed

When it comes to database management, the true cost goes far beyond the monthly bill. By choosing a self-managed Kubernetes solution, you’re not just saving money – you’re investing in flexibility, performance, and future-readiness. The savings and benefits will be always use-case and company-specific but the general conclusion shall remain unchanged. While operating databases in Kubernetes is not for everyone, for those who have the privilege of such choice, it should be a no-brainer kind of decision.

Is managing databases on Kubernetes complex?

While there is a learning curve, modern tools and platforms like simplyblock significantly simplify the process, often making it more straightforward than dealing with the limitations of managed services. The knowledge acquired in the process can be though re-utilized across different cloud deployments in different clouds.

How can I ensure high availability with self-managed databases?

Kubernetes offers robust features for high availability, including automatic failover and load balancing. With proper configuration, you can achieve even higher availability than many managed services offer, meeting any possible SLA out there. You are in full control of the SLAs.

How difficult is it to migrate from a managed database service to Kubernetes?

While migration requires careful planning, tools and services exist to streamline the process. Many companies find that the long-term benefits far outweigh the short-term effort of migration.

How does simplyblock handle database backups and point-in-time recovery in Kubernetes?

Simplyblock provides automated, space-efficient backup solutions that integrate seamlessly with Kubernetes. Our point-in-time recovery feature allows you to restore your database to any specific moment, offering protection against data loss and ransomware attacks.

Does simplyblock offer support for multiple database types?

Yes, simplyblock supports a wide range of database types including relational databases like PostgreSQL and MySQL, as well as NoSQL databases like MongoDB and Cassandra. Check out our “Supported Technologies” page for a full list of supported databases and their specific features.

The post RDS vs. EKS: The True Cost of Database Management appeared first on simplyblock.

AWS Migration: How to Migrate into the Cloud? Data Storage Perspective.

Rob Pankow — Thu, 12 Sep 2024 23:17:55 +0000

Migrating to the cloud can be daunting, but it becomes a manageable and rewarding process with the right approach and understanding of the storage perspective. Amazon Web Services (AWS) offers a comprehensive suite of tools and services to facilitate your migration journey, ensuring your data is securely and efficiently transitioned to the cloud. In this guide, we’ll walk you through the essential steps and considerations for migrating to AWS from a storage perspective.

Why Migrate to AWS?

Migrating to AWS offers numerous benefits, including scalability, cost savings, improved performance, and enhanced security. AWS’s extensive range of storage solutions caters to diverse needs, from simple object storage to high-performance block storage. By leveraging AWS’s robust infrastructure, businesses can focus on innovation and growth without worrying about underlying IT challenges.

Understanding AWS Storage Options

Before diving into the migration process, it’s crucial to understand the various storage options AWS offers:

Amazon S3 (Simple Storage Service) Amazon S3 is an object storage service that provides scalability, data availability, security, and performance. It’s ideal for storing and retrieving data at any time.
Amazon EBS (Elastic Block Store) Amazon EBS provides block storage for EC2 instances. It’s suitable for applications requiring low-latency data access and offers different volume types optimized for performance and cost.
Amazon EFS (Elastic File System) Amazon EFS is designed to be highly scalable and elastic. It provides scalable file storage for use with AWS Cloud services and on-premises resources.
Amazon Glacier Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup. It’s ideal for data that is infrequently accessed

AWS provides several migration tools, such as AWS DataSync and AWS Snowball, to ensure a smooth and efficient data migration process. Based on your data volume and migration requirements, choose the right tool.

How is data stored in AWS? AWS stores the data of each storage service separately. That means that AWS storage services are not synchronized and your data might be frequently duplicated multiple times. Coordination between AWS storage services might be resolved using orchestration tools such as simplyblock.

Steps for Migrating to AWS

1. Assess your Current Environment

Begin by evaluating your current storage infrastructure. Identify the types of data you store, how often it’s accessed, and any compliance requirements. This assessment will help you choose the right AWS storage services for your needs.

2. Plan your Migration Strategy

Develop a comprehensive migration plan that outlines the steps, timelines, and resources required. Decide whether you’ll use a lift-and-shift approach, re-architecting, or a hybrid strategy.

3. Choose the right AWS Storage Services

Based on your assessment, select the appropriate AWS storage services. For instance, Amazon S3 can be used for object storage, EBS for block storage, and EFS for scalable file storage.

4. Set up the AWS Environment

Set up your AWS environment, including creating an AWS account, configuring Identity and Access Management (IAM) roles, and setting up Virtual Private Clouds (VPCs).

5. Use AWS Migration Tools

AWS offers several tools to assist with migration, such as

AWS Storage Gateway, which bridges your on-premises data and AWS Cloud storage
AWS DataSync automates moving data between on-premises storage and AWS
AWS Snowball physically transports large amounts of data to AWS.

6. Migrate Data

Start migrating your data using the chosen AWS tools and services. Ensure data integrity and security during the transfer process. Test the migrated data to verify its accuracy and completeness.

7. Optimize Storage Performance

After migration, monitor and optimize your storage performance. Use AWS CloudWatch to track performance metrics and make necessary adjustments to enhance efficiency.

8. Ensure Data Security and Compliance

AWS provides various security features to protect your data, including encryption, access controls, and monitoring. Ensure your data meets regulatory compliance requirements.

9. Validate and Test

Conduct thorough testing to validate that your applications function correctly in the new environment. Ensure that data access and performance meet your expectations.

10. Decommission Legacy Systems

Once you’ve confirmed your data’s successful migration and testing, you can decommission your legacy storage systems. Ensure all data has been securely transferred and backed up before decommissioning.

Common Challenges in AWS Migration

1. Data Transfer Speed

Large data transfers can take time. Use tools like AWS Snowball for faster data transfer.

2. Data Compatibility

Ensure your data formats are compatible with AWS storage services. Consider data transformation if necessary.

3. Security Concerns

Data security is paramount. Utilize AWS security features such as encryption and IAM roles.

4. Cost Management

Monitor and manage your AWS storage costs. Use AWS Cost Explorer and set up budget alerts.

Benefits of AWS Storage Solutions

Scalability: AWS storage solutions scale according to your needs, ensuring you never run out of space.
Cost-Effectiveness: Pay only for the storage you actually use and leverage different storage tiers to optimize costs.
Reliability: AWS guarantees high availability and durability for your data.
Security: Robust security features protect your data against unauthorized access and threats.
Flexibility: Choose from various storage options for different workloads and applications.

Conclusion

Migrating to AWS from a storage perspective involves careful planning, execution, and optimization. By understanding the various AWS storage options and following a structured migration process, you can ensure a smooth transition to the cloud. AWS’s comprehensive suite of tools and services simplifies the migration journey, allowing you to focus on leveraging the cloud’s benefits for your business.

FAQs

What is the best AWS Storage Service for Archiving Data?

Amazon Glacier is ideal for archiving data due to its low cost and high durability.

How can I Ensure Data Security during Migration to AWS?

Utilize AWS encryption, access controls, and compliance features to secure your data during migration.

What tools can I use to migrate data to AWS?

AWS offers several tools to facilitate data migration, including AWS Storage Gateway, AWS DataSync, and AWS Snowball.

How do I Optimize Storage Costs in AWS?

Monitor usage with AWS Cost Explorer, choose appropriate storage tiers, and use lifecycle policies to manage data.

Can I Migrate my On-premises Database to AWS?

AWS provides services like AWS Database Migration Service (DMS) to help you migrate databases to the cloud.

How Simplyblock can be used with AWS Migration

Migrating to AWS can be a complex process, but using simplyblock can significantly simplify this journey while optimizing your costs, too.

Simplyblock software provides a seamless bridge between local NVMe disk, Amazon EBS, and Amazon S3, integrating these storage options into a cohesive system designed for the ultimate scale and performance of IO-intensive stateful workloads. By combining the high performance of local NVMe storage with the reliability and cost-efficiency of EBS (gp2 and gp3 volumes) and S3, respectively, simplyblock enables enterprises to optimize their storage infrastructure for stateful applications, ensuring scalability, cost savings, and enhanced performance. With simplyblock, you can save up to 80% of your AWS database storage costs.

Our technology uses NVMe over TCP for minimal access latency, high IOPS/GB, and efficient CPU core utilization, outperforming local NVMe disks and Amazon EBS in cost/performance ratio at scale. Ideal for high-performance Kubernetes environments, simplyblock combines the benefits of local-like latency with the scalability and flexibility necessary for dynamic AWS EKS deployments, ensuring optimal performance for I/O-sensitive workloads like databases. Using erasure coding (a better RAID) instead of replicas, simplyblock minimizes storage overhead while maintaining data safety and fault tolerance. This approach reduces storage costs without compromising reliability.

Simplyblock also includes additional features such as instant snapshots (full and incremental), copy-on-write clones, thin provisioning, compression, encryption, and many more – in short, there are many ways in which simplyblock can help you optimize your cloud costs. Get started using simplyblock right now and see how simplyblock can simplify and optimize your AWS migration. Simplyblock is available on AWS Marketplace.

The post AWS Migration: How to Migrate into the Cloud? Data Storage Perspective. appeared first on simplyblock.

What is the AWS Workload Migration Program and how simplyblock can help you with cloud migration?

Rob Pankow — Thu, 12 Sep 2024 23:13:24 +0000

What is the AWS Workload Migration Program?

The AWS Workload Migration Program is a comprehensive framework designed to help organizations migrate their workloads to the AWS cloud efficiently and effectively. It encompasses a range of tools, best practices, and services that streamline the migration process.

Key Features of the AWS Workload Migration Program

Comprehensive Migration Strategy: The program offers a step-by-step migration strategy tailored to meet the specific needs of different workloads and industries.
Robust Tools and Services: AWS provides a suite of robust tools and services, including AWS Migration Hub , AWS Application Migration Service, and AWS Database Migration Service, to facilitate smooth and secure migrations.

Benefits of using AWS Workload Migration Program

Reduced Migration Time: With pre-defined best practices and automated tools, the migration process is significantly faster, reducing downtime and disruption.
Minimized Risks: The program includes risk management strategies to ensure data integrity and security throughout the migration process.

Steps Involved in the AWS Workload Migration Program

Assessment Phase Evaluating Current Workloads: Assessing your current workloads to understand their requirements and dependencies is the first step in the migration process. Identifying Migration Objectives: Define clear objectives for what you want to achieve with the migration, such as improved performance, cost savings, or scalability.
Planning Phase Creating a Migration Plan: Develop a detailed migration plan that outlines the steps, timelines, and resources required for the migration. Defining Success Criteria: Establish success criteria to measure the effectiveness of the migration and ensure it meets your business goals.
Migration Phase Executing the Migration: Carry out the migration using AWS tools and services, ensuring minimal disruption to your operations. Ensuring Minimal Downtime: Implement strategies to minimize downtime during the migration, such as using live data replication and phased cutovers.
Optimization Phase Post-Migration Optimization: After migration, optimize your workloads for performance and cost-efficiency using AWS and simplyblock tools. Continuous Monitoring: Continuously monitor your workloads to ensure they are running optimally and to identify any areas for improvement.

Challenges in Cloud Migration

Common Migration Hurdles Data Security Concerns: Ensuring the security of data during and after migration is a top priority and a common challenge. Compatibility Issues: Ensuring that applications and systems are compatible with the new cloud environment can be complex.
Overcoming Migration Challenges Using the Right Tools: Leveraging the right tools, such as AWS Migration Hub and simplyblock’s storage solutions, can help overcome these challenges. Expert Guidance: Working with experienced cloud migration experts can provide the guidance needed to navigate complex migrations successfully.

Simplyblock and Cloud Migration

Introduction to Simplyblock

Simplyblock offers advanced AWS storage orchestration solutions designed to enhance the performance and reliability of cloud workloads. Simplyblock integrates seamlessly with AWS, making it easy to use their advanced storage solutions in conjunction with AWS services.

Key Benefits of using Simplyblock for Cloud Migration

Enhanced Performance: simplyblock’s advanced storage solutions deliver superior performance, reducing latency and increasing IOPS for your workloads, offering the benefits of storage tiering, thin provisioning, and multi-attach that are not commonly available in the cloud while a standard in private cloud data centers.
Improved Cost Efficiency: simplyblock helps you optimize storage costs while maintaining high performance, making cloud migration more cost-effective. You don’t have to pay more for storage in the cloud compared to your SAN system in private cloud.
Increased Reliability: simplyblock’s storage solutions offer high durability and reliability, ensuring your data is secure and available when you need it. You can optimize data durability to your needs. Simplyblock offers full flexibility in how the storage is orchestrated and provides various Disaster Recovery and Cybersecurity protection options.

Best Practices for Cloud Migration with Simplyblock

Pre-Migration Preparations

Assessing Storage Needs: Evaluate your storage requirements to choose the right simplyblock solutions for your migration. Data Backup Strategies: Implement robust data backup strategies to protect your data during the migration process.

Migration Execution

Using simplyblock Tools: Leverage simplyblock’s tools to streamline the migration process and ensure a smooth transition. Monitoring Progress: Continuously monitor the migration to identify and address any issues promptly.

Post-Migration Tips

Optimizing Performance: Optimize your workloads post-migration to ensure they are running at peak performance. Ensuring Data Security: Maintain stringent security measures to protect your data in the cloud environment.

Simplyblock integrates seamlessly with AWS, providing robust storage solutions that complement the AWS Workload Migration Program. Optimize your cloud journey with simplyblock.

Frequently Asked Questions (FAQs)

What is the AWS Workload Migration Program?

The AWS Workload Migration Program is a comprehensive framework designed to help organizations migrate their workloads to the AWS cloud efficiently and effectively.

How does Simplyblock Integrate with AWS?

Simplyblock integrates seamlessly with AWS, providing advanced storage solutions that enhance performance and reliability during and after migration.

What are the Key Benefits of using Simplyblock for Cloud Migration?

Using simplyblock for cloud migration offers enhanced performance, improved cost efficiency, and increased reliability, ensuring a smooth transition to the cloud.

How can Simplyblock Improve the Performance of Migrated Workloads?

Simplyblock can help lowerign access latency and providing high density of IOPS/GB, ensuring efficient data handling and superior performance for migrated workloads.

What are some Common Challenges in Cloud Migration and how does Simplyblock Address Them?

Common challenges in cloud migration include data security concerns and compatibility issues. Simplyblock addresses these challenges with robust security features, seamless AWS integration, and advanced storage solutions.

How Simplyblock can be used with Workload Migration Program

When migrating workloads to AWS, simplyblock can significantly optimize your storage infrastructure and reduce costs.

simplyblock is a cloud storage orchestration platform that optimizes AWS database storage costs by 50-75% . It offers a single interface to various storage services, combining the high performance of local NVMe disks with the durability of S3 storage. Savings are mostly achieved by:

Data reduction: Eliminating storage that you provision and pay for but do not use (thin provisioning)
Intelligent tiering: Optimizing data placement for cost and performance between various storage tiers (NVMe, EBS, S3, Glacier, etc)
Data efficiency features: Reducing data duplication on storage via multi-attach and deduplication

All services are accessible via a single logical interface (Kubernetes CSI or NVMe), fully abstracting cloud storage complexity from the database.

Our technology employs NVMe over TCP to deliver minimal access latency, high IOPS/GB, and efficient CPU core utilization, outperforming both local NVMe disks and Amazon EBS in cost/performance ratio at scale. It is particularly well-suited for high-performance Kubernetes environments, combining the low latency of local storage with the scalability and flexibility necessary for dynamic AWS EKS deployments . This ensures optimal performance for I/O-sensitive workloads like databases. Simplyblock also uses erasure coding (a more efficient alternative to RAID) to reduce storage overhead while maintaining data safety and fault tolerance, further lowering storage costs without compromising reliability.

Simplyblock offers features such as instant snapshots (full and incremental), copy-on-write clones, thin provisioning, compression, and encryption. These capabilities provide various ways to optimize your cloud costs. Start using simplyblock today and experience how it can enhance your AWS migration strategy . Simplyblock is available on AWS Marketplace.

The post What is the AWS Workload Migration Program and how simplyblock can help you with cloud migration? appeared first on simplyblock.

Ransomware Attack Recovery with Simplyblock

Michael Schmidt — Tue, 10 Sep 2024 23:26:57 +0000

In 2023, the number of victims of Ransomware attacks more than doubled, with 2024 off to an even stronger start. A Ransomware attack encrypts your local data. Additionally, the attackers demand a ransom be paid. Therefore, data is copied to remote locations to increase pressure on companies to pay the ransom. This increases the risk of the data being leaked to the internet even if the ransom is paid. Strong Ransomware protection and mitigation are now more important than ever.

Simplyblock provides sophisticated block storage-level Ransomware protection and mitigation. Together with recovery options, simplyblock enables Point-in-Time Recovery (PITR) for any service or solution storing data.

What is Ransomware?

Ransomware is a type of malicious software (also known as malware) designed to block access to a computer system and/or encrypt data until a ransom is paid to the attacker. Cybercriminals typically carry out this type of attack by demanding payment, often in cryptocurrency, in exchange for providing a decryption key to restore access to the data or system.

Statistics show a significant rise in ransomware cyber attacks: ransomware cases more than doubled in 2023, and the amount of ransom paid reached more than a billion dollars—and these are only official numbers. Many organizations prefer not to report breaches and payments, as those are illegal in many jurisdictions.

The Danger of Ransomware Increases

The number and sophistication of attack tools have also increased significantly. They are becoming increasingly commoditized and easy to use, drastically reducing the skills cyber criminals require to deploy them.

There are many best practices and tools to protect against successful attacks. However, little can be done once an account, particularly a privileged one, has been compromised. Even if the breach is detected, it is most often too late. Attackers may only need minutes to encrypt important data.

Storage, particularly backups, serves as a last line of defense. After a successful attack, they provide a means to recover. However, there are certain downsides to using backups to recover from a successful attack:

The latest backup does not contain all of the data: Data written between the last backup and the time the attack is unrecoverably lost. Even the loss of one hour of data written to a database can be critical for many enterprises.
Backups are not consistent with each other: The backup of one database may not fit the backup of another database or a file repository, so the systems will not be able to integrate correctly after restoration.
The latest backups may already contain encrypted data. It may be necessary to go back in time to find an older backup that is still “clean.” This backup, if available at all, may be linked to substantial data loss.
Backups must be protected from writes and delete operations; otherwise, they can be destroyed or damaged by attackers. Attackers may also damage the backup inventory management system, making it hard or impossible to locate specific backups.
Human error in Backup Management may lead to missing backups.

Simplyblock for Ransomware Protection and Mitigation

Simplyblock provides a smart solution to recover data after a ransomware attack, complementing classical backups.

In addition to writing data to hot-tier storage, simplyblock creates an asynchronously replicated write-ahead log (WAL) of all data written. This log is optimized for high throughput to secondary (low IOPS) storage, such as Amazon S3 or HDD pools, like AWS’ EBS st2 service. If this secondary storage supports write and deletion protection for pre-defined retention periods, as with S3, it is possible to “rewind” the storage to the point immediately before the attack. This performs a data recovery with near-zero RPO (Recovery Point Objective).

A recovery mechanism like this is particularly useful in combination with databases. Before the attack can start, database systems typically have to be stopped. This is necessary as all data and WAL files are in use by the database. This allows for automatically identifying a consistent recovery point with no data loss.

In the future, simplyblock plans to enhance this functionality further. A multi-stage attack detection mechanism will be integrated into the storage. Additionally, deletion protection after clearance from attack within a historical time window and precise automatic identification of attack launch points to locate recovery points.

Furthermore, simplyblock will support partial restore of recovery points to enable different service’ data on the same logical volumes to be restored from individual points in time. This is important since encryption of one service might have started earlier or later than for others, hence the point in time to rewind to must be different.

Conclusion

Simplyblock provides a complementary recovery solution to classical backups. Backups support long-term storage of full recovery snapshots. In contrast, write-ahead log-based recovery is specifically designed for near-zero RPO recovery right after a Ransomware attack starts and enables quick and easy recovery for data protection.

While many databases and data-storing services, such as PostgreSQL, may provide the possibility of Point-in-Time Recovery, the WAL segments need to be stored outside the system as soon as they are closed. That said, the RPO would come down to the size of a WAL segment, whereas with simplyblock, due to its copy-on-write nature, the RPO can be as small as one committed write.

Learn more about simplyblock and its other features like thin-provisioning, immediate clones and branches, encryption, compression, deduplication, and more. Or just get started right away and find the best Ransomware attack protection and mitigation to date.

The post Ransomware Attack Recovery with Simplyblock appeared first on simplyblock.