This is the first entry in what will be an ongoing series of posts related to various components and service offerings within Microsoft Azure. The landscape of cloud computing is immense and can be overwhelming to those unfamiliar with the concepts. Through the Azure Facts series, I hope to digest smaller fragments of the overall landscape and briefly examine how they can be utilized. The first topic in the Azure Facts series will focus on Blob Storage within the Azure Storage Account.
What’s a blob?
Before I get too far into describing blob storage, I want to make sure everyone understands the concept of a blob. The term blob is an acronym for Binary Large Object and has been around for a lot longer than the term “cloud computing”. The blob data type was created and commonly used in older computer systems, especially early forms of relational database management systems, when there was a need to store data that was larger than the maximum allowed string column length. Simply speaking, a blob is a big chunk of unstructured binary data like an image, a document, a video or an audio clip.
Cloud storage is easy
There is not a single week that goes by that I do not hear about a company that recently moved assets into the cloud and are facing what they perceive as “buyer’s remorse”. The CFO finally got the first set of invoices from their cloud provider and the cost is well beyond what they expected. Most of the time, they were promised massive cost savings in migrating their assets to the cloud in some form of “lift and shift” operation. From the standpoint of technical implementation, it is extremely simple to get data into the cloud. For many enterprises, this is often the first pitfall that leads to cost overruns. When reviewing the storage options of the major cloud providers, all of them offer various choices at strategically placed price points. Not all storage is the same and the decisions that are made when provisioning cloud storage can have a major impact on the bottom line. Let’s look at storing our blob data in Microsoft Azure and see why cloud storage may be easy, but there are several factors to consider before moving anything into cloud storage.
Note: If you do not have an Azure account subscription, you can get started by signing up for a free Azure account.
Azure Storage account
The core component required to facilitate storage in Azure is called the Storage Account. During the creation of the storage account, there are several parameters that need to be set based on how the storage will be used. At this point, it is critical to know what assets will be stored in the cloud and how they will be accessed. Let’s review the storage account creation process to unwrap some of those details.
Once we log into our Microsoft Azure Account via the Azure Portal, we are presented with the whole suite of services along the left pane of the user interface. By choosing Storage Accounts and then the “Add” option in the top navigation, we are presented with a new page of information with the following options:
Subscription – the account that will be billed for the service. In some cases, a subscription may have free credits to burn before receiving a charge, but everything starts with being assigned to some logical unit to charge cloud resource consumption.
Resource Group – the logical container to organize Azure artifacts. A resource group is not a physical thing, but a way to group together various cloud resources like storage, virtual machines, databases, etc. as a logical unit. One way to think about using resource groups is by creating one for each application that is deployed into the cloud. In doing so, there is a very easy way now to browse to the resource group in the Azure portal and see all the components that make up that application in one view. Resource groups also make it very easy for maintenance since we can delete a resource group and the delete operation will manage the disposal of everything inside of it which can be very efficient for demos, testing environments or even disaster recovery.
Storage Account Name – a unique name across all Azure storage accounts. One of the first things to understand about Azure Storage Accounts is that they expose all their functions via ReSTful APIs. The account name that is chosen must be unique because it is used as the first segment in the host name that is created. Ultimately, we will be able to access the blob storage from anywhere in the world via a URL that will look like:
Location – the Azure data center region where the storage account is hosted. Azure services are available from 54 regions spread across the globe. The location attribute enables us to choose the region that best serves our needs. Typically, we will want to pick the region that is closest to us to minimize latency. Another item to consider is what data is being stored in the cloud and the laws of the country/region where the data will be stored. There are a lot of legal and financial ramifications that could factor into this decision depending on the type of business being conducted or the type of data that is stored. If there is any uncertainty of the laws surrounding data residency or if an organization must comply with specific industry regulations, the Azure Guidance in the Trust Center documentation can provide additional insight.
Performance – Standard or Premium – This is the first parameter in the list where money comes in to play depending on the selection that is made. The Standard performance tier is backed by magnetic drives whereas the Premium performance tier is backed by solid state drives. If the requirement is to store massive amounts of data for the purposes of back-up and archive, the standard performance tier would be a reasonable choice since speed is not the primary use case. If the requirement is to create storage for assets that will be used as part of a high-performing web application that requires consistent, low-latency performance, the premium storage tier is the better choice. The Standard tier is cheaper than the Premium tier, but it is less performant.
Account Kind – Storage V1 (not recommended), Storage V2 (recommended), Blob only (special) – The V1 and V2 general-purpose account types contain multiple storage services within them: Blob Storage, File Storage, Queue Storage and Table Storage. We will review the other storage services in other posts. For now, we are just focusing on the Blob storage service. There is also a Blob only storage option if the other storage services are not required in a storage account, but there are a few trade-offs that must be considered related to the types of blobs that can be stored. The General Purpose V1 account type should not be selected as it will be deprecated soon. Microsoft recommends using the general Storage V2 option for maximum flexibility.
Replication – Local, Zone, Geo, Read Access Geo – By default, data in an Azure storage account is replicated locally to ensure availability through hardware interruptions within a given data center. As expected, each successive selection in the list ensures a higher level of availability by moving the data from local to another zone in the same region or an Azure data center in another region entirely protecting against a catastrophic data center loss. In addition to the high level of resiliency, the Geo and Read Access Geo-Replication options will incur additional costs to account for the data transfer operation that occurs when data is replicated into a data center in another region.
Access Tier – Cool, Hot – The cool option offers lower prices for storage based on the assumption that the data will be accessed less frequently. Conversely, the hot option has a higher storage price and is optimized for frequent data access. The main point to consider here is that if data is put in the cool tier, it should not be accessed very frequently. Otherwise, the resulting total storage cost may be more expensive than if the data were stored in the higher priced hot tier with lower data access charges.
Estimate storage cost
The key concept to understand when estimating storage cost is that there is more to the price than just a flat fee for storage at rest regardless of the storage media. The total storage cost can only be estimated by factoring in:
Data Storage Costs (price per unit) + Operations (read/write price per operation) + Data transfer prices (price per unit) = Total Storage Cost
Reviewing the pricing details in the Azure documentation, we see that each tier has various prices associated to the core storage cost, read and write operations, create container operations, data retrieval and more. Some of the price points are set based on the number of transactions performed against the storage account and others are set based on the size of the data that is accessed. While this whole concept may be mind-boggling to the uninitiated, Microsoft does provide a tool to help visualize these charges with the Azure Calculator. It is very useful to plug in the parameters of your cloud usage into the calculator to get an estimate of the monthly storage cost. The numbers here are obviously only to get a very raw estimate, but it allows us to see the various parameters and make some adjustments in our assumptions PRIOR to migrating data into cloud storage.
After a quick review of the options above, a critical takeaway is that it is perfectly acceptable and recommended to create different storage accounts for different uses. It does not make a lot of sense to create one storage account with premium access that serves as the backing store for your critical, high-performing website while using that same storage account to hold backed up versions of technical documentation that is rarely accessed. The common mistake that I see organizations make is to create this one storage account and then carve it up with various blob containers. A storage account should not be equated to the single, on-premises SAN disk array that is shared across an organization for all their storage needs. A storage account should be used to store data that is similar, and the creation of multiple storage accounts is normal practice, not the exception.
Work smarter, not harder
Microsoft recently announced a new feature that allows users to manage the lifecycle of assets stored in the cloud. Most of us can admit that there are files on our computers that were saved with the best intentions many years ago and have not been touched since. In many cases, we may have accessed the file quite frequently for a few days or even months, but after a while it became less relevant. As we saw with the storage account options earlier in this article, assets that we use frequently start in the hot tier (more expensive) and will migrate over a period of time to the cool tier (less expensive) and then ultimately into a long-term archive (really cheap) or even deleted (the cheapest of all) if it is no longer needed. This “lifecycle” is precisely what the latest offering is meant to address. By defining policies on storage accounts which are executed daily, we can automate this transition of data between various states and potentially see even greater cost savings since the policy can manage assets at a scale more efficiently than a human administrator. To read more about this feature, go to the Azure documentation for Managing the Azure Blob Storage Lifecycle.
This has been a very brief glimpse into how storage costs can be estimated and managed. Ultimately, cloud storage is easy to use and extremely powerful when operated at a massive scale, but we must be more vigilant in determining how the storage is provisioned and utilized. When it comes down to spending your organization’s money, ignorance is not bliss.
In the next post, we will create a storage account and look at the variety of techniques that can be used to move data into and around the cloud.
Azure Account Sign-up
Azure Trust Center
Blob Storage Pricing
Storage Lifecycle Management