As part of Azure Storage Accounts, lifecycle management policies allow you to automatically manage the data lifecycle of your blobs. These policies are particularly useful for optimising costs by transitioning blob data to the appropriate access tiers, or deleting data when it’s no longer needed. With rule based policies, you can transition data between access tiers, for example, you can move blobs from the hot, cool, cold and archive tiers after a set period of inactivity. For example, if a blob was created/last modified 30 days ago, then move the blob to cold storage automatically. Lifecycle management rules allow you to efficiently manage your data’s lifecycle. Whether you’re optimising costs by moving data or ensuring compliance, Azure Storage lifecycle management simplifies and automates the process. To learn more about pricing for the different storage tiers, visit the following link Azure Storage Blobs Pricing.
The purpose of this post is to explain the options available when configuring a lifecycle management rule in a Storage Account and in particular the options shown in the image below. The image shows a number of rule scopes including options to apply a rule to all blobs in a storage account and limit what blobs are processed based on filters.
Let’s start and explore the first option,
Apply rule to all blobs in your storage account
This option will apply the lifecycle management rule to all blobs included in the storage account. For example, if your storage account includes 50 containers, and you require all blobs last modified 30 days ago to be moved to the cool tier, all blob objects included in the 50 containers which meet the condition will be moved to cool storage.
That’s great but what if you wanted to target a certain container inside your storage account only? This is where the next option helps.
Limit blobs with filters
When selecting this option an additional tab named filter set appears.
Filters limit rule actions to a subset of blobs within the storage account. Instead of processing all blobs in a storage account, a filter set allows you to be more specific. In the example above I specify that I only want this rule to apply to blobs stored in a container named images. Therefore, all other containers will be ignored. I could also create a blob prefix rule, for example, images/pic. In this case the rule will process blobs inside the container named images, however, it will only apply to blobs starting with the name pic, such as pic1, pic2, pic3 and so on.
Therefore, as per my configuration shown in the image below, if any blobs included in the container images were last modified more than 30 days ago, they will automatically be moved to cool storage. Because I have specified that this rule should only process blobs in the container images, all other containers will not be processed by this rule.
Back to the details tab, there are a few more options to cover.
Blob Type:
Pick the type of blobs you want to automate the lifecycle for. We have two options including,
Block blobs: designed for general purpose storage and allow the storage of different types of unstructured data such as images, videos, pdf, docx, notepad files and so on. Blob Storage is ideal for serving images or documents directly to a browser, streaming video and audio, storing data for backup and restore, disaster recovery, and archiving or storing files for distributed access.
Append blobs: designed for scenarios where data needs to be added to a blob in chunks without altering the existing content. Each new piece of data is appended to the end of the blob, a process known as appending. This type of blob is particularly well suited for fast append operations, making it ideal for use cases such as logging and auditing, where you want to add data to an existing blob without changing its current contents.
Next, we have the option Blob index match. Let’s explore this option.
If configured, this option will process blobs where keys and values have been created. These may have been added manually or based on automation by an application. As datasets get larger, finding a specific object in a sea of data can be difficult. Blob index tags provide data management and discovery capabilities by using key value index tag attributes. Consider a scenario where you have millions of blobs in your storage account, accessed by many different applications. You want to find all related data from a single project. You aren’t sure what’s in scope as the data can be spread across multiple containers with different naming conventions. However, your applications upload all data with tags based on their project. Instead of searching through millions of blobs and comparing names and properties, you can use Project = cloudbuild as your discovery criteria. Blob index will filter all containers across your entire storage account to quickly find and return the set of blobs from project = cloudbuild. But where are these index tags added, let’s take a look.
Right click a blob and click properties. In my case I have accessed testfile.txt. Scroll down and that’s where blob index tags can be applied. The blob index match available in a lifecycle management rule would be able to move files including blob index tags from one storage tier to another or even delete them if needed.
Blob subtype:
Next we have blob subtype, where base blobs is selected by default. Let’s explore the three options, but before we do, we first need to understand what snapshots and versions are.
Snapshots: a snapshot is a read-only version of a blob that’s taken at a point in time. For example, if you access a container and click a blob in your storage account, you’ll find the option to take a snapshot of your blob. The blob subtype of snapshots above is referring to blob snapshots. All snapshots share the base blob’s URI. The only distinction between the base blob and the snapshot is the appended DateTime value. When you create a snapshot of a blob, the blob’s system properties are copied to the snapshot with the same values. Snapshots are simply a copy of the blob taken at a point in time.
You have the option to restore a blob back to a snapshot you may have created previously. The snapshot option available in a lifecycle management rule is referring to these snapshots.
Versions: You can enable blob storage versioning to automatically maintain previous versions of an object. When blob versioning is enabled, you can restore an earlier version of a blob to recover your data if it’s erroneously modified or deleted. Versioning must be enabled on the storage account as shown in the image below. The below image shows the data protection tab when creating a storage account and the option to enable versioning under the section tracking.
Once enabled, if I was to make a change to a blob, the previous versions would be maintained.
Here is my original blob named testfile.txt and includes one line of data containing the words ‘test file’
I click on the versions tab and there are no previous versions, because I have not made a change to the text file as yet.
If I was to make a change to the file, a version of the original file would be stored. However, before I do this, I need to enable versioning on my existing storage account,
- Access the storage account
- From the left pane, scroll down to Data protection, located under the Data management section.
- I click the option enable versioning for blobs and in my case I want to only hold 7 days of versions/changes.
Back to my blob testfile.txt, I make a change by adding another line of text and click save.
Click the versions tab and we see that the change has created a new version of the file. As part of the lifecycle management rule we are able to automatically move version of files from one tier to another.
Base blobs: this is the original file without any snapshots or versioning. The most recent file.
Note:
The platform runs the lifecycle policy once a day. When you configure or edit a lifecycle policy, it can take up to 24 hours for changes to go into effect and for the first execution to start. The time taken for policy actions to complete depends on the number of blobs evaluated and processed.
If you disable a policy, then no new policy runs will be scheduled, but if a run is already in progress, that run will continue until it completes and you’re billed for any actions that are required to complete the run. Source: Azure Storage | Microsoft Learn
I hope you found this post useful. See you at the next one.