Backup encrypted data from Linux servers directly to Azure Blob Storage

What is Azure Storage?

Azure Storage is scalable, programmable storage with high SLA (99.95%) and no need of any administration. So you can store and process hundreds of terabytes of data or you can store the small amounts of data required for a small website - in both scenarios you pay only for the data you’re storing.

Azure Storage uses an auto-partitioning system that automatically load-balances your data based on traffic. This means that as the demands on your application grow, Azure Storage automatically allocates the appropriate resources to meet them.

Azure Storage supports clients using a diverse set of operating systems (including Windows and Linux) and a variety of programming languages (including .NET, Java, Python and C++) for convenient development. Azure Storage also exposes data resources via simple REST APIs, which are available to any client capable of sending and receiving data via HTTP/HTTPS.

Azure storage is divided to four services:

Blob storage stores file data. A blob can be any type of text or binary data.
Table storage stores structured datasets. Table storage is a NoSQL key-attribute data store.
Queue storage provides reliable messaging for workflow processing and for communication between components of cloud services.
File storage offers shared storage using the standard SMB protocol. Azure virtual machines, cloud services and on-premise servers can share file data. On-premises applications can also access file data in a share via the File service REST API.

Every blob is organized into a container. Containers also provide a useful way to assign security policies to groups of objects. A storage account can contain any number of containers, and a container can contain any number of blobs, up to the 500 TB capacity limit of the storage account.

Blob Storage

Blob storage offers three types of blobs: block blobs, append blobs, and page blobs (disks). Block blobs are optimized for streaming and storing cloud objects, and are a good choice for storing documents, media files, backups etc. Append blobs are similar to block blobs, but are optimized for append operations. An append blob can be updated only by adding a new block to the end. Append blobs are a good choice for scenarios such as logging, where new data needs to be written only to the end of the blob.

Page blobs are optimized for representing IaaS disks and supporting random writes, and may be up to 1 TB in size. An Azure virtual machine network attached IaaS disk is a VHD stored as a page blob.

Access to Blob

By default, only the storage account owner can access resources in the storage account. For the security of your data, every request made against resources in your account must be authenticated. Authentication relies on a Shared Key model. Blobs can also be configured to support anonymous authentication.

Your storage account is assigned two private access keys on creation that are used for authentication. Having two keys ensures that your application remains available when you regularly regenerate the keys as a common security key management practice.

If you do need to allow users controlled access to your storage resources, then you can create a shared access signature. A shared access signature (SAS) is a token that can be appended to a URL that enables delegated access to a storage resource. Anyone who possesses the token can access the resource it points to with the permissions it specifies, for the period of time that it is valid.

Finally, you can specify that a container and its blobs, or a specific blob, are available for public access. When you indicate that a container or blob is public, anyone can read it anonymously; no authentication is required. Public containers and blobs are useful for exposing resources such as media and documents that are hosted on websites. To decrease network latency for a global audience, you can cache blob data used by websites with the Azure CDN.

Replication for Durability and High Availability

The data in your Microsoft Azure storage account is always replicated to ensure durability and high availability, meeting the Azure Storage SLA even in the face of transient hardware failures. When you create a storage account, you must select one of the following replication options:

Locally redundant storage (LRS). Locally redundant storage maintains three copies of your data. LRS is replicated three times within a single facility in a single region. LRS protects your data from normal hardware failures, but not from the failure of a single facility.

LRS is offered at a discount. For maximum durability, we recommend that you use geo-redundant storage, described below.

Zone-redundant storage (ZRS). Zone-redundant storage maintains three copies of your data. ZRS is replicated three times across two to three facilities, either within a single region or across two regions, providing higher durability than LRS. ZRS ensures that your data is durable within a single region.

ZRS provides a higher level of durability than LRS; however, for maximum durability, we recommend that you use geo-redundant storage, described below.

Geo-redundant storage (GRS). Geo-redundant storage is enabled for your storage account by default when you create it. GRS maintains six copies of your data. With GRS, your data is replicated three times within the primary region, and is also replicated three times in a secondary region hundreds of miles away from the primary region, providing the highest level of durability. In the event of a failure at the primary region, Azure Storage will failover to the secondary region. GRS ensures that your data is durable in two separate regions.

Read access geo-redundant storage (RA-GRS). Read access geo-redundant storage replicates your data to a secondary geographic location, and also provides read access to your data in the secondary location. Read-access geo-redundant storage allows you to access your data from either the primary or the secondary location, in the event that one location becomes unavailable.

More informations: https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/

Pricing: https://azure.microsoft.com/en-us/pricing/details/storage/

Duplicity: open source backup tool

What is it?

Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

The duplicity package also includes the rdiffdir utility. Rdiffdir is an extension of librsync's rdiff to directories - it can be used to produce signatures and deltas of directories as well as regular files. These signatures and deltas are in GNU tar format.

Current development status

Duplicity is fairly mature software. As any software, it may still have a few bugs, but will work for normal usage and is in use now for large personal and corporate backups.

More informations: http://duplicity.nongnu.org/

How-to: backup data from Linux Servers to Azure blob Storage

We need two elements to build complete backup environment:

1) Azure Storage account and container for backup.
2) Properly configured duplicity and cron task to run backup periodically.

We will configure Azure Storage account and container using Azure CLI tool:

azure storage account create -l "North Europe" --type LRS storageaccountname

Next we need to list account keys:

azure storage account keys list storageaccountname

Last thing. Let's create container for backup:

azure storage container create -a storageaccountname -k storageaccountkey -p Off containername

So we have: storage account name, storage account key (primary or secondary) and container name. Let's go to Linux server.

We need two tools to run duplicity backup directly to Azure Blob Storage. First one is... duplicity and the second one is Azure SDK for Python (and it should be installed globally if all users are going to use it with their own Azure Storage accounts).

Let's install duplicity (on Ubuntu):

sudo add-apt-repository ppa:duplicity-team/ppa
sudo apt-get update
sudo apt-get install duplicity

For tarball go to: http://duplicity.nongnu.org/

And now we need to install Azure SDK for Python. You will need PIP tool to do this. PIP installation depends on Linux distribution, but in most cases You will find it in distribution repositories. I will show You how it is in Ubuntu (14.04 LTS):

sudo apt-get install python-pip

and when we already have PIP we can proceed to install Azure SDK:

sudo pip install azure

It will be installed globally in /usr/local/lib/python2.7/dist-packages/.

Duplicity and Azure Blob Storage

OK - we have duplicity and we have storage account. Now we need to make them work together. We will create full backup with duplicity (You can learn about other options form duplicity manual).

If we want to use Azure Storage as duplicity backend, we need to set a few environment variables (or append them to duplicity command):

export AZURE_ACCOUNT_NAME=storageaccountname
export AZURE_ACCOUNT_KEY=storageaccountkey
export PASSPHRASE=passphrase_for_GPG_encryption

With these environment variables set we just need to run duplicity with source and destination:

duplicity full /source/dir azure://containername

If You don't want to, or You can't set environment variables, then You can append AZUREACCOUNTNAME, AZUREACCOUNTKEY and PASSPHRASE to duplicity command:

AZURE_ACCOUNT_NAME=storageaccountname AZURE_ACCOUNT_KEY=storageaccountkey PASSPHRASE=passphrase_for_GPG_encryption duplicity full /source/dir azure://containername

Finally - we need to set a cron job to run duplicity periodically. The way You edit the crontab depends on Your distribution and habits (or company policy). On ubuntu it will be:

crontab -e

Then You need to add a cron job to Your crontab:

5 0 * * * /path/to/backup_script.sh

The backup script will run every day, 5 minutes after midnight. And the example script:

export AZURE_ACCOUNT_NAME=storageaccountname
export AZURE_ACCOUNT_KEY=storageaccountkey
export PASSPHRASE=passphrase_for_GPG_encryption

duplicity full /source/dir azure://containername

unset PASSPHRASE
unset ZAZURE_ACCOUNT_NAME
unset AZURE_ACCOUNT_KEY

Remember to make this script executable.

And that's all. Your data is chunked to less than 32MB pieces, encrypted (server side) and pushed to Azure Blob Storage. No administration, high SLA. Perfection.

Restore

Restoring data from duplicity backup is very simple. In example:

Suppose we accidentally delete /home/me and want to restore it the way it was at the time of last backup:

duplicity azure://containername /home/me

Duplicity enters restore mode because the URL comes before the local directory. If we wanted to restore just the file "Mail/article" in /home/me as it was three days ago into /home/me/restored_file:

duplicity -t 3D --file-to-restore Mail/article azure://containername /home/me/restored_file

You can find more examples in duplicity manual page.