Scott Lakso's blog
Backup Restore and Recovery Considerations in Virtual Environments
It is no secret that large and small businesses alike, are rapidly adopting server virtualization in their data centers and most indications are that this trend will continue. When architecting virtual infrastructures, one of the first issues that business face is “What should I do for backup and recovery in a virtual environment?”
The most common approach, at least when starting out, is to ignore the fact that servers are now running on Virtual Machines (VMs) and backup the servers through the guest Operating Systems (Oss) just like you do when the OS is running on a physical server. While this approach will work, it does have some drawbacks. This approach typically requires you to load a backup agent on the guest OS in order to backup that server. If the server is running an application such as Exchange, SQL or SharePoint, then you need to load a separate agent for each application. Some backup applications also require separate agents to backup the Windows System State or Services Data Base.
When you load backup agents on a physical server, these agents are processes running on the OS, which require CPU resources. Depending upon the agents, each agent might use less than 1% of the CPU resources or more than 15%. Regardless of the resources required by the agents, this CPU overhead usually goes unnoticed on a physical server.
However, in a virtual environment, you could easily have 10 VMs running on a single physical host. Each VM might have several agents on the server to accommodate backing up the file system, services database and applications. Assuming a very conservative average of two agents per VM (each using 1% of the host’s CPU cycles), in a virtual environment, you would be wasting 20% of your available CPU resources on backup agents that don’t do anything during normal business hours.
Once businesses realize the overhead in terms of wasted CPU resources, as well as man-hours required to manage all those agents, they typically look for a solution that will allow them to back up their VMs from the physical host side. VMWare has the largest server virtualization market share so most of the major backup applications now support backing up VMs from the VMWare host side.
Backing up VMs from the host side has advantages over backing up servers from the guest OS side. First, there is no need to load or manage agents on each of the guest OSs. This saves on both CPU resources and management overhead.
The next advantage is that it is typically much faster to backup and restore VMs from the host side, since you are backing up and restoring a single large VMDK file rather than backing up and restoring thousands of small individual OS, application and data files. In a Disaster Recovery (DR) situation, where a VM’s OS becomes corrupted and you need to restore from a backup, it is very easy to point and click, and restore that system to another VM. The disadvantage with many backup applications is they don’t support individual file restores. If an end user deletes a single file, you need to restore the entire VM, find the file and give it to the end user, then delete the VM.
When moving to a virtual infrastructure, it is a good time to evaluate your current backup application and to see if it meets all your needs. If you determine that you need to invest in a new backup solution, you will want to choose one that will meet all your needs, now and in the future. You should look for a solution that will allow you to restore the entire VM in a DR situation or to restore applications and databases like Exchange and SQL without having to restore the entire VM. You should also consider a solution that allows you to restore individual Exchange messages or individual SharePoint items, without having to restore the entire database.
Finally, you should seriously consider a backup recovery solution that supports both physical servers and virtual environments. And the BUR solution should support more than just VMware. While VMware may have the lion’s share of the virtualization market share today, they are starting to face significant competition from other sources such as MS Hyper-V, XenServer and Parallels to name a few. Whenever a technology vendor thinks that a customer has no alternatives and is locked into their solution, they have very little incentive to reduce the cost of their solution. Bringing in an alternative virtualization solution may provide VMware an incentive to reduce their price. But you shouldn’t have to invest time and money in a new backup solution just because you want to try an alternative to VMware.
Comment on “Adding Disk Backup Intelligence to Primary Storage”
George Crump, in his blog at Storage-Switzerland.com, has some important advice about disk backup intelligence:
“Consolidation is everywhere in the data center, except in data protection. This process often uses separate servers, networks and most importantly separate storage devices. A whole market is dedicated to the re-purchase of storage capacity so it can be backed up. The problem with this strategy is that it’s redundant, not very flexible, and not very cost effective. With multi-tier disk types and the capacity optimization capabilities of primary storage becoming commonplace, adding disk backup intelligence to these systems through backup virtualization should be a top consideration for IT managers when selecting a new platform.”
You can read George’s entire post here: Adding_Disk_Backup_Intelligence_To_Primary_Storage
I couldn’t agree more when George suggests that adding a secondary tier of storage to your existing primary storage system has significant advantages over purchasing a separate backup disk storage system. To a certain extent, virtually all disk systems provide some level of tiered storage, even if that is as simple as providing a read/write cache to buffer the data as it is being written to disk or being read frequently. But most enterprise disk vendors also offer additional tiers of storage within a single subsystem that may include Flash storage, FC, SAS or SATA. Adding lower performance lower cost drives to an existing primary storage is system is not only cost effective, in most cases it will also provide better data protection for your backup data.
George goes on to suggest that users or vendors can add intelligence by integrating a backup virtualization engine like Tributary Systems Storage Director™ to solve the fact that tiered disk systems lack a sense of ‘disk backup intelligence’. Since the title of his article is “Adding Disk Backup Intelligence to Primary Storage,” I will suggest that all disk vendors develop the intelligence to move data from high performance disk to a lower performance disk pool based upon commands from backup applications.
Most backup solutions available today support disk to disk backup, even if the disk is really just a buffer before eventually moving the backup data to tape. With most disk to disk backup applications today, the data being backed up is copied from the primary storage to a backup server then written from the backup server to the backup disk pool. This is true whether using SAN or NAS connectivity and also true regardless of whether or not the backup disk pool is in the same disk storage system or a separate disk storage system.
If there was a common “Standard” for backup applications to copy data from high cost disk pools to low cost disk pools within the primary disk system without traversing the SAN or IP Network, this would significantly reduce network traffic and infrastructure requirements, improve backup reliability and hopefully speed up the backup process. Back when most backup applications were writing directly to tape, “NDMP” was developed to move data from a storage system directly to tape without sending it to a backup server and having the backup server write it to tape. I suggest that the industry needs a similar standard and disk intelligence to do the same thing within tiered disk solutions.
Plan your work and work your plan

Plan your work and work your plan. If you want to be efficient and successful, this is a good motto for all your activities, but it is particularly important when it comes to data protection planning.
Documenting a good plan is paramount to achieving positive results, but in order to achieve positive results, you must first understand what the actual results are that you wish to achieve.
A few years ago, I remember helping my son build a homecoming float for a High School parade. I was there because adult supervision was required by the school. The students quickly let me know that they were in charge of building the float and my opinions and suggestions were neither required nor desired.
They had a fairly well documented plan to build a grandiose float with their schools Mascot dropping the hammer on an opponent’s Mascot. They documented their plan to build this moving Mascot and actually executed the plan fairly well. They finished the float early the day before the parade. Their float really was awesome and they were sure to win the prize for best float.
WRONG. The day of the homecoming parade arrived and the students went to get their float and bring it to the parade, only it was too big to fit through the garage door. They had to tear it apart and duct tape it back together at the last minute just to get it into the parade. Of course it was not nearly as awesome looking as the original construction, and they did not win the prize for best float.
So if you are going to plan your work and work your plan, make sure you know what your objectives are. In this case, the students planned to build the most awesome float they could imagine, which they successfully accomplished. Unfortunately, the real objective they should have planned for was to have the most awesome float in the parade.
Plan for data protection
In the case of data protection, it is very easy to concentrate on the ways to protect your data, when in reality, the focus of your planning should start with “How do I access or retrieve my data in the event of hardware error, user error, a virus, a natural disaster or some other unforeseen event?”
When you start planning for data protection, you should start by planning what data you need to be able to access, how quickly you need to have access, and what unplanned events you are trying to protect your data against.
Once you have identified the events you want to be protected from, then you can plan for the best ways to protect against those events.
Image: http://www.flickr.com/photos/orangeacid/204163841/sizes/m/in/photostream/


