Monday, September 05, 2005

 

Minimizing Downtime with Disk Image Restores

Increasing productivity and reducing costs has become the mantra of I.T. and network managers since the Internet bubble burst. No longer is money flowing to new technologies just because it's there; today's I.T. managers are more pragmatic and sensitive to investments that won't deliver a fast return on investment and cannot be quantified.
All too often, I.T. managers fail to consider how unintended expenditures can result from the use of traditional technologies that work "well enough" that they haven't been replaced.

Many of these legacy technologies fall under the general categories of disaster recovery and business continuity. Preparing in advance for interruptions in your business operations is critical to surviving them, but some companies today still consider legacy technologies to be "sufficient."

Today's I.T. managers are bombarded by a complex array of technologies that promise to provide various levels of backup and data security . The problem is, in some cases the resulting backup is crippled by not having all of the information the I.T. manager requires for a baremetal restore.

The less complete the backup of a server disks, the longer it will take to restore the system should a disaster befall you. As a result, an I.T. department ultimately could spend a considerable amount of extra time and as a result, money, duplicating work they've already done simply because they lacked some necessary disk imaging software.



Traditionally, enterprise-class backup has been a file-based backup to tape. Tape is relatively reliable and inexpensive, but it's slow serial, takes up quite a bit of space and requires far more maintenance than disk storage. In addition, while tape can be used for disk imaging, its speed and capacity makes it more suited for off-line archiving of file-based backups rather than being used for online, image-based backups.

Today's disk-to-disk back-up strategies provide far superior performance, but at a cost. If the server is using SCSI, iSCSI or fibre channel storage devices, the hardware infrastructure cost can be significant. An IDE-based array is far less expensive, but it has limited usefulness in a large enterprise, where IDE is generally relegated to desktop systems.

However, a new generation of IDE drives -- those that spin at 15,000 RPM -- could change that. A less-expensive network-attached storage server could significantly improve the return on investment for disk imaging-based storage subsystems.

Bad Things Happen to Nice Computers

Backups are the life blood of any enterprise. They need to be portable and be part of an overall disaster recovery/business continuity process. Understanding how to make backups portable, so that they can be stored either offsite, in a vault, or simply physically away from the server, is a basic task of any I.T. manager.



Online or near-line storage of recently-archived data remains quite common, particularly in a hierarchical storage management (HSM) environment. Using some of that online or near-line storage for housing disk images of live server disks can significantly enhance your ability to recover from a disaster by significantly reducing the time it takes to access and restore necessary applications, the OS, patches, updates and, of course, data.

By far the most common types of disasters hitting enterprises today are various types of malware -- viruses, Trojans and other malicious code deliberately or accidentally introduced via e-mail, downloads or user software that has not been vetted by the I.T. department before being loaded onto workstations.

If your antivirus software doesn't catch the malware and all too often new strains of old viruses seep through even regularly updated antivirus programs, you're in for potentially serious systems problems. In a best case scenario, only one system is affected and then not seriously.

However, that best-case can easily turn into a worst-case if that one system turns out to be your mail server, web server, SQL server or other mission-critical system.

Potential lost revenue is not calculated only by lost sales lost or other direct transaction-based operations; it is calculated by lost productivity of all affected employees, lost goodwill of customers and potential customers who view a downed server as a lack of appropriate I.T. oversight and other factors.



All I.T. managers should have a checklist of items that they use to ensure that if a problem occurs, they will be able to continue with their business operations with the least amount of downtime . High on that list should be a plan for a bare-metal restore of affected servers.

An exact image of your server disks stored on a remote network or a removable drive will provide you with the fastest bare-metal restore possible. Remember that you're not just recovering the user data files and a clean install of the operating system, but also all of the OS security patches and updates, applications, the, the applications' security patches and updates, as well as numerous configuration files and other custom programming.

On top of that, remember to factor in the time it takes to collect all of the server applications, serial numbers, updates, patches and such. Depending on the organization of the I.T. department, this task conceivably could take more time than the software installation and configuration itself.

File vs. Folder vs. Partition

Of course, if the loss is localized or limited, you might only need to restore a single file or folder. Here again, time can be of the essence, depending on the severity of corruption and the file or files corrupted. If the damaged files are operating system files, this could significantly impact the ability for the I.T. manager to get the system back up and running quickly.

In such a case, it is useful to be able to boot the server independently of the installed OS. By bypassing the system OS, you can restore the damaged file without resorting to a major reinstallation of the full OS.

Acronis True Image Enterprise Server, for example, uses a Linux-based emergency rescue disk. Should a Windows 2003 Server fail due to a corrupted file, the I.T. manager can boot the individual server, restore the specific files that have been damaged, then reboot the server as if nothing has happened at all.

In fact, if the I.T. manager doesn't know which specific files were damaged, an entire directory can be restored just as easily as restoring a single file.

The process is incredibly simple. After booting from the emergency rescue disk, the I.T. manager can mount the image of the affected system as a virtual drive. The interface is a standard Windows XP Explorer-like graphical user interface. The requisite folder is identified and using a simple drag-and-drop, the image is copied back onto the damaged drive. The virtual image is then unmounted -- a one-click function -- and the system is rebooted back to the original OS.

The time it takes to restore the damaged files is literally minutes, not hours or days. In fact, the time it takes to restore an entire disk drive from an image can be measured in minutes.

When talking about return on investment, it's useful to have some sort of measure on which to base the number. Extensive analyses have been performed to calculate downtime costs to organizations when their servers fail. However, there is another important calculation of downtime that often gets overlooked.

There is a significant productivity difference between disk imaging software that images live servers versus those programs that require the I.T. manager to boot the server to DOS first. This becomes very acute at the workstation level. Here's why: Let's assume that a company has 2,080 employees, each all of whom images their workstations once per week and it takes one hour to create the image. Let's also assume that the server is imaged once per week for a full backup, with incremental images made nightly.

If the workstations have to be booted to DOS in order to be backed up, that means that every week the company will have 2,080 instances of nonproductive employee time. That's the equivalent of one employee's work year.

Over the course of one calendar year, the company will end up paying the equivalent of 52 employee years of work that wasn't done. That's roughly the same as adding 52 additional employees to the payroll (minus payroll taxes and other load) -- or 2.5 percent of the company payroll expense.

And that calculation only addresses workstations that have to boot to DOS in order to be imaged. If the server has to be booted to DOS as well, that complicates the equation even more.

Incremental Backups

Ensuring that you always have a current version of your server disk is critical to any disaster recovery plan. However, imaging a server disk daily can be time-consuming. As an alternative, you might consider creating a master image weekly and incremental images on a daily basis.

Incremental images only image those sectors of a disk that change. In the vast majority of enterprises, the operating system and applications are kept on separate partitions from the user data. By scheduling incremental images nightly on each partition, you can keep an exact copy of your server disks current while minimizing the time it takes to image the system.

Normally, the server image will be stored on a networked drive. Point the incremental image to be stored to the same directory as the primary image. If no full image is found in the target directory, a quality imaging package such as Acronis True Image Enterprise Server will create a full image, regardless of the instructions programmed into the setup.

This is because an incremental image uses the last full image and any interceding incremental images, as a basis for the latest incremental image. If none is found, the software must assume that there is no base image. If your imaging software does not make this assumption, you could end up with a partial and completely useless incremental image.

Change Management

Anyone who has ever tried to upgrade an operating system, patch an application or install "software upgrades" understands the need to have a complete image of a working hard disk. This fact was driven home to many users recently when they tried to upgrade to Windows XP Service Pack 2.

Sometimes an upgrade will crash an application, damage a network connection or cause other unanticipated problems. By having an image of the hard disk in a known, working state, no upgrade, virus or other software change will completely trash a system. Restoring a known, good image will dramatically decrease potential downtime due to problem software installations and upgrades that simply don't work.

Disk Cloning

Efficient disk imaging software can provide another important function in the I.T. department disk cloning. In many situations, such as providing a standard laptop notebook environment to a sales force or deploying multiple servers, the I.T. manager wants to keep the operating environments identical. A base image can provide that.

In the case of a mobile workforce, the base image might be the company product database , a contact manager, a standard office suite and a preconfigured network setup. By using a standard setup, the I.T. manager can save considerable time when deploying systems to new sales staff. In the case of a new machine, an image with all of the necessary information can be laid down and the machine can be sent to the new employee almost immediately.

If the laptop in question had been used by a previous salesperson, their database can be uploaded to the corporate database. Laying a new image over the existing drive not only will provide the new salesperson with a fresh install, but it also eliminates the possibility of a former, disgruntled employee from setting off a hidden virus.

It removes any software changes a prior employee might have made and overwrites any private data that might not be appropriate for the new employee.

Disk cloning also works with networked workstations. An image stored on a server can be deployed to multiple desktops. This eliminates the need for I.T. personnel to physically touch every new system being deployed. A multicast image can configure multiple systems simultaneously; conversely, you also can image multiple servers or workstations.

Where server deployment is required, laying down a fresh install of the operating system, all necessary patches and upgrades, all configuration files and the like could save an I.T. engineer hours of work. A clean, tested disk image means that a standardized server with a known, good configuration can be ready to deploy in a single day.

There is no need to start testing all network configuration information from scratch -- a disk image that includes a preconfigured network configuration (sans the IP address, of course), can eliminate a lot of redundant work.

Conclusion

Disk imaging plays an important part in not only disaster recovery and bare-metal restores, but also in disk deployment and change management. Being prepared for a disaster before it happens will go a long way in saving considerable amounts of time and money.

It's not enough today to just have data backups; a full image of the server disks can save literally days of installation, patching and configuration time. And when you're not recovering from a disaster, you can be sure you'll be deploying new software, managing software upgrades and changes and spending a lot of time managing your Windows-based desktops.

As any I.T. managers worth their salt will tell you, managing server and desktop software is not unlike juggling -- you try to keep the dozen balls in the air at all times and hope they don't fall on the floor. When they do, disaster strikes and you've got to be ready.

Source : http://www.cio-today.com/news/Minimizing-Downtime-with-Disk-Images/story.xhtml?story_id=0010002CKD92
Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?