Best Practices for Backup Exec Deduplication

How Backup Exec Deduplication Works:
Deduplication works by dividing data into 128K segments and then storing the segments in a deduplication storage folder, along with a database that tracks the segments. Data is not stored again when a backup encounters a segment that is already stored in the deduplication storage folder. So, if you back up the same unchanged file over and over again, it is stored only one time in the deduplication storage folder.

Where the Backup Exec Deduplication Option Works Best
Deduplication only happens when the Deduplication Option detects blocks of data that are in fact the same. Operating system files deduplicate well. They are the same across multiple systems and do not change often.

Deduplication works well in the following scenarios:
? With Windows and Linux file system data
? Where the same file is backed up multiple times
? Where the percentage of data that changes is small

Where Other Backup Exec Options Work Best
Deduplication does not work well if data changes frequently or if the Deduplication Option cannot detect the duplicated blocks of data. For example, when a new bit of data is inserted at the beginning of a large file (VMDKs), the blocks of data are shifted so that none of them will match. Therefore, the file is not deduplicated.
This segment shift works against the Deduplication Option in cases where a non-file system backup is sent to the deduplication storage folder. These backups appear as one very large stream to the deduplication storage folder. Because of this, adding data early in the data stream causes the rest of the data stream to deduplicate poorly, if at all (Example: Exchange Database maintenance).
The good news is that in these cases, some Backup Exec agents can avoid backing up duplicate data with the use of traditional differential and incremental backup techniques. For example, when backing up VMWare or Hyper-V virtual machines, significantly better deduplication rates will be achieved by ensuring the Backup Exec Agent for Windows Systems is installed in each of the virtual machines and backing those machines up as though they are physical machines. Doing so allows the deduplication option to read each of the files and folders within the virtual machine and deduplicate
those individual files. (NOTE: The Agent for VMWare Virtual Machines and the Agent for Microsoft Hyper-V licenses allow for unlimited usage of the Agent for Windows Systems within the same host machine.)
Expectations for the Deduplication Option
Deduplication is data-dependent. That is, the amount of deduplication that you are going to get out of a particular data set depends on what is in the data set. Data that is all unique is not going to benefit from deduplication. Data that contains many copies of the same data will benefit from deduplication.
If there is a terabyte of source data that doesn’t have any duplicate information in it, the deduplication storage folder is going to need a terabyte of space to store it.

A deduplication storage folder has significant memory and disk space requirements. Make sure to review the requirements for the Deduplication Option before implementing it. While the option may initially work on a system that does not meet these requirements, as time goes by and the deduplication storage folder fills up, a lack of memory and disk space will cause problems.

A deduplication storage folder is significantly more complex than a backup-to-disk folder. Detecting duplicate data, tracking it in a database, and managing the interconnected links in the deduplication folder all adds up to significant memory and CPU usage. Memory, processing, and time is traded for reduced storage space requirements. This trade-off needs to be considered when choosing to use a deduplication storage folder over a backup-to-disk folder.

Avamar Backup Job fails with error code 10007

I noticed a recent NDMP backup failed with error code 10007, here is the job log information:

2010-12-01 08:00:45 avndmp Error : Snapup of “ndmp-volume-name” aborted due to ‘Error during NDMP session’.
2010-12-01 08:00:45 avndmp Info : NDMP session result: avtar returned:176 ‘Fatal signal’ ndmp returned:157 ‘Miscellaneous error’
2010-12-01 08:00:45 avndmp Info : Final summary generated subwork 1, cancelled/aborted 1, snapview 0, exitcode 157
2010-12-01 08:00:45 avndmp FATAL : Fatal signal 11 in pid 21946
2010/12/01-08:00:45.23164 [avndmp_ctl_sup] FATAL ERROR: Fatal signal 11

[sociallocker id=”759″]I’m still investigating and will update the post when i find out the root cause.  [/sociallocker]

How to shutdown Avamar

The following is the procedure to Shutdown Avamar GSAN:
1. Log on to the system as user admin.

2. Load the ssh keys
ssh-agent bash
ssh-add .ssh/admin_key

3. Verify hfscheck and garbage collect are not running.
ps -eaf|egrep “gc_cron|cp_cron|hfscheck_cron”

If hfscheck is still running, run “hfscheck_kill” as user admin to kill it off.
If GC is still running, you will need to let it finish before continuing.
If CP is running, you will need to let it finish running.

4. Take a checkpoint (as dpn)
su – dpn
ssh-agent bash
ssh-add .ssh/dpnid
cp_cron –duplog
exit
exit (Note..you should now be back to admin)

5. Stop the EMS and MCS

suspend_crons

dpnctl stop ems

dpnctl stop mcs

6. Stop the GSAN
shutdown.dpn

7. Verify avamar is shutdown

dpnctl status

In this output it shows avamar is down:

dpnctl: INFO: gsan status: down
dpnctl: INFO: MCS status: down.
dpnctl: INFO: EMS status: down.
dpnctl: INFO: Scheduler status: down.
dpnctl: INFO: Maintenance operations status: suspended.
dpnctl: INFO: Unattended startup status: disabled.
dpnctl: INFO: [see log file “/usr/local/avamar/var/log/dpnctl.log”]

Now you can safely power off the hardware.

How to check node capactiy across an EMC Avamar grid

I recently came across an issue where the avamar garbage collect was not running. When I run a status.dpn on the grid I get the following message for the garbage collect status:

Last GC: finished Wed Aug 25 01:01:04 2010 after 00m 50s >> recovered 0.00 KB (MSG_ERR_DISKFULL)

The total grid utilization is currently at 87% and I also saw some Unacknowledged Events with the following information:



Code: 4202 Message: failed garbage collection with error MSG_ERR_DISKFULL



This error is a direct result of the garbage collect run limit being reached or exceeded due to excessive checkpoint overhead. To verify and check all of the node capacities use the following commands; also if this is a single node you will not have to use the mapall command.



su – admin

ssh-agent bash

ssh-add ~admin/.ssh/admin_key



Enter the passphrase for the admin keys. (If you dont know what it is then you should not be doing this) Then run:



mapall –noerror ‘df -h’



This should give you the filesystem for each node including the sizes, used, and space available. Then run:



avmaint nodelist | grep percent-full



This will give you a cleaner output of the numbers that really matter. Pay attention to each node’s “abs-percent-full”



In most cases you should contact EMC support to resolve this issue, however in some cases running an HFS check or checkpoint validation on your oldest checkpoint might free up enough overhead to get you back on track.

Backup Exec 2010 R2 has been released

Backup Exec 2010 R2 is the second release of Backup Exec 2010 and is generally available to all users starting today, August 2nd, 2010.

This release features improvements in usability, licensing and renewal management, deduplication, virtualization, and extends platform and application support. Here are some of the most important features that the release introduces…

•Enhanced Installation and backup wizards reduce the time and complexity of setting up your backups
•Backup recommendation tool identifies potential gaps in your backup strategy and provides recommendations on the agent(s) required to ensure complete data protection.
•Integrated RSS and Renewal Assistant help keep you informed and current.
•NEW! support for SharePoint 2010, Exchange 2010 SP1, Microsoft SQL 2008 R2, Enterprise Vault 9, Mac OSX 10.6, NDMP NetApp ONTAP 8.0, EMC DART 6.0, and OST support for Data Domain DDOS 4.8 with Boost Technology.

And time for some good news..customers who are current with maintenance or support contracts for previous versions of their licensed Backup Exec software can upgrade to the appropriate Backup Exec 2010 R2 licenses at no additional charge.

com.avamar.asn.NetworkException: Unable to connect to a login server

If you are having issues logging into avamar and you see the following message “com.avamar.asn.NetworkException: Unable to connect to a login server” then the issue is usually environmental. Avamar 5.x relies heavily on DNS and improperly configured DNS will cause this issue.

If you are using windows AD for DNS, then make sure that DNS records are setup in the same domain as the domain that Avamar was setup in. If they are and still having this issue, remove duplicate avamar records from subdomains. This should resolve your issue.

If the DNS server is unix/linux based case sensitivity is usually the culperate.

Why is my Avamar LCD Display Orange?

One of your EMC Avamar nodes are blinking orange, what do you do ? Well the nodes themselves are rebranded DELL 710’s (GEN3) and all have an LCD screen with general status.  The Dell servers also have a system event log (SEL) stored in NVRAM on the motherboard.  Certain hardware events will be logged here or in the alerts event log.  In either case, this will cause the LCD display to change from a pretty blue to orange.  The display will show some error text of the last message written to the SEL.  If the display is flashing blue, someone pressed the information button on the LCD display to identify which server you are working on in the rack.  Press it again to stop the flashing and it will turn orange if errors are present in the SEL.

Dell uses Open Manage to monitor the hardware.  You can use some Open Manage CLI commands to view the events and also clear the event and alert logs.  This will change the display from orange back to pretty blue. Viewing the logs will show you what is causing the problem.  Perhaps a disk has failed, there are Single Bit Memory Errors, the raid controller battery is bad or a power supply has failed.  Avamar support will only send a power supply, disk drive or the entire node to fix a problem.  Sometimes, a problem is intermittent and clearing the logs would be preferable to replacing the node.  Here are some of the commands.

Log on with root and type the following commands:

View the SEL log

# omreport system esmlog

View the alert log

# omreport system alert

Clear the SEL log

# omconfig system esmlog action=clear

Clear the alert log

# omconfig system alertlog action=clear

View the disk status

# omreport storage pdisk controller=0

View the server setup

# omreport system summary

Avamar Agent now standard on Iomega ix-12 array

EMC Avamar agents are usually not found running directly on storage arrays, until now. At EMC world Iomega gave a sneak preview into its ix12 array with a built in Avamar agent. The Avamar agent will allow for SMB remote offices to utilize Avamar’s deduplication features without requiring additional hardware.

Hello BURA world!

This is my first blog post, just breaking this thing in! You will see more Backup Recovery and Archiving posts as I get rolling !