I recently came across an issue where the avamar garbage collect was not running. When I run a status.dpn on the grid I get the following message for the garbage collect status:

Last GC: finished Wed Aug 25 01:01:04 2010 after 00m 50s >> recovered 0.00 KB (MSG_ERR_DISKFULL)

The total grid utilization is currently at 87% and I also saw some Unacknowledged Events with the following information:



Code: 4202 Message: failed garbage collection with error MSG_ERR_DISKFULL



This error is a direct result of the garbage collect run limit being reached or exceeded due to excessive checkpoint overhead. To verify and check all of the node capacities use the following commands; also if this is a single node you will not have to use the mapall command.



su – admin

ssh-agent bash

ssh-add ~admin/.ssh/admin_key



Enter the passphrase for the admin keys. (If you dont know what it is then you should not be doing this) Then run:



mapall –noerror ‘df -h’



This should give you the filesystem for each node including the sizes, used, and space available. Then run:



avmaint nodelist | grep percent-full



This will give you a cleaner output of the numbers that really matter. Pay attention to each node’s “abs-percent-full”



In most cases you should contact EMC support to resolve this issue, however in some cases running an HFS check or checkpoint validation on your oldest checkpoint might free up enough overhead to get you back on track.