Fatal device errors? (FATALERR, VOLINV)

The Question is:

We are running a 25 node mixed VAX/ALPHA cluster under OpenVMS 6.2-1H3. Daily
 backups of selected disks are performed as a batch job at midnight to a single
 DAT drive on one of the boot servers. The backup has recently begun to fail
 with the following mes
	%BACKUP-E-FATALERR fatal error on logical_drive_name:save_set_name
	 -SYSTEM-F-VOLINV, volume is not software enabled.
An operator request for QUIT or CONTINUE is then issued. Responding with
 CONTINUE simply re-issues the above message
This seems to occur randomly on any disk. There are no device errors showing in
 the log. The batch file uses a SHOW DEVICE command on a severe error to show
 that all appears well with the disk in question.
Subsequently the tape then fails to recognize any inserted media and a reboot
 is required to reset the tape.
Any ideas on whats going on here ?

The Answer is :

  Without the explicit error message text and the particular DCL commands
  used, it is not clear if the error is on the tape or the disk(s) -- the
  remarks refer to both various disks and to a specific DAT (DDS) tape
  drive, the OpenVMS Wizard will assume that the errors are arising on
  the access to the disk drives and not on the tape drive, but that the
  tape drive then locks up is rather unusual.
  This could be hardware or software -- please first ensure that the
  systems are appropriately tuned, that there are no errors logged on
  any devices (disk, tape, system, network, etc), ensure that the unnamed
  disks and unnamed tapes are supported devices and are up to the current
  firmware revision, that the (assumed) SCSI bus is correctly configured,
  terminated, and the cabling and the enclosures are (well) below the
  length limits for the particular SCSI controller, and that the OpenVMS
  system has relevent and mandatory ECO kits applied, and then contact
  the Compaq Customer Support Center.
  Upon contact with the CSC, expect to be asked for the above information
  -- including the specific text of the errors, the specific devices, and
  the specific DCL commands used -- as well as the ECOs that have been
  applied to the systems.  Also expect to be asked what, if anything, has
  changed recently (if anything) that might have precipitated these (disk
  or tape or cluster) errors.

answer written or last revised on ( 13-NOV-2000 )

