|
|
|
|
Using Mount Verification for Recovery
Without mount verification, a write lock or offline error causes a volume to be dismounted immediately. All outstanding I/O to the volume is canceled, and all open files on the volume are closed. Any data not yet written to the volume is lost.
You can also use mount verification to perform switched path on multipath fibre channel or SCSI disk or tape devices. See "How OpenVMS Performs Multipath Failover" in Guidelines for OpenVMS Cluster Configurations.
Understanding Mount Verification
When the system or a user attempts to access a device after
it has gone off line, mount verification is initiated. Usually a
device goes off line as the result of a hardware or user error.
Once a device is off line, the hardware (and for some disks, the
software) marks the disk or tape as "invalid," and
I/O requests for that device fail.
As long as mount verification is enabled, the following operations occur:
%%%%%%%%%%% OPCOM, <dd-mmm-yyyy hh:mm:ss.cc> %%%%%%%%%%% Device <device-name> is offline. Mount verification in progress.
When a device goes off line or is write-locked, mount verification sends two messages:
The second message is a form of insurance in cases in which OPCOM is unavailable. For example, if the system disk undergoes mount verification or if OPCOM is not present on a system, you at least receive the messages with the %SYSTEM-I-MOUNTVER prefix. Under normal circumstances, the operator terminal receives both messages, with the %SYSTEM-I-MOUNTVER message arriving first.
These messages notify you of the problem, and allow you to correct the problem and recover the operation. When a pending mount verification is canceled by timing out, OPCOM prints a message in the following format:
%%%%%%%%%%% OPCOM, <dd-mmm-yyyy hh:mm:ss.cc> %%%%%%%%%%% Mount verification aborted for device <device-name>.After a mount verification times out, all pending and future I/O requests to the volume fail. You must dismount and remount the disk before users can access it again.
Mount verification caused by a write-lock error does not time out. |
Suppose, for example, that a volume is mounted on a drive with write-lock off, and someone toggles the WRITE LOCK switch. If mount verification is enabled for the volume, the volume enters mount verification, and all I/O operations to the volume are suspended until you recover the operation, as explained in Recovering from Write-Lock Errors.
At mount time, if the system detects that the caches were not written back the last time the volume was used, the system automatically rebuilds the file information by scanning the contents of the volume. However, files being written at the time of the improper dismount might be partially or entirely lost. See Using the Analyze/Disk_Structure Utility to Check and Repair Disks for details about analyzing and repairing these problems.
With the mount verification feature of disk and tape handling, users are generally unaware that a mounted disk or tape has gone off line and returned on line, or in some other way has become unreachable and then restored.
Using Mount Verification
The following sections explain how to perform these tasks:
Task | Section |
---|---|
Enable and disable mount verification
|
Enabling Mount Verification
|
Control timeout periods for mount verification
|
Controlling Timeout Periods for Mount Verification
|
Recover from offline errors
|
Recovering from Offline Errors
|
Recover from write-lock errors
|
Recovering from Write-Lock Errors
|
Cancel mount verification using the DISMOUNT
command
|
Canceling Mount Verification
|
Control the number of mount verification messages
|
Controlling Mount Verification Messages
|
Enabling Mount Verification
Mount verification is enabled by default when you mount a
disk or tape. To disable mount verification, you must specify /NOMOUNT_VERIFICATION
when you mount a disk or tape.
Note that this feature applies to standard mounted tapes, foreign mounted tapes, and Files-11 disks.
Controlling Timeout Periods for Mount Verification
You can control the amount of time (in seconds) that is allowed
for a mount verification to complete before it is automatically
canceled. The MVTIMEOUT system parameter for disks and the TAPE_MVTIMEOUT
system parameter for tapes define the time (in seconds) that is
allowed for a pending mount verification to complete before it is
automatically canceled.
The default time limit for tapes is 600 seconds (10 minutes); for disks, it is 3600 seconds (1 hour). (Refer to the HP OpenVMS System Management Utilities Reference Manual for more information about system parameters.)
Always set either parameter to a reasonable value for the typical operations at your site. Note that resetting the value of the parameter does not affect a mount verification that is currently in progress.
Recovering from Offline Errors
When a mounted disk or tape volume goes off line while mount
verification is enabled, you can try to recover, or you can terminate
the mount request. The following options are available:
If you successfully put the device back on line, the mount verification software that polls the disk or tape drive begins verification in the following sequence of steps:
%%%%%%%%%%% OPCOM, <dd-mmm-yyyy hh:mm:ss.cc> %%%%%%%%%%% Device <device-name> contains the wrong volume. Mount verification in progress.
%%%%%%%%%%% OPCOM, <dd-mmm-yyyy hh:mm:ss.cc> %%%%%%%%%%% Mount verification completed for device .
%%%%%%%%%%% OPCOM, 28-MAY-2000 11:54:54.12 %%%%%%%%%%% Device DUA0: is offline. Mount verification in progress. %%%%%%%%%%% OPCOM, 28-MAY-2000 11:57:34.22 %%%%%%%%%%% Mount verification completed for device DUA0:.In this example, the message from OPCOM informs the operator that device DUA0: went off line and mount verification was initiated. The operator finds that the drive was accidentally powered down and successfully powers it up again.
Recovering from Write-Lock Errors
Devices become write-locked when a hardware or user error
occurs while a disk or a tape volume is mounted for a write operation.
For example, if a disk is write-locked or a tape is missing a write
ring, the hardware generates an error. As soon as the software discovers
that the disk or tape is write-locked (for example, when an I/O
operation fails with a write-lock error), mount verification begins.
OPCOM issues a message in the following format to the operators enabled for DISKS and DEVICES or TAPES and DEVICES, announcing the unavailability of the disk or tape:
%%%%%%%%%%%% OPCOM, <dd-mmm-yyyy hh:mm:ss.cc> %%%%%%%%%%% Device <device-name> has been write-locked. Mount verification in progress.You can either recover the operation or terminate mount verification. Your options include the following ones:
Once the mount verification software determines that the volume is in a write-enabled state, I/O operations to the tape or disk resume with no further messages.
Canceling Mount Verification
You can cancel a mount verification request in one of the
following ways:
The following section describes the first method, using the DISMOUNT command, in more detail. See Canceling Mount Verification for details about using the last method, IPC, to cancel mount verification.
To dismount a volume:
%%%%%%%%%%%% OPCOM, <dd-mmm-yyyy hh:mm:ss.cc> %%%%%%%%%%% Mount verification aborted for device <device-name>.If you do not have access to the volume, you receive an error message. You can try again if you can find an appropriate process to use. If your process hangs, the system file ACP is hung, and you cannot use this technique to cancel mount verification.
Controlling Mount Verification
Messages
In a Storage Area Network
(SAN),
mount verification takes place for a variety of reasons, including:
Mount verification now suppresses the messages that were previously displayed for mount verification events from which devices immediately recovered. These messages unduly alarmed some customers.
The number of messages logged to the operator's log is now controlled by two system parameters:
MVSUPMSG_NUM, which specifies a number of mount verification messages
MVSUPMSG_INTVL, which specifies a duration in seconds
If the number of mount verification messages that have been suppressed for a given device meets or exceeds the number specified by MVSUPMSG_NUM within the time specified by MVSUPMSG_INTVL, then an OPCOM message is displayed, as shown in the following examples:
%SYSTEM-I-MOUNTVER, $1$DGA9999: 5 Mount verification messages have been suppressed in past 51 seconds.%%%%%%%%%%% OPCOM 18-MAY-2003 13:50:09.72 %%%%%%%%%%% $1$DGA9999: 5 Mount verification messages have been suppressed in past 51 seconds.*********************************************************************************************%SYSTEM-I-MOUNTVER, $1$DGA9999: 5 Mount verification messages have been suppressed in past 3 seconds.%%%%%%%%%%% OPCOM 18-MAY-2003 13:50:13.17 %%%%%%%%%%% $1$DGA9999: 5 Mount verification messages have been suppressed in past 3 seconds.
Customers who prefer prior behavior or who would like to increase or decrease the number of messages that are logged can adjust the system parameter settings.
For more information about these new system parameters, refer to the HP OpenVMS System Management Utilities Reference Manual.
|
|