|
Volume Shadowing for OpenVMS
6.3 Merge Operations
The purpose of a merge operation is to compare data on shadow set
members and to ensure that inconsistencies are resolved. A merge
operation is initiated if either of the following events occurs:
- A system failure results in the possibility of incomplete writes.
For example, if a write request is made to a shadow set but the
system fails before a completion status is returned from all the shadow
set members, it is possible that:
- All members might contain the new data.
- All members might contain the old data.
- Some members might contain new data and others might contain old
data.
The exact timing of the failure during the original write request
defines which of these three scenarios results. When the system
recovers, however, it is essential that corresponding LBNs on each
shadow set member contain the same data (old or new). Thus,
the issue here is not one of data availability, but rather of
reconciling potential differences among shadow set members. Once the
data on all disks is made identical, application data can be
reconciled, if necessary, either by the user reentering the data or by
database recovery and application journaling techniques.
- If a shadow set enters mount verification with outstanding write
I/O in the internal queue, and the problem is not corrected before
mount verification times out, the systems on which the timeout occurred
cause the shadow set to require a merge operation.
For example, if
the shadow set were mounted on eight nodes and mount verification timed
out on two of them, those two nodes would each cause the shadow set to
require a merge operation.
The merge operation is managed by one of the OpenVMS systems that has
the shadow set mounted. The members of a shadow set are physically
compared to each other to ensure that they contain the same data. This
is done by performing a block-by-block comparison of the entire volume.
As the merge proceeds, any blocks that are different are made the same
--either both old or new---by means of a copy operation. Because the
shadowing software does not know which member contains newer data, any
full member can be the source member of the merge
operation.
The shadowing software always selects one member as a logical
master for any merge operation, across the OpenVMS Cluster.
Any difference in data is resolved by a propagation of the information
from the merge master to all the other members.
The system responsible for doing the merge operation on a given shadow
set, updates the merge fence for this shadow set after
a range of LBNs is reconciled. This fence "proceeds" across
the disk and separates the merged and unmerged portions of the shadow
set.
Application read I/O requests to the merged side of the fence can be
satisfied by any source member of the shadow set. Application read I/O
requests to the unmerged side of the fence are also satisfied by any
source member of the shadow set; however, any potential data
differences---discovered by doing a data compare operation---are
corrected on all members of the shadow set before returning
the data to the user or application that requested it.
This method of dynamic correction of data inconsistencies during read
requests allows a shadow set member to fail at any point during the
merge operation without impacting data availability.
Volume Shadowing for OpenVMS supports both assisted and unassisted
merge operations in the same cluster. Whenever you create a shadow set,
add members to an existing shadow set, or boot a system, the shadowing
software reevaluates each device in the changed configuration to
determine whether it is capable of supporting the merge assist.
6.3.1 Unassisted Merge Operations
For systems running software earlier than OpenVMS Version 5.5--2, the
merge operation is performed by the system and is known as an
unassisted merge operation.
To ensure minimal impact on user I/O requests, volume shadowing
implements a mechanism that causes the merge operation to give priority
to user and application I/O requests.
The shadow server process performs merge operations as a background
process, ensuring that when failures occur, they minimally impact user
I/O. A side effect of this is that unassisted merge operations can
often take an extended period of time to complete, depending on user
I/O rates. Also, if another node fails before a merge completes, the
current merge is abandoned and a new one is initiated from the
beginning.
Note that data availability and integrity are fully preserved during
merge operations regardless of their duration. All shadow set members
contain equally valid data.
6.3.2 Assisted Merge Operations
Starting with OpenVMS Version 5.5--2, the merge operation includes
enhancements for shadow set members that are configured on controllers
that implement assisted merge capabilities. The
assisted merge operation is also referred to as a
minimerge. The minimerge feature significantly reduces
the amount of time needed to perform merge operations. Usually, the
minimerge completes in a few minutes.
HSC and HSJ controllers support minimerge. Support for minimerge on HSG
controllers is planned.
By using information about write operations that were logged in
controller memory, the minimerge is able to merge only those areas of
the shadow set where write activity was known to have been in progress.
This avoids the need for the total read and compare scans required by
unassisted merge operations, thus reducing consumption of system I/O
resources.
Controller-based write logs contain information about exactly which
LBNs in the shadow set had write I/O requests outstanding (from a
failed node). The node that performs the assisted merge operation uses
the write logs to merge those LBNs that may be inconsistent across the
shadow set. No controller-based write logs are maintained for a one
member shadow set. No controller-based write logs are maintained if
only one OpenVMS system has the shadow set mounted.
Note
The shadowing software does not automatically enable a minimerge on a
system disk because of the requirement to consolidate crash dump files
on a nonsystem disk.
Dump off system disk (DOSD) is supported on both OpenVMS VAX and
OpenVMS Alpha, starting with OpenVMS VAX Version 6.2 and OpenVMS Alpha
Version 7.1. If DOSD is enabled, the system disk can be minimerged.
|
The minimerge operation is enabled on nodes running OpenVMS Version
5.5--2 or later. Volume shadowing automatically enables the minimerge
if the controllers involved in accessing the physical members of the
shadow set support it. See the Volume Shadowing for OpenVMS
Software Product Description (SPD 27.29.xx) for a
list of supported controllers. Note that minimerge operations are
possible even when shadow set members are connected to different
controllers. This is because write log entries are maintained on a per
controller basis for each shadow set member.
Volume Shadowing for OpenVMS automatically disables minimerges if:
- The shadow set is mounted on a cluster node that is running an
OpenVMS release earlier than Version 5.5--2.
- A shadow set member is mounted on a controller running a version
of firmware that does not support minimerge.
- A shadow set member is mounted on a controller that has
performance assists disabled.
- If any node in the cluster, with a shadow set mounted, is running
a version of Volume Shadowing that has minimerge disabled.
- The shadow set is mounted on a standalone system. (Minimerge
operations are not enabled on standalone systems.)
- The shadow set is mounted on only one node in the OpenVMS Cluster.
The following transient conditions can also cause a minimerge operation
to be disabled:
- If an unassisted merge operation is already in progress when a
node fails.
In this situation, the shadowing software cannot
interrupt the unassisted merge operation with a minimerge.
- When not enough write log entries are available in the
controllers.
The number of write log entries available is
determined by controller capacity. The shadowing software dynamically
determines when there are enough entries to maintain write I/O
information successfully. If the number of available write log entries
is too low, shadowing temporarily disables logging for that shadow set,
and it returns existing available entries on this and every node in the
cluster. After some time has passed, shadowing will attempt to reenable
write logging on this shadow set. A controller retains a write log
entry for each write I/O request until that entry is deleted by
shadowing, or the controller is restarted. A multiple-unit
controller shares its write log entries among multiple disks. This pool
of write log entries is managed by the shadowing software. If a
controller runs out of write log entries, shadowing disables minimerges
and will perform an unassisted merge operation, should a node leave the
cluster without first dismounting the shadow set. Note that write log
exhaustion does not typically occur with disks on which the write logs
are not shared.
- When the controller write logs become inaccessible for one of the
following reasons, a minimerge operation is not possible.
- Controller failure causes write logs to be lost or deleted.
- A device that is dual ported to multiple controllers fails over to
its secondary controller. (If the secondary controller is capable of
maintaining write logs, the minimerge operations are reestablished
quickly.)
6.4 Controlling HSC Assisted Copy and Minimerge Operations
This section describes how to control assisted copy and minimerge
operations on an HSC controller. It is not possible to control these
operations on an HSJ controller.
To disable both the merge and copy performance assists on the HSC
controller, follow these steps on each HSC controller for which you
want to disable the assists:
- Press Ctrl/C to get to the HSC prompt.
- When the HSC> prompt appears on the terminal screen, enter the
following commands:
HSC> RUN SETSHO
SETSHO> SET SERVER DISK/NOHOST_BASED_SHADOWING
SETSHO-I Your settings require an IMMEDIATE reboot on exit.
SETSHO> EXIT
SETSHO-Q Rebooting HSC. Press RETURN to continue, CTRL/Y to abort:
|
After you issue these commands, the HSC controller automatically
reboots:
To reenable the assists, follow the same procedure on your HSC
controller, but use the /HOST_BASED_SHADOWING qualifier on the SET
SERVER DISK command.
Use the HSC command SHOW ALL to see whether the assists are enabled or
disabled. The following example shows a portion of the SHOW ALL display
that indicates the shadowing assists status:
HSC> SHOW ALL
.
.
.
5-Jun-1997 16:42:51.40 Boot: 21-Feb-1997 13:07:19.47 Up: 2490:26
Version: V860 System ID: %X000011708247 Name: HSJNOT
Front Panel: Secure HSC Type: HSC90
.
.
.
Disk Server Options:
Disk Caching: Disabled
Host Based Shadowing Assists: Enabled
Variant Protocol: Enabled
Disk Drive Controller Timeout: 2 seconds
Maximum Sectors per Track: 74 sectors
Disk Copy Data connection limit: 4 Active: 0
.
.
.
|
6.5 What Happens to a Shadow Set When a System Fails?
When a system, controller, or disk failure occurs, the shadowing
software maintains data availability by performing the appropriate
copy, merge, or minimerge operation. The following subsections describe
the courses of action taken when failures occur. The course of action
taken depends on the event and whether the shadow set is in a steady
state or a transient state.
Transitions from Steady State
When a shadow set is in a steady state, the following transitions can
occur:
- If you mount a new disk into a steady state shadow set, the
shadowing software performs a copy operation to make the new disk a
full shadow set source member.
- If a failure occurs on a standalone system (the system crashes), on
a steady state shadow set, the shadow set SCB reflects that the shadow
set has been incorrectly dismounted. When the system is rebooted and
the set is remounted, a copy operation is not necessary, but a merge
operation is necessary and initiated.
- If a failure occurs in a cluster, the shadow set is merged by a
remaining node that has the shadow set mounted:
- If performance assists are enabled, and the controller-based write
logs are available, the shadowing software performs a minimerge.
- If performance assists are not enabled, the shadowing software
performs a merge operation.
Once the transition completes, the disks contain identical information
and the shadow set returns to a steady state.
Transitions During Copy and Minicopy Operations
The following list describes the transitions that can occur to a shadow
set that is undergoing a copy or minicopy operation. The transitions
apply to both forms of copy operations except where noted:
- If you mount an additional disk into the shadow set that is
already undergoing a copy operation, the shadowing software finishes
the original copy operation before it begins another copy operation on
the newly mounted disk.
- When a shadow set on a standalone system is undergoing a copy
operation and the system fails, the copy operation aborts and the
shadow set is left with the original members. For a standalone system,
there is no recourse except to reboot the system and reinitiate the
shadow set copy operation with a MOUNT command.
- When a shadow set is mounted on more than one node in the cluster
and is undergoing a copy operation, if the node performing the copy
operation dismounts the virtual unit, another node in the cluster that
has that shadow set mounted will continue the copy operation
automatically.
If a shadow set is undergoing a minicopy operation
when this occurs, the minicopy will not continue. Instead, a full copy
will continue from the point where the minicopy stopped, and all the
remaining blocks will be copied.
- If a shadow set is mounted on more than one node in the cluster
and is undergoing a copy operation, should the node performing the copy
operation fail, another node in the cluster that has that shadow set
mounted will continue the copy operation automatically.
When a node failure occurs during a shadow set copy operation, merge
behavior depends on whether or not the shadowing performance assists
are enabled.
- If minimerge is enabled and can be performed, the shadowing
software interrupts the copy operation to perform a minimerge and then
resumes the copy operation.
- If the minimerge is not enabled, the shadowing software marks the
set as needing a merge operation and finishes the copy operation before
beginning the merge operation.
Transitions During Minimerge Operations
When a shadow set is undergoing a minimerge operation, the following
transitions can occur:
- If a new member is mounted into a shadow set when a minimerge
operation is in progress, the minimerge is completed before the copy
operation is started.
- If another system failure occurs before a pending minimerge has
completed, the action taken depends on whether or not the shadowing
performance assists are enabled and if the controller-based write logs
are available.
- If performance assists are enabled and if the controller-based
write logs are available for the last node failure, the shadowing
software restarts the minimerge from the beginning and adds new LBNs to
the write log file based on the entries taken from the nodes that
failed.
- If performance assists are disabled, the shadowing software reverts
to a merge operation. The performance assists might be disabled if the
controller runs out of write logs or if a failover occurs from a
controller with write logs to one that does not.
Transitions During Merge Operations
The following list describes the transitions that can occur to the
shadow set that is undergoing a merge operation when performance
assists are not available:
- If you add a new disk to a shadow set that is undergoing a merge
operation, the shadowing software interrupts the merge operation to
perform a copy operation. The merge operation resumes when the copy
operation is completed.
- If a node failure occurs when the shadow set is performing a merge
operation, the shadowing software abandons the current merge operation
and starts a new merge operation.
6.6 Examples of Copy and Merge Operations
Example 6-1 shows what happens when you create a shadow set by
mounting two disk volumes that have never been a part of a shadow set.
Because neither disk volume has been a part of a shadow set, the Mount
utility (MOUNT) assumes that the first disk named in the MOUNT command
is the source member. When the Mount utility checks the volume labels
on the disks, it discovers that they are different from each other, and
the utility automatically performs a copy operation.
In this example, DSA0 is the virtual unit name, $1$DUA8 and $1$DUA89
are the names of the disk volumes, and SHADOWDISK is the volume label.
Example 6-1 Copy Operation: Creating a New
Shadow Set |
$ MOUNT DSA0: /SHADOW=($1$DUA8:,$1$DUA89:) SHADOWDISK
%MOUNT-I-MOUNTED, SHADOWDISK mounted on _DSA0:
%MOUNT-I-SHDWMEMSUCC, _$1$DUA8: (FUSS) is now a valid member
of the shadow set
%MOUNT-I-SHDWMEMCOPY, _$1$DUA89: (FUSS) added to the shadow
set with a copy operation
$ SHOW DEVICE DSA0:
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA0: Mounted 0 SHADOWDISK 890937 1 1
$1$DUA8: (FUSS) ShadowSetMember 0 (member of DSA0:)
$1$DUA89: (FUSS) ShadowCopying 0 (copy trgt DSA0: 1% copied)
|
The SHOW DEVICE display in Example 6-1 shows the shadow set during
the copy operation (transient state). Because the SCB information on
$1$DUA8 and $1$DUA89 indicates that these devices have never been part
of a shadow set, the shadowing software uses the first device named in
the command line ($1$DUA8) as the source of the copy operation. The
device status "ShadowSetMember" indicates that the $1$DUA8
device is a source shadow set member, and "ShadowCopying"
indicates that the physical device $11$DUA89 is the target of a copy
operation.
Suppose you want to add a new member to an existing shadow set, and the
device you add is a previous member of this same shadow set. In this
case, the volume label of the new member matches that of the current
shadow set members, but the new member's MOUNT generation number is out
of date compared with those of the current members. Thus, the Mount
utility automatically performs a copy operation on that member.
Example 6-2 shows the format of the MOUNT command and MOUNT status
messages returned when you add the $3$DIA12 device to the shadow set
represented by the DSA9999 virtual unit. Notice that you do not need to
list the member units currently in the shadow set on the MOUNT command
line.
Example 6-2 Copy Operation: Adding a Member
to an Existing Shadow Set |
$ MOUNT /SYSTEM DSA9999: /SHADOW=$3$DIA12: AXP_SYS_071
%MOUNT-I-MOUNTED, AXP_SYS_071 mounted on _DSA9999:
%MOUNT-I-SHDWMEMCOPY, _$3$DIA12: (SHAD03) added to the shadow
set with a copy operation
$ SHOW DEVICE DSA9999:
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA9999: Mounted 0 AXP_SYS_071 70610 1 1
$3$DIA7: (BGFUSS) ShadowSetMember 0 (member of DSA9999:)
$3$DIA5: (SHAD03) ShadowSetMember 0 (member of DSA9999:)
$3$DIA12: (SHAD03) ShadowCopying 0 (copy trgt DSA9999: 0% copied)
|
Example 6-3 shows what happens when a three-member shadow set is
dissolved on one node and then is immediately remounted on another
node. When the Mount utility checks the volume information on each
member, it finds that the volume information is consistent across the
shadow set. Thus, a copy operation is not necessary when the shadow set
is mounted.
In Example 6-3, DSA10 is the virtual unit and $3$DUA10, $3$DUA11, and
$3$DUA12 are the member volumes. The first part of the example displays
the output from a SHOW DEVICE command, which shows that the shadow set
is mounted and in a steady state. Then the user dismounts the DSA10
shadow set and immediately remounts it.
Example 6-3 No Copy Operation: Rebuilding a
Shadow Set |
$ SHOW DEVICE D
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA10: Mounted 0 VAX_SYS_071 292971 1 1
$3$DUA10: (MYNODE) ShadowSetMember 0 (member of DSA10:)
$3$DUA11: (MYNODE) ShadowSetMember 0 (member of DSA10:)
$3$DUA12: (MYNODE) ShadowSetMember 0 (member of DSA10:)
$ DISMOUNT /NOUNLOAD DSA10:
%%%%%%%%%%% OPCOM 24-MAR-1997 20:26:41.40 %%%%%%%%%%%
$3$DUA10: (MYNODE) has been removed from shadow set.
%%%%%%%%%%% OPCOM 24-MAR-1997 20:26:41.69 %%%%%%%%%%%
$3$DUA11: (MYNODE) has been removed from shadow set.
%%%%%%%%%%% OPCOM 24-MAR-1997 20:26:41.69 %%%%%%%%%%%
$3$DUA12: (MYNODE) has been removed from shadow set.
%%%%%%%%%%% OPCOM 24-MAR-1997 20:26:41.69 %%%%%%%%%%%
$ MOUNT /SYSTEM DSA10: /SHADOW=($3$DUA10:, $3$DUA11:, $3$DUA12:) VAX_SYS_071
%MOUNT-I-MOUNTED, VAX_SYS_071 mounted on _DSA10:
%MOUNT-I-SHDWMEMSUCC, _$3$DUA10: (MYNODE) is now a valid member of
the shadow set
%MOUNT-I-SHDWMEMSUCC, _$3$DUA11: (MYNODE) is now a valid member of
the shadow set
%MOUNT-I-SHDWMEMSUCC, _$3$DUA12: (MYNODE) is now a valid member of
the shadow set
$
|
Example 6-4 shows the output from the SHOW DEVICE command at the time
of the merge operation.
When a system fails, the volume information is left in a state that
shows that each shadow set member was not properly dismounted. If you
issue the MOUNT command again after the node reboots, the shadowing
software automatically performs a merge operation on the shadow set.
Example 6-4 Merge Operation: Rebuilding a
Shadow Set |
$ SHOW DEVICE DSA42:
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA42: Mounted 0 ATHRUZ 565997 1 1
$4$DUA2: (MYNODE) ShadowMergeMbr 0 (merging DSA42: 0% merged)
$4$DUA42: (YRNODE) ShadowMergeMbr 0 (merging DSA42: 0% merged)
|
|