The purpose of either a full merge or a minimerge
operation is to compare data on shadow set members and to ensure that
all of them contain identical data on every logical block (each block
is identified by its logical block number [LBN]). A full merge or
minimerge operation is initiated if either of the following events
occurs:
A system failure results
in the possibility of incomplete writes.
For example, if a write request is made to a shadow set but the system
fails before a completion status is returned from all the shadow set
members, it is possible that:
All members might contain
the new data.
All members might contain
the old data.
Some members might contain
new data and others might contain old data.
The exact timing of the failure during the original
write request defines which of these three scenarios results. When
the system recovers, Volume Shadowing for OpenVMS ensures that corresponding
LBNs on each shadow set member contain the same data (old or new). It is the responsibility of the application to
determine if the data is consistent from its point of view. The volume
might contain the data from the last write request or it might not,
depending on when the failure occurred. The application should be
designed to function properly in both cases.
If a shadow set enters
mount verification with outstanding write I/O in the driver’s
internal queue, and the problem is not corrected before mount verification
times out, the systems on which the timeout occurred require other
systems that have the shadow set mounted to put the shadow set into
a merge transient state.
For example, if
the shadow set were mounted on eight systems and mount verification
timed out on two of them, those two systems would each check their
internal queue for write I/O. If one were found, the shadow set would
enter a merge transient state.
The merge operation is managed by one of the OpenVMS
systems that has the shadow set mounted. The members of a shadow set
are physically compared to each other to ensure that they contain
the same data. This is done by performing a block-by-block comparison
of the entire volume. As the merge proceeds, any blocks that are different
are made the same — either both old or new —- by means
of a copy operation. Because the shadowing software does not know
which member contains newer data, any full member can be the source member of the merge operation.
A full merge operation can be a very lengthy procedure.
During the operation, application I/O continues but at a slower rate.
A minimerge operation can be significantly faster.
By using information about write operations that were logged in volatile
controller storage, the minimerge is able to merge only those areas
of the shadow set where write activity was known to have occurred.
This avoids the need for the entire volume scan that is required by
full merge operations, thus reducing consumption of system I/O resources.
The shadowing software always selects one member
as a logical master for any merge operation,
across the OpenVMS Cluster. Any difference in data is resolved by
a propagation of the information from the merge master to all the other members.
The system responsible for doing the merge operation
on a given shadow set, updates the merge fence for this shadow set after a range of LBNs is reconciled. This fence
“proceeds” across the disk and separates the merged
and unmerged portions of the shadow set.
Application read I/O requests to the merged side
of the fence can be satisfied by any source member of the shadow set.
Application read I/O requests to the unmerged side of the fence are
also satisfied by any source member of the shadow set; however, any
potential data differences---discovered by doing a data compare operation---are
corrected on all members of the shadow set before returning the data to the user or application that requested it.
This method of dynamic correction of data inconsistencies
during read requests allows a shadow set member to fail at any point
during the merge operation without impacting data availability.
Volume Shadowing for OpenVMS supports both assisted
and unassisted merge operations in the same cluster. Whenever you
create a shadow set, add members to an existing shadow set, or boot
a system, the shadowing software reevaluates each device in the changed
configuration to determine whether it is capable of supporting the
merge assist.
Unassisted Merge Operations |
|
For
systems running software earlier than OpenVMS Version 5.5–2,
the merge operation is performed by the system and is known as an unassisted merge operation.
To ensure minimal impact on user I/O requests,
volume shadowing implements a mechanism that causes the merge operation
to give priority to user and application I/O requests.
The shadow server process performs merge operations
as a background process, ensuring that when failures occur, they minimally
impact user I/O. A side effect of this is that unassisted merge operations
can often take an extended period of time to complete, depending on
user I/O rates. Also, if another node fails before a merge completes,
the current merge is abandoned and a new one is initiated from the
beginning.
Note that data availability and integrity are
fully preserved during merge operations regardless of their duration.
All shadow set members contain equally valid data.
Assisted Merge Operations (Alpha) |
|
Starting with OpenVMS
Version 5.5–2, the merge operation includes enhancements for
shadow set members that are configured on controllers that implement assisted merge capabilities. The assisted merge operation
is also referred to as a minimerge. The minimerge
feature significantly reduces the amount of time needed to perform
merge operations. Usually, the minimerge completes in a few minutes.
HSC and HSJ controllers support minimerge. Host-based minimerge is
supported on OpenVMS Alpha Version 7.3-2 and on OpenVMS Version 8.2
for OpenVMS Integrity servers and for OpenVMS Alpha. For more information,
see Chapter 8.
By using information about write operations that
were logged in controller memory, the minimerge is able to merge only
those areas of the shadow set where write activity was known to have
been in progress. This avoids the need for the total read and compare
scans required by unassisted merge operations, thus reducing consumption
of system I/O resources.
Controller-based write logs contain information
about exactly which LBNs in the shadow set had write I/O requests
outstanding (from a failed node). The node that performs the assisted
merge operation uses the write logs to merge those LBNs that may be
inconsistent across the shadow set. No controller-based write logs
are maintained for a one member shadow set. No controller-based write
logs are maintained if only one OpenVMS system has the shadow set
mounted.
The minimerge operation is enabled on nodes running
OpenVMS Version 5.5–2 or later. Volume shadowing automatically
enables the minimerge if the controllers involved in accessing the
physical members of the shadow set support it. See the HP Volume Shadowing
for OpenVMS Software Product Description (SPD
27.29.xx ) for a list of supported controllers.
Note that minimerge operations are possible even when shadow set members
are connected to different controllers. This is because write log
entries are maintained on a per controller basis for each shadow set
member.
Volume
Shadowing for OpenVMS automatically disables minimerges if:
The shadow set is mounted
on a cluster node that is running an OpenVMS release earlier than
Version 5.5–2.
A shadow set member is
mounted on a controller running a version of firmware that does not
support minimerge.
A shadow set member is
mounted on a controller that has performance assists disabled.
If any node in the cluster,
with a shadow set mounted, is running a version of Volume Shadowing
that has minimerge disabled.
The shadow set is mounted
on a standalone system. (Minimerge operations are not enabled on standalone
systems.)
The shadow set is mounted
on only one node in the OpenVMS Cluster.
The following transient conditions can also cause
a minimerge operation to be disabled:
If an unassisted merge
operation is already in progress when a node fails.
In this situation, the shadowing software cannot interrupt
the unassisted merge operation with a minimerge.
When not enough write
log entries are available in the controllers.
The number of write log entries available is determined by controller
capacity. The shadowing software dynamically determines when there
are enough entries to maintain write I/O information successfully.
If the number of available write log entries is too low, shadowing
temporarily disables logging for that shadow set, and it returns existing
available entries on this and every node in the cluster. After some
time has passed, shadowing attempts to re-enable write logging on
this shadow set.
A controller retains a write log entry for each
write I/O request until that entry is deleted by shadowing, or the
controller is restarted.
A multiple-unit controller shares its write log
entries among multiple disks. This pool of write log entries is managed
by the shadowing software. If a controller runs out of write log entries,
shadowing disables minimerges and performs an unassisted merge operation,
should a node leave the cluster without first dismounting the shadow
set. Note that write log exhaustion does not typically occur with
disks on which the write logs are not shared.
When the controller write
logs become inaccessible for one of the following reasons, a minimerge
operation is not possible.
Controller failure causes
write logs to be lost or deleted.
A device that is dual
ported to multiple controllers fails over to its secondary controller.
(If the secondary controller is capable of maintaining write logs,
the minimerge operations are reestablished quickly.)