|
Volume Shadowing for OpenVMS
Chapter 9 Performance Information for Volume Shadowing
Volume Shadowing for OpenVMS is designed primarily to be a data
availability product and not a performance product. Recognizing that
the topics of performance and data availability cannot be completely
separated from each other, this chapter discusses the performance
effects that can result on systems using Volume Shadowing for OpenVMS.
Several factors affect the performance of a shadow set, including the
following:
- I/O access path (local versus remote)
- Size of I/O requests
- Data access patterns (random or sequential)
- Read-to-write ratio
- Shadow set configuration
- State of a shadow set (steady or transient)
- Whether or not you use the shadowing copy and merge performance
assists (see Section 9.2.2)
- Whether or not you use the minicopy operation (see Chapter 7)
- Other work load on the system utilizing common resources (CPUs,
disks, controllers, interconnects)
- Striping (RAID) implementation
The following sections examine how the state of a shadow set and its
configuration can affect resource utilization and performance. Some
guidelines for controlling the use of system resources are also
provided in Section 9.3. Because there is no significant difference
in the performance of a nonshadowed disk and a one-member shadow set,
all discussions that follow apply to multiple-member shadow sets.
9.1 Performance During Steady State
A shadow set is in a steady state when all of its members are
consistent and no copy operation or merge operation is in progress.
Overall, the performance of a shadow set in a steady state compares
favorably with that of a nonshadowed disk. Read and write I/O requests
processed by a shadow set utilize a very small amount of extra CPU
processing time as compared with a nonshadowed disk. A shadow set often
can process read requests more efficiently than can a nonshadowed disk
because it can use the additional devices to respond to multiple read
requests simultaneously.
For a shadow set in a steady state, the shadowing software handles read
and write operations in the following manner:
- Write I/O requests are issued concurrently to all members of the
shadow set. Because each member must be updated before the I/O request
is considered complete, the overall completion time for a write
operation is determined by the member unit with the longest access time
from the node issuing the write request. Depending on how the shadow
set is configured and the access paths to the individual member units,
you might observe a slight increase in the time it takes to complete
write I/O requests. The steady state performance is generally better to
a member that is locally connected because the access path is shorter
and more direct than the access path to a served member. For example,
you might notice degraded write performance on shadow sets that include
some members that are accessed through an MSCP server across a network
link, where each member is locally connected to a separate node.
- Read I/O requests are issued to only one member unit. Volume
Shadowing for OpenVMS attempts to access the member unit that can
provide the best completion time. In terms of I/O throughput, a
two-member shadow set can satisfy nearly twice as many read requests as
a nonshadowed disk (and even more throughput is possible with a
three-member shadow set). The shadow set can use the additional disk
read heads to respond to multiple read requests at the same time. Thus,
a steady-state shadow set can provide better performance when an
application or user reads data from the disk. However, the performance
gains occur mainly when the read requests queued to the shadow set come
in batches such that there are as many read requests as there are
member units.
Even though the read performance of a shadow set in steady-state has
the potential for better performance, the primary purpose of volume
shadowing is to provide data availability. It is not appropriate to use
volume shadowing as a means to increase the read I/O throughput of your
applications (by explicitly increasing the I/O work load). This is
because the same level of performance cannot be expected during
situations when copy or merge operations must take place to add new
members or preserve data consistency, or when members are removed from
the shadow set. Section 9.2 discusses performance considerations when
the shadow set is in a transient state.
9.2 Performance During Copy and Merge Operations
A shadow set is in a transient state when some or all of its members
are undergoing a copy or merge operation. During merge operations,
Volume Shadowing for OpenVMS ensures consistency by reading the data
from one member and making sure it is the same as the data contained in
the same LBNs on the other members of the shadow set. If the data is
different, the shadowing software updates the LBN on all members before
returning the I/O request. For copy operations, the shadowing software
reads data from a source member and writes the data to the same LBN on
target members.
At the same time it performs a merge or copy operation, the shadowing
software continues to process application and user I/O requests. The
I/O processing necessitated by a copy operation can result in decreased
performance as compared with the possible performance of the same
shadow set under steady-state conditions. However, if your shadow set
members are configured on controllers that support the shadowing
assisted copy and assisted merge operations, you can significantly
improve the speed with which a shadow set performs a copy or merge
operation. Volume Shadowing for OpenVMS supports both assisted and
unassisted merge and copy operations.
The following list describes how the performance of a shadow set might
be affected while an unassisted merge or copy operation is in progress.
See Chapter 6 for a description of assisted copy and merge
operations.
- Copy operations
A copy operation is started on a two- or
three-member shadow set either when you mount the shadow set to create
it or to add a new member to an existing shadow set. During a copy
operation, members that are targets of the operation cannot provide
data availability until the operation completes. Therefore, the
shadowing software performs the copy operation as quickly as possible
to make the shadow set fully available. During a copy operation,
the shadowing software gives equal priority to user and application I/O
requests and to I/O requests necessary to complete the copy operation.
The performance of a shadow set during a copy operation is reduced
because:
- The shadowing software must follow special protocols for user read
and write I/O requests during a copy operation.
- Copy operation I/O requests are large in size and have the same
priority as user and application I/O requests.
In addition, other system resources are utilized during a copy
operation. Depending on the access path to the individual shadow set
members, these resources could include the disk controller,
interconnects, interconnect adapters, and systems. Because you
explicitly start copy operations when you mount a new shadow set or add
members to an existing set, you can control when the shadowing software
performs a copy operation. Therefore, you can minimize the effect on
users and applications in the system by limiting the number of copy
operations that occur at the same time. For example, when you create
new sets or add new members, try to add the sets or members during
periods of low system activity, and do not mount several sets at the
same time. You can further minimize the effect on users and
applications in the system by using the minicopy operation, introduced
with OpenVMS Version 7.3. The minicopy operation can significantly
reduce the time it takes to return a shadow set member to shadow set.
By the use of write bitmap technology, the minicopy operation copies
only the data that was changed while the member was dismounted. For
more information, see Chapter 7.
- Merge operations
In contrast to copy operations, merge
operations are not under the control of a user or program. The
shadowing software automatically initiates a merge operation on a
shadow set as a result of the failure of a node on which the shadow set
is mounted. As in the case of a copy operation, the volume
shadowing software ensures that all I/O requests to the shadow set
follow appropriate protocols to ensure data consistency. However, when
a shadow set is undergoing a merge operation, full data availability
exists in the sense that individual members of the set are valid data
sources and are accessible by applications and users on the system.
Therefore, it is not urgent for the shadowing software to finish the
merge operation, especially when the system is being heavily used.
Because of this major distinction from a copy operation, the shadowing
software implicitly places a higher priority on user activity to the
shadow set. Volume Shadowing for OpenVMS does this by detecting and
evaluating system load, and then dynamically controlling or
throttling the merge
operation so that other I/O activity can proceed without interference.
Because the merge throttle slows merge operations when there is
heavy application and user I/O activity on the system, the merge
operation can take longer than copy operations. The merge throttle
allows application and user activity to continue unimpeded by the merge
operation when heavy work loads are detected.
On the other hand, the read performance of a shadow set during a
merge operation is reduced because the shadowing software must perform
data integrity checks on all members for every read request. The volume
shadowing software reads the data from the same LBN on all members of
the shadow set, compares the data, and repairs any inconsistencies
before returning the read data to the user.
9.2.1 Improving Performance of Unassisted Merge Operations
During an unassisted shadow set merge operation, the read performance
is reduced. You can increase or decrease the merge rate by changing the
default value of the merge multiplication factor. Two logical names are
available to vary the unassisted merge multiplication factor: one
applies to all shadow sets mounted on a node, and one applies only to
the named shadow set:
- SHAD$MERGE_DELAY_FACTOR applies to all shadow sets mounted on a
node unless you also use SHAD$MERGE_DELAY_FACTOR_DSAnnnn.
- SHAD$MERGE_DELAY_FACTOR_DSAnnnn applies to each shadow set
specified by its virtual unit name, DSAnnnn.
You can set SHAD$MERGE_DELAY_FACTOR to a certain value for all shadow
sets mounted on a node, and then you can use the
SHAD$MERGE_DELAY_FACTOR_DSAnnnn to apply different values to
one or more shadow sets on the node.
If you increase the setting for either logical name, you increase the
merge rate and decrease the I/O rate. Conversely, if you decrease the
setting for either, you decrease the merge rate and increase the I/O
rate.
Note
Decreasing the merge multiplication factor to less than 200 decreases
the merge rate exponentially. Therefore, if you choose to decrease it,
do so in very small increments.
|
You can change the multiplication factor while a merge is in progress.
These two logical names are evaluated after the completion of every
1000 merge I/Os. If you change the values, they will take effect within
1000 merge I/Os.
To change the default values so that they will remain in effect between
system boots, you must define these logicals in the system table on
each node in the cluster. The valid range for the multiplication factor
is 100 to 100,000. Any value outside this range causes the factor to
default to 200. The default value of 200 is displayed at the start of a
shadow set merge, in the
%SHADOW_SERVER-I-SSRVINIMRG
message, following the word
Factor
, unless you previously specified a value in the range of 100 to
100,000.
The following example shows the use of
SHAD$MERGE_DELAY_FACTOR_DSAnnnn:
$ DEFINE SHAD$MERGE_DELAY_FACTOR_DSA42 1000
|
This command assigns a multiplication factor of 1000 to shadow set
DSA42. This higher multiplication factor will increase the merge rate
only for the shadow set identified by DSA42 when an unassisted merge is
required.
The following example shows the the use of SHAD$MERGE_DELAY_FACTOR:
$ DEFINE SHAD$MERGE_DELAY_FACTOR 800
|
This command assigns a multiplication factor of 800 to all shadow sets
mounted on the node, except for any that may have been assigned a
different multiplication factor with the preceding command.
Note
Increasing the values excessively may cause application performance
problems when merges are occurring. When setting values, system
managers must balance the site specific application needs with their
merge requirements.
|
9.2.2 Improving Performance for Merge and Copy Operations
There are two types of performance assists: the merge assist and the
copy assist.
The merge assist improves performance by using information that is
maintained in controller-based write logs to merge only the data that
is inconsistent across a shadow set. When a merge operation is assisted
by the write logs, it is referred to as a minimerge.
The copy assist reduces system resource usage and copy times by
enabling a direct disk-to-disk transfer of data without going through
host node memory.
Assisted merge operations are usually too short to be noticeable.
Improved performance is also possible during the assisted copy
operation because it consumes less CPU and interconnect resources.
Although the primary purpose of the performance assists is to reduce
the system resources required to perform a copy or merge operation, in
some circumstances you may also observe improved read and write I/O
performance.
Volume Shadowing for OpenVMS supports both assisted and unassisted
shadow sets in the same OpenVMS Cluster configuration. Whenever you
create a shadow set, add members to an existing shadow set, or boot a
system, the shadowing software reevaluates each device in the changed
configuration to determine whether it is capable of supporting either
the copy assist or the minimerge. Enhanced performance is possible only
as long as all shadow set members are configured on controllers that
support performance assist capabilities. If any shadow set member is
connected to a controller without these capabilities, the shadowing
software disables the performance assist for the shadow set.
When the correct revision levels of software are installed, the copy
assist and minimerge are enabled by default, and are fully managed by
the shadowing software.
9.2.3 Effects on Performance
The copy assist and minimerge are designed to reduce the time needed to
do copy and merge operations. In fact, you may notice significant time
reductions on systems that have little or no user I/O occurring during
the assisted copy or merge operation. Data availability is also
improved because copy operations quickly make data consistent across
the shadow set.
Minimerge Performance Improvements
The minimerge feature provides a significant reduction in the time
needed to perform merge operations. By using controller-based write
logs, it is possible to avoid the total volume scan required by earlier
merge algorithms and to merge only those areas of the shadow set where
write activity was known to be in progress at the time the node or
nodes failed.
Unassisted merge operations often take several hours, depending on user
I/O rates. Minimerge operations typically complete in a few minutes and
are usually undetectable by users.
The exact time taken to complete a minimerge depends on the amount of
outstanding write activity to the shadow set when the merge process is
initiated, and on the number of shadow set members undergoing a
minimerge simultaneously. Even under the heaviest write activity, a
minimerge will complete within several minutes. Additionally, minimerge
operations consume minimal compute and I/O bandwidth.
Copy Assist Performance Improvements
Copy times vary according to each configuration and generally take
longer on systems supporting user I/O. Performance benefits are
achieved when the source and target disks are on different HSJ internal
buses.
9.3 Guidelines for Managing Shadow Set Performance
Sections 9.1 and 9.2 describe the performance
impacts on a shadow set in steady state and while a copy or merge
operation is in progress. In general, performance during steady state
compares with that of a nonshadowed disk. Performance is affected when
a copy or a merge operation is in progress to a shadow set. In the case
of copy operations, you control when the operations are performed.
However, merge operations are not started because of user or program
actions. They are started automatically when a system fails, or when a
shadow set on a system with outstanding application write I/O enters
mount verification and times out. In this case, the shadowing software
reduces the utilization of system resources and the effects on user
activity by throttling itself dynamically. Minimerge operations consume
few resources and complete rapidly with little or no effect on user
activity.
The actual resources that are utilized during a copy or merge operation
depend on the access path to the member units of a shadow set, which in
turn depends on the way the shadow set is configured. By far, the
resources that are consumed most during both operations are the adapter
and interconnect I/O bandwidths.
You can control resource utilization by setting the SHADOW_MAX_COPY
system parameter to an appropriate value on a system based on the type
of system and the adapters on the machine. SHADOW_MAX_COPY is a dynamic
system parameter that controls the number of concurrent copy or merge
threads that can be active on a single system. If the number of copy
threads that start up on a particular system is more than the value of
the SHADOW_MAX_COPY parameter on that system, only the number of
threads specified by SHADOW_MAX_COPY will be allowed to proceed. The
other copy threads are stalled until one of the active copy threads
completes.
For example, assume that the SHADOW_MAX_COPY parameter is set to 3. If
you mount four shadow sets that all need a copy operation, only three
of the copy operations can proceed; the fourth copy operation must wait
until one of the first three operations completes. Because copy
operations use I/O bandwidth, this parameter provides a way to limit
the number of concurrent copy operations and avoid saturating
interconnects or adapters in the system. The value of SHADOW_MAX_COPY
can range from 0 to 200. The default value is OpenVMS version specific.
Chapter 3 explains how to set the SHADOW_MAX_COPY parameter. Keep
in mind that, once you arrive at a good value for the parameter on a
node, you should also reflect this change by editing the MODPARAMS.DAT
file so that when invoking AUTOGEN, the changed value takes effect.
In addition to setting the SHADOW_MAX_COPY parameter, the following
list provides some general guidelines to control resource utilization
and the effects on system performance when shadow sets are in transient
states.
- Create or add members to shadow sets when your system is lightly
loaded.
- The amount of data that a system can transfer during copy
operations varies depending on the type of disks, interconnect,
controller, the number of units in the shadow set, and the shadow set
configuration on the system. For example, approximately 5% to 15% of
the Ethernet or CI bandwidth might be consumed for each copy operation
(for disks typically configured in CI or Ethernet environments).
- When you create unassisted, three-member shadow sets consisting of
one source member and two target devices, add both target devices at
the same time in a single mount command rather than in two separate
mount commands. Adding all members at once optimizes the copy
operations by starting a single copy thread that reads from the source
member once, and performs write I/O requests to the target members in
parallel.
- For satellite nodes in a mixed-interconnect or local area OpenVMS
Cluster system, set the system parameter SHADOW_MAX_COPY to a value of
0 for nodes that do not have local disks as shadow set members.
- Do not use the MOUNT/CLUSTER command to mount every
shadow set across the cluster unless all nodes must access the set.
Instead, use the MOUNT/SYSTEM command to mount the shadow sets on only
those nodes that need to access a particular set. This reduces the
chances of a shadow set going into a merge state. Because a shadow set
goes into a merge state only when a node that has the set mounted
fails, you can reduce the chances of this happening by limiting the
number of nodes that mount a shadow set, especially when there is no
need for a node to access the shadow sets.
- Because a copy operation can occur only on nodes that have the
shadow set mounted, create and mount shadow sets on the nodes that are
local (have direct access) to the shadow set members. This allows the
copy threads to run on these nodes, resulting in faster copy operations
with fewer resources utilized.
- If you have shadow sets configured across nodes that are accessed
through the MSCP server, you might need to increase the value of the
MSCP_BUFFER system parameter in order to avoid fragmentation of
application I/O. Be aware that each shadow set copy or merge
operation normally consumes 127 buffers.
- Dual-pathed and dual-ported shadowed disks in a OpenVMS Cluster
system can provide additional coverage against the failure of nodes
that are directly connected to shadowed disks. This type of
configuration provides higher data availability with reasonable
performance characteristics.
- Use the preferred path option to ensure dual-ported drives are
accessed via the same controller so that the shadowing software will
perform assisted copy operations.
|