Guidelines for OpenVMS Cluster Configurations
Guidelines for OpenVMS Cluster Configurations
9.3.2 Six-Satellite OpenVMS Cluster with Two Boot Nodes
Figure 9-5 shows six satellites and two boot servers connected by
Ethernet. Boot server 1 and boot server 2 perform MSCP server dynamic
load balancing: they arbitrate and share the work load between them and
if one node stops functioning, the other takes over. MSCP dynamic load
balancing requires shared access to storage.
Figure 9-5 Six-Satellite LAN OpenVMS Cluster with Two Boot
Nodes
The advantages and disadvantages of the configuration shown in
Figure 9-5 include:
Advantages
- The MSCP server is enabled for adding satellites and allows access
to more storage.
- Two boot servers perform MSCP dynamic load balancing.
Disadvantage
- The Ethernet is a potential bottleneck and a single point of
failure.
If the LAN in Figure 9-5 became an OpenVMS Cluster bottleneck, this
could lead to a configuration like the one shown in Figure 9-6.
9.3.3 Twelve-Satellite LAN OpenVMS Cluster with Two LAN Segments
Figure 9-6 shows 12 satellites and 2 boot servers connected by two
Ethernet segments. These two Ethernet segments are also joined by a LAN
bridge. Because each satellite has dual paths to storage, this
configuration also features MSCP dynamic load balancing.
Figure 9-6 Twelve-Satellite OpenVMS Cluster with Two LAN
Segments
The advantages and disadvantages of the configuration shown in
Figure 9-6 include:
Advantages
- The MSCP server is enabled for adding satellites and allows access
to more storage.
- Two boot servers perform MSCP dynamic load balancing.
From the
perspective of a satellite on the Ethernet LAN, the dual paths to the
Alpha and Integrity server nodes create the advantage of MSCP load
balancing.
- Two LAN segments provide twice the amount of LAN capacity.
Disadvantages
- This OpenVMS Cluster configuration is limited by the number of
satellites that it can support.
- The single HSG controller is a potential bottleneck and a single
point of failure.
If the OpenVMS Cluster in Figure 9-6 needed to grow beyond its
current limits, this could lead to a configuration like the one shown
in Figure 9-7.
9.3.4 Forty-Five Satellite OpenVMS Cluster with Intersite Link
Figure 9-7 shows a large, 51-node OpenVMS Cluster that includes 45
satellite nodes. The three boot servers, Integrity server 1, Integrity
server 2, and Integrity server 3, share three disks: a common disk, a
page and swap disk, and a system disk. The intersite link is connected
to routers and has three LAN segments attached. Each segment has 15
workstation satellites as well as its own boot node.
Figure 9-7 Forty-Five Satellite OpenVMS Cluster with Intersite
Link
The advantages and disadvantages of the configuration shown in
Figure 9-7 include:
Advantages
- Decreased boot time, especially for an OpenVMS Cluster with such a
high node count.
Reference: For information about
booting an OpenVMS Cluster like the one in Figure 9-7 see
Section 10.2.4.
- The MSCP server is enabled for satellites to access more storage.
- Each boot server has its own page and swap disk, which reduces I/O
activity on the system disks.
- All of the environment files for the entire OpenVMS Cluster are on
the common disk. This frees the satellite boot servers to serve only
root information to the satellites.
Reference: For
more information about common disks and page and swap disks, see
Section 10.2.
Disadvantages
- The satellite boot servers on the Ethernet LAN segments can boot
satellites only on their own segments.
9.3.5 High-Powered Workstation OpenVMS Cluster (1995 Technology)
Figure 9-8 shows an OpenVMS Cluster configuration that provides high
performance and high availability on the FDDI ring.
Figure 9-8 High-Powered Workstation Server Configuration
1995
In Figure 9-8, several Alpha workstations, each with its own system
disk, are connected to the FDDI ring. Putting Alpha workstations on the
FDDI provides high performance because each workstation has direct
access to its system disk. In addition, the FDDI bandwidth is higher
than that of the Ethernet. Because Alpha workstations have FDDI
adapters, putting these workstations on an FDDI is a useful alternative
for critical workstation requirements. FDDI is 10 times faster than
Ethernet, and Alpha workstations have processing capacity that can take
advantage of FDDI's speed. (The speed of Fast Ethernet matches that of
FDDI, and Gigabit Ethernet is 10 times faster than Fast Ethernet and
FDDI.)
9.3.6 High-Powered Workstation OpenVMS Cluster (2004 Technology)
Figure 9-9 shows an OpenVMS Cluster configuration that provides high
performance and high availability using Gigabit Ethernet for the LAN
and Fibre Channel for storage.
Figure 9-9 High-Powered Workstation Server Configuration
2004
In Figure 9-9, several Alpha workstations, each with its own system
disk, are connected to the Gigabit Ethernet LAN. Putting Alpha
workstations on the Gigabit Ethernet LAN provides high performance
because each workstation has direct access to its system disk. In
addition, the Gigabit Ethernet bandwidth is 10 times higher than that
of the FDDI. Alpha workstations have processing capacity that can take
advantage of Gigabit Ethernet's speed.
9.3.7 Guidelines for OpenVMS Clusters with Satellites
The following are guidelines for setting up an OpenVMS Cluster with
satellites:
- Extra memory is required for satellites of large LAN configurations
because each node must maintain a connection to every other node.
- Configure network to eliminate bottlenecks (that is, allocate
sufficient bandwidth within the network cloud and on server
connections).
- Maximize resources with MSCP dynamic load balancing, as shown in
Figure 9-5 and Figure 9-6.
- Keep the number of nodes that require MSCP serving minimal for good
performance.
Reference: See Section 9.5.1 for more
information about MSCP overhead.
- To save time, ensure that the booting sequence is efficient,
particularly when the OpenVMS Cluster is large or has multiple
segments. See Section 10.2.4 for more information about how to reduce
LAN and system disk activity and how to boot separate groups of nodes
in sequence.
- Use multiple LAN adapters per host, and connect to independent LAN
paths. This enables simultaneous two-way communication between nodes
and allows traffic to multiple nodes to be spread over the available
LANs. In addition, multiple LAN adapters increase failover capabilities.
9.3.8 Extended LAN Configuration Guidelines
You can use bridges and switches between LAN segments to form an
extended LAN. This can increase availability, distance, and aggregate
bandwidth as compared with a single LAN. However, an extended LAN can
increase delay and can reduce bandwidth on some paths. Factors such as
packet loss, queuing delays, and packet size can also affect network
performance. Table 9-3 provides guidelines for ensuring adequate
LAN performance when dealing with such factors.
Table 9-3 Extended LAN Configuration Guidelines
Factor |
Guidelines |
Propagation delay
|
The amount of time it takes a packet to traverse the LAN depends on the
distance it travels and the number of times it is relayed from one link
to another through a switch or bridge. If responsiveness is critical,
then you must control these factors.
For high-performance applications, limit the number of switches between
nodes to two. For situations in which high performance is not required,
you can use up to seven switches or bridges between nodes.
|
Queuing delay
|
Queuing occurs when the instantaneous arrival rate at switches or
bridges and host adapters exceeds the service rate. You can control
queuing by:
- Reducing the number of switches or bridges between nodes that
communicate frequently.
- Using only high-performance switches or bridges and adapters.
- Reducing traffic bursts in the LAN. In some cases, for example, you
can tune applications by combining small I/Os so that a single packet
is produced rather than a burst of small ones.
- Reducing LAN segment and host processor utilization levels by using
faster processors and faster LANs, and by using switches or bridges for
traffic isolation.
|
Packet loss
|
Packets that are not delivered by the LAN require retransmission, which
wastes system and network resources, increases delay, and reduces
bandwidth. Bridges and adapters discard packets when they become
congested. You can reduce packet loss by controlling queuing, as
previously described.
Packets are also discarded when they become damaged in transit. You
can control this problem by observing LAN hardware configuration rules,
removing sources of electrical interference, and ensuring that all
hardware is operating correctly.
The retransmission timeout rate, which is a symptom of packet loss,
must be less than 1 timeout in 1000 transmissions for OpenVMS Cluster
traffic from one node to another. LAN paths that are used for
high-performance applications should have a significantly lower rate.
Monitor the occurrence of retransmission timeouts in the OpenVMS
Cluster.
Reference: For information about monitoring the
occurrence of retransmission timeouts, see HP OpenVMS Cluster Systems.
|
Switch or bridge recovery delay
|
Choose switches or bridges with fast self-test time and adjust them for
fast automatic reconfiguration. This includes adjusting spanning tree
parameters to match network requirements.
Reference: Refer to HP OpenVMS Cluster Systems for more information
about LAN bridge failover.
|
Bandwidth
|
All LAN paths used for OpenVMS Cluster communication must operate with
a nominal bandwidth of at least 10 Mb/s. The average LAN segment
utilization should not exceed 60% for any 10-second interval.
For Gigabit Ethernet and 10Gigabit Ethernet configurations, enable
jumbo frames where possible.
|
Traffic isolation
|
Use switches or bridges to isolate and localize the traffic between
nodes that communicate with each other frequently. For example, use
switches or bridges to separate the OpenVMS Cluster from the rest of
the LAN and to separate nodes within an OpenVMS Cluster that
communicate frequently from the rest of the OpenVMS Cluster.
Provide independent paths through the LAN between critical systems
that have multiple adapters.
|
Packet size
|
Ensure that the LAN path supports a data field of at least 4474 bytes
end to end. For Gigabit Ethernet devices using jumbo frames, set
NISCS_MAX_PTKSZ to 8192 bytes.
Some failures cause traffic to switch from an LAN path that
supports a large packet size to a path that supports only smaller
packets. It is possible to implement automatic detection and recovery
from these kinds of failures.
|
9.3.9 System Parameters for OpenVMS Clusters
In an OpenVMS Cluster with satellites and servers, specific system
parameters can help you manage your OpenVMS Cluster more efficiently.
Table 9-4 gives suggested values for these system parameters.
Table 9-4 OpenVMS Cluster System Parameters
System Parameter |
Value for Satellites |
Value for Servers |
LOCKDIRWT
|
0
|
1-4. The setting of LOCKDIRWT influences a node's willingness to serve
as a resource directory node and also may be used to determine
mastership of resource trees. In general, a setting greater than 1 is
determined after careful examination of a cluster node's specific
workload and application mix and is beyond the scope of this document.
|
SHADOW_MAX_COPY
|
0
|
4, where a significantly higher setting may be appropriate for your
environment
|
MSCP_LOAD
|
0
|
1
|
NPAGEDYN
|
Higher than for standalone node
|
Higher than for satellite node
|
PAGEDYN
|
Higher than for standalone node
|
Higher than for satellite node
|
VOTES
|
0
|
1
|
EXPECTED_VOTES
|
Sum of OpenVMS Cluster votes
|
Sum of OpenVMS Cluster votes
|
RECNXINTERVL
1
|
Equal on all nodes
|
Equal on all nodes
|
1Correlate with bridge timers and LAN utilization.
Reference: For more information about these
parameters, see HP OpenVMS Cluster Systems and HP Volume Shadowing for OpenVMS.
9.4 Scalability in a Cluster over IP
Cluster over IP allows a maximum of 96 nodes to be connected across
geographical locations along with the support for storage. The usage of
extended LAN configuration can be replaced by IP cluster communication.
The LAN switches and bridges are replaced by the routers, thus
overcoming the disadvantages of the LAN components. The routers can be
used for connecting two or more logical subnets, which do not
necessarily map one-to-one to the physical interfaces of the router.
9.4.1 Multiple node IP based Cluster System
Figure 9-10 shows an IP based cluster system that has multiple nodes
connected to the system. The nodes can be located across different
geographical locations thus, enabling disaster tolerance and high
availability.
Figure 9-10 Multiple node IP based Cluster System
Advantages
- Cluster communication on IP supports 10 Gigabit Ethernet that
provides a throughput of 10 Gb/s
- Easy to configure
- All nodes can access the other nodes and can have shared direct
access to storage
9.4.2 Guidelines for Configuring IP based Cluster
The following are the guidelines for setting up a cluster using IP
cluster communication:
- Requires the IP unicast address for remote node discovery
- Requires IP multicast address, which is system administrator scoped
and is computed dynamically using the cluster group number. See
OpenVMS Cluster Systems for information on cluster
configuration.
- IP address of the local machine is required along with the network
mask address
- Requires the local LAN adapter on which the IP address will be
configured and is used for SCS.
9.5 Scaling for I/Os
The ability to scale I/Os is an important factor in the growth of your
OpenVMS Cluster. Adding more components to your OpenVMS Cluster
requires high I/O throughput so that additional components do not
create bottlenecks and decrease the performance of the entire OpenVMS
Cluster. Many factors can affect I/O throughput:
- Direct access or MSCP served access to storage
- Settings of the MSCP_BUFFER and MSCP_CREDITS system parameters
- File system technologies, such as Files-11
- Disk technologies, such as magnetic disks, solid-state disks, and
DECram
- Read/write ratio
- I/O size
- Caches and cache "hit" rate
- "Hot file" management
- RAID striping and host-based striping
- Volume shadowing
These factors can affect I/O scalability either singly or in
combination. The following sections explain these factors and suggest
ways to maximize I/O throughput and scalability without having to
change in your application.
Additional factors that affect I/O throughput are types of
interconnects and types of storage subsystems.
Reference: For more information about interconnects,
see Chapter 4. For more information about types of storage
subsystems, see Chapter 5. For more information about MSCP_BUFFER
and MSCP_CREDITS, see HP OpenVMS Cluster Systems.)
9.5.1 MSCP Served Access to Storage
MSCP server capability provides a major benefit to OpenVMS Clusters: it
enables communication between nodes and storage that are not directly
connected to each other. However, MSCP served I/O does incur overhead.
Figure 9-11 is a simplification of how packets require extra handling
by the serving system.
Figure 9-11 Comparison of Direct and MSCP Served Access
In Figure 9-11, an MSCP served packet requires an extra
"stop" at another system before reaching its destination.
When the MSCP served packet reaches the system associated with the
target storage, the packet is handled as if for direct access.
In an OpenVMS Cluster that requires a large amount of MSCP serving, I/O
performance is not as efficient and scalability is decreased. The total
I/O throughput is approximately 20% less when I/O is MSCP served than
when it has direct access. Design your configuration so that a few
large nodes are serving many satellites rather than satellites serving
their local storage to the entire OpenVMS Cluster.
9.5.2 Disk Technologies
In recent years, the ability of CPUs to process information has far
outstripped the ability of I/O subsystems to feed processors with data.
The result is an increasing percentage of processor time spent waiting
for I/O operations to complete.
Solid-state disks (SSDs), DECram, and RAID level 0 bridge this gap
between processing speed and magnetic-disk access speed. Performance of
magnetic disks is limited by seek and rotational latencies, while SSDs
and DECram use memory, which provides nearly instant access.
RAID level 0 is the technique of spreading (or "striping") a
single file across several disk volumes. The objective is to reduce or
eliminate a bottleneck at a single disk by partitioning heavily
accessed files into stripe sets and storing them on multiple devices.
This technique increases parallelism across many disks for a single I/O.
Table 9-5 summarizes disk technologies and their features.
Table 9-5 Disk Technology Summary
Disk Technology |
Characteristics |
Magnetic disk
|
Slowest access time.
Inexpensive.
Available on multiple interconnects.
|
Solid-state disk
|
Fastest access of any I/O subsystem device.
Highest throughput for write-intensive files.
Available on multiple interconnects.
|
DECram
|
Highest throughput for small to medium I/O requests.
Volatile storage; appropriate for temporary read-only files.
Available on any Alpha or VAX system.
|
RAID level 0
|
Available on HSD, HSJ, and HSG controllers.
|
Note: Shared, direct access to a solid-state disk or
to DECram is the fastest alternative for scaling I/Os.
9.5.3 Read/Write Ratio
The read/write ratio of your applications is a key factor in scaling
I/O to shadow sets. MSCP writes to a shadow set are duplicated on the
interconnect.
Therefore, an application that has 100% (100/0) read activity may
benefit from volume shadowing because shadowing causes multiple paths
to be used for the I/O activity. An application with a 50/50 ratio will
cause more interconnect utilization because write activity requires
that an I/O be sent to each shadow member. Delays may be caused by the
time required to complete the slowest I/O.
To determine I/O read/write ratios, use the DCL command MONITOR IO.
9.5.4 I/O Size
Each I/O packet incurs processor and memory overhead, so grouping I/Os
together in one packet decreases overhead for all I/O activity. You can
achieve higher throughput if your application is designed to use bigger
packets. Smaller packets incur greater overhead.
9.5.5 Caches
Caching is the technique of storing recently or frequently used data in
an area where it can be accessed more easily---in memory, in a
controller, or in a disk. Caching complements solid-state disks,
DECram, and RAID. Applications automatically benefit from the
advantages of caching without any special coding. Caching reduces
current and potential I/O bottlenecks within OpenVMS Cluster systems by
reducing the number of I/Os between components.
Table 9-6 describes the three types of caching.
Table 9-6 Types of Caching
Caching Type |
Description |
Host based
|
Cache that is resident in the host system's memory and services I/Os
from the host.
|
Controller based
|
Cache that is resident in the storage controller and services data for
all hosts.
|
Disk
|
Cache that is resident in a disk.
|
Host-based disk caching provides different benefits from
controller-based and disk-based caching. In host-based disk caching,
the cache itself is not shareable among nodes. Controller-based and
disk-based caching are shareable because they are located in the
controller or disk, either of which is shareable.
|