|
HP OpenVMS Cluster Systems
9.5.3 Defining the Satellite System to the Boot Server
Integrity server Satellite systems boot via the PXE protocol. On
OpenVMS, PXE is handled by BOOTP from the TCPIP product. If you are
using more than one Integrity server system, which is a boot server in
your cluster, be sure the BOOTP database is on a common disk. See the
TCPIP documentation for information on configuring TCPIP components.
TCPIP must be installed, configured and running before attempting to
define a satellite system.
On an Integrity server system, which is a boot server, log in to the
system manager's or other suitably privileged account. Execute the
command procedure SYS$MANAGER:CLUSTER_CONFIG_ LAN.COM.
(CLUSTER_CONFIG.COM, which configures satellite nodes using DECnet,
does not support Integrity server systems. It will, however,
automatically invoke CLUSTER_CONFIG_LAN for Integrity server systems.)
CLUSTER_CONFIG_LAN is a menudriven command procedure designed to help
you configure satellite systems. The menus are context-sensitive and
may vary depending on architecture and installed products. If you are
unfamiliar with the procedure, please see refer to the System
Management documentation for a more extensive overview of
CLUSTER_CONFIG_LAN.
The essential information required to add an Integrity server satellite
includes the node's SCS node name, SCS system ID, and hardware address.
In addition, you will need to know the satellite's IP address, network
mask, and possibly gateway addresses. If you are unfamiliar with these
concepts, please refer to the TCPIP documentation. The procedure will
create a system root for the satellite.
CLUSTER_CONFIG_LAN should perform all steps required to make the
satellite system bootable. If you choose local paging and swapping
files, you will be prompted to boot the satellite system into the
cluster so that the files may be created. If not, paging and swapping
files will be created on the served system disk and you may boot the
satellites at your convenience.
9.5.4 Booting the Satellite
If you have previously added an option to the boot menu, select that
option. If you have not, see your hardware documentation for the steps
required to boot from a network adapter. Be sure to set the environment
variable VMS_FLAGS to include the memory disk boot flag (0x200000). The
system will detail boot progress in the form of a system message when
VMS_LOADER is obtained from the network, followed by one period
character written to the console device for every file downloaded to
start the boot sequence and last by a message indicating that IPB (the
primary bootstrap image) has been loaded.
Note the following example:
Loading.: Satellite Boot EIA0 Mac(00-13-21-5b-86-48)
Running LoadFile()
CLIENT MAC ADDR: 00 13 21 5B 86 48
CLIENT IP: 16.116.43.79 MASK: 255.255.248.0 DHCP IP: 0.240.0.0
TSize.Running LoadFile()
Starting: Satellite Boot EIA0 Mac(00-13-21-5b-86-48)
Loading memory disk from IP 16.116.43.78
............................................................................
Loading file: $13$DKA0:[SYS10.SYSCOMMON.SYSEXE]IPB.EXE from IP 16.116.43.78
%IPB-I-SATSYSDIS, Satellite boot from system device $13$DKA0:
HP OpenVMS Industry Standard 64 Operating System, Version V8.3
© Copyright 1976-2006 Hewlett-Packard Development Company, L.P.
|
Upon first full boot, the satellite system will run AUTOGEN and reboot.
9.5.5 Additional Tasks on the Satellite System
If you had not done so previously, create the dump file for DOSD at
this time. Edit the SYS$STARTUP:SYCONFIG.COM file and add commands to
mount the DOSD device. In order for the error log buffers to be
recovered, the DOSD device must be mounted in SYCONFIG.
9.6 Booting Satellites with IP interconnect (Integrity servers, Alpha)
For Alpha satellite nodes, the satellite node and its boot server must
exist in the same LAN segment. To select the interface to be used for
satellite booting, assume that the satellite node does not have any
disk running OpenVMS connected to it. If you are adding Alpha systems
as satellite nodes, you can receive information from the ">>>"
prompt by executing the following command:
P00>>>show device
dga5245.1003.0.3.0 $1$DGA5245 COMPAQ HSV110 (C)COMPAQ 3028
dga5245.1004.0.3.0 $1$DGA5245 COMPAQ HSV110 (C)COMPAQ 3028
dga5890.1001.0.3.0 $1$DGA5890 COMPAQ HSV110 (C)COMPAQ 3028
dga5890.1002.0.3.0 $1$DGA5890 COMPAQ HSV110 (C)COMPAQ 3028
dka0.0.0.2004.0 DKA0 COMPAQ BD03685A24 HPB7
dka100.1.0.2004.0 DKA100 COMPAQ BD01864552 3B08
dka200.2.0.2004.0 DKA200 COMPAQ BD00911934 3B00
dqa0.0.0.15.0 DQA0 HL-DT-ST CD-ROM GCR-8480 2.11
dva0.0.0.1000.0 DVA0
eia0.0.0.2005.0 EIA0 00-06-2B-03-2D-7D
pga0.0.0.3.0 PGA0 WWN 1000-0000-c92a-78e9
pka0.7.0.2004.0 PKA0 SCSI Bus ID 7
pkb0.6.0.2.0 PKB0 SCSI Bus ID 6 5.57
P00>>>
|
From the output, the LAN interface will be EIA0 on which the IP address
will be configured and used for Cluster configuration.
Note
The Alpha console uses the MOP protocol for network load of satellite
systems. Since the MOP protocol is non-routable, the satellite boot
server or servers and all satellites booting from them must reside in
the same LAN. In addition, the boot server must have at least one LAN
device enabled for cluster communications to permit the Alpha satellite
nodes to access the system disk.
|
On Integrity server systems, the interface name will either start with
EI or EW. If it is the first interface, it will be EIA0 or EWA0. Note
the mac address of the interface that you want to use from the Shell
prompt. To obtain the interface information on Integrity servers,
execute the following command on the EFI Shell:
Shell> lanaddress
LAN Address Information
LAN Address Path
----------------- ----------------------------------------
Mac(00306E4A133F) Acpi(HWP0002,0)/Pci(3|0)/Mac(00306E4A133F))
*Mac(00306E4A02F9) Acpi(HWP0002,100)/Pci(2|0)/Mac(00306E4A02F9))
Shell>
|
Assuming that the active interface is EIA0, configure the satellite
with EIA0, if it does not boot with EIA0 try with EWA0 subsequently.
For more information about configuring a satellite node, see
Section 8.2.3.4.
9.7 System-Disk Throughput
Achieving enough system-disk throughput requires some combination of
the following techniques:
9.7.1 Avoiding Disk Rebuilds
The OpenVMS file system maintains a cache of preallocated file headers
and disk blocks. When a disk is not properly dismounted, such as when a
system fails, this preallocated space becomes temporarily unavailable.
When the disk is mounted again, OpenVMS scans the disk to recover that
space. This is called a disk rebuild.
A large OpenVMS Cluster system must ensure sufficient capacity to boot
nodes in a reasonable amount of time. To minimize the impact of disk
rebuilds at boot time, consider making the following changes:
Action |
Result |
Use the DCL command MOUNT/NOREBUILD for all user disks, at least on the
satellite nodes. Enter this command into startup procedures that mount
user disks.
|
It is undesirable to have a satellite node rebuild the disk, yet this
is likely to happen if a satellite is the first to reboot after it or
another node fails.
|
Set the system parameter ACP_REBLDSYSD to 0, at least for the satellite
nodes.
|
This prevents a rebuild operation on the system disk when it is mounted
implicitly by OpenVMS early in the boot process.
|
Avoid a disk rebuild during prime working hours by using the SET
VOLUME/REBUILD command during times when the system is not so heavily
used. Once the computer is running, you can run a batch job or a
command procedure to execute the SET VOLUME/REBUILD command for each
disk drive.
|
User response times can be degraded during a disk rebuild operation
because most I/O activity on that disk is blocked. Because the SET
VOLUME/REBUILD command determines whether a rebuild is needed, the job
can execute the command for every disk. This job can be run during off
hours, preferably on one of the more powerful nodes.
|
Caution: In large OpenVMS Cluster systems, large
amounts of disk space can be preallocated to caches. If many nodes
abruptly leave the cluster (for example, during a power failure), this
space becomes temporarily unavailable. If your system usually runs with
nearly full disks, do not disable rebuilds on the server nodes at boot
time.
9.7.2 Offloading Work
In addition to the system disk throughput issues during an entire
OpenVMS Cluster boot, access to particular system files even during
steady-state operations (such as logging in, starting up applications,
or issuing a PRINT command) can affect response times.
You can identify hot system files using a performance
or monitoring tool (such as those listed in Section 1.5.2), and use the
techniques in the following table to reduce hot file I/O activity on
system disks:
Potential Hot Files |
Methods to Help |
Page and swap files
|
When you run CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM to add
computers to specify the sizes and locations of page and swap files,
relocate the files as follows:
- Move page and swap files for computers off system disks.
- Set up page and swap files for satellites on the satellites' local
disks, if such disks are available.
|
Move these high-activity files off the system disk:
- SYSUAF.DAT
- NETPROXY.DAT
- RIGHTSLIST.DAT
- ACCOUNTNG.DAT
- VMSMAIL_PROFILE.DATA
- QMAN$MASTER.DAT
- Layered product and other application files
|
Use any of the following methods:
- Specify new locations for the files according to the instructions
in Chapter 5.
- Use caching in the HSC subsystem or in RF or RZ disks to improve
the effective system-disk throughput.
- Add a solid-state disk to your configuration. These devices have
lower latencies and can handle a higher request rate than a regular
magnetic disk. A solid-state disk can be used as a system disk or to
hold system files.
- Use DECram software to create RAMdisks on MOP servers to hold
copies of selected hot read-only files to improve boot times. A RAMdisk
is an area of main memory within a system that is set aside to store
data, but it is accessed as if it were a disk.
|
Moving these files from the system disk to a separate disk eliminates
most of the write activity to the system disk. This raises the
read/write ratio and, if you are using Volume Shadowing for OpenVMS,
maximizes the performance of shadowing on the system disk.
9.7.3 Configuring Multiple System Disks
Depending on the number of computers to be included in a large cluster
and the work being done, you must evaluate the tradeoffs involved in
configuring a single system disk or multiple system disks.
While a single system disk is easier to manage, a large cluster often
requires more system disk I/O capacity than a single system disk can
provide. To achieve satisfactory performance, multiple system disks may
be needed. However, you should recognize the increased system
management efforts involved in maintaining multiple system disks.
Consider the following when determining the need for multiple system
disks:
- Concurrent user activity
In clusters with many satellites, the
amount and type of user activity on those satellites influence
system-disk load and, therefore, the number of satellites that can be
supported by a single system disk. For example:
IF... |
THEN... |
Comments |
Many users are active or run multiple applications simultaneously
|
The load on the system disk can be significant; multiple system disks
may be required.
|
Some OpenVMS Cluster systems may need to be configured on the
assumption that all users are constantly active. Such working
conditions may require a larger, more expensive OpenVMS Cluster system
that handles peak loads without performance degradation.
|
Few users are active simultaneously
|
A single system disk might support a large number of satellites.
|
For most configurations, the probability is low that most users are
active simultaneously. A smaller and less expensive OpenVMS Cluster
system can be configured for these typical working conditions but may
suffer some performance degradation during peak load periods.
|
Most users run a single application for extended periods
|
A single system disk might support a large number of satellites if
significant numbers of I/O requests can be directed to application data
disks.
|
Because each workstation user in an OpenVMS Cluster system has a
dedicated computer, a user who runs large compute-bound jobs on that
dedicated computer does not significantly affect users of other
computers in the OpenVMS Cluster system. For clustered workstations,
the critical shared resource is a disk server. Thus, if a workstation
user runs an I/O-intensive job, its effect on other workstations
sharing the same disk server might be noticeable.
|
- Concurrent booting activity
One of the few times when all
OpenVMS Cluster computers are simultaneously active is during a cluster
reboot. All satellites are waiting to reload the operating system, and
as soon as a boot server is available, they begin to boot in parallel.
This booting activity places a significant I/O load on the boot server,
system disk, and interconnect. Note: You can
reduce overall cluster boot time by configuring multiple system disks
and by distributing system roots for computers evenly across those
disks. This technique has the advantage of increasing overall system
disk I/O capacity, but it has the disadvantage of requiring additional
system management effort. For example, installation of layered products
or upgrades of the OpenVMS operating system must be repeated once for
each system disk.
- System management
Because system management work load increases
as separate system disks are added and does so in direct proportion to
the number of separate system disks that need to be maintained, you
want to minimize the number of system disks added to provide the
required level of performance.
Volume Shadowing for OpenVMS is an alternative to creating multiple
system disks. Volume shadowing increases the read I/O capacity of a
single system disk and minimizes the number of separate system disks
that have to be maintained because installations or updates need only
be applied once to a volume-shadowed system disk. For clusters with
substantial system disk I/O requirements, you can use multiple system
disks, each configured as a shadow set.
Cloning the system disk is a way to manage multiple system disks. To
clone the system disk:
- Create a system disk (or shadow set) with roots for all OpenVMS
Cluster nodes.
- Use this as a master copy, and perform all software upgrades on
this system disk.
- Back up the master copy to the other disks to create the
cloned system disks.
- Change the volume names so they are unique.
- If you have not moved system files off the system disk, you must
have the SYLOGICALS.COM startup file point to system files on the
master system disk.
- Before an upgrade, be sure to save any changes you need from the
cloned disks since the last upgrade, such as MODPARAMS.DAT and AUTOGEN
feedback data, accounting files for billing, and password history.
9.8 Conserving System Disk Space
The essential files for a satellite root take up very little space, so
that more than 96 roots can easily fit on a single system disk.
However, if you use separate dump files for each satellite node or put
page and swap files for all the satellite nodes on the system disk, you
quickly run out of disk space.
9.8.1 Techniques
To avoid running out of disk space, set up common dump files for all
the satellites or for groups of satellite nodes. For debugging
purposes, it is best to have separate dump files for each MOP and disk
server. Also, you can use local disks on satellite nodes to hold page
and swap files, instead of putting them on the system disk. In
addition, move page and swap files for MOP and disk servers off the
system disk.
Reference: See Section 10.7 to plan a strategy for
managing dump files.
9.9 Adjusting System Parameters
As an OpenVMS Cluster system grows, certain data structures within
OpenVMS need to grow in order to accommodate the large number of nodes.
If growth is not possible (for example, because of a shortage of
nonpaged pool) this will induce intermittent problems that are
difficult to diagnose. HP recommends you to have a separate network for
cluster communication. This can help avoid any user data interference
with cluster traffic and suitable for environment that has high
intra-cluster traffic.
You should run AUTOGEN with FEEDBACK frequently as a cluster grows, so
that settings for many parameters can be adjusted. Refer to
Section 8.7 for more information about running AUTOGEN.
In addition to running AUTOGEN with FEEDBACK, you should check and
manually adjust the following parameters:
- SCSRESPCNT
- CLUSTER_CREDITS
SCS connections are now allocated and expanded only as needed, up to a
limit of 65,000.
9.9.1 The SCSRESPCNT Parameter
Description: The SCSRESPCNT parameter controls the
number of response descriptor table (RDT) entries available for system
use. An RDT entry is required for every in-progress message exchange
between two nodes.
Symptoms of entry shortages: A shortage of entries
affects performance, since message transmissions must be delayed until
a free entry is available.
How to determine a shortage of RDT entries: Use the
SDA utility as follows to check each system for requests that waited
because there were not enough free RDTs.
SDA> READ SYS$SYSTEM:SCSDEF
%SDA-I-READSYM, reading symbol table SYS$COMMON:[SYSEXE]SCSDEF.STB;1
SDA> EXAM @SCS$GL_RDT + RDT$L_QRDT_CNT
8044DF74: 00000000 "...."
SDA>
|
How to resolve shortages: If the SDA EXAMINE command
displays a nonzero value, RDT waits have occurred. If you find a count
that tends to increase over time under normal operations, increase
SCSRESPCNT.
9.9.2 The CLUSTER_CREDITS Parameter
Description: The CLUSTER_CREDITS parameter specifies
the number of per-connection buffers a node allocates to receiving
VMS$VAXcluster communications. This system parameter is not dynamic;
that is, if you change the value, you must reboot the node on which you
changed it.
Default: The default value is 10. The default value
may be insufficient for a cluster that has very high locking rates.
Symptoms of cluster credit problem: A shortage of
credits affects performance, since message transmissions are delayed
until free credits are available. These are visible as credit waits in
the SHOW CLUSTER display.
How to determine whether credit waits exist: Use the
SHOW CLUSTER utility as follows:
- Run SHOW CLUSTER/CONTINUOUS.
- Type REMOVE SYSTEM/TYPE=HS.
- Type ADD LOC_PROC, CR_WAIT.
- Type SET CR_WAIT/WIDTH=10.
- Check to see whether the number of CR_WAITS (credit waits) logged
against the VMS$VAXcluster connection for any remote node is
incrementing regularly. Ideally, credit waits should not occur.
However, occasional waits under very heavy load conditions are
acceptable.
How to resolve incrementing credit waits:
If the number of CR_WAITS is incrementing more than once per minute,
perform the following steps:
- Increase the CLUSTER_CREDITS parameter on the node against which
they are being logged by five. The parameter should be modified on the
remote node, not on the node which is running SHOW CLUSTER.
- Reboot the node.
Note that it is not necessary for the CLUSTER_CREDITS parameter to be
the same on every node.
9.10 Minimize Network Instability
Network instability also affects OpenVMS Cluster operations.
Table 9-8 lists techniques to minimize typical network problems.
Table 9-8 Techniques to Minimize Network Problems
Technique |
Recommendation |
Adjust the RECNXINTERVAL parameter.
|
The RECNXINTERVAL system parameter specifies the number of seconds the
OpenVMS Cluster system waits when it loses contact with a node, before
removing the node from the configuration. Many large OpenVMS Cluster
configurations operate with the RECNXINTERVAL parameter set to 40
seconds (the default value is 20 seconds).
Raising the value of RECNXINTERVAL can result in longer perceived
application pauses, especially when the node leaves the OpenVMS Cluster
system abnormally. The pause is caused by the connection manager
waiting for the number of seconds specified by RECNXINTERVAL.
|
Protect the network
|
For clusters connected on the LAN interconnect, treat the LAN as if it
were a part of the OpenVMS Cluster system. For example, do not allow an
environment in which a random user can disconnect a ThinWire segment to
attach a new PC while 20 satellites hang.
For Clusters running on IP interconnect, ensure that the IP network
is protected using a VPN type of security.
|
Choose your hardware and configuration carefully.
|
Certain hardware is not suitable for use in a large OpenVMS Cluster
system.
- Some network components can appear to work well with light loads,
but are unable to operate properly under high traffic conditions.
Improper operation can result in lost or corrupted packets that will
require packet retransmissions. This reduces performance and can affect
the stability of the OpenVMS Cluster configuration.
- Beware of bridges that cannot filter and forward at full line rates
and repeaters that do not handle congested conditions well.
- Refer to Guidelines for OpenVMS Cluster Configurations to determine appropriate OpenVMS Cluster
configurations and capabilities.
|
Use the LAVC$FAILURE_ANALYSIS facility.
|
See Section D.5 for assistance in the isolation of network faults.
|
|