[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

HP OpenVMS Cluster Systems


Previous Contents Index

9.11 DECnet Cluster Alias

You should define a cluster alias name for the OpenVMS Cluster to ensure that remote access will be successful when at least one OpenVMS Cluster member is available to process the client program's requests.

The cluster alias acts as a single network node identifier for an OpenVMS Cluster system. Computers in the cluster can use the alias for communications with other computers in a DECnet network. Note that it is possible for nodes running DECnet for OpenVMS to have a unique and separate cluster alias from nodes running DECnet--Plus. In addition, clusters running DECnet--Plus can have one cluster alias for VAX, one for Alpha, and another for both.

Note: A single cluster alias can include nodes running either DECnet for OpenVMS or DECnet--Plus, but not both. Also, an OpenVMS Cluster running both DECnet for OpenVMS and DECnet--Plus requires multiple system disks (one for each).

Reference: See Chapter 4 for more information about setting up and using a cluster alias in an OpenVMS Cluster system.


Chapter 10
Maintaining an OpenVMS Cluster System

Once your cluster is up and running, you can implement routine, site-specific maintenance operations---for example, backing up disks or adding user accounts, performing software upgrades and installations, running AUTOGEN with the feedback option on a regular basis, and monitoring the system for performance.

You should also maintain records of current configuration data, especially any changes to hardware or software components. If you are managing a cluster that includes satellite nodes, it is important to monitor LAN activity.

From time to time, conditions may occur that require the following special maintenance operations:

  • Restoring cluster quorum after an unexpected computer failure
  • Executing conditional shutdown operations
  • Performing security functions in LAN and mixed-interconnect clusters

10.1 Backing Up Data and Files

As a part of the regular system management procedure, you should copy operating system files, application software files, and associated files to an alternate device using the OpenVMS Backup utility.

Some backup operations are the same in an OpenVMS Cluster as they are on a single OpenVMS system. For example, an incremental back up of a disk while it is in use, or the backup of a nonshared disk.

Backup tools for use in a cluster include those listed in Table 10-1.

Table 10-1 Backup Methods
Tool Usage
Online backup Use from a running system to back up:
  • The system's local disks
  • Cluster-shareable disks other than system disks
  • The system disk or disks

Caution: Files open for writing at the time of the backup procedure may not be backed up correctly.

Menu-driven If you have access to the OpenVMS Alpha distribution CD-ROM, back up your system using the menu system provided on that disc. This menu system, which is displayed automatically when you boot the CD-ROM, allows you to:
  • Enter a DCL environment, from which you can perform backup and restore operations on the system disk (instead of using standalone BACKUP).
  • Install or upgrade the operating system and layered products, using the POLYCENTER Software Installation utility.

Reference: For more detailed information about using the menu-driven procedure, see the OpenVMS Upgrade and Installation Manual and the HP OpenVMS System Manager's Manual.

Plan to perform the backup process regularly, according to a schedule that is consistent with application and user needs. This may require creative scheduling so that you can coordinate backups with times when user and application system requirements are low.

Reference: See the HP OpenVMS System Management Utilities Reference Manual: A--L for complete information about the OpenVMS Backup utility.

10.2 Updating the OpenVMS Operating System

When updating the OpenVMS operating system, follow the steps in Table 10-2.

Table 10-2 Upgrading the OpenVMS Operating System
Step Action
1 Back up the system disk.
2 Perform the update procedure once for each system disk.
3 Install any mandatory updates.
4 Run AUTOGEN on each node that boots from that system disk.
5 Run the user environment test package (UETP) to test the installation.
6 Use the OpenVMS Backup utility to make a copy of the new system volume.

Reference: See the appropriate OpenVMS upgrade and installation manual for complete instructions.

10.2.1 Rolling Upgrades

The OpenVMS operating system allows an OpenVMS Cluster system running on multiple system disks to continue to provide service while the system software is being upgraded. This process is called a rolling upgrade because each node is upgraded and rebooted in turn, until all the nodes have been upgraded.

If you must first migrate your system from running on one system disk to running on two or more system disks, follow these steps:

Step Action
1 Follow the procedures in Section 8.5 to create a duplicate disk.
2 Follow the instructions in Section 5.8 for information about coordinating system files.

These sections help you add a system disk and prepare a common user environment on multiple system disks to make the shared system files such as the queue database, rightslists, proxies, mail, and other files available across the OpenVMS Cluster system.

10.3 LAN Network Failure Analysis

The OpenVMS operating system provides a sample program to help you analyze OpenVMS Cluster network failures on the LAN. You can edit and use the SYS$EXAMPLES:LAVC$FAILURE_ANALYSIS.MAR program to detect and isolate failed network components. Using the network failure analysis program can help reduce the time required to detect and isolate a failed network component, thereby providing a significant increase in cluster availability.

Reference: For a description of the network failure analysis program, refer to Appendix D.

10.4 Recording Configuration Data

To maintain an OpenVMS Cluster system effectively, you must keep accurate records about the current status of all hardware and software components and about any changes made to those components. Changes to cluster components can have a significant effect on the operation of the entire cluster. If a failure occurs, you may need to consult your records to aid problem diagnosis.

Maintaining current records for your configuration is necessary both for routine operations and for eventual troubleshooting activities.

10.4.1 Record Information

At a minimum, your configuration records should include the following information:

  • A diagram of your physical cluster configuration. (Appendix D includes a discussion of keeping a LAN configuration diagram.)
  • SCSNODE and SCSSYSTEMID parameter values for all computers.
  • VOTES and EXPECTED_VOTES parameter values.
  • DECnet names and addresses for all computers.
  • Current values for cluster-related system parameters, especially ALLOCLASS and TAPE_ALLOCLASS values for HSC subsystems and computers.
    Reference: Cluster system parameters are described in Appendix A.
  • Names and locations of default bootstrap command procedures for all computers connected with the CI.
  • Names of cluster disk and tape devices.
  • In LAN and mixed-interconnect clusters, LAN hardware addresses for satellites.
  • Names of LAN adapters.
  • Names of LAN segments or rings.
  • Names of LAN bridges and switches and port settings.
  • Names of wiring concentrators or of DELNI or DEMPR adapters.
  • Serial numbers of all hardware components.
  • Changes to any hardware or software components (including site-specific command procedures), along with dates and times when changes were made.

10.4.2 Satellite Network Data

The first time you execute CLUSTER_CONFIG.COM to add a satellite, the procedure creates the file NETNODE_UPDATE.COM in the boot server's SYS$SPECIFIC:[SYSMGR] directory. (For a common-environment cluster, you must rename this file to the SYS$COMMON:[SYSMGR] directory, as described in Section 5.8.2.) This file, which is updated each time you add or remove a satellite or change its Ethernet hardware address, contains all essential network configuration data for the satellite.

If an unexpected condition at your site causes configuration data to be lost, you can use NETNODE_UPDATE.COM to restore it. You can also read the file when you need to obtain data about individual satellites. Note that you may want to edit the file occasionally to remove obsolete entries.

Example 10-1 shows the contents of the file after satellites EUROPA and GANYMD have been added to the cluster.

Example 10-1 Sample NETNODE_UPDATE.COM File

$ RUN SYS$SYSTEM:NCP 
    define node EUROPA address 2.21 
    define node EUROPA hardware address 08-00-2B-03-51-75 
    define node EUROPA load assist agent sys$share:niscs_laa.exe 
    define node EUROPA load assist parameter $1$DGA11:<SYS10.> 
    define node EUROPA tertiary loader sys$system:tertiary_vmb.exe 
    define node GANYMD address 2.22 
    define node GANYMD hardware address 08-00-2B-03-58-14 
    define node GANYMD load assist agent sys$share:niscs_laa.exe 
    define node GANYMD load assist parameter $1$DGA11:<SYS11.> 
    define node GANYMD tertiary loader sys$system:tertiary_vmb.exe 

Reference: See the DECnet--Plus documentation for equivalent NCL command information.

10.5 Controlling OPCOM Messages

When a satellite joins the cluster, the Operator Communications Manager (OPCOM) has the following default states:

  • For all systems in an OpenVMS Cluster configuration except workstations:
    • OPA0: is enabled for all message classes.
    • The log file SYS$MANAGER:OPERATOR.LOG is opened for all classes.
  • For workstations in an OpenVMS Cluster configuration, even though the OPCOM process is running:
    • OPA0: is not enabled.
    • No log file is opened.

10.5.1 Overriding OPCOM Defaults

Table 10-3 shows how to define the following system logical names in the command procedure SYS$MANAGER:SYLOGICALS.COM to override the OPCOM default states.

Table 10-3 OPCOM System Logical Names
System Logical Name Function
OPC$OPA0_ENABLE If defined to be true, OPA0: is enabled as an operator console. If defined to be false, OPA0: is not enabled as an operator console. DCL considers any string beginning with T or Y or any odd integer to be true, all other values are false.
OPC$OPA0_CLASSES Defines the operator classes to be enabled on OPA0:. The logical name can be a search list of the allowed classes, a list of classes, or a combination of the two. For example:
$ DEFINE/SYSTEM OP$OPA0_CLASSES CENTRAL,DISKS,TAPE

$ DEFINE/SYSTEM OP$OPA0_CLASSES "CENTRAL,DISKS,TAPE"
$ DEFINE/SYSTEM OP$OPA0_CLASSES "CENTRAL,DISKS",TAPE

You can define OPC$OPA0_CLASSES even if OPC$OPA0_ENABLE is not defined. In this case, the classes are used for any operator consoles that are enabled, but the default is used to determine whether to enable the operator console.

OPC$LOGFILE_ENABLE If defined to be true, an operator log file is opened. If defined to be false, no log file is opened.
OPC$LOGFILE_CLASSES Defines the operator classes to be enabled for the log file. The logical name can be a search list of the allowed classes, a comma-separated list, or a combination of the two. You can define this system logical even when the OPC$LOGFILE_ENABLE system logical is not defined. In this case, the classes are used for any log files that are open, but the default is used to determine whether to open the log file.
OPC$LOGFILE_NAME Supplies information that is used in conjunction with the default name SYS$MANAGER:OPERATOR.LOG to define the name of the log file. If the log file is directed to a disk other than the system disk, you should include commands to mount that disk in the SYLOGICALS.COM command procedure.

10.5.2 Example

The following example shows how to use the OPC$OPA0_CLASSES system logical to define the operator classes to be enabled. The following command prevents SECURITY class messages from being displayed on OPA0.


$ DEFINE/SYSTEM OPC$OPA0_CLASSES CENTRAL,PRINTER,TAPES,DISKS,DEVICES, -
_$ CARDS,NETWORK,CLUSTER,LICENSE,OPER1,OPER2,OPER3,OPER4,OPER5, -
_$ OPER6,OPER7,OPER8,OPER9,OPER10,OPER11,OPER12

In large clusters, state transitions (computers joining or leaving the cluster) generate many multiline OPCOM messages on a boot server's console device. You can avoid such messages by including the DCL command REPLY/DISABLE=CLUSTER in the appropriate site-specific startup command file or by entering the command interactively from the system manager's account.

10.6 Shutting Down a Cluster

The SHUTDOWN command of the SYSMAN utility provides five options for shutting down OpenVMS Cluster computers:
  • NONE (the default)
  • REMOVE_NODE
  • CLUSTER_SHUTDOWN
  • REBOOT_CHECK
  • SAVE_FEEDBACK

These options are described in the following sections.

10.6.1 The NONE Option

If you select the default SHUTDOWN option NONE, the shutdown procedure performs the normal operations for shutting down a standalone computer. If you want to shut down a computer that you expect will rejoin the cluster shortly, you can specify the default option NONE. In that case, cluster quorum is not adjusted because the operating system assumes that the computer will soon rejoin the cluster.

In response to the "Shutdown options [NONE]:" prompt, you can specify the DISABLE_AUTOSTART=n option, where n is the number of minutes before autostart queues are disabled in the shutdown sequence. For more information about this option, see Section 7.13.

10.6.2 The REMOVE_NODE Option

If you want to shut down a computer that you expect will not rejoin the cluster for an extended period, use the REMOVE_NODE option. For example, a computer may be waiting for new hardware, or you may decide that you want to use a computer for standalone operation indefinitely.

When you use the REMOVE_NODE option, the active quorum in the remainder of the cluster is adjusted downward to reflect the fact that the removed computer's votes no longer contribute to the quorum value. The shutdown procedure readjusts the quorum by issuing the SET CLUSTER/EXPECTED_VOTES command, which is subject to the usual constraints described in Section 10.11.

Note: The system manager is still responsible for changing the EXPECTED_VOTES system parameter on the remaining OpenVMS Cluster computers to reflect the new configuration.

10.6.3 The CLUSTER_SHUTDOWN Option

When you choose the CLUSTER_SHUTDOWN option, the computer completes all shut down activities up to the point where the computer would leave the cluster in a normal shutdown situation. At this point the computer waits until all other nodes in the cluster have reached the same point. When all nodes have completed their shutdown activities, the entire cluster dissolves in one synchronized operation. The advantage of this is that individual nodes do not complete shutdown independently, and thus do not trigger state transitions or potentially leave the cluster without quorum.

When performing a CLUSTER_SHUTDOWN you must specify this option on every OpenVMS Cluster computer. If any computer is not included, clusterwide shutdown cannot occur.

10.6.4 The REBOOT_CHECK Option

When you choose the REBOOT_CHECK option, the shutdown procedure checks for the existence of basic system files that are needed to reboot the computer successfully and notifies you if any files are missing. You should replace such files before proceeding. If all files are present, the following informational message appears:


%SHUTDOWN-I-CHECKOK, Basic reboot consistency check completed.

Note: You can use the REBOOT_CHECK option separately or in conjunction with either the REMOVE_NODE or the CLUSTER_SHUTDOWN option. If you choose REBOOT_CHECK with one of the other options, you must specify the options in the form of a comma-separated list.

10.6.5 The SAVE_FEEDBACK Option

Use the SAVE_FEEDBACK option to enable the AUTOGEN feedback operation.

Note: Select this option only when a computer has been running long enough to reflect your typical work load.

Reference: For detailed information about AUTOGEN feedback, see the HP OpenVMS System Manager's Manual.

10.6.6 Shutting Down TCP/IP

Where clusters use IP as the interconnect, shutting down the TCP/IP connection results in loss of connection between the node and the existing members of the cluster. As a result, the Quorum of the cluster hangs, leading to the CLUEXIT crash. Therefore, ensure that all software applications are closed before shutting down TCP/IP

Shut down TCP/IP as shown:


$@SYS$MANAGER:TCPIPCONFIG
Checking TCP/IP Services for OpenVMS configuration database files. 
                                          
        HP TCP/IP Services for OpenVMS Configuration Menu 
 
        Configuration options: 
 
                 1  -  Core environment 
                 2  -  Client components 
                 3  -  Server components 
                 4  -  Optional components 
                 5  -  Shutdown HP TCP/IP Services for OpenVMS 
                 6  -  Startup HP TCP/IP Services for OpenVMS 
                 7  -  Run tests 
                 A  -  Configure options 1 - 4 
                [E] -  Exit configuration procedure 
 
Enter configuration option: 5 
Begin Shutdown... 
 
  TCPIP$SHUTDOWN has detected the presence of IPCI configuration 
  file: SYS$SYSROOT:[SYSEXE]TCPIP$CLUSTER.DAT; 
 
  If you are using TCP/IP as your only cluster communication 
  channel, then stopping TCP/IP will cause this system to 
  CLUEXIT.  Remote systems may also CLUEXIT. 
 
Non-interactive.  Continuing with TCP/IP shutdown ... 

10.7 Dump Files

Whether your OpenVMS Cluster system uses a single common system disk or multiple system disks, you should plan a strategy to manage dump files.

10.7.1 Controlling Size and Creation

Dump-file management is especially important for large clusters with a single system disk. For example, on a 1 GB OpenVMS Alpha computer, AUTOGEN creates a dump file in excess of 350,000 blocks.

In the event of a software-detected system failure, each computer normally writes the contents of memory as a compressed selective dump file on its system disk for analysis. AUTOGEN calculates the size of the file based on the size of physical memory and the number of processes. If system disk space is limited (as is probably the case if a single system disk is used for a large cluster), you may want to specify that no dump file be created for satellites.

You can control dump-file size and creation for each computer by specifying appropriate values for the AUTOGEN symbols DUMPSTYLE and DUMPFILE in the computer's MODPARAMS.DAT file. For example, specify dump files as shown in Table 10-4.

Table 10-4 AUTOGEN Dump-File Symbols
Value Specified Result
DUMPSTYLE = 9 Compressed selective dump file created (default)
DUMPFILE = 0 No dump file created
DUMPFILE = n Dump file of size n created

Refer to the HP OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems for more information on dump files and Dump Off System Disk (DOSD).

Caution: Although you can configure computers without dump files, the lack of a dump file can make it difficult or impossible to determine the cause of a system failure.

The recommended method for controlling dump file size and location is using AUTOGEN and MODPARAMS.DAT. However, if necessary, the SYSGEN utility can be used explicitly. The following example shows the use of SYSGEN to modify the system dump-file size on large-memory systems:


$ MCR SYSGEN
SYSGEN> USE CURRENT
SYSGEN> SET DUMPSTYLE 9
SYSGEN> WRITE CURRENT
SYSGEN> CREATE SYS$SYSTEM:SYSDUMP.DMP/SIZE=350000
SYSGEN> EXIT
$ @SHUTDOWN

The dump-file size of 35,000 blocks is sufficient to cover about 1 GB of memory. This size is usually large enough to encompass the information needed to analyze a system failure.

After the system reboots, you can purge SYSDUMP.DMP.


Previous Next Contents Index