[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

HP OpenVMS Cluster Systems


Previous Contents Index

Table A-2 lists system parameters that should not require adjustment at any time. These parameters are provided for use in system debugging. HP recommends that you do not change these parameters unless you are advised to do so by your HP support representative. Incorrect adjustment of these parameters can result in cluster failures.

Table A-2 Cluster System Parameters Reserved for OpenVMS Use Only (Integrity servers and Alpha)
Parameter Description
MC_SERVICES_P1 (dynamic) The value of this parameter must be the same on all nodes connected by MEMORY CHANNEL.
MC_SERVICES_P5 (dynamic) This parameter must remain at the default value of 8000000. This parameter value must be the same on all nodes connected by MEMORY CHANNEL.
MC_SERVICES_P8 (static) This parameter must remain at the default value of 0. This parameter value must be the same on all nodes connected by MEMORY CHANNEL.
MPDEV_D1 A multipath system parameter.
PE4 PE4 SYSGEN parameter can be used to tune the important parameters of PEDRIVER driver. The PE4 value comprises of the following parameters:
Parameter PE4 Bits Default Units
Listen Timeout <7:0> 8 Seconds
HELLO Interval <15:8> 30 0.1 Sec (100ms)
CC Ticks/Second <23:16> 50  
Piggyback Ack Delay <31:24> 10 0.01 Sec (10ms)

HP recommends to retain the default values for these parameters. Any changes to these parameters should be done with the guidance of HP support.

PRCPOLINTERVAL Specifies, in seconds, the polling interval used to look for SCS applications, such as the connection manager and MSCP disks, on other computers. Each computer is polled, at most, once each interval.

This parameter trades polling overhead against quick recognition of new computers or servers as they appear.

SCSMAXMSG The maximum number of bytes of system application data in one sequenced message. The amount of physical memory consumed by one message is SCSMAXMSG plus the overhead for buffer management.

If an SCS port is not configured on your system, this parameter is ignored.

SCSMAXDG Specifies the maximum number of bytes of application data in one datagram.

If an SCS port is not configured on your system, this parameter is ignored.

SCSFLOWCUSH Specifies the lower limit for receive buffers at which point SCS starts to notify the remote SCS of new receive buffers. For each connection, SCS tracks the number of receive buffers available. SCS communicates this number to the SCS at the remote end of the connection. However, SCS does not need to do this for each new receive buffer added. Instead, SCS notifies the remote SCS of new receive buffers if the number of receive buffers falls as low as the SCSFLOWCUSH value.

If an SCS port is not configured on your system, this parameter is ignored.


Appendix B
Building Common Files

This appendix provides guidelines for building a common user authorization file (UAF) from computer-specific files. It also describes merging RIGHTSLIST.DAT files.

For more detailed information about how to set up a computer-specific authorization file, see the descriptions in the HP OpenVMS Guide to System Security.

B.1 Building a Common SYSUAF.DAT File

To build a common SYSUAF.DAT file, follow the steps in Table B-1.

Table B-1 Building a Common SYSUAF.DAT File
Step Action
1 Print a listing of SYSUAF.DAT on each computer. To print this listing, invoke AUTHORIZE and specify the AUTHORIZE command LIST as follows:
$ SET DEF SYS$SYSTEM

$ RUN AUTHORIZE
UAF> LIST/FULL [*,*]
2 Use the listings to compare the accounts from each computer. On the listings, mark any necessary changes. For example:
  • Delete any accounts that you no longer need.
  • Make sure that UICs are set appropriately:
    • User UICs

      Check each user account in the cluster to see whether it should have a unique user identification code (UIC). For example, OpenVMS Cluster member VENUS may have a user account JONES that has the same UIC as user account SMITH on computer MARS. When computers VENUS and MARS are joined to form a cluster, accounts JONES and SMITH will exist in the cluster environment with the same UIC. If the UICs of these accounts are not differentiated, each user will have the same access rights to various objects in the cluster. In this case, you should assign each account a unique UIC.

    • Group UICs

      Make sure that accounts that perform the same type of work have the same group UIC. Accounts in a single-computer environment probably follow this convention. However, there may be groups of users on each computer that will perform the same work in the cluster but that have group UICs unique to their local computer. As a rule, the group UIC for any given work category should be the same on each computer in the cluster. For example, data entry accounts on VENUS should have the same group UIC as data entry accounts on MARS.

    Note: If you change the UIC for a particular user, you should also change the owner UICs for that user's existing files and directories. You can use the DCL commands SET FILE and SET DIRECTORY to make these changes. These commands are described in detail in the HP OpenVMS DCL Dictionary.

3 Choose the SYSUAF.DAT file from one of the computers to be a master SYSUAF.DAT.

Note: See A Comparison of System Management on OpenVMS AXP and OpenVMS VAX 1 for information about setting the number of SYSUAF process limits and quotas on an Alpha computer.

4 Merge the SYSUAF.DAT files from the other computers to the master SYSUAF.DAT by running the Convert utility (CONVERT) on the computer that owns the master SYSUAF.DAT. (See the OpenVMS Record Management Utilities Reference Manual for a description of CONVERT.) To use CONVERT to merge the files, each SYSUAF.DAT file must be accessible to the computer that is running CONVERT.

Syntax: To merge the UAFs into the master SYSUAF.DAT file, specify the CONVERT command in the following format:

CONVERT SYSUAF1,SYSUAF2,...SYSUAFn MASTER_SYSUAF

Note that if a given user name appears in more than one source file, only the first occurrence of that name appears in the merged file.

Example: The following command sequence example creates a new SYSUAF.DAT file from the combined contents of the two input files:

$ SET DEFAULT SYS$SYSTEM

$ CONVERT/MERGE [SYS1.SYSEXE]SYSUAF.DAT, -
_$ [SYS2.SYSEXE]SYSUAF.DAT SYSUAF.DAT

The CONVERT command in this example adds the records from the files [SYS1.SYSEXE]SYSUAF.DAT and [SYS2.SYSEXE]SYSUAF.DAT to the file SYSUAF.DAT on the local computer.

After you run CONVERT, you have a master SYSUAF.DAT that contains records from the other SYSUAF.DAT files.

5 Use AUTHORIZE to modify the accounts in the master SYSUAF.DAT according to the changes you marked on the initial listings of the SYSUAF.DAT files from each computer.
6 Place the master SYSUAF.DAT file in SYS$COMMON:[SYSEXE].
7 Remove all node-specific SYSUAF.DAT files.

1This manual has been archived but is available in PostScript and DECW$BOOK (Bookreader) formats on the OpenVMS Documentation CD-ROM.

B.2 Merging RIGHTSLIST.DAT Files

If you need to merge RIGHTSLIST.DAT files, you can use a command sequence like the following:


$ ACTIVE_RIGHTSLIST = F$PARSE("RIGHTSLIST","SYS$SYSTEM:.DAT")
$ CONVERT/SHARE/STAT 'ACTIVE_RIGHTSLIST' RIGHTSLIST.NEW
$ CONVERT/MERGE/STAT/EXCEPTION=RIGHTSLIST_DUPLICATES.DAT  -
_$ [SYS1.SYSEXE]RIGHTSLIST.DAT, [SYS2.SYSEXE]RIGHTSLIST.DAT RIGHTSLIST.NEW
$ DUMP/RECORD RIGHTSLIST_DUPLICATES.DAT
$ CONVERT/NOSORT/FAST/STAT RIGHTSLIST.NEW 'ACTIVE_RIGHTSLIST'

The commands in this example add the RIGHTSLIST.DAT files from two OpenVMS Cluster computers to the master RIGHTSLIST.DAT file in the current default directory. For detailed information about creating and maintaining RIGHTSLIST.DAT files, see the security guide for your system.


Appendix C
Cluster Troubleshooting

C.1 Diagnosing Computer Failures

This appendix contains information to help you perform troubleshooting operations for the following:

  • Failures of computers to boot or to join the cluster
  • Cluster hangs
  • CLUEXIT bugchecks
  • Port device problems

C.1.1 Preliminary Checklist

Before you initiate diagnostic procedures, be sure to verify that these conditions are met:

  • All cluster hardware components are correctly connected and checked for proper operation.
  • OpenVMS Cluster computers and mass storage devices are configured according to requirements specified in the OpenVMS Cluster Software Software Product Description (SPD 29.78.xx).
  • When attempting to add a satellite to a cluster, you must verify that the LAN is configured according to requirements specified in the OpenVMS Cluster Software SPD. You must also verify that you have correctly configured and started the network, following the procedures described in Chapter 4.

If, after performing preliminary checks and taking appropriate corrective action, you find that a computer still fails to boot or to join the cluster, you can follow the procedures in Sections C.2 through C.3 to attempt recovery.

C.1.2 Sequence of Booting Events

To perform diagnostic and recovery procedures effectively, you must understand the events that occur when a computer boots and attempts to join the cluster. This section outlines those events and shows typical messages displayed at the console.

Note that events vary, depending on whether a computer is the first to boot in a new cluster or whether it is booting in an active cluster. Note also that some events (such as loading the cluster database containing the password and group number) occur only in OpenVMS Cluster systems on a LAN or IP.

The normal sequence of events is shown in Table C-1.

Table C-1 Sequence of Booting Events
Step Action
1 The computer boots. If the computer is a satellite, a message like the following shows the name and LAN address of the MOP server that has downline loaded the satellite. At this point, the satellite has completed communication with the MOP server and further communication continues with the system disk server, using OpenVMS Cluster communications.
%VAXcluster-I-SYSLOAD, system loaded from Node X...

For any booting computer, the OpenVMS "banner message" is displayed in the following format:
operating-system Version
n.n dd-mmm-yyyy hh:mm.ss

2 The computer attempts to form or join the cluster, and the following message appears:
waiting to form or join an OpenVMS Cluster system

If the computer is a member of an OpenVMS Cluster based on the LAN, the cluster security database (containing the cluster password and group number) is loaded. Optionally, the MSCP server, and TMSCP server can be loaded:

%VAXcluster-I-LOADSECDB, loading the cluster security database

%MSCPLOAD-I-LOADMSCP, loading the MSCP disk server
%TMSCPLOAD-I-LOADTMSCP, loading the TMSCP tape server

If the computer is a member of an OpenVMS Cluster based on IP, the IP configuration file is also loaded along with the cluster security database, the MSCP server and the TMSCP server:

%VMScluster-I-LOADIPCICFG, loading the IP cluster configuration 
file

%VMScluster-S-LOADEDIPCICFG, Successfully loaded IP cluster configuration file

For IP-based cluster communication, the IP interface and TCP/IP services are enabled. The multicast and unicast addresses are added to the list of IP bus, WE0 and sends the Hello packet:

%PEA0, Configuration data for IP clusters found

%PEA0, IP Multicast enabled for cluster communication, Multicast address, 224.0.0.3
%PEA0, Cluster communication enabled on IP interface, WE0
%PEA0, Successfully initialized with TCP/IP services
%PEA0, Remote node Address, 16.138.185.68, added to unicast list of IP bus, WE0
%PEA0, Remote node Address, 15.146.235.222, added to unicast list of IP bus, WE0
%PEA0, Remote node Address, 15.146.239.192, added to unicast list of IP bus, WE0
%PEA0, Hello sent on IP bus WE0
%PEA0, Cluster communication successfully initialized on IP interface , WE0
3 If the computer discovers a cluster, the computer attempts to join it. If a cluster is found, the connection manager displays one or more messages in the following format:
%CNXMAN, Sending VAXcluster membership request to system X...

Otherwise, the connection manager forms the cluster when it has enough votes to establish quorum (that is, when enough voting computers have booted).

4 As the booting computer joins the cluster, the connection manager displays a message in the following format:
%CNXMAN, now a VAXcluster member -- system X...

Note that if quorum is lost while the computer is booting, or if a computer is unable to join the cluster within 2 minutes of booting, the connection manager displays messages like the following:

%CNXMAN, Discovered system X...

%CNXMAN, Deleting CSB for system X...
%CNXMAN, Established "connection" to quorum disk
%CNXMAN, Have connection to system X...
%CNXMAN, Have "connection" to quorum disk

The last two messages show any connections that have already been formed.

5 If the cluster includes a quorum disk, you may also see messages like the following:
%CNXMAN, Using remote access method for quorum disk

%CNXMAN, Using local access method for quorum disk

The first message indicates that the connection manager is unable to access the quorum disk directly, either because the disk is unavailable or because it is accessed through the MSCP server. Another computer in the cluster that can access the disk directly must verify that a reliable connection to the disk exists.

The second message indicates that the connection manager can access the quorum disk directly and can supply information about the status of the disk to computers that cannot access the disk directly.

Note: The connection manager may not see the quorum disk initially because the disk may not yet be configured. In that case, the connection manager first uses remote access, then switches to local access.

6 Once the computer has joined the cluster, normal startup procedures execute. One of the first functions is to start the OPCOM process:
%%%%%%%%%%% OPCOM 15-JAN-1994 16:33:55.33 %%%%%%%%%%%

Logfile has been initialized by operator _X...$OPA0:
Logfile is SYS$SYSROOT:[SYSMGR]OPERATOR.LOG;17
%%%%%%%%%%% OPCOM 15-JAN-1994 16:33:56.43 %%%%%%%%%%%
16:32:32.93 Node X... (csid 0002000E) is now a VAXcluster member
7 As other computers join the cluster, OPCOM displays messages like the following:
%%%%% OPCOM 15-JAN-1994 16:34:25.23 %%%%% (from node X...)

16:34:24.42 Node X... (csid 000100F3)
received VAXcluster membership request from node X...

As startup procedures continue, various messages report startup events.

Hint: For troubleshooting purposes, you can include in your site-specific startup procedures messages announcing each phase of the startup process---for example, mounting disks or starting queues.

C.2 Satellite Fails to Boot

To boot successfully, a satellite must communicate with a MOP server over the LAN or IP. You can use the DECnet event logging feature to verify this communication. Perform the following procedure:
Step Action
1 Log in as system manager on the MOP server.
2 If event logging for management-layer events is not already enabled, enter the following NCP commands to enable it:
NCP> SET LOGGING MONITOR EVENT 0.*

NCP> SET LOGGING MONITOR STATE ON
3 Enter the following DCL command to enable the terminal to receive DECnet messages reporting downline load events:
$ REPLY/ENABLE=NETWORK

4 Boot the satellite. If the satellite and the MOP server can communicate and all boot parameters are correctly set, messages like the following are displayed at the MOP server's terminal:
DECnet event 0.3, automatic line service

From node 2.4 (URANUS), 15-JAN-1994 09:42:15.12
Circuit QNA-0, Load, Requested, Node = 2.42 (OBERON)
File = SYS$SYSDEVICE:<SYS10.>, Operating system
Ethernet address = 08-00-2B-07-AC-03
DECnet event 0.3, automatic line service
From node 2.4 (URANUS), 15-JAN-1994 09:42:16.76
Circuit QNA-0, Load, Successful, Node = 2.44 (ARIEL)
File = SYS$SYSDEVICE:<SYS11.>, Operating system
Ethernet address = 08-00-2B-07-AC-13
WHEN... THEN...
The satellite cannot communicate with the MOP server (VAX or Alpha). No message for that satellite appears. There may be a problem with a LAN cable connection or adapter service.
The satellite's data in the DECnet database is incorrectly specified (for example, if the hardware address is incorrect). A message like the following displays the correct address and indicates that a load was requested:
 DECnet event 0.7, aborted service

request
From node 2.4 (URANUS) 15-JAN-1994
Circuit QNA-0, Line open error
Ethernet address=08-00-2B-03-29-99

Note the absence of the node name, node address, and system root.

Sections C.2.2 through C.2.5 provide more information about satellite boot troubleshooting and often recommend that you ensure that the system parameters are set correctly.

C.2.1 Displaying Connection Messages

To enable the display of connection messages during a conversational boot, perform the following steps:

Step Action
1 Enable conversational booting by setting the satellite's NISCS_CONV_BOOT system parameter to 1. On Integrity servers and Alpha systems, update the ALPHAVMSSYS.PAR file and on Integrity server systems update the IA64VMSSYS.PAR file in the system root on the disk server.
2 Perform a conversational boot.

On Integrity servers and Alpha systems, enter the following command at the console:

>>> b -flags 0,1

On VAX systems, set bit <0> in register R5. For example, on a VAXstation 3100 system, enter the following command on the console:

>>> B/1

3 Observe connection messages.

Display connection messages during a satellite boot to determine which system in a large cluster is serving the system disk to a cluster satellite during the boot process. If booting problems occur, you can use this display to help isolate the problem with the system that is currently serving the system disk. Then, if your server system has multiple LAN adapters, you can isolate specific LAN adapters.

4 Isolate LAN adapters.

Isolate a LAN adapter by methodically rebooting with only one adapter connected. That is, disconnect all but one of the LAN adapters on the server system and reboot the satellite. If the satellite boots when it is connected to the system disk server, then follow the same procedure using a different LAN adapter. Continue these steps until you have located the bad adapter.

Reference: See also Appendix C for help with troubleshooting satellite booting problems.


Previous Next Contents Index