[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

The OpenVMS Frequently Asked Questions (FAQ)


Previous Contents Index

15.6.1 OpenVMS Cluster Communications Protocol Details?

The following sections contain information on the OpenVMS System Communications Services (SCS) Protocol. Cluster terminology is available in Section 15.6.1.2.1.

15.6.1.1 OpenVMS Cluster (SCS) over DECnet? Over IP?

The OpenVMS Cluster environment operates over various network protocols, but the core of clustering uses the System Communications Services (SCS) protocols, and SCS-specific network datagrams. Direct (full) connectivity is assumed.

An OpenVMS Cluster does not operate over DECnet, nor over IP.

No SCS protocol routers are available.

Many folks have suggested operating SCS over DECnet or IP over the years, but SCS is too far down in the layers, and any such project would entail a major or complete rewrite of SCS and of the DECnet or IP drivers. Further, the current DECnet and IP implementations have large tracts of code that operate at the application level, while SCS must operate in the rather more primitive contexts of the system and particularly the bootstrap---to get SCS to operate over a DECnet or IP connection would require relocating major portions of the DECnet or IP stack into the kernel. (And it is not clear that the result would even meet the bandwidth and latency expectations.)

The usual approach for multi-site OpenVMS Cluster configurations involves FDDI, Memory Channel (MC2), or a point-to-point remote bridge, brouter, or switch. The connection must be transparent, and it must operate at 10 megabits per second or better (Ethernet speed), with latency characteristics similar to that of Ethernet or better. Various sites use FDDI, MC2, ATM, or point-to-point T3 link.

15.6.1.2 Configuring Cluster SCS for path load balancing?

This section discusses OpenVMS Cluster communications, cluster terminology, related utilities, and command and control interfaces.

15.6.1.2.1 Cluster Terminology?

SCS: Systems Communication Services. The protocol used to communicate between VMSCluster systems and between OpenVMS systems and SCS-based storage controllers. (SCSI-based storage controllers do not use SCS.)

PORT: A communications device, such as DSSI, CI, Ethernet or FDDI. Each CI or DSSI bus is a different local port, named PAA0, PAB0, PAC0 etc. All Ethernet and FDDI busses make up a single PEA0 port.

VIRTUAL CIRCUIT: A reliable communications path established between a pair of ports. Each port in a VMScluster establishes a virtual circuit with every other port in that cluster.

All systems and storage controllers establish "Virtual Circuits" to enable communications between all available pairs of ports.

SYSAP: A "system application" that communicates using SCS. Each SYSAP communicates with a particular remote SYSAP. Example SYSAPs include:

VMS$DISK_CL_DRIVER connects to MSCP$DISK
The disk class driver is on every VMSCluster system. MSCP$DISK is on all disk controllers and all VMSCluster systems that have SYSGEN parameter MSCP_LOAD set to 1

VMS$TAPE_CL_DRIVER connects to MSCP$TAPE
The tape class driver is on every VMSCluster system. MSCP$TAPE is on all tape controllers and all VMSCluster systems that have SYSGEN parameter TMSCP_LOAD set to 1

VMS$VAXCLUSTER connects to VMS$VAXCLUSTER
This SYSAP contains the connection manager, which manages cluster connectivity, runs the cluster state transition algorithm, and implements the cluster quorum algorithm. This SYSAP also handles lock traffic, and various other cluster communications functions.

SCS$DIR_LOOKUP connects to SCS$DIRECTORY
This SYSAP is used to find SYSAPs on remote systems

MSCP and TMSCP
The Mass Storage Control Protocol and the Tape MSCP servers are SYSAPs that provide access to disk and tape storage, typically operating over SCS protocols. MSCP and TMSCP SYSAPs exist within OpenVMS (for OpenVMS hosts serving disks and tapes), within CI- and DSSI-based storage controllers, and within host-based MSCP- or TMSCP storage controllers. MSCP and TMSCP can be used to serve MSCP and TMSCP storage devices, and can also be used to serve SCSI and other non-MSCP/non-TMSCP storage devices.

SCS CONNECTION: A SYSAP on one node establishes an SCS connection to its counterpart on another node. This connection will be on ONE AND ONLY ONE of the available virtual circuits.

15.6.1.2.2 Cluster Communications Control?

When there are multiple virtual circuits between two OpenVMS systems it is possible for the VMS$VAXCLUSTER to VMS$VAXCLUSTER connection to use any one of these circuits. All lock traffic between the two systems will then travel on the selected virtual circuit.

Each port has a "LOAD CLASS" associated with it. This load class helps to determine which virtual circuit a connection will use. If one port has a higher load class than all others then this port will be used. If two or more ports have equally high load classes then the connection will use the first of these that it finds. Prior to enhancements found in V7.3-1 and later, the load class is static and normally all CI and DSSI ports have a load class of 14(hex), while the Ethernet and FDDI ports will have a load class of A(hex). With V7.3-1 and later, the load class values are dynamic.

For instance, if you have multiple DSSI busses and an FDDI, the VMS$VAXCLUSTER connection will chose the DSSI bus as this path has the system disk, and thus will always be the first DSSI bus discovered when the OpenVMS system boots.

To force all lock traffic off the DSSI and on to the FDDI, for instance, an adjustment to the load class value is required, or the DSSI SCS port must be disabled.

In addition to the load class mechanisms, you can also use the "preferred path" mechanisms of MSCP and TMSCP services. This allows you to control the SCS connections used for serving remote disk and tape storage. The preferred path mechanism is most commonly used to explicitly spread cluster I/O activity over hosts and/or storage controllers serving disk or tape storage in parallel. This can be particularly useful if your hosts or storage controllers individually lack the necessary I/O bandwidth for the current I/O load, and must thus aggregate bandwidth to serve the cluster I/O load.

For related tools, see various utilities including LAVC$STOP_BUS and LAVC$START_BUS, and see DCL commands including SET PREFERRED_PATH.

15.6.1.2.3 Cluster Communications Control Tools and Utilities?

In most OpenVMS versions, you can use the tools:

  • SYS$EXAMPLES:LAVC$STOP_BUS
  • SYS$EXAMPLES:LAVC$START_BUS

These tools permit you to disable or enable all SCS traffic on the on the specified paths.

You can also use a preferred path mechanism that tells the local MSCP disk class driver (DUDRIVER) which path to a disk should be used. Generally, this is used with dual-pathed disks, forcing I/O traffic through one of the controllers instead of the other. This can be used to implement a crude form of I/O load balancing at the disk I/O level.

Prior to V7.2, the preferred path feature uses the tool:

  • SYS$EXAMPLES:PREFER.MAR

In OpenVMS V7.2 and later, you can use the following DCL command:


$ SET PREFERRED_PATH

The preferred path mechanism does not disable nor affect SCS operations on the non-preferred path.

With OpenVMS V7.3 and later, please see the SCACP utility for control over cluster communications, SCS virtual circuit control, port selection, and related.

15.6.2 Cluster System Parameter Settings?

The following sections contain details of configuring cluster-related system parameters.

15.6.2.1 What is the correct value for EXPECTED_VOTES in a VMScluster?

The VMScluster connection manager uses the concept of votes and quorum to prevent disk and memory data corruptions---when sufficient votes are present for quorum, then access to resources is permitted. When sufficient votes are not present, user activity will be blocked. The act of blocking user activity is called a "quorum hang", and is better thought of as a "user data integrity interlock". This mechanism is designed to prevent a partitioned VMScluster, and the resultant massive disk data corruptions. The quorum mechanism is expressly intended to prevent your data from becoming severely corrupted.

On each OpenVMS node in a VMScluster, one sets two values in SYSGEN: VOTES, and EXPECTED_VOTES. The former is how many votes the node contributes to the VMScluster. The latter is the total number of votes expected when the full VMScluster is bootstrapped.

Some sites erroneously attempt to set EXPECTED_VOTES too low, believing that this will allow when only a subset of voting nodes are present in a VMScluster. It does not. Further, an erroneous setting in EXPECTED_VOTES is automatically corrected once VMScluster connections to other nodes are established; user data is at risk of severe corruptions during the earliest and most vulnerable portion of the system bootstrap, before the connections have been established.

One can operate a VMScluster with one, two, or many voting nodes. With any but the two-node configuration, keeping a subset of the nodes active when some nodes fail can be easily configured. With the two-node configuration, one must use a primary-secondary configuration (where the primary has all the votes), a peer configuration (where when either node is down, the other hangs), or (preferable) a shared quorum disk.

Use of a quorum disk does slow down VMScluster transitions somewhat -- the addition of a third voting node that contributes the vote(s) that would be assigned to the quorum disk makes for faster transitions---but the use of a quorum disk does mean that either node in a two-node VMScluster configuration can operate when the other node is down.

Note

The quorum disk must be on a non-host-based shadowed disk, though it can be protected with controller-based RAID. Because host-based volume shadowing depends on the lock manager and the lock manager depends on the connection manager and the connection manager depends on quorum, it is not technically feasible (nor even particularly reliable) to permit host-based volume shadowing to protect the quorum disk.

If you choose to use a quoum disk, a QUORUM.DAT file will be automatically created when OpenVMS first boots and when a quorum disk is specified -- well, the QUORUM.DAT file will be created when OpenVMS is booted without also needing the votes from the quorum disk.

In a two-node VMScluster with a shared storage interconnect, typically each node has one vote, and the quorum disk also has one vote. EXPECTED_VOTES is set to three.

Using a quorum disk on a non-shared interconnect is unnecessary---the use of a quorum disk does not provide any value, and the votes assigned to the quorum disk should be assigned to the OpenVMS host serving access to the disk.

For information on quorum hangs, see the OpenVMS documentation. For information on changing the EXPECTED_VOTES value on a running system, see the SET CLUSTER/EXPECTED_VOTES command, and see the documentation for the AMDS and Availability Manager tools. Also of potential interest is the OpenVMS system console documentation for the processor-specific console commands used to trigger the IPC (Interrrupt Priority Level %x0C; IPL C) handler. (IPC is not available on OpenVMS I64 V8.2.) AMDS, Availability Manager, and the IPC handler can each be used to clear a quorum hang. Use of AMDS and Availability Manager is generally recommended over IPC, particularly because IPC can cause CLUEXIT bugchecks if the system should remain halted beyond the cluster sanity timer limits, and because some Alpha consoles and most (all?) Integrity consoles do not permit a restart after a halt.

The quorum scheme is a set of "blade guards" deliberately implemented by OpenVMS Engineering to provide data integrity---remove these blade guards at your peril. OpenVMS Engineering did not implement the quorum mechanism to make a system manager's life more difficult--- the quorum mechanism was specifically implemented to keep your data from getting scrambled.

15.6.2.1.1 Why no shadowing for a Quorum Disk?

Stated simply, Host-Based Volume Shadowing uses the Distributed Lock Manager (DLM) to coordinate changes to membership of a shadowset (e.g. removing a member). The DLM depends in turn on the Connection Manager enforcing the Quorum Scheme and deciding which node(s) (and quorum disk) are participating in the cluster, and telling the DLM when it needs to do things like a lock database rebuild operation. So you can't introduce a dependency of the Connection Manager on Shadowing to try to pick proper shadowset member(s) to use as the Quorum Disk when Shadowing itself is using the DLM and thus indirectly depending on the Connection Manager to keep the cluster membership straight---it's a circular dependency.

So in practice, folks simply depend on controller-based mirroring (or controller-based RAID) to protect the Quorum Disk against disk failures (and dual-redundant controllers to protect against most cases of controller and interconnect failures). Since this disk unit appears to be a single disk up at the VMS level, there's no chance of ambiguity.

15.6.2.2 Explain disk (or tape) allocation class settings?

The allocation class mechanism provides the system manager with a way to configure and resolve served and direct paths to storage devices within a cluster. Any served device that provides multiple paths should be configured using a non-zero allocation class, either at the MSCP (or TMSCP) storage controllers, at the port (for port allocation classes), or at the OpenVMS MSCP (or TMSCP) server. All controllers or servers providing a path to the same device should have the same allocation class (at the port, controller, or server level).

Each disk (or tape) unit number used within a non-zero disk (or tape) allocation class must be unique, regardless of the particular device prefix. For the purposes of multi-path device path determination, any disk (or tape) device with the same unit number and the same disk (or tape) allocation class configuration is assumed to be the same device.

If you are reconfiguring disk device allocation classes, you will want to avoid the use of allocation class one ($1$) until/unless you have Fibre Channel storage configured. (Fibre Channel storage specifically requires the use of allocation class $1$. eg: $1$DGA0:.)

15.6.2.2.1 How to configure allocation classes and Multi-Path SCSI?

The HSZ allocation class is applied to devices, starting with OpenVMS V7.2. It is considered a port allocation class (PAC), and all device names with a PAC have their controller letter forced to "A". (You might infer from the the text in the "Guidelines for OpenVMS Cluster Configurations" that this is something you have to do, though OpenVMS will thoughtfully handle this renaming for you.)

You can force the device names back to DKB by setting the HSZ allocation class to zero, and setting the PKB PAC to -1. This will use the host allocation class, and will leave the controller letter alone (that is, the DK controller letter will be the same as the SCSI port (PK) controller). Note that this won't work if the HSZ is configured in multibus failover mode. In this case, OpenVMS requires that you use an allocation class for the HSZ.

When your configuration gets even moderately complex, you must pay careful attention to how you assign the three kinds of allocation class: node, port and HSZ/HSJ, as otherwise you could wind up with device naming conflicts that can be painful to resolve.

The display-able path information is for SCSI multi-path, and permits the multi-path software to distinguish between different paths to the same device. If you have two paths to $1$DKA100, for example by having two KZPBA controllers and two SCSI buses to the HSZ, you would have two UCBs in a multi-path set. The path information is used by the multi-path software to distinguish between these two UCBs.

The displayable path information describes the path; in this case, the SCSI port. If port is PKB, that's the path name you get. The device name is no longer completely tied to the port name; the device name now depends on the various allocation class settings of the controller, SCSI port or node.

The reason the device name's controller letter is forced to "A" when you use PACs is because a shared SCSI bus may be configured via different ports on the various nodes connected to the bus. The port may be PKB on one node, and PKC on the other. Rather obviously, you will want to have the shared devices use the same device names on all nodes. To establish this, you will assign the same PAC on each node, and OpenVMS will force the controller letter to be the same on each node. Simply choosing "A" was easier and more deterministic than negotiating the controller letter between the nodes, and also parallels the solution used for this situation when DSSI or SDI/STI storage was used.

To enable port allocation classes, see the SYSBOOT command SET/BOOT, and see the DEVICE_NAMING system parameter.

This information is also described in the Cluster Systems and Guidelines for OpenVMS Cluster Configurations manuals.

15.6.3 Tell me about SET HOST/DUP and SET HOST/HSC

The OpenVMS DCL commands SET HOST/DUP and SET HOST/HSC are used to connect to storage controllers via the Diagnostics and Utility Protocol (DUP). These commands require that the FYDRIVER device driver be connected. This device driver connection is typically performed by adding the following command(s) into the system startup command procedure:

On OpenVMS Alpha:


$ RUN SYS$SYSTEM:SYSMAN
SYSMAN> IO CONNECT FYA0/NOADAPTER/DRIVER=SYS$FYDRIVER

On OpenVMS VAX:


$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> CONNECT FYA0/NOADAPTER

Alternatives to the DCL SET HOST/DUP command include the console SET HOST command available on various mid- to recent-vintage VAX consoles:

Access to Parameters on an Embedded DSSI controller:


SET HOST/DUP/DSSI[/BUS:{0:1}] dssi_node_number PARAMS

Access to Directory of tools on an Embedded DSSI controller:


SET HOST/DUP/DSSI[/BUS:{0:1}] dssi_node_number DIRECT

Access to Parameters on a KFQSA DSSI controller:


SHOW UQSSP ! to get port_controller_number PARAMS
SET HOST/DUP/UQSSP port_controller_number PARAMS

These console commands are available on most MicroVAX and VAXstation 3xxx series systems, and most (all?) VAX 4xxx series systems. For further information, see the system documentation and---on most VAX systems---see the console HELP text.

EK-410AB-MG, _DSSI VAXcluster Installation and Troubleshooting_, is a good resource for setting up a DSSI VMScluster on OpenVMS VAX nodes. (This manual predates coverage of OpenVMS Alpha systems, but gives good coverage to all hardware and software aspects of setting up a DSSI-based VMScluster---and most of the concepts covered are directly applicable to OpenVMS Alpha systems. This manual specifically covers the hardware, which is something not covered by the standard OpenVMS VMScluster documentation.)

Also see Section 15.3.3, and for the SCS name of the OpenVMS host see Section 5.7.

15.6.4 How do I rename a DSSI disk (or tape?)

If you want to renumber or rename DSSI disks or DSSI tapes, it's easy---if you know the secret incantation...

From OpenVMS:


$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> CONNECT FYA0/NOADAPTER
SYSGEN> ^Z
$ SET HOST/DUP/SERV=MSCP$DUP/TASK=PARAMS <DSSI-NODE-NAME>
...
PARAMS> STAT CONF
<The software version is normally near the top of the display.>
PARAMS> EXIT
...

From the console on most 3000- and 4000-class VAX system consoles... (Obviously, the system must be halted for these commands...)

Integrated DSSI:


SET HOST/DUP/DSSI[/BUS:[0:1]] dssi_node_number PARAMS

KFQSA:


SET HOST/DUP/UQSSP port_controller_number PARAMS

For information on how to get out into the PARAMS subsystem, also see the HELP at the console prompt for the SET HOST syntax, or see the HELP on SET HOST /DUP (once you've connected FYDRIVER under OpenVMS).

Once you are out into the PARAMS subsystem, you can use the FORCEUNI option to force the use of the UNITNUM value and then set a unique UNITNUM inside each DSSI ISE---this causes each DSSI ISE to use the specfied unit number and not use the DSSI node as the unit number. Other parameters of interest are NODENAME and ALLCLASS, the node name and the (disk or tape) cluster allocation class.

Ensure that all disk unit numbers used within an OpenVMS Cluster disk allocation class are unique, and all tape unit numbers used within an OpenVMS Cluster tape allocation class are also unique. For details on the SCS name of the OpenVMS host, see Section 5.7. For details of SET HOST/DUP, see Section 15.6.3.

15.6.5 Where can I get Fibre Channel Storage (SAN) information?

15.6.6 Which files must be shared in an OpenVMS Cluster?

The following files are expected to be common across all nodes in a cluster environment, and though SYSUAF is very often common, it can also be carefully coordinated---with matching UIC values and matching binary identifier values across all copies. (The most common use of multiple SYSUAF files is to allow different quotas on different nodes. In any event, the binary UIC values and the binary identifier values must be coordinated across all SYSUAF files, and must match the RIGHTSLIST file.) In addition to the list of files (and directories, in some cases) shown in Table 15-1, please review the VMScluster documentation, and the System Management documentation.

Table 15-1 Cluster Common Shared Files
Filename Default Specification
SYSUAF SYS$SYSTEM:.DAT
SYSUAFALT SYS$SYSTEM:.DAT
SYSALF SYS$SYSTEM:.DAT
RIGHTSLIST SYS$SYSTEM:.DAT
NETPROXY SYS$SYSTEM:.DAT
NET$PROXY SYS$SYSTEM:.DAT
NETOBJECT SYS$SYSTEM:.DAT
NETNODE_REMOTE SYS$SYSTEM:.DAT
QMAN$MASTER SYS$SYSTEM:; this is a set of related files
LMF$LICENSE SYS$SYSTEM:.LDB
VMSMAIL_PROFILE SYS$SYSTEM:.DATA
VMS$OBJECTS SYS$SYSTEM:.DAT
VMS$AUDIT_SERVER SYS$MANAGER:.DAT
VMS$PASSWORD_HISTORY SYS$SYSTEM:.DATA
NETNODE_UPDATE SYS$MANAGER:.COM
VMS$PASSWORD_POLICY SYS$LIBRARY:.EXE
LAN$NODE_DATABASE SYS$SYSTEM:.DAT
VMS$CLASS_SCHEDULE SYS$SYSTEM:.DATA
SYS$REGISTRY SYS$SYSTEM:; this is a set of related files

In addition to the documentation, also see the current version of the file SYS$STARTUP:SYLOGICALS.TEMPLATE. Specifically, please see the most recent version of this file available, starting on or after OpenVMS V7.2.

A failure to have common or (in the case of multiple SYSUAF files) synchronized files can cause problems with batch operations, with the SUBMIT/USER command, with the general operations with the cluster alias, and with various SYSMAN and related operations. Object protections and defaults will not necessarily be consistent, as well. This can also lead to system security problems, including unintended access denials and unintended object accesses, should the files and particularly should the binary identifier values become skewed.


Previous Next Contents Index