|
Guidelines for OpenVMS Cluster Configurations
Guidelines for OpenVMS Cluster Configurations
A.6.4 Step 4: Show and Set SCSI Console Parameters
When creating a SCSI OpenVMS Cluster system, you need to verify the
settings of the console environment parameters shown in Table A-6
and, if necessary, reset their values according to your configuration
requirements.
Table A-6 provides a brief description of SCSI console parameters.
Refer to your system-specific documentation for complete information
about setting these and other system parameters.
Note
The console environment parameters vary, depending on the host adapter
type. Refer to the Installation and User's Guide for your adapter.
|
Table A-6 SCSI Environment Parameters
Parameter |
Description |
bootdef_dev
device_name
|
Specifies the default boot device to the system.
|
boot_osflags
root_number,
bootflag
|
The boot_osflags variable contains information that is used by the
operating system to determine optional aspects of a system bootstrap
(for example, conversational bootstrap).
|
pk*0_disconnect
|
Allows the target to disconnect from the SCSI bus while the target acts
on a command. When this parameter is set to 1, the target is allowed to
disconnect from the SCSI bus while processing a command. When the
parameter is set to 0, the target retains control of the SCSI bus while
acting on a command.
|
pk*0_fast
|
Enables SCSI adapters to perform in fast SCSI mode. When this parameter
is set to 1, the default speed is set to fast mode; when the parameter
is 0, the default speed is standard mode.
|
pk*0_host_id
|
Sets the SCSI device ID of host adapters to a value between 0 and 7.
|
scsi_poll
|
Enables console polling on all SCSI interconnects when the system is
halted.
|
control_scsi_term
|
Enables and disables the terminator on the integral SCSI interconnect
at the system bulkhead (for some systems).
|
Note
If you need to modify any parameters, first change the parameter (using
the appropriate console SET command). Then enter a console INIT command
or press the Reset button to make the change effective.
|
Examples
Before setting boot parameters, display the current settings of these
parameters, as shown in the following examples:
-
>>>SHOW *BOOT*
boot_osflags 10,0
boot_reset OFF
bootdef_dev dka200.2.0.6.0
>>>
|
The first number in the boot_osflags parameter specifies the system
root. (In this example, the first number is 10.) The boot_reset
parameter controls the boot process. The default boot device is the
device from which the OpenVMS operating system is loaded. Refer to the
documentation for your specific system for additional booting
information. Note that you can identify multiple boot devices to
the system. By doing so, you cause the system to search for a bootable
device from the list of devices that you specify. The system then
automatically boots from the first device on which it finds bootable
system software. In addition, you can override the default boot device
by specifying an alternative device name on the boot command line.
Typically, the default boot flags suit your environment. You can
override the default boot flags by specifying boot flags dynamically on
the boot command line with the -flags option.
-
>>>SHOW *PK*
pka0_disconnect 1
pka0_fast 1
pka0_host_id 7
|
The pk*0_disconnect parameter determines whether or not a target is
allowed to disconnect from the SCSI bus while it acts on a command. On
a multihost SCSI bus, the pk*0_disconnect parameter must be
set to 1, so that disconnects can occur. The pk*0_fast parameter
controls whether fast SCSI devices on a SCSI controller perform in
standard or fast mode. When the parameter is set to 0, the default
speed is set to standard mode; when the pk*0_fast parameter is set to
1, the default speed is set to fast SCSI mode. In this example, devices
on SCSI controller pka0 are set to fast SCSI mode. This means that both
standard and fast SCSI devices connected to this controller will
automatically perform at the appropriate speed for the device (that is,
in either fast or standard mode). The pk*0_host_id parameter
assigns a bus node ID for the specified host adapter. In this example,
pka0 is assigned a SCSI device ID of 7.
-
>>>SHOW *POLL*
scsi_poll ON
|
Enables or disables polling of SCSI devices while in console mode.
Set polling ON or OFF depending on the needs and environment of
your site. When polling is enabled, the output of the SHOW DEVICE is
always up to date. However, because polling can consume SCSI bus
bandwidth (proportional to the number of unused SCSI IDs), you might
want to disable polling if one system on a multihost SCSI bus will be
in console mode for an extended time. Polling must be
disabled during any hot-plugging operations. For information about hot
plugging in a SCSI OpenVMS Cluster environment, see Section A.7.6.
-
>>>SHOW *TERM*
control_scsi_term external
|
Used on some systems (such as the AlphaStation 400) to enable or
disable the SCSI terminator next to the external connector. Set the
control_scsi_term parameter to external if a cable is attached to the
bulkhead. Otherwise, set the parameter to internal.
A.6.5 Step 5: Install the OpenVMS Operating System
Refer to the OpenVMS Alpha or VAX upgrade and installation manual for
information about installing the OpenVMS operating system. Perform the
installation once for each system disk in the OpenVMS Cluster system.
In most configurations, there is a single system disk. Therefore, you
need to perform this step once, using any system.
During the installation, when you are asked if the system is to be a
cluster member, answer Yes. Then, complete the installation according
to the guidelines provided in HP OpenVMS Cluster Systems.
A.6.6 Step 6: Configure Additional Systems
Use the CLUSTER_CONFIG command procedure to configure additional
systems. Execute this procedure once for the second host that you have
configured on the SCSI bus. (See Section A.7.1 for more information.)
A.7 Supplementary Information
The following sections provide supplementary technical detail and
concepts about SCSI OpenVMS Cluster systems.
A.7.1 Running the OpenVMS Cluster Configuration Command Procedure
You execute either the CLUSTER_CONFIG.COM or the CLUSTER_CONFIG_LAN.COM
command procedure to set up and configure nodes in your OpenVMS Cluster
system. Your choice of command procedure depends on whether you use
DECnet or the LANCP utility for booting. CLUSTER_CONFIG.COM uses
DECnet; CLUSTER_CONFIG_LAN.COM uses the LANCP utility. (For information
about using both procedures, see HP OpenVMS Cluster Systems.)
Typically, the first computer is set up as an OpenVMS Cluster system
during the initial OpenVMS installation procedure (see Section A.6.5).
The CLUSTER_CONFIG procedure is then used to configure additional
nodes. However, if you originally installed OpenVMS without enabling
clustering, the first time you run CLUSTER_CONFIG, the procedure
converts the standalone system to a cluster system.
To configure additional nodes in a SCSI cluster, execute
CLUSTER_CONFIG.COM for each additional node. Table A-7 describes
the steps to configure additional SCSI nodes.
Table A-7 Steps for Installing Additional Nodes
Step |
Procedure |
1
|
From the first node, run the CLUSTER_CONFIG.COM procedure and select
the default option [1] for ADD.
|
2
|
Answer Yes when CLUSTER_CONFIG.COM asks whether you want to proceed.
|
3
|
Supply the DECnet name and address of the node that you are adding to
the existing single-node cluster.
|
4
|
Confirm that this will be a node with a shared SCSI interconnect.
|
5
|
Answer No when the procedure asks whether this node will be a satellite.
|
6
|
Configure the node to be a disk server if it will serve disks to other
cluster members.
|
7
|
Place the new node's system root on the default device offered.
|
8
|
Select a system root for the new node. The first node uses SYS0. Take
the default (SYS10 for the first additional node), or choose your own
root numbering scheme. You can choose from SYS1 to SYS
n, where
n is hexadecimal FFFF.
|
9
|
Select the default disk allocation class so that the new node in the
cluster uses the same ALLOCLASS as the first node.
|
10
|
Confirm whether or not there is a quorum disk.
|
11
|
Answer the questions about the sizes of the page file and swap file.
|
12
|
When CLUSTER_CONFIG.COM completes, boot the new node from the new
system root. For example, for SYSFF on disk DKA200, enter the following
command:
BOOT -FL FF,0 DKA200
In the BOOT command, you can use the following flags:
- -FL indicates boot flags.
- FF is the new system root.
- 0 means there are no special boot requirements, such as
conversational boot.
|
You can run the CLUSTER_CONFIG.COM procedure to set up an additional
node in a SCSI cluster, as shown in Example A-2.
Example A-2 Adding a Node to a SCSI
Cluster |
$ @SYS$MANAGER:CLUSTER_CONFIG
Cluster Configuration Procedure
Use CLUSTER_CONFIG.COM to set up or change an OpenVMS Cluster configuration.
To ensure that you have the required privileges, invoke this procedure
from the system manager's account.
Enter ? for help at any prompt.
1. ADD a node to a cluster.
2. REMOVE a node from the cluster.
3. CHANGE a cluster member's characteristics.
4. CREATE a duplicate system disk for CLU21.
5. EXIT from this procedure.
Enter choice [1]:
The ADD function adds a new node to a cluster.
If the node being added is a voting member, EXPECTED_VOTES in
every cluster member's MODPARAMS.DAT must be adjusted, and the
cluster must be rebooted.
WARNING - If this cluster is running with multiple system disks and
if common system files will be used, please, do not
proceed unless you have defined appropriate logical
names for cluster common files in SYLOGICALS.COM.
For instructions, refer to the OpenVMS Cluster Systems
manual.
Do you want to continue [N]? y
If the new node is a satellite, the network databases on CLU21 are
updated. The network databases on all other cluster members must be
updated.
For instructions, refer to the OpenVMS Cluster Systems manual.
What is the node's DECnet node name? SATURN
What is the node's DECnet node address? 7.77
Is SATURN to be a clustered node with a shared SCSI bus (Y/N)? y
Will SATURN be a satellite [Y]? N
Will SATURN be a boot server [Y]?
This procedure will now ask you for the device name of SATURN's system root.
The default device name (DISK$BIG_X5T5:) is the logical volume name of
SYS$SYSDEVICE:.
What is the device name for SATURN's system root [DISK$BIG_X5T5:]?
What is the name of SATURN's system root [SYS10]? SYS2
Creating directory tree SYS2 ...
System root SYS2 created
NOTE:
All nodes on the same SCSI bus must be members of the same cluster
and must all have the same non-zero disk allocation class or each
will have a different name for the same disk and data corruption
will result.
Enter a value for SATURN's ALLOCLASS parameter [7]:
Does this cluster contain a quorum disk [N]?
Updating network database...
Size of pagefile for SATURN [10000 blocks]?
.
.
.
|
A.7.2 Error Reports and OPCOM Messages in Multihost SCSI Environments
Certain common operations, such as booting or shutting down a host on a
multihost SCSI bus, can cause other hosts on the SCSI bus to experience
errors. In addition, certain errors that are unusual in a single-host
SCSI configuration may occur more frequently on a multihost SCSI bus.
These errors are transient errors that OpenVMS detects, reports, and
recovers from without losing data or affecting applications that are
running. This section describes the conditions that generate these
errors and the messages that are displayed on the operator console and
entered into the error log.
A.7.2.1 SCSI Bus Resets
When a host connected to a SCSI bus first starts, either by being
turned on or by rebooting, it does not know the state of the SCSI bus
and the devices on it. The ANSI SCSI standard provides a method called
BUS RESET to force the bus and its devices into a known state. A host
typically asserts a RESET signal one or more times on each of its SCSI
buses when it first starts up and when it shuts down. While this is a
normal action on the part of the host asserting RESET, other hosts
consider this RESET signal an error because RESET requires that the
hosts abort and restart all I/O operations that are in progress.
A host may also reset the bus in the midst of normal operation if it
detects a problem that it cannot correct in any other way. These kinds
of resets are uncommon, but they occur most frequently when something
on the bus is disturbed. For example, an attempt to hot plug a SCSI
device while the device is still active (see Section A.7.6) or halting
one of the hosts with Ctrl/P can cause a condition that forces one or
more hosts to issue a bus reset.
A.7.2.2 SCSI Timeouts
When a host exchanges data with a device on the SCSI bus, there are
several different points where the host must wait for the device or the
SCSI adapter to react. In an OpenVMS system, the host is allowed to do
other work while it is waiting, but a timer is started to make sure
that it does not wait too long. If the timer expires without a response
from the SCSI device or adapter, this is called a timeout.
There are three kinds of timeouts:
- Disconnect timeout---The device accepted a command from the host
and disconnected from the bus while it processed the command but never
reconnected to the bus to finish the transaction. This error happens
most frequently when the bus is very busy. See Section A.7.5 for more
information. The disconnect timeout period varies with the device, but
for most disks, it is about 20 seconds.
- Selection timeout---The host tried to send a command to a device on
the SCSI bus, but the device did not respond. This condition might
happen if the device did not exist or if it were removed from the bus
or powered down. (This failure is not more likely with a
multi-initiator system; it is mentioned here for completeness.) The
selection timeout period is about 0.25 seconds.
- Interrupt timeout---The host expected the adapter to respond for
any other reason, but it did not respond. This error is usually an
indication of a busy SCSI bus. It is more common if you have initiator
unit numbers set low (0 or 1) rather than high (6 or 7). The interrupt
timeout period is about 4 seconds.
Timeout errors are not inevitable on SCSI OpenVMS Cluster systems.
However, they are more frequent on SCSI buses with heavy traffic and
those with two initiators. They do not necessarily indicate a hardware
or software problem. If they are logged frequently, you should consider
ways to reduce the load on the SCSI bus (for example, adding an
additional bus).
A.7.2.3 Mount Verify
Mount verify is a condition declared by a host about a device. The host
declares this condition in response to a number of possible transient
errors, including bus resets and timeouts. When a device is in the
mount verify state, the host suspends normal I/O to it until the host
can determine that the correct device is there, and that the device is
accessible. Mount verify processing then retries outstanding I/Os in a
way that insures that the correct data is written or read. Application
programs are unaware that a mount verify condition has occurred as long
as the mount verify completes.
If the host cannot access the correct device within a certain amount of
time, it declares a mount verify timeout, and application programs are
notified that the device is unavailable. Manual intervention is
required to restore a device to service after the host has declared a
mount verify timeout. A mount verify timeout usually means that the
error is not transient. The system manager can choose the timeout
period for mount verify; the default is one hour.
A.7.2.4 Shadow Volume Processing
Shadow volume processing is a process similar to mount verify, but it
is for shadow set members. An error on one member of a shadow set
places the set into the volume processing state, which blocks I/O while
OpenVMS attempts to regain access to the member. If access is regained
before shadow volume processing times out, then the outstanding I/Os
are reissued and the shadow set returns to normal operation. If a
timeout occurs, then the failed member is removed from the set. The
system manager can select one timeout value for the system disk shadow
set, and one for application shadow sets. The default value for both
timeouts is 20 seconds.
Note
The SCSI disconnect timeout and the default shadow volume processing
timeout are the same. If the SCSI bus is heavily utilized so that
disconnect timeouts may occur, it may be desirable to increase the
value of the shadow volume processing timeout. (A recommended value is
60 seconds.) This may prevent shadow set members from being expelled
when they experience disconnect timeout errors.
|
A.7.2.5 Expected OPCOM Messages in Multihost SCSI Environments
When a bus reset occurs, an OPCOM message is displayed as each mounted
disk enters and exits mount verification or shadow volume processing.
When an I/O to a drive experiences a timeout error, an OPCOM message is
displayed as that drive enters and exits mount verification or shadow
volume processing.
If a quorum disk on the shared SCSI bus experiences either of these
errors, then additional OPCOM messages may appear, indicating that the
connection to the quorum disk has been lost and regained.
A.7.2.6 Error Log Basics
In the OpenVMS system, the Error Log utility allows device drivers to
save information about unusual conditions that they encounter. In the
past, most of these unusual conditions have happened as a result of
errors such as hardware failures, software failures, or transient
conditions (for example, loose cables).
If you type the DCL command SHOW ERROR, the system displays a summary
of the errors that have been logged since the last time the system
booted. For example:
$ SHOW ERROR
Device Error Count
SALT$PKB0: 6
$1$DKB500: 10
PEA0: 1
SALT$PKA0: 9
$1$DKA0: 0
|
In this case, 6 errors have been logged against host SALT's SCSI port B
(PKB0), 10 have been logged against disk $1$DKB500, and so forth.
To see the details of these errors, you can use the command
ANALYZE/ERROR/SINCE=dd-mmm-yyyy:hh:mm:ss at the DCL prompt.
The output from this command displays a list of error log entries with
information similar to the following:
******************************* ENTRY 2337. *******************************
ERROR SEQUENCE 6. LOGGED ON: CPU_TYPE 00000002
DATE/TIME 29-MAY-1995 16:31:19.79 SYS_TYPE 0000000D
<identification information>
ERROR TYPE 03
COMMAND TRANSMISSION FAILURE
SCSI ID 01
SCSI ID = 1.
SCSI LUN 00
SCSI LUN = 0.
SCSI SUBLUN 00
SCSI SUBLUN = 0.
PORT STATUS 00000E32
%SYSTEM-E-RETRY, RETRY OPERATION
<additional information>
|
For this discussion, the key elements are the ERROR TYPE and, in some
instances, the PORT STATUS fields. In this example, the error type is
03, COMMAND TRANSMISSION FAILURE, and the port status is 00000E32,
SYSTEM-E-RETRY.
A.7.2.7 Error Log Entries in Multihost SCSI Environments
The error log entries listed in this section are likely to be logged in
a multihost SCSI configuration, and you usually do not need to be
concerned about them. You should, however, examine any error log
entries for messages other than those listed in this section.
- ERROR TYPE 0007, BUS RESET DETECTED
Occurs when the other
system asserts the SCSI bus reset signal. This happens when:
- A system's power-up self-test runs.
- A console INIT command is executed.
- The EISA Configuration Utility (ECU) is run.
- The console BOOT command is executed (in this case, several resets
occur).
- System shutdown completes.
- The system detects a problem with an adapter or a SCSI bus (for
example, an interrupt timeout).
This error causes all mounted disks to enter mount verification.
- ERROR TYPE 05, EXTENDED SENSE DATA RECEIVED
When a SCSI bus is
reset, an initiator must get "sense data" from each device.
When the initiator gets this data, an EXTENDED SENSE DATA RECEIVED
error is logged. This is expected behavior.
- ERROR TYPE 03, COMMAND TRANSMISSION FAILURE
PORT STATUS E32, SYSTEM-E-RETRY Occasionally, one host may send a
command to a disk while the disk is exchanging error information with
the other host. Many disks respond with a SCSI "BUSY" code.
The OpenVMS system responds to a SCSI BUSY code by logging this error
and retrying the operation. You are most likely to see this error when
the bus has been reset recently. This error does not always happen near
resets, but when it does, the error is expected and unavoidable.
- ERROR TYPE 204, TIMEOUT
An interrupt timeout has occurred (see
Section A.7.2.2). The disk is put into mount verify when this error
occurs.
- ERROR TYPE 104, TIMEOUT
A selection timeout has occurred (see
Section A.7.2.2). The disk is put into mount verify when this error
occurs.
|