[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

HP OpenVMS Cluster Systems


Previous Contents Index

8.6.8 Rebooting Satellites Configured with OpenVMS on a Local Disk (Alpha only)

Satellite nodes can be set up to reboot automatically when recovering from system failures or power failures.

Reboot behavior varies from system to system. Many systems provide a console variable that allows you to specify which device to boot from by default. However, some systems have predefined boot "sniffers" that automatically detect a bootable device. The following table describes the rebooting conditions.

IF... AND... THEN...
If your system does not allow you to specify the boot device for automatic reboot (that is, it has a boot sniffer) An operating system is installed on the system's local disk That disk will be booted in preference to requesting a satellite MOP load. To avoid this, you should take one of the measures in the following list before allowing any operation that causes an automatic reboot---for example, executing SYS$SYSTEM:SHUTDOWN.COM with the REBOOT option or using CLUSTER_CONFIG.COM to add that satellite to the cluster:
  • Rename the directory file ddcu:[000000]SYS0.DIR on the local disk to ddcu:[000000]SYS x.DIR (where SYS x is a root other than SYS0, SYSE, or SYSF). Then enter the DCL command SET FILE/REMOVE as follows to remove the old directory entry for the boot image SYSBOOT.EXE:
    $ RENAME DUA0:[000000]SYS0.DIR DUA0:[000000]SYS1.DIR
    
    $ SET FILE/REMOVE DUA0:[SYSEXE]SYSBOOT.EXE

  • Disable the local disk. For instructions, refer to your computer-specific installation and operations guide. Note that this option is not available if the satellite's local disk is being used for paging and swapping.

8.7 Running AUTOGEN with Feedback

AUTOGEN includes a mechanism called feedback. This mechanism examines data collected during normal system operations, and it adjusts system parameters on the basis of the collected data whenever you run AUTOGEN with the feedback option. For example, the system records each instance of a disk server waiting for buffer space to process a disk request. Based on this information, AUTOGEN can size the disk server's buffer pool automatically to ensure that sufficient space is allocated.

Execute SYS$UPDATE:AUTOGEN.COM manually as described in the HP OpenVMS System Manager's Manual.

8.7.1 Advantages

To ensure that computers are configured adequately when they first join the cluster, you can run AUTOGEN with feedback automatically as part of the initial boot sequence. Although this step adds an additional reboot before the computer can be used, the computer's performance can be substantially improved.

HP strongly recommends that you use the feedback option. Without feedback, it is difficult for AUTOGEN to anticipate patterns of resource usage, particularly in complex configurations. Factors such as the number of computers and disks in the cluster and the types of applications being run require adjustment of system parameters for optimal performance.

HP also recommends using AUTOGEN with feedback rather than the SYSGEN utility to modify system parameters, because AUTOGEN:

  • Uses parameter changes in MODPARAMS.DAT and AGEN$ files. (Changes recorded in MODPARAMS.DAT are not lost during updates to the OpenVMS operating system.)
  • Reconfigures other system parameters to reflect changes.

8.7.2 Initial Values

When a computer is first added to an OpenVMS Cluster, system parameters that control the computer's system resources are normally adjusted in several steps, as follows:

  1. The cluster configuration command procedure (CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM) sets initial parameters that are adequate to boot the computer in a minimum environment.
  2. When the computer boots, AUTOGEN runs automatically to size the static operating system (without using any dynamic feedback data), and the computer reboots into the OpenVMS Cluster environment.
  3. After the newly added computer has been subjected to typical use for a day or more, you should run AUTOGEN with feedback manually to adjust parameters for the OpenVMS Cluster environment.
  4. At regular intervals, and whenever a major change occurs in the cluster configuration or production environment, you should run AUTOGEN with feedback manually to readjust parameters for the changes.

Because the first AUTOGEN operation (initiated by either CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM) is performed both in the minimum environment and without feedback, a newly added computer may be inadequately configured to run in the OpenVMS Cluster environment. For this reason, you might want to implement additional configuration measures like those described in Section 8.7.3 and Section 8.7.4.

8.7.3 Obtaining Reasonable Feedback

When a computer first boots into an OpenVMS Cluster, much of the computer's resource utilization is determined by the current OpenVMS Cluster configuration. Factors such as the number of computers, the number of disk servers, and the number of disks available or mounted contribute to a fixed minimum resource requirements. Because this minimum does not change with continued use of the computer, feedback information about the required resources is immediately valid.

Other feedback information, however, such as that influenced by normal user activity, is not immediately available, because the only "user" has been the system startup process. If AUTOGEN were run with feedback at this point, some system values might be set too low.

By running a simulated user load at the end of the first production boot, you can ensure that AUTOGEN has reasonable feedback information. The User Environment Test Package (UETP) supplied with your operating system contains a test that simulates such a load. You can run this test (the UETP LOAD phase) as part of the initial production boot, and then run AUTOGEN with feedback before a user is allowed to log in.

To implement this technique, you can create a command file like that in step 1 of the procedure in Section 8.7.4, and submit the file to the computer's local batch queue from the cluster common SYSTARTUP procedure. Your command file conditionally runs the UETP LOAD phase and then reboots the computer with AUTOGEN feedback.

8.7.4 Creating a Command File to Run AUTOGEN

As shown in the following sample file, UETP lets you specify a typical user load to be run on the computer when it first joins the cluster. The UETP run generates data that AUTOGEN uses to set appropriate system parameter values for the computer when rebooting it with feedback. Note, however, that the default setting for the UETP user load assumes that the computer is used as a timesharing system. This calculation can produce system parameter values that might be excessive for a single-user workstation, especially if the workstation has large memory resources. Therefore, you might want to modify the default user load setting, as shown in the sample file.

Follow these steps:

  1. Create a command file like the following:


    $! 
    $!   ***** SYS$COMMON:[SYSMGR]UETP_AUTOGEN.COM ***** 
    $! 
    $! For initial boot only, run UETP LOAD phase and 
    $! reboot with AUTOGEN feedback. 
    $! 
    $ SET NOON 
    $ SET PROCESS/PRIVILEGES=ALL 
    $! 
    $! Run UETP to simulate a user load for a satellite 
    $! with 8 simultaneously active user processes. For a  
    $! CI connected computer, allow UETP to calculate the load. 
    $! 
    $ LOADS = "8" 
    $ IF F$GETDVI("PAA0:","EXISTS") THEN LOADS = "" 
    $ @UETP LOAD 1 'loads' 
    $! 
    $! Create a marker file to prevent resubmission of 
    $! UETP_AUTOGEN.COM at subsequent reboots. 
    $! 
    $ CREATE SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE 
    $! 
    $! Reboot with AUTOGEN to set SYSGEN values. 
    $! 
    $ @SYS$UPDATE:AUTOGEN SAVPARAMS REBOOT FEEDBACK   
    $! 
    $ EXIT 
    
  2. Edit the cluster common SYSTARTUP file and add the following commands at the end of the file. Assume that queues have been started and that a batch queue is running on the newly added computer. Submit UETP_AUTOGEN.COM to the computer's local batch queue.


    $! 
    $ NODE = F$GETSYI("NODE") 
    $ IF F$SEARCH ("SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE") .EQS. "" 
    $ THEN 
    $ SUBMIT /NOPRINT /NOTIFY /USERNAME=SYSTEST - 
    _$ /QUEUE='NODE'_BATCH SYS$MANAGER:UETP_AUTOGEN 
                           
    $ WAIT_FOR_UETP: 
    $  WRITE SYS$OUTPUT "Waiting for UETP and AUTOGEN... ''F$TIME()'" 
    $  WAIT 00:05:00.00             ! Wait 5 minutes 
    $  GOTO WAIT_FOR_UETP 
    $ ENDIF 
    $! 
    

    Note: UETP must be run under the user name SYSTEST.
  3. Execute CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM to add the computer.

When you boot the computer, it runs UETP_AUTOGEN.COM to simulate the user load you have specified, and it then reboots with AUTOGEN feedback to set appropriate system parameter values.


Chapter 9
Building Large OpenVMS Cluster Systems

This chapter provides guidelines for building OpenVMS Cluster systems that include many computers---approximately 20 or more---and describes procedures that you might find helpful. (Refer to the OpenVMS Cluster Software Software Product Description (SPD) for configuration limitations.) Typically, such OpenVMS Cluster systems include a large number of satellites.

Note that the recommendations in this chapter also can prove beneficial in some clusters with fewer than 20 computers. Areas of discussion include:

  • Booting
  • Availability of MOP and disk servers
  • Multiple system disks
  • Shared resource availability
  • Hot system files
  • System disk space
  • System parameters
  • Network problems
  • Cluster alias

9.1 Setting Up the Cluster

When building a new large cluster, you must be prepared to run AUTOGEN and reboot the cluster several times during the installation. The parameters that AUTOGEN sets for the first computers added to the cluster will probably be inadequate when additional computers are added. Readjustment of parameters is critical for boot and disk servers.

One solution to this problem is to run the UETP_AUTOGEN.COM command procedure (described in Section 8.7.4) to reboot computers at regular intervals as new computers or storage interconnects are added. For example, each time there is a 10% increase in the number of computers, storage, or interconnects, you should run UETP_AUTOGEN.COM. For best results, the last time you run the procedure should be as close as possible to the final OpenVMS Cluster environment.

To set up a new, large OpenVMS Cluster, follow these steps:

Step Task
1 Configure boot and disk servers using the CLUSTER_CONFIG_LAN.COM or the CLUSTER_CONFIG.COM command procedure (described in Chapter 8).
2 Install all layered products and site-specific applications required for the OpenVMS Cluster environment, or as many as possible.
3 Prepare the cluster startup procedures so that they are as close as possible to those that will be used in the final OpenVMS Cluster environment.
4 Add a small number of satellites (perhaps two or three) using the cluster configuration command procedure.
5 Reboot the cluster to verify that the startup procedures work as expected.
6 After you have verified that startup procedures work, run UETP_AUTOGEN.COM on every computer's local batch queue to reboot the cluster again and to set initial production environment values. When the cluster has rebooted, all computers should have reasonable parameter settings. However, check the settings to be sure.
7 Add additional satellites to double their number. Then rerun UETP_AUTOGEN on each computer's local batch queue to reboot the cluster, and set values appropriately to accommodate the newly added satellites.
8 Repeat the previous step until all satellites have been added.
9 When all satellites have been added, run UETP_AUTOGEN a final time on each computer's local batch queue to reboot the cluster and to set new values for the production environment.

For best performance, do not run UETP_AUTOGEN on every computer simultaneously, because the procedure simulates a user load that is probably more demanding than that for the final production environment. A better method is to run UETP_AUTOGEN on several satellites (those with the least recently adjusted parameters) while adding new computers. This technique increases efficiency because little is gained when a satellite reruns AUTOGEN shortly after joining the cluster.

For example, if the entire cluster is rebooted after 30 satellites have been added, few adjustments are made to system parameter values for the 28th satellite added, because only two satellites have joined the cluster since that satellite ran UETP_AUTOGEN as part of its initial configuration.

9.2 General Booting Considerations

Two general booting considerations, concurrent booting and minimizing boot time, are described in this section.

9.2.1 Concurrent Booting

Concurrent booting occurs after a power or a site failure when all the nodes are rebooted simultaneously. This results in significant I/O load on the interconnects. Also, results in network activity due to SCS traffic required for synchronizing. All satellites wait to reload operating system. As soon as the boot server is available, they begin to boot in parallel resulting in elapsed time during login.

9.2.2 Minimizing Boot Time

A large cluster needs to be carefully configured so that there is sufficient capacity to boot the desired number of nodes in the desired amount of time. The effect of 96 satellites rebooting could induce an I/O bottleneck that can stretch the OpenVMS Cluster reboot times into hours. The following list provides a few methods to minimize boot times.

  • Careful configuration techniques
    Guidelines for OpenVMS Cluster Configurations contains data on configurations and the capacity of the computers, system disks, and interconnects involved.
  • Adequate system disk throughput
    Achieving enough system disk throughput typically requires a combination of techniques. Refer to Section 9.7 for complete information.
  • Sufficient network bandwidth
    A single Gigabit Ethernet is unlikely to have sufficient bandwidth to meet the needs of a large OpenVMS cluster. Likewise, a single Gigabit Ethernet adapter may become a bottleneck, especially for a disk server during heavy application synchronizing. This results in high SCS traffic. Having more adapters for SCS helps in overcoming such bandwidth limitation.
    Sufficient network bandwidth can also be provided using some of the techniques listed in step 1 of Table 9-2.
  • Installation of only the required layered products and devices.

9.2.3 General Booting Considerations for Cluster over IP

OpenVMS clusters can use TCP/IP stack for communicating with other nodes in the cluster and passing SCS traffic. To be able to use TCP/IP for cluster communication a node has to be configured. For details on how to configure a node to use OpenVMS Cluster over IP, see Section 8.2.3.1. After enabling this feature, load TCP/IP stack early in the boot time during load. OpenVMS executive has been modified to load TCP/IP execlets early in the boot time so that the node can exchange SCS messages with other existing nodes of the cluster. This feature also uses configuration files which get loaded during boot time. It is necessary to ensure that these configuration files are correctly generated during the configuration. The following are some of considerations for booting.

  • Ensure that the node has TCP/IP connectivity with other nodes of the cluster.
  • Ensure that the IP multicast address used for cluster is able to be passed between the routers.
  • If IP unicast is used, ensure that the nodes' IP address is present in all the existing nodes in the PE$IP_CONFIG.DAT file.(MC SCACP RELOAD command can be used to load new IP address).

9.3 Booting Satellites

OpenVMS Cluster satellite nodes use a single LAN adapter for the initial stages of booting. If a satellite is configured with multiple LAN adapters, the system manager can specify with the console BOOT command which adapter to use for the initial stages of booting. Once the system is running, the OpenVMS Cluster uses all available LAN adapters. This flexibility allows you to work around broken adapters or network problems.

For Alpha and Integrity cluster satellites, the network boot device cannot be a prospective member of a LAN Failover Set. For example, if you create a LAN Failover Set, LLA consisting of EWA and EWB, to be active when the system boots, you cannot boot the system as a satellite over the LAN devices EWA or EWB.

The procedures and utilities for configuring and booting satellite nodes vary between Integrity servers and Alpha systems.

9.3.1 Differences between Alpha and Integrity server Satellites

Table 9-1 lists the differences between Alpha and Integrity server satellites.

Table 9-1 Differences Between Alpha and Integrity server Satellites
  Alpha Integrity servers
Boot Protocol MOP PXE(BOOTP/DHCP/TFTP)
Crash Dumps May crash to remote system disk or to local disk via Dump Off the System Disk (DOSD) Requires DOSD. Crashing to the remote disk is not possible.
Error Log Buffers Always written to the remote system disk Error log buffers are written to the same disk as DOSD
File protections No different than standard system disk Requires that all loadable execlets are W:RE (the default case) and that certain files have ACL access via the VMS$SATELLITE_ ACCESS identifier

9.4 Configuring and Booting Satellite Nodes (Alpha)

Complete the items in the following Table 9-2 before proceeding with satellite booting.

Table 9-2 Checklist for Satellite Booting
Step Action
1 Configure disk server LAN adapters.

Because disk-serving activity in an OpenVMS Cluster system can generate a substantial amount of I/O traffic on the LAN, boot and disk servers should use the highest-bandwidth LAN adapters in the cluster. The servers can also use multiple LAN adapters in a single system to distribute the load across the LAN adapters.

The following list suggests ways to provide sufficient network bandwidth:

  • Select network adapters with sufficient bandwidth.
  • Use switches to segregate traffic and to provide increased total bandwidth.
  • Use multiple LAN adapters on MOP and disk servers.
  • Use switches or higher speed LANs, fanning out to slower LAN segments.
  • Use multiple independent networks.
  • Provide sufficient MOP and disk server CPU capacity by selecting a computer with sufficient power and by configuring multiple server nodes to share the load.
2 If the MOP server node and system-disk server node are not already configured as cluster members, follow the directions in Section 8.4 for using the cluster configuration command procedure to configure each of the Alpha nodes. Include multiple boot and disk servers to enhance availability and distribute I/O traffic over several cluster nodes.
3 Configure additional memory for disk serving.
4 Run the cluster configuration procedure on the Alpha node for each satellite you want to boot into the OpenVMS Cluster.

9.4.1 Booting from a Single LAN Adapter

To boot a satellite, enter the following command:


>>> BOOT LAN-adapter-device-name 

In the example, the LAN-adapter-device-name could be any valid LAN adapter name, for example EZA0 or XQB0.

If you need to perform a conversational boot, use the command shown. At the Alpha system console prompt (>>>), enter:


>>> b -flags 0,1 eza0

In this example, -flags stands for the flags command line qualifier, which takes two values:

  • System root number
    The "0" tells the console to boot from the system root [SYS0]. This is ignored when booting satellite nodes because the system root comes from the network database of the boot node.
  • Conversational boot flag
    The "1" indicates that the boot should be conversational.

The argument eza0 is the LAN adapter to be used for booting.

Finally, notice that a load file is not specified in this boot command line. For satellite booting, the load file is part of the node description in the DECnet or LANCP database.

If the boot fails:

  • If the configuration permits and the network database is properly set up, reenter the boot command using another LAN adapter (see Section 9.4.4).
  • See Section C.2.5 for information about troubleshooting satellite booting problems.


Previous Next Contents Index