[an error occurred while processing this directive]

Software > OpenVMS Systems > Documentation > 82final > 6552

HP OpenVMS Systems Documentation

HP Availability Manager User's Guide

Contents

Index

4.3.4.2 LAN Device Transmit Data Page

The LAN Device Transmit Data page, shown in Figure 4-22, displays LAN device transmit data.

Figure 4-22 LAN Device Transmit Data Page

Table 4-22 describes the data displayed in Figure 4-22.

**Table 4-22 LAN Device Transmit Data**
Data	Description
Messages Sent	Number of packets sent by this bus, including multicast packets.
Bytes Sent	Number of bytes in packets sent by this LAN device, including multicast packets.
Multicast Msgs Sent	Number of multicast packets sent by this LAN device.
Multicast Bytes Sent	Number of multicast bytes in packets sent by this LAN device.
Outstanding I/O Count	Number of transmit requests being processed by LAN driver.

4.3.4.3 LAN Device Receive Data Page

The LAN Device Receive Data page, shown in Figure 4-23, displays LAN device receive data.

Figure 4-23 LAN Device Receive Data Page

Table 4-23 describes the data displayed in Figure 4-23.

**Table 4-23 LAN Device Receive Data**
Data	Description
Messages Rcvd	Number of packets received by this LAN device, including multicast packets.
Bytes Received	Number of bytes in packets received by this LAN device, including multicast packets.
Multicast Msgs Rcvd	Number of multicast NISCA packets received by this LAN device.
Multicast Bytes Rcvd	Number of multicast bytes received by this LAN device.

4.3.4.4 LAN Device Events Data Page

The LAN Device Events Data page, shown in Figure 4-24, displays LAN device events data.

Figure 4-24 LAN Device Events Data Page

Table 4-24 describes the data displayed in Figure 4-24.

**Table 4-24 LAN Device Events Data**
Data	Description
Port Usable	Number of times the LAN device became usable.
Port Unusable	Number of times the LAN device became unusable.
Address Change	Number of times the LAN device's LAN address changed.
Restart Failures	Number of times the LAN device failed to restart.
Last Event	Event type of the last LAN device event (for example, LAN address change, an error, and so on).
Time of Last Event	Time the last event occurred.

4.3.4.5 LAN Device Errors Data Page

The LAN Device Errors Data page, shown in Figure 4-25, displays LAN device errors data.

Figure 4-25 LAN Device Errors Data Page

Table 4-25 describes the data displayed in Figure 4-25.

**Table 4-25 LAN Device Errors Data**
Data	Description
Bad SCSSYSTEM ID	Received a packet with the wrong SCSSYSTEM ID in it.
MC Msgs Directed to TR Layer	Number of multicast packets directed to the NISCA Transport layer.
Short CC Messages Received	Number of packets received that were too short to contain a NISCA channel control header.
Short DX Messages Received	Number of packets received that were too short to contain a NISCA DX header.
CH Allocation Failures	Number of times the system failed to allocate memory for use as a channel structure in response to a packet received by this LAN device.
VC Allocation Failures	Number of times the system failed to allocate memory for use as a VC structure in response to a packet received by this LAN device.
Wrong Port	Number of packets addressed to the wrong NISCA address.
Port Disabled	Number of packets discarded because the LAN device was disabled.
H/W Transmit Errors	Number of local hardware transmit errors.
Hello Transmit Errors	Number of transmit errors during HELLOs.
Last Transmit Error Reason	Reason for last transmit error.
Time of Last Transmit Error	Time of last transmit error: date and time.

Chapter 5
Getting Information About Events

Note

Before you start this chapter, be sure to read the explanations of data collection, events, thresholds, and occurrences in Chapter 1.

The Availability Manager indicates resource availability problems in the Event pane Figure 5-1 of the main Application window (see Figure 1-1).

Figure 5-1 OpenVMS Event Pane

The Event pane helps you identify system problems. In many cases, you can apply fixes to correct these problems as well, as explained in Chapter 6.

The Availability Manager displays a warning message in the Event pane whenever it detects a resource availability problem. If logging is enabled (the default), the Availability Manager also logs each event in the Events Log file, which you can display or print. (See Section 5.2 for the location of this file and a cautionary note about it.)

Occurrence Counters

During data collection, any time data meets or exceeds the threshold for an event, an occurrence counter is incremented. When the incremented value matches the value in the Occurrence box on the Event Customization page (Figure 1-6), the event is posted in the Event pane of the Application window (see Figure 1-1).

Note that some events are triggered when data is lower than the threshold; other events are triggered when data is higher than the threshold.

If, at any time during data collection, the data does not meet or exceed the threshold, the occurrence counter is set to 0, and the event is removed from the Event pane. Figure 5-2 depicts this sequence.

Figure 5-2 Testing for Events

5.1 Event Information That Is Displayed in the Event Pane

The Availability Manager can display events for all nodes that are currently in communication with the Data Analyzer. When an event of a certain severity occurs, the Availability Manager adds the event to a list in the Event pane.

The length of time an event is displayed depends on the severity of the event. Less severe events are displayed for a short period of time (30 seconds); more severe events are displayed until you explicitly remove the event from the Event pane (explained in Section 5.1.2).

5.1.1 Data in the Event Pane

Table 5-1 identifies the data items displayed in the Event pane.

**Table 5-1 Event Pane Data**
Data Item	Description
Node	Name of the node causing the event
Group	Group of the node causing the event
Date	Date the event occurred
Time	Time that an event was detected
Sev	Severity: a value from 0 to 100
Event	Alphanumeric identifier of the type of event
Description	Short description of the resource availability problem

Appendix B contains tables of events that are displayed in the Event pane. In addition, these tables contain an explanation of each event and the recommended remedial action.

5.1.2 Event Pane Menu Options

When you right-click a node name or data item in the Event pane, the Availability Manager displays a popup menu with the following options:

Menu Option	Description
Display	Displays the Node Summary page associated with that event.
Remove	Removes an event from the display.
Freeze/Unfreeze	Freezes a value in the display until you "unfreeze" it; a snowflake icon is displayed to the left of an event that is frozen.
Customize	Allows you to customize events.

5.2 Criteria for Posting and Displaying an Event

The Availability Manager uses the following criteria to determine whether to post an event and display it in the Event pane:

Data collection posts an event if the event condition exists for the number of data collections specified in the Occurrence value on the Event Customization page (Figure 5-3).
Figure 5-3 Sample Event Customization Page

The sample Event Customization page indicates an Occurrence value of 2. This means that if the DSKERR event exceeds its threshold of 15 for two consecutive data collections, the DSKERR event is posted in the Event pane.
When an event is posted, data is collected at the Event interval shown on the Data Collection Customization page (Figure 5-4).
Figure 5-4 OpenVMS Data Collection Customization Page

On the Data Collection Customization page, for example, the Event interval for Disk Status is every 15 seconds.
The data value displayed in the Node pane that is associated with the event turns red when an event is posted (see Figure 5-5).
Figure 5-5 OpenVMS Node Pane

When an event is posted, it is added to the Events Log file by default:

On OpenVMS systems, the Events Log file is:
AMDS$AM_LOG:ANALYZEREVENTS.LOG
A new version of this file is created each time you access the Availability Manager.
On Windows systems, the Events Log file is:
AnalyzerEvents.log
This file, which is in the installation directory, is overwritten each time you access the Availability Manager.

The following example shows a partial events log file:

VAXJET 01-22-2004 11:24:50.67 0  CFGDON  VAXJET configuration done
DBGAVC 01-22-2004 11:25:12.41 0  CFGDON  DBGAVC configuration done
AFFS5  01-22-2004 11:25:13.23 0  CFGDON  AFFS5 configuration done
DBGAVC 01-22-2004 11:25:18.31 80 LCKCNT  DBGAVC possible contention for resource REG$MASTER_LOCK
VAXJET 01-22-2004 11:25:27.47 40 LOBIOQ  VAXJET LES$ACP_V30 has used most of
                                         its BIOLM process quota
PEROIT 01-22-2004 11:25:27.16 0  CFGDON  PEROIT configuration done
KOINE  01-22-2004 11:25:33.05 99 NOSWFL  KOINE has no swap file
MAWK   01-22-2004 11:26:20.15 99 FXTIMO  MAWK Fix timeout for FID to Filename Fix
MAWK   01-22-2004 11:26:24.48 60 HIDIOR  MAWK direct I/O rate is high
REDSQL 01-22-2004 11:26:30.61 10 PRPGFL  REDSQL _FTA2: high page fault rate
REDSQL 01-22-2004 11:26:31.18 60 PRPIOR  REDSQL _FTA7: paging I/O rate is high
MAWK   01-22-2004 11:26:24.48 60 HIDIOR  MAWK direct I/O rate is high
AFFS52 01-22-2004 11:25:33.64 60 DSKMNV  AFFS52 $4$DUA320(OMTV4) disk mount verify in progress
VAXJET 01-22-2004 11:38:46.23 90 DPGERR  VAXJET error executing driver program, ...
REDSQL 01-22-2004 11:39:18.73 60 PRCPWT  REDSQL _FTA2: waiting in PWAIT
REDSQL 01-22-2004 11:44:37.19 75 PRCCUR  REDSQL _FTA7: has a high CPU rate

Caution About Events Logs

If you collect data on many nodes, running the Availability Manager for a long period of time can result in a large events log. For example, in a run that monitors more than 50 nodes with most of the background data collection enabled, the events log can grow by up to 30 MB per day. At this rate, systems with small disks might fill up the disk on which the events log resides.

Closing the Availability Manager application will enable you to access the events log for tasks such as archiving. Starting the Availability Manager starts a new events log.

5.3 Displaying Additional Event Information

For more detailed information about a specific event, double-click any event data item in the Event pane. The Availability Manager first displays a data page that most closely corresponds to the cause of the event. You can choose other tabs for additional detailed information.

For a description of data pages and the information they contain, see Chapter 3.

Chapter 6
Performing Fixes on OpenVMS Nodes

You can perform fixes on OpenVMS nodes to resolve resource availability problems and improve system availability.

This chapter discusses the following topics:

Understanding fixes
Performing fixes

Caution

Performing certain fixes can have serious repercussions, including possible system failure. Therefore, only experienced system managers should perform fixes.

6.1 Understanding Fixes

When you suspect or detect a resource availability problem, in many cases you can use the Availability Manager to analyze the problem and to perform a fix to improve the situation.

Availability Manager fixes fall into these categories:

Node fixes
Process fixes
Cluster interconnect fixes

You can access fixes, by category, from the pages listed in Table 6-1.

**Table 6-1 Accessing Availability Manager Fixes**
Fix Category and Name	Available from This Page
Node fixes: Crash Node Adjust Quorum	Node Summary CPU Memory I/O
Process fixes: General process fixes: Delete Process Exit Image Suspend Process Resume Process Process Priority Process memory fixes: Purge Working Set (WS) Adjust Working Set (WS) Process limits fixes: Direct I/O Buffered I/O AST Open file Lock Timer Subprocess I/O Byte Pagefile Quota	All of the process fixes are available from the following pages: Memory I/O CPU Process Single Process
Cluster interconnect fixes:	These fixes are available from the following lines of data on the Cluster Summary page (Figure 4-7):
-- Port Adjust Priority	Right-click a data item on the local port data display line to display a menu containing the Adjust Priority option.
-- Circuit Adjust Priority	Right-click a data item on the circuits data display line to display a menu containing the Adjust Priority option.
LAN Virtual Circuit summary: Maximum Transmit Window Size Maximum Receive Window Size Checksumming Compression	Right-click a data item in the LAN Virtual Circuit Summary category to display a menu. Then click the Fixes... menu item.
LAN Path (Channel) Summary: Adjust Priority Hops Maximum Packet Size	Right-click a data item in the LAN Path (Channel) Summary category to display a menu. Then click the VC LAN Fix... menu item.
LAN Device Details: Adjust Priority Maximum Buffer Size Start Device Stop Device	Right-click a data item in the LAN Path (Channel) Summary category to display a menu. Then click the LAN Device Details menu item to display pages containing Fix options.

Table 6-2 summarizes various problems, recommended fixes, and the expected results of fixes.

**Table 6-2 Summary of Problems and Matching Fixes**
Problem	Fix	Result
Node resource hanging cluster	Crash Node	Node fails with operator-requested shutdown.
Cluster hung	Adjust Quorum	Quorum for cluster is adjusted.
Process looping, intruder	Delete Process	Process no longer exists.
Endless process loop in same PC range	Exit Image	Exits from current image.
Runaway process, unwelcome intruder	Suspend Process	Process is suspended from execution.
Process previously suspended	Resume Process	Process starts from point it was suspended.
Runaway process or process that is overconsuming	Process Priority	Base priority changes to selected setting.
Low node memory	Purge Working Set (WS)	Frees memory on node; page faulting might occur for process affected.
Working set too high or low	Adjust Working Set (WS)	Removes unused pages from working set; page faulting might occur.
Process quota has reached its limit and has entered RWAIT state	Adjust Process Limits	Process limit is increased, which in many cases frees the process to continue execution.
Process has exhausted its pagefile quota	Adjust Pagefile Quota	Pagefile quota limit of the process is adjusted.

Most process fixes correspond to an OpenVMS system service call, as shown in the following table:

Process Fix	System Service Call
Delete Process	$DELPRC
Exit Image	$FORCEX
Suspend Process	$SUSPND
Resume Process	$RESUME
Process Priority	$SETPRI
Purge Working Set (WS)	$PURGWS
Adjust Working Set (WS)	$ADJWSL
Adjust process limits of the following: Direct I/O (DIO) Buffered I/O (BIO) Asynchronous system trap (AST) Open file (FIL) Lock queue (ENQ) Timer queue entry (TQE) Subprocess (PRC) I/O byte (BYT)	None

Note

Each fix that uses a system service call requires that the process execute the system service. A hung process will have the fix queued to it, where the fix will remain until the process is operational again.

Be aware of the following facts before you perform a fix:

You must have write access to perform a fix. To perform LAN fixes, you must have control access.
You cannot undo many fixes. For example, after using the Crash Node fix, the node must be rebooted (either by the node if the node reboots automatically, or by a person performing a manual boot).
Do not apply the Exit Image, Delete Process, or Suspend Process fix to system processes. Doing so might require you to reboot the node.
Whenever you exit an image, you cannot return to that image.
You cannot delete processes that have exceeded their job or process quota.
The Availability Manager ignores fixes applied to the SWAPPER process.

How to Perform Fixes

Standard OpenVMS privileges restrict users' write access. When you run the Data Analyzer, you must have the CMKRNL privilege to send a write (fix) instruction to a node with a problem.

The following options are displayed at the bottom of all fix pages:

Option	Description
OK	Applies the fix and then exits the page. Any message associated with the fix is displayed in the Event pane.
Cancel	Cancels the fix.
Apply	Applies the fix and does not exit the page. Any message associated with the fix is displayed in the Return Status section of the page and in the Event pane.

The following sections explain how to perform node fixes and process fixes.