|
HP OpenVMS Cluster Systems
C.10.4 Logged Message Entries
Logged-message entries are made when the LAN port receives a response
that contains either data that the port driver cannot interpret or an
error code in status field of the response.
C.10.5 Error-Log Entry Descriptions
This section describes error-log entries for the CI and LAN ports. Each
entry shown is followed by a brief description of what the associated
port driver (for example, PADRIVER, PBDRIVER, PEDRIVER) does, and the
suggested action a system manager should take. In cases where you are
advised to contact your HP support representative. and save crash
dumps, it is important to capture the crash dumps as soon as possible
after the error. For CI entries, note that path A and path 0 are the
same path, and that path B and path 1 are the same path.
Table C-6 lists error-log messages.
Table C-6 Port Messages for All Devices
Message |
Result |
User Action |
BIIC FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative.
|
11/750 CPU MICROCODE NOT ADEQUATE FOR PORT
|
The port driver sets the port off line with no retries attempted. In
addition, if this port is needed because the computer is booted from an
HSC subsystem or is participating in a cluster, the computer bugchecks
with a UCODEREV code bugcheck.
|
Read the appropriate section in the current OpenVMS Cluster Software
SPD for information on required computer microcode revisions. Contact
your HP support representative, if necessary.
|
PORT MICROCODE REV NOT CURRENT, BUT SUPPORTED
|
The port driver detected that the microcode is not at the current
level, but the port driver will continue normally. This error is logged
as a warning only.
|
Contact your HP support representative when it is convenient to have
the microcode updated.
|
DATAGRAM FREE QUEUE INSERT FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. This error is caused by a
failure to obtain access to an interlocked queue. Possible sources of
the problem are CI hardware failures, or memory, SBI (11/780), CMI
(11/750), or BI (8200, 8300, and 8800) contention.
|
DATAGRAM FREE QUEUE REMOVE FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. This error is caused by a
failure to obtain access to an interlocked queue. Possible sources of
the problem are CI hardware failures, or memory, SBI (11/780), CMI
(11/750), or BI (8200, 8300, and 8800) contention.
|
FAILED TO LOCATE PORT MICROCODE IMAGE
|
The port driver marks the device off line and makes no retries.
|
Make sure console volume contains the microcode file CI780.BIN (for the
CI780, CI750, or CIBCI) or the microcode file CIBCA.BIN for the
CIBCA--AA. Then reboot the computer.
|
HIGH PRIORITY COMMAND QUEUE INSERT FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. This error is caused by a
failure to obtain access to an interlocked queue. Possible sources of
the problem are CI hardware failures, or memory, SBI (11/780), CMI
(11/750), or BI (8200, 8300, and 8800) contention.
|
MSCP ERROR LOGGING DATAGRAM RECEIVED
|
On receipt of an error message from the HSC subsystem, the port driver
logs the error and takes no other action. You should disable the
sending of HSC informational error-log datagrams with the appropriate
HSC console command because such datagrams take considerable space in
the error-log data file.
|
Error-log datagrams are useful to read only if they are not captured on
the HSC console for some reason (for example, if the HSC console ran
out of paper.) This logged information duplicates messages logged on
the HSC console.
|
INAPPROPRIATE SCA CONTROL MESSAGE
|
The port driver closes the port-to-port virtual circuit to the remote
port.
|
Contact your HP support representative. Save the error logs and the
crash dumps from the local and remote computers.
|
INSUFFICIENT NON-PAGED POOL FOR INITIALIZATION
|
The port driver marks the device off line and makes no retries.
|
Reboot the computer with a larger value for NPAGEDYN or NPAGEVIR.
|
LOW PRIORITY CMD QUEUE INSERT FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. This error is caused by a
failure to obtain access to an interlocked queue. Possible sources of
the problem are CI hardware failures, or memory, SBI (11/780), CMI
(11/750), or BI (8200, 8300, and 8800) contention.
|
MESSAGE FREE QUEUE INSERT FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. This error is caused by a
failure to obtain access to an interlocked queue. Possible sources of
the problem are CI hardware failures, or memory, SBI (11/780), CMI
(11/750), or BI (8200, 8300, and 8800) contention.
|
MESSAGE FREE QUEUE REMOVE FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. This error is caused by a
failure to obtain access to an interlocked queue. Possible sources of
the problem are CI hardware failures, or memory, SBI (11/780), CMI
(11/750), or BI (8200, 8300, and 8800) contention.
|
MICRO-CODE VERIFICATION ERROR
|
The port driver detected an error while reading the microcode that it
just loaded into the port. The driver attempts to reinitialize the
port; after 50 failed attempts, it marks the device off line.
|
Contact your HP support representative.
|
NO PATH-BLOCK DURING VIRTUAL CIRCUIT CLOSE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. Save the error log and a crash
dump from the local computer.
|
NO TRANSITION FROM UNINITIALIZED TO DISABLED
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative.
|
PORT ERROR BIT(S) SET
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
A
maintenance timer expiration bit may mean that the
PASTIMOUT system parameter is set too low and should be increased,
especially if the local computer is running privileged user-written
software. For all other bits, call your HP support representative.
|
PORT HAS CLOSED VIRTUAL CIRCUIT
|
The port driver closed the virtual circuit that the local port opened
to the remote port.
|
Check the PPD$B_STATUS field of the error-log entry for the reason the
virtual circuit was closed. This error is normal if the remote computer
failed or was shut down. For PEDRIVER, ignore the PPD$B_OPC field
value; it is an unknown opcode.
If PEDRIVER logs a large number of these errors, there may be a
problem either with the LAN or with a remote system, or nonpaged pool
may be insufficient on the local system.
|
PORT POWER DOWN
|
The port driver halts port operations and then waits for power to
return to the port hardware.
|
Restore power to the port hardware.
|
PORT POWER UP
|
The port driver reinitializes the port and restarts port operations.
|
No action needed.
|
RECEIVED CONNECT WITHOUT PATH-BLOCK
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. Save the error log and a crash
dump from the local computer.
|
REMOTE SYSTEM CONFLICTS WITH KNOWN SYSTEM
|
The configuration poller discovered a remote computer with SCSSYSTEMID
and/or SCSNODE equal to that of another computer to which a virtual
circuit is already open.
|
Shut down the new computer as soon as possible. Reboot it with a unique
SCSYSTEMID and SCSNODE. Do not leave the new computer up any longer
than necessary. If you are running a cluster, and two computers with
conflicting identity are polling when any other virtual circuit failure
takes place in the cluster, then computers in the cluster may shut down
with a CLUEXIT bugcheck.
|
RESPONSE QUEUE REMOVE FAILURE
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative. This error is caused by a
failure to obtain access to an interlocked queue. Possible sources of
the problem are CI hardware failures, or memory, SBI (11/780), CMI
(11/750), or BI (8200, 8300, and 8800) contention.
|
SCSSYSTEMID MUST BE SET TO NON-ZERO VALUE
|
The port driver sets the port off line without attempting any retries.
|
Reboot the computer with a conversational boot and set the SCSSYSTEMID
to the correct value. At the same time, check that SCSNODE has been set
to the correct nonblank value.
|
SOFTWARE IS CLOSING VIRTUAL CIRCUIT
|
The port driver closes the virtual circuit to the remote port.
|
Check error-log entries for the cause of the virtual circuit closure.
Faulty transmission or reception on both paths, for example, causes
this error and may be detected from the one or two previous error-log
entries noting bad paths to this remote computer.
|
SOFTWARE SHUTTING DOWN PORT
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Check other error-log entries for the possible cause of the port
reinitialization failure.
|
UNEXPECTED INTERRUPT
|
The port driver attempts to reinitialize the port; after 50 failed
attempts, it marks the device off line.
|
Contact your HP support representative.
|
UNRECOGNIZED SCA PACKET
|
The port driver closes the virtual circuit to the remote port. If the
virtual circuit is already closed, the port driver inhibits datagram
reception from the remote port.
|
Contact your HP support representative. Save the error-log file that
contains this entry and the crash dumps from both the local and remote
computers.
|
VIRTUAL CIRCUIT TIMEOUT
|
The port driver closes the virtual circuit that the local CI port
opened to the remote port. This closure occurs if the remote computer
is running CI microcode Version 7 or later, and if the remote computer
has failed to respond to any messages sent by the local computer.
|
This error is normal if the remote computer has halted, failed, or was
shut down. This error may mean that the local computer's TIMVCFAIL
system parameter is set too low, especially if the remote computer is
running privileged user-written software.
|
INSUFFICIENT NON-PAGED POOL FOR VIRTUAL CIRCUITS
|
The port driver closes virtual circuits because of insufficient pool.
|
Enter the DCL command SHOW MEMORY to determine pool requirements, and
then adjust the appropriate system parameter requirements.
|
The descriptions in Table C-7 apply only to LAN devices.
Table C-7 Port Messages for LAN Devices
Message |
Completion Status |
Explanation |
User Action |
FATAL ERROR DETECTED BY DATALINK
|
First longword SS$_NORMAL (00000001), second longword (00001201)
|
The LAN driver stopped the local area OpenVMS Cluster protocol on the
device. This completion status is returned when the SYS$LAVC_STOP_BUS
routine completes successfully. The SYS$LAVC_STOP_BUS routine is called
either from within the LAVC$STOP_BUS.MAR program found in SYS$EXAMPLES
or from a user-written program. The local area OpenVMS Cluster protocol
remains stopped on the specified device until the SYS$LAVC_START_BUS
routine executes successfully. The SYS$LAVC_START_BUS routine is called
from within the LAVC$START_BUS.MAR program found in SYS$EXAMPLES or
from a user-written program.
|
If the protocol on the device was stopped inadvertently, then restart
the protocol by assembling and executing the LAVC$START_BUS program
found in SYS$EXAMPLES.
Reference: See Appendix D for an explanation of the
local area OpenVMS Cluster sample programs. Otherwise, this error
message can be safely ignored.
|
|
First longword is any value other than (00000001), second longword
(00001201)
|
The LAN driver has shut down the device because of a fatal error and is
returning all outstanding transmits with SS$_OPINCOMPL. The LAN device
is restarted automatically.
|
Infrequent occurrences of this error are typically not a problem. If
the error occurs frequently or is accompanied by loss or
reestablishment of connections to remote computers, there may be a
hardware problem. Check for the proper LAN adapter revision level or
contact your HP support representative.
|
|
First longword (undefined), second longword (00001200)
|
The LAN driver has restarted the device successfully after a fatal
error. This error-log message is usually preceded by a FATAL ERROR
DETECTED BY DATALINK error-log message whose first completion status
longword is anything other than 00000001 and whose second completion
status longword is 00001201.
|
No action needed.
|
TRANSMIT ERROR FROM DATALINK
|
SS$_OPINCOMPL (000002D4)
|
The LAN driver is in the process of restarting the data link because an
error forced the driver to shut down the controller and all users (see
FATAL ERROR DETECTED BY DATALINK).
|
|
|
SS$_DEVREQERR (00000334)
|
The LAN controller tried to transmit the packet 16 times and failed
because of defers and collisions. This condition indicates that LAN
traffic is heavy.
|
|
|
SS$_DISCONNECT (0000204C)
|
There was a loss of carrier during or after the transmit. This includes
transmit attempts when the link is down.
|
The port emulator automatically recovers from any of these errors, but
many such errors indicate either that the LAN controller is faulty or
that the LAN is overloaded. If you suspect either of these conditions,
contact your HP support representative.
|
INVALID CLUSTER PASSWORD RECEIVED
|
|
A computer is trying to join the cluster using the correct cluster
group number for this cluster but an invalid password. The port
emulator discards the message. The probable cause is that another
cluster on the LAN is using the same cluster group number.
|
Provide all clusters on the same LAN with unique cluster group numbers.
|
NISCS PROTOCOL VERSION MISMATCH RECEIVED
|
|
A computer is trying to join the cluster using a version of the cluster
LAN protocol that is incompatible with the one in use on this cluster.
|
Install a version of the operating system that uses a compatible
protocol, or change the cluster group number so that the computer joins
a different cluster.
|
C.11 OPA0 Error-Message Logging and Broadcasting
Port drivers detect certain error conditions and attempt to log them.
The port driver attempts both OPA0 error broadcasting and standard
error logging under any of the following circumstances:
- The system disk has not yet been mounted.
- The system disk is undergoing mount verification.
- During mount verification, the system disk drive contains the wrong
volume.
- Mount verification for the system disk has timed out.
- The local computer is participating in a cluster, and quorum has
been lost.
Note the implicit assumption that the system and error-logging devices
are one and the same.
The following table describes error-logging methods and their
reliability.
Method |
Reliability |
Comments |
Standard error logging to an error-logging device.
|
Under some circumstances, attempts to log errors to the error-logging
device can fail. Such failures can occur because the error-logging
device is not accessible when attempts are made to log the error
condition.
|
Because of the central role that the port device plays in clusters, the
loss of error-logged information in such cases makes it difficult to
diagnose and fix problems.
|
Broadcasting selected information about the error condition to OPA0.
(This is in addition to the port driver's attempt to log the error
condition to the error-logging device.)
|
This method of reporting errors is not entirely reliable, because some
error conditions may not be reported due to the way OPA0 error
broadcasting is performed. This situation occurs whenever a second
error condition is detected before the port driver has been able to
broadcast the first error condition to OPA0. In such a case, only the
first error condition is reported to OPA0, because that condition is
deemed to be the more important one.
|
This second, redundant method of error logging captures at least some
of the information about port-device error conditions that would
otherwise be lost.
|
Note: Certain error conditions are always broadcast to
OPA0, regardless of whether the error-logging device is accessible. In
general, these are errors that cause the port to shut down either
permanently or temporarily.
C.11.1 OPA0 Error Messages
One OPA0 error message for each error condition is always logged. The
text of each error message is similar to the text in the summary
displayed by formatting the corresponding standard error-log entry
using the Error Log utility. (See Section C.10.5 for a list of Error Log
utility summary messages and their explanations.)
Table C-8 lists the OPA0 error messages. The table is divided into
units by error type. Many of the OPA0 error messages contain some
optional information, such as the remote port number, CI packet
information (flags, port operation code, response status, and port
number fields), or specific CI port registers. The codes specify
whether the message is always logged on OPA0 or is logged only when the
system device is inaccessible.
Table C-8 OPA0 Messages
Error Message |
Logged or Inaccessible |
Software Errors During Initialization |
%PEA0, Configuration data for IP cluster not found
|
Logged
|
%Pxxn, Insufficient Non-Paged Pool for Initialization
|
Logged
|
%Pxxn, Failed to Locate Port Micro-code Image
|
Logged
|
%Pxxn, SCSSYSTEMID has NOT been set to a Non-Zero Value
|
Logged
|
Hardware Errors |
%Pxxn, BIIC failure---BICSR/BER/CNF xxxxxx/xxxxxx/xxxxxx
|
Logged
|
%Pxxn, Micro-code Verification Error
|
Logged
|
%Pxxn, Port Transition Failure---CNF/PMC/PSR xxxxxx/xxxxxx/xxxxxx
|
Logged
|
%Pxxn, Port Error Bit(s) Set---CNF/PMC/PSR xxxxxx/xxxxxx/xxxxxx
|
Logged
|
%Pxxn, Port Power Down
|
Logged
|
%Pxxn, Port Power Up
|
Logged
|
%Pxxn, Unexpected Interrupt---CNF/PMC/PSR xxxxxx/xxxxxx/xxxxxx
|
Logged
|
Queue Interlock Failures |
%Pxxn, Message Free Queue Remove Failure
|
Logged
|
%Pxxn, Datagram Free Queue Remove Failure
|
Logged
|
%Pxxn, Response Queue Remove Failure
|
Logged
|
%Pxxn, High Priority Command Queue Insert Failure
|
Logged
|
%Pxxn, Low Priority Command Queue Insert Failure
|
Logged
|
%Pxxn, Message Free Queue Insert Failure
|
Logged
|
%Pxxn, Datagram Free Queue Insert Failure
|
Logged
|
Cable Change-of-State Notification |
%Pxxn, Path #0. Has gone from GOOD to BAD---REMOTE PORT
1 xxx
|
Inaccessible
|
%Pxxn, Path #1. Has gone from GOOD to BAD---REMOTE PORT
1 xxx
|
Inaccessible
|
%Pxxn, Path #0. Has gone from BAD to GOOD---REMOTE PORT
1 xxx
|
Inaccessible
|
%Pxxn, Path #1. Has gone from BAD to GOOD---REMOTE PORT
1 xxx
|
Inaccessible
|
%Pxxn, Cables have gone from UNCROSSED to CROSSED---REMOTE PORT
1 xxx
|
Inaccessible
|
%Pxxn, Cables have gone from CROSSED to UNCROSSED---REMOTE PORT
1 xxx
|
Inaccessible
|
%Pxxn, Path #0. Loopback has gone from GOOD to BAD---REMOTE PORT
1 xxx
|
Logged
|
%Pxxn, Path #1. Loopback has gone from GOOD to BAD---REMOTE PORT
1 xxx
|
Logged
|
%Pxxn, Path #0. Loopback has gone from BAD to GOOD---REMOTE PORT
1 xxx
|
Logged
|
%Pxxn, Path #1. Loopback has gone from BAD to GOOD---REMOTE PORT
1 xxx
|
Logged
|
%Pxxn, Path #0. Has become working but CROSSED to Path #1.--- REMOTE
PORT
1 xxx
|
Inaccessible
|
%Pxxn, Path #1. Has become working but CROSSED to Path #0.--- REMOTE
PORT
1 xxx
|
Inaccessible
|
1If the port driver can identify the remote SCS node name of
the affected computer, the driver replaces the "REMOTE PORT
xxx" text with "REMOTE SYSTEM X...", where X...
is the value of the system parameter SCSNODE on the remote computer. If
the remote SCS node name is not available, the port driver uses the
existing message format.
Key to CI Port Registers:
CNF---configuration register
PMC---port maintenance and control register
PSR---port status register
See also the CI hardware documentation for a detailed description
of the CI port registers.
|