|
» |
|
|
|
|
|
|
|
This article describes the HP Services tool,
Revision and Configuration Management (RCM), which collects detailed system
configuration information from HP systems at customer sites. The data is stored
on the RCM server at HP and is used to create configuration, change,
comparison, and analysis reports that the customer and HP account team can
access through the Electronic Site Management Guide (eSMG). Customers with valid service
contracts access their own information using encrypted (https) connections.
RCM is available for OpenVMS VAX and Alpha systems from Version
6.2 through Version 7.3-2, as well as for HP Tru64 UNIX, Windows NT/2000 and HP-UX
systems. This paper describes the RCM architecture and focuses on the design of
the RCM OpenVMS collector. It describes the main features of the RCM reports,
such as recommendations for critical patches as well as detailed information
including the following:
- Disk and tape devices on the system
- Installed software and patches
- Firmware revision levels; hardware part numbers and revision levels
- SAN controller details and topology map
- Enterprise Virtual Array (EVA) configuration reports.
By collecting system configuration information on a regular schedule, RCM change
reports make it possible for the customer and HP Service personnel to quickly
diagnose problems that configuration changes might have introduced.
RCM, which was developed by the HP Mission Critical and Proactive
Services group, has been deployed on thousands of OpenVMS systems worldwide and
is a key part of the HP Services portfolio.
|
|
|
|
|
|
Figure 1 illustrates the overall RCM architecture. Numbers correspond to numbers in the figure.
- The RCM data collector is installed on a system
at a customer site and gathers selected information from the system software.
- The collected information is transported to the
RCM Server at HP using DSNlink, Email or FTP. DSNLink is currently being
retired as a transport mechanism; however, future releases of RCM will use the
HP Instant Support Enterprise Edition (ISEE) to transport RCM data securely
from the customer site to HP.
- At the RCM server, the parser-loader software processes
the collected information, verifying that the information conforms to the
necessary standard (system type, revision, format, and so on) before loading it
into the Configuration Snapshot Repository (CSR) database. This database is
used as the source of information for various RCM reports. The Configuration
Reference Database (CRDT), an additional database that is maintained on the RCM
server, contains product reference information such as current hardware
revisions.
- The Electronic Site Management Guide (eSMG)
serves as a repository for RCM reports, system healthcheck reports,
availability statistics, contact information, and any additional site-related
documents that the customer wants to store.
- Customers and HP Technical Account Managers
(TAMs) can access eSMG through a web-based user interface to request reports
for specific customer systems. These reports can be used to track historical
changes or to make comparisons with a reference system.
Figure 1 RCM Architecture
|
|
|
|
|
|
The Electronic Site Management Guide
incorporates security features to ensure that customer information stored in
the eSMG remains confidential. Each customer's information is accessible only
by authorized members of the HP Mission-Critical and HP support team in charge
of that customer. Customer access to the eSMG is reserved to customers who have
valid service contracts.
Authorized customers can access their data from
the Internet through a secured port using an SSL 3.0 protocol and 128-bit key.
All information sent and received over the Internet is encrypted. Customers
access their personal information using a personal username and password that
an HP representative provides. By default, Internet access to customer
information is set to read-only mode.
Initially, only the Technical Account
Manager assigned to a customer and the eSMG
administrator have access to customer data. Granting access to the customer
information to additional HP Services employees requires the approval of the
TAM.
|
|
Access IDs |
|
|
All data stored in the
RCM Server is secure. The RCM server and eSMG use access identifiers to
identify data collections and to control access to these collections. An
identifier is assigned to a system when the RCM collector is configured. As a
result, every RCM data collection has an associated access identifier, which
represents the system, or group of systems, from which data was collected.All RCM reports associated with that
access identifier are available in eSMG within one hour of the time data is
sent from a customer's system.
|
|
|
|
|
The RCM OpenVMS collector consists of a number of Digital
Command Language (DCL) command procedures as well as several OpenVMS VAX and
Alpha binary executables that parts of the collection require. One node
initiates the collection process for all the selected nodes. This node manages
the overall collection process and sends the collected data to the RCM server
at HP.
The RCM OpenVMS data
collector is available as a free download from the HP software depot site and
is installed using the POLYCENTER Software Installation Utility. You use the DCL command PRODUCT INSTALL RCM
to install the kit. By default, RCM will be installed on the system disk, but
you can install it on another disk by specifying the /DESTINATION=device_name:[directory_name]
qualifier.
To de-install RCM, you use the DCL command PRODUCT REMOVE RCM.
Cluster Installation You usually need to install RCM on only one node in a
cluster, and it can collect data from all nodes. The only requirement is that
each node must have read and write access to the disk on which RCM is
installed.
|
|
|
|
|
|
The RCM_START.COM
procedure initiates the collection process. Execute this procedure on one node
in a cluster.
When you run RCM_START
for the first time, as part of the installation procedure, it prompts for
configuration information that is written to a configuration file in the
RCM$DATA directory. By default, the configuration file is named <nodename>.CFG.
The RCM_START
procedure also asks you to select the nodes from which to collect data, and it
tries to obtain the serial numbers of the selected systems. On VAX systems and
some older Alpha models, it is not possible to obtain the serial number from
the operating system; therefore, RCM prompts for the serial number. Because RCM
and ESMG use the serial number as part of the identifying information, it is
important to enter the number correctly. On newer Alpha systems, the serial
number is obtained from the system using the output from the SHOW CPU/FULL
command, which is included in the RCM_START procedure.
The RCM configuration
file specifies all aspects of the RCM collection including the following:
- Selected nodes and serial numbers
- Collection frequency
- Date and time of next collection
- Transport option
- SAN switch data collection
- EVA data collection
- Run RCM on reboot
- Number of data collections to archive on the system
You can edit the
configuration file using an editor or the built-in editing feature of
RCM_START.
Example 1 shows a sample RCM configuration file.
Example 1 RCM Configuration File
The RCM_START
procedure starts a detached process, which runs the RCM_COLLECT.COM procedure,
on each selected node in the cluster. The process name of each detached process
is "RCM_COLLECT". The data for each system is written to a text file in the
RCM$DATA directory with a naming convention of
RCMO-<nodename>-yyyymmdd-hhmmss.TXT.
The RCM_COLLECT
process on the node that initiated the RCM collection is responsible for
managing the overall collection. When
all the other collection processes have completed, the initial node's collect
process adds each system's RCM data files to a ZIP archive, which is then
transported to HP using the selected transport option.
If email is selected
as the transport option, the ZIP file is first converted to a text file using
the uuencode utility included in the RCM Kit; it is then emailed to HP using
VMSMAIL. If no transport option is selected, the user should send the data to
the RCM server in HP using the most convenient option, such as sending the ZIP
file as an attachment to an email or using ftp to send the data. The size of an
RCM collection ZIP file depends on many factors, such as the number of nodes
and devices and whether SAN and EVA collection is enabled. The size of the
archive is usually less than 1MB.
The RCM collected data
can also be sent to another system at the customer site, using either FTP or
email, by setting the LOCAL SITE option to Y and entering the transport details.
You can use this method to centralize all site RCM data in one location. You might need to use this method if you have
a system without a connection to the internet in order to move data to a
gateway node for later transport to HP.
|
|
Scheduling Collections of Data |
|
|
The required
collection schedule for RCM data is a configuration option. The default setting
is to collect RCM data once a month but you can choose daily, weekly, or
quarterly collections instead. If you do not need a regular schedule, you can
configure RCM to collect information only on demand. After you set up a
schedule, a detached process is started, and the process hibernates until the
next scheduled time. During the collection process, the date for the next
collection is updated, and another detached process is started.
RCM adds an entry to
the SYSMAN startup database so that when a system reboots, the collection
schedule is preserved. To see the RCM entry, enter the following command:
$ MCR SYSMAN STARTUP SHOW FILE RCM/FULL
This command shows the
entry RCM$STARTUP_ nodename.COM enabled on the node with RCM installed.
The RCM$STARTUP_nodename.COM procedure in SYS$STARTUP defines the
required RCM logicals and resets the collection schedule when the system
reboots. Note that only the system on which RCM was installed runs the RCM
procedure at startup.
|
Monitoring and Stopping an RCM Collection |
|
|
The duration of an RCM
collection depends on the system type and configuration; however, it usually
takes less than fifteen minutes. You can monitor the progress of a collection
by executing the RCM$DIR:RCM_STATUS.COM procedure. This procedure shows the
current status of the collection on each system and the scheduled date and time
of the next collection.
You can use the
RCM$DIR:RCM_STOP procedure at any time to stop the collection process and to
cancel future collections. Running RCM_START again resets the scheduled
collections.
|
|
|
|
|
The RCM_COLLECT
process runs on each selected system and collects detailed system
information. Even for customers without
a service contract, the raw data file is a useful source of information for the
system manager.
The collected data
file is written as a series of tag-delimited sections as follows:
--- START RCM <section name> ---
--- END RCM <section name> ---
Some examples of
sections of the raw data file are shown in Example 2.
--- START RCM OPERATING SYSTEM ---
Operating System = VMS V7.3-1
Console Version = V6.6-1111
Palcode Version = 1.98-2
--- END RCM OPERATING SYSTEM ---
--- START RCM NODE ---
Node Host Name = ALPHA1
Node System Type = AlphaServer GS160 6/1224
Node Domain Name =
--- END RCM NODE ---
|
Example 2 Raw Data Sections
The following sections describe the types of data that RCM collects.
|
|
Configuration Tree Data (FRU) |
|
|
A configuration tree is a memory structure containing
system hardware resource configuration and associated Field Replaceable Unit
(FRU) information on AlphaServer systems. The configuration tree, which is
built by the console firmware, is a permanent data structure in system memory;
it is also written to the binary error log. RCM uses two separate products,
DECevent and WEBES, to read the FRU data:
- DECevent
On
AlphaServer systems with FRU Version 4.0 (models 1200, 4X00, 8X00, GS60,
GS140), DECevent is required to translate the FRU table. DECevent is a hardware
fault-management diagnostic tool that reads hardware configuration information
from an error log. RCM uses the OpenVMS Analyze/System command CLUE FRU to
generate a file with a dummy binary error log record for DECevent to
analyze. Note that no Product
Authorization Kit (PAK) is required for the error log translation feature of
DECevent.
- WEBES
The HP Service Tool WEBES is required on systems with FRU Version 5.0 to enable
RCM to collect additional hardware and firmware information from the
configuration tree. Some of the AlphaServer systems supporting FRU Version 5.0
are models DS10, DS10L, DS20, DS20E, DS25, ES40, ES45, GS80, GS160, GS320, and
GS1280. RCM determines if a system is a FRU Version 5.0 system by checking the
value that F$GETSYI("SYSTYPE") returns. Alpha systems with a system type higher
than 33 are FRU Version 5.0 systems.
Note that the "Desta Director" process
is not required to be running on RCM Version 4.2.
The collected
configuration tree data is written to the RCM data file as a series of
name=value pairs as shown in the short extract in Example 3.
|
Clue Configuration Data |
|
|
The collector uses the
Analyze/System SDA command CLUE CONFIG to obtain detailed system, memory, and
device configuration information.
|
Clue SCSI Data |
|
|
RCM uses the
Analyze/System SDA command CLUE SCSI/SUMMARY to collect SCSI configuration data,
including all ports, targets, and connections with attached devices. Device
types and hardware revision information are included.
|
Installed Software Product Information |
|
|
RCM uses the DCL
command PRODUCT SHOW PRODUCT/FULL to find which products and patches have been
installed using the DCL command PRODUCT INSTALL. RCM also searches the
VMSINSTAL.HISTORY file to find any products installed using the older
installation tool VMSINSTAL.
|
License Information |
|
|
RCM uses the OpenVMS
DCL command SHOW LICENSE/UNIT_REQUIREMENT to collect details of licenses loaded
on the system.
|
Device Information |
|
|
To collect detailed
information about all disk and tape devices attached to the system, RCM uses
the DCL command SHOW DEVICE/FULL and the lexical F$GETDVI with various item
codes.
|
Network Configuration Information |
|
|
Detailed network
configuration information is usually collected by displaying hardware and IP
addresses. Customers who do not want to
display this information can skip this section by setting the [SUPPRESS IP ADDRESSES] option in the
configuration file to Y.
|
Hard and Soft Partition Information |
|
|
RCM can collect
details of the hard and soft partitions for AlphaServers systems. Hard
partitioning is a physical separation of computing resources by
hardware-enforced access barriers. No
resource sharing exists between hard partitions. Soft partitioning is a
separation of computing resources by software-controlled access barriers. Read
and write access across a soft partition boundary is controlled by the
operating system. OpenVMS Galaxy is an implementation of soft partitioning.
Hard
and soft partition information, which RCM and ESMG require to properly identify
a system, is found in configuration tree data.
If a system does not support partitions, the hard and soft partition
values are set to -1. Example 4 shows
the hard and soft partition information in the raw data file for a typical
system:
Example 4 Collected Hard/Soft Partition IDs
Example 5 shows a
section of an RCM configuration report with platform partition information.
Hard Partition Id |
Soft Partition Id |
Instance Name |
Instance OS |
Date of Last Collection |
0 |
0 |
ALPHA1 |
VMS V7.3-1 |
2004-04-27 02:52:32 |
0 |
1 |
ALPHA2 |
VMS V7.3-1 |
2004-04-27 02:58:31 |
0 |
2 |
ALPHA3 |
VMS V7.3-1 |
2004-04-27 03:01:26 |
Example 5 Platform Partition Table from an RCM Report
|
GALAXY and RAD Data |
|
|
For systems with
OpenVMS Galaxy software, additional information is collected using the
following F$GETSYI Galaxy item codes:
galaxy_platform, galaxy_member, galaxy_id, community_id, partition_id
RCM uses the
Analyze/System SDA command SHOW GALAXY to capture the state of all the
instances in the Galaxy configuration and collects
Resource Affinity Domain (RAD) data using the following F$GETSYI item codes:
rad_max_rads, rad_cpus, rad_memsize and rad_shmemsize
The output from the
following command, which shows how much memory is mapped in each RAD, is also
collected:
$ MCR SYS$TEST:RADCHECK.EXE -allprocs
|
ProPatch Data |
|
|
RCM collects input
data for the separate tool ProPatch and writes it to a .LIF file. The RCM
server automatically submits the LIF file to the ProPatch server, which returns
a report that recommends installing specific patches. This functionality is
derived from the former Digital Equipment Corporation tools DASC and LIFE.
The LIF files consists
of a listing of all the executable images in several system directories such as
SYS$SYSTEM, SYS$LOADABLE_IMAGES and SYS$SHARE. Each entry in the LIF file shows
the image name, link time, and version identification. This information is
extracted from the image header.
The ProPatch server
uses the image link times to determine if any patches are required. See Example 6 and Example
7 for portions of a typical .LIF file and the
corresponding ProPatch report, which recommends installing a critical patch
based on the link time of a system executable file. In this example, ProPatch
recommends installing the critical patch VMS731_XFC-V0200 because it has
detected that the system has an old version of SYS$SHARE:ALPHA_XF$SDA.EXE.
Example 6 Section of a .LIF File
Example 7 Portion of a ProPatch Report
|
Storage Controller Information |
|
|
RCM collects
information from storage controllers using the HP StorageWorks Command Scripter
utility, which is bundled with the RCM collector. This is a command-line tool
that can safely interrogate a storage controller. Earlier versions of RCM used
HSZTERM, which is no longer supported.
The Command Scripter
command line (CLI) command '-j subsysdata' is used to find
available HSJ devices and the '-f subsysdata' command is used to obtain
a list of available HSZ and HSG storage controllers.
The following command
script CLI commands are used for HSJ devices:
SHOW
THIS_CONTROLLER FULL
SHOW OTHER_CONTROLLER FULL
SHOW STORAGESETS FULL
SHOW FAILEDSET FULL
SHOW SPARESETS FULL
SHOW UNITS FULL
SHOW DEVICES FULL
For HSG and HSZ
devices, enter these commands:
SHOW THIS_CONTROLLER FULL
SHOW ASSOCIATIONS FULL
SHOW CONCATSETS FULL
SHOW CONNECTIONS FULL
SHOW REMOTE_COPY FULL
SHOW STORAGESETS FULL
SHOW UNITS FULL
SHOW DEVICES FULL
SHOW OTHER_CONTROLLER FULL
SHOW EMU
If the Command Console
LUN (CCL) is enabled at the console, a CCL device is used to access a HSG80
controller. Otherwise, one of the attached devices is used to access the
controller. The CCL devices have device names such as $1$GGAnnnn.
The firmware revision
of the storage controllers is in the output of the SHOW THIS_CONTROLLER command
as shown Example 8.
--- START RCM HSG ---
SANworks Command Scripter V1.0B Build 076
SHOW THIS_CONTROLLER FULL
Controller:
HSG80 ZG11305292 Software V87F-1, Hardware E12
NODE-_ID = 5000-1FE1-0011-8F20
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-3
Configured for MULTIBUS_FAILOVER with ZG11305742
In dual-redundant configuration
|
Example 8 SHOW THIS_CONTROLLER Output
To obtain the firmware revision of Fibre Channel Adapters (for example,
KGPSA), use the following Analyze/System command:
SDA> FC SHOW DEVICE FGxx:
Example 9 shows
typical output of this command in the raw data file.
--- START RCM SDA FC SHOW DEV FG ---
FGAO: operational firmware revision DS3.81A4
port_name(adapter_id) = 1000-0000-C92B-0098, node_name(host_id) = 2000-0000-C92B-0098
FGB0: operational firmware revision DS3.81A4
Port_name(adapter_id) = 1000-0000-C92A-FF07, node_name(host_id) = 2000-0000-C92A-FF07
--- END RCM SDA FC SHOW DEV FG ---
|
Example 9 KGPSA Firmware Revision
|
SAN Switch Data Collection |
|
|
You can configure RCM
to collect configuration information from HP-supported switches that are used
in Storage Area Networks. To collect this data, RCM requires the IP address of
each SAN switch as well as its SNMP community string. The RCM collector issues
a passive request for data to each designated SAN switch using the SNMP
protocol.
This option is
available only on systems running the HP TCPIP product because the OpenVMS RCM
collector uses the TCPIP utility SYS$SYSTEM:TCPIP$SNMP_REQUEST.EXE to collect
the SAN Management Information Base (MIB) data shown in Example 10.
SAN configuration
reports are generated from the collected MIB data and are made available to
customers in the eSMG. Excerpts from sample reports are in the last sections of
this paper.
Example 10 SAN Switch MIB Data
|
EVA Data |
|
|
RCM V5.0 can collect
configuration information from
HSV110-based Enterprise Virtual Array Storage Systems. To enable EVA
data collection, the RCM configuration file must have the EVA DATA COLLECTION
option set to Y.
RCM uses the Storage
System Scripting Utility (SSSU) to communicate with a Command View EVA (also
called the Storage Element Manager), running on the SAN Appliance Manager. This
communication is necessary to find out
what EVA storage systems, or cells, the appliance manager is responsible for
and also to collect revision and
configuration information for a particular cell. Each cell represents, at a
logical level, all the components that make up the EVA storage system,
including cabinets, power supplies, disks, and controllers.
To communicate with
the Element Manager, the RCM collector requires the IP addresses and valid
access details for each SAN Appliance Manager. This information is stored in
the file RCM$DATA:EVA.INI using the format shown in Example 11.
Supported versions of
the Element Manager are Version 2 and Version 3. SSSU Version 2 is
required for Element Manager Version 2;
SSSU Version 3 is required for Element Manager Version 3. Both versions of the
OpenVMS Alpha SSSU utility are included in the RCM collector kit. It is
important to know the versions of Element Manager running on each appliance
manager so that you use the correct version of SSSU.
Example 11 EVE.INI with Details of a V3 Element Manager
|
SYSGEN Parameter Information |
|
|
To collect OpenVMS
System Generation Utility (SYSGEN) parameters, RCM uses the SYSGEN commands
SHOW/ALL and SHOW/SPECIAL. The contents of the file SYS$SYSTEM:MODPARAMS.DAT
are also included in the RCM collected data.
The RCM Configuration
Report does not show all the SYSGEN parameters; however, the Change Report
shows any differences in the SYSGEN parameters between any two collections.
This can be useful in diagnosing problems that might be due to changed SYSGEN
parameters. To see all the parameters, you can view the raw data.
|
Configuration Report |
|
|
The following figures
show excerpts from a sample RCM Configuration Report (many sections have been
omitted from these examples). The data for the various tables is collected from
many parts of the raw data. Hardware part numbers and revision information are
derived from configuration tree data.
Figure 2 Sections of an RCM Configuration Report
Figure 3 Additional Sample Tables from Configuration Reports
Figure4 gives some examples of HSJ and HSG Storage
controller information from a configuration report. Several columns and rows
have been removed for readability.
Figure 4 HSx Storage Controller Tables from Configuration Reports
|
ProPatch Report |
|
|
Figure 5 shows a partial ProPatch Report.
Figure 5 Sample ProPatch Report
|
Change Report |
|
|
Figure 6 shows part of
a Change Report illustrating SYSGEN parameter changes.
Figure 6 Portion of a Change Report
|
SAN Reports |
|
|
A typical SAN report
contains the following information:
- Configuration information for individual fabric servers
- Configuration information for fabric storage
- Configuration information for SAN switches, inter-switch links, and switch ports
- SAN topology maps
Some examples from a
typical SAN report are shown in the following figures:
Figure 7 Part of a SAN Report
Figure 8 Additional Details from a SAN Report
Figure 9 Physical FABRIC Diagram from a SAN Report
|
|
|
|
|
RCM is an extremely useful tool for mission-critical
customers and HP technical account managers in managing OpenVMS systems. The
automated collection process ensures that the configuration information in eSMG
is always up-to-date. RCM helps diagnose problems that can arise due to
configuration changes, and it maintains a historical record of all changes to a
system.
|
|
|