|
» |
|
|
|
|
|
|
|
|
Kostas G. Gavrielidis, Master Technologist HP Services
|
|
|
|
|
|
|
|
Several layered products and utilities exist for monitoring
Oracle Rdb databases on OpenVMS platforms. In this article, we review products available
from other companies and propose an HP internally-developed solution developed
and deployed in customer production environments.
There are several Rdb monitoring and alarming utilities and
tools available today:
- RMU/SHOW STATISTICS Utility
The RMU/SHOW STATISTICS utility, one of the most
powerful and useful tools available to database administrators (DBAs), analyzes performance characteristics of
a database. However, the ability to analyze performance problems is only one aspect
of a performance analysis tool. The DBA also needs to detect problems in a timely manner, then analyze the problems and
immediately perform corrective
actions.
- PATROL Knowledge Module for Rdb
PATROL monitors and manages the resources in the environment using information from
special files known as Knowledge Modules (KM) that are loaded in the console. If
PATROL detects a problem with a particular computer or application that it is
monitoring, these modules provide information for PATROL to attempt to fix the
problem. If the problem escalates or requires human intervention, PATROL
displays a warning or alarm condition for every resource affected by the
problem.
- Oracle Rdb Trace and Rdb Expert
Oracle Trace is a layered product that gathers and reports event-based data from
OpenVMS layered products and application programs that contain its service
routine calls. Oracle Trace provides Oracle Expert for Rdb along with data for
optimizing existing Rdb databases. Event-based data can be collected from products
that contain Oracle Trace service routine calls. In addition, Oracle Trace
routine calls can be added to other user-developed applications to collect datafrom them. This process of adding Oracle Trace service
routine calls to an application is called instrumenting
the application.
- Internally-Developed Solutions
Internally-developed solutions consist of the RMU/SHOW
STATISTICS Utility along with alarming and notification features based on rules customized by the
database administrator.
This article is written for those who implement enterprise
solutions that involve HP customer's Oracle Rdb and OpenVMS production
environment configurations.
|
|
|
|
|
|
Database objectives include providing monitoring and
alarming capabilities that support overall Enterprise Management (EM)
activities. These objectives maximize the availability of applications and
support the monitoring and alarming requirements that meet the following
objectives:
- Develop software scripts, as necessary, to facilitate the monitoring and alarming of
important databases by focusing on specific areas previously identified by the
customer. These include (but are not limited to) the following areas:
- database status (i.e., up or down)
- elapsed transaction processing time
- transaction rate
- file system fragmentation
- record locking
- record level fragmentation
- critical transaction rate
- lock rate
- transaction duration
- number of fetches
- number of stores
- number of erases
- Document all developed software and the installation and deployment processes that be
performed in the customer production environment.
|
|
|
|
|
|
The event statistic name clause identifies the particular valid statistic
field for which the event will be enabled or disabled. Note that some statistic
names are valid only when certain database attributes are enabled, such as
global buffers or record caching. The names of a particular statistic field can
be found on the individual screen of interest. When statistic field names
contain multiple words, such as process
attaches, the statistic name must be either single- or double-quoted;
failure to quote the statistic name may result in a syntax error.
Certain statistic fields have leading blanks. This white
space is considered part of the statistic name and is necessary to identify a
unique statistic. However, the use of leading blanks is often difficult to
discern in the configuration file. Therefore, the underscore character (_) or dash character (-) can be used in place of spaces
in statistic names that have leading spaces. For example, the statistic field
name " file extend" can also be specified as "_ _
file_extend" or "- - file-extend". This method improves the readability of difficult
statistic field names.
Most general events are defined using the summary statistics screens. However, it
is sometimes necessary to define an event on a specific table or index, or even
a particular partition of a table. The AREA attribute allows you to specify this
type of drill-down event and indicate
the name of a particular storage area. When this clause is specified, the
statistic field selected must be from the IO
Statistics (by file) or Locking
Statistics (by file) screens. The identified statistic name can be also
qualified with the LAREA attribute to specify the name of a particular logical area, such
as a table, btree index, hash index, or blob. When this clause is specified,
the statistic field selected must be from the Logical Area screens. Further, if
the selected logical area is partitioned across multiple storage areas, the
AREA clause can also be used to identify a specific partition against which to
define the event.
The following table explains the semantics for specifying the AREA
and LAREA clauses to qualify a statistic field name:
AREA |
LAREA |
Description |
No |
No |
General Statistic Field |
Yes |
No |
Storage Area Statistic Field |
No |
Yes |
Logical Area Statistic Field, all partitions |
Yes |
Yes |
Logical Area Statistic Field, single partition |
The AREA and LAREA clauses are attributes and must follow the Threshold Name component of the event
definition.
|
|
|
|
|
|
The optional event attribute
list clause provides additional characteristics for enabled event thresholds. In
general, these attributes are ignored when disabling an event. Any or all of
the event attributes can be specified for each event name within the same EVENT_DESCRIPTION variable definition. The attribute list clauses are as follows:
- AREAstorage_area_name−
Defines the name of a particular storage area. When this clause is specified,
the statistic field selected must be from the "IO Statistics (by
file)" or "Locking Statistics (by file)" screens, unless the LAREA
clause is also specified. This clause is not ignored when disabling the event.
- LAREA logical_area_name− Defines the name of a
particular logical area, such as a table, btree index, hash index, or blob.
When this clause is specified, the statistic field selected must be from the
"Logical Area" screens. This clause is not ignored when disabling the
event.
- INITIAL value− Defines the initial value of the "current" event threshold. The default value is zero (0) for MAX_XXX
thresholds and "very big number" for MIN_XXX thresholds.
The default value guarantees that at least one event will be signaled, thereby
initializing the new "current" threshold value.
- EVERY value− Defines the value by which the initial threshold will be incremented or
decremented when an event is signaled. If this value is the default value zero
(0) for any event except the MIN_CUR_RATE and MAX_CUR_RATE
events, then the event will be signaled only once.
- LIMIT value− Defines the maximum number of times the event can be signaled. If the value is
the default value zero (0), events can be signaled indefinitely providing that
the EVERY clause is specified with a non-zero value.
- SKIP value− Defines the number of event notifications to ignore before performing an actual notification. This clause is
extremely useful for the MIN_CUR_RATE
and MAX_CUR_RATE
events, as the thresholds for these events are not reset upon being signaled.
The default value zero (0) ensures that all events are notified.
- NOTIFY oper_class_list− Defines the quoted comma-separated list of operators
to be notified for all events defined on the specified statistic. Valid
operator keywords are CENTRAL, DISKS, CLUSTER, SECURITY and OPER1 through OPER12.
- INVOKE program_name− Defines the
user-supplied program to be invoked for all events defined on the specified
statistic. On OpenVMS, the program name is specified as a DCL process global
symbol known to the RMU/SHOW
STATISTICS utility.
|
|
|
|
|
|
Very long configuration file lines can be continued on the next line by terminating
the line with a back-slash (\). As the last character of a line, this
continuation character is used to indicate that the configuration entry
is continued on the next line exactly as if it were entered as a single line.
Lines can be continued practically indefinitely, up to 2048 characters, even
within quoted values. The following example demonstrates how to define a
multi-line event description:
The continuation character is not limited to the EVENT_DESCRIPTION
variable; it can be used for any configuration variable.
Comments can be embedded in
continued lines if they start at the beginning of the next line. The following
example demonstrates two event descriptions containing embedded comments. The
comment in the second event description takes precedence over the line
continuation character.
|
|
|
|
|
|
For an event to be active, you
must specify either one or both of the NOTIFY
or INVOKE attribute clauses.
When using the INVOKE attribute
clause, the program must be specified by defining a process-global symbol
pointing to the DCL command procedure or image to be invoked. The INVOKE program and NOTIFY operator classes apply to all
events defined for the statistic field. Therefore, these clauses need to be defined only once per statistic field, no matter how
many events thresholds are defined for that statistic. By specifying multiple
programs or operator classes, only the last-specified attribute is used.
Once an event has been signaled, it will only be
re-signaled if the EVERY attribute clause was specified with a non-zero value. The current
threshold value, originally initialized to the INITIAL value, will be advanced for MAX_XXX thresholds and declined for MIN_XXX thresholds. The exceptions to
this rule are the current rate
thresholds MIN_CUR_RATE and MAX_CUR_RATE, which are never advanced nor declined. The MIN_XXX thresholds disable themselves
once the INITIAL value reaches zero (0), while the MAX_XXX thresholds never disable themselves.
Once an event has been disabled, it can be re-enabled only
by importing a new configuration file or by manually using the Statistics Event Information screen, Re-enable all disabled events
configuration sub-menu option. Individual events cannot be re-enabled on line.
|
|
|
|
|
|
The user-defined events are
analyzed by the RMU/SHOW STATISTICS
utility at the specified screen refresh rate. The default screen refresh rate
of 3-seconds is ideal for most databases. However, using a 1-second refresh rate will produce a finer
granularity event signaling mechanism. Multiple events defined for the same
statistic field may cause the specified program to be invoked multiple times (once
for each affected event).
As the RMU/SHOW STATISTICS utility identifies a statistic field
whose current value or average value is changing, it examines any defined event
thresholds established for that statistic field. This manner of examination
minimizes the impact of event analysis, since the analysis is performed as part
of the normal statistics collection process.
When the utility determines that
a specified event threshold has been exceeded, an event is signaled. The
signaling of the event means that any specified programs will be invoked and
any specified operators will be notified. The event notification occurs
immediately.
If you defined a program that
will be invoked when an event is signaled, the program will be invoked with
eight parameters. Some of
the parameters contain multiple words that must be quoted if the parameters are
passed to other utilities.
The parameters passed to invoked programs are as follows:
- P1 − This parameter is the date and time the event occurred. This parameter contains embedded blanks.
- P2 − This parameter is the statistic field name. This parameter may contain embedded blanks.
- P3 − This parameter is the event name.
- P4 − This parameter is the current event numeric value, expressed to the nearest tenth.
- P5 − This parameter is the word above or below.
- P6 − This parameter is the current event threshold value.
- P7 − This parameter is the event occurrence count.
- P8 − This parameter is the optional physical area and/or logical area name for the statistic field.
The P8 parameter is either null (blank) or contains the
name of the affected storage area and/or logical area. The following example
contains a log file sample output where an event for a partitioned logical area
was signaled. Note that when both the storage area and logical area names are
specified, they are separated by a period (.).
|
|
|
|
|
|
The current runtime status of
the user-defined events can be examined using the new Statistics Event Information screen, located in the Database Parameters sub-menu. Note that
you do not have to be viewing this
screen to signal events. Note also that the
physical area and logical area identifiers are only displayed in Full mode.
|
|
|
|
|
|
Nothing demonstrates a new
feature better than a real-life example explained in step-by-step detail. For
the purposes of this example, suppose that the DBA wants to be sent email
whenever a database freeze occurs. A
database freeze occurs when an
application process on the database prematurely terminates (i.e., "dies"). Such
an event results in all application activity being temporarily suspended until
the recovery operation for the terminated process has been completed. This is a
very significant and serious runtime event that should be immediately detected.
Using events to notify the DBA
when a process terminates prematurely is very easy to accomplish. The following
steps describe how this can be achieved using the RMU/SHOW STATISTICS utility User-Defined Events:
- Identify the operation. Because you are going to define a new event, specify the ENABLEoperation keyword.
- Identify the statistic name to
which the event will be assigned. Use the "process failures"
statistic from the "Recovery Statistics" screen, which is located in
the "AIJ Information" sub-menu. This statistic is available even if
you are not using after-image journaling.
- Identify the threshold name to
use. Use the MAX_CUR_TOTALthreshold, since this represents the current number of
processes that have failed.
- Identify the event attributes to use. This is probably the hardest part of defining an
event. You want to be alerted to anyprocess failure, so you must set theINITIAL attribute to zero (0). Since you want to be
notified on each and every process failure, set theEVERY attribute to one (1) and theLIMITattribute to zero (0).
- Define how you will be alerted
about the event. Since you want to be sent mail, use theINVOKEclause. Invoking a program on
OpenVMS requires that you define a "DCL process-global symbol" to
identify the actual DCL script, as is demonstrated by the following example:
- Write the program to be
invoked. Since you want to be sent mail with a clear description of the event actually
signaled, use the simple DCL script in the following example:
- Combine all of this information into the configuration file entry. The following example contains
the final event description as you would enter it in the configuration file:
- Invoke the RMU/SHOW STATISTICS utility as a server, using
the configuration file. Be sure to use the /CLUSTER command qualifier if you want to be notified of
cluster-wide events. The following example demonstrates the command-line to
perform this operation:
Because applications increasingly require 24*7 availability, the rate at which DBAs are expected to
react to potential downtime
increases accordingly. The RMU/SHOW STATISTICS utility User-Defined
Events provides the means by which DBAs can be automatically alerted when
such critical situations arise, therefore enabling timely
corrective actions.
|
|
|
|
|
|
Configuration Variable −
A symbolic name that defines a value that can be used to define other variables'
values, or can be used by the RMU/SHOW
STATISTICS utility.
Drill-Down Event− An event defined on a specific
storage area, logical area, or logical area partition statistic.
Event− The identification of a particular
statistic value on which the RMU/SHOW
STATISTICS utility
is to perform some user-defined action.
Oracle Rdb− A high-end client/server
relational database for Alpha AXP and VAX.
Statistic Name− The valid statistic field for
which the event is to be enabled or disabled.
Summary Event− A general-purpose event defined
on a "summary" statistic field.
Threshold Name− The columns in the numeric version
of the statistics screens for the specified statistic to be enabled or
disabled.
|
|
|
|
|
|
PATROL is a systems, applications, and event management tool for database and system
administrators. It provides an object-oriented graphical workspace where you
can view the status of every vital resource in the distributed environment that
it is managing. PATROL monitors and manages the resources in the environment
using information obtained
from special files called Knowledge Modules (KM) that are loaded in the console. If PATROL detects a problem with a particular computer or
application that it is monitoring, then these modules provide "knowledge" information that
PATROL will use to attempt to fix the problem.
If the problem escalates or requires human intervention, PATROL displays every
resource affected by the problem in a warning or alarm condition. Table 1(see Appendix) includes a list of all the Rdb KM parameters that can be
monitored. PATROL is made up of three major components: the PATROL Console, the PATROL Agent,
and the Knowledge Modules.
|
|
|
|
|
|
Oracle Trace tabular reports identify occurrences such as
transaction with the higher virtual memory usage or the 95th percentile disk I/O
for each transaction. For transaction processing, Oracle trace gathers
information for DEC ACMS events to provide task-level performance information.
|
|
|
|
|
|
Oracle Trace allows the instrumentation of routines that
use the cross-facility capability. For example, using the cross-facility,
Oracle Trace associates the ACMS Procedure Call event with its related Oracle
Rdb transactions and requests in order to provide a greater understanding of
the total resources the ACMS Procedure Call uses.
|
|
|
|
|
|
This article described the monitoring and alarming capabilities of existing
utilities and products for the Oracle Rdb database engine on OpenVMS platforms. It presented the MSE Rdb_Mon utility as one of the internally-developed
alternatives for this technology sector.
|
|
|
|
|
|
» http://www.oracle.com/technology/products/rdb/index.html- Oracle Rdb
» http://documents.bmc.com/supportu/documents/55/16/5516/5516.pdf - PATROL Knowledge
Module for Rdb User Guide
Kostas G. Gavrielidis works in HP
Services Customer Support and has been with HP for more than 20 years. Currently,
and for the last 10 years, he is involved with the MSE proactive consulting
projects for our customer production Database Management systems, and works on
the analysis and performance improvements for SAP R/3, Oracle, Rdb, Ingres,
SYBASE, SQL Server on UNIX, OpenVMS, and Windows platforms.
|
|
|
|
|
|
Table 1: Rdb KM parameters that can be monitored
Parameter |
Description |
RDB_aij_reads |
Displays the number of read QIOs issued to the database .AIJ file (if after-image journaling is enabled). |
RDB_aij_writes |
Displays the total number of QIOs issued to the database after-image journal file (if after-image journaling is enabled). |
RDB_attaches |
Displays the number of current attaches to the database. |
RDB_blasts |
Monitors the number of blocking AST's delivered to Rdb by the OpenVMS lock manager. |
RDB_buf_unmark |
This parameter is incremented each time a modified buffer is written back to disk. Its value is equal to the sum of the 14 fields:
transaction, pool overflow, blocking AST, lock quota, lock conflict, user unbind, batch rollback, new area mode, larea change,
incr backup, no aij access, truncate snaps, checkpoint, and aij_backup. |
RDB_check_pts |
Displays the current number of checkpoints per minute. |
RDB_df_reads |
Displays the number of read QIOs issued to the database storage area for a single-file and multifile databases and snapshot files. |
RDB_df_writes |
Displays the number of write QIOs issued to the database storage area for a single-file and multifile databases and snapshot files. |
RDB_dup_nd_ins |
Displays the number of duplicate index keys inserted into the database's indexes. There should be a one-to-one correspondence to the
number of duplicate records being stored in the table. |
RDB_fetch_read |
Displays the number of synchronous data page requests to the PIO subsystem where only read privileges are being requested for the page. |
RDB_fetch_upd |
Displays the number of data page requests to the PIO subsystem where update and read privileges are being requested for the page. |
RDB_free_global |
This parameter displays the current percentage of free global buffers. |
RDB_hash_del |
Displays the number of hash key deletions from the database's hashed indexes. It includes unique key deletions and duplicate key deletions. |
RDB_hash_dup_ins |
Displays the number of duplicate hash key insertions in the database's hashed indexes. |
RDB_hash_ins |
Displays the number of hash key insertions in the database's hashed indexes. It includes unique key insertions and duplicate key insertions. |
RDB_lck_conf_unmask |
This parameter is incremented by 1 for each modified buffer that is written back to the disk to reduce the possibility of a deadlock
when Rdb discovers a lock conflict. |
RDB_lock_dem |
Displays the number of $ENQ lock requests to demote an existing lock to a lower lock mode. These requests always succeed. |
RDB_lock_req |
Displays the number of lock requests to new locks. Whether the lock request succeeds or fails, it is included in this count. |
RDB_overflow_unmark |
This parameter is incremented by 1 for each modified buffer that is written back to disk as a result of a request to read in a
new page from disk. |
RDB_recoveries |
Displays the current number of detached recovery (DBR) processes acting on this database. |
RDB_rt_nd_rem |
Displays the number of index entries removed from a root node because of deletion of entries within lower-level nodes. If an index
consists of only one node, removals from this node are not included in this field; but are included in the leaf removals field. |
RDB_rt_nd_ins |
Displays the number of index entries inserted into the root index node. The number of insertions should be small except when you load
a database. If an index consists of only one node, insertions into this node are not included in this field; but are included in the leaf insertion field. |
RDB_rt_reads |
Displays the number of read QIOs issued to the database root (.RDB) file. Rdb reads the .RDB file when a new user attaches to the
database and when an .RDB file control block needs to be updated because of database activity on another OpenVMS cluster node. |
RDB_rt_writes |
Displays the number of write QIOs issued to the database root (.RDB) file. Rdb writes to the .RDB file when a user issues a COMMIT
or ROLLBACK statements. Other events also cause updates to the .RDB file. |
RDB_ruj_reads |
Displays the number of read QIOs issued to the database recovery unit journal (.RUJ) file. This operation reads before-image records
from the .RUJ file to roll back a verb or a transaction. |
RDB_ruj_writes |
Displays the number of write QIOs issued to the database recovery unit journal (.RUJ) file. This operation writes before-image records
to the .RUJ file in case a verb or a transaction must be rolled back. Before-image must be written to the RUJ file before the corresponding database page can be written back to the database. |
RDB_trans_cnt |
Displays the number of completed database transactions. It is the count of the COMMIT and ROLLBACK statements that have executed. |
RDB_txn_unmark |
This parameter is incremented by 1 for each modified buffer that is written back to disk as a result of a COMMIT or ROLLBACK statement. |
RDBMON_attaches |
Displays the number of current attaches to all databases on this system. |
RDBMON_databases |
Displays the number of open databases on this system. |
RDBMON_recoveries |
Displays the current number of detached database recovery (DBR) processes on this system. |
RMU_stats |
This parameter is the collector for all Rdb parameters. |
|
|
» Send feedback to about this article
|