To help test and demonstrate the
capabilities of a rather large amount of cached data, a test database with one billion (109;
1,000,000,000) rows and associated indexes in Row Caches was created. This database was configured with 10 tables
each horizontally partitioned across 5 storage areas. Each table consists of 10 quadword (64-bit integer) columns. One pseudo-ranked sorted index on a single
column was created for each table.
100,000,000 data rows of random content were then loaded into each
table. Each table required approximately 1.2 million index nodes at 2000 bytes
each (index depth of 5 levels).
For each table, two caches were configured: one for the rows and one for
the index nodes. Simulating an
environment planning for future database growth, caches for the database rows
were configured with 101,000,000 “slots” per table and the caches for the index
nodes were configured with 1,515,000 “slots” per index. The caches for the indexes were somewhat
over-sized as compared to the caches for the data rows. This was intentional to simulate a typical
production application where the DBA would allow for modifying index key
values. Changes to the index keys could
cause index node splitting, leading to additional index nodes. The caches for
indexes were also configured specifying the Row Cache “no replacement”
attribute to enable an optimization that helps avoid cache slot contention for
heavily accessed database objects.
Taking advantage of the “Snapshots in Row Cache” feature of Oracle Rdb
Release 7.1.2, each cache was created with 50,000 snapshot “slots” to allow
read-write transactions to store snapshot records for modified rows directly
into cache (avoiding disk I/O and database page locking). By eliminating snapshot and database I/O when
modifying cached records, the application should more fully utilize the system
CPU resources, leading to reduced transaction duration. All caches were created specifying the
“SHARED MEMORY IS PROCESS RESIDENT” attribute to allow use of OpenVMS shared page
tables and granularity hint regions (reducing physical memory consumption and
improving performance).
The total amount of physical memory utilized for
Row Caches for this configuration was approximately 202GB (representing both
overhead and usable data storage).
Data content for each table was generated by a
program that wrote records of random values into an OpenVMS mailbox; the
mailbox was read by an RMU /LOAD session that populated the table.
After populating the caches with the data row and
index node content, a synthetic workload was run. This workload represented database server processes that would
continuously execute database update transactions, with each transaction
choosing a table at random and selecting and modifying a random row in the table. Due to the initial data distribution
pattern, approximately one transaction in a thousand would find the specified
row non-existent and would insert the row into the database. Thus, every transaction either updates an
existing database row or, occasionally, inserts a new row into the database.
The workload program was written in C using the
Oracle Rdb SQL precompiler. In order to
keep the test program development effort limited, and to help ensure a
reasonable percentage of non-database activity on the system, the SQL update
and insert statements are generated at run-time using the dynamic SQL
interface. The technique of using
entirely dynamic SQL for the workload may be revisited in the future, as the
CPU cost to completely interpret and compile the SQL statement for each
operation may be an excessive burden on performance of this test and limited
overall throughput. The Oracle Rdb SQL
language provides the ability to, at run time, compile and later repetitively
execute statements. This approach would
be expected to save some CPU time.
To ensure high levels of performance for writing to
the after-image journal (AIJ), the database was configured to use the optional
AIJ Log Server (ALS) process. The ALS
process removes the task of writing to the AIJ from the user processes by
performing all AIJ write operations for a database. The ALS is optimized to avoid journal locking operations, and
does “double buffered” asynchronous I/O operations to the after-image journal
to reduce effective I/O latency.
As the test application would be modifying or
inserting records for every transaction, a large number of locking operations
were expected during the test. To
provide for efficient system scaling, OpenVMS was configured to use the
optional dedicated CPU lock manager.
This feature provides significant performance increases for heavy lock
operation loads when many CPUs are present in the system by eliminating
contention for locking data structures.
A single CPU is dedicated to locking operations for all processes on the
system with very low overhead.
System Configuration
The test system utilized a Hewlett-Packard AlphaServer GS1280
running 32 Alpha EV7 processors at 1150 MHz with 256GB of physical memory. OpenVMS Alpha Version 7.3-2 base level X9Z0
was installed and disk volumes were configured as RAID-5 arrays in an EVA12000
storage server accessed through KGPSA controllers. As disk I/O loads were expected to remain quite low during this
test (ultimately less than 5000 I/O write operations per second), the
configuration and absolute performance of the storage subsystem was not a
significant concern and little analysis or tuning was performed or required.
Initial Test Results – More than 13,000 Database Transactions Per Second
To simulate a large multi-client database server environment, multiple copies
of the workload were run at the same time.
Instances of the program were added (to a total of 60) until the system
was effectively CPU bound (over 98% utilization). All database server workload instances solely executed read-write
transactions and updated or inserted a single row per transaction.
At this workload level, the peak database transaction rate was 13,228
transactions per second with significantly less than 1 disk I/O request per
transaction. System-wide disk I/O rate
was approximately 3000 per second and the system-wide locking operation rate
(new lock requests, promotions, demotions and releases performed by the OpenVMS
lock manager) was approximately 150,000 per second.
Second Test
In an attempt to increase the transaction rate, a second experiment was
run with database snapshots disabled and the Oracle Rdb “commit to journal”
feature enabled. Avoiding writing
snapshots to the Row Caches should reduce CPU consumption (in a typical
production system, this would not often be a viable configuration because it
causes read-only transactions to get promoted to read-write transactions and
some operations such as online database backup are prohibited while snapshots
are disabled). The “commit to journal”
feature reduces database root-file I/O by allowing transactions to allocate
transaction sequence numbers from the database in groups of up to 1024 at a
time, rather than one per transaction.
The database was also configured to allow Oracle
Rdb to use the OpenVMS “Fast I/O” feature.
Setting the “buffer object enabled” attribute for database file, root
file, after-image journal file, and recovery-unit journal file I/O operations
enables this feature within Oracle Rdb.
The “Fast I/O” feature provides a streamlined I/O interface that can
reduce CPU usage by simplifying the I/O path and performing buffer locking and
probing once (typically at application startup), rather than for each I/O
request.
To enable multiple statistics global sections to be automatically
created by Oracle Rdb, OpenVMS was also configured to enable RAD support on the
GS1280. By using many statistics global
sections, memory contention for statistics counters maintained by Oracle Rdb is
reduced. When Oracle Rdb detects that a
system is configured with RAD support when a database is opened, a statistics
section is created in each RAD. As user
processes attach to the database, they select a statistics global section to be
used based on the process’s “home” RAD.
Second Test Results – One Million Database Transactions Per Minute
With the reduced overhead due to the elimination of database snapshots,
the “commit to journal” feature, and the “FAST I/O” feature, the same test
workload peaked approximately 17,000 database transactions per second. This was considered a significant data point
in that it represents just over 1 million database transactions per minute for
this particular workload.
Results Analysis and Updates
The performance indicators in this test caused some amount of interest
between both HP and Oracle engineering staffs.
Further lab testing revealed some areas in the Rdb engine that could be
further optimized. In particular, some
alignment faults (cases where the instruction stream was expecting to operate
on naturally aligned data cells but caused a fault when the data was not
aligned) were eliminated from the code.
The application
itself was also enhanced to pre-compile all possible update statements during
program startup. Then at run time, the correct pre-compiled statement was
executed during each transaction. This
technique avoided having to compile the dynamic SQL statement for each update,
saving CPU cycles.
Third Test
Each release of Oracle Rdb version 7.1 is currently shipping as two
variants: one compiled for all Alpha processors and one compiled for Alpha EV56
and later processors. The EV56 variant
includes code compiled to utilize the Alpha byte-word instructions. As an internal performance experiment, the
Rdb code was compiled explicitly for the EV67 and later Alpha processors. This configuration allowed the language
compilers to produce an Alpha instruction sequence that was optimal for the
EV67/EV68/EV7 processors in both use of available instructions and the
scheduling of instructions for the quad-issue Alpha processor.
A 32 processor GS1280 with 128GB was configured and made available for a
second week of testing in January of 2004.
OpenVMS V7.3-2 was installed with the experimental compilation of Oracle
Rdb along with the enhanced workload program.
The OpenVMS feature RAD support was not enabled for this test.
A number of experiments were run with the updated configuration. Areas of interest included measuring
performance with varied numbers of CPUs to determine how effectively the system
scaled from 8 to 32 processors, the effect of the Rdb AIJ Log Server (RDMALS)
process, impact of the Record Cache Server (RDMRCS) process checkpoint
operations, and so on.
Third Test Results – 1,791,480
Database Transactions Per Minute
A sustained measured database transaction rate of 1,791,480 for an interval
greater than one minute was demonstrated (representing 29,858 transactions per
second). Rates of over 30,000
transactions per second were sustained over 15 second intervals.
Further Analysis
In addition to the performance indicators, the ability to run these
workloads on a large multi-processor system also provided the opportunity for
Oracle Rdb Engineering and Hewlett-Packard OpenVMS engineering to measure and
review various performance indicators.
The results of such analysis helps provide direction for further areas
of investigation for performance enhancements for future releases of both
OpenVMS and Oracle Rdb.
Observations and Recommendations
Using caches as large as those in the
tests requires some operational considerations. First, it can take quite a while for the Row Cache Server (RCS)
process to create and map (and, by extension, initialize) these large global
sections. The Database Recovery process
(DBR), as well, can run for a longer time because it must scan all cache slots
in order to make sure that the failed process had not reserved any cache
slots. It may be possible to reduce
these effects, to some extent, in future releases of Oracle Rdb.
In the same way, the RMU/SHOW statistics utility can take a long time
to accumulate statistics information for very large caches. Some of the displays require that all cache
slots be examined (to determine the amount of space used in a cache, or to
count the number of reserved rows, for example). This can require a significant amount of CPU time and the actual
screen refresh interval will be quite slow.
When closing, backing up, or analyzing
a database, the RCS process must write all modified rows back to the physical
database. When using huge caches with
many modified rows in cache, the RCS may take quite a while to do this
writing. Planning ahead for a shutdown
or backup may allow you to perform an RMU /SERVER RECORD_CACHE CHECKPOINT
command prior to the scheduled shutdown or backup time. This allows the RCS to get a “head start”
writing modified rows back to the database.
|