[an error occurred while processing this directive]

HP OpenVMS Systems

ask the wizard
Content starts here

RMS Indexed File Performance Tuning?

» close window

The Question is:

 
I have created an indexed file as following:
 
- variable length, maximium 100 bytes
- single primary key (with NO duplicate) of 19 bytes of string,
  starting at beginning of the record
- the nature of primary key is a string of timestamp in format
  of YYYYMMDDHHMMSSCCCCC.
- initially the file is empty (as to be created daily in
  future production environment during day start processing)
 
I have used EDIT/FDL and then created and tested the file using
SYS$PUT in a toy program. To eliminate accumulative effect, an
empty file is created from scratch using the FDL file for each
test.
 
The following setup is preapred for this test:
- the entire OpenVMS is solely used by one single user for
  this test.
- pre-allocation is used to ensure the file is large enough
  during file creation
 
The performance result is shown below.
 
- 200 records are written, 1 seconds is used, average 200 msg per sec
- 400 records are written, 2 seconds is used, average 200 msg per sec
- 800 records are written, 5 seconds is used, average 160 msg per sec
- 1600 records are written, 9 seconds is used, average 177 msg per sec
- 3200 records are written, 20 seconds is used, average 160 msg per sec
- 6400 records are written, 42 seconds is used, average 152 msg per sec
- 12800 records are written, 81 seconds is used, average 158 msg per sec
- 25600 records are written, 166 seconds is used, average 154 msg per sec
 
In real life, data will be received from socket at (max) 200 messages
per second. For the duration of 4 trading hours, there are 2.88 million
messages to be written. I have to ensure that the rate of SYS$PUT must at
least maintain constantly at 200 records per second.
 
I understand that the nature of ascending pattern of primary keys has
the impact to cause RMS to frequently rebuild index and split bucket
(please correct me if I use the terminology incorrectly).
 
How can I avoid the degrading performance of SYS$PUT ?
 
The following changes have been attempted but no improvement can be seen:
 
- changing data_key_compression, data_record_compression, key_compression
  from YES to NO (actually I do not know whether it should be YES or NO,
  just try)
- changing SYS$OPEN to reduce from SHRUPD|SHRDEL|SHRPUT|SHRGET to only
  SHRGET (since it is expected another program will read records by
  primary key)
- enlarging bucket size (from 3 to 10, then to 30, finally to max 63)
 
What else can I further consider ? process quota ? defer write ?
enlarging index bucket size (how to do this?) ?
 
Many thanks for your time and assistance
 
 
 
 
 
 


The Answer is :

 
  Please contact HP consulting services, as this certainly appears
  to be a non-trivial application environment.  This RMS performance
  discussion is well beyond the assistance that can be reasonably
  offered here in Ask The Wizard, as well.
 
  A text-based time value is certainly a reasonable key, and it
  will compress to about eight characters.  That said, the
  OpenVMS Wizard would more likely use a quadword time value as
  the key.
 
  Beware the time-change for daylight savings time -- the keys
  can and often should be in UTC or similar, and thus the TDF
  (Timezone Differential Factor) information is often needed.
  Also consider using UTC-format time itself and the associated
  system service and RTL routines, particularly if you do not
  wish to run the system time in UTC.
 
  When using EDIT/FDL, you must input the final size and not the
  initial size of the file.
 
  You will want to set the file allocation and file extension
  sizes appropriately for file activity.  Often an extension
  size of 500 blocks is reasonable, though you will want to
  investigate pre-sizing the file as appropriate.
 
  Be sure to select a non-default number of buffers.  The number
  of buffers is based on the index depth, and the index depth
  on the prototype file will not be realistic.  You will likely
  see a value around 4, but you probably want to use 10 to 20.
  See RAB$B_MBC or SET RMS/INDEXED.
 
  When measuing your performance during your testing, make sure
  to insert the keys in ascending order.  You should see a second
  index level at 1000 and a third at 100,000 records with small
  buckets.
 
  You can use the rms_tools Freeware spreadsheet as a tool to
  predict the index level.  See:
 
    http://h71000.www7.hp.com/freeware/freeware50/rms_tools/
 
  You will also want to consider what other caches are active,
  as a controller or block cache may not be caching appropriate
  data.  (It is possible that you are exceeding the cache, and
  causing blocks to be flushed.)
 
  The OpenVMS Wizard will assume few or no read I/O operations.
  You will want to consider the write I/O activity, using tools
  such as SET FILE/STATISTICS and MONITOR RMS.  Also see the
  ANALYZE/SYSTEM command SHOW PROCESS/RMS=FSB, or use the
  RMS_STATS tool from the Freeware area mentioned earlier.
 
  You will want to consider if you can coallesce multiple record
  operations into one I/O; you will want to consider the risks of
  loss of data during a failure against the costs of the I/O.
  Larger I/Os tend to prefer larger bucket sizes, while smaller
  and more frequent I/Os tend to prefer smaller buckets.
  If you can coallesce records, enable defered writes and
  flush ever 100 ms or so (assuming 10 I/Os per second) or
  flush based on the numbers of records stacked.  Or let RMS
  manage the buffers, writing them when they overflow.
 
  If you can not group and thus every $put must become an
  I/O write operation, WRITE IO, you will want to select smaller
  buffers as RMS always writes entire buckets.  (Too small,
  however, and the numbers of index writes will increase.
  Typical (small) bucket sizes should fit between six and
  roughly thirty records, and the index buckets should be
  sized for thirty to a thousand data buckets per index bucket.
 
  As for the bucket splits, if your data is input in ascending
  order as you indicate, there will be no bucket splits.  The
  records are written to the end, and the indices will extend.
 
  As for your performance questions in general, you will want
  to measure and explain all RMS activity.  Use the available
  tools.  Understand all of the compontents of the I/O path,
  including the block and controller caches, the interconnect
  speeds, and the spiral transfer rates.  For instance, run
  the application and determine what RMS has buffered after
  10, then 1,000, then 100,000 records.  Is this what you
  expected to be buffered?  (use ANALYZE/SYSTEM with the
  SHOW PROCESS/RMS=(RAB,BOBSUM) command.)
 
  Key compression is normally enabled, though index compression
  is normally disabled as this provides for binary searches.
 
  Any file sharing will enable full locking.
 
  Also consider a hardware upgrade, as faster processors and
  particularly Fibre Channel I/O can dramatically improve file
  system throughput.  (Tuning is an on-going and expensive
  process, as well.)
 
  As fo how to enlarge index buckets, create a file with multiple
  AREAS using EDIT/FDL or other tools, and create an area with a
  larger bucket size for the index.
 
 

answer written or last revised on ( 11-APR-2003 )

» close window