[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

HP COBOL
User Manual


Previous Contents Index

Each random access request begins by comparing a key value to the root bucket's entries. It seeks the first root bucket entry whose key value equals or exceeds the value of the access request key. (This search is always successful, because the root bucket's highest key value is the highest possible value that the key field can contain.) Once that key value is located, the bucket pointer is used to bring the target bucket on the next lower level into memory. This process is repeated for each level of the index.

One bucket is searched at each level of the index until a target bucket is reached at the data level. The data record's location is then determined so that a record can be retrieved or a new record written.

A data level bucket may not be large enough to contain a new record. In this case, the I/O system inserts a new bucket in the chain, moving enough records from the old bucket to preserve the key value sequence. This is known as a bucket split.

Data bucket splits can cause index bucket splits.

15.7.3.1 Optimizing Indexed File I/O

I/O optimization of an indexed file depends on five concepts:

  • Records---The size and format of the data records can affect the disk space used by the file.
  • Keys---The number of keys and existence of duplicate key values can affect disk space and processing time.
  • Buckets---Bucket size can affect disk space and processing time. Index depth and file activity can affect bucket size.
  • Index depth---The depth of the index can affect bucket size and processing time.
  • File size---The length of files affects space and access time.

Records

Variable-length records can save file space: you need write only the primary record key data item (plus alternate keys, if any) for each record. In contrast, fixed-length records require that all records be equal in length.

For example, assume that you are designing an employee master file. A variable-length record file lets you write a long record for a senior employee with a large amount of historical data, and a short record for a new employee with less historical data.

In the following example of a variable-length record description, integer 10 of the RECORD VARYING clause represents the length of the primary record key, while integer 80 describes the length of the longest record in A-FILE:


FILE-CONTROL.
    SELECT A-FILE ASSIGN TO "AMAST"
            ORGANIZATION IS INDEXED.
DATA DIVISION.
FILE SECTION.
FD  A-FILE
    ACCESS MODE IS DYNAMIC
    RECORD KEY IS A-KEY
    RECORD VARYING FROM 10 TO 80 CHARACTERS.
01  A-REC.
    03  A-KEY           PIC X(10).
    03  A-REST-OF-REC   PIC X(70).
        .
        .
        .

Buckets must contain enough room for record insertion, or bucket splitting occurs. The I/O system handles it by creating a new data bucket for the split, moving some records from the original to the new bucket, and putting the pointer to the new bucket into the lowest-level index bucket. If the lowest-level index bucket overflows, the I/O system splits it in similar fashion, on up to the top level (root level).

In an indexed file, the I/O system also maintains chains of forward pointers through the buckets.

For each record moved, a 7-byte pointer to the new record location remains in the original bucket. Thus, bucket splits can accumulate overhead and possibly reduce usable space so much that the original bucket can no longer receive records.

Record deletions can also accumulate storage overhead. However, most of the space is available for reuse.

There are several ways to minimize overhead accumulation. First, determine or estimate the frequency of certain operations. For example, if you expect to add or delete 100 records of a 100,000-record file, your database is stable enough to allow some wasted space for record additions and deletions. However, if you expect frequent additions and deletions, try to:

  • Choose a bucket size that allows for overhead accumulation, if possible. Avoid bucket sizes that are an exact or near multiple of your record size. See the Bucket Size section below.
  • Optimize record insertion by using the RMS DEFINE Utility (refer to the OpenVMS Record Management Utilities Reference Manual) to define the file with fill numbers; use the APPLY FILL-SIZE clause when loading the file.

Alternate Keys

Each alternate key requires the creation and maintenance of a separate index structure. The more keys you define, the longer each WRITE, REWRITE, and DELETE operation takes. (The throughput of READ operations is not affected by multiple keys.)

If your application requires alternate keys, you can minimize I/O processing time if you avoid duplicate alternate keys. Duplicate keys can create long record pointer arrays, which fill bucket space and increase access time.

Bucket Size

Bucket size selection can influence indexed file performance.

To the system, bucket size is an integral number of physical blocks, each 512 bytes long. Thus, a bucket size of 1 specifies a 512-byte bucket, while a bucket size of 2 specifies a 1024-byte bucket, and so on.

The HP COBOL compiler passes bucket size values to the I/O system based on what you specify in the BLOCK CONTAINS clause. In this case, you express bucket size in terms of records or characters.

If you specify block size in records, the bucket can contain more records than you specify, but never fewer. For example, assume that your file contains fixed-length, 100-byte records, and you want each bucket to contain five records, as follows:


BLOCK CONTAINS 5 RECORDS

This appears to define a bucket as a 512-byte block, containing five records of 100 bytes each. However, the compiler adds I/O system record and bucket overhead to each bucket, as follows:

Bucket overhead = 15 bytes per bucket
Record overhead = 7 bytes per record (fixed-length)
9 bytes per record (variable-length)

Thus, in this example, the bucket size calculation is:


Because blocks are 512 bytes long, and buckets are always an integral number of blocks, the smallest bucket size possible (the system default) in this case is two blocks. The system, however, puts in as many records as fit into each bucket. Thus, the bucket actually contains nine records, not five.

The CHARACTERS option of the BLOCK CONTAINS clause lets you specify bucket size more directly. For example:


BLOCK CONTAINS 2048 CHARACTERS

This specifies a bucket size of four 512-byte blocks. The number of characters in a bucket is always a multiple of 512. If not, the I/O system rounds it to the next higher multiple of 512.

Index Depth

The length of data records, key fields, and buckets in the file determines the depth of the index. Index depth, in turn, determines the number of disk accesses needed to retrieve a record. The smaller the index depth, the better the performance. In general, an index depth of 3 or 4 gives satisfactory performance. If your calculated index depth is greater than 4, you should consider redesigning the file.

You can optimize your file's index depth after you have determined file, record, and key size. Calculating index depth is an iterative process, with bucket size as the variable. Keep in mind that the highest level (root level) can contain only one bucket.

If much data is added over time to an indexed file, you should reorganize the file periodically to restore its indexes to their optimal levels.

Following is detailed information on calculating file size, and an example of index depth calculation:

File Size

When you calculate file size:

  • Every bucket in an indexed file contains 15 bytes of overhead.
  • Every bucket in an indexed file contains records. Only record type and size differ.
  • Data records are only in level 0 buckets of the primary index.
  • Index records are in level 1 and higher-numbered buckets.
  • If you use alternate keys, secondary index data records (SIDRs) are only in level 0 buckets of alternate indexes.

Use these calculations to determine data and index record size:

  • Data records:
    Fixed-length record size = actual record size + 7

    Variable-length record size = actual record size + 9
  • Index records:
    Record size = key size + 3

If a file has more than 65,536 blocks, the 3-byte index record overhead could increase to 5 bytes.

Use these calculations to determine SIDR record length:

  • No duplicates allowed:
    Record size = key size + 9
  • Duplicates allowed:
    Record size = key size + 8 + 5 * (number of duplicate records)

Note

Bucket packing efficiency determines how well bucket space is used. A packing efficiency of 1 means the buckets of an index are full. A packing efficiency of .5 means that, on the average, the buckets are half full.

Consider an indexed file with these attributes:

  • 100,000 fixed-length records of 200 characters each
  • Primary key = 20 characters
  • Alternate key = 8 characters, no duplicates allowed
  • Bucket size = 3 (an arbitrary value)
  • No fill number

Primary key index level calculations:

In the following calculations, some results are to be rounded up, and some truncated.



Previous Next Contents Index