[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

HP OpenVMS Utility Routines Manual


Previous Contents Index


Chapter 7
Data Compression/Expansion (DCX) Routines

The set of routines described in this chapter comprises the Data Compression/Expansion (DCX) facility. There is no DCL-level interface to this facility, nor is there a DCX utility.

7.1 Introduction to DCX Routines

Using the DCX routines described in this chapter, you can decrease the size of text, binary data, images, and any other type of data. Compressed data uses less space, but there is a trade-off in terms of access time to the data. Compressed data must first be expanded to its original state before it is usable. Thus, infrequently accessed data makes a good candidate for data compression.

The DCX facility provides routines that analyze and compress data records and expand the compressed records to their original state. In this process, no information is lost. A data record that has been compressed and then expanded is in the same state as it was before it was compressed.

Most collections of data can be reduced in size by DCX. However, there is no guarantee that the size of an individual data record will always be smaller after compression; in fact, some may grow larger.

The DCX facility allows for the independent analysis, compression, and expansion of more than one stream of data records at the same time. This capability is provided by means of a "context variable," which is an argument in each DCX routine. Most applications have no need for this capability; for these applications, there is a single context variable.

Some of the DCX routines make calls to various Run-Time Library (RTL) routines, for example, LIB$GET_VM. If any of these RTL routines fails, a return status code indicating the cause of the failure is returned. In such a case, you must refer to the documentation of the appropriate RTL routine to determine the cause of the failure. The status codes documented in this chapter are primarily DCX status codes.

Note also that the application program should declare referenced constants and return status symbols as external symbols; these symbols are resolved upon linking with the utility shareable image.

7.1.1 Compression Routines

Compressing a file with the DCX routines involves the following steps:

  1. Initialize an analysis work area---Use the DCX$ANALYZE_INIT routine to initialize a work area for analyzing the records. The first (and, typically, the only) argument passed to DCX$ANALYZE_INIT is an integer variable for storing the context value. The DCX facility assigns a value to the context variable and associates the value with the created work area. Each time you want to analyze a record in that area, specify the associated context variable. You can analyze two or more files at once by creating a different work area for each file, giving each area a different context variable, and analyzing the records of each file in the appropriate work area.
  2. Analyze the records in the file---Use the DCX$ANALYZE_DATA routine to pass each record in the file to an analysis work area. During analysis, the DCX facility gathers information that DCX$MAKE_MAP uses to create the compression/expansion function for the file. To ensure that the first byte of each record is passed to the DCX facility rather than being interpreted as a carriage control, specify CARRIAGECONTROL = NONE when you open the file to be compressed.
  3. Create the compression/expansion function---Use the DCX$MAKE_MAP routine to create the compression/expansion function. You pass DCX$MAKE_MAP a context variable, and DCX$MAKE_MAP uses the information stored in the associated work area to compute a compression/expansion function for the records being compressed. If DCX$MAKE_MAP returns a status value of DCX$_AGAIN, repeat Steps 2 and 3 until DCX$MAKE_MAP returns a status of DCX$_NORMAL, indicating that a compression/expansion function has been created.
    In Example 7-1, the integer function GET_MAP analyzes each record in the file to be compressed and invokes DCX$MAKE_MAP to create the compression/expansion function. The function value of GET_MAP is the return status of DCX$MAKE_MAP, and the address and length of the compression/expansion function are returned in the GET_MAP argument list. The main program, COMPRESS_FILES, invokes the GET_MAP function, examines its function value, and, if necessary, invokes the GET_MAP function again (see the ANALYZE DATA program section).
  4. Clean up the analysis work area---Use the DCX$ANALYZE_DONE routine to delete a work area. Identify the work area to be deleted by passing DCX$ANALYZE_DONE routine a context variable.
  5. Save the compression/expansion function---You cannot expand compressed records without the compression/expansion function. Therefore, before compressing the records, write the compression/expansion function to the file that will contain the compressed records.
    If your programming language cannot use an address directly, pass the address of the compression/expansion function to a subprogram (WRITE_MAP in Example 7-1). Pass the subprogram the length of the compression/expansion function as well.
    In the subprogram, declare the dummy argument corresponding to the function address as a one-dimensional, adjustable, byte array. Declare the dummy argument corresponding to the function length as an integer, and use it to dimension the adjustable array. Write the function length and the array containing the function to the file that is to contain the compressed records. (The length must be stored so that you can read the function from the file using unformatted I/O; see Section 7.1.2.)
  6. Compress each record---Use the DCX$COMPRESS_INIT routine to initialize a compression work area. Specify a context variable for the compression area just as for the analysis area.
    Use the DCX$COMPRESS_DATA routine to compress each record. As you compress each record, use unformatted I/O to write the compressed record to the file containing the compression/expansion function. For each record, write the length of the record and the substring containing the record. See the COMPRESS DATA section in Example 7-1. (The length is stored with the substring so that you can read the compressed record from the file using unformatted I/O; see Section 7.1.2.)
  7. Use DCX$COMPRESS_DONE to delete the work area created by DCX$COMPRESS_INIT. Identify the work area to be deleted by passing DCX$COMPRESS_DATA a context variable. Use LIB$FREE_VM to free the virtual memory that DCX$MAKE_MAP used for the compression/expansion function.

7.1.2 Expansion Routines

Expanding a file with the DCX routines involves the following steps:

  1. Read the compression/expansion function---When reading the compression/expansion function from the compressed file, do not make any assumptions about the function's size. The best practice is to read the length of the function from the compressed file and then invoke the LIB$GET_VM routine to get the necessary amount of storage for the function. The LIB$GET_VM routine returns the address of the first byte of the storage area.
    If your programming language cannot use an address directly, pass the address of the storage area to a subprogram. Pass the subprogram the length of the compression/expansion function as well.
    In the subprogram, declare the dummy argument corresponding to the storage address as a one-dimensional, adjustable, byte array. Declare the dummy argument corresponding to the function length as an integer and use it to dimension the adjustable array. Read the compression/expansion function from the compressed file into the dummy array. Because the compression/expansion function is stored in the subprogram, do not return to the main program until you have expanded all of the compressed records.
  2. Initialize an expansion work area---Use the DCX$EXPAND_INIT routine to initialize a work area for expanding the records. The first argument passed to DCX$EXPAND_INIT is an integer variable to contain a context value (see step 1 in Section 7.1.1). The second argument is the address of the compression/expansion function.
  3. Expand the records---Use the DCX$EXPAND_DATA routine to expand each record.
  4. Clean up the work area---Use the DCX$EXPAND_DONE routine to delete an expansion work area. Identify the work area to be deleted by passing DCX$EXPAND_DONE a context variable.

7.2 Using the DCX Routines: Examples

Example 7-1 shows how to use the callable DCX routines to compress a file in a HP Fortran program.

Example 7-2 expands a compressed file. The first record of the compressed file is an integer containing the number of bytes in the compression/expansion function. The second record is the compression/expansion function. The remainder of the file contains the compressed records. Each compressed record is stored as two records: an integer containing the length of the record and a substring containing the record.

Example 7-1 Compressing a File in a HP Fortran Program

PROGRAM COMPRESS_FILES
! COMPRESSION OF FILES

! status variable
INTEGER STATUS,
2       IOSTAT,
2       IO_OK,
2       STATUS_OK
PARAMETER (IO_OK = 0)
PARAMETER (STATUS_OK = 1)
INCLUDE '($FORDEF)'
EXTERNAL DCX$_AGAIN

! context variable
INTEGER CONTEXT
! compression/expansion function
INTEGER MAP,
2       MAP_LEN

! normal file name, length, and logical unit number
CHARACTER*256 NORM_NAME
INTEGER*2 NORM_LEN
INTEGER NORM_LUN
! compressed file name, length, and logical unit number
CHARACTER*256 COMP_NAME
INTEGER*2 COMP_LEN
INTEGER COMP_LUN

! Logical end-of-file
LOGICAL EOF
! record buffers;  32764 is maximum record size
CHARACTER*32764 RECORD,
2               RECORD2
INTEGER RECORD_LEN,
2       RECORD2_LEN

! user routine
INTEGER GET_MAP,
2       WRITE_MAP

! Library procedures
INTEGER DCX$ANALYZE_INIT,
2       DCX$ANALYZE_DONE,
2       DCX$COMPRESS_INIT,
2       DCX$COMPRESS_DATA,
2       DCX$COMPRESS_DONE,
2       LIB$GET_INPUT,
2       LIB$GET_LUN,
2       LIB$FREE_VM

! get name of file to be compressed and open it
STATUS = LIB$GET_INPUT (NORM_NAME,
2                       'File to compress: ',
2                       NORM_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_LUN (NORM_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = NORM_LUN,
2     FILE = NORM_NAME(1:NORM_LEN),
2     CARRIAGECONTROL = 'NONE',
2     STATUS = 'OLD')




! ************
! ANALYZE DATA
! ************
! initialize work area
STATUS = DCX$ANALYZE_INIT (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! get compression/expansion function (map)
STATUS = GET_MAP (NORM_LUN,
2                 CONTEXT,
2                 MAP,
2                 MAP_LEN)
DO WHILE (STATUS .EQ. %LOC(DCX$_AGAIN))
  ! go back to beginning of file
  REWIND (UNIT = NORM_LUN)
  ! try map again
  STATUS = GET_MAP (NORM_LUN,
2                   CONTEXT,
2                   MAP,
2                   MAP_LEN)
  END DO
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! clean up work area
STATUS = DCX$ANALYZE_DONE (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))

! *************
! COMPRESS DATA
! *************
! go back to beginning of file to be compressed
REWIND (UNIT = NORM_LUN)
! open file to hold compressed records
STATUS = LIB$GET_LUN (COMP_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_INPUT (COMP_NAME,
2                       'File for compressed records: ',
2                       COMP_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = COMP_LUN,
2     FILE = COMP_NAME(1:COMP_LEN),
2     STATUS = 'NEW',
2     FORM = 'UNFORMATTED')

! initialize work area
STATUS = DCX$COMPRESS_INIT (CONTEXT,
2                           MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! write compression/expansion function to new file
CALL WRITE_MAP (COMP_LUN,
2               %VAL(MAP),
2               MAP_LEN)

! read record from file to be compressed
EOF = .FALSE.
READ (UNIT = NORM_LUN,
2     FMT = '(Q,A)',
2     IOSTAT = IOSTAT) RECORD_LEN,
2                      RECORD(1:RECORD_LEN)
IF (IOSTAT .NE. IO_OK) THEN
  CALL ERRSNS (,,,,STATUS)
  IF (STATUS .NE. FOR$_ENDDURREA) THEN
     CALL LIB$SIGNAL (%VAL(STATUS))
     ELSE
     EOF = .TRUE.
     STATUS = STATUS_OK
     END IF
  END IF

DO WHILE (.NOT. EOF)
  ! compress the record
  STATUS = DCX$COMPRESS_DATA (CONTEXT,
2                             RECORD(1:RECORD_LEN),
2                             RECORD2,
2                             RECORD2_LEN)
  IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
  ! write compressed record to new file
  WRITE (UNIT = COMP_LUN) RECORD2_LEN
  WRITE (UNIT = COMP_LUN) RECORD2 (1:RECORD2_LEN)
  ! read from file to be compressed
  READ (UNIT = NORM_LUN,
2       FMT = '(Q,A)',
2       IOSTAT = IOSTAT) RECORD_LEN,
2                        RECORD (1:RECORD_LEN)
  IF (IOSTAT .NE. IO_OK) THEN
    CALL ERRSNS (,,,,STATUS)
    IF (STATUS .NE. FOR$_ENDDURREA) THEN
       CALL LIB$SIGNAL (%VAL(STATUS))
       ELSE
       EOF = .TRUE.
       STATUS = STATUS_OK
       END IF
    END IF
  END DO

! close files and clean up work area
CLOSE (NORM_LUN)
CLOSE (COMP_LUN)
STATUS = LIB$FREE_VM (MAP_LEN,
2                     MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = DCX$COMPRESS_DONE (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))

END

INTEGER FUNCTION GET_MAP (LUN,      ! passed
2                         CONTEXT,  ! passed
2                         MAP,      ! returned
2                         MAP_LEN)  ! returned
! Analyzes records in file opened on logical
! unit LUN and then attempts to create a
! compression/expansion function using
! DCX$MAKE_MAP.

! dummy arguments
! context variable
INTEGER CONTEXT
! logical unit number
INTEGER LUN
! compression/expansion function
INTEGER MAP,
2       MAP_LEN

! status variable
INTEGER STATUS,
2       IOSTAT,
2       IO_OK,
2       STATUS_OK
PARAMETER (IO_OK = 0)
PARAMETER (STATUS_OK = 1)
INCLUDE '($FORDEF)'

! Logical end-of-file
LOGICAL EOF
! record buffer;  32764 is the maximum record size
CHARACTER*32764 RECORD
INTEGER RECORD_LEN

! library procedures
INTEGER DCX$ANALYZE_DATA,
2       DCX$MAKE_MAP

! analyze records
EOF = .FALSE.
READ (UNIT = LUN,
2     FMT = '(Q,A)',
2     IOSTAT = IOSTAT) RECORD_LEN,RECORD
IF (IOSTAT .NE. IO_OK) THEN
  CALL ERRSNS (,,,,STATUS)
  IF (STATUS .NE. FOR$_ENDDURREA) THEN
     CALL LIB$SIGNAL (%VAL(STATUS))
     ELSE
     EOF = .TRUE.
     STATUS = STATUS_OK
     END IF
  END IF

DO WHILE (.NOT. EOF)
  STATUS = DCX$ANALYZE_DATA (CONTEXT,
2                            RECORD(1:RECORD_LEN))
  IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
  READ (UNIT = LUN,
2       FMT = '(Q,A)',
2       IOSTAT = IOSTAT) RECORD_LEN,RECORD
  IF (IOSTAT .NE. IO_OK) THEN
    CALL ERRSNS (,,,,STATUS)
    IF (STATUS .NE. FOR$_ENDDURREA) THEN
       CALL LIB$SIGNAL (%VAL(STATUS))
       ELSE
       EOF = .TRUE.
       STATUS = STATUS_OK
       END IF
    END IF
  END DO

STATUS = DCX$MAKE_MAP (CONTEXT,
2                      MAP,
2                      MAP_LEN)
GET_MAP = STATUS

END

SUBROUTINE WRITE_MAP (LUN,     ! passed
2                     MAP,     ! passed
2                     MAP_LEN) ! passed
IMPLICIT INTEGER(A-Z)
! write compression/expansion function
! to file of compressed data

! dummy arguments
INTEGER LUN,        ! logical unit of file
2       MAP_LEN     ! length of function
BYTE MAP (MAP_LEN)  ! compression/expansion function

! write map length
WRITE (UNIT = LUN) MAP_LEN
! write map
WRITE (UNIT = LUN) MAP

END

Example 7-2 shows how to expand a compressed file in a HP Fortran program.

Example 7-2 Expanding a Compressed File in a HP Fortran Program

PROGRAM EXPAND_FILES
IMPLICIT INTEGER(A-Z)
! EXPANSION OF COMPRESSED FILES

! file names, lengths, and logical unit numbers
CHARACTER*256 OLD_FILE,
2             NEW_FILE
INTEGER*2 OLD_LEN,
2         NEW_LEN
INTEGER OLD_LUN,
2       NEW_LUN

! length of compression/expansion function
INTEGER MAP,
2       MAP_LEN

! user routine
EXTERNAL EXPAND_DATA

! library procedures
INTEGER LIB$GET_LUN,
2       LIB$GET_INPUT,
2       LIB$GET_VM,
2       LIB$FREE_VM

! open file to expand
STATUS = LIB$GET_LUN (OLD_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_INPUT (OLD_FILE,
2                       'File to expand: ',
2                       OLD_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = OLD_LUN,
2     STATUS = 'OLD',
2     FILE = OLD_FILE(1:OLD_LEN),
2     FORM = 'UNFORMATTED')
! open file to hold expanded data
STATUS = LIB$GET_LUN (NEW_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_INPUT (NEW_FILE,
2                       'File to hold expanded data: ',
2                       NEW_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = NEW_LUN,
2     STATUS = 'NEW',
2     CARRIAGECONTROL = 'LIST',
2     FILE = NEW_FILE(1:NEW_LEN))

! expand file
! get length of compression/expansion function
READ (UNIT = OLD_LUN) MAP_LEN
STATUS = LIB$GET_VM (MAP_LEN,
2                    MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! expand records
CALL EXPAND_DATA (%VAL(MAP),
2                 MAP_LEN,     ! length of function
2                 OLD_LUN,     ! compressed data file
2                 NEW_LUN)     ! expanded data file
! delete virtual memory used for function
STATUS = LIB$FREE_VM (MAP_LEN,
2                     MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
END

SUBROUTINE EXPAND_DATA (MAP,     ! passed
2                       MAP_LEN, ! passed
2                       OLD_LUN, ! passed
2                       NEW_LUN) ! passed
! expand data program

! dummy arguments
INTEGER MAP_LEN,   ! length of expansion function
2       OLD_LUN,   ! logical unit of compressed file
2       NEW_LUN    ! logical unit of expanded file
BYTE MAP(MAP_LEN)  ! array containing the function

! status variables
INTEGER STATUS,
2       IOSTAT,
2       IO_OK,
2       STATUS_OK
PARAMETER (IO_OK = 0)
PARAMETER (STATUS_OK = 1)
INCLUDE '($FORDEF)'

! context variable
INTEGER CONTEXT

! logical end_of_file
LOGICAL EOF
! record buffers
CHARACTER*32764 RECORD,
2               RECORD2
INTEGER RECORD_LEN,
2       RECORD2_LEN

! library procedures
INTEGER DCX$EXPAND_INIT,
2       DCX$EXPAND_DATA,
2       DCX$EXPAND_DONE

! read data compression/expansion function
READ (UNIT = OLD_LUN) MAP
! initialize work area
STATUS = DCX$EXPAND_INIT (CONTEXT,
2                         %LOC(MAP(1)))
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! expand records
EOF = .FALSE.
! read length of compressed record
READ (UNIT = OLD_LUN,
2     IOSTAT = IOSTAT) RECORD_LEN
IF (IOSTAT .NE. IO_OK) THEN
  CALL ERRSNS (,,,,STATUS)
  IF (STATUS .NE. FOR$_ENDDURREA) THEN
    CALL LIB$SIGNAL (%VAL(STATUS))
    ELSE
    EOF = .TRUE.
    STATUS = STATUS_OK
    END IF
  END IF
DO WHILE (.NOT. EOF)
  ! read compressed record
  READ (UNIT = OLD_LUN) RECORD (1:RECORD_LEN)
  ! expand record
  STATUS = DCX$EXPAND_DATA (CONTEXT,
2                           RECORD(1:RECORD_LEN),
2                           RECORD2,
2                           RECORD2_LEN)
  IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
  ! write expanded record to new file
  WRITE (UNIT = NEW_LUN,
2        FMT = '(A)') RECORD2(1:RECORD2_LEN)
  ! read length of compressed record
  READ (UNIT = OLD_LUN,
2       IOSTAT = IOSTAT) RECORD_LEN
  IF (IOSTAT .NE. IO_OK) THEN
    CALL ERRSNS (,,,,STATUS)
    IF (STATUS .NE. FOR$_ENDDURREA) THEN
      CALL LIB$SIGNAL (%VAL(STATUS))
      ELSE
      EOF = .TRUE.
      STATUS = STATUS_OK
      END IF
    END IF
  END DO
! clean up work area
STATUS = DCX$EXPAND_DONE (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
END

7.3 DCX Routines

This section describes the individual DCX routines.


DCX$ANALYZE_DATA

The DCX$ANALYZE_DATA routine performs statistical analysis on a data record. The results of the analysis are accumulated internally in the context area and are used by the DCX$MAKE_MAP routine to compute the mapping function.

Format

DCX$ANALYZE_DATA context ,record


RETURNS


OpenVMS usage: cond_value
type: longword (unsigned)
access: write only
mechanism: by value

Longword condition value. Most utility routines return a condition value in R0. Condition values that this routine can return are listed under Condition Values Returned.


Arguments

context


OpenVMS usage: context
type: longword (unsigned)
access: read only
mechanism: by reference

Value identifying the data stream that DCX$ANALYZE_DATA analyzes. The context argument is the address of a longword containing this value. DCX$ANALYZE_INIT initializes this value; you should not modify it. You can define multiple context arguments to identify multiple data streams that are processed simultaneously.

record


OpenVMS usage: char_string
type: character string
access: read only
mechanism: by descriptor

Record to be analyzed. DCX$ANALYZE_DATA reads the record argument, which is the address of a descriptor for the record string. The maximum length of the record string is 65,535 characters.


Description

The DCX$ANALYZE_DATA routine performs statistical analysis on a single data record. This routine is called once for each data record to be analyzed.

During analysis, the DCX facility gathers information that DCX$MAKE_MAP uses to create the compression/expansion function for the file. After the data records have been analyzed, call the DCX$MAKE_MAP routine. Upon receiving the DCX$_AGAIN status code from DCX$MAKE_MAP, you must again analyze the same data records (in the same order) using DCX$ANALYZE_DATA and then call DCX$MAKE_MAP again. On the second iteration, DCX$MAKE_MAP returns the DCX$_NORMAL status code, and the data analysis is complete.


Condition Values Returned

DCX$_INVCTX Error. The context variable is invalid, or the context area is invalid or corrupted. This may be caused by a failure to call the appropriate routine to initialize the context variable or by an application program error.
DCX$_NORMAL Normal successful completion.

This routine also returns any condition values returned by LIB$ANALYZE_SDESC_R2.


Previous Next Contents Index