|
HP OpenVMS Utility Routines Manual
Chapter 7 Data Compression/Expansion (DCX) Routines
The set of routines described in this chapter comprises the Data
Compression/Expansion (DCX) facility. There is no DCL-level interface
to this facility, nor is there a DCX utility.
7.1 Introduction to DCX Routines
Using the DCX routines described in this chapter, you can decrease the
size of text, binary data, images, and any other type of data.
Compressed data uses less space, but there is a trade-off in terms of
access time to the data. Compressed data must first be expanded to its
original state before it is usable. Thus, infrequently accessed data
makes a good candidate for data compression.
The DCX facility provides routines that analyze and compress data
records and expand the compressed records to their original state. In
this process, no information is lost. A data record that has been
compressed and then expanded is in the same state as it was before it
was compressed.
Most collections of data can be reduced in size by DCX. However, there
is no guarantee that the size of an individual data record will always
be smaller after compression; in fact, some may grow larger.
The DCX facility allows for the independent analysis, compression, and
expansion of more than one stream of data records at the same time.
This capability is provided by means of a "context variable,"
which is an argument in each DCX routine. Most applications have no
need for this capability; for these applications, there is a single
context variable.
Some of the DCX routines make calls to various Run-Time Library (RTL)
routines, for example, LIB$GET_VM. If any of these RTL routines fails,
a return status code indicating the cause of the failure is returned.
In such a case, you must refer to the documentation of the appropriate
RTL routine to determine the cause of the failure. The status codes
documented in this chapter are primarily DCX status codes.
Note also that the application program should declare referenced
constants and return status symbols as external symbols; these symbols
are resolved upon linking with the utility shareable image.
7.1.1 Compression Routines
Compressing a file with the DCX routines involves the following steps:
- Initialize an analysis work area---Use the DCX$ANALYZE_INIT routine
to initialize a work area for analyzing the records. The first (and,
typically, the only) argument passed to DCX$ANALYZE_INIT is an integer
variable for storing the context value. The DCX facility assigns a
value to the context variable and associates the value with the created
work area. Each time you want to analyze a record in that area, specify
the associated context variable. You can analyze two or more files at
once by creating a different work area for each file, giving each area
a different context variable, and analyzing the records of each file in
the appropriate work area.
- Analyze the records in the file---Use the DCX$ANALYZE_DATA routine
to pass each record in the file to an analysis work area. During
analysis, the DCX facility gathers information that DCX$MAKE_MAP uses
to create the compression/expansion function for the file. To ensure
that the first byte of each record is passed to the DCX facility rather
than being interpreted as a carriage control, specify CARRIAGECONTROL =
NONE when you open the file to be compressed.
- Create the compression/expansion function---Use the DCX$MAKE_MAP
routine to create the compression/expansion function. You pass
DCX$MAKE_MAP a context variable, and DCX$MAKE_MAP uses the information
stored in the associated work area to compute a compression/expansion
function for the records being compressed. If DCX$MAKE_MAP returns a
status value of DCX$_AGAIN, repeat Steps 2 and 3 until DCX$MAKE_MAP
returns a status of DCX$_NORMAL, indicating that a
compression/expansion function has been created.
In Example 7-1,
the integer function GET_MAP analyzes each record in the file to be
compressed and invokes DCX$MAKE_MAP to create the compression/expansion
function. The function value of GET_MAP is the return status of
DCX$MAKE_MAP, and the address and length of the compression/expansion
function are returned in the GET_MAP argument list. The main program,
COMPRESS_FILES, invokes the GET_MAP function, examines its function
value, and, if necessary, invokes the GET_MAP function again (see the
ANALYZE DATA program section).
- Clean up the analysis work area---Use the DCX$ANALYZE_DONE routine
to delete a work area. Identify the work area to be deleted by passing
DCX$ANALYZE_DONE routine a context variable.
- Save the compression/expansion function---You cannot expand
compressed records without the compression/expansion function.
Therefore, before compressing the records, write the
compression/expansion function to the file that will contain the
compressed records.
If your programming language cannot use an
address directly, pass the address of the compression/expansion
function to a subprogram (WRITE_MAP in Example 7-1). Pass the
subprogram the length of the compression/expansion function as well.
In the subprogram, declare the dummy argument corresponding to the
function address as a one-dimensional, adjustable, byte array. Declare
the dummy argument corresponding to the function length as an integer,
and use it to dimension the adjustable array. Write the function length
and the array containing the function to the file that is to contain
the compressed records. (The length must be stored so that you can read
the function from the file using unformatted I/O; see Section 7.1.2.)
- Compress each record---Use the DCX$COMPRESS_INIT routine to
initialize a compression work area. Specify a context variable for the
compression area just as for the analysis area.
Use the
DCX$COMPRESS_DATA routine to compress each record. As you compress each
record, use unformatted I/O to write the compressed record to the file
containing the compression/expansion function. For each record, write
the length of the record and the substring containing the record. See
the COMPRESS DATA section in Example 7-1. (The length is stored with
the substring so that you can read the compressed record from the file
using unformatted I/O; see Section 7.1.2.)
- Use DCX$COMPRESS_DONE to delete the work area created by
DCX$COMPRESS_INIT. Identify the work area to be deleted by passing
DCX$COMPRESS_DATA a context variable. Use LIB$FREE_VM to free the
virtual memory that DCX$MAKE_MAP used for the compression/expansion
function.
7.1.2 Expansion Routines
Expanding a file with the DCX routines involves the following steps:
- Read the compression/expansion function---When reading the
compression/expansion function from the compressed file, do not make
any assumptions about the function's size. The best practice is to read
the length of the function from the compressed file and then invoke the
LIB$GET_VM routine to get the necessary amount of storage for the
function. The LIB$GET_VM routine returns the address of the first byte
of the storage area.
If your programming language cannot use an
address directly, pass the address of the storage area to a subprogram.
Pass the subprogram the length of the compression/expansion function as
well. In the subprogram, declare the dummy argument corresponding
to the storage address as a one-dimensional, adjustable, byte array.
Declare the dummy argument corresponding to the function length as an
integer and use it to dimension the adjustable array. Read the
compression/expansion function from the compressed file into the dummy
array. Because the compression/expansion function is stored in the
subprogram, do not return to the main program until you have expanded
all of the compressed records.
- Initialize an expansion work area---Use the DCX$EXPAND_INIT routine
to initialize a work area for expanding the records. The first argument
passed to DCX$EXPAND_INIT is an integer variable to contain a context
value (see step 1 in Section 7.1.1). The second argument is the address
of the compression/expansion function.
- Expand the records---Use the DCX$EXPAND_DATA routine to expand each
record.
- Clean up the work area---Use the DCX$EXPAND_DONE routine to delete
an expansion work area. Identify the work area to be deleted by passing
DCX$EXPAND_DONE a context variable.
7.2 Using the DCX Routines: Examples
Example 7-1 shows how to use the callable DCX routines to compress a
file in a HP Fortran program.
Example 7-2 expands a compressed file. The first record of the
compressed file is an integer containing the number of bytes in the
compression/expansion function. The second record is the
compression/expansion function. The remainder of the file contains the
compressed records. Each compressed record is stored as two records: an
integer containing the length of the record and a substring containing
the record.
Example 7-1 Compressing a File in a HP
Fortran Program |
PROGRAM COMPRESS_FILES
! COMPRESSION OF FILES
! status variable
INTEGER STATUS,
2 IOSTAT,
2 IO_OK,
2 STATUS_OK
PARAMETER (IO_OK = 0)
PARAMETER (STATUS_OK = 1)
INCLUDE '($FORDEF)'
EXTERNAL DCX$_AGAIN
! context variable
INTEGER CONTEXT
! compression/expansion function
INTEGER MAP,
2 MAP_LEN
! normal file name, length, and logical unit number
CHARACTER*256 NORM_NAME
INTEGER*2 NORM_LEN
INTEGER NORM_LUN
! compressed file name, length, and logical unit number
CHARACTER*256 COMP_NAME
INTEGER*2 COMP_LEN
INTEGER COMP_LUN
|
! Logical end-of-file
LOGICAL EOF
! record buffers; 32764 is maximum record size
CHARACTER*32764 RECORD,
2 RECORD2
INTEGER RECORD_LEN,
2 RECORD2_LEN
! user routine
INTEGER GET_MAP,
2 WRITE_MAP
|
! Library procedures
INTEGER DCX$ANALYZE_INIT,
2 DCX$ANALYZE_DONE,
2 DCX$COMPRESS_INIT,
2 DCX$COMPRESS_DATA,
2 DCX$COMPRESS_DONE,
2 LIB$GET_INPUT,
2 LIB$GET_LUN,
2 LIB$FREE_VM
! get name of file to be compressed and open it
STATUS = LIB$GET_INPUT (NORM_NAME,
2 'File to compress: ',
2 NORM_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_LUN (NORM_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = NORM_LUN,
2 FILE = NORM_NAME(1:NORM_LEN),
2 CARRIAGECONTROL = 'NONE',
2 STATUS = 'OLD')
! ************
! ANALYZE DATA
! ************
! initialize work area
STATUS = DCX$ANALYZE_INIT (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! get compression/expansion function (map)
STATUS = GET_MAP (NORM_LUN,
2 CONTEXT,
2 MAP,
2 MAP_LEN)
DO WHILE (STATUS .EQ. %LOC(DCX$_AGAIN))
! go back to beginning of file
REWIND (UNIT = NORM_LUN)
! try map again
STATUS = GET_MAP (NORM_LUN,
2 CONTEXT,
2 MAP,
2 MAP_LEN)
END DO
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! clean up work area
STATUS = DCX$ANALYZE_DONE (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
|
! *************
! COMPRESS DATA
! *************
! go back to beginning of file to be compressed
REWIND (UNIT = NORM_LUN)
! open file to hold compressed records
STATUS = LIB$GET_LUN (COMP_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_INPUT (COMP_NAME,
2 'File for compressed records: ',
2 COMP_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = COMP_LUN,
2 FILE = COMP_NAME(1:COMP_LEN),
2 STATUS = 'NEW',
2 FORM = 'UNFORMATTED')
! initialize work area
STATUS = DCX$COMPRESS_INIT (CONTEXT,
2 MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! write compression/expansion function to new file
CALL WRITE_MAP (COMP_LUN,
2 %VAL(MAP),
2 MAP_LEN)
! read record from file to be compressed
EOF = .FALSE.
READ (UNIT = NORM_LUN,
2 FMT = '(Q,A)',
2 IOSTAT = IOSTAT) RECORD_LEN,
2 RECORD(1:RECORD_LEN)
IF (IOSTAT .NE. IO_OK) THEN
CALL ERRSNS (,,,,STATUS)
IF (STATUS .NE. FOR$_ENDDURREA) THEN
CALL LIB$SIGNAL (%VAL(STATUS))
ELSE
EOF = .TRUE.
STATUS = STATUS_OK
END IF
END IF
|
DO WHILE (.NOT. EOF)
! compress the record
STATUS = DCX$COMPRESS_DATA (CONTEXT,
2 RECORD(1:RECORD_LEN),
2 RECORD2,
2 RECORD2_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! write compressed record to new file
WRITE (UNIT = COMP_LUN) RECORD2_LEN
WRITE (UNIT = COMP_LUN) RECORD2 (1:RECORD2_LEN)
! read from file to be compressed
READ (UNIT = NORM_LUN,
2 FMT = '(Q,A)',
2 IOSTAT = IOSTAT) RECORD_LEN,
2 RECORD (1:RECORD_LEN)
IF (IOSTAT .NE. IO_OK) THEN
CALL ERRSNS (,,,,STATUS)
IF (STATUS .NE. FOR$_ENDDURREA) THEN
CALL LIB$SIGNAL (%VAL(STATUS))
ELSE
EOF = .TRUE.
STATUS = STATUS_OK
END IF
END IF
END DO
! close files and clean up work area
CLOSE (NORM_LUN)
CLOSE (COMP_LUN)
STATUS = LIB$FREE_VM (MAP_LEN,
2 MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = DCX$COMPRESS_DONE (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
END
INTEGER FUNCTION GET_MAP (LUN, ! passed
2 CONTEXT, ! passed
2 MAP, ! returned
2 MAP_LEN) ! returned
! Analyzes records in file opened on logical
! unit LUN and then attempts to create a
! compression/expansion function using
! DCX$MAKE_MAP.
|
! dummy arguments
! context variable
INTEGER CONTEXT
! logical unit number
INTEGER LUN
! compression/expansion function
INTEGER MAP,
2 MAP_LEN
! status variable
INTEGER STATUS,
2 IOSTAT,
2 IO_OK,
2 STATUS_OK
PARAMETER (IO_OK = 0)
PARAMETER (STATUS_OK = 1)
INCLUDE '($FORDEF)'
! Logical end-of-file
LOGICAL EOF
! record buffer; 32764 is the maximum record size
CHARACTER*32764 RECORD
INTEGER RECORD_LEN
! library procedures
INTEGER DCX$ANALYZE_DATA,
2 DCX$MAKE_MAP
! analyze records
EOF = .FALSE.
READ (UNIT = LUN,
2 FMT = '(Q,A)',
2 IOSTAT = IOSTAT) RECORD_LEN,RECORD
IF (IOSTAT .NE. IO_OK) THEN
CALL ERRSNS (,,,,STATUS)
IF (STATUS .NE. FOR$_ENDDURREA) THEN
CALL LIB$SIGNAL (%VAL(STATUS))
ELSE
EOF = .TRUE.
STATUS = STATUS_OK
END IF
END IF
|
DO WHILE (.NOT. EOF)
STATUS = DCX$ANALYZE_DATA (CONTEXT,
2 RECORD(1:RECORD_LEN))
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
READ (UNIT = LUN,
2 FMT = '(Q,A)',
2 IOSTAT = IOSTAT) RECORD_LEN,RECORD
IF (IOSTAT .NE. IO_OK) THEN
CALL ERRSNS (,,,,STATUS)
IF (STATUS .NE. FOR$_ENDDURREA) THEN
CALL LIB$SIGNAL (%VAL(STATUS))
ELSE
EOF = .TRUE.
STATUS = STATUS_OK
END IF
END IF
END DO
STATUS = DCX$MAKE_MAP (CONTEXT,
2 MAP,
2 MAP_LEN)
GET_MAP = STATUS
END
SUBROUTINE WRITE_MAP (LUN, ! passed
2 MAP, ! passed
2 MAP_LEN) ! passed
IMPLICIT INTEGER(A-Z)
! write compression/expansion function
! to file of compressed data
! dummy arguments
INTEGER LUN, ! logical unit of file
2 MAP_LEN ! length of function
BYTE MAP (MAP_LEN) ! compression/expansion function
! write map length
WRITE (UNIT = LUN) MAP_LEN
! write map
WRITE (UNIT = LUN) MAP
END
|
Example 7-2 shows how to expand a compressed file in a HP Fortran
program.
Example 7-2 Expanding a Compressed File in a
HP Fortran Program |
PROGRAM EXPAND_FILES
IMPLICIT INTEGER(A-Z)
! EXPANSION OF COMPRESSED FILES
! file names, lengths, and logical unit numbers
CHARACTER*256 OLD_FILE,
2 NEW_FILE
INTEGER*2 OLD_LEN,
2 NEW_LEN
INTEGER OLD_LUN,
2 NEW_LUN
! length of compression/expansion function
INTEGER MAP,
2 MAP_LEN
! user routine
EXTERNAL EXPAND_DATA
! library procedures
INTEGER LIB$GET_LUN,
2 LIB$GET_INPUT,
2 LIB$GET_VM,
2 LIB$FREE_VM
! open file to expand
STATUS = LIB$GET_LUN (OLD_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_INPUT (OLD_FILE,
2 'File to expand: ',
2 OLD_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = OLD_LUN,
2 STATUS = 'OLD',
2 FILE = OLD_FILE(1:OLD_LEN),
2 FORM = 'UNFORMATTED')
! open file to hold expanded data
STATUS = LIB$GET_LUN (NEW_LUN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
STATUS = LIB$GET_INPUT (NEW_FILE,
2 'File to hold expanded data: ',
2 NEW_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
OPEN (UNIT = NEW_LUN,
2 STATUS = 'NEW',
2 CARRIAGECONTROL = 'LIST',
2 FILE = NEW_FILE(1:NEW_LEN))
|
! expand file
! get length of compression/expansion function
READ (UNIT = OLD_LUN) MAP_LEN
STATUS = LIB$GET_VM (MAP_LEN,
2 MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! expand records
CALL EXPAND_DATA (%VAL(MAP),
2 MAP_LEN, ! length of function
2 OLD_LUN, ! compressed data file
2 NEW_LUN) ! expanded data file
! delete virtual memory used for function
STATUS = LIB$FREE_VM (MAP_LEN,
2 MAP)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
END
SUBROUTINE EXPAND_DATA (MAP, ! passed
2 MAP_LEN, ! passed
2 OLD_LUN, ! passed
2 NEW_LUN) ! passed
! expand data program
! dummy arguments
INTEGER MAP_LEN, ! length of expansion function
2 OLD_LUN, ! logical unit of compressed file
2 NEW_LUN ! logical unit of expanded file
BYTE MAP(MAP_LEN) ! array containing the function
! status variables
INTEGER STATUS,
2 IOSTAT,
2 IO_OK,
2 STATUS_OK
PARAMETER (IO_OK = 0)
PARAMETER (STATUS_OK = 1)
INCLUDE '($FORDEF)'
! context variable
INTEGER CONTEXT
! logical end_of_file
LOGICAL EOF
! record buffers
CHARACTER*32764 RECORD,
2 RECORD2
INTEGER RECORD_LEN,
2 RECORD2_LEN
|
! library procedures
INTEGER DCX$EXPAND_INIT,
2 DCX$EXPAND_DATA,
2 DCX$EXPAND_DONE
! read data compression/expansion function
READ (UNIT = OLD_LUN) MAP
! initialize work area
STATUS = DCX$EXPAND_INIT (CONTEXT,
2 %LOC(MAP(1)))
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! expand records
EOF = .FALSE.
! read length of compressed record
READ (UNIT = OLD_LUN,
2 IOSTAT = IOSTAT) RECORD_LEN
IF (IOSTAT .NE. IO_OK) THEN
CALL ERRSNS (,,,,STATUS)
IF (STATUS .NE. FOR$_ENDDURREA) THEN
CALL LIB$SIGNAL (%VAL(STATUS))
ELSE
EOF = .TRUE.
STATUS = STATUS_OK
END IF
END IF
DO WHILE (.NOT. EOF)
! read compressed record
READ (UNIT = OLD_LUN) RECORD (1:RECORD_LEN)
! expand record
STATUS = DCX$EXPAND_DATA (CONTEXT,
2 RECORD(1:RECORD_LEN),
2 RECORD2,
2 RECORD2_LEN)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
! write expanded record to new file
WRITE (UNIT = NEW_LUN,
2 FMT = '(A)') RECORD2(1:RECORD2_LEN)
! read length of compressed record
READ (UNIT = OLD_LUN,
2 IOSTAT = IOSTAT) RECORD_LEN
IF (IOSTAT .NE. IO_OK) THEN
CALL ERRSNS (,,,,STATUS)
IF (STATUS .NE. FOR$_ENDDURREA) THEN
CALL LIB$SIGNAL (%VAL(STATUS))
ELSE
EOF = .TRUE.
STATUS = STATUS_OK
END IF
END IF
END DO
! clean up work area
STATUS = DCX$EXPAND_DONE (CONTEXT)
IF (.NOT. STATUS) CALL LIB$SIGNAL (%VAL(STATUS))
END
|
7.3 DCX Routines
This section describes the individual DCX routines.
DCX$ANALYZE_DATA
The DCX$ANALYZE_DATA routine performs statistical analysis on a data
record. The results of the analysis are accumulated internally in the
context area and are used by the DCX$MAKE_MAP routine to compute the
mapping function.
Format
DCX$ANALYZE_DATA context ,record
RETURNS
OpenVMS usage: |
cond_value |
type: |
longword (unsigned) |
access: |
write only |
mechanism: |
by value |
Longword condition value. Most utility routines return a condition
value in R0. Condition values that this routine can return are listed
under Condition Values Returned.
Arguments
context
OpenVMS usage: |
context |
type: |
longword (unsigned) |
access: |
read only |
mechanism: |
by reference |
Value identifying the data stream that DCX$ANALYZE_DATA analyzes. The
context argument is the address of a longword
containing this value. DCX$ANALYZE_INIT initializes this value; you
should not modify it. You can define multiple context
arguments to identify multiple data streams that are processed
simultaneously.
record
OpenVMS usage: |
char_string |
type: |
character string |
access: |
read only |
mechanism: |
by descriptor |
Record to be analyzed. DCX$ANALYZE_DATA reads the
record argument, which is the address of a descriptor
for the record string. The maximum length of the record string is
65,535 characters.
Description
The DCX$ANALYZE_DATA routine performs statistical analysis on a single
data record. This routine is called once for each data record to be
analyzed.
During analysis, the DCX facility gathers information that DCX$MAKE_MAP
uses to create the compression/expansion function for the file. After
the data records have been analyzed, call the DCX$MAKE_MAP routine.
Upon receiving the DCX$_AGAIN status code from DCX$MAKE_MAP, you must
again analyze the same data records (in the same order) using
DCX$ANALYZE_DATA and then call DCX$MAKE_MAP again. On the second
iteration, DCX$MAKE_MAP returns the DCX$_NORMAL status code, and the
data analysis is complete.
Condition Values Returned
DCX$_INVCTX
|
Error. The context variable is invalid, or the context area is invalid
or corrupted. This may be caused by a failure to call the appropriate
routine to initialize the context variable or by an application program
error.
|
DCX$_NORMAL
|
Normal successful completion.
|
This routine also returns any condition values returned by
LIB$ANALYZE_SDESC_R2.
|