|
OpenVMS User's Manual
9.2 Sorting Files
To sort files, use the DCL command SORT. Specify the names of the files
to be sorted, separated by commas, followed by the name of the ordered
output file to be created.
Optionally, you can specify a key for each field on which you want to
sort. Each key includes the following information:
- Starting position of the key field in a record (required)
- Size of the key (required)
- Data type of the key
- Order in which the records are sorted
- Priority of the key
If you do not specify any keys, Sort assumes there is only one key and
that this key field:
- Begins in the first position of a record
- Includes the entire record
- Contains character data
- Is sorted in ascending order
The following two examples use the default key.
- In this example, the file NAMES.LST is sorted in ascending order:
$ SORT NAMES.LST BYNAME.LST
|
This command creates the ordered output file BYNAME.LST, as shown
in Figure 9-1.
Figure 9-1 List Sorted in Ascending Order
- In this example, the files NAMES.LST and NAMES2.LST are sorted
into the ordered output file BYNAME.LST. Sort treats the files as if
they were one large file:
$ SORT NAMES.LST,NAMES2.LST BYNAME.LST
|
See Section 9.9 for a complete list of SORT qualifiers.
9.2.1 Defining a Key
Use the /KEY qualifier to define a key. When specifying multiple keys,
use a separate /KEY qualifier for each key.
Table 9-2 describes the five elements that comprise a key.
Table 9-2 /KEY Qualifier Values
Key Element |
Value |
Description |
Key position
|
POSITION:
n
|
The position of the first byte of the key field within the record. The
first byte in a record is position 1. POSITION:
n is required.
|
Key size
|
SIZE:
n
|
The length of the key field. SIZE:
n is required except for floating-point data.
The data type you specify for the key determines what values are
acceptable when specifying size. The following table lists the possible
values for each type of data and the units used to specify the size of
the key.
Data |
Valid Range |
Units |
Character
|
1 through 32,767
|
Characters
|
Binary
|
1, 2, 4, 8, or 16 (For the high-performance Sort/Merge utility, the
size of a binary data type key must be 1, 2, 4, or 8 bytes. Support of
a 16-byte binary key is deferred to a future OpenVMS Alpha release.)
|
Bytes
|
Decimal
|
1 through 31
|
Digits
|
Floating-point
|
No value is necessary.
|
For decimal data, if the decimal sign is stored in a separate byte, that byte is not counted toward the size of the data. If you specify a key that extends beyond the end of a record, Sort treats the missing characters as null characters.
|
Data type
|
CHARACTER
|
Character data. CHARACTER is the default data type.
|
|
BINARY
|
Binary data.
SIGNED --- Signed binary or decimal data. SIGNED is the default for
binary and decimal data.
UNSIGNED --- Unsigned binary or decimal data.
|
|
F_FLOATING
|
F_FLOATING format data.
|
|
D_FLOATING
|
D_FLOATING format data.
|
|
G_FLOATING
|
G_FLOATING format data.
|
|
H_FLOATING
|
On VAX systems, H_FLOATING format data. (Not currently supported by the
high-performance Sort/Merge utility.)
|
|
S_FLOATING
|
On Alpha systems, IEEE S_FLOATING format data.
|
|
T_FLOATING
|
On Alpha systems, IEEE T_FLOATING format data.
|
|
DECIMAL
|
Decimal data.
TRAILING_SIGN --- Trailing sign decimal data. TRAILING_SIGN is the
default for decimal data.
LEADING_SIGN --- Leading sign decimal data. The leading sign must
be in the first position of the field and the field must be left zero
padded.
OVERPUNCHED_SIGN --- Overpunched decimal data. OVERPUNCHED_SIGN is
the default for decimal data.
SEPARATE_SIGN --- Separate sign decimal data.
|
|
ZONED
|
Zoned decimal data. (Not currently supported by the high-performance
Sort/Merge utility.)
|
|
PACKED_DECIMAL
|
Packed decimal data.
|
Sort order
|
ASCENDING
|
Orders the sorting operation in ascending alphabetical or numerical
order. ASCENDING is the default order.
|
|
DESCENDING
|
Orders the sorting operation in descending alphabetical or numerical
order.
|
Key priority
|
NUMBER:
n
|
Specifies the order of priority of each key if you do not list multiple
keys in the order of their priority. A value of 1 to 255 can be
specified.
|
If the data in the key fields is not character data, you must specify
the data type. The following data types are recognized by the
Sort/Merge utility:
BINARY, [SIGNED]
|
|
BINARY, UNSIGNED
|
|
CHARACTER
|
|
DECIMAL, LEADING_SIGN, SEPARATE_SIGN [SIGNED]
|
|
DECIMAL, LEADING_SIGN, [OVERPUNCHED_SIGN, SIGNED]
|
|
DECIMAL [,SIGNED, TRAILING_SIGN, OVERPUNCHED_SIGN]
|
|
DECIMAL, [TRAILING SIGN], SEPARATE_SIGN, [SIGNED]
|
|
DECIMAL, UNSIGNED
|
|
D_FLOATING
|
|
F_FLOATING
|
|
G_FLOATING
|
|
H_FLOATING
|
|
S_FLOATING, IEEE (Alpha systems only)
|
|
T_FLOATING, IEEE (Alpha systems only)
|
|
PACKED_DECIMAL
|
|
ZONED
|
|
The items in brackets are defaults and need not be specified.
Note
For decimal string data, the Sort/Merge utility reports an invalid
digit in the input string differently for VAX and Alpha systems. On VAX
systems, you receive a message that the invalid digit (or reserved
operand) is converted to a valid decimal string for comparison
purposes. On Alpha systems, Sort/Merge performs the same conversion but
does not display a message. In both cases, the data from the input file
is written to the output file without change.
|
In Figure 9-2, each record in the file EMPLOYEE.LST consists of
three fields: (1) a department name, (2) an account
number, and (3) an employee name.
Figure 9-2 Record Fields in a List
The following examples illustrate how to sort the records in
EMPLOYEE.LST both with, and without, a key field:
- In this example, EMPLOYEE.LST is sorted by account number, using
the /KEY qualifier to describe the account number field:
$ SORT/KEY=(POSITION:5,SIZE:4,DECIMAL) EMPLOYEE.LST BILLING1.LST
|
This command specifies that the key field (the account number)
starts in position 5, is 4 characters long, contains decimal data, and
should be sorted in ascending order (the default). Figure 9-3 shows
the results of this Sort operation.
Figure 9-3 Sorting by Key Field
- This example shows how to sort the file EMPLOYEE.LST without
specifying a key field:
$ SORT EMPLOYEE.LST BYDEPT.LST
|
Because no key is specified, Sort assumes the default
characteristics. Figure 9-4 shows the result of this Sort operation.
Figure 9-4 Sorting with Default Key Records
Sort treats each record in EMPLOYEE.LST as one key of character
data. In this example, each record includes a department name, an
account number, and an employee name. If Sort finds a duplicate
department name, it sorts the names by account number. If it then finds
a duplicate account number, it sorts by employee name. Note that the
account number is part of the record. Unless you specify otherwise, it
is treated as character data.
9.2.2 Multiple Key Fields
You can sort with more than one key (up to a limit of 255 keys). You
can specify multiple keys in order of their priority with the primary
key first, the secondary key next, and so on. Alternately, you can
specify a key's priority using NUMBER:n. Each key can be
ascending or descending.
In the following example, the file EMPLOYEE.LST is sorted by the
employee name key first and then (where there are identical names), by
the account number:
$ SORT /KEY=(POSITION:10,SIZE:15,CHARACTER) -
_$ /KEY=(POSITION:5,SIZE:4,DECIMAL) EMPLOYEE.LST BILLING2.LST
|
Figure 9-5 shows the results of this Sort operation.
Figure 9-5 Sorting with Multiple Key Fields
In the following example, records are sorted first by the department
name in descending order, then by the employee name in ascending order:
$ SORT/KEY=(POSITION:1,SIZE:3,DESCENDING) -
_$ /KEY=(POSITION:10,SIZE:15) -
_$ EMPLOYEE.LST BILLING3.LST
|
Figure 9-6 shows the results of this Sort operation.
Figure 9-6 Sorting with Multiple Key Fields (Ascending and
Descending Order)
9.2.3 Identical Key Fields
By default, Sort/Merge keeps records with identical key fields but does
not necessarily maintain the same order in which they appeared in the
input file. To control the way in which records with identical keys are
sorted, specify one of the following qualifiers:
- /STABLE
Maintains the input order of records with identical
keys. If you use this qualifier when sorting multiple input files, on
output, records with equal keys in the first file precede those from
the second file and so on.
- /NODUPLICATES
Retains only one copy of records with identical
keys. If you want to specify which duplicate record to keep, invoke
Sort at the program level and specify an equal-key routine.
The /STABLE and /NODUPLICATES qualifiers are incompatible. You cannot
specify both qualifiers on the same command line.
In the following example, records with duplicate account numbers are
eliminated from the file EMPLOYEE.LST:
$ SORT /KEY=(POSITION:5,SIZE:4)/NODUPLICATES EMPLOYEE.LST BUDGET.LST
|
Figure 9-7 shows the results of this Sort operation.
Figure 9-7 Sorting with Identical Key Fields
9.2.4 Noncharacter Data
If you sort records that contain items other than character data,
specify the data type of each key. In addition, take care in
calculating starting positions and sizes because the items being
compared can occupy more than 1 byte.
If you are sorting a file that contains 20 characters followed by 3
floating-point numbers in F_floating format, the positions are as
follows:
- The character data occupies positions 1 to 20 (20 characters).
- The first F_floating-point number occupies positions 21 to 24.
- The second F_floating-point number occupies positions 25 to 28.
- The third F_floating-point number occupies positions 29 to 32.
To sort the file by the third floating-point number, specify the key
field as follows:
$ SORT/KEY=(POSITION:29,F_FLOATING) STATS.RAW STATS.SOR
|
You do not need to specify the size of the floating-point number
because it is fixed at four bytes.
9.2.5 Output File Organization
By default, Sort produces an output file with the same file
organization as that of the first input file. To specify a different
output file organization, include one of the following qualifiers after
the output file specification on the Sort command line:
- /FORMAT (record format)
When you use this output qualifier,
you can define the file record format, length, and block size.
- /INDEXED_SEQUENTIAL
Using this qualifier, you can define the
output to have indexed sequential file organization.
If you specify indexed sequential as the output file organization, you
must also do the following:
- Before you perform the Sort operation, create an empty file to be
used as the output file. Sort requires an output file that already
exists and is empty.
- Include the /OVERLAY qualifier after the name of the output file on
the SORT command line. The /OVERLAY qualifier indicates the existing
file is to be overlaid with the sorted records of the input file.
- /RELATIVE
Using this qualifier, you can define the output to
have relative file organization.
- /SEQUENTIAL
Using this qualifier, you can define the output
to have sequential file organization.
In the following example, a sequential file is produced after the
indexed sequential file EMPLOYEE.LST is sorted:
$ SORT/KEY=(POSITION:10,SIZE:15) -
_$ EMPLOYEE.LST BYNAME.LST/SEQUENTIAL
|
9.2.6 Sorting Process
Sort arranges files using one of the internal processes: record, tag,
address, or indexed. (The high-performance Sort/Merge utility supports
only the record process. Implementation of tag, address, and index
processes is deferred to a future OpenVMS Alpha release.) The process
you specify can affect the efficiency of the Sort operation. Refer to
Section 9.8 for information about optimizing a Sort or Merge
operation.
The following table describes the four types of process. Use the
/PROCESS=type qualifier to specify the sort process.
Sort Process |
type |
Description |
Record
|
RECORD
|
Keeps records intact while sorting and produces an output file
consisting of complete records. Record is the default sorting process.
|
Tag
|
TAG
|
Sorts the key fields only and then rereads the input file to produce an
output file of complete records. The net result is the same as for a
complete record sort.
A tag sort is useful if disk space is low because it typically uses
less work file space during the sorting. In most cases, a tag sort is
slower than a record sort because it requires extra time to reread the
input file.
|
Address
|
ADDRESS
|
Sorts the key fields only and produces an output file that is an index
of
record file addresses (RFAs) in binary format.
An address sort is faster than a record sort but you must write a
program to associate the record addresses with the records of the input
file.
|
Indexed
|
INDEX
|
Sorts the key fields only and produces an output file of keys and RFAs
(in binary format).
As with an address sort, an index sort is faster than a record
sort, but you must write a program to associate the record addresses
with the records of the input file.
|
9.3 Specifying a Collating Sequence
Characters are sorted according to a collating
sequence. For files that contain character data, you can use
the /COLLATING_SEQUENCE=sequence qualifier to specify the
collating sequence. The following table describes the collating
sequence options:
Collating Sequence |
sequence |
Description |
ASCII
|
ASCII
|
The default collating sequence for character data. The ASCII sequence
orders numbers (0 to 9) first, then uppercase letters (A to Z), and
then lowercase letters (a to z).
|
EBCDIC
|
EBCDIC
|
Generates an output file that is ordered in EBCDIC sequence. The data
remains in the ASCII representation. The EBCDIC sequence orders
lowercase letters (a to z) first, then uppercase letters (A to Z), and
then numbers (0 to 9).
|
DEC Multinational character set
|
MULTINATIONAL
|
The multinational collating sequence collates characters according to
the DEC Multinational character set (refer to Appendix A). In the
MULTINATIONAL character sequence, characters are ordered according to
the following rules:
- All diacritical forms of a character are given the collating value
of the character (A', A", A` collate as A).
- Lowercase characters are given the collating value of their
uppercase equivalents (a collates as A, a" collates as A").
- If two strings compare as equal, tie-breaking is performed. The
strings are compared to detect differences due to diacritical marks,
ignored characters, or characters that collate as equal although they
are actually different. If strings still compare as equal, another
comparison is done based on the numeric codes of the characters. In
this final comparison, lowercase characters are ordered before
uppercase.
|
National character set (NCS)
|
Collating sequence name
|
The named collating sequence must be defined in an NCS library. For
more information, see the OpenVMS National Character Set Utility Manual.
(The high-performance Sort/Merge utility does not support the
National Character Set (NCS) collating sequences. Support for NCS
collating sequences is deferred to a future OpenVMS Alpha release.)
|
User-defined sequence
|
(sequence-string)
|
Specifies a user-defined collating sequence. User-defined collating
sequences are supported only through specification files and not
through the command line interface.
(The high-performance Sort/Merge utility does not support
user-defined collating sequences. Support for user-defined collating
sequences is deferred to a future OpenVMS Alpha release.)
|
|
|
Define a collating sequence by specifying a string of single or double
characters or ranges of single characters. (A double character is any
set of two single characters collated as if they were one character.
For example, "CH" can be defined to collate as "C".) This string should
be enclosed in parentheses.
You can also represent characters by their corresponding octal,
decimal, or hexadecimal values using the radix operators: %O, %D, %X.
You must observe the following rules when defining your collating
sequence:
- Enclose characters in quotation marks ("").
- Separate each character and character range with a comma (,), and
enclose the entire list in parentheses.
- Give all the characters appearing in the character keys in the Sort
or Merge operation a collating value. Any character not given a
collating value will be ignored unless the FOLD or MODIFICATION options
are specified.
- Do not define a character more than once.
- Do not specify the null character by using quotation marks ("").
Instead, use a radix operator such as %X0.
- Specify quotation marks by enclosing them within another set of
quotation marks ("" "") or by using a radix operator.
The following string defines a collating sequence in which the
double character LL collates as a single character between L and M.
("A"-"L","LL","M"-"Z")
|
Note
Exercise caution when using the multinational collating sequence to
sort or merge files for further processing. Sequence-checking
procedures in most programming languages compare numeric characters.
Normal sequence checking does not work because the multinational
sequence is based on actual graphic characters, not the codes
representing those characters.
|
The following examples demonstrate the creation of user-defined
collating sequences for use in specification files. See Section 9.7
for information about specification files.
-
/COLLATING_SEQUENCE=(SEQUENCE=ASCII,IGNORE=("-"," "))
|
This /COLLATING_SEQUENCE qualifier with an IGNORE option specified
results in the following fields being compared as equal before tie
breaking:
252-3412
252 3412
2523412
|
-
/COLLATING_SEQUENCE=(SEQUENCE=("A"-"L","LL","M"-"R","RR","S"-"Z"))
|
This /COLLATING_SEQUENCE qualifier defines a sequence in which the
double character LL collates as a single character between L and M, and
the double character RR collates as a single character between R and S.
These double characters would otherwise appear in their usual
alphabetical order. By default, this user-defined sequence does not
define any other characters, such as lowercase a to z.
|