 |
HP OpenVMS/Hangul RTL Korean Processing (HSY$)
Manual
HSYSHR provides the following features and capabilities:
- HSYSHR performs a wide range of general multi-byte processing
operations. You can call the HSY$ routines instead of writing your own
code to perform the operation.
- Routines in HSYSHR follow the OpenVMS Procedure Calling Standard.
It allows you to call any HSY$ routines from any programming language
support in OpenVMS/Hangul, thus increasing program flexibility.
- Because all routines are shared, they take up less virtual address
space of a process.
- When new versions of the HSYSHR are installed, you do not need to
revise your calling program, and generally do not need to relink.
Routines in HSYSHR execute entirely in the mode of the caller and are
intended to be called in the user mode. To link your application that
contains explicit calls to HSYSHR, use the following link command:
$ LINK program, SYS$LIBRARY:HSYIMGLIB.OLB/LIBRARY
Chapter 2 MULTI-BYTE CHARACTER CONCEPTS
This chapter describes some important concepts of multi-byte character
that are used throughout the documentation.
DEC Hangul character set is implemented as a multi-byte character set
containing Korean characters, punctuation marks and various kinds of
symbols. Each multi-byte character refers to a two-byte character with
the most significant bit of the first byte always set. In
OpenVMS/Hangul operating system, the DEC Hangul character set is
adopted, and all Korean characters are represented as multi-byte
characters from the character set. For detailed discussion of the DEC
Hangul character set, please refer to OpenVMS/Hangul User
Guide.
In HSYSHR, most of the routines use characters as a processing entity
contrary to conventional byte by byte processing. Some routines require
the input character pointer pointing at the proper character boundary
in the user buffer. "Pointing at the proper character boundary" means
the character pointer should not point to the non-first-byte position
of a multi-byte character.
In the DEC Hangul character set, there is a set of two-byte ASCII
characters. To distinguish them from the conventional one-byte 7-bit
ASCII characters, the terms "full form" and "half form" characters are
used. Full form characters refer to two-byte ASCII characters whereas
half form characters refer to one-byte 7-bit ASCII characters.
Conversion services between full form and half form characters are
provided by the conversion routines in HSYSHR. In some applications
where character matching requires treating the full form and half form
characters equivalent, the user can call the searching routines in
HSYSHR and specify the conversion flag argument. Note that uppercasing
and lowercasing can both be applied to these full form characters.
In HSYSHR, multi-byte character representation in single character
argument is different from that found in the character string argument.
Single character argument uses unsigned longword integer representation
whereas characters in the string argument use the normal character
string representation. An example is as follows. The two-byte character
B0A1(hex) is represented differently in the following two cases.
Single character argument: (VMS Usage - longword_unsigned)
+--+--+--+--+
|00|00|B0|A1|
+--+--+--+--+
H L
|
In a string argument: (VMS Usage - char_string)
--+--+--+- +--+
.... |A1|B0|....| | start of string
--+--+--+- +--+
H L
|
The read routines in HSYSHR read the buffer with character string
format and return the character read in unsigned longword format. The
write routines write the character in unsigned longword format to the
buffer. The character written will be in character string format.
HSY$ Reference Section
This section provides detailed discussions of the routines provided in
the Korean Processing Run Time Library HSYSHR.
HSY$CH_MOVE
HSY$CH_MOVE moves a substring from a specified source buffer to a
specified destination buffer.
Format
HSY$CH_MOVE len,src,dst
Arguments
len
VMS usage:
|
longword_signed
|
type:
|
longword integer (signed)
|
access:
|
read only
|
mechanism:
|
by value
|
The length in bytes of the substring to be moved.
src
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the starting position of the source buffer.
dst
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the starting position of the destination buffer.
Description
This routine is multi-byte insensitive. If len is not
specifying the proper multi-byte character boundary, e.g. it indicates
the second byte of a two-byte character, then only half of the
multi-byte character is moved to the last character of the destination
string.
HSY$DX_TRIM
HSY$DX_TRIM trims trailing one-byte and multi-byte spaces and TAB
characters.
Format
HSY$DX_TRIM dst,src,[len]
RETURNS
VMS usage:
|
cond_value
|
type:
|
longword (unsigned)
|
access:
|
write only
|
mechanism:
|
by value
|
Arguments
dst
VMS usage:
|
char_string
|
type:
|
character string
|
access:
|
write only
|
mechanism:
|
by descriptor
|
The destination string to store the trimmed string.
src
VMS usage:
|
char_string
|
type:
|
character string
|
access:
|
read only
|
mechanism:
|
by descriptor
|
The source string that is to be converted.
len
VMS usage:
|
word_signed
|
type:
|
word integer (signed)
|
access:
|
write only
|
mechanism:
|
by reference
|
The length in bytes of the trimmed string. If this optional argument is
not supplied, no length information of the trimmed string will be
returned to the caller.
Description
dst and src can contain one-byte and
multi-byte characters.
CONDITION VALUES RETURNED
LIB$_INVSTRDES
|
Invalid string descriptor. A string descriptor has an invalid value in
its DSC$B_CLASS field.
|
LIB$_STRTRU
|
Procedure successfully completed. String truncated.
|
LIB$_FATERRLIB
|
Fatal internal error. An internal consistency check has failed.
|
LIB$_INSVIRMEM
|
Insufficient virtual memory.
|
SS$_NORMAL
|
Procedure successfully completed.
|
HSY$DX_TRUNC
HSY$DX_TRUNC truncates the input string to the specified length.
Format
HSY$DX_TRUNC dst,src,offset,[len]
RETURNS
VMS usage:
|
cond_value
|
type:
|
longword (unsigned)
|
access:
|
write only
|
mechanism:
|
by value
|
Arguments
dst
VMS usage:
|
char_string
|
type:
|
character string
|
access:
|
write only
|
mechanism:
|
by descriptor
|
The specified destination string to store the truncated string.
src
VMS usage:
|
char_string
|
type:
|
character string
|
access:
|
read only
|
mechanism:
|
by descriptor
|
The specified source string to be truncated.
offset
VMS usage:
|
word_signed
|
type:
|
word integer (signed)
|
access:
|
read only
|
mechanism:
|
by reference
|
The offset in bytes from the starting position of the source string
which indicates the position of the first character just after the
truncated string. Note that this offset may not be on the proper
character boundary, e.g. it may point to the second byte of a two-byte
character.
len
VMS usage:
|
word_signed
|
type:
|
word integer (signed)
|
access:
|
write only
|
mechanism:
|
by reference
|
The length in bytes of the truncated string. If this optional argument
is not supplied, no length information of the truncated string will be
returned to the caller.
Description
The value returned in len may not necessarily be equal
to the value specified in offset since
offset may not be pointing at the first byte of a
multi-byte character. In any case, the character indicated by
offset will be treated as the first character that
follows the truncated string.
CONDITION VALUES RETURNED
LIB$_INVSTRDES
|
Invalid string descriptor. A string descriptor has an invalid value in
its DSC$B_CLASS field.
|
LIB$_STRTRU
|
Procedure successfully completed. Truncated string is further truncated
due to insufficient space allocated in the destination string buffer.
|
LIB$_FATERRLIB
|
Fatal internal error. An internal consistency check has failed.
|
LIB$_INSVIRMEM
|
Insufficient virtual memory.
|
SS$_NORMAL
|
Procedure successfully completed.
|
HSY$TRIM
HSY$TRIM trims trailing one-byte and multi-byte spaces and TAB
characters.
Format
HSY$TRIM str,len
RETURNS
VMS usage:
|
longword_signed
|
type:
|
longword integer (signed)
|
access:
|
write only
|
mechanism:
|
by value
|
The offset in bytes from the starting position of the input string
which indicates the position of the terminating character of the
trimmed string. If the terminating character is a multi-byte character,
the returned offset will be pointing to the first byte of the
multi-byte character.
Arguments
str
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the starting position of the input string to be trimmed.
len
VMS usage:
|
longword_signed
|
type:
|
longword integer (signed)
|
access:
|
read only
|
mechanism:
|
by value
|
The length in bytes of the input string.
Description
str can contain one-byte and multi-byte characters.
HSY$TRUNC
HSY$TRUNC returns the position of the first character that follows the
truncated string.
Format
HSY$TRUNC str,len,offset
RETURNS
VMS usage:
|
longword_signed
|
type:
|
longword integer (signed)
|
access:
|
write only
|
mechanism:
|
by value
|
The offset in bytes which indicates the position of the first character
just follows the truncated string. If this character is a multi-byte
character, the offset will be pointing at the first byte of the
multi-byte character.
Arguments
str
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the starting position of the input string.
len
VMS usage:
|
longword_signed
|
type:
|
longword integer (signed)
|
access:
|
read only
|
mechanism:
|
by value
|
The length in bytes of the input string.
offset
VMS usage:
|
longword_signed
|
type:
|
longword integer (signed)
|
access:
|
read only
|
mechanism:
|
by value
|
The offset in bytes of the character just follows the truncated string.
It may not be on the proper character boundary, e.g. it can point to
the second byte of a two-byte character.
Description
str can contain one-byte and multi-byte characters.
This routine helps you to position offset to the
proper character boundary. Its function is similar to routine
HSY$CH_CURR but with different parameter interface.
HSY$CH_GCHAR
HSY$CH_GCHAR reads the current character.
Format
HSY$CH_GCHAR cur,end
RETURNS
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
write only
|
mechanism:
|
by value
|
The current character.
Arguments
cur
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the current position of the specified current character.
Note that this address must be on the proper character boundary, e.g.
it should not point to the second byte of a two-byte character.
end
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the string terminating position plus one as illustrated
below:
+---+---+---+---+
.. | | | | |
+---+---+---+---+
string ^
end
|
Description
This routine reads a character with end of buffer checking. FFFF (hex)
will be returned when read past the end of buffer. If the current
character is a one-byte 7-bit control character or one-byte 8-bit
character (e.g. an 8-bit character followed by a 7-bit control
character), the one-byte 7-bit or 8-bit character will be returned. No
updating of current pointer is done since cur is
passed by value.
HSY$CH_GNEXT
HSY$CH_GNEXT reads the current character.
Format
HSY$CH_GNEXT cur,end
RETURNS
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
write only
|
mechanism:
|
by value
|
The current character.
Arguments
cur
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
modify
|
mechanism:
|
by reference
|
The address of the current position of the specified current character.
Note that this address must be on the proper character boundary, e.g.
it should not point to the second byte of a two-byte character.
end
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the string terminating position plus one as illustrated
below:
+---+---+---+---+
.. | | | | |
+---+---+---+---+
string ^
end
|
Description
This routine reads a character with end of buffer checking. FFFF (hex)
will be returned when read past the end of buffer. If the current
character is a one-byte 7-bit control character or one-byte 8-bit
character (e.g. an 8-bit character followed by a 7-bit control
character), the one-byte 7-bit or 8-bit character will be returned.
Updating of the current pointer is done. After the read action,
cur will be updated to the next character position
pointing at the proper character boundary. This routine is useful for
successive character reading.
HSY$CH_NEXTG
HSY$CH_NEXTG reads the next character, skipping the current character.
Format
HSY$CH_NEXTG cur,end
RETURNS
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
write only
|
mechanism:
|
by value
|
The next character.
Arguments
cur
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
modify
|
mechanism:
|
by reference
|
The address of the current position of the specified current character.
Note that this address must be on the proper character boundary, e.g.
it should not point to the second byte of a two-byte character.
end
VMS usage:
|
longword_unsigned
|
type:
|
longword integer (unsigned)
|
access:
|
read only
|
mechanism:
|
by value
|
The address of the string terminating position plus one as illustrated
below:
+---+---+---+---+
.. | | | | |
+---+---+---+---+
string ^
end
|
Description
This routine reads the next character, skipping the current character.
FFFF (hex) will be returned when read past the end of buffer. If the
next character is a one-byte 7-bit control character or one-byte 8-bit
character (e.g. an 8-bit character followed by a 7-bit control
character), the one-byte 7-bit or 8-bit character will be returned.
Updating of the current pointer is done. After the read action,
cur will be updated to the next character position
pointing at the proper character boundary.
|