[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

HP OpenVMS/Hanzi RTL Chinese Processing (HSY$) Manual


Previous Contents

1.2 Features of HSYSHR

HSYSHR provides the following features and capabilities:
  • HSYSHR performs a wide range of general multi-byte processing operations. You can call the HSY$ routines instead of writing your own code to perform the operation.
  • Routines in HSYSHR follow the OpenVMS Procedure Calling Standard. It allows you to call any HSY$ routines from any programming language supported in OpenVMS/Hanzi, thus increasing program flexibility.
  • Because all routines are shared, they take up less virtual address space of a process.
  • When a new version of HSYSHR are installed, you do not need to revise your calling program, and generally do not need to relink.

1.3 Linking with HSYSHR

Routines in HSYSHR execute entirely in the mode of the caller and are intended to be called in the user mode. To link your application that contains explicit calls to HSYSHR, use the following link command:
$ LINK program, SYS$LIBRARY:HSYIMGLIB.OLB/LIBRARY


Chapter 2
MULTI-BYTE CHARACTER CONCEPTS

This chapter describes some important concepts of multi-byte character that are used throughout the documentation.

2.1 What is a Multi-byte Character?

DEC Hanzi character set is implemented as a multi-byte character set containing Chinese characters, punctuation marks and various kinds of symbols. Each multi-byte character refers to a two-byte character with the most significant bit of the first byte always set. In OpenVMS/Hanzi operating system, the DEC Hanzi character set is adopted, and Chinese characters are represented as multi-byte characters from the character set. For detailed discussion of the DEC Hanzi character set, please refer to OpenVMS/Hanzi User Guide.

2.2 Proper Character Boundary

In HSYSHR, most of the routines use characters as a processing entity contrary to conventional byte by byte processing. Some routines require the input character pointer pointing at the proper character boundary in the user buffer. "Pointing at the proper character boundary" means the character pointer should not point to the non-first-byte position of a multi-byte character.

2.3 Full Form and Half Form Character

In the DEC Hanzi character set, there is a set of two-byte ASCII characters. To distinguish them from the conventional one-byte 7-bit ASCII characters, the terms "full form" and "half form" characters are used. Full form characters refer to two-byte ASCII characters whereas half form characters refer to one-byte 7-bit ASCII characters. Conversion services between full form and half form characters are provided by the conversion routines in HSYSHR. In some applications where character matching requires treating the full form and half form characters alike, the user can call the searching routines in HSYSHR and specify the conversion flag argument. Note that uppercasing and lowercasing can both be applied to these full form characters.

2.4 Multi-byte Character Unsigned Longword Representation

In HSYSHR, multi-byte character representation in single character argument is different from that found in the character string argument. Single character argument uses unsigned longword integer representation whereas characters in the string argument use the normal character string representation. An example is as follows. The two-byte character B0A1(hex) is represented differently in the following two cases.

Single character argument: (VMS Usage - longword_unsigned)



         +--+--+--+--+
         |00|00|B0|A1|
         +--+--+--+--+
         H           L

In a string argument: (VMS Usage - char_string)



             --+--+--+-   +--+
         ....  |A1|B0|....|  | start of string
             --+--+--+-   +--+
         H                   L

The read routines in HSYSHR read the buffer with character string format and return the character read in unsigned longword format. The write routines write the character in unsigned longword format to the buffer. The character written will be in character string format.


HSY$ Reference Section

This section provides detailed discussions of the routines provided in the Chinese Processing Run Time Library HSYSHR.

HSY$CH_MOVE

HSY$CH_MOVE moves a substring from a specified source buffer to a specified destination buffer.

Format

HSY$CH_MOVE len,src,dst


Arguments

len


VMS usage: longword_signed
type: longword integer (signed)
access: read only
mechanism: by value

The length in bytes of the substring to be moved.

src


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the starting position of the source buffer.

dst


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the starting position of the destination buffer.

Description

This routine is multi-byte insensitive. If len is not specifying the proper multi-byte character boundary, e.g. it indicates the second byte of a two-byte character, then only half of the multi-byte character is moved to the last character of the destination string.


HSY$DX_TRIM

HSY$DX_TRIM trims trailing one-byte and multi-byte spaces and TAB characters.

Format

HSY$DX_TRIM dst,src,[len]


RETURNS

VMS usage: cond_value
type: longword (unsigned)
access: write only
mechanism: by value

Arguments

dst


VMS usage: char_string
type: character string
access: write only
mechanism: by descriptor

The destination string to store the trimmed string.

src


VMS usage: char_string
type: character string
access: read only
mechanism: by descriptor

The source string that is to be converted.

len


VMS usage: word_signed
type: word integer (signed)
access: write only
mechanism: by reference

The length in bytes of the trimmed string. If this optional argument is not supplied, no length information of the trimmed string will be returned to the caller.

Description

dst and src can contain one-byte and multi-byte characters.

CONDITION VALUES RETURNED

LIB$_INVSTRDES Invalid string descriptor. A string descriptor has an invalid value in its DSC$B_CLASS field.
LIB$_STRTRU Procedure successfully completed. String truncated.
LIB$_FATERRLIB Fatal internal error. An internal consistency check has failed.
LIB$_INSVIRMEM Insufficient virtual memory.
SS$_NORMAL Procedure successfully completed.


HSY$DX_TRUNC

HSY$DX_TRUNC truncates the input string to the specified length.

Format

HSY$DX_TRUNC dst,src,offset,[len]


RETURNS

VMS usage: cond_value
type: longword (unsigned)
access: write only
mechanism: by value

Arguments

dst


VMS usage: char_string
type: character string
access: write only
mechanism: by descriptor

The specified destination string to store the truncated string.

src


VMS usage: char_string
type: character string
access: read only
mechanism: by descriptor

The specified source string to be truncated.

offset


VMS usage: word_signed
type: word integer (signed)
access: read only
mechanism: by reference

The offset in bytes from the starting position of the source string which indicates the position of the first character just after the truncated string. Note that this offset may not be on the proper character boundary, e.g. it may point to the second byte of a two-byte character.

len


VMS usage: word_signed
type: word integer (signed)
access: write only
mechanism: by reference

The length in bytes of the truncated string. If this optional argument is not supplied, no length information of the truncated string will be returned to the caller.

Description

The value returned in len may not necessarily be equal to the value specified in offset since offset may not be pointing at the first byte of a multi-byte character. In any case, the character indicated by offset will be treated as the first character that follows the truncated string.

CONDITION VALUES RETURNED

LIB$_INVSTRDES Invalid string descriptor. A string descriptor has an invalid value in its DSC$B_CLASS field.
LIB$_STRTRU Procedure successfully completed. Truncated string is further truncated due to insufficient space allocated in the destination string buffer.
LIB$_FATERRLIB Fatal internal error. An internal consistency check has failed.
LIB$_INSVIRMEM Insufficient virtual memory.
SS$_NORMAL Procedure successfully completed.


HSY$TRIM

HSY$TRIM trims trailing one-byte and multi-byte spaces and TAB characters.

Format

HSY$TRIM str,len


RETURNS

VMS usage: longword_signed
type: longword integer (signed)
access: write only
mechanism: by value

The offset in bytes from the starting position of the input string which indicates the position of the terminating character of the trimmed string. If the terminating character is a multi-byte character, the returned offset will be pointing to the first byte of the multi-byte character.


Arguments

str


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the starting position of the input string to be trimmed.

len


VMS usage: longword_signed
type: longword integer (signed)
access: read only
mechanism: by value

The length in bytes of the input string.

Description

str can contain one-byte and multi-byte characters.


HSY$TRUNC

HSY$TRUNC returns the position of the first character that follows the truncated string.

Format

HSY$TRUNC str,len,offset


RETURNS

VMS usage: longword_signed
type: longword integer (signed)
access: write only
mechanism: by value

The offset in bytes which indicates the position of the first character just follows the truncated string. If this character is a multi-byte character, the offset will be pointing at the first byte of the multi-byte character.


Arguments

str


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the starting position of the input string.

len


VMS usage: longword_signed
type: longword integer (signed)
access: read only
mechanism: by value

The length in bytes of the input string.

offset


VMS usage: longword_signed
type: longword integer (signed)
access: read only
mechanism: by value

The offset in bytes of the character just follows the truncated string. It may not be on the proper character boundary, e.g. it can point to the second byte of a two-byte character.

Description

str can contain one-byte and multi-byte characters. This routine helps you to position offset to the proper character boundary. Its function is similar to routine HSY$CH_CURR but with different parameter interface.


HSY$CH_GCHAR

HSY$CH_GCHAR reads the current character.

Format

HSY$CH_GCHAR cur,end


RETURNS

VMS usage: longword_unsigned
type: longword integer (unsigned)
access: write only
mechanism: by value

The current character.


Arguments

cur


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the current position of the specified current character. Note that this address must be on the proper character boundary, e.g. it should not point to the second byte of a two-byte character.

end


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the string terminating position plus one as illustrated below:


   +---+---+---+---+
.. |   |   |   |   |
   +---+---+---+---+
string                ^
                     end

Description

This routine reads a character with end of buffer checking. FFFF (hex) will be returned when read past the end of buffer. If the current character is a one-byte 7-bit control character or one-byte 8-bit character (e.g. an 8-bit character followed by a 7-bit control character), the one-byte 7-bit or 8-bit character will be returned. No updating of current pointer is done since cur is passed by value.


HSY$CH_GNEXT

HSY$CH_GNEXT reads the current character.

Format

HSY$CH_GNEXT cur,end


RETURNS

VMS usage: longword_unsigned
type: longword integer (unsigned)
access: write only
mechanism: by value

The current character.


Arguments

cur


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: modify
mechanism: by reference

The address of the current position of the specified current character. Note that this address must be on the proper character boundary, e.g. it should not point to the second byte of a two-byte character.

end


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the string terminating position plus one as illustrated below:


   +---+---+---+---+
.. |   |   |   |   |
   +---+---+---+---+
string                ^
                     end

Description

This routine reads a character with end of buffer checking. FFFF (hex) will be returned when read past the end of buffer. If the current character is a one-byte 7-bit control character or one-byte 8-bit character (e.g. an 8-bit character followed by a 7-bit control character), the one-byte 7-bit or 8-bit character will be returned. Updating of the current pointer is done. After the read action, cur will be updated to the next character position pointing at the proper character boundary. This routine is useful for successive character reading.


HSY$CH_NEXTG

HSY$CH_NEXTG reads the next character, skipping the current character.

Format

HSY$CH_NEXTG cur,end


RETURNS

VMS usage: longword_unsigned
type: longword integer (unsigned)
access: write only
mechanism: by value

The next character.


Arguments

cur


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: modify
mechanism: by reference

The address of the current position of the specified current character. Note that this address must be on the proper character boundary, e.g. it should not point to the second byte of a two-byte character.

end


VMS usage: longword_unsigned
type: longword integer (unsigned)
access: read only
mechanism: by value

The address of the string terminating position plus one as illustrated below:


   +---+---+---+---+
.. |   |   |   |   |
   +---+---+---+---+
string                ^
                     end

Description

This routine reads the next character, skipping the current character. FFFF (hex) will be returned when read past the end of buffer. If the next character is a one-byte 7-bit control character or one-byte 8-bit character (e.g. an 8-bit character followed by a 7-bit control character), the one-byte 7-bit or 8-bit character will be returned. Updating of the current pointer is done. After the read action, cur will be updated to the next character position pointing at the proper character boundary.


Previous Next Contents