 |
HP OpenVMS RTL Library (LIB$) Manual
1.2 Alphabet of LIB$T[ABLE_]PARSE
The LIB$T[ABLE_]PARSE alphabet consists of a set of symbol types
defined in Table lib-9. This alphabet includes strings made up of
elements of the ASCII character set. It provides all the basic building
blocks needed for constructing a grammar using the ASCII character set.
The alphabet also includes symbol types that represent the more complex
constructions found in programming and command language grammar.
Use the symbols types that comprise the LIB$T[ABLE_]PARSE alphabet to
define a vocabulary and grammar for your language. For each transition
you define, you specify one of the alphabet symbol types.
LIB$T[ABLE_]PARSE compares the characters at the beginning of the
remaining input string with this symbol type of each of the possible
transitions. If LIB$T[ABLE_]PARSE finds a match, it enters the state
specified by that transition.
Table lib-9 The Alphabet of LIB$T [ABLE_]PARSE
Symbol Type |
Characters Matched |
'
x'
|
The particular ASCII character. In a state table, it is expressed by
enclosing the character in single quotation marks. The character can be
any member of the 8-bit ASCII code set. LIB$T[ABLE_]PARSE does not
consider uppercase and lowercase alphabetic characters and codes with
different values in bit 7 to be equivalent.
|
TPA$_ANY
|
Any single character.
|
TPA$_ALPHA
|
Any alphabetic character, which includes the DEC multinational
character set.
|
TPA$_DIGIT
|
Any numeric character, that is, 0 through 9.
|
TPA$_STRING
|
Any string of one or more alphanumeric characters, that is, uppercase
or lowercase A through Z, and the numeric characters 0 through 9. The
string can be any length. It is bounded on the right by the first
nonalphanumeric character or by the end of the string.
|
TPA$_SYMBOL
|
Any string of one or more through characters of the standard OpenVMS
symbol constituent set, that is, uppercase and lowercase A through Z
and all DEC multinational characters, in addition to the dollar sign
($) and the underscore (_). The string is bounded on the right by some
character not in the symbol constituent set (usually a blank) or by the
end of the string.
|
'
keyword'
|
The string of characters enclosed in single quotation marks. A keyword
can consist of one or more characters of the OpenVMS symbol constituent
set, that is, uppercase and lowercase A through Z, the numeric
characters 0 through 9, the dollar sign ($), and the underscore (_).
Uppercase and lowercase alphabetics are treated as different characters.
A state table can contain up to 220 keywords. The keyword is
bounded on the right by a character not in the symbol constituent set
or by the end of the string.
Keywords that are one character in length are expressed in the form
'
x*' to distinguish them from the single-character symbol ('
x'). They must be differentiated because they are not the same
in operation. For example, in the input string AB+C, the single
character 'A' would match the first character of this string, whereas
the keyword 'A*' would not, because B in the string is in the symbol
constituent set.
|
TPA$_BLANK
|
Any string of one or more blanks and/or tabs.
|
TPA$_OCTAL
|
Any octal number (that is, any string of one or more numeric characters
0 through 7) whose magnitude is less than 2
32 for a 32-bit argument block or less than 2
64 for a 64-bit argument block.
|
TPA$_DECIMAL
|
Any decimal number (that is, any string of one or more numeric
characters 0 through 9) whose magnitude is less than 2
32 for a 32-bit argument block or less than 2
64 for a 64-bit argument block.
|
TPA$_HEX
|
Any hexadecimal number (that is, any string of one or more numeric
characters 0 through 9, A through F) whose magnitude is less than 2
32 for a 32-bit argument block or less than 2
64 for a 64-bit argument block.
|
(Alpha and I64 specific) TPA$_OCTAL_64
|
Any octal number (that is, any string of one or more numeric characters
0 through 7) whose magnitude is less than 2
64.
|
(Alpha and I 64 specific) TPA$_DECIMAL_64
|
Any decimal number (that is, any string of one or more numeric
characters 0 through 9) whose magnitude is less than 2
64.
|
(Alpha and I64 specific) TPA$_HEX_64
|
Any hexadecimal number (that is, any string of one or more numeric
characters 0 through 9, A through F) whose magnitude is less than 2
64.
|
TPA$_FILESPEC
|
Any string that constitutes a valid OpenVMS file specification. The
string is bounded on the right by the first character that either is
not a file specification constituent character or would cause the
string to violate the syntax rules of a file specification.
|
TPA$_NODE
|
Matches a full node specification including the double colon (::).
|
TPA$_NODE_ACS
|
Matches a primary node specification including the access control
string, if any, but not the double colon (::).
|
TPA$_NODE_PRIMARY
|
Matches a primary node specification excluding both the access control
string, if any, and the double colon (::).
|
TPA$_UIC
|
Any string that constitutes a valid OpenVMS numerical UIC
specification, bounded by square brackets or angle brackets. The binary
value of the UIC, converted in octal radix, is placed in the argument
block. The wildcard character (*) is permitted in the group and/or
member fields; its presence results in that field being set to its
largest possible value in the binary representation.
|
TPA$_IDENT
|
Any string that constitutes a valid OpenVMS identifier. Identifiers may
be given as numerical UICs according to the rules for TPA$_UIC, or as
alphabetic identifier names that appear in the system's rights
database. The binary value of the identifier, converted in either octal
or hexadecimal radix or by lookup in the system rights database, is
placed in the argument block. Identifiers can be entered in any of the
following forms:
[n,m] <n,m>
[name1,name2] <name1,name2>
[name] <name>
name
%Xhex-value
You can use a wildcard (*) in place of any occurence of
number or
name in an identifier form.
|
TPA$_LAMBDA
|
The empty string (always matches). As it executes the transition,
LIB$T[ABLE_]PARSE does not remove any characters from the input string.
LAMBDA transitions are useful in getting action routines called under
otherwise awkward circumstances, providing unconditional GOTOs to link
portions of a state table together, and providing default actions in
certain cases.
|
TPA$_EOS
|
The end of the input string.
|
state label
|
The label of a state that functions as a subexpression. A subexpression
is analogous to a subroutine within the state table.
The subexpression facility permits complex syntactic constructs
that appear in many places in grammar to appear only once in the state
table. It also permits a degree of nondeterministic or pushdown parsing
with a parser that is otherwise deterministic and finite-state. See
Section 3.5 for detailed information about subexpressions and
examples of their use.
|
Note
By default, LIB$T[ABLE_]PARSE treats blanks (defined to be either
spaces or tabs), as though they belong to no symbol type constituent
set. Effectively, this makes the blank a separator. LIB$T[ABLE_]PARSE
begins its next comparison with the first nonblank character following
the blanks. To have LIB$T[ABLE_]PARSE evaluate a blank as it would any
other character in the input string, set the TPA$V_BLANKS flag in the
argument block. Section 3.2 provides an example of the use of
this flag.
|
1.3 State Tables
This section describes state table generation and the macros used to
construct state tables. Section 2 explains how to use these
macros.
The state table must be set up using either MACRO or BLISS. Everything
else, including any action routines, can be coded in the language of
your choice. Simply compile the state table separately, then link it
with your program.
The body of the state table consists of one or more states, each of
which defines one or more transitions to the same or other states. The
order of the states and the order of the transitions for each state are
important:
- If a transition does not specify a target state, LIB$T[ABLE_]PARSE
transitions to the next state after the current state in the state
table.
- For a given state, LIB$T[ABLE_]PARSE evaluates the input string
against the transitions in the order in which they are defined and
executes the first transition it matches.
- If a state defines more than one transition with symbol types that
match overlapping sets of tokens, the order of transition definitions
within the state is significant. For example, the characters 123
followed by a comma (,) could match TPA$_DECIMAL, TPA$_OCTAL,
TPA$_STRING, or one of several other symbol types.
- It is best to order transitions in order of increasing generality
of their symbol types. For example, the TPA$_SYMBOL symbol type matches
all keyword strings. In general, LIB$T[ABLE_]PARSE never executes a
keyword transition that follows a TPA$_SYMBOL transition. The symbol
types, in order of increasing generality, are as follows:
'keyword'
'x'
TPA$_EOS
TPA$_ALPHA
TPA$_DIGIT
TPA$_BLANK
TPA$_OCTAL
TPA$_OCTAL_64 (Alpha and I64 only)
TPA$_DECIMAL
TPA$_DECIMAL_64 (Alpha and I64 only)
TPA$_HEX
TPA$_HEX_64 (Alpha and I64 only)
TPA$_STRING
TPA$_SYMBOL
TPA$_UIC
TPA$_IDENT
TPA$_NODE_PRIMARY
TPA$_NODE_ACS
TPA$_NODE
TPA$_FILESPEC
TPA$_ANY
TPA$_LAMBDA
Note
The list of symbol types does not include subexpression calls, because
the generality of these calls depends on the symbol types recognized
within the subexpression. If you use action routines to reject certain
transitions, you can change the order in which that symbol type is
placed in this order. In any case, LIB$T[ABLE_]PARSE executes the first
transition listed in a state that you permit to match the leftmost
portion of the remaining input string.
|
1.3.1 MACRO State Table Generation Macro Calls
The OpenVMS system MACRO library contains a set of assembler macros
that allow convenient and readable coding of a LIB$T[ABLE_]PARSE state
table. These macros generate symbol definitions and tables. They do not
produce any executable code or routine calls.
There are four MACRO state table generation macros:
- $INIT_STATE---Initializes the LIB$T[ABLE_]PARSE macros and declares
the beginning of a state table (see Section 1.3.1.1 )
- $STATE---Defines a state (see Section 1.3.1.2 )
- $TRAN---Defines a state transition (see Section 1.3.1.3 )
- $END_STATE---Ends the state table (see Section 1.3.1.4 )
A state table begins with a call to $INIT_STATE and ends with a call to
$END_STATE. Within the state table, define each state by a call to
$STATE immediately followed by as many calls to $TRAN as you need to
define the transitions from that state.
1.3.1.1 $INIT_STATE---Initializes the LIB$T[ABLE_]PARSE Macros
The $INIT_STATE macro declares the beginning of a state table. It
initializes the internals of the table generator macros and declares
the locations of the state table and the keyword table:
- The state table is the structure containing the definitions of the
states and the transitions between them. LIB$T[ABLE_]PARSE builds the
state table as it processes the $STATE and $TRAN macros you use to
define the table.
- The keyword table contains the text of the keywords used in the
state table. LIB$T[ABLE_]PARSE builds the keyword table as it processes
the calls to $TRAN for each state.
Section 4 provides specific information on the allocation
and binary representations of the state table and the keyword table.
This information may be useful in debugging your program.
$INIT_STATE state-table ,key-table
|
state-table
The name assigned to the state table. LIB$T[ABLE_]PARSE equates this
label to the start of the first state in the state table.
key-table
The name assigned to the keyword table. LIB$T[ABLE_]PARSE equates this
label to the start of the keyword table.
You must supply both the address of the state table and the address of
the keyword table in the call to LIB$T[ABLE_]PARSE to perform a parse.
The $INIT_STATE macro can appear more than once in a program. Each
occurrence defines a separate state table. No part of any state table
can refer to part of any other state table.
1.3.1.2 $STATE---Defines a State
The $STATE macro declares the beginning of a state.
label
An optional label for the state. LIB$T[ABLE_]PARSE equates the label,
if present, to the starting address of the state.
1.3.1.3 $TRAN---Defines a State Transition
The $TRAN macro defines a transition from the state in which it is
defined to some other (or to the same) state. The arguments of the
macro define, among other things, the symbol type that causes the
transition to be executed, the state to which to transfer, and the
action routine to call, if any. The transition defined by a $TRAN macro
belongs to the state defined by the last preceding $STATE macro.
$TRAN type [,label] [,action] [,mask] [,msk-adr] [,argument]
|
type
The symbol type, taken from the LIB$T[ABLE_]PARSE alphabet, that is
recognized by this transition. The transition is taken if the
characters from the beginning of the remaining input string match the
specified symbol type.
If the transition calls a subexpression to determine a match, the
symbol type syntax includes the state label of the subexpression to be
called. It is indicated with the MACRO expression
!label. See Section 3.5 for information
about subexpressions.
label
The optional target state of this transition. If present, it must be
the label assigned to some state in the state table. If no
label argument is present, LIB$T[ABLE_]PARSE transfers
control to the state immediately following the current state in the
state table.
LIB$T[ABLE_]PARSE defines two expressions you can also specify as the
target state in the label argument:
- TPA$_EXIT --- The parsing operation in progress terminates with a
success status.
- TPA$_FAIL --- The parsing operation stops with a failure status, as
if a syntax error had occurred.
action
The optional address of a user-supplied action routine. If this
argument is present, LIB$T[ABLE_]PARSE calls the named action routine
before it executes the transition. Section 3.1 describes the
calling sequence of action routines and the information available to
them.
Because the action routine address is self-relative, it cannot be in a
shared image separate from the state table.
mask
An optional 32-bit mask value used with the msk-adr
argument.
When LIB$T[ABLE_]PARSE executes the transition, it performs an
inclusive OR operation using the mask value and the
contents of msk-adr and stores the result in
msk-adr.
You can associate one or more bits in mask with a
particular transition and set those bits. When LIB$T[ABLE_]PARSE
returns, you can check the bits in msk-adr to
determine which transitions were executed. You can also use an action
routine to check the bit and ensure that a transition is executed only
once.
If the mask argument is present, the
msk-adr argument must also be present.
msk-adr
The msk-adr argument provides two mutually exclusive
capabilities depending on whether the mask argument is
present:
- If mask is present, msk-adr is
the address of a longword associated with the preceding
mask argument. LIB$T[ABLE_]PARSE performs the
inclusive OR operation on the contents of this address and the
mask argument and stores the result in
msk-adr.
Initialize the contents of
msk-adr to zero before calling LIB$T[ABLE_]PARSE.
- If mask is not present, you can use
msk-adr to specify the address of a location where
LIB$T[ABLE_]PARSE stores information about the matching token. No OR
operation is performed. This capability lets a program extract the most
commonly needed information from the input string without using action
routines.
The kind of information that LIB$T[ABLE_]PARSE stores in
the location you specify as the msk-adr argument
depends on the symbol type specified for the type
argument and on the argument block, as follows:
- If the symbol type is TPA$_DECIMAL, TPA$_OCTAL, or TPA$_HEX,
LIB$T[ABLE_]PARSE stores the binary representation of the matching
number as an unsigned longword for a 32-bit argument block and as an
unsigned quadword for a 64-bit argument block.
- If the symbol type is TPA$_DECIMAL_64, TPA$_OCTAL_64, or
TPA$_HEX_64, LIB$T[ABLE_]PARSE stores the binary representation of the
matching number as an unsigned quadword for both 32-bit and 64-bit
argument blocks.
- If the symbol type is 'x', TPA$_ANY, TPA$_ALPHA, or
TPA$_DIGIT, LIB$T[ABLE_]PARSE stores the 8-bit matching character as an
unsigned byte.
- If the symbol is of any other type, you must specify
msk-adr as the address of a 32-bit or 64-bit string
descriptor, as appropriate, that you allocate in your program.
LIB$T[ABLE_]PARSE assumes a 32-bit or 64-bit descriptor if the argument
block with which you called it is 32-bit or 64-bit, respectively.
For a 32-bit descriptor, LIB$T[ABLE_]PARSE stores the length of the
token in the first 32 bits (longword) of the descriptor. It stores a
pointer to the token in the second longword. This pointer is the
address of the token in the input string. For a 64-bit descriptor,
LIB$T[ABLE_]PARSE stores the length of the token in the second quadword
of the descriptor and stores the address of the token in the input
string in the third quadword. On entry, LIB$T[ABLE_]PARSE writes the
fields of the first quadword as follows:
DSC64$B_CLASS = DSC64$K_CLASS_S
DSC64$B_DTYPE = DSC64$K_DTYPE_T
DSC64$L_MBMO = --1
DSC64$W_MBO = +1
Using msk-adr makes your parsing program nonmodular.
The resulting program, which contains this state table, includes code
that is not position independent.
Because the address specified by msk-adr is
self-relative, it cannot be in a shared image separate from the state
table.
argument
An optional 32-bit value that LIB$T[ABLE_]PARSE passes to the action
routine without interpretation. This argument can be an identifier
number, an address, or any other information your action routine needs.
It allows a single action routine to serve many transitions for which
similar, but slightly varying, actions must be performed.
Because LIB$T[ABLE_]PARSE does not know the form or meaning of
argument the value is stored in its absolute form. If
you use argument to pass an address, you must store
the address in its absolute form rather than as a self-relative
pointer. In this case the resulting program, which contains this state
table, is nonmodular. That is, it includes code that is not position
independent.
1.3.1.4 $END_STATE---Ends the State Table
The $END_STATE macro declares the end of the state table. It is
mandatory, in order to permit the orderly cleanup of the
LIB$T[ABLE_]PARSE macro system. The $END_STATE macro has no arguments.
You code it as follows:
1.3.2 BLISS State Table Generation Macro Calls
The SYS$LIBRARY:TPAMAC.L32 and SYS$LIBRARY:TPAMAC.L64 files each
contain a set of BLISS macros that allow convenient and readable coding
of LIB$T[ABLE_]PARSE state tables in BLISS.
Use one of the following BLISS state table generation macros:
- $INIT_STATE---Initializes the macros (see Section 1.3.2.1 )
- $STATE---Defines a state and its transitions (see Section
1.3.2.2 )
To make the macros available to the program, include the following
declaration in the module containing the state tables:
LIBRARY 'SYS$LIBRARY:TPAMAC';
|
The BLISS compiler you use, BLISS-32 or BLISS-64, chooses the
corresponding SYS$LIBRARY:TPAMAC file.
The BLISS table generation macros contain no BEGIN or END statements.
This allows $STATE macros to refer to each other. They generate all
storage with OWN declarations. This means that the macros modify PSECT
declarations for OWN and GLOBAL storage. Thus if other data
declarations follow the state table declarations, they may not have the
correct attributes. You cannot simply surround the state table with
BEGIN/END, because this constitutes an expression. No declarations of
any kind, including ROUTINE declarations, can follow an expression.
Use one of the following techniques to include LIB$T[ABLE_]PARSE a
state table in a BLISS module:
- Follow the state table with explicit redeclarations of the OWN and
GLOBAL PSECTs. Example 3 illustrates this technique.
- Place the state table in a separate module. The high-level language
examples in the next section use this technique.
- Place the state table between BEGIN and END statements after the
declarations within a routine body.
- Place the state table between BEGIN and END statements at the end
of a module.
In all cases you must define all action routines, masks, addresses, and
arguments with suitable declarations (which can be FORWARD or
EXTERNAL). The LIB$T[ABLE_]PARSE macros handle the necessary FORWARD
declarations for forward references to labels within the state table.
1.3.2.1 $INIT_STATE---Initializes the LIB$T[ABLE_]PARSE Macros
The $INIT_STATE macro initializes the LIB$T[ABLE_]PARSE macro system in
the same manner it does for MACRO.
$INIT_STATE (state-table, key-table);
|
state-table
The name assigned to the state table. LIB$T[ABLE_]PARSE equates this
label to the start of the first state in the state table.
key-table
The name assigned to the keyword table. LIB$T[ABLE_]PARSE equates this
label to the start of the keyword table.
Both names are declared as global vectors of length zero. As with the
MACRO state table generation macros, you can invoke $INIT_STATE more
than once to declare several state tables within a single module.
1.3.2.2 $STATE---Declares a State and Its Transitions
In BLISS, you use the $STATE macro to declare a state in its entirety,
including its transitions.
$STATE ([label],
( transition ),
( transition ),
( transition )
.
.
.
);
|
label
Optional address of the start of the state. The compiler declares
label as a local vector of length zero. Note that the
comma following the optional label is mandatory.
transition
Each transition appears within parentheses in the same form as the
transition argument list for the MACRO $TRAN macro.
type [,label] [,action] [,mask] [,msk-adr] [,argument]
|
The arguments of each transition are expressed in exactly the same
format as in the MACRO macros, with the exception of the subexpression
symbol type. In BLISS, this symbol type has the form (label).
Note that the transitions are not specified as keyword macros.
Therefore, you must use commas to indicate arguments you have skipped.
1.4 LIB$T[ABLE_]PARSE Argument Block
LIB$T[ABLE_]PARSE finds the input string through the argument block.
This argument block is the impure database upon which LIB$T[ABLE_]PARSE
operates. That is, it is a set of variable data that can be written as
well as read. It contains information about the string to be parsed,
option flags for LIB$T[ABLE_]PARSE, and data about the current token.
If LIB$T[ABLE_]PARSE calls an action routine, it passes the argument
block to the action routine. This permits the action routine efficient
reference to relevant data.
1.4.1 Choosing an Argument Block
LIB$T[ABLE_]PARSE provides an argument block for 32-bit operations on
VAX, Alpha, and I64 systems. It also provides an argument block for
64-bit operations on Alpha and I64 systems.
1.4.1.1 32-Bit Argument Block
The 32-bit LIB$T[ABLE_]PARSE argument block accommodates longword
addresses and values as well as input tokens whose binary
representations require no more than 32 bits.
On Alpha and I64 systems, the LIB$T[ABLE_]PARSE 32-bit argument block
can also accommodate a numeric input token whose binary representation
requires up to 64 bits.
LIB$T[ABLE_]PARSE defines the first 9 longwords of the 32-bit argument
block as shown in Figure lib-20. You must pass an argument block of at
least this length as the first argument to LIB$T[ABLE_]PARSE. You can
add fields to the end of the argument block as a means of passing
user-defined data to action routines.
|