[an error occurred while processing this directive]

HP OpenVMS Systems Documentation

Content starts here

HP Pascal for OpenVMS
User Manual


Previous Contents Index

3.1.17 Software Pipelining (OpenVMS I64 and OpenVMS Alpha systems)

Software pipelining and additional software dependency analysis are enabled using the /OPTIMIZE=LEVEL=5 command-line option, which in certain cases improves run-time performance. /OPTIMIZE=LEVEL=5 is not the default; /OPTIMIZE=LEVEL=4 remains the default.

As compared to regular loop unrolling (enabled at optimization level 3 or above), software pipelining uses instruction scheduling to eliminate instruction stalls within loops, rearranging instructions between different unrolled loop iterations to improve performance.

For instance, if software dependency anaylsis of data flow reveals that certain calculations can be done before or after that iteration of the unrolled loop, software pipelining reschedules those instructions ahead or behind that loop iteration at places where their execution can prevent instruction stalls or otherwise improve performance.

For this version of HP Pascal, loops chosen for software pipelining:

  • Are always innermost loops (those executed the most).
  • Do not contain branches or procedure calls.

By modifying the unrolled loop and inserting instructions as needed before and/or after the unrolled loop, software pipelining generally improves run-time performance, except for cases where the loops contain a large number of instructions with many existing overlapped operations. In this case, software pipelining may not have enough registers available to effectively improve execution performance and using optimization level 5 instead of optimization level 4 may not improve run-time performance.

To determine whether using optimization level 5 benefits your particular program, time program execution for the same program compiled at level 4 and 5. For programs that contain loops that exhaust available registers, longer execution times may result with optimization level 5.

In cases where performance does not improve, consider compiling using /OPTIMIZE=(UNROLL=1,LEVEL=5) to (possibly) improve the effects of software pipelining.

3.1.18 Processor Selection and Tuning (OpenVMS Alpha systems)

HP Pascal provides support for generating code for specific Alpha processors and for tuning code for a preferred processor. The supported Alpha processors are EV4, EV5, EV56, EV6, EV7, EV67, and EV68.

The EV4 and EV5 processors are basically identical, with the only difference in the preferred instruction scheduling phase. The EV56 processor added byte and word opcodes. The EV6 processor added a SQRT instruction, instructions to move data directly between floating and integer registers, and a few other instructions. The EV7 processor is similar to the EV6 processor with differences only in the instruction scheduling phase.

The default architecture (see the /ARCHITECTURE qualifier) is for the EV4 processor. This restricts the compiler to instructions that exist on the EV4 processor. It essentially tells the compiler the earliest Alpha processor that will execute the code. If you run the code on earlier Alpha systems, you might get invalid opcode errors or OpenVMS might attempt to emulate the instructions at a severe performance penalty.

The default tuning (see the /OPTIMIZE=TUNE qualifier) is "generic." The tuning is for an average Alpha processor. You can achieve better performance it you allow the compiler to tune the code for a specific processor.

Specifying an explicit /ARCHITECTURE setting also defaults the /OPTIMIZE=TUNE setting to the same processor.

For example, specifying /ARCHITECTURE=EV56/OPTIMIZE=TUNE=EV7 tells the compiler to use instructions that the generated code should be able to run on an EV56 system, but that it should tune the generated code for best performance on an EV7 system. In these situations, the compiler can actually generate multiple code sequences, one using only EV56 instructions, and the other using EV7 instructions and the AMASK instruction to dynamically execute the faster sequence based on the system executing the program.

Since most Alpha systems are EV56 or later, you might see a significant improvement by specifying /ARCHITECTURE=EV56 on the command line.

3.1.19 Compiling for Optimal Performance

The following command lines will result in producing the fastest code from the compiler. Depending on the system, use one of the following:

For OpenVMS I64 systems, use:


PASCAL /NOZERO_HEAP /OPT=LEVEL=4 /NOCHECK

For OpenVMS Alpha systems, use:


PASCAL /NOZERO_HEAP /MATH_LIBRARY=FAST /OPT=LEVEL=4 /NOCHECK /ARCH=HOST
/ASSUME=NOACCURACY_SENSITIVE

For OpenVMS VAX systems, use:


PASCAL /OPTIMIZE /NOCHECK

In both cases, you may also want to use the performance flagger to identify datatypes that could be modified for additional performance.

For More Information:

3.2 Programming Considerations

The language elements that you use in a source program directly affect the compiler's ability to optimize the resulting object program. Therefore, you should be aware of the following ways in which you can assist compiler optimization and obtain a more efficient program:

  • Define constant identifiers to represent values that do not change during your program. The use of constant identifiers generally makes a program easier to read, understand, and later modify. In addition, the resulting object code is more efficient because symbolic constants are evaluated only once, at compile time, while variables must be reevaluated whenever they are assigned new values.
  • Whenever possible, use the structured control statements CASE, FOR, IF-THEN-ELSE, REPEAT, WHILE, and WITH rather than the GOTO statement. You can use the GOTO statement to exit from a loop, but careless use of it interferes with both optimization and the straightforward analysis of program flow.
  • Enclose in parentheses any subexpression that occurs frequently in your program. The compiler checks whether any assignments have affected the subexpression's value since its last occurrence. If the value has not changed, the compiler recognizes that a subexpression enclosed in parentheses has already been evaluated and does not repeat the evaluation. For example:


    x := SIN( u + (b - c) );
    y := COS( v + (b - c) );
    

    The compiler evaluates the subexpression ( b - c ) as a result of performing the SIN function. When it is encountered again, the compiler checks to see whether new values have been assigned to either b or c since they were last used. If their values have not changed, the compiler does not reevaluate ( b - c ).
  • Once your program has been completely debugged, disable all checking with [CHECK(NONE)] or with the appropriate compilation switch. Recall that HP Pascal enables bounds and declaration checking by default. When no checking code is generated, more optimizations can occur, and the program executes faster.
    Integer overflow checking is disabled by default. If you are sure that your program is not in danger of integer overflow, you should not enable overflow checking. Because overflow checking precludes certain optimizations, you can achieve a more efficient program by leaving it disabled.
  • When a variable is accessed by a program block other than the one in which it was declared, the variable should have static rather than automatic allocation. An automatically allocated variable has a varying location in memory; accessing it in another block is time-consuming and less efficient than accessing a static variable.
  • On OpenVMS VAX systems, avoid using the same temporary variable many times in the course of a program. Instead, use a new variable every time your program needs a temporary variable. Because variables stored in registers are the easiest to access, your program is most efficient when as many variables as possible can be allocated in registers. If you use several different temporary variables, the lifetime of each one is greatly reduced; thus, there is a greater chance that storage for them can be allocated in registers rather than at memory locations.
  • When creating schema records (or records with nonstatic fields), place the fields with run-time size at the end of the record. The generated code has to compute the offset of all record fields after a field with run-time size, and this change minimizes the overhead.

For More Information:

  • On HP Pascal language elements and on attributes (HP Pascal for OpenVMS Language Reference Manual)
  • On compilation switches (Chapter 1)

3.3 Implementation-Dependent Behavior

The Pascal language has several implementation-dependent behaviors that a program must not rely upon. Relying on these behaviors for correct behavior is illegal and is not portable to other platforms or other compiler versions.

Refer to the HP Pascal for OpenVMS Language Reference Manual for a list of the implementation-dependent behaviors.

For More Information:

  • On attributes and on static and automatic variables (HP Pascal for OpenVMS Language Reference Manual)
  • On compilation switches (Chapter 1)

3.3.1 Subexpression Evaluation Order

The compiler can evaluate subexpressions in any order and may even choose not to evaluate some of them. Consider the following subexpressions that involve a function with side effects:


IF f( a ) AND f( b ) THEN ...

This IF statement contains two designators for function f with the same parameter a. If f has side effects, the compiler does not guarantee the order in which the side effects will be produced. In fact, if one call to f returns FALSE, the other call to f might never be executed, and the side effects that result from that call would never be produced. For example:


q := f( a ) + f( a );

The Pascal standard allows a compiler to optimize the code as follows:


Q := 2 * f( a )

If the compiler does so, and function f has side effects, the side effects would occur only once because the compiler has generated code that evaluates f( a ) only once.

If you wish to ensure left-to-right evaluation with short circuiting, use the AND_THEN and OR_ELSE Boolean operators.

For More Information:

  • On the order of expression evaluation, see the description of the NOOPTIMIZE attribute (HP Pascal for OpenVMS Language Reference Manual)

3.3.2 MAXINT and MAXINT64 Predeclared Constants

The smallest possible value of the INTEGER type is represented by the predeclared constant - MAXINT. The largest possible value of the INTEGER type is represented by the predeclared constant MAXINT. However, the Itanium, Alpha, and VAX architectures support an additional integer value, which is ( - MAXINT - 1). If your program contains a subexpression with this value, the program's evaluation might result in an integer overflow trap. Therefore, a computation involving the value ( - MAXINT - 1) might not produce the expected result. To evaluate expressions that include ( - MAXINT - 1), you should disable either optimization or integer overflow checking.

Similarly, on OpenVMS I64 and OpenVMS Alpha systems, ( - MAXINT64 - 1) might not produce the expected results.

3.3.3 Pointer References

The compiler assumes that the value of a pointer variable is either the constant identifier NIL or a reference to a variable allocated in heap storage by the NEW procedure. A variable allocated in heap storage is not declared in a VAR section and has no identifier of its own; you can refer to it only by the name of a pointer variable followed by a circumflex (^). Consider the following example:


VAR
   x : INTEGER;
   p : ^INTEGER;
{In the executable section:}
NEW( p );
p^ := 0;
x  := 0;
IF p^ = x THEN  p^ := p^ + 1;

If a pointer variable in your program must refer to a variable with an explicit name, that variable must be declared VOLATILE or READONLY. The compiler makes no assumptions about the value of volatile variables and therefore performs no optimizations on them.

Use of the ADDRESS function, which creates a pointer to a variable, can result in a warning message because of optimization characteristics. By passing a nonread-only or nonvolatile static or automatic variable as the parameter to the ADDRESS function, you indicate to the compiler that the variable was not allocated by NEW but was declared with its own identifier. Because the compiler's assumptions are incorrect, a warning message occurs. You can also use IADDRESS, which functions similarly to the ADDRESS function except that IADDRESS returns an INTEGER_ADDRESS value and does not generate any warning messages. Use caution when using IADDRESS.

Similarly, when the parameter to ADDRESS is a formal VAR parameter or a component of a formal VAR parameter, the compiler issues a warning message that not all dynamic variables allocated by NEW may be passed to the function.

For More Information:

  • On attributes and on predeclared routines (HP Pascal for OpenVMS Language Reference Manual)

3.3.4 Variant Records

Because all the variants of a record variable are stored in the same memory location, a program can use several different field identifiers to refer to the same storage space. However, only one variant is valid at a given time; all other variants are undefined. You must store a value in a field of a particular variant before you attempt to use it. For example:


VAR
   x : INTEGER;
   a : RECORD
      CASE t : BOOLEAN OF
         TRUE   : ( b : INTEGER );
         FALSE  : ( c : REAL );
      END;
{In the executable section:}
x := a.b + 5;
a.c := 3.0;
x := a.b + 5;

Record a has two variants, b and c, which are located at the same storage address. When the assignment a.c := 3.0 is executed, the value of a.b becomes undefined because TRUE is no longer the currently valid variant. When the statement x := a.b + 5 is executed for the second time, the value of a.b is unknown. The compiler may choose not to evaluate a.b a second time because it has retained the field's previous value. To eliminate any misinterpretations caused by this assumption, variable a should be associated with the VOLATILE attribute. The compiler makes no assumptions about the value of VOLATILE objects.

For More Information:

  • On variant records or on the VOLATILE attribute (HP Pascal for OpenVMS Language Reference Manual)

3.3.5 Atomicity, Granularity, Volatility, and Write Ordering

When data is shared by multiple code streams (either multiple processes, multiple threads, or asynchronous events such as AST routines or condition handlers), you need to be aware of certain issues to guarantee correct sharing of data.

You must inform the compiler that the data being shared may change in an asynchronous fashion. By default, the compiler assumes that data is only modified by assignment statements, routine calls, etc. If the data is being changed in a way that the compiler does not know about you must use the VOLATILE attribute to tell the compiler that it must fetch the data in an atomic fashion from memory at each reference and the compiler must store the data in an atomic fashion back into memory at each assignment.

To accomplish atomic access on OpenVMS I64 systems for volatile objects 64 bits or smaller, fetches and stores are done with the normal ldn and stn instructions.

To accomplish atomic access on the Alpha for volatile objects smaller than 32 bits, fetches and stores are done with the LDx_L/STx_C instruction sequence. This pair of instructions ensures that the volatile data is accessed in an atomic fashion. Without the VOLATILE attribute, you will not get this special instruction sequence, and the data might become corrupted if two writers are trying to store to the shared data at the same time. Items of 32 bits or 64 bits are accessed with single longword and quadword instructions and do not use the LDx_L/STx_C sequence. Newer Alpha systems include byte and word instructions. See the /ARCHITECTURE qualifier for more information. Only aligned data objects are guaranteed to be accessed atomically. Larger objects that are manipulated with run-time routines are not atomic, as those routines may be interrupted.

Granularity is a term on Alpha machines to describe the situation where two threads update nearby data at the same time. Because the compiler on the older Alpha must fetch the surrounding longword or quadword, modify it, and store it back, the two threads could possibly overwrite each others data. For these situations, the nearby data should be moved to separate quadwords or use the /GRANULARITY qualifier to tell the compiler that you want longword or byte granularity at the expense of additional LDx_L/STx_C sequences. (See the /ARCHITECTURE qualifier for more information on the byte and word instructions available on newer Alpha systems.)

To accomplish atomic access on the VAX for volatile objects 32 bits or smaller, fetches and stores are done with the normal MOVB/MOVW/MOVL/INSV/EXTV instructions. In a single CPU environment, the alignment of the objects is not relevant. However, in a multiple CPU SMP system, the data being accessed must reside in a single 32-bit longword otherwise the underlying memory system may return incorrect data if two CPUs are updating the same longwords at the same time. Larger objects that are manipulated with the MOVC3/MOVC5 instructions are not atomic as those instructions may be interrupted.

Besides atomic accesses, many programs want to perform atomic operations on shared data. To facilitate this, HP Pascal provides the following built-in routines:

  • ADD_INTERLOCKED(expr,variable)
    This routine adds the expression to the aligned word variable and returns - 1 if the new value is negative, 0 if it is zero, or 1 if it is positive. On OpenVMS I64 systems, it uses the cmpxchg instruction. On OpenVMS Alpha systems, it uses the LDx_L/STx_C instructions. On OpenVMS VAX systems, it generates the VAX ADAWI instruction.
  • CLEAR_INTERLOCKED(Boolean-variable)
    SET_INTERLOCKED(Boolean-variable)
    These routines clear or set a Boolean variable, respectively, and return the original value. On OpenVMS I64 systems, they use the cmpxchg instruction. On OpenVMS Alpha systems, they use the LDx_L/STx_C instructions. On OpenVMS VAX systems, they generate the BBCCI and BBSSI instructions, respectively.
  • ADD_ATOMIC(expr,variable)
    AND_ATOMIC(expr,variable)
    OR_ATOMIC(expr,variable)
    These routines atomically add/and/or the value of the expression with the variable and return the original value. On OpenVMS I64 systems, they use the cmpxchg instruction. On OpenVMS Alpha systems, they use the LDx_L/STx_C instructions. These routines are not on OpenVMS VAX systems.

On the VAX, write operations to independent memory locations are completed in the order of the instructions. However, on Alpha and Itanium, the architectures do not guarantee that independent writes will complete in the order in which they were issued. Both architectures provide a special instruction to serialize write operations. HP Pascal provides the BARRIER built-in routine on these systems to generate the MB instruction on Alpha systems and the mf instruction on Itanium systems in order to preserve write ordering.

If your code uses a higher-level synchronization scheme to guard critical regions (such as a lock manager or a semaphore package), then using the VOLATILE attribute, the GRANULARITY qualifier, and the INTERLOCKED/ATOMIC built-ins may not be necessary; you have already ensured that there are only single readers/writers in the critical section.

3.3.6 Debugging Considerations

Some of the effects of optimized programs on debugging are as follows:

  • Use of registers
    When the compiler determines that the value of an expression does not change between two given occurrences, it may save the value in a register. In such a case, it does not recompute the value for the next occurrence, but assumes that the value saved in the register is valid. If, while debugging the program, you attempt to change the value of the variable in the expression, then the value of that variable is changed, but the corresponding value stored in the register is not. When execution continues, the value in the register may be used instead of the changed value in the expression, causing unexpected results.
    When the value of a variable is being held in a register, its value in memory is generally invalid; therefore, a spurious value may be displayed if you try to examine a variable under these circumstances.
  • Coding order
    Some of the compiler optimizations cause code to be generated in a order different from the way it appears in the source. Sometimes code is eliminated altogether. This causes unexpected behavior when you try to step by line, use source display features, or examine or deposit variables.
  • Use of condition codes (OpenVMS VAX systems)
    This optimization technique takes advantage of the way in which the VAX processor condition codes are set. For example, consider the following source code:


    x := x + 2.5;
    IF x < 0  THEN ...
    

    Rather than test the new value of x to determine whether to branch, the optimized object code bases its decision on the condition code settings after 2.5 is added to x. If you attempt to set a debugging breakpoint at the second line and deposit a different value into x, you cannot achieve the intended result because the condition codes no longer reflect the value of x. In other words, the decision to branch is being made without regard to the deposited value of the variable.
  • Inline code expansion on user-declared routines
    There is no stack frame for an inline user-declared routine and no debugger symbol table information for the expanded routine. Debugging the execution of an inline user-declared routine is difficult and is not recommended.

To prevent conflicts between optimization and debugging, you should always compile your program with a compilation switch that deactivates optimization until it is thoroughly debugged. Then you can recompile the program (which by default is optimized) to produce efficient code.

For More Information:


Previous Next Contents Index