[an error occurred while processing this directive]
HP OpenVMS Systems Documentation |
HP Pascal for OpenVMS
|
Previous | Contents | Index |
Software pipelining and additional software dependency analysis are enabled using the /OPTIMIZE=LEVEL=5 command-line option, which in certain cases improves run-time performance. /OPTIMIZE=LEVEL=5 is not the default; /OPTIMIZE=LEVEL=4 remains the default.
As compared to regular loop unrolling (enabled at optimization level 3 or above), software pipelining uses instruction scheduling to eliminate instruction stalls within loops, rearranging instructions between different unrolled loop iterations to improve performance.
For instance, if software dependency anaylsis of data flow reveals that certain calculations can be done before or after that iteration of the unrolled loop, software pipelining reschedules those instructions ahead or behind that loop iteration at places where their execution can prevent instruction stalls or otherwise improve performance.
For this version of HP Pascal, loops chosen for software pipelining:
By modifying the unrolled loop and inserting instructions as needed before and/or after the unrolled loop, software pipelining generally improves run-time performance, except for cases where the loops contain a large number of instructions with many existing overlapped operations. In this case, software pipelining may not have enough registers available to effectively improve execution performance and using optimization level 5 instead of optimization level 4 may not improve run-time performance.
To determine whether using optimization level 5 benefits your particular program, time program execution for the same program compiled at level 4 and 5. For programs that contain loops that exhaust available registers, longer execution times may result with optimization level 5.
In cases where performance does not improve, consider compiling using
/OPTIMIZE=(UNROLL=1,LEVEL=5) to (possibly) improve the effects of
software pipelining.
3.1.18 Processor Selection and Tuning (OpenVMS Alpha systems)
HP Pascal provides support for generating code for specific Alpha processors and for tuning code for a preferred processor. The supported Alpha processors are EV4, EV5, EV56, EV6, EV7, EV67, and EV68.
The EV4 and EV5 processors are basically identical, with the only difference in the preferred instruction scheduling phase. The EV56 processor added byte and word opcodes. The EV6 processor added a SQRT instruction, instructions to move data directly between floating and integer registers, and a few other instructions. The EV7 processor is similar to the EV6 processor with differences only in the instruction scheduling phase.
The default architecture (see the /ARCHITECTURE qualifier) is for the EV4 processor. This restricts the compiler to instructions that exist on the EV4 processor. It essentially tells the compiler the earliest Alpha processor that will execute the code. If you run the code on earlier Alpha systems, you might get invalid opcode errors or OpenVMS might attempt to emulate the instructions at a severe performance penalty.
The default tuning (see the /OPTIMIZE=TUNE qualifier) is "generic." The tuning is for an average Alpha processor. You can achieve better performance it you allow the compiler to tune the code for a specific processor.
Specifying an explicit /ARCHITECTURE setting also defaults the /OPTIMIZE=TUNE setting to the same processor.
For example, specifying /ARCHITECTURE=EV56/OPTIMIZE=TUNE=EV7 tells the compiler to use instructions that the generated code should be able to run on an EV56 system, but that it should tune the generated code for best performance on an EV7 system. In these situations, the compiler can actually generate multiple code sequences, one using only EV56 instructions, and the other using EV7 instructions and the AMASK instruction to dynamically execute the faster sequence based on the system executing the program.
Since most Alpha systems are EV56 or later, you might see a significant
improvement by specifying /ARCHITECTURE=EV56 on the command line.
3.1.19 Compiling for Optimal Performance
The following command lines will result in producing the fastest code from the compiler. Depending on the system, use one of the following:
For OpenVMS I64 systems, use:
PASCAL /NOZERO_HEAP /OPT=LEVEL=4 /NOCHECK |
For OpenVMS Alpha systems, use:
PASCAL /NOZERO_HEAP /MATH_LIBRARY=FAST /OPT=LEVEL=4 /NOCHECK /ARCH=HOST /ASSUME=NOACCURACY_SENSITIVE |
For OpenVMS VAX systems, use:
PASCAL /OPTIMIZE /NOCHECK |
In both cases, you may also want to use the performance flagger to identify datatypes that could be modified for additional performance.
The language elements that you use in a source program directly affect the compiler's ability to optimize the resulting object program. Therefore, you should be aware of the following ways in which you can assist compiler optimization and obtain a more efficient program:
x := SIN( u + (b - c) ); y := COS( v + (b - c) ); |
The Pascal language has several implementation-dependent behaviors that a program must not rely upon. Relying on these behaviors for correct behavior is illegal and is not portable to other platforms or other compiler versions.
Refer to the HP Pascal for OpenVMS Language Reference Manual for a list of the implementation-dependent behaviors.
The compiler can evaluate subexpressions in any order and may even choose not to evaluate some of them. Consider the following subexpressions that involve a function with side effects:
IF f( a ) AND f( b ) THEN ... |
This IF statement contains two designators for function f with the same parameter a. If f has side effects, the compiler does not guarantee the order in which the side effects will be produced. In fact, if one call to f returns FALSE, the other call to f might never be executed, and the side effects that result from that call would never be produced. For example:
q := f( a ) + f( a ); |
The Pascal standard allows a compiler to optimize the code as follows:
Q := 2 * f( a ) |
If the compiler does so, and function f has side effects, the side effects would occur only once because the compiler has generated code that evaluates f( a ) only once.
If you wish to ensure left-to-right evaluation with short circuiting, use the AND_THEN and OR_ELSE Boolean operators.
The smallest possible value of the INTEGER type is represented by the predeclared constant - MAXINT. The largest possible value of the INTEGER type is represented by the predeclared constant MAXINT. However, the Itanium, Alpha, and VAX architectures support an additional integer value, which is ( - MAXINT - 1). If your program contains a subexpression with this value, the program's evaluation might result in an integer overflow trap. Therefore, a computation involving the value ( - MAXINT - 1) might not produce the expected result. To evaluate expressions that include ( - MAXINT - 1), you should disable either optimization or integer overflow checking.
Similarly, on OpenVMS I64 and OpenVMS Alpha systems, ( - MAXINT64 -
1) might not produce the expected results.
3.3.3 Pointer References
The compiler assumes that the value of a pointer variable is either the constant identifier NIL or a reference to a variable allocated in heap storage by the NEW procedure. A variable allocated in heap storage is not declared in a VAR section and has no identifier of its own; you can refer to it only by the name of a pointer variable followed by a circumflex (^). Consider the following example:
VAR x : INTEGER; p : ^INTEGER; {In the executable section:} NEW( p ); p^ := 0; x := 0; IF p^ = x THEN p^ := p^ + 1; |
If a pointer variable in your program must refer to a variable with an explicit name, that variable must be declared VOLATILE or READONLY. The compiler makes no assumptions about the value of volatile variables and therefore performs no optimizations on them.
Use of the ADDRESS function, which creates a pointer to a variable, can result in a warning message because of optimization characteristics. By passing a nonread-only or nonvolatile static or automatic variable as the parameter to the ADDRESS function, you indicate to the compiler that the variable was not allocated by NEW but was declared with its own identifier. Because the compiler's assumptions are incorrect, a warning message occurs. You can also use IADDRESS, which functions similarly to the ADDRESS function except that IADDRESS returns an INTEGER_ADDRESS value and does not generate any warning messages. Use caution when using IADDRESS.
Similarly, when the parameter to ADDRESS is a formal VAR parameter or a component of a formal VAR parameter, the compiler issues a warning message that not all dynamic variables allocated by NEW may be passed to the function.
Because all the variants of a record variable are stored in the same memory location, a program can use several different field identifiers to refer to the same storage space. However, only one variant is valid at a given time; all other variants are undefined. You must store a value in a field of a particular variant before you attempt to use it. For example:
VAR x : INTEGER; a : RECORD CASE t : BOOLEAN OF TRUE : ( b : INTEGER ); FALSE : ( c : REAL ); END; {In the executable section:} x := a.b + 5; a.c := 3.0; x := a.b + 5; |
Record a has two variants, b and c, which are located at the same storage address. When the assignment a.c := 3.0 is executed, the value of a.b becomes undefined because TRUE is no longer the currently valid variant. When the statement x := a.b + 5 is executed for the second time, the value of a.b is unknown. The compiler may choose not to evaluate a.b a second time because it has retained the field's previous value. To eliminate any misinterpretations caused by this assumption, variable a should be associated with the VOLATILE attribute. The compiler makes no assumptions about the value of VOLATILE objects.
When data is shared by multiple code streams (either multiple processes, multiple threads, or asynchronous events such as AST routines or condition handlers), you need to be aware of certain issues to guarantee correct sharing of data.
You must inform the compiler that the data being shared may change in an asynchronous fashion. By default, the compiler assumes that data is only modified by assignment statements, routine calls, etc. If the data is being changed in a way that the compiler does not know about you must use the VOLATILE attribute to tell the compiler that it must fetch the data in an atomic fashion from memory at each reference and the compiler must store the data in an atomic fashion back into memory at each assignment.
To accomplish atomic access on OpenVMS I64 systems for volatile objects 64 bits or smaller, fetches and stores are done with the normal ldn and stn instructions.
To accomplish atomic access on the Alpha for volatile objects smaller than 32 bits, fetches and stores are done with the LDx_L/STx_C instruction sequence. This pair of instructions ensures that the volatile data is accessed in an atomic fashion. Without the VOLATILE attribute, you will not get this special instruction sequence, and the data might become corrupted if two writers are trying to store to the shared data at the same time. Items of 32 bits or 64 bits are accessed with single longword and quadword instructions and do not use the LDx_L/STx_C sequence. Newer Alpha systems include byte and word instructions. See the /ARCHITECTURE qualifier for more information. Only aligned data objects are guaranteed to be accessed atomically. Larger objects that are manipulated with run-time routines are not atomic, as those routines may be interrupted.
Granularity is a term on Alpha machines to describe the situation where two threads update nearby data at the same time. Because the compiler on the older Alpha must fetch the surrounding longword or quadword, modify it, and store it back, the two threads could possibly overwrite each others data. For these situations, the nearby data should be moved to separate quadwords or use the /GRANULARITY qualifier to tell the compiler that you want longword or byte granularity at the expense of additional LDx_L/STx_C sequences. (See the /ARCHITECTURE qualifier for more information on the byte and word instructions available on newer Alpha systems.)
To accomplish atomic access on the VAX for volatile objects 32 bits or smaller, fetches and stores are done with the normal MOVB/MOVW/MOVL/INSV/EXTV instructions. In a single CPU environment, the alignment of the objects is not relevant. However, in a multiple CPU SMP system, the data being accessed must reside in a single 32-bit longword otherwise the underlying memory system may return incorrect data if two CPUs are updating the same longwords at the same time. Larger objects that are manipulated with the MOVC3/MOVC5 instructions are not atomic as those instructions may be interrupted.
Besides atomic accesses, many programs want to perform atomic operations on shared data. To facilitate this, HP Pascal provides the following built-in routines:
On the VAX, write operations to independent memory locations are completed in the order of the instructions. However, on Alpha and Itanium, the architectures do not guarantee that independent writes will complete in the order in which they were issued. Both architectures provide a special instruction to serialize write operations. HP Pascal provides the BARRIER built-in routine on these systems to generate the MB instruction on Alpha systems and the mf instruction on Itanium systems in order to preserve write ordering.
If your code uses a higher-level synchronization scheme to guard
critical regions (such as a lock manager or a semaphore package), then
using the VOLATILE attribute, the GRANULARITY qualifier, and the
INTERLOCKED/ATOMIC built-ins may not be necessary; you have already
ensured that there are only single readers/writers in the critical
section.
3.3.6 Debugging Considerations
Some of the effects of optimized programs on debugging are as follows:
x := x + 2.5; IF x < 0 THEN ... |
To prevent conflicts between optimization and debugging, you should always compile your program with a compilation switch that deactivates optimization until it is thoroughly debugged. Then you can recompile the program (which by default is optimized) to produce efficient code.
Previous | Next | Contents | Index |