HP OpenVMS Systems

ask the wizard

Floating Point Traps (Divide by Zero; HPARITH)

» close window

The Question is:

 
Analysing Exception-Informations of HPARITH
 
We are using mathematical models written in Fortran on OpenVMS AXP. To be able
 to continue execution after an exception, we have implemented a
 Condition-Handler using LIB$ESTABLISH and LIB$REVERT. In the Condition-Handler
 we are checking the actual except
ion (e.g. SIGNAL_ARRAY(2) .EQ. SS$_HPARITH) and if valid for continuation (in
 the routine we established the handler) write a specific message via
 SYS$GETMSG and SYS$FAOL and then do a stack rewind to establishing routine -
 SYS$UNWIND( %REF(MECHANISM_ARRA
Y.CHF$IS_MCH_DEPTH) , ).
 
The message we got is as follows:
  Message: %SYSTEM-F-HPARITH, high performance arithmetic trap,Imask=00000000,
 Fmask=00002000, summary=04, PC=000000000004CD80, PS=0000001B
 
Checking Summary-Bits we found out: "Division by Zero: An attempt was made to
 perform a floating divide operation with a divisor of 0."
 
Using the Linker-Map-File and the Exception-PC we found out the routine,
 causing the exception:
 
Part of Linker-Map-File:
Psect Name      Module Name       Base     End           Length
 Align                 Attributes
----------      -----------       ----     ---           ------
 -----                 ----------
$CODE$                          00030000 00069327 00039328 (     234280.) OCTA
 4   PIC,CON,REL,LCL,  SHR,  EXE,NOWRT,NOVEC,  MOD
                ...
                SCM_ENERGY_BALANCE
                                0004C7E0 0004DDDB 000015FC (       5628.) OCTA  4
 
We now want do know the statements causing the exception (as given by the
 Traceback-Handler) or the variable (by using the floating register write mask
 as shown below).
 
Assuming 64-Bit Adresses we can find out the Offset to Base-Adress of routine
 found and then calculate a PC-Offset to CODE-Base-Adress - but this seems not
 to be exact (because of compiler generating unknown machine code ?). We
 assume, that Exception is g
iven via TETA = 0.0
 
Part of List-File:
	  10931             TETA
	  10932      >      = RHO*CP_TOT (ACT_STRIP, I_LEN, I_THICK)
	  10933      >      + RHO*(H_AUSTENITE(ACT_STRIP, I_LEN,
 I_THICK)-H_FERRITE(ACT_STRIP, I_LEN, I_THICK))
	  10934      >      * DP_DTEMP_TOT (ACT_STRIP, I_LEN, I_THICK)
 
 
	  10936             TETA_L = LAMBDA(ACT_STRIP, I_LEN, I_THICK)/TETA
	  10937             TETA_V = RHO*CP_TOT (ACT_STRIP, I_LEN, I_THICK)/TETA
	  10938             TETA_P = RHO*( H_AUSTENITE (ACT_STRIP, I_LEN,
 I_THICK)
	  10939      >                   - H_FERRITE (ACT_STRIP, I_LEN,
 I_THICK))/TETA
 
 
If then using Register adresses as shown from List-File
 
    Address   Type  Name
      **      R*4   TETA
REG-00000023  R*4   TETA_L
REG-00000024  R*4   TETA_V
REG-00000026  R*4   TETA_P
 
we are NOT able to see a relation to floating register write mask.
 
 
The question now is:
- Is the way, we are trying to find out variable or statement causing the
 exception correct ?
- Why are we not able to do the last step ?
- Is there any other way to find out variable or statement causing the
 exception ?
 
 
Thanks in advance
Herbert
 
PS
Actually we are switching off LIB$ESTABLISH and LIB$REVERT via Mail-Message, to
 get a Traceback in case of an exception (but the program then ends).

The Answer is :

 
  On Alpha, the processing of arithmetic exceptions is delayed for
  performance reasons, so the exception PC does not directly identify
  which instruction caused the exception.  This is why additional
  information is captured.  In this example, "Fmask=00002000" means
  that the divide instruction that incurred the exception wrote its
  result into register F13.
 
  If you wish to track this manually, you should compile your code
  with /LIST/MACHINE_CODE qualifiers to determine the actual sequence
  of instructions generated.
 
  You are already following the right procedure to find where in the
  code to look, but you now need to look at the instructions executed
  prior to the place where the exception was reported.  The exception
  for a divide might be delayed quite a few cycles, depending on which
  Alpha model you have, so you might have to examine instructions for
  some distance prior to the exception.  Look specifically for a
  "DIVx Fa,Fb,Fc" instruction where Fc is F13.  When that instruction
  was executed, Fb contained zero.  (It does indeed seem likely that
  Fb represents the variable TETA.)
 
  If you wish finer granularity on exceptions, the Alpha architecture
  requires you use a construct known as a trap barrier (TRAPB) or an
  exception barrier (EXCB).  Particularly should you need to specifically
  identify a failing instruction.  On Alpha, the floating point traps can
  be delivered at any time up to the next TRAPB (or CALL_PAL, which
  implicitly includes TRAPB) operation -- and thus the exception is
  usually only effectively identified within the program unit.
 
  If you deem it necessary and appropriate, you can explicitly request
  the compiler option /SYNCHRONOUS_EXCEPTIONS, and thus cause the compiler
  to insert TRAPB instructions.  The presence of the TRAPB instructions
  will ensure that any arithmetic exception to be delivered immediately
  after the instruction that caused it.  Use of this technique will
  reduce the performance of your application program, however.
 
  For details on traps, exceptions, and on floating point, you will want
  to acquire the Alpha Architecture Reference Manual.  (Copies of this
  manual and of hardware-related documentation are available for
  downloading, please see the OpenVMS FAQ for pointers.)
 

  
     
     answer written or last revised on ( 6-NOV-2001 )
     » close window