HP OpenVMS Systems

ask the wizard

Debugging memory allocation problems?

» close window

The Question is:

 
I have written a C program using DEC C version 6.0.  In this C program I
make a call to getenv.  The pointer that returns from the getenv overrites a
memory area that I have previously allocated using malloc.  Am I doing
something wrong here?  This progra
m has run for years on a VAX machine using DEC C version 4.0.
 
Thanks

The Answer is :

 
  Your program would initially appear to contain what is often refered to
  as a programming bug -- that a particular program has run for years only
  indicates that there are no overt problems.  (OpenVMS Engineering has
  recently identified a bug that has been latent for over twenty years.)
  Problems in the memory allocation and deallocation routines provided
  by OpenVMS are certainly possible, though the level of use that these
  routines generally see also makes these bugs a rather rare species.
 
  The OpenVMS Wizard would initially suspect that a previous memory
  allocation is underrunning or overrunning the allocated storage, or
  that the application is writing to memory that has been deallocated.
  Various classes of stack errors can also arise, where too much or too
  little information is written to the variables stored on the stack,
  causing corruptions of other nearby variables.  Alternatives include
  various asynchronous writes, particularly where access is unsynchronized
  or where the target variable is allocated in an inactive scope.
 
  The C malloc and free calls are very heavily used within OpenVMS code
  and within layered products and customer application code.  Accordingly,
  any problems that might be lurking within the run-time library (RTL)
  code that implements these calls would typically be expected to have
  very obvious and very widespread effects.
 
  The Compaq C RTL ships with OpenVMS and not the compiler, and OpenVMS
  ECO kits for the RTL are available.   That said, the OpenVMS Wizard
  would NOT assume that an ECO will repair a memory management problem.
  The OpenVMS Wizard WOULD assume that there is an application bug here.
  Additionally, use of current compilers is also recommended, but this
  again is far from a certain way to repair a memory management problem.
 
  The following is a typical scheme for managing a pool of variable-sized
  memory, and is similar to what is used within various of the OpenVMS
  system memory pools.  The fixed overhead is eight bytes assuming longword
  (32-bit) addressing and sixteen bytes assuming quadword (64-bit)
  addressing, and the variable overhead -- assuming quadword alignment,
  which is a common assumption -- is somewhere between zero and seven bytes.
 
        +--------+--------+--------+--------+
        | forward pointer (offset or abs)   |\
        +--------+--------+--------+--------+ > fixed overhead
        | size of this block in bytes       |/
        *========*========*========*========*
        # userdata (assume 9 bytes)         #\
        *                                   * \
        #                                   #  > variable size user data
        *========*========*========*        * /
        | wasted space             #        #/
        +                          *========*
        |                                   |> wasted_bytes
        +--------+--------+--------+--------+
 
        average bytes wasted:
 
                ((sizeof( alignment ) - 1) / 2) + sizeof(fixed overhead)
 
  For simplicity of the allocation (and, given the performance benefits
  of natural alignment), the size of the fixed overhead area is typically
  the same as the alignment specified for the user data area.
 
  Since the size of the offsets and sizes involved are typically powers of
  2, the waste at the end of each chunk of pool is likely less than
  wasted_bytes, on average.
 
  Of course, any code that cares about the alignment of the buffers or about
  the pointer organization within the areas ajoining the memory returned to
  the caller -- or cares directly or indirectly about the wasted bytes -- can
  be suspect when porting the code to other platforms.  For an example of
  code that indirectly cares about the wasted bytes, consider code that
  (erroneously) writes data into the wasted space.  So long as there is
  sufficient extra space "wasted" in the allocation, the code will operate
  and the application will have a latent bug.  Should the code be ported or
  should the allocation scheme change or -- the most common variation that
  exposes the bug -- should the allocation size change and allocate less
  (or no) wasted space -- the latent bug will be exposed.  (And since the
  code "worked fine" on the other platform, the application code itself
  "clearly cannot be at fault".)
 
  The OpenVMS Wizard would certainly like to see ANSI C memory management
  extensions -- in particular, one that allows a user to specify the required
  alignment in pool, and that allows a programmer to define one or more
  fixed-size lookaside lists for performance.  Extensions for debugging,
  such as the add-on tool "purify", would also be useful.  (The OpenVMS C
  and LIBRTL run-time libraries do adapt to the allocation pattern of the
  application, with a selection of lookaside lists of fixed-size blocks and
  such.  And OpenVMS does provide the Heap Analyzer, a component of the
  OpenVMS Debugger which permits a programmer to debug the memory allocation
  pool.)
 
  As for the technique variously known as "fenceposts", the implementation
  of fenceposts in the allocation headers allows (easier) detection of the
  typical memory management corruptions -- this also generally requires
  a core set of memory allocation and deallocation routines be implemented
  and used.  (This common memory management code is a Good Thing, of course.)
  This "fencepost" technique does "waste" memory when enabled, but any
  programmer that has ever chased a memory corruption bug will understand
  why this memory use is valuable.
 
  Here is what the layout of the typical memory packet looks like, when
  fenceposts are implemented.
 
        +--------+--------+--------+--------+
        | forward pointer (offset or abs)   |\
        +--------+--------+--------+--------+ > fixed overhead
        | size of this block in bytes       |/
        +--------+--------+--------+--------+
        | fencepost -- used to detect pool  |
        + corruption due to underwriting    +
        | or overwriting allocated areas.   |
        *========*========*========*========*
        # userdata (assume 9 bytes)         #\
        *                                   * \
        #                                   #  > variable size user data
        *========*========*========*        * /
        | variable fencepost       #        #/
        +                          *========*
        |                                   |> wasted_bytes
        +--------+--------+--------+--------+
        | the fencepost should be filled    |
        + with known value(s) with as few   +
        | zero or space bytes as possible   |
        +--------+--------+--------+--------+
 
  The allocation and deallocation routines fill in the fencepost and
  read out and compare the values with the expected values, of course.
  The OpenVMS Wizard tends to use the allocation address and the
  size of the packet as the values written into the fencepost, and
  particularly a bit inversion (or an XOR with a known value) of these
  values -- the bit inversion or the XOR avoids having zero bytes in
  the fencepost, as zeros are a byte value common found in data overruns
  and data underruns.
 
  Also useful in the common code is the use of the lib$get_vm and
  lib$free_vm calls -- why?  Because zones can be tailored with far
  more control than with malloc and free, and because this can also
  provide direct access to the lib$show_vm and lib$stat_vm calls,
  calls which are helpful when examining and debugging and tuning
  memory allocation.  (Calls to malloc and free are obviously best
  and most appropriate for code that needs to be portable, of course.)
 
  Please see topics (2624), (3115) , (6870) and (1661) for related
  discussions.
 

  
     
     answer written or last revised on ( 27-MAR-2002 )
     » close window