[an error occurred while processing this directive]

HP OpenVMS Systems

ask the wizard
Content starts here

Bitlocks, interlocks, synchronization?

» close window

The Question is:

 
Dear Wizard
 
In article wiz_2681 you wrote:
 
"The OpenVMS Wizard will leave the question
of how to reliably deal with bitlocks,
and how to spin appropriately, for another
topic!"
 
We are still waiting :)
 
  Thank you.
 


The Answer is :

 
  Synchronization techniques are described in detail in the OpenVMS
  Programming Concepts manual, and in the hardware platform or
  hardware architecture documentation.
 
  Bitlocks and interlocked instructions are the most lightweight
  of all synchronization techniques available on OpenVMS systems,
  and the interlocked capabilities are used as the basis of many
  other more complex (and more flexible) synchronization mechanisms.
  Interlocked instructions permit only one accessor to perform the
  specified function at a time, and are thus useful for protecting
  critical code and critical data from uncoordinated (shared) access.
 
  VAX systems have hardware-implemented bitlocks and hardware bitlock
  (and interlocked quuee) instructions including BBCCI, BBSSI, REMQHI,
  REMQTI, INSQHI, and INSQTI.
 
  For compatibility with existing applications, the OpenVMS Alpha and
  OpenVMS Itanium systems have simulations of the VAX interlocked
  instructions.  Further, OpenVMS provides run-time library calls
  (eg: lib$bbcci), and various compilers can offer "built-ins" --
  language extensions -- targeting application sychronization.
  (These calls could potentially be implemented on OpenVMS Alpha, for
  instance, using the Alpha LDx_L and STx_C mechanisms -- please
  see the OpenVMS source listings media for details.)
 
  Use only interlocked instructions when modifying the bitlock.  Do
  not mix interlocked and non-interlocked access to the interlock.
  (The interlocks correctly manage the processor caches.  The
  non-interlocked access can run afoul of the processor cache and
  may not see the correct data.  For details on memory barriers and
  shared memory, see the OpenVMS Programming Concepts Manual and
  please see topic (2681).)
 
  The interlocked instructions can typically lock ranges of memory
  rather larger than the target bit (or target queue entry), and can
  thus bitlocks located within the same "interlock grain" can encounter
  contention.  This contention will not disrupt the correct operation
  of the bitlocks, but it can slow the access to the bitlocks.  (The
  particular span of interlocking is implementation-specific, and can
  range from a naturally aligned longword (VAX) to a naturally-aligned
  quadword (Alpha) to all of system memory, and can potentially be
  discontiguous.  For additional related information, please see
  topics (8149) and (7383).)
 
  Except for high-IPL kernel-mode (driver) code that must necessarily
  block other system activity, application code should not spin on a
  bitlock -- spinning is a term for repeatedly checking the state of
  the bitlock.  Spinning causes system performance overhead because of
  the loop and because of the interlocks used to access the bitlock,
  and the act of spinning can reduce the ability of other accessors to
  access the bitlock.  Spinning is also sensitive to non-uniform memory
  access (NUMA) memory organization, with processes local to the bitlock
  receiving substantially more preferential access to the bitlock.
  (The kernel-mode OpenVMS spinlock primitives were explicitly modified
  to account for this particular characteristic of NUMA.)
 
  When spinning is required, spinning is normally best performed with a
  combination of interlocked and non-interlocked operations.  The
  interlocked operations are used to acquire and to release the bitlock,
  while the non-interlocked (read) operations are used to poll for the
  potential to access the requested bitlock -- this design avoids the
  contention on the interlock primitives that can arise if the application
  spins using the interlocked operations.  (As was mentioned earlier,
  the granularity of the interlock primitives can entail locking an
  entirely implementation-specific and potentially non-contiguous chunk
  of memory involving from between a longword and all of physical memory,
  inclusive.)
 
  The usual approach for an application waiting on a bitlock is to use
  a $resched, $hiber/$schdwk or other similar system service call to
  "back off" from the bitlock; to avoid repeated sequential access to
  the bitlock.  Backing off from the bitlock permits other accessors
  to access the bitlock, and reduces the general system overhead
  resulting from the spinning.  (Application designs involving use of
  the distributed lock manager can also assist here.)
 
  The usual approach for applications communicating via shared memory
  involves two or more queues of data structures -- a queue of structures
  that are free (often fixed-length), and a queue of structures that are
  pending work processing.  A process writing data dequeues a free packet,
  fills it in, and then appends the packet to the pending work queue.
  This approach avoids contention and the potential for corruptions
  when multiple accessors are referencing the shared memory data
  structures in parallel.
 
  Also please see the OpenVMS documentation of the distributed lock
  manager.  This documentation is included in the OpenVMS Programming
  Concepts Manual and in the system service reference materials for
  $enq[w] and $deq[w].  There are many features available to clients
  of the lock manager -- distributed operations, asynchronous grant
  and asynchronous blocking notifications, shared and exclusive access,
  queued access -- that must otherwise be manually implemented on top
  of bitlocks or other more primitive synchronization techniques.
 
  Also please see topics (1661), (2681), (6099), (7383) and (8149).
 
  Attached is an example of interlocked queue operations.
 
 
#pragma  module	qdemo	"V2.0"
#pragma builtins
 
/*
** Copyright 2001 Compaq Computer Corporation
**
*/
 
/*
**++
**  FACILITY:  Examples
**
**  MODULE DESCRIPTION:
**
**      This routine contains a demonstration of the OpenVMS self-relative
**	interlocked RTL queue routines lib$remqhi() and lib$insqti(), and
**      the equivilent Compaq C compiler builtin functions, and provides
**      a demonstration of the OpenVMS Compaq C memory management routines.
**
**  AUTHORS:
**
**      Stephen Hoffman
**
**  CREATION DATE:  21-Jan-1990
**
**  DESIGN ISSUES:
**
**      NA
**
**  MODIFICATION HISTORY:
**
**      9-Aug-2001  Hoffman
**                  Compaq C updates, added builtin calls.
**
**--
*/
 
/*
**  $! queue demo build procedure...
**  $ cc/decc/debug/noopt qdemo
**  $ link qdemo/debug
**  $!
*/
 
/*
**
**  INCLUDE FILES
**
*/
 
#include <builtins.h>
#include <lib$routines.h>
#include <libdef.h>
#include <ssdef.h>
#include <stdio.h>
#include <stdlib.h>
#include <stsdef.h>
 
main()
    {
    unsigned long int retstat;
    unsigned long int i;
    struct queueblock
	{
	unsigned long int *flink;
	unsigned long int *blink;
	unsigned long int dd;
	} *qb;
    /*
    **	Allocate the (zeroed) queue header now.
    **
    **	The interlocked queue forward and backward links located in
    **	the queue header (of self-relative queues) must be initialized
    **	to zero prior to usage.  calloc() performs this for us.  Blocks
    **	allocated and inserted in the queue subsequently need not have
    **  their links zeroed.
    **
    **	NB: On VMS, the calloc() and malloc() routines acquire memory
    **	that is quadword (or better) aligned.  The VAX hardware queue
    **	instructions (and thus the queue routines) require a minimum
    **	of quadword alignment.
    */
    struct queueblock *header = calloc(1, sizeof( struct queueblock ));
    struct queueblock *qtmp = 0;
 
    printf( "qdemo.c -- queue demomstration\n" );
    printf( "\nRTL calls...\n\n" );
 
    /*
    **  dynamically allocate the memory for each block, place a value
    **  in the block and insert the block onto the tail of the queue.
    */
    for ( i = 0; i < 10; i++ )
	{
	qtmp = calloc(1,sizeof( struct queueblock ));
	qtmp->dd = i;
	printf( "inserting item: %d\n", qtmp->dd );
	retstat = lib$insqti( qtmp, header );
 
	};
 
    /*
    **	Remove queue entries until there are no more.
    */
    retstat = SS$_NORMAL;
    while ( $VMS_STATUS_SUCCESS( retstat ) )
	{
	retstat = lib$remqhi( header, &qtmp );
	if ( $VMS_STATUS_SUCCESS( retstat ) )
	    {
	    printf( "removing item: %d\n", qtmp->dd );
	    free( qtmp );
	    }
	}
 
    if ( retstat != LIB$_QUEWASEMP )
	printf( "unexpected status %x received\n", retstat );
    else
	printf( "expected completion status received\n" );
 
    printf( "\nbuiltin calls...\n\n" );
 
    /*
    **  dynamically allocate the memory for each block, place a value
    **  in the block and insert the block onto the tail of the queue.
    */
    for ( i = 0; i < 10; i++ )
	{
	qtmp = calloc(1,sizeof( struct queueblock ));
	qtmp->dd = i;
	printf( "inserting item: %d\n", qtmp->dd );
	retstat = _INSQTI( qtmp, header );
	};
 
 
    /*
    **	Remove queue entries until there are no more.
    */
    retstat = _remqi_removed_more;
    while (( retstat == _remqi_removed_more ) ||
	   ( retstat == _remqi_removed_empty ))
	{
	retstat = _REMQHI( header, &qtmp );
	if (( retstat == _remqi_removed_more ) ||
	   ( retstat == _remqi_removed_empty ))
	    {
	    printf( "removing item: %d\n", qtmp->dd );
	    free( qtmp );
	    }
	}
 
    switch ( retstat )
      {
      case _remqi_removed_empty:
	printf( "unexpected status _remqi_removed_empty received\n" );
        break;
      case _remqi_removed_more:
	printf( "unexpected status _remqi_removed_more received\n" );
        break;
      case _remqi_not_removed:
	printf( "unexpected status _remqi_not_removed received\n" );
        break;
      case _remqi_empty:
	printf( "expected status _remqi_empty received\n" );
        break;
      }
 
    printf( "\nDone...\n" );
    return SS$_NORMAL;
    }
 
 

answer written or last revised on ( 8-DEC-2002 )

» close window