Fast I/O is a set of three system services that
were developed as a $QIO alternative built for speed. These services
are not a $QIO replacement; $QIO is unchanged, and $QIO interoperation
with these services is fully supported. Rather, the services substitute
for a subset of $QIO operations, namely, only the high-volume read/write
I/O requests.
The Fast I/O services support 64-bit addresses
for data transfers to and from disk and tape devices.
While Fast I/O services are available on OpenVMS
VAX, the performance advantage applies only to OpenVMS Alpha and Integrity
servers. OpenVMS VAX has a run-time library (RTL) compatibility package
that translates the Fast I/O service requests to $QIO system service
requests, so one set of source code can be used on VAX, Alpha, and
Integrity server systems.
10.1.1 Fast I/O Benefits
The performance benefits of Fast I/O result from
streamlining high-volume I/O requests. The Fast I/O system service
interfaces are optimized to avoid the overhead of general-purpose
services. For example, I/O request packets (IRPs) are now permanently
allocated and used repeatedly for I/O rather than allocated and deallocated
anew for each I/O.
The greatest benefits stem from having user data
buffers and user I/O status structures permanently locked down and
mapped using system space. This allows Fast I/O to do the following:
For direct I/O, avoid
per-I/O buffer lockdown or unlocking.
For buffered I/O, avoid
allocation and deallocation of a separate system buffer, because the
user buffer is always addressable.
Complete Fast I/O operations
at IPL 8, thereby avoiding the interrupt chaining usually required
by the more general-purpose $QIO system service. For each I/O, this
eliminates the IPL 4 IOPOST interrupt and a kernel AST.
In total, Fast I/O services eliminate four spinlock
acquisitions per I/O (two for the MMG spinlock and two for the SCHED
spinlock). The reduction in CPU cost per I/O is 20 percent for uniprocessor
systems and 10 percent for multiprocessor systems.
10.1.2 Using Buffer Objects
The lockdown of user-process data structures is
accomplished by buffer objects. A “buffer object” is
process memory whose physical pages have been locked in memory and
double-mapped into system space. After creating a buffer object, the
process remains fully pageable and swappable and the process retains
normal virtual memory access to its pages in the buffer object.
If the buffer object contains process data structures
to be passed to an OpenVMS system service, the OpenVMS system can
use the buffer object to avoid any probing, lockdown, and unlocking
overhead associated with these process data structures. Additionally,
double-mapping into system space allows the OpenVMS system direct
access to the process memory from system context.
To date, only the $QIO system service and the
Fast I/O services have been changed to accept buffer objects. For
example, a buffer object allows a programmer to eliminate I/O memory
management overhead. On each I/O, each page of a user data buffer
is probed and then locked down on I/O initiation and unlocked on I/O
completion. Instead of incurring this overhead for each I/O, it can
be done once at buffer object creation time. Subsequent I/O operations
involving the buffer object can completely avoid this memory management
overhead.
Two system services can be used to create and
delete buffer objects, respectively, and can be called from any access
mode. To create a buffer object, the $CREATE_BUFOBJ system service
is called. This service expects as inputs an existing process memory
range and returns a buffer handle for the buffer object. The buffer
handle is an opaque identifier used to identify the buffer object
on future I/O requests. The $DELETE_BUFOBJ system service is used
to delete the buffer object and accepts as input the buffer handle.
Although image rundown deletes all existing buffer objects, it is
good form for the application to clean up properly.
A 64-bit equivalent version of the $CREATE_BUFOBJ
system service ($CREATE_BUFOBJ_64) can be used to create buffer objects
from the new 64-bit P2 or S2 regions. The $DELETE_BUFOBJ system service
can be used to delete 32-bit or 64-bit buffer objects.
Buffer objects require system management. Because
buffer objects tie up physical memory, extensive use of buffer objects
requires system management planning. All the bytes of memory in the
buffer object are deducted from a systemwide system parameter called
MAXBOBMEM (maximum buffer object memory). System managers must set
this parameter correctly for the application loads that run on their
systems.
The MAXBOBMEM parameter defaults to 100 Alpha
pages, but for applications with large buffer pools it will likely
be set much larger. To prevent user-mode code from tying up excessive
physical memory, user-mode callers of $CREATE_BUFOBJ must have a new
system identifier, VMS$BUFFER_OBJECT_USER, assigned. This new identifier
is automatically created in an OpenVMS Version 7.0 upgrade if the
file SYS$SYSTEM:RIGHTSLIST.DAT is present. The system manager can
assign this identifier with the DCL command SET ACL command to a protected
subsystem or application that creates buffer objects from user mode.
It may also be appropriate to grant the identifier to a particular
user with the Authorize utility command GRANT/IDENTIFIER (for example,
to a programmer who is working on a development system).
There is currently a restriction on the type of
process memory that can be used for buffer objects. Global section
memory cannot be made into a buffer object.
10.1.3 Differences Between Fast I/O Services and $QIO
The precise definition of high-volume I/O operations
optimized by Fast I/O services is important. I/O that does not comply
with this definition either is not possible with the Fast I/O services
or is not optimized. The characteristics of the high-volume I/O optimized
by Fast I/O services can be seen by contrasting the operation of Fast
I/O system services to the $QIO system service as follows:
The $QIO system service
I/O status block (IOSB) is replaced by an I/O status area (IOSA) that
is larger and quadword aligned. The transfer byte count returned in
IOSA is 64 bits, and the field is aligned on a quadword boundary.
Unlike the IOSB, which is optional, the IOSA is required.
User data buffers must
be aligned to a 512-byte boundary.
All user process structures
passed to the Fast I/O system services must reside in buffer objects.
This includes the user data buffer and the IOSA.
Only transfers that are
multiples of 512 bytes are supported.
Only the following function
codes are supported: IO$_READVBLK, IO$_READLBLK, IO$_WRITEVBLK, and
IO$_WRITELBLK.
Only I/O to disk and tape
devices is optimized for performance.
No event flags are used
with Fast I/O services. If application code must use an event flag
in relation to a specific I/O, then the Event No Flag EFN (EFN$C_ENF)
can be used. This event flag is a no-overhead EFN that can be used
in situations when an EFN is required by a system service interface
but has no meaning to an application.
For
example, Fast I/O services do not use EFNs, so the application cannot
specify a valid EFN associated with the I/O to the $SYNCH system service
with which to synchronize I/O completion. To resolve this issue, the
application can call the $SYNCH system service passing as arguments:
EFN$C_ENF and the address of the appropriate IOSA. Specifying EFN$C_ENF
signifies to $SYNCH that no EFN is involved in the synchronization
of the I/O. Once the IOSA has been written with a status and byte
count, return from the $SYNCH call occurs. The IOSA is now the central
point of synchronization for a given Fast I/O (and is the only way
to determine whether the asynchronous I/O is complete).
To minimize arguments
passing overhead to these services, the $QIO parameters P3 through
P6 are replaced by a single argument that is passed directly by the
Fast I/O system services to device drivers. For disk-like devices,
this argument is the media address (VBN or LBN) of the transfer. For
drivers with complex parameters, this argument is the address of a
descriptor or of a buffer specific to the device and function.
Segmented transfers are
supported by Fast I/O but are not fully optimized. There are two major
causes of segmented transfers. The first is disk fragmenting. While
this can be an issue, it is assumed that sites seeking maximum performance
have eliminated the overhead of segmenting I/O due to fragmentation.
A second cause of segmenting is issuing an I/O
that exceeds the port's maximum limit for a single transfer.
Transfers beyond the port maximum limit are segmented into several
smaller transfers. Some ports limit transfers to 64KB. If the application
limits its transfers to less than 64KB, this type of segmentation
should not be a concern.
10.1.4 Using Fast I/O Services
The three Fast I/O system services are:
$IO_SETUP—-Sets
up an I/O
$IO_PERFORM[W]—-Performs
an I/O request
$IO_CLEANUP—Cleans
up an I/O request
10.1.4.1 Using Fandles
A key concept behind the operation of the Fast
I/O services is the file handle or fandle. A fandle is an opaque token that represents a “setup”
I/O. A fandle is needed for each I/O outstanding from a process.
All possible setup, probing, and validation of
arguments is performed off the mainline code path during application
startup with calls to the $IO_SETUP system service. The I/O function,
the AST address, the buffer object for the data buffer, and the IOSA
buffer object are specified on input to $IO_SETUP service, and a fandle
representing this setup is returned to the application.
To perform an I/O, the $IO_PERFORM system service
is called, specifying the fandle, the channel, the data buffer address,
the IOSA address, the length of the transfer, and the media address
(VBN or LBN) of the transfer.
If the asynchronous version of this system service,
$IO_PERFORM, is used to issue the I/O, then the application can wait
for I/O completion using a $SYNCH specifying EFN$C_ENF and the appropriate
IOSA. The synchronous form of the system service, $IO_PERFORMW, is
used to issue an I/O and wait for it to complete. Optimum performance
comes when the application uses AST completion; that is, the application
does not issue an explicit wait for I/O completion.
To clean up a fandle, the fandle can be passed
to the $IO_CLEANUP system service.
10.1.4.2 Modifying Existing Applications
Modifying an application to use the Fast I/O services
requires a few source-code changes. For example:
A programmer adds code
to create buffer objects for the IOSAs and data buffers.
The programmer changes
the application to use the Fast I/O services. Not all $QIOs need to
be converted. Only high-volume read/write I/O requests should be changed.
A simple example is a “database writer”
program, which writes modified pages back to the database. Suppose
the writer can handle up to 16 simultaneous writes. At application
startup, the programmer would add code to create 16 fandles by 16
$IO_SETUP system service calls.
In the main processing
loop within the database writer program, the programmer replaces the
$QIO calls with $IO_PERFORM calls. Each $IO_PERFORM call uses one
of the 16 available fandles. While the I/O is in progress, the selected
fandle is unavailable for use with other I/O requests. The database
writer is probably using AST completion and recycling fandle, data
buffer, and IOSA once the completion AST arrives.
If the database writer routine cannot return until all
dirty buffers are written (that is, it must wait for all I/O completions),
then $IO_PERFORMW can be used. Alternatively $IO_PERFORM calls can
be followed by $SYNCH system service calls passing the EFN$C_ENF argument
to await I/O completions.
The database writer runs faster and scale better
because I/O requests now use less CPU time.
When the application exits,
an $IO_CLEANUP system service call is done for each fandle returned
by a prior $IO_SETUP system service call. Then the buffer objects
are deleted. Image rundown performs fandle and buffer object cleanup
on behalf of the application, but it is good form for the application
to clean up properly.
10.1.4.3 I/O Status Area (IOSA)
The central point of synchronization for a given
Fast I/O is its IOSA. The IOSA replaces the $QIO system service's
IOSB argument. Larger than the IOSB argument, the byte count field
in the IOSA is 64 bits and quadword aligned. Unlike the $QIO system
service, Fast I/O services require the caller to supply an IOSA and
require the IOSA to be part of a buffer object.
The IOSA context field can be used in place of
the $QIO system service ASTPRM argument. The $QIO ASTPRM argument
is typically used to pass a pointer back to the application on the
completion AST to locate the user context needed for resuming a stalled
user-thread; however, for the $IO_PERFORM system service, the ASTPRM
on the completion AST is always the IOSA. Because there is no user-settable
ASTPRM, an application can store a pointer to the user-thread context
for this I/O in the IOSA context field and retrieve the pointer from
the IOSA in the completion AST. )
10.1.4.4 $IO_SETUP
The $IO_SETUP system service performs the setup
of an I/O and returns a unique identifier for this setup I/O, called
a fandle, to be used on future I/Os. The $IO_SETUP arguments used
to create a given fandle remain fixed throughout the life of the fandle.
This has implications for the number of fandles needed in an application.
For example, a single fandle can be used only for reads or only for
writes. If an application module has up to 16 simultaneous reads or
writes pending, then potentially 32 fandles are needed to avoid any
$IO_SETUP calls during mainline processing.
The $IO_SETUP system service supports an expedite
flag, which is available to boost the priority of an I/O among the
other I/O requests that have been handed off to the controller. Unrestrained
use of this argument is useless, because if all I/O is expedited,
nothing is expedited. Note that this flag requires the use of ALTPRI
and PHY_IO privilege.
10.1.4.5 $IO_PERFORM[W]
The $IO_PERFORM[W] system service accepts a fandle
and five other variable I/O parameters for the high-performance I/O
operation. The fandle remains in use to the application until the
$IO_PERFORMW returns or if $IO_PERFORM is used until a completion
AST arrives.
The CHAN argument to the fandle contains the data
channel returned to the application by a previous file operation.
This argument allows the application the flexibility of using the
same fandle for different open files on successive I/Os; however,
if the fandle is used repeatedly for the same file or channel, then
an internal optimization with $IO_PERFORM is taken.
Note that $IO_PERFORM was designed to have no
more than six arguments to take advantage of the HP OpenVMS
Calling Standard, which specifies that calls with up to
six arguments can be passed entirely in registers.
10.1.4.6 $IO_CLEANUP
A fandle can be cleaned up by passing the fandle
to the $IO_CLEANUP system service.
10.1.4.7 Fast I/O FDT Routine (ACP_STD$FASTIO_BLOCK)
Because $IO_PERFORM supports only four function
codes, this system service does not use the generalized function decision
table (FDT) dispatching that is contained in the $QIO system service.
Instead, $IO_PERFORM uses a single vector in the driver dispatch table
called DDT$PS_FAST_FDT for the four supported functions. The DDT$PS_FAST_FDT
field is a FDT routine vector that indicates whether the device driver
called by $IO_PERFORM is set up to handle Fast I/O operations. A nonzero
value for this field indicates that the device driver supports Fast
I/O operations and that the I/O can be fully optimized.
If the DDT$PS_FAST_FDT field is zero, then the
driver is not set up to handle Fast I/O operations. The $IO_PERFORM
system service tolerates such device drivers, but the I/O is only
slightly optimized in this circumstance.
The OpenVMS disk and tape drivers that ship as
part of OpenVMS Version 7.0 have added the following line to their
driver dispatch table (DDTAB) macro:
This line initializes the DDT$PS_FAST_FDT field
to the address of the standard Fast I/O FDT routine, ACP_STD$FASTIO_BLOCK.
If you have a disk or tape device driver that
can handle Fast I/O operations, you can add this DDTAB macro line
to your driver. If you cannot use the standard Fast I/O FDT routine,
ACP_STD$FASTIO_BLOCK, you can develop your own based on the model
presented in this routine.
10.1.5 Additional Information
See the HP OpenVMS System
Services Reference Manual for additional information about
the following Fast I/O system services:
$CREATE_BUFOBJ
$DELETE_BUFOBJ
$CREATE_BUFOBJ_64
$IO_SETUP
$IO_PERFORM
$IO_CLEANUP
To see a sample program that demonstrates the
use of buffer objects and the Fast I/O system services, see the IO_PERFORM.C
program in the SYS$EXAMPLES directory.