Fast Path is an optional feature designed to improve
I/O performance. Three factors serve to throttle performance for OpenVMS
on SMP systems.
Time spent by a CPU waiting
for memory to be faulted into its cache.
Contention for the SCS/IOLOCK8
spinlock.
Contention for the primary
CPU on which all I/O completion is processed.
Fast Path addresses these factors as follows:
Select a secondary CPU
for a given device or port. and cause all I/O for that device to originate
and complete on that CPU. This offloads the primary CPU and reduces
cache faults.
Replace dependence upon
SCS/IOLOCK8 spinlock by providing a port-specific spinlock whenever
possible.
For the most common I/O
requests, preallocate resources and provide an optimized path through
the mainline code.
Using Fast Path features does not require source-code
changes. It does require major changes to device drivers, so it has
been implemented only for the newer high-performance devices. These
currently service many CI, Fibre Channel, parallel SCSI, and LAN devices.
Table 10-1 lists
the supported ports for each OpenVMS Alpha version.
Table 10-1 Supported Ports for Each Version of OpenVMS Alpha and Integrity
servers
Version
Supported Ports
7.3-2
SMART Array 53xx, many
LAN devices
7.3-1
KZPEA
7.3
CIXCD, CIPCA, KGPSA,
KZPBA
7.1
CIXCD, CIPCA
7.0
CIXCD
Prior to OpenVMS Alpha Version 7.3-1, all hardware
interrupts took place on the primary CPU. Interrupts from Fast Path
enabled devices would have to be redirected from the primary CPU to
a ''preferred'' CPU. However, this redirection
still involved the primary CPU, and also incurred interprocessor overhead.
Starting with OpenVMS Alpha Version 7.3-1, hardware
interrupts that are targeted for a ''preferred''
CPU go directly to the ''preferred'' CPU, thereby
eliminating any I/O processing in the primary CPU. This major Fast
Path enhancement is known as distributed interrupts.
NOTE: This feature is available on Fibre Channel, CI,
and some SCSI ports on AlphaServer DS20, ES40/45, and GS series systems.
For more information about Fibre Channel, SCSI,
and CI configurations, see Guidelines for
OpenVMS Cluster Configurations.
10.2.1 Using Fast Path Features
Preferred CPU Selection
All Fast Path ports are assignable to CPUs. You
can set a system parameter specifying the set of CPUs that are allowed
to serve as preferred CPUs. This set is called the set of allowable CPUs. At any point in time, the set
of CPUs that currently can have ports assigned to them, called the
set of usable CPUs, is the intersection
of the set of allowable CPUs, and the current set of running CPUs.
Each Fast Path Port is initially assigned to a
CPU by the FASTPATH_SERVER process
that runs at port initialization time. This process executes an automatic
assignment algorithm that spreads Fast Path ports evenly among the
usable CPUs. The FASTPATH_SERVER process also runs whenever a secondary
CPU is started, and whenever the set of system parameters specifying
the allowable CPUs is changed.
If the primary CPU is in the set of allowable
CPUs, the initial distribution is biased against the primary CPU in
that a port will only be assigned to the primary after ports have
been assigned to each of the other usable CPUs.
To identify a device or port's current preferred
CPU, you can use either $GETDVI or the SHOW DEVICE/FULL command. To
identify the Fast Path ports currently assigned to a CPU, you use
the SHOW CPU /FULL command.
You can directly assign a Fast Path port to a
CPU, or request the system to automatically select the port's
preferred CPU from a specific set of CPUs. To do this, you either
issue a $QIO or use the SET DEVICE/PREFERRED_CPU command. This also
sets the port's User Preferred CPU to be the selected CPU.
You can clear the port's User Preferred CPU
by issuing either a $QIO, or by using the SET DEVICE/NOPREFERRED CPU
DCL command.
You can redistribute the system assignable Fast
Path ports across a subset of the set of usable CPUs by calling the
$IO_FASTPATH system service.
Optimizing Application Performance
Processes running on a port's preferred CPU
have an inherent advantage when issuing I/O to a port in that the
overhead to assign the I/O to the preferred CPU can be avoided. An
application process can use the $PROCESS_AFFINITY system service to
assign itself to the preferred CPU of the device to which the majority
of its I/O is sent.
With proper attention to assignment, a process's
execution need never leave the preferred CPU. This presents a scalable
process and I/O scheme for maximizing multiprocessor system operation.
Like most RISC systems, Alpha system performance is highly dependent
on the performance of CPU memory caches. Process assignment and preferred
CPU assignment are two keys to minimizing the memory stalls in the
application and in the operating system, thereby maximizing multiprocessor
system throughput.
10.2.2 Managing Fast Path
This section describes how to manage Fast Path.
10.2.2.1 Fast Path System Parameters
There are three FAST_PATH system parameters:
FAST_PATH
FAST_PATH_PORTS
IO_PREFER_CPUS
These parameters can be used to control Fast Path
as follows:
FAST_PATH
FAST_PATH is a static
system parameter that enables (1) or disables (0) the Fast Path performance
features for all Fast Path-capable ports.
Fast Path is enabled by default.
FAST_PATH_PORTS
FAST_PATH_PORTS is a 32-bit
mask. Once Fast Path has been enabled by setting FAST_PATH to 1, FAST_PATH_PORTS
can be used to selectively disable Fast Path for some specific adapter
types.
The value of the FAST_PATH_PORTS
system parameter is the sum of the values of the bits that have been
set. Table 10-2 describes the
bit mask:
Table 10-2 FAST_PATH_PORTS Bit Masks
Bit
Mask
Description
0
00000001
0 = Fast Path is ENABLED
for KZPBA ports when FAST_PATH is set to 1.
1 = Fast Path is DISABLED for KZPBA ports.
1
00000002
0 = Fast Path is ENABLED
for KGPSA ports when FAST_PATH is set to 1.
1 = Fast Path is DISABLED for KGPSA ports.
2
00000004
0 = Fast Path is ENABLED
for KZPEA ports when FAST_PATH is set to 1.
1 = Fast Path is DISABLED for KZPEA ports.
3
00000008
0 = Fast Path is ENABLED
for LAN ports when FAST_PATH is set to 1.
1 = Fast Path is DISABLED for LAN ports.
4
00000010
0 = Fast Path is ENABLED
for KZPDC ports when FAST_PATH is set to 1.
1 = Fast Path is DISABLED
for KZPDC ports.
The remaining bits are reserved for possible future
adapter types.
The default setting for FAST_PATH_PORTS is 0;
therefore, all supported ports are enabled.
Note that CI drivers are not controlled by FAST_PATH_PORTS.
Fast Path for CI is enabled and disabled exclusively by the FAST_PATH
system parameter.
IO_PREFER_CPUS
IO_PREFER_CPUS is a dynamic
system parameter that controls the set of CPUs available for use as
Fast Path preferred CPUs.
IO_PREFER_CPUS
is a CPU bit mask specifying the CPUs that are allowed to serve as
preferred CPUs and thus can be assigned a Fast Path port. CPUs whose
bit is set in the IO_PREFER_CPUS bit mask are enabled for Fast Path
port assignment. IO_PREFER_CPUS defaults to -1, which specifies that
all CPUs are allowed to be assigned Fast Path ports.
You may want to disable the primary CPU from serving
as a preferred CPU by clearing its bit in IO_PREFER_CPUS. This reserves
the primary for use by non-Fast Path IO operations.
Changing the value of IO_PREFER_CPUS causes the
FASTPATH_SERVER process to execute the automatic assignment algorithm
that spreads Fast Path ports evenly among the new set of usable CPUs.
10.2.2.2 Identifying and Setting a Port's Preferred CPU
Following are the commands used to identify and
set a preferred CPU for a port.
DCL SHOW DEVICE/FULL
or $GETDVI DVI$_PREFERRED_CPU
To identify the preferred
CPU for any Fast Path-capable device when Fast Path is enabled, use
the DCL command SHOW DEVICE/FULL to display — whether or not
the device supports Fast Path — the current preferred CPU ID
and, if set, the User Preferred CPU ID for a port or disk device.
Alternatively, the $GETDVI system service or the
DCL F$GETDVI lexical function returns the preferred CPU for a given
device or file. The $GETDVI system service item code is DVI$_PREFERRED_CPU,
and the F$GETDVI item code string argument is PREFERRED_CPU. The return
argument is a 32-bit CPU bit mask with a bit set indicating the preferred
CPU. A return argument containing a bit mask of zero indicates that
no preferred CPU exists, either because Fast Path is disabled or the
device is not a Fast Path-capable device. The return argument serves
as a CPU bit mask input argument to the $PROCESS_AFFINITY system service.
The argument can be used to assign an application process to the optimal
preferred CPU.
For an application seeking optimal Fast Path benefits,
you can code each application process to identify and run on the preferred
CPU where the majority of the process' I/O activity occurs.
A high-availability feature of OpenVMS Cluster
Systems is that dual-pathed devices automatically fail over to a secondary
path, if the primary path becomes inoperable. Because a Fast Path
device could fail over to another path or port, and thereby, to another
preferred CPU, an application should occasionally reissue the $GETDVI
in a timer thread to check that process assignment is optimal.
DCL SHOW CPU /FULL
You can use this DCL command
to identify whether a CPU is enabled for use as a preferred CPU, and
the current set of ports assigned to that CPU.
DCL SET DEVICE /PREFERRED_CPU
and /NOPREFERRED_CPU
These commands allow you
to specify a CPU or a set of candidate CPUs from which the operating
system chooses the CPU to assign to the Fast Path port. The chosen
CPU is called the preferred CPU for this Fast Path port. The Fast
Path port's interrupt I/O completion processing and I/O initiation
processing is performed on this preferred CPU.
In addition to selecting the preferred CPU, the User Preferred CPU
is set for this port. Setting the User Preferred CPU prevents the
port from being reassigned to another CPU unless the User Preferred
CPU is being stopped. The qualifier can be negated. When the /NOPREFERRED_CPUS
qualifier is specified, the User Preferred CPU is cleared for the
port, but it still remains a Fast Path port, and the current preferred
CPU is not changed.
If both /PREFERRED_CPUS and /NOPREFERRED_CPUS
are specified on the same command line, /NOPREFERRED_CPUS is ignored.
You can change the assignment
of a Fast Path port to a CPU by issuing a $QIO IO$_SETPRFPATH (Set
Preferred Path) to the port device, for example, PNA0. The IO$M_PREFERRED_CPU
modifier must be set, and the $QIO argument P1 must be set to either
0 or the address of a 32-bit CPU bit mask with a bit set indicating
the new preferred CPU. On return from the I/O, the port and its associated
devices are all assigned to a new preferred CPU. Note that explicitly
setting the preferred CPU overrides any default assignment of Fast
Path ports to CPUs. This interface allows you the flexibility to load
balance I/O activity over multiple CPUs in an SMP system. This is
important because I/O activity can change over the course of a day
or week.
The $QIO passes in either a set
containing one or more candidate CPUs, or 0 as a wildcard value indicating
the set of usable CPUs. If the candidate set contains only one CPU,
you are explicitly designating the new preferred CPU. If the candidate
set contains multiple CPUs, you are requesting use of the automatic
preferred CPU assignment algorithm to select a suitable CPU from the
candidate set.
Including the IO$M_SYS_ASSIGNABLE modifier inhibits
setting the selected CPU as the device's User Preferred CPU.
The $QIO or the SET DEVICE/PREFERRED_CPU command
makes a best effort to assign the port to a CPU. However, it is possible
for this request to return failure for the following reasons:
There is no intersection
between the candidate set and the node's set of usable CPUs.
There is resource contention.
If after a reasonable effort the request is unable to acquire a key
system resource, the request fails. Some key resources include Fast
Path spinlock, the CPU mutex, and a CPU transition lock.
If the $QIO or SET DEVICE/PREFERRED_CPU returns
failure, you should consider retrying either immediately or after
a short delay. It is possible that a large number of ports were being
reassigned, and the request failed due to resource contention.
$IO_FASTPATH
The $IO_FASTPATH system
service performs operations on the set of Fast Path devices and CPUs
enabled for Fast Path use. The $IO_FASTPATHW system service completes
synchronously. That is, it returns after the operation is complete.
The FP$K_BALANCE_PORTS function code specifies
that the system service is to distribute the set of system assignable
Fast Path ports across the intersection of a caller-supplied set of
candidate CPUs.
10.2.3 Fast Path Restrictions
Fast Path restrictions include the following:
Only high-volume I/Os
are optimized.
Fast Path streamlines the
operation of high-volume I/O. I/O that does not meet the definition
of high-volume is not optimized.
A high-volume Fast Path I/O is a read or write
operation to a Fast Path device without special I/O modifiers issued
at a time when necessary resources have been pre-allocated and there
are no circumstances restricting I/O operations.
Send-credits resource
must be managed for DSA controllers.
Applications
seeking maximum performance must ensure the availability of sufficient
I/O resources.
The only I/O resource that a Fast Path user needs
to be concerned about is send credits. Send credits are extended by
DSA controllers to host systems and represent the maximum number of
I/Os that can be outstanding at any given point in time. If an application
sends an unlimited number of simultaneous I/Os to a controller, it
is likely that some I/O will back up waiting for send credits.
You can tell whether the send-credit limit is
being exceeded by using the DCL command SHOW CLUSTER/CONTINUOUS, followed
by an ADD CONNECTIONS, CR_WAIT command. Rapidly increasing credit-wait
counts for the disk-class driver connections (a LOC_PROC_NAME name
of VMS$DISK_CL_DRVR) is a sign that an application may be incurring
send-credit waits.
To ensure sufficient send credits, some controllers,
like the HSC and HSJ, allow the number of send credits to vary; however,
not all controllers have this flexibility, and different controllers
have different send-credit limits. The best workaround is to know
your application access patterns and look for send-credit waits.
If the number of send credits is being exhausted
on one node, then add another controller to spread the load over multiple
controllers. An alternative is to rework the application to load balance
controller activity throughout the cluster, spreading a given controller's
disk load over multiple nodes and allowing an application to exceed
the send credits allotted to one node.
10.2.4 Special Considerations for Fast Path on Multi-RAD Systems
On systems supporting multiple resource affinity
domains (RADs), the best performance for Fast Path ports is usually
obtained by setting the Fast Path preferred CPU assignment to a CPU
within the same RAD as the port.
The FASTPATH_SERVER restricts its distribution
of ports accordingly whenever possible. If a port should be within
a RAD without available Fast Path CPUs, the system sets the preferred
CPU to the primary CPU.
Because you can override this assignment by the
methods described in this chapter, care should be taken that reassignment
does not sacrifice the performance improvements provided by localizing
activity to a single RAD.