|
» |
|
|
|
|
|
|
|
The article explains what alignment faults
are, describes how alignment faults impact application performance, presents
ways to detect alignment faults on a running system, and provides a few ideas
on fixing alignment faults.
|
|
|
|
|
|
AlphaServer and Intel® Itanium® 2 processors provide fast access to naturally
aligned data. To be naturally aligned, a word datum must be on a word boundary,
a longword datum must be on a longword boundary, and a quadword datum must be
on a quadword boundary.
When an attempt is made to load or store a
quadword, longword, or word to or from a memory location that does not have a
naturally aligned address, the processor transfers control to a special routine
(PALcode on AlphaServer systems and an operating system routine on Intel® Itanium® 2
systems) to execute a series of instructions to perform the unaligned
access. The step of executing a special set of routines to access unaligned
data is referred to as alignment fault.
The following diagram illustrates the difference between aligned and unaligned memory access:
In the first row, we access a longword
starting with address 0 that is naturally aligned so all is well. In the second
row we attempt to access a longword starting at address 10. This address is not
naturally aligned (10 divided by 4 does not yield a remainder of 0). Alignment
fault will occur in this case. In the third row, we attempt to read a quadword
starting at address 16 that is naturally aligned (16 divided by 8 yields a
remainder of 0) so all is well. In the fourth row, we attempt to access a
quadword starting at address 28. Address 28 is not quadword aligned so an
alignment fault will occur.
Okay...I understand Alignment faults but why should I care?
When the compiler can detect misaligned data,
what would normally take three instructions on an AlphaServer system will take
fifteen. As not all of these instructions access memory, the aggregate
degradation in performance is an instruction stream that is three times slower.
When the compiler cannot correct the problem, a run time alignment fault is
incurred. The alignment handler is about ten to twenty times slower than
accessing naturally aligned data.
The behavior of an Intel® Itanium® 2 system is similar to the AlphaServer, except
that alignment faults are hundreds to thousands of times slower than accessing
naturally aligned data, as alignment faults are handled by the operating system
itself instead of PAL code (firmware). There is also a system-wide impact for
resolving alignment faults. This impact is due to the requirement for spinlock
(MMG) and associated MP synchronization time.
Let's take a look at a small example. The
following program allocates 1 GB of virtual memory in P2 space and randomly
increments 50,000,000 quadwords.
$ ty aligned.c
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define random_key(upper_bound) (abs (random () % upper_bound))
void main()
{
int NumberOfBytes = 1000000000; // 1GB using marketing bytes
int status;
VOID_PQ MappedVA;
INT64_PQ RandomVA;
lib$init_timer(); // initialize timer
//
// Allocate 1GB from P2 space
//
status = lib$get_vm_64 (&NumberOfBytes, &MappedVA);
if (!$VMS_STATUS_SUCCESS(status))
{
lib$signal (status);
return;
}
RandomVA = MappedVA;
for (int i=0; i<50000000; i++)
{
// Increment a random Quadword
RandomVA [random_key((100000000/8) -1)] ++ ;
}
//
// Free VM
//
status = lib$free_vm_64 (&NumberOfBytes, &MappedVA);
if (!$VMS_STATUS_SUCCESS(status))
{
lib$signal (status);
return;
}
lib$show_timer();
}
$! Run the program – rx2600 1.3 GHZ
$ cc/pointer=long aligned
$ link aligned
$ r aligned
ELAPSED: 0 00:00:18.97 CPU: 0:00:18.97 BUFIO: 0 DIRIO: 0 FAULTS: 713808
$
|
Incrementing 50,000,000 random quadwords on a 1.3 GHz Integrity rx2600 Server took 18.97 seconds.
Now, let's force the above program to
increment 50,000,000 quadwords using unaligned pointers:
$ ty not_aligned.c
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define random_key(upper_bound) (abs (random () % upper_bound))
void main()
{
int NumberOfBytes = 1000000000; // 1GB using marketing bytes
int status;
VOID_PQ MappedVA;
INT64_PQ RandomVA;
lib$init_timer(); // initialize timer
//
// Allocate 1GB from P2 space
//
status = lib$get_vm_64 (&NumberOfBytes, &MappedVA);
if (!$VMS_STATUS_SUCCESS(status))
{
lib$signal (status);
return;
}
//
// Force the pointer to become unaligned
//
RandomVA = (INT64_PQ)((char *) MappedVA + 1);
for (int i=0; i<50000000; i++)
{
// Increment a random Quadword
RandomVA [random_key((100000000/8) -1)] ++ ;
}
//
// Free VM
//
status = lib$free_vm_64 (&NumberOfBytes, &MappedVA);
if (!$VMS_STATUS_SUCCESS(status))
{
lib$signal (status);
return;
}
lib$show_timer();
}
$ cc/pointer=long not_aligned.c
$ link not_aligned
$ r not_aligned
ELAPSED: 0 00:03:45.62 CPU: 0:03:45.53 BUFIO: 0 DIRIO: 0 FAULTS: 200027
$
|
The same 1.3 GHz Integrity rx2600 Server increments 50,000,000 unaligned quadwords
in 3 minutes and 45 seconds.
For our small test program, performance
degrades by more than 12 times
when accessing unaligned data.
|
|
|
|
|
|
OpenVMS V8.3 introduced a new class for the monitor utility. The
align class monitors alignment faults currently occurring throughout the system
and breaks out the output per mode.
The following display was generated while
running the NOT_ALIGNED program:
$ monitor align/int=1
OpenVMS Monitor Utility
ALIGNMENT FAULT STATISTICS
on node IT13
21-NOV-2006 01:50:13.26
CUR AVE MIN MAX
Kernel Fault Rate 0.00 0.44 0.00 4.00
Exec Fault Rate 0.00 0.00 0.00 0.00
Super Fault Rate 0.00 0.00 0.00 0.00
User Fault Rate 445492.00 220809.67 0.00 445492.00
Total Fault Rate 445492.00 220810.12 0.00 445492.00
|
Our test program generates more than 445,000
alignment faults per second, all in user mode.
MONITOR ALIGN provides a high-level overview
of alignment faults currently occurring on the system. It helps detect
alignment faults and warns that the system is suffering from alignment faults.
But MONITOR ALIGN does not provide any information about which process or
program generated the alignment faults. MONITOR ALIGN is intended to help and
determine if you are suffering from alignment faults. Different tools should be
used to determine what is generating the faults. Note that MONITOR ALIGN is
currently available on Intel®
Itanium® 2 systems
only.
|
|
|
|
|
|
Once you determine that your system is prone
to alignment fault issues, the next step is to determine where the faults are
coming from. The FLT extension in SDA is a very powerful tool for detecting and
logging alignment faults. For each alignment fault that occurs while logging is
enabled, it logs the time the fault occurred, the CPU encountering the fault,
the unaligned Virtual Address, access mode, and process id. This information allows the developer to
determine the exact location in the application which generated the alignment fault.
The FLT extension is available on both AlphaServer and Intel® Itanium® 2 systems.
Here are few examples demonstrating the use of FLT
$ ana/sys
OpenVMS system analyzer
Load the SDA extension
SDA> flt load
FLT$DEBUG load status = 00000001
Start tracing ...
SDA> flt start trace
Tracing started...
Look at the summary display
SDA> flt show trace/sum
Fault Trace Information: (at 21-NOV-2006 02:07:21.87, trace time 00:00:00.190015)
---------------------------------------------------------------------------------
Exception PC Count Exception PC Module Offset
----------------- ------------ -------------------------------------- ----------------------------------
00000000.000103D1 39384 SYS$K_VERSION_16+00391
00000000.000103E1 39383 SYS$K_VERSION_16+003A1
Two Program Counters are displayed pointing to PC 103D1 and 103E1, each PC generated
more 39834 faults. Let's find our culprit, instead of looking at the summary output
we can look at individual entries in the trace buffer for more information:
SDA> flt show trace
Unaligned Data Fault Trace Information:
---------------------------------------
Timestamp CPU Exception PC Unaligned VA Access EPID Trace Buffer
---------------------- --- ----------------------------------------------- ----------------- ------ -------- -----------------
21-NOV 02:08:22.002794 00 00000000.000103E1 SYS$K_VERSION_16+003A1 00000000.840BECF9 User 2160057F FFFFFFFF.7E4E86C0
21-NOV 02:08:22.002791 00 00000000.000103D1 SYS$K_VERSION_16+00391 00000000.840BECF9 User 2160057F FFFFFFFF.7E4E8658
21-NOV 02:08:22.002789 00 00000000.000103E1 SYS$K_VERSION_16+003A1 00000000.84617049 User 2160057F FFFFFFFF.7E4E85F0
21-NOV 02:08:22.002786 00 00000000.000103D1 SYS$K_VERSION_16+00391 00000000.84617049 User 2160057F FFFFFFFF.7E4E8588
21-NOV 02:08:22.002784 00 00000000.000103E1 SYS$K_VERSION_16+003A1 00000000.8252A0E1 User 2160057F FFFFFFFF.7E4E8520
21-NOV 02:08:22.002781 00 00000000.000103D1 SYS$K_VERSION_16+00391 00000000.8252A0E1 User 2160057F FFFFFFFF.7E4E84B8
21-NOV 02:08:22.002779 00 00000000.000103E1 SYS$K_VERSION_16+003A1 00000000.850E3241 User 2160057F FFFFFFFF.7E4E8450
21-NOV 02:08:22.002776 00 00000000.000103D1 SYS$K_VERSION_16+00391 00000000.850E3241 User 2160057F FFFFFFFF.7E4E83E8
21-NOV 02:08:22.002774 00 00000000.000103E1 SYS$K_VERSION_16+003A1 00000000.84CD53D1 User 2160057F FFFFFFFF.7E4E8380
21-NOV 02:08:22.002771 00 00000000.000103D1 SYS$K_VERSION_16+00391 00000000.84CD53D1 User 2160057F FFFFFFFF.7E4E8318
.......
All the entries are pointing to process with ID 2160057F, let's look at the
process to find out what image it is executing:
SDA> set proc/id=2160057F
SDA> show proc/image
Process index: 017F Name: Faulty Extended PID: 2160057F
--------------------------------------------------------------------
Process activated images
------------------------
Image Name Type IMCB GP
--------------------------------------- ------------ -------- -----------------
NOT_ALIGNED MAIN 7FE89290 00000000.00240000
DCL MRGD SHR 7FE88BD0 00000000.7B0D8000
LIBRTL GLBL SHR 7FE8BC10 00000000.7B546000
LIBOTS GLBL SHR 7FE8A690 00000000.7B560000
CMA$TIS_SHR GLBL SHR 7FE88010 00000000.7B73C000
DPML$SHR GLBL SHR 7FE88270 00000000.7B904000
DECC$SHR GLBL SHR 7FE883A0 00000000.7BB10000
SYS$PUBLIC_VECTORS GLBL 7FE886C0 FFFFFFFF.8CA00400
SYS$BASE_IMAGE GLBL 7FE88920 FFFFFFFF.8CA24E00
Total images = 9 Pages allocated = 322
SDA> map 0103E1
Image Base End Image Offset
NOT_ALIGNED
Code 00000000.00010000 00000000.0001059F 00000000.000103E1
SDA>
We found out that all the alignment faults are generated by process "Faulty" executing
the NOT_ALIGNED image. Next step would be to look at the listing and determine
the offending code in offset 103E1.
Before we look at the listing, the FLT extension can interpret the location of the faulting PC if
the image contains traceback information and if it lives in system space. Now, let's install
NOT_ALIGNED.EXE as resident image, it will force the image to be copied into system space:
SDA> flt stop trace
SDA> spawn instal add/resi SYS$SYSDEVICE:[PELEG]NOT_ALIGNED
SDA> flt start trace
Tracing started...
SDA> flt show trace/summ
Fault Trace Information: (at 21-NOV-2006 02:13:23.77, trace time 00:00:00.190637)
---------------------------------------------------------------------------------
Exception PC Count Exception PC Module Offset
----------------- ------------ -------------------------------------- ----------------------------------
FFFFF802.11EFE3D1 39384 NOT_ALIGNED+103D1 NOT_ALIGNED 000103D1
NOT_ALIGNED + 000003D1 / main + 000002D1
FFFFF802.11EFE3E1 39383 NOT_ALIGNED+103E1 NOT_ALIGNED 000103E1
NOT_ALIGNED + 000003E1 / main + 000002E1
SDA>
We start tracing again, now the summary display show the exact location in the image that
generated the fault. In our example this is routine main+2D1 and main +2E1 in NOT_ALIGNED.EXE.
Let's look at relevant portion of the listing in NOT_ALIGNED.LIS
001000000046 0240 (pr6) break.m 1048577
00C7080121C0 0241 setf.sig f7 = r9
018402242200 0242 cmp4.lt pr8, pr0 = i, r34 ;; // pr8, pr0 = r33, r34 // 023707
}
{ .mfi
00C708006180 0250 setf.sig f6 = r3 // 023711
000008000000 0251 nop.f 0
000008000000 0252 nop.i 0 ;;
}
{ .mfi
000008000000 0260 nop.m 0
0000E000E240 0261 fcvt.xf f9 = f7
000008000000 0262 nop.i 0
}
{ .mfi
000008000000 0270 nop.m 0
0000E000C200 0271 fcvt.xf f8 = f6
000008000000 0272 nop.i 0 ;;
}
{ .mfi
000008000000 0280 nop.m 0
000630910280 0281 frcpa.s1 f10, pr6 = f8, f9
000008000000 0282 nop.i 0 ;;
}
{ .mfi
000008000000 0290 nop.m 0
018448A021C6 0291 (pr6) fnma.s1 f7 = f10, f9, f1
000008000000 0292 nop.i 0 ;;
}
{ .mfi
000008000000 02A0 nop.m 0
010438A142C6 02A1 (pr6) fma.s1 f11 = f10, f7, f10
000008000000 02A2 nop.i 0
}
{ .mfi
000008000000 02B0 nop.m 0
010438700186 02B1 (pr6) fma.s1 f6 = f7, f7, f0
000008000000 02B2 nop.i 0 ;;
}
{ .mfi
000008000000 02C0 nop.m 0
0104508001C6 02C1 (pr6) fma.s1 f7 = f8, f10, f0
000008000000 02C2 nop.i 0 ;;
}
{ .mfi
000008000000 02D0 nop.m 0
010430B16286 02D1 (pr6) fma.s1 f10 = f11, f6, f11
000008000000 02D2 nop.i 0 ;;
}
{ .mfi
000008000000 02E0 nop.m 0
0184389102C6 02E1 (pr6) fnma.s1 f11 = f9, f7, f8
000008000000 02E2 nop.i 0
}
{ .mfi
000008000000 02F0 nop.m 0
018448A02186 02F1 (pr6) fnma.s1 f6 = f10, f9, f1
000008000000 02F2 nop.i 0 ;;
}
{ .mfi
main + 2D1 and main + 2E1 point to line number 23711 in the source:
1 23707 for (int i=0; i<50000000; i++)
2 23708 {
2 23709
2 23710 // Increment a random Quadword
2 23711 RandomVA [random_key((100000000/8) -1)] ++ ;
2 23712
1 23713 }
The next step logical step would be fixing the program to avoid unaligned memory access.
|
|
|
|
|
|
|
The symbolic debugger can be used for
detecting alignment faults. The
SET BREAK/UNALIGN command will cause the
debugger to break each time an alignment fault occurs. The faulting Virtual Address, the current PC,
and the source line that generated the fault will be displayed:
$ run/debug not_aligned
OpenVMS I64 Debug64 Version V8.3-009
%DEBUG-I-INITIAL, Language: C, Module: NOT_ALIGNED
%DEBUG-I-NOTATMAIN, Type GO to reach MAIN program
DBG> set break/unaligned
DBG>
* SRC: module NOT_ALIGNED -scroll-source********************************************************************************************
23703: // Force the pointer to become unaligned
23704: //
23705: RandomVA = MappedVA + 1;
23706:
23707: for (int i=0; i<50000000; i++)
23708: {
23709:
23710: // Increment a random Quadword
->3711: RandomVA [random_key((100000000/8) -1)] ++ ;
23712:
23713: }
23714:
23715: //
23716: // Free VM
23717: //
23718: status = lib$free_vm_64 (&NumberOfBytes, &MappedVA);
23719:
* OUT -output******************************************************************************************
Unaligned data access: virtual address = 0000000081E0E7E1, PC = 00000000000103E2
break on unaligned data trap preceding NOT_ALIGNED\main\%LINE 23711+402
23711: RandomVA [random_key((100000000/8) -1)] ++ ;
DBG>
NOTE: SET BREAK/UNALIGNED can not be used while the FLT utility is in use. When FLT is
running, attempting to use the debugger for reporting alignment faults will fail with the
following error:
DBG> set break/unalign
%SYSTEM-E-AFR_ENABLED, alignment fault reporting already enabled
-FOR-W-NOMSG, Message number 00189E80
DBG>
|
|
|
|
|
|
|
Aligning the data is the best solution for
avoiding alignment faults.
Today's compilers are smart enough to detect
alignment faults problems most of the time and add code to access the data
through multiple loads, shifts, and masks.
Sometimes it is not possible or not practical
to align the data. Such examples would be when transferring data between
systems or when reading/write from/to fixed record layout in a file.
Make sure fields within data structures are
naturally aligned. Some compilers like C and C++ do this by default. In MACRO,
use .align [quad|long]. In SDL, use basealign [quad|long]
|
|
|
|
|
|
Programming languages may support declaration
modifiers that will cause predicated code to be generated that will test for
unaligned data and operate on it in such a way as to preclude alignment faults.
Language support includes:
- __unaligned (C)
- .set_registers unaligned=<Rx> (Macro)
- align(x) (Bliss32/Bliss64)
- aligned(x) (Pascal)
Using the options will eliminate the alignment
faults. However, code accessing aligned data will be slower than normal.
Remember - the extra code generated when
giving hints to the compiler that data maybe unaligned will perform
much better than hitting an alignment fault.
Let's modify the NOT_ALIGNED program to
declare that the pointer for the random data is unaligned:
$ ty not_aligned.c
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define random_key(upper_bound) (abs (random () % upper_bound))
void main()
{
int NumberOfBytes = 1000000000; // 1GB using marketing bytes
int status;
VOID_PQ MappedVA;
INT64_PQ RandomVA;
lib$init_timer(); // initialize timer
//
// Allocate 1GB from P2 space
//
status = lib$get_vm_64 (&NumberOfBytes, &MappedVA);
if (!$VMS_STATUS_SUCCESS(status))
{
lib$signal (status);
return;
}
//
// Force the pointer to become unaligned
//
RandomVA = (INT64_PQ)((char *) MappedVA + 1);
for (int i=0; i<50000000; i++)
{
// Increment a random Quadword – pointer now declared unaligned
__int64 __unaligned *MyData = &RandomVA [random_key((100000000/8) -1)];
*MyData = *MyData + 1;
}
//
// Free VM
//
status = lib$free_vm_64 (&NumberOfBytes, &MappedVA);
if (!$VMS_STATUS_SUCCESS(status))
{
lib$signal (status);
return;
}
lib$show_timer();
}
$ cc/pointer=long not_aligned.c
$ link not_aligned
$ r not_aligned
ELAPSED: 0 00:00:20.74 CPU: 0:00:20.67 BUFIO: 0 DIRIO: 0 FAULTS: 703741
$
Now our program completed in 20.74 seconds... this is a big
improvement comparing to 3 minutes and 45 seconds when the
compiler was not expecting unaligned data.
|
|
|
» Send feedback to about this article
|