[an error occurred while processing this directive]

HP OpenVMS Systems

HP Digital Continuous Profiling Infrastructure
» 

HP OpenVMS Systems

OpenVMS information

» What's new on our site
» Upcoming events
» Configuration and buying assistance
» Send us your comments

HP OpenVMS systems

» OpenVMS software
» Supported Servers
» OpenVMS virtualization
» OpenVMS solutions and partners
» OpenVMS success stories
» OpenVMS service and support
» OpenVMS resources and information
» OpenVMS documentation
» Education and training

DCPI

» Home
» What's New
» Install Software
» Documentation
» Publications

Evolving business value

» Business Systems Evolution
» AlphaServer systems transition planning
» Alpha RetainTrust program

Related links

» HP Integrity servers
» HP Alpha systems
» HP storage
» HP software
» HP products and services
» HP solutions
» HP support
disaster proof
HP Integrity server animation
HP Integrity server animation
Content starts here

dcpiprofileme(1)

NAME

dcpiprofileme - Uses HP DCPI to collect and view ProfileMe data

COLLECTING PROFILEME SAMPLES

On an EV67 (or later) system, tell DCPI to gather ProfileMe data via the command:

  dcpid -event pm <profile db dir>
This causes dcpi to collect ProfileMe samples. The data for each sample is decomposed into named bit and counter values.

BIT NAMES AND THEIR MEANINGS

retired
The instruction retired, that is, it was not in the shadow of any trap. However, it may have caused a mispredict trap.

taken
The conditional branch was taken. This bit is undefined for samples for instructions other than conditional branches or for a conditional branch when it mispredicts.

cbrmispredict
The conditional branch was mispredicted. This bit is clear for instructions other than conditional branches.

valid
The instruction retired and did not cause a trap.

nyp
Stands for Not Yet Prefetched. Indicates that when the fetcher asked for the fetch block containing the instruction, the instruction was not in the icache and the prefetcher had not yet initiated an off-chip request for the instruction.

If nyp is set, the instruction's fetch block definitely caused an icache miss stall.

If nyp is clear, the instruction's fetch block may have still caused an icache miss stall: the prefetcher may have made an off-chip request for the instruction, but the instruction may not have arrived at the time the fetcher needed it.

ldstorder
Indicates that a replay trap was caused by one of the following:
  • load store order

    a younger load issuing before an older store to the same physical address

  • troll order

    a younger load issuing before an older store where the dcache indexes for the physical addresses match but the higher order address bits are different

  • simultaneous load and store

    a load and a store to the same physical address issuing simultaneously

In all three cases, the younger instruction causes a replay trap.

Untested.

map_stall
The instruction stalled after it was fetched and before it was mapped. Such stalls are caused by a shortage of physical registers, integer issue queue space, floating-point issue queue space, or inums. There are 80 inums used to track instructions that are in flight.

early_kill
The instruction was killed early in the pipeline (before it entered an issue queue).

late_kill
The instruction was killed late in the pipeline.

COUNTER NAMES AND THEIR MEANINGS

retdelay
A lower bound on the number of cycles that the instruction's inum delayed the advance of the retire pointer. Large values indicate a probable performance problem. For example, the retdelay of the first instruction that uses the result of a load that misses out to memory might have a value of 100.

inflight
For instructions that retired without trapping (retired^notrap), this is -3 plus the number of cycles elapsed from when the instruction exited the fetch stage until the instruction retired (that is, approximately the number of cycles that the instruction was inflight).

TRAP BIT NAMES AND THEIR MEANINGS

Exactly one trap bit is set in any given ProfileMe sample.

notrap
None of the below

mispredict
The instruction caused a JSR/RET/JMP/JMP_COROUTINE or conditional branch mispredict

replays
The instruction caused a replay trap.

unaligntrap
The instruction caused an unaligned load or store.

dtbmiss
The instruction caused a DTB single miss.

dtb2miss3
The instruction caused a DTB double miss. (3-level page tables)

dtb2miss4
The instruction caused a DTB double miss. (4-level page tables)

itbmiss
The instruction caused an Instruction TLB miss. Most other bit and counter values will those for the first instruction in the ITB miss handler.

arithtrap
The instruction caused an arithmetic trap.

fpdisabledtrap
The instruction caused a floating point disabled trap.

MT_FPCRtrap

dfaulttrap
The instruction caused a Dstream fault because the virtual page is inaccessible or because the virtual address is malformed, that is, the virtual address is not properly sign-extended.

iacvtrap
The instruction caused an istream access violation. Most other bit and counter values will those for the first instruction in the IACV fault handler.

OPCDECtrap
The instruction caused an opcdec trap.

interrupt
The instruction was pre-empted by an interrupt. Most other bit and counter values will those for the first instruction in the PAL code that handles interrupts.

mchktrap
Note: trap can be used as a synonym for \!notrap.

VIEWING PROFILEME DATA

Use dcpiprof(1) to find out how many samples with particular bit values landed in each image or procedure of a program. Use dcpilist(1) to find out how many landed on a particular instruction.

The dcpi tools use the following syntax to name sets of samples:

 sample_set ::= bit_value
             | sample_set ^ bit_value
             | any

 bit_value ::= <Bit Name>
             | ! <Bit Name>
             | <Trap Bit Name>
             | ! <Trap Bit Name>
/ may be used instead of ! to indicate negation (because ! must usually be escaped on the command line).

Example sample sets:

retired^notrap
names all samples where the retired bit and the notrap bit are both set, that is, samples where the instruction retired and didn't cause a trap.

taken^!mispredict
names all samples where the taken bit is set and the mispredict bit is clear.

Each bit_value is a constraint on the set of samples included in the set: if the bit_value contains `!', the set includes only samples whose value for the bit is 0. If the bit_value has no `!', the set includes only samples whose value for the bit is 1. The sample set contains all samples that satisfy the constraints. The special sample set any includes all samples.

A sample set may be used as an event-type to determine how many samples in the set come from a particular image, procedure, or instruction.

To view the counter data, one appends ":CounterName" to the end of a sample_set. This denotes the total of the counter's values over each sample in the set.

EXAMPLE USAGE

dcpiprof -sp \!notrap -pm \!notrap a.out
Lists, in descending order for each procedure in a.out, the number of samples where an instruction in the procedure caused some kind of trap. (Note the use of `\' to prevent the shell from munging `!'. Note also that `/' can be used on the command line instead of `\!' to simplify typing.)

dcpiprof -sp retired:retdelay -pm retired+trap^\!dtbmiss
Lists, in descending order for all images, the total of the retire delay count for samples of instructions that retired, along with the number of samples for retired instructions and the number of samples in which the instruction trapped and the trap was not a dtbmiss.

dcpilist -pm retired main a.out
lists, for each instruction in procedure main of a.out, the number of samples where the instruction retired.

dcpilist -pm \!notrap main a.out
Lists, for each instruction, how many samples where the instruction caused some kind of trap.

dcpilist -pm \!notrap+retired main a.out
Gets the data for the previous two examples with a single command. dcpiprof also supports the use of + to display 2 or more sample sets with one command.

dcpiprof -pm retired:retdelay a.out
Lists, by procedure, the total of the retire delay count for each sample of an instruction that retired.

dcpiprof -pm default+retired:retdelay::retired a.out
Lists, by procedure, the default information plus a column showing the average retire-delay per retired instruction in the procedure.

PROFILEME LIMITATIONS

Because retdelay is merely a lower bound, there is no way to account for all cycles using only ProfileMe data. The retire delay always excludes stall cycles prior to when the profiled instruction was fetched. This makes it impossible to measure the length of icache miss stalls.

When a profiled instruction is killed early in the pipeline (early_kill is set), the PC reported by the hardware may be wrong and all counter values and bits other than valid, early_kill, no_trap, and and map_stall may be wrong.

Note that the unreliable data is restricted to instructions that were killed, and this data can be excluded by requiring \!early_kill.

The taken bit is UNDEFINED for instructions other than conditional branches or for conditional branches that mispredict.

SEE ALSO

dcpi(1), dcpi2ps(1), dcpicat(1), dcpictl(1), dcpid(1), dcpidiff(1), dcpiformat(4), dcpilist(1), dcpiprof(1), dcpitopstalls(1), dcpiwhatcg(1)  

For more information, see the HP Digital Continuous Profiling Infrastructure project home page (http://h30097.www3.hp.com/dcpi).



Comments
Last modified: April 8, 2004