Intel Computer Hardware 80200 Bedienungsanleitung PDF herunterladen (Seite 267)

Developer’s Manual March, 2003 B-39

Intel

80200 Processor based on Intel

XScale

™

Microarchitecture

Optimization Guide

B.5.2 Scheduling Data Processing Instructions

Most Intel

80200 processor data processing instructions have a result latency of 1 cycle. This

means that the current instruction is able to use the result from the previous data processing

instruction. However, the result latency is 2 cycles if the current instruction needs to use the result

of the previous data processing instruction for a shift by immediate. As a result, the following code

segment would incur a 1 cycle stall for the mov instruction:

sub r6, r7, r8

add r1, r2, r3

mov r4, r1, LSL #2

The code above can be rearranged as follows to remove the 1 cycle stall:

add r1, r2, r3

sub r6, r7, r8

mov r4, r1, LSL #2

All data processing instructions incur a 2 cycle issue penalty and a 2 cycle result penalty when the

shifter operand is a shift/rotate by a register or shifter operand is RRX. Since the next instruction

would always incur a 2 cycle issue penalty, there is no way to avoid such a stall except by

re-writing the assembler instruction. Consider the following segment of code:

mov r3, #10

mul r4, r2, r3

add r5, r6, r2, LSL r3

sub r7, r8, r2

The subtract instruction would incur a 1 cycle stall due to the issue latency of the add instruction as

the shifter operand is shift by a register. The issue latency can be avoided by changing the code as

follows:

mov r3, #10

mul r4, r2, r3

add r5, r6, r2, LSL #10

sub r7, r8, r2

1 2 ... 262 263 264 265 266 267 268 269 270 271 272 ... 288 289

Keine Kommentare

Intel Computer Hardware 80200 Bedienungsanleitung Seite 267