Intel ARCHITECTURE IA-32 Handbücher

Bedienungsanleitungen und Benutzerhandbücher für Computerzubehör Intel ARCHITECTURE IA-32.
Wir stellen 1 PDF-Handbücher Intel ARCHITECTURE IA-32 zum kostenlosen herunterladen nach Dokumenttypen zur Verfügung Bedienungsanleitung


Inhaltsverzeichnis

IA-32 Intel® Architecture

1

Optimization Reference

1

Contents

3

Appendix DStack Alignment

14

Examples

15

Introduction

23

Tuning Your Application

24

About This Manual

24

Related Documentation

27

Notational Conventions

28

IA-32 Intel

29

Architecture

29

Processor Family Overview

29

SIMD Technology

30

Y4 Y3 Y2 Y1

31

OP OP OP OP

31

• inherently parallel

32

Summary of SIMD Technologies

33

Streaming SIMD Extensions 2

34

Streaming SIMD Extensions 3

34

Intel NetBurst

36

Microarchitecture

36

)UHTXHQWO\XVHGSDWKV

38

/HVVIUHTXHQWO\XVHGSDWKV

38

The Front End

39

The Out-of-order Core

40

Retirement

40

Front End Pipeline Detail

41

Execution Trace Cache

42

Branch Prediction

43

Execution Core Detail

44

Data Prefetch

49

• multiple outstanding misses

52

• buffering of writes

52

Pentium

54

• fetch/decode unit

55

• instruction cache

55

Data Prefetching

57

Out-of-Order Core

58

Microarchitecture of Intel

59

Core™ Solo and

59

Core™ Duo Processors

59

• Power-optimized bus

60

• Data Prefetch

60

• Micro-op fusion

60

• operational fairness

64

Shared Resources

65

Front End Pipeline

66

Multi-Core Processors

67

Load and Store Operations

70

General Optimization

73

Optimize Memory Access

77

Enable Vectorization

79

Performance Tools

81

VTune™ Performance Analyzer

82

Processor Perspectives

83

A and B. If the condition is

88

Spin-Wait and Idle Loops

90

Static Prediction

91

Inlining, Calls and Returns

94

Branch Type Selection

95

Loop Unrolling

98

• inlining where appropriate

100

Memory Accesses

101

Line 029e7100h

103

Line 029e70c0h

103

Line 029e7140h

103

Store Forwarding

104

Alignment

105

Figure 2-2

106

Example 2-14

108

• parameter passing

110

Data Layout Optimizations

111

Stack Alignment

114

Aliasing Cases in the Pentium

117

4 and Intel

117

Processors

117

Mixing Code and Data

119

Write Combining

120

Locality Enhancement

122

Minimizing Bus Latency

124

• software prefetch for data

127

Cacheability Instructions

128

Applications

129

• arithmetic overflow

132

• arithmetic underflow

132

• denormalized operand

132

Floating-point Modes

134

Core Duo Processors

142

Memory Operands

143

Floating-Point Stalls

144

Instruction Selection

145

Complex Instructions

146

Use of the lea Instruction

146

Flag Register Accesses

147

Integer Divide

148

Alternate Sequence without

150

Partial Register Stall

150

• Operand size prefix (0x66)

152

• Address size prefix (0x67)

152

REP Prefix and Data Movement

153

• Throughput per iteration:

154

• Address alignment:

154

• Cache eviction:

155

Destination

157

• immediate constant

158

• base register

158

• scaled index register

158

Clearing Registers

159

Compares

159

Floating Point/SIMD Operands

160

Prolog Sequences

162

Instruction Scheduling

163

Spill Scheduling

164

Vectorization

165

• avoid global pointers

166

• avoid global variables

166

Miscellaneous

167

User/Source Coding Rules

169

PUSH, CALL, RET). 2-84

179

Tuning Suggestions

180

Coding for SIMD

181

Architectures

181

Technologies

182

bool OSSupportCheck() {

184

Programming

188

Identifying Hot Spots

190

Coding Techniques

192

Coding Methodologies

193

Assembly

195

Intrinsics

195

+”, “>>”)

197

Automatic Vectorization

198

Stack and Data Alignment

200

__m128* datatypes

202

__m128*

203

Compiler-Supported Alignment

204

Improving Memory Utilization

207

SoA Data Structure

208

Strip Mining

212

Example 3-19 Strip Mined Code

213

Loop Blocking

214

Example 3-20 Loop Blocking

215

A. Original Loop

215

Blocking

216

Tuning the Final Application

219

Optimizing for SIMD Integer

221

Using the EMMS Instruction

223

Data Alignment

226

Signed Unpack

227

MM/M64 mm

229

Non-Interleaved Unpack

231

Extract Word

233

Insert Word

234

Figure 4-6 pinsrw Instruction

235

Move Byte Mask to Integer

236

55 47 39 23 15 7

237

X4 X3 X2 X1

238

X1 X2 X3 X4

238

Generating Constants

241

Building Blocks

243

Absolute Value

245

0x8000800080008000

246

Highly Efficient Clipping

247

Signed Word

249

Packed Multiply High Unsigned

250

Packed Average (Byte/Word)

251

Packed 32*32 Multiply

253

Packed 64-bit Add/Subtract

253

128-bit Shifts

253

Memory Optimizations

254

Partial Memory Accesses

255

Instruction

259

Optimizing for SIMD

263

Floating-point Applications

263

Planning Considerations

264

Scalar Floating-point Code

265

Data Swizzling

271

Example 5-3 Swizzling Data

272

Data Deswizzling

276

Instructions

277

Instructions (continued)

278

Functions

279

Horizontal ADD Using SSE

280

C1 C2 C3 C4 D1 D2 D3 D4

281

C1 C2 D1 D2 C3 C4 D3 D4

281

MXCSR register should be

283

SSE3 and Complex Arithmetics

285

Optimizing Cache Usage

291

Optimizing Cache Usage 6

293

Hardware Prefetching of Data

294

Prefetch

296

Implementation

298

Cacheability Control

299

Fencing

300

Streaming Non-temporal Stores

300

WB) or Write-Through (WT)

301

WC semantics)

301

Write-Combining

302

Streaming Store Usage Models

303

• hand-crafted code

305

The lfence Instruction

306

The mfence Instruction

306

The clflush Instruction

307

Software-controlled Prefetch

308

Hardware Prefetch

309

Constant Stride

311

Non-Adjacent Passes Loops

326

60 invis

332

• write-once (non-temporal)

333

Cache Management

334

Video Encoder

335

Video Decoder

335

• alignment of data

337

• cache size

337

Bit Location Name Meaning

344

• Determine prefetch stride

345

Parameters

346

Multi-Core and

347

Hyper-Threading Technology

347

Performance and Usage Models

348

Single Thread

349

Multi-Thread on MP

349

Multitasking Environment

350

• workload

352

• thread interaction

352

• hardware utilization

352

• domain decomposition

353

• functional decomposition

353

Functional Decomposition

354

P(1)P(1) C(1)C(1)P(1)

355

P: producer

356

C: consumer

356

Thread 0

358

Thread 1

358

Optimization Guidelines

362

Thread Synchronization

365

Optimization with Spin-Locks

371

PAUSE instruction in the

372

Example 7-5

373

System Bus Optimization

379

Conserve Bus Bandwidth

380

Memory Optimization

384

Shared-Memory Optimization

385

4 KB in each thread

388

Per-thread Stack Offset

390

Per-instance Stack Offset

392

Front-end Optimization

394

Resources

395

Processor

397

Processor (Contd.)

398

Sharing the Same Cache

401

64-bit Mode Coding

409

Guidelines

409

Only When Necessary

410

Assembly/Compiler Coding rule

411

64-Bit Arithmetic

412

Assembly/Compiler Coding Rule

413

Possible

414

Using Software Prefetch

414

Power Optimization for

415

Mobile Usages

415

Mobile Usage Scenarios

416

ACPI C-States

418

Reducing Amount of Work

423

• Switch off unused devices

424

Technology

426

Enabling Intel

428

Enhanced Deeper Sleep

428

Multi-Core Considerations

429

(C1-C4)

433

Application Performance

435

Compilers

436

Code Optimization Options

437

Vectorizer Switch Options

439

Multithreading with OpenMP*

440

VTune™ Performance Analyzer

442

Sampling

443

Event-based Sampling

444

Workload Characterization

445

Call Graph

447

Performance Libraries

448

Benefits Summary

449

Optimizations with the Intel

450

Enhanced Debugger (EDB)

451

Threading Tools

451

Thread Profiler

453

Software College

454

Using Performance Monitoring

455

Bogus, Non-bogus, Retire

456

Bus Ratio

456

Counting Clocks

458

Non-Halted Clockticks

459

Non-Sleep Clockticks

460

Time Stamp Counter

461

Microarchitecture Notes

462

Side Bus

464

Reads due to program loads

465

Writebacks (dirty evictions)

466

Usage Notes on Bus Activities

469

Tags for replay_event

500

Tags for front_end_event

502

Tags for execution_event

502

Technology

504

Parallel Counting

505

Parallel Counting (continued)

506

Intel Core Duo processors

510

Ratio Interpretation

511

Notes on Selected Events

512

Throughput

515

Overview

516

PADDQ and PMULUDQ, each have

517

Definitions

518

Latency and Throughput

518

See “Table Footnotes”

520

Instructions (continued)

524

Table Footnotes

533

Stack Alignment D

539

& 0x0f) == 0x08

542

Stack Frame Optimizations

545

Inlined Assembly and ebx

546

Mathematics of Prefetch

547

Scheduling Distance

547

Mathematical Model for PSD

548

L2 lookup miss latency

550

• Optimize T

551

No Preloading or Prefetch

552

Front-Side Bus

553

Execution pipeline

553

Execution cycles

553

Compute Bound (Case: T

554

>= T

556

INTEL SALES OFFICES

567





Weitere Produkte und Handbücher für Computerzubehör Intel

Modelle Dokumententyp
1520 Bedienungsanleitung   Intel 1520 User Manual, 176 Seiten
82600 Bedienungsanleitung   Intel 82600 User Manual, 40 Seiten
CELERON 200 Bedienungsanleitung   Intel CELERON 200 User Manual, 53 Seiten
I/O Controller Hub 6300ESB Bedienungsanleitung   Intel I/O Controller Hub 6300ESB User Manual, 14 Seiten
220T Bedienungsanleitung   Intel 220T User Manual, 24 Seiten
520T Bedienungsanleitung   Intel 520T User Manual, 31 Seiten
410T Bedienungsanleitung   Intel 410T User Manual, 40 Seiten
Express 510T Bedienungsanleitung    Intel Express 510T Manuel d'utilisation, 144 Seiten
130T Bedienungsanleitung   Intel 130T User Manual, 18 Seiten
4 Bedienungsanleitung   Intel 4 User Manual, 10 Seiten
IA-32 Bedienungsanleitung   Intel IA-32 User Manual, 636 Seiten
Evaluation Platform Board Manual IQ80960RM Bedienungsanleitung   Intel Evaluation Platform Board Manual IQ80960RM User Manual, 88 Seiten
cPCI-7200 Bedienungsanleitung   Intel cPCI-7200 User Manual, 71 Seiten
AXXSW1GB Bedienungsanleitung   Intel AXXSW1GB User Manual, 220 Seiten
Express Hub Bedienungsanleitung   Intel Express Hub User Manual, 4 Seiten
SBC-455 Bedienungsanleitung   Intel SBC-455 User Manual, 97 Seiten
Ethernet Switch Boards Bedienungsanleitung   Intel Ethernet Switch Boards User Manual [en] , 52 Seiten
TOUCH-N-MOW 120000 Bedienungsanleitung   Intel TOUCH-N-MOW 120000 User Manual, 12 Seiten
ZT8101 Bedienungsanleitung   Intel ZT8101 User Manual, 124 Seiten
NetStructure 470 Bedienungsanleitung   Intel NetStructure 470 User Manual, 155 Seiten