Rabu, 14 Maret 2012

Part 1 PowerPC 603 Microprocessor Overview


Part 1 PowerPC 603 Microprocessor Overview
This section describes the features of the 603, provides a block diagram showing the major functional units,
and gives an overview of how the 603 operates.
The 603 is the first low-power implementation of the PowerPC microprocessor family of reduced instruction
set computer (RISC) microprocessors. The 603 implements the 32-bit portion of the PowerPC architecture,
which provides 32-bit effective addresses, integer data types of 8, 16, and 32 bits, and floating-point data
types of 32 and 64 bits. For 64-bit PowerPC microprocessors, the PowerPC architecture provides 64-bit
integer data types, 64-bit addressing, and other features required to complete the 64-bit architecture.
The 603 provides four software controllable power-saving modes. Three of the modes (the nap, doze, and
sleep modes) are static in nature, and progressively reduce the amount of power dissipated by the processor.
The fourth is a dynamic power management mode that causes the functional units in the 603 to
automatically enter a low-power mode when the functional units are idle without affecting operational
performance, software execution, or any external hardware.
The 603 is a superscalar processor capable of issuing and retiring as many as three instructions per clock.
Instructions can execute out of order for increased performance; however, the 603 makes completion appear

sequential.
The 603 integrates five execution units—an integer unit (IU), a floating-point unit (FPU), a branch
processing unit (BPU), a load/store unit (LSU), and a system register unit (SRU). The ability to execute five
instructions in parallel and the use of simple instructions with rapid execution times yield high efficiency
and throughput for 603-based systems. Most integer instructions execute in one clock cycle. The FPU is
pipelined so a single-precision multiply-add instruction can be issued every clock cycle.
The 603 provides independent on-chip, 8-Kbyte, two-way set-associative, physically addressed caches for
instructions and data and on-chip instruction and data memory management units (MMUs). The MMUs
contain 64-entry, two-way set-associative, data and instruction translation lookaside buffers (DTLB and
ITLB) that provide support for demand-paged virtual memory address translation and variable-sized block
translation. The TLBs and caches use a least recently used (LRU) replacement algorithm. The 603 also
supports block address translation through the use of two independent instruction and data block address
translation (IBAT and DBAT) arrays of four entries each. Effective addresses are compared simultaneously
with all four entries in the BAT array during block translation. In accordance with the PowerPC architecture,
if an effective address hits in both the TLB and BAT array, the BAT translation takes priority.
The 603 has a selectable 32- or 64-bit data bus and a 32-bit address bus. The 603 interface protocol allows
multiple masters to compete for system resources through a central external arbiter. The 603 provides a
three-state coherency protocol that supports the exclusive, modified, and invalid cache states. This protocol
is a compatible subset of the MESI (modified/exclusive/shared/invalid) four-state protocol and operates
coherently in systems that contain four-state caches. The 603 supports single-beat and burst data transfers
for memory accesses; it also supports both memory-mapped I/O and direct-store interface addressing.
The 603 uses an advanced, 3.3-V CMOS process technology and maintains full interface compatibility with
TTL devices.
1.1 PowerPC 603 Microprocessor Features
This section describes details of the 603’s implementation of the PowerPC architecture. Major features of
the 603 are as follows:
• High-performance, superscalar microprocessor
— As many as three instructions issued and retired per clock
— As many as five instructions in execution per clock
PowerPC 603 RISC Microprocessor Technical Summary 3
— Single-cycle execution for most instructions
— Pipelined FPU for all single-precision and most double-precision operations
• Five independent execution units and two register files
— BPU featuring static branch prediction
— A 32-bit IU
— Fully IEEE 754-compliant FPU for both single- and double-precision operations
— LSU for data transfer between data cache and GPRs and FPRs
— SRU that executes condition register (CR) and special-purpose register (SPR) instructions
— Thirty-two GPRs for integer operands
— Thirty-two FPRs for single- or double-precision operands
• High instruction and data throughput
— Zero-cycle branch capability (branch folding)
— Programmable static branch prediction on unresolved conditional branches
— Instruction fetch unit capable of fetching two instructions per clock from the instruction cache
— A six-entry instruction queue that provides look-ahead capability
— Independent pipelines with feed-forwarding that reduces data dependencies in hardware
— 8-Kbyte data cache—two-way set-associative, physically addressed; LRU replacement
algorithm
— 8-Kbyte instruction cache—two-way set-associative, physically addressed; LRU replacement
algorithm
— Cache write-back or write-through operation programmable on a per page or per block basis
— BPU that performs CR look-ahead operations
— Address translation facilities for 4-Kbyte page size, variable block size, and 256-Mbyte
segment size
— A 64-entry, two-way set-associative ITLB
— A 64-entry, two-way set-associative DTLB
— Four-entry data and instruction BAT arrays providing 128-Kbyte to 256-Mbyte blocks
— Software table search operations and updates supported through fast trap mechanism
— 52-bit virtual address; 32-bit physical address
• Facilities for enhanced system performance
— A 32- or 64-bit split-transaction external data bus with burst transfers
— Support for one-level address pipelining and out-of-order bus transactions
— Bus extensions for direct-store interface operations
• Integrated power management
— Low-power 3.3-volt design
— Internal processor/bus clock multiplier that provides 1/1, 2/1, 3/1, and 4/1 ratios
— Three power saving modes: doze, nap, and sleep
— Automatic dynamic power reduction when internal functional units are idle
• In-system testability and debugging features through JTAG boundary-scan capability
4 PowerPC 603 RISC Microprocessor Technical Summary
1.2 Block Diagram
Figure 1 provides a block diagram of the 603 that illustrates how the execution units—IU, FPU, BPU, LSU,
and SRU—operate independently and in parallel.
The 603 provides address translation and protection facilities, including an ITLB, DTLB, and instruction
and data BAT arrays. Instruction fetching and issuing is handled in the instruction unit. Translation of
addresses for cache or external memory accesses are handled by the MMUs. Both units are discussed in
more detail in Sections 1.3, “Instruction Unit,” and 1.5.1, “Memory Management Units (MMUs).”
1.3 Instruction Unit
As shown in Figure 1, the 603 instruction unit, which contains a fetch unit, instruction queue, dispatch unit,
and BPU, provides centralized control of instruction flow to the execution units. The instruction unit
determines the address of the next instruction to be fetched based on information from the sequential fetcher
and from the BPU.
The instruction unit fetches the instructions from the instruction cache into the instruction queue. The BPU
extracts branch instructions from the fetcher and uses static branch prediction on unresolved conditional
branches to allow the instruction unit to fetch instructions from a predicted target instruction stream while
a conditional branch is evaluated. The BPU folds out branch instructions for unconditional branches or
conditional branches unaffected by instructions in progress in the execution pipeline.
Instructions issued beyond a predicted branch do not complete execution until the branch is resolved,
preserving the programming model of sequential execution. If any of these instructions are to be executed
in the BPU, they are decoded but not issued. Instructions to be executed by the FPU, IU, LSU, and SRU are
issued and allowed to complete up to the register write-back stage. Write-back is allowed when a correctly
predicted branch is resolved, and instruction execution continues without interruption along the predicted
path.
If branch prediction is incorrect, the instruction unit flushes all predicted path instructions, and instructions
are issued from the correct path.
PowerPC 603 RISC Microprocessor Technical Summary 5
Figure 1. PowerPC 603 Microprocessor Block Diagram
BRANCH
PROCESSING
UNIT
32-/64-BIT DATA BUS
32-BIT ADDRESS BUS
INSTRUCTION UNIT
INTEGER
UNIT
FLOATINGPOINT
UNIT
FPR File
FP Rename
Registers
8-Kbyte
D Cache
Tags
SEQUENTIAL
FETCHER
CTR
CR
LR
/ * +
FPSCR
SYSTEM
REGISTER
UNIT
/ * +
PROCESSOR BUS
INTERFACE
D MMU
SRs
DTLB
DBAT
Array
Touch Load Buffer
Copyback Buffer
64 BIT
32 BIT
Dispatch Unit
64 BIT
64 BIT
Power
Dissipation
Control
COMPLETION
UNIT
Time Base
Counter/
Decrementer
Clock
Multiplier
JTAG/COP
Interface
XER
I MMU
SRs
ITLB
IBAT
Array
8-Kbyte
I Cache
Tags
64 BIT
64 BIT
64 BIT
64 BIT 64 BIT
GPR File LOAD/STORE
UNIT
+
64-BIT
GP Rename
Registers
INSTRUCTION
QUEUE
6 PowerPC 603 RISC Microprocessor Technical Summary
1.3.1 Instruction Queue and Dispatch Unit
The instruction queue (IQ), shown in Figure 1, holds as many as six instructions and loads up to two
instructions from the instruction unit during a single cycle. The instruction fetch unit continuously loads as
many instructions as space in the IQ allows. Instructions are dispatched to their respective execution units
from the dispatch unit at a maximum rate of two instructions per cycle. Dispatching is facilitated to the IU,
FPU, LSU, and SRU by the provision of a reservation station at each unit. The dispatch unit performs source
and destination register dependency checking, determines dispatch serializations, and inhibits subsequent
instruction dispatching as required.
For a more detailed overview of instruction dispatch, see Section 3.7, “Instruction Timing.”
1.3.2 Branch Processing Unit (BPU)
The BPU receives branch instructions from the fetch unit and performs CR look-ahead operations on
conditional branches to resolve them early, achieving the effect of a zero-cycle branch in many cases.
The BPU uses a bit in the instruction encoding to predict the direction of the conditional branch. Therefore,
when an unresolved conditional branch instruction is encountered, the 603 fetches instructions from the
predicted target stream until the conditional branch is resolved.
The BPU contains an adder to compute branch target addresses and three user-control registers—the link
register (LR), the count register (CTR), and the CR. The BPU calculates the return pointer for subroutine
calls and saves it into the LR for certain types of branch instructions. The LR also contains the branch target
address for the Branch Conditional to Link Register (bclrx) instruction. The CTR contains the branch target
address for the Branch Conditional to Count Register (bcctrx) instruction. The contents of the LR and CTR
can be copied to or from any GPR. Because the BPU uses dedicated registers rather than GPRs or FPRs,
execution of branch instructions is largely independent from execution of integer and floating-point
instructions.
1.4 Independent Execution Units
The PowerPC architecture’s support for independent execution units allows implementation of processors
with out-of-order instruction execution. For example, because branch instructions do not depend on GPRs
or FPRs, branches can often be resolved early, eliminating stalls caused by taken branches.
In addition to the BPU, the 603 provides four other execution units and a completion unit, which are
described in the following sections.
1.4.1 Integer Unit (IU)
The IU executes all integer instructions. The IU executes one integer instruction at a time, performing
computations with its arithmetic logic unit (ALU), multiplier, divider, and integer exception register (XER).
Most integer instructions are single-cycle instructions. Thirty-two general-purpose registers are provided to
support integer operations. Stalls due to contention for GPRs are minimized by the automatic allocation of
rename registers. The 603 writes the contents of the rename registers to the appropriate GPR when integer
instructions are retired by the completion unit.
1.4.2 Floating-Point Unit (FPU)
The FPU contains a single-precision multiply-add array and the floating-point status and control register
(FPSCR). The multiply-add array allows the 603 to efficiently implement multiply and multiply-add
operations. The FPU is pipelined so that single-precision instructions and double-precision instructions can
be issued back-to-back. Thirty-two floating-point registers are provided to support floating-point operations.
Stalls due to contention for FPRs are minimized by the automatic allocation of rename registers. The 603
PowerPC 603 RISC Microprocessor Technical Summary 7
writes the contents of the rename registers to the appropriate FPR when floating-point instructions are
retired by the completion unit.
The 603 supports all IEEE 754 floating-point data types (normalized, denormalized, NaN, zero, and infinity)
in hardware, eliminating the latency incurred by software exception routines. (The term, ‘exception’ is also
referred to as ‘interrupt’ in the architecture specification.)
1.4.3 Load/Store Unit (LSU)
The LSU executes all load and store instructions and provides the data transfer interface between the GPRs,
FPRs, and the cache/memory subsystem. The LSU calculates effective addresses, performs data alignment,
and provides sequencing for load/store string and multiple instructions.
Load and store instructions are issued and translated in program order; however, the actual memory accesses
can occur out of order. Synchronizing instructions are provided to enforce strict ordering.
Cacheable loads, when free of data dependencies, execute in a speculative manner with a maximum
throughput of one per cycle and a two-cycle total latency. Data returned from the cache is held in a rename
register until the completion logic commits the value to a GPR or FPR. Stores cannot be executed
speculatively and are held in the store queue until the completion logic signals that the store operation is to
be completed to memory. The time required to perform the actual load or store operation varies depending
on whether the operation involves the cache, system memory, or an I/O device.
1.4.4 System Register Unit (SRU)
The SRU executes various system-level instructions, including condition register logical operations and
move to/from special-purpose register instructions. In order to maintain system state, most instructions
executed by the SRU are completion-serialized; that is, the instruction is held for execution in the SRU until
all prior instructions issued have completed. Results from completion-serialized instructions executed by
the SRU are not available or forwarded for subsequent instructions until the instruction completes.
1.4.5 Completion Unit
The completion unit tracks instructions from dispatch through execution, and then retires, or “completes”
them in program order. Completing an instruction commits the 603 to any architectural register changes
caused by that instruction. In-order completion ensures the correct architectural state when the 603 must
recover from a mispredicted branch or any exception.
Instruction state and other information required for completion is kept in a first-in-first-out (FIFO) queue of
five completion buffers. A single completion buffer is allocated for each instruction once it enters the
dispatch unit. An available completion buffer is a required resource for instruction dispatch; if no
completion buffers are available, instruction dispatch stalls. A maximum of two instructions per cycle are
completed in order from the queue.
1.5 Memory Subsystem Support
The 603 provides support for cache and memory management through dual instruction and data memory
management units. The 603 also provides dual 8-Kbyte instruction and data caches, and an efficient
processor bus interface to facilitate access to main memory and other bus subsystems. The memory
subsystem support functions are described in the following subsections.
1.5.1 Memory Management Units (MMUs)
The 603’s MMUs support up to 4 Petabytes (252) of virtual memory and 4 Gigabytes (232) of physical
memory (referred to as real memory in the architecture specification) for instruction and data. The MMUs
also control access privileges for these spaces on block and page granularities. Referenced and changed

Tidak ada komentar:

Posting Komentar