## ARCHITECTURE OF COMPUTER SYSTEMS LECTURE 11 - OUT-OF-ORDER ISSUE, REGISTER RENAMING & BRANCH PREDICTION

## LAST TIME IN

- Fipelining is complicated by multiple and/or variable latency functional units
- Out-of-order and/or pipelined execution requires tracking of dependencies
  - RAW
  - WAR
  - WAW
- Dynamic issue logic can support out-of-order execution to improve performance
  - Last time, looked at simple scoreboard to track out-of-order completion
- Hardware register renaming can further improve performance by removing hazards.

### REGISTER RENAMING



- Decode does register renaming and adds instructions to the issue-stage instruction reorder buffer (ROB)
  - ⇒ renaming makes WAR or WAW hazards impossible
- Any instruction in ROB whose RAW hazards have been satisfied can be issued.
  - ⇒ Out-of-order or dataflow execution



- Instruction template (i.e., tag t) is allocated by the Decode stage, which also associates tag with register in regfile
- When an instruction completes, its tag is deallocated

#### KEUKDEK BUFFEK



#### ROB managed circularly

- "exec" bit is set when instruction begins execution
- When an instruction completes its "use" bit is marked free
- ptr<sub>2</sub> is incremented only if the "use" bit is marked free

#### Instruction slot is candidate for execution when:

- It holds a valid instruction ("use" bit is set)
- It has not already started execution ("exec" bit is clear)
- Both operands are available (p1 and p2 are set)



### EFFECTIVENESS?

Renaming and Out-of-order execution was first implemented in 1969 in IBM 360/91 but did not show up in the subsequent models until mid-Nineties.

Why?

#### Reasons

- 1. Effective on a very small class of programs
- 2. Memory latency a much bigger problem
- 3. Exceptions not precise!

One more problem needed to be solved *Control transfers* 

# PRECISE INTERRUPTS

It must appear as if an interrupt is taken between two instructions (say  $I_i$  and  $I_{i+1}$ )

- the effect of all instructions up to and including I<sub>i</sub> is totally complete
- no effect of any instruction after I<sub>i</sub> has taken place

The interrupt handler either aborts the program or restarts it at  $I_{i+1}$ .

# EFFECT ON INTERRUPTS

**OUT-OF-ORDER COMPLETION** 

```
I_1 DIVD f6, f6, f4

I_2 LD f2, 45(r3)

I_3 MULTD f0, f2, f4

I_4 DIVD f8, f6, f2

I_5 SUBD f10, f0, f6

I_6 ADDD f6, f8, f2
```

```
out-of-order comp 1 2 \underline{2} 3 \underline{1} 4 \underline{3} 5 \underline{5} \underline{4} 6 \underline{6} restore f2 Consider interrupts
```

Precise interrupts are difficult to implement at high speed
- want to start execution of later instructions before
exception checks finished on earlier instructions

#### EXCEPTION HANDLIN (IN-ORDER FIVE-STAGE PIPELINE) **Point** Inst. Data Decode Mem Mem illegal <del>)ata Addı</del> Opcod Overflow Selec Except Writebac Handler PC Address PC Exceptions Exc Exc Exc Cause **EPC** Kill F Kill D Kill E **Asynchronous**

Hold exception flags in pipeline until commit point (M stage)

Stage

*Interrupts* 

• Exceptions in earlier pipe stages override later exceptions

Stage

Stage

- Inject external interrupts at commit point (override others)
- If exception at commit: update Cause and EPC registers, kill all stages, inject handler PC into fetch stage

Total Control of the Control of the

### INSTRUCTION



#### IN-ORDER COMMIT FOR PRECISE EXCEPTIONS



- Instructions fetched and decoded into instruction reorder buffer in-order
- Execution is out-of-order ( ⇒ out-of-order completion)
- Commit (write-back to architectural state, i.e., regfile & memory, is in-order

Temporary storage needed to hold results before commit (shadow registers and store buffers)

### EXTENSIONS FOR pd dest data cause ptr<sub>2</sub> next to commit $ptr_1$ next available

Reorder buffer

- add <pd, dest, data, cause> fields in the instruction template
- commit instructions to reg file and memory in program order ⇒ buffers can be maintained circularly
- on exception, clear reorder buffer by resetting ptr<sub>1</sub>=ptr<sub>2</sub> (stores must wait for commit before updating memory)

#### (now holds only committed state) Ins# use exec d**a**ta **p2** pd dest Op) |p1 srd1 src2 $t_2$ Reorder buffer Commit Store Load FU FU FU Unit Unit < t, result >

Register file does not contain renaming tags any more.

How does the decode stage find the tag of a source register?

Search the "dest" field in the reorder buffer

## RENAMING TABLE



Renaming table is a cache to speed up register name look up. It needs to be cleared after each exception taken.

When else are valid bits cleared?

Control transfers

## CONTROL FLOW PENALTY

Modern processors may have > 10 pipeline stages between next PC calculation and branch resolution!

How much work is lost if pipeline doesn't follow correct instruction flow?

~ Loop length x pipeline width



### CS152 ADMINISTRIVIA

- Quiz 2, Tuesday March 5
  - Caches and Virtual memory L6 L9, PS 2, Lab 2, readings

## MISPREDICT RECOVERY

#### In-order execution machines:

- Assume no instruction issued after branch can write-back before branch resolves
- Kill all instructions in pipeline behind mispredicted branch

#### Out-of-order execution?

 Multiple instructions following branch in program order can complete before branch resolves

#### IN-ORDER COMMIT FOR PRECISE EXCEPTIONS



- Instructions fetched and decoded into instruction reorder buffer in-order
- Execution is out-of-order ( ⇒ out-of-order completion)
- Commit (write-back to architectural state, i.e., regfile & memory, is in-order

Temporary storage needed in ROB to hold results before commit

# BRANCH MISPREDICTION IN PIPELINE



- Can have multiple unresolved branches in ROB
- Can resolve branches out-of-order by killing all the instructions in ROB that follow a mispredicted branch



Take snapshot of register rename table at each predicted branch, recover earlier snapshot if branch mispredicted

### "DATA-IN-ROB" DESIGN



- On dispatch into ROB, ready sources can be in regfile or in ROB dest (copied into src1/src2 if ready before dispatch)
- On completion, write to dest field and broadcast to src fields.
- On issue, read from ROB src fields

# DATA MOVEMENT IN DATA-IN-ROB DESIGN



## REGISTER FILE

- Refferie all architectural registers that a single physical register file during decode, no register values read
- Functional units read and write from single unified register file holding committed and temporary registers in execute
- Commit only updates mapping of architectural register to physical register, no data movement



### Pipeline Design with Physical Regfile



## LIFETIME OF PHYSICAL REGISTERS Physical regfile holds committed and speculative values

- Physical registers decoupled from ROB entries (no data in ROB)



ld P1, (Px) addi P2, P1, #4 sub P3, Py, Pz add P4, P2, P3 Id P5, (P1) add P6, P5, P4 sd P6, (P1) Id P7, (Pw)

When can we reuse a physical register?

When next write of same architectural register commits

## PHYSICAL REGISTER MANAGEMENT





| _ |    | _ |
|---|----|---|
|   | P0 |   |
|   | P1 |   |
|   | P3 |   |
|   | P2 |   |
|   | P4 |   |
|   |    |   |
|   |    |   |
|   |    |   |
|   |    |   |
|   |    |   |
|   | :  |   |
|   |    |   |

Free List

| ld x1, 0(x3)    |
|-----------------|
| addi x3, x1, #4 |
| sub x6, x7, x6  |
| add x3, x3, x6  |
| ld x6, 0(x1)    |

ROB

| use | ex | ор | p1 | PR1 | p2 | PR2 | Rd | LPRd | PRd |
|-----|----|----|----|-----|----|-----|----|------|-----|
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |

(LPRd requires third read port on Rename Table for each instruction)

# PHYSICAL REGISTER MANAGEMENT



→ Id x1, 0(x3)
 addi x3, x1, #4
 sub x6, x7, x6
 add x3, x3, x6
 Id x6, 0(x1)

| use | ex | ор | p1 | PR1 | p2 | PR2 | Rd         | LPRd      | PRd |
|-----|----|----|----|-----|----|-----|------------|-----------|-----|
| X   |    | ld | р  | P7  |    |     | <b>x</b> 1 | <b>P8</b> | P0  |
|     |    |    |    |     |    |     |            |           |     |
|     |    |    |    |     |    |     |            |           |     |
|     |    |    |    |     |    |     |            |           |     |
|     |    |    |    |     |    |     |            |           |     |
|     |    |    |    |     |    |     |            |           |     |
|     |    |    |    |     |    |     |            |           |     |



Id x1, 0(x3)
 addi x3, x1, #4
 sub x6, x7, x6
 add x3, x3, x6
 Id x6, 0(x1)

| use | ex | ор   | p1 | PR1 | p2 | PR2 | Rd         | LPRd | PRd |
|-----|----|------|----|-----|----|-----|------------|------|-----|
| X   |    | ld   | р  | P7  |    |     | X1         | P8   | P0  |
| X   |    | addi |    | P0  |    |     | <b>x</b> 3 | *P7  | P1  |
|     |    |      |    |     |    |     |            |      |     |
|     |    |      |    |     |    |     |            |      |     |
|     |    |      |    |     |    |     |            |      |     |
|     |    |      |    |     |    |     |            |      |     |
|     |    |      |    |     |    |     |            |      |     |









- ld x1, 0(x3) addi x3, x1, #4
- → sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1)

| use | ex | ор   | p1 | PR1 | p2 | PR2 | Rd | LPRd       | PRd |
|-----|----|------|----|-----|----|-----|----|------------|-----|
| X   |    | ld   | р  | P7  |    |     | x1 | P8         | P0  |
| X   |    | addi |    | P0  |    |     | x3 | P7         | P1  |
| X   |    | sub  | р  | P6  | р  | P5  | x6 | <b>P</b> 5 | P3  |
|     |    |      |    |     |    |     |    |            |     |
|     |    |      |    |     |    |     |    |            |     |
|     |    |      |    |     |    |     |    |            |     |
|     |    |      |    |     |    |     |    |            |     |

## PHYSICAL REGISTER MANAGEMENT



ld x1, 0(x3)
addi x3, x1, #4
sub x6, x7, x6
→ add x3, x3, x6
ld x6, 0(x1)

| use | ex | op   | p1 | PR1 | p2 | PR2 | Rd         | LPRd       | PRd |
|-----|----|------|----|-----|----|-----|------------|------------|-----|
| X   |    | ld   | р  | P7  |    |     | x1         | P8         | PO  |
| X   |    | addi |    | Р0  |    |     | <b>x</b> 3 | P7         | P\1 |
| X   |    | sub  | р  | P6  | р  | P5  | x6         | P5         | Pβ  |
| X   |    | add  | Ţ. | P1  | _  | P3  | <b>x</b> 3 | <b>P</b> 1 | P2  |
|     |    |      |    |     |    |     |            |            |     |
|     |    |      |    |     |    |     |            |            |     |
|     |    |      |    |     |    |     |            |            |     |



1d x1, 0(x3)addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6  $\rightarrow$  Id x6, 0(x1)

| use | ex | ор   | p1 | PR1 | p2 | RR2 | Rd | LPRd       | P | Rd             |
|-----|----|------|----|-----|----|-----|----|------------|---|----------------|
| X   |    | ld   | р  | P7  |    |     | x1 | P8         |   | P0             |
| X   |    | addi |    | P0  |    |     | х3 | P7         |   | <b>R</b> 1     |
| X   |    | sub  | р  | P6  | р  | P5  | x6 | P5         |   | P <sub>3</sub> |
| X   |    | add  | _  | P1  | ·  | P3  | x3 | P1         |   | P2             |
| X   |    | ld   |    | P0  |    |     | x6 | <b>P</b> 3 |   | P4             |
|     |    |      |    |     |    |     |    |            |   |                |
|     |    |      |    |     |    |     |    |            |   |                |

MANAGEMENT





Id x1, 0(x3)
addi x3, x1, #4
sub x6, x7, x6
add x3, x3, x6
Id x6, 0(x1)

ROB

| use | ex | ор   | p1 | PR1 | p2 | PR2 | Rd         | LPR  | ld | PRd  |
|-----|----|------|----|-----|----|-----|------------|------|----|------|
| X   | X  | ld   | р  | P7  |    |     | x1         | \P8  |    | ' P0 |
| X   |    | addi | D  | P0  |    |     | x3         | . P7 |    | P1   |
| X   |    | sub  | p  | P6  | р  | P5  | х6         | P5   |    | P3   |
| X   |    | add  |    | P1  |    | P3  | <b>x</b> 3 | P1   |    | P2   |
| X   |    | ld   | D4 | P0  |    |     | x6         | P3   |    | P4   |
|     |    |      |    |     |    |     |            |      |    |      |
|     |    |      |    |     |    |     |            |      |    |      |

Execute & Commit





ld x1, 0(x3)
addi x3, x1, #4
sub x6, x7, x6
add x3, x3, x6
ld x6, 0(x1)

ROB

| use | ex | op   | р1 | PR1       | p2 | PR2 | Rd         | LF | Rd | PRd  |
|-----|----|------|----|-----------|----|-----|------------|----|----|------|
| X   | X  | Īd   | р  | P7        |    |     | x1         | P  | 8  | P0   |
| X   | X  | addi | р  | P0        |    |     | х3         | P  | 7  | ├ P1 |
| X   |    | sub  | þ  | P6        | р  | P5  | хб         | Р  | 5  | P3   |
| X   |    | add  | p  | <b>₽1</b> |    | Р3  | <b>x</b> 3 | Р  | 1  | P2   |
| X   |    | ld   | р  | P0        |    |     | x6         | Р  | 3  | P4   |
|     |    |      | '  |           |    |     |            |    |    |      |
|     |    |      |    |           |    |     |            |    |    |      |

Execute & Commit

### ACKNOWLEDGEMENTS

- These slides contain material developed and copyright by:
  - Arvind (MIT)
  - Krste Asanovic (MIT/UCB)
  - Joel Emer (Intel/MIT)
  - James Hoe (CMU)
  - John Kubiatowicz (UCB)
  - David Patterson (UCB)
- MIT material derived from course 6.823
- UCB material derived from course CS252