#### **Scoreboard Limitations**



- No forwarding read from register
- Structural hazards stall at issue
- WAW hazard stall at issue
- WAR hazard stall at write



IBM 360/91: ~3 years after CDC 6600

- Had very few registers
  - 4 in IBM 360 vs 8 in CDC 6600
  - Resulted in frequent data dependencies.
  - → Needed a way to efficiently resolve WAR & WAW dependencies to maximize opportunity for instruction reordering
- Had longer memory & functional unit latencies

 $\rightarrow$  Needed to find independent instructions in the presence of long-latency stalls

 Solution: Tomasulo's Algorithm for improved dynamic scheduling Tomasulo's Algorithm: key ideas



- Controls and buffers distributed with functional units (scoreboard centralizes this functionality)
  - Called reservation stations
  - Prevents front-end blocking due to a structural hazard
- Register names replaced by pointers to reservation station entries: register renaming
  - Register renaming avoids WAR & WAW hazards by renaming all destination registers
    - Older readers no longer endangered by younger writers (avoids WAR hazard)
    - Newly issued readers always get the value from most recent (in program order) writer (avoids WAW hazard)
- Common data bus broadcasts results to all functional units
  - Provides **forwarding** functionality



- Register renaming accomplished through reservation stations (RS) containing:
  - The instruction
  - Operand values (when available)
  - RS number(s) of instruction(s) providing the operand values











# Example:

| LD r0, 0(r7)     | →        | RS1: LD RS1, 0, 0x1000    |
|------------------|----------|---------------------------|
| LD(r1, 8(r7))    | <b>→</b> | RS2: LDRS2, 8, 0x1000     |
| MUL.D r4, r0, r1 | →        | RS3: MUL.D RS3, RS1, RS2  |
| ADD.D r1, r0, r3 | <b>→</b> | RS4: ADD.D RS4, RS1, 0x16 |

# WAW dependence avoided through renaming!

Q: Which r1 should be written into the register file?

A: Only the last (ADD.D  $\rightarrow$  RS4), thus ensuring that the register file holds the correct register value even if instructions reordered



- As each instruction is issued to an RS:
  - Available values are fetched (from register file) and buffered at the instruction's RS
  - Dataflow (RAW) dependencies resolved by changing source register specifiers to RS' producing those register values
  - A result status register (or rename table) maps each architectural register to the most recent RS producing its value



- Handles RAW with proper stalls and eliminates WAR and WAW through register renaming
- Step 1: Issue
  - Get next instruction from the fetch queue and issue it to the reservation stations if there is a free reservation station
  - Read operands from register file if available or rename operands if pending (resolve WAR, WAW)
- Step 2: Execute
  - Monitor the CDB for operand(s). Once available, store into all reservation stations waiting for it
  - Execute instruction when both operands are ready in the reservation station (RAW)
- Step 3: Write result
  - Put the result on CDB and write it into the register file (if last producer) and all reservation stations waiting on it (RAW)



# IBM S/360 model 91 used Tomasulo's Algorithm

- Dynamic O-O-O execution
- Tags (RS #'s) used to name flow dependencies
- 5 reservation stations
- 6 load buffers
- Issue instructions to reservation stations, load buffers and store buffers
- Instructions wait in reservation stations or store buffers until all their operands are collected
- Functional units broadcast result and tag on the Common Data Bus (CDB) for all reservation stations, store buffers and FP register file



Reservation stations associated with functional units: simplifies scheduling & management of structural hazards



- Op: Operation to be performed
- Qj, Qk: Reservation station producing source registers
- Vj, Vk: Values of source operands
- Busy: indicates whether reservation station is busy
- Register result status Qi: indicates which RS will write each register, if one exists. Blank otherwise.



#### Instruction Issue:

Get next instruction from head of the issue queue

If reservation station RS is available then:

```
For each p in { j, k } representing operand register uIf Reg[u].Qi == 0 then RS.Vp = Reg[u].value// value ready nowIf Reg[u].Qi != 0 then RS.Qp = Reg[u].Qi// value not yet readyRS.Busy = 1// reserve this RSRS.Op = instruction opcode// set the operation
```

#### Execution:

Wait until (RS.Qj == 0) and (RS.Qk == 0), and whilst waiting:
 For each p in { j, k }
 If CDB.tag == RS.Qp then { RS.Vp = CDB.value; RS.Qp = 0 }
When (RS.Qj == 0) and (RS.Qk == 0), perform operation in RS.Op

#### • Write Result:

When CDB is free, broadcast CDB = { tag = RS.id, value = RS.result } and clear RS.Busy Inf3 Computer Architecture - 2016-2017



- LDs: 2 cycles
- ADDs and SUBDs: 2 cycles
- MULTDs: 10 cycles
- DIVDs: 40 cycles











| Instr        | ructior   | n sta | tus:   |      |       | Exec  | Write     |       |       |            |         |     |     |
|--------------|-----------|-------|--------|------|-------|-------|-----------|-------|-------|------------|---------|-----|-----|
| Ins          | struction | n     | j      | k    | Issue | Comp  | Result    |       |       | Busy       | Address |     |     |
| LI           | D         | F6    | 34+    | R2   | 1     |       |           |       | Load1 | Yes        | 34+R2   |     |     |
| LI           | D         | F2    | 45+    | R3   | 2     |       |           |       | Load2 | Yes        | 45+R3   |     |     |
| $\mathbf{M}$ | ULTD      | F0    | F2     | F4   |       |       |           |       | Load3 | No         |         |     |     |
| SU           | JBD       | F8    | F6     | F2   |       |       |           |       |       |            |         |     |     |
| DI           | IVD       | F10   | FO     | F6   |       |       |           |       |       |            |         |     |     |
| AI           | DDD       | F6    | F8     | F2   |       |       |           |       |       |            |         |     |     |
| Rese         | ervatic   | on St | ations | 5:   |       | S1    | <i>S2</i> | RS    | RS    |            |         |     |     |
|              |           | Time  | Name   | Busy | Op    | Vj    | Vk        | Qj    | Qk    | _          |         |     |     |
|              |           |       | Add1   | No   |       |       |           |       |       |            |         |     |     |
|              |           |       | Add2   | No   |       |       |           |       |       |            |         |     |     |
|              |           |       | Add3   | No   |       |       |           |       |       |            |         |     |     |
|              |           |       | Mult1  | No   |       |       |           |       |       |            |         |     |     |
|              |           |       | Mult2  | No   |       |       |           |       |       |            |         |     |     |
| Regi         | ister re  | esult | statu  | s:   |       |       |           |       |       |            |         |     |     |
| $\mathbf{C}$ | lock      |       |        |      | F0    | F2    | F4        | F6    | F8    | <i>F10</i> | F12     | ••• | F30 |
|              | 2         |       |        | FU   |       | Load2 |           | Load1 |       |            |         |     |     |





| Clock |    | F0    | F2    | F4 | F6    | F8 | <i>F10</i> | F12 | ••• | F30 |
|-------|----|-------|-------|----|-------|----|------------|-----|-----|-----|
| 3     | FU | Mult1 | Load2 |    | Load1 |    |            |     |     |     |





Register result status:

| Clock |    | F0    | F2    | F4 | F6    | F8   | <i>F10</i> | <i>F12</i> | ••• | <i>F30</i> |
|-------|----|-------|-------|----|-------|------|------------|------------|-----|------------|
| 4     | FU | Mult1 | Load2 |    | M(A1) | Add1 |            |            |     |            |









| Clock | _  | F0    | F2    | F4 | F6   | F8   | <i>F10</i> | <i>F12</i> | ••• | <i>F30</i> |
|-------|----|-------|-------|----|------|------|------------|------------|-----|------------|
| 6     | FU | Mult1 | M(A2) |    | Add2 | Add1 | Mult2      |            |     |            |









Mult1

FU

M(A2)

8

Add2

(M-M) Mult2





Mult1

FU

M(A2)

9

Add2

(M-M) Mult2





| Clock |    | F0    | F2    | F4 | F6   | F8    | F10   | <i>F12</i> | ••• | <i>F30</i> |
|-------|----|-------|-------|----|------|-------|-------|------------|-----|------------|
| 10    | FU | Mult1 | M(A2) |    | Add2 | (M-M) | Mult2 |            |     |            |



| Inst | ruction   | n sta | tus:          |            |       | Exec          | Write     |       |          |            |         |     |     |
|------|-----------|-------|---------------|------------|-------|---------------|-----------|-------|----------|------------|---------|-----|-----|
| I    | nstructio | n     | j             | k          | Issue | Comp          | Result    |       |          | Busy       | Address |     |     |
| L    | D         | F6    | 34+           | R2         | 1     | 3             | 4         |       | Load1    | No         |         |     |     |
| L    | D         | F2    | 45+           | R3         | 2     | 4             | 5         |       | Load2    | No         |         |     |     |
| Ν    | AULTD     | F0    | F2            | <b>F</b> 4 | 3     |               |           |       | Load3    | No         |         |     |     |
| S    | UBD       | F8    | F6            | F2         | 4     | 7             | 8         |       |          |            |         |     |     |
| Γ    | DIVD      | F10   | $\mathbf{F0}$ | <b>F6</b>  | 5     |               |           |       |          |            |         |     |     |
| A    | ADDD      | F6    | F8            | F2         | 6     | 10            | 11        |       |          |            |         |     |     |
| Res  | ervatic   | on St | ations        |            |       | <i>S1</i>     | <i>S2</i> | RS    | RS       |            |         |     |     |
|      |           | Time  | Name          | Busy       | Op    | V j           | Vk        | Qj    | Qk       |            |         |     |     |
|      |           |       | Add1          | No         |       |               |           |       |          |            |         |     |     |
|      |           |       | Add2          | No         |       |               |           |       |          |            |         |     |     |
|      |           |       | Add3          | No         |       |               |           |       |          |            |         |     |     |
|      |           | 4     | Mult1         | Yes        | MULTE | <b>M</b> (A2) | R(F4)     |       |          |            |         |     |     |
|      |           |       | Mult2         | Yes        | DIVD  |               | M(A1)     | Mult1 |          |            |         |     |     |
| Reg  | vister re | esult | statu         | s:         |       |               |           |       |          |            |         |     |     |
| (    | Clock     |       |               |            | F0    | F2            | F4        | F6    | F8       | <i>F10</i> | F12     | ••• | F30 |
|      | 11        |       |               | FU         | Mult1 | M(A2)         | (1        | M-M+N | /. (M-M) | Mult2      |         |     |     |





Mult1

FU

M(A2)

(M-M+N(M-M) Mult2





Mult1

FU

M(A2)

(M-M+N(M-M) Mult2









Mult1

FU

M(A2)

15

(M-M+N(M-M) Mult2



| Instructio | on sta     | tus:          |            |              | Exec  | Write     |       |          |            |         |     |
|------------|------------|---------------|------------|--------------|-------|-----------|-------|----------|------------|---------|-----|
| Instructi  | on         | j             | k          | Issue        | Comp  | Result    |       |          | Busy       | Address |     |
| LD         | F6         | 34+           | R2         | 1            | 3     | 4         |       | Load1    | No         |         |     |
| LD         | F2         | 45+           | R3         | 2            | 4     | 5         |       | Load2    | No         |         |     |
| MULTE      | <b>F</b> 0 | F2            | F4         | 3            | 15    | 16        |       | Load3    | No         |         |     |
| SUBD       | F8         | F6            | F2         | 4            | 7     | 8         |       |          |            |         |     |
| DIVD       | F10        | $\mathbf{F0}$ | F6         | 5            |       |           |       |          |            |         |     |
| ADDD       | F6         | F8            | F2         | 6            | 10    | 11        |       |          |            |         |     |
| Reservat   | ion St     | ations        | s:         |              | S1    | <i>S2</i> | RS    | RS       |            |         |     |
|            | Time       | Name          | Busy       | Op           | Vj    | Vk        | Qj    | Qk       | _          |         |     |
|            |            | Add1          | No         |              |       |           |       |          |            |         |     |
|            |            | Add2          | No         |              |       |           |       |          |            |         |     |
|            |            | Add3          | No         |              |       |           |       |          |            |         |     |
|            |            | Mult1         | No         |              |       |           |       |          |            |         |     |
|            | 40         | ) Mult2       | Yes        | DIVD         | M*F4  | M(A1)     |       |          | ]          |         |     |
| Register   | result     | t statu       | <i>s</i> : |              |       |           |       |          |            |         |     |
| Clock      |            |               |            | F0           | F2    | F4        | Fб    | F8       | <i>F10</i> | F12     | ••• |
| 16         |            |               | FU         | <b>M*</b> F4 | M(A2) | ()        | M-M+N | √. (M-M) | Mult2      |         |     |



| Instruct | ior  | ı sta | tus:   |      |              | Exec         | Write     |       |         |            |         |     |     |
|----------|------|-------|--------|------|--------------|--------------|-----------|-------|---------|------------|---------|-----|-----|
| Instruc  | ctio | n     | j      | k    | Issue        | Comp         | Result    |       |         | Busy       | Address |     |     |
| LD       |      | F6    | 34+    | R2   | 1            | 3            | 4         |       | Load1   | No         |         |     |     |
| LD       |      | F2    | 45+    | R3   | 2            | 4            | 5         |       | Load2   | No         |         |     |     |
| MULT     | D    | F0    | F2     | F4   | 3            | 15           | 16        |       | Load3   | No         |         |     |     |
| SUBD     |      | F8    | F6     | F2   | 4            | 7            | 8         |       |         |            |         |     |     |
| DIVD     |      | F10   | FO     | F6   | 5            |              |           |       |         |            |         |     |     |
| ADDI     | )    | F6    | F8     | F2   | 6            | 10           | 11        |       |         |            |         |     |     |
| Reserva  | itio | n St  | ations | :    |              | <i>S1</i>    | <i>S2</i> | RS    | RS      |            |         |     |     |
|          |      | Time  | Name   | Busy | Op           | Vj           | Vk        | Qj    | Qk      |            |         |     |     |
|          |      |       | Add1   | No   |              |              |           |       |         |            |         |     |     |
|          |      |       | Add2   | No   |              |              |           |       |         |            |         |     |     |
|          |      |       | Add3   | No   |              |              |           |       |         |            |         |     |     |
|          |      |       | Mult1  | No   |              |              |           |       |         |            |         |     |     |
|          |      | 1     | Mult2  | Yes  | DIVD         | <b>M*</b> F4 | M(A1)     |       |         |            |         |     |     |
| Register | r re | esult | statu  | s:   |              |              |           |       |         |            |         |     |     |
| Cloc     | k    |       |        |      | F0           | F2           | F4        | F6    | F8      | <i>F10</i> | F12     | ••• | F30 |
| 55       |      |       |        | FU   | <b>M*</b> F4 | M(A2)        | ()        | A-M+I | V (M-M) | Mult2      |         |     |     |



| In | structio   | n sta | tus:   |      |       | Exec         | Write     |    |       |            |         |     |
|----|------------|-------|--------|------|-------|--------------|-----------|----|-------|------------|---------|-----|
|    | Instructio | n     | j      | k    | Issue | Comp         | Result    |    |       | Busy       | Address |     |
|    | LD         | F6    | 34+    | R2   | 1     | 3            | 4         |    | Load1 | No         |         |     |
|    | LD         | F2    | 45+    | R3   | 2     | 4            | 5         |    | Load2 | No         |         |     |
|    | MULTD      | F0    | F2     | F4   | 3     | 15           | 16        |    | Load3 | No         |         |     |
|    | SUBD       | F8    | F6     | F2   | 4     | 7            | 8         |    |       |            |         |     |
|    | DIVD       | F10   | FO     | F6   | 5     | 56           |           |    |       |            |         |     |
|    | ADDD       | F6    | F8     | F2   | 6     | 10           | 11        |    |       |            |         |     |
| Re | eservatio  | on St | ations | 5:   |       | <i>S1</i>    | <i>S2</i> | RS | RS    |            |         |     |
|    |            | Time  | Name   | Busy | Op    | Vj           | Vk        | Qj | Qk    | _          |         |     |
|    |            |       | Add1   | No   |       |              |           |    |       |            |         |     |
|    |            |       | Add2   | No   |       |              |           |    |       |            |         |     |
|    |            |       | Add3   | No   |       |              |           |    |       |            |         |     |
|    |            |       | Mult1  | No   |       |              |           |    |       |            |         |     |
|    |            | C     | Mult2  | Yes  | DIVD  | <b>M*</b> F4 | M(A1)     |    |       |            |         |     |
| Re | egister r  | esult | statu  | s:   |       |              |           |    |       |            |         |     |
|    | Clock      |       |        |      | F0    | F2           | F4        | F6 | F8    | <i>F10</i> | F12     | ••• |

M\*F4 M(A2)

FU

(M-M+N(M-M) Mult2)

56



| In | structio   | n sta | tus:    |            |       | Exec         | Write     |    |       |            |            |   |
|----|------------|-------|---------|------------|-------|--------------|-----------|----|-------|------------|------------|---|
|    | Instructio | n     | j       | k          | Issue | Comp         | Result    |    |       | Busy       | Address    |   |
|    | LD         | F6    | 34+     | R2         | 1     | 3            | 4         |    | Load1 | No         |            |   |
|    | LD         | F2    | 45+     | R3         | 2     | 4            | 5         |    | Load2 | No         |            |   |
|    | MULTD      | F0    | F2      | F4         | 3     | 15           | 16        |    | Load3 | No         |            |   |
|    | SUBD       | F8    | F6      | F2         | 4     | 7            | 8         |    |       |            |            |   |
|    | DIVD       | F10   | F0      | F6         | 5     | 56           | 57        |    |       |            |            |   |
|    | ADDD       | F6    | F8      | F2         | 6     | 10           | 11        |    |       |            |            |   |
| Re | eservatio  | on St | ation   | s:         |       | <i>S1</i>    | <i>S2</i> | RS | RS    |            |            |   |
|    |            | Time  | Name    | Busy       | Op    | V j          | Vk        | Qj | Qk    | _          |            |   |
|    |            |       | Add1    | No         |       |              |           |    |       |            |            |   |
|    |            |       | Add2    | No         |       |              |           |    |       |            |            |   |
|    |            |       | Add3    | No         |       |              |           |    |       |            |            |   |
|    |            |       | Mult1   | No         |       |              |           |    |       |            |            |   |
|    |            |       | Mult2   | Yes        | DIVD  | <b>M*</b> F4 | M(A1)     |    |       |            |            |   |
| Re | egister r  | esult | t statu | <i>s</i> : |       |              |           |    |       |            |            |   |
|    | Clock      |       |         |            | F0    | F2           | F4        | F6 | F8    | <i>F10</i> | <i>F12</i> | • |

 $F0 \quad F2 \quad F4 \quad F6 \quad F8 \quad F10 \quad F12 \quad \dots \quad F30$   $FU \quad M^{*}F4 \quad M(A2) \qquad (M-M+N(M-M) \quad \text{Result}$ 



- Register renaming:
  - $Q_j$  and  $Q_k$  can come from any reservation station independent of the register file  $\rightarrow$  in fact we could have many more reservation stations than registers
  - $V_j$  and  $V_k$  store the actual <u>value</u> to be used
- Parallel release of all instructions dependent as soon as the earlier instruction completes (both sub.d and MUL.d get the value from Load\_2)
- No need to wait on WAR and WAW (notice that ADD.D has issued before DIV.D has read its f6 operand and will execute as soon as the SUB.D finishes)