

- Pipelining: Issue instructions in every cycle (CPI  $\rightarrow$  1)
- Compiler scheduling (static scheduling) reduces impact of dependences
  - Increased compiler complexity, especially when attempting global scheduling (across BB's)
  - Limited information at compile time (branch outcomes, memory addresses, cache misses)
  - Not portable to different pipeline implementations
- Hardware scheduling so far: in-order instruction execution
  - Instructions after a stalled instruction must wait even if independent



- Example: DIV.D F0,F2,F4 ;F0=F2/F4
  ADD.D F10,F0,F8 ;F10=F0+F8
  SUB.D F12,F8,F14 ;F12=F8-F14
  - DIV.D is a long latency operation
  - ADD.D depends on DIV.D but SUB.D does not
- Solution: out-of-order execution
  - Detect dependence of ADD.D and block it
  - Detect that SUB.D is not dependent and execute it
  - Now SUB.D executes before ADD.D even though it comes after it in program order
  - Hardware must be able to look ahead of blocked instructions
  - Multiple functional units can be effectively used



- Instruction fetch: fetch instruction from memory
- Instruction issue: decode instruction, check for structural hazards, and send to execution units
- Instruction execution: execute instruction after registers are read once dependences are cleared
- Instruction completion (or retire or commit): finish instruction and update processor state
- Some combinations are possible:
  - In-order issue, execution and completion
  - In-order issue and out-of-order execution and in-order completion
  - Out-of-order issue, execution and completion



- Read after Write RAW (Flow, True)
  - MUL R3, R1,R2
  - DADD R5, R3, R4
- Write after Read WAR (Anti, Name)
  - MUL R3, R1, R2
  - DADD R1, R5, R6
- Write after Write WAW (Output, Name)
  - MUL R3, R1, R2
  - DADD R3, R4, R5



- Handles all RAW, WAR, and WAW with proper stalls, but allows independent instructions to proceed
- Step 1: Issue (part of original ID stage)
  - Issue instruction to functional unit iff functional unit is free and no earlier instruction writes to the same destination register (WAW)
- Step 2: Read operands (part of original ID stage)
  - Wait until source registers become available from earlier instructions through register file (RAW)
- Step 3: Execute (original EXE stage)
  - Execute instruction and notify scoreboard when done
- Step 4: Write result (original WB stage)
  - Wait until earlier instructions read operands before writing to register file (WAR)



- Instruction status: either one of the four steps of the instruction operation (Issue, Read Op, Execute, Write)
- Functional unit status:
  - Busy functional unit is being used
  - Op type of operation to be performed (e.g., add, sub, etc.)
  - F<sub>i</sub> destination register
  - $F_j$ ,  $F_k$  source registers
  - $Q_j$ ,  $Q_k$  functional units producing  $F_j$  and  $F_k$
  - $R_j$ ,  $R_k$  read flag, indicates if  $F_j$  and  $F_k$  are ready but not yet read. Set to "no" after operands are read.
- Register result status: indicates which functional unit will write the register next (one per register), and also reserves register thereby detecting WAW hazards.

# Scoreboarding Pipeline Control



|     | Instruction<br>status | Wait until                                                                          | Bookkeeping                                                                                                                                                  |  |  |  |  |
|-----|-----------------------|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| WAW | Issue                 | س Not busy (FU)<br>and not result(D)                                                | Busy(FU)← yes; Op(FU)← op;<br>Fi(FU)← `D'; Fj(FU)← `S1';<br>Fk(FU)← `S2'; Qj← Result('S1');<br>Qk← Result(`S2'); Rj← not Qj;<br>Rk← not Qk; Result('D')← FU; |  |  |  |  |
|     | Read<br>operands      | Rj and Rk                                                                           | Rj← No; Rk← No                                                                                                                                               |  |  |  |  |
|     | Execution complete    | Functional unit<br>done                                                             |                                                                                                                                                              |  |  |  |  |
|     | Write result          | ∀f((Fj( f )!=Fi(FU)<br>or Rj( f )=No) &<br>(Fk( f )!=Fi(FU)<br>→ or<br>Rk( f )=No)) | ∀f(if Qj(f)=FU then Rj(f)← Yes);<br>∀f(if Qk(f)=FU then Rj(f)← Yes);<br>Result(Fi(FU))← 0; Busy(FU)← No                                                      |  |  |  |  |
|     | Ŵ                     | AR                                                                                  |                                                                                                                                                              |  |  |  |  |



Instruction sequence:

| L.D   | F6, 34(R2)  |
|-------|-------------|
| L.D   | F2, 45(R3)  |
| MUL.D | FO, F2, F4  |
| SUB.D | F8, F6, F2  |
| DIV.D | F10, F0, F6 |
| ADD.D | F6, F8, F2  |

- Latencies:
  - Integer  $\rightarrow$  1 cycle
  - FP add  $\rightarrow$  2 cycles
  - FP multiply  $\rightarrow$  10 cycles
  - FP divide  $\rightarrow$  40 cycles
- Functional units: 1 integer (also for ld/st), 1 FP adder, 2 FP multipliers, 1 FP divider























| Instruction status     |                        |           |           |           | Read   | Execution           | Writ       | e         |         |          |      |        |
|------------------------|------------------------|-----------|-----------|-----------|--------|---------------------|------------|-----------|---------|----------|------|--------|
| Instruction            | Instruction <i>j k</i> |           |           | Issue     | opera  | operal.complete Res |            |           |         |          |      |        |
| LD                     | F6                     | 34+       | R2        | 1         | 2      | 3                   | 4          |           | MIII.   | Can'     | t ro | ad ite |
| LD                     | F2                     | 45+       | R3        | 5         | 6      | 7                   |            |           | WOL     |          |      | au no  |
| MULTD                  | F0                     | F2        | F4        | 6         |        |                     |            |           | oper    | ands (   | (F2) |        |
| SUBD                   | F8                     | F6        | F2        | 7         |        |                     |            |           | beca    | use L    | D #2 | 2      |
| DIVD                   | F10                    | <b>F0</b> | <b>F6</b> |           |        |                     |            |           | hace    | 't finie | sha  | -      |
| ADDD                   | F6                     | F8        | F2        |           |        |                     |            |           | 114511  | L IIIIIS | sneu | J.     |
| Functional unit status |                        |           |           | dest      | S1     | S2                  | FU for j   | FU for k  | Fj?     | Fk?      |      |        |
|                        | Tim                    | mName     |           | Busy      | Ор     | Fi                  | Fj         | Fk        | Qj      | Qk       | Rj   | Rk     |
|                        |                        | Inte      | ger       | Yes       | Load   | F2                  |            | R3        |         |          |      | No     |
|                        |                        | Mul       | t1        | Yes       | Mult   | F0                  | F2         | F4        | Integer |          | No   | Yes    |
|                        |                        | Mul       | t2        | No        |        |                     |            |           |         |          |      |        |
|                        |                        | Add       |           | Yes       | Subd   | F8                  | F6         | F2        |         | Integer  | Yes  | No     |
|                        |                        | Divi      | de        | No        |        |                     |            |           |         |          |      |        |
| Register               | resi                   | ult sta   | atus      |           |        |                     |            |           |         |          |      |        |
| Clock                  |                        |           |           | <i>F0</i> | F2     | F4                  | <i>F</i> 6 | <i>F8</i> | F10     | F12      |      | F30    |
| 7                      |                        |           | FU        | Mult      | Intege | er                  |            | Add       |         |          |      |        |







| $\frac{\text{Instruction status}}{\text{Instruction}  j  k}$ $\text{LD}  F6  34 + R2$ $\text{LD}  F2  45 + R3$ |               |                                              |                 | Read EX Writ<br>IssueOp compl Res |        |      | te<br>sult | Now MULT and<br>SUBD can both<br>read F2. |        |            |     |     |  |
|----------------------------------------------------------------------------------------------------------------|---------------|----------------------------------------------|-----------------|-----------------------------------|--------|------|------------|-------------------------------------------|--------|------------|-----|-----|--|
| MULTD                                                                                                          | F∠<br>F0      | 40+<br>F2                                    | кз<br>F4        | 6                                 | 6<br>9 | 1    | 0          |                                           |        |            |     |     |  |
| SUBD                                                                                                           | F8            | F6                                           | F2              | 7                                 | 9      |      |            |                                           |        |            |     |     |  |
| DIVD                                                                                                           | F10           | F0                                           | F6              | 8                                 |        |      |            |                                           |        |            |     |     |  |
| ADDD                                                                                                           | F6            | F8                                           | F2              |                                   |        |      |            | J                                         |        |            |     |     |  |
| Functional unit status                                                                                         |               |                                              |                 |                                   |        | dest | S1         | S2                                        | FU for | j FU for k | Fj? | Fk? |  |
|                                                                                                                | Time Name     |                                              |                 | Busy                              | Ор     | Fi   | Fj         | Fk                                        | Qj     | Qk         | Rj  | Rk  |  |
|                                                                                                                | 10            | Inte<br>Mul <sup>:</sup><br>Mul <sup>:</sup> | ger<br>t1<br>t2 | No<br>Yes<br>No                   | Mult   | F0   | F2         | F4                                        |        |            | No  | No  |  |
|                                                                                                                | 2             | Add                                          |                 | Yes                               | Sub    | F8   | F6         | F2                                        |        |            | No  | No  |  |
|                                                                                                                |               | Divi                                         | de              | Yes                               | Div    | F10  | F0         | F6                                        | Mult1  |            | No  | Yes |  |
| <u>Registe</u>                                                                                                 | <u>r resu</u> | <u>ilt sta</u>                               | <u>atus</u>     |                                   |        |      |            |                                           |        |            |     |     |  |
| Clock                                                                                                          |               |                                              |                 | FO                                | F2     | F4   | F6         | F8                                        | F10    | F12        |     | F30 |  |
| 9                                                                                                              |               |                                              | FU              | Mult                              | 1      |      |            | Add                                       | Divide |            |     |     |  |

































![](_page_24_Picture_1.jpeg)

![](_page_24_Figure_2.jpeg)

![](_page_25_Picture_1.jpeg)

![](_page_25_Figure_2.jpeg)

![](_page_26_Picture_1.jpeg)

![](_page_26_Figure_2.jpeg)

![](_page_27_Picture_1.jpeg)

| <b>Instruction</b> | S             |              | Read  | Executi | Write  | <del>)</del> |           |          |        |       |     |
|--------------------|---------------|--------------|-------|---------|--------|--------------|-----------|----------|--------|-------|-----|
| Instruction        | j             | k            | Issue | operand | comple | Resu         | lt 🛛      |          |        |       |     |
| LD F6              | 34+           | R2           | 1     | 2       | 3      | 4            |           |          |        |       |     |
| LD F2              | <b>45</b> +   | R3           | 5     | 6       | 7      | 8            |           |          |        |       |     |
| MULTIFO            | F2            | F4           | 6     | 9       | 19     | 20           |           |          |        |       |     |
| SUBD F8            | F6            | F2           | 7     | 9       | 11     | 12           |           |          |        |       |     |
| DIVD F10           | FO            | F6           | 8     | 21      | 61     | 62           |           |          |        |       |     |
| ADDDF6             | F8            | F2           | 13    | 14      | 16     | 22           |           |          |        |       |     |
| <b>Functional</b>  | <u>unit s</u> | <u>tatus</u> |       |         | dest   | S1           | <i>S2</i> | FU for J | FU for | k Fj? | Fk? |
| Time Name          |               |              | Busy  | Ор      | Fi     | Fj           | Fk        | Qj       | Qk     | Rj    | Rk  |
|                    | Integ         | jer          | No    |         |        |              |           |          |        |       |     |
|                    | Mult          | 1            | No    |         |        |              |           |          |        |       |     |
|                    | Multa         | 2            | No    |         |        |              |           |          |        |       |     |
|                    | Add           |              | No    |         |        |              |           |          |        |       |     |
| 0                  | Divid         | е            | No    |         |        |              |           |          |        |       |     |
| Register re        | sult st       | tatus        |       |         |        |              |           |          |        |       |     |
|                    | oure o        |              |       |         |        |              |           |          |        |       |     |
| Clock              |               |              | FO    | F2      | F4     | F6           | F8        | F10      | F12    |       | F30 |

![](_page_28_Picture_1.jpeg)

- Dynamically schedules instructions
- Forces instructions to wait on RAW, WAR, WAW dependences and structural hazards
- First used in the CDC 6600 in 1964 and yielded performance improvements of 1.7 to 2.5 times
- Hardware cost (size) of scoreboard equivalent to one of the functional units

![](_page_29_Picture_1.jpeg)

- No forwarding read from register
- Structural hazards stall at issue
- WAW hazard stall at issue
- WAR hazard stall at write

<u>Next lecture:</u> Dynamic scheduling using Tomasulo's algorithm

- Avoids WAW & WAR via register renaming
- Supports forwarding using a centralized result bus