# CS 61C: Great Ideas in Computer Architecture What's Next and Course Review

Instructors: Krste Asanović and Randy H. Katz http://inst.eecs.Berkeley.edu/~cs61c/fa17

# Agenda

- FireBox: A Hardware Building Block for the 2020 WSC
- Course Review
- Project 3 Performance Competition
- Course On-line Evaluations

#### Agenda

- FireBox: A Hardware Building Block for the 2020 WSC
- Course Review
- Project 3 Performance Competition
- Course On-line Evaluations

#### 11/30/17

ll 2017 -- Lecture #26

# Warehouse-Scale Computers (WSCs)

- Computing migrating to two extremes:
   Mobile and the "Swarm" (Internet of Things)
- The "Cloud"
- Most mobile/swarm apps supported by cloud compute
- All data backed up in cloud
- Ongoing demand for ever more powerful WSCs

#### **Three WSC Generations**

- 1. ~2000: Commercial Off-The-Shelf (COTS) computers, switches, & racks
- 2. ~2010: Custom computers, switches, & racks but build from COTS chips
- ~2020: Custom computers, switches, & racks using custom chips
- Moving from horizontal Linux/x86 model to vertical integration WSC\_OS/WSC\_SoC (System-on-Chip) model
- Increasing impact of open-source model across generations













# WSC: Moore's Law

- Old CW: Moore's Law, each 18-month technology generation, transistor performance/energy improves, cost/transistor decreases
- New CW: generations slowing to 3 year -> 5+ year transistor performance/energy slight improvement, contraction of the state of the stat

2020: Moore's Law has ended for logic, SRAM, & DRAM (Maybe 3D Flash & new NVM continues?)





### Why Custom Chips in 2020?

- Without transistor scaling, improvements in system capability have to come above transistor-level
- More specialized hardware
   WSCs proliferate @ \$100M/WSC
- Economically sound to divert some \$ if yield more cost-performance-energy
  effective chips
- Good news: when scaling stops, custom chip costs drop

   Amortize investments in capital equipment, CAD tools, libraries, training, ...
   over decades vs. 18 months
- New HW description languages supporting parameterized generators improve productivity and reduce design cost

   E.g., Stanford Genesis2; Berkeley's Chisel, based on Scala

# Berkeley RISC-V ISA

- A new completely open ISA

   Already runs GCC, Linux, glibc, LLVM, ...
   RV32, RV64, and RV128 variants for 32b, 64b, and 128b address spaces defined
- Base ISA only 40 integer instructions, but supports compiler, linker, OS, etc.
- Extensions provide full general-purpose ISA, including IEEE-754/2008
  floating-point
- Comparable ISA-level metrics to other RISCs
- Designed for extension, customization
- Eight 64-bit silicon prototype implementations completed at Berkeley so far (45nm, 28nm)

Fall 2017 -- I





#### **FireBox Big Bets**

- Reduce OpEx, manage units of 1,000+ sockets
- · Support huge in-memory (NVM) databases directly
- · Massive network bandwidth to simplify software
- · Re-engineered software/processor/NIC/network for low-overhead messaging between cores, low-latency high-bandwidth bulk memory access
- · Data always encrypted on fiber and in bulk storage
- · Custom SoC with hardware to support above features
- · Open-source hardware generator to allow customization within WSC SoC template

#### **FireBox SoC Highlights**

- ~100 (homogenous) cores per SoC
- Simplify resource management, software model Each core has vector processor++ (>> SIMD)
- "General-purpose specialization"
   Uses RISC-V instruction set
- Open source, virtualizable, modern 64-bit RISC ISA
- GCC/LLVM compilers, runs Linux Cache coherent on-chip so only need one OS per SoC
- Core/outer caches can be split into local/global scratchpad/cache to improve tail tolerance Compress/Encrypt engine so reduce size for storage and transmission yet always
- encrypted outside node Implemented as parameterized Chisel chip generator
- Easy to add custom application accelerators, tune architectural parameters

#### **FireBox Hardware Highlights**

- 8-32 DRAM chips on interposer for high BW

   326b chips give 32-1286B DRAM capacity/node
   500GB/s DRAM bandwidth

   Message Passing is RPC: can return/throw exceptions
- ≈20 ns overhead for send or receive, including SW
   ≈100ns latency to access Bulk Memory: ≈2X DRAM latency
- Error Detection/Correction on Bulk Memory
- No Disks in Standard Box; special Disk Boxes instead Disk Boxes for Cold Storage
- ≈50 KW/box
- ≈35KW for 1000 sockets 20W for socket cores, 10W for socket I/O, 5W for local DRAM

≈15KW for Bulk NVRAM + Crossbar switch • 10<sup>-12</sup> joule/bit transfer => Terabit/sec/Watt

#### Revised FireBox Vision, 2017

- · Not too many mispredicts we were surprisingly mostly on track
- By 2015, we realized that flash was going to dominate, so bulk memory will be DRAM+Flash for forseeable future - Other NVM technology very slow to market, unclear value proposition
  - Flash arrays became huge business
- Custom hardware in datacenter happened faster than expected - Microsoft Catapult, Brainwave; Google TPU/TPU2; Amazon F1 instances
- RISC-V took off far faster than expected
- · Monolithic photonics becoming credible
- From special-purpose FPGA boards, to F1 to run WSC simulations
- · Services as unit of work in datacenter still/more popular
- Security still a big problem



### Agenda

- FireBox: A Hardware Building Block for the 2020 WSC
- Course Review
- Project 3 Performance Competition
- Course On-line Evaluations

















# Six Great Ideas in Computer Architecture

- 1. Design for Moore's Law (Multicore, Parallelism, OpenMP, Project #3.1)
- 2. Abstraction to Simplify Design (Everything a number, Machine/Assembler Language, C, Project #1; Logic Gates, Datapaths, Project #2)
- Make the Common Case Fast (RISC Architecture, Project #2)
- Dependability via Redundancy (ECC, RAID)
- Dependability via Redundancy (LCC, RAD)
   Memory Hierarchy (Locality, Consistency, False Sharing, Project #3.1)
- Performance via Parallelism/Pipelining/Prediction (the five kinds of parallelism, Project #3.1, #3.2,#4)

### The Five Kinds of Parallelism

- 1. Request Level Parallelism (Warehouse Scale Computers)
- 2. Instruction Level Parallelism (Pipelining, CPI > 1, Project #2)
- 3. (Fine Grain) Data Level Parallelism (AVX SIMD instructions, Project #3)
- 4. (Course Grain) Data/Task Level Parallelism (Big Data Analytics, MapReduce/Spark, Project #4)
- 5. Thread Level Parallelism (Multicore Machines, OpenMP, Project #3)





















|                             | Adminis                                                                                                                              | trivia (2/3      | )                                 |
|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------|------------------|-----------------------------------|
|                             | 2 Final Rev                                                                                                                          | iew Sessions     |                                   |
| Led by:                     | Time                                                                                                                                 | Location         | Style:                            |
| Tutors                      | Saturday Dec 2, 11-1pm                                                                                                               | Cory 540AB       | OH, small group                   |
| TAs                         | Friday Dec 8, 5-8pm                                                                                                                  | VLSB 2050        | Lecture style,<br>problem-solving |
| Lab 13<br>Last Gu<br>7-9 PM | (Spark) is due any day th<br>(VM) is due any day nexi<br>ierrilla Session is next Tu<br>@ Cory 293<br>review the most difficult topi | t week<br>esday, | That's all Folks                  |
| 1/30/17                     | Fall 2017                                                                                                                            | 7 Lecture #26    |                                   |



#### CS61c In The News!

WESTERN DIGITAL TO ACCELERATE THE FUTURE OF NEXT-GENERATION COMPUTING ARCHITECTURES FOR BIG DATA AND FAST DATA ENVIRONMENTS



Check Digital Corp. (NASDAC: WDC) announced toda the t7% RISCV Workshop that the company intends to form the rest RISCV Workshop that the company intends to compute architectures to meet the increasingly diverse address. (Western Digital's Check Corp. Compute and the software the company's committee to the place the devine streng to diata-centric workshop workshop that will enable the diversity of Big Diat and Loss and Loss and workloads proliferation of the RISCV big Diata Check Corp. Compute workshop the device the RISCV for diata Diata centers and in verificable compute the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims and the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims and the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that and the RISCV initiality is significant in that it aims that an

### Agenda

- FireBox: A Hardware Building Block for the 2020 WSC
- Course Review
- Project 3 Performance Competition
- Course On-line Evaluations





# What Next?

- EECS151 (spring/fall) if you liked digital systems design
- CS152 (spring) if you liked computer architecture
- CS162 (spring/fall) operating systems and system programming
- CS168 (fall) computer networks

# And, in Conclusion ...

- As the field changes, cs61c had to change too!
- It is still about the software-hardware interface – Programming for performance!
  - Parallelism: Task-, Thread-, Instruction-, and Data-MapReduce, OpenMP, C, AVX intrinsics
  - Understanding the memory hierarchy and its impact on application performance
- Interviewers ask what you did this semester!

Fall 2017 -- Lect

#### Agenda

- FireBox: A Hardware Building Block for the 2020 WSC
- Course Review
- Project 3 Performance Competition
- Course On-line Evaluations:
- HKN Evaluations Today and Electronic Course Evaluations until end of RRR Week! See <u>https://course-evaluations.berkeley.edu</u>