[Fa17] CS294-73

Software Engineering for Scientific Computing
UC Berkeley

Instructor

Prof. Phillip Colella
colella@eecs.berkeley.edu

Office Hour:
3-4:30 pm, Tuesday & Thursday, 727 Soda Hall

Teaching Assistant (GSI)

``Max'' Chiyu Jiang
chiyu.jiang@berkeley.edu

Office Hour:
9-10:30 am, Monday & Wednesday, Aclove-283E, Soda Hall

Course Description

This course will focus on software development skills and tools that cross most disciplines, and how they are applied towards larger software development goals. The course will be taught in the context of several dominant algorithmic patterns found in scientific computing: dense and sparse linear algebra, structured and unstructured grid methods, particle methods, and fast Fourier transforms (FFTs). The course will be taught using C++, although no prior experience with the language will be assumed. Specific topics will include: working in a typical Unix environment, Makefiles, compilers, revision control systems, debuggers, profilers. Fundamentals of C++: Functions, pointers, references; scoping; classes, encapsulation, inheritance; templates; memory management. Fundamentals of data structures and algorithms as they arise in scientific computing. Asymptotic analysis of algorithmic complexity. Correctness debugging and performance debugging. Engineering larger systems: factoring, reuse, dependencies, composition, use of external libraries.

Course Format

Three hours faculty-led instruction per week, and 4 hours GSI-led, synchronous office hours per week. GSIs will also go over homework assignments, prepare students for their homework assignments and post answer guides to homework assignments after they are submitted. Outside class work should average about 9 hours a week.

Reading List and Resources

Code Complete: A Practical Handbook of Software Construction
http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670
O’Reilly GNU Make
Stroustrup C++ Language Reference
http://docstore.mik.ua/orelly/unix/lrnunix/
http://www.ee.surrey.ac.uk/Teaching/Unix/

Grading

There will be 5 homework (programming) assignments, adding up to 55% of the final grade. The final project is worth 45% of the grade. There will be no final exam.

Prerequisites

The students will have had to have taken an advanced undergraduate course in an area in which simulation is commonly used and at least one of the algorithmic patterns listed above arises; or the consent of the instructor. This course is not open to graduate students in computer science.

Course requirements

Each student to submit all homework assignments, discussion postings for grading, and complete a final project. A laptop, workstation, or access to a UNIX style system is required, as is installation of an Emacs editor.

Learning objectives for this course

A. Develop skills and tools that cross most disciplines, and how they are applied towards larger software development goals.
B. Understanding of dominant algorithmic patterns found in scientific computing: dense and sparse linear algebra, structured and unstructured grid methods, particle methods, fast Fourier transforms (FFT).
C. Students will learn fundamentals of data structures as they arise in scientific computing, asymptotic analysis of algorithmic complexity.
D. Students will be taught C++, a widely-used programming language in scientific computing; how to develop code in a typical Unix environment; procedures for correctness debugging and performance debugging, and the engineering larger systems.

Class Schedule

Week Content Lecture Slides
1 Course introduction. Algorithmic patterns for scientific computing.
Compiled vs interpreted programming environments.
C/C++, compilers, objects, libraries, executables.
Getting your tools installed: g++ / clang, gmake, git, gdb/lldb.
Lecture1
2 C++ key language features: functions, pointers, pass-by-reference, pass-by-value, return value.
Command-line options, argv, argc. Object-oriented programming;
classes, encapsulation, inheritance. Templates. Memory management.
The heap and the stack. Data basics; managing scope; globals;
class basics; encapsulation; rational default behavior
. Build management using GMake. Revision control using git.
Coding standards (including the coding standard for this course).
The various uses of comments. Intro to Doxygen.
Lecture2
Lecture3
3 The structured grid pattern: class RectMDArray. data+methods.
providing an algebra. Stencil operations.
C++ standard template library (STL): STL pointers, arrays. Debugging. Compiler errors, warnings;
printf debugging, gdb/lldb. Calling functions from the debugger. Memory leaks.
Lecture4
Lecture5
4 Sparse linear algebra pattern. Class Vector, class SparseMatrix;
public interface, private interface, friends (or not).
Unstructured grid pattern: finite element class library.
More on GMake. STL vector. I/O, visualization for unstructured grids.
Lecture6
Lecture7
5 Introduction to asymptotic analysis and algorithmic complexity.
More C++ skills. Building interfaces using inheritance. Streams for I/O.
Lecture8
Lecture9
6 FFT and dense linear algebra patterns.
Introduction to recursion.
Using third-party packages (FFTW, LAPACK, BLAS).
Lecture10
Lecture11
7 Introduction to profiling and simple optimization:
instrumenting profilers, sampling profilers. Call tree. hot-spots and bottlenecks.
The harder optimizations: vector instructions, data locality, auto-tuning, and why you don’t want to do these yourself if you can help it (“buy or build”).
Lecture12
Lecture13
8 Particle pattern. N-body (global and local), particle-in-cell, particle-particle particle-mesh algorithms.
Fast discrete convolution using FFTs and Hockney’s method.
STL container classes: vector, lists, maps. Discussion of the first two programming assignments.
Lecture14
Lecture15
9 Managing project complexity and team development.
Revision control systems revisited. Headers, prototypes.
More on Doxygen. Unit testing, test targets. Making libraries.
triangular dependencies, insulation. Project scope and suggestions.
Final project process: submission to class git repo;
intermediate design / documentation milestones and deadlines.
Dictionaries, keeping track of the explosion of parameters using hash tables.
Lecture16
Lecture17
10 Structured-grid methods revisited. Multigrid for structured-grid discretizations of Poisson’s equation.
Multilevel recursion. Performance analysis and high-performance implementations.
11 Mixed-language programming. Interoperating with C, Fortran, Java, MatLab, Python.
Q&A session for final projects. Discussion of programming assignments 3 and 4.
Lecture20
12 A tour through language and compilation: assembly code and machine instructions;
compilation; AST. The illusion of sequential execution.
Case study: Framework design for structured-grid adaptive mesh refinement.
13 Case study: VisIt - design of a high-performance visualization tool.
Q&A session for final projects. Discussion of programming assignment 5.
14 And all the world is parallel: introduction to programming models, memory models, execution models, and how they interact with the algorithmic patterns.
Where to go for more (CS267). Case study: HPGMG - a high-performance multigrid benchmark.

Homeworks

Assignment Due Date
Programming assignment 1 (doc):
Forward-Euler diffusion problem on a structured grid.
I/O, visualization for structured grids.
Sep 21, 2017
23:59
Programming assignment 2 (doc):
Solving Poisson’s equation on an unstructured grid using point Jacobi.
Oct 06, 2017
23:59
Programming assignment 3 (doc):
Comparing naive implementations of FFT, DGEMM to those written by the pros.
Oct 20, 2017
23:59
Programming assignment 4 (doc):
PIC and PPPM in two dimensions.
Nov 01, 2017
23:59
Programming assignment 5 (doc):
Improving the performance of Multigrid.
Nov 13, 2017
23:59

Final Project

Milestone Due Date
Milestone 1:
Identification of the teams, and of the topic for the project. A mathematical description of the problem being solved. This can be in the form of a research article, combined with a concise summary.
Oct 30, 2017
23:59
Milestone 2:
A complete mathematical specification of the discretization methods and / or numerical algorithms to be used to solve the problem. A specification of what will constitute a computational solution to this problem.
Nov 13, 2017
23:59
Milestone 3:
A top-level design of the software used to solve this problem, in the form of header files for the major classes, the mapping of those classes to the algorithm specification given above, and the specification of a testing process for each class.
Nov 27, 2017
23:59
Final Project Report Due Dec 15, 2017
23:59