Project 3 Extra Credit Details...

TA: Andrew Gearhart


  1. You may implement any optimization that correctly calculates A’A!!!!
  2. Matrices: The test matrices will be between 100000 <= M <= 200000 and 10 <= N <= 1000. I’m going to try some strange sizes within these bounds, so make sure your fringe cases work correctly!
  3. All submitted code will be run on the machines within 200SD for evaluation purposes. Thus, you may use the EC2 to develop code (EC2 guidelines), but please realize that the lab machines have a slower clock rate. (The EC2 machines have have same microarchitecture, so the relative performance improvement over your Part 2 submissions should be similar. Also, you shouldn’t have to use different block sizes for the two types of machines.)
  4. Grading:
  1. Extra credit will be awarded for 3 catagories of submissions:
  1. Top 3 serial codes and the top 3 parallel codes (on 8 threads)
  1. Note: Codes within 100MFlop/s will be considered “tied”, and both will be awarded extra credit.
  1. If you achieved over a 20% improvement over the performance of your sgemm-see.cpp or sgemm-openmp.cpp submissions for Part 2.
  2. Partial credit for through implementations of advanced concepts, that might not achieve speedup. (i.e. You correctly implement Strassen, but it runs slower) Realize that to obtain extra credit via this option, a significant amount of work and effort needs to be dedicated.
  1. Submissions:
  1. Codes: sgemm-ec-serial.cpp and/or sgemm-ec-parallel.cpp
  2. Report: ec-report.pdf
  1. Please submit a pdf file or doc!!!! No .docx/.xlsx!!! I use Ubuntu, so printing out these Office files is quite frustrating.
  2. Plot speedup of sgemm-ec-serial.cpp vs. sgemm-sse.cpp for increasing matrix size. For parallel code submissions plot weak scaling and strong scaling of sgemm-ec-parallel.cpp and sgemm-openmp.cpp (two lines on the same plot). If the plot requires a fixed problem size, use A = 100000x500. Please plot speedup as the ratio of two performance values!!!!!
  3. Create a list of implemented optimizations, and in a couple sentences for each evaluate effectiveness.