High-Performance Matrix Computations --- 2015
- Summer semester 2015.
- CAMPUS #: 15ss-24886.
- Lectures begin: Tuesday, April 14.
Lectures & Exercises:
Tuesday, Thursday: 5.15pm Rogowski 115 - AICES seminar room (Schinkelstrasse 2)
- Office hours: Tuesdays, 11am-1pm. AICES R432 (Rogowski Building - Schinkelstrasse 2)
- 14.04 - Introduction. [Notes] [GER]
- 16.04 - Timers. Pipelining. Memory hierarchy, prefetching. [File]
- 21.04 - Locality. Time, performance, TPP, GEMM. [Notes]
- 23.04 - BLAS, scalability. [Notes]
- 28.04 - Storage by Rows & Cols. Caching & cache thrashing. [File] [File]
- 30.04 - Efficiency; turbo vs. heating. [BLAS reference] [File]
- 05.05 - BLAS interface. Tensors & GEMM. [Homework #1]; Due: Friday, May 15th, 1pm.
- 07.05 - Blocked vs. unblocked algorithms. Cholesky factorization. [File]
- 12.05 - Partitioned Matrix Expression, Cholesky variants. [Notes]
- 19.05 - How to optimize GEMM. [rvdgWIKI]
- 21.05 - #flops vs BLAS-level; multithreading (part 1) [File], [# FLOPS]
- 02.06 - review HW1; Least Squares
- 09.06 - ELAPS 1/2. [ELAPS on GitHub]
- 11.06 - ELAPS 2/2 [SandyBridge_MKL.cfg] [cluster batch system] [Homework #2]. Due: Saturday, June 20th, 23.59pm.
- 16.06 - Algorithms by blocks [Paper].
- 18.06 - Roofline Model [Paper]. Eigensolvers (intro).
- 23.06 - Bisection & Inverse Iteration [Section 2.3.1]
- 25.06 - The symmetric eigenproblem
- 30.06 - HW2 review [Archive].
- 02.07 - MRRR, sequential [Section 2.3.2]
- 07.07 - Final project [PDF] [file]
- 09.07 - MRRR, parallelism [Talk]
- 14.07 - Computing Petaflops over Teraflops of data [Paper]
- 16.07 - Semester review
- July 27, 28, 29, 30, 31
- August 3, 4, 5
October 2, 5
PrerequisitesBasic knowledge of numerical linear algebra.
Principles of algorithms and programming.
Familiarity with Matlab and C.
OverviewThe course centers around the idea of developing efficient numerical algorithms through a synergy between mathematics and architectures.
We will cover all the following topics.
processor architecture (cpu, memory system, interconnect)
floating point operations
matrix-matrix product, BLAS
methods of relatively robust representations (MR3)
algorithms by block
shared memory vs. distributed memory paradigm
synchronization vs. communication
ExamsFirst come first served.