HighPerformance Matrix Computations  2011
Prerequisites
Basic knowledge of numerical linear algebra.
Principles of algorithms and programming.
Familiarity with Matlab and C.
Overview
This course is research and programming oriented.

Summer semester 2011.

CAMPUS #: 11ss24886
Prof. Paolo Bientinesi & Prof. Martin Buecker

Lectures
Mondays, 17.1518.45, AICES R432 (Rogowski Building  Schinkelstrasse 2)
Tuesdays, 17.0018.30, AICES R432 (Rogowski Building  Schinkelstrasse 2)

Office hours
Tuesdays, 11am1pm, AICES R432 (Rogowski Building  Schinkelstrasse 2)

First meeting: Tuesday, April 5th, 5pm.
Lectures begin: Monday, April 11th, 5.15pm.

Classes

April 5th
Introduction (Paolo & Martin).

April 11th & 12th
ALU, memory hierarchy, pipelining, caching, cost metrics, performance.
Examples: DGER, DGEMM.
Reference: "Computer organization and design: the hardware/software interface", by
David A. Patterson and John L. Hennessy.

April 18th & 19th
BLAS, HPL.

April 26th
Parallel performance, scalability.

May 2nd, 3rd
LAPACK;
collective communication, 2D matrix distribution, analysis of MV; Strassen.

May 9th
Intro to iterative methods. (Martin)

May 10th
Linear systems vs. eigenproblems.
Question (BLAS 3): Can you use BLAS3 routines if your matrices are stored by rows?
Program (LAPACK): Write a program that computes the
eigenvalues W and eigenvectors Z of a symmetric matrix A
using the Divide & Conquer method. Check the correctness of
your program by computing the residual A Z  Z W and the
orthogonality Z^T Z  I. Link your program against an
optimized BLAS (GotoBLAS, MKL, ...), and if possible compare
the timings with the reference (unoptimized) BLAS.

May 16th & 17th
Generalized, standard, tridiagonal eigenproblems;
Gerschgorin theorems;
reductions & backtransformations.
Program (Mathematica/Matlab): Write a program that mimics the proof of the second Gerschgorin's theorem.

May 23th & 24th
Blocked reduction to tridiagonal form;
bisection, Sturm's count.

May 30th & 31th
GPU programming. (Christian Lessig)
Material:
Lecture 1
Lecture 2
[PDF]
On the design of display processors.
T. H. Myer and I. E. Sutherland.
Commun. ACM 11(6) 1968.

June 6th & 7th
The algorithm of Multiple Relatively Robust Representations (MRRR).
Parallelization for distributed and shared memory architectures.

June 18th & 19th
June 25th & 26th
July 4th & 5th
Sparse matrixvector multiplication. Graphs & hypergraphs. (Martin)

July 11th (final class)
Multithreaded BLAS vs. algorithms by blocks.