High-Performance Matrix Computations --- 2012-13



    Prerequisites

    Basic knowledge of numerical linear algebra.
    Principles of algorithms and programming.
    Familiarity with Matlab and C.

    Overview

    The course centers around the idea of developing efficient numerical algorithms through a synergy between mathematics and architectures.
    We will cover all the following topics. Please also visit [HPMC 2011]

    processor architecture (cpu, memory system, interconnect)
    floating point operations
    roofline model
    vectorization
    matrix-matrix product, BLAS
    factorizations
    methods of relatively robust representations (MR3)
    blocked algorithms
    algorithms by block
    dynamic scheduling
    data parallelism
    shared memory vs. distributed memory paradigm
    synchronization vs. communication



  • Summer semester 2013.

  • CAMPUS #: 13ss-24886.

  • Lectures begin: Tuesday, April 9, 5pm.

  • Lectures & Exercises:
    Tuesday, Thursday: 17.00-18.30. Rogowski 115 - AICES seminar room (Schinkelstrasse 2)

  • Office hours: Tuesdays, 11am-1pm. AICES R432 (Rogowski Building - Schinkelstrasse 2)

    • Schedule

    • April, Tuesday 9; introduction [Intro]

    • April, Thursday 11; computer architecture [lecture 1]

    • April, Tuesday 16; performance [lecture 2] [timer]

    • April, Thursday 18; ger vs. gemm [lecture 3] [Mathematica notebook]
      [Assignment #1]

    • April, Tuesday 23; BLAS, storage, assignment review [BLAS reference] [column vs. row]

      [Assignment #1 again] Deadline: April, Monday 29th, midnight.
      Target: RZ's cluster, Harperton nodes.
      To access an Harperton: login to cluster-linux-xeon.rz.rwth-aachen.de
      To submit jobs (not necessary): add #BSUB -R "select[model==Harpertown]" to your job script.

    • April, Thursday 25;

    • April, Tuesday 30; GEMM, blocked vs. unblocked algorithms. PME. [lecture 4]

    • May, Thursday 2; blocked vs. unblocked, part 2. Cholesky factorization.

    • May, Tuesday 7; what's behind GEMM. [99% of peak] → How To Optimize Gemm

    • May, Tuesday 14; Locality, modularity. Matrix factorizations.

    • May, Thursday 16; GPU part 1. NVIDIA Fermi architecture, CUDA: Execution Model, Programming Model. [material] [CUDA cheat sheet]

    • May, Tuesday 28; GPU part 2. CUDA: Global Memory, Shared Memory. [material] [GPUs on RWTH's cluster]

    • June, Tuesday 4; GPU part 3. CUDA Optimization: Streams, async. execution, occupancy. [material]

    • June, Thursday 6; GPU part 4. NVIDIA Kepler Architecture, CUBLAS, MAGMA, OpenACC. [material]

    • June, Thursday 13; algorithms by blocks. [Uppsala]

    • June, Tuesday 18; GPU part 5/5. Introduction to OpenCL. [material]

    • June, Thursday 20; reduction to tridiagonal form. [material]
      Assignment #3: implement the unblocked and blocked reduction to tridiagonal form in Matlab.
      Deadline: Monday, July 1st, midnight.

    • June, Tuesday 25; tridiagonal eigenproblem. Intro and eigenvalues.

    • June, Thursday 27; tridiagonal eigenproblem.

    • July, Tuesday 2; MRRR, part 1.

    • July, Thursday 4; MRRR, part 2. [material]

    • July, Tuesday 9; project assignment. [projects]

    • July, Thursday 11; collective communications [paper].

    • July, Tuesday 16; matrix distributions.

    • July, Thursday 18; projects discussion.


    • Exam dates - by appointment


    • before Friday, July 26;

    • between August 15 and August 30;

    • between October 2 and October 13;