Fork me on GitHub

ReLAPACK

The Recursive LAPACK collection

Author Elmar Peise
GitHub http://github.com/HPAC/ReLAPACK
Travis CI Build Status

The ReLAPACK library provides recursive implementations of a collection of blocked LAPACK routines. These implementations yield the same performance as optimially tuned blocked algorithms, but do not require any tuning themselves. They not only provide a performance boost over vanilla LAPACK but also over highly optimized codes. ReLAPACK 's routines provide the same interface and features as LAPACK's blocked counterparts and can thus be effortlessly used in existing applications.

For further details, see the README on GitHub and our paper on ReLAPACK:

Journal Article

  1. Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection
    ACM Transactions on Mathematical Software (TOMS), March 2017.
    Accepted.
    @article{Peise2017:728,
        author      = "Elmar Peise and Paolo Bientinesi",
        title       = "Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection",
        journal     = "ACM Transactions on Mathematical Software (TOMS)",
        year        = 2017,
        month       = mar,
        note        = "Accepted",
        institution = "AICES, RWTH Aachen University",
        url         = "http://arxiv.org/pdf/1602.06763v1"
    }
    To exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries such as LAPACK commonly implement operations as blocked algorithms. However, to achieve next-to-optimal performance with such algorithms, significant tuning is required. On the other hand, recursive algorithms are virtually tuning free, and yet attain similar performance. In this paper, we first analyze and compare blocked and recursive algorithms in terms of performance, and then introduce ReLAPACK, an open-source library of recursive algorithms to seamlessly replace most of LAPACK's blocked algorithms. In many scenarios, ReLAPACK clearly outperforms reference LAPACK, and even improves upon the performance of optimizes libraries.
    abstractwebPDFbibtexhide

Performance example

MKL vs. ReLAPACK on 1 core of an Intel Xeon E5-2560 v3 (Haswell) and MKL version 11.3

Double-precision inversion of a lower triangular matrix (dtrtri):