The Recursive LAPACK collection
The ReLAPACK library provides recursive implementations of a collection of blocked LAPACK routines. These implementations yield the same performance as optimially tuned blocked algorithms, but do not require any tuning themselves. They not only provide a performance boost over vanilla LAPACK but also over highly optimized codes. ReLAPACK 's routines provide the same interface and features as LAPACK's blocked counterparts and can thus be effortlessly used in existing applications.
For further details, see the README on GitHub and our paper on ReLAPACK:
- Recursive Algorithms for Dense Linear Algebra: The ReLAPACK CollectionACM Transactions on Mathematical Software (TOMS), February 2016.abstractwebPDFhideTo exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries such as LAPACK commonly implement operations as blocked algorithms. However, to achieve next-to-optimal performance with such algorithms, significant tuning is required. On the other hand, recursive algorithms are virtually tuning free, and yet attain similar performance. In this paper, we first analyze and compare blocked and recursive algorithms in terms of performance, and then introduce ReLAPACK, an open-source library of recursive algorithms to seamlessly replace most of LAPACK's blocked algorithms. In many scenarios, ReLAPACK clearly outperforms reference LAPACK, and even improves upon the performance of optimizes libraries.
MKL vs. ReLAPACK on 1 core of an Intel Xeon E5-2560 v3 (Haswell) and MKL version 11.3
Double-precision inversion of a lower triangular matrix (dtrtri):