# Publications - Matthias Petschow

### Journal Articles

**Improved Accuracy and Parallelism for MRRR-based Eigensolvers**SIAM Journal of Scientific Computing, Volume 36(2), pp. 240-263, April 2014.@article{Petschow2014:60, author = "Matthias Petschow and {Enrique S.} Quintana-Orti and Paolo Bientinesi", title = "Improved Accuracy and Parallelism for MRRR-based Eigensolvers", journal = "SIAM Journal of Scientific Computing", year = 2014, volume = 36, number = 2, pages = "240--263", month = apr, url = "http://arxiv.org/pdf/1304.1864v3" }

abstractwebPDFbibtexThe real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to real tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also - under most circumstances - faster and more scalable than the competition.**High-Performance Solvers for Dense Hermitian Eigenproblems**SIAM Journal on Scientific Computing, Volume 35(1), pp. C1-C22, January 2013.@article{Petschow2013:208, author = "Matthias Petschow and Elmar Peise and Paolo Bientinesi", title = "High-Performance Solvers for Dense Hermitian Eigenproblems", journal = "SIAM Journal on Scientific Computing", year = 2013, volume = 35, number = 1, pages = "C1-C22", month = jan, publisher = "Society for Industrial and Applied Mathematics", url = "http://arxiv.org/pdf/1205.2107.pdf" }

abstractwebPDFbibtexWe introduce a new collection of solvers - subsequently called EleMRRR - for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we present a fast and scalable tridiagonal solver based on the Algorithm of Multiple Relatively Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers, PMRRR is part of the freely available Elemental library, and is designed to fully support both message-passing (MPI) and multithreading parallelism (SMP). As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's solvers on two supercomputers. Such a study, performed with up to 8,192 cores, provides precise guidelines to assemble the fastest solver within the ScaLAPACK framework; it also indicates that EleMRRR outperforms even the fastest solvers built from ScaLAPACK's components.**MR3-SMP: A Symmetric Tridiagonal Eigensolver for Multi-Core Architectures**Parallel Computing, Volume 37(12), December 2011.@article{Petschow2011:254, author = "Matthias Petschow and Paolo Bientinesi", title = "MR3-SMP: A Symmetric Tridiagonal Eigensolver for Multi-Core Architectures", journal = "Parallel Computing", year = 2011, volume = 37, number = 12, month = dec, url = "http://hpac.rwth-aachen.de/~pauldj/pubs/MR3-SMP-TR.pdf" }

abstractwebPDFbibtexThe computation of eigenvalues and eigenvectors of symmetric tridiagonal matrices arises frequently in applications; often as one of the steps in the solution of Hermitian and symmetric eigenproblems. While several accurate and efficient methods for the tridiagonal eigenproblem exist, their corresponding implementations usually target uni-processors or large distributed memory systems. Our new eigensolver MR3-SMP is instead specifically designed for multi-core and many-core general purpose processors, which today have effectively replaced uni-processors. We show that in most cases MR3-SMP is faster and achieves better speedups than state-of-the-art eigensolvers for uni-processors and distributed-memory systems.**Condensed Forms for the Symmetric Eigenvalue Problem on Multi-threaded Architectures**Concurrency and Computation: Practice and Experience, Volume 23(7), May 2011.@article{Bientinesi2011:754, author = "Paolo Bientinesi and Francisco Igual and Daniel Kressner and Matthias Petschow and {Enrique S.} Quintana-Orti", title = "Condensed Forms for the Symmetric Eigenvalue Problem on Multi-threaded Architectures", journal = "Concurrency and Computation: Practice and Experience", year = 2011, volume = 23, number = 7, month = may, url = "http://hpac.rwth-aachen.de/~pauldj/pubs/red-multi-TR.pdf" }

abstractwebPDFbibtexWe investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric eigenvalue problem, on general-purpose multi-core processors. In response to the advances of hardware accelerators, we also modify the code in the SBR toolbox to accelerate the computation by off-loading a significant part of the operations to a graphics processor (GPU). The performance results illustrate the parallelism and scalability of these algorithms on current high-performance multi-core and many-core architectures.**Effects of Pupil Discretization and Littrow Illumination in the Simulation of Bright-Field Defect Detection**Optics Letters, Volume 34(12), pp. 1840-1842, 2009.@article{Rafler2009:968, author = "Stephan Rafler and Matthias Petschow and Uwe Seifert and Karsten Frenner and Jens Goeckeritz and Wolfgang Osten", title = "Effects of Pupil Discretization and Littrow Illumination in the Simulation of Bright-Field Defect Detection", journal = "Optics Letters", year = 2009, volume = 34, number = 12, pages = "1840--1842" }

abstractwebbibtexThe simulation of optical defect detection on wafers by microscopy requires several approximations. For the simulation with Fourier modal methods, the number of modes is limited by memory and time consumption. Furthermore, the illumination pupil has to be divided into discrete incidence angles for plane waves. We present a way to save the computation of most of these incidence angles. It works only for certain configurations of grating period and illumination wavelength but is an accurate and robust approximation in these cases. We present sample calculations for one- and two-dimensional periodic structures with defects.

### Peer Reviewed Conference Publications

**An Example of Symmetry Exploitation for Energy-related Eigencomputations**INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009: (ICCMSE 2009), AIP Conference Proceedings, Volume 1504, pp. 1134-1137, 2012.@inproceedings{Petschow2012:860, author = "Matthias Petschow and Edoardo {Di Napoli} and Paolo Bientinesi", title = "An Example of Symmetry Exploitation for Energy-related Eigencomputations", booktitle = "INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009: (ICCMSE 2009)", year = 2012, volume = 1504, series = "AIP Conference Proceedings", pages = "1134--1137", url = "http://link.aip.org/link/doi/10.1063/1.4772126" }

abstractwebPDFbibtexOne of the most used approaches in simulating materials is the tight-binding approximation. When using this method in a material simulation, it is necessary to compute the eigenvalues and eigenvectors of the Hamiltonian describing the system. In general, the system possesses few explicit symmetries. Due to them, the problem has many degenerate eigenvalues. The ambiguity in choosing a orthonormal basis of the invariant subspaces, associated with degenerate eigenvalues, will result in eigenvectors which are not invariant under the action of the symmetry operators in matrix form. A meaningful computation of the eigenvectors needs to take those symmetries into account. A natural choice is a set of eigenvectors, which simultaneously diagonalizes the Hamiltonian and the symmetry matrices. This is possible because all the matrices commute with each other. The simultaneous eigenvectors and the corresponding eigenvalues will be in a parametrized form in terms of the lattice momentum components. This functional dependence of the eigenvalues is the dispersion relation and describes the band structure of a material. Therefore it is important to find this functional dependence in any numerical computation related to material properties.**The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors**Proceedings of the PARA 2010, Reykjavik, Iceland, June 6-9, 2010, Lecture Notes in Computer Science, Volume 7133, pp. 152-161, Springer, 2010.@inproceedings{Petschow2010:754, author = "Matthias Petschow and Paolo Bientinesi", title = "The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors", year = 2010, editor = "Jonasson, Kristjan", volume = 7133, series = "Lecture Notes in Computer Science", pages = "152--161", publisher = "Springer", url = "http://hpac.rwth-aachen.de/aices/preprint/documents/AICES-2010-09-04.pdf" }

abstractwebPDFbibtexThe algorithm of Multiple Relatively Robust Representations (MRRR or MR3) computes k eigenvalues and eigenvectors of a symmetric tridiagonal matrix in O(nk) arithmetic operations. Large problems can be effectively tackled with existing distributed-memory parallel implementations of MRRR; small and medium size problems can instead make use of LAPACK's routine xSTEMR. However, xSTEMR is optimized for single-core CPUs, and does not take advantage of today's multi-core and future many-core architectures. In this paper we discuss some of the issues and trade-offs arising in the design of MR3-SMP, an algorithm for multi-core CPUs and SMP systems. Experiments on application matrices indicate that MR3-SMP is both faster and obtains better speedups than all the tridiagonal eigensolvers included in LAPACK and Intel's Math Kernel Library (MKL).**Investigation of Methods to Set Up the Normal Vector Field for the Differential Method**Proc. SPIE, Volume 6995, 2008.@inproceedings{Rafler2008:190, author = "Stephan Rafler and Peter Goetz and Matthias Petschow and Thomas Schuster and Wolfgang Osten", title = "Investigation of Methods to Set Up the Normal Vector Field for the Differential Method", booktitle = "Proc. SPIE", year = 2008, volume = 6995 }

abstractwebbibtexIn Fourier modal methods like the RCWA and the Differential Method the Li-rules for products in truncated Fourier space have to be obeyed in order to achieve good convergence of the results with respect to the mode number. The Li- rules have to be applied differently for parts of the field that are tangential and orthogonal to material boundaries. This is achieved in the Differential Method by including a field of vectors in the calculation that are normal to the material boundaries. The same can be done laterally in each layer of an RCWA calculation of a 2-D periodic structure. It turns out that discontinuities in the normal vector field can disturb the computation especially when metallic materials are dominant in the structure which would make the usefulness of the normal vector method questionable. So it is of great importance to investigate how normal vector fields can be established with as few discontinuities as possible. We present various methods for the 2-D RCWA and the 1-D and 2-D Differential Method and compare the respective convergence behaviors. Especially we emphasize methods that are automatic and require as few user input as possible.

### Theses

**MRRR-based Eigensolvers for Multi-Core Processors and Supercomputers**RWTH Aachen, January 2014.

PhD thesis.@phdthesis{Petschow2014:748, author = "Matthias Petschow", title = "MRRR-based Eigensolvers for Multi-Core Processors and Supercomputers", school = "RWTH Aachen", year = 2014, month = jan, note = "PhD thesis", url = "http://arxiv.org/pdf/1401.4950v1.pdf" }

abstractwebPDFbibtexThe real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR or MR3 in short) - introduced in the late 1990s - is among the fastest methods. To compute k eigenpairs of a real n-by-n tridiagonal T, MRRR only requires O(kn) arithmetic operations; in contrast, all the other practical methods require O(k^2 n) or O(n^3) operations in the worst case. This thesis centers around the performance and accuracy of MRRR.**Differential Method for Arbitrary Anisotropic Periodic Media**University Stuttgart, 2008.

Diploma thesis (in German).

### Technical Report

**Improved Orthogonality for Dense Hermitian Eigensolvers based on the MRRR algorithm**Aachen Institute for Computational Engineering Science, RWTH Aachen, September 2012.

Technical Report AICES-2012/09-1.@techreport{Petschow2012:754, author = "Matthias Petschow and {Enrique S.} Quintana-Orti and Paolo Bientinesi", title = "Improved Orthogonality for Dense Hermitian Eigensolvers based on the MRRR algorithm", institution = "Aachen Institute for Computational Engineering Science, RWTH Aachen", year = 2012, month = sep, note = "Technical Report AICES-2012/09-1", url = "http://www.aices.rwth-aachen.de:8080/%7Epetschow/pubs/mixed.pdf" }

abstractPDFbibtexThe dense Hermitian eigenproblem is of outstanding importance in numerical computations and a number of excellent algorithms for this problem exist. One of the fastest method is the MRRR algorithm, which is based on a reduction to real tridiagonal form. This approach, although fast, does not deliver the same accuracy (orthogonality) as competing methods like the Divide-and-Conquer or the QR algorithm. In this paper, we demonstrate how the use of mixed precisions in MRRR-based eigensolvers leads to improved orthogonality. At the same time, when compared to the classical use of the MRRR algorithm, our approach comes with no or only limited performance penalty, increases the robustness, and improves scalability.