Recent Talks

Concurrent Alternating Least Squares and Jackknife Resampling for Canonical Polyadic Decompositions
Lars Karlsson and Paolo Bientinesi
ISC High Performance, Hamburg, Germany, June 2023.
Speaker: Lars Karlsson.
In the domains of matrix and tensor computations, the most typical approach to speed up a workflow consists in optimizing the underlying building blocks, i.e., operations such as the matrix product and LU factorization (for matrices), or tensor contractions and the MTTKRP (for tensors). While undeniably useful and effective, this approach is inherently limited by the rigid interface and boundaries of each individual building block, which prevent multi-operation optimizations. Inspired by a workflow we observed in chemometrics, namely that of fitting repeatedly one same tensor to many different models, we consider the problem of concurrently computing multiple CP decompositions. We recently published CALS (Concurrent ALS) that simultaneously computes multiple CP decompositions of the same tensor using Alternating Least Squares. The arithmetic intensity of the computation is increased by fusing independent MTTKRP operations. When the rank is small, each individual ALS computation is inherently memory-bound, but CALS makes the whole set of computations compute-bound, thus enabling the use of efficient kernels, including offloading to accelerators. We also adapted the idea to support jackknife resampling, a technique used to to estimate the uncertainties in the parameters of CP decompositions. In jackknife, the underlying tensor is nearly, but not exactly, the same. Nevertheless, the idea of concurrent ALS applies, resulting in significant speedups for the entire workflow.
abstract hide
Current state of programming languages for linear algebra computations
Paolo Bientinesi
TU Delft, DCSE High Performance Computing Symposium, Delft, The Netherlands, June 2023.
hide
High-Performance Matrix Computations: We Need More Than Fast Libraries
Paolo Bientinesi
SIAM Conference on Computational Science and Engineering.
Amsterdam, NL, February 2023.
hide
Matrix computations: Going beyond libraries
Paolo Bientinesi
eSSENCE, Swedish e-Science Academy, Umeå, Sweden, October 2022.
PDF hide
The fragmented landscape of tensor computations
Paolo Bientinesi
Chalmers University, 4th Workshop on Scientific Computing in Sweden (SwedComp22), Göteborg, Sweden, October 2022.
PDF hide
High-performance matrix computations: It’s not all about libraries
Paolo Bientinesi
RWTH Aachen University, EU Regional School, Aachen, Germany, May 2022.
PDF hide
Software for tensor computations: What is happening???
Paolo Bientinesi
Dagstuhl Seminar 22101, Tensor Computations: Applications and Optimization, Dagstul, Germany, March 2022.
PDF hide
The MTTKRP, a closer look
Christos Psarras
Dagstuhl Seminar 22101, Tensor Computations: Applications and Optimization.
March 2022.
web hide
Parallel Algorithms --- Introduction to High Performance Computing
Paolo Bientinesi
PDC summer school on High-Performance Computing, KTH, Stockholm, August 2021.
hide
High-Performance Tensor Computations: Where Do We Stand?
Paolo Bientinesi
SIAM Conference on Computational Science and Engineering.
Dallas (via Zoom), March 2021.
Since the introduction of the BLAS-1 library 40+ years ago, the entire domain of matrix computations has evolved around well defined layers, and a few "container" libraries that included state-of-the-art algorithms/implementations for a specific class of problems and/or a specific type of parallelism; these libraries served and are still serving the needs of a vast ecosystem of applications. In stark contrast, the domain of tensor computations still lacks a set of building blocks, and many similar libraries are developed in different application domains. This situation inevitably leads to redundancy and to suboptimal results. Furthermore, the software landscape for tensor computations is fragmented in terms of features, programming languages, and computing platforms, to the point that comparisons between new and existing algorithms are excessively challenging. In this talk we survey the software for high-performance tensor computations and make suggestions for an initial set of computational building blocks.
abstract PDF hide