# Publications - Edoardo Di Napoli

### Submitted Papers

**ChASE: Chebyshev Accelerated Subspace iteration Eigensolver for sequences of Hermitian eigenvalue problems**May 2018.

submitted to ACM TOMS.abstractwebPDFSolving dense Hermitian eigenproblems arranged in a sequence with direct solvers fails to take advantage of those spectral properties which are pertinent to the entire sequence, and not just to the single problem. When such features take the form of correlations between the eigenvectors of consecutive problems, as is the case in many real-world applications, the potential benefit of exploiting them can be substantial. We present ChASE, a modern algorithm and library based on subspace iteration with polynomial acceleration. Novel to ChASE is the computation of the spectral estimates that enter in the filter and an optimization of the polynomial degree which further reduces the necessary FLOPs. ChASE is written in C++ using the modern software engineering concepts which favor a simple integration in application codes and a straightforward portability over heterogeneous platforms. When solving sequences of Hermitian eigenproblems for a portion of their extremal spectrum, ChASE greatly benefits from the sequence's spectral properties and outperforms direct solvers in many scenarios. The library ships with two distinct parallelization schemes, supports execution over distributed GPUs, and it is easily extensible to other parallel computing architectures.**Perfect spike detection via time reversal**2017.

Submitted to Frontiers in Neuroscience.abstractwebPDFSpiking neuronal networks are usually simulated with three main simulation schemes: the classical time-driven and event-driven schemes, and the more recent hybrid scheme. All three schemes evolve the state of a neuron through a series of checkpoints: equally spaced in the first scheme and determined neuron-wise by spike events in the latter two. The time-driven and the hybrid scheme determine whether the membrane potential of a neuron crosses a threshold at the end of of the time interval between consecutive checkpoints. Threshold crossing can, however, occur within the interval even if this test is negative. Spikes can therefore be missed. The present work derives, implements, and benchmarks a method for perfect retrospective spike detection. This method can be applied to neuron models with affine or linear subthreshold dynamics. The idea behind the method is to propagate the threshold with a time-inverted dynamics, testing whether the threshold crosses the neuron state to be evolved, rather than vice versa. Algebraically this translates into a set of inequalities necessary and sufficient for threshold crossing. This test is slower than the imperfect one, but faster than an alternative perfect tests based on bisection or root-finding methods. Comparison confirms earlier results that the imperfect test rarely misses spikes (less than a fraction 1/108 of missed spikes) in biologically relevant settings. This study offers an alternative geometric point of view on neuronal dynamics.**Non-linear Least-Squares optimization of rational filters for the solution of interior eigenvalue problems**2017.

Submitted to SIAM SIMAX.abstractwebPDFRational filter functions can be used to improve convergence of contour-based eigensolvers, a popular family of algorithms for the solution of the interior eigenvalue problem. We present a framework for the optimization of rational filters based on a non-convex weighted Least-Squares scheme. When used in combination with the FEAST library, our filters out-perform existing ones on a large and representative set of benchmark problems. This work provides a detailed description of: (1) a set up of the optimization process that exploits symmetries of the filter function for Hermitian eigenproblems, (2) a formulation of the gradient descent and Levenberg-Marquardt algorithms that exploits the symmetries, (3) a method to select the starting position for the optimization algorithms that reliably produces effective filters, (4) a constrained optimization scheme that produces filter functions with specific properties that may be beneficial to the performance of the eigensolver that employs them.

### Journal Articles

**High-performance functional renormalization group calculations for interacting fermions**Computer Physics Communications, Volume 213, pp. 100-110, April 2017.@article{Lichtenstein2017:788, author = "Julian Lichtenstein and David {Sanchez de la Pena} and Daniel Rohe and Edoardo {Di Napoli} and Carsten Honerkamp and {Stefan A.} Maier", title = "High-performance functional renormalization group calculations for interacting fermions", journal = "Computer Physics Communications", year = 2017, volume = 213, pages = "100-110", month = apr, url = "https://arxiv.org/pdf/1604.06296v2.pdf" }

abstractwebPDFbibtexWe derive a novel computational scheme for functional Renormalization Group (fRG) calculations for interacting fermions on 2D lattices. The scheme is based on the exchange parametrization fRG for the two-fermion interaction, with additional insertions of truncated partitions of unity. These insertions decouple the fermionic propagators from the exchange propagators and lead to a separation of the underlying equations. We demonstrate that this separation is numerically advantageous and may pave the way for refined, large-scale computational investigations even in the case of complex multiband systems. Furthermore, on the basis of speedup data gained from our implementation, it is shown that this new variant facilitates efficient calculations on a large number of multi-core CPUs. We apply the scheme to the t,t′ Hubbard model on a square lattice to analyze the convergence of the results with the bond length of the truncation of the partition of unity. In most parameter areas, a fast convergence can be observed. Finally, we compare to previous results in order to relate our approach to other fRG studies.**High-performance generation of the Hamiltonian and Overlap matrices in FLAPW methods**Computer Physics Communications, Volume 211, pp. 61 - 72, February 2017.

High Performance Computing for Advanced Modeling and Simulation of Materials.@article{Di_Napoli2017:318, author = "Edoardo {Di Napoli} and Elmar Peise and Markus Hrywniak and Paolo Bientinesi", title = "High-performance generation of the Hamiltonian and Overlap matrices in FLAPW methods", journal = "Computer Physics Communications", year = 2017, volume = 211, pages = "61 - 72", month = feb, note = "High Performance Computing for Advanced Modeling and Simulation of Materials", url = "http://arxiv.org/pdf/1602.06589v2" }

abstractwebPDFbibtexOne of the greatest effort of computational scientists is to translate the mathematical model describing a class of physical phenomena into large and complex codes. Many of these codes face the difficulty of implementing the mathematical operations in the model in terms of low level optimized kernels offering both performance and portability. Legacy codes suffers from the additional curse of rigid design choices based on outdated performance metrics (e.g. minimization of memory footprint). Using a representative code from the Materials Science community, we propose a methodology to restructure the most expensive operations in terms of an optimized combination of dense linear algebra kernels. The resulting algorithm guarantees an increased performance and an extended life span of this code enabling larger scale simulations.**New methodology for determining the electronic thermal conductivity of metals via direct non-equilibrium ab initio molecular dynamics**Physical Review B, pp. 075149, 25 August 2016.@article{Yue2016:504, author = "Sheng-Ying Yue and Xiaoliang Zhang and Stephen Stackhouse and Guangzhao Qin and Edoardo {Di Napoli} and Ming Hu", title = "New methodology for determining the electronic thermal conductivity of metals via direct non-equilibrium ab initio molecular dynamics", journal = "Physical Review B", year = 2016, pages = 75149, month = aug, url = "http://arxiv.org/pdf/1603.07755v2.pdf" }

abstractwebPDFbibtexMany physical properties of metals can be understood in terms of the free electron model, as proven by the Wiedemann-Franz law. According to this model, electronic thermal conductivity (κ_el) can be inferred from the Boltzmann transport equation (BTE). However, the BTE does not perform well for some complex metals, such as Cu. Moreover, the BTE cannot clearly describe the origin of the thermal energy carried by electrons or how this energy is transported in metals. The charge distribution of conduction electrons in metals is known to reflect the electrostatic potential (EP) of the ion cores. Based on this premise, we develop a new methodology for evaluating κel by combining the free electron model and non-equilibrium ab initio molecular dynamics (NEAIMD) simulations. We demonstrate that the kinetic energy of thermally excited electrons originates from the energy of the spatial electrostatic potential oscillation (EPO), which is induced by the thermal motion of ion cores. This method directly predicts the κ_el of pure metals with a high degree of accuracy.**Efficient estimation of eigenvalue counts in an interval**Numerical Linear Algebra with Applications, Volume 23(4), pp. 674-692, July 2016.@article{Di_Napoli2016:838, author = "Edoardo {Di Napoli} and Eric Polizzi and Yousef Saad", title = "Efficient estimation of eigenvalue counts in an interval", journal = "Numerical Linear Algebra with Applications", year = 2016, volume = 23, number = 4, pages = "674-692", month = jul }

abstractwebbibtexEstimating the number of eigenvalues located in a given interval of a large sparse Hermitian matrix is an important problem in certain applications and it is a prerequisite of eigensolvers based on a divide-and-conquer paradigm. Often an exact count is not necessary and methods based on stochastic estimates can be utilized to yield rough approximations. This paper examines a number of techniques tailored to this specific task. It reviews standard approaches and explores new ones based on polynomial and rational approximation filtering combined with stochastic procedure.**An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems**Concurrency and Computation: Practice and Experience, Volume 27(4), pp. 905, 25 March 2015.@article{Berljafa2015:768, author = "Mario Berljafa and Daniel Wortmann and Edoardo {Di Napoli}", title = "An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems", journal = "Concurrency and Computation: Practice and Experience", year = 2015, volume = 27, number = 4, pages = 905, month = mar, url = "http://arxiv.org/pdf/1404.4161" }

abstractwebPDFbibtexIn many scientific applications the solution of non-linear differential equations are obtained through the set-up and solution of a number of successive eigenproblems. These eigenproblems can be regarded as a sequence whenever the solution of one problem fosters the initialization of the next. In addition, some eigenproblem sequences show a connection between the solutions of adjacent eigenproblems. Whenever is possible to unravel the existence of such a connection, the eigenproblem sequence is said to be a correlated. When facing with a sequence of correlated eigenproblems the current strategy amounts to solving each eigenproblem in isolation. We propose a novel approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials (ChFSI). The resulting eigensolver, is optimized by minimizing the number of matvec multiplications and parallelized using the Elemental library framework. Numerical results shows that ChFSI achieves excellent scalability and is competitive with current dense linear algebra parallel eigensolvers.**Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions**Applied Mathematics and Computation, Volume 235, pp. 454-468, May 2014.@article{Di_Napoli2014:210, author = "Edoardo {Di Napoli} and Diego Fabregat-Traver and Gregorio Quintana-Orti and Paolo Bientinesi", title = "Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions", journal = "Applied Mathematics and Computation", year = 2014, volume = 235, pages = "454--468", month = may, publisher = "Elsevier", url = "http://arxiv.org/pdf/1307.2100" }

abstractwebPDFbibtexMathematical operators whose transformation rules constitute the building blocks of a multi-linear algebra are widely used in physics and engineering applications where they are very often represented as tensors. In the last century, thanks to the advances in tensor calculus, it was possible to uncover new research fields and make remarkable progress in the existing ones, from electromagnetism to the dynamics of fluids and from the mechanics of rigid bodies to quantum mechanics of many atoms. By now, the formal mathematical and geometrical properties of tensors are well defined and understood; conversely, in the context of scientific and high-performance computing, many tensor-related problems are still open. In this paper, we address the problem of efficiently computing contractions among two tensors of arbitrary dimension by using kernels from the highly optimized BLAS library. In particular, we establish precise conditions to determine if and when GEMM, the kernel for matrix products, can be used. Such conditions take into consideration both the nature of the operation and the storage scheme of the tensors, and induce a classification of the contractions into three groups. For each group, we provide a recipe to guide the users towards the most effective use of BLAS.**Block Iterative Eigensolvers for Sequences of Correlated Eigenvalue Problems**Computer Physics Communications, Volume 184, pp. 2478 - 2488, November 2013.@article{Di_Napoli2013:904, author = "Edoardo {Di Napoli} and Mario Berljafa", title = "Block Iterative Eigensolvers for Sequences of Correlated Eigenvalue Problems", journal = "Computer Physics Communications", year = 2013, volume = 184, pages = "2478 - 2488", month = nov, url = "http://arxiv.org/pdf/1206.3768v2" }

abstractwebPDFbibtexIn Density Functional Theory simulations based on the LAPW method, each self-consistent cycle comprises dozens of large dense generalized eigenproblems. In contrast to real-space methods, eigenpairs solving for problems at distinct cycles have either been believed to be independent or at most very loosely connected. In a recent study [7], it was proposed to revert this point of view and consider simulations as made of dozens of sequences of eigenvalue problems; each sequence groups together eigenproblems with equal k-vectors and an increasing outer-iteration cycle index l. From this different standpoint it was possible to demonstrate that, contrary to belief, successive eigenproblems in a sequence are strongly correlated with one another. In particular, by tracking the evolution of subspace angles between eigenvectors of successive eigenproblems, it was shown that these angles decrease noticeably after the first few iterations and become close to collinear: the closer to convergence the stronger the correlation becomes. This last result suggests that we can manipulate the eigenvectors, solving for a specific eigenproblem in a sequence, as an approximate solution for the following eigenproblem. In this work we present results that are in line with this intuition. First, we provide numer- ical examples where opportunely selected block iterative solvers benefit from the reuse of eigenvectors by achieving a substantial speed-up. We then develop a C language version of one of these algorithms and run a series of tests specifically focused on perfor- mance and scalability. All the numerical tests are carried out employing sequences of eigenproblems extracted from simulations of solid-state physics crystals. The results presented here could eventually open the way to a widespread use of block iterative solvers in ab initio electronic structure codes based on the LAPW approach.**Dissecting the FEAST Algorithm for Generalized Eigenproblems**Journal of Computational and Applied Mathematics, Volume 244, pp. 1-9, May 2013.@article{Kraemer2013:188, author = "Lukas Kraemer and Edoardo {Di Napoli} and {Martin } Galgon and Bruno Lang and Paolo Bientinesi", title = "Dissecting the FEAST Algorithm for Generalized Eigenproblems", journal = "Journal of Computational and Applied Mathematics", year = 2013, volume = 244, pages = "1--9", month = may, url = "http://arxiv.org/abs/1204.1726" }

abstractwebPDFbibtexWe analyze the FEAST method for computing selected eigenvalues and eigenvectors of large sparse matrix pencils. After establishing the close connection between FEAST and the well-known Rayleigh-Ritz method, we identify several critical issues that influence convergence and accuracy of the solver: the choice of the starting vector space, the stopping criterion, how the inner linear systems impact the quality of the solution, and the use of FEAST for computing eigenpairs from multiple intervals. We complement the study with numerical examples, and hint at possible improvements to overcome the existing problems.**Correlations in Sequences of Generalized Eigenproblems Arising in Density Functional Theory**Computer Physics Communications (CPC), Volume 183(8), pp. 1674-1682, August 2012.@article{Di_Napoli2012:160, author = "Edoardo {Di Napoli} and Stefan Bluegel and Paolo Bientinesi", title = "Correlations in Sequences of Generalized Eigenproblems Arising in Density Functional Theory", journal = "Computer Physics Communications (CPC)", year = 2012, volume = 183, number = 8, pages = "1674-1682", month = aug, url = "http://arxiv.org/pdf/1108.2594v1" }

abstractwebPDFbibtexDensity Functional Theory (DFT) is one of the most used {\em ab initio} theoretical frameworks in materials science. It derives the ground state properties of multi-atomic ensembles directly from the computation of their one-particle density \nr . In DFT-based simulations the solution is calculated through a chain of successive self-consistent cycles; in each cycle a series of coupled equations (Kohn-Sham) translates to a large number of generalized eigenvalue problems whose eigenpairs are the principal means for expressing \nr. A simulation ends when \nr\ has converged to the solution within the required numerical accuracy. This usually happens after several cycles, resulting in a process calling for the solution of many sequences of eigenproblems. In this paper, the authors report evidence showing unexpected correlations between adjacent eigenproblems within each sequence and suggest the investigation of an alternative computational approach: information extracted from the simulation at one step of the sequence is used to compute the solution at the next step. The implications are multiple: from increasing the performance of material simulations, to the development of a mixed direct-iterative solver, to modifying the mathematical foundations of the DFT computational paradigm in use, thus opening the way to the investigation of new materials.**Solving Dense Generalized Eigenproblems on Multi-Threaded Architectures**Applied Mathematics and Computation, Volume 218(22), pp. 11279-11289, July 2012.@article{Aliaga2012:420, author = "Jose' Aliaga and Paolo Bientinesi and Davor Davidovic and Edoardo {Di Napoli} and Francisco Igual and {Enrique S.} Quintana-Orti", title = "Solving Dense Generalized Eigenproblems on Multi-Threaded Architectures", journal = "Applied Mathematics and Computation", year = 2012, volume = 218, number = 22, pages = "11279-11289", month = jul, url = "http://arxiv.org/pdf/1111.6374v1" }

abstractwebPDFbibtexWe compare two approaches to compute a portion of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and materials science, are employed to investigate the contributions of the application, architecture, and parallelism of the method to the performance of the solvers. The experimental results on a state-of-the-art 8-core platform, equipped with a graphics processing unit (GPU), reveal that in real applications, iterative Krylov-subspace methods can be a competitive approach also for the solution of dense problems.**Quantum Deconstruction of 5D SQCD**Journal of High Energy Physics, Volume 2007(3), pp. 092, 2007.

ArXiv:hep-th/0611085.@article{Di_Napoli2007:504, author = "Edoardo {Di Napoli} and Vadim Kaplunovsky", title = "Quantum Deconstruction of 5D SQCD ", journal = "Journal of High Energy Physics", year = 2007, volume = 2007, number = 3, pages = 92, note = "ArXiv:hep-th/0611085" }

abstractwebbibtexWe deconstruct the fifth dimension of 5D SCQD with general numbers of colors and flavors and general 5D Chern-Simons level; the latter is adjusted by adding extra quarks to the 4D quiver. We use deconstruction as a non-stringy UV completion of the quantum 5D theory; to prove its usefulness, we compute quantum corrections to the SQCD_5 prepotential. We also explore the moduli/parameter space of the deconstructed SQCD_5 and show that for |K_CS| < N_F/2 it continues to negative values of 1/(g_5)^2. In many cases there are flop transitions connecting SQCD_5 to exotic 5D theories such as E0, and we present several examples of such transitions. We compare deconstruction to brane-web engineering of the same SQCD_5 and show that the phase diagram is the same in both cases; indeed, the two UV completions are in the same universality class, although they are not dual to each other. Hence, the phase structure of an SQCD_5 (and presumably any other 5D gauge theory) is inherently five-dimensional and does not depends on a UV completion.**Multi-parametric Quantum Algebras and the Cosmological Constant.**Advances in High Energy Physics, Volume 2007, pp. 13458, 2007.

ArXiv: hep-th/0511147.@article{Krishnan2007:300, author = "Chethan Krishnan and Edoardo {Di Napoli}", title = "Multi-parametric Quantum Algebras and the Cosmological Constant. ", journal = "Advances in High Energy Physics", year = 2007, volume = 2007, pages = 13458, note = "ArXiv: hep-th/0511147" }

abstractwebbibtexWith a view towards applications for de Sitter, we construct the multi-parametric $q$-deformation of the $so(5,\IC)$ algebra using the Faddeev-Reshetikhin-Takhtadzhyan (FRT) formalism.**Anomaly Cancellation and Conformality in Quiver Gauge Theories**Physics Letters B, Volume 638(4), pp. 374, 2006.

ArXiv:hep-th/0603065.@article{Di_Napoli2006:710, author = "Edoardo {Di Napoli} and Paul Frampton", title = "Anomaly Cancellation and Conformality in Quiver Gauge Theories ", journal = "Physics Letters B", year = 2006, volume = 638, number = 4, pages = 374, note = "ArXiv:hep-th/0603065" }

abstractwebbibtexAbelian quiver gauge theories provide nonsupersymmetric candidates for the conformality approach to physics beyond the standard model. Written as ${\cal N}=0$, $U(N)^n$ gauge theories, however, they have mixed $U(1)_p U(1)_q^2$ and $U(1)_p SU(N)_q^2$ triangle anomalies. It is shown how to construct explicitly a compensatory term $\Delta{\cal L}_{comp}$ which restores gauge invariance of ${\cal L}_{eff} = {\cal L} + \Delta {\cal L}_{comp}$ under $U(N)^n$. It can lead to a negative contribution to the U(1) $\beta$-function and hence to one-loop conformality at high energy for all dimensionless couplings.**Can Quantum de Sitter Space Have Finite Entropy?**Classical and Quantum Gravity, Volume 24(13), pp. 3457, 2006.

ArXiv:hep-th/0602002.@article{Krishnan2006:98, author = "Chethan Krishnan and Edoardo {Di Napoli}", title = "Can Quantum de Sitter Space Have Finite Entropy? ", journal = "Classical and Quantum Gravity", year = 2006, volume = 24, number = 13, pages = 3457, note = "ArXiv:hep-th/0602002" }

abstractwebbibtexIf one tries to view de Sitter as a true (as opposed to a meta-stable) vacuum, there is a tension between the finiteness of its entropy and the infinite-dimensionality of its Hilbert space. We invetsigate the viability of one proposal to reconcile this tension using $q$-deformation. After defining a differential geometry on the quantum de Sitter space, we try to constrain the value of the deformation parameter by imposing the condition that in the undeformed limit, we want the real form of the (inherently complex) quantum group to reduce to the usual SO(4,1) of de Sitter. We find that this forces $q$ to be a real number. Since it is known that quantum groups have finite-dimensional representations only for $q=$ root of unity, this suggests that standard $q$-deformations cannot give rise to finite dimensional Hilbert spaces, ruling out finite entropy for q-deformed de Sitter.**Unitary matrix model of a chiral [SU(N)]^K gauge theory**Journal of High Energy Physics, Volume 2005(10), pp. 074, 2005.

ArXiv:hep-th/0508192.@article{Di_Napoli2005:918, author = "Edoardo {Di Napoli} and Vadim Kaplunovsky", title = "Unitary matrix model of a chiral [SU(N)]^K gauge theory ", journal = "Journal of High Energy Physics", year = 2005, volume = 2005, number = 10, pages = 74, note = "ArXiv:hep-th/0508192" }

abstractbibtexWe build a matrix model of a chiral [SU(N)]^K gauge theory (SQCD5 deconstructed down to 4D) using random unitary matrices to model chiral bifundamental fields (N,bar N) (without (bar N,N)). We verify the duality by matching the loop equation of the matrix model to the anomaly equations of the gauge theory. Then we evaluate the matrix model's free energy and use it to derive the effective superpotential for the gaugino condensates.**Chiral rings of deconstructive [SU(n(c))]^N quivers**Journal of High Energy Physics, Volume 2004(6), pp. 060, 2004.

ArXiv:hep-th/0406122.@article{Di_Napoli2004:128, author = "Edoardo {Di Napoli} and Vadim Kaplunovsky and Jacob Sonnenschein", title = "Chiral rings of deconstructive [SU(n(c))]^N quivers ", journal = "Journal of High Energy Physics", year = 2004, volume = 2004, number = 6, pages = 60, note = "ArXiv:hep-th/0406122" }

abstractbibtexDimensional deconstruction of 5D SQCD with general n_c, n_f and k_CS gives rise to 4D N=1 gauge theories with large quivers of SU(n_c) gauge factors. We construct the chiral rings of such [SU(n_c)]^N theories, off-shell and on-shell. Our results are broadly similar to the chiral rings of single U(n_c) theories with both adjoint and fundamental matter, but there are also some noteworthy differences such as nonlocal meson-like operators where the quark and antiquark fields belong to different nodes of the quiver. And because our gauge groups are SU(n_c) rather than U(n_c), our chiral rings also contain a whole zoo of baryonic and antibaryonic operators.

### Book Chapter

**Numerical methods for the eigenvalue problem in electronic structure computations**Computing Solids: Models, ab-initio methods and supercomputing, Key Technologies, Volume 74, pp. D3.1-28, Forschungszentrum Juelich GmbH, March 2014.

Lecture Notes of the 45th IFF Spring School.bibtex@inbook{Di_Napoli2014:348, author = "Edoardo {Di Napoli}", title = "Numerical methods for the eigenvalue problem in electronic structure computations", pages = "D3.1-28", publisher = "Forschungszentrum Juelich GmbH", year = 2014, volume = 74, series = "Key Technologies", month = mar, note = "Lecture Notes of the 45th IFF Spring School", booktitle = "Computing Solids: Models, ab-initio methods and supercomputing", institution = "Juelich Supercomputing Centre" }

### Peer Reviewed Conference Publications

**Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods**Proceedings of the JARA-HPC Symposium, Lecture Notes in Computer Science, Volume 10164, pp. 200-211, Springer, 2017.@inproceedings{Fabregat-Traver2017:4, author = "Diego Fabregat-Traver and Davor Davidovic and Markus Höhnerbach and Edoardo {Di Napoli}", title = " Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods", year = 2017, volume = 10164, series = "Lecture Notes in Computer Science", pages = "200--211", publisher = "Springer", url = "https://arxiv.org/pdf/1611.00606v1" }

abstractwebPDFbibtexIn this paper we focus on the integration of high-performance numerical libraries in ab initio codes and the portability of performance and scalability. The target of our work is FLEUR, a software for electronic structure calculations developed in the Forschungszentrum J\"ulich over the course of two decades. The presented work follows up on a previous effort to modernize legacy code by re-engineering and rewriting it in terms of highly optimized libraries. We illustrate how this initial effort to get efficient and portable shared-memory code enables fast porting of the code to emerging heterogeneous architectures. More specifically, we port the code to nodes equipped with multiple GPUs. We divide our study in two parts. First, we show considerable speedups attained by minor and relatively straightforward code changes to off-load parts of the computation to the GPUs. Then, we identify further possible improvements to achieve even higher performance and scalability. On a system consisting of 16-cores and 2 GPUs, we observe speedups of up to 5x with respect to our optimized shared-memory code, which in turn means between 7.5x and 12.5x speedup with respect to the original FLEUR code.**Parallel adaptive integration in high-performance functional Renormalization Group computations**Jülich Aachen Research Alliance High-Performance Computing Symposium 2016, Lecture Notes in Computer Science, Springer-Verlag, 2017.@inproceedings{Lichtenstein2017:360, author = "Julian Lichtenstein and Jan Winkelmann and David {Sanchez de la Pena} and Toni Vidovic and Edoardo {Di Napoli}", title = "Parallel adaptive integration in high-performance functional Renormalization Group computations", booktitle = "Jülich Aachen Research Alliance High-Performance Computing Symposium 2016", year = 2017, editor = "E. Di Napoli et. al.", series = "Lecture Notes in Computer Science", publisher = "Springer-Verlag", url = "https://arxiv.org/pdf/1610.09991v1.pdf" }

abstractwebPDFbibtexThe conceptual framework provided by the functional Renormalization Group (fRG) has become a formidable tool to study correlated electron systems on lattices which, in turn, provided great insights to our understanding of complex many-body phenomena, such as high- temperature superconductivity or topological states of matter. In this work we present one of the latest realizations of fRG which makes use of an adaptive numerical quadrature scheme specifically tailored to the described fRG scheme. The final result is an increase in performance thanks to improved parallelism and scalability.**A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW**Lecture Notes in Computer Science, 2013.

Accepted.@inproceedings{Berljafa2013:300, author = "Mario Berljafa and Edoardo {Di Napoli}", title = "A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW ", booktitle = "Lecture Notes in Computer Science", year = 2013, note = "Accepted" }

abstractbibtexIn one of the most important methods in Density Functional Theory -- the Full- Potential Linearized Augmented Plane Wave (FLAPW) method -- dense generalized eigenproblems are organized in long sequences. Moreover each eigenproblem is strongly correlated to the next one in the sequence. We propose a novel approach which ex- ploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials. The resulting solver, parallelized using the Elemental library framework, achieves excellent scalability and is competitive with current dense parallel eigensolvers.**An Example of Symmetry Exploitation for Energy-related Eigencomputations**INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009: (ICCMSE 2009), AIP Conference Proceedings, Volume 1504, pp. 1134-1137, 2012.@inproceedings{Petschow2012:860, author = "Matthias Petschow and Edoardo {Di Napoli} and Paolo Bientinesi", title = "An Example of Symmetry Exploitation for Energy-related Eigencomputations", booktitle = "INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2009: (ICCMSE 2009)", year = 2012, volume = 1504, series = "AIP Conference Proceedings", pages = "1134--1137", url = "http://link.aip.org/link/doi/10.1063/1.4772126" }

abstractwebPDFbibtexOne of the most used approaches in simulating materials is the tight-binding approximation. When using this method in a material simulation, it is necessary to compute the eigenvalues and eigenvectors of the Hamiltonian describing the system. In general, the system possesses few explicit symmetries. Due to them, the problem has many degenerate eigenvalues. The ambiguity in choosing a orthonormal basis of the invariant subspaces, associated with degenerate eigenvalues, will result in eigenvectors which are not invariant under the action of the symmetry operators in matrix form. A meaningful computation of the eigenvectors needs to take those symmetries into account. A natural choice is a set of eigenvectors, which simultaneously diagonalizes the Hamiltonian and the symmetry matrices. This is possible because all the matrices commute with each other. The simultaneous eigenvectors and the corresponding eigenvalues will be in a parametrized form in terms of the lattice momentum components. This functional dependence of the eigenvalues is the dispersion relation and describes the band structure of a material. Therefore it is important to find this functional dependence in any numerical computation related to material properties.**Matrix Structure Exploitation in Generalized Eigenproblems Arising in Density Functional Theory**ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010, AIP Conference Proceedings, Volume 1281, pp. 937-940, American Institute of Physics, 2010.@inproceedings{Di_Napoli2010:830, author = "Edoardo {Di Napoli} and Paolo Bientinesi", title = "Matrix Structure Exploitation in Generalized Eigenproblems Arising in Density Functional Theory", booktitle = "ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010", year = 2010, volume = 1281, series = "AIP Conference Proceedings", pages = "937--940", publisher = "American Institute of Physics", url = "http://hpac.rwth-aachen.de/~pauldj/pubs/ICNAAM-Edo-TR.pdf" }

abstractwebPDFbibtexIn this short paper, the authors report a new computational approach in the context of Density Functional Theory (DFT). It is shown how it is possible to speed up the self-consistent cycle (iteration) characterizing one of the most well-known DFT implementations: FLAPW. Generating the Hamiltonian and overlap matrices and solving the associated generalized eigenproblems A x = l B x constitute the two most time-consuming fractions of each iteration. Two promising directions, implementing the new methodology, are presented that will ultimately improve the performance of the generalized eigensolver and save computational time.

### Theses

**Quiver gauge theories, chiral rings and random matrix models**Ph.D. Dissertation, The University of Texas at Austin, 2005.@phdthesis{Di_Napoli2005:340, author = "Edoardo {Di Napoli}", title = "Quiver gauge theories, chiral rings and random matrix models ", school = "The University of Texas at Austin", year = 2005, type = "Ph.D. Dissertation", address = "1, University Station, 78712 Austin, Texas, USA" }

abstractbibtexDimensional deconstruction of 5D SQCD with general nc, nf and kCS gives rise to 4D N = 1 gauge theories with large quivers of SU(nc) gauge factors. We first describe the spectrum of the model in the deconstructive limit and show its properties. We then construct the chiral rings of such theories, off-shell and on-shell. Anomaly equations for the various resolvents allowed by the model permit us to calculate all the relevant chiral operators. The results are broadly similar to the chiral rings of single U(nc) theories with both adjoint and fundamental matter, but there are also some noteworthy differences such as nonlocal meson-like operators where the quark and antiquark fields belong to different nodes of the quiver. And because the analyzed gauge groups are SU(nc) rather than U(nc), our chiral rings also contain a whole collection of baryonic and antibaryonic operators. We then investigate the random matrix model corresponding to such chiral ring. We find that bifundamental chiral operators correspond to unitary matrices. We derive the loop equations and show that they are in perfect agreement with the anomaly equations of the gauge model. An exact expression for the free energy is found in the large NÃ�ï¿½ (rank of the matrix) limit. A formula for the effective superpotential is derived and some examples are illustrated.**Su una proposta di Modello Standard alternativo**Laurea Thesis, I Universita` di Roma "La Sapienza", 1996.@mastersthesis{Di_Napoli1996:554, author = "Edoardo {Di Napoli}", title = "Su una proposta di Modello Standard alternativo ", school = "I Universita` di Roma "La Sapienza"", year = 1996, type = "Laurea Thesis", address = "Piazzale Aldo Moro, Roma, Italy" }

abstractbibtexQuesto modello si propone di sostituire al tradizionale settore elettrodebole del Modello Standard, in cui le masse sono ''generate'' dalla rottura spontanea di simmetria di un doppietto scalare inserito esplicitamente nella lagrangiana, un modello nel quale la rottura di simmetria è dinamica e le particelle scalari di tipo Higgs sono il risultato di uno stato composito di fermioni. L'analogia con la teoria BCS è stretta anche se la trattazione matematica è decisamente differente facendo ampio uso di metodologie funzionali largamente diffuse nell'ambito delle teorie di campo relativistiche. Il punto di partenza è il fondamentale lavoro di Nambu-Jona Lasinio che ha dato origine all'omonimo modello. In questo modello affondano le origini del meccanismo di rottura dinamica di simmetria poi ampiamente sviluppato in lavori successivi. NJL basandosi sulla stretta analogia con la teoria BCS, applicano il metodo delle quasi particelle, ideato da Bogoliubov, alla fisica relativistica dei campi. Attraverso una semplice interazione a 4-F (quattro fermioni) chiral-invariante ricavano un eq. di gap, nell'approssimazione di Hartree-Fock, per un parametro d'ordine che poi non è nient'altro che la massa dinamica dei fermioni. Ma il modello non si ferma quì\ indicando esplicitamente la ricca struttura degli infiniti stati fondamentali e l'esistenza di stati dinamici legati massivi. L'unico problema che NJL trovano è legato alla inevitabile dipendenza logaritmica dal cut-off dei risultati analitici, che paradossalmente arricchisce strutturalmente la teoria ma non può essere assorbito nell'eventuale ridefinizione della costante d'accoppiamento essendo l'intero apparato non rinormalizzabile. Nella prima parte del presente lavoro si è fatto ampio uso delle idee fondamentali di questo modello unitamente alle tecniche sviluppate da Gross e Neveu per lo sviluppo non perturbativo 1/N ed al formalismo per gli operatori compositi inventato da Jackiw, Cornwall e Tomboulis. I risultati sono stati confrontati con il recente articolo di Bardeen, Hill e Lindner. Allo scopo di ottenere uno schema il più generale possibile, è stato seguito un metodo autoconsistene che fà ampio uso di tecniche funzionali applicate all'azione efficace.

### Technical Report

**Solving Dense Generalized Eigenproblems on Multi-Threaded Architectures**Aachen Institute for Computational Engineering Science, RWTH Aachen, November 2011.

Technical Report AICES-2011/11-3.@techreport{Aliaga2011:348, author = "Jose' Aliaga and Paolo Bientinesi and Davor Davidovic and Edoardo {Di Napoli} and Francisco Igual and {Enrique S.} Quintana-Orti", title = "Solving Dense Generalized Eigenproblems on Multi-Threaded Architectures", institution = "Aachen Institute for Computational Engineering Science, RWTH Aachen", year = 2011, month = nov, note = "Technical Report AICES-2011/11-3" }

abstractbibtex