My Eigen-Journey

May 31, 2018

LOBPCG_blog

This blog post is about eigenvalues. If you are not on good terms with the eigen-family, this is a good time to reconcile. Happy Australian reconciliation week!

Eigenvalues are special mathematical properties that square matrices have. The eigenvalues of a $2 \times 2$ matrix $A$ say something about the extent of deformation the sides of a unit square of corners $\mathbf{p}_1$ and $\mathbf{p}_2$ would incur relative to each other, if you multiply $A$ by the matrix $( \begin{smallmatrix} \mathbf{p}_1 & \mathbf{p}_2 \end{smallmatrix})$ which could be $\bigl( \begin{smallmatrix} 1 & 0 \\ 0 & 1\end{smallmatrix} \bigr)$ for example. An eigenvalue $\lambda$ of the square matrix $A$ and its corresponding eigenvector $x$ satisfy the following mathematical relation:

$\mathbf{A} \mathbf{x} = \lambda \mathbf{x}$

Generalized eigenvalues and eigenvectors are properties of square matrix pairs $\mathbf{A}$ and $\mathbf{B}$ , generalizing the above definition to:

$\mathbf{A} \mathbf{x} = \lambda \mathbf{B} \mathbf{x}$

Simple eigenvalues and eigenvectors satsify the generalized condition where $\mathbf{B} = \mathbf{I}$ is the identity matrix.

Motivation from Structural Mechanics

Given a certain loaded structure, one might want to ask: what is the biggest number that I can multiply the current loads by without causing the structure to fall apart? Generalized eigenvalues help us answer this question.

One straightforward way to answer the above question is to use some finite element software such as ABAQUS or ANSYS and compute the maximum von Mises stress $\sigma_v$ that the current load configuration results in, then the answer would be $\sigma_y/\sigma_v$ where $\sigma_y$ is the yield stress of the material. This is because multiplying the loads by a certain factor roughly multiplies the stress by the same factor given certain conventional assumptions. While this answer is perfectly reasonable for a good number of structures, there is a problem with it, and it is not in the linear proportionality of stress and strain. The problem is that elastic strain is fundamentally nonlinear.

Rise of the quadratic terms

Strain is a measure of deformation. For 1D structures such as a spring, strain is often treated as a linear function of how much the tip of the spring has moved. More specifically, it is defined as the ratio of the displacement of the tip and the length of the unloaded spring. This however is only true under the assumption of small strain. More correctly and generally for 2D and 3D structures, the strain, also known as the Green-Lagrange strain, is defined using the following formula:

$\epsilon_{ij} = \frac{1}{2}(v_{i,j} + v_{j,i} + v_{k,i}v_{k,j})$

where $v_i = v_i(\mathbf{p})$ is the unit displacement along the $i^{th}$ axis, of the point originally located at the position vector $\mathbf{p}$ , when the structure was undeformed. $v_{i,j}(\mathbf{p})$ is the rate of change of the function $v_i(\mathbf{p})$ as the point $\mathbf{p}$ moves along the $j^{th}$ axis. The above definition uses tensor notation such that:

$v_{k,i}v_{k,j} := ∑_{k=1}^{dim}v_{k,i}v_{k,j}$

where $dim$ is the dimension of the problem.

The above more correct definition of strain is nonlinear in the displacement function’s derivatives. However typical linear stress analysis assumes the so-called infinitesimal strain theory which lets us drop the quadratic term from the above definition. This quadratic term can however come back and bite us in the neck leaving a mark that says Buckling.

Buckling happens when there exists a zero-energy deformation mode at a certain stress level. If a zero-energy deformation exists at the current state of the system, it is possible that subject to a finger flick, that the system may deform significantly along the zero-energy deformation mode vectors. Such can result in the phenonmenon known as buckling, and the system at this state is an unstable system.

Predicting when these zero-energy deformation modes are likely to show up is a challenging task. However, for the cases where we have a single load (or multiple treated as one), it is simple to answer the question: what is the biggest number that I can multiply the current load by without reaching an unstable state?

To cut a long story short, taking the above quadratic terms into account and using certain assumptions, the smallest value for $\lambda$ that satisfies the equation below for some $\mathbf{u}$ is an estimate of the answer to the above question:

$\mathbf{K}\mathbf{u} = -\lambda \mathbf{K}_\sigma \mathbf{u}$

where $\mathbf{K}$ is a symmetric positive definite (SPD) matrix derived from the material of the structure and its shape discretization, and $\mathbf{K}_\sigma$ is a symmetric matrix derived from the shape discretization and stress distribution obtained using stress analysis.

The equation above maps closely to the generalized eigenvalue problem where $\mathbf{A} = \mathbf{K}$ and $\mathbf{B} = -\mathbf{K}_\sigma$ . The rest of this article will therefore be dedicated to finding the minimum (or maximum) eigenvalues of the system $\mathbf{A} \mathbf{x} = \lambda \mathbf{B} \mathbf{x}$ .

The locally optimal block preconditioned conjugate gradient (LOBPCG) method

From now on, I will only be talking about symmetric matrices $\mathbf{A}$ and $\mathbf{B}$ . I will also assume that $\mathbf{B}$ is positive definite. If $\mathbf{B}$ is not positive definite but $A$ is, then one can find the maximum eigenvalue of the system $\mathbf{B}x = \frac{1}{\lambda}\mathbf{A}\mathbf{x}$ instead, then invert the eigenvalue. This will give us the minimum eigenvalue of $\mathbf{A} \mathbf{x} = \lambda \mathbf{B} \mathbf{x}$ .

One way to find the minimum (maximum) eigenvalues of the system $\mathbf{A} \mathbf{x} = \lambda \mathbf{B} \mathbf{x}$ is to minimize (maximize) the so-called Rayleigh quotient on the $\mathbf{B}$ -ellipsoid. The Rayleigh quotient is defined as:

$\frac{\mathbf{x}'\mathbf{A}\mathbf{x}}{\mathbf{x}'\mathbf{B}\mathbf{x}}$

A vector $\mathbf{x}$ lies on the $\mathbf{B}$ -ellipsoid if it satisfies the condition $\mathbf{x}' \mathbf{B} \mathbf{x} == 1$ . One might also be interested in finding the $k$ smallest (largest) eigenvalues and their corresponding eigenvectors in which case the condition $\mathbf{X}'\mathbf{B}\mathbf{X}==\mathbf{I}_{k \times k}$ needs to be satisfied, where the columns of $\mathbf{X}$ are the eigenvectors corresponding to the $k$ smallest (largest) eigenvalues. This condition means that any 2 eigenvectors $\mathbf{x}_1$ and $\mathbf{x}_2$ are $\mathbf{B}$ -orthogonal, that is $\mathbf{x}_1' \mathbf{B} \mathbf{x}_2 == 0$ . It also means that each eigenvector $\mathbf{x}$ is $\mathbf{B}$ -normalized, that is $\mathbf{x}' \mathbf{B} \mathbf{x} == 1$ .

The LOBPCG algorithm uses an enhanced conjugate gradient (CG) method. The details of the algorithm will be left for another blog post. In the rest of this post, I will talk about my implementation challenges and progress as part of the Google Summer of Code (GSoC) with the NumFOCUS organization.

LOBPCG.jl

As part of my GSoC project, I am expected to implement the LOBPCG algorithm in the package IterativeSolvers.jl. To make it easier to modify stuff and upload unfinished bugged code showcasing my progress, I created the LOBPCG.jl repository to host my code.

Which algorithm is the LOBPCG?

One challenge I had initially was to nail down one version of the algorithm to label it “LOBPCG”. For that and following a suggestion from my mentor Harmen Stoppels, I found the Python implementation of the algorithm by its inventor and a collaborator. The code was BSD licensed and since my translation to Julia would be considered “derived work”, I had to make an acknowledgement at the top of the file giving credit to the original authors and putting all necessary disclaimers that are part of the BSD license.

Python -> Julia -> Fast Julia

After reading the Python implementation, I translated it to Julia very naively, allocating a lot of memory unnecessarily, using type unstable code and using repetitive codes all over the place. It was a very close translation of the Python version so it had a lot of the Python features that makes it, well…slow! After making sure the translation is working, I refactored the code a bit to identify key elements of the program, it was a big mess with over 30 matrices interacting. This helped me identify the patterns in the code, which enabled my third refactoring pass. That third pass was the main pass. I used Julia’s callable structs feature which lets me group variables by the functions using them (pretty much the inverse of object-oriented programming). This made it possible to write clean, type-stable and modular code. I was careful to use views and in-place operations such as A_mul_B!, pre-allocating all the necessary memory at the very beginning and simply resusing this memory later. The end result was a mostly in-place Julian version of the LOBPCG algorithm that doesn’t work! I had to fix a ton of bugs, often self-reflecting, and fantasizing about a world with no programming bugs, really none at all!!!

After a long patient tracking down of all the bugs, I was finally happy to see my program working. At that point the program was fast and mostly in-place. I then did a fourth refactoring pass removing most repetitive patterns in the code which resulted in the current version on Github. Preliminary tests show that the LOBPCG algorithm is about >10x faster than the eigs function of Julia for generalized eigenvalues, but is 5-10x slower for simple eigenvalues. Something is fishy though, since it seems that eigs tries to factorize B when solving the generalized eigenvalue problem. I will look further into this with Harmen.

Next steps

The next step will be to move my code IterativeSolvers.jl and devise more elaborate tests to benchmark my code including different versions in the future.