Science Lounge

GSoC - Divorcing that PR

In the last blog post, I talked about the LOBPCG algorithm and my GSoC project’s first 2 weeks. The 3rd and 4th weeks of my GSoC project were a long rally to get the LOBPCG PR out of the window.

Result and tracing

At the start of the 3rd week, I made a PR to IterativeSolvers.jl to finalize that week’s milestone. Soon after, I had a discussion with my mentor, Harmen Stoppels, and we agreed to implement a result and tracing structs approach similar to that of Optim.jl. The idea is to make it possible to return all the useful information that may be required by the user out of a LOBPCG run without having to litter the return statement with too many outputs. Had I chosen to return many outputs, the user would have had to remember the order these outputs are following which would be unnecessary mental workload. These useful information include things like: the number of iterations, tolerance, which eigenvectors converged, etc.

Another limitation of the implementation then was that a very basic trace was used which only allowed the storing of residual norms. While residual norms are important info to trace, one might also be interested in the trace of Ritz values. To that end, I implemented a LOBPCGState to store important information of every iteration. These are then stacked together if the log keyword argument was set to true. If the log keyword argument is set to false, the trace field of the LOBPCGResults struct is set to an empty LOBPCGTrace to maintain type stability.

Tests-based workflow

After adding the above features, I realized that not all the LOBPCG code was tested. So I sat down and wrote some more tests for the constraints, fixed some errors and tried to increase the coverage as much as possible. I must say this is the first time I pay close attention to rigorous test coverage of my programs. Usually I would just make sure my program is running in all cases using notebooks and call it a day. However to convince Harmen and other spectators that the functions are working correctly in all scenarios, I had to be systematic about my tests making sure the results on my machine are reproducible on Travis. Coming up with and getting used to a workflow for debugging the erroring and failing tests took me a while, but then I think I found a good workflow. The tests I made were using different number types, for generalized and simple eigenvalue problems, for single and multiple eigenvalues, with and without constraints, with and without preconditioners, etc. The cases were almost too many to try out sequentially in a notebook. The @testset macro which admits a for loop in the header was my friend though, enabling me to define different combinations of the above test cases using relatively few lines of code. However, if one test case wouldn’t work and I try to fix it, re-running all the tests again to check if they pass or not would take a few minutes. This is unreasonable. So as soon as I found a test case erroring or failing, I would extract the relevant code from the test/lobpcg.jl file and start playing with it in a separate Jupyter notebook. With the help of Revise.jl, I can then productively fix the problem with the code/test. So my final workflow was as follows:

  1. Use VSCode to develop the files/package.
  2. Run and debug basic manual test cases using a Jupyter notebook with the help of Revise.jl. This ensures that basic functionality is working.
  3. Use VSCode to develop rigorous tests with maximal code coverage.
  4. Use the command line to run test/runtests.jl.
  5. Isolate the first erorring/failing test in a Jupyter notebook.
  6. Use Jupyter, Revise.jl and VSCode to debug and fix the code and/or test case.
  7. Move on to the next erroring/failing test until all are passed.
  8. Commit the desired changes with the help of VSCode. Worthy of mention is the (Un)Stage Selected Range option of VSCode that I often use to select specific lines to stage in a commit. This makes it easier to separate changes made in the same coding session by their context.

Decoupling the block size and number of eigenvalues

After all tests had passed and the PR was ready to merge, Harmen made one valid point about the possibility of separating nev which is the number of eigenvalues/eigenvectors desired and the block size k of the algorithm. He suggested to make it possible to find the k of the nev eigenvectors at a time. This may be useful in cases where not all the Ritz vectors converge at a similar rate. It may therefore be more efficient to use a smaller block size k < nev sequentially to find all the nev eigenvectors, k at a time, than to simply have k == nev. Supporting this functionality required adding an efficient Constraint updating mechanism. While the main idea seemed easy, that is just finding k eigenvectors in every iteration and adding them to the constraints to find the next batch of eigenvectors, the devil was indeed in the details. Efficiently updating the Constraint and handling corner cases where nev is not a multiple of k as well creating and updating the LOBPCGResults struct were the main challenges. After some hours of struggling with this, it was finally ready to merge!

My Eigen-Journey

LOBPCG_blog

This blog post is about eigenvalues. If you are not on good terms with the eigen-family, this is a good time to reconcile. Happy Australian reconciliation week!

Eigenvalues are special mathematical properties that square matrices have. The eigenvalues of a matrix say something about the extent of deformation the sides of a unit square of corners and would incur relative to each other, if you multiply by the matrix which could be for example. An eigenvalue of the square matrix and its corresponding eigenvector satisfy the following mathematical relation:

Generalized eigenvalues and eigenvectors are properties of square matrix pairs and , generalizing the above definition to:

Simple eigenvalues and eigenvectors satsify the generalized condition where is the identity matrix.

Motivation from Structural Mechanics

Given a certain loaded structure, one might want to ask: what is the biggest number that I can multiply the current loads by without causing the structure to fall apart? Generalized eigenvalues help us answer this question.

One straightforward way to answer the above question is to use some finite element software such as ABAQUS or ANSYS and compute the maximum von Mises stress that the current load configuration results in, then the answer would be where is the yield stress of the material. This is because multiplying the loads by a certain factor roughly multiplies the stress by the same factor given certain conventional assumptions. While this answer is perfectly reasonable for a good number of structures, there is a problem with it, and it is not in the linear proportionality of stress and strain. The problem is that elastic strain is fundamentally nonlinear.

Rise of the quadratic terms

Strain is a measure of deformation. For 1D structures such as a spring, strain is often treated as a linear function of how much the tip of the spring has moved. More specifically, it is defined as the ratio of the displacement of the tip and the length of the unloaded spring. This however is only true under the assumption of small strain. More correctly and generally for 2D and 3D structures, the strain, also known as the Green-Lagrange strain, is defined using the following formula:

where is the unit displacement along the axis, of the point originally located at the position vector , when the structure was undeformed. is the rate of change of the function as the point moves along the axis. The above definition uses tensor notation such that:

where is the dimension of the problem.

The above more correct definition of strain is nonlinear in the displacement function’s derivatives. However typical linear stress analysis assumes the so-called infinitesimal strain theory which lets us drop the quadratic term from the above definition. This quadratic term can however come back and bite us in the neck leaving a mark that says Buckling.

Buckling happens when there exists a zero-energy deformation mode at a certain stress level. If a zero-energy deformation exists at the current state of the system, it is possible that subject to a finger flick, that the system may deform significantly along the zero-energy deformation mode vectors. Such can result in the phenonmenon known as buckling, and the system at this state is an unstable system.

Predicting when these zero-energy deformation modes are likely to show up is a challenging task. However, for the cases where we have a single load (or multiple treated as one), it is simple to answer the question: what is the biggest number that I can multiply the current load by without reaching an unstable state?

To cut a long story short, taking the above quadratic terms into account and using certain assumptions, the smallest value for that satisfies the equation below for some is an estimate of the answer to the above question:

where is a symmetric positive definite (SPD) matrix derived from the material of the structure and its shape discretization, and is a symmetric matrix derived from the shape discretization and stress distribution obtained using stress analysis.

The equation above maps closely to the generalized eigenvalue problem where and . The rest of this article will therefore be dedicated to finding the minimum (or maximum) eigenvalues of the system .

The locally optimal block preconditioned conjugate gradient (LOBPCG) method

From now on, I will only be talking about symmetric matrices and . I will also assume that is positive definite. If is not positive definite but is, then one can find the maximum eigenvalue of the system instead, then invert the eigenvalue. This will give us the minimum eigenvalue of .

One way to find the minimum (maximum) eigenvalues of the system is to minimize (maximize) the so-called Rayleigh quotient on the -ellipsoid. The Rayleigh quotient is defined as:

A vector lies on the -ellipsoid if it satisfies the condition . One might also be interested in finding the smallest (largest) eigenvalues and their corresponding eigenvectors in which case the condition needs to be satisfied, where the columns of are the eigenvectors corresponding to the smallest (largest) eigenvalues. This condition means that any 2 eigenvectors and are -orthogonal, that is . It also means that each eigenvector is -normalized, that is .

The LOBPCG algorithm uses an enhanced conjugate gradient (CG) method. The details of the algorithm will be left for another blog post. In the rest of this post, I will talk about my implementation challenges and progress as part of the Google Summer of Code (GSoC) with the NumFOCUS organization.

LOBPCG.jl

As part of my GSoC project, I am expected to implement the LOBPCG algorithm in the package IterativeSolvers.jl. To make it easier to modify stuff and upload unfinished bugged code showcasing my progress, I created the LOBPCG.jl repository to host my code.

Which algorithm is the LOBPCG?

One challenge I had initially was to nail down one version of the algorithm to label it “LOBPCG”. For that and following a suggestion from my mentor Harmen Stoppels, I found the Python implementation of the algorithm by its inventor and a collaborator. The code was BSD licensed and since my translation to Julia would be considered “derived work”, I had to make an acknowledgement at the top of the file giving credit to the original authors and putting all necessary disclaimers that are part of the BSD license.

Python -> Julia -> Fast Julia

After reading the Python implementation, I translated it to Julia very naively, allocating a lot of memory unnecessarily, using type unstable code and using repetitive codes all over the place. It was a very close translation of the Python version so it had a lot of the Python features that makes it, well…slow! After making sure the translation is working, I refactored the code a bit to identify key elements of the program, it was a big mess with over 30 matrices interacting. This helped me identify the patterns in the code, which enabled my third refactoring pass. That third pass was the main pass. I used Julia’s callable structs feature which lets me group variables by the functions using them (pretty much the inverse of object-oriented programming). This made it possible to write clean, type-stable and modular code. I was careful to use views and in-place operations such as A_mul_B!, pre-allocating all the necessary memory at the very beginning and simply resusing this memory later. The end result was a mostly in-place Julian version of the LOBPCG algorithm that doesn’t work! I had to fix a ton of bugs, often self-reflecting, and fantasizing about a world with no programming bugs, really none at all!!!

After a long patient tracking down of all the bugs, I was finally happy to see my program working. At that point the program was fast and mostly in-place. I then did a fourth refactoring pass removing most repetitive patterns in the code which resulted in the current version on Github. Preliminary tests show that the LOBPCG algorithm is about >10x faster than the eigs function of Julia for generalized eigenvalues, but is 5-10x slower for simple eigenvalues. Something is fishy though, since it seems that eigs tries to factorize B when solving the generalized eigenvalue problem. I will look further into this with Harmen.

Next steps

The next step will be to move my code IterativeSolvers.jl and devise more elaborate tests to benchmark my code including different versions in the future.