LSMR further updates (#111)

* bring LSMR implementation of #46 up to date * formatting * use VectorInterface and update tests * add docs for LSMR * add newline at end of `lsmr.jl` to pass formatting check * update LSMR implementation * fix docs, tests and formatting * actually add docs page --------- Co-authored-by: victor <[email protected]>
Jutho · Jan 6, 2025 · 478db80 · 478db80
1 parent 4d2a06f
commit 478db80
Show file tree

Hide file tree

Showing 15 changed files with 418 additions and 36 deletions.
diff --git a/Project.toml b/Project.toml
@@ -27,7 +27,7 @@ Printf = "1"
 Random = "1"
 Test = "1"
 TestExtras = "0.2,0.3"
-VectorInterface = "0.4"
+VectorInterface = "0.4,0.5"
 Zygote = "0.6"
 julia = "1.6"
 

diff --git a/docs/make.jl b/docs/make.jl
@@ -7,6 +7,7 @@ makedocs(; modules=[KrylovKit],
          pages=["Home" => "index.md",
                 "Manual" => ["man/intro.md",
                              "man/linear.md",
+                             "man/leastsquares.md",
                              "man/eig.md",
                              "man/svd.md",
                              "man/matfun.md",

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -10,6 +10,7 @@ objects with vector like behavior (see below) as vectors.
 
 The high level interface of KrylovKit is provided by the following functions:
 *   [`linsolve`](@ref): solve linear systems `A*x = b`
+*   [`lssolve`](@ref): solve least square problems `A*x ≈ b`
 *   [`eigsolve`](@ref): find a few eigenvalues and corresponding eigenvectors of an
     eigenvalue problem `A*x = λ x`
 *   [`geneigsolve`](@ref): find a few eigenvalues and corresponding vectors of a
@@ -113,12 +114,12 @@ Here follows a wish list / to-do list for the future. Any help is welcomed and a
 
 *   More algorithms, including biorthogonal methods:
     -   for `linsolve`: L-GMRES, MINRES, BiCG, IDR(s), ...
+    -   for `lssolve`: LSQR, ...
     -   for `eigsolve`: BiLanczos, Jacobi-Davidson JDQR/JDQZ, subspace iteration (?), ...
     -   for `geneigsolve`: trace minimization, ...
 *   Support both in-place / mutating and out-of-place functions as linear maps
 *   Reuse memory for storing vectors when restarting algorithms (related to previous)
 *   Support non-BLAS scalar types using GeneralLinearAlgebra.jl and GeneralSchur.jl
-*   Least square problems
 *   Nonlinear eigenvalue problems
 *   Preconditioners
 *   Refined Ritz vectors, Harmonic Ritz values and vectors

diff --git a/docs/src/man/algorithms.md b/docs/src/man/algorithms.md
@@ -24,6 +24,7 @@ KrylovKit.MINRES
 GMRES
 KrylovKit.BiCG
 BiCGStab
+LSMR
 ```
 ## Specific algorithms for generalized eigenvalue problems
 ```@docs

diff --git a/docs/src/man/leastsquares.md b/docs/src/man/leastsquares.md
@@ -0,0 +1,10 @@
+# Least squares problems
+
+Least square problems take the form of finding `x` that minimises `norm(b - A*x)` where
+`A` should be a linear map. As opposed to linear systems, the input and output of the linear
+map do not need to be the same, so that `x` (input) and `b` (output) can live in different
+vector spaces. Such problems can be solved using the function `lssolve`:
+
+```@docs
+lssolve
+```
diff --git a/src/KrylovKit.jl b/src/KrylovKit.jl
@@ -28,7 +28,7 @@ using Random
 using PackageExtensionCompat
 const IndexRange = AbstractRange{Int}
 
-export linsolve, reallinsolve
+export linsolve, reallinsolve, lssolve
 export eigsolve, geneigsolve, realeigsolve, schursolve, svdsolve
 export exponentiate, expintegrator
 export orthogonalize, orthogonalize!!, orthonormalize, orthonormalize!!
@@ -37,7 +37,7 @@ export initialize, initialize!, expand!, shrink!
 export ClassicalGramSchmidt, ClassicalGramSchmidt2, ClassicalGramSchmidtIR
 export ModifiedGramSchmidt, ModifiedGramSchmidt2, ModifiedGramSchmidtIR
 export LanczosIterator, ArnoldiIterator, GKLIterator
-export CG, GMRES, BiCGStab, Lanczos, Arnoldi, GKL, GolubYe
+export CG, GMRES, BiCGStab, Lanczos, Arnoldi, GKL, GolubYe, LSMR
 export KrylovDefaults, EigSorter
 export RecursiveVec, InnerProductVec
 
@@ -236,6 +236,10 @@ include("linsolve/cg.jl")
 include("linsolve/gmres.jl")
 include("linsolve/bicgstab.jl")
 
+# lssolve
+include("lssolve/lssolve.jl")
+include("lssolve/lsmr.jl")
+
 # eigsolve and svdsolve
 include("eigsolve/eigsolve.jl")
 include("eigsolve/lanczos.jl")

diff --git a/src/algorithms.jl b/src/algorithms.jl
@@ -84,16 +84,16 @@ abstract type KrylovAlgorithm end
 
 # General purpose; good for linear systems, eigensystems and matrix functions
 """
-    Lanczos(; krylovdim = KrylovDefaults.krylovdim, maxiter = KrylovDefaults.maxiter,
-        tol = KrylovDefaults.tol, orth = KrylovDefaults.orth, eager = false, verbosity = 0)
+    Lanczos(; krylovdim=KrylovDefaults.krylovdim, maxiter=KrylovDefaults.maxiter,
+        tol=KrylovDefaults.tol, orth=KrylovDefaults.orth, eager=false, verbosity=0)
 
 Represents the Lanczos algorithm for building the Krylov subspace; assumes the linear
 operator is real symmetric or complex Hermitian. Can be used in `eigsolve` and
 `exponentiate`. The corresponding algorithms will build a Krylov subspace of size at most
 `krylovdim`, which will be repeated at most `maxiter` times and will stop when the norm of
 the residual of the Lanczos factorization is smaller than `tol`. The orthogonalizer `orth`
 will be used to orthogonalize the different Krylov vectors. Eager mode, as selected by
-`eager = true`, means that the algorithm that uses this Lanczos process (e.g. `eigsolve`)
+`eager=true`, means that the algorithm that uses this Lanczos process (e.g. `eigsolve`)
 can try to finish its computation before the total Krylov subspace of dimension `krylovdim`
 is constructed. Default verbosity level `verbosity` is zero, meaning that no output will be
 printed.
@@ -121,8 +121,8 @@ function Lanczos(;
 end
 
 """
-    GKL(; krylovdim = KrylovDefaults.krylovdim, maxiter = KrylovDefaults.maxiter,
-        tol = KrylovDefaults.tol, orth = KrylovDefaults.orth, verbosity = 0)
+    GKL(; krylovdim=KrylovDefaults.krylovdim, maxiter=KrylovDefaults.maxiter,
+        tol=KrylovDefaults.tol, orth=KrylovDefaults.orth, verbosity=0)
 
 Represents the Golub-Kahan-Lanczos bidiagonalization algorithm for sequentially building a
 Krylov-like factorization of a general matrix or linear operator with a bidiagonal reduced
@@ -153,15 +153,15 @@ function GKL(;
 end
 
 """
-    Arnoldi(; krylovdim = KrylovDefaults.krylovdim, maxiter = KrylovDefaults.maxiter,
-        tol = KrylovDefaults.tol, orth = KrylovDefaults.orth, eager = false, verbosity = 0)
+    Arnoldi(; krylovdim=KrylovDefaults.krylovdim, maxiter=KrylovDefaults.maxiter,
+        tol=KrylovDefaults.tol, orth=KrylovDefaults.orth, eager=false, verbosity=0)
 
 Represents the Arnoldi algorithm for building the Krylov subspace for a general matrix or
 linear operator. Can be used in `eigsolve` and `exponentiate`. The corresponding algorithms
 will build a Krylov subspace of size at most `krylovdim`, which will be repeated at most
 `maxiter` times and will stop when the norm of the residual of the Arnoldi factorization is
 smaller than `tol`. The orthogonalizer `orth` will be used to orthogonalize the different
-Krylov vectors. Eager mode, as selected by `eager = true`, means that the algorithm that
+Krylov vectors. Eager mode, as selected by `eager=true`, means that the algorithm that
 uses this Arnoldi process (e.g. `eigsolve`) can try to finish its computation before the
 total Krylov subspace of dimension `krylovdim` is constructed. Default verbosity level
 `verbosity` is zero, meaning that no output will be printed.
@@ -190,16 +190,16 @@ function Arnoldi(;
 end
 
 """
-    GolubYe(; krylovdim = KrylovDefaults.krylovdim, maxiter = KrylovDefaults.maxiter,
-        tol = KrylovDefaults.tol, orth = KrylovDefaults.orth, verbosity = 0)
+    GolubYe(; krylovdim=KrylovDefaults.krylovdim, maxiter=KrylovDefaults.maxiter,
+        tol=KrylovDefaults.tol, orth=KrylovDefaults.orth, verbosity=0)
 
 Represents the Golub-Ye algorithm for solving hermitian (symmetric) generalized eigenvalue
 problems `A x = λ B x` with positive definite `B`, without the need for inverting `B`.
 Builds a Krylov subspace of size `krylovdim` starting from an estimate `x` by acting with
 `(A - ρ(x) B)`, where `ρ(x) = dot(x, A*x)/dot(x, B*x)`, and employing the Lanczos
 algorithm. This process is repeated at most `maxiter` times. In every iteration `k>1`, the
 subspace will also be expanded to size `krylovdim+1` by adding ``x_k - x_{k-1}``, which is
-known as the LOPCG correction and was suggested by Money and Ye. With `krylovdim = 2`, this
+known as the LOPCG correction and was suggested by Money and Ye. With `krylovdim=2`, this
 algorithm becomes equivalent to `LOPCG`.
 """
 struct GolubYe{O<:Orthogonalizer,S<:Real} <: KrylovAlgorithm
@@ -222,7 +222,7 @@ end
 abstract type LinearSolver <: KrylovAlgorithm end
 
 """
-    CG(; maxiter = KrylovDefaults.maxiter, tol = KrylovDefaults.tol)
+    CG(; maxiter=KrylovDefaults.maxiter, tol=KrylovDefaults.tol, verbosity=0)
 
 Construct an instance of the conjugate gradient algorithm with specified parameters, which
 can be passed to `linsolve` in order to iteratively solve a linear system with a positive
@@ -231,7 +231,7 @@ will search for the optimal `x` in a Krylov subspace of maximal size `maxiter`,
 `norm(A*x - b) < tol`. Default verbosity level `verbosity` is zero, meaning that no output
 will be printed.
 
-See also: [`linsolve`](@ref), [`MINRES`](@ref), [`GMRES`](@ref), [`BiCG`](@ref),
+See also: [`linsolve`](@ref), [`MINRES`](@ref), [`GMRES`](@ref), [`BiCG`](@ref), [`LSMR`](@ref),
 [`BiCGStab`](@ref)
 """
 struct CG{S<:Real} <: LinearSolver
@@ -247,8 +247,8 @@ function CG(;
 end
 
 """
-    GMRES(; krylovdim = KrylovDefaults.krylovdim, maxiter = KrylovDefaults.maxiter,
-        tol = KrylovDefaults.tol, orth::Orthogonalizer = KrylovDefaults.orth)
+    GMRES(; krylovdim=KrylovDefaults.krylovdim, maxiter=KrylovDefaults.maxiter,
+        tol=KrylovDefaults.tol, orth::Orthogonalizer=KrylovDefaults.orth)
 
 Construct an instance of the GMRES algorithm with specified parameters, which can be passed
 to `linsolve` in order to iteratively solve a linear system. The `GMRES` method will search
@@ -262,7 +262,7 @@ to as the restart parameter, and `maxiter` is the number of outer iterations, i.
 cycles. The total iteration count, i.e. the number of expansion steps, is roughly
 `krylovdim` times the number of iterations.
 
-See also: [`linsolve`](@ref), [`BiCG`](@ref), [`BiCGStab`](@ref), [`CG`](@ref),
+See also: [`linsolve`](@ref), [`BiCG`](@ref), [`BiCGStab`](@ref), [`CG`](@ref), [`LSMR`](@ref),
 [`MINRES`](@ref)
 """
 struct GMRES{O<:Orthogonalizer,S<:Real} <: LinearSolver
@@ -283,7 +283,7 @@ end
 
 # TODO
 """
-    MINRES(; maxiter = KrylovDefaults.maxiter, tol = KrylovDefaults.tol)
+    MINRES(; maxiter=KrylovDefaults.maxiter, tol=KrylovDefaults.tol)
 
     !!! warning "Not implemented yet"
 
@@ -295,7 +295,7 @@ end
     orthogonalizer `orth`. Default verbosity level `verbosity` is zero, meaning that no
     output will be printed.
 
-See also: [`linsolve`](@ref), [`CG`](@ref), [`GMRES`](@ref), [`BiCG`](@ref),
+See also: [`linsolve`](@ref), [`CG`](@ref), [`GMRES`](@ref), [`BiCG`](@ref), [`LSMR`](@ref),
 [`BiCGStab`](@ref)
 """
 struct MINRES{S<:Real} <: LinearSolver
@@ -311,7 +311,7 @@ function MINRES(;
 end
 
 """
-    BiCG(; maxiter = KrylovDefaults.maxiter, tol = KrylovDefaults.tol)
+    BiCG(; maxiter=KrylovDefaults.maxiter, tol=KrylovDefaults.tol)
 
     !!! warning "Not implemented yet"
 
@@ -322,7 +322,7 @@ end
     b) < tol`. Default verbosity level `verbosity` is zero, meaning that no output will be
     printed.
 
-See also: [`linsolve`](@ref), [`GMRES`](@ref), [`CG`](@ref), [`BiCGStab`](@ref),
+See also: [`linsolve`](@ref), [`GMRES`](@ref), [`CG`](@ref), [`BiCGStab`](@ref), [`LSMR`](@ref),
 [`MINRES`](@ref)
 """
 struct BiCG{S<:Real} <: LinearSolver
@@ -338,15 +338,15 @@ function BiCG(;
 end
 
 """
-    BiCGStab(; maxiter = KrylovDefaults.maxiter, tol = KrylovDefaults.tol)
+    BiCGStab(; maxiter=KrylovDefaults.maxiter, tol=KrylovDefaults.tol)
 
     Construct an instance of the Biconjugate gradient algorithm with specified parameters,
     which can be passed to `linsolve` in order to iteratively solve a linear system general
     linear map. The `BiCGStab` method will search for the optimal `x` in a Krylov subspace
     of maximal size `maxiter`, or stop when `norm(A*x - b) < tol`. Default verbosity level
     `verbosity` is zero, meaning that no output will be printed.
 
-See also: [`linsolve`](@ref), [`GMRES`](@ref), [`CG`](@ref), [`BiCG`](@ref),
+See also: [`linsolve`](@ref), [`GMRES`](@ref), [`CG`](@ref), [`BiCG`](@ref), [`LSMR`](@ref),
 [`MINRES`](@ref)
 """
 struct BiCGStab{S<:Real} <: LinearSolver
@@ -361,6 +361,36 @@ function BiCGStab(;
     return BiCGStab(maxiter, tol, verbosity)
 end
 
+# Solving least squares problems
+abstract type LeastSquaresSolver <: KrylovAlgorithm end
+"""
+LSMR(; maxiter=KrylovDefaults.maxiter, tol=KrylovDefaults.tol, verbosity=0)
+
+Represents the LSMR algorithm, which minimizes ``\\|Ax - b\\|^2 + \\|λx\\|^2`` in the Euclidean norm.
+If multiple solutions exists the minimum norm solution is returned.
+The method is based on the Golub-Kahan bidiagonalization process. It is
+algebraically equivalent to applying MINRES to the normal equations
+``(A^*A + λ^2I)x = A^*b``, but has better numerical properties,
+especially if ``A`` is ill-conditioned.
+
+The `LSMR` method will search for the optimal ``x`` in a Krylov subspace of maximal size 
+`maxiter`, or stop when ``norm(A'*(A*x - b) + λ^2 * x) < tol``. Default verbosity level
+`verbosity` is zero, meaning that no output will be printed.
+
+See also: [`lssolve`](@ref)
+"""
+struct LSMR{S<:Real} <: LeastSquaresSolver
+    maxiter::Int
+    tol::S
+    verbosity::Int
+end
+function LSMR(;
+              maxiter::Integer=KrylovDefaults.maxiter,
+              tol::Real=KrylovDefaults.tol,
+              verbosity::Int=0)
+    return LSMR(maxiter, tol, verbosity)
+end
+
 # Solving eigenvalue systems specifically
 abstract type EigenSolver <: KrylovAlgorithm end
 

diff --git a/src/eigsolve/arnoldi.jl b/src/eigsolve/arnoldi.jl
@@ -58,7 +58,7 @@ currently available.
 The return value is always of the form `T, vecs, vals, info = schursolve(...)` with
 
   - `T`: a `Matrix` containing the partial Schur decomposition of the linear map, i.e. it's
-    elements are given by `T[i,j] = dot(vecs[i], f(vecs[j]))`. It is of Schur form, i.e.
+    elements are given by `T[i,j] = inner(vecs[i], f(vecs[j]))`. It is of Schur form, i.e.
     upper triangular in case of complex arithmetic, and block upper triangular (with at most
     2x2 blocks) in case of real arithmetic.
   - `vecs`: a `Vector` of corresponding Schur vectors, of the same length as `vals`. Note
@@ -67,7 +67,7 @@ The return value is always of the form `T, vecs, vals, info = schursolve(...)` w
     objects that are typically similar to the starting guess `x₀`, up to a possibly
     different `eltype`. When the linear map is a simple `AbstractMatrix`, `vecs` will be
     `Vector{Vector{<:Number}}`. Schur vectors are by definition orthogonal, i.e.
-    `dot(vecs[i],vecs[j]) = I[i,j]`. Note that Schur vectors are real if the problem (i.e.
+    `inner(vecs[i],vecs[j]) = I[i,j]`. Note that Schur vectors are real if the problem (i.e.
     the linear map and the initial guess) are real.
   - `vals`: a `Vector` of eigenvalues, i.e. the diagonal elements of `T` in case of complex
     arithmetic, or extracted from the diagonal blocks in case of real arithmetic. Note that

diff --git a/src/linsolve/cg.jl b/src/linsolve/cg.jl
@@ -19,7 +19,14 @@ function linsolve(operator, b, x₀, alg::CG, a₀::Real=0, a₁::Real=1; alg_rr
     numiter = 0
 
     # Check for early return
-    normr < tol && return (x, ConvergenceInfo(1, r, normr, numiter, numops))
+    if normr < tol
+        if alg.verbosity > 0
+            @info """CG linsolve converged without any iterations:
+             *  norm of residual = $normr
+             *  number of operations = 1"""
+        end
+        return (x, ConvergenceInfo(1, r, normr, numiter, numops))
+    end
 
     # First iteration
     ρ = normr^2
@@ -34,6 +41,14 @@ function linsolve(operator, b, x₀, alg::CG, a₀::Real=0, a₁::Real=1; alg_rr
     β = ρ / ρold
     numops += 1
     numiter += 1
+    if normr < tol
+        if alg.verbosity > 0
+            @info """CG linsolve converged at iteration $numiter:
+             *  norm of residual = $normr
+             *  number of operations = $numops"""
+        end
+        return (x, ConvergenceInfo(1, r, normr, numiter, numops))
+    end
     if alg.verbosity > 1
         msg = "CG linsolve in iter $numiter: "
         msg *= "normres = "
@@ -62,6 +77,8 @@ function linsolve(operator, b, x₀, alg::CG, a₀::Real=0, a₁::Real=1; alg_rr
             ρ = normr^2
             β = ρ / ρold
         end
+        numops += 1
+        numiter += 1
         if normr < tol
             if alg.verbosity > 0
                 @info """CG linsolve converged at iteration $numiter:
@@ -70,8 +87,6 @@ function linsolve(operator, b, x₀, alg::CG, a₀::Real=0, a₁::Real=1; alg_rr
             end
             return (x, ConvergenceInfo(1, r, normr, numiter, numops))
         end
-        numops += 1
-        numiter += 1
         if alg.verbosity > 1
             msg = "CG linsolve in iter $numiter: "
             msg *= "normres = "

diff --git a/src/linsolve/gmres.jl b/src/linsolve/gmres.jl
@@ -32,10 +32,11 @@ function linsolve(operator, b, x₀, alg::GMRES, a₀::Number=0, a₁::Number=1;
     gs = Vector{Givens{T}}(undef, krylovdim)
     R = fill(zero(T), (krylovdim, krylovdim))
     numiter = 0
-    numops = 1 # operator has been applied once to determine T
+    numops = 1 # operator has been applied once to determine T and r
 
     iter = ArnoldiIterator(operator, r, alg.orth)
-    fact = initialize(iter)
+    fact = initialize(iter; verbosity=alg.verbosity - 2)
+    sizehint!(fact, alg.krylovdim)
     numops += 1 # start applies operator once
 
     while numiter < maxiter # restart loop
@@ -56,7 +57,7 @@ function linsolve(operator, b, x₀, alg::GMRES, a₀::Number=0, a₁::Number=1;
         end
 
         while (β > tol && length(fact) < krylovdim) # inner arnoldi loop
-            fact = expand!(iter, fact)
+            fact = expand!(iter, fact; verbosity=alg.verbosity - 2)
             numops += 1 # expand! applies the operator once
             k = length(fact)
             H = rayleighquotient(fact)
@@ -133,7 +134,7 @@ function linsolve(operator, b, x₀, alg::GMRES, a₀::Number=0, a₁::Number=1;
 
         # Restart Arnoldi factorization with new r
         iter = ArnoldiIterator(operator, r, alg.orth)
-        fact = initialize!(iter, fact)
+        fact = initialize!(iter, fact; verbosity=alg.verbosity - 2)
     end
 
     if alg.verbosity > 0

diff --git a/src/linsolve/linsolve.jl b/src/linsolve/linsolve.jl
@@ -85,7 +85,7 @@ efficiently. Check the documentation for more information on the possible values
 
 The final (expert) method, without default values and keyword arguments, is the one that is
 finally called, and can also be used directly. Here, one specifies the algorithm explicitly.
-Currently, only [`CG`](@ref), [`GMRES`](@ref) and [`BiCGStab`](@ref) are implemented, where
+Currently, only [`CG`](@ref), [`GMRES`](@ref), [`BiCGStab`](@ref) and [`LSMR`](@ref) are implemented, where
 `CG` is chosen if `isposdef == true` and `GMRES` is chosen otherwise. Note that in standard
 `GMRES` terminology, our parameter `krylovdim` is referred to as the *restart* parameter,
 and our `maxiter` parameter counts the number of outer iterations, i.e. restart cycles. In