Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match Ginkgo's parameters/performance with a fast set of parameters from AMGX #1770

Open
rbourgeois33 opened this issue Jan 17, 2025 · 1 comment

Comments

@rbourgeois33
Copy link

rbourgeois33 commented Jan 17, 2025

Hi,

I am currently evaluating portable, GPU-enabled sparse linear algebra libraries. My goal is to solve a sparse linear system of 2,500,000 unknowns from our CFD code (TRUST) on a single NVIDIA A5000 GPU.

The method I wish to use is a preconditioned conjugate gradient with Classical AMG (Ruge-Stuben) preconditioner and Jacobi smoother.

For now, the best performance I could obtain with Ginkgo is not satisfactory when compared to AMGX:

Library niters setup time (s) Execution Time (s) time per iter (ms)
Ginkgo 44 1.56643 0.280997 6.3863
AMGX 6 0.297411 0.049706 8.2844

This table suggests that Ginkgo's underperformance does not stem from inefficient GPU utilization, as the time per iteration is very close (Ginkgo's iterations are even faster than AMGX's). It rather seems that the default parameters or algorithms used by Ginkgo lead to slower convergence (niter = 44 >> 6).

Therefore, I would like to use the same resolution parameters for both libraries, but it is unclear to me how to configure some options in the Ginkgo library API. Below are parameters selectable in AMGX that I could not figure out how to use in Ginkgo. I apologize in advance if my understanding of both libraries is wrong (I am new to AMG methods) or if the information is available in Ginkgo's documentation.

  1. Selector/parallel coarsening algorithm: PMIS (Parallel Maximal Independent Set)
    It seems that only the "parallel graph matching (PGM)" algorithm is available in Ginkgo, while only PMIS is available in Classical AMG in AMGX. Could this explain the performance mismatch ? Is PMIS actually availabale in Ginkgo ?

  2. Interpolator: D2
    From AMGX's reference manual: "Choose the method for setting interpolation weights. Allowable options are D1 and D2. [...] D2 uses an interpolatory set consisting of vertices that are indirectly connected as well ('distance 2' along the graph)."
    This seems crucial for the coarsening process. If I select D1 in AMGX, the solution does not converge. I was unable to find how to choose the interpolator in Ginkgo from the provided examples (e.g., multigrid-preconditioned-solver-customized.cpp). Is this an option I can configure when setting up PGM?

  3. Coarse solver: DENSE_LU_SOLVER and dense_lu_num_rows=2
    From the source code, I can see that direct LU is selected by default when building a multigrid solver. How can I explicitly set it as the coarse solver? Is there an equivalent to dense_lu_num_rows in Ginkgo and/or other selectable parameters ?

  4. Cycle iterations: 2
    This refers to "the number of CG iterations per outer iteration." I believe AMGX allows applying two CG iterations with only one AMG preconditioning step, but I am not entirely sure. Setting this to 1 in AMGX does not seem to affect performance. Is this selectable in Ginkgo ?

Thank you in advance for your help,
Cheers
Rémi

@yhmtsai
Copy link
Member

yhmtsai commented Jan 19, 2025

  1. Yes, the convergence is affected by the coarsening method. Unfortunately, we only support PGM at this moment.
    We also have fixed coarsening method but it requires the input of coarse selection. We have add custom coarsening method #1659 such that you can bring your own multigrid to fit into ginkgo structure.
  2. no, we do not provide the interpolator selection
  3. Ginkgo gives more control on that.
mg::build()
  .with_coarsest_solver(linop_factory_1, linop_factory_2, ...)
  .with_solver_selector(
  [] (size_type level, const LinOp* coarsest_matrix) -> size_type
    {
      if (coarsest_matrix->get_size()[0] > 1024) {return 0; // if matrix size is larger than 1024, select the first solver}
      else {return 1; // otherwise, select the second one}
    }
)....

you can give more than one option, and use the index of level or matrix property to select the coarsest solver.
4. I thoutht it only affects the CG and CGF cycle (to me it sounds like the K-cycle). If it is K-cycle, it does something like CG internal not the outer CG. if you do not select the CG/CGF, it does not affect the solver setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants