Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement batched serial getrf #2331

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

yasahi-hpc
Copy link
Contributor

@yasahi-hpc yasahi-hpc commented Sep 10, 2024

This PR implements getrf function.

Following files are added:

  1. KokkosBatched_Getrf_Serial_Impl.hpp: Internal interfaces with implementation details
  2. KokkosBatched_Getrf.hpp: APIs
  3. Test_Batched_SerialGetrf.hpp: Unit tests for that

Detailed description

It computes an LU factorization of a real general M-by-N matrix A using partial pivoting with row interchanges.
Here, the matrix has the following shape.

  • A: (batch_count, m, n)
    On entry, the M-by-N matrix to be factored. On exit, the factors L and U from the factorization
    A = P * L * U; the unit diagonal elements of L are not stored.
  • IPIV: (batch_count, min(m, n))
    The pivot indices; for 0 <= i < min(M,N), row i of the matrix was interchanged with row IPIV(i).

Parallelization would be made in the following manner. This is efficient only when
A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).

Kokkos::parallel_for('getrf', 
    Kokkos::RangePolicy<execution_space> policy(0, n),
    [=](const int k) {
        auto aa = Kokkos::subview(_a, k, Kokkos::ALL(), Kokkos::ALL());
        auto ipiv = Kokkos::subview(_ipiv, k, Kokkos::ALL());

        KokkosBatched::SerialGetrf<AlgoTagType>::invoke(aa, ipiv);
    });

Tests

  1. Make a random matrix from random A and factorize it into LU with ipiv. Reconstruct L and U from LU. Then permute A by ipiv and confirm LU == A.
  2. Simple and small analytical test, i.e. choose A as follows to confirm LU == A.
A = [[1. 0. 0. 0.],
     [0. 1. 0. 0.],
     [0. 0. 1. 0.],
     [0. 0. 0. 1.]]
LU = [[1. 0. 0. 0.],
      [0. 1. 0. 0.],
      [0. 0. 1. 0.],
      [0. 0. 0. 1.]]
piv = [0 1 2 3]

Important remarks (edited 30/Oct)

  1. We have different implementations for CPUs and GPUs. For CPUs, we employ the original implementation with recursion. For GPUs, we use a static stack to achieve recursive calls. Device level recursion did not work appropriately.
  2. We need to add helper functions Laswp and Iamax. These are also used for getrs and gbtrf.
  3. In the current implementation, the maximum A size is 4096 x 4096 (=2^12) due to the limit of the stack size. In order to avoid this limit, we need to ask user to prepare a temporal buffer to achieve a stack.
  4. Currently, we call CPU version if View is accessible from host. This does not work appropriately if we execute the function on GPUs with USM. We call recursive version on Host and stuck version on Device by using KOKKOS_IF_ON_HOST and KOKKOS_IF_ON_DEVICE.
  5. For the complex matrix, we need to implement the complex versions of trsm and gemm first.
  6. Using gesv with dynamic_pivoting (getrf + getrs) shows comparable or better performance for most matrix sizes compared to gesv with static_pivoting on NVIDIA and AMD GPUs

@yasahi-hpc yasahi-hpc marked this pull request as draft September 10, 2024 07:44
@kokkos-devops-admin
Copy link

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

Yuuichi Asahi added 9 commits October 30, 2024 12:27
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
@yasahi-hpc yasahi-hpc force-pushed the implement-batched-serial-getrf branch from 689fba1 to 2ebf151 Compare October 30, 2024 03:28
Signed-off-by: Yuuichi Asahi <[email protected]>
@yasahi-hpc yasahi-hpc marked this pull request as ready for review October 30, 2024 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants