Venado optimizations #755

mewall · 2025-01-14T16:54:26Z

Modify bml_transpose() fortran API to match the C API
o Add bml_transpose_new()
o Change the tests to use the new API
Add methods to get the pointer for MAGMA arrays, for use in Fortran OpenACC and OpenMP offload
o Write fortran wrapper for existing bml_get_data_ptr_dense()
o Add new bml_get_ld_dense()
Add bml_set_N_dense() to change the size of a bml array that's already been allocated
o This avoids unnecessary allocations and leads to substantial speedups
o Unsafe method that's exposed in fortran for dense matrices only

o Back out bml_get_ptr_dense() from bml_getters o Write fortran wrapper for existing bml_get_data_ptr_dense() method o Write new bml_get_ld_dense() to enable magma matrix pointer use

nicolasbock

LGTM

scripts/build_bml_cray.sh

jeanlucf22 · 2025-01-14T17:59:54Z

src/C-interface/dense/bml_setters_dense.c

+#include "magma_v2.h"
+#endif
+
+void bml_set_N_dense(


I don't see a use case for setting N after construction. Is it to avoid constructing a new matrix?

@jeanlucf22 Yes. This was addressed in the commit comment but the bullets needed fixing to clarify (now corrected). This is to avoid repeated memory allocations which became a bottleneck in MD simulations.

Looks dangerous to me: you modify N without modifying anything else will lead to matrices in inconsistent state!

As I mention in the comments, it is unsafe. It's possible to change N so that the matrix will exceed its original allocation. It's exposed only as bml_set_N_dense(), for experts. It works. Can you explain how the matrix will be inconsistent? It's needed for an application, we need to figure out how to make this method available in some form.

If needed we can protect against exceeding initial size by adding an extra variable to the struct keeping track of the size of the originally allocated matrix. But it's less intrusive to leave the struct alone...

I believe the performance issue comes from the allocation of the struct members domain and domain2:

bml/src/C-interface/dense/bml_types_dense.h

Line 30 in ef2ee26

bml_domain_t *domain;

I looked into it at some point, but the fix I tried was breaking qmd-progress.
These two struct members could be static or initialized as needed.
Increasing N would obviously lead to a memory allocation too small to hold an NxN matrix.

What is the use case? Is N decreasing and you want to reuse the allocation? Whatever it is, I think looking into domain and domain2 is a safer and better alternative.

The issue is needing to use a MAGMA working array of differing size on different MD iterations, where max N is known. The problem is that creating a new array each time is slow, due to the GPU allocation. The solution is to allocate using max N once and to resize the array using bml_set_N_dense() as needed. How can I use domain and domain2 to do this? I'm not familiar with those.

I understand now.

I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time.

Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case.

scripts/build_chicoma_hackathon.sh

mewall · 2025-01-28T00:41:44Z

Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: Jean-Luc Fattebert ***@***.***> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml ***@***.***> Cc: Wall, Michael E ***@***.***>; Author ***@***.***> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) @jeanlucf22 commented on this pull request.

________________________________ In src/C-interface/dense/bml_setters_dense.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>:

@@ -3,6 +3,22 @@

#include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$>. You are receiving this because you authored the thread.Message ID: ***@***.***>

mewall added 10 commits January 14, 2025 09:30

Add build script for venado hackathon

e827603

Add preliminary method to set matrix size

096d3fe

Add build scripts for hackathon

eac6576

Add bml_transpose_inplace Fortran subroutine

e7a8fc0

Modify Fortran bml_transpose API to match the C interface

b1dcbb1

Modify tests to use new transpose API

7ded60f

Venado build modifications

87d02dd

Move build scripts to scripts/ dir

d766123

Expose dense matrix pointer using bml_get_ptr_dense

84be6e9

New introspection methods

559bc8e

o Back out bml_get_ptr_dense() from bml_getters o Write fortran wrapper for existing bml_get_data_ptr_dense() method o Write new bml_get_ld_dense() to enable magma matrix pointer use

mewall requested review from nicolasbock, cnegre, suemni, jeanlucf22, tokshgithub and jmohdyusof as code owners January 14, 2025 16:54

suemni approved these changes Jan 14, 2025

View reviewed changes

nicolasbock approved these changes Jan 14, 2025

View reviewed changes

nicolasbock enabled auto-merge January 14, 2025 18:45

jeanlucf22 requested changes Jan 14, 2025

View reviewed changes

Remove LANL-specific build scripts

fc03d39

jeanlucf22 reviewed Jan 27, 2025

View reviewed changes

scripts/build_chicoma_hackathon.sh Outdated Show resolved Hide resolved

Remove an additional LANL specific build script

0fc96d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Venado optimizations #755

Venado optimizations #755

mewall commented Jan 14, 2025 •

edited

Loading

nicolasbock left a comment

jeanlucf22 Jan 14, 2025

mewall Jan 27, 2025

jeanlucf22 Jan 27, 2025

mewall Jan 27, 2025

mewall Jan 27, 2025

jeanlucf22 Jan 27, 2025

mewall Jan 27, 2025

jeanlucf22 Jan 27, 2025

mewall commented Jan 28, 2025 via email

Venado optimizations #755

Are you sure you want to change the base?

Venado optimizations #755

Conversation

mewall commented Jan 14, 2025 • edited Loading

nicolasbock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mewall commented Jan 28, 2025 via email

mewall commented Jan 14, 2025 •

edited

Loading