-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Venado optimizations #755
base: master
Are you sure you want to change the base?
Venado optimizations #755
Conversation
o Back out bml_get_ptr_dense() from bml_getters o Write fortran wrapper for existing bml_get_data_ptr_dense() method o Write new bml_get_ld_dense() to enable magma matrix pointer use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#include "magma_v2.h" | ||
#endif | ||
|
||
void bml_set_N_dense( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a use case for setting N after construction. Is it to avoid constructing a new matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeanlucf22 Yes. This was addressed in the commit comment but the bullets needed fixing to clarify (now corrected). This is to avoid repeated memory allocations which became a bottleneck in MD simulations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks dangerous to me: you modify N without modifying anything else will lead to matrices in inconsistent state!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mention in the comments, it is unsafe. It's possible to change N so that the matrix will exceed its original allocation. It's exposed only as bml_set_N_dense(), for experts. It works. Can you explain how the matrix will be inconsistent? It's needed for an application, we need to figure out how to make this method available in some form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If needed we can protect against exceeding initial size by adding an extra variable to the struct keeping track of the size of the originally allocated matrix. But it's less intrusive to leave the struct alone...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the performance issue comes from the allocation of the struct members domain and domain2:
bml_domain_t *domain; |
I looked into it at some point, but the fix I tried was breaking qmd-progress.
These two struct members could be static or initialized as needed.
Increasing N would obviously lead to a memory allocation too small to hold an NxN matrix.
What is the use case? Is N decreasing and you want to reuse the allocation? Whatever it is, I think looking into domain and domain2 is a safer and better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is needing to use a MAGMA working array of differing size on different MD iterations, where max N is known. The problem is that creating a new array each time is slow, due to the GPU allocation. The solution is to allocate using max N once and to resize the array using bml_set_N_dense() as needed. How can I use domain and domain2 to do this? I'm not familiar with those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand now.
I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time.
Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case.
Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported.
Meanwhile, what do you think about merging the current function?
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Jean-Luc Fattebert ***@***.***>
Sent: Monday, January 27, 2025 4:21:48 PM
To: lanl/bml ***@***.***>
Cc: Wall, Michael E ***@***.***>; Author ***@***.***>
Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755)
@jeanlucf22 commented on this pull request.
________________________________
In src/C-interface/dense/bml_setters_dense.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>:
@@ -3,6 +3,22 @@
#include "bml_setters_dense.h"
#include "bml_types_dense.h"
+#ifdef BML_USE_MAGMA
+#include "magma_v2.h"
+#endif
+
+void bml_set_N_dense(
I understand now.
I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time.
Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
o Add bml_transpose_new()
o Change the tests to use the new API
o Write fortran wrapper for existing bml_get_data_ptr_dense()
o Add new bml_get_ld_dense()
o This avoids unnecessary allocations and leads to substantial speedups
o Unsafe method that's exposed in fortran for dense matrices only