-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a valid Neoverse N1 target. #623
base: master
Are you sure you want to change the base?
Conversation
|
||
// Initialize level-3 blocksize objects with architecture-specific values. | ||
// s d c z | ||
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 8, 6, -1, -1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, that's great to see neoverse n1 tuning. Can I ask you how you came up with these blocksize values ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! To be honest I just wanted the compiler to generate tuned neoverse-n1
code with this patch so blocksize values were taken from thunderx2
. If BLIS has a standard procedure to generate those value I am all up for it, please just let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I value what you did, however i don't have the answer for this.
@devinamatthews any pointer you could share ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egaudry Do you think the fine tuning is essential to merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a clear interface and arch detection makes sense indeed, however without proper tuning, mergers/reviewers might not see this as a priority.
Just guessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jeff Diamond has better tuning parameters for N1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffhammond Thanks for commenting. Could you please point me to Jeff Diamond so I could ask him if he is able to share his parameters please?
"The establishment" here. @everton1984 thanks for your work but @egaudry is pretty much right; it is best to have specifically-tuned block sizes and/or kernels with performance numbers before creating a new sub-configuration. Otherwise it is just easier to use the thunderx2 subconfig directly. I'll ask Jeff Diamond on the status of the tuned N1 parameters since that code may still be in the clutches of Oracle's lawyers. |
@devinamatthews Thanks for answering. No problem it makes sense, I can generate the parameters just wanted to know before trying something ad-hoc if there is a particularly defined procedure to obtain them. |
The block sizes can, to some extent, be determined analytically, see https://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf. A basic non-analytical strategy is:
Final note: The block sizes must satisfy MC%MR == 0 and NC%NR == 0. If possibly it doesn't hurt to have all three cache block sizes as multiples of both MR and NR unless this choice is too restrictive. It may also help to avoid large powers of 2. |
@devinamatthews Thanks a lot! Let me find the correct parameters then. |
This PR adds a valid Arm Neoverse N1 compilation target using Armv8 kernels. It creates the appropriate registry information and can autodetect a N1 cpu.