Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Expression Experiment #19

Open
2 of 3 tasks
rachitnigam opened this issue Oct 19, 2019 · 6 comments
Open
2 of 3 tasks

Index Expression Experiment #19

rachitnigam opened this issue Oct 19, 2019 · 6 comments

Comments

@rachitnigam
Copy link
Member

rachitnigam commented Oct 19, 2019

Experiment

Figure out if index expression analysis can catastrophically hurt DSE.

  • Two instances of the gemm kernel.
    • 1D Gemm kernel that takes two extra parameters x and out[1]. It performs the computation out[0] += x ^ addr where addr is the matrix address in the innermost loop and also performs the normal computation for GeMM.
    • 1D Gemm kernel that takes two parameters x and out[1] and indexes into the arrays by doing M[x ^ addr] where addr is the normal indexing expression while also doing out[0] += x ^ addr.
    • Both programs are run with parameters: partitioning = 16, 1 ported memories and unrolling factors [2, 4, 8] for both the two innermost loops 18 configurations.
    • See if the second program uses more resources and is slower than the first one.
@rachitnigam
Copy link
Member Author

RFC from @sampsyo and @tissue3.

@tissue3
Copy link
Contributor

tissue3 commented Oct 20, 2019

Sorry I still don't know what DSE is.

@rachitnigam
Copy link
Member Author

Design space exploration

@sampsyo
Copy link
Contributor

sampsyo commented Oct 22, 2019

Again, sounds just about perfect!

@rachitnigam
Copy link
Member Author

@sampsyo comments on the current heatmaps (permalink):

Wow; pretty weird outlier in the execution time results, huh? But it’s again odd that the execution time is so stable among the other points, even as the unrolling and partitioning changes…

The resource usage indeed goes up as expected but the runtime does not go down. One possible hypothesis is that the benchmark is memory bound -- the data transfer cost outweigh the total runtime of the gemm kernel. Figure out a way to validate this.

@rachitnigam
Copy link
Member Author

Also, note that unlike the misaligned-partition-and-unroll experiment where the unrolling and partitioning factors increase together and the runtime changes more predictably, this experiment uses single ported memories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants