-
Notifications
You must be signed in to change notification settings - Fork 505
TurbSim: OpenMP parallelization in factorization and FFT #3020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
- Very cludgy mess. Seems to work locally though.
f090d87 to
1637f4a
Compare
The TRH_in in the SHARED was causing memory issues. Removing that makes things identical now.
No need for the inner loop.
…se THREADPRIVATE and COPYIN to create the necessary copies of TRH array in CalcFourierCoeffs_IEC instead of doing it manually
…hen not using OpenMP or explicitly setting the BLAS vendor. This should reduce the chances of slowdowns from calling the MKL with many small calculations trying to distribute it over many threads.
andrew-platt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deslaughter's changes are great - a much better method than what I proposed.
| CONTAINS | ||
| SUBROUTINE Cleanup() | ||
|
|
||
| IF ( ALLOCATED( Dist ) ) DEALLOCATE( Dist ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this subroutine, we are allocating memory, but with removing the cleanup() reoutine, we are not explicitly deallocating it anymore. I know this routine is called only once, so maybe isn't a big concern, but I've seen this cause memory leaks or allocation errors because the array is already allocated when the routine gets called a second time. Thoughts?
Needs some review and testing.
Feature or improvement description
TurbSim has been single-threaded forever. This PR adds parallelization through OpenMP and should in theory run faster as a result.
If OpenMP is enabled at compile time, parallelization for MKL will be automatically turned off. We saw this the MKL parallelization is an issue in #3018 when the MKL would try to parallelize some tiny matrix manipulation incurring massive overhead costs.
Still need to extend to the following routines:
CalcFourierCoeffs_APICalcFourierCoeffs_GeneralCalcFourierCoeffs_NoneRelated issue, if one exists
#3018
Impacted areas of the software
TurbSim only
Additional supporting information
APRIVATE(<allocatable_array>)in a!$OMP PARALLEL DOpragma doesn't always allocate the copy of that array. This leads to segfaults. I have messy workarounds for this. There may be a better way to do it, but I have no idea if the ROCM Flang compiler will be able to handle anything more advanced.THREADPRIVATEwithCOPYINworks better than the original solutionTest results, if applicable
No test results should change.