1.4.0 minor release
The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.
Ginkgo's mixed-precision support is greatly enhanced thanks to:
- The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the accessor BLAS benchmarks repository as a usage example. - All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.
Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!
Supported systems and requirements:
- For all platforms, cmake 3.13+
- C++14 compliant compiler
- Linux and MacOS
- gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
- clang: 3.9+
- Intel compiler: 2018+
- Apple LLVM: 8.0+
- CUDA module: CUDA 9.0+
- HIP module: ROCm 3.5+
- DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to
dpcpp
.
- Windows
- MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
- Microsoft Visual Studio: VS 2019
- CUDA module: CUDA 9.0+, Microsoft Visual Studio
- OpenMP module: MinGW or Cygwin.
Algorithm and important feature additions:
- Add a new DPC++ Executor for SYCL execution and other base utilities
#648, #661, #757, #832 - Port matrix formats, solvers and related kernels to DPC++. For some kernels,
also make use of a shared kernel implementation for all executors (except
Reference). #710, #799, #779, #733, #844, #843, #789, #845, #849, #855, #856 - Add accessors which allow multi-precision kernels, among other things.
#643, #708 - Add support for mixed precision operations through apply in all LinOps. #677
- Add incomplete Cholesky factorizations and preconditioners as well as some
improvements to ILU. #672, #837, #846 - Add an AMGX implementation and kernels on all devices but DPC++.
#528, #695, #860 - Add a new mixed-precision capability solver, Compressed Basis GMRES
(CB-GMRES). #693, #763 - Add the IDR(s) solver. #620
- Add a new fixed-size block CSR matrix format (for the Reference executor).
#671, #730 - Add native mixed-precision support to the ELL format. #717, #780
- Add Reverse Cuthill-McKee reordering #500, #649
- Add matrix assembly support on CPUs. #644
- Extends ISAI from triangular to general and spd matrices. #690
Other additions:
- Add the possibility to apply real matrices to complex vectors.
#655, #658 - Add functions to compute the absolute of a matrix format. #636
- Add symmetric permutation and improve existing permutations.
#684, #657, #663 - Add a MachineTopology class with HWLOC support #554, #697
- Add an implicit residual norm criterion. #702, #818, #850
- Row-major accessor is generalized to more than 2 dimensions and a new
"block column-major" accessor has been added. #707 - Add an heat equation example. #698, #706
- Add ccache support in CMake and CI. #725, #739
- Allow tuning and benchmarking variables non intrusively. #692
- Add triangular solver benchmark #664
- Add benchmarks for BLAS operations #772, #829
- Add support for different precisions and consistent index types in benchmarks.
#675, #828 - Add a Github bot system to facilitate development and PR management.
#667, #674, #689, #853 - Add Intel (DPC++) CI support and enable CI on HPC systems. #736, #751, #781
- Add ssh debugging for Github Actions CI. #749
- Add pipeline segmentation for better CI speed. #737
Changes:
- Add a Scalar Jacobi specialization and kernels. #808, #834, #854
- Add implicit residual log for solvers and benchmarks. #714
- Change handling of the conjugate in the dense dot product. #755
- Improved Dense stride handling. #774
- Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. #703, #765, #740 - Allow specialization of submatrix and other dense creation functions in solvers. #718
- Improved Identity constructor and treatment of rectangular matrices. #646
- Allow CUDA/HIP executors to select allocation mode. #758
- Check if executors share the same memory. #670
- Improve test install and smoke testing support. #721
- Update the JOSS paper citation and add publications in the documentation.
#629, #724 - Improve the version output. #806
- Add some utilities for dim and span. #821
- Improved solver and preconditioner benchmarks. #660
- Improve benchmark timing and output. #669, #791, #801, #812
Fixes:
- Sorting fix for the Jacobi preconditioner. #659
- Also log the first residual norm in CGS #735
- Fix BiCG and HIP CSR to work with complex matrices. #651
- Fix Coo SpMV on strided vectors. #807
- Fix segfault of extract_diagonal, add short-and-fat test. #769
- Fix device_reset issue by moving counter/mutex to device. #810
- Fix
EnableLogging
superclass. #841 - Support ROCm 4.1.x and breaking HIP_PLATFORM changes. #726
- Decreased test size for a few device tests. #742
- Fix multiple issues with our CMake HIP and RPATH setup.
#712, #745, #709 - Cleanup our CMake installation step. #713
- Various simplification and fixes to the Windows CMake setup. #720, #785
- Simplify third-party integration. #786
- Improve Ginkgo device arch flags management. #696
- Other fixes and improvements to the CMake setup.
#685, #792, #705, #836 - Clarification of dense norm documentation #784
- Various development tools fixes and improvements #738, #830, #840
- Make multiple operators/constructors explicit. #650, #761
- Fix some issues, memory leaks and warnings found by MSVC.
#666, #731 - Improved solver memory estimates and consistent iteration counts #691
- Various logger improvements and fixes #728, #743, #754
- Fix for ForwardIterator requirements in iterator_factory. #665
- Various benchmark fixes. #647, #673, #722
- Various CI fixes and improvements. #642, #641, #795, #783, #793, #852