Skip to content

Commit 87f840a

Browse files
DOCS: Update NEWS for 1.4 (#1119)
* DOCS: Update NEWS for 1.4 * DOCS: update authors
1 parent 050e806 commit 87f840a

File tree

2 files changed

+118
-25
lines changed

2 files changed

+118
-25
lines changed

AUTHORS

Lines changed: 36 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,36 @@
1-
Alex Margolin [email protected]
2-
Anatoly Vildemanov [email protected]
3-
Boris Karasev [email protected]
4-
Ching-Hsiang Chu [email protected]
5-
Devendar Bureddy [email protected]
6-
Ferrol Aderholdt [email protected]
7-
Geoffroy Vallee [email protected]
8-
Hessam Mirsadeghi [email protected]
9-
10-
Manjunath Gorentla Venkata [email protected]
11-
Mike Dubman [email protected]
12-
Pavel Shamis (Pasha) [email protected]
13-
Sergey Lebedev [email protected]
14-
Valentin Petrov [email protected]
15-
16-
Artem Ryabov [email protected]
17-
Shimmy Balsam [email protected]
1+
Alex Margolin [email protected]
2+
Alexey Rivkin [email protected]
3+
Anatoly Vildemanov [email protected]
4+
Andrii Bilokur [email protected]
5+
6+
Artem Ryabov [email protected]
7+
Boris Karasev [email protected]
8+
Brad Settlemyer [email protected]
9+
Ching-Hsiang Chu [email protected]
10+
Devendar Bureddy [email protected]
11+
Edgar Gabriel [email protected]
12+
Evgeny Keidar [email protected]
13+
Ferrol Aderholdt [email protected]
14+
Geoffroy Vallee [email protected]
15+
Hessam Mirsadeghi [email protected]
16+
Ilya Kryukov [email protected]
17+
18+
19+
Mamzi Bayatpour [email protected]
20+
Manjunath Gorentla Venkata [email protected]
21+
Masaki Kozuki [email protected]
22+
Mike Dubman [email protected]
23+
Nilesh M Negi [email protected]
24+
Nick Sarkauskas [email protected]
25+
Pavel Shamis (Pasha) [email protected]
26+
Pedram Alizadeh [email protected]
27+
Rob Bradford [email protected]
28+
Sam Nordmann [email protected]
29+
Sergey Lebedev [email protected]
30+
Shimmy Balsam [email protected]
31+
Sourav Chakraborty [email protected]
32+
Taekyung Heo [email protected]
33+
Tommy Janjusic [email protected]
34+
Valentin Petrov [email protected]
35+
36+
Yael Yacobovich [email protected]

NEWS

Lines changed: 82 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,80 @@
66

77
## Current
88

9+
## 1.4.0 (TBD)
10+
11+
## New Features and Enhancements
12+
13+
### Core
14+
- Implemented asymmetric memory support {PR #1000}
15+
- Enhanced error handling and resource cleanup {PR #960, #951}
16+
- Improved service team handling {PR #1046}
17+
- Fixed triggered post for zero size collectives {PR #960}
18+
19+
### CL/HIER
20+
- Added allgatherv support {PR #1111}
21+
- Implemented node subgroup unpacking {PR #1103}
22+
- Added reduce to supported collectives {PR #997}
23+
- Fixed integer overflow in alltoall {PR #944}
24+
25+
### TL/UCP
26+
- Split single and multithreaded send/receive operations {PR #1109}
27+
- Added knomial allgather with CUDA memory support {PR #1095}
28+
- Implemented reduce SRG knomial algorithm {PR #1058}
29+
- Added radix selection to knomial operations {PR #1072}
30+
- Added sliding window allreduce implementation {PR #958}
31+
- Added knomial allgatherv support {PR #1008}
32+
- Added sparbit algorithm for allgather {PR #940}
33+
- Extended broadcast active set support for size > 2 {PR #926}
34+
- Added knomial algorithm for reduce-scatter {PR #970}
35+
36+
### TL/MLX5
37+
- Added multicast-based zero-copy broadcast {PR #1087}
38+
- Implemented mcast multi-group support {PR #1060}
39+
- Added non-blocking CUDA memory copy support {PR #1040}
40+
- Added device memory multicast broadcast {PR #989}
41+
- Enhanced mcast allgather staging-based algorithm {PR #994}
42+
- Improved one-sided mcast reliability initialization {PR #980}
43+
- Various performance optimizations in alltoall {PR #1067}
44+
- Fixed fences in all-to-all WQEs {PR #1069}
45+
- Added context option to disable all-to-all operations {PR #1062}
46+
- Improved error handling and device checks {PR #1102}
47+
- Disabled mcast for thread multiple mode {PR #961}
48+
49+
### TL/SHARP
50+
- Added support for allgather operation {PR #1081}
51+
- Enabled reduce-scatter with SAT support {PR #1084}
52+
- Added SHARP multi-channel support {PR #1049}
53+
- Fixed service team OOB handling {PR #1001}
54+
- Improved internal OOB usage {PR #986}
55+
56+
### CUDA
57+
- Added linear broadcast implementation {PR #948}
58+
- Batch CUDA stream memory operations, reduced CPU and GPU execution overhead {PR #1093}
59+
- Enhanced error handling for CUDA context operations {PR #1025}
60+
- Fixed context cleanup in CUDA operations {PR #954}
61+
62+
### Build and Test
63+
- Added support for specific GPU architectures with ROCM {PR #987}
64+
- Added UCC pkg-config support {PR #1036}
65+
- Fixed build compatibility with NVC compiler {PR #1052}
66+
- Enhanced config parser functionality {PR #1092}
67+
- Enhanced ASAN/LSAN memory leak detection {PR #1074}
68+
- Added error checking and exit handling in gtests {PR #1083}
69+
70+
### Documentation
71+
- Updated README with UCC publication information {PR #1028}
72+
- Added DOCA_UROM documentation {PR #999}
73+
- Fixed Doxygen documentation issues {PR #1038}
74+
- Enhanced code style consistency {PR #1020}
75+
76+
### CL/DOCA_UROM
77+
- Implemented new DOCA UROM plugin {PR #978}
78+
- Added support for offloading collective operations to DPUs
79+
- Implemented allreduce collective
80+
81+
## 1.3.0 (April 18th, 2024)
82+
983
## New Features and Enhancements
1084

1185
### CL/HIER
@@ -207,7 +281,7 @@
207281
- Added support for multithreaded context progress
208282
- Added support for nonblocking team destroy
209283

210-
#### CL
284+
#### CL
211285

212286
- Added support for hierarchical collectives
213287
- Added support for hierarchical allreduce collective operation
@@ -219,12 +293,12 @@
219293

220294
##### UCP
221295

222-
- Added Bcast SAG algorithm for large messages
223-
- Added Knomial based reduce algorithm
296+
- Added Bcast SAG algorithm for large messages
297+
- Added Knomial based reduce algorithm
224298
- Making allgather and alltoall agree with the API
225299
- Added SRA knomial allreduce algorithm
226300
- Added pairwise alltoall and alltoallv algorithms
227-
- Added allgather and allgatherv ring algorithms
301+
- Added allgather and allgatherv ring algorithms
228302
- Added support for collective operations based on one-sided semantics
229303
- Added support for alltoall with one-sided transfer semantics
230304
- Bug fixes
@@ -237,7 +311,7 @@
237311
scatter, bcast, allgather and allgatherv
238312

239313
#### Tests
240-
- Updated tests to test the newly added algorithms and operations
314+
- Updated tests to test the newly added algorithms and operations
241315

242316

243317
## 0.1.0 (TBD)
@@ -256,12 +330,12 @@
256330
- Added support for configuring UCC library and contexts
257331

258332

259-
#### CL
333+
#### CL
260334

261335
- Added support for collectives, while the source and destination is either in
262-
CPU or device (GPU)
336+
CPU or device (GPU)
263337
- Added support for UCC_THREAD_MULTIPLE
264-
- Added support for CUDA stream-based collectives
338+
- Added support for CUDA stream-based collectives
265339

266340

267341
#### TL

0 commit comments

Comments
 (0)