NVIDIA NCCL 2.28 Revolutionizes GPU Communication with New Device API

[ad_1]



Rebeca Moen
Nov 10, 2025 23:56

NVIDIA’s latest NCCL 2.28 release introduces a device API, enhancing communication and computation fusion for GPU networks, boosting performance and efficiency.





The NVIDIA Collective Communications Library (NCCL) has introduced its latest version, NCCL 2.28, a significant leap forward in GPU communication technology. This update focuses on the fusion of communication and computation, aiming to enhance throughput, reduce latency, and maximize GPU utilization across multi-GPU and multi-node systems, according to NVIDIA.

Key Features of NCCL 2.28

NCCL 2.28 brings several new features, including GPU-initiated networking, device APIs for communication-compute fusion, and copy-engine-based collectives. These innovations are designed to empower developers to create efficient, scalable distributed applications. The release also includes expanded APIs, improved tooling, and cleaner integration paths, facilitating the development of custom communication kernels.

Device API and Copy Engine Collectives

The new device API allows for the development of custom device kernels that integrate communication within NVIDIA CUDA kernels, removing the need for host-initiated operations. This integration reduces synchronization overhead, thus increasing throughput and reducing latency. Three operation modes are introduced: Load/Store Accessible (LSA), Multimem, and GPU Initiated Networking (GIN), each supporting different communication scenarios.

Moreover, the copy engine-based collectives enable efficient NVLink transfers by offloading communication tasks from streaming multiprocessors (SMs) to dedicated hardware. This approach minimizes resource contention, allowing simultaneous execution of communication and computation tasks.

NCCL Inspector for Enhanced Profiling

The NCCL Inspector, a new profiling tool, provides always-on observability and analysis of NCCL communication patterns. It offers detailed performance and metadata logging, allowing developers to analyze and debug collective operations efficiently. The plugin tracks each NCCL communicator individually, offering insights into performance patterns across different communication contexts.

Developer Experience Improvements

NCCL 2.28 enhances the developer experience with new APIs for operations like AllToAll, Gather, and Scatter. It introduces flexible configuration management through an environment plugin API, facilitating programmatic version matching and configuration storage agnostic setups. Additionally, the release supports CMake for Linux builds, streamlining integration into larger build pipelines.

For further details on NCCL 2.28 and its features, visit the official NVIDIA blog.

Image source: Shutterstock


[ad_2]

Source link

Santosh

Share
Published by
Santosh

Recent Posts

Stocks Vs Crypto vs Forex what to do?

Source Download video - Download Video

3 days ago

7 Most Time Management Tips | by Him eesh Madaan

Discover 7 magical time management techniques for 100% success. Do you want to achieve more…

4 days ago

THIS CHAKRA THAT SUMMONS ME IS IT MADARA’S

Source Download video - Download Video

5 days ago

2026 में Crypto Market में वापसी की जोरदार उम्मीद! | Bitcoin News

2026 में Crypto Market में वापसी की जोरदार उम्मीद! | Bitcoin News 2025 में क्रिप्टो…

6 days ago

Caffeinated Cowboys: A History of Coffee in the Old Wild West…

Coffee played an essential role in shaping the American frontier during the Old West. For…

1 week ago

Financial Education in Hindi Financial literacy

Financial Education in Hindi Financial Literacy Follow me here Qj1GXxO16XXOpVIuAYUNm7 youtube channelhttps://www.youtube.com/channel/UCZt6GXD3VnY4rsvXqLX8IQw Source Download video…

1 week ago

This website uses cookies.