NVIDIA NCCL 2.28 Revolutionizes GPU Communication with New Device API

[ad_1]



Rebeca Moen
Nov 10, 2025 23:56

NVIDIA’s latest NCCL 2.28 release introduces a device API, enhancing communication and computation fusion for GPU networks, boosting performance and efficiency.





The NVIDIA Collective Communications Library (NCCL) has introduced its latest version, NCCL 2.28, a significant leap forward in GPU communication technology. This update focuses on the fusion of communication and computation, aiming to enhance throughput, reduce latency, and maximize GPU utilization across multi-GPU and multi-node systems, according to NVIDIA.

Key Features of NCCL 2.28

NCCL 2.28 brings several new features, including GPU-initiated networking, device APIs for communication-compute fusion, and copy-engine-based collectives. These innovations are designed to empower developers to create efficient, scalable distributed applications. The release also includes expanded APIs, improved tooling, and cleaner integration paths, facilitating the development of custom communication kernels.

Device API and Copy Engine Collectives

The new device API allows for the development of custom device kernels that integrate communication within NVIDIA CUDA kernels, removing the need for host-initiated operations. This integration reduces synchronization overhead, thus increasing throughput and reducing latency. Three operation modes are introduced: Load/Store Accessible (LSA), Multimem, and GPU Initiated Networking (GIN), each supporting different communication scenarios.

Moreover, the copy engine-based collectives enable efficient NVLink transfers by offloading communication tasks from streaming multiprocessors (SMs) to dedicated hardware. This approach minimizes resource contention, allowing simultaneous execution of communication and computation tasks.

NCCL Inspector for Enhanced Profiling

The NCCL Inspector, a new profiling tool, provides always-on observability and analysis of NCCL communication patterns. It offers detailed performance and metadata logging, allowing developers to analyze and debug collective operations efficiently. The plugin tracks each NCCL communicator individually, offering insights into performance patterns across different communication contexts.

Developer Experience Improvements

NCCL 2.28 enhances the developer experience with new APIs for operations like AllToAll, Gather, and Scatter. It introduces flexible configuration management through an environment plugin API, facilitating programmatic version matching and configuration storage agnostic setups. Additionally, the release supports CMake for Linux builds, streamlining integration into larger build pipelines.

For further details on NCCL 2.28 and its features, visit the official NVIDIA blog.

Image source: Shutterstock


[ad_2]

Source link

Santosh

Share
Published by
Santosh

Recent Posts

शेयर बाजार ने इन 4 वजहों से भरी उड़ान…2 घंटे में ही करीब 2% की धुआंधार तेजी – why are stock markets rising today sensex and nifty 4 big reasons including trump tariff pause

[ad_1] भारतीय शेयर बाजारों में शुक्रवार (11 अप्रैल) को जबरदस्त तेजी देखने को मिली। सेंसेक्स…

3 months ago

BTC Price Prediction: Bitcoin Eyes $100,000 Target by Year-End Despite Current Consolidation

[ad_1] Joerg Hiller Dec 13, 2025 13:56 BTC price prediction suggests…

3 months ago

Glassnode Unveils Latest Insights in The Bitcoin Vector #33

[ad_1] Lawrence Jengar Dec 10, 2025 12:37 Glassnode releases The Bitcoin…

3 months ago

जेफरीज के अनुसार 2026 में देखने योग्य शीर्ष उपभोक्ता वित्त स्टॉक्स

[ad_1] जेफरीज के अनुसार 2026 में देखने योग्य शीर्ष उपभोक्ता वित्त स्टॉक्स [ad_2] Source link

3 months ago

ARB Price Prediction: Targeting $0.24-$0.31 Recovery Despite Near-Term Weakness Through January 2025

[ad_1] Felix Pinkston Dec 10, 2025 12:39 ARB price prediction shows…

3 months ago

This website uses cookies.