Accelerating CUDA C++ applications with multiple GPUs
Roderic Hill Building: 422
By participating in this workshop, you’ll:
- Use concurrent CUDA streams to overlap memory transfers with GPU computation
- Utilize all available GPUs on a single node to scale workloads across all available GPUs
- Combine the use of copy/compute overlap with multiple GPUs
- Rely on the NVIDIA Nsight™ Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop
Prerequisites:
- Professional experience programming CUDA C/C++ applications, including the use of the nvcc compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, and CUDA error handling
- Familiarity with the Linux command line
- Experience using makefiles to compile C/C++ code
Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.
Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated server in the cloud.
Food coupons will be given to attendees which can be used at all Taste Imperial outlets for lunch.
The course is free of charge, but if you register and do not attend and/or cancel with insufficient notice, we will charge you as outlined in the POD website T&C