Category: CUDA
How to debug Async Kernels or APIs in CUDA
Summary In this post, I will introduce how to debug async kernels or async APIs in CUDA. The async operations will not block CPU codes. When we check the return type of the functions calls, it may be SUCCESS but there are bugs like "illegal memory access". On the other hand, when we find the…
Sync and Async in CUDA
Summary In this post, I will introduce the Sync and Async behaviors in CUDA. Conclusion The followings are handy codes testing the behaviors of CPU and streams. Details There are two aspects, kernels and streams. 1. Kernels Some of my conclusions are, All kernels will return immediately no matter we use the default stream or…
Profile Applications in CUDA
Summary In this post, I will introduce how to use the tool nvprof to profile your CUDA applications. Details It is a good practice to dive deeper to see how much time each kernel or each CUDA runtime API takes when you want to optimize your applications. Intuition It is not good to use any…
Install CUDA 10.1 and Driver 418
Summary In this post, I will introduce how to install the newest CUDA and corresponding Nvidia driver in Ubuntu 16.04. Details I want to use CUDA for neural network inference. But after I compile the executable files and run, it tells me driver not compatible with this version of CUDA. I have GTX 1060 and…