{"id":635,"date":"2019-12-19T16:14:48","date_gmt":"2019-12-20T00:14:48","guid":{"rendered":"http:\/\/35.243.195.209\/?p=635"},"modified":"2019-12-19T16:14:48","modified_gmt":"2019-12-20T00:14:48","slug":"how-to-debug-async-kernels-or-apis-in-cuda","status":"publish","type":"post","link":"https:\/\/nanzhou.cc\/index.php\/2019\/12\/19\/how-to-debug-async-kernels-or-apis-in-cuda\/","title":{"rendered":"How to debug Async Kernels or APIs in CUDA"},"content":{"rendered":"<h2>Summary<\/h2>\n<p>In this post, I will introduce how to debug async kernels or async APIs in CUDA.  <\/p>\n<p>The async operations will not block CPU codes. When we check the return type of the functions calls, it may be SUCCESS but there are bugs like &quot;illegal memory access&quot;. On the other hand, when we find the return type is FAILED, it may mean that previous operations failed. But we do not know which operation has bugs. <\/p>\n<h2>Conclusion<\/h2>\n<p>Insert the following codes into the suspicious positions:<\/p>\n<pre><code class=\"language-c++\">{\n    cudaMemoryTest();\n    kernel_A&lt;&lt;&lt;...&gt;&gt;&gt;;\n\n    ......\n\n    cudaMemoryTest();\n    kernel_B&lt;&lt;&lt;...&gt;&gt;&gt;;\n\n    ......\n}<\/code><\/pre>\n<p>The macros are:<\/p>\n<pre><code class=\"language-c++\">#define cudaSafeCall(call)  \\\n        do {\\\n            cudaError_t err = call;\\\n            if (cudaSuccess != err) \\\n            {\\\n                std::cerr &lt;&lt; &quot;CUDA error in &quot; &lt;&lt; __FILE__ &lt;&lt; &quot;(&quot; &lt;&lt; __LINE__ &lt;&lt; &quot;): &quot; \\\n                    &lt;&lt; cudaGetErrorString(err);\\\n                exit(EXIT_FAILURE);\\\n            }\\\n        } while(0)\n\nvoid cudaMemoryTest()\n{\n    const unsigned int N = 1048576;\n    const unsigned int bytes = N * sizeof(int);\n    int *h_a = (int*)malloc(bytes);\n    int *d_a;\n    cudaSafeCall(cudaMalloc((int**)&amp;d_a, bytes));\n\n    memset(h_a, 0, bytes);\n    cudaSafeCall(cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice));\n    cudaSafeCall(cudaMemcpy(h_a, d_a, bytes, cudaMemcpyDeviceToHost));\n}<\/code><\/pre>\n<h2>Details<\/h2>\n<p>The above codes work since there are specific behaviors for the streams. In short, the function <code>cudaMemoryTest()<\/code> uses the default stream. The default stream will block all previous operations in all other streams (except for non-blocking streams). See <a href=\"http:\/\/35.243.195.209\/index.php\/2019\/12\/19\/sync-and-async-in-cuda\/\">my another post<\/a> for details.<\/p>\n<p>So when we call the function <code>cudaMemoryTest()<\/code>, it will block until all previous async codes finish. So if there are bugs, the function <code>cudaMemoryTest()<\/code> will fail. Thus, we can easily locate the bugs. <\/p>\n<h2>Reference<\/h2>\n<p><a href=\"https:\/\/nanxiao.me\/en\/an-empirical-method-of-debugging-illegal-memory-access-bug-in-cuda-programming\/\">This post from a hardcore engineer<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary In this post, I will introduce how to debug async kernels or async APIs in CUDA. The async operations will not block CPU codes. When we check the return type of the functions calls, it may be SUCCESS but there are bugs like &quot;illegal memory access&quot;. On the other hand, when we find the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30,29,31],"tags":[],"class_list":["post-635","post","type-post","status-publish","format-standard","hentry","category-cuda","category-gpu","category-parallel-computation"],"_links":{"self":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/635","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/comments?post=635"}],"version-history":[{"count":1,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/635\/revisions"}],"predecessor-version":[{"id":636,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/635\/revisions\/636"}],"wp:attachment":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/media?parent=635"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/categories?post=635"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/tags?post=635"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}