essay on programming languages, computer science, information techonlogies and all.

Thursday, February 28, 2013

OpenCL - pinned memory

Handling page-able memory must require constant intervention from CPU side. Or maybe DMA transfer can't be employed in full. I don't know what is the cause but the NVidia ask to use pinned memory for critical operation. It says that you may get 5GBps. Remember the last post which says that the copy throughput is around 3.3 GB/s.

Below is the modified code snippet for the pinned memory. Host memory is allocated with flag CL_MEM_ALLOC_HOST_PTR and get the raw pointer with mapping function. And these mapped raw pointer is used for buffer copy operation. It is a bit odd to use raw pointer not the host buffer handle but I guess it's for maintaining function interface.

...
cl::Buffer hostSrc( task.GetContext(), CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, width*height );
cl::Buffer hostDst( task.GetContext(), CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, width*height );

uint8_t *hostSrcPtr = 
  (unsigned char*) 
  queue.enqueueMapBuffer( hostSrc, CL_TRUE, CL_MAP_WRITE, 0, width*height );
uint8_t *hostDstPtr = 
  (unsigned char*) 
  queue.enqueueMapBuffer( hostSrc, CL_TRUE, CL_MAP_READ, 0, width*height );

...
queue.enqueueWriteBuffer( devSrc, CL_FALSE, 0, width*height, hostSrcPtr, &ve0, &steps[1] );
...
queue.enqueueNDRangeKernel( kernelPitch0, cl::NullRange, globalws, localws, &ve1, &steps[2] );
...
queue.enqueueReadBuffer( devDst, CL_FALSE, 0, width*height, hostDstPtr, &ve2, &steps[3]  );


The result is as expected. The achieved memory copy throughput is around 6.1GB/s. And it makes overall throughput to be 1.4 GB/s. First time over the GIGA. Huray !

Entering test case "TestPitch0Pinned"
Step 1 : start 0 ns, end 647168 ns, duration 647168 ns, 6136.32 MB/s
Step 2 : start 879072 ns, end 2166112 ns, duration 1287040 ns, 3085.55 MB/s
Step 3 : start 2196800 ns, end 2824128 ns, duration 627328 ns, 6330.39 MB/s
Total : duration 2824128 ns, 1406.18 MB/s


Reference : OpenCL Best Practice Guide, Chapter 3.1.1. Pinned Memory

No comments: