essay on programming languages, computer science, information techonlogies and all.

Monday, December 15, 2014

An attempt on parallel execution in OpenCL

To achieve heterogeneous computing in full strength, it is needed to use every bit of computing power of a system and also every bandwidth possible. One important thing for that road is parallel execution. And one of parallel execution is parallelism between kernel and bus. In other word, when a GPU is running a kernel, the bus should prepare the next data to process.

First of all, the hardware of GPU should be constructed in such a way that when a kernel is running and accessing the memory in it, the bus e.g. PCIe connecting GPU and PC main memory should be able to transfer the next data to run the kernel. I found that it is not so easy to figure out in datasheets of GPU boards.

I tried with my laptop - HP Pavillion e119wm with AMD A10 and AMD Radeon HD 8650G. And installed AMD APP 2.9. I made up the kernel and data size so that time for kernel and data transfer are equal. Refer below the screen shot using AMD Profiler.


Then I tried using out-of-order queue and putting events to run data transfer for next image start with kernel execution. Though it is pushed out to the last somehow.


Then I tried making two in-order queue. Still it is serialized somehow as below screenshot.

Going through internet, found that OpenCL of AMD APP does not support the out-of-order queue, but not easy to find out about report of multiple queue parallel execution. I will try to update when I have more news.