Computing Memoir: Next Move

Sunday, February 17, 2013

Next Move

CUDA

Using GT 640. PTX instructions. VLIW aware kernel programming

SSE

Any benefit using store without cache ?
Calculate throughput.

EB = ( Br + Bw ) / T , EB : effective bandwidth, Br ; read bytes, Bw : written bytes, T : time

HW

Study hardware - cache, bank and so on

OpenCL

Using NVidia OpenCL with GT 640. Compare result with CUDA. Using buffer. Using Image.
Using NVidia OpenCL with GT 640. vector type
Using Intel OpenCL. Compare result with SSE code
Run kernels on GPU GT 640 and CPU G2120 at the same time.
Buy 3rd generation processor i3 - 3220 - to access GPU HD2500.

Inspection algorithm

Image preprocessing

Geometric Distortion corretion - radial in area camera
Shading correction - Area, line scan. How to define the compensation ?

periodic pattern inspection - 2 points ( hor, ver ), 4 points, 8 points, many points in horizontal and vertical
threshold - constant, adaptive threshold, ...
binary image handling
morphology - open, close, kernel size, iteration count, iteration order
segmentation - contour based, region based, bug follower, area, bounding box calculation

Distributed system

Group communication to manage multiple inspection processor
Join/Leave the group
Send packet that goes out to all group member fast and reliable using broadcast or multi-cast.
Designated member can get the burst of REP from all the group member.
A processor can join more than 1 group
Streaming images to a member - or designated member
REQ, REP, ACK, AYA( Are you alive ? ), IAA ( I am alive ), TA( Try Again ), AU ( Address unknown )
Make API that is flexible and easy to use ? Allow to have different type of REQ possible ? Refer other API for network ?

No comments:

Subscribe to: Post Comments (Atom)