I tested performance of DirectCompute feature in Unity where I planned to use it in 3D convex hull calculation.
In short, there seems to be a performance issue when reading data from GPU to CPU, in Unity's direct compute feature.
GPU is really faster when using it in well-aligned way with rendering pipeline. But I have to use it in different way in Convex hull. The point-plane test has to be done several times (300~500 in average for our excavator vertex set) on subset of points with newly generated planes in each loop repetition. The "Loop repetition" is a problem here. Since we need point-plane test result of previous loop in next loop calculation, we should send calculation result from GPU to CPU but it is extremely slow.
As I mentioned in previous post, I predicted the problem but the amount of slowdown is far more than I expected. I also wrote the related question in Unity answers forum but no "answers" yet.
The paper I read(CudaHull) also took similar approach with my plan. But there seems less slowdown in GPU to CPU data transfer. So I suspect that the problem is Unity-specific, or DirectCompute-specific one...
Here is the first test result on 100,000 vertices with tetrahedron. Tetrahedron is changing in each frame and all the vertices are tested if it can be seen in each face of tetrahedron. As you can see, GPU is lot faster than CPU when without a loop.
Test on 100,000 vertices.
Following two clips are 100,000 vertices with 100 and 300 loop. In each loop, subset of points are tested and results are sent to GPU with GetData() function. You can see how long it takes to send data from GPU to CPU.
No comments:
Post a Comment