At the moment I'm spending some spare time doing CUDA programming.
Spending most of my time with Scala, it would be nice if I could exploit the graphics card also in this language, at the moment some very smart guys are exactly doing this with ScalaCL, which I will definitely have a closer look at in the near future.
Anyhow, I started with the hello world program, vector addition, and my first experiments look promising. If you have to solve problems which are independent from each other and lend themselves for parallel programming, you can achieve very good results. Well, who would have thought that. ;-)
runtimes for an example cuda program |
In the example above, a runtime measurement of a simple CUDA program is shown. The first line shows the runtime for 1024 threads, the second one for 512 and the last one for 256 threads. It's interesting to see the difference between 512 and 256 threads compared to the rather small difference between 1024 and 512 threads.
My point is that you can expect a considerable performance gain if you tune your calculations using the power of the gpu. As always, one has to create means for parameterization, but, even more important, use them to actually to balance the system appropriately.
Of course, those guys already have done programming GPU 101.
Of course, those guys already have done programming GPU 101.
No comments:
Post a Comment