how to use my existing .cpp code to write cuda code [duplicate] - c++

I hv code in c++ and wanted to use it along with cuda.Can anyone please help me? Should I provide my code?? Actually I tried doing so but I need some starting code to proceed for my code.I know how to do simple square program (using cuda and c++)for windows(visual studio) .Is it sufficient to do the things for my program?

The following are both good places to start. CUDA by Example is a good tutorial that gets you up and running pretty fast. Programming Massively Parallel Processors includes more background, e.g. chapters on the history of GPU architecture, and generally more depth.
CUDA by Example: An Introduction to General-Purpose GPU Programming
Programming Massively Parallel Processors: A Hands-on Approach
These both talk about CUDA 3.x so you'll want to look at the new features in CUDA 4.x at some point.
Thrust is definitely worth a look if your problem maps onto it well (see comment above). It's an STL-like library of containers, iterators and algorithms that implements data-parallel algorithms on top of CUDA.
Here are two tutorials on getting started with CUDA and Visual C++ 2010:
http://www.ademiller.com/blogs/tech/2011/03/using-cuda-and-thrust-with-visual-studio-2010/
http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in-visual-studio-2010/
There's also a post on the NVIDIA forum:
http://forums.nvidia.com/index.php?showtopic=184539
Asking very general how do I get started on ... on Stack Overflow generally isn't the best approach. Typically the best reply you'll get is "go read a book or the manual". It's much better to ask specific questions here. Please don't create duplicate questions, it isn't helpful.

It's a non-trivial task to convert a program from straight C(++) to CUDA. As far as I know, it is possible to use C++ like stuff within CUDA (esp. with the announced CUDA 4.0), but I think it's easier to start with only C stuff (i.e. structs, pointers, elementary data types).
Start by reading the CUDA programming guide and by examining the examples coming with the CUDA SDK or available here. I personally found the vector addition sample quite enlightening. It can be found over here.
I can not tell you how to write your globals and shareds for your specific program, but after reading the introductory material, you will have at least a vague idea of how to do.
The problem is that it is (as far as I know) not possible to tell a generic way of transforming pure C(++) into code suitable for CUDA. But here are some corner stones for you:
Central idea for CUDA: Loops can be transformed into different threads executed multiple times in parallel on the GPU.
Therefore, the single iterations optimally are independent of other iterations.
For optimal execution, the single execution branches of the threads should be (almost) the same, i.e. the single threads sould do almost the same.

You can have multiple .cpp and .cu files in your project. Unless you want your .cu files to contain only device code, it should be fairly easy.
For your .cu files you specify a header file, containing host functions in it. Then, include that header file in other .cu or .cpp files. The linker will do the rest. It is nothing different than having multiple plain C++ .cpp files in your project.
I assume you already have CUDA rule files for your Visual Studio.

Related

Parallel computation with unreal engine 4

I am currently researching the usability of Unreal Engine for a computational intensive project and Google have not been terribly helpful.
I need to do some heavy computation, in the background of the game loop, for the project, which is currently implemented using OpenMP. A fluid simulation.
Unreal Engine on windows compiles and builds using visual studio and should as such have support for what the visual studio compiler has, but it is unclear to me if you can make UE compile with the proper settings for OpenMP.
Alternatively, I could easily make this work with Intels threading building blocks TBB, which is apparently already used in UE... though perhaps only the memory allocator part?
In general I guess my question is what support there is for cross platform heavily parallelized computation with UE. If you need to do a large number of identical computations running on all available cores, how do you do that in UE?
I am not interested in manually creating some number of threads, run them on blocks code, wait for completion by joining and then move on. I would much prefer a parallel_for loop for my purpose.
Further more, I wonder how easy it is to use other frameworks such as CUDA or OpenCL from within a UE "script".
I come from Unity where I made c# wrappers around native dlls, written in c++, for high performance, but I am getting a bit tired of this wrapping native in c#, so I would much prefer a c++ based engine... but if that itself is limited in what I can do in my c++ code, then it is not much help... I assume it can all work, when I learn how, but I am a bit confused by not being able to find any info about any of the above questions. Anything with the word "parallel" in it generally lead to blueprints.
There's a ParallelFor() in UE, defined here: "Engine\Source\Runtime\Core\Public\Async\ParallelFor.h"
Signature is void ParallelFor(int32 Num, TFunctionRef<void(int32)> Body, bool bForceSingleThread = false)
Example use can be found easily in the source codes.

OpenCV vs IVT - For Beginners

I'm looking to getting into image/video processing and was searching for a good library to start with. I've heard of two, OpenCV and IVT. I'd like to hear your opinion about which one is better to start with, what are the advantages/disadvantages of both and which one is better for possible commercial use later on.
Both offer very similar functionality - about 95% of it overlaps.
Both are BSD(ish) licensed and are widely used in commercial packages.
IVT is a cleaner more modern C++ design, but the new c++ bindings to opencv work well. Opencv has a few more 'C' type macros but it also means it's usable from C. Opencv is also very well supported by python and other languages, don't know about IVT.
There is also CImg. It requires only a single header file and uses C++ type templates so you write code like result = image.blur().sharpen().edge() almost like Mathematica!
It doesn't have the same depth of functionality, especially in things like recognition and machine learning, but is definitely easier to use. It's GPL/LGPL so usable commercially with care.
OpenCV is much more widely used, so has a bigger set of users who might answer questions - but it also has a MUCH bigger set of beginners asking questions !
The decider for me is that openCV is moving to support (almost) all the functionality in CUDA (ie on a parallel GPU) which is fantastic for anything needing realtime video processing.
Other than this I couldn't comment on the performance I didn't really benchmark IVT enough. OpenCV does use custom SSE2 assembler for a lot of the operations and uses TBB to parallel the rest if you have a multicore/hyperthreaded CPU.
I am a beginner like you and I personally say OpenCV. My first learning experience with OpenCV was more effortless than IVT because documentation was so neat and clear, also there are common beginner books, many tutorials and example projects for OpenCV.

Explanation of CUDA C and C++

Can anyone give me a good explanation as to the nature of CUDA C and C++? As I understand it, CUDA is supposed to be C with NVIDIA's GPU libraries. As of right now CUDA C supports some C++ features but not others.
What is NVIDIA's plan? Are they going to build upon C and add their own libraries (e.g. Thrust vs. STL) that parallel those of C++? Are they eventually going to support all of C++? Is it bad to use C++ headers in a .cu file?
CUDA C is a programming language with C syntax. Conceptually it is quite different from C.
The problem it is trying to solve is coding multiple (similar) instruction streams for multiple processors.
CUDA offers more than Single Instruction Multiple Data (SIMD) vector processing, but data streams >> instruction streams, or there is much less benefit.
CUDA gives some mechanisms to do that, and hides some of the complexity.
CUDA is not optimised for multiple diverse instruction streams like a multi-core x86.
CUDA is not limited to a single instruction stream like x86 vector instructions, or limited to specific data types like x86 vector instructions.
CUDA supports 'loops' which can be executed in parallel. This is its most critical feature. The CUDA system will partition the execution of 'loops', and run the 'loop' body simultaneously across an array of identical processors, while providing some of the illusion of a normal sequential loop (specifically CUDA manages the loop "index"). The developer needs to be aware of the GPU machine structure to write 'loops' effectively, but almost all of the management is handled by the CUDA run-time. The effect is hundreds (or even thousands) of 'loops' complete in the same time as one 'loop'.
CUDA supports what looks like if branches. Only processors running code which match the if test can be active, so a subset of processors will be active for each 'branch' of the if test. As an example this if... else if ... else ..., has three branches. Each processor will execute only one branch, and be 're-synched' ready to move on with the rest of the processors when the if is complete. It may be that some of the branch conditions are not matched by any processor. So there is no need to execute that branch (for that example, three branches is the worst case). Then only one or two branches are executed sequentially, completing the whole if more quickly.
There is no 'magic'. The programmer must be aware that the code will be run on a CUDA device, and write code consciously for it.
CUDA does not take old C/C++ code and auto-magically run the computation across an array of processors. CUDA can compile and run ordinary C and much of C++ sequentially, but there is very little (nothing?) to be gained by that because it will run sequentially, and more slowly than a modern CPU. This means the code in some libraries is not (yet) a good match with CUDA capabilities. A CUDA program could operate on multi-kByte bit-vectors simultaneously. CUDA isn't able to auto-magically convert existing sequential C/C++ library code into something which would do that.
CUDA does provides a relatively straightforward way to write code, using familiar C/C++ syntax, adds a few extra concepts, and generates code which will run across an array of processors. It has the potential to give much more than 10x speedup vs e.g. multi-core x86.
Edit - Plans: I do not work for NVIDIA
For the very best performance CUDA wants information at compile time.
So template mechanisms are the most useful because it gives the developer a way to say things at compile time, which the CUDA compiler could use. As a simple example, if a matrix is defined (instantiated) at compile time to be 2D and 4 x 8, then the CUDA compiler can work with that to organise the program across the processors. If that size is dynamic, and changes while the program is running, it is much harder for the compiler or run-time system to do a very efficient job.
EDIT:
CUDA has class and function templates.
I apologise if people read this as saying CUDA does not. I agree I was not clear.
I believe the CUDA GPU-side implementation of templates is not complete w.r.t. C++.
User harrism has commented that my answer is misleading. harrism works for NVIDIA, so I will wait for advice. Hopefully this is already clearer.
The hardest stuff to do efficiently across multiple processors is dynamic branching down many alternate paths because that effectively serialises the code; in the worst case only one processor can execute at a time, which wastes the benefit of a GPU. So virtual functions seem to be very hard to do well.
There are some very smart whole-program-analysis tools which can deduce much more type information than the developer might understand. Existing tools might deduce enough to eliminate virtual functions, and hence move analysis of branching to compile time. There are also techniques for instrumenting program execution which feeds directly back into recompilation of programs which might reach better branching decisions.
AFAIK (modulo feedback) the CUDA compiler is not yet state-of-the-art in these areas.
(IMHO it is worth a few days for anyone interested, with a CUDA or OpenCL-capable system, to investigate them, and do some experiments. I also think, for people interested in these areas, it is well worth the effort to experiment with Haskell, and have a look at Data Parallel Haskell)
CUDA is a platform (architecture, programming model, assembly virtual machine, compilation tools, etc.), not just a single programming language. CUDA C is just one of a number of language systems built on this platform (CUDA C, C++, CUDA Fortran, PyCUDA, are others.)
CUDA C++
Currently CUDA C++ supports the subset of C++ described in Appendix D ("C/C++ Language Support") of the CUDA C Programming Guide.
To name a few:
Classes
__device__ member functions (including constructors and destructors)
Inheritance / derived classes
virtual functions
class and function templates
operators and overloading
functor classes
Edit: As of CUDA 7.0, CUDA C++ includes support for most language features of the C++11 standard in __device__ code (code that runs on the GPU), including auto, lambda expressions, range-based for loops, initializer lists, static assert, and more.
Examples and specific limitations are also detailed in the same appendix linked above. As a very mature example of C++ usage with CUDA, I recommend checking out Thrust.
Future Plans
(Disclosure: I work for NVIDIA.)
I can't be explicit about future releases and timing, but I can illustrate the trend that almost every release of CUDA has added additional language features to get CUDA C++ support to its current (In my opinion very useful) state. We plan to continue this trend in improving support for C++, but naturally we prioritize features that are useful and performant on a massively parallel computational architecture (GPU).
Not realized by many, CUDA is actually two new programming languages, both derived from C++. One is for writing code that runs on GPUs and is a subset of C++. Its function is similar to HLSL (DirectX) or Cg (OpenGL) but with more features and compatibility with C++. Various GPGPU/SIMT/performance-related concerns apply to it that I need not mention. The other is the so-called "Runtime API," which is hardly an "API" in the traditional sense. The Runtime API is used to write code that runs on the host CPU. It is a superset of C++ and makes it much easier to link to and launch GPU code. It requires the NVCC pre-compiler which then calls the platform's C++ compiler. By contrast, the Driver API (and OpenCL) is a pure, standard C library, and is much more verbose to use (while offering few additional features).
Creating a new host-side programming language was a bold move on NVIDIA's part. It makes getting started with CUDA easier and writing code more elegant. However, truly brilliant was not marketing it as a new language.
Sometimes you hear that CUDA would be C and C++, but I don't think it is, for the simple reason that this impossible. To cite from their programming guide:
For the host code, nvcc supports whatever part of the C++ ISO/IEC
14882:2003 specification the host c++ compiler supports.
For the device code, nvcc supports the features illustrated in Section
D.1 with some restrictions described in Section D.2; it does not
support run time type information (RTTI), exception handling, and the
C++ Standard Library.
As I can see, it only refers to C++, and only supports C where this happens to be in the intersection of C and C++. So better think of it as C++ with extensions for the device part rather than C. That avoids you a lot of headaches if you are used to C.
What is NVIDIA's plan?
I believe the general trend is that CUDA and OpenCL are regarded as too low level techniques for many applications. Right now, Nvidia is investing heavily into OpenACC which could roughly be described as OpenMP for GPUs. It follows a declarative approach and tackles the problem of GPU parallelization at a much higher level. So that is my totally subjective impression of what Nvidia's plan is.

(Re)Starting with C++ (for scientific computing)

I have a fair hang of programming in various languages. I have been implementing my codes for research using MATLAB (during the past few months) and for the first time really noticed the difference in execution speed of MATLAB v$ C. (As much as I love the blazingly fast prototyping capabilities).
I am looking to pickup C++ and start using it in my research. I am aware of OOP and have programmed fair bit of Java (relatively long back) and C++ (even longer back). I would like to really get deep into C++ now and hence need suggestions for resources on the same:
What C++ things I need to pick up (STLs and. ) to really make good use of C++?
What is a good tutorial/manual to get started with?
What are the numerical/scientific libraries for C++? GSL? Is there a equivalent (features) of Scipy/Numpy for C++?
I shall be programming on Linux, so I shall be using g++ .
Any pointers to previous SO questions also appreciated.
You'll want to get to grips with parallel programming as quickly as possible. For message-passing I like this book by Karniadakis and Kirby. Of the books on OpenMP, for distributed-memory programming, this one is the best.
If you can get access to them, then Intel's Threading Building Blocks, Maths Kernel Library, and Integrated Performance Primitives are good to have. If not, there are plenty of open source alternatives, start looking at Netlib.
Oh, I almost forgot BOOST, which is a must.
In regards to numerical stuff like Numpy, you should have a look at both:
Blitz++ http://www.oonumerics.org/blitz/
and
Jama/TNT http://math.nist.gov/tnt/download.html
On the library side, check out Armadillo. It almost gives you the full extent of MATLAB's array manipulation syntax and uses LAPACK and BLAS (ATLAS) under the hood.
This tutorial absolutely rocks, but you may not want to tackle it initially.
http://www.parashift.com/c++-faq/
Make sure to read up on the STL (standard template library) and other stuff, using sites like:
http://cplusplus.com/
And, check out the Boost library:
http://www.boost.org/
To make really good use of C++, you need to learn at least the STL, that alone will save you lots of time, but as parashift mentions, C++ OOP is only programming with objects, if you don't use dynamic bindings.
TRNG is a parallel random number generation library. It allows you to create multiple independent streams and was designed for use on clusters.

Matlab vs. Visual C++? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm doing a Windows Application the uses lots of charts. Its practically a dataviewer. I started doing Matlab, because its easier, but I realized it's too slow. I wanted to change to another language. Somebody recommended me Visual C++ or Java. But Im not sure. What language should I use??
In my opinion the speed gain from going to another "faster" language is not as much as refining your algorithm.
The "problem" with MATLAB is that it allows you to do some nasty things, such as resizing your matrix in a tight loop. You should really try to pinpoint your bottlenecks using the following command:
profile on
... run your program
profile off
profile report
This will give you nice information about which function takes how long to execute and which line creates the biggest bottleneck. You can also see how many times a function is called and a M-Lint Code Check Report is included.
These measurements and hints can show you the bottlenecks of your algorithm. if your sure there isn't a way to reduce the callcount/speed of a function using a smarter algorithm. Such as do I really need that big 2d matrix where a smart vector would be large enough, or if I found a artifact, why would I still continue searching for artifacts. You could write the functions you're experiencing the most performance problems with in c/c++ and use it as a function in matlab. You can get a big speedup out of correctly choosing which functions to implement in c/c++. There is an amount of overhead with calling a c/c++ function from MATLAB, or more correctly there is a overhead in c/c++ to get the data from MATLAB, so a function which is called 10000 times will not be the best to implement in c/c++, you'd be better of with the function higher up the callstack.
It depends on what your requirements are.
The advantage of using matlab is that it's strong in numerical calculations. If you don't need that, then there is no advantage to using matlab. In this case, all those languages are okay, and many others (Python, C#, ...) as well. It depends on which language you are most comfortable with.
If you do want the advantages of matlab then:
Try optimizing in matlab. Most optimization techniques are language independent.
There are tools to translate matlab to C automatically. You can then try to compile with all optimizations on. I seriously doubt this will help much, however, especially considering the GUI part.
First and foremost, as other answers have mentioned, you need to profile your code to find out where the bottleneck is. I would check out Doug Hull's blog at The MathWorks, specifically this entry about using the profiler. This will help you find out where all the work is being done in your code.
If the source of the slowdown is associated with data processing, there may be a number of ways to speed things up (vectorizing, writing a mex file, etc.).
If the source of the slowdown is your GUI, this may be even easier to solve. There are a number of blog posts, both from Doug and other MathWorkers, which I've seen that deal with GUI design. There have also been a few questions on SO dealing with it (here's one). If you're dealing with displaying very large data sets, this submission from Jiro Doke on The MathWorks File Exchange may help speed things up.
It's hard to give you more specific advice since I don't know how you are designing your GUI, but if that turns out to be the bottleneck in displaying your data there are many resources to turn to for improving its speed before you go through the hassle of switching to a whole other language.
Don't forget that you can create functions in C++ that can be called from Matlab. And TADA, you have access to both environments !
I would use C#. It is easier than C++ and integrates well with the Windows platform. Just find a free graphing library for it and you're good to go.
There are plenty of other options depending on your preference of language. Eg. Qt with Python or C++.
As far as I know, the most common methodology is to first do the proof of concept or just the main algorithm on Matlab, because of its ease of use and convenience for math calculations, and after that to translate it to a "real" programming language in order to improve the performance. Usually C or C++ act as the "real" language, but in your case, aiming to do a Windows application, perhaps C# will be the best option.
I found that GUI programming in MATLAB can get really nasty if your application gets more complex. BTW MATLAB can also be called from Java easily (and vice versa, current versions basically provide an interactive Java console).
Just as a side note, if you still need the math power of Matlab, you may want to check out Scilab. It's open-source and free, and it has examples of how it can be integrated with other C# or C++. I have created projects on which Scilab was running in the background to perform all the data math operations; and displaying them with C#'s ZedGraph library. Worked like magic!
I suggest you using Java and the JFreeChart (http://www.jfree.org/jfreechart/) library. I found very easy (and fast) developing applications with a lot of charts of different typologies. If you don't need particularly fast performances, you can use Java. I suppose that there are similar libraries for C#, but I'm not sure.
An alternative to Matlab and Scilab is another free software: Octave.
I don't know about Scilab, but Octave syntax is nearly the same as matlab so you can import code with minimal effort.
If you need fancy toolboxes though, Scilab and Octave might let you down, so check this.
You can execute Octave functions in a C++ program:
http://en.wikipedia.org/wiki/GNU_Octave
I do not think that you can call your own m-files functions from your C++ program though. In the past, the Matlab compiler would let users run matlab programs without installing Matlab, but not without installing a huge library (250 MB if I remember correctly). Nevermind if your Matlab program took 20 kB, you had to distribute the huge library.
Please someone edit/comment on the situation today!
It has been a while since I used the GUI "ability" of Matlab, but back then (2005) I found it awful. Ugly, hard to use, very hard to maintain, dependent on user settings of windows parameters.
Please comment or edit on that too, they may have made progress!
If they have not, I believe that Matlab is NOT the way to go for a program that you want to deliver to anyone.
If you can use Visual Studio for doing your GUI, do that. I second the earlier opinions: go with what you're comfortable with.
If you need the Matlab functions, go with what you're comfortable with, that supports Matlab libraries.
First of all Visual C++ is not a language is an IDE for developing applications.
Second... Which languages do you know? You can have several options. Take a look to:
C++ + Qt (Mine preferred option, powerful and easy to understand)
C# + .NET or WPF
Java
If you can tell more information we could find a language that matches your needs.