Explanation of CUDA C and C++

Explanation of CUDA C and C++ - c++

Can anyone give me a good explanation as to the nature of CUDA C and C++? As I understand it, CUDA is supposed to be C with NVIDIA's GPU libraries. As of right now CUDA C supports some C++ features but not others.
What is NVIDIA's plan? Are they going to build upon C and add their own libraries (e.g. Thrust vs. STL) that parallel those of C++? Are they eventually going to support all of C++? Is it bad to use C++ headers in a .cu file?

CUDA C is a programming language with C syntax. Conceptually it is quite different from C.
The problem it is trying to solve is coding multiple (similar) instruction streams for multiple processors.
CUDA offers more than Single Instruction Multiple Data (SIMD) vector processing, but data streams >> instruction streams, or there is much less benefit.
CUDA gives some mechanisms to do that, and hides some of the complexity.
CUDA is not optimised for multiple diverse instruction streams like a multi-core x86.
CUDA is not limited to a single instruction stream like x86 vector instructions, or limited to specific data types like x86 vector instructions.
CUDA supports 'loops' which can be executed in parallel. This is its most critical feature. The CUDA system will partition the execution of 'loops', and run the 'loop' body simultaneously across an array of identical processors, while providing some of the illusion of a normal sequential loop (specifically CUDA manages the loop "index"). The developer needs to be aware of the GPU machine structure to write 'loops' effectively, but almost all of the management is handled by the CUDA run-time. The effect is hundreds (or even thousands) of 'loops' complete in the same time as one 'loop'.
CUDA supports what looks like if branches. Only processors running code which match the if test can be active, so a subset of processors will be active for each 'branch' of the if test. As an example this if... else if ... else ..., has three branches. Each processor will execute only one branch, and be 're-synched' ready to move on with the rest of the processors when the if is complete. It may be that some of the branch conditions are not matched by any processor. So there is no need to execute that branch (for that example, three branches is the worst case). Then only one or two branches are executed sequentially, completing the whole if more quickly.
There is no 'magic'. The programmer must be aware that the code will be run on a CUDA device, and write code consciously for it.
CUDA does not take old C/C++ code and auto-magically run the computation across an array of processors. CUDA can compile and run ordinary C and much of C++ sequentially, but there is very little (nothing?) to be gained by that because it will run sequentially, and more slowly than a modern CPU. This means the code in some libraries is not (yet) a good match with CUDA capabilities. A CUDA program could operate on multi-kByte bit-vectors simultaneously. CUDA isn't able to auto-magically convert existing sequential C/C++ library code into something which would do that.
CUDA does provides a relatively straightforward way to write code, using familiar C/C++ syntax, adds a few extra concepts, and generates code which will run across an array of processors. It has the potential to give much more than 10x speedup vs e.g. multi-core x86.
Edit - Plans: I do not work for NVIDIA
For the very best performance CUDA wants information at compile time.
So template mechanisms are the most useful because it gives the developer a way to say things at compile time, which the CUDA compiler could use. As a simple example, if a matrix is defined (instantiated) at compile time to be 2D and 4 x 8, then the CUDA compiler can work with that to organise the program across the processors. If that size is dynamic, and changes while the program is running, it is much harder for the compiler or run-time system to do a very efficient job.
EDIT:
CUDA has class and function templates.
I apologise if people read this as saying CUDA does not. I agree I was not clear.
I believe the CUDA GPU-side implementation of templates is not complete w.r.t. C++.
User harrism has commented that my answer is misleading. harrism works for NVIDIA, so I will wait for advice. Hopefully this is already clearer.
The hardest stuff to do efficiently across multiple processors is dynamic branching down many alternate paths because that effectively serialises the code; in the worst case only one processor can execute at a time, which wastes the benefit of a GPU. So virtual functions seem to be very hard to do well.
There are some very smart whole-program-analysis tools which can deduce much more type information than the developer might understand. Existing tools might deduce enough to eliminate virtual functions, and hence move analysis of branching to compile time. There are also techniques for instrumenting program execution which feeds directly back into recompilation of programs which might reach better branching decisions.
AFAIK (modulo feedback) the CUDA compiler is not yet state-of-the-art in these areas.
(IMHO it is worth a few days for anyone interested, with a CUDA or OpenCL-capable system, to investigate them, and do some experiments. I also think, for people interested in these areas, it is well worth the effort to experiment with Haskell, and have a look at Data Parallel Haskell)

CUDA is a platform (architecture, programming model, assembly virtual machine, compilation tools, etc.), not just a single programming language. CUDA C is just one of a number of language systems built on this platform (CUDA C, C++, CUDA Fortran, PyCUDA, are others.)
CUDA C++
Currently CUDA C++ supports the subset of C++ described in Appendix D ("C/C++ Language Support") of the CUDA C Programming Guide.
To name a few:
Classes
__device__ member functions (including constructors and destructors)
Inheritance / derived classes
virtual functions
class and function templates
operators and overloading
functor classes
Edit: As of CUDA 7.0, CUDA C++ includes support for most language features of the C++11 standard in __device__ code (code that runs on the GPU), including auto, lambda expressions, range-based for loops, initializer lists, static assert, and more.
Examples and specific limitations are also detailed in the same appendix linked above. As a very mature example of C++ usage with CUDA, I recommend checking out Thrust.
Future Plans
(Disclosure: I work for NVIDIA.)
I can't be explicit about future releases and timing, but I can illustrate the trend that almost every release of CUDA has added additional language features to get CUDA C++ support to its current (In my opinion very useful) state. We plan to continue this trend in improving support for C++, but naturally we prioritize features that are useful and performant on a massively parallel computational architecture (GPU).

Not realized by many, CUDA is actually two new programming languages, both derived from C++. One is for writing code that runs on GPUs and is a subset of C++. Its function is similar to HLSL (DirectX) or Cg (OpenGL) but with more features and compatibility with C++. Various GPGPU/SIMT/performance-related concerns apply to it that I need not mention. The other is the so-called "Runtime API," which is hardly an "API" in the traditional sense. The Runtime API is used to write code that runs on the host CPU. It is a superset of C++ and makes it much easier to link to and launch GPU code. It requires the NVCC pre-compiler which then calls the platform's C++ compiler. By contrast, the Driver API (and OpenCL) is a pure, standard C library, and is much more verbose to use (while offering few additional features).
Creating a new host-side programming language was a bold move on NVIDIA's part. It makes getting started with CUDA easier and writing code more elegant. However, truly brilliant was not marketing it as a new language.

Sometimes you hear that CUDA would be C and C++, but I don't think it is, for the simple reason that this impossible. To cite from their programming guide:
For the host code, nvcc supports whatever part of the C++ ISO/IEC
14882:2003 specification the host c++ compiler supports.
For the device code, nvcc supports the features illustrated in Section
D.1 with some restrictions described in Section D.2; it does not
support run time type information (RTTI), exception handling, and the
C++ Standard Library.
As I can see, it only refers to C++, and only supports C where this happens to be in the intersection of C and C++. So better think of it as C++ with extensions for the device part rather than C. That avoids you a lot of headaches if you are used to C.

What is NVIDIA's plan?
I believe the general trend is that CUDA and OpenCL are regarded as too low level techniques for many applications. Right now, Nvidia is investing heavily into OpenACC which could roughly be described as OpenMP for GPUs. It follows a declarative approach and tackles the problem of GPU parallelization at a much higher level. So that is my totally subjective impression of what Nvidia's plan is.

Related

Using libraries like boost in cuda device code

I am learning cuda at the moment and I am wondering if it is possible to use functions from different libraries and api's like boost in cuda device code.
Note: I tried using std::cout and that did not work I got printf working after changing the code generation to compute_20,sm_20. I'm using Visual Studio 2010. Cuda 5.0. GPU Nvidia GTX 570. NSIght is installed.

Here's an answer. And here's the CUDA documentation about language support. Boost won't make it for sure.
Since the purpose of using CUDA is to speed up kernels in your code, you'll typically want to limit the language complexities used, because of adding overhead. This will mean that you'll typically stay very close to plain C, with just a few sprinkles of C++ if that's really handy.
Constructs in for example Boost can result in large amounts of assembly code (and C++ in general has been criticised for this and is a reason to not use certain constructs in real-time software). This is all fine for most applications, but not for kernels you'll want to run on a GPU, where every instruction counts.
For CUDA (or OpenCL), people typically write intense algorithms that work on data in arrays. For example special image processing. You only use these techniques to do the calculation intensive tasks of you application. Then you have a 'regular' program that interacts with the user/network/database which creates these CUDA tasks (i.e. selects the data and parameters) and gives starts them. Here are CUDA samples.

Boost uses the expression templates technique not to loose performance while enabling a simpler syntax.
BlueBird and Newton are libraries using metaprogramming, similarly to Boost, enabling CUDA computations.
ArrayFire is another library using Just in Time compilation and exploiting the CUDA language underneath.
Finally, Thrust, as suggested by Njuffa, is a template library enabling CUDA calculations (but not using metaprogramming, see Evaluating expressions consisting of elementwise matrix operations in Thrust).

Kernel development and C++ [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
From what I know, even though the common OS have parts written in other languages, the kernel is entirely written in C.
I want to know if it's feasible to write a Kernel in C++ and if not, what would be the drawbacks.

There are plenty of examples of well-used operating systems (or parts of them) implemented in C++ - IOKit - the device driver subsystem of MacOSX and IOS is implemented in EC++. Then there's the eCOS RTOS - where the kernel is implemented in C++, even making use of templates.
Operating systems are traditionally awash with examples of OO concepts implemented the hard way in C. In the linux device model kobject is effectively the base-class for driver and device objects, complete with DIY v-tables and some funky arrangements implemented in macros for up and down-casting.
The Windows NT kernel has an even more deeply rooted inheritance hierarchy of kernel objects. And for all of the neigh-sayers complaining about the suitability of exception handling in kernel code, exactly such a mechanism is provided.
Traditionally, the arguments against using C++ in kernel code have been:
Portability: availability of C++ compilers for all intended target platforms. This is not really an issue any more
Cost of C++ language mechanisms such as RTTI and exceptions. Clearly if they were to be used, the standard implementation isn't suitable and a kernel-specific variant needs using. This is generally the driver behind the use of EC++
Robustness of C++ APIs, and particularly the Fragile base-class problem
Undoubtedly, the use of exceptions and RAII paradigm would vastly improve kernel code quality - you only have to look at source code for BSD or linux to see the alternative - enormous amounts of error handling code implemented with gotos.

This is covered explicitly in the OSDev Wiki.
Basically, you either have to implement runtime support for certain things (like RTTI, exceptions), or refrain from using them (leaving only a subset of C++ to be used).
Other than that, C++ is the more complex language, so you need to have a bit more competent developers that won't screw it up. Linus Torvalds hating C++ being purely coincidental, of course.

To address Torvalds' concerns and others mentioned elsewhere here:
In hard-RT systems written in C++, STL/RTTI/exceptions are not used and that same principal can be applied to the much more lenient Linux kernel. Other concerns about "OOP memory model" or "polymorphism overhead" basically show programmers that never really checked what happens at the assembly level or the memory structure. C++ is as efficient, and due to optimized compilers many times more efficient than a C programmer writing lookup tables badly since he doesn't have virtual functions at hand.
In the hands of an average programmer C++ doesn't add any additional assembly code vs a C written piece of code. Having read the asm translation of most C++ constructs and mechanisms, I'd say that the compiler even has more room to optimize vs C and can create even leaner code at times. So as far as performance it's pretty easy to use C++ as efficiently as C, while still utilizing the power of OOP in C++.
So the answer is that it's not related to facts, and basically revolves around prejudice and not really knowing what code CPP creates. I personally enjoy C almost as much as C++ and I don't mind it, but there is no rational against layering an object oriented design above Linux, or in the Kernel itself, it would've done Linux a lot of good.

You can write an OS kernel in more or less any language you like.
There are a few reasons to prefer C, however.
It is a simple language! There's very little magic. You can reason about the machinecode the compiler will generate from your source code without too much difficulty.
It tends to be quite fast.
There's not much of a required runtime; there's minimal effort needed to port that to a new system.
There are lots of decent compilers available that target many many different CPU and system architectures.
By contrast, C++ is potentially a very complex language which involves an awful lot of magic being done to translate your increasingly high-level OOP code into machine code. It is harder to reason about the generated machine code, and when you need to start debugging your panicky kernel or flaky device driver the complexities of your OOP abstractions will start becoming extremely irritating... especially if you have to do it via user-unfriendly debug ports into the target system.
Incidentally, Linus is not the only OS developer to have strong opinions on systems programming languages; Theo de Raadt of OpenBSD has made a few choice quotes on the matter too.

The feasibility of writing a kernel in C++ can be easily established: it has already been done. EKA2 is the kernel of Symbian OS, which has been written in C++.
However, some restrictions to the usage of certain C++ features apply in the Symbian environment.

While there is something "honest" about (ANSI) C, there is also something "honest", in a different way, about C++.
C++'s syntactic support for abstracting objects is very worthwhile, no matter what the application space. The more tools available for misnomer mitigation, the better ... and classes are such a tool.
If some part of an existing C++ compiler does not play well with kernel-level realities, then whittle up a modified version of the compiler that does it the "right" way, and use that.
As far as programmer caliber and code quality, one can write either hideous or sublime code in either C or C++. I don't think it is right to discriminate against people who can actually code OOP well by disallowing it at the kernel level.
That said, and even as a seasoned programmer, I miss the old days of writing in assembler. I like 'em both ... C++ and ASM ... as long as I can use Emacs and source level debuggers (:-).

Revision after many years:
Looking back, I'd say the biggest problem is actually with the tons of high level features in C++, that are either hidden or outside the control of the programmer. The standard doesn't enforce any particular way of implementing things, even if most implementations follow common sanity, there are many good reasons to be 100% explicit and have full control over how things are implemented in a OS kernel.
This allows (as long as you know what you are doing) to reduce memory footprint, optimize data layout based on access patterns rather than OOP paradigms, thus improve cache-friendliness and performance, and avoid potential bugs that might come hidden in the tons of high level features of C++.
Note that even tho far more simple, even C is too unpredictable in some cases, which is one of the reasons there is also a lot of platform specific assembly in the kernel code.

Google's new-coming operating system Fuchsia is based on the kernel called Zircon, which is written mostly in C++, with some parts in assembly language[1] [2]. Plus, the rest of the OS is also written mostly in C++[3]. I think modern C++ gives programmers many reasons to use it as a general programming environment for huge codebases. It has lots of great features, and new features are added regularly. I think this is the main motive behind Google's decision. I think C++ could easily be the future of system programming.

One of the big benefits of C is it's readability. If you have a lot of code, which is more readable:
foo.do_something();
or:
my_class_do_something(&foo);
The C version is explicit about which type foo is every time foo is used. In C++ you have lots and lots of ambiguous "magic" going on behind the scenes. So readability is much worse if you are just looking at some small piece of code.

Why are drivers and firmwares almost always written in C or ASM and not C++?

I am just curious why drivers and firmwares almost always are written in C or Assembly, and not C++?
I have heard that there is a technical reason for this.
Does anyone know this?
Lots of love,
Louise

Because, most of the time, the operating system (or a "run-time library") provides the stdlib functionality required by C++.
In C and ASM you can create bare executables, which contain no external dependencies.
However, since windows does support the C++ stdlib, most Windows drivers are written in (a limited subset of) C++.
Also when firmware is written ASM it is usually because either (A) the platform it is executing on does not have a C++ compiler or (B) there are extreme speed or size constraints.
Note that (B) hasn't generally been an issue since the early 2000's.

Code in the kernel runs in a very different environment than in user space. There is no process separation, so errors are a lot harder to recover from; exceptions are pretty much out of the question. There are different memory allocators, so it can be harder to get new and delete to work properly in a kernel context. There is less of the standard library available, making it a lot harder to use a language like C++ effectively.
Windows allows the use of a very limited subset of C++ in kernel drivers; essentially, those things which could be trivially translated to C, such as variable declarations in places besides the beginning of blocks. They recommend against use of new and delete, and do not have support for RTTI or most of the C++ standard library.
Mac OS X use I/O Kit, which is a framework based on a limited subset of C++, though as far as I can tell more complete than that allowed on Windows. It is essentially C++ without exceptions and RTTI.
Most Unix-like operating systems (Linux, the BSDs) are written in C, and I think that no one has ever really seen the benefit of adding C++ support to the kernel, given that C++ in the kernel is generally so limited.

1) "Because it's always been that way" - this actually explains more than you think - given that the APIs on pretty much all current systems were originally written to a C or ASM based model, and given that a lot of prior code exists in C and ASM, it's often easier to 'go with the flow' than to figure out how to take advantage of C++.
2) Environment - To use all of C++'s features, you need quite a runtime environment, some of which is just a pain to provide to a driver. It's easier to do if you limit your feature set, but among other things, memory management can get very interesting in C++, if you don't have much of a heap. Exceptions are also very interesting to consider in this environment, as is RTTI.
3) "I can't see what it does". It is possible for any reasonably skilled programmer to look at a line of C and have a good idea of what happens at a machine code level to implement that line. Obviously optimization changes that somewhat, but for the most part, you can tell what's going on. In C++, given operator overloading, constructors, destructors, exception, etc, it gets really hard to have any idea of what's going to happen on a given line of code. When writing device drivers, this can be deadly, because you often MUST know whether you are going to interact with the memory manager, or if the line of code affects (or depends on) interrupt levels or masking.
It is entirely possible to write device drivers under Windows using C++ - I've done it myself. The caveat is that you have to be careful about which C++ features you use, and where you use them from.

Except for wider tool support and hardware portability, I don't think there's a compelling reason to limit yourself to C anymore. I often see complicated hand-coded stuff done in C that can be more naturally done in C++:
The grouping into "modules" of functions (non-general purpose) that work only on the same data structure (often called "object") -> Use C++ classes.
Use of a "handle" pointer so that module functions can work with "instances" of data structures -> Use C++ classes.
File scope static functions that are not part of a module's API -> C++ private member functions, anonymous namespaces, or "detail" namespaces.
Use of function-like macros -> C++ templates and inline/constexpr functions
Different runtime behavior depending on a type ID with either hand-made vtable ("descriptor") or dispatched with a switch statement -> C++ polymorphism
Error-prone pointer arithmetic for marshalling/demarshalling data from/to a communications port, or use of non-portable structures -> C++ stream concept (not necessarily std::iostream)
Prefixing the hell out of everything to avoid name clashes: C++ namespaces
Macros as compile-time constants -> C++11 constexpr constants
Forgetting to close resources before handles go out of scope -> C++ RAII
None of the C++ features described above cost more than the hand-written C implementations. I'm probably missing some more. I think the inertia of C in this area has more to do with C being mostly used.
Of course, you may not be able to use STL liberally (or at all) in a constrained environment, but that doesn't mean you can't use C++ as a "better C".

The comments I run into as why a shop is using C for an embedded system versus C++ are:
C++ produces code bloat
C++ exceptions take up too much
room.
C++ polymorphism and virtual tables
use too much memory or execution
time.
The people in the shop don't know
the C++ language.
The only valid reason may be the last. I've seen C language programs that incorporate OOP, function objects and virtual functions. It gets very ugly very fast and bloats the code.
Exception handling in C, when implemented correctly, takes up a lot of room. I would say about the same as C++. The benefit to C++ exceptions: they are in the language and programmers don't have to redesign the wheel.
The reason I prefer C++ to C in embedded systems is that C++ is a stronger typed language. More issues can be found in compile time which reduces development time. Also, C++ is an easier language to implement Object Oriented concepts than C.
Most of the reasons against C++ are around design concepts rather than the actual language.

The biggest reason C is used instead of say extremely guarded Java is that it is very easy to keep sight of what memory is used for a given operation. C is very addressing oriented. Of key concern in writing kernel code is avoiding referencing memory that might cause a page fault at an inconvenient moment.
C++ can be used but only if the run-time is specially adapted to reference only internal tables in fixed memory (not pageable) when the run-time machinery is invoked implicitly eg using a vtable when calling virtual functions. This special adaptation does not come "out of the box" most of the time.
Integrating C with a platform is much easier to do as it is easy to strip C of its standard library and keep control of memory accesses utterly explicit. So what with it also being a well-known language it is often the choice of kernel tools designers.
Edit: Removed reference to new and delete calls (this was wrong/misleading); replaced with more general "run-time machinery" phrase.

The reason that C, not C++ is used is NOT:
Because C++ is slower
Or because the c-runtime is already present.
It IS because C++ uses exceptions.
Most implementations of C++ language exceptions are unusable in driver code because drivers are invoked when the OS is responding to hardware interrupts. During a hardware interrupt, driver code is NOT allowed to use exceptions as that would/could cause recursive interrupts. Also, the stack space available to code while in the context of an interrupt is typically very small (and non growable as a consequence of the no exceptions rule).
You can of course use new(std::nothrow), but because exceptions in c++ are now ubiqutious, that means you cannot rely on any library code to use std::nothrow semantics.
It IS also because C++ gave up a few features of C :-
In drivers, code placement is important. Device drivers need to be able to respond to interrupts. Interrupt code MUST be placed in code segments that are "non paged", or permanently mapped into memory, as, if the code was in paged memory, it might be paged out when called upon, which will cause an exception, which is banned.
In C compilers that are used for driver development, there are #pragma directives that can control which type of memory functions end up on.
As non paged pool is a very limited resource, you do NOT want to mark your entire driver as non paged: C++ however generates a lot of implicit code. Default constructors for example. There is no way to bracket C++ implicitly generated code to control its placement, and because conversion operators are automatically called there is no way for code audits to guarantee that there are no side effects calling out to paged code.
So, to summarise :- The reason C, not C++ is used for driver development, is because drivers written in C++ would either consume unreasonable amounts of non-paged memory, or crash the OS kernel.

C is very close to a machine independent assembly language. Most OS-type programming is down at the "bare metal" level. With C, the code you read is the actual code. C++ can hide things that C cannot.
This is just my opinion, but I've spent a lot of time in my life debugging device drivers and OS related things. Often by looking at assembly language. Keep it simple at the low level and let the application level get fancy.

Windows drivers are written in C++.
Linux drivers are written in c because the kernel is written in c.

Probably because c is still often faster, smaller when compiled, and more consistent in compilation between different OS versions, and with fewer dependencies. Also, as c++ is really built on c, the question is do you need what it provides?
There is probably something to the fact that people that write drivers and firmware are usually used to working at the OS level (or lower) which is in c, and therefore are used to using c for this type of problem.

The reason that drivers and firmwares are mostly written in C or ASM is, there is no dependency on the actual runtime libraries. If you were to imagine this imaginary driver written in C here
#include <stdio.h>
#define OS_VER 5.10
#define DRIVER_VER "1.2.3"
int drivermain(driverstructinfo **dsi){
if ((*dsi)->version > OS_VER){
(*dsi)->InitDriver();
printf("FooBar Driver Loaded\n");
printf("Version: %s", DRIVER_VER);
(*dsi)->Dispatch = fooDispatch;
}else{
(*dsi)->Exit(0);
}
}
void fooDispatch(driverstructinfo *dsi){
printf("Dispatched %d\n", dsi->GetDispatchId());
}
Notice that the runtime library support would have to be pulled in and linked in during compile/link, it would not work as the runtime environment (that is when the operating system is during a load/initialize phase) is not fully set up and hence there would be no clue on how to printf, and would probably sound the death knell of the operating system (a kernel panic for Linux, a Blue Screen for Windows) as there is no reference on how to execute the function.
Put it another way, with a driver, that driver code has privilege to execute code along with the kernel code which would be sharing the same space, ring0 is the ultimate code execution privilege (all instructions allowed), ring3 is where the front end of the operating system runs in (limited execution privilege), in other words, a ring3 code cannot have a instruction that is reserved for ring0, the kernel will kill the code by trapping it as if to say 'Hey, you have no privilege to tread up ring0's domain'.
The other reason why it is written in assembler, is mainly for code size and raw native speed, this could be the case of say, a serial port driver, where input/output is 'critical' to the function in relation to timing, latency, buffering.
Most device drivers (in the case of Windows), would have a special compiler toolchain (WinDDK) which can use C code but has no linkage to the normal standard C's runtime libraries.
There is one toolkit that can enable you to build a driver within Visual Studio, VisualDDK. By all means, building a driver is not for the faint of heart, you will get stress induced activity by staring at blue screens, kernel panics and wonder why, debugging drivers and so on.
The debugging side is harder, ring0 code are not easily accessible by ring3 code as the doors to it are shut, it is through the kernel trap door (for want of a better word) and if asked politely, the door still stays shut while the kernel delegates the task to a handler residing on ring0, execute it, whatever results are returned, are passed back out to ring3 code and the door still stays shut firmly. That is the analogy concept of how userland code can execute privileged code on ring0.
Furthermore, this privileged code, can easily trample over the kernel's memory space and corrupt something hence the kernel panic/bluescreens...
Hope this helps.

Perhaps because a driver doesn't require object oriented features, while the fact that C still has somewhat more mature compilers would make a difference.

There are many style of programming such as procedural, functional, object oriented etc. Object oriented programming is more suited for modeling real world.
I would use object-oriented for device drivers if it suites it. But, most of the time when you programming device drivers, you would not need the advantages provided by c++ such as, abstraction, polymorphism, code reuse etc.

Well, IOKit drivers for MacOSX are written in C++ subset (no exceptions, templates, multiple inheritance). And there is even a possibility to write linux kernel modules in haskell.)
Otherwise, C, being a portable assembly language, perfectly catches the von Neumann architecture and computation model, allowing for direct control over all it's peculiarities and drawbacks (such as the "von Neumann bottleneck"). C does exactly what it was designed for and catches it's target abstraction model completely and flawlessly (well except for implicit assumption in single control flow which could have been generalized to cover the reality of hardware threads) and this is why i think it is a beautiful language.) Restricting the expressive power of the language to such basics eliminates most of the unpredictable transformation details when different computational models are being applied to this de-facto standard. In other words, C makes you stick to basics and allows pretty much direct control over what you are doing, for example when modeling behavior commonality with virtual functions you control exactly how the function pointer tables get stored and used when comparing to C++'s implicit vtbl allocation and management. This is in fact helpful when considering caches.
Having said that, object-based paradigm is very useful for representing physical objects and their dependencies. Adding inheritance we get object-oriented paradigm which in turn is very useful to represent physical objects' structure and behavior hierarchy. Nothing stops anyone from using it and expressing it in C again allowing full control over exactly how your objects will be created, stored, destroyed and copied. In fact that is the approach taken in linux device model. They got "objects" to represent devices, object implementation hierarchy to model power management dependancies and hacked-up inheritance functionality to represent device families, all done in C.

because from system level, drivers need to control every bits of every bytes of the memory, other higher language cannot do that, or cannot do that natively, only C/Asm achieve~

fpga: choosing c++ to program fpga

I keep hearing mostly from electrical engineers that C is used for fpga work.
What about C++? Are there any disadvantages to using C++? I would think that the parallelism desired when programming for hardware would be better served by C++ more than C, no?
Also what do I use after that to make compatible c++ with the hardware?

I'm pretty sure that FPGAs are programmed either in VHDL or Verilog.
http://en.wikipedia.org/wiki/Vhdl
http://en.wikipedia.org/wiki/Verilog
I know that Altera also offers some C to HDL translators. I doubt that they are usable for anything but tiny designs though.

By far and away the easiest way to program an FPGA is via LabView's FPGA module. However this also ties you into their hardware and software. Not a cheap solution, but certainly the fastest way to get your program in hardware without having to learn anything but LabVIEW.

You can use C or C++ to program FPGAs but it requires some very expensive Highlevel Synthesis software. CatapultC is a product from Mentor Graphics which allows you to write your algorithm in untimed C++. It then synthesizes that C++ into RTL VHDL or Verilog. But CatapultC sells for well over $100,000 so it's definitely not for hobbyists. There's another product called ImpulseC which allows you to write C code which then gets synthesized into RTL, but I'm pretty sure that it only handles C not C++. ImpulseC is about $2000.
For hobbyists, you're probably best off sticking with VHDL or Verilog to describe your design and then using the free tools from Xilinx or Altera to synthesize that code and program the FPGA.

There is a big difference between compiling for CPUs and compiling for FPGAs. "Normal" compilers generate binary program code. The special FPGA compilers generate "hardware". There are compilers out there that turn some C-like code into "hardware". But it's not exactly C. It may be a C derivative extented with integer types of arbitrary bit lengths and is probably restricted to iteration and non-recursive function calls.
I am a big fan of C++ but even I see that many parts of it are just not appropriate for FPGAs: virtual functions, RTTI, exceptions. At least that's my impression. I didn't test those C-like FPGA compilers myself but a buddy of mine worked with them and it's supposedly a PITA.

They're probably using C to interface with the FPGA. When working with one in a design class, we used Verilog to program the FPGA and C in the attached Linux board. In that case, they're likely using C as it's easier to bang out a small program in C than in C++.

You're probably talking about SystemC, which is a set of C++ classes and macros used mainly for (transaction-level) modeling, not for synthesis. The high-level model can then be used as a golden reference to verifiy the register transfer level (RTL) description, which is typically coded in VHDL or Verilog.

as some of you I'm a fan of C++. I think it would be great to use SystemC to work with fpga's. So looking for that I found the next page. Maybe it could interest some of you.
http://www.es.ele.tue.nl/~ljozwiak/education/5JJ70p/blocks/4/sc2fpgaflow.html

What you are referring to is "behavioral synthesis", a compilation technique that allows to take sequential code as input (C, SystemC, C++) and generate automatically a FSM+Datapath pair in VHDL or Verilog, that can then be synthesized using regular Xilinx or Altera synthesizers.
To date, there are many "behavioral synthesizers" :
CatapultC from Mentor Graphics allows you to use a big subset of C and also C++
Cynthesizer from Forte Design System (systemC based) [edit 2015: now Cadence]
for FPGA, ImpulseC seems quite mature
[edit 2015] for Xilinx FPGA : Vivado-HLS
Hope this helps

Two that I can think of off the top of my head: C++ is much more complicated to write compilers (in this case HDL translators) for and has too many features that just would not be useful in such low level programming as fpga programming calls for.

Like others have said most FPGA's are designed using VHDL or Verilog. I have also seen PALASM used several years ago for small designs. The design is a logic description that is converted to settings that configure the FPGA. Verilog is based on c so knowing c will help with learning verilog however FPGAs are by nature parallel so even though the syntax might look similar not much else translates.

They might be talking about a programming language like Handel-C, which is some kind of C dialect, targeted for hardware programming. Handel-C can be directly or indirectly compiled to HDL, which in turn creates an FPGA configuration (i.e., the "program" on an FPGA).
Although VHDL and Verilog are much harder to learn, I suggest you start right away. When you're doing FPGA-related stuff, you're usually interested in efficiency. Handel-C will most likely make less efficient code than the code that you can write by hand (in VHDL or Verilog).
Edit: There's no C++ variant of Handel-C or related compilers I have ever head of.

You could use a soft processor running on the FPGA and code in C from there (NIOS from Altera, Microblaze from Xilinx etc).
SOC FPGA using OpenCL to interface with the FPGA from the ARM processor.

You can use C/C++ to program FPGAs. Xilinx has SDSoC, an eclipse based tool to program your FPGAs using C/C++. This basically builds hardware accelerator for part of code flagged to be synthesized. This assumes ARM+FPGA based zynq devices, host code is running on ARM
There is another flow SDAccel for PC based solution. With host code on X86 & accelerator on FPGA connected to your PCIe slot.

FPGA code describes hardware. What that means is you design a digital system and then use code to "describe" the hardware to the compiler. This is mainly done in VHDL or Verilog.
Modern compilers provide inerfaces between software and hardware platforms to let you write C code that can be run on FPGAs. Its not really code that will be uploaded to the FPGA but merely lets the compiler make code for the FPGA based on your C code. For an optimal system, there will need to be a Hardware Engineer who can make changes to the compiler generated code.
Here's a nice article on the issue: https://www.eetimes.com/c-language-techniques-for-fpga-acceleration-of-embedded-software/#
There are some programs like Intel Monitor Program which let you run C code on FPGAs but what it does is create a 'Computer' using the Microprocessor already on Chip to let you see how the code works. Its more of a learning tool.

Not sure if this has been said yet (I didn't read all the responses) but it is a possibility that what you are hearing is a running software in a soft core... For instance Altera has Nios II which can run a linux kernel (uCLinux) which will allow some pre-loaded C programs to run on that soft core which in turn communicates with the FPGA. So an FPGA will still be programmed with VHDL/Verilog for the hardware side, then whatever data it holds could be accessible to a small app running on the OS in the soft core. I'm sure C++ would be allowed so long as uCLinux/whatever kernel is running supports the language.
~doddy

You usually get a larger standard library with C++ and with C, and C is closer to the way that the hardware operates, i.e. easier for electrical engineers.

Using C++ in an embedded environment

Today I got into a very interesting conversation with a coworker, of which one subject got me thinking and googling this evening. Using C++ (as opposed to C) in an embedded environment. Looking around, there seems to be some good trades for and against the features C++ provides, but others Meyers clearly support it. So, I was wondering who would be able to shed some light on this topic and what the general consensus of the community was.

C++ for embedded platforms is perfectly fine - as long as you treat it as a better C. I love the fact that the language is slightly more structured. You can still do all the things that you want to do with C. Just remember to stick to an embedded C library like Newlib or uClibc.
I particularly like the abstraction that we can build using C++, particularly for I/O devices. So, we can have a class for UART and a class for GPIO and what nots. It is cleaner than having a bunch of functions (IMHO).

The fear of C++ among embedded developers is largely a thing of the past, when C++ compilers were not as good as C compilers (optimizations and code quality wise).
This applies especially to modern platforms with 32 bit architectures.
But, C is certainly still the preferred choice for more confined environments (as is assembler for 8 bit or 4 bit targets).
So, it really boils down to the resources your target platform provides, and how much of these resources you are likely to actually require, i.e. if you can afford the 'luxury' of doing embedded development in C++ (or even Java for that matter), because you know that you'll hardly have any issues regarding memory or CPU constraints.
Nowadays, many modern embedded platforms (think gaming consoles, mobile phones, PDAs etc), have really become very capable targets, with RISC architectures, several MB of RAM, and 3D hardware acceleration.
It would be a poor decision, to program such platforms using just C or even assembler out of uninformed performance considerations, on the other hand programming a 16 bit PIC in C++ would probably also be a controversial decision.
So, it's really a matter of asking yourself how much of the power, you'll actually need and how much you can afford to sacrifice, in order to improve the development experience (high level language, faster development, less tedious/redundant tasks).

It sort of depends on the particular nature of your embedded system and which features of C++ you use. The language itself doesn't necessarily generate bulkier code than C.
For example, if memory is your tightest constraint, you can just use C++ like "C with classes" -- that is, only using direct member functions, disabling RTTI, and not having any virtual functions or templates. That will fit in pretty much the same space as the equivalent C code, since you've no type information, vtables, or redundant functions to clutter things up.
I've found that templates are the biggest thing to avoid when memory is really tight, since you get one copy of each template function for each type it's specialized on, and that can rapidly bloat code segment.
In the console video games industry (which is sort of the beefy end of the embedded world) C++ is king. Our constraints are hard limits on memory (512mb on current generation) and realtime performance. Generally virtual functions and templates are used, but not exceptions, since they bloat the stack and are too perf-costly. In fact, one major manufacturer's compiler doesn't even support exceptions at all.

In my previous company all embedded code was written in a small subset of C code due to security (SIL-2) and memory reasons. By introducing a richer language like C++ in that particular scenario would have maybe cause more trouble than benefits.
In all due respect to C++ (which is a language I really love) but I think C - in our particular scenario - was the better choice.
I bet in some cases C++ is just fine to use for embedded applications but it really depends on the application - there is a difference if your program is controlling a nuclear plant or administrating an address book on your cell phone.

I don't know about "general consensus", only the company I work for (which does a lot of development for mobile phones, car navigation systems, DPFs, etc.).
The main drawback I've encountered to using C++ on embedded platforms as opposed to C is that it isn't quite as portable - there are many more cases of compilers that don't adhere to the standard which can cause problems if you need to build your code with more than 1 compiler or outright have bugs in the implementation. Then there are environments where C++ code simply won't run - BREW's issues with relocatable code and its "native OOP" don't play so well with "regular" C++ classes and inheritance.
In the end, though, if you're only targeting 1 platform, I'd say use whatever you think is "better" (faster, less bugs, better design) for your development - in most cases the issues can be worked around quite easily.

Depends what kind of embedded development you are doing. I've done embedded development with both C++, C, and Assembly on various platforms, you can even use Java to write applications on smart phones.
For instance on a smart phone like device that's running Windows CE 5, almost all of the code is C++, including in the operating system. Only small bits are written in C or assembly.
On the other hand I've written code for an MSP430 microcontroller, which was in C, and I probably would have done that in C++ had the compiler been more reliable and standards compliant.
Also I seem to recall a university lecturer of mine talking about writing embedded code in Forth or something. So really any language can do.

Now a days it will all boil down to the C++ runtime support of the platform. You're likely to find a way to compile C++ code down to almost any embedded platform with GCC, but if you can't find a suitable C++ runtime for the platform your efforts will be futile, unless you write your own C++ runtime.

One of the few things I tend to agree with Linus is his opinion about C++ http://thread.gmane.org/gmane.comp.version-control.git/57643/focus=57918
Besides this, if you really really want to use C++ you might want to have a look at http://www.caravan.net/ec2plus/ which describes Embedded C++, or better to say you should not use in C++ for embedded systems.

The big thing keeping us with using C++ for a long time was the VxWorks support for it, which truly sucked. That supposidly has gotten better on VxWorks 6 (yes, it's been out a while... good 'ole vendor lock-in and lack of company vision has kept us stuck on VxWorks 5.5).
So for us it's mostly a question of the environment. After that, C++ can obviously be just as good as C... it's a matter of people understanding what their tool does and how to use it. C++ may make it easier to write incredibly inefficient code, but that doesn't mean we have to succomb to it.

I am currently fighting a problem with exceptions in an embedded Linux application. We are trying to port software written for a different platform that seemed to support exceptions well, but the new compiler tools (a port of gcc) reports errors when creating the eh_frame. I was against using exceptions for this tool, but the developer reassured me that modern compilers would support it well.
My opinion is that there are some advantages to C++, but I would stay away from exceptions and the standard template library. We haven't had problems using virtual functions.

C++ is suitable for microcontrollers and devices without an OS. You just have to know the architecture of the system and be conscious of time and space constrains, especially when doing mission critical programming.
With C++ you can do abstraction which often leads to an increased footprint in the code. You do not want this when programming for a resource-limited machine such as an 8-bit MCU.
Generally, avoid:
Dynamic memory allocation because it represents uncertainty in timing
Overloading
RTTI because the memory cost is large
Exceptions because of the execution speed lowering
Be cautious with virtual functions as they have a resource cost of a vtable per class and one pointer to the vtable per object. Also, use const in place of #define.
As you move up to 16 and 32-bit MCUs, with 10s or 100s of MB RAM, heavier features like the ones mentioned above may be used.
So to round up, C++ is useful for embedded systems. A main benefit is that OOP can be useful when you want to abstract aspects of the microcontroller, for example UART or state machines. But you may want to avoid certain features all of the time and some of the features some of the time, depending on the target you are programming for.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js