To write a bootloader in C or C++? - c++

I am writing a program, more specifically a bootloader, for an embedded system. I am going to use a C library to interact with some of the hardware components and I have the choice of writing it either in C or C++. Is there any reason I should choose one over the other? I do not need the object oriented features of C++ but it does have a stronger type system. Could it have other language features that would make the program more robust? I know some people avoid C++ because it can (but not always) generate large firmware images.

This isn't a particularly straightforward question to answer. It depends on a number of factors including:
How you prefer to layout your code.
Whether there's a C++ compiler available for your target (and any other targets you may wish to use the bootloader on).
How critical the code size is for your application (we're talking about 10% extra maybe, not MB as suggested by another answer).
Personally, I really like classes as a way of laying out my code. Even when writing C code, I'll tend to keep everything in modular files with file-scope static functions "simulating" member functions and (a few) file-scope static variables to "simulate" member variables. Having said that, most of my existing embedded projects (all of which are relatively small scale, up to a maximum of 128kB flash including bootloader, but usually less) have tended to be written in C. Now that I have a C++ compiler though, I'm certainly considering moving to C++.
There are considerable benefits to C++ from simply using references, overloading and templates, even if you don't go as far as classes. Certainly, I'd stop short of using a lot of more advanced features, including the use of dynamic memory allocation (new). Then again, I'd avoid dynamic memory allocation (malloc etc) in embedded C as well if possible.
If you have a C++ compiler (even if it's only g++), it is worth running your code through it just for the additional type checking so that you can reduce the number of problems in your code. The C++ compiler can pick up on a few things that even static analysis tools won't spot.
For a good discussion on many invalid reasons people reject C++, see Dan Saks' article on Embedded.com.

For a boot-loader the obvious choice is C, especially on an embedded system. The generated code will need to be close to the metal, and very easy to debug, likely by dropping into assembly, which quickly becomes difficult without care in C++. Also C tool-chains are far more ubiquitous than C++ tool-chains, allowing your boot-loader to be used on more platforms. Lastly, generated binaries are typically smaller, and use less memory when written C style.

If you don't need to use Object Orientation, use C. Simple choice there. Its simpler and easier, whilst accomplishing the same task.
Some die hards will disagree, but OO is what makes C++ > C, and vice versa in a lot of circumstances.

I would use C unless there is a specific reason to use C++. For a Bootloader you are not really going to need OO.
Use the simplest tool that will accomplish the job.

Write programs in C is not the same as writing it in C++. If you know how to do it only in C++, then your choice is C++. For writing bootloader it will be better to minimize code, so you probably will have to disable standard C++ library. If you know how to write in C then you should use C — it is more common choice for such kind of tasks.

Most of the previous answers assume that your bootloader is small and simple which is typically the case; however, if it becomes more complex (i.e. you need to be able to load from an Ethernet port, a USB port, or a serial port...you need to validate the code that is being loaded before you wipe out your existing code, etc.) you may want to consider C++.
I have also found that the bootloader and the application typically share some amount of common code so you may also want to consider using the same language as your application to facilitate the code sharing.

The C language is substantially easier to parse than C++. This means a program that is both valid C and valid C++ will compile faster as a C program. Probably not a major concern, but it is just another reason why C++ is probably overkill.

Go with C++ and objchoose what language features you need. You still have full control of the output object code as long as you understand the C++ abstractions that you're using.
Use of OO can still run well if you avoid the use of virtual functions. Avoid immutable object types that require a lot of copying in order to pass values, like std::string. But, you can still use features like templates without any real impact on runtime performance.

Use C with µClibc. It will make your code simpler and reduce its footprint. Can be found in: www.uclibc.org.

Related

Are there any downsides in using C++ for network daemons?

I've been writing a number of network daemons in different languages over the past years, and now I'm about to start a new project which requires a new custom implementation of a properitary network protocol.
The said protocol is pretty simple - some basic JSON formatted messages which are transmitted in some basic frame wrapping to have clients know that a message arrived completely and is ready to be parsed.
The daemon will need to handle a number of connections (about 200 at the same time) and do some management of them and pass messages along, like in a chat room.
In the past I've been using mostly C++ to write my daemons. Often with the Qt4 framework (the network parts, not the GUI parts!), because that's what I also used for the rest of the projects and it was simple to do and very portable. This usually worked just fine, and I didn't have much trouble.
Being a Linux administrator for a good while now, I noticed that most of the network daemons in the wild are written in plain C (of course some are written in other languages, too, but I get the feeling that > 80% of the daemons are written in plain C).
Now I wonder why that is.
Is this due to a pure historic UNIX background (like KISS) or for plain portability or reduction of bloat? What are the reasons to not use C++ or any "higher level" languages for things like daemons?
Thanks in advance!
Update 1:
For me using C++ usually is more convenient because of the fact that I have objects which have getter and setter methods and such. Plain C's "context" objects can be a real pain at some point - especially when you are used to object oriented programming.
Yes, I'm aware that C++ is a superset of C, and that C code is basically C++ you can compile any C code with a C++ compiler. But that's not the point. ;)
Update 2:
I'm aware that nowadays it might make more sense to use a high level (scripting) language like Python, node.js or similar. I did that in the past, and I know of the benefits of doing that (at least I hope I do ;) - but this question is just about C and C++.
I for one can't think of any technical reason to chose C over C++. Not one that I can't instantly think of a counterpoint for anyway.
Edit in reply to edit: I would seriously discourage you from considering, "...C code is basically C++." Although you can technically compile any C program with a C++ compiler (in as far as you don't use any feature in C that's newer than what C++ has adopted) I really try to discourage anyone from writing C like code in C++ or considering C++ as "C with objects."
In response to C being standard in Linux, only in as far as C developers keep saying it :p C++ is as much part of any standard in Linux as C is and there's a huge variety of C++ programs made on Linux. If you're writing a Linux driver, you need to be doing it in C. Beyond that...I know RMS likes to say you're more likely to find a C compiler than a C++ one but that hasn't actually been true for quite a long time now. You'll find both or neither on almost all installations.
In response to maintainability - I of course disagree.
Like I said, I can't think of one that can't instantly be refuted. Visa-versa too really.
The resistance to C++ for the development for daemon code stem from a few sources:
C++ has a reputation for being hard to avoid memory leaks. And memory leaks are a no no in any long running software. This is to a degree untrue - the problem is developers with a C background tend to use C idioms in C++, and that is very leaky. Using the available C++ features like vectors and smart pointers can produce leak free code.
As a converse, the smart pointer template classes, while they hide resource allocation and deallocation from the programmer, do a lot of it under the covers. In fact C++ generally has a lot of implicit allocation as a result of copy constructors and so on. As a result the C++ heap can become fragmented over time and daemon processes will eventually fail with an out of memory error even though there is sufficient RAM. This can be ameliorated by the use of modern heap managers that are more fragmenttation resistant, but they do this by consuming more resource up front.
while this doesn't apply to usermode daemon code, kernel mode developers avoid C++, again because of the implicit code C++ generates, and the exceptions C++ libraries use to handle errors. Most c++ compilers implement c++ exceptions in terms of hardware exceptions, and lots of kernel mode code is executed in environments where exceptions are not allowed to be thrown. Also, all the implicit code generated by c++, being implicit, cannot be wrapped in #pragma directives to guarantee its placement in pageable, or non pageable memory.
As a result, C++ is not possible for kernel development on any platform at all, and generally shunned by daemon developers too. Even if one's code is written using the proper smart memory management classes and does not leak - keeping on top of potential memory fragmentation issues makes languages where memory allocation is explicit a preferred choice.
I would recommend whichever you feel more comfortable with. If you are more comfortable with C++, your code is going to be cleaner, and run more efficiently, as you'll be more used to it, if you know what I mean.
The same applies on a larger scale to something like a Python vs Perl discussion. Whichever you are more comfortable with will probably produce better code, because you'll have experience.
I think the reason is that ANSI C is the standard programming language in Linux. It is important to follow this standard whenever people want to share their code with others etc. But it is not a requirement if you just want to write something for yourself.
You personally can use C or C++ and the result will be identical. I think you should choose C++ if you know it well and can exploit some special object oriented features of it in your code. Don't look too much to other people here, if you are good in C++ just go and write your daemon in C++. I would personally write it in C++ as well.
You're right. The reason for not using C++ is KISS, particularly if you ever intend for someone else to maintain your code down the road. Most folks that I know of learned to write daemons from existing source or reading books by Stevens. Pretty much that means your examples will be in C. C++ is just fine, I've written daemons in it myself, but I think if you expect it to be maintained and you don't know who the maintainer might be down the road it shows better foresight to write in C.
Boost makes it incredibly easy to write single threaded, or multi-threaded and highly scalable, networking daemons with the asio library.
I would recommend using C++, with a reservation on using exception handling and dynamic RTTI. These features may have run time performance cost implications and may not be supported well across platforms.
C++ is more modular and maintainable so if you can avoid these features go ahead and use it for your project.
Both C and C++ are perfectly suited for the task of writing daemons.
Besides that, nowadays, you should consider also scripting languages as Perl or Python. Performance is usually just good enough and you will be able to write applications more robust and in less time.
BTW, take a look at ACE, a framework for writting portable network applications in C++.

C or C++ to write a compiler?

I want to write a compiler for a custom markup language, I want to get optimum performance and I also want to have a good scalable design.
Multi-paradigm programming language (C++) is more suitable to implement modern design patterns, but I think that will degrade performance a little bit (think of RTTI for example) which more or less might make C a better choice.
I wonder what is the best language (C, C++ or even objective C) if someone wants to create a modern compiler (in the sense of complying to modern software engineering principles as a software) that is fast, efficient, and well designed.
The "expensive" features of C++ (e.g., exceptions, virtual functions, RTTI) simply don't exist in C. By the time you simulate them in C, you're likely to end up with something at least as expensive as it is in C++, but less known, less documented, etc. (let's face it: compiler writers aren't stupid -- while it's possible you can implement a feature "better" than them, it's not really particularly likely).
In the other direction, templates (for one example) often make it relatively easy to write code that is considerably faster than is practical in C. Just for one obvious example, C++ code using std::sort will often be two to three times as fast as equivalent C code using qsort.
Bottom line: the only reason for a C++ program to be slower than an equivalent written in C is if you've decided (for whatever reason) to write slower code. Common reasons are simplicity and readability -- and in most cases, those are more important than execution speed. Nonetheless, using C++ doesn't necessarily carry any speed penalty. It's completely up to you to decide whether to do something that might run more slowly.
C++ adheres to a "pay only for what you use" policy. You are not going to see performance hits due to the language choice; the performance of your application will be purely dependent upon your implementation.
Have you considered OCaml? Functional languages are well-suited for compiler writing. Pattern matching is an extremely useful construct, and the lack of side effects will make parallelization easy.
OCaml can be compiled to native code, and its performance is comparable to C and C++. Its standard library is somewhat lacking, but you don't really much else to write a compiler.
F# is a very similar language if you prefer a .NET environment.
People who write compilers in C as their basic language usually have the good sense to use tools for certain parts of it.
Specifically, go find out about lex and yacc (in their free implementations, flex and bison).
This advice almost certainly applies to any other language you choose, be it C++, Java or whatever.
I dont have any links but from what i hear and from experience C/C++ is a poor language to write a compiler with. First of all, do you really honestly need it to be scalable? Or scalable at this stage? Especially for a markup language? your not compiling 60+ mb of source so i dont think you actually need it to be scalable.
Anyways for my programming language i used bison for the parser (reading bison+flex is a must, try to avoid all conflicts my language has none). Then i use both C and C++ for the code. C because bison uses C and i just call a simple C function which creates and fill in a struct to create an abstract syntax tree. Then when its done it calls my C++ code that runs through the AST and generate the binary.
Standard ML is suppose to be really good with creating a language. If you dont use that a functional language is a good choice because it fits with the mindset (parsing may be left to right but your function calls wont be in that order). So i recommend that if you dont use bison (or know how to call it using C/C++ and bison).
Note: I tried writing a compiler twice. The first time in C without bison the 2nd time with bison. Theres no question that it would have taken me exponentially longer due to the fact that bison finds the conflicts for me and i am not doomed in debug land (i would probably in fact try to figure out a way to report conflicts before i write the code which is exactly what bison does)
Forget what programming language you use & also given that you have huge memory support in these modern computer era you could write good & fast programs using interpreted language and also very bad & slow running programs using C/C++ (compiled languages) & vice versa.
What is important is to use right data structures and algorithms & follow the style/patterns of the programming language you use to implement it. Remember that some one said "OO is not a panacea" & to the other extent some one else also said "show your data structures and I will code up the algorithm for the problem you are trying to solve".

Why are drivers and firmwares almost always written in C or ASM and not C++?

I am just curious why drivers and firmwares almost always are written in C or Assembly, and not C++?
I have heard that there is a technical reason for this.
Does anyone know this?
Lots of love,
Louise
Because, most of the time, the operating system (or a "run-time library") provides the stdlib functionality required by C++.
In C and ASM you can create bare executables, which contain no external dependencies.
However, since windows does support the C++ stdlib, most Windows drivers are written in (a limited subset of) C++.
Also when firmware is written ASM it is usually because either (A) the platform it is executing on does not have a C++ compiler or (B) there are extreme speed or size constraints.
Note that (B) hasn't generally been an issue since the early 2000's.
Code in the kernel runs in a very different environment than in user space. There is no process separation, so errors are a lot harder to recover from; exceptions are pretty much out of the question. There are different memory allocators, so it can be harder to get new and delete to work properly in a kernel context. There is less of the standard library available, making it a lot harder to use a language like C++ effectively.
Windows allows the use of a very limited subset of C++ in kernel drivers; essentially, those things which could be trivially translated to C, such as variable declarations in places besides the beginning of blocks. They recommend against use of new and delete, and do not have support for RTTI or most of the C++ standard library.
Mac OS X use I/O Kit, which is a framework based on a limited subset of C++, though as far as I can tell more complete than that allowed on Windows. It is essentially C++ without exceptions and RTTI.
Most Unix-like operating systems (Linux, the BSDs) are written in C, and I think that no one has ever really seen the benefit of adding C++ support to the kernel, given that C++ in the kernel is generally so limited.
1) "Because it's always been that way" - this actually explains more than you think - given that the APIs on pretty much all current systems were originally written to a C or ASM based model, and given that a lot of prior code exists in C and ASM, it's often easier to 'go with the flow' than to figure out how to take advantage of C++.
2) Environment - To use all of C++'s features, you need quite a runtime environment, some of which is just a pain to provide to a driver. It's easier to do if you limit your feature set, but among other things, memory management can get very interesting in C++, if you don't have much of a heap. Exceptions are also very interesting to consider in this environment, as is RTTI.
3) "I can't see what it does". It is possible for any reasonably skilled programmer to look at a line of C and have a good idea of what happens at a machine code level to implement that line. Obviously optimization changes that somewhat, but for the most part, you can tell what's going on. In C++, given operator overloading, constructors, destructors, exception, etc, it gets really hard to have any idea of what's going to happen on a given line of code. When writing device drivers, this can be deadly, because you often MUST know whether you are going to interact with the memory manager, or if the line of code affects (or depends on) interrupt levels or masking.
It is entirely possible to write device drivers under Windows using C++ - I've done it myself. The caveat is that you have to be careful about which C++ features you use, and where you use them from.
Except for wider tool support and hardware portability, I don't think there's a compelling reason to limit yourself to C anymore. I often see complicated hand-coded stuff done in C that can be more naturally done in C++:
The grouping into "modules" of functions (non-general purpose) that work only on the same data structure (often called "object") -> Use C++ classes.
Use of a "handle" pointer so that module functions can work with "instances" of data structures -> Use C++ classes.
File scope static functions that are not part of a module's API -> C++ private member functions, anonymous namespaces, or "detail" namespaces.
Use of function-like macros -> C++ templates and inline/constexpr functions
Different runtime behavior depending on a type ID with either hand-made vtable ("descriptor") or dispatched with a switch statement -> C++ polymorphism
Error-prone pointer arithmetic for marshalling/demarshalling data from/to a communications port, or use of non-portable structures -> C++ stream concept (not necessarily std::iostream)
Prefixing the hell out of everything to avoid name clashes: C++ namespaces
Macros as compile-time constants -> C++11 constexpr constants
Forgetting to close resources before handles go out of scope -> C++ RAII
None of the C++ features described above cost more than the hand-written C implementations. I'm probably missing some more. I think the inertia of C in this area has more to do with C being mostly used.
Of course, you may not be able to use STL liberally (or at all) in a constrained environment, but that doesn't mean you can't use C++ as a "better C".
The comments I run into as why a shop is using C for an embedded system versus C++ are:
C++ produces code bloat
C++ exceptions take up too much
room.
C++ polymorphism and virtual tables
use too much memory or execution
time.
The people in the shop don't know
the C++ language.
The only valid reason may be the last. I've seen C language programs that incorporate OOP, function objects and virtual functions. It gets very ugly very fast and bloats the code.
Exception handling in C, when implemented correctly, takes up a lot of room. I would say about the same as C++. The benefit to C++ exceptions: they are in the language and programmers don't have to redesign the wheel.
The reason I prefer C++ to C in embedded systems is that C++ is a stronger typed language. More issues can be found in compile time which reduces development time. Also, C++ is an easier language to implement Object Oriented concepts than C.
Most of the reasons against C++ are around design concepts rather than the actual language.
The biggest reason C is used instead of say extremely guarded Java is that it is very easy to keep sight of what memory is used for a given operation. C is very addressing oriented. Of key concern in writing kernel code is avoiding referencing memory that might cause a page fault at an inconvenient moment.
C++ can be used but only if the run-time is specially adapted to reference only internal tables in fixed memory (not pageable) when the run-time machinery is invoked implicitly eg using a vtable when calling virtual functions. This special adaptation does not come "out of the box" most of the time.
Integrating C with a platform is much easier to do as it is easy to strip C of its standard library and keep control of memory accesses utterly explicit. So what with it also being a well-known language it is often the choice of kernel tools designers.
Edit: Removed reference to new and delete calls (this was wrong/misleading); replaced with more general "run-time machinery" phrase.
The reason that C, not C++ is used is NOT:
Because C++ is slower
Or because the c-runtime is already present.
It IS because C++ uses exceptions.
Most implementations of C++ language exceptions are unusable in driver code because drivers are invoked when the OS is responding to hardware interrupts. During a hardware interrupt, driver code is NOT allowed to use exceptions as that would/could cause recursive interrupts. Also, the stack space available to code while in the context of an interrupt is typically very small (and non growable as a consequence of the no exceptions rule).
You can of course use new(std::nothrow), but because exceptions in c++ are now ubiqutious, that means you cannot rely on any library code to use std::nothrow semantics.
It IS also because C++ gave up a few features of C :-
In drivers, code placement is important. Device drivers need to be able to respond to interrupts. Interrupt code MUST be placed in code segments that are "non paged", or permanently mapped into memory, as, if the code was in paged memory, it might be paged out when called upon, which will cause an exception, which is banned.
In C compilers that are used for driver development, there are #pragma directives that can control which type of memory functions end up on.
As non paged pool is a very limited resource, you do NOT want to mark your entire driver as non paged: C++ however generates a lot of implicit code. Default constructors for example. There is no way to bracket C++ implicitly generated code to control its placement, and because conversion operators are automatically called there is no way for code audits to guarantee that there are no side effects calling out to paged code.
So, to summarise :- The reason C, not C++ is used for driver development, is because drivers written in C++ would either consume unreasonable amounts of non-paged memory, or crash the OS kernel.
C is very close to a machine independent assembly language. Most OS-type programming is down at the "bare metal" level. With C, the code you read is the actual code. C++ can hide things that C cannot.
This is just my opinion, but I've spent a lot of time in my life debugging device drivers and OS related things. Often by looking at assembly language. Keep it simple at the low level and let the application level get fancy.
Windows drivers are written in C++.
Linux drivers are written in c because the kernel is written in c.
Probably because c is still often faster, smaller when compiled, and more consistent in compilation between different OS versions, and with fewer dependencies. Also, as c++ is really built on c, the question is do you need what it provides?
There is probably something to the fact that people that write drivers and firmware are usually used to working at the OS level (or lower) which is in c, and therefore are used to using c for this type of problem.
The reason that drivers and firmwares are mostly written in C or ASM is, there is no dependency on the actual runtime libraries. If you were to imagine this imaginary driver written in C here
#include <stdio.h>
#define OS_VER 5.10
#define DRIVER_VER "1.2.3"
int drivermain(driverstructinfo **dsi){
if ((*dsi)->version > OS_VER){
(*dsi)->InitDriver();
printf("FooBar Driver Loaded\n");
printf("Version: %s", DRIVER_VER);
(*dsi)->Dispatch = fooDispatch;
}else{
(*dsi)->Exit(0);
}
}
void fooDispatch(driverstructinfo *dsi){
printf("Dispatched %d\n", dsi->GetDispatchId());
}
Notice that the runtime library support would have to be pulled in and linked in during compile/link, it would not work as the runtime environment (that is when the operating system is during a load/initialize phase) is not fully set up and hence there would be no clue on how to printf, and would probably sound the death knell of the operating system (a kernel panic for Linux, a Blue Screen for Windows) as there is no reference on how to execute the function.
Put it another way, with a driver, that driver code has privilege to execute code along with the kernel code which would be sharing the same space, ring0 is the ultimate code execution privilege (all instructions allowed), ring3 is where the front end of the operating system runs in (limited execution privilege), in other words, a ring3 code cannot have a instruction that is reserved for ring0, the kernel will kill the code by trapping it as if to say 'Hey, you have no privilege to tread up ring0's domain'.
The other reason why it is written in assembler, is mainly for code size and raw native speed, this could be the case of say, a serial port driver, where input/output is 'critical' to the function in relation to timing, latency, buffering.
Most device drivers (in the case of Windows), would have a special compiler toolchain (WinDDK) which can use C code but has no linkage to the normal standard C's runtime libraries.
There is one toolkit that can enable you to build a driver within Visual Studio, VisualDDK. By all means, building a driver is not for the faint of heart, you will get stress induced activity by staring at blue screens, kernel panics and wonder why, debugging drivers and so on.
The debugging side is harder, ring0 code are not easily accessible by ring3 code as the doors to it are shut, it is through the kernel trap door (for want of a better word) and if asked politely, the door still stays shut while the kernel delegates the task to a handler residing on ring0, execute it, whatever results are returned, are passed back out to ring3 code and the door still stays shut firmly. That is the analogy concept of how userland code can execute privileged code on ring0.
Furthermore, this privileged code, can easily trample over the kernel's memory space and corrupt something hence the kernel panic/bluescreens...
Hope this helps.
Perhaps because a driver doesn't require object oriented features, while the fact that C still has somewhat more mature compilers would make a difference.
There are many style of programming such as procedural, functional, object oriented etc. Object oriented programming is more suited for modeling real world.
I would use object-oriented for device drivers if it suites it. But, most of the time when you programming device drivers, you would not need the advantages provided by c++ such as, abstraction, polymorphism, code reuse etc.
Well, IOKit drivers for MacOSX are written in C++ subset (no exceptions, templates, multiple inheritance). And there is even a possibility to write linux kernel modules in haskell.)
Otherwise, C, being a portable assembly language, perfectly catches the von Neumann architecture and computation model, allowing for direct control over all it's peculiarities and drawbacks (such as the "von Neumann bottleneck"). C does exactly what it was designed for and catches it's target abstraction model completely and flawlessly (well except for implicit assumption in single control flow which could have been generalized to cover the reality of hardware threads) and this is why i think it is a beautiful language.) Restricting the expressive power of the language to such basics eliminates most of the unpredictable transformation details when different computational models are being applied to this de-facto standard. In other words, C makes you stick to basics and allows pretty much direct control over what you are doing, for example when modeling behavior commonality with virtual functions you control exactly how the function pointer tables get stored and used when comparing to C++'s implicit vtbl allocation and management. This is in fact helpful when considering caches.
Having said that, object-based paradigm is very useful for representing physical objects and their dependencies. Adding inheritance we get object-oriented paradigm which in turn is very useful to represent physical objects' structure and behavior hierarchy. Nothing stops anyone from using it and expressing it in C again allowing full control over exactly how your objects will be created, stored, destroyed and copied. In fact that is the approach taken in linux device model. They got "objects" to represent devices, object implementation hierarchy to model power management dependancies and hacked-up inheritance functionality to represent device families, all done in C.
because from system level, drivers need to control every bits of every bytes of the memory, other higher language cannot do that, or cannot do that natively, only C/Asm achieve~

How to design a C / C++ library to be usable in many client languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm planning to code a library that should be usable by a large number of people in on a wide spectrum of platforms. What do I have to consider to design it right? To make this questions more specific, there are four "subquestions" at the end.
Choice of language
Considering all the known requirements and details, I concluded that a library written in C or C++ was the way to go. I think the primary usage of my library will be in programs written in C, C++ and Java SE, but I can also think of reasons to use it from Java ME, PHP, .NET, Objective C, Python, Ruby, bash scrips, etc... Maybe I cannot target all of them, but if it's possible, I'll do it.
Requirements
It would be to much to describe the full purpose of my library here, but there are some aspects that might be important to this question:
The library itself will start out small, but definitely will grow to enormous complexity, so it is not an option to maintain several versions in parallel.
Most of the complexity will be hidden inside the library, though
The library will construct an object graph that is used heavily inside. Some clients of the library will only be interested in specific attributes of specific objects, while other clients must traverse the object graph in some way
Clients may change the objects, and the library must be notified thereof
The library may change the objects, and the client must be notified thereof, if it already has a handle to that object
The library must be multi-threaded, because it will maintain network connections to several other hosts
While some requests to the library may be handled synchronously, many of them will take too long and must be processed in the background, and notify the client on success (or failure)
Of course, answers are welcome no matter if they address my specific requirements, or if they answer the question in a general way that matters to a wider audience!
My assumptions, so far
So here are some of my assumptions and conclusions, which I gathered in the past months:
Internally I can use whatever I want, e.g. C++ with operator overloading, multiple inheritance, template meta programming... as long as there is a portable compiler which handles it (think of gcc / g++)
But my interface has to be a clean C interface that does not involve name mangling
Also, I think my interface should only consist of functions, with basic/primitive data types (and maybe pointers) passed as parameters and return values
If I use pointers, I think I should only use them to pass them back to the library, not to operate directly on the referenced memory
For usage in a C++ application, I might also offer an object oriented interface (Which is also prone to name mangling, so the App must either use the same compiler, or include the library in source form)
Is this also true for usage in C# ?
For usage in Java SE / Java EE, the Java native interface (JNI) applies. I have some basic knowledge about it, but I should definitely double check it.
Not all client languages handle multithreading well, so there should be a single thread talking to the client
For usage on Java ME, there is no such thing as JNI, but I might go with Nested VM
For usage in Bash scripts, there must be an executable with a command line interface
For the other client languages, I have no idea
For most client languages, it would be nice to have kind of an adapter interface written in that language. I think there are tools to automatically generate this for Java and some others
For object oriented languages, it might be possible to create an object oriented adapter which hides the fact that the interface to the library is function based - but I don't know if its worth the effort
Possible subquestions
is this possible with manageable effort, or is it just too much portability?
are there any good books / websites about this kind of design criteria?
are any of my assumptions wrong?
which open source libraries are worth studying to learn from their design / interface / souce?
meta: This question is rather long, do you see any way to split it into several smaller ones? (If you reply to this, do it as a comment, not as an answer)
Mostly correct. Straight procedural interface is the best. (which is not entirely the same as C btw(**), but close enough)
I interface DLLs a lot(*), both open source and commercial, so here are some points that I remember from daily practice, note that these are more recommended areas to research, and not cardinal truths:
Watch out for decoration and similar "minor" mangling schemes, specially if you use a MS compiler. Most notably the stdcall convention sometimes leads to decoration generation for VB's sake (decoration is stuff like #6 after the function symbol name)
Not all compilers can actually layout all kinds of structures:
so avoid overusing unions.
avoid bitpacking
and preferably pack the records for 32-bit x86. While theoretically slower, at least all compilers can access packed records afaik, and the official alignment requirements have changed over time as the architecture evolved
On Windows use stdcall. This is the default for Windows DLLs. Avoid fastcall, it is not entirely standarized (specially how small records are passed)
Some tips to make automated header translation easier:
macros are hard to autoconvert due to their untypeness. Avoid them, use functions
Define separate types for each pointer types, and don't use composite types (xtype **) in function declarations.
follow the "define before use" mantra as much as possible, this will avoid users that translate headers to rearrange them if their language in general requires defining before use, and makes it easier for one-pass parsers to translate them. Or if they need context info to auto translate.
Don't expose more than necessary. Leave handle types opague if possible. It will only cause versioning troubles later.
Do not return structured types like records/structs or arrays as returntype of functions.
always have a version check function (easier to make a distinction).
be careful with enums and boolean. Other languages might have slightly different assumptions. You can use them, but document well how they behave and how large they are. Also think ahead, and make sure that enums don't become larger if you add a few fields, break the interface. (e.g. on Delphi/pascal by default booleans are 0 or 1, and other values are undefined. There are special types for C-like booleans (byte,16-bit or 32-bit word size, though they were originally introduced for COM, not C interfacing))
I prefer stringtypes that are pointer to char + length as separate field (COM also does this). Preferably not having to rely on zero terminated. This is not just because of security (overflow) reasons, but also because it is easier/cheaper to interface them to Delphi native types that way.
Memory always create the API in a way that encourages a total separation of memory management. IOW don't assume anything about memory management. This means that all structures in your lib are allocated via your own memory manager, and if a function passes a struct to you, copy it instead of storing a pointer made with the "clients" memory management. Because you will sooner or later accidentally call free or realloc on it :-)
(implementation language, not interface), be reluctant to change the coprocessor exception mask. Some languages change this as part of conforming to their standards floating point error(exception-)handling.
Always pair a callbacks with an user configurable context. This can be used by the user to give the the callback state without defining global variables. (like e.g. an object instance)
be careful with the coprocessor status word. It might be changed by others and break your code, and if you change it, other code might stop working. The status word is generally not saved/restored as part of calling conventions. At least not in practice.
don't use C style varargs parameters. Not all languages allow variable number of parameters in an unsafe way
(*) Delphi programmer by day, a job that involves interfacing a lot of hardware and thus translating vendor SDK headers. By night Free Pascal developer, in charge of, among others, the Windows headers.
(**)
This is because what "C" means binary is still dependant on the used C compiler, specially if there is no real universal system ABI. Think of stuff like:
C adding an underscore prefix on some binary formats (a.out, Coff?)
sometimes different C compilers have different opinions on what to do with small structures passed by value. Officially they shouldn't support it at all afaik, but most do.
structure packing sometimes varies, as do details of calling conventions (like skipping
integer registers or not if a parameter is registerable in a FPU register)
===== automated header conversions ====
While I don't know SWIG that well, I know and use some delphi specific header tools( h2pas, Darth/headconv etc).
However I never use them in fully automatic mode, since more often then not the output sucks. Comments change line or are stripped, and formatting is not retained.
I usually make a small script (in Pascal, but you can use anything with decent string support) that splits a header up, and then try a tool on relatively homogeneous parts (e.g. only structures, or only defines etc).
Then I check if I like the automated conversion output, and either use it, or try to make a specific converter myself. Since it is for a subset (like only structures) it is often way easier than making a complete header converter. Of course it depends a bit what my target is. (nice, readable headers or quick and dirty). At each step I might do a few substitutions (with sed or an editor).
The most complicated scheme I did for Winapi commctrl and ActiveX/comctl headers. There I combined IDL and the C header (IDL for the interfaces, which are a bunch of unparsable macros in C, the C header for the rest), and managed to get the macros typed for about 80% (by propogating the typecasts in sendmessage macros back to the macro declaration, with reasonable (wparam,lparam,lresult) defaults)
The semi automated way has the disadvantage that the order of declarations is different (e.g. first constants, then structures then function declarations), which sometimes makes maintenance a pain. I therefore always keep the original headers/sdk to compare with.
The Jedi winapi conversion project might have more info, they translated about half of the windows headers to Delphi, and thus have enormous experience.
I don't know but if it's for Windows then you might try either a straight C-like API (similar to the WINAPI), or packaging your code as a COM component: because I'd guess that programming languages might want to be able to invoke the Windows API, and/or use COM objects.
Regarding automatic wrapper generation, consider using SWIG. For Java, it will do all the JNI work. Also, it is able to translate complex OO-C++-interfaces properly (provided you follow some basic guidelines, i.e. no nested classes, no over-use of templates, plus the ones mentioned by Marco van de Voort).
Think C, nothing else. C is one of the most popular programming languages. It is widely used on many different software platforms, and there are few computer architectures for which a C compiler does not exist. All popular high-level languages provide an interface to C. That makes your library accessible from almost all platforms in existence. Don't worry too much about providing an Object Oriented interface. Once you have the library done in C, OOP, functional or any other style interface can be created in appropriate client languages. No other systems programming language will give you C's flexibility and potability.
NestedVM I think is going to be slower than pure Java because of the array bounds checking on the int[][] that represents the MIPS virtual machine memory. It is such a good concept but might not perform well enough right now (until phone manufacturers add NestedVM support (if they do!), most stuff is going to be SLOW for now, n'est-ce pas)? Whilst it may be able to unpack JPEGs without error, speed is of no small concern! :)
Nothing else in what you've written sticks out, which isn't to say that it's right or wrong! The principles sound (mainly just listening to choice of words and language to be honest) like roughly standard best practice but I haven't thought through the details of everything you've said. As you said yourself, this really ought to be several questions. But of course doing this kind of thing is not automatically easy just because you're fixed on perhaps a slightly different architecture to the last code base you've worked on...! ;)
My thoughts:
All your comments on C interface compatibility sound sensible to me, pretty much best practice except you don't seem to properly address memory management policy - some sentences a bit ambiguous/vague/wrong-sounding. The design of the memory management will be to a large extent determined by the access patterns made in your application, rather than the functionality per se. I suiggest you study others' attempts at making portable interfaces like the standard ANSI C API, Unix API, Win32 API, Cocoa, J2SE, etc carefully.
If it was me, I'd write the library in a carefully chosen subset of the common elements of regular Java and Davlik virtual machine Java and also write my own custom parser that translates the code to C for platforms that support C, which would of course be most of them. I would suggest that if you restrict yourself to data types of various size ints, bools, Strings, Dictionaries and Arrays and make careful use of them that will help in cross-platform issues without affecting performance much most of the time.
your assumptions seem ok, but i see trouble ahead, much of which you have already spotted in your assumptions.
As you said, you can't really export c++ classes and methods, you will need to provide a function based c interface. What ever facade you build around that, it will remain a function based interface at heart.
The basic problem i see with that is that people choose a specific language and its runtime because their way of thinking (functional or object oriented) or the problem they address (web programming, database,...) corresponds to that language in some way or other.
A library implemented in c will probably never feel like the libraries they are used to, unless they program in c themselves.
Personally, I would always prefer a library that "feels like python" when I use python, and one that feels like java when I do Java EE, even though I know c and c++.
So your effort might be of little actual use (other than your gain in experience), because people will probably want to stick with their mindset, and rather re-implement the functionality than use a library that does the job, but does not fit.
I also fear the desired portability will seriously hamper development. Just think of the infinite build settings needed, and tests for that. I have worked on a project that tried to maintain compatibility for 5 operating systems (all posix-like, but still) and about 10 compilers, the builds were a nightmare to test and maintain.
Give it an XML interface, whether passed as a parameter and return value or as files through a command-line invocation. This may not seem as direct as a normal function interface, but is the most practical way to access an executable from, e.g., Java.

Why are many VMs written in C when they look like they have C++ features?

I noticed some not so old VM languages like Lua, NekoVM, and Potion written in C.
It looked like they were reimplementing many C++ features.
Is there a benefit to writing them in C rather than C++?
I know something about Lua.
Lua is written in pure ANSI Standard C and compiles on any ANSI platform with no errors and no warnings. Thus Lua runs on almost any platform in the world, including things like Canon PowerShot cameras. It's a lot harder to get C++ to run on weird little embedded platforms.
Lua is a high-performance VM, and because C cannot express method calls (which might be virtual or might not) and operator overloading, it is much easier to predict the performance of C code just by looking at the code. C++, especially with the template library, makes it a little too easy to burn resources without being aware of it. (A full implementation of Lua including not only VM but libraries fits in 145K of x86 object code. The whole language fits even in a tiny 256K cache, which you find at L2 on Intel i7 and L1 on older chips. Unless you really know what you're doing, it's much harder to write C++ that compiles to something this small.)
These are two good reasons to write a VM in C.
It looked like they were reimplementing many C++ features.
Are you suggesting it's easier to implement polymorphism in C++ rather than C? I think you are greatly mistaken.
If you write a VM in C++, you wouldn't implement polymorphism in terms of C++'s polymorphism. You'd roll your own virtual table which maps function names to pointers, or something like that.
People are used to C. I have to admit that I'm more likely to write C for my own projects, even though I've been writing C++ since cfront 1.0.
If you want complete control over things, C is a little easier.
One obvious answer is interoperability. Any time language X has to call functions defined in language Y, you usually make sure that either X or Y is C (the language C, that is)
C++ doesn't define an ABI, so calling C++ code from another language is a bit tricky to do portably. But calling C code is almost trivial. That means that at least part of your VM is probably going to have to be written in C, and then why not be consistent and write the entire thing in C?
Another advantage of C is that it's simple. Everyone can read it, and there are plenty of programmers to help you write it. C++ is, for good and bad, much more of an experts language. You can do a lot of impressive things in C++, and it can save you a lot of work, but there are also fewer programmers who are really good at it.
It's much harder to be "good" at C++, and until one is good at it they will have a lot of bugs and problems. Now, especially when working on large projects with many people, the chance that one of them won't be good enough is much bigger, so coding the project in C is often less risky. There are also portability issues - C code is much easier to port across compilers than C++.
Lua also has many features that are very easy to implement in Lisp, so why doesn't it take that as a basis? The point is that C is little more than glorified assembler code with only a thin layer of abstraction. It is like a somewhat polished blank slate, on which you can build your higher level abstractions. C++ is such a building. Lua is a different building, and if it had to use C++ abstractions, it would have to bend its intent around the existing C++ structure. Starting from the blank slate instead gives you the freedom to build it like you want.
In many cases, code in C could be much faster than C++. For instance most of the functions in the stdio.c library are faster than iostream. scanf is faster than cin, printf is faster than cout etc.
and VMs demand high performance, so C code makes perfect sense, although the programs would most probably take longer to develop.
C++ is implemented in C. I suspect everyone was following the C++ approach.
Even though modern C++ compilers skip (or conceal) the explicit C++ to C translation as a discrete step, the C++ language has peculiarities that stem from the underlying C implementation.
Two examples.
Pointers in addition to references is entirely because of C. References are sufficient, and that's the way Java, Python and Ruby all work.
The classes are not first-class objects that exist at run-time because the class is only a way to define the attributes and method functions in the underlying C code. Class objects exist at run-time in Java, Python and Ruby, and can be manipulated.
Just a side note, you should look into CLR (Rotor-incarnation) and Java sources and you will note it is far more C++-as-C rather than modern or good C++. So it has a parallel there and it is a side-effect of abstracting for toys and making it average-performance happy for the crowd in managed languages.
It also helps avoid pitfalls of naive C++ usage. Exceptions and all other sort of things (bits David at boost consulting kicked off and more while we build sequencers and audio sampling before he even had a job :) are an issue too..
Python integration is another matter, and has a messy history in boost for example.. But for primitive data types and interfaces/interop and machine abstraction, well it is quite clear nothing beats C. No compiler issues either, and it still bootstraps many things before you get to anything as influential as it is/was/will be.
Stepanov recognised this achievement when he nailed STL, and Bjarne nailed it with templates.. Those are the three things always worth thinking about, as you don't have a decent incarnation of them in popular managed languages, not to that expressivness and power. All of that more than 20 years later, which is remarkable and all still bootstrap via C/C++.. Legacy of goodness (but I'm not defending 'dark age' C code c1982-2000, just the idea, yuo can misuse anything ).