Static source code analysis with LLVM - c++

I recently discover the LLVM (low level virtual machine) project, and from what I have heard It can be used to performed static analysis on a source code. I would like to know if it is possible to extract the different function call through function pointer (find the caller function and the callee function) in a program.
I could find the kind of information in the website so it would be really helpful if you could tell me if such an library already exist in LLVM or can you point me to the good direction on how to build it myself (existing source code, reference, tutorial, example...).
EDIT:
With my analysis I actually want to extract caller/callee function call. In the case of a function pointer, I would like to return a set of possible callee. both caller and callee must be define in the source code (this does not include third party function in a library).

I think that Clang (the analyzer that is part of LLVM) is geared towards the detection of bugs, which means that the analyzer tries to compute possible values of some expressions (to reduce false positives) but it sometimes gives up (in this case, emitting no alarm to avoid a deluge of false positives).
If your program is C only, I recommend you take a look at the Value Analysis in Frama-C. It computes supersets of possible values for any l-value at each point of the program, under some hypotheses that are explained at length here. Complexity in the analyzed program only means that the returned supersets are more approximated, but they still contain all the possible run-time values (as long as you remain within the aforementioned hypotheses).
EDIT: if you are interested in possible values of function pointers for the purpose of slicing the analyzed program, you should definitely take a look at the existing dependencies and slicing computations in Frama-C. The website doesn't have any nice example for slicing, here is one from a discussion on the mailing-list

You should take a look at Elsa. It is relatively easy to extend and lets you parse an AST fairly easily. It handles all of the parsing, lexing and AST generation and then lets you traverse the tree using the Visitor pattern.
class CallGraphGenerator : public ASTVisitor
{
//...
virtual bool visitFunction(Function *func);
virtual bool visitExpression(Expression *expr);
}
You can then detect function declarations, and probably detect function pointer usage. Finally you could check the function pointers' declarations and generate a list of the declared functions that could have been called using that pointer.

In our project, we perform static source code analysis by converting LLVM bytecode into C code with help of llc program that is shipped with LLVM. Then we analyze C code with CIL (C Intermediate Language), but for C language a lot of tools is available. The pitfail that the code generated by llc is AWFUL and suffers from a great loss of precision. But still, it's one way to go.
Edit: in fact, I wouldn't recommend anyone to o like this. But still, just for a record...

I think your question is flawed. The title says "Static source code analysis". Yet your underlying reason appears to be the construction of (part of ) a call graph including calls through a function pointer. The essence of function pointers is that you cannot know their values at compile time, i.e. at the point where you do static source code analysis. Consider this bit of code:
void (*pFoo)() = GetFoo();
pFoo();
Static code analysis cannot tell you what GetFoo() returns at runtime, although it might tell you that the result is subsequently used for a function call.
Now, what values could GetFoo() possibly return? You simply can't say this in general (equivalent to solving the halting problem). You will be able to guess some trivial cases. The guessable percentage will of course go up depending on how much effort you are willing to invest.

The DMS Software Reengineering Toolkit provides various types of control, data flow, and global points-to analyzers for large systems of C code, and constructs call graphs using that global points-to analysis (with the appropriate conservative assumptions). More discussion and examples of the analyses can be found at the web site.
DMS has been tested on monolithic systems of C code with 25 million lines. (The call graph for this monster had 250,000 functions in it).
Engineering all this machinery from basic C ASTs and symbol tables is a huge amount of work; been there, done that. You don't want to do this yourself if you have something else to do with your life, like implement other applications.

Related

Are variable identifiers totally needless, at the end of the day?

I've taken a good time studying TOC and Compiler design, not done yet but I feel comfortable with the conceptions. On the other hand I have a very shallow knowledge of assembly and machine code, and I have always the desire/need to connect the two sides( HLL and LLL representation of the code ), as I'm learning C++ with paying great attention to performance and optimization discussions.
C++ is a statically typed language:
My question is: Our variables when written as expressions in the statements of the code, do all these variables ( and other entities with identifiers ) become at runtime, mere instructions of addressing to positions of the virtual memory ( for static and for globals ) and addressing relevant to stack address for local variables?
I mean, after a successful compilation including semantic and syntactic verification, isn't wise to deal with data at runtime as guaranteed entities of target memory bytes without any thinking of any identifier or any checking, with the symbol table no more needed?
If my question appeared to be the type of questions that are due to lacking of learning effort ( which I hope it doesn't ), please just inform me about that, and tell me where to read. If that was the case, then it's honestly because I'm concentrating on C++ nowadays and haven't got the chance yet to have a sound knowledge of low level languages, I apologize for that in advance.
You're spot on. Once compiled to machine code, there is no longer any notion of a variable identifier (or variable type, for that matter). It's just bytes at a certain location. Which location was determined by the compiler (when compiling) based on the variable name, or by the linker (when linking) in the case of global variables.
Of course, it can be useful to retain information such as identifiers, for debugging purposes. This is precisely what "compilation with debug information" means: when you do that, the compiler will somehow embed the (redundant) identifiers into the generated code such that a debugger can access them. Or put them in a separate file alongside; the details of that depend on the format of the debugging information.
Yes, mostly. There are a few details that will make identifiers remain more than just addresses or stack offsets.
First we have in RTTI in C++ which means that during runtime the name of at least types may still be available. For example:
const std::type_info &info = typeid(*ptr_interface);
std::cout << info.name() << std::endl;
would print the name of whatever type *ptr_interface is of.
Second, due to the way a program is linked the symbols from the object files may still be present in the executing image. You have for example the linux kernel making use of this as it can produce a backtrace of the stack including the function names. Also it uses knowledge of function names in order to be able to load and link modules. Similar functionality exists in Gnu C library, than when linked for it is able to retrieve function names in stack traces.
In normal cases though the code will not be affected by the original names of the variables (but the compiler will of course emit code suitable for the type the variable have).

Computing stack demand for C++; How to get readable symbol table?

Assume that you have not only executable file, but also source code files.
My problem is to calculate correct stack size of running process only for local variables, return address, passing argument. I was trying to use VMMap developed by MS. Because it can catch allocated memory in system with categories such as stack. However, it also contains guard page, paging file(s) and so on. Thus, stack size from VMMap was overestimated.
I would like to change the way to solve the problem. I will trace stack to draw actual call tree by using StackWalker64 of WinAPI and get symbol table from either executable or source code. But there is a problem that symbol table from executable such as ELF is not readable.
Now, I am planning to apply doxygen which is open source project with lexer of compiler. Because doxygen only provides function list with their return type and function argument, I don't know about local variables. So, I also need lexer to make complete symbol table as pre-processing. But it kind of complicated one. I am not sure it is best solution.
Is there better way to solve?
What OP wants to do is statically compute worst-case stack depth.
This is very hard to do. To accomplish it, he needs:
The source code for every function (as raw material used to derive following facts)
The "symbol table" (all declarations) for every function
A call graph of the application code across compilation units
A deep knowledge of what his particular compiler does to generate and optimize code
The "symbol table" needs to be compiler-accurate so that the estimation process knows which declared data (conceptually) goes into the stack. OP will need what amounts to a full compiler front-end for his particular dialect of C++. [OP mistakenly thinks a "lexer" will give him a symbol table. It will not].
Now consider constructing the global call graph. The basics seems simple; if function "bar" contains "foo(x)" then "bar" calls "foo". But there are many complications: overloading, virtual functions, indirect calls, and casts. Overloading is presumably resolved by name and type resolution; this gets messy in the face of templates (consider SFINAE). Virtual functions and indirect calls forces one to build a conservative points-to analyzer; an ugly enough cast may force the assumption of a call to any argument-compatible function. Points-to analyzers come in varying degrees of precision; low precision may produce a call graph with many bogus (conservative) edges which will throw off the size estimation.
The compiler won't provide this global call graph since it only operates on single compilation units.
Finally, we can consider constructing a stack size estimate. Here one needs to understand the sizes and alignments use to represent each declared data type, and how the particular compiler of interest allocates local variables to the stack. Generally sequential coding blocks { .... } { ... } overlap stack locations. One also needs to understand how expressions are evaluated and how arguments are passed as these impact the stack usage. Finally, one needs to understand how the compiler allocates registers and what optimizations the compiler can/did(?) apply, as such optimizations will affect expression stack usage as well as how many local variables actually allocated to the stack. This is an awful lot to know, and the only trustworthy source of knowledge is the compiler itself. Rather than trying to replicate all this machinery, it is likely better to either get the compiler to provide its actual stack size allocation per function (I believe GCC will do this), or give up getting a precise result and conservatively estimate the stack demand by assuming every declared local consumes stack space (in this case it is unclear what one should do to estimate expression stack usage; one might assume that every variable and intermediate expression result takes stack space according to its type).
With stack space estimates per function, and a call graph, a simple analysis of the call graph can produce stack demands per call chain from the root. The max of these is the stack estimate needed. (Note: this assumes that each function uses its full stack demand for every call; that's clearly a conservative estimate). This part is pretty easy.
Overall this is a complex analysis. Ideally you can get the compiler to provide stack size estimates and basic call-graph facts. Building the points-to analysis is difficult and gets very hard as the size of the application gets big.
Arguably you can bend GCC to help provide the compilation-unit level data. Clang is presumably designed to be bent to provide that same data. Neither offers specific support for global points-to analysis to my knowledge. It is unclear the GCC and Clang handle the Windows dialects of C++; they may.
Our DMS Software Reengineering Toolkit and its C++ front end is designed to provide symbol tables, the result of name resolution (e.g., resolving overloads) and can easily extract local call facts. It handles both GCC and MS dialects of C++. DMS also provides support for building global points-to analyses and a global call graph; while we have not used this specifically for C++, we have used it to process C applications of some 16 million lines.
All this difficulty explains why people often punt and try to see how big the stack is using dynamic analysis. If OP wants a static analyzer, he needs to be prepared to invest significant effort to get it.

Recover C++ class declaration from ELF binary generated by GCC 2.95

Is it possible (that is, could be easily accomplished with commodity tools) to reconstruct a C++ class declaration from a .so file (non-debug, non-x86) — to the point that member functions could be called successfully as if the original .h file was available for that class?
For example, by trial and error I found that this code works when 64K are allocated for instance-local storage. Allocating 1K or 8K leads to SEGFAULT, although I never saw offsets higher than 0x0650 in disassembly and it is highly unlikely that this class actually stores much data by itself.
class TS57 {
private:
char abLocalStorage[65536]; // Large enough to fit all possible instance data.
public:
TS57(int iSomething);
~TS57(void);
int SaveAsMap(long (*)(long, long, long, long, long), char const*, char const*, char const*);
};
And what if I needed more complex classes and usage scenarios? What if allocating 64K per instance would be too wasteful? Are there simple tools (like nm or objdump) that may give insight on a .so library's type information? I found some papers on SecondWrite — an “executable analysis and rewriting framework” which does “recovery of object oriented features from C++ binaries”, — but, despite several references in newsgroups, could not even conclude whether this is a production-ready software, or just a proof of concept, or a private tool not for general use.
FYI. I am asking this totally out of curiosity. I already found a C-style wrapper for that function that not only instantiates the object and calls its method, but also conveniently does additional housekeeping. Moreover, I am not enthusiastic to write for G++ 2.95 — as this was the version that library was compiled by, and I could not find a switch to get the same name mangling scheme in my usual G++ 3.3. Therefore, I am not interested in workarounds: I wonder whether a direct solution exists — in case it is needed in the future.
The short answer is "no, a machine can't do that".
The longer answer is still roughly the same, but I'll try to explain why:
Some information about functions would be available from nm libmystuff.so|c++filt, which will demangle the names. But that will only show functions that have a public name, and it will most likely still be a bit ambiguous as to what the data-types actually mean.
During compilation, a lot of "semantical information"[1] is lost. Functions are inlined, loops are transforme [loops made with for, do-while and while, even goto in some cases, are made to look almost identical), conditions are compiled out or re-arranged, variable names are lost, much of the type information and enum-names are completely lost, etc. Private and public fields of classes would be lost.
Compiler will also do "clever" transformations on the code to replace complex instructions with less complex ones (int x; ... x = x * 5 may become lea eax, [eax*4 + eax] or similar) [this one is pretty simple - try figuring out "backwards" how the compiler solved a populationcount (number of bits set in a binary number) or cosine when it has been inlined...]
A human, that knows what the code is MEANT to do, and good knowledge of the machine code of the target processor, MAY be able to reverse engineer the code and break out functions that have been inlined. But it's still hard to tell the difference between:
void Foo::func()
{
this->x++;
}
and
void func(Foo* p)
{
p->x++;
}
These two functions should become exactly identical machine-code, and if the function does not have a name in the symbol table, there is no way to tell which it is.
[1] Information about the "meaning" of the code.

Is there a reason why not to use link-time optimization (LTO)?

GCC, MSVC, LLVM, and probably other toolchains have support for link-time (whole program) optimization to allow optimization of calls among compilation units.
Is there a reason not to enable this option when compiling production software?
I assume that by "production software" you mean software that you ship to the customers / goes into production. The answers at Why not always use compiler optimization? (kindly pointed out by Mankarse) mostly apply to situations in which you want to debug your code (so the software is still in the development phase -- not in production).
6 years have passed since I wrote this answer, and an update is necessary. Back in 2014, the issues were:
Link time optimization occasionally introduced subtle bugs, see for example Link-time optimization for the kernel. I assume this is less of an issue as of 2020. Safeguard against these kinds of compiler and linker bugs: Have appropriate tests to check the correctness of your software that you are about to ship.
Increased compile time. There are claims that the situation has significantly improved since 2014, for example thanks to slim objects.
Large memory usage. This post claims that the situation has drastically improved in recent years, thanks to partitioning.
As of 2020, I would try to use LTO by default on any of my projects.
This recent question raises another possible (but rather specific) case in which LTO may have undesirable effects: if the code in question is instrumented for timing, and separate compilation units have been used to try to preserve the relative ordering of the instrumented and instrumenting statements, then LTO has a good chance of destroying the necessary ordering.
I did say it was specific.
If you have well written code, it should only be advantageous. You may hit a compiler/linker bug, but this goes for all types of optimisation, this is rare.
Biggest downside is it drastically increases link time.
Apart from to this,
Consider a typical example from embedded system,
void function1(void) { /*Do something*/} //located at address 0x1000
void function2(void) { /*Do something*/} //located at address 0x1100
void function3(void) { /*Do something*/} //located at address 0x1200
With predefined addressed functions can be called through relative addresses like below,
(*0x1000)(); //expected to call function2
(*0x1100)(); //expected to call function2
(*0x1200)(); //expected to call function3
LTO can lead to unexpected behavior.
updated:
In automotive embedded SW development,Multiple parts of SW are compiled and flashed on to a separate sections.
Boot-loader, Application/s, Application-Configurations are independently flash-able units. Boot-loader has special capabilities to update Application and Application-configuration. At every power-on cycle boot-loader ensures the SW application and application-configuration's compatibility and consistence via Hard-coded location for SW-Versions and CRC and many more parameters. Linker-definition files are used to hard-code the variable location and some function location.
Given that the code is implemented correctly, then link time optimization should not have any impact on the functionality. However, there are scenarios where not 100% correct code will typically just work without link time optimization, but with link time optimization the incorrect code will stop working. There are similar situations when switching to higher optimization levels, like, from -O2 to -O3 with gcc.
That is, depending on your specific context (like, age of the code base, size of the code base, depth of tests, are you starting your project or are you close to final release, ...) you would have to judge the risk of such a change.
One scenario where link-time-optimization can lead to unexpected behavior for wrong code is the following:
Imagine you have two source files read.c and client.c which you compile into separate object files. In the file read.c there is a function read that does nothing else than reading from a specific memory address. The content at this address, however, should be marked as volatile, but unfortunately that was forgotten. From client.c the function read is called several times from the same function. Since read only performs one single read from the address and there is no optimization beyond the boundaries of the read function, read will always when called access the respective memory location. Consequently, every time when read is called from client.c, the code in client.c gets a freshly read value from the address, just as if volatile had been used.
Now, with link-time-optimization, the tiny function read from read.c is likely to be inlined whereever it is called from client.c. Due to the missing volatile, the compiler will now realize that the code reads several times from the same address, and may therefore optimize away the memory accesses. Consequently, the code starts to behave differently.
Rather than mandating that all implementations support the semantics necessary to accomplish all tasks, the Standard allows implementations intended to be suitable for various tasks to extend the language by defining semantics in corner cases beyond those mandated by the C Standard, in ways that would be useful for those tasks.
An extremely popular extension of this form is to specify that cross-module function calls will be processed in a fashion consistent with the platform's Application Binary Interface without regard for whether the C Standard would require such treatment.
Thus, if one makes a cross-module call to a function like:
uint32_t read_uint32_bits(void *p)
{
return *(uint32_t*)p;
}
the generated code would read the bit pattern in a 32-bit chunk of storage at address p, and interpret it as a uint32_t value using the platform's native 32-bit integer format, without regard for how that chunk of storage came to hold that bit pattern. Likewise, if a compiler were given something like:
uint32_t read_uint32_bits(void *p);
uint32_t f1bits, f2bits;
void test(void)
{
float f;
f = 1.0f;
f1bits = read_uint32_bits(&f);
f = 2.0f;
f2bits = read_uint32_bits(&f);
}
the compiler would reserve storage for f on the stack, store the bit pattern for 1.0f to that storage, call read_uint32_bits and store the returned value, store the bit pattern for 2.0f to that storage, call read_uint32_bits and store that returned value.
The Standard provides no syntax to indicate that the called function might read the storage whose address it receives using type uint32_t, nor to indicate that the pointer the function was given might have been written using type float, because implementations intended for low-level programming already extended the language to supported such semantics without using special syntax.
Unfortunately, adding in Link Time Optimization will break any code that relies upon that popular extension. Some people may view such code as broken, but if one recognizes the Spirit of C principle "Don't prevent programmers from doing what needs to be done", the Standard's failure to mandate support for a popular extension cannot be viewed as intending to deprecate its usage if the Standard fails to provide any reasonable alternative.
LTO could also reveal edge-case bugs in code-signing algorithms. Consider a code-signing algorithm based on certain expectations about the TEXT portion of some object or module. Now LTO optimizes the TEXT portion away, or inlines stuff into it in a way the code-signing algorithm was not designed to handle. Worst case scenario, it only affects one particular distribution pipeline but not another, due to a subtle difference in which encryption algorithm was used on each pipeline. Good luck figuring out why the app won't launch when distributed from pipeline A but not B.
LTO support is buggy and LTO related issues has lowest priority for compiler developers. For example: mingw-w64-x86_64-gcc-10.2.0-5 works fine with lto, mingw-w64-x86_64-gcc-10.2.0-6 segfauls with bogus address. We have just noticed that windows CI stopped working.
Please refer the following issue as an example.

Moving from C++ to C

After a few years coding in C++, I was recently offered a job coding in C, in the embedded field.
Putting aside the question of whether it's right or wrong to dismiss C++ in the embedded field, there are some features/idioms in C++ I would miss a lot. Just to name a few:
Generic, type-safe data structures (using templates).
RAII. Especially in functions with multiple return points, e.g. not having to remember to release the mutex on each return point.
Destructors in general. I.e. you write a d'tor once for MyClass, then if a MyClass instance is a member of MyOtherClass, MyOtherClass doesn't have to explicitly deinitialize the MyClass instance - its d'tor is called automatically.
Namespaces.
What are your experiences moving from C++ to C?
What C substitutes did you find for your favorite C++ features/idioms? Did you discover any C features you wish C++ had?
Working on an embedded project, I tried working in all C once, and just couldn't stand it. It was just so verbose that it made it hard to read anything. Also, I liked the optimized-for-embedded containers I had written, which had to turn into much less safe and harder to fix #define blocks.
Code that in C++ looked like:
if(uart[0]->Send(pktQueue.Top(), sizeof(Packet)))
pktQueue.Dequeue(1);
turns into:
if(UART_uchar_SendBlock(uart[0], Queue_Packet_Top(pktQueue), sizeof(Packet)))
Queue_Packet_Dequeue(pktQueue, 1);
which many people will probably say is fine but gets ridiculous if you have to do more than a couple "method" calls in a line. Two lines of C++ would turn into five of C (due to 80-char line length limits). Both would generate the same code, so it's not like the target processor cared!
One time (back in 1995), I tried writing a lot of C for a multiprocessor data-processing program. The kind where each processor has its own memory and program. The vendor-supplied compiler was a C compiler (some kind of HighC derivative), their libraries were closed source so I couldn't use GCC to build, and their APIs were designed with the mindset that your programs would primarily be the initialize/process/terminate variety, so inter-processor communication was rudimentary at best.
I got about a month in before I gave up, found a copy of cfront, and hacked it into the makefiles so I could use C++. Cfront didn't even support templates, but the C++ code was much, much clearer.
Generic, type-safe data structures (using templates).
The closest thing C has to templates is to declare a header file with a lot of code that looks like:
TYPE * Queue_##TYPE##_Top(Queue_##TYPE##* const this)
{ /* ... */ }
then pull it in with something like:
#define TYPE Packet
#include "Queue.h"
#undef TYPE
Note that this won't work for compound types (e.g. no queues of unsigned char) unless you make a typedef first.
Oh, and remember, if this code isn't actually used anywhere, then you don't even know if it's syntactically correct.
EDIT: One more thing: you'll need to manually manage instantiation of code. If your "template" code isn't all inline functions, then you'll have to put in some control to make sure that things get instantiated only once so your linker doesn't spit out a pile of "multiple instances of Foo" errors.
To do this, you'll have to put the non-inlined stuff in an "implementation" section in your header file:
#ifdef implementation_##TYPE
/* Non-inlines, "static members", global definitions, etc. go here. */
#endif
And then, in one place in all your code per template variant, you have to:
#define TYPE Packet
#define implementation_Packet
#include "Queue.h"
#undef TYPE
Also, this implementation section needs to be outside the standard #ifndef/#define/#endif litany, because you may include the template header file in another header file, but need to instantiate afterward in a .c file.
Yep, it gets ugly fast. Which is why most C programmers don't even try.
RAII.
Especially in functions with multiple return points, e.g. not having to remember to release the mutex on each return point.
Well, forget your pretty code and get used to all your return points (except the end of the function) being gotos:
TYPE * Queue_##TYPE##_Top(Queue_##TYPE##* const this)
{
TYPE * result;
Mutex_Lock(this->lock);
if(this->head == this->tail)
{
result = 0;
goto Queue_##TYPE##_Top_exit:;
}
/* Figure out `result` for real, then fall through to... */
Queue_##TYPE##_Top_exit:
Mutex_Lock(this->lock);
return result;
}
Destructors in general.
I.e. you write a d'tor once for MyClass, then if a MyClass instance is a member of MyOtherClass, MyOtherClass doesn't have to explicitly deinitialize the MyClass instance - its d'tor is called automatically.
Object construction has to be explicitly handled the same way.
Namespaces.
That's actually a simple one to fix: just tack a prefix onto every symbol. This is the primary cause of the source bloat that I talked about earlier (since classes are implicit namespaces). The C folks have been living this, well, forever, and probably won't see what the big deal is.
YMMV
I moved from C++ to C for a different reason (some sort of allergic reaction ;) and there are only a few thing that I miss and some things that I gained. If you stick to C99, if you may, there are constructs that let you program quite nicely and safely, in particular
designated initializers (eventually
combined with macros) make
initialization of simple classes as
painless as constructors
compound literals for temporary variables
for-scope variable may help you to do scope bound resource management, in particular to ensure to unlock of mutexes or free of arrays, even under preliminary function returns
__VA_ARGS__ macros can be used to have default arguments to functions and to do code unrolling
inline functions and macros that combine well to replace (sort of) overloaded functions
The difference between C and C++ is the predictability of the code's behavior.
It is a easier to predict with great accuracy what your code will do in C, in C++ it might become a bit more difficult to come up with an exact prediction.
The predictability in C gives you better control of what your code is doing, but that also means you have to do more stuff.
In C++ you can write less code to get the same thing done, but (at leas for me) I have trouble occasionally knowing how the object code is laid out in memory and it's expected behavior.
Nothing like the STL exists for C.
There are libs available which provide similar functionality, but it isn't builtin anymore.
Think that would be one of my biggest problems... Knowing with which tool I could solve the problem, but not having the tools available in the language I have to use.
In my line of work - which is embedded, by the way - I am constantly switching back & forth between C and C++.
When I'm in C, I miss from C++:
templates (including but not limited to STL containers). I use them for things like special counters, buffer pools, etc. (built up my own library of class templates & function templates that I use in different embedded projects)
very powerful standard library
destructors, which of course make RAII possible (mutexes, interrupt disable, tracing, etc.)
access specifiers, to better enforce who can use (not see) what
I use inheritance on larger projects, and C++'s built-in support for it is much cleaner & nicer than the C "hack" of embedding the base class as the first member (not to mention automatic invocation of constructors, init. lists, etc.) but the items listed above are the ones I miss the most.
Also, probably only about a third of the embedded C++ projects I work on use exceptions, so I've become accustomed to living without them, so I don't miss them too much when I move back to C.
On the flip side, when I move back to a C project with a significant number of developers, there are whole classes of C++ problems that I'm used to explaining to people which go away. Mostly problems due to the complexity of C++, and people who think they know what's going on, but they're really at the "C with classes" part of the C++ confidence curve.
Given the choice, I'd prefer using C++ on a project, but only if the team is pretty solid on the language. Also of course assuming it's not an 8K μC project where I'm effectively writing "C" anyway.
Couple of observations
Unless you plan to use your c++ compiler to build your C (which is possible if you stick to a well define subset of C++) you will soon discover things that your compiler allows in C that would be a compile error in C++.
No more cryptic template errors (yay!)
No (language supported) object oriented programming
Pretty much the same reasons I have for using C++ or a mix of C/C++ rather than pure C. I can live without namespaces but I use them all the time if the code standard allows it. The reasons is that you can write much more compact code in C++. This is very usefull for me, I write servers in C++ which tend to crash now and then. At that point it helps a lot if the code you are looking at is short and consist. For example consider the following code:
uint32_t
ScoreList::FindHighScore(
uint32_t p_PlayerId)
{
MutexLock lock(m_Lock);
uint32_t highScore = 0;
for(int i = 0; i < m_Players.Size(); i++)
{
Player& player = m_Players[i];
if(player.m_Score > highScore)
highScore = player.m_Score;
}
return highScore;
}
In C that looks like:
uint32_t
ScoreList_getHighScore(
ScoreList* p_ScoreList)
{
uint32_t highScore = 0;
Mutex_Lock(p_ScoreList->m_Lock);
for(int i = 0; i < Array_GetSize(p_ScoreList->m_Players); i++)
{
Player* player = p_ScoreList->m_Players[i];
if(player->m_Score > highScore)
highScore = player->m_Score;
}
Mutex_UnLock(p_ScoreList->m_Lock);
return highScore;
}
Not a world of difference. One more line of code, but that tends to add up. Nomally you try your best to keep it clean and lean but sometimes you have to do something more complex. And in those situations you value your line count. One more line is one more thing to look at when you try to figure out why your broadcast network suddenly stops delivering messages.
Anyway I find that C++ allows me to do more complex things in a safe fashion.
yes! i have experienced both of these languages and what i found is C++ is more friendly language. It facilitates with more features. It is better to say that C++ is superset of C language as it provide additional features like polymorphism, interitance, operator and function overloading, user defined data types which is not really supported in C. The thousand lines of code is reduce to few lines with the help of object oriented programming that's the main reason of moving from C to C++.
I think the main problem why c++ is harder to be accepted in embedded environment is because of the lack of engineers that understand how to use c++ properly.
Yes, the same reasoning can be applied to C as well, but luckily there aren't that many pitfalls in C that can shoot yourself in the foot. C++ on the other hand, you need to know when not to use certain features in c++.
All in all, I like c++. I use that on the O/S services layer, driver, management code, etc.
But if your team doesn't have enough experience with it, it's gonna be a tough challenge.
I had experience with both. When the rest of the team wasn't ready for it, it was a total disaster. On the other hand, it was good experience.
Certainly, the desire to escape complex/messy syntax is understandable. Sometimes C can appear to be the solution. However, C++ is where the industry support is, including tooling and libraries, so that is hard to work around.
C++ has so many features today including lambdas.
A good approach is to leverage C++ itself to make your code simpler. Objects are good for isolating things under the hood so that at a higher level, the code is simpler. The core guidelines recommend concrete (simple) objects, so that approach can help.
The level of complexity is under the engineer's control. If multiple inheritance (MI) is useful in a scenario and one prefers that option, then one may use MI.
Alternatively, one can define interfaces, inherit from the interface(s), and contain implementing objects (composition/aggregation) and expose the objects through the interface using inline wrappers. The inline wrappers compile down to nothing, i.e., compile down to simple use of the internal (contained) object, yet the container object appears to have that functionality as if multiple inheritance was used.
C++ also has namespaces, so one should leverage namespaces even if coding in a C-like style.
One can use the language itself to create simpler patterns and the STL is full of examples: array, vector, map, queue, string, unique_ptr,... And one can control (to a reasonable extent) how complex their code is.
So, going back to C is not the way, nor is it necessary. One may use C++ in a C-like way, or use C++ multiple inheritance, or use any option in-between.