new-Operator increases Arduino sketch size drastically - why? - c++

While restucturing a part of my code into a class I chose to change a static sized array into a dynamic array and was shocked about the size of my sketch, which increased by ~579%!
Now there is some discussion going on about wheather to use new or malloc() but I did not find a hint to this massive increase in sketch size.
So, if anybody would like to explain where this huge increase is comming from, that would be great!
Further, if anybody knows simmilar pitfalls, it would be very nice of you to share ;)
Here is a demo code to check for yourselfs:
void setup() {
// put your setup code here, to run once:
#define BUFLEN 8 * sizeof(char)
#define BUFVAL '5'
#define BUFARR {BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,0}
#define MODE 2
int i = 0;
Serial.begin(115200);
#if (MODE == 0)
//10.772 Bytes for an Arduino Due on IDE 1.57 -> 2% of total
char empty_page[BUFLEN+1] = BUFARR;
#elif (MODE == 1)
//12.772 Bytes for an Arduino Due on IDE 1.57 -> 2% of total, ~18.5% increase
char *empty_page = (char *)malloc(BUFLEN+1);
memset(empty_page, BUFVAL, BUFLEN);
empty_page[BUFLEN+1] = '\0'; // NULL Terminate
#elif (MODE == 2)
//73.152 Bytes for an Arduino Due on IDE 1.57 -> 13% of total, ~579% increase
char *empty_page = new char[BUFLEN+1]BUFARR;
#endif
Serial.println("Result: ");
for(i=0; i<BUFLEN; i++) {
Serial.print(empty_page[i]);
}
Serial.println("");
#if (MODE == 1)
free(empty_page);
#elif (MODE == 2)
delete[] empty_page;
#endif
}
void loop() {
// put your main code here, to run repeatedly:
}
To check this code without arduino: http://ideone.com/e.js/bMVi0d
EDIT:
My understanding is, that new leads the IDE to compile in some large c++ stuff in order to handle it. On the other hand, the verbose compiler output of the IDE is identical.
I am trying to minimize my sketches, anybody else with this goal would sure be interested as well in parts like ´new´ that you need to avoid in order to get a smaller sketch. This seems to be a general Arduino IDE thing, so there should be a more meta explaination for it.

The new operator is essentially a type safe version of malloc meant to reduce errors in C++. You can see from the code here that new actually just calls malloc with a few bells and whistles added on. As far as when to use new vs malloc, one great discussion can be found here where the main verdict is that almost all C++ programs should use new. That being said, you do not need the extra bells and whistles for making a char array as you don't need to call a constructor for primitive types (calling a constructor is one of the main functions of the new operator. Also of note, primitive types do not have constructors at all so a compiler may lose performance searching for one). If memory is of dire concern, malloc is a perfectly acceptable solution. Your code with malloc looks perfectly fine and should work better for your purposes.

You're confusing a few parts. The "IDE" is the Integrated Development Environment. That's a GUI which acts as an integrated front end for a few tools, including the compiler and linker. This problem looks like a linker problem.
In particular, this looks like a very poor compiler. It drags in about 60 kB while it should drag in nothing. Type safety should be handled by the compiler. Here the compiler should have told the compiler to just use malloc instead of new[], as all the type checks pass.
You should understand that the Arduino is a cheap product and the tooling isn't exactly state of the art.

Related

VOLUME_BITMAP_BUFFER - Accessing the BYTE Buffer?

As the title suggests, I'm doing some low-level volume access (as close to C code as possible) to make a disk clone program, I want to set all the free clusters to zero to use a simple compression and keep the size small.
I've been beating my head off a wall forever trying to figure out why I can't get the FSCTL_GET_VOLUME_BITMAP function working properly... so if possible please don't link me to any external reading as I've probably already been there and its either been C#, invalid links, or has no explanation I am looking for.
I want to understand the buffer itself more than i need the actual code.
The simplest way I can ask is what is the proper way to read from an array with a length of [1] in C/C++ like the one used by VOLUME_BITMAP_BUFFER?
The only way I can even assign anything to it is by recreating it with my own huge buffer and I still end up with errors, even after locking the volume in Recovery mode. I do get all needed permissions to access the raw disk just on a side note.
I know I'm likely missing some fundamental basic in C++ that would allow me to read from the memory its stored in, but I just can't figure it out without getting errors.
In case I happen to be looping through the bytes wrong which is causing my error, I added how I was doing it...although that still leaves me with the Buffer question.
I know you can call multiple times, but I have to assume its not 8 bytes at a time.
Something like this (pardon my code..I typed this on my phone so it likely has errors)...I tried adding any relevant cause of failure in case, but the buffer is the real question.
#define BYTE_MASK = 0x80;
#define BITS_PER_BYTE = 8;
void function foo() {
const int BUFFER_SIZE = 268435456;
struct {
LARGE_INTEGER StartingLcn;
LARGE_INTEGER BitmapSize;
BYTE Buffer[BUFF_SIZE];
} volBuff;
// I want to use VOLUME_BITMAP_BUFFER
/* Part of a larger loop checking for errors and more data
BYTE Mask = 1;
BOOL b = DeviceIoControl(vol, FSCTL_GET-VOLUME_BITMAP, &lcnStart, sizeof(STARTING_LCN_INPUT_BUFFER), &volBuff, sizeof(volBuff), &dwRet);
*/
for (x = 0; x < (bmpLen / BITS_PER_BYTE;) {
if ((volBuff.Buffer[x] & Mask) != 0) {
NotFree++;
} else {
FreeSpc++;
}
// I did try not dividing the size
if (Mask == BYTE_MASK) {
Mask = 1;
x++;
} else {
Mask = (Mask << 1);
}
}
return;
}
I've literally put an entire project on hold for I don't even know how long just being stubborn at this point...and I can't find any answer that actually explains the buffer, forget the rest of the process.
If someone wants to be more thorough I won't complain after all my attempts, but the buffer is driving me crazy at this point.
Any help would be greatly appreciated.
I can safely assume the one answer I was given
"...array with a length of [1]..." there is no way in Standard C++ of accessing the additional bytes. You can either: (a) pray that your compiler can do this (via an extension) or (b) write a C module (where this is well defined) that you can call from C++. - Richard Critton"
Was as correct of an answer I can expect after my extensive attempts to make this work any other way, especially since I was only able to make my own array work using standard C and not C++ directly.
I wanted to put a close on this since my computer is out of use for a bit.
If the problem continues after I dig through some examples for defragmenting in C that I FINALLY came across I'll ask a more direct question with full code to support it.
That answer was enough to remove the wall I had hit and get me thinking again. I thank you for that.

Different Unallocated Memory Behaviour Between Visual Studio Versions

i'm having a weird situation. i'm trying to implement a 10+ years old pci camera device SDK to my camera management software. Manifacturer no longer in business and i have no chance to get official help. So here I am, looking for some help to my ugly problem.
SDK comes with Visual Studio 6.0 samples. One of the include files has a structure ending with a one byte array like below;
typedef struct AVData {
...
BYTE audioVideoData[1];
}AVDATA, *PAVDATA;
But this single byte allocated byte array receives video frames and weird enough, it works fine with Visual Studio 6.0 version. If I try it with Visual Studio 2005/2008/2010, i start getting Memory Access Violation error messages which actully makes sense since it shouldn't be possible to allocate space to a fixed size array afterwards, no? But same code runs fine with VS 6.0?! It's probably caused by either compiler or c++ runtime differences but i'm not very experienced on this subject so it's hard to tell the certain reason for me.
I tried changing the size to an expected maximum number of bytes like below;
typedef struct AVData {
...
BYTE audioVideoData[20000];
}AVDATA, *PAVDATA;
This helped it get working but from time to time i get memory access violation problems when trying to destroy the decoder object of the library.
There is something definitely wrong with this. I don't have the source codes of the SDK, only the DLL, Lib and Header files. My questions are:
1) Is it really legal to allocate space to a fixed size array in Visual Studio 6.0 version?
2) Is there any possible way (a compiler option etc.) to make the same code work with newer VS versions / C++ runtimes?
3) Since my workaround of editing the header file works up to a point but still having problems, do you know any better way to get around of this problem?
IIRC its an old trick to create a struct that is variable in size.
consider
struct {
int len;
char name[1];
} s;
the 'name' can now be of variable length if the appropriate allocation is done and it will be sequentially laid out in memory :
char* foo = "abc";
int len = strlen(foo);
struct s* p = malloc( sizeof(int) + len + 1 );
p->len = len;
strcpy(p->name, foo );
I think the above should work fine in newer versions of visual studio as well, maybe it is a matter of packing, have you done #pragma pack(1) to get structs on byte boundaries? I know that VS6 had that as default.
A one-element array in a C structure like this often means that the size is unknown until runtime. (For a Windows example, see BITMAPINFO.)
Usually, there will be some other information (possibly in the struct) that tells you how large the buffer needs to be. You would never allocate one of these directly, but instead allocate the right size block of memory, then cast it:
int size = /* calculate frame size somehow */
AVDATA * data = (AVDATA*) malloc(sizeof(AVDATA) + size);
// use data->audioVideoData
The code almost certainly exhibits undefined behaviour in some way, and there is no way to fix this except to fix the interface or source code of the SDK. As it is no longer in business, this is impossible.

Getting The Size of a C++ Function

I was reading this question because I'm trying to find the size of a function in a C++ program, It is hinted at that there may be a way that is platform specific. My targeted platform is windows
The method I currently have in my head is the following:
1. Obtain a pointer to the function
2. Increment the Pointer (& counter) until I reach the machine code value for ret
3. The counter will be the size of the function?
Edit1: To clarify what I mean by 'size' I mean the number of bytes (machine code) that make up the function.
Edit2: There have been a few comments asking why or what do I plan to do with this. The honest answer is I have no intention, and I can't really see the benefits of knowing a functions length pre-compile time. (although I'm sure there are some)
This seems like a valid method to me, will this work?
Wow, I use function size counting all the time and it has lots and lots of uses. Is it reliable? No way. Is it standard c++? No way. But that's why you need to check it in the disassembler to make sure it worked, every time that you release a new version. Compiler flags can mess up the ordering.
static void funcIwantToCount()
{
// do stuff
}
static void funcToDelimitMyOtherFunc()
{
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
}
int getlength( void *funcaddress )
{
int length = 0;
for(length = 0; *((UINT32 *)(&((unsigned char *)funcaddress)[length])) != 0xCCCCCCCC; ++length);
return length;
}
It seems to work better with static functions. Global optimizations can kill it.
P.S. I hate people, asking why you want to do this and it's impossible, etc. Stop asking these questions, please. Makes you sound stupid. Programmers are often asked to do non-standard things, because new products almost always push the limits of what's availble. If they don't, your product is probably a rehash of what's already been done. Boring!!!
No, this will not work:
There is no guarantee that your function only contains a single ret instruction.
Even if it only does contain a single ret, you can't just look at the individual bytes - because the corresponding value could appear as simply a value, rather than an instruction.
The first problem can possibly be worked around if you restrict your coding style to, say, only have a single point of return in your function, but the other basically requires a disassembler so you can tell the individual instructions apart.
It is possible to obtain all blocks of a function, but is an unnatural question to ask what is the 'size' of a function. Optimized code will rearrange code blocks in the order of execution and will move seldom used blocks (exception paths) into outer parts of the module. For more details, see Profile-Guided Optimizations for example how Visual C++ achieves this in link time code generation. So a function can start at address 0x00001000, branch at 0x00001100 into a jump at 0x20001000 and a ret, and have some exception handling code 0x20001000. At 0x00001110 another function starts. What is the 'size' of your function? It does span from 0x00001000 to +0x20001000, but it 'owns' only few blocks in that span. So your question should be unasked.
There are other valid questions in this context, like the total number of instructions a function has (can be determined from the program symbol database and from the image), and more importantly, what is the number of instructions in the frequent executed code path inside the function. All these are questions normally asked in the context of performance measurement and there are tools that instrument code and can give very detailed answers.
Chasing pointers in memory and searching for ret will get you nowhere I'm afraid. Modern code is way way way more complex than that.
This won't work... what if there's a jump, a dummy ret, and then the target of the jump? Your code will be fooled.
In general, it's impossible to do this with 100% accuracy because you have to predict all code paths, which is like solving the halting problem. You can get "pretty good" accuracy if you implement your own disassembler, but no solution will be nearly as easy as you imagine.
A "trick" would be to find out which function's code is after the function that you're looking for, which would give pretty good results assuming certain (dangerous) assumptions. But then you'd have to know what function comes after your function, which, after optimizations, is pretty hard to figure out.
Edit 1:
What if the function doesn't even end with a ret instruction at all? It could very well just jmp back to its caller (though it's unlikely).
Edit 2:
Don't forget that x86, at least, has variable-length instructions...
Update:
For those saying that flow analysis isn't the same as solving the halting problem:
Consider what happens when you have code like:
foo:
....
jmp foo
You will have to follow the jump each time to figure out the end of the function, and you cannot ignore it past the first time because you don't know whether or not you're dealing with self-modifying code. (You could have inline assembly in your C++ code that modifies itself, for instance.) It could very well extend to some other place of memory, so your analyzer will (or should) end in an infinite loop, unless you tolerate false negatives.
Isn't that like the halting problem?
I'm posting this to say two things:
1) Most of the answers given here are really bad and will break easily. If you use the C function pointer (using the function name), in a debug build of your executable, and possibly in other circumstances, it may point to a JMP shim that will not have the function body itself. Here's an example. If I do the following for the function I defined below:
FARPROC pfn = (FARPROC)some_function_with_possibility_to_get_its_size_at_runtime;
the pfn I get (for example: 0x7FF724241893) will point to this, which is just a JMP instruction:
Additionally, a compiler can nest several of those shims, or branch your function code so that it will have multiple epilogs, or ret instructions. Heck, it may not even use a ret instruction. Then, there's no guarantee that functions themselves will be compiled and linked in the order you define them in the source code.
You can do all that stuff in assembly language, but not in C or C++.
2) So that above was the bad news. The good news is that the answer to the original question is, yes, there's a way (or a hack) to get the exact function size, but it comes with the following limitations:
It works in 64-bit executables on Windows only.
It is obviously Microsoft specific and is not portable.
You have to do this at run-time.
The concept is simple -- utilize the way SEH is implemented in x64 Windows binaries. Compiler adds details of each function into the PE32+ header (into the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory of the optional header) that you can use to obtain the exact function size. (In case you're wondering, this information is used for catching, handling and unwinding of exceptions in the __try/__except/__finally blocks.)
Here's a quick example:
//You will have to call this when your app initializes and then
//cache the size somewhere in the global variable because it will not
//change after the executable image is built.
size_t fn_size; //Will receive function size in bytes, or 0 if error
some_function_with_possibility_to_get_its_size_at_runtime(&fn_size);
and then:
#include <Windows.h>
//The function itself has to be defined for two types of a call:
// 1) when you call it just to get its size, and
// 2) for its normal operation
bool some_function_with_possibility_to_get_its_size_at_runtime(size_t* p_getSizeOnly = NULL)
{
//This input parameter will define what we want to do:
if(!p_getSizeOnly)
{
//Do this function's normal work
//...
return true;
}
else
{
//Get this function size
//INFO: Works only in 64-bit builds on Windows!
size_t nFnSz = 0;
//One of the reasons why we have to do this at run-time is
//so that we can get the address of a byte inside
//the function body... we'll get it as this thread context:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
if(pRTFn)
{
nFnSz = pRTFn->EndAddress - pRTFn->BeginAddress;
}
*p_getSizeOnly = nFnSz;
return false;
}
}
This can work in very limited scenarios. I use it in part of a code injection utility I wrote. I don't remember where I found the information, but I have the following (C++ in VS2005):
#pragma runtime_checks("", off)
static DWORD WINAPI InjectionProc(LPVOID lpvParameter)
{
// do something
return 0;
}
static DWORD WINAPI InjectionProcEnd()
{
return 0;
}
#pragma runtime_checks("", on)
And then in some other function I have:
size_t cbInjectionProc = (size_t)InjectionProcEnd - (size_t)InjectionProc;
You have to turn off some optimizations and declare the functions as static to get this to work; I don't recall the specifics. I don't know if this is an exact byte count, but it is close enough. The size is only that of the immediate function; it doesn't include any other functions that may be called by that function. Aside from extreme edge cases like this, "the size of a function" is meaningless and useless.
The real solution to this is to dig into your compiler's documentation. The ARM compiler we use can be made to produce an assembly dump (code.dis), from which it's fairly trivial to subtract the offsets between a given mangled function label and the next mangled function label.
I'm not certain which tools you will need for this with a windows target, however. It looks like the tools listed in the answer to this question might be what you're looking for.
Also note that I (working in the embedded space) assumed you were talking about post-compile-analysis. It still might be possible to examine these intermediate files programmatically as part of a build provided that:
The target function is in a different object
The build system has been taught the dependencies
You know for sure that the compiler will build these object files
Note that I'm not sure entirely WHY you want to know this information. I've needed it in the past to be sure that I can fit a particular chunk of code in a very particular place in memory. I have to admit I'm curious what purpose this would have on a more general desktop-OS target.
In C++, the there is no notion of function size. In addition to everything else mentioned, preprocessor macros also make for an indeterminate size. If you want to count number of instruction words, you can't do that in C++, because it doesn't exist until it's been compiled.
What do you mean "size of a function"?
If you mean a function pointer than it is always just 4 bytes for 32bits systems.
If you mean the size of the code than you should just disassemble generated code and find the entry point and closest ret call. One way to do it is to read the instruction pointer register at the beginning and at the end of your function.
If you want to figure out the number of instructions called in the average case for your function you can use profilers and divide the number of retired instructions on the number of calls.
I think it will work on windows programs created with msvc, as for branches the 'ret' seems to always come at the end (even if there are branches that return early it does a jne to go the end).
However you will need some kind of disassembler library to figure the current opcode length as they are variable length for x86. If you don't do this you'll run into false positives.
I would not be surprised if there are cases this doesn't catch.
There is no facilities in Standard C++ to obtain the size or length of a function.
See my answer here: Is it possible to load a function into some allocated memory and run it from there?
In general, knowing the size of a function is used in embedded systems when copying executable code from a read-only source (or a slow memory device, such as a serial Flash) into RAM. Desktop and other operating systems load functions into memory using other techniques, such as dynamic or shared libraries.
Just set PAGE_EXECUTE_READWRITE at the address where you got your function. Then read every byte. When you got byte "0xCC" it means that the end of function is actual_reading_address - 1.
Using GCC, not so hard at all.
void do_something(void) {
printf("%s!", "Hello your name is Cemetech");
do_something_END:
}
...
printf("size of function do_something: %i", (int)(&&do_something_END - (int)do_something));
below code the get the accurate function block size, it works fine with my test
runtime_checks disable _RTC_CheckEsp in debug mode
#pragma runtime_checks("", off)
DWORD __stdcall loadDll(char* pDllFullPath)
{
OutputDebugStringA(pDllFullPath);
//OutputDebugStringA("loadDll...................\r\n");
return 0;
//return test(pDllFullPath);
}
#pragma runtime_checks("", restore)
DWORD __stdcall getFuncSize_loadDll()
{
DWORD maxSize=(PBYTE)getFuncSize_loadDll-(PBYTE)loadDll;
PBYTE pTail=(PBYTE)getFuncSize_loadDll-1;
while(*pTail != 0xC2 && *pTail != 0xC3) --pTail;
if (*pTail==0xC2)
{ //0xC3 : ret
//0xC2 04 00 : ret 4
pTail +=3;
}
return pTail-(PBYTE)loadDll;
};
The non-portable, but API-based and correctly working approach is to use program database readers - like dbghelp.dll on Windows or readelf on Linux. The usage of those is only possible if debug info is enabled/present along with the program. Here's an example on how it works on Windows:
SYMBOL_INFO symbol = { };
symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
// Implies, that the module is loaded into _dbg_session_handle, see ::SymInitialize & ::SymLoadModule64
::SymFromAddr(_dbg_session_handle, address, 0, &symbol);
You will get the size of the function in symbol.Size, but you may also need additional logic identifying whether the address given is a actually a function, a shim placed there by incremental linker or a DLL call thunk (same thing).
I guess somewhat similar can be done via readelf on Linux, but maybe you'll have to come up with the library on top of its sourcecode...
You must bear in mind that although disassembly-based approach is possible, you'll basically have to analyze a directed graph with endpoints in ret, halt, jmp (PROVIDED you have incremental linking enabled and you're able to read jmp-table to identify whether the jmp you're facing in function is internal to that function (missing in image's jmp-table) or external (present in that table; such jmps frequently occur as part of tail-call optimization on x64, as I know)), any calls that are meant to be nonret (like an exception generating helper), etc.
It's an old question but still...
For Windows x64, functions all have a function table, which contains the offset and the size of the function. https://learn.microsoft.com/en-us/windows/win32/debug/pe-format . This function table is used for unwinding when an exception is thrown.
That said, this doesn't contain information like inlining, and all the other issues that people already noted...
int GetFuncSizeX86(unsigned char* Func)
{
if (!Func)
{
printf("x86Helper : Function Ptr NULL\n");
return 0;
}
for (int count = 0; ; count++)
{
if (Func[count] == 0xC3)
{
unsigned char prevInstruc = *(Func - 1);
if (Func[1] == 0xCC // int3
|| prevInstruc == 0x5D// pop ebp
|| prevInstruc == 0x5B// pop ebx
|| prevInstruc == 0x5E// pop esi
|| prevInstruc == 0x5F// pop edi
|| prevInstruc == 0xCC// int3
|| prevInstruc == 0xC9)// leave
return count++;
}
}
}
you could use this assumming you are in x86 or x86_64

How much footprint does C++ exception handling add

This issue is important especially for embedded development. Exception handling adds some footprint to generated binary output. On the other hand, without exceptions the errors need to be handled some other way, which requires additional code, which eventually also increases binary size.
I'm interested in your experiences, especially:
What is average footprint added by your compiler for the exception handling (if you have such measurements)?
Is the exception handling really more expensive (many say that), in terms of binary output size, than other error handling strategies?
What error handling strategy would you suggest for embedded development?
Please take my questions only as guidance. Any input is welcome.
Addendum: Does any one have a concrete method/script/tool that, for a specific C++ object/executable, will show the percentage of the loaded memory footprint that is occupied by compiler-generated code and data structures dedicated to exception handling?
When an exception occurs there will be time overhead which depends on how you implement your exception handling. But, being anecdotal, the severity of an event that should cause an exception will take just as much time to handle using any other method. Why not use the highly supported language based method of dealing with such problems?
The GNU C++ compiler uses the zero–cost model by default i.e. there is no time overhead when exceptions don't occur.
Since information about exception-handling code and the offsets of local objects can be computed once at compile time, such information can be kept in a single place associated with each function, but not in each ARI. You essentially remove exception overhead from each ARI and thus avoid the extra time to push them onto the stack. This approach is called the zero-cost model of exception handling, and the optimized storage mentioned earlier is known as the shadow stack. - Bruce Eckel, Thinking in C++ Volume 2
The size complexity overhead isn't easily quantifiable but Eckel states an average of 5 and 15 percent. This will depend on the size of your exception handling code in ratio to the size of your application code. If your program is small then exceptions will be a large part of the binary. If you are using a zero–cost model than exceptions will take more space to remove the time overhead, so if you care about space and not time than don't use zero-cost compilation.
My opinion is that most embedded systems have plenty of memory to the extent that if your system has a C++ compiler you have enough space to include exceptions. The PC/104 computer that my project uses has several GB of secondary memory, 512 MB of main memory, hence no space problem for exceptions - though, our micorcontrollers are programmed in C. My heuristic is "if there is a mainstream C++ compiler for it, use exceptions, otherwise use C".
Measuring things, part 2. I have now got two programs. The first is in C and is compiled with gcc -O2:
#include <stdio.h>
#include <time.h>
#define BIG 1000000
int f( int n ) {
int r = 0, i = 0;
for ( i = 0; i < 1000; i++ ) {
r += i;
if ( n == BIG - 1 ) {
return -1;
}
}
return r;
}
int main() {
clock_t start = clock();
int i = 0, z = 0;
for ( i = 0; i < BIG; i++ ) {
if ( (z = f(i)) == -1 ) {
break;
}
}
double t = (double)(clock() - start) / CLOCKS_PER_SEC;
printf( "%f\n", t );
printf( "%d\n", z );
}
The second is C++, with exception handling, compiled with g++ -O2:
#include <stdio.h>
#include <time.h>
#define BIG 1000000
int f( int n ) {
int r = 0, i = 0;
for ( i = 0; i < 1000; i++ ) {
r += i;
if ( n == BIG - 1 ) {
throw -1;
}
}
return r;
}
int main() {
clock_t start = clock();
int i = 0, z = 0;
for ( i = 0; i < BIG; i++ ) {
try {
z += f(i);
}
catch( ... ) {
break;
}
}
double t = (double)(clock() - start) / CLOCKS_PER_SEC;
printf( "%f\n", t );
printf( "%d\n", z );
}
I think these answer all the criticisms made of my last post.
Result: Execution times give the C version a 0.5% edge over the C++ version with exceptions, not the 10% that others have talked about (but not demonstrated)
I'd be very grateful if others could try compiling and running the code (should only take a few minutes) in order to check that I have not made a horrible and obvious mistake anywhere. This is knownas "the scientific method"!
I work in a low latency environment. (sub 300 microseconds for my application in the "chain" of production) Exception handling, in my experience, adds 5-25% execution time depending on the amount you do!
We don't generally care about binary bloat, but if you get too much bloat then you thrash like crazy, so you need to be careful.
Just keep the binary reasonable (depends on your setup).
I do pretty extensive profiling of my systems.
Other nasty areas:
Logging
Persisting (we just don't do this one, or if we do it's in parallel)
I guess it'd depend on the hardware and toolchain port for that specific platform.
I don't have the figures. However, for most embedded developement, I have seen people chucking out two things (for VxWorks/GCC toolchain):
Templates
RTTI
Exception handling does make use of both in most cases, so there is a tendency to throw it out as well.
In those cases where we really want to get close to the metal, setjmp/longjmp are used. Note, that this isn't the best solution possible (or very powerful) probably, but then that's what _we_ use.
You can run simple tests on your desktop with two versions of a benchmarking suite with/without exception handling and get the data that you can rely on most.
Another thing about embedded development: templates are avoided like the plague -- they cause too much bloat. Exceptions tag along templates and RTTI as explained by Johann Gerell in the comments (I assumed this was well understood).
Again, this is just what we do. What is it with all the downvoting?
One thing to consider: If you're working in an embedded environment, you want to get the application as small as possible. The Microsoft C Runtime adds quite a bit of overhead to programs. By removing the C runtime as a requirement, I was able to get a simple program to be a 2KB exe file instead of a 70-something kilobyte file, and that's with all the optimizations for size turned on.
C++ exception handling requires compiler support, which is provided by the C runtime. The specifics are shrouded in mystery and are not documented at all. By avoiding C++ exceptions I could cut out the entire C runtime library.
You might argue to just dynamically link, but in my case that wasn't practical.
Another concern is that C++ exceptions need limited RTTI (runtime type information) at least on MSVC, which means that the type names of your exceptions are stored in the executable. Space-wise, it's not an issue, but it just 'feels' cleaner to me to not have this information in the file.
It's easy to see the impact on binary size, just turn off RTTI and exceptions in your compiler. You'll get complaints about dynamic_cast<>, if you're using it... but we generally avoid using code that depends on dynamic_cast<> in our environments.
We've always found it to be a win to turn off exception handling and RTTI in terms of binary size. I've seen many different error handling methods in the absence of exception handling. The most popular seems to be passing failure codes up the callstack. In our current project we use setjmp/longjmp but I'd advise against this in a C++ project as they won't run destructors when exiting a scope in many implementations. If I'm honest I think this was a poor choice made by the original architects of the code, especially considering that our project is C++.
In my opinion exception handling is not something that's generally acceptable for embedded development.
Neither GCC nor Microsoft have "zero-overhead" exception handling. Both compilers insert prologue and epilogue statements into each function that track the scope of execution. This leads to a measurable increase in performance and memory footprint.
The performance difference is something like 10% in my experience, which for my area of work (realtime graphics) is a huge amount. The memory overhead was far less but still significant - I can't remember the figure off-hand but with GCC/MSVC it's easy to compile your program both ways and measure the difference.
I've seen some people talk about exception handling as an "only if you use it" cost. Based on what I've observed this just isn't true. When you enable exception handling it affects all code, whether a code path can throw exceptions or not (which makes total sense when you consider how a compiler works).
I would also stay away from RTTI for embedded development, although we do use it in debug builds to sanity check downcasting results.
Define 'embedded'. On an 8-bit processor I would not certainly not work with exceptions (I would certainly not work with C++ on an 8-bit processor). If you're working with a PC104 type board that is powerful enough to have been someone's desktop a few years back then you might get away with it. But I have to ask - why are there exceptions? Usually in embedded applications anything like an exception occurring is unthinkable - why didn't that problem get sorted out in testing?
For instance, is this in a medical device? Sloppy software in medical devices has killed people. It is unacceptable for anything unplanned to occur, period. All failure modes must be accounted for and, as Joel Spolsky said, exceptions are like GOTO statements except you don't know where they're called from. So when you handle your exception, what failed and what state is your device in? Due to your exception is your radiation therapy machine stuck at FULL and is cooking someone alive (this has happened IRL)? At just what point did the exception happen in your 10,000+ lines of code. Sure you may be able to cut that down to perhaps 100 lines of code but do you know the significance of each of those lines causing an exception is?
Without more information I would say do NOT plan for exceptions in your embedded system. If you add them then be prepared to plan the failure modes of EVERY LINE OF CODE that could cause an exception. If you're making a medical device then people die if you don't. If you're making a portable DVD player, well, you've made a bad portable DVD player. Which is it?

OSX lacks memalign

I'm working on a project in C and it requires memalign(). Really, posix_memalign() would do as well, but darwin/OSX lacks both of them.
What is a good solution to shoehorn-in memalign? I don't understand the licensing for posix-C code if I were to rip off memalign.c and put it in my project- I don't want any viral-type licensing LGPL-ing my whole project.
Mac OS X appears to be 16-byte mem aligned.
Quote from the website:
I had a hard time finding a definitive
statement on MacOS X memory alignment
so I did my own tests. On 10.4/intel,
both stack and heap memory is 16 byte
aligned. So people porting software
can stop looking for memalign() and
posix_memalign(). It’s not needed.
update: OSX now has posix_memalign()
Late to the party, but newer versions of OSX do have posix_memalign(). You might want this when aligning to page boundaries. For example:
#include <stdlib.h>
char *buffer;
int pagesize;
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) handle_error("sysconf");
if (posix_memalign((void **)&buffer, pagesize, 4 * pagesize) != 0) {
handle_error("posix_memalign");
}
One thing to note is that, unlike memalign(), posix_memalign() takes **buffer as an argument and returns an integer error code.
Should be easy enough to do yourself, no? Something like the following (not tested):
void *aligned_malloc( size_t size, int align )
{
void *mem = malloc( size + (align-1) + sizeof(void*) );
char *amem = ((char*)mem) + sizeof(void*);
amem += align - ((uintptr)amem & (align - 1));
((void**)amem)[-1] = mem;
return amem;
}
void aligned_free( void *mem )
{
free( ((void**)mem)[-1] );
}
(thanks Jonathan Leffler)
Edit:
Regarding ripping off another memalign implementation, the problem with that is not licensing. Rather, you'd run into the difficulty that any good memalign implementation will be an integral part of the heap-manager codebase, not simply layered on top of malloc/free. So you'd have serious trouble transplanting it to a different heap-manager, especially when you have no access to it's internals.
Why does the software you are porting need memalign() or posix_memalign()? Does it use it for alignments bigger than the 16-byte alignments referenced by austirg?
I see Mike F posted some code - it looks relatively neat, though I think the while loop may be sub-optimal (if the alignment required is 1KB, it could iterate quite a few times).
Doesn't:
amem += align - ((uintptr)amem & (align - 1));
get there in one operation?
Yes Mac OS X does have 16 Byte memory alignment in the ABI.
You should not need to use memalign(). If you memory requirements are a factor of 16 then I would not implement it and maybe just add an assert.
From the macosx man pages:
The malloc(), calloc(), valloc(),
realloc(), and reallocf() functions
allocate memory. The allocated memory is aligned such that it can be
used for any data type, including AltiVec- and SSE-related types. The free()
function frees allocations that were created via the preceding allocation
functions.
If you need an arbitrarily aligned malloc, check out x264's malloc (common/common.c in the git repository), which has a custom memalign for systems without malloc.h. Its extremely trivial code, to the point where I would not even consider it copyrightable, but you should easily be able to implement your own after seeing it.
Of course, if you only need 16-byte alignment, as stated above, its in the OS X ABI.
Might be worthwhile suggesting using Doug Lea's malloc in your code. link text
Thanks for the help, guys... helped in my case (OpenCascade src/Image/Image_PixMap.cxx, OSX10.5.8 PPC)
Combined with the answers above, this might save someone some digging around or instill hope if not particularly familiar with malloc, etc.:
The rather large project I'm building only had one reference to posix_memalign, and it turns out it was the result of a bunch of preprocessor conditions that didn't include OSX but DID include BORLANDC, which confirms what others suggested about it being safe to use malloc in some cases:
#if defined(_MSC_VER)
return (TypePtr )_aligned_malloc (theBytesCount, theAlign);
#elif (defined(__GNUC__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 1)
return (TypePtr ) _mm_malloc (theBytesCount, theAlign);
#elif defined(__BORLANDC__)
return (TypePtr ) malloc (theBytesCount);
#else
void* aPtr;
if (posix_memalign (&aPtr, theAlign, theBytesCount))
{
aPtr = NULL;
}
return (TypePtr )aPtr;
#endif
So, it could be as simple as just using malloc, as suggested by others.
e.g. here: moving __BORLANDC__ condition above __GNUC__ and adding APPLE:
#elif (defined(__BORLANDC__) || defined(__APPLE__)) //now above `__GNUC__`
NOTE: I did NOT check that BORLANDC uses 16-byte alignment like someone above stated OS X does. Nor did I verify that PPC OS X does. However, this usage suggests that this alignment isn't particularly important. (Here's hoping it works, and that it could be that easy for you searchers, as well!)