IAR C/C++ comparsion operator behaviour - c++

I have following define:
#define DEVICE_ID ((uint8_t)0x3f)
and I have following function:
void LIS3DSH_Init(LIS3DSH_InitTypeDef* LIS3DSH_InitStruct)
{
// uint8_t ctrl=0x00;
uint8_t ident=0x00;
LIS3DSH_LowLevel_Init();
LIS3DSH_Read(&ident,
LIS3DSH_WHOAMI_REG_ADDR,
1);
if(DEVICE_ID==ident)
{
// LIS3DSH detected
}
else
{
// LIS3DSH not detected
failureHandler();
}
} // LIS3DSH_Init
Now, if I go step-by-step in this function, the ident variable gets value 0x3f after LIS3DSH_Read function call, which is ok. My question is, why the hell if clause jumps to failureHandler? The values of DEVICE_ID and ident are the same - both are 0x3f, if should not jump to failureHanlder(). I am working on LIS3DSH accelerator library using IAR C/C++ and STM32F4 Discovery Board. Here is a screenshot of situation:

You should type-cast the if(DEVICE_ID==ident) to be if( (uint8_t)DEVICE_ID == (uint8_t)ident)
This has been an issue for me in the past.
And yes, declare ident as volatile, and for debugging purposes, try adding a delay before the comparison via a for-loop with __no_operation(); inside it. Note that there are 2 underscores in front of that, not 1 (intrinsic NOP instruction), and that a single NOP takes roughly ~29ns on a 168MHz board, measured via scope.
Also, since you have IAR, you might as well pop open the "assembly" view and look at what registers and/or constants are actually being compared. Open the "register" view as well...so you can see the register values themselves.

Is the function
failureHandler()
processed?
If not, the view of the debugger is just confusing due to compiler optimizations.
It is likely that the optimized code uses a common "return" code for both the
good case and the failure case. The debugger stops at the "exit" of the failure path even
in good case.

Related

C++ self erasing code [duplicate]

I was reading this question because I'm trying to find the size of a function in a C++ program, It is hinted at that there may be a way that is platform specific. My targeted platform is windows
The method I currently have in my head is the following:
1. Obtain a pointer to the function
2. Increment the Pointer (& counter) until I reach the machine code value for ret
3. The counter will be the size of the function?
Edit1: To clarify what I mean by 'size' I mean the number of bytes (machine code) that make up the function.
Edit2: There have been a few comments asking why or what do I plan to do with this. The honest answer is I have no intention, and I can't really see the benefits of knowing a functions length pre-compile time. (although I'm sure there are some)
This seems like a valid method to me, will this work?
Wow, I use function size counting all the time and it has lots and lots of uses. Is it reliable? No way. Is it standard c++? No way. But that's why you need to check it in the disassembler to make sure it worked, every time that you release a new version. Compiler flags can mess up the ordering.
static void funcIwantToCount()
{
// do stuff
}
static void funcToDelimitMyOtherFunc()
{
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
}
int getlength( void *funcaddress )
{
int length = 0;
for(length = 0; *((UINT32 *)(&((unsigned char *)funcaddress)[length])) != 0xCCCCCCCC; ++length);
return length;
}
It seems to work better with static functions. Global optimizations can kill it.
P.S. I hate people, asking why you want to do this and it's impossible, etc. Stop asking these questions, please. Makes you sound stupid. Programmers are often asked to do non-standard things, because new products almost always push the limits of what's availble. If they don't, your product is probably a rehash of what's already been done. Boring!!!
No, this will not work:
There is no guarantee that your function only contains a single ret instruction.
Even if it only does contain a single ret, you can't just look at the individual bytes - because the corresponding value could appear as simply a value, rather than an instruction.
The first problem can possibly be worked around if you restrict your coding style to, say, only have a single point of return in your function, but the other basically requires a disassembler so you can tell the individual instructions apart.
It is possible to obtain all blocks of a function, but is an unnatural question to ask what is the 'size' of a function. Optimized code will rearrange code blocks in the order of execution and will move seldom used blocks (exception paths) into outer parts of the module. For more details, see Profile-Guided Optimizations for example how Visual C++ achieves this in link time code generation. So a function can start at address 0x00001000, branch at 0x00001100 into a jump at 0x20001000 and a ret, and have some exception handling code 0x20001000. At 0x00001110 another function starts. What is the 'size' of your function? It does span from 0x00001000 to +0x20001000, but it 'owns' only few blocks in that span. So your question should be unasked.
There are other valid questions in this context, like the total number of instructions a function has (can be determined from the program symbol database and from the image), and more importantly, what is the number of instructions in the frequent executed code path inside the function. All these are questions normally asked in the context of performance measurement and there are tools that instrument code and can give very detailed answers.
Chasing pointers in memory and searching for ret will get you nowhere I'm afraid. Modern code is way way way more complex than that.
This won't work... what if there's a jump, a dummy ret, and then the target of the jump? Your code will be fooled.
In general, it's impossible to do this with 100% accuracy because you have to predict all code paths, which is like solving the halting problem. You can get "pretty good" accuracy if you implement your own disassembler, but no solution will be nearly as easy as you imagine.
A "trick" would be to find out which function's code is after the function that you're looking for, which would give pretty good results assuming certain (dangerous) assumptions. But then you'd have to know what function comes after your function, which, after optimizations, is pretty hard to figure out.
Edit 1:
What if the function doesn't even end with a ret instruction at all? It could very well just jmp back to its caller (though it's unlikely).
Edit 2:
Don't forget that x86, at least, has variable-length instructions...
Update:
For those saying that flow analysis isn't the same as solving the halting problem:
Consider what happens when you have code like:
foo:
....
jmp foo
You will have to follow the jump each time to figure out the end of the function, and you cannot ignore it past the first time because you don't know whether or not you're dealing with self-modifying code. (You could have inline assembly in your C++ code that modifies itself, for instance.) It could very well extend to some other place of memory, so your analyzer will (or should) end in an infinite loop, unless you tolerate false negatives.
Isn't that like the halting problem?
I'm posting this to say two things:
1) Most of the answers given here are really bad and will break easily. If you use the C function pointer (using the function name), in a debug build of your executable, and possibly in other circumstances, it may point to a JMP shim that will not have the function body itself. Here's an example. If I do the following for the function I defined below:
FARPROC pfn = (FARPROC)some_function_with_possibility_to_get_its_size_at_runtime;
the pfn I get (for example: 0x7FF724241893) will point to this, which is just a JMP instruction:
Additionally, a compiler can nest several of those shims, or branch your function code so that it will have multiple epilogs, or ret instructions. Heck, it may not even use a ret instruction. Then, there's no guarantee that functions themselves will be compiled and linked in the order you define them in the source code.
You can do all that stuff in assembly language, but not in C or C++.
2) So that above was the bad news. The good news is that the answer to the original question is, yes, there's a way (or a hack) to get the exact function size, but it comes with the following limitations:
It works in 64-bit executables on Windows only.
It is obviously Microsoft specific and is not portable.
You have to do this at run-time.
The concept is simple -- utilize the way SEH is implemented in x64 Windows binaries. Compiler adds details of each function into the PE32+ header (into the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory of the optional header) that you can use to obtain the exact function size. (In case you're wondering, this information is used for catching, handling and unwinding of exceptions in the __try/__except/__finally blocks.)
Here's a quick example:
//You will have to call this when your app initializes and then
//cache the size somewhere in the global variable because it will not
//change after the executable image is built.
size_t fn_size; //Will receive function size in bytes, or 0 if error
some_function_with_possibility_to_get_its_size_at_runtime(&fn_size);
and then:
#include <Windows.h>
//The function itself has to be defined for two types of a call:
// 1) when you call it just to get its size, and
// 2) for its normal operation
bool some_function_with_possibility_to_get_its_size_at_runtime(size_t* p_getSizeOnly = NULL)
{
//This input parameter will define what we want to do:
if(!p_getSizeOnly)
{
//Do this function's normal work
//...
return true;
}
else
{
//Get this function size
//INFO: Works only in 64-bit builds on Windows!
size_t nFnSz = 0;
//One of the reasons why we have to do this at run-time is
//so that we can get the address of a byte inside
//the function body... we'll get it as this thread context:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
if(pRTFn)
{
nFnSz = pRTFn->EndAddress - pRTFn->BeginAddress;
}
*p_getSizeOnly = nFnSz;
return false;
}
}
This can work in very limited scenarios. I use it in part of a code injection utility I wrote. I don't remember where I found the information, but I have the following (C++ in VS2005):
#pragma runtime_checks("", off)
static DWORD WINAPI InjectionProc(LPVOID lpvParameter)
{
// do something
return 0;
}
static DWORD WINAPI InjectionProcEnd()
{
return 0;
}
#pragma runtime_checks("", on)
And then in some other function I have:
size_t cbInjectionProc = (size_t)InjectionProcEnd - (size_t)InjectionProc;
You have to turn off some optimizations and declare the functions as static to get this to work; I don't recall the specifics. I don't know if this is an exact byte count, but it is close enough. The size is only that of the immediate function; it doesn't include any other functions that may be called by that function. Aside from extreme edge cases like this, "the size of a function" is meaningless and useless.
The real solution to this is to dig into your compiler's documentation. The ARM compiler we use can be made to produce an assembly dump (code.dis), from which it's fairly trivial to subtract the offsets between a given mangled function label and the next mangled function label.
I'm not certain which tools you will need for this with a windows target, however. It looks like the tools listed in the answer to this question might be what you're looking for.
Also note that I (working in the embedded space) assumed you were talking about post-compile-analysis. It still might be possible to examine these intermediate files programmatically as part of a build provided that:
The target function is in a different object
The build system has been taught the dependencies
You know for sure that the compiler will build these object files
Note that I'm not sure entirely WHY you want to know this information. I've needed it in the past to be sure that I can fit a particular chunk of code in a very particular place in memory. I have to admit I'm curious what purpose this would have on a more general desktop-OS target.
In C++, the there is no notion of function size. In addition to everything else mentioned, preprocessor macros also make for an indeterminate size. If you want to count number of instruction words, you can't do that in C++, because it doesn't exist until it's been compiled.
What do you mean "size of a function"?
If you mean a function pointer than it is always just 4 bytes for 32bits systems.
If you mean the size of the code than you should just disassemble generated code and find the entry point and closest ret call. One way to do it is to read the instruction pointer register at the beginning and at the end of your function.
If you want to figure out the number of instructions called in the average case for your function you can use profilers and divide the number of retired instructions on the number of calls.
I think it will work on windows programs created with msvc, as for branches the 'ret' seems to always come at the end (even if there are branches that return early it does a jne to go the end).
However you will need some kind of disassembler library to figure the current opcode length as they are variable length for x86. If you don't do this you'll run into false positives.
I would not be surprised if there are cases this doesn't catch.
There is no facilities in Standard C++ to obtain the size or length of a function.
See my answer here: Is it possible to load a function into some allocated memory and run it from there?
In general, knowing the size of a function is used in embedded systems when copying executable code from a read-only source (or a slow memory device, such as a serial Flash) into RAM. Desktop and other operating systems load functions into memory using other techniques, such as dynamic or shared libraries.
Just set PAGE_EXECUTE_READWRITE at the address where you got your function. Then read every byte. When you got byte "0xCC" it means that the end of function is actual_reading_address - 1.
Using GCC, not so hard at all.
void do_something(void) {
printf("%s!", "Hello your name is Cemetech");
do_something_END:
}
...
printf("size of function do_something: %i", (int)(&&do_something_END - (int)do_something));
below code the get the accurate function block size, it works fine with my test
runtime_checks disable _RTC_CheckEsp in debug mode
#pragma runtime_checks("", off)
DWORD __stdcall loadDll(char* pDllFullPath)
{
OutputDebugStringA(pDllFullPath);
//OutputDebugStringA("loadDll...................\r\n");
return 0;
//return test(pDllFullPath);
}
#pragma runtime_checks("", restore)
DWORD __stdcall getFuncSize_loadDll()
{
DWORD maxSize=(PBYTE)getFuncSize_loadDll-(PBYTE)loadDll;
PBYTE pTail=(PBYTE)getFuncSize_loadDll-1;
while(*pTail != 0xC2 && *pTail != 0xC3) --pTail;
if (*pTail==0xC2)
{ //0xC3 : ret
//0xC2 04 00 : ret 4
pTail +=3;
}
return pTail-(PBYTE)loadDll;
};
The non-portable, but API-based and correctly working approach is to use program database readers - like dbghelp.dll on Windows or readelf on Linux. The usage of those is only possible if debug info is enabled/present along with the program. Here's an example on how it works on Windows:
SYMBOL_INFO symbol = { };
symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
// Implies, that the module is loaded into _dbg_session_handle, see ::SymInitialize & ::SymLoadModule64
::SymFromAddr(_dbg_session_handle, address, 0, &symbol);
You will get the size of the function in symbol.Size, but you may also need additional logic identifying whether the address given is a actually a function, a shim placed there by incremental linker or a DLL call thunk (same thing).
I guess somewhat similar can be done via readelf on Linux, but maybe you'll have to come up with the library on top of its sourcecode...
You must bear in mind that although disassembly-based approach is possible, you'll basically have to analyze a directed graph with endpoints in ret, halt, jmp (PROVIDED you have incremental linking enabled and you're able to read jmp-table to identify whether the jmp you're facing in function is internal to that function (missing in image's jmp-table) or external (present in that table; such jmps frequently occur as part of tail-call optimization on x64, as I know)), any calls that are meant to be nonret (like an exception generating helper), etc.
It's an old question but still...
For Windows x64, functions all have a function table, which contains the offset and the size of the function. https://learn.microsoft.com/en-us/windows/win32/debug/pe-format . This function table is used for unwinding when an exception is thrown.
That said, this doesn't contain information like inlining, and all the other issues that people already noted...
int GetFuncSizeX86(unsigned char* Func)
{
if (!Func)
{
printf("x86Helper : Function Ptr NULL\n");
return 0;
}
for (int count = 0; ; count++)
{
if (Func[count] == 0xC3)
{
unsigned char prevInstruc = *(Func - 1);
if (Func[1] == 0xCC // int3
|| prevInstruc == 0x5D// pop ebp
|| prevInstruc == 0x5B// pop ebx
|| prevInstruc == 0x5E// pop esi
|| prevInstruc == 0x5F// pop edi
|| prevInstruc == 0xCC// int3
|| prevInstruc == 0xC9)// leave
return count++;
}
}
}
you could use this assumming you are in x86 or x86_64

Is there a good reason in C++ to refer to a property without changing it?

I'm using someone else's C++ app as a guide for my own C# application and I've come across a strange pattern:
array[index].property;
No assignment. Nothing evaluated.
I assume this is a vestigial line, left over from when an assignment would have been done here.
But I'm not sure. And seeing as I'm having problems, I'm starting to turn over even the most unlikely rocks.
My question then: does that line do anything? Does it perhaps perform a function akin to touch?
Update - real code
The real code is:
devconfig[fbbloop-1].wAlphaMax;
Where devconfig is an array of the following struct:
typedef struct tagBIRDDEVICECONFIG
{
BYTE byStatus; // device status (see bird device status bits, above)
BYTE byID; // device ID code (see bird device ID's, above)
WORD wSoftwareRev; // software revision of device
BYTE byError; // error code flagged by device
BYTE bySetup; // setup information (see bird device setup bits, above)
BYTE byDataFormat; // data format (see bird data formats, above)
BYTE byReportRate; // rate of data reporting, in units of frames
WORD wScaling; // full scale measurement, in inches
BYTE byHemisphere; // hemisphere of operation (see bird hemisphere codes, above)
BYTE byDeviceNum; // bird number
BYTE byXmtrType; // transmitter type (see bird transmitter type bits, above)
WORD wAlphaMin[7]; // filter constants (see Birdnet3 Protocol pp.26-27 for values)
WORD wAlphaMax[7]; // filter constants (see Birdnet3 Protocol pp.26-27 for values)
WORD wVM[7]; // filter constants (see Birdnet3 Protocol pp.26-27 for values)
BIRDANGLES anglesReferenceFrame; // reference frame of bird readings
BIRDANGLES anglesAngleAlign; // alignment of bird readings
}
BIRDDEVICECONFIG;
The code you included fits a pattern that I call "This programmer is a goof rocket". The code either does nothing or there is a side effect on which the programmer is depending. In both cases, "This programmer is a goof rocket" is a good description of a programmer who codes like that.
The case of "oh, the previous coder just make a mistake" is mearly a nice variant of "This programmer is goof rocket".
The original developer most likely ran into a bug with the compiler not recompiling some changed files.
This happens fairly often with Visual Studio, less often with more recent versions, when you change some central code and use minimal rebuilds.
Essentially, the compiler will not notice some change and link with old or incorrect object code. The result is some strange behavior that doesn't make sense and is nearly impossible to debug, i.e. a function returning to the wrong address or jumping into the middle of some other function for no apparent reason.
The solution to this is to do a full rebuild, but if the developer doesn't immediately know that this is a mistake on the compiler's side, they might think that the issue is in the code and try to fix it.
Often, introducing a code change that has no actual effect on the functionality of the code will fix the issue. It's also fairly common to see a line that's just 0; used to achieve the same effect. Again, if the developer doesn't know that the issue was caused by the compiler, they might think that this line of code that does nothing is important.
It doesn't do much if the container is indeed an array and index is within array bounds. If the container is is a map, a record will be created for index key.
No, unless the operator [] was overloaded. Even so, it is a weird thing to do. Probably it is a leftover from something else.
No, there's no function to that line. It's a primitive access, so no funky overloading shenanigans going on here. It could be covering up for some compiler bug or error, and it wouldn't be the first time I've seen nonsense lines have to be left in for that reason, but it certainly doesn't serve an actual purpose in the execution of the program.
From your update it looks like the code is just accessing the address of the first item of an array. You can safely remove the line.
Often when you see nonsensical code like that, it was created because programmer wanted to supress some pedantic warning from either the compiler or some static analysis tool.
If you have access to the versioning system, you could maybe go back in time and see during which conditions when it was created...

Getting The Size of a C++ Function

I was reading this question because I'm trying to find the size of a function in a C++ program, It is hinted at that there may be a way that is platform specific. My targeted platform is windows
The method I currently have in my head is the following:
1. Obtain a pointer to the function
2. Increment the Pointer (& counter) until I reach the machine code value for ret
3. The counter will be the size of the function?
Edit1: To clarify what I mean by 'size' I mean the number of bytes (machine code) that make up the function.
Edit2: There have been a few comments asking why or what do I plan to do with this. The honest answer is I have no intention, and I can't really see the benefits of knowing a functions length pre-compile time. (although I'm sure there are some)
This seems like a valid method to me, will this work?
Wow, I use function size counting all the time and it has lots and lots of uses. Is it reliable? No way. Is it standard c++? No way. But that's why you need to check it in the disassembler to make sure it worked, every time that you release a new version. Compiler flags can mess up the ordering.
static void funcIwantToCount()
{
// do stuff
}
static void funcToDelimitMyOtherFunc()
{
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
}
int getlength( void *funcaddress )
{
int length = 0;
for(length = 0; *((UINT32 *)(&((unsigned char *)funcaddress)[length])) != 0xCCCCCCCC; ++length);
return length;
}
It seems to work better with static functions. Global optimizations can kill it.
P.S. I hate people, asking why you want to do this and it's impossible, etc. Stop asking these questions, please. Makes you sound stupid. Programmers are often asked to do non-standard things, because new products almost always push the limits of what's availble. If they don't, your product is probably a rehash of what's already been done. Boring!!!
No, this will not work:
There is no guarantee that your function only contains a single ret instruction.
Even if it only does contain a single ret, you can't just look at the individual bytes - because the corresponding value could appear as simply a value, rather than an instruction.
The first problem can possibly be worked around if you restrict your coding style to, say, only have a single point of return in your function, but the other basically requires a disassembler so you can tell the individual instructions apart.
It is possible to obtain all blocks of a function, but is an unnatural question to ask what is the 'size' of a function. Optimized code will rearrange code blocks in the order of execution and will move seldom used blocks (exception paths) into outer parts of the module. For more details, see Profile-Guided Optimizations for example how Visual C++ achieves this in link time code generation. So a function can start at address 0x00001000, branch at 0x00001100 into a jump at 0x20001000 and a ret, and have some exception handling code 0x20001000. At 0x00001110 another function starts. What is the 'size' of your function? It does span from 0x00001000 to +0x20001000, but it 'owns' only few blocks in that span. So your question should be unasked.
There are other valid questions in this context, like the total number of instructions a function has (can be determined from the program symbol database and from the image), and more importantly, what is the number of instructions in the frequent executed code path inside the function. All these are questions normally asked in the context of performance measurement and there are tools that instrument code and can give very detailed answers.
Chasing pointers in memory and searching for ret will get you nowhere I'm afraid. Modern code is way way way more complex than that.
This won't work... what if there's a jump, a dummy ret, and then the target of the jump? Your code will be fooled.
In general, it's impossible to do this with 100% accuracy because you have to predict all code paths, which is like solving the halting problem. You can get "pretty good" accuracy if you implement your own disassembler, but no solution will be nearly as easy as you imagine.
A "trick" would be to find out which function's code is after the function that you're looking for, which would give pretty good results assuming certain (dangerous) assumptions. But then you'd have to know what function comes after your function, which, after optimizations, is pretty hard to figure out.
Edit 1:
What if the function doesn't even end with a ret instruction at all? It could very well just jmp back to its caller (though it's unlikely).
Edit 2:
Don't forget that x86, at least, has variable-length instructions...
Update:
For those saying that flow analysis isn't the same as solving the halting problem:
Consider what happens when you have code like:
foo:
....
jmp foo
You will have to follow the jump each time to figure out the end of the function, and you cannot ignore it past the first time because you don't know whether or not you're dealing with self-modifying code. (You could have inline assembly in your C++ code that modifies itself, for instance.) It could very well extend to some other place of memory, so your analyzer will (or should) end in an infinite loop, unless you tolerate false negatives.
Isn't that like the halting problem?
I'm posting this to say two things:
1) Most of the answers given here are really bad and will break easily. If you use the C function pointer (using the function name), in a debug build of your executable, and possibly in other circumstances, it may point to a JMP shim that will not have the function body itself. Here's an example. If I do the following for the function I defined below:
FARPROC pfn = (FARPROC)some_function_with_possibility_to_get_its_size_at_runtime;
the pfn I get (for example: 0x7FF724241893) will point to this, which is just a JMP instruction:
Additionally, a compiler can nest several of those shims, or branch your function code so that it will have multiple epilogs, or ret instructions. Heck, it may not even use a ret instruction. Then, there's no guarantee that functions themselves will be compiled and linked in the order you define them in the source code.
You can do all that stuff in assembly language, but not in C or C++.
2) So that above was the bad news. The good news is that the answer to the original question is, yes, there's a way (or a hack) to get the exact function size, but it comes with the following limitations:
It works in 64-bit executables on Windows only.
It is obviously Microsoft specific and is not portable.
You have to do this at run-time.
The concept is simple -- utilize the way SEH is implemented in x64 Windows binaries. Compiler adds details of each function into the PE32+ header (into the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory of the optional header) that you can use to obtain the exact function size. (In case you're wondering, this information is used for catching, handling and unwinding of exceptions in the __try/__except/__finally blocks.)
Here's a quick example:
//You will have to call this when your app initializes and then
//cache the size somewhere in the global variable because it will not
//change after the executable image is built.
size_t fn_size; //Will receive function size in bytes, or 0 if error
some_function_with_possibility_to_get_its_size_at_runtime(&fn_size);
and then:
#include <Windows.h>
//The function itself has to be defined for two types of a call:
// 1) when you call it just to get its size, and
// 2) for its normal operation
bool some_function_with_possibility_to_get_its_size_at_runtime(size_t* p_getSizeOnly = NULL)
{
//This input parameter will define what we want to do:
if(!p_getSizeOnly)
{
//Do this function's normal work
//...
return true;
}
else
{
//Get this function size
//INFO: Works only in 64-bit builds on Windows!
size_t nFnSz = 0;
//One of the reasons why we have to do this at run-time is
//so that we can get the address of a byte inside
//the function body... we'll get it as this thread context:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
if(pRTFn)
{
nFnSz = pRTFn->EndAddress - pRTFn->BeginAddress;
}
*p_getSizeOnly = nFnSz;
return false;
}
}
This can work in very limited scenarios. I use it in part of a code injection utility I wrote. I don't remember where I found the information, but I have the following (C++ in VS2005):
#pragma runtime_checks("", off)
static DWORD WINAPI InjectionProc(LPVOID lpvParameter)
{
// do something
return 0;
}
static DWORD WINAPI InjectionProcEnd()
{
return 0;
}
#pragma runtime_checks("", on)
And then in some other function I have:
size_t cbInjectionProc = (size_t)InjectionProcEnd - (size_t)InjectionProc;
You have to turn off some optimizations and declare the functions as static to get this to work; I don't recall the specifics. I don't know if this is an exact byte count, but it is close enough. The size is only that of the immediate function; it doesn't include any other functions that may be called by that function. Aside from extreme edge cases like this, "the size of a function" is meaningless and useless.
The real solution to this is to dig into your compiler's documentation. The ARM compiler we use can be made to produce an assembly dump (code.dis), from which it's fairly trivial to subtract the offsets between a given mangled function label and the next mangled function label.
I'm not certain which tools you will need for this with a windows target, however. It looks like the tools listed in the answer to this question might be what you're looking for.
Also note that I (working in the embedded space) assumed you were talking about post-compile-analysis. It still might be possible to examine these intermediate files programmatically as part of a build provided that:
The target function is in a different object
The build system has been taught the dependencies
You know for sure that the compiler will build these object files
Note that I'm not sure entirely WHY you want to know this information. I've needed it in the past to be sure that I can fit a particular chunk of code in a very particular place in memory. I have to admit I'm curious what purpose this would have on a more general desktop-OS target.
In C++, the there is no notion of function size. In addition to everything else mentioned, preprocessor macros also make for an indeterminate size. If you want to count number of instruction words, you can't do that in C++, because it doesn't exist until it's been compiled.
What do you mean "size of a function"?
If you mean a function pointer than it is always just 4 bytes for 32bits systems.
If you mean the size of the code than you should just disassemble generated code and find the entry point and closest ret call. One way to do it is to read the instruction pointer register at the beginning and at the end of your function.
If you want to figure out the number of instructions called in the average case for your function you can use profilers and divide the number of retired instructions on the number of calls.
I think it will work on windows programs created with msvc, as for branches the 'ret' seems to always come at the end (even if there are branches that return early it does a jne to go the end).
However you will need some kind of disassembler library to figure the current opcode length as they are variable length for x86. If you don't do this you'll run into false positives.
I would not be surprised if there are cases this doesn't catch.
There is no facilities in Standard C++ to obtain the size or length of a function.
See my answer here: Is it possible to load a function into some allocated memory and run it from there?
In general, knowing the size of a function is used in embedded systems when copying executable code from a read-only source (or a slow memory device, such as a serial Flash) into RAM. Desktop and other operating systems load functions into memory using other techniques, such as dynamic or shared libraries.
Just set PAGE_EXECUTE_READWRITE at the address where you got your function. Then read every byte. When you got byte "0xCC" it means that the end of function is actual_reading_address - 1.
Using GCC, not so hard at all.
void do_something(void) {
printf("%s!", "Hello your name is Cemetech");
do_something_END:
}
...
printf("size of function do_something: %i", (int)(&&do_something_END - (int)do_something));
below code the get the accurate function block size, it works fine with my test
runtime_checks disable _RTC_CheckEsp in debug mode
#pragma runtime_checks("", off)
DWORD __stdcall loadDll(char* pDllFullPath)
{
OutputDebugStringA(pDllFullPath);
//OutputDebugStringA("loadDll...................\r\n");
return 0;
//return test(pDllFullPath);
}
#pragma runtime_checks("", restore)
DWORD __stdcall getFuncSize_loadDll()
{
DWORD maxSize=(PBYTE)getFuncSize_loadDll-(PBYTE)loadDll;
PBYTE pTail=(PBYTE)getFuncSize_loadDll-1;
while(*pTail != 0xC2 && *pTail != 0xC3) --pTail;
if (*pTail==0xC2)
{ //0xC3 : ret
//0xC2 04 00 : ret 4
pTail +=3;
}
return pTail-(PBYTE)loadDll;
};
The non-portable, but API-based and correctly working approach is to use program database readers - like dbghelp.dll on Windows or readelf on Linux. The usage of those is only possible if debug info is enabled/present along with the program. Here's an example on how it works on Windows:
SYMBOL_INFO symbol = { };
symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
// Implies, that the module is loaded into _dbg_session_handle, see ::SymInitialize & ::SymLoadModule64
::SymFromAddr(_dbg_session_handle, address, 0, &symbol);
You will get the size of the function in symbol.Size, but you may also need additional logic identifying whether the address given is a actually a function, a shim placed there by incremental linker or a DLL call thunk (same thing).
I guess somewhat similar can be done via readelf on Linux, but maybe you'll have to come up with the library on top of its sourcecode...
You must bear in mind that although disassembly-based approach is possible, you'll basically have to analyze a directed graph with endpoints in ret, halt, jmp (PROVIDED you have incremental linking enabled and you're able to read jmp-table to identify whether the jmp you're facing in function is internal to that function (missing in image's jmp-table) or external (present in that table; such jmps frequently occur as part of tail-call optimization on x64, as I know)), any calls that are meant to be nonret (like an exception generating helper), etc.
It's an old question but still...
For Windows x64, functions all have a function table, which contains the offset and the size of the function. https://learn.microsoft.com/en-us/windows/win32/debug/pe-format . This function table is used for unwinding when an exception is thrown.
That said, this doesn't contain information like inlining, and all the other issues that people already noted...
int GetFuncSizeX86(unsigned char* Func)
{
if (!Func)
{
printf("x86Helper : Function Ptr NULL\n");
return 0;
}
for (int count = 0; ; count++)
{
if (Func[count] == 0xC3)
{
unsigned char prevInstruc = *(Func - 1);
if (Func[1] == 0xCC // int3
|| prevInstruc == 0x5D// pop ebp
|| prevInstruc == 0x5B// pop ebx
|| prevInstruc == 0x5E// pop esi
|| prevInstruc == 0x5F// pop edi
|| prevInstruc == 0xCC// int3
|| prevInstruc == 0xC9)// leave
return count++;
}
}
}
you could use this assumming you are in x86 or x86_64

Why is this code slower even if the function is inlined?

I have a method like this :
bool MyFunction(int& i)
{
switch(m_step)
{
case 1:
if (AComplexCondition)
{
i = m_i;
return true;
}
case 2:
// some code
case 3:
// some code
}
}
Since there are lots of case statements (more than 3) and the function is becoming large, I tried to extract the code in case 1 and put it in an inline function like this:
inline bool funct(int& i)
{
if (AComplexCondition)
{
i = m_i;
return true;
}
return false;
}
bool MyFunction(int& i)
{
switch(m_step)
{
case 1:
if (funct(i))
{
return true;
}
case 2:
// some code
case 3:
// some code
}
}
It seems this code is significantly slower than the original. I checked with -Winline and the function is inlined. Why is this code slower? I thought it would be equivalent. The only difference I see is there is one more conditional check in the second version, but I thought the compiler should be able to optimize it away. Right?
edit:
Some peoples suggested that I should use gdb to stop over every assembly instructions in both versions to see the differences. I did this.
The first version look like this :
mov
callq (Call to AComplexCondition())
test
je (doesn't jump)
mov (i = m_i)
movl (m_step = 1)
The second version, that is a bit slower seems simpler.
movl (m_step = 1)
callq (Call to AComplexCondition())
test
je (doesn't jump)
mov (i = m_i)
xchg %ax,%ax (This is a nop I think)
These two version seems to do the same thing, so I still don't know why the second version is still slower.
Just step through it. Plant a breakpoint, go into the disassembly view, and start stepping.
All mysteries will vanish.
This is very hard to track down. One problem could be code bloat causing the majority of the loop to be pushed out of the (small) CPU cache... But that doesn't entirely make sense either now that I think of it..
What I suggest doing:
Isolate the code and condition as much as possible while still being able to observe the slowdown.
Then, go profile it. Does the profiling make sense? Now, (assuming your up for the adventure) disasssemble the code and look at what g++ is doing different. Report those results back here
GMan is correct, inline doesn't guarantee that your function will be inlined. It is a hint to the compiler that it might be a good idea. If the compiler doesn't think it is wise to inline the function, you now have the overhead of a function call. Which at the very least will mean two JMP statement being executed. Which means the instructions for the function are stored in a non sequential location, not in the next memory location where the function was invoked, and execution will move that new location complete it and move back to after your function call.
Without seeing the ComplexCondition part, it's hard to say. If that condition is sufficiently complex, the compiler won't be able to pipeline it properly and it will interfere with the branch prediction in the chip. Just a possibility.
Does the assembler tell you anything about what's happening? It might be easier to look at the disassembly than to have us guess, although I go along with iaimtomisbehave's jmp idea generally.
This is a good question. Let us know what you find. I do have a few thoughts mostly stemming from the compiler no longer being able to break up the code you have inlined, but no guaranteed answer.
statement order. It makes sense that the compiler would put this statement with its complex code last. That means the other cases would be evaluated first and it would never get checked unless necessary. If you simplify the statement it might not do this, meaning your crazy conditional gets fully evaluated every time.
creating extra cases. It should be possible to pull some of the coditionals out of the if statement and make an extra case stament in some circumstances. That could eliminate some checking.
pipelining defeated. Even if it inlines, it won't be able to break up the code inside the actuall inlining any. This is the basic issue with all three of these, but with pipelining this causes problems obviously since for pipelining you want to start executing before you get to the check itself.

Gentle introduction to JIT and dynamic compilation / code generation

The deceptively simple foundation of dynamic code generation within a C/C++ framework has already been covered in another question. Are there any gentle introductions into topic with code examples?
My eyes are starting to bleed staring at highly intricate open source JIT compilers when my needs are much more modest.
Are there good texts on the subject that don't assume a doctorate in computer science? I'm looking for well worn patterns, things to watch out for, performance considerations, etc. Electronic or tree-based resources can be equally valuable. You can assume a working knowledge of (not just x86) assembly language.
Well a pattern I've used in emulators goes something like this:
typedef void (*code_ptr)();
unsigned long instruction_pointer = entry_point;
std::map<unsigned long, code_ptr> code_map;
void execute_block() {
code_ptr f;
std::map<unsigned long, void *>::iterator it = code_map.find(instruction_pointer);
if(it != code_map.end()) {
f = it->second
} else {
f = generate_code_block();
code_map[instruction_pointer] = f;
}
f();
instruction_pointer = update_instruction_pointer();
}
void execute() {
while(true) {
execute_block();
}
}
This is a simplification, but the idea is there. Basically, every time the engine is asked to execute a "basic block" (usually a everything up to next flow control op or whole function in possible), it will look it up to see if it has already been created. If so, execute it, else create it, add it and then execute.
rinse repeat :)
As for the code generation, that gets a little complicated, but the idea is to emit a proper "function" which does the work of your basic block in the context of your VM.
EDIT: note that I haven't demonstrated any optimizations either, but you asked for a "gentle introduction"
EDIT 2: I forgot to mention one of the most immediately productive speed ups you can implement with this pattern. Basically, if you never remove a block from your tree (you can work around it if you do but it is way simpler if you never do), then you can "chain" blocks together to avoid lookups. Here's the concept. Whenever you return from f() and are about to do the "update_instruction_pointer", if the block you just executed ended in either a call, unconditional jump, or didn't end in flow control at all, then you can "fixup" its ret instruction with a direct jmp to the next block it'll execute (cause it'll always be the same one) if you have already emited it. This makes it so you are executing more and more often in the VM and less and less in the "execute_block" function.
I'm not aware of any sources specifically related to JITs, but I imagine that it's pretty much like a normal compiler, only simpler if you aren't worried about performance.
The easiest way is to start with a VM interpreter. Then, for each VM instruction, generate the assembly code that the interpreter would have executed.
To go beyond that, I imagine that you would parse the VM byte codes and convert them into some sort of suitable intermediate form (three address code? SSA?) and then optimize and generate code as in any other compiler.
For a stack based VM, it may help to to keep track of the "current" stack depth as you translate the byte codes into intermediate form, and treat each stack location as a variable. For example, if you think that the current stack depth is 4, and you see a "push" instruction, you might generate an assignment to "stack_variable_5" and increment a compile time stack counter, or something like that. An "add" when the stack depth is 5 might generate the code "stack_variable_4 = stack_variable_4+stack_variable_5" and decrement the compile time stack counter.
It is also possible to translate stack based code into syntax trees. Maintain a compile-time stack. Every "push" instruction causes a representation of the thing being pushed to be stored on the stack. Operators create syntax tree nodes that include their operands. For example, "X Y +" might cause the stack to contain "var(X)", then "var(X) var(Y)" and then the plus pops both var references off and pushes "plus(var(X), var(Y))".
Get yourself a copy of Joel Pobar's book on Rotor (when it's out), and delve through the source to the SSCLI. Beware, insanity lies within :)