HP-UX Itanium Compare and Swap

HP-UX Itanium Compare and Swap - c++

I am developing C/C++ cross-platform code, and the last platform is Itanium based HP-UX. Relevant machine an processor information can be found at the end of the question.
I need to implement or find an atomic compare and swap for the machine and compiler specifications given below.
I have found a few possibilities for solutions, but haven't been able to find how to use them.
The first possible solution is through the use of _Asm_cmpxchg (documentation here). I'm unable to find what header to include for this or how to get it to compile.
The second possible solution is to write my own inline assembly with the direct use of the cmpxchg and cmpxchg8b commands, but I haven't been able to find how to correctly do this either. I've found various resources, most of which are directly writing assembly, not for the processor architecture I require, or don't show a specific enough example.
I found more documentation about cmpxchg and cmpxchg8 instructions (as well as tzcnt and lzcnt which are two that are nice to have, but not necessary) here. If you are viewing in google chrome, abosulte page values are 234 for cmpxchg and 236 for cmpxchg8.
Limitations: I am unable to use a third party library due to constraints beyond my control.
Result of uname -smr: HP-UX B.11.31 ia64
Processor Model: Intel(R) Itanium(R) Processor 9340
Compiler -v: aCC: HP C/aC++ B3910B A.06.28
Update: I was able to get _Asm_cmpxchg to compile, but it doesn't seem to work (the value remains unchanged). For parameters, I passed _SZ_W for the _Asm_sz, _SEM_ACQ for _Asm_sem, _LDHINT_NONE for _Asm_ldhint, a pointer to the original 32 bit integer value for r3, and the desired new value for r2. I'm guessing at the meaning of the parameters, given that documentation is very lackluster.

I ended up finding the solution on my own, using option 1. Below is the sample code to get it to work:
bool compare_and_swap(unsigned int* var, unsigned int oldval, unsigned int newval)
{
// Move the old value into register _AREG_CCV because this is the register
// that var will be compared against
_Asm_mov_to_ar(_AREG_CCV, oldval);
// Do the compare and swap
return oldval == _Asm_cmpxchg(
_SZ_W /* 4 byte word */,
_SEM_ACQ /* acquire the semaphore */,
var,
newval,
_LDHINT_NONE /* locality hint */);
}

Related

GCC Assembly "+t"

I'm currently testing some inline assembly in C++ on an old compiler (GCC circa 2004) and I wanted to perform the square root function on a floating point number. After trying and searching for a successful method, I came across the following code
float r3(float n){
__asm__("fsqrt" : "+t" (n));
return n;
};
which worked. The issue is, even though I understand the assembly instructions used, I'm unable to find any particular documentation as to what the "+t" flag means on the n variable. I'm under the genuine idea that it seems to be a manner by which to treat the variable n as both the input and output variable but I was unable to find any information on it. So, what exactly is the "t" flag and how does it work here?

+
Means that this operand is both read and written by the instruction.
(From here)
t
Top of 80387 floating-point stack (%st(0)).
(From here)

+ means you are reading and writing the register.
t means the value is on the top of the 80387 floating point stack.
References:
GCC manual, Extended Asm has general information about constraints - search for "constraints"
GCC manual, Machine Constraints has information about the specific constraints supported on each architecture - search for "x86 family"

How can I utilize the 'red' and 'atom' PTX instructions in CUDA C++ code?

The CUDA PTX Guide describes the instructions 'atom' and 'red', which perform atomic and non-atomic reductions. This is news to me (at least with respect to non-atomic reductions)... I remember learning how to do reductions with SHFL a while back. Are these instructions reflected or wrapped somehow in CUDA runtime APIs? Or some other way accessible with C++ code without actually writing PTX code?

Are these instructions reflected or wrapped somehow in CUDA runtime APIs? Or some other way accessible with C++ code without actually writing PTX code?
Most of these instructions are reflected in atomic operations (built-in intrinsics) described in the programming guide. If you compile any of those atomic intrinsics, you will find atom or red instructions emitted by the compiler at the PTX or SASS level in your generated code.
The red instruction type will generally be used when you don't explicitly use the return value from from one of the atomic intrinsics. If you use the return value explicitly, then the compiler usually emits the atom instruction.
Thus, it should be clear that this instruction by itself does not perform a complete classical parallel reduction, but certainly could be used to implement one if you wanted to depend on atomic hardware (and associated limitations) for your reduction operations. This is generally not the fastest possible implementation for parallel reductions.
If you want direct access to these instructions, the usual advice would be to use inline PTX where desired.
As requested, to elaborate using atomicAdd() as an example:
If I perform the following:
atomicAdd(&x, data);
perhaps because I am using it for a typical atomic-based reduction into the device variable x, then the compiler would emit a red (PTX) or RED (SASS) instruction taking the necessary arguments (the pointer to x and the variable data, i.e. 2 logical registers).
If I perform the following:
int offset = atomicAdd(&buffer_ptr, buffer_size);
perhaps because I am using it not for a typical reduction but instead to reserve a space (buffer_size) in a buffer shared amongst various threads in the grid, which has an offset index (buffer_ptr) to the next available space in the shared buffer, then the compiler would emit a atom (PTX) or ATOM (SASS) instruction, including 3 arguments (offset, &buffer_ptr, and buffer_size, in registers).
The red form can be issued by the thread/warp which may then continue and not normally stall due to this instruction issue which will normally have no dependencies for subsequent instructions. The atom form OTOH will imply modification of one of its 3 arguments (one of 3 logical registers). Therefore subsequent use of the data in that register (i.e. the return value of the intrinsic, i.e. offset in this case) can result in a thread/warp stall, until the return value is actually returned by the atomic hardware.

C++ self erasing code [duplicate]

I was reading this question because I'm trying to find the size of a function in a C++ program, It is hinted at that there may be a way that is platform specific. My targeted platform is windows
The method I currently have in my head is the following:
1. Obtain a pointer to the function
2. Increment the Pointer (& counter) until I reach the machine code value for ret
3. The counter will be the size of the function?
Edit1: To clarify what I mean by 'size' I mean the number of bytes (machine code) that make up the function.
Edit2: There have been a few comments asking why or what do I plan to do with this. The honest answer is I have no intention, and I can't really see the benefits of knowing a functions length pre-compile time. (although I'm sure there are some)
This seems like a valid method to me, will this work?

Wow, I use function size counting all the time and it has lots and lots of uses. Is it reliable? No way. Is it standard c++? No way. But that's why you need to check it in the disassembler to make sure it worked, every time that you release a new version. Compiler flags can mess up the ordering.
static void funcIwantToCount()
{
// do stuff
}
static void funcToDelimitMyOtherFunc()
{
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
}
int getlength( void *funcaddress )
{
int length = 0;
for(length = 0; *((UINT32 *)(&((unsigned char *)funcaddress)[length])) != 0xCCCCCCCC; ++length);
return length;
}
It seems to work better with static functions. Global optimizations can kill it.
P.S. I hate people, asking why you want to do this and it's impossible, etc. Stop asking these questions, please. Makes you sound stupid. Programmers are often asked to do non-standard things, because new products almost always push the limits of what's availble. If they don't, your product is probably a rehash of what's already been done. Boring!!!

No, this will not work:
There is no guarantee that your function only contains a single ret instruction.
Even if it only does contain a single ret, you can't just look at the individual bytes - because the corresponding value could appear as simply a value, rather than an instruction.
The first problem can possibly be worked around if you restrict your coding style to, say, only have a single point of return in your function, but the other basically requires a disassembler so you can tell the individual instructions apart.

It is possible to obtain all blocks of a function, but is an unnatural question to ask what is the 'size' of a function. Optimized code will rearrange code blocks in the order of execution and will move seldom used blocks (exception paths) into outer parts of the module. For more details, see Profile-Guided Optimizations for example how Visual C++ achieves this in link time code generation. So a function can start at address 0x00001000, branch at 0x00001100 into a jump at 0x20001000 and a ret, and have some exception handling code 0x20001000. At 0x00001110 another function starts. What is the 'size' of your function? It does span from 0x00001000 to +0x20001000, but it 'owns' only few blocks in that span. So your question should be unasked.
There are other valid questions in this context, like the total number of instructions a function has (can be determined from the program symbol database and from the image), and more importantly, what is the number of instructions in the frequent executed code path inside the function. All these are questions normally asked in the context of performance measurement and there are tools that instrument code and can give very detailed answers.
Chasing pointers in memory and searching for ret will get you nowhere I'm afraid. Modern code is way way way more complex than that.

This won't work... what if there's a jump, a dummy ret, and then the target of the jump? Your code will be fooled.
In general, it's impossible to do this with 100% accuracy because you have to predict all code paths, which is like solving the halting problem. You can get "pretty good" accuracy if you implement your own disassembler, but no solution will be nearly as easy as you imagine.
A "trick" would be to find out which function's code is after the function that you're looking for, which would give pretty good results assuming certain (dangerous) assumptions. But then you'd have to know what function comes after your function, which, after optimizations, is pretty hard to figure out.
Edit 1:
What if the function doesn't even end with a ret instruction at all? It could very well just jmp back to its caller (though it's unlikely).
Edit 2:
Don't forget that x86, at least, has variable-length instructions...
Update:
For those saying that flow analysis isn't the same as solving the halting problem:
Consider what happens when you have code like:
foo:
....
jmp foo
You will have to follow the jump each time to figure out the end of the function, and you cannot ignore it past the first time because you don't know whether or not you're dealing with self-modifying code. (You could have inline assembly in your C++ code that modifies itself, for instance.) It could very well extend to some other place of memory, so your analyzer will (or should) end in an infinite loop, unless you tolerate false negatives.
Isn't that like the halting problem?

I'm posting this to say two things:
1) Most of the answers given here are really bad and will break easily. If you use the C function pointer (using the function name), in a debug build of your executable, and possibly in other circumstances, it may point to a JMP shim that will not have the function body itself. Here's an example. If I do the following for the function I defined below:
FARPROC pfn = (FARPROC)some_function_with_possibility_to_get_its_size_at_runtime;
the pfn I get (for example: 0x7FF724241893) will point to this, which is just a JMP instruction:
Additionally, a compiler can nest several of those shims, or branch your function code so that it will have multiple epilogs, or ret instructions. Heck, it may not even use a ret instruction. Then, there's no guarantee that functions themselves will be compiled and linked in the order you define them in the source code.
You can do all that stuff in assembly language, but not in C or C++.
2) So that above was the bad news. The good news is that the answer to the original question is, yes, there's a way (or a hack) to get the exact function size, but it comes with the following limitations:
It works in 64-bit executables on Windows only.
It is obviously Microsoft specific and is not portable.
You have to do this at run-time.
The concept is simple -- utilize the way SEH is implemented in x64 Windows binaries. Compiler adds details of each function into the PE32+ header (into the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory of the optional header) that you can use to obtain the exact function size. (In case you're wondering, this information is used for catching, handling and unwinding of exceptions in the __try/__except/__finally blocks.)
Here's a quick example:
//You will have to call this when your app initializes and then
//cache the size somewhere in the global variable because it will not
//change after the executable image is built.
size_t fn_size; //Will receive function size in bytes, or 0 if error
some_function_with_possibility_to_get_its_size_at_runtime(&fn_size);
and then:
#include <Windows.h>
//The function itself has to be defined for two types of a call:
// 1) when you call it just to get its size, and
// 2) for its normal operation
bool some_function_with_possibility_to_get_its_size_at_runtime(size_t* p_getSizeOnly = NULL)
{
//This input parameter will define what we want to do:
if(!p_getSizeOnly)
{
//Do this function's normal work
//...
return true;
}
else
{
//Get this function size
//INFO: Works only in 64-bit builds on Windows!
size_t nFnSz = 0;
//One of the reasons why we have to do this at run-time is
//so that we can get the address of a byte inside
//the function body... we'll get it as this thread context:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
if(pRTFn)
{
nFnSz = pRTFn->EndAddress - pRTFn->BeginAddress;
}
*p_getSizeOnly = nFnSz;
return false;
}
}

This can work in very limited scenarios. I use it in part of a code injection utility I wrote. I don't remember where I found the information, but I have the following (C++ in VS2005):
#pragma runtime_checks("", off)
static DWORD WINAPI InjectionProc(LPVOID lpvParameter)
{
// do something
return 0;
}
static DWORD WINAPI InjectionProcEnd()
{
return 0;
}
#pragma runtime_checks("", on)
And then in some other function I have:
size_t cbInjectionProc = (size_t)InjectionProcEnd - (size_t)InjectionProc;
You have to turn off some optimizations and declare the functions as static to get this to work; I don't recall the specifics. I don't know if this is an exact byte count, but it is close enough. The size is only that of the immediate function; it doesn't include any other functions that may be called by that function. Aside from extreme edge cases like this, "the size of a function" is meaningless and useless.

The real solution to this is to dig into your compiler's documentation. The ARM compiler we use can be made to produce an assembly dump (code.dis), from which it's fairly trivial to subtract the offsets between a given mangled function label and the next mangled function label.
I'm not certain which tools you will need for this with a windows target, however. It looks like the tools listed in the answer to this question might be what you're looking for.
Also note that I (working in the embedded space) assumed you were talking about post-compile-analysis. It still might be possible to examine these intermediate files programmatically as part of a build provided that:
The target function is in a different object
The build system has been taught the dependencies
You know for sure that the compiler will build these object files
Note that I'm not sure entirely WHY you want to know this information. I've needed it in the past to be sure that I can fit a particular chunk of code in a very particular place in memory. I have to admit I'm curious what purpose this would have on a more general desktop-OS target.

In C++, the there is no notion of function size. In addition to everything else mentioned, preprocessor macros also make for an indeterminate size. If you want to count number of instruction words, you can't do that in C++, because it doesn't exist until it's been compiled.

What do you mean "size of a function"?
If you mean a function pointer than it is always just 4 bytes for 32bits systems.
If you mean the size of the code than you should just disassemble generated code and find the entry point and closest ret call. One way to do it is to read the instruction pointer register at the beginning and at the end of your function.
If you want to figure out the number of instructions called in the average case for your function you can use profilers and divide the number of retired instructions on the number of calls.

I think it will work on windows programs created with msvc, as for branches the 'ret' seems to always come at the end (even if there are branches that return early it does a jne to go the end).
However you will need some kind of disassembler library to figure the current opcode length as they are variable length for x86. If you don't do this you'll run into false positives.
I would not be surprised if there are cases this doesn't catch.

There is no facilities in Standard C++ to obtain the size or length of a function.
See my answer here: Is it possible to load a function into some allocated memory and run it from there?
In general, knowing the size of a function is used in embedded systems when copying executable code from a read-only source (or a slow memory device, such as a serial Flash) into RAM. Desktop and other operating systems load functions into memory using other techniques, such as dynamic or shared libraries.

Just set PAGE_EXECUTE_READWRITE at the address where you got your function. Then read every byte. When you got byte "0xCC" it means that the end of function is actual_reading_address - 1.

Using GCC, not so hard at all.
void do_something(void) {
printf("%s!", "Hello your name is Cemetech");
do_something_END:
}
...
printf("size of function do_something: %i", (int)(&&do_something_END - (int)do_something));

below code the get the accurate function block size, it works fine with my test
runtime_checks disable _RTC_CheckEsp in debug mode
#pragma runtime_checks("", off)
DWORD __stdcall loadDll(char* pDllFullPath)
{
OutputDebugStringA(pDllFullPath);
//OutputDebugStringA("loadDll...................\r\n");
return 0;
//return test(pDllFullPath);
}
#pragma runtime_checks("", restore)
DWORD __stdcall getFuncSize_loadDll()
{
DWORD maxSize=(PBYTE)getFuncSize_loadDll-(PBYTE)loadDll;
PBYTE pTail=(PBYTE)getFuncSize_loadDll-1;
while(*pTail != 0xC2 && *pTail != 0xC3) --pTail;
if (*pTail==0xC2)
{ //0xC3 : ret
//0xC2 04 00 : ret 4
pTail +=3;
}
return pTail-(PBYTE)loadDll;
};

The non-portable, but API-based and correctly working approach is to use program database readers - like dbghelp.dll on Windows or readelf on Linux. The usage of those is only possible if debug info is enabled/present along with the program. Here's an example on how it works on Windows:
SYMBOL_INFO symbol = { };
symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
// Implies, that the module is loaded into _dbg_session_handle, see ::SymInitialize & ::SymLoadModule64
::SymFromAddr(_dbg_session_handle, address, 0, &symbol);
You will get the size of the function in symbol.Size, but you may also need additional logic identifying whether the address given is a actually a function, a shim placed there by incremental linker or a DLL call thunk (same thing).
I guess somewhat similar can be done via readelf on Linux, but maybe you'll have to come up with the library on top of its sourcecode...
You must bear in mind that although disassembly-based approach is possible, you'll basically have to analyze a directed graph with endpoints in ret, halt, jmp (PROVIDED you have incremental linking enabled and you're able to read jmp-table to identify whether the jmp you're facing in function is internal to that function (missing in image's jmp-table) or external (present in that table; such jmps frequently occur as part of tail-call optimization on x64, as I know)), any calls that are meant to be nonret (like an exception generating helper), etc.

It's an old question but still...
For Windows x64, functions all have a function table, which contains the offset and the size of the function. https://learn.microsoft.com/en-us/windows/win32/debug/pe-format . This function table is used for unwinding when an exception is thrown.
That said, this doesn't contain information like inlining, and all the other issues that people already noted...

int GetFuncSizeX86(unsigned char* Func)
{
if (!Func)
{
printf("x86Helper : Function Ptr NULL\n");
return 0;
}
for (int count = 0; ; count++)
{
if (Func[count] == 0xC3)
{
unsigned char prevInstruc = *(Func - 1);
if (Func[1] == 0xCC // int3
|| prevInstruc == 0x5D// pop ebp
|| prevInstruc == 0x5B// pop ebx
|| prevInstruc == 0x5E// pop esi
|| prevInstruc == 0x5F// pop edi
|| prevInstruc == 0xCC// int3
|| prevInstruc == 0xC9)// leave
return count++;
}
}
}
you could use this assumming you are in x86 or x86_64

Get SSE version without __asm on x64

I'm trying to build slightly modified versions of some functions of the VS2010 CRT library, all is well except for the parts where it tries to access a global variable which presumably holds the instruction set architecture version (ISA):
if (__isa_available > __ISA_AVAILABLE_SSE2)
{
// ...
}
else if (__isa_available == __ISA_AVAILABLE_SSE2)
{
// ...
}
The values it should hold I found in an assembly file
__ISA_AVAILABLE_X86 equ 0
__ISA_AVAILABLE_SSE2 equ 1
__ISA_AVAILABLE_SSE42 equ 2
__ISA_AVAILABLE_AVX equ 3
How and where __isa_available is assigned a value is nowhere to be found (I've tried a find-in-files in all my directories...)
MSDN refers to the CPUID example to determine the instruction set. The problem with that is it uses __asm blocks and those are not allowed in my x64 build.
Does anyone knows how to quickly assign the correct value to __isa_available?

Microsoft decided to stop the support of inline assembly. But they introduced a new format. You can find more information about CPUID in the new format here (with example).
The advantage of intrinsics over inline assembly is that they are compatible with both x86 and x64 without additional code and are easier to use.

VC++ has an intrinsic that allows you to use CPUID without inline ASM:
__cpuid in intrin.h
On that same website is an extensive code sample, too.

Getting The Size of a C++ Function

I was reading this question because I'm trying to find the size of a function in a C++ program, It is hinted at that there may be a way that is platform specific. My targeted platform is windows
The method I currently have in my head is the following:
1. Obtain a pointer to the function
2. Increment the Pointer (& counter) until I reach the machine code value for ret
3. The counter will be the size of the function?
Edit1: To clarify what I mean by 'size' I mean the number of bytes (machine code) that make up the function.
Edit2: There have been a few comments asking why or what do I plan to do with this. The honest answer is I have no intention, and I can't really see the benefits of knowing a functions length pre-compile time. (although I'm sure there are some)
This seems like a valid method to me, will this work?

Wow, I use function size counting all the time and it has lots and lots of uses. Is it reliable? No way. Is it standard c++? No way. But that's why you need to check it in the disassembler to make sure it worked, every time that you release a new version. Compiler flags can mess up the ordering.
static void funcIwantToCount()
{
// do stuff
}
static void funcToDelimitMyOtherFunc()
{
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
__asm _emit 0xCC
}
int getlength( void *funcaddress )
{
int length = 0;
for(length = 0; *((UINT32 *)(&((unsigned char *)funcaddress)[length])) != 0xCCCCCCCC; ++length);
return length;
}
It seems to work better with static functions. Global optimizations can kill it.
P.S. I hate people, asking why you want to do this and it's impossible, etc. Stop asking these questions, please. Makes you sound stupid. Programmers are often asked to do non-standard things, because new products almost always push the limits of what's availble. If they don't, your product is probably a rehash of what's already been done. Boring!!!

No, this will not work:
There is no guarantee that your function only contains a single ret instruction.
Even if it only does contain a single ret, you can't just look at the individual bytes - because the corresponding value could appear as simply a value, rather than an instruction.
The first problem can possibly be worked around if you restrict your coding style to, say, only have a single point of return in your function, but the other basically requires a disassembler so you can tell the individual instructions apart.

It is possible to obtain all blocks of a function, but is an unnatural question to ask what is the 'size' of a function. Optimized code will rearrange code blocks in the order of execution and will move seldom used blocks (exception paths) into outer parts of the module. For more details, see Profile-Guided Optimizations for example how Visual C++ achieves this in link time code generation. So a function can start at address 0x00001000, branch at 0x00001100 into a jump at 0x20001000 and a ret, and have some exception handling code 0x20001000. At 0x00001110 another function starts. What is the 'size' of your function? It does span from 0x00001000 to +0x20001000, but it 'owns' only few blocks in that span. So your question should be unasked.
There are other valid questions in this context, like the total number of instructions a function has (can be determined from the program symbol database and from the image), and more importantly, what is the number of instructions in the frequent executed code path inside the function. All these are questions normally asked in the context of performance measurement and there are tools that instrument code and can give very detailed answers.
Chasing pointers in memory and searching for ret will get you nowhere I'm afraid. Modern code is way way way more complex than that.

This won't work... what if there's a jump, a dummy ret, and then the target of the jump? Your code will be fooled.
In general, it's impossible to do this with 100% accuracy because you have to predict all code paths, which is like solving the halting problem. You can get "pretty good" accuracy if you implement your own disassembler, but no solution will be nearly as easy as you imagine.
A "trick" would be to find out which function's code is after the function that you're looking for, which would give pretty good results assuming certain (dangerous) assumptions. But then you'd have to know what function comes after your function, which, after optimizations, is pretty hard to figure out.
Edit 1:
What if the function doesn't even end with a ret instruction at all? It could very well just jmp back to its caller (though it's unlikely).
Edit 2:
Don't forget that x86, at least, has variable-length instructions...
Update:
For those saying that flow analysis isn't the same as solving the halting problem:
Consider what happens when you have code like:
foo:
....
jmp foo
You will have to follow the jump each time to figure out the end of the function, and you cannot ignore it past the first time because you don't know whether or not you're dealing with self-modifying code. (You could have inline assembly in your C++ code that modifies itself, for instance.) It could very well extend to some other place of memory, so your analyzer will (or should) end in an infinite loop, unless you tolerate false negatives.
Isn't that like the halting problem?

I'm posting this to say two things:
1) Most of the answers given here are really bad and will break easily. If you use the C function pointer (using the function name), in a debug build of your executable, and possibly in other circumstances, it may point to a JMP shim that will not have the function body itself. Here's an example. If I do the following for the function I defined below:
FARPROC pfn = (FARPROC)some_function_with_possibility_to_get_its_size_at_runtime;
the pfn I get (for example: 0x7FF724241893) will point to this, which is just a JMP instruction:
Additionally, a compiler can nest several of those shims, or branch your function code so that it will have multiple epilogs, or ret instructions. Heck, it may not even use a ret instruction. Then, there's no guarantee that functions themselves will be compiled and linked in the order you define them in the source code.
You can do all that stuff in assembly language, but not in C or C++.
2) So that above was the bad news. The good news is that the answer to the original question is, yes, there's a way (or a hack) to get the exact function size, but it comes with the following limitations:
It works in 64-bit executables on Windows only.
It is obviously Microsoft specific and is not portable.
You have to do this at run-time.
The concept is simple -- utilize the way SEH is implemented in x64 Windows binaries. Compiler adds details of each function into the PE32+ header (into the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory of the optional header) that you can use to obtain the exact function size. (In case you're wondering, this information is used for catching, handling and unwinding of exceptions in the __try/__except/__finally blocks.)
Here's a quick example:
//You will have to call this when your app initializes and then
//cache the size somewhere in the global variable because it will not
//change after the executable image is built.
size_t fn_size; //Will receive function size in bytes, or 0 if error
some_function_with_possibility_to_get_its_size_at_runtime(&fn_size);
and then:
#include <Windows.h>
//The function itself has to be defined for two types of a call:
// 1) when you call it just to get its size, and
// 2) for its normal operation
bool some_function_with_possibility_to_get_its_size_at_runtime(size_t* p_getSizeOnly = NULL)
{
//This input parameter will define what we want to do:
if(!p_getSizeOnly)
{
//Do this function's normal work
//...
return true;
}
else
{
//Get this function size
//INFO: Works only in 64-bit builds on Windows!
size_t nFnSz = 0;
//One of the reasons why we have to do this at run-time is
//so that we can get the address of a byte inside
//the function body... we'll get it as this thread context:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
if(pRTFn)
{
nFnSz = pRTFn->EndAddress - pRTFn->BeginAddress;
}
*p_getSizeOnly = nFnSz;
return false;
}
}

This can work in very limited scenarios. I use it in part of a code injection utility I wrote. I don't remember where I found the information, but I have the following (C++ in VS2005):
#pragma runtime_checks("", off)
static DWORD WINAPI InjectionProc(LPVOID lpvParameter)
{
// do something
return 0;
}
static DWORD WINAPI InjectionProcEnd()
{
return 0;
}
#pragma runtime_checks("", on)
And then in some other function I have:
size_t cbInjectionProc = (size_t)InjectionProcEnd - (size_t)InjectionProc;
You have to turn off some optimizations and declare the functions as static to get this to work; I don't recall the specifics. I don't know if this is an exact byte count, but it is close enough. The size is only that of the immediate function; it doesn't include any other functions that may be called by that function. Aside from extreme edge cases like this, "the size of a function" is meaningless and useless.

The real solution to this is to dig into your compiler's documentation. The ARM compiler we use can be made to produce an assembly dump (code.dis), from which it's fairly trivial to subtract the offsets between a given mangled function label and the next mangled function label.
I'm not certain which tools you will need for this with a windows target, however. It looks like the tools listed in the answer to this question might be what you're looking for.
Also note that I (working in the embedded space) assumed you were talking about post-compile-analysis. It still might be possible to examine these intermediate files programmatically as part of a build provided that:
The target function is in a different object
The build system has been taught the dependencies
You know for sure that the compiler will build these object files
Note that I'm not sure entirely WHY you want to know this information. I've needed it in the past to be sure that I can fit a particular chunk of code in a very particular place in memory. I have to admit I'm curious what purpose this would have on a more general desktop-OS target.

In C++, the there is no notion of function size. In addition to everything else mentioned, preprocessor macros also make for an indeterminate size. If you want to count number of instruction words, you can't do that in C++, because it doesn't exist until it's been compiled.

What do you mean "size of a function"?
If you mean a function pointer than it is always just 4 bytes for 32bits systems.
If you mean the size of the code than you should just disassemble generated code and find the entry point and closest ret call. One way to do it is to read the instruction pointer register at the beginning and at the end of your function.
If you want to figure out the number of instructions called in the average case for your function you can use profilers and divide the number of retired instructions on the number of calls.

I think it will work on windows programs created with msvc, as for branches the 'ret' seems to always come at the end (even if there are branches that return early it does a jne to go the end).
However you will need some kind of disassembler library to figure the current opcode length as they are variable length for x86. If you don't do this you'll run into false positives.
I would not be surprised if there are cases this doesn't catch.

There is no facilities in Standard C++ to obtain the size or length of a function.
See my answer here: Is it possible to load a function into some allocated memory and run it from there?
In general, knowing the size of a function is used in embedded systems when copying executable code from a read-only source (or a slow memory device, such as a serial Flash) into RAM. Desktop and other operating systems load functions into memory using other techniques, such as dynamic or shared libraries.

Just set PAGE_EXECUTE_READWRITE at the address where you got your function. Then read every byte. When you got byte "0xCC" it means that the end of function is actual_reading_address - 1.

Using GCC, not so hard at all.
void do_something(void) {
printf("%s!", "Hello your name is Cemetech");
do_something_END:
}
...
printf("size of function do_something: %i", (int)(&&do_something_END - (int)do_something));

below code the get the accurate function block size, it works fine with my test
runtime_checks disable _RTC_CheckEsp in debug mode
#pragma runtime_checks("", off)
DWORD __stdcall loadDll(char* pDllFullPath)
{
OutputDebugStringA(pDllFullPath);
//OutputDebugStringA("loadDll...................\r\n");
return 0;
//return test(pDllFullPath);
}
#pragma runtime_checks("", restore)
DWORD __stdcall getFuncSize_loadDll()
{
DWORD maxSize=(PBYTE)getFuncSize_loadDll-(PBYTE)loadDll;
PBYTE pTail=(PBYTE)getFuncSize_loadDll-1;
while(*pTail != 0xC2 && *pTail != 0xC3) --pTail;
if (*pTail==0xC2)
{ //0xC3 : ret
//0xC2 04 00 : ret 4
pTail +=3;
}
return pTail-(PBYTE)loadDll;
};

The non-portable, but API-based and correctly working approach is to use program database readers - like dbghelp.dll on Windows or readelf on Linux. The usage of those is only possible if debug info is enabled/present along with the program. Here's an example on how it works on Windows:
SYMBOL_INFO symbol = { };
symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
// Implies, that the module is loaded into _dbg_session_handle, see ::SymInitialize & ::SymLoadModule64
::SymFromAddr(_dbg_session_handle, address, 0, &symbol);
You will get the size of the function in symbol.Size, but you may also need additional logic identifying whether the address given is a actually a function, a shim placed there by incremental linker or a DLL call thunk (same thing).
I guess somewhat similar can be done via readelf on Linux, but maybe you'll have to come up with the library on top of its sourcecode...
You must bear in mind that although disassembly-based approach is possible, you'll basically have to analyze a directed graph with endpoints in ret, halt, jmp (PROVIDED you have incremental linking enabled and you're able to read jmp-table to identify whether the jmp you're facing in function is internal to that function (missing in image's jmp-table) or external (present in that table; such jmps frequently occur as part of tail-call optimization on x64, as I know)), any calls that are meant to be nonret (like an exception generating helper), etc.

It's an old question but still...
For Windows x64, functions all have a function table, which contains the offset and the size of the function. https://learn.microsoft.com/en-us/windows/win32/debug/pe-format . This function table is used for unwinding when an exception is thrown.
That said, this doesn't contain information like inlining, and all the other issues that people already noted...

int GetFuncSizeX86(unsigned char* Func)
{
if (!Func)
{
printf("x86Helper : Function Ptr NULL\n");
return 0;
}
for (int count = 0; ; count++)
{
if (Func[count] == 0xC3)
{
unsigned char prevInstruc = *(Func - 1);
if (Func[1] == 0xCC // int3
|| prevInstruc == 0x5D// pop ebp
|| prevInstruc == 0x5B// pop ebx
|| prevInstruc == 0x5E// pop esi
|| prevInstruc == 0x5F// pop edi
|| prevInstruc == 0xCC// int3
|| prevInstruc == 0xC9)// leave
return count++;
}
}
}
you could use this assumming you are in x86 or x86_64

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js