Is it possible to see builtin function definition? - c++

This is pure curiosity, I'm using a cmath function, that is cos, whatever.
After looking at the cmath header, I see that kind of things:
inline _GLIBCXX_CONSTEXPR float
acos(float __x)
{ return __builtin_acosf(__x); }
And there is that "__ builtin__" and I don't find any source of that with google.
I'm interested in seeing the source code of how they implement a math function, I mean I guess they use taylor and stuff, but I wanted to see how they do it. Is it hidden for proprietary reasons or can it be found?

GCC __builtin_ functions are not actual functions, that is, they are not really declared and implemented anywhere as such, but instead they are detected by the compiler and mapped to some implementation. See builtins.h, builtins.c and builtins.def. It is rather hard to track down where the actual implementations reside, but it seems that these mathematical functions are taken from the implementation of libc that you are using. For example, digging through GLibc source code, one can find at least a couple of implementations of __ieee754_acosf (which seems to be then aliased and wrapped by other function) one in C e_acosf.c and one in x86 assembly a_acosf.S (backup links, since these are unofficial GitHub mirrors e_acosf.c e_acosf.S). You could say that readability was not their first priority. You can find similar code in the musl libc source tree: acosf.c in C an acosf.s in x86.

Related

How to put inline functions in the C++ source file?

How can I force the inlining of a function, but define it in a C++ file ?
This is a question that's been asked in the past, for example here: Moving inline methods from a header file to a .cpp files
The answers there, in short, go as follows: "inline used to mean [remove function call overhead at the expense of .text size], now it means [relax ODR], so don't use inline for anything that's not ODR related, the compiler knows better".
I'm aware of that, however in my somewhat exotic case, I don't care about performance.
I'm programming an embedded device and, should someone break through the other layers of security, I want to make it as obnoxious as possible to reverse engineer this part of the code, and one thing this implies is that I don't want function calls (that aren't called numerous times anyway) to expose the function boundaries, which are natural delimitations of pieces of code that achieve something on their own.
However, I would also like to keep my code orderly and not have code in my header files.
I see that I can use __attribute((force_inline)) to force inlining, but then I get warnings if those functions don't have an inline attribute too: warning: always_inline function might not be inlinable [-Wattributes]
Suppressing the attributes warning is an option, but I'd rather only take it once I'm sure there are no clean way to do this.
Hence the question: how can I have a forcibly inlined function whose declaration is in a header, but definition is in a source file, without suppressing all attributes warnings ? Is that impossible ?
Inlining can only be asked. Sometimes a bit forcefully. But you can never guarantee that the function WILL be inlined finally - because reasons, sometimes quite obscure ones.
Here what's MSVC documentation says (I've highlighted the important parts):
The compiler treats the inline expansion options and keywords as suggestions. There's no guarantee that functions will be inlined. You can't force the compiler to inline a particular function, even with the __forceinline keyword. When compiling with /clr, the compiler won't inline a function if there are security attributes applied to the function.
C++ standard says:
No matter how you designate a function as inline, it is a request that the compiler is allowed to ignore: the compiler might inline-expand some, all, or none of the places where you call a function designated as inline.
GCC documentation is a bit less crystal-clear about non-inlinable functions, but cases exists anyway.
The only "real" way to force inlining is quite ugly, since it rely on inlining it before compilation... Yeah, old-style preprocessor macros. The Evil Itself. Or by using a dirty hack with a #include replacing the function call (and inserting C++ code instead)... It may be a bit safer than a macro, regarding double evaluations, but other side-effects can be even worse since it must rely on "global" variables to work.
Does it worth the pain? Probably not. In particular for "obfuscation", because it won't be as "secure" as you think it will be. Yes, an explicit function call is easier to trace. But it won't change anything: reverse engineering don't rely on that to be done. In fact, obfuscation is near never a good (or even working...) solution. I used to think that... a long, very long time ago. I proved to myself that it was near useless. On my own "secured" code. Breaking the code took me much less time than it took me to "protect" it...

Which functions are affected by -fno-math-errno?

I have been excited by this post: https://stackoverflow.com/a/57674631/2492801 and I consider using -fno-math-errno. But I would like to be sure that I do not harm the behaviour of the software I am working on.
Therefore I have checked the (rather large) codebase to see where errno is being used and I wanted to decide whether these usages interfere with -fno-math-errno. But how to do that? The documentation says:
-fno-math-errno
Do not set errno after calling math functions that are executed with a single instruction, e.g., sqrt...
But how can I know which math functions are executed with a single instruction? Is this documented somewhere? Where?
It seems as if the codebase I use relies on errno especially when calling strtol and when working with streams. I guess that strtol is not executed with a single instruction. Is it considered to be a math function at all? How can I be sure?
You can find list of functions affected by -fno-math-errno in GCC's builtins.def (search for "ERRNO"). It seems that only some functions from math.h header (cos, sin, exp, etc.) are affected. Treatment of other standard functions that use errno (strtol, etc.) will not change under this flag.

Function with bool return value, only set 1 byte of the entire register

I have the following piece of code which is a part of api (cdecl). In MSVC++ the sizeof bool is 1 byte, but since bool is implementation defined, some programs compiled by other compiler/the author incorrectly define function signature may treat bool as >1 byte and calling the check below may return true on their side of programs.
virtual bool isValid()
{
return false;
// ^ code above in asm: xor al, al
}
To avoid this, I put an inline asm, xor eax, eax before the return - but I feel it a bit hacky and it of course will not work on x64 due to lack of inline assembler support.
Using #define bool int will work but it is not I wanted, as I have structs that have bool datatype inside it and using this will causes corruption.
Is there anything like intrinsics that can zeroed the eax/rax register or anything that can solve this problem?
There's nothing that will do what you're asking for. Your problem needs a much different solution.
First any code that "incorrectly define function signature" is broken an needs to fixed. It's never the solution to work around it in other code.
Next your problem is like more than just bool being implementation defined, the C++ standard makes a whole host of things are implementation defined. So much so that two different C++ compilers are rarely have a compatible ABIs. If your code provides C++ interfaces for the use of code compiled by other people you'll probably need to produce separately compiled binaries, whether in the form of object files, static libraries, DLLs or executables, for each different compiler you want to support. In fact you may need to provide separate binaries for each version of each compiler.
There are two C++ compilers the try to be compatible with the Microsoft C++ ABI. The first is Intel's C++ compiler and the second is the Windows port of clang. The clang implementation is notably still a work in progress. You may still need to create separate versions for each version of the Microsoft C/C++ runtime libraries your code is compiled with.
You can potentially reduce the number of different versions of binaries that you need to distribute by providing a pure C interface to your code. A pure C interface means using only C data types and only functions declared as extern "C". While things like classes, member functions, templates, RTTI and exceptions can be used in your implementation the can't be used as part of your public interface. An exception are COM-like interfaces, classes with nothing but public pure virtual functions. Since C compilers for Windows all use essentially the same C ABI and support COM interfaces, compatibility issues are less likely to be an issue. However the bool type (actually the _Bool type in C) is probably not safe to use, since it's a relatively recent addition to the C language. Use int in your C interfaces instead.
Note that because of C/C++ runtime differences even if you all you want to do distribute compiled binaries for use with Microsoft's Visual C++ compiler you may still need to distribute versions for each version of the compiler. That's because each version comes with a different runtime implementation and which have data structures with incompatible internal layouts. You can't pass an STL container created in a function compiled by one version of Visual C++ to a function compiled with a different version. You can't allocate memory with malloc in an executable and free it in a DLL, if the executable and DLL use different versions of the C runtime.
Unfortunately unless you're willing to restrict your users to one particular compiler the easy solution to your problem that you're looking for may not exist. Note that this is a common solution used by programs that provide plugin support. Pugins need to be compiled the same version of the same compiler that compiled the executable.

Is there a portable wrapper for C++ type_info that standardizes type name string format?

The format of the output of type_info::name() is implementation specific.
namespace N { struct A; }
const N::A *a;
typeid(a).name(); // returns e.g. "const struct N::A" but compiler-specific
Has anyone written a wrapper that returns dependable, predictable type information that is the same across compilers. Multiple templated functions would allow user to get specific information about a type. So I might be able to use:
MyTypeInfo::name(a); // returns "const struct N::A *"
MyTypeInfo::base(a); // returns "A"
MyTypeInfo::pointer(a); // returns "*"
MyTypeInfo::nameSpace(a); // returns "N"
MyTypeInfo::cv(a); // returns "const"
These functions are just examples, someone with better knowledge of the C++ type system could probably design a better API. The one I'm interested in in base(). All functions would raise an exception if RTTI was disabled or an unsupported compiler was detected.
This seems like the sort of thing that Boost might implement, but I can't find it in there anywhere. Is there a portable library that does this?
There are some limitations to do such things in C++, so you probably won't find exactly what you want in the near future. The meta-information about the types that the compiler inserts in the compiled code is also implementation-specific to the RTL used by the compiler, so it'd be difficult for a third-party library to do a good job without relying to undocumented features of each specific compiler that might break in later versions.
The Qt framework has, to my knowledge, the nearest thing to what you intended. But they do that completely independent from RTTI. Instead, they have their own "compiler" that parses the source code and generates additional source modules with the meta-information. Then, you compile+link these modules along with your program and use their API to get the information. Take a look at http://doc.qt.nokia.com/latest/metaobjects.html
Jeremy Pack (from Boost Extension plugin framework) appears to have written such a thing:
http://blog.redshoelace.com/2009/06/resource-management-across-dll.html
3. RTTI does not always function as expected across DLL boundaries. Check out the type_info classes to see how I deal with that.
So you could have a look there.
PS. I remembered because I once fixed a bug in that area; this might still add information so here's the link: https://stackoverflow.com/a/5838527/85371
GCC has __cxa_demangle https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
If there are such extensions for all compilers you target, you could use them to write a portable function with macros to detect the compiler.

Static source code analysis with LLVM

I recently discover the LLVM (low level virtual machine) project, and from what I have heard It can be used to performed static analysis on a source code. I would like to know if it is possible to extract the different function call through function pointer (find the caller function and the callee function) in a program.
I could find the kind of information in the website so it would be really helpful if you could tell me if such an library already exist in LLVM or can you point me to the good direction on how to build it myself (existing source code, reference, tutorial, example...).
EDIT:
With my analysis I actually want to extract caller/callee function call. In the case of a function pointer, I would like to return a set of possible callee. both caller and callee must be define in the source code (this does not include third party function in a library).
I think that Clang (the analyzer that is part of LLVM) is geared towards the detection of bugs, which means that the analyzer tries to compute possible values of some expressions (to reduce false positives) but it sometimes gives up (in this case, emitting no alarm to avoid a deluge of false positives).
If your program is C only, I recommend you take a look at the Value Analysis in Frama-C. It computes supersets of possible values for any l-value at each point of the program, under some hypotheses that are explained at length here. Complexity in the analyzed program only means that the returned supersets are more approximated, but they still contain all the possible run-time values (as long as you remain within the aforementioned hypotheses).
EDIT: if you are interested in possible values of function pointers for the purpose of slicing the analyzed program, you should definitely take a look at the existing dependencies and slicing computations in Frama-C. The website doesn't have any nice example for slicing, here is one from a discussion on the mailing-list
You should take a look at Elsa. It is relatively easy to extend and lets you parse an AST fairly easily. It handles all of the parsing, lexing and AST generation and then lets you traverse the tree using the Visitor pattern.
class CallGraphGenerator : public ASTVisitor
{
//...
virtual bool visitFunction(Function *func);
virtual bool visitExpression(Expression *expr);
}
You can then detect function declarations, and probably detect function pointer usage. Finally you could check the function pointers' declarations and generate a list of the declared functions that could have been called using that pointer.
In our project, we perform static source code analysis by converting LLVM bytecode into C code with help of llc program that is shipped with LLVM. Then we analyze C code with CIL (C Intermediate Language), but for C language a lot of tools is available. The pitfail that the code generated by llc is AWFUL and suffers from a great loss of precision. But still, it's one way to go.
Edit: in fact, I wouldn't recommend anyone to o like this. But still, just for a record...
I think your question is flawed. The title says "Static source code analysis". Yet your underlying reason appears to be the construction of (part of ) a call graph including calls through a function pointer. The essence of function pointers is that you cannot know their values at compile time, i.e. at the point where you do static source code analysis. Consider this bit of code:
void (*pFoo)() = GetFoo();
pFoo();
Static code analysis cannot tell you what GetFoo() returns at runtime, although it might tell you that the result is subsequently used for a function call.
Now, what values could GetFoo() possibly return? You simply can't say this in general (equivalent to solving the halting problem). You will be able to guess some trivial cases. The guessable percentage will of course go up depending on how much effort you are willing to invest.
The DMS Software Reengineering Toolkit provides various types of control, data flow, and global points-to analyzers for large systems of C code, and constructs call graphs using that global points-to analysis (with the appropriate conservative assumptions). More discussion and examples of the analyses can be found at the web site.
DMS has been tested on monolithic systems of C code with 25 million lines. (The call graph for this monster had 250,000 functions in it).
Engineering all this machinery from basic C ASTs and symbol tables is a huge amount of work; been there, done that. You don't want to do this yourself if you have something else to do with your life, like implement other applications.