I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!
As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.
There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.
You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.
Related
Well this might be a very weird question but my curiosity has striken pretty hard on this. So here it goes...
NOTE: Lets take the language C into consideration here.
As programmers we usually define a user-defined datatype(say struct) in the source code with the appropriate name.
Suppose I have a program in which I have a structure defined as:
struct Animal {
char *name;
int lifeSpan;
};
And also I have started the execution of this program.
Now, my question here is;
What if I want to define a new structure called "Plant" just like "Animal" mentioned above in my program, without writing its definition in the source code itself(which is obviously impossible currently) but rather from a user input string(or a file input) during runtime.
Lets say my program takes input string from a text file named file1.txt whose content is:
struct Plant {
char *name;
int lifeSpan;
};
What I want now is to have a new structure named "Plant" in my program which is already in execution. The program should read the file content and create a structure as written in the file and attach it to itself on-the-go.
I have checked out a solution for C++ in the discussion Declaring a data type dynamically in C++ but it doesnt seem to have a very convincing solution.
The solution I am looking for is at the compiler-linker-loader level rather than from the language itself.I would be very pleased and thankful if anyone is looking forward to sharing their ideas on this.
What you're asking about is basically "can we implement C as a scripting language?", since this is the only way code can be executed after compilation.
I'm aware that people have been writing (mostly in the comments) that it's possible in other languages but isn't possible in C, since C is a compiled language (hence data types should be defined during compile time).
However, to the best of my knowledge it's actually possible (and might not be as hard as one would imagine).
There are many possible approaches (machine code emulation (VM), JIT compilation, etc').
One approach will use a C compiler to compile the C script as an external dynamic library (.dll on windows, .so on linux, etc') and than "load" the compiled library and execute the code (this is pretty much the JIT compilation approach, for lazy people).
EDIT:
As mentioned in the comments, by using this approach, the new type is loaded as part of an external library.
The original code won't know about this new type, only the new code (or library) will be "aware" of this new type and able to properly use it.
On the other hand, I'm not sure why you're insisting on the need to use static types and a compiler-linker-loader level solution.
The language itself (the C language) can manage this task dynamically (during execution time).
Consider Ruby MRI, for example. The Ruby language supports dynamic types that can be defined during runtime...
...However, this is implemented in C and it's possible to use the code from within C to define new modules and classes. These aren't static types that can be tested during compilation (type creation and identification is performed during runtime).
This is a perfect example showing that C (as a language) can dynamically define "types".
However, this is also a poor example because Ruby's approach is slow. A custom approved can be far faster since it would avoid the huge overhead related to functionality you might not need (such as inheritance).
So, this is a bit of an open question. But let's say that I have a large application which globally overrides the various new and delete operators so that they use home-brewed jemalloc-style arenas and custom alignments.
All fine and good, but I have been running into segfault issues because other C++-based DLLs and their dependencies also use the overloaded allocators when they shouldn't (namely LLVM), putting the little custom allocator to its knees (lack of memory and more stresses).
Testing workarounds, I have wrapped (and moved) those global operators into a class, and I made all base classes inherit from it. And well, that works for classes, but not for base types. That's the problem.
Given that C++ doesn't allow useful things like having separate allocators per namespace, or limiting the new operator per executable module, what is the best way of emulating this in base data types, where I can't directly subclass an int?
The obvious way is wrapping them in a custom template, but the problem is performance. Do I have to emulate all the array and indexing operations under a second layer just so that I can malloc from a different place without having to change the rest of the functional code? There's a better way?
P.S.: I have also been thinking about using special global new/delete operators with extra parameters, while leaving the standard ones alone. Thus ensuring that I am (well, my executable module is) the only one calling those global functions. It should be a simple search-and-replace.
Well, quick update. What I did in the end to 'solve' this conundrum is to manually detect if the code that called the overridden global allocators comes from the main executable module and conditionally redirect all the external new / delete calls to their corresponding malloc / free while still using the custom arena allocator for our own internal code.
How? After doing some R&D I found that this could be done by using the _ReturnAddress() built-in on MSVC and __builtin_extract_return_addr(__builtin_return_address(0)) on GCC/Clang; and I can say that it seems to work fine so far in production software.
Now, when some C++ code from our address space wants some memory we can see where it comes from.
But, how do we find out if that address is part of some other module in our process space or our own? We might need to find out both the base and end addresses of the main program, cache them at startup as globals, and check that the return address is within bounds.
All for extremely little overhead. But, our second problem is that retrieving the base address is different in every platform. After some research I found that things were more straightforward than expected:
In Windows/Win32 we can simply do this:
#include <windows.h>
#include <psapi.h>
inline void __initialize_base_address()
{
MODULEINFO minfo;
GetModuleInformation(GetCurrentProcess(), GetModuleHandle(NULL), &minfo, sizeof(minfo));
base_addr = (uintptr_t) minfo.lpBaseOfDll;
base_end = (uintptr_t) minfo.lpBaseOfDll + minfo.SizeOfImage;
}
In Linux there are a thousand ways of doing this, including linker globals and some debuggey (verbose and unreliable) ways of walking the process module table. I was looking at the linker map output and noticed that the _init and _fini functions always seem to wrap the rest of the .text section symbols. Sometimes it's hard to get to the simplest solution that works everywhere:
#include <link.h>
inline void __initialize_base_address()
{
void *handle = dlopen(0, RTLD_NOW);
base_addr = (uintptr_t) dlsym(handle, "_init");
base_end = (uintptr_t) dlsym(handle, "_fini");
dlclose(handle);
}
While in macOS things are even less documented and I had to cobble together my own thing using the Darwin kernel open-source code and tracking down some obscure low-level tools as reference. Keep in mind that _NSGetMachExecuteHeader() is just a wrapper for the internal _mh_execute_header linker global. If you need to do anything about parsing the Mach-O format and its structures then getsect.h is the way to go:
#include <mach-o/getsect.h>
#include <mach-o/ldsyms.h>
#include <crt_externs.h>
inline void __initialize_base_address()
{
size_t size;
void *ptr = getsectiondata(&_mh_execute_header, SEG_TEXT, SECT_TEXT, &size);
base_addr = (uintptr_t) _NSGetMachExecuteHeader();
base_end = (uintptr_t) ptr + size;
}
Another thing to keep in mind is that this some-other-cpp-module-is-using-our-internal-allocator-that-globally-overrides-new-causing-weird-bugs issue seems to be a problem in Linux and maybe macOS, I didn't have this issue in Windows, probably because no conflicting DLLs were loaded in the process, being mostly C API-based. I think, or maybe the platform uses different C++ runtimes for each module.
The main issue I had was caused by Mesa3D, which uses LLVM (pure C++ in and out) for many of their GLSL shader compilers and liked to gobble up big chunks of my small custom-tailored memory arena uninvited.
Rewriting a legacy program that is structurally dependent on these allocators was out of the question due to its sheer size and complexity, so this turned out to be the best way of making things work as expected.
It's only a few lines of optional, sneaky, extra per-platform code.
I want to place a function void loadableSW (void) at a specific location:0x3FF802. In another function residentMain() I will jump to this location using pointer to function. How to declare function
loadableSW to accomplish this. I have attached the skeleton of residentMain for clarity.
Update: Target hardware is TMS320C620xDSP. Since this is an aerospace project, deterministic
behaviour is a desirable design objective. Ideally, they would like to know what portion of memory contains what at a particular time. The solution as I just got to know is to define a section in memory in the linker file. The section shall start at 0x3FF802 (Location where to place the function). Since the size of the loadableSW function is known, the size of the memory section can also be determined. And then the directive #pragma CODESECTION ("function_name", "section_name") can place that function in the specified section.
Since pragma directives are not permissible in test scripts, I am wondering if there is any other way to do this without using any linker directives.
Besides I am curious. Is there any placement syntax for functions in C++? I know there is one for objects, but functions?
void residentMain (void)
{
void (*loadable_p) (void) = (void (*) (void)) 0x3FF802;
int hardwareOK = 0;
/*Code to check hardware integrity. hardwareOK = 1 if success*/
if (hardwareOK)
{
loadable_p (); /*Jump to Loadable Software*/
}
else
{
dspHalt ();
}
}
I'm not sure about your OS/toolchain/IDE, but the following answer should work:
How to specify a memory location at which function will get stored?
There is just one way I know of and it is shown in the first answer.
UPDATE
How to define sections in gcc:
variables:
http://mcuoneclipse.com/2012/11/01/defining-variables-at-absolute-addresses-with-gcc/
methods (section ("section-name")): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attributes.html#Function%20Attributes
How to place a function at a particular address in C?
Since pragma directives are not permissible in test scripts, I am wondering if there is any other way to do this without using any linker directives.
If your target supports PC-relative addressing and you can ensure it is pure, then you can use a memcpy() to relocate the routine.
How to run code from RAM... has some hints on this. If you can not generate PC-relative/relocatable code, then you absolutely can not do this with out the help of the linker. That is the definition of a linker/loader, to fix up addresses.
Which can take you to a different concept. Do not fully link your code. Instead defer the address fixup until loading. Then you must write a loader to place the code at run-time; but from your aerospace project comment, I think that complexity and analysis are also important so I don't believe you would accept that. You also need double the storage, etc.
Dearest stack exchange,
I'm programming an MRI scanner. I won't go into too much background, but I'm fairly constrained in how much code I've got access to, and the way things have been set up is...suboptimal. I have a situation as follows:
There is a big library, written in C++. It ultimately does "transcoding" (in the worst possible way), writing out FPGA assembly that DoesThings. It provides a set of functions to "userland" that are translated into (through a mix of preprocessor macros and black magic) long strings of 16 bit and 32 bit words. The way this is done is prone to buffer overflows, and generally to falling over.*
The FPGA assembly is then strung out over a glorified serial link to the relevant electronics, which executes it (doing the scan), and returning the data back again for processing.
Programmers are expected to use the functions provided by the library to do their thing, in C (not C++) functions that are linked against the standard library. Unfortunately, in my case, I need to extend the library.
There's a fairly complicated chain of preprocessor substitution and tokenization, calling, and (in general) stuff happening between you writing doSomething() in your code, and the relevant library function actually executing it. I think I've got it figured out to some extent, but it basically means that I've got no real idea about the scope of anything...
In short, my problem is:
In the middle of a method, in a deep dark corner of many thousands of lines of code in a big blob I have little control over, with god-knows-what variable scoping going on, I need to:
Extend this method to take a function pointer (to a userland function) as an argument, but
Let this userland function, written after the library has been compiled, have access to variables that are local to both the scope of the method where it appears, as well as variables in the (C) function where it is called.
This seems like an absolute mire of memory management, and I thought I'd ask here for the "best practice" in these situations, as it's likely that there are lots of subtle issues I might run into -- and that others might have lots of relevant wisdom to impart. Debugging the system is a nightmare, and I've not really got any support from the scanner's manufacturer on this.
A brief sketch of how I plan to proceed is as follows:
In the .cpp library:
/* In something::something() /*
/* declare a pointer to a function */
void (*fp)(int*, int, int, ...);
/* by default, the pointer points to a placeholder at compile time*/
fp = &doNothing(...);
...
/* At the appropriate time, point the pointer to the userland function, whose address is supplied as an argument to something(): /*
fp= userFuncPtr;
/* Declare memory for the user function to plonk data into */
i_arr_coefficients = (int) malloc(SOMETHING_SENSIBLE);
/* Create a pointer to that array for the userland function */
i_ptr_array=&i_arr_coefficients[0];
/* define a struct of pointers to local variables for the userland function to use*/
ptrStrct=createPtrStruct();
/* Call the user's function: */
fp(i_ptr_array,ptrStrct, ...);
CarryOnWithSomethingElse();
The point of the placeholder function is to keep things ticking over if the user function isn't linked in. I get that this could be replaced with a #DEFINE, but the compiler's cleverness or stupidity might result in odd (to my ignorant mind, at least) behaviour.
In the userland function, we'd have something like:
void doUsefulThings(i_ptr_array, ptrStrct, localVariableAddresses, ...) {
double a=*ptrStrct.a;
double b=*ptrStrct.b;
double c=*localVariableAddresses.c;
double d=doMaths(a, b, c);
/* I.e. do maths using all of these numbers we've got from the different sources */
storeData(i_ptr_array, d);
/* And put the results of that maths where the C++ method can see it */
}
...
something(&doUsefulThings(i_ptr_array, ptrStrct, localVariableAddresses, ...), ...);
...
If this is as clear as mud please tell me! Thank you very much for your help. And, by the way, I sincerely wish someone would make an open hardware/source MRI system.
*As an aside, this is the primary justification the manufacturer uses to discourage us from modifying the big library in the first place!
You have full access to the C code. You have limited access to the C++ library code. The C code is defining the "doUsefullthings" function. From C code you are calling the "Something" function ( C++ class/function) with function pointer to "doUseFullThings" as the argument. Now the control goes to the C++ library. Here the various arguments are allocated memory and initialized. Then the the "doUseFullThings" is called with those arguments. Here the control transfers back to the C code. In short, the main program(C) calls the library(C++) and the library calls the C function.
One of the requirements is that the "userland function should have access to local variable from the C code where it is called". When you call "something" you are only giving the address of "doUseFullThings". There is no parameter/argument of "something" that captures the address of the local variables. So "doUseFullThings" does not have access to those variables.
malloc statement returns pointer. This has not been handled properly.( probably you were trying to give us overview ). You must be taking care to free this somewhere.
Since this is a mixture of C and C++ code, it is difficult to use RAII (taking care of allocated memory), Perfect forwarding ( avoid copying variables), Lambda functions ( to access local varibales) etc. Under the circumstances, your approach seems to be the way to go.
I have following requirement:
Adding text at the entry and exit point of any function.
Not altering the source code, beside inserting from above (so no pre-processor or anything)
For example:
void fn(param-list)
{
ENTRY_TEXT (param-list)
//some code
EXIT_TEXT
}
But not only in such a simple case, it'd also run with pre-processor directives!
Example:
void fn(param-list)
#ifdef __WIN__
{
ENTRY_TEXT (param-list)
//some windows code
EXIT_TEXT
}
#else
{
ENTRY_TEXT (param-list)
//some any-os code
if (condition)
{
return; //should become EXIT_TEXT
}
EXIT_TEXT
}
So my question is: Is there a proper way doing this?
I already tried some work with parsers used by compilers but since they all rely on running a pre-processor before parsing, they are useless to me.
Also some of the token generating parser, which do not need a pre-processor are somewhat useless because they generate a memory-mapping of tokens, which then leads to a complete new source code, instead of just inserting the text.
One thing I am working on is to try it with FLEX (or JFlex), if this is a valid option, I would appreciate some input on it. ;-)
EDIT:
To clarify a little bit: The purpose is to allow something like a stack trace.
I want to trace every function call, and in order to follow the call-hierachy, I need to place a macro at the entry-point of a function and at the exit point of a function.
This builds a function-call trace. :-)
EDIT2: Compiler-specific options are not quite suitable since we have many different compilers to use, and many that are propably not well supported by any tools out there.
Unfortunately, your idea is not only impractical (C++ is complex to parse), it's also doomed to fail.
The main issue you have is that exceptions will bypass your EXIT_TEXT macro entirely.
You have several solutions.
As has been noted, the first solution would be to use a platform dependent way of computing the stack trace. It can be somewhat imprecise, especially because of inlining: ie, small functions being inlined in their callers, they do not appear in the stack trace as no function call was generated at assembly level. On the other hand, it's widely available, does not require any surgery of the code and does not affect performance.
A second solution would be to only introduce something on entry and use RAII to do the exit work. Much better than your scheme as it automatically deals with multiple returns and exceptions, it suffers from the same issue: how to perform the insertion automatically. For this you will probably want to operate at the AST level, and modify the AST to introduce your little gem. You could do it with Clang (look up the c++11 migration tool for examples of rewrites at large) or with gcc (using plugins).
Finally, you also have manual annotations. While it may seem underpowered (and a lot of work), I would highlight that you do not leave logging to a tool... I see 3 advantages to doing it manually: you can avoid introducing this overhead in performance sensitive parts, you can retain only a "summary" of big arguments and you can customize the summary based on what's interesting for the current function.
I would suggest using LLVM libraries & Clang to get started.
You could also leverage the C++ language to simplify your process. If you just insert a small object into the code that is constructed on function scope entrance & rely on the fact that it will be destroyed on exit. That should massively simplify recording the 'exit' of the function.
This does not really answer you question, however, for your initial need, you may use the backtrace() function from execinfo.h (if you are using GCC).
How to generate a stacktrace when my gcc C++ app crashes