Determine size of functions/stub/namespace in memory - c++

I have a couple of functions in a namespace called stub.
I have to determine the exact start address of the namespace and the end address, of at least the size of the namespace in memory (to copy these functions into another process).
While this worked perfectly in Visual C++ 2008 by adding a
void stub_end() { }
at the end of the namespace and using
size_t size = reinterpret_cast<ULONG_PTR>(stub_end) - reinterpret_cast<ULONG_PTR>(stub_start);
to determine the size of the stub.
This worked because Visual C++ preserved the function order as it is in the .cpp file, however that does not seem to be the case in Visual C++ 2010 anymore.
How can I find out the size of the functions or the whole namespace/stub by using pragma directives, compiler/linker facilities or similar?

With the new push in security these days (heap randomization, layout randomization, etc..) I think this is going to be much more difficult. You may end up having to just copy each function individually.

You can try and place each function in a different section, using the VC++ equivalent GCC's attribute ((section ("name"))) http://www.delorie.com/gnu/docs/gcc/gcc_62.html and then use your technique, or you could place each function in a different source file.

The C++ language provides no guarantees for finding addresses or sizes of namespaces. That said, venture into assembly language and linker instructions.
Many assembly languages have an opcode or mnemonic for placing code at specific addresses. This allows a label to be set up to indicate the start of a memory area. Some linkers have variables for obtaining segment starting addresses and sizes. These would be user defined logical addresses.
In summary, use your assembly and linker tools to define public symbols for the namespace start and length or optionally the end of the segment. In your C++ program, access these labels as extern.

Related

ELF INIT section code to prepopulate objects used at runtime

I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!
As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.
There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.
You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.

C++ Passing std::string by reference to function in dll

I have the problem with passing by reference std::string to function in dll.
This is function call:
CAFC AFCArchive;
std::string sSSS = std::string("data\\gtasa.afc");
AFCER_PRINT_RET(AFCArchive.OpenArchive(sSSS.c_str()));
//AFCER_PRINT_RET(AFCArchive.OpenArchive(sSSS));
//AFCER_PRINT_RET(AFCArchive.OpenArchive("data\\gtasa.afc"));
This is function header:
#define AFCLIBDLL_API __declspec(dllimport)
AFCLIBDLL_API EAFCErrors CAFC::OpenArchive(std::string const &_sFileName);
I try to debug pass-by-step through calling the function and look at _sFileName value inside function.
_sFileName in function sets any value(for example, t4gs..\n\t).
I try to detect any heap corruption, but compiler says, that there is no error.
DLL has been compiled in debug settings. .exe programm compiled in debug too.
What's wrong?? Help..!
P.S. I used Visual Studio 2013. WinApp.
EDIT
I have change header of func to this code:
AFCLIBDLL_API EAFCErrors CAFC::CreateArchive(char const *const _pArchiveName)
{
std::string _sArchiveName(_pArchiveName);
...
I really don't know, how to fix this bug...
About heap: it is allocated in virtual memory of our process, right? In this case, shared virtual memory is common.
The issue has little to do with STL, and everything to do with passing objects across application boundaries.
1) The DLL and the EXE must be compiled with the same project settings. You must do this so that the struct alignment and packing are the same, the members and member functions do not have different behavior, and even more subtle, the low-level implementation of a reference and reference parameters is exactly the same.
2) The DLL and the EXE must use the same runtime heap. To do this, you must use the DLL version of the runtime library.
You would have encountered the same problem if you created a class that does similar things (in terms of memory management) as std::string.
Probably the reason for the memory corruption is that the object in question (std::string in this case) allocates and manages dynamically allocated memory. If the application uses one heap, and the DLL uses another heap, how is that going to work if you instantiated the std::string in say, the DLL, but the application is resizing the string (meaning a memory allocation could occur)?
C++ classes like std::string can be used across module boundaries, but doing so places significant constraints on the modules. Simply put, both modules must use the same instance of the runtime.
So, for instance, if you compile one module with VS2013, then you must do so for the other module. What's more, you must link to the dynamic runtime rather than statically linking the runtime. The latter results in distinct runtime instances in each module.
And it looks like you are exporting member functions. That also requires a common shared runtime. And you should use __declspec(dllexport) on the entire class rather than individual members.
If you control both modules, then it is easy enough to meet these requirements. If you wish to let other parties produce one or other of the modules, then you are imposing a significant constraint on those other parties. If that is a problem, then consider using more portable interop. For example, instead of std::string use const char*.
Now, it's possible that you are already using a single shared instance of the dynamic runtime. In which case the error will be more prosaic. Perhaps the calling conventions do not match. Given the sparse level of detail in your question, it's hard to say anything with certainty.
I encountered similar problem.
I resolved it synchronizing Configuration Properties -> C / C++ settings.
If you want debug mode:
Set _DEBUG definition in Preprocessor Definitions in both projects.
Set /MDd in Code Generation -> Runtime Library in both projects.
If you want release mode:
Remove _DEBUG definition in Preprocessor Definitions in both projects.
Set /MD in Code Generation -> Runtime Library in both projects.
Both projects I mean exe and dll project.
It works for me especially if I don't want to change any settings of dll but only adjust to them.

How to place a function at a particular address in C?

I want to place a function void loadableSW (void) at a specific location:0x3FF802. In another function residentMain() I will jump to this location using pointer to function. How to declare function
loadableSW to accomplish this. I have attached the skeleton of residentMain for clarity.
Update: Target hardware is TMS320C620xDSP. Since this is an aerospace project, deterministic
behaviour is a desirable design objective. Ideally, they would like to know what portion of memory contains what at a particular time. The solution as I just got to know is to define a section in memory in the linker file. The section shall start at 0x3FF802 (Location where to place the function). Since the size of the loadableSW function is known, the size of the memory section can also be determined. And then the directive #pragma CODESECTION ("function_name", "section_name") can place that function in the specified section.
Since pragma directives are not permissible in test scripts, I am wondering if there is any other way to do this without using any linker directives.
Besides I am curious. Is there any placement syntax for functions in C++? I know there is one for objects, but functions?
void residentMain (void)
{
void (*loadable_p) (void) = (void (*) (void)) 0x3FF802;
int hardwareOK = 0;
/*Code to check hardware integrity. hardwareOK = 1 if success*/
if (hardwareOK)
{
loadable_p (); /*Jump to Loadable Software*/
}
else
{
dspHalt ();
}
}
I'm not sure about your OS/toolchain/IDE, but the following answer should work:
How to specify a memory location at which function will get stored?
There is just one way I know of and it is shown in the first answer.
UPDATE
How to define sections in gcc:
variables:
http://mcuoneclipse.com/2012/11/01/defining-variables-at-absolute-addresses-with-gcc/
methods (section ("section-name")): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attributes.html#Function%20Attributes
How to place a function at a particular address in C?
Since pragma directives are not permissible in test scripts, I am wondering if there is any other way to do this without using any linker directives.
If your target supports PC-relative addressing and you can ensure it is pure, then you can use a memcpy() to relocate the routine.
How to run code from RAM... has some hints on this. If you can not generate PC-relative/relocatable code, then you absolutely can not do this with out the help of the linker. That is the definition of a linker/loader, to fix up addresses.
Which can take you to a different concept. Do not fully link your code. Instead defer the address fixup until loading. Then you must write a loader to place the code at run-time; but from your aerospace project comment, I think that complexity and analysis are also important so I don't believe you would accept that. You also need double the storage, etc.

memory address of a member variable of argument objects changes when dll function is called

class SomeClass
{
//some members
MemberClass one_of_the_mem_;
}
I have a function foo( SomeClass *object ) within a dll, it is being called from an exe.
Problem
address of one_of_the_mem_ changes during the time the dll call is dispatched.
Details:
before the call is made (from exe):
'&(this).one_of_the_mem_' - `0x00e913d0`
after - in the dll itself :
'&(this).one_of_the_mem_' - `0x00e913dc`
The address of object remains constant. It is only the member whose address shift by c every time.
I want some pointers regarding how can I troubleshoot this problem.
Code :
Code from Exe
stat = module->init ( this,
object_a,
&object_b,
object_c,
con_dir
);
Code in DLL
Status_C ModuleClass( SomeClass *object, int index, Config *conf, const char* name)
{
_ASSERT(0); //DEBUGGING HOOK
...
Update 1:
I compared the Offsets of members following Michael's instruction and they are the same in both cases.
Update 2:
I found a way to dump the class layout and noticed the difference in size, I have to figure out why is that happening though.
linked is the question that I found to dump class layout.
Update 3:
Final Update : Solved the problem, much thanks to Michael Burr.
it turned out that one of the build was using 32 bit time, _USE_32BIT_TIME_T was defined in it and the other one was using 64 bit time. So it generated the different layout for the object, attached is the difference file.
Your DLL was probably compiled with different set of compiler options (or maybe even a slightly different header file) and the class layout is different as a result.
For example, if one was built using debug flags and other wasn't or even if different compiler versions were used. For example, the libraries used by different compiler versions might have subtle differences and if your class incorporates a type defined by the library you could have different layouts.
As a concrete example, with Microsoft's compiler iterators and containers are sensitive to release/debug, _SECURE_SCL on/off , and _HAS_ITERATOR_DEBUGGING on/off setting (at least up though VS 2008 - VS 2010 may have changed some of this to a certain extent). See http://connect.microsoft.com/VisualStudio/feedback/details/352699/secure-scl-is-broken-in-release-builds for some details.
These kinds of issues make using C++ classes across DLL boundaries a bit more fragile than using straight C interfaces. They can occur in C structures as well, but it seems like C++ libraries have these differences more often (I think that's the nature of having richer functionality).
Another layout-changing issue that occurs every now and then is having a different structure packing option in effect in the different compiles. One thing that can 'hide' this is that pragmas are often used in headers to set structure packing to a certain value, and sometimes you may come across a header that does this without changing it back to the default (or more correctly the previous setting). If you have such a header, it's easy to have it included in the build for one module, but not another.
that sounds a bit wierd, you should show more code, it should 'move' if it being passed by ref, it sounds more like a copy of it is being made and that having the member function called.
Perhaps the DLL versions is compiled against a different version that you are referencing. check and make sure the header file is for the same version as the dll.
Recompile the library if you can.

How to use CUDA constant memory in a programmer pleasant way?

I'm working on a number crunching app using the CUDA framework. I have some static data that should be accessible to all threads, so I've put it in constant memory like this:
__device__ __constant__ CaseParams deviceCaseParams;
I use the call cudaMemcpyToSymbol to transfer these params from the host to the device:
void copyMetaData(CaseParams* caseParams)
{
cudaMemcpyToSymbol("deviceCaseParams", caseParams, sizeof(CaseParams));
}
which works.
Anyways, it seems (by trial and error, and also from reading posts on the net) that for some sick reason, the declaration of deviceCaseParams and the copy operation of it (the call to cudaMemcpyToSymbol) must be in the same file. At the moment I have these two in a .cu file, but I really want to have the parameter struct in a .cuh file so that any implementation could see it if it wants to. That means that I also have to have the copyMetaData function in the a header file, but this messes up linking (symbol already defined) since both .cpp and .cu files include this header (and thus both the MS C++ compiler and nvcc compiles it).
Does anyone have any advice on design here?
Update: See the comments
With an up-to-date CUDA (e.g. 3.2) you should be able to do the memcpy from within a different translation unit if you're looking up the symbol at runtime (i.e. by passing a string as the first arg to cudaMemcpyToSymbol as you are in your example).
Also, with Fermi-class devices you can just malloc the memory (cudaMalloc), copy to the device memory, and then pass the argument as a const pointer. The compiler will recognise if you are accessing the data uniformly across the warps and if so will use the constant cache. See the CUDA Programming Guide for more info. Note: you would need to compile with -arch=sm_20.
If you're using pre-Fermi CUDA, you will have found out by now that this problem doesn't just apply to constant memory, it applies to anything you want on the CUDA side of things. The only two ways I have found around this are to either:
Write everything CUDA in a single file (.cu), or
If you need to break out code into separate files, restrict yourself to headers which your single .cu file then includes.
If you need to share code between CUDA and C/C++, or have some common code you share between projects, option 2 is the only choice. It seems very unnatural to start with, but it solves the problem. You still get to structure your code, just not in a typically C like way. The main overhead is that every time you do a build you compile everything. The plus side of this (which I think is possibly why it works this way) is that the CUDA compiler has access to all the source code in one hit which is good for optimisation.