C and C++ Code Interoperability - Data Passing Issues - c++

The following is the situation. There is a system/software which is completely written in C. This C program spawns a new thread to start some kind of a data processing engine written in C++. Hence, the system which I have, runs 2 threads (the main thread and the data processing engine thread). Now, I have written some function in C which takes in a C struct and passes it to the data processing thread so that a C++ function can access the C struct. While doing so, I am observing that the values of certain fields (like unsigned int) in the C struct changes when being accessed in the C++ side and I am not sure why. At the same time, if I pass around a primitive data type like an int, the value does not change. It would be great if someone can explain me why it behaves like this. The following is the code that i wrote.
`
/* C++ Function */
void DataProcessor::HandleDataRecv(custom_struct* cs)
{
/*Accesses the fields in the structure cs - an unsigned int field. The value of
field here is different from the value when accessed through the C function below.
*/
}
/*C Function */
void forwardData(custom_struct* cs)
{
dataProcessor->HandleDataRecv(cs); //Here dataProcessor is a reference to the object
//of the C++ class.
}
`
Also, both these functions are in different source files(one with .c ext and other with .cc ext)

I'd check that both sides layout the struct in the same
print sizeof(custom_struct) in both languages
Create an instance of custom_struct in both languages and print the offset of
each member variable.

My wild guess would be Michael Andresson is right, structure aligment might be the issue.
Try to compile both c and c++ files with
-fpack-struct=4
(or some other number for 4). This way, the struct is aligned the same in every case.
If we could see the struct declaration, it would probably clearer. The struct does not contain any #ifdef with c++-specific code like a constructor, does it? Also, check for #pragma pack directives which manipulate data alignment.

Maybe on one side the struct has 'empty bytes' added to make the variables align on 32 bit boundaries for speed (so a CPU register can point to the variable directly).
And on the other side the struct may be packed to conserve space.

(CORRECTION) With minor exceptions, C++ is a superset of C (meaning C89), So i'm confused about what is going on. I can only assume it has something to do with how you are passing or typing your variables, and/or the systems they are running on. It should, technically speaking, unless I am very mistaken, have nothing to do with c/c++ interoperability.
Some more details would help.

Related

ELF INIT section code to prepopulate objects used at runtime

I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!
As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.
There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.
You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.

Passing function pointers as an API interface to a compiled library

Dearest stack exchange,
I'm programming an MRI scanner. I won't go into too much background, but I'm fairly constrained in how much code I've got access to, and the way things have been set up is...suboptimal. I have a situation as follows:
There is a big library, written in C++. It ultimately does "transcoding" (in the worst possible way), writing out FPGA assembly that DoesThings. It provides a set of functions to "userland" that are translated into (through a mix of preprocessor macros and black magic) long strings of 16 bit and 32 bit words. The way this is done is prone to buffer overflows, and generally to falling over.*
The FPGA assembly is then strung out over a glorified serial link to the relevant electronics, which executes it (doing the scan), and returning the data back again for processing.
Programmers are expected to use the functions provided by the library to do their thing, in C (not C++) functions that are linked against the standard library. Unfortunately, in my case, I need to extend the library.
There's a fairly complicated chain of preprocessor substitution and tokenization, calling, and (in general) stuff happening between you writing doSomething() in your code, and the relevant library function actually executing it. I think I've got it figured out to some extent, but it basically means that I've got no real idea about the scope of anything...
In short, my problem is:
In the middle of a method, in a deep dark corner of many thousands of lines of code in a big blob I have little control over, with god-knows-what variable scoping going on, I need to:
Extend this method to take a function pointer (to a userland function) as an argument, but
Let this userland function, written after the library has been compiled, have access to variables that are local to both the scope of the method where it appears, as well as variables in the (C) function where it is called.
This seems like an absolute mire of memory management, and I thought I'd ask here for the "best practice" in these situations, as it's likely that there are lots of subtle issues I might run into -- and that others might have lots of relevant wisdom to impart. Debugging the system is a nightmare, and I've not really got any support from the scanner's manufacturer on this.
A brief sketch of how I plan to proceed is as follows:
In the .cpp library:
/* In something::something() /*
/* declare a pointer to a function */
void (*fp)(int*, int, int, ...);
/* by default, the pointer points to a placeholder at compile time*/
fp = &doNothing(...);
...
/* At the appropriate time, point the pointer to the userland function, whose address is supplied as an argument to something(): /*
fp= userFuncPtr;
/* Declare memory for the user function to plonk data into */
i_arr_coefficients = (int) malloc(SOMETHING_SENSIBLE);
/* Create a pointer to that array for the userland function */
i_ptr_array=&i_arr_coefficients[0];
/* define a struct of pointers to local variables for the userland function to use*/
ptrStrct=createPtrStruct();
/* Call the user's function: */
fp(i_ptr_array,ptrStrct, ...);
CarryOnWithSomethingElse();
The point of the placeholder function is to keep things ticking over if the user function isn't linked in. I get that this could be replaced with a #DEFINE, but the compiler's cleverness or stupidity might result in odd (to my ignorant mind, at least) behaviour.
In the userland function, we'd have something like:
void doUsefulThings(i_ptr_array, ptrStrct, localVariableAddresses, ...) {
double a=*ptrStrct.a;
double b=*ptrStrct.b;
double c=*localVariableAddresses.c;
double d=doMaths(a, b, c);
/* I.e. do maths using all of these numbers we've got from the different sources */
storeData(i_ptr_array, d);
/* And put the results of that maths where the C++ method can see it */
}
...
something(&doUsefulThings(i_ptr_array, ptrStrct, localVariableAddresses, ...), ...);
...
If this is as clear as mud please tell me! Thank you very much for your help. And, by the way, I sincerely wish someone would make an open hardware/source MRI system.
*As an aside, this is the primary justification the manufacturer uses to discourage us from modifying the big library in the first place!
You have full access to the C code. You have limited access to the C++ library code. The C code is defining the "doUsefullthings" function. From C code you are calling the "Something" function ( C++ class/function) with function pointer to "doUseFullThings" as the argument. Now the control goes to the C++ library. Here the various arguments are allocated memory and initialized. Then the the "doUseFullThings" is called with those arguments. Here the control transfers back to the C code. In short, the main program(C) calls the library(C++) and the library calls the C function.
One of the requirements is that the "userland function should have access to local variable from the C code where it is called". When you call "something" you are only giving the address of "doUseFullThings". There is no parameter/argument of "something" that captures the address of the local variables. So "doUseFullThings" does not have access to those variables.
malloc statement returns pointer. This has not been handled properly.( probably you were trying to give us overview ). You must be taking care to free this somewhere.
Since this is a mixture of C and C++ code, it is difficult to use RAII (taking care of allocated memory), Perfect forwarding ( avoid copying variables), Lambda functions ( to access local varibales) etc. Under the circumstances, your approach seems to be the way to go.

Which tool can list writing access to a specific variable in C?

Unfortunately I'm not even sure how this sort of static analysis is called. It's not really control flow analysis because I'm not looking for function calls and I don't really need data flow analysis because I don't care about the actual values.
I just need a tool that lists the locations (file, function) where writing access to a specific variable takes place. I don't even care if that list contained lines that are unreachable. I could imagine that writing a simple parser could suffice for this task but I'm certain that there must be a tool out there that does this simple analysis.
As a poor student I would appreciate free or better yet open source tools and if someone could tell me how this type of static analysis is actually called, I would be equally grateful!
EDIT: I forgot to mention there's no pointer arithmetic in the code base.
Why don't you make the variable const and then note down all the errors where your compiler bans write access?
Note: This won't catch errors where the memory underlying the variable is written to in some erroneous manner such as a buffer overrun.
EDIT: For example:
const int a = 1;
a = 2;
a = 3;
My compiler produces:
1>MyProg.c(46): error C3892: 'a' : you cannot assign to a variable that is const
1>MyProg.c(47): error C3892: 'a' : you cannot assign to a variable that is const
Do you mean something like this?
This works for C programs that you have made the effort to analyze with Frama-C's value analysis. It is Open Source and the dependency information is also available programmatically. As static analyzers go, it is rather on the “precise” side of the spectrum. It will work better if your target is embedded C code.
I am not sure such a tool could be written. Pointers can be used to change arbitary data in memory without having any reference to other variables pointing to that data. Think about functions like memset(), which change whole blocks of memory.
If you are not interested in these kind of mutations, you would still have to take transitive pointers into account. In C, you can have any number of pointers pointing to the same data, and you would have to analyze where copies of these pointers are made. And then these copies can be copied again, ...
So even in the "simple" case it would require quite a big amount of code analysis.

C/C++ difference and

I have a vendor specifc code for ADC and other peripherals.
Now using it I am understanding the flow.
the extension of the files are .cpp but the statements there are similar to C rather C++
i.e. printf() is used instead of cout;
no namespace.std defined... also other things which made me sure that it is a c-language code.
(pardon me but whatever I ask the vendors but response is quite late from there)
So it is a complete C- code. But while understangin I came to a point where the class is defined and I am really puzzled now. Since I have not seen or heard any one using class
C4DSPBlast cBlast;
cBlast.GetBlastInfo();
where C4DSPBlast cBlast;
and following code shows that C4DSPBlast is a class. Now, while debugging, I found that I am getting error at exactly this statement cBlast.GetBlastInfo();
but since I don't know the classes in C- I post it here since I am not getting any forward any further in debugging.
class C4DSPBlast
{
public:
//! empty constructor.
C4DSPBlast(void);
//! empty destructor.
~C4DSPBlast(void);
//! Get BLAST information from the hardware(firmware).
/*!
* Read the BLAST information from an the PCI memory attached to the hardware device. This function populates internal class members with this information.
* #return CBLAST_IO_ERROR_BLAST_INFO_RD, CBLAST_NO_DEV_TYPE or CBLAST_SUCCESS if no errors.
*/
int GetBlastInfo(void);
//! m_valBLASTRegister the standard BLAST information register.
union { BLASTReg m_BLASTRegister; unsigned long m_val0; } m_valBLASTRegister;
//! m_valBLASTRegisterExt the extended BLAST information register.
union { BLASTReg m_BLASTRegisterExt; unsigned long m_val1; } m_valBLASTRegisterExt;
//! The whole BLAST information populated by GetBlastInfo() as a C data structure.
struct BOARD m_cBoard;
};
The code is C++. Compile it as C++ and the errors will disappear.
C and C++ are different languages. Current common convention is that if you give file with an extension .c to the compiler, it will compile it as C file. If you give it a .cpp or .cxx (exact list depends on the compiler) it will process it as C++ file. This will work even if you put mixture of C/C++ files on the same command line.
If you pick up arbitrary C file, rename to .cpp, and give it to compiler, 99% chance that it will be compiled. C++ standard describes a list of incompatibilities with C, but these are rather rare things.
Most likely that you see a file that was created as C and then started its new life as C++.

Why is this a memory copying error - Insure++ false positive?

I've been trying run Insure++ with some scientific code and it reports many errors, although to be fair it officially does not support K&R C and I don't know what having a lot of K&R functions has done to its evaluation process. The C and C++ code it is testing is being run in a DLL invoked from a WPF application.
One error report that puzzles me is the following, which I'm confident is safe code but am trying to work out why it thinks is an error (it does work). I'd be interested if anyone has an insight into why this might be an error condition.
[MacImagePlot.c:984] **READ_OVERFLOW**
SetCursorQD(*GetCursorQD(watchCursor));
Reading overflows memory: GetCursorQD(watchCursor)
bbbbb
| 4 | 4 |
rrrrr
Reading (r) : 0x5639d164 thru 0x5639d167 (4 bytes)
From block (b) : 0x5639d160 thru 0x5639d163 (4 bytes)
gWatchCursor, declared at WPFMacGraphics.cpp, 418
for some very simple code.
typedef int Cursor;
typedef Cursor* CursPtr;
typedef CursPtr* CursHandle;
CursHandle GetCursorQD (short cursorID);
void SetCursorQD (const Cursor *crsr);
enum {
....
watchCursor = 4
};
// file globals
Cursor gWatchCursor=watchCursor;
CursPtr gWatchCursorPtr = &gWatchCursor;
CursHandle GetCursorQD (short cursorID)
{
if (cursorID==watchCursor) // this is actually the only case ever called
return &gWatchCursorPtr;
return 0;
}
I'm not familiar at all with the tools you're talking about, but have you verified that your GetCursorQD function is returning the pointer you expect and not NULL/0?
Perhaps something wonky happened with your enum definition for watchCursor (such as it being declared differently elsewhere, or it picking up a local variable instead of the enum).
I hate to say it but I suspect your problem is going to be the lack of some arcane function modifiers needed to ensure that data on the stack isn't getting munged when crossing the DLL boundary. I'd suggest writing a simple app that replicates the code but does it all in one module and see if Insure++ still detects an error. If it doesn't, get ready to wade through __declspec documentation.
I assume that the following line is the Problem:
if (cursorID==watchCursor)
cursorID is defined as short (usually 2 Bytes)
watchCursor is part of a enum and thus of type int (4 Bytes on a 32Bit OS)
This actually is not a problem. The compiler will cast one of both parameters correctly, as far as the enum value will not exceed a 2 Byte range.
By my experience all static (as well as runtime-) code analysis tools report many false positives (i tried some of them). They of course help, but it takes quite a while to assert false positives from real bugs.
Like Soapbox, I am not familiar with Insure++.
But looking at the code, it is admittedly a bit confusing...so
That typedef makes CursHandle effectively a pointer to pointer to int...
CursHandle is a pointer of type CursPtr
CursPtr is a pointer of type Cursor
Cursor is typedef'd to type int
yet in the GetCursorQD, you are using a 'double address of' int? The reason I say 'double address' is the function is returning a address of gWatchCursorPtr (&gWatchCursorPtr) of type CursHandle, which in turn is a global variable which is a address of gWatchCursor (&gWatchCursor) which is of type Cursor.
Your definition of the return type for the function does not match up with the global variable name's typeof despite the typedef's...that's what my thinking is...
Hope this helps,
Best regards,
Tom.