Are object and void pointer exchangeable in extern C declarations? - c++

I have an extern "C" interface to some dynamically loaded code. For that code some objects are opaque handles, expressed as void*. The interface looks like this:
extern "C" {
void* get_foo() { Foo* foo = /* create and remember object */; return foo; }
int foo_get_bar(void* Foo) { return ((Foo*)foo)->bar(); }
}
Such pattern are pretty much the standard way to interact with C++ objects across a C-API, AFAIK. But I wonder if I can omit that pointer cast?
The caller of the code is generated and only linked against the interface above (it has its own function declarations). So it effectively generates code like this:
void foo_get_bar(void*);
void* get_foo();
int do_something() { return foo_get_bar(get_foo()) };
We can assume that the caller uses the code correctly (i.e., it does not pass wrong pointers). But of course it has no notion of a Foo pointer.
Now could I change the interface to the (simpler) variant:
extern "C" {
int foo_get_bar(Foo* Foo) { return foo->bar(); }
}
Or is there a subtle difference between void* and Foo* at the linker level (could e.g., the sizes not match up?).
edit: I might have caused some confusion with the caller code in C. There is no C code. There is no C compiler and thus no C type checker involved. The caller code is generated and it uses the C-API. So to use an opaque struct I would need to know how the C-API represents that pointer after compilation.

I found the question a little confusing, but if I understand you correctly this is fine. The size of a pointer to anything is the same on any modern platform (we have, thankfully, left behind us the insanity of __near, __far, and __middling).
So I might introduce some notion of type-safety like this in the header file declaring get_foo():
// Foo.h
struct Foo;
extern "C"
{
Foo* get_foo ();
int foo_get_bar (Foo* Foo);
}
And in the implementation of class Foo, which is presumably C++, we might have:
// foo.cpp
struct Foo
{
int get_bar () { return 42; }
};
Foo* get_foo () { return new Foo (); }
int foo_get_bar (Foo *foo) { return foo->get_bar (); }
Sneaky, but legal. You just have to declare Foo as a struct rather than a class in foo.cpp, which means that if you want things private or protected you have to explicitly say so.
If that's not palatable, then you can do this instead in foo.h:
#ifdef __cplusplus
class Foo;
#else
struct Foo;
#endif
And then declare Foo as a class rather than a struct in foo.c. Choose your poison.
Live demo.

In practice the pointer sizes/representations shouldn't change on common platforms x86 and ARM, so you may get away with using void *.
In principle however, the standard doesn't guarantee that 2 pointers of different referenced types will have the same pointer representation and even width, except for the cases mentioned in
C11 6.2.5p28:
that a pointer to void will have the same representation and alignment requirement as a pointer to a character type
pointers to compatible types will have the same representation and alignment requirements despite the alignment
all pointers to structures will have the same representation and alignment requirements among themselves
all pointers to unions will have the same representation and alignment requirements among themselves
Therefore the cleanest way to represent a pointer to a C++ class in C is to use a pointer to an incomplete struct:
typedef struct Foo Foo;
void foo_get_bar(Foo *);
Foo *get_foo(void);
Another advantage is that the compiler will complain if you try to use a Bar * by mistake.
P.S.: do note that get_foo() does not mean the same as get_foo(void) in C.

Related

What is the correct typedef for an opaque C pointer to a C++ class?

There are dozens upon dozens of SO questions and blog posts that describe wrapping a C++ class with a C API. Example Wrapping C++ class API for C consumption
Most of these answers and blogposts go for something like this:
typedef void* CMyClass;
But others say that this is bad because it provides no type safety. They propose various variations of opaque structs, without any explanation. I could just copy the above snippet and move on with my life (which I will do in the meantime), but I'd like to know once and for all
Which form is the best?
Which guarantees does it provide over void*?
How does it work?
Use struct MyType in C++.
Use typedef struct MyType* pMyType; as your common handle.
Your "C" APIs should compile in both C and C++ (with extern "C" wrappers in C++ to get correct linkage). And you'll get close to max type safety.
Now, struct MyHandle{void* private_ptr;}; is another option: this avoids exposing the name of the C++ type to C. And so long as you isolate direct interaction with private_ptr to a handful of functions, it will be as type safe everywhere else.
The problem with void * is that it gives you no protection from accidentally assigning an incompatible pointer.
typedef void *CMyClass;
int i = 1;
CMyClass c = &i; // No complaints
If you instead typedef to some unique opaque type the compiler will help you.
typedef struct MyClass *CMyClass;
int i = 1;
CMyClass c = &i; // BOOM!
I think in C this is not an error but Clang 6.0 warns me with (even without any warnings enabled)
warning: incompatible pointer types initializing 'CMyClass' (aka 'struct MyClass *') with an expression of type 'int *'
The first thing is that void * is indeed not a good choice because it makes API more error prone by silently accepting any unrelated pointers. So the better idea would be to add a forward declaration to some struct and accept a pointer to that struct:
#ifdef __cplusplus
extern "C"
{
#endif
struct CMyClassTag;
typedef struct CMyClassTag CMyClass;
void CMyClass_Work(CMyClass * p_self);
#ifdef __cplusplus
}
#endif
The next step is to explicitly tell user that this pointer is opaque and is not supposed to be dereferenced by hiding pointer as unnecessary implementation detail:
typedef struct CMyClassTag * CMyClassHandle;
void CMyClass_Work(CMyClassHandle h_my_class);
Additionally rather that relying on user to correctly utilize this interface you can make real handle type rather than an opaque pointer. This could be done in several ways, but the main idea is to pass some obscure integer identifier and perform mapping from it to real pointer on library side at runtime:
typedef uintptr_t CMyClassHandle;
void CMyClass_Work(CMyClassHandle h_my_class);
// impl
void CMyClass_Work(CMyClassHandle h_my_class)
{
auto it{s_instances_map.find(h_my_class)};
if(s_instances_map.end() != it)
{
auto & self{it->second};
// ...
}
}

What is the correct way of interop'ing with C flexible array members from C++?

I know that flexible array member is not part of the C++11 standard.
So what is the correct way of interoperating with C code that return, or accept as argument, structs with flexible array member, from C++11?
Should I write a shim that maps the flexible array member from the C struct to a pointer in C++?
As far as I am aware, standard C++ won't even accept the declaration of a struct with a flexible array member. With that being the case, I see no alternative but to write wrapper functions (in C), unless the structure type containing the FAM can be opaque to your C++ code. I'm uncertain whether a wrapper is the kind of shim you had in mind.
Before we go further, however, I should point out that the problem is substantially different if your C functions accept and return pointers to structures with a flexible array member than if they pass and return the actual structures. I'll assume that they do work with pointers to these structures, for otherwise there seems no point to having the FAM in the first place.
I guess that given a C declaration such as
struct foo {
int count;
my_type fam[];
};
I would represent the same data in C++ as
struct cpp_foo {
int count;
my_type *fam;
};
, which of course can be handled by C, as well. Be aware that you cannot successfully cast between these, because arrays are not pointers.
Given a C function
struct foo *do_something(struct foo *input);
the needed wrapper might then look like this:
struct cpp_foo *do_something_wrap(struct cpp_foo *input) {
struct cpp_foo *cpp_output = NULL;
// Prepare the input structure
size_t fam_size = input->count * sizeof(*input->fam);
struct foo *temp = malloc(sizeof(*temp) + fam_size);
if (!temp) {
// handle allocation error ...
} else {
struct foo *output;
temp->count = input->count;
memcpy(temp->fam, input->fam, fam_size);
// call the function
output = do_something(temp);
if (output) {
// Create a copy of the output in C++ flavor
cpp_output = malloc(sizeof(*cpp_output));
if (!cpp_output) {
// handle allocation error
} else {
fam_size = output->count * sizeof(output->fam[0])
cpp_output->fam = malloc(fam_size);
if (!cpp_output) // handle allocation error
memcpy(cpp_output->fam, output->fam, fam_size);
// Supposing that we are responsible for the output object ...
free(output);
}
} // else cpp_output is already NULL
free(temp);
}
return cpp_output;
}
Naturally, if you have several functions to wrap then you probably want to write reusable conversion functions to simplify it.
There's a trick used by Windows by setting the flexible array member to have size 1 (because the Win32 API has been developed long before the feature ever went into C99, let alone C++)
struct foo {
int count;
my_type fam[1];
};
If you're allowed to change the C version then use the same struct in both C and C++. In case you can't change the C version then you'll need to redefine the struct in C++. You still have to change the C++ code when the C struct was modified, but at least it'll compile fine
See also
Are flexible array members valid in C++?
Why do some structures end with an array of size 1?
As flexible array members cannot to exposed to C++ (my_type fam[]; is not a valid C++ member), we'll have to define your own type.
Luckily C linkage functions don't have symbols that depend on their arguments. So we can either modify the definition of foo within shared headers, or define our own and don't include their headers.
This is a struct that is likely to be compatible layout-wise. Note that you should never declare these on the stack in C++:
struct foo {
int count;
#ifndef __cplusplus
my_type fam[];
#else
my_type do_not_use_fam_placeholder;
my_type* fam() {
return &do_not_use_fam_placeholder;
}
my_type const* fam() const {
return &do_not_use_fam_placeholder;
}
#endif
};
This relies upon the binary layout of the foo structure in C to be the prefix members, followed by the flexible array member's elements, and no additional packing or alignment be done. It also requires that the flexible array member never be empty.
I would use this+1 but that runs into alignment issues if there is padding between count and fam.
Use of memcpy or memmov or the like on foo is not advised. In general, creating a foo on the C++ side isn't a good idea. If you have to, you could do something like this:
struct foo_header {
int count;
};
foo* create_foo_in_cpp(int count) {
std::size_t bytes = sizeof(foo)+sizeof(my_type)*(count-1);
foo* r = (foo*)malloc(bytes);
::new((void*)r) foo_header{count};
for (int i = 0; i < count; ++i)
::new( (void*)(r->fam()+i) my_type();
return r;
};
which constructs every object in question in C++. C++'s object existence rules are more strict than C's; merely taking some POD memory and interpreting it as a POD is not a valid action in C++, while it is in C. The news above will be optimized to noops at runtime, but are required by C++ to declare that the memory in question should be treated as objects of that type under strict reading of the standard.
Now, there are some standard issues (defects) with manually per-element constructing the elements of an array, and layout-compatibility between arrays and elements, so you'll have to trust somewhat that the ABI of the C++ compiler and the C code is compatible (or check it).
In general, all interop between C and C++ is undefined by the C++ standard (other than some parts of the standard C library which C++ incorporates; even here, there is no mandate that C++ use the same C library). You must understand how your particular implementation of C and C++ interoprate.

Allocate incomplete type on stack

I am working on wrapping a C++ library in C. The C++ library is a library for a database server. It uses a wrapper class for passing around serialized data. I can't use that class directly in C, so I defined a struct that can be used in C code like this:
In include/c-wrapper/c-wrapper.h (this is the wrapper that clients of my C wrapper are including)
extern "C" {
typedef struct Hazelcast_Data_t Hazelcast_Data_t;
Hazelcast_Data_t *stringToData(char *str);
void freeData(Hazelcast_Data_t *d);
}
In impl.pp
extern "C" struct Hazelcast_Data_t {
hazelcast::client::serialization::pimpl::Data data; // this is the C++ class
};
Hazelcast_Data_t *stringToData(char *str) {
Data d = serializer.serialize(str);
Hazelcast_Data_t *dataStruct = new Hazelcast_Data_t();
dataStruct->data = d;
return dataStruct;
}
...
Now this works, the client of my C library only sees typedef struct Hazelcast_Data_t Hazelcast_Data_t;. The problem is, that the aforementioned type cannot be allocated on the stack, like if I would like to provide an API like this:
// this is what I want to achieve, but Hazelcast_Data_t is an incomplete type
#include <include/c-wrapper/c-wrapper.h>
int main() {
char *str = "BLA";
Hazelcast_Data_t d;
stringToData(str, &d);
}
The compiler will throw an error that Hazelcast_Data_t is an incomplete type. I would still like to provide an API that allows to pass a stack-allocated reference of Hazelcast_Data_t to the serialization function, but because Hazelcast_Data_t has a pointer to the C++ class, this seems pretty much impossible. Having the option to pass a stack allocated reference however would greatly simplify the code for the client of my C library (no need to free the newed structure).
Is it somehow doable to redefine Hazelcast_Data_t type so that it can be used in C and still be allocated on the stack?
Most of the hacks you're thinking of for doing this invoke undefined behaviour, since C will not call the C++ constructor for the contained object when the struct is created, and not call the C++ destructor when the struct goes out of scope. To make it work you need the struct to contain a buffer of the right size and new into that buffer in an init function, and call the destructor on that buffer when done. This means the code looks like this (assuming that nothing throws - in which case you need to add exception handling and translation...)
struct wrapper {
char buffer[SIZE_OF_CXX_CLASS];
}
void wrapper_init() {
new (buffer) Wrapped();
}
void wrapper_destroy() {
((Wrapper*)buffer)->~Wrapper();
}
{
struct wrapper wrapped;
wrapper_init(&wrapped);
// ... use it ...
wrapper_destroy(&wrapped);
}
If you forget to call wrapper_init everything goes into undefined behvaiour land. If you forget to call wrapper_destroy I think you get UB too.
But since this forces your caller to call the init and destroy functions there's very little gain over using a pointer. I'd go so far as to claim that the use of a struct rather than a pointer suggests to API users that initialisation should be trivial, and destruction unnecessary. I.e. as an API user I'd expect to be able to do
{
struct wrapper wrapped = WRAPPER_INIT; //Trivial initialisaton macro
// .. use it ..
// No need to do anything it is a trivial object.
}
In the cases where this is not possible (like yours) I'd stick with the usual allocate it on the heap idiom
{
struct wrapper* wrapped = wrapper_create();
// ... use it ...
wrapper_destroy(wrapped);
}
You need to provide a definition of the struct in your header file such so that clients know how much space to allocate on the stack. But that becomes tricky when the underlying representation in a C++ class which can't be exposed by the extern "C".
The solution is a pointer to the C++ class rather than the actual class. As pointers are the same size this will work in the C client, even when it has no knowledge of C++.
Thus in the header
typedef struct Hazelcast_Data_t {
void *data
} Hazelcast_Data_t
And in the C++ file you can use static_cast to access the C++ class via this pointer.
Make a wrapper struct that simply contains an array large and aligned enough to contain your C++ type. Placement-new your C++ type in it.
You will probably have to build a small C++ executable that would generate a C header file with SIZEOF_HAZELCAST_T and ALIGNOF_HAZELCAST_T appropriately defined.

Can I get away with this C++ downcasting fib?

I have a C library that has types like this:
typedef struct {
// ...
} mytype;
mytype *mytype_new() {
mytype *t = malloc(sizeof(*t));
// [Initialize t]
return t;
}
void mytype_dosomething(mytype *t, int arg);
I want to provide C++ wrappers to provide a better syntax. However, I want to avoid the complication of having a separately-allocated wrapper object. I have a relatively complicated graph of objects whose memory-management is already more complicated than I would like (objects are refcounted in such a way that all reachable objects are kept alive). Also the C library will be calling back into C++ with pointers to this object and the cost of a new wrapper object to be constructed for each C->C++ callback (since C doesn't know about the wrappers) is unacceptable to me.
My general scheme is to do:
class MyType : public mytype {
public:
static MyType* New() { return (MyType*)mytype_new(); }
void DoSomething(int arg) { mytype_dosomething(this, arg); }
};
This will give C++ programmers nicer syntax:
// C Usage:
mytype *t = mytype_new();
mytype_dosomething(t, arg);
// C++ Usage:
MyType *t = MyType::New();
t->DoSomething(arg);
The fib is that I'm downcasting a mytype* (which was allocated with malloc()) to a MyType*, which is a lie. But if MyType has no members and no virtual functions, it seems like I should be able to depend on sizeof(mytype) == sizeof(MyType), and besides MyType has no actual data to which the compiler could generate any kind of reference.
So even though this probably violates the C++ standard, I'm tempted to think that I can get away with this, even across a wide array of compilers and platforms.
My questions are:
Is it possible that, by some streak of luck, this does not actually violate the C++ standard?
Can anyone think of any kind of real-world, practical problem I could run into by using a scheme like this?
EDIT: #James McNellis asks a good question of why I can't define MyType as:
class MyType {
public:
MyType() { mytype_init(this); }
private:
mytype t;
};
The reason is that I have C callbacks that will call back into C++ with a mytype*, and I want to be able convert this directly into a MyType* without having to copy.
You're downcasting a mytype* to a MyType*, which is legal C++. But here it's problematic since the mytype* pointer doesn't actually point to a MyType. It actually points to a mytype. Thus, if you downcast it do a MyType and attempt to access its members, it'll almost certainly not work. Even if there are no data members or virtual functions, you might in the future, and it's still a huge code smell.
Even if it doesn't violate the C++ standard (which I think it does), I would still be a bit suspicious about the code. Typically if you're wrapping a C library the "modern C++ way" is through the RAII idiom:
class MyType
{
public:
// Constructor
MyType() : myType(::mytype_new()) {}
// Destructor
~MyType() { ::my_type_delete(); /* or something similar to this */ }
mytype* GetRawMyType() { return myType; }
const mytype* GetConstRawMyType() const { return myType; }
void DoSomething(int arg) { ::mytype_dosomething(myType, int arg); }
private:
// MyType is not copyable.
MyType(const MyType&);
MyType& operator=(const MyType&);
mytype* myType;
};
// Usage example:
{
MyType t; // constructor called here
t.DoSomething(123);
} // destructor called when scope ends
Is it possible that, by some streak of luck, this does not actually violate the C++ standard?
I'm not advocating this style, but as MyType and mytype are both PODs, I believe the cast does not violate the Standard. I believe MyType and mytype are layout-compatible (2003 version, Section 9.2, clause 14: "Two POD-struct ... types are layout-compatible if they have the same number of nonstatic data members, and corresponding nonstatic data members (in order) have layout-compatible types (3.9)."), and as such can be cast around without trouble.
EDIT: I had to test things, and it turns out I'm wrong. This is not Standard, as the base class makes MyType non-POD. The following doesn't compile:
#include <cstdio>
namespace {
extern "C" struct Foo {
int i;
};
extern "C" int do_foo(Foo* f)
{
return 5 + f->i;
}
struct Bar : Foo {
int foo_it_up()
{
return do_foo(this);
}
};
}
int main()
{
Bar f = { 5 };
std::printf("%d\n", f.foo_it_up());
}
Visual C++ gives the error message that "Types with a base are not aggregate." Since "Types with a base are not aggregate," then the passage I quoted simply doesn't apply.
I believe that you're still safe in that most compilers will make MyType layout-compatible with with mytype. The cast will "work," but it's not Standard.
I think it would be much safer and elegant to have a mytype* data member of MyType, and initialize it in the constructor of MyType rather than having a New() method (which, by the way, has to be static if you do want to have it).
It does violate the c++ standard, however it should work on most (all that I know) compilers .
You're relying on a specific implementation detail here (that the compiler doesn't care what the actual object is, just what is the type you gave it), but I don't think any compiler has a different implementation detail. be sure to check it on every compiler you use, it might catch you unprepared.

Why should non-POD types in C++ be opaque to C clients?

gcc 4.4.4 c89
I was just reading a discussion at DevX about calling C++ code from C since I have to do something similar. I am just wondering what user Vijayan meant by "make sure that non POD types in C++ are opaque to C clients."
Many thanks for any suggestions,
C can only deal with POD types.
Consequently, you cannot pass objects of non-POD types to C programs (by value). Also, if you pass pointers of non-POD types to C programs, they can't interact with the objects pointed to.
POD = Plain old data structure = C structs, no virtual methods, etc. You need to write wrapper functions for C to access non-POD types (i.e., classes).
More on POD:
http://en.wikipedia.org/wiki/Plain_old_data_structure
For a type to be opaque means you can't look inside it: it's a "black box" that can be passed around but not inspected or manipulated directly by the C code. You typically refer to the object using either heap-allocated memory and void*s, or using functions to determine the necessary length and buffers.
For example, a C++ object might contain a std::string, but the layout of a std::string is not specified in the C++ Standard, so you can't write C code that directly reads from or writes to the string (at least, not without having a total understanding of the std::string layout, manually revalidated every time the compiler/STL is updated).
So, to allow C code to access the object, you might write C-callable functions such as:
#if __cplusplus
extern "C" {
#endif
void* object_new();
const char* object_get_string(void* p_object);
void object_set_string(void* p_object, const char* s);
void object_delete();
#if _cplusplus
}
#endif
With C++ implementation ala:
class Object { std::string string_; ... }
void* object_new() { return new Object; }
const char* object_get_string(void* p) { return ((Object*)p)->string_.c_str()); }
...
Here, the object_XXX functions provide the C code with a safe way to use the Object.
Making the type opaque means, as per the line in the link:
typedef struct base base ; /* opaque */
makes the name of the handle available to C code, but not the definition of the type. This means that the C code cannot access any members directly, but has to go through the interface functions.
Note that you do not have to make a cast to a generic , i.e. void*, pointer, although doing so is one option, as per 9dan's answer.
Note that such a style of interface is in my experience a very nice way to manage encapsulation even in pure C code, just as in the standard C streams library.
Making opaque to clients means nothing special. C stream file I/O (FILE* f = fopen) API is the typical example that present opaque handle to clients.
Apparently C can not handle non-POD type so you must hide C++ implementation from C clients but provide access method.
Example:
C++ Implementation
class MyLibrary {
MyLibrary();
~MyLibrary();
int DoSomething();
...
}
Declaration for C clients
typedef void* OPAQUEHANDLE;
extern OPAQUEHANDLE MyLibrary_OpenLibrary();
extern void MyLibrary_CloseLibrary(OPAQUEHANDLE h);
extern int MyLibrary_DoSometing(OPAQUEHANDLE h);
Implementation for C clients (in .cpp file)
extern OPAQUEHANDLE MyLibrary_OpenLibrary()
{
return new MyLibrary;
}
extern void MyLibrary_CloseLibrary(OPAQUEHANDLE h)
{
delete (MyLibrary*) h;
}
extern int MyLibrary_DoSometing(OPAQUEHANDLE h)
{
return ((MyLibrary*)h)->DoSomething();
}