Is there a standard procedure for passing classes by value? In other words, if I do this:
struct Test
{
int a;
double b;
}
void DoSomething(Test t)
{
std::cout << t.a << std::endl;
std::cout << t.b << std::endl;
}
//...
Test myObj;
myObj.a = 5;
myObj.b = 10.5;
DoSomething(myObj);
Assuming standard packing and layout, does the standard provide any guarantees that the class will be sent and received in a consistent manner regardless of compiler?
Because I anticipate questions along the lines of "why do you want to do this?" or "this feels like an XY problem", here's (lengthy) context. I'm attempting to pass a class object back-and-forth between an EXE and a DLL compiled with different compilers, and it appears that the class object is not being read from the correct starting address. This problem evaporates if I pass the object by reference, however. (Bonus question - why would passing by reference work when passing by value would not? I was under the impression passing by value would copy the object and pass a reference to the copy. Clearly I'm misunderstanding something here.)
In general, ABI between different C++ compilers can vary in any way they see fit. The C++ standard does not mandate a given ABI.
However, C ABIs are extremely stable. One way to deal with this problem is to have header-only functions that translate your code into extern "C" functions, which are exported from your DLL. Inside your DLL the extern "C" functions then call a more conventional C++ interface.
For a concrete example,
struct Test;
// DLL exported:
extern "C" void Private_DoSomething_Exported( Test* );
// Interface:
namespace Interface {
inline void DoSomething( Test t ) { return Private_DoSomething_Exported(&t); }
};
// implementation. Using Test&& to make it clear that the reference is not an export parameter:
namespace Implementation {
void DoSomething( Test&&t ) {
std::cout << t.a << std::endl;
std::cout << t.b << std::endl;
}
}
void Private_DoSomething_Exported( Test* t ) {
Assert(t);
Implementation::DoSomething(std::move(*t));
}
This places the "most compatible" ABI (a pure "C" ABI) at the point where you export functions from a DLL. The client code calls Interface::DoSomething, which inline in the client code calls the "C" ABI (which doesn't even know the layout of the object), which then calls a C++ Implementation::DoSomething.
This is still not proof against every issue, because the layout of even POD structs could vary based on compilers (as a practical example, some compilers treat long as 32 bit on 64 bit machines, others treat long as 64 bits on 64 bit machines). Packing can also vary.
To reduce that impact, you'll first want to only use fixed size types from the C header files. You'll also want to examine the packing docs of both compilers.
Related
I'm trying to make some sort of C++ "bridge" to connect an unmanaged C++ dll on one end (without modifying their code) to a C# Wrapper which uses DllImport for various imports.
I was able to pass a C# string to my bridge using char pointers, but the receiving Dll needs to receive std::string, so I tried with std::string(foo); with no luck, it always gets transformed into weird characters.
The structure is the following :
C# Wrapper
[DllImport(#"Bridge.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern void initDetector(string foo, int something = 0);
C++ Bridge
extern "C" __declspec(dllexport) void initCppClass(char* foo, int something)
{
std::string bar = std::string(foo);
std::cout << bar << std::endl; //Returns "foo"
instance = new CppClass(bar, something);
}
C++ Imported DLL (not allowed to change code here)
CppClass::CppClass(std::string foo, int something)
{
std::cout << foo << std::endl; //Returns garbage
}
Note that this constructor is for demonstration purposes only, as I cannot disclose the original code.
I originally tried passing the char* directly to the constructor but that didn't work either. Is there something I'm missing here ?
I think the problem is different string encoding.
Try adding CharSet = CharSet.Ansi in C#, like this:
[DllImport(#"Bridge.dll", CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi)]
However, please read the documentation of that C++ dll API. A lot of C++ code, especially if that’s cross-platform code, expect UTF8-encoded strings. If that’s your case, instead change the bridge to
extern "C" __declspec(dllexport) void initCppClass(const wchar_t* foo, int something)
And write code to convert the string from UTF16-encoded C pointer into UTF8-encoded std::string, see this answer for an example.
Update: another possible reason is different STL, or different CRT. When you pass std::string or any other STL objects objects across DLL boundaries, you have to use same compiler & same version of it, same build settings for (e.g. in VC++, std::strings memory layout differs between debug and release builds), and also both DLLs must link to CRT dynamically., e.g. Multi-threaded DLL (/MD)
In short: it is a smart pointers in C question. Reason: embedded programming and need to ensure that if complex algorithm is used, then proper deallocation occurs with little effort on the developer side.
My favorite feature of C++ is ability to execute a proper deallocation of object allocated on stack and that goes out of scope. GO language defer provides same functionality and it is a bit closer in spirit to C.
GO defer would be the desired way of doing things in C. Is there a practical way to add such functionality?
The goal of doing so is simplification of tracking when and where object goes out of scope. Here is a quick example:
struct MyDataType *data = malloc(sizeof(struct MyDataType));
defer(data, deallocator);
if (condition) {
// dallocator(data) is called automatically
return;
}
// do something
if (irrelevant) {
struct DT *localScope = malloc(...);
defer(localScope, deallocator);
// deallocator(localScope) is called when we exit this scope
}
struct OtherType *data2 = malloc(...);
defer(data2, deallocator);
if (someOtherCondition) {
// dallocator(data) and deallocator(data2) are called in the order added
return;
}
In other languages I could create an anonymous function inside the code block, assign it to the variable and execute manually in front of every return. This would be at least a partial solution. In GO language defer functions can be chained. Manual chaining with anonymous functions in C is error prone and impractical.
Thank you
In C++, I've seen "stack based classes" that follow the RAII pattern. You could make a general purpose Defer class (or struct) that can take any arbitrary function or lambda.
For example:
#include <cstddef>
#include <functional>
#include <iostream>
#include <string>
using std::cout;
using std::endl;
using std::function;
using std::string;
struct Defer {
function<void()> action;
Defer(function<void()> doLater) : action{doLater} {}
~Defer() {
action();
}
};
void Subroutine(int i) {
Defer defer1([]() { cout << "Phase 1 done." << endl; });
if (i == 1) return;
char const* p = new char[100];
Defer defer2([p]() { delete[] p; cout << "Phase 2 done, and p deallocated." << endl; });
if (i == 2) return;
string s = "something";
Defer defer3([&s]() { s = ""; cout << "Phase 3 done, and s set to empty string." << endl; });
}
int main() {
cout << "Call Subroutine(1)." << endl;
Subroutine(1);
cout << "Call Subroutine(2)." << endl;
Subroutine(2);
cout << "Call Subroutine(3)." << endl;
Subroutine(3);
return EXIT_SUCCESS;
}
Many different answers, but a few interesting details was not said.
Of course destructors of C++ are very strong and should be used very often. Sometime some smart pointers could help you. But the mechanism, that is the most resemble to defer is ON_BLOCK_EXIT/ON_BLOCK_EXIT_OBJ (see http://http://www.drdobbs.com/cpp/generic-change-the-way-you-write-excepti/184403758 ). Do not forgot to read about ByRef.
One big difference between C++ and go is when deffered is called. In C++ when your program leaving scope, where is was created. But in go when your program leaving function. That means, this code won't work at all:
for i:=0; i < 10; i++ {
mutex.Lock()
defer mutex.Unlock()
/* do something under the mutex */
}
Of course C does not pretends that is object oriented and therefore there are no destructors at all. It help a lot of readability of code, because you know that your program at line X do only what is written in that line. In contrast of C++ where each closing curly bracket could cause calling of dozens destructors.
In C you can use hated statement goto. Don't use it for anything else, but it is practical to have cleanup label at the end of function call goto cleanup from many places. Bit more complicated is when more than one resource you want do release, than you need more that one cleanup. Than your function finish with
cleanup_file:
fclose(f);
cleanup_mutex:
pthread_mutex_unlock(mutex);
return ret;
}
C does not have destructors (unless you think of the GCC specific variable attribute cleanup, which is weird and rarely used; notice also that the GCC function attribute destructor is not what other languages, C++ notably, call destructor). C++ have them. And C & C++ are very different languages.
In C++11, you might define your class, having a std::vector or std::function-s, initialized using a std::initialized_list of lambda expressions (and perhaps dynamically augmented by some push_back). Then its destructor could mimic Go's defer-ed statements. But this is not idiomatic.
Go have defer statements and they are idiomatic in Go.
I recommend sticking to the idioms of your programming languages.
(In other words: don't think in Go while coding in C++)
You could also embed some interpreter (e.g. Lua or Guile) in your application. You might also learn more about garbage collection techniques and concepts and use them in your software (in other words, design your application with its specific GC).
Reason: embedded programming and need to ensure that if complex algorithm is used, then proper deallocation occurs with little effort on the developer side.
You might use arena-based allocation techniques, and de-allocate the arena when suitable... When you think about that, it is similar to copying GC techniques.
Maybe you dream of some homoiconic language with a powerful macro system suitable for meta-programming. Then look into Common Lisp.
I just implemented a very simple thing like defer in golang several days ago.
The only one behaviour different from golang is my defer will not be executed when you throw an exception but does not catch it at all. Another difference is this cannot accept a function with multiple arguments like in golang, but we can deal it with lambda capturing local variables.
The implementations are here.
class _Defer {
std::function<void()> __callback;
public:
_Defer(_Defer &&);
~_Defer();
template <typename T>
_Defer(T &&);
};
_Defer::_Defer(_Defer &&__that)
: __callback{std::forward<std::function<void()>>(__that.__callback)} {
}
template <typename T>
_Defer::_Defer(T &&__callback)
: __callback{
static_cast<std::function<void()>>(std::forward<T>(__callback))
} {
static_assert(std::is_convertible<T, std::function<void()>>::value,
"Cannot be convert to std::function<void()>.");
}
_Defer::~_Defer() {
this->__callback();
}
And then I defined some macros to make my defer like a keyword in C++ (just for fun)
#define __defer_concatenate(__lhs, __rhs) \
__lhs##__rhs
#define __defer_declarator(__id) \
if (0); /* You may forgot a `;' or deferred outside of a scope. */ \
_Defer __defer_concatenate(__defer, __id) =
#define defer \
__defer_declarator(__LINE__)
The if (0); is used to prevent defer a function out of a scope. And then we can use defer like in golang.
#include <iostream>
void foo() {
std::cout << "foo" << std::endl;
}
int main() {
defer []() {
std::cout << "bar" << std::endl;
};
defer foo;
}
This will print
foo
bar
to screen.
GO defer would be the desired way of doing things in C. Is there a practical way to add such functionality?
The goal of doing so is simplification of tracking when and where object goes out of scope.
C does not have any built-in mechanism for automatically invoking any kind of behavior at the end of an object's lifetime. The object itself ceases to exist, and any memory it occupied is available for re-use, but there is no associated hook for executing code.
For some kinds of objects, that is entirely satisfactory by itself -- those whose values do not refer to other objects with allocated storage duration that need to be cleaned up as well. In particular, if struct MyDataType in your example is such a type, then you get automatic cleanup for free by declaring instances as automatic variables instead of allocating them dynamically:
void foo(void) {
// not a pointer:
struct MyDataType data /* = initializer */;
// ...
/* The memory (directly) reserved for 'data' is released */
}
For objects that require attention at the end of their lifetime, it is generally a matter of code style and convention to ensure that you know when to clean up. It helps, for example, to declare all of your variables at the top of the innermost block containing them, though C itself does not require this. It can also help to structure your code so that for each object that requires custom cleanup, all code paths that may execute during its lifetime converge at the end of that lifetime.
Myself, as a matter of personal best practices, I always try to write any cleanup code needed for a given object as soon as I write its declaration.
In other languages I could create an anonymous function inside the code block, assign it to the variable and execute manually in front of every return. This would be at least a partial solution. In GO language defer functions can be chained. Manual chaining with anonymous functions in C is error prone and impractical
C has neither anonymous functions nor nested ones. It often does make sense, however, to write (named) cleanup functions for data types that require cleanup. These are analogous to C++ destructors, but you must call them manually.
The bottom line is that many C++ paradigms such as smart pointers, and coding practices that depend on them, simply do not work in C. You need different approaches, and they exist, but converting a large body of existing C++ code to idiomatic C is a distinctly non-trivial undertaking.
For those using C, I’ve built a preprocessor in C (open source, Apache license) that inserts the deferred code at the end of each block:
https://sentido-labs.com/en/library/#cedro
GitHub: https://github.com/Sentido-Labs/cedro/
It includes a C utility that wraps the compiler (works out-of-the-box with GCC and clang, configurable) so you can use it as drop-in replacement for cc, called cedrocc, and if you decide to get rid of it, running cedro on a C source file will produce plain C. (see the examples in the manual)
The alternatives I know about are listed in the “Related work” part of the documentation:
Apart from the already mentioned «A defer mechanism for C», there are macros that use a for loop as for (allocation and initialization; condition; release) { actions } [a] or other techniques [b].
[a] “P99 Scope-bound resource management with for-statements” from the same author (2010), “Would it be possible to create a scoped_lock implementation in C?” (2016), ”C compatible scoped locks“ (2021), “Modern C and What We Can Learn From It - Luca Sas [ ACCU 2021 ] 00:17:18”, 2021
[b] “Would it be possible to create a scoped_lock implementation in C?” (2016), “libdefer: Go-style defer for C” (2016), “A Defer statement for C” (2020), “Go-like defer for C that works with most optimization flag combinations under GCC/Clang” (2021)
Compilers like GCC and clang have non-standard features to do this like the __cleanup__ variable attribute.
This implementation avoids dynamic allocation and most limitations of other implementations shown here
#include<type_traits>
#include<utility>
template<typename F>
struct deferred
{
std::decay_t<F> f;
template<typename G>
deferred(G&& g) : f{std::forward<G>(g)} {}
~deferred() { f(); }
};
template<typename G>
deferred(G&&) -> deferred<G>;
#define CAT_(x, y) x##y
#define CAT(x, y) CAT_(x, y)
#define ANONYMOUS_VAR(x) CAT(x, __LINE__)
#define DEFER deferred ANONYMOUS_VAR(defer_variable) = [&]
And use it like
#include<iostream>
int main()
{
DEFER {
std::cout << "world!\n";
};
std::cout << "Hello ";
}
Now, whether to allow exceptions in DEFER is a design choice bordering on philosophy, and I'll leave it to Andrei to fill in the details.
Note all such deferring functionalities in C++ necessarily has to be bound to the scope at which it is declared, as opposed to Go's which binds to the function at which it is declared.
Consider the following code:
file_1.hpp:
typedef void (*func_ptr)(void);
func_ptr file1_get_function(void);
file1.cpp:
// file_1.cpp
#include "file_1.hpp"
static void some_func(void)
{
do_stuff();
}
func_ptr file1_get_function(void)
{
return some_func;
}
file2.cpp
#include "file1.hpp"
void file2_func(void)
{
func_ptr function_pointer_to_file1 = file1_get_function();
function_pointer_to_file1();
}
While I believe the above example is technically possible - to call a function with internal linkage only via a function pointer, is it bad practice to do so? Could there be some funky compiler optimizations that take place (auto inline, for instance) that would make this situation problematic?
There's no problem, this is fine. In fact , IMHO, it is a good practice which lets your function be called without polluting the space of externally visible symbols.
It would also be appropriate to use this technique in the context of a function lookup table, e.g. a calculator which passes in a string representing an operator name, and expects back a function pointer to the function for doing that operation.
The compiler/linker isn't allowed to make optimizations which break correct code and this is correct code.
Historical note: back in C89, externally visible symbols had to be unique on the first 6 characters; this was relaxed in C99 and also commonly by compiler extension.
In order for this to work, you have to expose some portion of it as external and that's the clue most compilers will need.
Is there a chance that there's a broken compiler out there that will make mincemeat of this strange practice because they didn't foresee someone doing it? I can't answer that.
I can only think of false reasons to want to do this though: Finger print hiding, which fails because you have to expose it in the function pointer decl, unless you are planning to cast your way around things, in which case the question is "how badly is this going to hurt".
The other reason would be facading callbacks - you have some super-sensitive static local function in module m and you now want to expose the functionality in another module for callback purposes, but you want to audit that so you want a facade:
static void voodoo_function() {
}
fnptr get_voodoo_function(const char* file, int line) {
// you tagged the question as C++, so C++ io it is.
std::cout << "requested voodoo function from " << file << ":" << line << "\n";
return voodoo_function;
}
...
// question tagged as c++, so I'm using c++ syntax
auto* fn = get_voodoo_function(__FILE__, __LINE__);
but that's not really helping much, you really want a wrapper around execution of the function.
At the end of the day, there is a much simpler way to expose a function pointer. Provide an accessor function.
static void voodoo_function() {}
void do_voodoo_function() {
// provide external access to voodoo
voodoo_function();
}
Because here you provide the compiler with an optimization opportunity - when you link, if you specify whole program optimization, it can detect that this is a facade that it can eliminate, because you let it worry about function pointers.
But is there a really compelling reason not just to remove the static from infront of voodoo_function other than not exposing the internal name for it? And if so, why is the internal name so precious that you would go to these lengths to hide that?
static void ban_account_if_user_is_ugly() {
...;
}
fnptr do_that_thing() {
ban_account_if_user_is_ugly();
}
vs
void do_that_thing() { // ban account if user is ugly
...
}
--- EDIT ---
Conversion. Your function pointer is int(*)(int) but your static function is unsigned int(*)(unsigned int) and you don't want to have to cast it.
Again: Just providing a facade function would solve the problem, and it will transform into a function pointer later. Converting it to a function pointer by hand can only be a stumbling block for the compiler's whole program optimization.
But if you're casting, lets consider this:
// v1
fnptr get_fn_ptr() {
// brute force cast because otherwise it's 'hassle'
return (fnptr)(static_fn);
}
int facade_fn(int i) {
auto ui = static_cast<unsigned int>(i);
auto result = static_fn(ui);
return static_cast<int>(result);
}
Ok unsigned to signed, not a big deal. And then someone comes along and changes what fnptr needs to be to void(int, float);. One of the above becomes a weird runtime crash and one becomes a compile error.
I'm working on a project that delivers statistics to the user. I created a class called Dog,
And it has several functions. Speak, woof, run, fetch, etc.
I want to have a function that spits out how many times each function has been called. I'm also interested in the constructor calls and destructor calls as well.
I have a header file which defines all the functions, then a separate .cc file that implements them. My question is, is there a way to keep track of how many times each function is called?
I have a function called print that will fetch the "statistics" and then output them to standard output. I was considering using static integers as part of the class itself, declaring several integers to keep track of those things. I know the compiler will create a copy of the integer and initialize it to a minimum value, and then I'll increment the integers in the .cc functions.
I also thought about having static integers as a global variable in the .cc. Which way is easier? Or is there a better way to do this?
Any help is greatly appreciated!
Using static member variables is the way to go. However, the compiler will not "create a copy of the integer and initialize it to a minimum value"; you'll have to provide a definition for each one in the .cc file and initialize it to 0 there. (Things are a bit different if you're using C++11, but the basic idea is the same.)
There's no reason to use static global variables instead of static members.
foo.h:
class Foo {
static int countCtor_;
static int countDtor_;
static int countprint_:
Foo();
~Foo();
static void print();
};
foo.cc:
#include <iostream>
#include "foo.h"
int Foo::countCtor_ = 0;
int Foo::countDtor_ = 0;
int Foo::countprint_ = 0;
Foo::Foo() {
++countCtor_;
// Something here
}
Foo::~Foo() {
++countDtor_;
// Something here
}
void Foo::print() {
++countprint_;
std::cout << "Ctor: " << countCtor_ << "\n"
<< "Dtor: " << countDtor_ << "\n"
<< "print: " << countprint_ << "\n";
}
But if you've got a lot of functions, the repetition involved is a bit annoying—it's very easy to accidentally do ++countBar_ when you meant ++countBaz_ (especially if you copy and paste the boilerplate), so you may want something a bit fancier, such as a static map and a macro that increments counts[__FUNC__], so you can just use the exact same line in each function. Like this:
foo.h:
#include <map>
class Foo {
static std::map<const char*, int> counts_;
Foo();
~Foo();
void print();
};
foo.cc:
#include <iostream>
#include "foo.h"
std::map<const char *, int> Foo::counts_;
#define INC_COUNT_() do { ++counts_[__FUNC__]; } while (0)
Foo::Foo() {
INC_COUNT_();
// Something here
}
Foo::~Foo() {
INC_COUNT_();
// Something here
}
void Foo::print() {
INC_COUNT_();
for (std::map<const char *, int>::const_iterator it = counts_.begin();
it != counts_.end(); ++it) {
std::cout << it->first << ": " << it->second << "\n";
}
}
In the example code above, __FUNC__ is a placeholder. Unfortunately, there is no standard-compliant value you can use in its place. Most compilers have some subset of __func__, __FUNC__, __FUNCTION__, __FUNCSIG__, and __PRETTY_FUNCTION__. However, none of those are standard in C++03. C++11 does standardize __func__, but only as an "implementation-defined string", which isn't guaranteed to be useful, or even unique. On top of that, the values will be different on different compilers. Also, some of them may be macros rather than identifiers, to make things more fun.
If you want truly portable code, in C++11, you can use something like string(__func__) + ":" + STRINGIZE(__LINE__)—this will be somewhat ugly, but at least each function will have a unique name. And in C++03, there is no equivalent. If you just need "portable enough", consult the documentation for every compiler you use, or rely on something like autoconf.
Is there any reason you can't use standard profiling tools that will count these calls for you? Something like gprof?
Otherwise static integers would be the way to go.
Assuming you want these statistics tracked all the time in your program, you could use an unordered_map of your function names:
std::unordered_map<const char *, unsigned> stats;
void foo () {
// use __FUNCDNAME__ for MSVC
++stats[__PRETTY_FUNCTION__];
//...
}
The use of compiler specific function name specifiers is purposefully there to get the decorated function names. This is so that overloaded function names get counted as separate functions.
This technique allows you to add new functions easily without thinking about anything else, but there is a small additional cost if there are hash collisions (which can be remedied somewhat by sizing the stats map to be larger). There is no hash computed on the string, since the key is a pointer type, it just uses the pointer value itself as the hash.
If this is just one-off code for profiling, then you should first try to use the code profiling tools available on your platform.
You can put static locals inside the methods themselves, that seems cleaner since these variables aren't logically connected to the class so there's no reason to make them members.
Additionaly, you could have a macro to simplify the work. I normally don't recommend using macros, but this seems like an appropriate use:
#define DEFINE_COUNTER \
static int noCalls = 0; \
noCalls++;
void foo()
{
DEFINE_COUNTER
}
Use a library that implements the Observer Pattern or Method Call Interception. You can choose one from this list, or use something like Vitamin.
I have a core file I am examining. And I am just stumped at what can be the possible causes for this. Here is the behavoir:
extern sampleclas* someobj;
void func()
{
someobj->MemFuncCall("This is a sample str");
}
My crash is inside MemFuncCall. But when I examine core file, someobj has an address, say abc(this address is properly initialized and not corrupted) , which is different from this pointer in the function stacktrace: sampleclass::MemFuncCall(this=xyz, "This is a sample str")
I was assuming that this pointer will always be the same as address for someobj i.e. abc should always be equal to xyz.
What are the possible cases where these 2 addresses can be different???
Fyi, This app is single threaded.
Ship optimizations can make things appear very strange in a debugger. Recompile in debug mode (optimizations off), and repo.
Another possible explaination is if the calling convention (or definition in general) is wrong for MemFuncCall (there is a disagreement between the header you compiled with and when MemFuncCall was compiled). You have to try hard to get this wrong, though.
It is possible. Maybe some kind of buffer overrun? Maybe the calling convention (or definition in general) is wrong for MemFuncCall (there is a mismatch between the header you compiled with and when MemFuncCall was compiled).
Hard to say.
But since this is single threaded I would try following technique. Usually memory layout in apps is the same between reruns of application. So start your application under debugger, stop it immediately and put two memory breakpoints on addresses 0xabc and 0xxyz. You have good chance of hitting breakpoints once someone is modifying this memory. Maybe than stack traces will help?
In the case of multiple inheritance the this pointer can be different from the pointer to the "real" object:
struct A {
int a;
void fa() { std::cout << "A::this=" << this << std::endl; }
};
struct B {
int b;
void fb() { std::cout << "B::this=" << this << std::endl; }
};
struct C : A, B {
};
int main() {
C obj;
obj.fa();
obj.fb();
}
Here inside obj.fa(), this will point to the A part of obj while inside fb() it will point to the B part. So the this pointers can be different in different methods, and also different from &obj.
As a reason for the crash a possibility would be that someobj was deleted before and the pointer is no longer valid.
since the pointer someobj is defined externally it could the that there is some inconsistencies between your compilation units. try to clean everything and rebuild the project
One thing I can think of is a corruption of vtable.