Is it possible to avoid the GC for delegates?
I am building a task system. I have N-Threads with a local task queue. A task queue is basically just a Array!Fiber tasks. Because it is discouraged to send fibers to a different thread, I am sending a closure/delegate to a thread, create the fiber from that delegate and put it in the array tasks.
Now the delegates that I am sending are delegates that capture variables.
//Some Pseudo code
auto f = //some function;
auto cell = Cell(...);
auto del = () {
let res = f();
cell.write(res);
}
send(del);
}
Now cell is heap allocated and synchronized with an atomic counter. I can then check if the atomic counter from cell has reached 0, if it did I can safely read from it.
The problem is that delegates which capture variables, allocate the variables on the GC. Now I only allocate a pointer and it is probably not a huge problem but I would still like to avoid the GC.
How would I do this?
You might already know all this, but this is a bit of a FAQ so I'm going to write a few details.
First, let's understand what a delegate is. Like how a slice is just a C data pointer paired with a length, a delegate is just a C data pointer paired with a function pointer. These are passed together to functions expecting them, as if it was defined
struct d_delegate {
void* ptr; // yes, it is actually typed void*!
T* funcptr; // this is actually a function pointer
};
(Note that the fact that there is just one data ptr in there is the reason behind some compiler errors when you try to take a nested delegate inside a class method!)
That void* is what points to the data and like with a slice, it can come from a variety of places:
Object obj = new Object();
string delegate() dg = &obj.toString;
At this point, dg.ptr points to obj, which happens to be a garbage collected class object, but only because I newed it above.
struct MyStruct {
string doSomething() { return "hi"; }
}
MyStruct obj;
string delegate() dg = &obj.doSomething;
In this case, obj lives on the stack due to how I allocated it above, so the dg.ptr also points to that temporary object.
Whether something is a delegate or not says nothing about the memory allocation scheme used for it - this is arguably dangerous because a passed delegate to you might point to a temporary object that will disappear before you're finished with it! (That's the main reason why GC is used by the way, to help prevent such use-after-free bugs.)
So, if delegates can come from any object, why are they assumed to be GC so much? Well, the automatically generated closure can copy local variables to a GC segment when the compiler thinks the lifetime of the delegate is longer than the outer function.
void some_function(void delegate() dg);
void foo() {
int a;
void nested() {
a++;
}
some_function(&nested);
}
Here, the compiler will copy the variable a to a GC segment because it assumes some_function will keep a copy of it and wants to prevent use-after-free bugs (which are a pain to debug as it frequently leads to memory corruption!) as well as memory leaks.
However, if you promise the compiler that you'll do it right yourself by using the scope keyword on the delegate definition, it will trust you and leave the locals right where they are:
void some_function(scope void delegate() dg);
Keeping the rest the same, it will no longer allocate a copy. Doing it on the function definition side is the best because then you, as the function author, can ensure you don't actually keep a copy.
On the usage side though, you can also label it scope:
void foo() {
int a;
void nested() {
a++;
}
// this shouldn't allocate either
scope void delegate() dg = &nested;
some_function(&dg);
}
So, the only time memory is automatically allocated by the GC is when local variables are used by a nested function which has its address taken without the scope keyword.
Note that the () => whatever and () { return foo; } syntaxes are just shorthand for a named nested function with its address being automatically taken, so they work the same way as the above. dg = {a++;}; is the same as dg = &nested; above.
Thus, the key takeaway from this for you is that if you want to manually allocate a delegate, you just need to manually allocate an object and make a delegate from one of its methods instead of automatically capturing variables! But, you need to keep track of the lifetime and free it properly. That's the tricky part.
So for your example:
auto del = () {
let res = f();
cell.write(res);
};
you might translate that into:
struct Helper {
T res;
void del() {
cell.write(res);
}
}
Helper* helper = malloc(Helper.sizeof);
helper.res = res; // copy the local explicitly
send(&helper.del);
Then, on the receiving side, don't forget to free(dg.ptr); when you're done so you don't leak it.
Or, better yet, if you can change send to just actually take Helper objects, you don't need to allocate it at all, you can just pass it by value.
It also occurs to me that you could pack some other data in that pointer to pass other data in-place, but that'd be abi hacking and possibly undefined behavior. Try it if you wanna play though :)
Related
I have a not-so-ideal situation where a class returns handle references to objects that shouldn't be accessed after the parent objects' lifetimes. What is the best way to alter the pattern below to aid defensive coding?
// 'A<T>' is a thin factory-style utility class with asynchronous consumers.
template <typename T>
struct A {
A() : h_(*(new T())) { /* ... */ }
~A() { /* h_ deleted or ownership passed elsewhere */ }
// What's the best way to indicate that these handles shouldn't be used after
// the destructions of the A instances?
T &handle() { return h_; }
private
T &h_;
};
struct B { /* ... */ };
int main() {
B *b1{nullptr};
{
A<B> a;
// Is there a good way to trigger detection that the reference is bound to
// a variable which will outlive its 'valid' local lifetime?
b1 = &a.handle();
B &b2(a.handle()); // this is reasonable though
b1->ok_action();
b2.also_alright();
}
b1->uh_oh();
}
I know you can't truly prevent a user of C++ from doing most unsafe things, but if I could at least generate warnings on trivial accidental uses like this is would be the bulk of what I'd like to achieve.
I'm taking the liberty of making a few assumptions about your situation:
The handles point to dynamically allocated objects generated by A at the users discretion.
The handles will be passed around where A is out of scope, thus A cannot be used as a mandatory gateway.
The data to which the handles point must be destroyed when A is destroyed, thus automatic garbage collection cannot be applied to the handles.
So far, compile-time safety checking does not seem to be possible. You want coding mistakes to manifest themselves at runtime through some kind of exception mechanism rather than spontaneous crashes.
With that in mind here's a possible solution:
Within A's constructor, allocate some kind of signal object S which is set when A is destroyed. Handle S using a shared_ptr. Have A::handle return a custom handle class H which contains a B handle, and a shared_ptr to S. Create a dereference operator within H which verifies that A is still valid (S is not set), or throws an exception. When all handles expire, S will be destroyed automatically.
You want object A to produce another object of class B, let somebody use it, then ensure that B is destroyed before A?
Rather than return an instance of B, would it be possible to define a method on A which obtains the B and then passes it to some kind of delegate (virtual method, functor, lambda function)? This way the user function is nested inside a call to a method on the A object, so it's logically impossible for A to be destroyed before the user code has finished whatever it is doing.
For instance:
class A
{
public:
template <typename Functor>
void DoSomethingWithAnInstanceOfB(const char* whichB, Functor f)
{
B& bref = InternalLookupB(whichB);
f(bref);
}
};
This looks up the correct B instance and then passes it to an arbitrary functor. The functor can do whatever it wants, but it necessarily must return before DoSomethingWithAnInstanceOfB() will return, therefore guaranteeing that A's lifetime is at least as long as B.
This is not a question about why you would write code like this, but more as a question about how a method is executed in relation to the object it is tied to.
If I have a struct like:
struct F
{
// some member variables
void doSomething(std::vector<F>& vec)
{
// do some stuff
vec.push_back(F());
// do some more stuff
}
}
And I use it like this:
std::vector<F>(10) vec;
vec[0].doSomething(vec);
What happens if the push_back(...) in doSomething(...) causes the vector to expand? This means that vec[0] would be copied then deleted in the middle of executing its method. This would be no good.
Could someone explain what exactly happens here?
Does the program instantly crash? Does the method just try to operate on data that doesn't exist?
Does the method operate "orphaned" of its object until it runs into a problem like changing the object's state?
I'm interested in how a method call is related to the associated object.
Yes, it's bad. It's possible for your object to be copied (or moved in C++11 if the distinction is relevant to your code) while your are inside doSomething(). So after the push_back() returns, the this pointer may no longer point to the location of your object. For the specific case of vector::push_back(), it's possible that the memory pointed to by this has been freed and the data copied to a new array somewhere else. For other containers (list, for example) that leave their elements in place, this is (probably) not going to cause problems at all.
In practice, it's unlikely that your code is going to crash immediately. The most likely circumstance is a write to free memory and a silent corruption of the state of your F object. You can use tools like valgrind to detect this kind of behavior.
But basically you have the right idea: don't do this, it's not safe.
Could someone explain what exactly happens here?
Yes. If you access the object, after a push_back, resize or insert has reallocated the vector's contents, it's undefined behavior, meaning what actually happens is up to your compiler, your OS, what do some more stuff is and maybe a number of other factors like maybe phase of the moon, air humidity in some distant location,... you name it ;-)
In short, this is (indirectly via the std::vector implemenation) calling the destructor of the object itself, so the lifetime of the object has ended. Further, the memory previously occupied by the object has been released by the vector's allocator. Therefore the use the object's nonstatic members results in undefined behavior, because the this pointer passed to the function does not point to an object any more. You can however access/call static members of the class:
struct F
{
static int i;
static int foo();
double d;
void bar();
// some member variables
void doSomething(std::vector<F>& vec)
{
vec.push_back(F());
int n = foo(); //OK
i += n; //OK
std::cout << d << '\n'; //UB - will most likely crash with access violation
bar(); //UB - what actually happens depends on the
// implementation of bar
}
}
I am trying to write a function that will check if an object exists:
bool UnloadingBay::isEmpty() {
bool isEmpty = true;
if(this->unloadingShip != NULL) {
isEmpty = false;
}
return isEmpty;
}
I am pretty new to C++ and not sure if my Java background is confusing something, but the compiler gives an error:
UnloadingBay.cpp:36: error: no match for ‘operator!=’ in ‘((UnloadingBay*)this)->UnloadingBay::unloadingShip != 0’
I can't seem to figure out why it doesn't work.
Here is the declaration for class UnloadingBay:
class UnloadingBay {
private:
Ship unloadingShip;
public:
UnloadingBay();
~UnloadingBay();
void unloadContainer(Container container);
void loadContainer(Container container);
void dockShip(Ship ship);
void undockShip(Ship ship);
bool isEmpty();
};
It sounds like you may need a primer on the concept of a "variable" in C++.
In C++ every variable's lifetime is tied to it's encompassing scope. The simplest example of this is a function's local variables:
void foo() // foo scope begins
{
UnloadingShip anUnloadingShip; // constructed with default constructor
// do stuff without fear!
anUnloadingShip.Unload();
} // // foo scope ends, anything associated with it guaranteed to go away
In the above code "anUnloadingShip" is default constructed when the function foo is entered (ie its scope is entered). No "new" required. When the encompassing scope goes away (in this case when foo exits), your user-defined destructor is automatically called to clean up the UnloadingShip. The associated memory is automatically cleaned up.
When the encompassing scope is a C++ class (that is to say a member variable):
class UnloadingBay
{
int foo;
UnloadingShip unloadingShip;
};
the lifetime is tied to the instances of the class, so when our function creates an "UnloadingBay"
void bar2()
{
UnloadingBay aBay; /*no new required, default constructor called,
which calls UnloadingShip's constructor for
it's member unloadingShip*/
// do stuff!
} /*destructor fires, which in turn trigger's member's destructors*/
the members of aBay are constructed and live as long as "aBay" lives.
This is all figured out at compile time. There is no run-time reference counting preventing destruction. No considerations are made for anything else that might refer to or point to that variable. The compiler analyzes the functions we wrote to determine the scope, and therefore lifetime, of the variables. The compiler sees where a variable's scope ends and anything needed to clean up that variable will get inserted at compile time.
"new", "NULL", (don't forget "delete") in C++ come into play with pointers. Pointers are a type of variable that holds a memory address of some object. Programmers use the value "NULL" to indicate that a pointer doesn't hold an address (ie it doesn't point to anything). If you aren't using pointers, you don't need to think about NULL.
Until you've mastered how variables in C++ go in and out of scope, avoid pointers. It's another topic entirely.
Good luck!
I'm assuming unloadingShip is an object and not a pointer so the value could never be NULL.
ie.
SomeClass unloadingShip
versus
SomeClass *unloadingShip
Well, you don't have to write so much code to check if a pointer is NULL or not. The method could be a lot simpler:
bool UnloadingBay::isEmpty() const {
return unloadingShip == NULL;
}
Plus, it should be marked as "const" because it does not modify the state of the object and can be called on constant instances as well.
In your case, "unloadingShip" is an object of class "UnloadingShip" which is not dynamically allocated (except when the whole class "UnloadingBay" is allocated dynamically). Thus, checking if it equals to NULL doesn't make sense because it is not a pointer.
For checking, if an object exists, you can consider going this way:
create a pointer to your object:
someClass *myObj = NULL // Make it null
and now where you pass this pointer, you can check:
if(!myObj) // if its set null, it wont pass this condition
myObj = new someClass();
and then in case you want to delete, you can do this:
if(myobj)
{
delete myObj;
myObj = NULL;
}
so in this way, you can have a good control on checking whether your object exists, before deleting it or before creating a new one.
Hope this helps!
I'm working with a class for which the new operator has been made private, so that the only way to get an instance is to write
Foo foo = Foo()
Writing
Foo* foo = new Foo()
does not work.
But because I really want a pointer to it, I simulate that with the following :
Foo* foo = (Foo*)malloc(sizeof(Foo));
*foo = Foo();
so that can test whether the pointer is null to know whether is has already been initialized.
It looks like it works, from empirical tests, but is it possible that not enough space had been allocated by malloc ? Or that something else gets funny ?
--- edit ---
A didn't mention the context because I was not actually sure about why they the new operator was disabled. This class is part of a constraint programming library (gecode), and I thought it may be disabled in order to enforced the documented way of specifying a model.
I didn't know about the Concrete Data Type idiom, which looks like a more plausible reason.
That allocation scheme may be fine when specifying a standard model --- in which everything is specified as CDTs in the Space-derived class --- but in my case, these instance are each created by specific classes and then passed by reference to the constructor of the class that reprensents the model.
About the reason i'm not using the
Foo f;
Foo *pf = &f;
it would be like doing case 1 below, which throws a "returning reference to local variable" warning
int& f() { int a=5; return a; } // case 1
int& f() { int a=5; int* ap=&a; return *ap; }
int& f() { int* ap=(int*)malloc(sizeof(int)); *ap=5; return *ap; }
this warning disappears when adding a pointer in case 2, but I guess it is because the compiler loses tracks.
So the only option left is case 3 (not mentioning that additionaly, ap is a member of a class that will be initialized only once when f is called, will be null otherwise, and is the only function returning a reference to it. That way, I am sure that ap in this case when lose its meaning because of the compilier optimizing it away (may that happen ?)
But I guess this reaches far too much beyond the scope of the original question now...
Don't use malloc with C++ classes. malloc is different from new in the very important respect that new calls the class' constructor, but malloc does not.
You can get a pointer in a couple ways, but first ask yourself why? Are you trying to dynamically allocate the object? Are you trying to pass pointers around to other functions?
If you're passing pointers around, you may be better off passing references instead:
void DoSomething(Foo& my_foo)
{
my_foo.do_it();
}
If you really need a pointer (maybe because you can't change the implementation of DoSomething), then you can simply take the pointer to an automatic:
Foo foo;
DoSomething(&foo);
If you need to dynamically allocate the Foo object, things get a little trickier. Someone made the new operation private for a reason. Probably a very good reason. There may be a factory method on Foo like:
class Foo
{
public:
static Foo* MakeFoo();
private:
};
..in which case you should call that. Otherwise you're going to have to edit the implementation of Foo itself, and that might not be easy or a good thing to do.
Be careful about breaking the Concrete Data Type idiom.
You are trying to circumvent the fact that the new operator has been made private, i.e. the Concrete Data Type idiom/pattern. The new operator was probably made private for specific reasons, e.g. another part of the design may depend on this restriction. Trying to get around this to dynamically allocate an instance of the class is trying to circumvent the design and may cause other problems or other unexpected behavior. I wouldn't suggest trying to circumvent this without studying the code thoroughly to ensure you understand the impact to other parts of the class/code.
Concrete Data Type
http://users.rcn.com/jcoplien/Patterns/C++Idioms/EuroPLoP98.html#ConcreteDataType
Solutions
...
Objects that represent abstractions that live "inside" the program, closely tied to the computational model, the implementation, or the programming language, should be declared as local (automatic or static) instances or as member instances. Collection classes (string, list, set) are examples of this kind of abstraction (though they may use heap data, they themselves are not heap objects). They are concrete data types--they aren't "abstract," but are as concrete as int and double.
class ScopedLock
{
private:
static void * operator new (unsigned int size); // Disallow dynamic allocation
static void * operator new (unsigned int size, void * mem); // Disallow placement new as well.
};
int main (void)
{
ScopedLock s; // Allowed
ScopedLock * sl = new ScopedLock (); // Standard new and nothrow new are not allowed.
void * buf = ::operator new (sizeof (ScopedLock));
ScopedLock * s2 = new(buf) ScopedLock; // Placement new is also not allowed
}
ScopedLock object can't be allocated dynamically with standard uses of new operator, nothrow new, and the placement new.
The funny thing that would happen results from the constructor not being called for *foo. It will only work if it is a POD (simple built-in types for members + no constructor). Otherwise, when using assignment, it may not work out right, if the left-hand side is not already a valid instance of the class.
It seems, you can still validly allocate an instance on the heap with
Foo* p = ::new Foo;
To restrict how a class instance can be created, you will probably be better off declaring the constructor(s) private and only allow factory functions call them.
Wrap it:
struct FooHolder {
Foo foo;
operator Foo*() { return &foo; }
};
I don't have full understanding of the underlying code. If other things are ok, the code above is correct. Enough space will be allocated from malloc() and anything funny will not happen. But avoid using strange code and work straighforward:
Foo f;
Foo *pf = &f;
I'm not new to programming, but after working in Java I'm coming back to C++ and am a little confused about class variables that aren't pointers. Given the following code:
#include <iostream>
#include <map>
using namespace std;
class Foo {
public:
Foo() {
bars[0] = new Bar;
bars[0]->id = 5;
}
~Foo() { }
struct Bar {
int id;
};
void set_bars(map<int,Bar*>& b) {
bars = b;
}
void hello() {
cout << bars[0]->id << endl;
}
protected:
map<int,Bar*> bars;
};
int main() {
Foo foo;
foo.hello();
map<int,Foo::Bar*> testbars;
testbars[0] = new Foo::Bar;
testbars[0]->id = 10;
foo.set_bars(testbars);
foo.hello();
return(0);
}
I get the expected output of 5 & 10. However, my lack of understanding about references and pointers and such in C++ make me wonder if this will actually work in the wild, or if once testbars goes out of scope it will barf. Of course, here, testbars will not go out of scope before the program ends, but what if it were created in another class function as a function variable? Anyway, I guess my main question is would it better/safer for me to create the bars class variable as a pointer to the map map?
Anyway, I guess my main question is
would it better/safer for me to create
the bars class variable as a pointer
to the map map?
No. C++ is nothing like Java in this and may other respects. If you find yourself using pointers and allocating new'd objects to them a lot, you are probably doing something wrong. To learn the right way to do things, I suggest getting hold of a copy of Accelerated C++ by Koenig & Moo,
The member variable bars is a separate instance of a "dictionary"-like/associative array class. So when it is assigned to in set_bars, the contents of the parameter b are copied into bars. So there is no need to worry about the relative lifetimes of foo and testbars, as they are independent "value-like" entites.
You have more of a problem with the lifetimes of the Bar objects, which are currently never going to be deleted. If you add code somewhere to delete them, then you will introduce a further problem because you are copying the addresses of Bar objects (rather than the objects themselves), so you have the same object pointed to by two different maps. Once the object is deleted, the other map will continue to refer to it. This is the kind of thing that you should avoid like the plague in C++! Naked pointers to objects allocated with new are a disaster waiting to happen.
References (declared with &) are not different from pointers with regard to object lifetimes. To allow you to refer to the same object from two places, you can use either pointers or references, but this will still leave you with the problem of deallocation.
You can get some way toward solving the deallocation problem by using a class like shared_ptr, which should be included with any up-to-date C++ environment (in std::tr1). But then you may hit problems with cyclical pointer networks (A points to B and B points to A, for example), which will not be automatically cleaned up.
For every new you need a corresponding delete.
If you try and reference the memory after you call delete - where ever that is - then the program will indeed "barf".
If you don't then you will be fine, it's that simple.
You should design your classes so that ownership of memory is explicit, and that you KNOW that for every allocation you are doing an equal deallocation.
Never assume another class/container will delete memory you allocated.
Hope this helps.
In the code below you can pass map of Bars and then will be able to modify Bars outside of the class.
But. But unless you call set_bars again.
It is better when one object is responsible for creation and deletion of Bars. Which is not true in your case.
If you want you can use boost::shared_ptr< Bars > instead of Bars*. That will be more Java like behavior.
class Foo {
public:
Foo() {
bars[0] = new Bar;
bars[0]->id = 5;
}
~Foo() { freeBarsMemory(); }
struct Bar {
int id;
};
typedef std::map<int,Bar*> BarsList;
void set_bars(const BarsList& b) {
freeBarsMemory();
bars = b;
}
void hello() {
std::cout << bars[0]->id << std::endl;
}
protected:
BarsList bars;
void freeBarsMemory()
{
BarsList::const_iterator it = bars.begin();
BarsList::const_iterator end = bars.end();
for (; it != end; ++it)
delete it->second;
bars.clear();
}
};
I'm not new to programming, but after working in Java I'm coming back to C++ and am a little confused about class variables that aren't pointers.
The confusion appears to come from a combination of data that is on the heap and data that is not necessarily on the heap. This is a common cause of confusion.
In the code you posted, bars is not a pointer. Since it's in class scope, it will exist until the object containing it (testbars) is destroyed. In this case testbars was created on the stack so it will be destroyed when it falls out of scope, regardless of how deeply nested that scope is. And when testbars is destroyed, subobjects of testbars (whether they are parent classes or objects contained within the testbars object) will have their destructors run at that exact moment in a well-defined order.
This is an extremely powerful aspect of C++. Imagine a class with a 10-line constructor that opens a network connection, allocates memory on the heap, and writes data to a file. Imagine that the class's destructor undoes all of that (closes the network connection, deallocates the memory on the heap, closes the file, etc.). Now imagine that creating an object of this class fails halfway through the constructor (say, the network connection is down). How can the program know which lines of the destructor will undo the parts of the constructor that succeeded? There is no general way to know this, so the destructor of that object is not run.
Now imagine a class that contains ten objects, and the constructor for each of those objects does one thing that must be rolled back (opens a network connection, allocates memory on the heap, writes data to a file, etc.) and the destructor for each of those objects includes the code necessary to roll back the action (closes the network connection, deallocates objects, closes the file, etc.). If only five objects are successfully created then only those five need to be destroyed, and their destructors will run at that exact moment in time.
If testbars had been created on the heap (via new) then it would only be destroyed when calling delete. In general it's much easier to use objects on the stack unless there is some reason for the object to outlast the scope it was created in.
Which brings me to Foo::bar. Foo::bars is a map that refers to objects on the heap. Well, it refers to pointers that, in this code example, refer to objects allocated on the heap (pointers can also refer to objects allocated on the stack). In the example you posted the objects these pointers refer to are never deleted, and because these objects are on the heap you're getting a (small) memory leak (which the operating system cleans up on program exit). According to the STL, std::maps like Foo::bar do not delete pointers they refer to when they are destroyed. Boost has a few solutions to this problem. In your case it's probably be easiest to simply not allocate these objects on the heap:
#include <iostream>
#include <map>
using std::map;
using std::cout;
class Foo {
public:
Foo() {
// normally you wouldn't use the parenthesis on the next line
// but we're creating an object without a name, so we need them
bars[0] = Bar();
bars[0].id = 5;
}
~Foo() { }
struct Bar {
int id;
};
void set_bars(map<int,Bar>& b) {
bars = b;
}
void hello() {
cout << bars[0].id << endl;
}
protected:
map<int,Bar> bars;
};
int main() {
Foo foo;
foo.hello();
map<int,Foo::Bar> testbars;
// create another nameless object
testbars[0] = Foo::Bar();
testbars[0].id = 10;
foo.set_bars(testbars);
foo.hello();
return 0;
}