Can Tail Call Optimization and RAII Co-Exist? - c++

I can't think of a true RAII language that also has tail call optimization in the specs, but I know many C++ implementations can do it as an implementation-specific optimization.
This poses a question for those implementations that do: given that destructors are invoked at the end of a automatic variable's scope and not by a separate garbage collection routine, doesn't it violate TCO's constraint that a recursive call must be the last instruction at the end of a function?
For example:-
#include <iostream>
class test_object {
public:
test_object() { std::cout << "Constructing...\n"; }
~test_object() { std::cout << "Destructing...\n"; }
};
void test_function(int count);
int main()
{
test_function(999);
}
void test_function(int count)
{
if (!count) return;
test_object obj;
test_function(count - 1);
}
"Constructing..." would be written 999 times and then "Destructing..." another 999 times. Ultimately, 999 test_object instances would be automatically allocated before the unwind. But assuming an implementation had TCO, would 1000 stack frames exist or just 1?
Does the destructor after the recursive call collide with the defacto TCO implementation requirements?

Taken at face-value, it would certainly seem like RAII works against TCO. However, remember that there are a number of ways in which the compiler can "get away with it", so to speak.
The first and most obvious case is if the destructor is trivial, meaning that it is the default destructor (compiler-generated) and all the sub-objects have trivial destructors too, then the destructor is effectively non-existent (always optimized away). In that case, TCO can be performed as usual.
Then, the destructor could be inlined (it's code is taken and put directly in the function as opposed to being called like a function). In that case, it boils down to just having some "clean-up" code after the return statement. The compiler is allowed to re-order operations if it can determine that the end-result is the same (the "as-if" rule), and it will do so (in general) if the re-ordering leads to better code, and I would assume TCO is one of the considerations being applied by most compilers (i.e., if it can re-order things such that the code becomes suitable for TCO, then it will do it).
And for the rest of the cases, where the compiler cannot be "smart enough" to do it on its own, then it becomes the responsibility of the programmer. The presence of this automatic destructor call does make it a bit harder for the programmer to see the TCO-inhibiting clean-up code after the tail call, but it doesn't make any difference in terms of the ability of the programmer to make the function a candidate for TCO. For example:
void nonRAII_recursion(int a) {
int* arr = new int[a];
// do some stuff with array "arr"
delete[] arr;
nonRAII_recursion(--a); // tail-call
};
Now, a naive RAII_recursion implementation might be:
void RAII_recursion(int a) {
std::vector<int> arr(a);
// do some stuff with vector "arr"
RAII_recursion(--a); // tail-call
}; // arr gets destroyed here, not good for TCO.
But a wise programmer can still see that this won't work (unless the vector destructor is inlined, which is likely in this case), and can rectify the situation easily:
void RAII_recursion(int a) {
{
std::vector<int> arr(a);
// do some stuff with vector "arr"
}; // arr gets destroyed here
RAII_recursion(--a); // tail-call
};
And I'm pretty sure you could demonstrate that there are essentially no cases where this kind of trick could not be used to ensure that TCO can be applied. So, RAII merely makes it a bit harder to see if TCO can be applied. But I think programmers that are wise enough to design TCO-capable recursive calls are also wise enough to see those "hidden" destructor calls that would need to be forced to occur before the tail-call.
ADDED NOTE: Look at it this way, the destructor hides away some automatic clean-up code. If you need the clean-up code (i.e., non-trivial destructor), you will need it whether you use RAII or not (e.g., C-style array or whatever). And then, if you want TCO to be possible, it must be possible to do the cleaning up before doing the tail-call (with or without RAII), and it is possible, then it is possible be force the RAII objects to be destroyed before the tail-call (e.g., by putting them inside an extra scope).

If the compiler perform the TCO then the order in which destructors are called is changed with respect to when it doesn't do TCO.
If the compiler can prove that this reordering doesn't matter (e.g. if the destructor is trivial) then as per the as-if rule it can perform TCO. However, in your example the compiler can't prove that and will not do TCO.

Related

Can you call the destructor without calling the constructor?

I've been trying not to initialize memory when I don't need to, and am using malloc arrays to do so:
This is what I've run:
#include <iostream>
struct test
{
int num = 3;
test() { std::cout << "Init\n"; }
~test() { std::cout << "Destroyed: " << num << "\n"; }
};
int main()
{
test* array = (test*)malloc(3 * sizeof(test));
for (int i = 0; i < 3; i += 1)
{
std::cout << array[i].num << "\n";
array[i].num = i;
//new(array + i) i; placement new is not being used
std::cout << array[i].num << "\n";
}
for (int i = 0; i < 3; i += 1)
{
(array + i)->~test();
}
free(array);
return 0;
}
Which outputs:
0 ->- 0
0 ->- 1
0 ->- 2
Destroyed: 0
Destroyed: 1
Destroyed: 2
Despite not having constructed the array indices. Is this "healthy"? That is to say, can I simply treat the destructor as "just a function"?
(besides the fact that the destructor has implicit knowledge of where the data members are located relative to the pointer I specified)
Just to specify: I'm not looking for warnings on the proper usage of c++. I would simply like to know if there's things I should be wary of when using this no-constructor method.
(footnote: the reason I don't wanna use constructors is because many times, memory simply does not need to be initialized and doing so is slow)
No, this is undefined behaviour. An object's lifetime starts after the call to a constructor is completed, hence if a constructor is never called, the object technically never exists.
This likely "seems" to behave correctly in your example because your struct is trivial (int::~int is a no-op).
You are also leaking memory (destructors destroy the given object, but the original memory allocated via malloc still needs to be freed).
Edit: You might want to look at this question as well, as this is an extremely similar situation, simply using stack allocation instead of malloc. This gives some of the actual quotes from the standard around object lifetime and construction.
I'll add this as well: in the case where you don't use placement new and it clearly is required (e.g. struct contains some container class or a vtable, etc.) you are going to run into real trouble. In this case, omitting the placement-new call is almost certainly going to gain you 0 performance benefit for very fragile code - either way, it's just not a good idea.
Yes, the destructor is nothing more than a function. You can call it at any time. However, calling it without a matching constructor is a bad idea.
So the rule is: If you did not initialize memory as a specific type, you may not interpret and use that memory as an object of that type; otherwise it is undefined behavior. (with char and unsigned char as exceptions).
Let us do a line by line analysis of your code.
test* array = (test*)malloc(3 * sizeof(test));
This line initializes a pointer scalar array using a memory address provided by the system. Note that the memory is not initialized for any kind of type. This means you should not treat these memory as any object (even as scalars like int, let aside your test class type).
Later, you wrote:
std::cout << array[i].num << "\n";
This uses the memory as test type, which violates the rule stated above, leading to undefined behavior.
And later:
(array + i)->~test();
You used the memory a test type again! Calling destructor also uses the object ! This is also UB.
In your case you are lucky that nothing harmful happens and you get something reasonable. However UBs are solely dependent on your compiler's implementation. It can even decide to format your disk and that's still standard-conforming.
That is to say, can I simply treat the destructor as "just a function"?
No. While it is like other functions in many ways, there are some special features of the destructor. These boil down to a pattern similar to manual memory management. Just as memory allocation and deallocation need to come in pairs, so do construction and destruction. If you skip one, skip the other. If you call one, call the other. If you insist upon manual memory management, the tools for construction and destruction are placement new and explicitly calling the destructor. (Code that uses new and delete combine allocation and construction into one step, while destruction and deallocation are combined into the other.)
Do not skip the constructor for an object that will be used. This is undefined behavior. Furthermore, the less trivial the constructor, the more likely that something will go wildly wrong if you skip it. That is, as you save more, you break more. Skipping the constructor for a used object is not a way to be more efficient — it is a way to write broken code. Inefficient, correct code trumps efficient code that does not work.
One bit of discouragement: this sort of low-level management can become a big investment of time. Only go this route if there is a realistic chance of a performance payback. Do not complicate your code with optimizations simply for the sake of optimizing. Also consider simpler alternatives that might get similar results with less code overhead. Perhaps a constructor that performs no initializations other than somehow flagging the object as not initialized? (Details and feasibility depend on the class involved, hence extend outside the scope of this question.)
One bit of encouragement: If you think about the standard library, you should realize that your goal is achievable. I would present vector::reserve as an example of something that can allocate memory without initializing it.
You currently have UB as you access field from non-existing object.
You might let field uninitialized by doing a constructor noop. compiler might then easily doing no initialization, for example:
struct test
{
int num; // no = 3
test() { std::cout << "Init\n"; } // num not initalized
~test() { std::cout << "Destroyed: " << num << "\n"; }
};
Demo
For readability, you should probably wrap it in dedicated class, something like:
struct uninitialized_tag {};
struct uninitializable_int
{
uninitializable_int(uninitialized_tag) {} // No initalization
uninitializable_int(int num) : num(num) {}
int num;
};
Demo

Exception Safety: Benefits of a nothrow swap

If a type has a swap function which cannot fail, this can make it easier for other functions to provide the strong exception safety guarantee. This is because we can first do all of the function's work which may fail "off to the side", and then commit the work using non-throwing swaps.
However, are there any other benefits of guaranteeing that swap will never fail?
For example, is there a situation in which the presence of a no-fail swap makes it easier for another function to provide the basic guarantee?
Let's say I do this:
class C {
T a, b; // invariant: a > b
void swap(C& other);
};
It seems that there's no way to implement C::swap() with basic guarantee if T::swap(T&) might throw. I'd have to add a level of indirection and store T* instead of T.

Is it possible to adapt unique_ptr for use in plain c?

Is it possible to adapt unique_ptr for plain c?
Perhaps if there is a way of simulating calls to a home made "constructor/destructor" when calling malloc/free?
Is it doable? Or is it just a silly idea?
The whole point of a "smart pointer" is to do certain tasks automatically on destruction. Since C doesn't have destructors there's no way to accomplish this except with explicit function calls - but that's how you deallocate memory in C already.
You could possibly make a list of pointers that need to be freed and do them all at the same time with a single function call.
No, not in plain C. You can do something similar with the cleanup GCC attribute (it is not standard):
#include <stdio.h>
void scope_leaving(int* p)
{
printf("Leaving scope.\n");
// this is essentially your "destructor"
}
int main(int argc, char* argv[])
{
printf("Before x is declared.\n");
{
int x __attribute__((cleanup (scope_leaving)));
x = 42;
}
printf("Scope was left.\n");
}
As you would expect, the output is this:
Before x is declared.
Leaving scope.
Scope was left.
With this, you can implement a pointer emulates the RAII semantics of unique_ptr, maybe using a macro for easier declaration. There is more to unique_ptr than that, but I'm not sure from your question if you want other aspects besides RAII.
On second reading, there's nothing that prevents your C structures from starting with a void(*)() pointer. If you have a custom_malloc(size_t size, void(*deleter)() which sets the pointer, your custom_free(void*) can subsequently call that deleter. This is similar to a virtual destructor from C++. However, the second part of std::unique_ptr is a delete'd copy constructor. You can't do that in C.
In order to implemented unique_ptr, you need two things:
assignment operator which transfers ownership;
destructor for when unique_ptr goes out-of-scope.
In order to implement (1), since C does not allow you to have function overloading, you need to "discipline" yourself and always call a custom assignment function or macro. C cannot force you to always have ownership transferred.
As for (2), GCC provides an extension: you are allow to assign a "cleanup" function, which is called once a variable gets out of scope. This can be used to implement a destructor for your unique_ptr.

Making swap faster, easier to use and exception-safe

I could not sleep last night and started thinking about std::swap. Here is the familiar C++98 version:
template <typename T>
void swap(T& a, T& b)
{
T c(a);
a = b;
b = c;
}
If a user-defined class Foo uses external ressources, this is inefficient. The common idiom is to provide a method void Foo::swap(Foo& other) and a specialization of std::swap<Foo>. Note that this does not work with class templates since you cannot partially specialize a function template, and overloading names in the std namespace is illegal. The solution is to write a template function in one's own namespace and rely on argument dependent lookup to find it. This depends critically on the client to follow the "using std::swap idiom" instead of calling std::swap directly. Very brittle.
In C++0x, if Foo has a user-defined move constructor and a move assignment operator, providing a custom swap method and a std::swap<Foo> specialization has little to no performance benefit, because the C++0x version of std::swap uses efficient moves instead of copies:
#include <utility>
template <typename T>
void swap(T& a, T& b)
{
T c(std::move(a));
a = std::move(b);
b = std::move(c);
}
Not having to fiddle with swap anymore already takes a lot of burden away from the programmer.
Current compilers do not generate move constructors and move assignment operators automatically yet, but as far as I know, this will change. The only problem left then is exception-safety, because in general, move operations are allowed to throw, and this opens up a whole can of worms. The question "What exactly is the state of a moved-from object?" complicates things further.
Then I was thinking, what exactly are the semantics of std::swap in C++0x if everything goes fine? What is the state of the objects before and after the swap? Typically, swapping via move operations does not touch external resources, only the "flat" object representations themselves.
So why not simply write a swap template that does exactly that: swap the object representations?
#include <cstring>
template <typename T>
void swap(T& a, T& b)
{
unsigned char c[sizeof(T)];
memcpy( c, &a, sizeof(T));
memcpy(&a, &b, sizeof(T));
memcpy(&b, c, sizeof(T));
}
This is as efficient as it gets: it simply blasts through raw memory. It does not require any intervention from the user: no special swap methods or move operations have to be defined. This means that it even works in C++98 (which does not have rvalue references, mind you). But even more importantly, we can now forget about the exception-safety issues, because memcpy never throws.
I can see two potential problems with this approach:
First, not all objects are meant to be swapped. If a class designer hides the copy constructor or the copy assignment operator, trying to swap objects of the class should fail at compile-time. We can simply introduce some dead code that checks whether copying and assignment are legal on the type:
template <typename T>
void swap(T& a, T& b)
{
if (false) // dead code, never executed
{
T c(a); // copy-constructible?
a = b; // assignable?
}
unsigned char c[sizeof(T)];
std::memcpy( c, &a, sizeof(T));
std::memcpy(&a, &b, sizeof(T));
std::memcpy(&b, c, sizeof(T));
}
Any decent compiler can trivially get rid of the dead code. (There are probably better ways to check the "swap conformance", but that is not the point. What matters is that it's possible).
Second, some types might perform "unusual" actions in the copy constructor and copy assignment operator. For example, they might notify observers of their change. I deem this a minor issue, because such kinds of objects probably should not have provided copy operations in the first place.
Please let me know what you think of this approach to swapping. Would it work in practice? Would you use it? Can you identify library types where this would break? Do you see additional problems? Discuss!
So why not simply write a swap template that does exactly that: swap the object representations*?
There's many ways in which an object, once being constructed, can break when you copy the bytes it resides in. In fact, one could come up with a seemingly endless number of cases where this would not do the right thing - even though in practice it might work in 98% of all cases.
That's because the underlying problem to all this is that, other than in C, in C++ we must not treat objects as if they are mere raw bytes. That's why we have construction and destruction, after all: to turn raw storage into objects and objects back into raw storage. Once a constructor has run, the memory where the object resides is more than only raw storage. If you treat it as if it weren't, you will break some types.
However, essentially, moving objects shouldn't perform that much worse than your idea, because, once you start to recursively inline the calls to std::move(), you usually ultimately arrive at where built-ins are moved. (And if there's more to moving for some types, you'd better not fiddle with the memory of those yourself!) Granted, moving memory en bloc is usually faster than single moves (and it's unlikely that a compiler might find out that it could optimize the individual moves to one all-encompassing std::memcpy()), but that's the price we pay for the abstraction opaque objects offer us. And it's quite small, especially when you compare it to the copying we used to do.
You could, however, have an optimized swap() using std::memcpy() for aggregate types.
This will break class instances that have pointers to their own members. For example:
class SomeClassWithBuffer {
private:
enum {
BUFSIZE = 4096,
};
char buffer[BUFSIZE];
char *currentPos; // meant to point to the current position in the buffer
public:
SomeClassWithBuffer();
SomeClassWithBuffer(const SomeClassWithBuffer &that);
};
SomeClassWithBuffer::SomeClassWithBuffer():
currentPos(buffer)
{
}
SomeClassWithBuffer::SomeClassWithBuffer(const SomeClassWithBuffer &that)
{
memcpy(buffer, that.buffer, BUFSIZE);
currentPos = buffer + (that.currentPos - that.buffer);
}
Now, if you just do memcpy(), where would currentPos point? To the old location, obviously. This will lead to very funny bugs where each instance actually uses another's buffer.
Some types can be swapped but cannot be copied. Unique smart pointers are probably the best example. Checking for copyability and assignability is wrong.
If T isn't a POD type, using memcpy to copy/move is undefined behavior.
The common idiom is to provide a method void Foo::swap(Foo& other) and a specialization of std::swap<Foo>. Note that this does not work with class templates, …
A better idiom is a non-member swap and requiring users to call swap unqualified, so ADL applies. This also works with templates:
struct NonTemplate {};
void swap(NonTemplate&, NonTemplate&);
template<class T>
struct Template {
friend void swap(Template &a, Template &b) {
using std::swap;
#define S(N) swap(a.N, b.N);
S(each)
S(data)
S(member)
#undef S
}
};
The key is the using declaration for std::swap as a fallback. The friendship for Template's swap is nice for simplifying the definition; the swap for NonTemplate might also be a friend, but that's an implementation detail.
I deem this a minor issue, because
such kinds of objects probably should
not have provided copy operations in
the first place.
That is, quite simply, a load of wrong. Classes that notify observers and classes that shouldn't be copied are completely unrelated. How about shared_ptr? It obviously should be copyable, but it also obviously notifies an observer- the reference count. Now it's true that in this case, the reference count is the same after the swap, but that's definitely not true for all types and it's especially not true if multi-threading is involved, it's not true in the case of a regular copy instead of a swap, etc. This is especially wrong for classes that can be moved or swapped but not copied.
because in general, move operations
are allowed to throw
They are most assuredly not. It is virtually impossible to guarantee strong exception safety in pretty much any circumstance involving moves when the move might throw. The C++0x definition of the Standard library, from memory, explicitly states any type usable in any Standard container must not throw when moving.
This is as efficient as it gets
That is also wrong. You're assuming that the move of any object is purely it's member variables- but it might not be all of them. I might have an implementation-based cache and I might decide that within my class, I should not move this cache. As an implementation detail it is entirely within my rights not to move any member variables that I deem are not necessary to be moved. You, however, want to move all of them.
Now, it's true that your sample code should be valid for a lot of classes. However, it's extremely very definitely not valid for many classes that are completely and totally legitimate, and more importantly, it's going to compile down to that operation anyway if the operation can be reduced to that. This is breaking perfectly good classes for absolutely no benefit.
your swap version will cause havoc if someone uses it with polymorphic types.
consider:
Base *b_ptr = new Base(); // Base and Derived contain definitions
Base *d_ptr = new Derived(); // of a virtual function called vfunc()
yourmemcpyswap( *b_ptr, *d_ptr );
b_ptr->vfunc(); //now calls Derived::vfunc, while it should call Base::vfunc
d_ptr->vfunc(); //now calls Base::vfunc while it should call Derived::vfunc
//...
this is wrong, because now b contains the vtable of the Derived type, so Derived::vfunc is invoked on a object which isnt of type Derived.
The normal std::swap only swaps the data members of Base, so this is OK with std::swap

Idiomatic use of auto_ptr to transfer ownership to a container

I'm refreshing my C++ knowledge after not having used it in anger for a number of years. In writing some code to implement some data structure for practice, I wanted to make sure that my code was exception safe. So I've tried to use std::auto_ptrs in what I think is an appropriate way. Simplifying somewhat, this is what I have:
class Tree
{
public:
~Tree() { /* delete all Node*s in the tree */ }
void insert(const string& to_insert);
...
private:
struct Node {
...
vector<Node*> m_children;
};
Node* m_root;
};
template<T>
void push_back(vector<T*>& v, auto_ptr<T> x)
{
v.push_back(x.get());
x.release();
}
void Tree::insert(const string& to_insert)
{
Node* n = ...; // find where to insert the new node
...
push_back(n->m_children, auto_ptr<Node>(new Node(to_insert));
...
}
So I'm wrapping the function that would put the pointer into the container, vector::push_back, and relying on the by-value auto_ptr argument to
ensure that the Node* is deleted if the vector resize fails.
Is this an idiomatic use of auto_ptr to save a bit of boilerplate in my
Tree::insert? Any improvements you can suggest? Otherwise I'd have to have
something like:
Node* n = ...; // find where to insert the new node
auto_ptr<Node> new_node(new Node(to_insert));
n->m_children.push_back(new_node.get());
new_node.release();
which kind of clutters up what would have been a single line of code if I wasn't
worrying about exception safety and a memory leak.
(Actually I was wondering if I could post my whole code sample (about 300 lines) and ask people to critique it for idiomatic C++ usage in general, but I'm not sure whether that kind of question is appropriate on stackoverflow.)
It is not idiomatic to write your own container: it is rather exceptional, and for the most part useful only for learning how to write containers. At any rate, it is most certainly not idiomatic to use std::autp_ptr with standard containers. In fact, it's wrong, because copies of std::auto_ptr aren't equivalent: only one auto_ptr owns a pointee at any given time.
As for idiomatic use of std::auto_ptr, you should always name your auto_ptr on construction:
int wtv() { /* ... */ }
void trp(std::auto_ptr<int> p, int i) { /* ... */ }
void safe() {
std::auto_ptr<int> p(new int(12));
trp(p, wtv());
}
void danger() {
trp(std::auto_ptr<int>(new int(12)), wtv());
}
Because the C++ standard allows arguments to evaluate in any arbitrary order, the call to danger() is unsafe. In the call to trp() in danger(), the compiler may allocate the integer, then create the auto_ptr, and finally call wtv(). Or, the compiler may allocate a new integer, call wtv(), and finally create the auto_ptr. If wtv() throws an exception then danger() may or may not leak.
In the case of safe(), however, because the auto_ptr is constructed a-priori, RAII guarantees it will clean up properly whether or not wtv() throws an exception.
Yes it is.
For example, see the interface of the Boost.Pointer Container library. The various pointer containers all feature an insert function taking an auto_ptr which semantically guarantees that they take ownership (they also have the raw pointer version but hey :p).
There are however other ways to achieve what you're doing with regards to exception safety because it's only internal here. To understand it, you need to understand what could go wrong (ie throw) and then reorder your instructions so that the operations that may throw are done with before the side-effects occur.
For example, taking from your post:
auto_ptr<Node> new_node(new Node(to_insert)); // 1
n->m_children.push_back(new_node.get()); // 2
new_node.release(); // 3
Let's check each line,
The constructor may throw (for example if the CopyConstructor of the type throws), in this case however you are guaranteed that new will perform the cleanup for you
The call to push_back may throw a std::bad_alloc if the memory is exhausted. That's the only error possible as copying a pointer is a no-throw operation
This is guaranteed not to throw
If you look closely, you'll remark that you would not have to worry if you could somehow have 2 being executed before 1. It is in fact possible:
n->m_children.reserve(n->m_children.size() + 1);
n->m_children.push_back(new Node(to_insert));
The call to reserve may throw (bad_alloc), but if it completes normally you are then guaranteed that no reallocation will occur until size becomes equal to capacity and you try another insertion.
The call to new may fall if the constructor throw, in which case new will perform the cleanup, if it completes you're left with a pointer that is immediately inserted in the vector... which is guaranteed not to throw because of the line above.
Thus, the use of auto_ptr may be replaced here. It was nonetheless a good idea, though as has been noted you should refrain from executing RAII initialization within a function evaluation.
I like the idea of declaring the ownership of pointer. This is one of the great features in c++0x, std::unique_ptrs. However std::auto_ptr is so hard to understand and lethal is even slightly misued I would suggest avoiding it entirely.
http://www.gotw.ca/publications/using_auto_ptr_effectively.htm