I'm currently writing an own string implementation in C++. (Just for exercise).
However, I currently have this copy-constructor:
// "obj" has the same type of *this, it's just another string object
string_base<T>(const string_base<T> &obj)
: len(obj.length()), cap(obj.capacity()) {
raw_data = new T[cap];
for (unsigned i = 0; i < cap; i++)
raw_data[i] = obj.data()[i];
raw_data[len] = 0x00;
}
and I wanted to increase performance a little bit. So I came on the idea using memcpy() to just copy obj into *this.
Just like that:
// "obj" has the same type of *this, it's just another string object
string_base<T>(const string_base<T> &obj) {
memcpy(this, &obj, sizeof(string_base<T>));
}
Is it safe to overwrite the data of *this like that? Or may this produce any problems?
Thanks in advance!
No, it's not safe. From cppreference.com:
If the objects are not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined.
Your class is not TriviallyCopyable, since its copy constructor is user-provided.
Moreover, your copy constructor would make only shallow copies (which might be fine if you wanted, e.g., copy-on-write mechanism applied with your strings).
It will produce problems. All references and pointers will be just copied, even the pointer to raw_data, which will be the same as the source object.
As a requisite in order to use memcpy, your class should:
Be Trivially Copyable
Have no references or pointers unless: static pointers or not owning the pointed data. Unless you know what you are doing such as implementing a copy-on-write mechanism or when this behaviour is otherwise intended and managed.
As others have said, in order for memcpy to work correctly, the the objects being copied must be trivially copyable. For an arbitrary type like T in a template, you can't be sure of that. Sure, you can check for it, but it's much easier to let somebody else do the checking. Instead of writing that loop and tweaking it, use std::copy_n. It will use memcpy when that's appropriate, and element-by-element copying when it isn't. So change
raw_data = new T[cap];
for (unsigned i = 0; i < cap; i++)
raw_data[i] = obj.data()[i];
to
raw_data = new T[cap];
std::copy_n(obj.data(), cap, raw_data);
This also has the slight advantage of not evaluating obj.data() on every pass through the loop, which is an optimization that your compiler might or might not apply.
It seems pretty obvious from the limited fragment that raw_data is a member, and a pointer to a new[]'ed array of T. If you memcpy the object, you memcpy this pointer. You don't copy the array.
Look at your destructor. It probably calls delete[], unconditionally. It has no idea how many copies exist. That means it's calling delete[] too often. This is fixable: you'd need something similar to shared_ptr. That's not at all trivial; you have to worry about thread safety of that share count. And obviously you can't just memcpy the object, as that would not update the share count.
Related
The reason for me to ask this is I need to store std::function in a vector, and the in-house vector we have in company basically is doing realloc if it needs more memory. (Basically just memcpy, no copy/move operator involves)
This means all the element we can put in our container need to be trivially-copyable.
Here is some code to demonstrate the problematic copy I had:
void* func1Buffer = malloc(sizeof(std::function<void(int)>));
std::function<void(int)>* func1p = new (func1Buffer) std::function<void(int)>();
std::function<void(int)>* func2p = nullptr;
*func1p = [](int) {};
char func2Buffer[sizeof(*func1p)];
memcpy(&func2Buffer, func1p, sizeof(*func1p));
func2p = (std::function<void(int)>*)(func2Buffer);
// func2p is still valid here
(*func2p)(10);
free(func1Buffer);
// func2p is now invalid, even without std::function<void(int)> desctructor get triggered
(*func2p)(10);
I understand we should support copy/move of the element in order to store std::function safely.
But I am still very curious about what is the direct cause of invalid std::function copy above.
----------------------------------------------------UpdateLine----------------------------------------------------
Updated the code sample.
I have found the direct reason for this failure, by debugging our in-house vector more.
The trivially copied std::function has some dependency on original object memory, delete the original memory will trash the badly copied std::function even without the destruction of the original object.
Thanks for everyone's answer to this post. It's all valuable input. :)
The problem is how std::function has to be implemented: it has to manage the lifetime of whatever object it's holding onto. So when you write:
{
std::function<Sig> f = X{};
}
we must invoke the destructor of X when f goes out of scope. Moreover, std::function will [potentially] allocate memory to hold that X so the destructor of f must also [potentially] free that memory.
Now consider what happens when we try to do:
char buffer[100000]; // something big
{
std::function<void()> f = X{};
memcpy(buffer, &f, sizeof(f));
}
(*reinterpret_cast<std::function<void()>*>(buffer))();
At the point we're calling the function "stored" at buffer, the X object has already been destroyed and the memory holding it has been [potentially] freed. Regardless of whether X were TriviallyCopyable, we don't have an X anymore. We have the artist formerly known as an X.
Because it's incumbent upon std::function to manage its own objects, it cannot be TriviallyCopyable even if we added the requirement that all callables it managed were TriviallyCopyable.
To work in your realloc_vector, you need either need something like function_ref (or std::function<>*) (that is, a type that simply doesn't own any resources), or you need to implement your own version of function that (a) keeps its own storage as a member to avoid allocating memory and (b) is only constructible with TriviallyCopyable callables so that it itself becomes trivially copyable. Whichever solution is better depends on the what your program is actually doing.
But I am still very curious about what is the direct cause of invalid
std::function copy above.
std::function cannot be TriviallyCopyable (or conditionally TriviallyCopyable) because as a generic callable object wrapper it cannot assume that the stored callable is TriviallyCopyable.
Consider implementing your own version of std::function that only supports TriviallyCopyable callable objects (using a fixed buffer for storage), or use a vector of function pointers if applicable in your situation.
To be trivially copyable is something that is inherently related to a given type, not to an object.
Consider the following example:
#include<type_traits>
#include<functional>
int main() {
auto l = [](){};
static_assert(not std::is_trivially_copyable<decltype(l)>::value, "!");
std::function<void(void)> f;
bool copyable = std::is_trivially_copyable<decltype(f)>::value;
f = l;
// do something based on the
// fact that f is trivially copyable
}
How could you enforce the property once you have assigned to the function the lambda, that is not trivially copyable?
What you are looking for would be a runtime machinery that gets a decision based on the actual object assigned to the function.
This is not how std::is_trivially_copyable works.
Therefore the compiler has to make a decision at compile-time regarding the given specialization for the std::function. For it's a generic container for callable objects and you can assign it trivially copyable objects as well as objects that aren't trivially copyable, the rest goes without saying.
A std::function might allocate memory for captured variables. As with any other class which allocates memory, it's not trivially copyable.
So, I have been working on my assignment for quite a while but stumbled upon a problem that I couldn't solve by myself.
My task was to create a class CompositeShape which would be capable of holding shared_ptr pointers of a base class Shape in a unique_ptr array so that we would then have an array of derived from Shape classes. The CompositeShape by itself is derived from Shape as well.
So, even though the code below works fine and even provides a strong exception guarantee, my teacher says that this particular function can be optimized.
void kosnitskiy::CompositeShape::add(const std::shared_ptr<Shape> &src)
{
if (src == nullptr)
{
throw std::invalid_argument("Attempt to add an empty pointer exception");
}
std::unique_ptr<std::shared_ptr<Shape>[]> shapes(new std::shared_ptr<Shape>[count_ + 1]);
for (int i = 0; i < count_; i++)
{
shapes[i] = std::move(shapes_[i]);
}
shapes[count_] = src;
shapes_ = std::move(shapes);
count_ += 1;
}
My first reaction was to change the array expanding algorithm to something similar to a vector one so that we wouldn't be forced to create a new array every time a new element is being added, but the teacher said, that despite the fact it's a quite good idea, he talks about the different type of improvement. The one, which wouldn't change the class design. So I assume there is a flaw somewhere in a function itself. I have already changed the assignment construction, used in a for loop, from shapes[i] = shapes_[i] to a one using the std::move instead, since I figured that move assignment operator would be way more efficient than a copy assignment one, but I'm pretty much out of ideas now.
I'm not allowed to make any class design changes. I didn't use vector because it was specified by the teacher that we can't use any standard containers. I didn't use weak_ptr for the same reason as well: we were told only to use unique_ptr and shared_ptr pointers. Non-smart pointers are blocked as well
Thank you very much for your help in advance
My guess is, that teacher's remark might about passing src by value and moving it into the array.
Another option would be to use std::move with std::begin/end instead of a raw for loop.
But both of those seem to be micro-optimizations with the array growing by 1 and being reallocated with each addition and are out of context.
The reason for me to ask this is I need to store std::function in a vector, and the in-house vector we have in company basically is doing realloc if it needs more memory. (Basically just memcpy, no copy/move operator involves)
This means all the element we can put in our container need to be trivially-copyable.
Here is some code to demonstrate the problematic copy I had:
void* func1Buffer = malloc(sizeof(std::function<void(int)>));
std::function<void(int)>* func1p = new (func1Buffer) std::function<void(int)>();
std::function<void(int)>* func2p = nullptr;
*func1p = [](int) {};
char func2Buffer[sizeof(*func1p)];
memcpy(&func2Buffer, func1p, sizeof(*func1p));
func2p = (std::function<void(int)>*)(func2Buffer);
// func2p is still valid here
(*func2p)(10);
free(func1Buffer);
// func2p is now invalid, even without std::function<void(int)> desctructor get triggered
(*func2p)(10);
I understand we should support copy/move of the element in order to store std::function safely.
But I am still very curious about what is the direct cause of invalid std::function copy above.
----------------------------------------------------UpdateLine----------------------------------------------------
Updated the code sample.
I have found the direct reason for this failure, by debugging our in-house vector more.
The trivially copied std::function has some dependency on original object memory, delete the original memory will trash the badly copied std::function even without the destruction of the original object.
Thanks for everyone's answer to this post. It's all valuable input. :)
The problem is how std::function has to be implemented: it has to manage the lifetime of whatever object it's holding onto. So when you write:
{
std::function<Sig> f = X{};
}
we must invoke the destructor of X when f goes out of scope. Moreover, std::function will [potentially] allocate memory to hold that X so the destructor of f must also [potentially] free that memory.
Now consider what happens when we try to do:
char buffer[100000]; // something big
{
std::function<void()> f = X{};
memcpy(buffer, &f, sizeof(f));
}
(*reinterpret_cast<std::function<void()>*>(buffer))();
At the point we're calling the function "stored" at buffer, the X object has already been destroyed and the memory holding it has been [potentially] freed. Regardless of whether X were TriviallyCopyable, we don't have an X anymore. We have the artist formerly known as an X.
Because it's incumbent upon std::function to manage its own objects, it cannot be TriviallyCopyable even if we added the requirement that all callables it managed were TriviallyCopyable.
To work in your realloc_vector, you need either need something like function_ref (or std::function<>*) (that is, a type that simply doesn't own any resources), or you need to implement your own version of function that (a) keeps its own storage as a member to avoid allocating memory and (b) is only constructible with TriviallyCopyable callables so that it itself becomes trivially copyable. Whichever solution is better depends on the what your program is actually doing.
But I am still very curious about what is the direct cause of invalid
std::function copy above.
std::function cannot be TriviallyCopyable (or conditionally TriviallyCopyable) because as a generic callable object wrapper it cannot assume that the stored callable is TriviallyCopyable.
Consider implementing your own version of std::function that only supports TriviallyCopyable callable objects (using a fixed buffer for storage), or use a vector of function pointers if applicable in your situation.
To be trivially copyable is something that is inherently related to a given type, not to an object.
Consider the following example:
#include<type_traits>
#include<functional>
int main() {
auto l = [](){};
static_assert(not std::is_trivially_copyable<decltype(l)>::value, "!");
std::function<void(void)> f;
bool copyable = std::is_trivially_copyable<decltype(f)>::value;
f = l;
// do something based on the
// fact that f is trivially copyable
}
How could you enforce the property once you have assigned to the function the lambda, that is not trivially copyable?
What you are looking for would be a runtime machinery that gets a decision based on the actual object assigned to the function.
This is not how std::is_trivially_copyable works.
Therefore the compiler has to make a decision at compile-time regarding the given specialization for the std::function. For it's a generic container for callable objects and you can assign it trivially copyable objects as well as objects that aren't trivially copyable, the rest goes without saying.
A std::function might allocate memory for captured variables. As with any other class which allocates memory, it's not trivially copyable.
Suppose I have a class which manages a pointer to an internal buffer:
class Foo
{
public:
Foo();
...
private:
std::vector<unsigned char> m_buffer;
unsigned char* m_pointer;
};
Foo::Foo()
{
m_buffer.resize(100);
m_pointer = &m_buffer[0];
}
Now, suppose I also have correctly implemented rule-of-3 stuff including a copy constructor which copies the internal buffer, and then reassigns the pointer to the new copy of the internal buffer:
Foo::Foo(const Foo& f)
{
m_buffer = f.m_buffer;
m_pointer = &m_buffer[0];
}
If I also implement move semantics, is it safe to just copy the pointer and move the buffer?
Foo::Foo(Foo&& f) : m_buffer(std::move(f.m_buffer)), m_pointer(f.m_pointer)
{ }
In practice, I know this should work, because the std::vector move constructor is just moving the internal pointer - it's not actually reallocating anything so m_pointer still points to a valid address. However, I'm not sure if the standard guarantees this behavior. Does std::vector move semantics guarantee that no reallocation will occur, and thus all pointers/iterators to the vector are valid?
I'd do &m_buffer[0] again, simply so that you don't have to ask these questions. It's clearly not obviously intuitive, so don't do it. And, in doing so, you have nothing to lose whatsoever. Win-win.
Foo::Foo(Foo&& f)
: m_buffer(std::move(f.m_buffer))
, m_pointer(&m_buffer[0])
{}
I'm comfortable with it mostly because m_pointer is a view into the member m_buffer, rather than strictly a member in its own right.
Which does all sort of beg the question... why is it there? Can't you expose a member function to give you &m_buffer[0]?
I'll not comment the OP's code. All I'm doing is aswering this question:
Does std::vector move semantics guarantee that no reallocation will occur, and thus all pointers/iterators to the vector are valid?
Yes for the move constructor. It has constant complexity (as specified by 23.2.1/4, table 96 and note B) and for this reason the implementation has no choice other than stealing the memory from the original vector (so no memory reallocation occurs) and emptying the original vector.
No for the move assignment operator. The standard requires only linear complexity (as specified in the same paragraph and table mentioned above) because sometimes a reallocation is required. However, in some cirsunstances, it might have constant complexity (and no reallocation is performed) but it depends on the allocator. (You can read the excelent exposition on moved vectors by Howard Hinnant here.)
A better way to do this may be:
class Foo
{
std::vector<unsigned char> m_buffer;
size_t m_index;
unsigned char* get_pointer() { return &m_buffer[m_index];
};
ie rather than store a pointer to a vector element, store the index of it. That way it will be immune to copying/resizing of the vectors backing store.
The case of move construction is guaranteed to move the buffer from one container to the other, so from the point of view of the newly created object, the operation is fine.
On the other hand, you should be careful with this kind of code, as the donor object is left with a empty vector and a pointer referring to the vector in a different object. This means that after being moved from your object is in a fragile state that might cause issues if anyone accesses the interface and even more importantly if the destructor tries to use the pointer.
While in general there won't be any use of your object after being moved from (the assumption being that to be bound by an rvalue-reference it must be an rvalue), the fact is that you can move out of an lvalue by casting or by using std::move (which is basically a cast), in which case code might actually attempt to use your object.
I am writing my own string class for really just for learning and cementing some knowledge. I have everything working except I want to have a constructor that uses move semantics with an std::string.
Within my constructor I need to copy and null out the std::string data pointers and other things, it needs to be left in an empty but valid state, without deleting the data the string points to, how do I do this?
So far I have this
class String
{
private:
char* mpData;
unsigned int mLength;
public:
String( std::string&& str)
:mpData(nullptr), mLength(0)
{
// need to copy the memory pointer from std::string to this->mpData
// need to null out the std::string memory pointer
//str.clear(); // can't use clear because it deletes the memory
}
~String()
{
delete[] mpData;
mLength = 0;
}
There is no way to do this. The implementation of std::string is implementation-defined. Every implementation is different.
Further, there is no guarantee that the string will be contained in a dynamically allocated array. Some std::string implementations perform a small string optimization, where small strings are stored inside of the std::string object itself.
The below implementation accomplishes what was requested, but at some risk.
Notes about this approach:
It uses std::string to manage the allocated memory. In my view, layering the allocation like this is a good idea because it reduces the number of things that a single class is trying to accomplish (but due to the use of a pointer, this class still has potential bugs associated with compiler-generated copy operations).
I did away with the delete operation since that is now performed automatically by the allocation object.
It will invoke so-called undefined behavior if mpData is used to modify the underlying data. It is undefined, as indicated here, because the standard says it is undefined. I wonder, though, if there are real-world implementations for which const char * std::string::data() behaves differently than T * std::vector::data() -- through which such modifications would be perfectly legal. It may be possible that modifications via data() would not be reflected in subsequent accesses to allocation, but based on the discussion in this question, it seems very unlikely that such modifications would result in unpredictable behavior assuming that no further changes are made via the allocation object.
Is it truly optimized for move semantics? That may be implementation defined. It may also depend on the actual value of the incoming string. As I noted in my other answer, the move constructor provides a mechanism for optimization -- but it doesn't guarantee that an optimization will occur.
class String
{
private:
char* mpData;
unsigned int mLength;
std::string allocation;
public:
String( std::string&& str)
: mpData(const_cast<char*>(str.data())) // cast used to invoke UB
, mLength(str.length())
, allocation(std::move(str)) // this is where the magic happens
{}
};
I am interpreting the question as "can I make the move constructor result in correct behavior" and not "can I make the move constructor optimally fast".
If the question is strictly, "is there a portable way to steal the internal memory from std::string", then the answer is "no, because there is no 'transfer memory ownership' operation provided in the public API".
The following quote from this explanation of move semantics provides a good summary of "move constructors"...
C++0x introduces a new mechanism called "rvalue reference" which,
among other things, allows us to detect rvalue arguments via function
overloading. All we have to do is write a constructor with an rvalue
reference parameter. Inside that constructor we can do anything we
want with the source, as long as we leave it in some valid state.
Based on this description, it seems to me that you can implement the "move semantics" constructor (or "move constructor") without being obligated to actually steal the internal data buffers.
An example implementation:
String( std::string&& str)
:mpData(new char[str.length()]), mLength(str.length())
{
for ( int i=0; i<mLength; i++ ) mpData[i] = str[i];
}
As I understand it, the point of move semantics is that you can be more efficient if you want to. Since the incoming object is transient, its contents do not need to be preserved -- so it is legal to steal them, but it is not mandatory. Maybe, there is no point to implementing this if you aren't transferring ownership of some heap-based object, but it seems like it should be legal. Perhaps it is useful as a stepping stone -- you can steal as much as is useful, even if that isn't the entire contents.
By the way, there is a closely related question here in which the same kind of non-standard string is being built and includes a move constructor for std::string. The internals of the class are different however, and it is suggested that std::string may have built-in support for move semantics internally (std::string -> std::string).