so I have an exam soon, and glancing through my notes, the teacher says that a shallow copy is defined as a bit by bit copy. I know all about shallow and deep copies, yet I have no idea what bit by bit copy is supposed to mean. Isn't all computer data stored as bits? Could this definition imply that during a shallow copy, a bitstream is implemented when copying the data? Anybody know stuff about this "bit by bit" terminology? Thanks
Say you have two variables MyObj a, b;. If a = b performs a shallow copy, then the bits in the variable b will now be the same as the bits in the variable a. In particular, if MyObj contains any pointers or references, they are the same in both a and b. The objects which are pointed or referred to are not copied.
Bit by bit copy / Shallow copy
Take for example a pointer pointing to a chunk of data:
int* my_ints = new int[1000];
my_ints point to the start of an area of memory which spans a thousand ints.
When you do
int* his_ints = my_ints;
the value of my_ints is copied to his_ints, i.e. the bits of my_ints is copied to his_ints. This means that his_ints also points to the start of the same area of memory which my_ints also points. Therefore by doing
his_ints[0] = 42;
my_ints[0] will also be 42 because they both point to the same data. That is what your professor is most probably referred to as "bit by bit" copying, which is also commonly called as "shallow copy". This is mostly encountered when copying pointers and references (you can't technically copy references, but you can bind a reference to a variable bound to another reference).
Deep copy
Now, you may not want to have the bit by bit copy behavior. For example, if you want a copy and you want to modify that copy without modifying the source. For this, you do a deep copy.
int* my_ints = new int[1000];
int* his_ints = new int[1000];
std::copy(my_ints, my_ints + 1000, his_ints);
std::copy there copies the ints in the area of memory pointed to by my_ints into the area of memory pointed to by his_ints. Now if you do
my_ints[0] = 42;
his_ints[0] = 90;
my_ints[0] and his_ints[0] will now have different values, as they now point to their respective and different areas of memory.
How does this matter in C++?
When you have your C++ class, you should properly define its constructor. The one constructor that is relevant with the topic is the copy constructor (note: this also applies to the copy assignment operator). If you only let the compiler generate the default copy constructor, it simply does shallow copies of data, including pointers that may point to areas of memory.
class Example {
public:
int* data;
Example() {
data = new int[1000];
}
Example(const Example&) = default; // Using C++11 default specifier
~Example() {
delete[] data;
}
};
// Example usage
{
Example x;
Example y = x; // Shallow copies x into y
assert(x.data == y.data);
// y's destructor will be called, doing delete[] data.
// x's destructor will be called, doing delete[] data;
// This is problematic, as you are doing delete[] twice on the same data.
}
To solve the problem, you must perform a deep copy of the data. For this to be done, you must define the copy constructor yourself.
Example(const Example& rhs) {
data = new int[1000];
std::copy(data, data + 1000, rhs.data);
}
For more info and better explanation, please see What is The Rule of Three?.
I would define a bit-by-bit copy as the transfer of the information allocated to an object as an unstructured block of memory. In the case of simple structs this is easy to imagine.
What are the contents of the source struct? Are they initialized? What are its relationships to other objects? All unimportant.
In some sense, a bit-by-bit copy is like a shallow copy in that like a shallow copy a bit-by-bit copy will not duplicate related objects, but that's because it doesn't even consider object relationships.
For example C++ defines a trivial copy constructor as
A trivial copy constructor is a constructor that creates a bytewise copy of the object representation of the argument, and performs no other action. Objects with trivial copy constructors can be copied by copying their object representations manually, e.g. with std::memmove. All data types compatible with the C language (POD types) are trivially copyable.
In contrast, shallow copy and its counter-part deep copy exist as a concept precisely because of the question of object relationships.
Related
Why we need to define the operator or copy constructors as these are system supplied default constructors?
Kindly explain this with examples
You need to provide them when you need different behaviour than what the default, compiler provided, functions implement.
The default copy constructor copies all members by value. The default assignment operator assigns all members by value.
In a lot of cases this is just what you want so there is no need to provide your own implementations. But that is not always the case.
Consider this contrived example:
struct Broken {
Broken() : i(42), p(new int) { }
~Broken() { delete p; }
int i;
Int* p;
};
It will get the default compiler generated copy ctor and operator= but in this case they don't do what you want - consider this:
int main()
{
Broken b1;
Broken b2 = b1;
} // you'll probably crash here with a double free
What happens is that b1 stores 42 in its i member, then allocates a new int on the heap and stores the address in its p member (let's say the address is 0x1234).
Then we construct b2 from b1 and the default copy constructor happily assigns b2.i to be 42 - this is fine. It also assigns b2.p to have the value 0x1234 - this is not what you want. Now both objects hold a pointer to the same memory and both their destructors will attempt to delete it.
Thus, when b2 goes out of scope at the end of main it disposes of the memory - so far so good - but then b1 also goes out of scope and tries to release the already freed memory and your program is now broken.
In this case you would want to provide your own operator= and copy ctor that don't naively copy the value of the pointer, but instead make a deep copy of whatever the pointer points to and stores that in a new chunk of memory that is distinct from the original, so that both objects have their own unique copy.
There are many other examples but this was the simplest I could come up with.
I was looking to understand how vector is implemented in C++. There was a previous question that asked this, and so I took a look at it, and I have a small question. Assuming the implementation in the linked question is correct, let's look at this code:
int main(){
Vector<int> test2 = test_Vector();
cout << test2[0] << endl;
return 0;
}
// below is NOT the STL vector object, but the one in the linked question,
// in which the asker tries to implement STL vector himself/herself
Vector<int> test_Vector(){
Vector<int> test;
test.push_back(5);
return test;
}
As I understand it, the test Vector object is created locally, so when the test_Vector method returns, the local object goes out of scope, thereby calling the destructor and delete-ing the dynamic array. Since the code actually works and 5 is printed, I guess I'm wrong. What's the right explanation?
You are right, but you're missing one important thing.
Because you're returning a Vector<int>, you should think of it as being copied. This would normally invoke the copy constructor, which copies test into a new instance of Vector<int>. The copy constructor is implemented in the linked question as:
template<class T>
Vector<T>::Vector(const Vector<T> & v)
{
my_size = v.my_size;
my_capacity = v.my_capacity;
buffer = new T[my_size];
for (int i = 0; i < my_size; i++)
buffer[i] = v.buffer[i];
}
Note that the copy constructor might not be invoked due to Return Value Optimization (see hair-splitting in comments below). Compilers are allowed to optimize the copy away in many cases, and the C++ standard allows for the fact that this optimization may change program behaviour.
Whether the object is copied, or RVO is applied, you should end up with the same thing. The optimization should not ruin your object provided you follow normal object-oriented practices.
You should always think of function return values being passed by value (ie copied) regardless of type, and then consider that your compiler is probably doing RVO. It's important not to forget the Rule of Three (or Four, or Five).
He provided a public copy constructor (technically, he didn't make the copy constructor private), so the standard C++ logic kicks in and makes a copy of the object to return. The new object is local to main. Inside the copy constructor, new memory is malloced and the data copied over. If he hadn't provided a copy constructor, it would have accessed invalid memory (trying to access the memory from the pointer which has been freed)
When test is returned its copy constructor is invoked (in theory) which allocates a new chunk of memory and copies the contents of test. The original test on test_Vectors stack is destructed (in theory) and test2 gets assigned the copy, which means the assignment operator will be invoked (in theory) which will again allocate memory and copy the data from the temporary object that was created when returning. The temporary object returned from test_Vector then has its destructor called (in theory).
As you can see, without compiler optimizations this is hell :) The "in theory"s can be skipped if you don't do anything too baroque and the compiler is smart and your situation is this simple. See this for some updates to the story as of C++11.
I have a class called Directory with certain members followed by a copy constructor.
class Directory{
private:
char * Name;
int Telephone_Number;
char * Address;
public:
Directory (Directory & b)
{
Name = new char [10]; //Just assume that the name will not be greater than //10 characters
Address = new char [30]; //Same here
strcpy (Name, b.Name);
Telephone_Number = b.Telephone_Number;
strcpy (Address, b.Address);
}
};
I wanted to know if my copy constructor would perform deep copy or shallow copy. I understand that it is deep copying Address and Name because new memory is being allocated for them, but what about Telephone_Number?
Is my code doing shallow copying or deep copying? Could anyone explain copy constructors in general to me?
Telephone_Number is declared as an int in the class and copied by value (no references or pointers or anything), so this constructor is doing a deep copy.
Lots of information is available about copy constructors on Wikipedia or in any good C++ book, you should read something like that first and then see if you have any other specific questions.
It is probably worth the reading time, there are important rules that govern how copy constructors are used when initializing and assigning objects that you should understand.
Obviously it is deep copying Address and Name because new memory is being allocated for them, but what about Telephone_Number?
An int, like the telephone number, is always deep copied.
Is my current code doing shallow copy or deep?
Deep
Could anyone explain copy constructors in general and copy constructors specifically to me?
A compiler-synthesized copy constructor would just generate member by
member copy, i.e. a pointer value is copied, not its contents. Compiler-synthesized copy constructor is always shallow. Whenever deep copy is required, the class author has to write it (as you did).
Also, the signature of the copy constructor should be corrected to pass
the original object (here, b) by const reference to prevent it from accidental modification:
Directory (Directory const & b)
^^^^^^
That's a deep copy. Your int is copied by value and you're not copying a reference or pointer.
It is a deep copy. Integers are always copied.
As the others have said, it is a deep copy.
In general when you use a copy constructor, the idea is to get a deep copy. Meaning that the copy does not share data with the original.
So Let's look at the following:
Directory B = new Directory(A);
A->DoSomething();
B->DoSomething();
When I read the above code, I expect that A->DoSomething() only affects the data in A. Likewise with B->DoSomething().
However, there can be exceptions to the deep copy rule. Maybe the data is large, so it might be inefficient to make a copy:
class LargeData {
int* PointerToLargeArray;
}
In the case above, you could implement the copy constructor by just copying pointers, rather than the data:
LargeData(LargeData& b)
{
PointerToLargeArray = b->PointerToLargeArray;
}
Name = new char [10];
Address = new char [30]; //Same here
strcpy (Name, b.Name);
strcpy (Address, b.Address);
Name and Address will just have the memory address to the actual
string. So when we copy 'Name'/'Address' we have to copy the actual content which is pointed by 'Name'/'Address'.If we do just shallow copy as below,
b.Name = Name;
b.Address = Address;
we are just pointing to the data which was created by someone which may also get deleted by that someone.If someone delete the actual content,we will be pointing the content assuming its available.Now the pointer is dangling(Pointing Nothing).
This is why we use copy constructor.
Telephone_Number = b.Telephone_Number;
'Telephone_Number' is holding the actual value.So shallow copy is enough to copy the value
Name/Address will have the memory address and shallow copy will copy
just the memory address of the actual value.
Telephone_Number will have a integer value and shallow copy will copy
that integer value
So the copy constructor is 'PERFECT' doing a deep copy.
#include<iostream>
class A{
public :
int a;
};
int main(){
A obj;
obj.a = 5;
A b(obj);
b.a = 6;
std::cout<<obj.a;
return 0;
}
why is the answer returned as 5 , by default copy constructor in C++ returns a shallow copy. Isn't the shallow copy means reference ??
or m i missing something ?
shallow copy means reference ?? or m i missing something ?
You are missing something. Shallow copy means copy. It copies all the members of the object from one to another. It is NOT a reference. The copy created is completely independent of the original
See this excellent tutorial for the difference between shallow and deep copy.
b is a completely separate object from obj. It has its own a independent of obj's.
It sounds like what you have in mind is a reference:
A& b = obj;
After this, both b and obj refer to the same object. Changes made through b would be visible through obj and vice versa.
shallow copy means reference ?? or m i missing something ?
Yes, you're missing something.
Shallow copy doesn't mean reference. Shallow copy means copying the members : if a member is a pointer, then it copies the address, not the content the pointer is pointing to. That means, the pointers in the original object and the so-called copied object point to the same content in memory. That is called shallow copy. Deep copy, on the other hand, doesn't copy the address, it creates a new pointer (in the new object), allocates memory for it, and then copies the content the original pointer points to.
In your case, shallow copy and deep copy make no difference because there are no pointer member(s) in the class. Every member is copied (as usual), and since no member is a pointer, each copied member is a different member in memory. That is, the original object and the copied object are completely different objects in memory. There is absolutely nothing that the two objects share with each other. So when you modify one, it doesn't at all change anything in the other.
Yes, the default copy constructor is a shallow copy. See more here
But, b is completely disjoint from a, so the two things are not related directly.
A b(obj) copies obj information into the newly created object b. Yes it's a shallow copy so b does not actually control what's being assigned to it. What you're probably thinking about is a reference:
A& b = obj;
b.a = 6;
std::cout << obj.a; // 6
I'm making a very dumb mistake just wrapping a pointer to some new'ed memory in a simple class.
class Matrix
{
public:
Matrix(int w,int h) : width(w),height(h)
{
data = new unsigned char[width*height];
}
~Matrix() { delete data; }
Matrix& Matrix::operator=(const Matrix&p)
{
width = p.width;
height = p.height;
data= p.data;
return *this;
}
int width,height;
unsigned char *data;
}
.........
// main code
std::vector<Matrix> some_data;
for (int i=0;i<N;i++) {
some_data.push_back(Matrix(100,100)); // all Matrix.data pointers are the same
}
When I fill the vector with instances of the class, the internal data pointers all end up pointing to the same memory ?
1. You're missing the copy constructor.
2. Your assignment operator should not just copy the pointer because that leaves multiple Matrix objects with the same data pointer, which means that pointer will be deleted multiple times. Instead, you should create a deep copy of the matrix. See this question about the copy-and-swap idiom in which #GMan gives a thorough explanation about how to write an efficient, exception-safe operator= function.
3. You need to use delete[] in your destructor, not delete.
Whenever you write one of a copy-constructor, copy-assignment operator, or destructor, you should do all three. These are The Big Three, and the previous rule is The Rule of Three.
Right now, your copy-constructor doesn't do a deep copy. I also recommend you use the copy-and-swap idiom whenever you implement The Big Three.* As it stands, your operator= is incorrect.
Perhaps it's a learning exercise, but you should always give classes a single responsibly. Right now, yours has two: managing a memory resource, and being a Matrix. You should separate these so that you have one class that handles the resource, and another that uses said class to use the resource.
That utility class will need to implement The Big Three, but the user class will actually need not implement any of them, because the implicitly generated ones will be handled properly thanks to the utility class.
Of course, such a class already exists as std::vector.
You missed the copy constructor.
Matrix(const Matrix& other) : width(other.w),height(other.h)
{
data = new unsigned char[width*height];
std::copy(other.data, other.data + width*height, data);
}
Edit: And your destructor is wrong. You need to use delete[] instead of delete. Also you assignment operator is just copying the address of the already allocated array and isn't doing a deep copy.
Your missing copy ctor has already been pointed out. When you fix that, you'll still have a major problem though: your assignment operator is doing a shallow copy, which will give undefined behavior (deleting the same data twice). You need either a deep copy (i.e., in your operator= allocate new space, copy existing contents to new space) or else use something like reference counting to ensure the data gets deleted only once, when the last reference to it is destroyed.
Edit: at the risk of editorializing, what you've posted is basically a poster-child for why you should use a standard container instead of writing your own. If you want a rectangular matrix, consider writing it as a wrapper around a vector.
You're using new[], but you aren't using delete[]. That's a really bad idea.
And your assignment operator makes two instances refer to the same allocated memory - both of which will try to deallocate it! Oh, and you're leaking the left side's old memory during assignment.
And, yes, you're missing a copy constructor, too. That's what the Rule of Three is about.
The problem is you are creating a temporary with Matrix(100,100) that gets destructed after it is shallow copied into the vector. Then on the next iteration it is constructed again and the same memory is allocated for the next temporary object.
To fix this:
some_data.push_back(new Matrix(100,100));
You will also have to add some code to delete the objects in the matrix when you are done.
EDIT: Also fix the stuff mentioned in the other answers. That's important, too. But if you change your copy constructor and assignment operators to perform deep copies, then don't 'new' the objects when filling the vector or it will leak memory.