#include <iostream>
using namespace std;
class Empty{
char omg[0];
};
int main()
{
Empty em1, em2;
Empty set[100];
cout << sizeof(Empty) << " " << sizeof(em1) << " " << sizeof(em2) << endl;
cout << (long*)&em1 << " " << (long*)&em2 << endl;
cout << "total numbers of element is: " << sizeof(set)/sizeof(*set) << endl;
return 0;
}
Its output is:
0 0 0
0xbff36ad0 0xbff36ac8
numbers of elements is: 4
The results are so surprising.
As shown above, Empty is a class, the size of it and its objects are all 0, why?
Maybe I guess, because a empty class's size is 1, and when the class is not empty, its size is decided by is members, but here its member is special, it is a Arrays of Length Zero, and this array's size is 0, so the size of class and objects are all 0.
It's just my guess. As the program running, we can see that two objects both have address, and the address is different.
Here is my question: if object of 0 size can be implemented, Why the C++ standard states that empty objects have sizeof() = 1, it is for "To ensure that the addresses of two different objects will be different"Why is the size of an empty class not zero? , but now, we do have different address as the output,how does this happen?
Further more, no matter what the size of the array set is, the last line output is always 4, why?
Thanks :)
PS: I run this program on MacOS, and the compiler is Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
I'll take a stab since no one more experienced has:
As shown above, Empty is a class, the size of it and its objects are all 0, why?
Zero-sized arrays are prohibited by the standard, therefore as far as the standard is concerned sizeof(Empty) is a meaningless expression, you are already in the realm of undefined behaviour.
Here is my question: if object of 0 size can be implemented, [...] Why is the size of an empty class not zero? , but now, we do have different address as the output,how does this happen?
As above, an object of size 0 cannot exist in a valid standard c++ program (with the exception of base class subobjects).
Your compiler allows this as an extension to the standard, and as long as you use this extension within the scope it was intended for (i.e. as a pre-flexible array member hack) you shouldn't have any problems, although your code is not portable. Your example above however is not how zero-sized arrays are meant to be used (not to mention there are better constructs in c++ for handling these situations anyway).
Your compiler is intelligent enough to provide separate addresses for em1 and em2, but you should find that all elements of set have in fact the same address.
Further more, no matter what the size of the array set is, the last line output is always 4, why?
Since your compiler considers sizeof(Empty) and arrays of Empty to be zero, you are dividing by zero, which is undefined behavior. You might find your program crashes if you disable optimizations, with GCC for instance your program crashes with -O0 but not with -O1.
Related
Why does the new[] operator in C++ actually create an array of length + 1? For example, see this code:
#include <iostream>
int main()
{
std::cout << "Enter a positive integer: ";
int length;
std::cin >> length;
int *array = new int[length]; // use array new. Note that length does not need to be constant!
//int *array;
std::cout << "I just allocated an array of integers of length " << length << '\n';
for (int n = 0; n<=length+1; n++)
{
array[n] = 1; // set element n to value 1
}
std::cout << "array[0] " << array[0] << '\n';
std::cout << "array[length-1] " << array[length-1] << '\n';
std::cout << "array[length] " << array[length] << '\n';
std::cout << "array[length+1] " << array[length+1] << '\n';
delete[] array; // use array delete to deallocate array
array = 0; // use nullptr instead of 0 in C++11
return 0;
}
We dynamically create an array of length "length" but we are able to assign a value at the index length+1. If we try to do length+2, we get an error.
Why is this? Why does C++ make the length = length + 1?
It doesn’t. You’re allowed to calculate the address array + n, for the purpose of checking that another address is less than it. Trying to access the element array[n] is undefined behavior, which means the program becomes meaningless and the compiler is allowed to do anything whatsoever. Literally anything; one old version of GCC, if it saw a #pragma directive, started a roguelike game on the terminal. (Thanks, Revolver_Ocelot, for reminding me: that was technically implementation-defined behavior, a different category.) Even calculating the address array + n + 1 is undefined behavior.
Because it can do anything, the particular compiler you tried that on decided to let you shoot yourself in the foot. If, for example, the next two words after the array were the header of another block in the heap, you might get a memory-corruption bug. Or maybe a compiler stored the array at the top of your memory space, the address &array[n+1] is aNULL` pointer, and trying to dereference it causes a segmentation fault. Or maybe the next page of memory is not readable or writable and trying to access it crashes the program with a protection fault. Or maybe the implementation bounds-checks your array accesses at runtime and crashes the program. Maybe the runtime stuck a canary value after the array and checks later to see if it was overwritten. Or maybe it happens, by accident, to work.
In practice, you really want the compiler to catch those bugs for you instead of trying to track down the bugs that buffer overruns cause later. It would be better to use a std::vector than a dynamic array. If you must use an array, you want to check that all your accesses are in-bounds yourself, because you cannot rely on the compiler to do that for you and skipping them is a major cause of bugs.
If you write or read beyond the end of an array or other object you create with new, your program's behaviour is no longer defined by the C++ standard.
Anything can happen and the compiler and program remain standard compliant.
The most likely thing to happen in this case is you are corrupting memory in the heap. In a small program this "seems to work" as the section of the heap ypu use isn't being used by any other code, in a larger one you will crash or behave randomly elsewhere in a seemingoy unrelated bit of code.
But arbitrary things could happen. The compiler could prove a branch leads to access beyond tue end of an array and dead-code eliminate paths that lead to it (UB that time travels), or it could hit a protected memory region and crash, or it could corrupt heap management data and cause a future new/delete to crash, or nasal demons, or whatever else.
At the for loop you are assigning elements beyond the bounds of the loop and remember that C++ does not do bounds checking.
So when you initialize the array you are initializing beyond the bounds of the array (Say the user enters 3 for length you are initializing 1 to array[0] through array[5] because the condition is n <= length + 1;
The behavior of the array is unpredictable when you go beyond its bounds, but most likely your program will crash. In this case you are going 2 elements beyonds its bounds because you have used = in the condition and length + 1.
There is no requirement that the new [] operator allocate more memory than requested.
What is happening is that your code is running past the end of the allocated array. It therefore has undefined behaviour.
Undefined behaviour means that the C++ standard imposes no requirements on what happens. Therefore, your implementation (compiler and standard library, in this case) will be equally correct if your program SEEMS to work properly (as it does in your case), produces a run time error, trashes your system drive, or anything else.
In practice, all that is happening is that your code is writing to memory, and later reading from that memory, past the end of the allocated memory block. What happens depends on what is actually in that memory location. In your case, whatever happens to be in that memory location is able to be modified (in the loop) or read (in order to print to std::cout).
Conclusion: the explanation is not that new[] over-allocates. It is that your code has undefined behaviour, so can seem to work anyway.
I thought I knew how to deal with memory management in c++ but this confused me:
Consider the following code:
struct A {
int i;
};
int main(int argc, char* argv[]) {
A a{ 5 }; //Constructs an A object on the stack
A* b = new A{ 7 }; //Constructs an A object on the heap and stores a pointer to it in b
A* c = new A[] { //Construct an array of A objects on the heap and stores a pointer to it in c
{ 3 },
{ 4 },
{ 5 },
{ 6 }
};
std::cout << "a: " << a.i << "\n"; //Prints 'a: 5'
std::cout << "b: " << b->i << "\n"; //Prints 'b: 7'
std::cout << "c: " << c[0].i << "; " << c[1].i << "; " << c[2].i << "; " << c[3].i << "\n";
//Prints 'c: -33686019; -1414812757; -1414812757; -1414812757'
delete b;
delete[] c;
return 0;
}
I don't understand why the last print-out of c prints those weird numbers. If I add a constructor to A like so:
struct A {
A(int i) : i{i} {}
int i;
};
Then the output of the last print-out becomes:
'c: 3; 4; 5; 6'
as it should be. But now delete[] c; will give me a runtime error (not an exception it seems) that says MyGame.exe has triggered a breakpoint. (I'm working in VS2013).
Furthermore, if I change the line A* c = new A[] { to A* c = new A[4] { the error disappears and everything works as expected.
So my questions are:
Why the weird numbers? Won't the A objects in the array get properly constructed somehow if I don't define a constructor?
And why do I need to specify the array size explicitly even though it will compile and link just fine without? Initializing arrays on the stack this way does not give me a runtime error (I tested it to be sure).
This is an error:
A* c = new A[] { {3}, {4}, {5}, {6} };
You must put the dimension inside the []. With new the array dimension cannot be deduced from the initializer list.
Putting 4 in here makes your code work correctly for me.
Your compiler apparently has an "extension" that treats new A[] as new A[1].
If you compile in standard mode (with gcc or clang, -std=c++14 -pedantic), which is always a good idea, the compiler will tell you about things like this. Treat warnings as errors unless you are really sure they are not errors :)
Why the weird numbers?
Because no memory was allocated to back them. The pointer is pointing at Crom knows what. That structure should not compile.
Won't the A objects in the array get properly constructed somehow if I don't define a constructor?
Without a constructor all of the members will be initialized to their defaults. int's and most Plain Old Datatypes have no defined default value. In a typical implementation they get whatever value happens to already be in their allocated memory block. If a member object is of a type that doesn't default constructor and is unable to make one, you get a compiler error.
And why do I need to specify the array size explicitly even though it will compile and link just fine without?
It shouldn't compile, mismatch between the size of the array (unspecified and an error unto itself) and the number of elements in the initializer list, so the compiler has a bug. Linker is not involved at this point.
Initializing arrays on the stack this way does not give me a runtime error (I tested it to be sure).
In the static version the compiler can count the number of elements in initialization list. Why the dynamic version with new can't, gotta say I have no good answer. You'd think it would be a simple bit of counting that initializer list, so there's something deeper preventing it. The folk who debated and then approved the standard either never considered allocating a dynamic array that way or couldn't find a good way to make it work in all cases. Same reason variable length arrays still aren't in the standard.
"And why do I need to specify the array size explicitly even though it will compile and link just fine without? It shouldn't compile, ...." To be clear: If I add the constructor to A and run it, it runs just fine up until the delete[] statement. Only then it crashes but cout << c[0] works as 'expected'
This is because you are unlucky. That constructor is writing into memory that your program owns, but didn't allocate to c. Printing those values works, but whatever was supposed to be in memory at that point has been overwritten. This will probably cause your program to crash sooner or later. This time it's later.
My suspicions, and this is guesswork based on specific because you've ventured far into the realms of the undefined, are the crash on delete[] is because
A* c = new A[]
Allocated A[1] and assigned it to c rather than failing to compile. c has one A to work with. The initializer list tries to stuff in 4 and writes 3 into c[0] and the 4,5, and 6 over the heap control information that delete needs to put the data back. All looks good until delete tries to use that overwritten information.
Oh and this:"Without a constructor all of the members will be initialized to their defaults. int's and most Plain Old Datatypes have no defined default value.". For structs a user defined ctor seems optional because you can initialize a struct by providing arguments corresponding to its data fields.
A struct has a much more permissive attitude toward data encapsulation than a class and defaults to public access where a class defaults to private. I've never tried it, but I'm betting that you can use the same struct trick to init all the public members of a class.
OK. Just tried it. Works in GCC 4.8.1. Not going to make that claim in general without looking it up in the standard. Got to get a copy of it.
I wrote a simple code as follows:
void show(const int a[], unsigned elements);
int main()
{
show(new int[]{1, 2, 3, 45}, 4); //does not work
}
void show(const int a[], unsigned elements)
{
cout << "{ ";
for (int i = 0; i < elements; i++)
{
cout << a[i];
if (i != elements - 1)
cout << ",";
cout << " ";
}
cout << "}";
}
It should just output { 1, 2, 3, 45 }. If I include a size in the brackets
show(new int[4]{1, 2, 3, 45}, 4);
then it works. So naturally I would assume that if I write the new this way I have to specify the size (although I thought that giving it an initialization list would imply the size). But, the odd thing is that when set a breakpoint at the show function call and I run it step by step through the debugger, the program outputs everything correctly and terminates at the end of main like it should. If I don't use the debugger, it either crashes after outputting a '{' or it outputs the whole thing "{ 1, 2, 3, 45 }" and an assertion failure " Program: ... "Expression: _CrtIsValidHeapPointer(pUserData) ... "
I'm curious to know why it is behaving this way. Also, I am using Visual Studio on Windows 8.
EDIT: I am using namepsace std. Please don't comment about using namespaces or about how to better write this code. I'm solely interested in the cause of this issue.
EDIT Responding to additional question in comment.
To be quick, yes it would "still" be a pointer, and yes it compiles with clang and gcc when you add the 4.
There are a couple things going on, however, and my initial answer was a simplification. The problem is that your expression is not well-formed to begin with, so it's not clear what it should evaluate to or what the type should be. Consider
If type is an array type, all dimensions other than the first must be specified as positive integral constant expression (until C++14)converted constant expression of type std::size_t (since C++14), but the first dimension may be any expression convertible to std::size_t.
Source: http://en.cppreference.com/w/cpp/language/new
As it says, either way there must be an expression in the brackets. This makes it difficult to say whether the expression would still evaluate to a pointer. A well-formed new expression would indeed evaluate to a pointer, no matter how many dimensions it has, even if it has zero. When I say pointer here, I strictly mean the representation, not the type.
The point is that the type, at least "inside" new, is different depending on how many dimensions you have. So, whether you do
new int
new int[6]
new int[12][14]
the representation is the same (a pointer), but the type new sees is different in each case. The compiler is able to respond to the different types in new (think by analogy with function overloading). In particular, when the type is an array type, it is possible to initialize the new memory with the braced initializer list containing multiple elements.
My best guess is, since VS was accepting the brackets without an expression, it was allocating memory for either a single int or int[0]. In the former case, it was wrongly allowing you to brace initialize it as if it was an array type, and in the latter case the allocated memory was not enough anyway. Your main then wrote over a heap guard that is there to catch this sort of thing in debug mode. When this was checked at the end of main or at program termination, you saw the symptoms. The flakiness in the output was either due to different heap layouts or due to buffering in the output stream.
Original answer
Your new expression, if it was well-formed, would have scalar type, meaning that the result is a "single value". That single value is a pointer to an integer, specifically to the one at the beginning of the array you are trying to create. That is how "dynamic arrays" are represented in C++. The type system does not "know" their size.
You are trying to initialize this single pointer value with an initializer list of 4 values. This shouldn't work. I am not sure that this should compile at all. It certainly didn't compile with clang or gcc, and I'm surprised that it worked in Visual Studio.
This question already has answers here:
What happens if I define a 0-size array in C/C++?
(8 answers)
Closed 1 year ago.
Today I incidentally defined a two dimensional array with the size of one dimension being 0, however my compiler did not complain. I found the following which states that this is legal, at least in the case of gcc:
6.17 Arrays of Length Zero
However, I have two questions on this usage:
First, is this considered as good programming practice? If so, then when should we use it in real world?
Second, the array I defined was two dimensional, with 0 size for one dimension. Is this the same as the one dimensional case? For example,
int s[0]
int s[0][100]
int s[100][0]
Are they all the same in the memory and for the compiler?
EDIT: Reply to Greg: The compiler I am using is gcc 4.4.5. My intention for this problem is not compiler-dependent, however if there are any compiler specific quirks that would be helpful too:)
Thanks in advance!
In C++ it is illegal to declare an array of zero length. As such it is not normally considered a good practice as you are tying your code to a particular compiler extension. Many uses of dynamically sized arrays are better replaced with a container class such as std::vector.
ISO/IEC 14882:2003 8.3.4/1:
If the constant-expression (5.19) is present, it shall be an integral constant expression and its value shall be greater than zero.
However, you can dynamically allocate an array of zero length with new[].
ISO/IEC 14882:2003 5.3.4/6:
The expression in a direct-new-declarator shall have integral or enumeration type (3.9.1) with a non-negative value.
I ran this program at ideone.com
#include <iostream>
int main()
{
int a[0];
int b[0][100];
int c[100][0];
std::cout << "sizeof(a) = " << sizeof(a) << std::endl;
std::cout << "sizeof(b) = " << sizeof(b) << std::endl;
std::cout << "sizeof(c) = " << sizeof(c) << std::endl;
return 0;
}
It gave the size of all the variables as 0.
sizeof(a) = 0
sizeof(b) = 0
sizeof(c) = 0
So in the above example, no memory is allocated for a, b or c.
Compiling your example with gcc, all three of them have sizeof 0, so I would assume that all of them are treated equally by the compiler.
Your link explains everything. They are used as last field in a struct when the length of struct is not known at compile time. If you try using them on stack or in a middle of other declarations you will end up overwriting next elements.
In this example, I create a vector with one integer in it and then I erase that integer from the vector. The size of the vector decreases, but the integer is still there! Why is the integer still there? How is it possible for a vector of size 0 to contain elements?
#include <vector>
#include <iostream>
using namespace std;
int main(int agrc, char* argv[])
{
vector<int> v;
v.push_back(450);
cout << "Before" << endl;
cout << "Size: " << v.size() << endl;
cout << "First element: " << (*v.begin()) << endl;
v.erase(v.begin());
cout << "After" << endl;
cout << "Size: " << v.size() << endl;
cout << "First element: " << *(v.begin()) << endl;
return(0);
}
output:
Before
Size: 1
First element: 450
After
Size: 0
First element: 450
You are invoking undefined behavior by dereferencing an invalid memory location. Normally, the heap manager will not immediately free the memory deleted using delete for efficiency purposes. However, that doesn't mean that you can access that memory location, heap manager can use this memory location for other purposes whenever it likes. So your program will behave unpredictably if you dereference a invalid memory location.
IIRC a vector doesn't release space unless specifically told to, so you're seeing an item which is still in its memory but not being tracked by the vector. This is part of the reason why you're supposed to check the size first (the other being that if you never assigned anything, you'll be dereferencing a garbage pointer).
To start, don't count on it being this way across all systems. How a vector works internally is completely implementation-dependent. By dereferencing an invalid memory location, you're circumventing the behavior that has been outlined in the documentation.
That is to say, you can only count on behavior working that is outlined in the STL docs.
The reason you can still access that memory location is because that particular implementation you are using doesn't immediately delete memory, but keeps it around for awhile(probably for performance purposes). Another implementation could very well delete that memory immediately if the author so desired.
It is just that the vector has not freed the memory, but kept it around for future use.
This is what we call "undefined behaviour" There is no guarantee that it will work next time and it may easily crash the program on a future attempt. Don't do it.
What are your compiler options? I get a crash with the usual
options, with both of the compilers I regularly use (g++ and
VC++). In the case of g++, you have to set some additional
options (-D_GLIBCXX_DEBUG, I think) for this behavior; as far as
I can tell, it's the default for VC++. (My command for VC++ was
just "cl /EHs bounds.cc".)
As others have said, it's undefined behavior, but with a good
compiler, it will be defined to cause the program to crash.