I thought I knew how to deal with memory management in c++ but this confused me:
Consider the following code:
struct A {
int i;
};
int main(int argc, char* argv[]) {
A a{ 5 }; //Constructs an A object on the stack
A* b = new A{ 7 }; //Constructs an A object on the heap and stores a pointer to it in b
A* c = new A[] { //Construct an array of A objects on the heap and stores a pointer to it in c
{ 3 },
{ 4 },
{ 5 },
{ 6 }
};
std::cout << "a: " << a.i << "\n"; //Prints 'a: 5'
std::cout << "b: " << b->i << "\n"; //Prints 'b: 7'
std::cout << "c: " << c[0].i << "; " << c[1].i << "; " << c[2].i << "; " << c[3].i << "\n";
//Prints 'c: -33686019; -1414812757; -1414812757; -1414812757'
delete b;
delete[] c;
return 0;
}
I don't understand why the last print-out of c prints those weird numbers. If I add a constructor to A like so:
struct A {
A(int i) : i{i} {}
int i;
};
Then the output of the last print-out becomes:
'c: 3; 4; 5; 6'
as it should be. But now delete[] c; will give me a runtime error (not an exception it seems) that says MyGame.exe has triggered a breakpoint. (I'm working in VS2013).
Furthermore, if I change the line A* c = new A[] { to A* c = new A[4] { the error disappears and everything works as expected.
So my questions are:
Why the weird numbers? Won't the A objects in the array get properly constructed somehow if I don't define a constructor?
And why do I need to specify the array size explicitly even though it will compile and link just fine without? Initializing arrays on the stack this way does not give me a runtime error (I tested it to be sure).
This is an error:
A* c = new A[] { {3}, {4}, {5}, {6} };
You must put the dimension inside the []. With new the array dimension cannot be deduced from the initializer list.
Putting 4 in here makes your code work correctly for me.
Your compiler apparently has an "extension" that treats new A[] as new A[1].
If you compile in standard mode (with gcc or clang, -std=c++14 -pedantic), which is always a good idea, the compiler will tell you about things like this. Treat warnings as errors unless you are really sure they are not errors :)
Why the weird numbers?
Because no memory was allocated to back them. The pointer is pointing at Crom knows what. That structure should not compile.
Won't the A objects in the array get properly constructed somehow if I don't define a constructor?
Without a constructor all of the members will be initialized to their defaults. int's and most Plain Old Datatypes have no defined default value. In a typical implementation they get whatever value happens to already be in their allocated memory block. If a member object is of a type that doesn't default constructor and is unable to make one, you get a compiler error.
And why do I need to specify the array size explicitly even though it will compile and link just fine without?
It shouldn't compile, mismatch between the size of the array (unspecified and an error unto itself) and the number of elements in the initializer list, so the compiler has a bug. Linker is not involved at this point.
Initializing arrays on the stack this way does not give me a runtime error (I tested it to be sure).
In the static version the compiler can count the number of elements in initialization list. Why the dynamic version with new can't, gotta say I have no good answer. You'd think it would be a simple bit of counting that initializer list, so there's something deeper preventing it. The folk who debated and then approved the standard either never considered allocating a dynamic array that way or couldn't find a good way to make it work in all cases. Same reason variable length arrays still aren't in the standard.
"And why do I need to specify the array size explicitly even though it will compile and link just fine without? It shouldn't compile, ...." To be clear: If I add the constructor to A and run it, it runs just fine up until the delete[] statement. Only then it crashes but cout << c[0] works as 'expected'
This is because you are unlucky. That constructor is writing into memory that your program owns, but didn't allocate to c. Printing those values works, but whatever was supposed to be in memory at that point has been overwritten. This will probably cause your program to crash sooner or later. This time it's later.
My suspicions, and this is guesswork based on specific because you've ventured far into the realms of the undefined, are the crash on delete[] is because
A* c = new A[]
Allocated A[1] and assigned it to c rather than failing to compile. c has one A to work with. The initializer list tries to stuff in 4 and writes 3 into c[0] and the 4,5, and 6 over the heap control information that delete needs to put the data back. All looks good until delete tries to use that overwritten information.
Oh and this:"Without a constructor all of the members will be initialized to their defaults. int's and most Plain Old Datatypes have no defined default value.". For structs a user defined ctor seems optional because you can initialize a struct by providing arguments corresponding to its data fields.
A struct has a much more permissive attitude toward data encapsulation than a class and defaults to public access where a class defaults to private. I've never tried it, but I'm betting that you can use the same struct trick to init all the public members of a class.
OK. Just tried it. Works in GCC 4.8.1. Not going to make that claim in general without looking it up in the standard. Got to get a copy of it.
Related
Consider the code below. Here I try to create an array that is suppost to take in pointers to objects of type Person. I wanted it's size to be 3 so i put a 3 inside the [ ]. However this 3 seem to do nothing. So i'm wondering what is the correct way of declaring the array? As you can see from the line below i can put the address to a person in 23 position of the array. Which I think is a bit weird since the memory is not reserved.
#include <iostream>
class Person {
//some code
};
int main() {
Person person1;
Person* array_of_person[3];
array_of_person[22] = &person1;
for (int i = 0; i < 10; i++) {
std::cout << array_of_person[i] << "hey im out of bounds " << std::endl;
}
}
However this 3 seem to do nothing.
The 3 means: You declared an array of size 3.
The rest of your code is undefined behaviour for accessing this array out-of-bounds. I presume you expected to get some error or something. This is not how C++ works. If you do something wrong, wrong things will happen. When your code has undefined behaviour the compiler is not mandated to issue an error. As the name suggests it is undefined what your code does.
If you want some feedback use a vector and its at method, as in:
#include <iostream>
class Person {
//some code
};
int main() {
Person person1;
std::vector<Person> array_of_person(3);
array_of_person.at(22) = person1; // out-of-bounds exception
for (int i = 0; i < 10; i++) {
std::cout << array_of_person.at(i) << "hey im out of bounds " << std::endl;
// more out-of-bounds exceptions starting from index 3
}
}
Its not clear why you used pointers, dont do it when not necessary.
The array is declared correctly.
Out of bounds access is not always detected by C++, especially if there is nothing else after the array. If you had some other variables declared after it, they probably would be trashed. Memory is allocated by pages, which typically are 4096 bytes.
C++ doesn't do runtime checking on array bounds. That's up to you. (And it's also why there are array classes in the standard library.)
So you're free to stuff something into array_of_persons[22], but you're stepping on random memory somewhere. You have no idea what you stepped on, but nothing is going to stop you.
But you changed the value of some random data.
Sure, the memory is not reserved, but it still exists. Those c style arrays don't have bound checks (like python or other languages).
You need to be carefulle what you access.
Arrays are like pointers in C++.
array_of_person[22] is the same as *(array_of_person+22)
Its the pointer to the 0th value of array_of_person[] and skips 22 elements ahead.
EDIT:
As mentioned in the comments: the memory is not guaranteed to exist or what data there is. You are most likely either gonna segfault or corrupt some data. C++ doesn't guarantee anything here except for which Address you are trying to access.
So, as of now, it seems to be impossible to actually modify a "const" value in C++ (tested in VS 2017).
const int a = 5;
int* ptr = (int*)&a; // Method 1
*((int*)(&a)) = 6; // Method 2
int* ptr = const_cast<int*>(&a); // Method 3
*ptr = 55;
cout << a << "\t" << &a << endl;
cout << *ptr << "\t" << ptr << endl;
Result:
5 SOMEMEMORYADDRESS
55 SOMEMEMORYADDRESS
Anyone got any idea what else can be tried to achieve the effect? Really curious how it is possible to have 1 memory address (at least according to the console) with 2 values.
Please note: there are topics like this for older C++ versions (and they used to work in the past - but they don't, anymore).
Really curious how it is possible to have 1 memory address (at least according to the console) with 2 values.
It's because you invoked undefined behavior. The C++ standard, from C++98, has expressly forbidden you from modifying an object that is declared const. And the standard has a catch-all statement such that if you do anything which causes modification of a const object, you get undefined behavior.
Because modifying an object declared const is UB, the compiler is free to assume that this object will never be modified. So, since the compiler can see that a is const and it is initialed to 5, it is 100% valid for the compiler to at compile time replace everything which revers to this object with 5. So when you do cout << a, the compiler is free to not bother to access memory; it can just do cout << 5.
If you did something to modify the memory behind a, that's UB, so the compiler doesn't have to care about what happens in that case.
they used to work in the past - but they don't, anymore
No, they never "worked". They merely just so happened to do what you thought they should. But C++ never guaranteed that compilers would behave in this way, so you have no right to complain about compilers changing that behavior now.
I wrote a simple code as follows:
void show(const int a[], unsigned elements);
int main()
{
show(new int[]{1, 2, 3, 45}, 4); //does not work
}
void show(const int a[], unsigned elements)
{
cout << "{ ";
for (int i = 0; i < elements; i++)
{
cout << a[i];
if (i != elements - 1)
cout << ",";
cout << " ";
}
cout << "}";
}
It should just output { 1, 2, 3, 45 }. If I include a size in the brackets
show(new int[4]{1, 2, 3, 45}, 4);
then it works. So naturally I would assume that if I write the new this way I have to specify the size (although I thought that giving it an initialization list would imply the size). But, the odd thing is that when set a breakpoint at the show function call and I run it step by step through the debugger, the program outputs everything correctly and terminates at the end of main like it should. If I don't use the debugger, it either crashes after outputting a '{' or it outputs the whole thing "{ 1, 2, 3, 45 }" and an assertion failure " Program: ... "Expression: _CrtIsValidHeapPointer(pUserData) ... "
I'm curious to know why it is behaving this way. Also, I am using Visual Studio on Windows 8.
EDIT: I am using namepsace std. Please don't comment about using namespaces or about how to better write this code. I'm solely interested in the cause of this issue.
EDIT Responding to additional question in comment.
To be quick, yes it would "still" be a pointer, and yes it compiles with clang and gcc when you add the 4.
There are a couple things going on, however, and my initial answer was a simplification. The problem is that your expression is not well-formed to begin with, so it's not clear what it should evaluate to or what the type should be. Consider
If type is an array type, all dimensions other than the first must be specified as positive integral constant expression (until C++14)converted constant expression of type std::size_t (since C++14), but the first dimension may be any expression convertible to std::size_t.
Source: http://en.cppreference.com/w/cpp/language/new
As it says, either way there must be an expression in the brackets. This makes it difficult to say whether the expression would still evaluate to a pointer. A well-formed new expression would indeed evaluate to a pointer, no matter how many dimensions it has, even if it has zero. When I say pointer here, I strictly mean the representation, not the type.
The point is that the type, at least "inside" new, is different depending on how many dimensions you have. So, whether you do
new int
new int[6]
new int[12][14]
the representation is the same (a pointer), but the type new sees is different in each case. The compiler is able to respond to the different types in new (think by analogy with function overloading). In particular, when the type is an array type, it is possible to initialize the new memory with the braced initializer list containing multiple elements.
My best guess is, since VS was accepting the brackets without an expression, it was allocating memory for either a single int or int[0]. In the former case, it was wrongly allowing you to brace initialize it as if it was an array type, and in the latter case the allocated memory was not enough anyway. Your main then wrote over a heap guard that is there to catch this sort of thing in debug mode. When this was checked at the end of main or at program termination, you saw the symptoms. The flakiness in the output was either due to different heap layouts or due to buffering in the output stream.
Original answer
Your new expression, if it was well-formed, would have scalar type, meaning that the result is a "single value". That single value is a pointer to an integer, specifically to the one at the beginning of the array you are trying to create. That is how "dynamic arrays" are represented in C++. The type system does not "know" their size.
You are trying to initialize this single pointer value with an initializer list of 4 values. This shouldn't work. I am not sure that this should compile at all. It certainly didn't compile with clang or gcc, and I'm surprised that it worked in Visual Studio.
#include <iostream>
using namespace std;
class Empty{
char omg[0];
};
int main()
{
Empty em1, em2;
Empty set[100];
cout << sizeof(Empty) << " " << sizeof(em1) << " " << sizeof(em2) << endl;
cout << (long*)&em1 << " " << (long*)&em2 << endl;
cout << "total numbers of element is: " << sizeof(set)/sizeof(*set) << endl;
return 0;
}
Its output is:
0 0 0
0xbff36ad0 0xbff36ac8
numbers of elements is: 4
The results are so surprising.
As shown above, Empty is a class, the size of it and its objects are all 0, why?
Maybe I guess, because a empty class's size is 1, and when the class is not empty, its size is decided by is members, but here its member is special, it is a Arrays of Length Zero, and this array's size is 0, so the size of class and objects are all 0.
It's just my guess. As the program running, we can see that two objects both have address, and the address is different.
Here is my question: if object of 0 size can be implemented, Why the C++ standard states that empty objects have sizeof() = 1, it is for "To ensure that the addresses of two different objects will be different"Why is the size of an empty class not zero? , but now, we do have different address as the output,how does this happen?
Further more, no matter what the size of the array set is, the last line output is always 4, why?
Thanks :)
PS: I run this program on MacOS, and the compiler is Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
I'll take a stab since no one more experienced has:
As shown above, Empty is a class, the size of it and its objects are all 0, why?
Zero-sized arrays are prohibited by the standard, therefore as far as the standard is concerned sizeof(Empty) is a meaningless expression, you are already in the realm of undefined behaviour.
Here is my question: if object of 0 size can be implemented, [...] Why is the size of an empty class not zero? , but now, we do have different address as the output,how does this happen?
As above, an object of size 0 cannot exist in a valid standard c++ program (with the exception of base class subobjects).
Your compiler allows this as an extension to the standard, and as long as you use this extension within the scope it was intended for (i.e. as a pre-flexible array member hack) you shouldn't have any problems, although your code is not portable. Your example above however is not how zero-sized arrays are meant to be used (not to mention there are better constructs in c++ for handling these situations anyway).
Your compiler is intelligent enough to provide separate addresses for em1 and em2, but you should find that all elements of set have in fact the same address.
Further more, no matter what the size of the array set is, the last line output is always 4, why?
Since your compiler considers sizeof(Empty) and arrays of Empty to be zero, you are dividing by zero, which is undefined behavior. You might find your program crashes if you disable optimizations, with GCC for instance your program crashes with -O0 but not with -O1.
I'm trying to write a simple program to show how variables can be manipulated indirectly on the stack. In the code below everything works as planned: even though the address for a is passed in, I can indirectly change the value of c. However, if I delete the last line of code (or any of the last three), then this no longer applies. Do those lines somehow force the compiler to put my 3 in variables sequentially onto the stack? My expectation was that that would always be the case.
#include <iostream>
using namespace std;
void someFunction(int* intPtr)
{
// write some code to break main's critical output
int* cptr = intPtr - 2;
*cptr = 0;
}
int main()
{
int a = 1;
int b = 2;
int c = 3;
someFunction(&a);
cout << a << endl;
cout << b << endl;
cout << "Critical value is (must be 3): " << c << endl;
cout << &a << endl;
cout << &b << endl;
cout << &c << endl; //when commented out, critical value is 3
}
Your code causes undefined behaviour. You can't pass a pointer to an int and then just subtract an arbitrary amount from it and expect it to point to something meaningful. The compiler can put a, b, and c wherever it likes in whatever order it likes. There is no guaranteed relationship of any kind between them, so you you can't assume someFunction will do anything meaningful.
The compiler can place those wherever and in whatever order it likes in the current stack frame, it may even optimize them out if not used. Just make the compiler do what you want, by using arrays, where pointer arithmetic is safe:
int main()
{
int myVars[3] = {1,2,3};
//In C++, one could use immutable (const) references for convenience,
//which should be optimized/eliminated pretty well.
//But I would never ever use them for pointer arithmetic.
int& const a = myVars[0];
int& const b = myVars[1];
int& const c = myVars[2];
}
What you do is undefined behaviour, so anything may happen. But what is probably going on, is that when you don't take the adress of c by commenting out cout << &c << endl;, the compiler may optimize avay the variable c. It then substitutes cout << c with cout << 3.
As many have answered, your code is wrong since triggering undefined behavior, see also this answer to a similar question.
In your original code the optimizing compiler could place a, b and c in registers, overlap their stack location, etc....
There are however legitimate reasons for wanting to know what are the location of local variables on the stack (precise garbage collection, introspection and reflection, ...).
The correct way would then to pack these variables in a struct (or a class) and to have some way to access that structure (for example, linking them in a list, etc.)
So your code might start with
void fun (void)
{
struct {
int a;
int b;
int c;
} _frame;
#define a _frame.a
#define b _frame.b
#define c _frame.c
do_something_with(&_frame); // e.g. link it
You could also use array members (perhaps even flexible or zero-length arrays for housekeeping routines), and #define a _frame.v[0] etc...
Actually, a good optimizing compiler could optimize that nearly as well as your original code.
Probably, the type of the _frame might be outside of the fun function, and you'll generate housekeeping functions for inspecting, or garbage collecting, that _frame.
Don't forget to unlink the frame at end of the routine. Making the frame an object with a proper constructor and destructor definitely helps. The constructor would link the frame and the destructor would unlink it.
For two examples where such techniques are used (both because a precise garbage collector is needed), see my qish garbage collector and the (generated C++) code of MELT (a domain specific language to extend GCC). See also the (generated) C code of Chicken Scheme or Ocaml runtime conventions (and its <caml/memory.h> header).
In practice, such an approach is much more welcome for generated C or C++ code (precisely because you will also generate the housekeeping code). If writing them manually, consider at least fancy macros (and templates) to help you. See e.g. gcc/melt-runtime.h
I actually believe that this is a deficiency in C. There should be some language features (and compiler implementations) to introspect the stack and to (portably) backtrace on it.