this is my first question in this forum, sorry my bad english.
I have a question about pointers and dynamic memory in c++.
Example, this code:
#include <iostream>
using namespace std;
int main(int argc, char const *argv[])
{
int *a = new int;
for (int i = 0; i < 5; i++)
cout << a++ << endl;
return 0;
}
Output:
0x11d4c20
0x11d4c24
0x11d4c28
0x11d4c2c
0x11d4c30
My question, is why can I move more than that 'single' block of memory that I created with new.
What is a pointing to?
Same occurs with new int[], even if I specific the size:
#include <iostream>
using namespace std;
int main(int argc, char const *argv[])
{
int *a = new int[2];
for (int i = 0; i < 5; i++)
cout << a++ << endl;
return 0;
}
Output:
0x2518c20
0x2518c24
0x2518c28
0x2518c2c
0x2518c30
Again, what is happening?
What is a pointing to?
Does all of this mean I'm violating memory?
a is an int*, not an int. What you are printing is actually the pointer, i.e. the memory address of the pointed object. Use the dereference operator * whenever you want to modify the pointed value, i.e.
cout << (*a)++ << endl;
NB: Likewise, you can get a pointer to an int using the reference operator, &, not to be mixed up with a reference (e.g. a int& type).
This may print 0 1 2 3 4. may because you are not initializing the new int created in dynamic memory. This means reading from *a (dereferenced a) is undefined behavior, which means your program may misbehave. You have to change your line using new:
int *a = new int();
This will initialize *a to 0 and now 0 1 2 3 4 will be printed correctly.
Note that int *a = new int[2]; does create a dynamic array of 2 entries in dynamic memory, which means *(a + 1) can be used as well (as if it was a regular array). It does not initialize *a to 2.
Do remember to delete a; when you've done using it. In a real application, you could get a memory leak if you don't - i.e. your program would still use memory it doesn't need anymore. Caution, when you have to delete a dynamically-allocated array (i.e. new int[2]), you need to use delete[] a; instead, or you will trigger undefined behavior.
You may also use a unique_ptr (or a shared_ptr) in C++11 as an alternative to this kind of memory allocation, i.e. :
#include <memory>
// ...
std::unique_ptr<int> a = std::make_unique<int>(0);
Thanks to this solution, you do not need to delete a because the unique_ptr will do this for you, when itself dies (i.e. out of the scope, here).
Edit: Bonus:
0x2518c20
0x2518c24
0x2518c28
Why is the number incremented by 4 if you just used ++?
Using ++ on an address will actually increment it by sizeof(T), which here is sizeof(int) and not 1. This explains why you can, as previously stated, use *(a + 1) if you used new int[2].
i think that it is ,because a points to an integer , the size of an integer is 4 bytes (sizeof(int) == 4) after executing a++ ,a points to the next integer,try char *a, and a++ to be more sure
It is legal to point at an object or the spot right after an object. A new int[2] creates two adjacent objects (in an array) and returns a pointer to the first one.
So yes, what you did above is not permitted. The behaviour of adding 5 to a pointer to a single object is not defined by the C++ standard; the compiler can generate assembly that does anything at all.
As it happens, on a flat memory architecture, pointers are basically unsigned integers. And incrementing a pointer by 1 is just incrementing it by the size of the pointed-to object.
So what often happens is you just get pointers to whatever happens to be there.
Now this is not guaranteed and should not be relied upon. Many of the rules of C++ that make actions undefined permit certain optimizations to occur over the "naive" mapping you might think the compiler does. For example, pointers to short can never point to an int and change its value in a defined way, which means if a function has both an int and short pointer it can assume that a write to the short does not modify the int.
The naive "write to the two words in an int" method of using shorts can work, then not work for seeming no reason, because the behaviour was undefined and the compiler was free to optimize assuming it could not happen.
In short, your actions are not legal, the behaviour you got is not surprising, but you can never rely on it.
Pointers are not just unsigned integers, even if that is what your compiler implements them with. They are an abstraction. Their behaviour is determined not by their implementation, but rather what the standard permits you do with them. When you act in ways the standard does not permit, the behaviour you get is undefined by the standard. Compilers can, and have been known to, exploit that fact to assume undefined behaviour cannot and does not occur. The program could behave unexpectedly on lines of code prior to the undefined behaviour as the compiler reorders and optimizes based on the assumption your code has well defined behaviour.
Pointers in c++ (and c) are just addresses to the memory (32/64bit numbers). There is no problem with increasing them or decreasing any way you want and you are not violating the memory or any other rule. You would be however violating the memory if you tried to read or write to the address pointed to by A after going through the for cycle.
As for what it is pointing to, most likely it's just some more space allocated to you by the new (and malloc under it) because it tends to give more space than you ask for, though this behaviour is not guaranteed. It might also point to heap data or just unassigned memory.
Related
I've been studying C++ for couple of months now and just recently decided to look more deeply into the logic of pointers and arrays. What I've been taught in uni is pretty basic - pointers contain the address of a variable. When an array is created, basically a pointer to its first element is created.
So I started experimenting a bit. (and got to a conclusion which I need confirmation for). First of all I created
int arr[10];
int* ptr = &arr[5];
And as you would imagine
cout << ptr[3];
gave me the 8th element of the array. Next I tried
int num = 6;
int* ptr2 = #
cout << ptr2[5];
cout << ptr2 + 5;
which to my great delight (not irony) returned the same addresses. Even though num wasn't an array.
The conclusion to which I got: array is not something special in C++. It's just a pointer to the first element (already typed that). More important: Can I think about every pointer in the manner of object of a class variable*. Is the operator [] just overloaded in the class int*? For example to be something along the lines of:
int operator[] (int index){
return *(arrayFirstaddress + index);
}
What was interesting to me in these experiments is that operator [] works for EVERY pointer. (So it's exactly like overloading an operator for all instances of the said class)
Of course, I can be as wrong as possible. I couldn't find much information in the web, since I didn't know how to word my question so I decided to ask here.
It would be extremely helpful if you explained to me if I'm right/wrong/very wrong and why.
You find the definition of subscripting, i.e. an expression like ptr2[5] in the c++ standard, e.g. like in this online c++ draft standard:
5.2.1 Subscripting [expr.sub]
(1) ... The expression E1[E2] is identical (by definition) to
*((E1)+(E2))
So your "discovery" sounds correct, although your examples seem to have some bugs (e.g. ptr2[5] should not return an address but an int value, whereas ptr2+5 is an address an not an int value; I suppose you meant &ptr2[5]).
Further, your code is not a prove of this discovery as it is based on undefined behaviour. It may yield something that supports your "discovery", but your discovery could still be not valid, and it could also do the opposite (really!).
The reason why it is undefined behaviour is that even pointer arithmetics like ptr2+5 is undefined behaviour if the result is out of the range of the allocated memory block ptr2 points to (which is definitely the case in your example):
5.7 Additive operators
(6) ... Unless both pointers point to elements of the same array
object, or one past the last element of the array object, the behavior
is undefined.
Different compilers, different optimization settings, and even slight modifications anywhere in your program may let the compiler do other things here.
An array in C++ is a collection of objects. A pointer is a variable that can store the address of something. The two are not the same thing.
Unfortunately, your sample
int num = 6;
int* ptr2 = #
cout << ptr2[5];
cout << ptr2 + 5;
exhibits undefined behaviour, both in the evaluation of ptr2[5] and ptr2 + 5. Pointer expressions are special - arithmetic involving pointers only has defined behaviour if the pointer being acted on (ptr2 in this case) and the result (ptr2 + 5) are within the same object. Or one past the end (although dereferencing a "one past the end" pointer - trying to access the value it points at - also gives undefined behaviour).
Semantically, *(ptr + n) and ptr[n] are equivalent (i.e. they have the same meaning) if ptr is a pointer and n is an integral value. So if evaluating ptr + n gives undefined behaviour, so does evaluating ptr[n]. Similarly, &ptr[n] and ptr + n are equivalent.
In expressions, depending on context, the name of an array is converted to a pointer, and that pointer is equal to the address of that array's first element. So, given
int x[5];
int *p;
// the following all have the same effect
p = x + 2;
p = &x[0] + 2;
p = &x[2];
That does not mean an array is a pointer though.
Once I encountered following code:
int s =10;
int *p=&s;
cout << p[3] << endl;
And I can't understand why am I able to access p[3] that doesn't exist (only p exists that is single pointer but I still get access to p[3] that is array that I have never created).
Is it some compiler bug or it is a feature or I don't know some basics of C++ that covers this?
Thank you
Why does C++ consider pointer and array of pointers as same thing?
It doesn't. You're asking why it treats pointers and arrays as the same.
The [] operator is just an abbreviated form of pointer arithmetic. a[b] is equivalent to *(a + b). Array names can decay into pointers, and then pointer arithmetic is applied. It's the programmers job to make sure they don't go out of bounds. The compiler can't possibly stop you from shooting your foot off.
Also, claiming to be able to "access" it is a strong assertion. That is UB, and is most likely going to either read the wrong memory or get a segfault.
No, it's not a compiler bug, its a very useful feature... but lets not get ahead of ourselves here, the consequence of your code is called Undefined Behaviour
So, what's the feature? All naked arrays are actually pointer to the first element. Except un-decayed arrays (See What is array decaying?).
Consider this code:
int s =10;
int* array = new int[12];
int *p;
p = array; // p refers to the first element
int* x = p + 7; //advances to the 7th element, compiler never checks bounds
int* y = p + 700; //ditto ...this is obviously undefined
p = &s; //p is now pointing to where s
int* xx = p + 3; //But s is a single element, so Undefined Behaviour
Once an array is decayed, it's simply a pointer... And a pointer can be incremented, decremented, dereferenced, advanced, assigned or reassigned.
So,
cout << p[7] << endl;
is a valid C++ program. but not necessarily correct.
It's the responsibility of the programmer to know whether a pointer points to a single element or an array. but thanks to static analyzers and https://github.com/isocpp/CppCoreGuidelines, things are changing for good.
Also see What are all the common undefined behaviours that a C++ programmer should know about?
From here, section array-to-pointer decay:
There is an implicit conversion from lvalues and rvalues of array type to rvalues of pointer type: it constructs a pointer to the first element of an array. This conversion is used whenever arrays appear in context where arrays are not expected, but pointers are
Inherited from C, C++ allows you to treat any pointer like the first element of an array starting at that address.
That's in part because it passes arrays by reference as pointers and so for that to make sense you need to be able to treat a pointer as an array.
It also enables some quite neat and very efficient code in various circumstances.
The upshot is that p[3] is a valid construct in this context.
Obviously however it has undefined behaviour because p isn't pointing to an array! Unfortunately the language rules (and compiler) aren't smart enough to work that out.
C is a very low level language and doesn't enforce nice things like range checking either during compilation or execution.
I read Are negative array indexes allowed in C? and found it interesting that negative values can be used for the index of an array. I tried it again with the c++11 unique_ptr and it works there as well! Of course the deleter must be replaced with something which can delete the original array. Here is what it looks like:
#include <iostream>
#include <memory>
int main()
{
const int min = -23; // the smaller valid index
const int max = -21; // the highest valid index
const auto deleter = [min](char* p)
{
delete [](p+min);
};
std::unique_ptr<char[],decltype(deleter)> up(new char[max-min+1] - min, deleter);
// this works as expected
up[-23] = 'h'; up[-22] = 'i'; up[-21] = 0;
std::cout << (up.get()-23) << '\n'; // outputs:hi
}
I'm wondering if there is a very, very small chance that there is a memory leak. The address of the memory created on the heap (new char[max-min+1]) could overflow when adding 23 to it and become a null pointer. Subtracting 23 still yields the array's original address, but the unique_ptr may recognize it as a null pointer. The unique_ptr may not delete it because it's null.
So, is there a chance that the previous code will leak memory or does the smart pointer behave in a way which makes it safe?
Note: I wouldn't actually use this in actual code; I'm just interested in how it would behave.
Edit: icepack brings up an interesting point, namely that there are only two valid pointer values that are allowed in pointer arithmetic:
§5.7 [expr.add] p5
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
As such, the new char[N] - min of your code already invokes UB.
Now, on most implementations, this will not cause problems. The destructor of std::unique_ptr, however, will (pre-edit answer from here on out):
§20.7.1.2.2 [unique.ptr.single.dtor] p2
Effects: If get() == nullptr there are no effects. Otherwise get_deleter()(get()).
So yes, there is a chance that you will leak memory here if it indeed maps to whatever value represents the null pointer value (most likely 0, but not necessarily). And yes, I know this is the one for single objects, but the array one behaves exactly the same:
§20.7.1.3 [unique.ptr.runtime] p2
Descriptions are provided below only for member functions that have behavior different from the primary template.
And there is no description for the destructor.
new char[max-min+1] doesn't allocate memory on the stack but rather on heap - that's how standard operator new behaves. The expression max-min+1 is evaluated by the compiler and results in 3, so eventually this expression is equal to allocating 3 bytes on the heap. No problem here.
However, subtracting min results in pointer which is 23 bytes beyond the beginning of the allocated memory returned by new and since in new you allocated only 3 bytes, this will definitely point to a location not owned by you --> anything following will result in undefined behavior.
Consider this code:
int *p = new int;
cout << sizeof(*p);
delete p;
As expected the result is 4. Now, consider this other code:
int *p = new int[10];
cout << sizeof(*p);
delete[] p;
I expected to get 40 (the size of the allocated array), however the result is still 4.
Now, suppose I have a function int *foo() that returns a pointer to a structure created with new or with new[] (but I don't know which one):
int *p = foo();
My question is, is there a way (or hack) to know if p points to a single integer or an array of integers?
Please keep in mind that this is just a theoretical question. I won't be writing real code in this fashion.
No, there is no way of doing that. But you know the difference, because the code you wrote called new or new[].
The reason by the way that:
cout << sizeof(*p);
gives you 4 in both cases is because p is a pointer to an int, the expression *p means the thing pointed to by such a pointer (i.e. an int) and the size of an int on your platform is 4. This is all evaluated at compile time, so even if new[] did return a special value, sizeof would not be able to use it.
No, because your result is an address (that's why you get 4 for sizeof() in both cases). You created it, so you're expected to know what it is.
In both examples the type of p is the same: int *. sizeof operates on the type, not the data. It's computed at compile time.
You have a couple of choices. You can keep track of the array size yourself, or you can venture into using one of the containers in the standard library such as vector< int >. These containers will track the size (e.g. vector< int >::size()) for you.
sizeof(x) returns the amount of memory needed to contain x as declared.
There is no dynamic aspect to this at all.
sizeof (*foo) where foo is a bar * will always be the same as sizeof(bar)
No, there isn't any way.
Obligatory question: Why do you need to know?
If it's "because I need to know whether to say delete [] or delete", then just use arrays all the time, if for some obscure reason you can't figure out which one you used in your own code.
Having a function that can return a pointer to a single item or an array is a bad design decision. You can always return a pointer to an array of size 1:
return new int[1];
First, sizeof(*p) returns always a value to the integer, so it's always returning 4.
Now, how can you know whether p is pointing to int or int[] ?
There is no standard way of it. However, you can hack the platform and get it known. For example, if you try printing p[-1], p[-2], ..., p[-4] etc. for certain compilers (say linux in my case) then you will see a particular pattern in the value of this locations. However, this is just a hack and you cannot rely upon it always.
#include<iostream>
using namespace std;
int main()
{
int *p,*c;
p=(int*)10;
c=(int*)20;
cout<<(int)p<<(int)c;
}
Somebody asked me "What is wrong with the above code?" and I couldn't figure it out. Someone please help me.
The fact that int and pointer data types are not required to have the same number of bits, according to the C++ standard, is one thing - that means you could lose precision.
In addition, casting an int to an int pointer then back again is silly. Why not just leave it as an int?
I actually did try to compile this under gcc and it worked fine but that's probably more by accident than good design.
Some wanted a quote from the C++ standard (I'd have put this in the comments of that answer if the format of comments wasn't so restricted), here are two from the 1999 one:
5.2.10/3
The mapping performed by reinterpret_cast is implementation defined.
5.2.10/5
A value of integral type or enumeration type can be explicitly converted to a pointer.
A pointer converted to an integer of sufficient size (if ant such exists on the implementation)
and back to the same pointer type will have its original value; mappings between pointers and
integers are otherwise implementation-defined.
And I see nothing mandating that such implementation-defined mapping must give a valid representation for all input. Otherwise said, an implementation on an architecture with address registers can very well trap when executing
p = (int*)10;
if the mapping does not give a representation valid at that time (yes, what is a valid representation for a pointer may depend of time. For instance delete may make invalid the representation of the deleted pointer).
Assuming I'm right about what this is supposed to be, it should look like this:
int main()
{
int *p, *c;
// Something that creates whatever p and c point to goes here, a trivial example would be.
int pValue, cValue;
p = &pValue;
c = &cValue;
// The & operator retrieves the memory address of pValue and cValue.
*p = 10;
*c = 20;
cout << *p << *c;
}
In order to assign or retrieve a value to a variable referenced by a pointer, you need to dereference it.
What your code is doing is casting 10 into pointer to int (which is the memory address where the actual int resides).
addresses p and c may be larger than int.
The problem on some platforms you need
p = (int*) (long) 10;
See GLIB documentation on type conversion macros.
And for the people who might not find a use for this type of expressions, it is possible to return data inside pointer value returning functions. You can find real-world examples, where this case it is better to use this idiom, instead of allocating a new integer on the heap, and return it back - poor performance, memory fragmentation, just ugly.
You're assigning values (10 and 20) to the pointers which obviously is a potential problem if you try to read the data at those addresses. Casting the pointer to an integer is also really ugly. And your main function does not have a return statement. That is just a few things.
there is more or less everything wrong with it:
int *p,*c;
p=(int*)10;
c=(int*)20;
afterwards p is pointing to memory address 10
afterwards c is pointing to memory address 20
This doesn't look very intentional.
And I suppose that the whole program will simply crash.