C++: Write to/read from invalid/out of bound array index? - c++

First of all, I am a beginner when it comes to C++ programming. Yesterday I encountered something rather strange. I was trying to determine the length of an array via a pointer pointing towards it. Since sizeof didn't work I did a little Google search and ended up on this website where I found the answer that it was not possible. Instead I should put an out of bound value at the last index of the array and increment a counter until this index is reached. Because I didn't want to overwrite the information that was contained at the last index, I tried putting the out of bound value one index after the last one. I expected it to fail, but for some reason it didn't.
I thought that I made a mistake somewhere else and that the array was longer then I assigned it to be, so I made the following test:
int a[4];
a[20] = 42;
std::cout << a[20];
The output is 42 without any errors. Why does this work? This should not be valid at all, right? What's even more interesting is the fact that this works with any primitive type array. However, once I use a std::string the program instantly exists with 1.
Any ideas?

Your system just happens to not be using the memory that just happens to be 20 * sizeof(int) bytes further from the address of your array. (From the beginning of it.) Or the memory belongs to your process and therefore you can mess with it and either break something for yourself or just by lucky coincidence break nothing.
Bottom line, don't do that :)

I think what you need to understand is the following:
when you creating a[4] the compiler allocate memory for 4 integers and remember in a the address of the first one: (*a == &(a[0])).
when you read\write the compiler doesn't check if you in the bounds (because he doesn't longer have this information). and just go to the address of the requested cell of the array in the following way: a[X] == &(a + sizeof(int) * X)
in C++ it's the programmer responsibility to check the bounds when accessing an array.

Related

Visual Studio Variable Type Asterisk

Using Visual Studio, I am running into some trouble with an int type variable and a float type variable. They are both stored in their own arrays. When I go to print them out they come out as memory location gibberish. When I debug I noticed that the correct value is displayed next to the memory gibberish in the watch area. I also noticed that under type, the variable types have an * (asterisk) next to them. Could anybody offer information as to why this would happen? Thanks in advance.
Watch area looks like this...
Name Value Type
score 0x002ff5c8 {96.0000000} float *
studentID 0x002ff698 {9317} int *
I recommend reading into the tutorial above and perhaps an introductory book depending on how interested you are in pursuing learning c++. The type is a pointer type to int and float data. Here is a small example that answers the question (how to print out these values):
float* a = new float(5.8);
the pointer is established, this points to a memory location where a float with the value 5.8 is stored.
std::cout << *a;
The asterisk before a is called dereferencing a pointer, this is how the data is accessed, you may want to check to make sure you have a valid pointer or your program can crash.
delete a;
delete memory allocated when it will no longer be used(EDIT 2), this will free the space given to store a (failing to do so causes a memory leak)
EDIT 1:
Consider that the pointer may point to a contiguous array of floats or int (which is much more likely than just one), then you will have to know the size of the array you are reading to access the elements. In this case, you will use the operator [] to access the members, let's say we have an array b,
float* b = new float[2] {0.0,1.0};
to print it's members you would have to access each element
std::cout << b[0] << ' ' << b[1];
the delete operator looks like this for arrays
delete[] b;
EDIT 2:
Whenever you use new to dynamically allocate memory, think about the scope of the variable, when the scope is over delete the pointer. User is correct, you do not want to delete a pointer which may be used later, nor is it necessary to call delete to pointers obtained from references.
First off you are using the debugger. This is awesome. The sheer number of SO questions that could be solved in five minutes with a debugger is staggering. You are already far, far ahead of the game than a lot of the time-wasting sad sacks who can't be bothered to use the expletive deleted tools that came with the compiler.
Second, some important reading because it explains part of what is going on: What is array decaying?
Now to break down what the debugger is showing you
score 0x002ff5c8 {96.0000000} float *
score: Obviously the variable's name
float *: score is a variable of type float *, a pointer to a float.
0x002ff5c8: This is the data value of score. Pointers are a reference to a location in memory. Rather than being data, they point to data. So a pointer is a variable that contains where to find, the address of, another variable. 002ff5c8 is the hexadecimal location in memory where you will find what score points to.
{96.0000000}: score points to a floating point value that has been set to 96 (possibly plus or minus some fuzziness because not all numbers can be exactly represented with floating point)
So the crazy number 0x002ff5c8 tells the program where to find score's data, and this data happens to be 96.
Note the debugger only shows you the first value in the array of data that is at score, which brings us back to array decaying. Odds are good that the program has knowledge of how much data is pointed at by score. Could be one float. Could be a million. You have to carry the length of a block of an array around with it once the array has decayed.

How do I take the address of one past the end of an array if the last address is 0xFFFFFFFF?

If it is legal to take the address one past the end of an array, how would I do this if the last element of array's address is 0xFFFFFFFF?
How would this code work:
for (vector<char>::iterator it = vector_.begin(), it != vector_.end(); ++it)
{
}
Edit:
I read here that it is legal before making this question: May I take the address of the one-past-the-end element of an array?
If this situation is a problem for a particular architecture (it may or may not be), then the compiler and runtime can be expected to arrange that allocated arrays never end at 0xFFFFFFFF. If they were to fail to do this, and something breaks when an array does end there, then they would not conform to the C++ standard.
Accessing out of the array boundaries is undefined behavior. You shouldn't be surprised if a demon flies out of your nose (or something like that)
What might actually happen would be an overflow in the address which could lead to you reading address zero and hence segmentation fault.
If you are always within the array range, and you do the last ++it which goes out of the array and you compare it against _vector.end(), then you are not really accessing anything and there should not be a problem.
I think there is a good argument for suggesting that a conformant C implementation cannot allow an array to end at (e.g.) 0xFFFFFFFF.
Let p be a pointer to one-element-off-the-end-of-the-array: if buffer is declared as char buffer[BUFFSIZE], then p = buffer+BUFFSIZE, or p = &buffer[BUFFSIZE]. (The latter means the same thing, and its validity was made explicit in the C99 standard document.)
We then expect the ordinary rules of pointer comparison to work, since the initialization of p was an ordinary bit of pointer arithmetic. (You cannot compare arbitrary pointers in standard C, but you can compare them if they are both based in a single array, memory buffer, or struct.) But if buffer ended at 0xFFFFFFFF, then p would be 0x00000000, and we would have the unlikely situation that p < buffer!
This would break a lot of existing code which assumes that, in valid pointer arithmetic done relative to an array base, the intuitive address-ordering property holds.
It's not legal to access one past the end of an array
that code doesn't actually access that address.
and you will never get an address like that on a real system for you objects.
The difference is between dereferencing that element and taking its address. In your example the element past the end wont be dereferenced and so it is a valid. Although this was not really clear in the early days of C++ it is clear now. Also the value you pass to subscript does not really matter.
Sometimes the best thing you can do about corner cases is forbid them. I saw this class of problem with some bit field extraction instructions of the NS32032 in which the hardware would load 32 bits starting at the byte address and extract from that datum. So even single-bit fields anywhere in the last 3 bytes of mapped memory would fail. The solution was to never allow the last 4 bytes of memory to be available for allocation.
Quite a few architectures that would be affected by this solve the problem by reserving offset 0xFFFFFFFF (and a bit more) for the OS.

Vector Ranges in C++

Another quick question here, I have this code:
string sa[6] = {
"Fort Sumter", "Manassas", "Perryville",
"Vicksburg", "Meridian", "Chancellorsville" };
vector<string> svec(sa, sa+6);
for (vector<string>::iterator iter = svec.begin(); iter != svec.end(); iter++)
{
std::cout << *iter << std::endl;
}
Why is it that when I do svec(sa, sa+7), the code works but it prints out an empty line after the last word and when I do sa+8 instead it crashes? Because the string array is only 6 elements big, shouldn't it crash at sa+7 also?
Thanks.
Accessing past the end of a vector is undefined behavior. Anything could happen. You might have nasal demons.
You have an array of only six elements. When you try to access the supposed "seventh" element, you get undefined behavior. Technically, that means anything can happen, but that doesn't seem to me like a very helpful explanation, so let's take a closer look.
That array occupies memory, and when you accessed the element beyond the end, you were reading whatever value happened to occupy that memory. It's possible that that address doesn't belong to your process, but it probably is, and so it's generally safe to read the sizeof(string) bytes that reside in that space.
Your program read from it and, since it was reading it through a string array, it treated that memory as though it were a real string object. (Your program can't tell the difference. It doesn't know any better. It's just trying to carry out your instructions.) Apparently, whatever data happened to be there looked enough like a real string object that your program was able to treat it like one, at least long enough to make a copy of it in the vector and then print its (empty) value. It worked this time, but that doesn't mean it will work every time.
There was no such luck with the data in the "eighth" position of the array. It did not look enough like a valid string object. A string object usually contains a pointer to the character data, along with a length. Maybe the area of the object that would normally represent that pointer didn't contain a valid address for your program. Or maybe the part that represented the length field contained a value far larger than what was available at the address in the pointer.
Your application does not crash because there is some standard that specifies that it should crash. Crashing is just random (undefined) behaviour. You will not always get a crash when you exceed the bounds of an array as you have found out.
Essentially, anything could happen such as printing a blank line, crashing or even as just posted - have demons fly out of your nose.
C++ doesn't do range-checking of arrays.
Reading beyond the end of an array is what's called "undefined" behaviour: i.e. it's not guaranteed to throw an exception, it's not guaranteed to not throw an exception, and it's not guaranteed to have consistent behaviour from one run to the next.
If people say that C++ is an "unsafe" language, this is part of what they mean by that. C++ doesn't check the range at run-time, because doing that a run-time take extra CPU instructions, and part of the design philosophy of C++ is to make it no slower than C.
Your compiler might have been able to warn you at compile-time (are you using the compiler command-line options to give you the maximum possible number of warnings?), though that too isn't guaranteed/required by the language.

What may cause losing object at the other end of a pointer in c++?

EDIT: I have found the error: I did not initialize an array with a size. question can be closed.
I have a class V, and another class N. An object of N will have an array of pointers to objects of class V (say V **vList). So, N has a function like
V **getList();
Now in some function of other classes or simply a driver function, if I say V **theList = (N)n.getList(); Q1: theList would be pointing at the 1st element of the array? Given that the size of array is known, can I loop through with index i and say V *oneV = *vList[i]? Please correct me if what I'm doing above is wrong.
I have been using debugger to trace through the whole process of my program running, the thing I found was that after using V *oneV = vList[i], the value of the pointers in the array, vList, were the same as when they were created, but if I follow the pointer to where it is pointing at, the object was gone. I'm guessing that might be the reason why I am getting seg fault or bus error. Could it be the case? WHY did I 'loose' the object at the other end of a pointer? What did I do wrong?
and yes, I am working on a school assignment, that's why I do not want to print out my codes, I want to finish it myself, but I need help finding a problem. I think I still need explanation on array of pointers. Thank you
Q1 is right. For the second part, V *oneV = vList[i] would be the correct syntax. In your syntax you are dereferencing one more time (treating an object of type V as a pointer to such an object) which obviously is crashing your code.
EDIT:
Since you are using the correct syntax, the reason of segfaults would depend on your memory management of the objects of type V. If you have inserted addresses of objects created on the stack (automatic vars, not by new or malloc) inside a function and are trying to access them outside of it, then the pointers would be dangling and your code will crash.
Class N has to manage the number of elements in a list somehow. The usual approaches are to make a public function which returns the number of elements in the array, or to provide an iterator function which loops over all the list's elements.
An array with N elements are stored at array[0] through array[N-1]. You're accessing one past the end of the array.
First rule out the initial ones:
you are initializing correctly (new instead of automatic/local variables)
you are accessing the elements correctly (not like in the typo you posted in the question - based on your comment)
you are using the right size
If you go through all the normal ones and everything is k, then make sure to pay special attention to your loops / size calculations / and anything else that could be causing you to write to unintended addresses.
It is possible to write garbage at unintended locations & then get the error in unexpected places ... the worst I saw like that, was some file descriptors's variables being corrupted because of an array gone wrong right before those variables - it broke on file related functions, which seemed v. crazy.
theList would be pointing at the 1st
element of the array? Given that the
size of array is known, can I loop
through with index i and say V *oneV =
*vList[i]?
Yes, that is correct.
I'm guessing that might be the reason
why I am getting seg fault or bus
error. Could it be the case?
Yes, if you have an invalid pointer and try to dereference it you'll get a segfault.
WHY did I 'loose' the object at the
other end of a pointer? What did I do
wrong?
That is difficult to predict without seeing the actual code. Most probable causes are that either you are not filling the V** correctly or after putting a V* pointer inside V** array you are deleting that object from some other place. BTW, I am assuming that you are allocating memory using new, is this assumption correct?

What is the point of pointer types in C++?

Let's say I have some pointers called:
char * pChar;
int * pInt;
I know they both simply hold memory addresses that point to some other location, and that the types declare how big the memory location is pointed to by the particular pointer. So for example, a char might be the size of a byte on a system, while an int may be 4 bytes.. So when I do:
pChar++; // I am actually incrementing the address pointed to by pChar by 1 byte;
pInt++; // I am actually incrementing the address pointed to by pInt by 4 bytes;
But what if I do this:
pChar+2; // increment the address pointed to by pChar by 2 bytes?
pInt+2; // increment the address pointed to by pInt by 2 bytes? what happens to the other two bytes?
Thanks.. Would appreciate any clarification here.. Is the pointer type simply for the ++ operation?
EDIT: So avp answered my question fittingly, but I have a follow up question, what happens when I do:
memcpy(pChar,pInt,2);
Will it copy 2 bytes? or 4 bytes? Will I have an access violation?
EDIT: THe answer, according to Ryan Fox, is 2 bytes, because they are typecasted to a (void*). Thanks! CLOSED!
EDIT: Just so that future searchers may find this.. Another piece of info I discovered..
memcpy(pChar+5,pInt+5,2);
doesnt copy 2 bytes of the memory block pointed to by pInt+5bytelocations,to pChar+5bytelocations.. what happens is that 2 bytes are copied to pChar+5bytelocations from pInt(4*5)bytelocations.. no wonder I got access violations, I was trying to read off somewhere I wasn't supposed to be reading.. :)
"++" is just another name for X = X + 1;
For pointers it doesn't matter if you increment by 1 or by N.
Anyway, sizeof(type)*N is used. In the case of 1 it will be just sizeof(type).
So, when you increment by 2 (your second case):
for char is 2*sizeof(char)=2*1=2 bytes,
for int will be 2*sizeof(int)=2*4=8 bytes.
Ahh, now I understand. You should have asked - "What is the point of pointers having types?"
There are two points, actually:
Pointer arithmetics;
Dereferencing (getting the value back that is stored in the address that the pointer is pointing to).
Both would be impossible without knowing the type of the pointer.
Added: Read the documentation of memcpy. The last argument is number of bytes, because memcpy has no idea what the type of the pointer is. Both arguments to it are void pointers.
Added 2: Access violation - it depends. If you aren't going outside of the memory that you have allocated for these pointers, there will be no access violation. The copy operation will copy everything byte-by-byte and you will get your results just like you expect them (although it might not make much sense).
If you are going outside your allocated memory bounds then you might get an access violation, but you might as well just cross over into the memory that was allocated for another variable. It's pretty much impossible to tell what gets where when your program is executed, so doing this will lead to quite unpredictable results.
There are three main advantages of pointers:
You can pass arguments to function "by reference". This used to be more of an issue in C, which didn't have real references like C++, but it's still very useful in many cases, like when you have to cooperate with external libraries. Also notice, that passing by reference is not only useful when you want the function to modify the variable you're passing. It's also very good for passing large data structures as parameters.
For building all kinds of nifty dynamic data structures like trees, linked lists, etc. This would be impossible without pointers.
For being able to re-allocate arrays to bigger/smaller ones as needed.
P.S. I understand that the question was about why pointers are good, using the arithmetics only as an example, right?
Pointer arithmetic doesn't work precisely that way. Your first example is correct, the second not so much.
pChar+2; // increment the address pointed to by pChar by 2 bytes
pInt+2; // increment the address pointed to by pInt by 8 bytes
For this part:
memcpy(pChar+5,pInt+5,2);
First, "+" is evaluated, then, typecast.
So in bytes:
pChar+5 here "5" is 5 bytes,
pInt+5 here "5" is 5 ints, so 5 * 4 = 20 bytes.
Then everything is cast to void* and two bytes copied.
If instead of "5" you use counter, like here:
for (int i = 0; i<100; i++)
memcpy(pChar+i, pInt+i, 2);
Then for pChar you will be overwriting one copied byte (the second) with the next copy command. And for pInt you will be jumping 4 bytes each step (which is ok for array of ints though).
I would have said that the point of pointer types in C++ is to account for vtable offsets.