Why C array has extra bytes at tail? [duplicate] - c++

This question already has answers here:
What is the purpose of allocating a specific amount of memory for arrays in C++?
(5 answers)
Closed 5 years ago.
I examine that C array maybe have some extra bytes at tail.
There are my code
int a = 5;
int test[] = {1,2,3,4};
int b = 5;
test[-1] = 11;
test[4] = 11;
cout << b << endl; // 11
cout << a << endl; // 5
You can see the running result there
the value of b is changed through changing test[-1]'s value. But when I change test[4]'s value, the value of a doesn't change;
I use gdb to check their addresses, found that
In g++ 6.4.0, the address of a substract address of test[4] is 8 bytes
In clang++ 3.8.1, the address of a substract address of test[4] is 4 bytes
So, I am curious that why the array has some bytes at tail?
Thanks #Peter A.Schneider to explaining the question.
It is surely a UB , But it is just a experimental code. This isn't a discuss for practical code.
generally,variables at the runtime stack are close together. b is close to test, but why 'a' is not close to 'test+3'. That's the key of the problem.

test[-1] = 11;
test[4] = 11;
This is undefined behavior.(Meaning anything could have happened). In your case you changed the value of b because they are adjacent in the memory where they are allocated. But you shouldn't rely on it. Because this may blow up your program or results in erroneous code behavior most of the time.
The UB you have is because `Accessing an array index out of bound in undefined behavior."

Related

Palindrome mystery: Why an array of size 3 ends up being printed with 5 elements?

#include <iostream>
#include <cstring>
using namespace std;
int main(){
char a[] = "abc";
char b[2];
for(int i = 0,k = 2;i < 3;i++,k--){
b[k] = a[i];
cout << i << " " << k << endl;
}
if(strcmp(a,b) == 0){
cout << "palindrome";
}else{
cout << "no palindrome" << endl;
}
cout << "a: " << a << endl;
cout << "b: " << b << endl;
return 0;
}
output:
0 2
1 1
2 0
no palindrom
a: abc
b: cbabc
I don't understand why b array ends up with 5 elements, when the array holds only 3. Additionally, the loop loops only 3 times and this is the output I get.... A mystery.
You have an out-of-bounds array access and also need to be conscious of null-terminating your strings!
Specifically, char b[2]; gives you an array with exactly 2 chars, so only b[0] and b[1] are valid. You also need to account for the null character that should terminate all C-style strings. So to hold "cba" for example you need 4 elements. You can also see this if you print sizeof(a) (should be 4: 'a', 'b', 'c', '\0').
Basically, your program elicits undefined behavior (UB). The simple fix is to make b bigger (the same size as a, which is 4 in this case). The more complete answer is to manage your array lengths more carefully and look at the safer "n" versions of the C manipulation functions such as strncmp
Edit: to be complete, you have 2 sourced of UB. The first is in line b[k] = a[i] when k == 2 because again you have only allocated b[0] and b[1]. The second is when you call strcmp since b has not been properly null-terminated and strcmp will happily read past the array bounds, which it doesn't know.
b is not terminated by a null character (\0), so any string operation on it (like strcmp, or even just printing it with cout runs over until it happens to hit such a character somewhere in the memory. In other words, you are witnessing undefined behavior.
Strictly speaking you have undefined behaviour and any observed behaviour (wrong or seemingly partially correct) is explained by that.
For details and solutions see the other answers.
End of answer.
Now lets look at a speculation on why you might in your environment end up with specifically the output you observe.
Assumption, the memory for your arrays
char a[] = "abc";
char b[2]
looks like an often seen habit of linkers of how to arrange variables:
b[0] non-initialised
b[1] non-initialised
a[0] = 'a'
a[1] = 'b'
a[2] = 'c'
a[3] = '\0'
Note the four (not three) elements of a and the terminator 0.
Your loop, right in the first iteration, attempts to write to the non-existing b[2].
This is already what causes undefined behaviour. Clean discussion ends here.
Let's continue speculating.
Your loop unintentionally writes one place beyond the existing b[1] and ends up clobbering a[0]. By chance it writes the value which happens to be already there, so no change there.
Your loop continues to write, now to existing entries of b.
The speculated result is
b[0] = 'c
b[1] = 'b'
a[0] = 'a' = 'a'
a[1] = 'b'
a[2] = 'c'
a[3] = '\0'
and the loop ends.
Then you try to output a and b.
This is done by outputting all characters found consecutively from the start of the arrays, until a terminator 0 is found.
For a this (luckily in case of the "a") is "abc\0", all from a.
For b this is "bc" from b, followed (on the search for a 0) by "abc\0" from a.
Note that the seemingly correct "a" already is incorrectly from a, not from b.
Ok, when debugging this you can check for address of b[2].
In gdb:
(gdb) p &b[1]
$8 = 0x7fffffffdfe3 "\377abc"
See? If b was null terminated it would start with '\0', but it doesn't, you tell the compiler to use 2 spaces for b. When asked the debugger what's the address of last b character b[1], it not only tells the address, it also shows the char* value represented. As b is a non null terminated (my compiler didn't initialize it), it will continue beyond the boundaries of b!. Suspiciously enough the string of characters finishes with 'a''b''c''\0'. Let's check address of a[0]:
(gdb) p &a[0]
$9 = 0x7fffffffdfe4 "abc"
See? The a field pointed by b is contiguous to a. Now you are making two mistakes here:
You are not properly initializing b.
b reserves 2 slots of memory. If you want to check palindromes of a fixed size of 3 characters you should reserve 4 slots like you did for the null terminated string "abc".
Try changing b declaration from:
char b[2];
To:
char b[] = "xyz";
Your initialization code will set the palindrome as a function of a, so it would do what you intend to.

Strangely For loop counter variable gets reduced by .get()

Consider the following piece of code. This function reads the some integers and strings from a file.
const int vardo_ilgis = 10;
void skaityti(int &n, int &m, int &tiriama, avys A[])
{
ifstream fd("test.txt");
fd >> n >> m >> tiriama;
fd.ignore(80, '\n');
char vard[vardo_ilgis]; // <---
for(int i = 1; i <= n; i++)
{
cout << i << ' ';
fd.get(vard, vardo_ilgis+1); // <---
cout << i << endl;
A[i].vardas = vard;
getline(fd, A[i].DNR);
}
fd.close();
}
and input:
4 6
4
Baltukas TAGCTT
Bailioji ATGCAA
Doli AGGCTC
Smarkuolis AATGAA
In this case, variable 'vard' has a length vardo_ilgis = 10, but in function fd.get the read input is vardo_ilgis+1 = 11 (larger than the variable length in which data is stored). I'm not asking how to fix a problem, because it's obvious not to read more than you can store on a variable.
However, I really want to understand the reason of this behaviour: the loop count variable gets decreased by fd.get. Why and how even can this happen? That's the output of this little piece of code:
1 0
1 0
1 0
1 0
1 1
2 2
3 3
4 4
Why did you use +1 ??
fd.get(vard, vardo_ilgis+1);
Overrunning that buffer corrupts some memory. In a simple unoptimized build, that corrupted memory could be the loop index.
the loop count variable gets decreased by fd.get. Why and how even can this happen?
Once you know why you have caused undefined behavior, many people say you aren't supposed to inquire into the details of that undefined behavior. I disagree. By understanding the details, you can improve your ability to diagnose other situations where you don't know what undefined behavior you might have invoked.
All you local variables are stored together, so overwriting one will tend to clobber another.
You describe the variable being "decreased" when in fact it was set to zero. The fact that it was 1 before being zeroed didn't affect its being zeroed. The undefined behavior happened to be equivalent to i&=~255; which for values under 256 is equal to i=0;. It is more accidental that you could see it as i--;
Hopefully it is clear why i stopped being zeroed once you ran out of input.
fd.get(vard, vardo_ilgis+1); makes buffer be written out-of-bounds.
In your case, the area where you write (and where you should not) is probably the same memory area where i is stored.
But, what's most important is that you end up with the so famous undetermined behaviour. Which mean anything could happen and there is no point trying to understand why or how (what happens is platform, compiler and even context specific, I don't think anyone can predict nor explain it).

Why sizeof behaves differently for an double pointer and a double array? [duplicate]

This question already has answers here:
Pointer array and sizeof confusion
(5 answers)
Closed 8 years ago.
I wrote a code a fragment of which is shown below. I don't understand why it does not print 800 for the pointer variable p.
double *p = new double [100];
double q[10];
printf("Sizeof(p) = %d\n", sizeof(p)); // prints 4
printf("Sizeof(q) = %d\n", sizeof(q)); // prints 80
I understand why it prints 80 for q (8 bytes/double * 10) but why not 800 for p? An associated question would be, how does the compiler know how much space to deallocate when it encounters the delete for p?
delete [] p;
Because the actual pointer address is stored in 4 bytes. If you wanted the size of what p points to, you would say:
sizeof(*p);

Strlen returns undefined behaviour with C++ [duplicate]

This question already has answers here:
Strlen returns unreasonable number
(4 answers)
Closed 8 years ago.
int main ()
{
char* tab=new char[14] ;
cout << " lenght with sizeof: "<<sizeof(tab)<<endl;
cout << " length with strlen: "<<strlen(tab)<<endl;
system(" pause");
return 0;
}
I got the output:
length with sizeof: 4
length with strlen: 30
I expect the result of sizeof but not what return strlen!
For those who will hurry to publish that it's a duplicate question.
I want to say that it's not the opportunity at all. Because I know about compile time and run-time and many other things concerning strlen and sizeof however I cannot find explanation to this result.
Thank you for help in advance.
Since you're allocating a char array but do not initialize it, strlen() will count from the beginning of the tab pointer to the first NUL character. So the result depends on the contents of your program's heap.

Manual memory management in C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
summary of code: stores over 16 million uint8_t in a 3d array as pointers to those uint8_t.
The code works but why is it that I only saved 4 KB by using uint8_t as opposed to ints. I run this same code with ints it uses 330,488K but with the uint8_t it uses 330,484. I know most of that is the pointers but shouldn't (assuming each int used minimum space) decreasing the size of each 16 million ints from 2bytes to 1 byte have saved more than 4k??? I'm thinking it should have saved closer to 16 MB right?
By "Run the same code with ints" I literally do a "find and replace: uint8_t with int" Then recompile.
uint8_t**** num3d;
num3d = new uint8_t***[256];
for(int i=0;i<256;i++){
num3d[i] = new uint8_t**[256];
for(int j=0;j<256;j++){
num3d[i][j] = new uint8_t*[256];
}
}
// Initialize
uint8_t *B;
for(int lx = 0;lx<256;lx++){
for(int ly= 0;ly<256;ly++){
for(int lz=0;lz<256;lz++){
if(ly == 0 || lx == 0 || lz == 0 || ly == 255 || lx == 255 || lz == 255){
B = new uint8_t(2);
num3d[lx][ly][lz] = B;
continue;
}
if(ly < 60){
B = new uint8_t(1);
num3d[lx][ly][lz] = B;
continue;
}
B = new uint8_t(0);
num3d[lx][ly][lz] = B;
} // inner inner loop
} // inner loop
} // outer loop
Answer to question 1)... This loops goes for ever:
for (uint8_t i=0;i<256;i++)
Indeed the range of number which can be representable by a uint8_t is 0...255. So don't use uint8_t here !
It seems to me that since your computer is allocating is this loop, it will end up eating all memory therefore question 2) doesn't really make sense.
" My question is what is it about int that allows it to work using full 32 bit ints and how would I replicate what the program already does with ints for use with 8 bit ints. I know they must have included memory management into normal ints that isn't included with uint8_t."
Well, int is at least 16 bits, 32 bits isn't even guaranteed. But ignoring that, the fact is that each integral type has a certain range. std::numeric_limits<int> or <uint_8> will tell you the respective ranges. Obviously you can't use an 8 bit number to count from 0 to 256. You can only count to 255.
Also, there's no memory management at all for int and other simple types like uint_8. The compiler justs says "The integer with name Foo is stored in these bytes" and that's it. No management needed. There are a few minor variations, e.g. an int member of a struct is stored "in these bytes of the struct" etcetera.