Consider this code:
#include <string.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
const char *task = "foo";
int key = 0;
int arr[] = {};
if (!strcmp(task, "foo")) {
key++;
}
arr[key] = 2;
key++;
printf("key: %d\n", key);
}
The final value of key is 3. It should be 2.
If I change "foo" to "foo1" on the first occurrence, the final value of key is 1, as expected.
If I change arr[key] = 2 to arr[key] = 1, or remove that line, the final value of key is 2, as expected.
Why is this?
You have undefined behavior in your code so anything can happen.
int arr[] = {};
is a zero sized array which is not standard. Since its size is 0 accessing any element and setting its value is undefined behavior. Since we now have undefined behavior there is no longer any way to reason out how the program works.
When you do this:
int arr[] = {}; you are trying to tell compiler to allocate memory for array of size which is equal to elements specified in between {}. But since you have no elements, compiler didn't allocate any memory. All this is happening at compile time.
And then you are trying arr[key] = 2;, that you are trying to modify the memory which you don't own. In this case, you will either get segmentation fault, or code will silently move on and corrupt memory allocated which is allocated somewhere else by your code.
so ideally you should have some elements in between {}, if you know at compile time that you will never need more than that, or use dynamic allocation at run time.
This is undefined behavior, so it might produce different results on a different compiler or for any of a number of reasons.
The array you define has zero length. The standard might enforce a minimum of length 1, but in this case it doesn't matter - you're accessing [1] which is out of bounds either way. That's what causes undefined behavior.
Related
For the code(Full demo) like:
#include <iostream>
struct A
{
int a;
char ch[1];
};
int main()
{
volatile A *test = new A;
test->a = 1;
test->ch[0] = 'a';
test->ch[1] = 'b';
test->ch[2] = 'c';
test->ch[3] = '\0';
std::cout << sizeof(*test) << std::endl
<< test->ch[0] << std::endl;
}
I need to ignore the compilation warning like
warning: array subscript 1 is above array bounds of 'volatile char 1' [-Warray-bounds]
which is raised by gcc8.2 compiler:
g++ -O2 -Warray-bounds=2 main.cpp
A method to ignore this warning is to use pointer to operate the four bytes characters like:
#include <iostream>
struct A
{
int a;
char ch[1];
};
int main()
{
volatile A *test = new A;
test->a = 1;
// Use pointer to avoid the warning
volatile char *ptr = test->ch;
*ptr = 'a';
*(ptr + 1) = 'b';
*(ptr + 2) = 'c';
*(ptr + 3) = '\0';
std::cout << sizeof(*test) << std::endl
<< test->ch[0] << std::endl;
}
But I can not figure out why that works to use pointer instead of subscript array. Is it because pointer do not have boundary checking for which it point to? Can anyone explain that?
Thanks.
Background:
Due to padding and alignment of memory for struct, though ch[1]-ch[3] in struct A is out of declared array boundary, it is still not overflow from memory view
Why don't we just declare the ch to ch[4] in struct A to avoid this warning?
Answer:
struct A in our app code is generated by other script while compiling. The design rule for struct in our app is that if we do not know the length of an array, we declare it with one member, place it at the end of the struct, and use another member like int a in struct A to control the array length.
Due to padding and alignment of memory for struct, though ch[1]
– ch[3] in struct A is out of declared array boundary, it is
still not overflow for memory view, so we want to ignore this warning.
C++ does not work the way you think it does. You are triggering undefined behavior. When your code triggers undefined behavior, the C++ standard places no requirement on its behavior. A version of GCC attempts to start some video games when certain kind of undefined behavior is encountered. Anthony Williams also knows at least one case where a particular instance of undefined behavior caused someone's monitor to catch on fire. (C++ Concurrency in Action, page 106) Your code may appear to be working at this very time and situation, but that is just an instance of undefined behavior and you cannot count on it. See Undefined, unspecified and implementation-defined behavior.
The correct way to suppress this warning is to write correct C++ code with well-defined behavior. In your case, declaring ch as char ch[4]; solves the problem.
The standard specifies this as undefined behavior in [expr.add]/4:
When an expression J that has integral type is added to or
subtracted from an expression P of pointer type, the result has the
type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]),78 the expressions P +
J and J + P (where J has the value j) point to the
(possibly-hypothetical) array element i + j of x if
0 ≤ i + j ≤ n and the
expression P - J points to the (possibly-hypothetical) array element
i − j of x if 0 ≤ i − j ≤ n.
Otherwise, the behavior is undefined.
78) An object that is not an array element is
considered to belong to a single-element array for this purpose; see
[expr.unary.op]. A pointer past the last element of an array x of
n elements is considered to be equivalent to a pointer to a hypothetical array element n for this purpose; see
[basic.compound].
I want to avoid the warning like
warning: array subscript 1 is above array bounds of 'volatile char 1' [-Warray-bounds]
Well, it is probably better to fix the warning, not just avoid it.
The warning is actually telling you something: what you are doing is undefined behavior. Undefined behavior is really bad (it allows your program to literally anything!) and should be fixed.
Let's look at your struct again:
struct A
{
int a;
char ch[1];
};
In C++, your array has only one element in it. The standard only guarantees array elements of 0 through N-1, where N is the size of the array:
[dcl.array]
...If the value of the constant expression is N, the array
has N elements numbered 0 to N-1...
So ch only has the elements 0 through 1-1, or elements 0 through 0, which is just element 0. That means accessing ch[1], ch[2] overruns the buffer, which is undefined behavior.
Due to padding and alignment of memory for struct, though ch1-ch3 in struct A is out of declared array boundary, it is still not overflow for memory view, so we want to ignore this warning.
Umm, if you say so. The example you gave only allocated 1 A, so as far as we know, there is still only space for the 1 character. If you do allocate more than 1 A at a time in your real program, then I suppose this is possible. But that's still probably not a good thing to do. Especially since you might run into int a of the next A if you're not careful.
A solution to ignore this warning is to use pointer...But I can not figure out why that works. Is it because pointer do not have boundary checking for which it point?
Probably. That would be my guess too. Pointers can point to anything (including destroyed data or even nothing at all!), so the compiler probably won't check it for you. The compiler may not even have a way of knowing whether the memory you point to is valid or not (or may just not care), and, thus, may not even have a way to warn you, much less will warn you. Its only choice is to trust you, so I'm guessing that's why there's no warning.
Why don't we just declare the ch to ch4 in struct A to avoid this warning?
Side issue: actually std::string is probably a better choice here if you don't know how many characters you want to store in here ahead of time--assuming it's different for every instance of A. Anyway, moving on:
Why don't we just declare the ch to ch4 in struct A to avoid this warning?
Answer:
struct A in our app code is generated by other script while compiling. The design rule for struct in our app is that if we do not know the length of an array, we declare it with one member, place it at the end of the struct, and use another member like int a in struct A to control the array length.
I'm not sure I understand your design principle completely, but it sounds like std::vector might be a better option. Then, size is kept track of automatically by the std::vector, and you know that everything is stored in ch. To access it, it would be something like:
myVec[i].ch[0]
I don't know all your constraints for your situation, but it sounds like a better solution instead of walking the line around undefined behavior. But that's just me.
Finally, I should mention that if you are still really interested in ignoring our advice, then I should mention that you still have the option to turn off the warning, but again, I'd advise not doing that. It'd be better to fix A if you can, or get a better use strategy if you can't.
There really is no way to work with this cleanly in C++ and iirc the type (a dynamically sized struct) isn't actually properly formed in C++. But you can work with it because compilers still try to preserve compatibility with C. So it works in practice.
You can't have a value of the struct, only references or pointers to it. And they must be allocated by malloc() and released by free(). You can't use new and delete. Below I show you a way that only allows you to allocate pointers to variable sized structs given the desired payload size. This is the tricky bit as sizeof(Buf) will be 16 (and not 8) because Buf::buf must have a unique address. So here we go:
#include <cstddef>
#include <cstdint>
#include <stdlib.h>
#include <new>
#include <iostream>
#include <memory>
struct Buf {
size_t size {0};
char buf[];
[[nodiscard]]
static Buf * alloc(size_t size) {
void *mem = malloc(offsetof(Buf, buf) + size);
if (!mem) throw std::bad_alloc();
return std::construct_at(reinterpret_cast<Buf*>(mem), AllocGuard{}, size);
}
private:
class AllocGuard {};
public:
Buf(AllocGuard, size_t size_) noexcept : size(size_) {}
};
int main() {
Buf *buf = Buf::alloc(13);
std::cout << "buffer has size " << buf->size << std::endl;
}
You should delete or implement the assign/copy/move constructors and operators as desired. A another good idea would be to use std::uniq_ptr or std::shared_ptr with a Deleter that calls free() instead of returning a naked pointer. But I leave that as exercise to the reader.
Simple code
#include <iostream>
using namespace std;
struct foo {
int bar;
};
struct foo tab[2];
int sum = 0;
int main()
{
tab[2].bar = 3; //this change 'sum' value!
cout << sum << endl;
return 0;
}
result in 3 instead of 0. It is unbelievable, so problably I am missing something. What I have done wrong?
Arrays start at 0, so tab[2] would be the third element, but you only allocated 2 of them.
In this case, sum is in the memory directly after tab, so when you go to where the third tab would be, you're actually in the memory for sum.
Notice that you access tab[2] which is an overflow (its size is 2 so valid indices are 0 and 1).
So tab[2] accesses the memory address of sum.
When you declare your variable
struct foo tab[2];
tab[2] does not exist.
You can only do
tab[0].bar = 3
tab[1].bar = 3
because arrays index starts from 0 and ends at arraySize-1.
If you look closely tab has a length of 2. By accessing the index 2, you are accessing memory out of the tab, which means you are accessing sum.
This is the reason why you are changing sum.
First of all, turn on compiler warnings! If you'd allow the compiler to help you, then it would very likely point out the exact error in this line:
tab[2].bar = 3; //this change 'sum' value!
Depending on which compiler you use, the warning may be as follows:
warning: array subscript is above array bounds
struct foo tab[2]; has two elements with indices 0 and 1, you try access a non-existing 3rd element. This results in undefined behaviour of your program. Whatever results you got, it was just random. Your program could also randomly crash.
Note that your code is also half C and half C++. That's not good. You don't need to write struct foo when you want to refer to the foo type, it's enough to write foo. Instead of a raw array, std::array<Foo, 2> can be used. And using namespace std; should not be used by default.
In this code below I try to access the '-1'th element of an array, I don't get any runtime error.
#include <stdio.h>
int A[10] = {0};
int main(){
A[-1] += 12;
printf("%d",A[-1]);
return 0;
}
When I run the code, it outputs 12 that means it is adding 12 to the non-existent A[-1]. Till today whenever I had tried to access an out-of-bounds element, I had got a runtime-error. I had never tried it on a simple code before.
Can anyone explain why does my code run successfully?
I ran it on my computer and also on ideone, in both the cases it ran successfully.
You see, when you allocate a variable like this, it lands on the stack. Stack holds small packages of information about local variables in each function you call, to say it in simple words. The runtime is able to check, whether you exceed the bounds of allocated stack, but not if you write some data in the invalid place on the stack. The stack may look like the following:
[4 bytes - some ptr][4 bytes - A's first element][4 bytes - A's second element] ...
When you try to assign to -1th element of an array, you actually attempt to read four bytes preceding the array (four bytes, because it's an int array). You overwrite some data held on stack - but that's still in valid process's memory, so there are no complaints from the system.
Try running this code in release mode in Visual Studio:
#include <stdio.h>
int main(int argc, char * argv[])
{
// NEVER DO IT ON PURPOSE!
int i = 0;
int A[5];
A[-1] = 42;
printf("%d\n", i);
getchar();
return 0;
}
Edit: in response to comments.
I missed the fact, that A is global. It won't be held in stack, but instead (mostly probably) in .data segment of the binary module, however the rest of explanation stands: A[-1] is still within process's memory, so assignment won't raise AV. However, such assignment will overwrite something, that is before A (possibly a pointer or other part of the binary module) resulting in undefined behavior.
Note, that my example may work and may not, depending on compiler (or compiler mode). For example, in debug mode the program returns 0 - I guess, that memory manager inserts some sentry data between stack frames to catch errors like buffer over/underrun.
C and C++ does not have any bounds checking. It is a part of the language. It is to enable the language to execute faster.
If you want bounds checking use another language that has it. Java perhaps?
As your code executes you are just lucky.
In C++ (and C), the arrays don't check out of range indices. They're not classes.
In C++11, however you could use std::array<int,10> and at() function as:
std::array<int,10> arr;
arr.at(-1) = 100; //it throws std::out_of_range exception
Or you can use std::vector<int> and at() member function.
According to the correct answer in Static array vs. dynamic array in C++ static arrays have fixed sizes.
However, this compiles and runs just fine:
int main(int argc, char** argv) {
int myArray[2];
myArray[0] = 0;
myArray[1] = 1;
cout<<myArray[0]<<endl;
cout<<myArray[1]<<endl;
myArray[4];
myArray[2] = 2;
myArray[3] = 3;
cout<<myArray[2]<<endl;
cout<<myArray[3]<<endl;
return 0;
}
Does this mean a static array can be resized?
You're not actually enlarging the array. Let's see your code in detail:
int myArray[2];
myArray[0] = 0;
myArray[1] = 1;
You create an array of two positions, with indexes from 0 to 1. So far, so good.
myArray[4];
You're accessing the fifth element in the array (an element which surely does not exist in the array). This is undefined behaviour: anything can happen. You're not doing anything with that element, but that is not important.
myArray[2] = 2;
myArray[3] = 3;
Now you are accessing elements three and four, and changing their values. Again, this is undefined behaviour. You are changing memory locations near to the created array, but "nothing else". The array remains the same.
Actually, you could check the size of the array by doing:
std::cout << sizeof( myArray ) / sizeof( int ) << std::endl;
You'll check that the size of the array has not changed. BTW, this trick works in the same function in which the array is declared, as soon you pass it around it decays into a pointer.
In C++, the boundaries of arrays are not checked. You did not receive any error or warning mainly because of that. But again, accessing elements beyond the array limit is undefined behaviour. Undefined behaviour means that it is an error that may be won't show up immediately (something that is apparently good, but is actually not). Even the program can apparently end without problems.
No, not a chance in hell. All you've done is illegally access it outside it's bounds. The fact that this happens to not throw an error for you is utterly irrelevant. It is thoroughly UB.
First, this is not a static array, it is an array allocated in the automatic storage.
Next, the
myArray[4];
is not a new declaration, it is a discarded read from element #4 of the previously declared 2-element array - an undefined behavior.
Assignments that follow
myArray[2] = 2;
myArray[3] = 3;
write to the memory that is not allocated to your program - an undefined behavior as well.
I have the following program:
//simple array memory test.
#include <iostream>
using namespace std;
void someFunc(float*, int, int);
int main() {
int convert = 2;
float *arr = new float[17];
for(int i = 0; i < 17; i++) {
arr[i] = 1.0;
}
someFunc(arr, 17, convert);
for(int i = 0; i < 17; i++) {
cout << arr[i] << endl;
}
return 0;
}
void someFunc(float *arr, int num, int flag) {
if(flag) {
delete []arr;
}
}
When I put the following into gdb and insert a break point at float *arr ..., I step through the program and observe the following:
Printing the array arr after it has been initialized gives me 1 17 times.
Inside someFunc too, I print arr before delete to get the same print as above.
Upon going back into main, when I print arr, I get the first digit as 0 followed by 16 1.0s.
My questions:
1. Once the array has been deleted in someFunc, how am I still able to access arr without a segfault in someFunc or main?
2. The code snippet above is a test version of another piece of code that runs in a bigger program. I observe the same behaviour in both places (first digit is 0 but all others are the same. If this is some unexplained memory error, how am I observing the same thing in different areas?
3. Some explanations to fill the gaps in my understanding are most welcome.
A segfault occurs when you access a memory address that isn't mapped into the process. Calling delete [] releases memory back to the memory allocator, but usually not to the OS.
The contents of the memory after calling delete [] are an implementation detail that varies across compilers, libraries, OSes and especially debug-vs-release builds. Debug memory allocators, for instance, will often fill the memory with some tell-tale signature like 0xdeadbeef.
Dereferencing a pointer after it has been deleteed is undefined behavior, which means that anything can happen.
Once the array has been deleted, any access to it is undefined behavior.
There's no guarantee that you'll get a segment violation; in fact,
typically you won't. But there's no guarantee of what you will get; in
larger programs, modifying the contents of the array could easily result
in memory corruption elsewhere.
delete gives the memory back to the OS memory manager, but does not necessarily clears the contents in the memory(it should not, as it causes overhead for nothing). So the values are retained in the memory. And in your case, you are accessing the same memory -- so it will print what is in the memory -- it is not necessarily an undefined behaviour(depends on memory manager)
Re 1: You can't. If you want to access arr later, don't delete it.
C++ doesn't check for array boundaries. Only if you access a memory which you are not allowed to you will get segfault