Vector out of boundaries access: why such behavior? - c++

I am aware that out of boundary access of an std::vector in C++ with the operator[] results in undefined behavior. So, I should not expect anything meaningful doing that. However, I'm curious about what is actually happening there under the hood.
Consider the following piece of code:
#include <iostream>
#include <vector>
int main() {
{
std::cerr << "Started\n";
std::vector<int> v(2);
std::cerr << "Successfully initialized vector\n";
v[-1] = 10000; // Note: if accessing v[3], nothing bad seems to happen
std::cerr << "Successfully accessed element -1\n";
}
std::cerr << "Successfully destructed the vector\n";
}
When compiled on GNU/Linux with g++ (GCC) 11.2.0, running this code produces the following output:
Started
Successfully initialized vector
Successfully accessed element -1
double free or corruption (out)
Aborted (core dumped)
Why could have that happened? Why does it cause the destructor to fail? Why does it produce such an error message?
I would understand it if I was using some structure that stored the array together with it on the stack: I would then accidentally access some of its internal data that lies right before v[0] and could have broken something. But as far as I know, the underlying array of std::vector is stored on heap, so the data that I access should not even belong to it, should it? Also, because my last output attempt is taken right after exiting the block with only vector declared in it, I don't see what else except for its destructor could have been called, so the vector seems to be somehow affected by my action...

A hypothetical answer that could have happened: The UB caused arbitrary piece of memory to be overwritten. This is called memory corruption.
That overwritten arbitrary piece of memory happened to be right before the dynamic memory that the vector allocated. The arbitrary piece of memory right before the allocation happened to contain an "information header" that describes the allocation. When the destructor was called, there was an attempt to deallocate the memory. The global allocator detected that the corrupted information was inconsistent, produced the diagnostic message and terminated the program.
This is what the source code of the global memory allocator on your system may look like: https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#4326 The link leads specifically to the line that produces the error message.

Related

Considering makecontext() what is uc_stack.ss_size good for?

Prior to calling makecontext why do we need to set the stack size ss_size?
I just had an unit test case for makecontext/swapcontext snippet and it failed with SIGSEGV. What happened was that stack size was too small and unrelated memory (happened to be some unique pointers) got corrupted and reported segfault. So the segfault was on these unrelated pointers, I could have had e.g. some string and then the memory corruption would have been unnoticed.
I would have expected that SIGSEGV is beeing raised immediately when stack size ss_size does not suffice, but
considering the memory corruption described above, I conclude its impossible to recover from SIGSEGV here. That brings me back to the question, why do we need to set the stack size then in first place, when it is not being used to signal overflows? What is it used for?
EDIT:
Well it's all about makecontext(3). These functions are still being used for green threads, coroutines etc. There is just no real replacement for them considering these tasks (in my opinion) also not in c++.
ss_size defined in sigaltstack(2) is being needed for uc_stack in ucontext_t defined in getcontext(3).
Following a minimal verifiable example that shows the memory corruption, by "painting" the memory, described above.
#include <iostream>
#include <ucontext.h>
#include <memory>
#include <cstring>
#include <stdio.h>
#include <unistd.h>
ucontext_t caller, callee;
void cb(void){
//paint stack with 2
char tmp[7000];
std::memset(tmp,2,7000);
//note stack size specified 6k bytes in size
//this should not be allowed.
//furthermore there is not even some signal raised here
//i expected raised SIGSEGV when this call stack exceeds ss_size
//it makes ss_size useless no?
}
int main(){
//
std::memset(&caller,0,sizeof(caller));
std::memset(&callee,0,sizeof(callee));
//create stack and paint 0
std::unique_ptr<std::byte[]> stack(new std::byte[10000]());
std::memset(stack.get(),0,10000);//paint stack 0
//make context
//note stack specified to [2000,8000)
//that means [0,2000) and [8000,10000) should not be touched
if(getcontext(&callee) == -1) {std::cout << errno << ":" << std::strerror(errno) << std::endl; return 1;}
callee.uc_link = &caller;
callee.uc_stack.ss_sp = stack.get()+2000;
callee.uc_stack.ss_size = 6000; //what is this line good for, what is it guarding?
makecontext(&callee,cb,0);
//swap to callee
if(swapcontext(&caller,&callee) == -1) {std::cout << errno << ":" << std::strerror(errno) << std::endl; return 1;}
//print color - should be 0
//if 2 then memory corrupted by callee
std::cout << int(stack[996]) << std::endl;
std::cout << int(stack[997]) << std::endl;
std::cout << int(stack[998]) << std::endl;
std::cout << int(stack[999]) << std::endl;
return 0;
}
Once again what I don't understand is why we need to set the stack size ss_size, because it looks like that it is not being used to guard against memory corruption or anything else. It looks like it is just there to be there but without any use. But I can't believe that it has no use. So what is it "guarding" / good for?
Well, I don't want to bring more confusion into this. The goal is to get away from a fixed size function call stack by either being able to recover by installing SIGSEGV signal handler, but this looks like mission impossible due to this memory corruption; or to have a growable stack e.g. using mmap(2) with MAP_GROWSDOWN flag, but this looks broken and therefore not an option.
callee.uc_stack.ss_size = 6000; // what is this line good for, what is it guarding?
This line set's the stack size (as you could read in man sigalstack). From reading makecontext from glibc the ss_size is used for determining end of stack, where glibc setups the stack of the new context. Because stack on some machine "grows toward numerically lower addresses" (like it does on x86 architecture and wiki x86) the makecontext needs/wants to place it data on the end of the stack. So it needs to determinate the end of stack and this is what ss_size is used for.
Setting ss_size to any value does not mean that overflowing the stack size will issue a operating system signal to your process that notifies that your process tried to access restricted memory area. The implementation of *context isn't (and, well, shouldn't be) designed to make the address ss_sp + ss_size (+ 1) as kernel protected memory, so that writing to that address will trigger segmentation fault. This is still all normal variables. As always with writing to an unknown memory location and for example overflowing arrays, the invalid address may just happen to be inside your process address space, so according to the kernel the process will be writing inside it's address space and everything is fine. As you do here - your cb function writes inside new std::byte[10000] memory, from the kernel perspective there is nothing wrong with that.
You most probably could allocate new std::byte[6000] and run your process under valgrind or gdb or other tools to inspect malicious writes.

Is alocating specific memory for a void pointer undefined behaviour?

I've met a situation that I think it is undefined behavior: there is a structure that has some member and one of them is a void pointer (it is not my code and it is not public, I suppose the void pointer is to make it more generic). At some point to this pointer is allocated some char memory:
void fooTest(ThatStructure * someStrPtr) {
try {
someStrPtr->voidPointer = new char[someStrPtr->someVal + someStrPtr->someOtherVal];
} catch (std::bad_alloc$ ba) {
std::cerr << ba.what << std::endl;
}
// ...
and at some point it crashes at the allocation part (operator new) with Segmentation fault (a few times it works, there are more calls of this function, more cases). I've seen this in debug.
I also know that on Windows (my machine is using Linux) there is also a Segmentation fault at the beginning (I suppose that in the first call of the function that allocates the memory).
More, if I added a print of the values :
std::cout << someStrPtr->someVal << " " << someStrPtr->someOtherVal << std::endl;
before the try block, it runs through the end. This print I've done to see if there is some other problem regarding the structure pointer, but the values are printed and not 0 or negative.
I've seen these topics: topic1, topic2, topic3 and I am thinking that there is some UB linked to the void pointer. Can anyone help me in pointing the issue here so I can solve it, thanks?
No, that in itself is not undefined behavior. In general, when code "crashes at the allocation part", it's because something earlier messed up the heap, typically by writing past one end of an allocated block or releasing the same block more than once. In short: the bug isn't in this code.
A void pointer is a perfectly fine thing to do in C/C++ and you can usually cast from/to other types
When you get a seg-fault while initialization, this means some of the used parameters are themselves invalid or so:
Is someStrPtr valid?
is someStrPtr->someVal and someStrPtr->someotherVal valid?
Are the values printed is what you were expecting?
Also if this is a multuthreaded application, make sure that no other thread is accessing those variables (especially between your print and initialization statement). This is what is really difficult to catch

C++: delete this; return x;

Hey I am curious about some C++ behaviour as the code I am working on would benefit greatly from this in terms of simplicity if this behaviour is consistent. Basically the idea is for a specific function inside my object A to compute a complex calculation returning a float, but just before returning the float, occasionally, calling delete this.
1
here is a code example of the functionality i am trying to verify is consistent.
#include <iostream>
#include <stdio.h>
#include <cstdlib>
using namespace std;
struct A
{
float test(float a) {delete this; return a;}
};
int main()
{
A *a = new A();
cout << a->test(1.f) << endl;
cout << "deleted?" << endl;
cout << a->test(1.f) << endl;
}
the output becomes:
1.0
deleted?
*** Error in `./test': double free or corruption (fasttop): 0x0105d008 *** Aborted (core dumped)
I think this means the object was deleted correctly (what is left in memory? an uncallable skeleton of A? A typed pointer? A null pointer?), but am not sure whether I am right about that. If so, is this behaviour going to be consistent (my functions will only be returning native types (floats))
2
Additionally I am curious as to why this doesn't seem to work:
struct A
{
float test(float a) {delete this; return a;}
};
int main()
{
A a;
cout << a.test(1.f) << endl;
}
this compiles but throws the following error before returning anything.
*** Error in `./test': free(): invalid pointer: 0xbe9e4c64 *** Aborted (core dumped)
NOTE Please don't tell reply with a long list of explanations as to why this is bad coding/etiquette or whatever, don't care, I am simply interested in the possibilities.
It is safe for a member function to call delete this; if you know that the object was allocated using scalar new and that nothing else will use the object afterward.
In your first example, after the first call to a->test(1.f), a becomes a "dangling pointer". You invoke Undefined Behavior when you dereference it to call test a second time.
In your second example, the delete this; statement is Undefined Behavior because the object was not created using new.
The behavior is undefined, but in a typical modern implementation the practical "possibilities" of accessing deallocated memory include (but not limited to):
delete releases memory at run-time library level (RTL), but does not return it to the OS. I.e. OS-level memory protection is not engaged and OS continues to see that memory as allocated. However, internal RTL data stored in freed memory blocks clobbers your data. The result: access through the pointer does not cause your code to crash, but the data looks meaningless (clobbered)
Same as 1, but internal RTL data happens not to overlap your critical data. The code does not crash and continues to work "as if" everything is "fine".
delete releases memory to the OS. OS-level memory protection is engaged. Any attempt to access though the pointer causes an immediate crash.
Your examples proceed in accordance with the second scenario, i.e. the data stored in the object appears to remain untouched even after you free the memory.
The crashes you observe in your code happen because RTL detects a double free attempt (or an attempt to free a non-dynamic memory, as in the second example), which is kinda besides the point in the context of your question.

How big can a globally declared data structure be?

I have a global vector that I load data into, which is then read from later in my program.
If I have say, 1000000 elements pushed back into this vector, will it cause any problems such as those created by overflowing the stack? How much memory space is a available in the global scope?
As per C++11 section 23, unless your type provides a specialised allocator, sequence containers such as vector will use std::allocator, which gets its memory using new. In other words, using dynamic memory allocation functions ("from the heap" in layman's parlance).
So, provided you follow the rules, there's no way to corrupt the stack using that container, as might be the case if you did something like:
void function(void) {
int xyzzy[999999999];
:
}
That's not to say you can't run out of memory, the heap isn't infinite in size, as shown in the following code:
#include <iostream>
#include <vector>
int main (void) {
std::vector<const char*> *v = new std::vector<const char*>();
long count = 0;
while (1) {
try {
v->push_back("xyzzy");
count++;
} catch (std::exception &e) {
std::cout << e.what() << '\n';
break;
}
}
std::cout << count << " pushbacks done.\n";
return 0;
}
which outputs (on my system):
std::bad_alloc
134217728 pushbacks done.
But getting an exception because you're run out of memory is a far cry from corruption caused by stack overflow or running out of static storage duration ("global") space.
Question:
If I have say, 1000000 elements pushed back into this vector, will it cause any problems such as those created by overflowing the stack?
No, it won't. When you create an std::vector, the memory for the data is allocated from the heap, not from the memory reserved for global data.
Question:
How much memory space is a available in the global scope?
I don't have an answer to that. It might be irrelevant given the answer to the first question.
You might find this answer to another SO post relevant.

Why are we able to access unallocated memory in a class?

I am sorry if I may not have phrased the question correctly, but in the following code:
int main() {
char* a=new char[5];
a="2222";
a[7]='f'; //Error thrown here
cout<<a;
}
If we try to access a[7] in the program, we get an error because we haven't been assigned a[7].
But if I do the same thing in a class :
class str
{
public:
char* a;
str(char *s) {
a=new char[5];
strcpy(a,s);
}
};
int main()
{
str s("ssss");
s.a[4]='f';s.a[5]='f';s.a[6]='f';s.a[7]='f';
cout<<s.a<<endl;
return 0;
}
The code works, printing the characters "abcdfff".
How are we able to access a[7], etc in the code when we have only allocated char[5] to a while we were not able to do so in the first program?
In your first case, you have an error:
int main()
{
char* a=new char[5]; // declare a dynamic char array of size 5
a="2222"; // assign the pointer to a string literal "2222" - MEMORY LEAK HERE
a[7]='f'; // accessing array out of bounds!
// ...
}
You are creating a memory leak and then asking why undefined behavior is undefined.
Your second example is asking, again, why undefined behavior is undefined.
As others have said, it's undefined behavior. When you write to memory out of bounds of the allocated memory for the pointer, several things can happen
You overwrite an allocated, but unused and so far unimportant location
You overwrite a memory location that stores something important for your program, which will lead to errors because you've corrupted your own memory at that point
You overwrite a memory location that you aren't allowed to access (something out of your program's memory space) and the OS freaks out, causing an error like "AccessViolation" or something
For your specific examples, where the memory is allocated is based on how the variable is defined and what other memory has to be allocated for your program to run. This may impact the probability of getting one error or another, or not getting an error at all. BUT, whether or not you see an error, you shouldn't access memory locations out of your allocated memory space because like others have said, it's undefined and you will get non-deterministic behavior mixed with errors.
int main() {
char* a=new char[5];
a="2222";
a[7]='f'; //Error thrown here
cout<<a;
}
If we try to access a[7] in the program, we get an error because we
haven't been assigned a[7].
No, you get a memory error from accessing memory that is write-protected, because a is pointing to the write-only memory of "2222", and by chance two bytes after the end of that string is ALSO write-protected. If you used the same strcpy as you use in the class str, the memory access would overwrite some "random" data after the allocated memory which is quite possibly NOT going to fail in the same way.
It is indeed invalid (undefined behaviour) to access memory outside of the memory you have allocated. The compiler, C and C++ runtime library and OS that your code is produced with and running on top of is not guaranteed to detect all such things (because it can be quite time-consuming to check every single operation that accesses memory). But it's guaranteed to be "wrong" to access memory outside of what has been allocated - it just isn't always detected.
As mentioned in other answers, accessing memory past the end of an array is undefined behavior, i.e. you don't know what will happen. If you are lucky, the program crashes; if not, the program continues as if nothing was wrong.
C and C++ do not perform bounds checks on (simple) arrays for performance reasons.
The syntax a[7] simply means go to memory position X + sizeof(a[0]), where X is the address where a starts to be stored, and read/write. If you try to read/write within the memory that you have reserved, everything is fine; if outside, nobody knows what happens (see the answer from #reblace).