I've met a situation that I think it is undefined behavior: there is a structure that has some member and one of them is a void pointer (it is not my code and it is not public, I suppose the void pointer is to make it more generic). At some point to this pointer is allocated some char memory:
void fooTest(ThatStructure * someStrPtr) {
try {
someStrPtr->voidPointer = new char[someStrPtr->someVal + someStrPtr->someOtherVal];
} catch (std::bad_alloc$ ba) {
std::cerr << ba.what << std::endl;
}
// ...
and at some point it crashes at the allocation part (operator new) with Segmentation fault (a few times it works, there are more calls of this function, more cases). I've seen this in debug.
I also know that on Windows (my machine is using Linux) there is also a Segmentation fault at the beginning (I suppose that in the first call of the function that allocates the memory).
More, if I added a print of the values :
std::cout << someStrPtr->someVal << " " << someStrPtr->someOtherVal << std::endl;
before the try block, it runs through the end. This print I've done to see if there is some other problem regarding the structure pointer, but the values are printed and not 0 or negative.
I've seen these topics: topic1, topic2, topic3 and I am thinking that there is some UB linked to the void pointer. Can anyone help me in pointing the issue here so I can solve it, thanks?
No, that in itself is not undefined behavior. In general, when code "crashes at the allocation part", it's because something earlier messed up the heap, typically by writing past one end of an allocated block or releasing the same block more than once. In short: the bug isn't in this code.
A void pointer is a perfectly fine thing to do in C/C++ and you can usually cast from/to other types
When you get a seg-fault while initialization, this means some of the used parameters are themselves invalid or so:
Is someStrPtr valid?
is someStrPtr->someVal and someStrPtr->someotherVal valid?
Are the values printed is what you were expecting?
Also if this is a multuthreaded application, make sure that no other thread is accessing those variables (especially between your print and initialization statement). This is what is really difficult to catch
Related
I am aware that out of boundary access of an std::vector in C++ with the operator[] results in undefined behavior. So, I should not expect anything meaningful doing that. However, I'm curious about what is actually happening there under the hood.
Consider the following piece of code:
#include <iostream>
#include <vector>
int main() {
{
std::cerr << "Started\n";
std::vector<int> v(2);
std::cerr << "Successfully initialized vector\n";
v[-1] = 10000; // Note: if accessing v[3], nothing bad seems to happen
std::cerr << "Successfully accessed element -1\n";
}
std::cerr << "Successfully destructed the vector\n";
}
When compiled on GNU/Linux with g++ (GCC) 11.2.0, running this code produces the following output:
Started
Successfully initialized vector
Successfully accessed element -1
double free or corruption (out)
Aborted (core dumped)
Why could have that happened? Why does it cause the destructor to fail? Why does it produce such an error message?
I would understand it if I was using some structure that stored the array together with it on the stack: I would then accidentally access some of its internal data that lies right before v[0] and could have broken something. But as far as I know, the underlying array of std::vector is stored on heap, so the data that I access should not even belong to it, should it? Also, because my last output attempt is taken right after exiting the block with only vector declared in it, I don't see what else except for its destructor could have been called, so the vector seems to be somehow affected by my action...
A hypothetical answer that could have happened: The UB caused arbitrary piece of memory to be overwritten. This is called memory corruption.
That overwritten arbitrary piece of memory happened to be right before the dynamic memory that the vector allocated. The arbitrary piece of memory right before the allocation happened to contain an "information header" that describes the allocation. When the destructor was called, there was an attempt to deallocate the memory. The global allocator detected that the corrupted information was inconsistent, produced the diagnostic message and terminated the program.
This is what the source code of the global memory allocator on your system may look like: https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#4326 The link leads specifically to the line that produces the error message.
Okay, so i was experimenting with pointers in C++, as i have started to code again after almost an year and a half break(School Stuff). So please spare me if this seems naive to you, i am just rusting off.
Anyways, i was screwing around with pointers in VS 2019, and i noticed something that started to bug me.
This is the code that i wrote:
#include <iostream>
int main() {
int* i = new int[4];
*i = 323;
*(i + 1) = 453;
std::cout << *i << std::endl;
std::cout << *(i + 1) << std::endl;
delete i;
}
Something seems odd right? Dont worry, that delete is intentional, and is kindof the point of this question. Now I expected it to do some memory mishaps, since this is not the way we delete an array on the heap, BUT to my surprise, it did not(I did this in both Debug and Release and had the same observation)
Allocating the array:
First Modification:
Second Modification
Now i was expecting some sort of mishap at delete, since i did not delete it the way it is supposed to be deleted. BUT
Deleting Incorrectly
Now on another run i actually used the delete operator correctly and it did the same thing:
Now i got the same results when i tried to allocate with malloc and use delete to free it.
So my question is - Why does my code not mess up? I mean if this is how it works, I could just use delete (pointer) to wipe the entire array.
The gist of my question is "What does new operator do under the hood?"
What happens on allocating with “new []” and deleting with just “delete”
The behaviour of the program is undefined.
Now I expected it to do some memory mishaps
Your expectation is misguided. Undefined behaviour does not guarantee mishaps in memory or otherwise. Nothing about the behaviour of the program is guaranteed.
I mean if this is how it works
This is how you observed it to "work". It doesn't mean that it will necessarily always work like that. Welcome to undefined behaviour.
Why does the new[] operator in C++ actually create an array of length + 1? For example, see this code:
#include <iostream>
int main()
{
std::cout << "Enter a positive integer: ";
int length;
std::cin >> length;
int *array = new int[length]; // use array new. Note that length does not need to be constant!
//int *array;
std::cout << "I just allocated an array of integers of length " << length << '\n';
for (int n = 0; n<=length+1; n++)
{
array[n] = 1; // set element n to value 1
}
std::cout << "array[0] " << array[0] << '\n';
std::cout << "array[length-1] " << array[length-1] << '\n';
std::cout << "array[length] " << array[length] << '\n';
std::cout << "array[length+1] " << array[length+1] << '\n';
delete[] array; // use array delete to deallocate array
array = 0; // use nullptr instead of 0 in C++11
return 0;
}
We dynamically create an array of length "length" but we are able to assign a value at the index length+1. If we try to do length+2, we get an error.
Why is this? Why does C++ make the length = length + 1?
It doesn’t. You’re allowed to calculate the address array + n, for the purpose of checking that another address is less than it. Trying to access the element array[n] is undefined behavior, which means the program becomes meaningless and the compiler is allowed to do anything whatsoever. Literally anything; one old version of GCC, if it saw a #pragma directive, started a roguelike game on the terminal. (Thanks, Revolver_Ocelot, for reminding me: that was technically implementation-defined behavior, a different category.) Even calculating the address array + n + 1 is undefined behavior.
Because it can do anything, the particular compiler you tried that on decided to let you shoot yourself in the foot. If, for example, the next two words after the array were the header of another block in the heap, you might get a memory-corruption bug. Or maybe a compiler stored the array at the top of your memory space, the address &array[n+1] is aNULL` pointer, and trying to dereference it causes a segmentation fault. Or maybe the next page of memory is not readable or writable and trying to access it crashes the program with a protection fault. Or maybe the implementation bounds-checks your array accesses at runtime and crashes the program. Maybe the runtime stuck a canary value after the array and checks later to see if it was overwritten. Or maybe it happens, by accident, to work.
In practice, you really want the compiler to catch those bugs for you instead of trying to track down the bugs that buffer overruns cause later. It would be better to use a std::vector than a dynamic array. If you must use an array, you want to check that all your accesses are in-bounds yourself, because you cannot rely on the compiler to do that for you and skipping them is a major cause of bugs.
If you write or read beyond the end of an array or other object you create with new, your program's behaviour is no longer defined by the C++ standard.
Anything can happen and the compiler and program remain standard compliant.
The most likely thing to happen in this case is you are corrupting memory in the heap. In a small program this "seems to work" as the section of the heap ypu use isn't being used by any other code, in a larger one you will crash or behave randomly elsewhere in a seemingoy unrelated bit of code.
But arbitrary things could happen. The compiler could prove a branch leads to access beyond tue end of an array and dead-code eliminate paths that lead to it (UB that time travels), or it could hit a protected memory region and crash, or it could corrupt heap management data and cause a future new/delete to crash, or nasal demons, or whatever else.
At the for loop you are assigning elements beyond the bounds of the loop and remember that C++ does not do bounds checking.
So when you initialize the array you are initializing beyond the bounds of the array (Say the user enters 3 for length you are initializing 1 to array[0] through array[5] because the condition is n <= length + 1;
The behavior of the array is unpredictable when you go beyond its bounds, but most likely your program will crash. In this case you are going 2 elements beyonds its bounds because you have used = in the condition and length + 1.
There is no requirement that the new [] operator allocate more memory than requested.
What is happening is that your code is running past the end of the allocated array. It therefore has undefined behaviour.
Undefined behaviour means that the C++ standard imposes no requirements on what happens. Therefore, your implementation (compiler and standard library, in this case) will be equally correct if your program SEEMS to work properly (as it does in your case), produces a run time error, trashes your system drive, or anything else.
In practice, all that is happening is that your code is writing to memory, and later reading from that memory, past the end of the allocated memory block. What happens depends on what is actually in that memory location. In your case, whatever happens to be in that memory location is able to be modified (in the loop) or read (in order to print to std::cout).
Conclusion: the explanation is not that new[] over-allocates. It is that your code has undefined behaviour, so can seem to work anyway.
#include<iostream>
using namespace std;
int main( )
{
int *p;
double *q;
cout << p << " " << q << endl;
p++;
q++;
cout << p << " " << q << endl;
//*p = 5; // should be wrong!
}
This function prints
0x7ffe6c0591a0 0
0x7ffe6c0591a4 0x8
Why does p point to some randm address and q to zero? Also, when I uncomment the line *p=5, shouln't it throw an error? It still works fine:
code with line uncommented output
0x7ffc909a2f70 0
0x7ffc909a2f74 0x8
What can explain this weird behaviour?
When local (auto) variables of basic type (int, double, pointers to them, etc) are uninitialised, any operation that accesses their value yields undefined behaviour.
Printing a variable accesses its value, so both the statements with cout << ... give undefined behaviour. Incrementing a variable also accesses its value (it is not possible to give the result of incrementing without accessing the previous value) so both the increment operators present undefined behaviour. Derefererencing an unitialised pointer (as in *p) gives undefined behaviour, as does assigning a value to the result *p = 5.
So every statement you have shown after the definitions of p and q gives undefined behaviour.
Undefined behaviour means there are no constraints on what is permitted to happen - or, more simply, that anything can happen. That allows any result from "appear to do nothing" to "crash" to "reformat your hard drive".
The particular output your are getting therefore doesn't really matter. You may get completely different behavior when the code is built with a different compiler, or even during a different phase of the moon.
In terms of a partial explanation of what you are seeing .... The variables p and q will probably receive values corresponding to whatever happens to be in memory at the location where they are created - and therefore to whatever some code (within an operating system driver, within your program, even within some other program) happened to write at that location previously. But that is only one of many possible explanations.
As you have not initialised the variables - the code resorts to undefined behaviour. So anything can happen including what you have experienced.
C++ gives you more than enough rope to hang yourself. Compile with all the warnings switched on to avoid some of the perils.
When you do not initiate a variable, it set to a random or specific value based on the compiler policy.
About ++ if you create a pointer to class A, and use ++ the pointer will be incremented by sizeof(A).
And about your last question, *p=5 is a good instance of undefined behavior when you did not allocate a memory for p;
Hey I am curious about some C++ behaviour as the code I am working on would benefit greatly from this in terms of simplicity if this behaviour is consistent. Basically the idea is for a specific function inside my object A to compute a complex calculation returning a float, but just before returning the float, occasionally, calling delete this.
1
here is a code example of the functionality i am trying to verify is consistent.
#include <iostream>
#include <stdio.h>
#include <cstdlib>
using namespace std;
struct A
{
float test(float a) {delete this; return a;}
};
int main()
{
A *a = new A();
cout << a->test(1.f) << endl;
cout << "deleted?" << endl;
cout << a->test(1.f) << endl;
}
the output becomes:
1.0
deleted?
*** Error in `./test': double free or corruption (fasttop): 0x0105d008 *** Aborted (core dumped)
I think this means the object was deleted correctly (what is left in memory? an uncallable skeleton of A? A typed pointer? A null pointer?), but am not sure whether I am right about that. If so, is this behaviour going to be consistent (my functions will only be returning native types (floats))
2
Additionally I am curious as to why this doesn't seem to work:
struct A
{
float test(float a) {delete this; return a;}
};
int main()
{
A a;
cout << a.test(1.f) << endl;
}
this compiles but throws the following error before returning anything.
*** Error in `./test': free(): invalid pointer: 0xbe9e4c64 *** Aborted (core dumped)
NOTE Please don't tell reply with a long list of explanations as to why this is bad coding/etiquette or whatever, don't care, I am simply interested in the possibilities.
It is safe for a member function to call delete this; if you know that the object was allocated using scalar new and that nothing else will use the object afterward.
In your first example, after the first call to a->test(1.f), a becomes a "dangling pointer". You invoke Undefined Behavior when you dereference it to call test a second time.
In your second example, the delete this; statement is Undefined Behavior because the object was not created using new.
The behavior is undefined, but in a typical modern implementation the practical "possibilities" of accessing deallocated memory include (but not limited to):
delete releases memory at run-time library level (RTL), but does not return it to the OS. I.e. OS-level memory protection is not engaged and OS continues to see that memory as allocated. However, internal RTL data stored in freed memory blocks clobbers your data. The result: access through the pointer does not cause your code to crash, but the data looks meaningless (clobbered)
Same as 1, but internal RTL data happens not to overlap your critical data. The code does not crash and continues to work "as if" everything is "fine".
delete releases memory to the OS. OS-level memory protection is engaged. Any attempt to access though the pointer causes an immediate crash.
Your examples proceed in accordance with the second scenario, i.e. the data stored in the object appears to remain untouched even after you free the memory.
The crashes you observe in your code happen because RTL detects a double free attempt (or an attempt to free a non-dynamic memory, as in the second example), which is kinda besides the point in the context of your question.