i got the following code:
char *func(char * a)
{
char b[1000];
strcpy(b,a);
return b;
}
(I know that the code is bad, because I return address of array, that will delete when I exit the function.) My question is, what will be deleted/override, if I put in "a", an array of 2000 chars, and "b" is only 1000 chars array. I read this question somewhere, and they said that by this code I can know what will override.
It seems you are not familiar with the idea of stack. When the program control enters into a function a pointer to stack is given to the program control. And all local variables are allocated on this stack. When program returns from the function then the stack pointer is changed to it's original value. Therefore b is on stack and 1000 bytes are allocated for it. And when program returns from func then in fact nothing will be deleted or overwritten until some other function uses that area of stack. You can try accessing 'b' just after you come out from the function and it must work. But suppose after calling 'func' you call another function 'func1' which has some local variables then updating those variables will overwrite the content where b is pointing
what will be deleted/override, if I put in "a", an array of 2000 chars, and "b" is only 1000 chars array.
The behaviour is undefined. From the standard's perspective, nothing is guaranteed. Some memory could be overwritten, or it might not be. In practice, it's likely that some memory would be overwritten.
correct me if I wrong, that this overflow, will rewrite the value of the pointer. right?
It could. It might not. The behaviour is undefined.
What will happen depends on the compiler, the version of the compiler, the cpu architecture, the compilation options, how the rest of the program is defined and possibly other factors.
In a typical implementation, the return value fits in a register and is a compile time constant so in practice it is unlikely to be stored on the stack in which case it would not be affected by the overflow. There are much more dangerous potential side-effects, such as the function returning in a completely different place than where it was called from.
If the C-string in a that you pass is bigger than the space you allocated for b (1000 bytes) it will happily write past the end of b and down through whatever is on the stack below b. A C-string has no defined length. strcpy(b,a) will keep copying bytes into b until it finds a \0 inside a. On your function func's stack, the compiler will reserve 1000 bytes for b then save the return address to whatever called func. If a overwrites b you'll write over the return address and when func returns you'll jump to some random address with horrible results. Each compiler is free to put the return address from the function wherever it likes. Maybe it puts the return address at the top of its stack. But even in that scenario you'll be writing over stuff that you should not be writing over. If you're lucky you'll get an access violation.
To protect against this you can use strncpy(b,a,sizeof(b)-1) and put a \0 at the end of b. Best to check strlen(a) and handle the error sanely if strlen(a) > sizeof(b)-1.
This is exactly the "buffer overrun" technique that hackers used to break Windows security restrictions. You call a Windows function that expects a C-string but does not check the length of the C-string that is passed in. The string that is passed in guesses the length of the input buffer and eventually it guesses correctly and overwrites the return address of the function to point into the string that was passed in. The remainder of the input string contains machine code instructions that then operate under the security permissions of the Windows function. It can then do whatever it wants.
Microsoft closed this security loophole long ago, but it remains as a good lesson to check the length of C-strings that you accept as input parameters.
Related
When defining a variable without initialization on either the stack or the free store it usually has a garbage value, as assigning it to some default value e.g. 0 would just be a waste of time.
Examples:
int foo;//uninitialized foo may contain any value
int* fooptr=new int;//uninitialized *fooptr may contain any value
This however doens't answer the question of where the garbage values come from.
The usual explanation to that is that new or malloc or whatever you use to get dynamically allocated memory don't initialize the memory to some value as I've stated above and the garbage values are just leftover from whatever program used the same memory prior.
So I put this explanation to the test:
#include <iostream>
int main()
{
int* ptr= new int[10]{0};//allocate memory and initialize everything to 0
for (int i=0;i<10;++i)
{
std::cout<<*(ptr+i)<<" "<<ptr+i<<std::endl;
}
delete[]ptr;
ptr= new int[10];//allocate memory without initialization
for (int i=0;i<10;++i)
{
std::cout<<*(ptr+i)<<" "<<ptr+i<<std::endl;
}
delete[]ptr;
}
Output:
0 0x1291a60
0 0x1291a64
0 0x1291a68
0 0x1291a6c
0 0x1291a70
0 0x1291a74
0 0x1291a78
0 0x1291a7c
0 0x1291a80
0 0x1291a84
19471096 0x1291a60
19464384 0x1291a64
0 0x1291a68
0 0x1291a6c
0 0x1291a70
0 0x1291a74
0 0x1291a78
0 0x1291a7c
0 0x1291a80
0 0x1291a84
In this code sample I allocated memory for 10 ints twice. The first time I do so I initialize every value to 0. I use delete[] on the pointer and proceed to immediately allocate the memory for 10 ints again but this time without initialization.
Yes I know that the results of using an uninitialized variable are undefined, but I want to focus on the garbage values fro now.
The output shows that the first two ints now contain garbage values in the same memory location.
If we take the explanation for garbage values into consideration this leaves me only one conclusion: Between deleting the pointer and allocating the memory again something must have tampered with the values in those memory locations.
But isn't the free store reserved for new and delete?
What could have tampered those values?
Edit:
I removed the std::cout as a comment pointed it out.
I use the compiler Eclipse 2022-06 comes with (MinGW GCC) using default flags on Windows 10.
One of the things you need to understand about heap allocations is that there is always a small control block also allocated when you do a new. The values in the control block tend to inform the compiler how much space is being freed when delete is called.
When a block is deleted, the first part of the buffer is often overwritten by a control block. If you look at the two values you see from your program as hex values, you will note they appear to be addresses in the same general memory space. The first looks to be a pointer to the next allocated location, while the second appears to be a pointer to the start of the heap block.
Edit: One of the main reasons to add this kind of control block in a recently deallocated buffer is that is supports memory coalescence. That two int signature will effectively show how much memory can be claimed if that space is reused, and it signals that it is empty by pointing to the start of the frame.
When defining a variable without initialization on either the stack or the free store it usually has a garbage value, as assigning it to some default value e.g. 0 would just be a waste of time.
No. The initial value of a variable that is not initialized is always garbage. All garbage. This is inherent in "not initialized". The language semantics do not specify what the value of the variable is, and reading that value produces undefined behavior. If you do read it and it seems to make sense to you -- it is all zeroes, for example, or it looks like the value that some now-dead variable might have held -- that is meaningless.
This however doens't answer the question of where the garbage values come from.
At the level of the language semantics, that question is non-sensical. "Garbage values" aren't a thing of themselves. The term is descriptive of values on which you cannot safely rely, precisely because the language does not describe where they come from or how they are determined.
The usual explanation to that is that new or malloc or whatever you use to get dynamically allocated memory don't initialize the memory [so the] values are just leftover from whatever program used the same memory prior.
That's an explanation derived from typical C and C++ implementation details. Read again: implementation details. These are what you are asking about, and unless your objective is to learn about writing C and / or C++ compilers or implementations of their standard libraries, it is not a particularly useful area to probe. The specifics vary across implementations and sometimes between versions of the same implementation, and if your programs do anything that exposes them to these details then those programs are wrong.
I know that the results of using an uninitialized variable are undefined, but I want to focus on the garbage values fro now.
No, apparently you do not know that the results of using the value of an uninitialized variable are undefined. If you did, you would not present the results of your program as if they were somehow meaningful.
You also seem not understand the term "garbage value", for in addition to thinking that the results of your program are meaningful, you appear to think that some of the values it outputs are not garbage.
I have a this fragment of code in C++:
char x[50];
cout << x << endl;
which outputs some random symbols as seen here:
So my first question: what is the reason behind this output? Shouldn't it be spaces or at least same symbols?
The reason I am concerned with this is that I am writing program in CUDA and I'm doing some character manipulations inside __global__ function, hence the use of string gives a "calling host function is not allowed" error.
But if I am using "big enough" char array (each chunk of text I am operating with differs in size, meaning that it will not always utilize char array fully) it's sometimes not fully filled and I left with junk like in the picture below hanging at the end of text:
So my second question: is there any way to avoid this?
what is the reason behind this output?
The values in an automatic variable are indeterminate. The standard doesn't specify it, so it might be spaces as you said, it might be random content.
[...] sometimes not fully filled and I left with junk [...]
Strings in C are null-terminated, so any routine dedicated to printing a string will loop as long as no null byte is encountered. In uninitialized memory, this null byte occurs randomly (or not at all). These weird, trailing characters are a result of that.
is there any way to avoid this?
Yes. Initialize it.
(will assume x86 in this post)
what is the reason behind this output?
Here's roughly what happens, in assembly, when you do char x[50];:
ADD ESP, 0x34 ; 52 bytes
Essentially, the stack is moved up by 0x34 bytes (must be divisible by 4). Then, that space on the stack becomes x. There's no cleaning, no changes or pushes or pops, just this space becoming x. Anything that was there before (abandoned params, return addresses, variables from previous function calls) will be in x.
Here's roughly what happens when you do new char[50]:
1. Control gets passed to the allocator
2. The allocator looks for any heap of sufficient size (readas: an already allocated but uncommited heap)
3. If 2 fails, the allocator makes a new heap
4. The allocator takes the heap (either the found or allocated one) and commits it
5. The address of that heap is returned to your code where it is used as a char*
The same as with a stack, you get whatever data is there. Some programs or systems may have allocators that zero out heaps when they are allocated or committed, where others may only zero when allocated but not committed, and some may not zero at all. Depending on the allocator, you may get clean memory or you may get re-used and dirty memory. This is why the values here can be non-zero and aren't predictable.
is there any way to avoid this?
In the case of heap memory, you can overload the new and delete operators in C++ and always zero newly allocated memory. You can see examples of overloading these operators here. As for memory on the stack, you just have to live with zeroing it out every time.
ZeroMemory(myArray, sizeof(myarray));
Alternatively, for both methods, you could stay away from naked arrays and use std::vector or other wrappers that take care of initialization for you. You'll still want to make sure to initialize integers and other numeric or pointer data-types, though.
No, there is no way to avoid it. C++ does not initialize automatic variables of built-in types (such as arrays of built-in types in your case) automatically, you need to initialize them yourself.
Why are you having issues with this code?
char x[50];
cout << new char[50] << endl;
cout << x << endl;
You're leaking memory with the 'new char[50] without a corresponding delete.
Also, uninitialized memory is undefined as others have said and in most cases you get garbage within that memory block. A better method is to initialize it:
char x[50] = {};
char* y = new char[50]();
Then just remember to call delete on y later to free the memory. Yes, the OS will do it for you, but this is never a way to write good programs though.
For example, we have
int* p;
Could this pointer be initialized by 0 randomly, it means initialized by the operating system, in this case we dont change the value of this pointer ?
Here's the tricky part: no valid program can figure this out. Reading p is Undefined Behavior, and anything may happen including returning nullptr even though p doesn't actually contain nullptr (!)
If you wonder how that's possible, p may be put in a register on first write. Trying to read p before that would give rather random results.
Assumption: you are talking about the possibility that the return of a malloc or new should happen to be 0 at some point.
In this case, I believe the answer is no. The pointer will take a virtual address. Being something allocated dynamically, it will get an address belonging to the Heap that will never start at address 0.
The virtual memory space of your process is divided in more sections: Text, Data, BSS, Heap (where all the dynamically allocated objects go), the stack and the kernel space. The image below is for a 32b OS but for 64b the picture is similar.
You can make a small program and read some addresses in different spaces, and understand what you can and cannot access.
The heap (the place where your pointer will point), grows after the Text, Data and BSS segments. So it will never be 0.
Declaring the variable as Global or static would be automatically initialized to 0X0 by OS.
So I can fix this manually so it isn't an urgent question but I thought it was really strange:
Here is the entirety of my code before the weird thing that happens:
int main(int argc, char** arg) {
int memory[100];
int loadCounter = 0;
bool getInput = true;
print_memory(memory);
and then some other unrelated stuff.
The print memory just prints the array which should've initialized to all zero's but instead the first few numbers are:
+1606636544 +32767 +1606418432 +32767 +1856227894 +1212071026 +1790564758 +813168429 +0000 +0000
(the plus and the filler zeros are just for formatting since all the numbers are supposed to be from 0-1000 once the array is filled. The rest of the list is zeros)
It also isn't memory leaking because I tried initializing a different array variable and on the first run it also gave me a ton of weird numbers. Why is this happening?
Since you asked "What do C++ arrays init to?", the answer is they init to whatever happens to be in the memory they have been allocated at the time they come into scope.
I.e. they are not initialized.
Do note that some compilers will initialize stack variables to zero in debug builds; this can lead to nasty, randomly occurring issues once you start doing release builds.
The array you are using is stack allocated:
int memory[100];
When the particular function scope exits (In this case main) or returns, the memory will be reclaimed and it will not leak. This is how stack allocated memory works. In this case you allocated 100 integers (32 bits each on my compiler) on the stack as opposed to on the heap. A heap allocation is just somewhere else in memory hopefully far far away from the stack. Anyways, heap allocated memory has a chance for leaking. Low level Plain Old Data allocated on the stack (like you wrote in your code) won't leak.
The reason you got random values in your function was probably because you didn't initialize the data in the 'memory' array of integers. In release mode the application or the C runtime (in windows at least) will not take care of initializing that memory to a known base value. So the memory that is in the array is memory left over from last time the stack was using that memory. It could be a few milli-seconds old (most likely) to a few seconds old (less likely) to a few minutes old (way less likely). Anyways, it's considered garbage memory and it's to be avoided at all costs.
The problem is we don't know what is in your function called print_memory. But if that function doesn't alter the memory in any ways, than that would explain why you are getting seemingly random values. You need to initialize those values to something first before using them. I like to declare my stack based buffers like this:
int memory[100] = {0};
That's a shortcut for the compiler to fill the entire array with zero's.
It works for strings and any other basic data type too:
char MyName[100] = {0};
float NoMoney[100] = {0};
Not sure what compiler you are using, but if you are using a microsoft compiler with visual studio you should be just fine.
In addition to other answers, consider this: What is an array?
In managed languages, such as Java or C#, you work with high-level abstractions. C and C++ don't provide abstractions (I mean hardware abstractions, not language abstractions like OO features). They are dessigned to work close to metal that is, the language uses the hardware directly (Memory in this case) without abstractions.
That means when you declare a local variable, int a for example, what the compiler does is to say "Ok, im going to interpret the chunk of memory [A,A + sizeof(int)] as an integer, which I call 'a'" (Where A is the offset between the beginning of that chunk and the start address of function's stack frame).
As you can see, the compiler only "assigns" memory-segments to variables. It does not do any "magic", like "creating" variables. You have to understand that your code is executed in a machine, and the machine has only a memory and a CPU. There is no magic.
So what is the value of a variable when the function execution starts? The value represented with the data which the chunk of memory of the variable has. Commonly, that data has no sense from our current point of view (Could be part of the data used previously by a string, for example), so when you access that variable you get extrange values. Thats what we call "garbage": Data previously written which has no sense in our context.
The same applies to an array: An array is only a bigger chunk of memory, with enough space to fit all the values of the array: [A,A + (length of the array)*sizeof(type of array elements)]. So as in the variable case, the memory contains garbage.
Commonly you want to initialize an array with a set of values during its declaration. You could achieve that using an initialiser list:
int array[] = {1,2,3,4};
In that case, the compiler adds code to the function to initialize the memory-chunk which the array is with that values.
Sidenote: Non-POD types and static storage
The things explained above only applies to POD types such as basic types and arrays of basic types. With non-POD types like classes the compiler adds calls to the constructor of the variables, which are designed to initialise the values (attributes) of a class instance.
In addition, even if you use POD types, if variables have static storage specification, the compiler initializes its memory with a default value, because static variables are allocated at program start.
the local variable on stack is not initialized in c/c++. c/c++ is designed to be fast so it doesn't zero stack on function calls.
Before main() runs, the language runtime sets up the environment. Exactly what it's doing you'd have to discover by breaking at the load module's entry point and watching the stack pointer, but at any rate your stack space on entering main is not guaranteed clean.
Anything that needs clean stack or malloc or new space gets to clean it itself. Plenty of things don't. C[++] isn't in the business of doing unnecessary things. In C++ a class object can have non-trivial constructors that run implicitly, those guarantee the object's set up for use, but arrays and plain scalars don't have constructors, if you want an inital value you have to declare an initializer.
I have two questions regarding array:
First one is regarding following code:
int a[30]; //1
a[40]=1; //2
why isn't the line 2 giving segfault, it should give because array has been allocated
only 30 int space and any dereferencing outside its allocated space should give segfault.
Second: assuming that above code works is there any chance that a[40] will get over written, since it doesn't come is the reserved range of arrray.
Thanks in advance.
That's undefined behavior - it may crash, it may silently corrupt data, it may produce no observable results, anything. Don't do it.
In your example the likely explanation is that the array is stack-allocated and so there's a wide range of addresses around the array accessible for writing, so there're no immediate observable results. However depending on how (which direction - to larger addresses or to smaller addresses) the stack grows on your system this might overwrite the return address and temporaries of functions up the call stack and this will crash your program or make it misbehave when it tries to return from the function.
For performance reason, C will not check array size each time you access it. You could also access elements via direct pointers in which case there is no way to validate the access.
SEGFAULT will happen only if you are out of the memory allocated to your process.
For 2nd question, yes it can be overwritten as this memory is allocated to your process and is possibly used by other variables.
It depends on where has the system allocated that array, if by casuality position 40 is in an operative system reserved memory then you will receive segfault.
Your application will crash only if you do something illegal for the rest of your system: if you try and access a virutal memory address that your program doesn't own, what happens is that your hardware will notice that, will inform your operating system, and it will kill your application with a segmentation fault: you accessed a memory segment you were not supposed to.
However if you access a random memory address (which is what you did: for sure a[40] is outside of your array a, but it could be wherever), you could access a valid memory cell (which is what happened to you).
This is an error: you'll likely overwrite some memory area your program owns, thus risking to break your program elsewhere, but the system cannot know if you accessed it by purpose or by mistake and won't kill you.
Programs written in managed languages (ie: programs that run in a protected environment checking anything) would notice your erroneous memory access, but C is not a managed language: you're free to do whatever you want (as soon as you don't create problems to the rest of the system).
The reason line 2 works and doesn't throw a segfault is because in C/C++, arrays are pointers. So your array variable a points to some memory address e.g. 1004. The array syntax tells your program how many bytes down from the location of a to look for an array element.
This means that
printf("%p", a);
// prints out "1004"
and
printf("%p", a[0]);
// prints out "1004"
should print the same value.
However,
printf("%p", a[40]);
// prints out "1164"
returns the memory address that is sizeof(int) * 40 down from the address of a.
Yes, it will eventually be overwritten.
If you malloc the space, you should get a segfault (or at least I believe so), but when using an array without allocating space, you'll be able to overwrite memory for a while. It will crash eventually, possibly when the program does an array size check or maybe when you hit a memory block reserved for something else (not sure what's going on under the hood).
Funny thing is that, IIRC, efence won't catch this either :D.