C++ Initialize Array Waste Value

C++ Initialize Array Waste Value - c++

I often make mistakes not to initialize the values of the array.
Theoretically, we know that in that case, the arrangement should have a waste value.
In practice, however, many values are initialized to zero.
Therefore, program satisfied with the small examples.
This makes debugging difficult.
Can you tell me why this is happening?

Uninitialized values usually appear to be zero in simple test cases because modern operating systems blank memory before handing it to processes as a security precaution. This won't hold if your program has been running for awhile, so don't depend on it. This applies to both automatic (stack) variables and heap allocations. For stack allocations it's actually worse as the variable can take on a value that the variable can't possibly contain normally, potentially crashing your program outright. When dealing with the Itanium processor, it could crash with a memory fault even when assigning an uninitialized integer variable to another variable.
Or try it in DOS. It will not work because DOS doesn't blank memory.
On the other hand, static and global allocations are guaranteed to be zeroed if not initialized by the standard.

If you want to be warned about uninitialised memory, both g++ and clang++ support the Memory Sanitizer. Just add -fsanitize=memory to your compiler flags and you'll get runtime errors when you read uninitialised memory.

Related

Large number of null variables with the total size exceeding the memory in C

I have a very basic question related to NULL variables in C. Consider a hypothetical 64-bit system with very limited memory say 4KB and with a large number of integer pointers all set to NULL, such that the total size exceeds the available memory. Will such a program compile and execute?
Assume that the program doesn't have to do anything meaningful, just do declarations to a bunch of null integer pointers(of the sort int *x = NULL) and terminate.

Even though you did this:
int *x = NULL;
memory is still allocated for storing the pointer x (despite there being NULL on the right hand side). Memory in such case, if x is automatic variable was allocated on the stack.
If you had used malloc on the right hand side you would additionally have claimed memory from the heap.
Now if you create many such pointers which will exceed available stack memory you will get stack overflow on run time - but if you don't use these pointers they might as well get optimized away.

If you declare but don't use a variable which has no side effects the compiler will optimize it out of existence. So no, this is not a way to go out of memory.
If you don't have optimizations turned on, you could create enough variables on the stack to cause a stack overflow. You could also just create a really big array on the stack.
That said, it's quite easy to run out of memory, and you don't need to do it with copious quantities of int pointers. No matter how you manage to run out of memory, it won't stop you from compiling the program successfully.

Why does my dynamically allocated array get initialized to 0?

I have some code that creates a dynamically allocated array with
int *Array = new int[size];
From what I understand, Array should be a pointer to the first item of Array in memory. When using gdb, I can call x Array to examine the value at the first memory location, x Array+1 to examine the second, etc. I expect to have junk values left over from whatever application was using those spots in memory prior to mine. However, using x Array returns 0x00000000 for all those spots. What am I doing wrong? Is my code initializing all of the values of the Array to zero?
EDIT: For the record, I ask because my program is an attempt to implement this: http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time/. I want to make sure that my algorithm isn't incrementing through the array to initialize every element to 0.

In most modern OSes, the OS gives zeroed pages to applications, as opposed to letting information seep between unrelated processes. That's important for security reasons, for example. Back in the old DOS days, things were a bit more casual. Today, with memory protected OSes, the OS generally gives you zeros to start with.
So, if this new happens early in your program, you're likely to get zeros. You'd be crazy to rely on that though; it's undefined behavior if you do.
If you keep allocating, filling, and freeing memory, eventually new will return memory that isn't zeroed. Rather, it'll contain remnants of your process' own earlier scribblings.
And there's no guarantee that any particular call to new, even at the beginning of your program, will return memory filled with zeros. You're just likely to see that for calls to new early in your program. Don't let that mislead you.

I expect to have junk values left over from whatever application was using those spots
It's certainly possible but by no means guaranteed. Particularly in debug builds, you're just as likely to have the runtime zero out that memory (or fill it with some recognisable bit pattern) instead, to help you debug things if you use the memory incorrectly.
And, really, "those spots" is a rather loose term, given virtual addressing.
The important thing is that, no, your code is not setting all those values to zero.

segfault with array

I have two questions regarding array:
First one is regarding following code:
int a[30]; //1
a[40]=1; //2
why isn't the line 2 giving segfault, it should give because array has been allocated
only 30 int space and any dereferencing outside its allocated space should give segfault.
Second: assuming that above code works is there any chance that a[40] will get over written, since it doesn't come is the reserved range of arrray.
Thanks in advance.

That's undefined behavior - it may crash, it may silently corrupt data, it may produce no observable results, anything. Don't do it.
In your example the likely explanation is that the array is stack-allocated and so there's a wide range of addresses around the array accessible for writing, so there're no immediate observable results. However depending on how (which direction - to larger addresses or to smaller addresses) the stack grows on your system this might overwrite the return address and temporaries of functions up the call stack and this will crash your program or make it misbehave when it tries to return from the function.

For performance reason, C will not check array size each time you access it. You could also access elements via direct pointers in which case there is no way to validate the access.
SEGFAULT will happen only if you are out of the memory allocated to your process.
For 2nd question, yes it can be overwritten as this memory is allocated to your process and is possibly used by other variables.

It depends on where has the system allocated that array, if by casuality position 40 is in an operative system reserved memory then you will receive segfault.

Your application will crash only if you do something illegal for the rest of your system: if you try and access a virutal memory address that your program doesn't own, what happens is that your hardware will notice that, will inform your operating system, and it will kill your application with a segmentation fault: you accessed a memory segment you were not supposed to.
However if you access a random memory address (which is what you did: for sure a[40] is outside of your array a, but it could be wherever), you could access a valid memory cell (which is what happened to you).
This is an error: you'll likely overwrite some memory area your program owns, thus risking to break your program elsewhere, but the system cannot know if you accessed it by purpose or by mistake and won't kill you.
Programs written in managed languages (ie: programs that run in a protected environment checking anything) would notice your erroneous memory access, but C is not a managed language: you're free to do whatever you want (as soon as you don't create problems to the rest of the system).

The reason line 2 works and doesn't throw a segfault is because in C/C++, arrays are pointers. So your array variable a points to some memory address e.g. 1004. The array syntax tells your program how many bytes down from the location of a to look for an array element.
This means that
printf("%p", a);
// prints out "1004"
and
printf("%p", a[0]);
// prints out "1004"
should print the same value.
However,
printf("%p", a[40]);
// prints out "1164"
returns the memory address that is sizeof(int) * 40 down from the address of a.

Yes, it will eventually be overwritten.
If you malloc the space, you should get a segfault (or at least I believe so), but when using an array without allocating space, you'll be able to overwrite memory for a while. It will crash eventually, possibly when the program does an array size check or maybe when you hit a memory block reserved for something else (not sure what's going on under the hood).
Funny thing is that, IIRC, efence won't catch this either :D.

garbage values in C/C++ [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How an uninitialised variable gets a garbage value?
How are garbage values generated in C and C++ ? Does the compiler use some random number generation technique for generating garbage values?

If you mean the values of uninitialized variables, those aren't generated. They're just whatever garbage happened to be in that memory location.
int *foo = new int;
std::cout << *foo << std::endl;
New returned a pointer to some address in memory. That bit of RAM has always existed; there is no way to know what was stored there before. If it was just requested from the OS, it'll likely be 0 (the OS will erase memory blocks before giving them out for security reasons). If it was previously used by your program, who knows.
Actually, the results of using an uninitialized variable are undefined. You might get back an unpredictable number, your program may crash, or worse.
Even if you know that its safe to run the above on your platform, you shouldn't rely on that giving a random value. It will appear random, but is probably actually quite a bit more predictable and controllable than you'd like. Plus, the distribution will be nowhere near uniform.

If by "garbage values" you mean the values of uninitialized variables, it doesn't -- the value of an uninitialized variable is undefined by the standard. The garbage value you're thinking of is actually just whatever happened to be stored in that memory right before the storage for the variable was allocated from the stack or heap.
That said, some compilers offer debugging aids that will fill uninitialized variables with some well-known "magic number" to help you catch errors of this sort. For example, Microsoft Visual C++ fills uninitialized stack variables with 0xCCCCCCCC. These are specific to the compiler and usually require that you compile with debugging turned on.

The memory your program happens to use comes up in some state, controlled by how the electrons flowed at power up, funny properties of the silicion, what was in the memory before, and sometimes cosmic rays.
It isn't random, but it is extremely hard to predict.

Array index out of bound behavior

Why does C/C++ differentiates in case of array index out of bound
#include <stdio.h>
int main()
{
int a[10];
a[3]=4;
a[11]=3;//does not give segmentation fault
a[25]=4;//does not give segmentation fault
a[20000]=3; //gives segmentation fault
return 0;
}
I understand that it's trying to access memory allocated to process or thread in case of a[11] or a[25] and it's going out of stack bounds in case of a[20000].
Why doesn't compiler or linker give an error, aren't they aware of the array size? If not then how does sizeof(a) work correctly?

The problem is that C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
In this particular case, you are declaring a stack based array. Depending upon the particular implementation, accessing outside the bounds of the array will simply access another part of the already allocated stack space (most OS's and threads reserve a certain portion of memory for stack). As long as you just happen to be playing around in the pre-allocated stack space, everything will not crash (note i did not say work).
What's happening on the last line is that you have now accessed beyond the part of memory that is allocated for the stack. As a result you are indexing into a part of memory that is not allocated to your process or is allocated in a read only fashion. The OS sees this and sends a seg fault to the process.
This is one of the reasons that C/C++ is so dangerous when it comes to boundary checking.

The segfault is not an intended action of your C program that would tell you that an index is out of bounds. Rather, it is an unintended consequence of undefined behavior.
In C and C++, if you declare an array like
type name[size];
You are only allowed to access elements with indexes from 0 up to size-1. Anything outside of that range causes undefined behavior. If the index was near the range, most probably you read your own program's memory. If the index was largely out of range, most probably your program will be killed by the operating system. But you can't know, anything can happen.
Why does C allow that? Well, the basic gist of C and C++ is to not provide features if they cost performance. C and C++ has been used for ages for highly performance critical systems. C has been used as a implementation language for kernels and programs where access out of array bounds can be useful to get fast access to objects that lie adjacent in memory. Having the compiler forbid this would be for naught.
Why doesn't it warn about that? Well, you can put warning levels high and hope for the compiler's mercy. This is called quality of implementation (QoI). If some compiler uses open behavior (like, undefined behavior) to do something good, it has a good quality of implementation in that regard.
[js#HOST2 cpp]$ gcc -Wall -O2 main.c
main.c: In function 'main':
main.c:3: warning: array subscript is above array bounds
[js#HOST2 cpp]$
If it instead would format your hard disk upon seeing the array accessed out of bounds - which would be legal for it - the quality of implementation would be rather bad. I enjoyed to read about that stuff in the ANSI C Rationale document.

You generally only get a segmentation fault if you try to access memory your process doesn't own.
What you're seeing in the case of a[11] (and a[10] by the way) is memory that your process does own but doesn't belong to the a[] array. a[25000] is so far from a[], it's probably outside your memory altogether.
Changing a[11] is far more insidious as it silently affects a different variable (or the stack frame which may cause a different segmentation fault when your function returns).

C isn't doing this. The OS's virtual memeory subsystem is.
In the case where you are only slightly out-of-bound you are addressing memeory that is allocated for your program (on the stack call stack in this case). In the case where you are far out-of-bounds you are addressing memory not given over to your program and the OS is throwing a segmentation fault.
On some systems there is also a OS enforced concept of "writeable" memory, and you might be trying to write to memeory that you own but is marked unwriteable.

Just to add what other people are saying, you cannot rely on the program simply crashing in these cases, there is no gurantee of what will happen if you attempt to access a memory location beyond the "bounds of the array." It's just the same as if you did something like:
int *p;
p = 135;
*p = 14;
That is just random; this might work. It might not. Don't do it. Code to prevent these sorts of problems.

As litb mentioned, some compilers can detect some out-of-bounds array accesses at compile time. But bounds checking at compile time won't catch everything:
int a[10];
int i = some_complicated_function();
printf("%d\n", a[i]);
To detect this, runtime checks would have to be used, and they're avoided in C because of their performance impact. Even with knowledge of a's array size at compile time, i.e. sizeof(a), it can't protect against that without inserting a runtime check.

As I understand the question and comments, you understand why bad things can happen when you access memory out of bounds, but you're wondering why your particular compiler didn't warn you.
Compilers are allowed to warn you, and many do at the highest warning levels. However the standard is written to allow people to run compilers for all sorts of devices, and compilers with all sorts of features so the standard requires the least it can while guaranteeing people can do useful work.
There are a few times the standard requires that a certain coding style will generate a diagnostic. There are several other times where the standard does not require a diagnostic. Even when a diagnostic is required I'm not aware of any place where the standard says what the exact wording should be.
But you're not completely out in the cold here. If your compiler doesn't warn you, Lint may. Additionally, there are a number of tools to detect such problems (at run time) for arrays on the heap, one of the more famous being Electric Fence (or DUMA). But even Electric Fence doesn't guarantee it will catch all overrun errors.

That's not a C issue its an operating system issue. You're program has been granted a certain memory space and anything you do inside of that is fine. The segmentation fault only happens when you access memory outside of your process space.
Not all operating systems have seperate address spaces for each proces, in which case you can corrupt the state of another process or of the operating system with no warning.

C philosophy is always trust the programmer. And also not checking bounds allows the program to run faster.

As JaredPar said, C/C++ doesn't always perform range checking. If your program accesses a memory location outside your allocated array, your program may crash, or it may not because it is accessing some other variable on the stack.
To answer your question about sizeof operator in C:
You can reliably use sizeof(array)/size(array[0]) to determine array size, but using it doesn't mean the compiler will perform any range checking.
My research showed that C/C++ developers believe that you shouldn't pay for something you don't use, and they trust the programmers to know what they are doing. (see accepted answer to this: Accessing an array out of bounds gives no error, why?)
If you can use C++ instead of C, maybe use vector? You can use vector[] when you need the performance (but no range checking) or, more preferably, use vector.at() (which has range checking at the cost of performance). Note that vector doesn't automatically increase capacity if it is full: to be safe, use push_back(), which automatically increases capacity if necessary.
More information on vector: http://www.cplusplus.com/reference/vector/vector/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js