Why uninitialized char array is filled with random symbols? - c++

I have a this fragment of code in C++:
char x[50];
cout << x << endl;
which outputs some random symbols as seen here:
So my first question: what is the reason behind this output? Shouldn't it be spaces or at least same symbols?
The reason I am concerned with this is that I am writing program in CUDA and I'm doing some character manipulations inside __global__ function, hence the use of string gives a "calling host function is not allowed" error.
But if I am using "big enough" char array (each chunk of text I am operating with differs in size, meaning that it will not always utilize char array fully) it's sometimes not fully filled and I left with junk like in the picture below hanging at the end of text:
So my second question: is there any way to avoid this?

what is the reason behind this output?
The values in an automatic variable are indeterminate. The standard doesn't specify it, so it might be spaces as you said, it might be random content.
[...] sometimes not fully filled and I left with junk [...]
Strings in C are null-terminated, so any routine dedicated to printing a string will loop as long as no null byte is encountered. In uninitialized memory, this null byte occurs randomly (or not at all). These weird, trailing characters are a result of that.
is there any way to avoid this?
Yes. Initialize it.

(will assume x86 in this post)
what is the reason behind this output?
Here's roughly what happens, in assembly, when you do char x[50];:
ADD ESP, 0x34 ; 52 bytes
Essentially, the stack is moved up by 0x34 bytes (must be divisible by 4). Then, that space on the stack becomes x. There's no cleaning, no changes or pushes or pops, just this space becoming x. Anything that was there before (abandoned params, return addresses, variables from previous function calls) will be in x.
Here's roughly what happens when you do new char[50]:
1. Control gets passed to the allocator
2. The allocator looks for any heap of sufficient size (readas: an already allocated but uncommited heap)
3. If 2 fails, the allocator makes a new heap
4. The allocator takes the heap (either the found or allocated one) and commits it
5. The address of that heap is returned to your code where it is used as a char*
The same as with a stack, you get whatever data is there. Some programs or systems may have allocators that zero out heaps when they are allocated or committed, where others may only zero when allocated but not committed, and some may not zero at all. Depending on the allocator, you may get clean memory or you may get re-used and dirty memory. This is why the values here can be non-zero and aren't predictable.
is there any way to avoid this?
In the case of heap memory, you can overload the new and delete operators in C++ and always zero newly allocated memory. You can see examples of overloading these operators here. As for memory on the stack, you just have to live with zeroing it out every time.
ZeroMemory(myArray, sizeof(myarray));
Alternatively, for both methods, you could stay away from naked arrays and use std::vector or other wrappers that take care of initialization for you. You'll still want to make sure to initialize integers and other numeric or pointer data-types, though.

No, there is no way to avoid it. C++ does not initialize automatic variables of built-in types (such as arrays of built-in types in your case) automatically, you need to initialize them yourself.

Why are you having issues with this code?
char x[50];
cout << new char[50] << endl;
cout << x << endl;
You're leaking memory with the 'new char[50] without a corresponding delete.
Also, uninitialized memory is undefined as others have said and in most cases you get garbage within that memory block. A better method is to initialize it:
char x[50] = {};
char* y = new char[50]();
Then just remember to call delete on y later to free the memory. Yes, the OS will do it for you, but this is never a way to write good programs though.

Related

How to check if a certain memory address is available for use in c++?

I'm working on my hobby project in c++, and want to test a continuous memory allocation for variables of different types, (Like array with variables of different types). How can i check if a specific memory address is available for use?
More Details:
Let's say we've got the following code: we have an integer int_var, (it doesn't matter in which memory address this variable seats), In order to allocate a variable of different type in the address right after the address of int_var i need to check if that address is available and then use it. i tried the following code:
int int_var = 5;
float* flt_ptr = (float*)(&int_var + (sizeof(int_var) / sizeof(int)));
// check if flt_ptr is successfully allocated
if (flt_ptr) { // successfully allocated
// use that address
} else { // not successfully allocated
cout << "ERROR";
}
The problem is: When i run the program, sometimes flt_ptr is successfully allocated and all right, and sometimes not - but when it is not successfully allocated it throws an exception that says "Read access violation ..." instead of printing "ERROR". Why is that? Maybe i missed something about checking if flt_ptr is successfully allocated? Or did something wrong? If so, How do i check if flt_ptr is successfully allocated before i use it?
Thanks!!
This memory model you are assuming was valid back in DOS, where, in real mode, the memory was a continuous stream of bytes.
Now that we have paging (either in x86 or in x64), this is not possible. Therefore, you can make no assumptions on the existance of memory "near" memory.
You have to allocate properly, which means using C++ shared_ptr/unique_ptr/STL. Or, new/malloc the old (bad) way.
If you want variables to be one near the other, allocate the whole memory at once (via a struct, for example).
You can't, C++ memory model does not work like that.
The only valid pointers are those obtained by the '&' operator, those returned from 'new/malloc' and static arrays. There is no mechanism for checking if the memory address is (still) valid or whether the object has been destroyed or not existed there at all. So it is up to the programmer to manage the correctness of the pointers.
Because of the reasons above your program has undefined behavior.
if(pointer) only checks whether pointer==0, nothing more. Note that int n=5; int array[n]; is not valid C++ either. Not sure if you are using it, but if you do, don't.
Based on the comments, you want a heterogenous container. In that case use an array of unions or better std::array<std::variant<int,double,char, float...>> array;. Or std::vector if you need dynamic size.
C++ guarantees that arrays ([], malloc or new[]) are contiguous, but they only contain one type. In general, you cannot store float, double, int, char continuously together because of the alignment issues. The array above is continuous in terms of std::variant object, but its size will be at least the size of the largest type. So chars will not be packed together.
That is not how you allocate memory. You need to do it properly using new.
See here.
want to test a continuous memory allocation for variables of a
different types
You may use a structure and declare required variables as members for continuous memory allocation for different datatypes.
For example:
struct eg_struct
{
unsigned char abc;
unsigned int xyz;
}
Note that if required you may need to pack the structure.
Here there is no need to check whether memory is free or not.

What is the purpose of allocating a specific amount of memory for arrays in C++?

I'm a student taking a class on Data Structures in C++ this semester and I came across something that I don't quite understand tonight. Say I were to create a pointer to an array on the heap:
int* arrayPtr = new int [4];
I can access this array using pointer syntax
int value = *(arrayPtr + index);
But if I were to add another value to the memory position immediately after the end of the space allocated for the array, I would then be able to access it
*(arrayPtr + 4) = 0;
int nextPos = *(arrayPtr + 4);
//the value of nextPos will be 0, or whatever value I previously filled that space with
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems. So aside from it being a requirement of C++, why even give arrays a specific size when declaring them?
When you go past the end of allocated memory, you are actually accessing memory of some other object (or memory that is free right now, but that could change later). So, it will cause you problems. Especially if you'll try to write something to it.
I can access this array using pointer syntax
int value = *(arrayPtr + index);
Yeah, but don't. Use arrayPtr[index]
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems.
You understand wrong. Oh so very wrong. You're invoking undefined behavior and undefined behavior is undefined. It may work for a week, then break one day next week and you'll be left wondering why. If you don't know the collection size in advance use something dynamic like a vector instead of an array.
Yes, in C/C++ you can access memory outside of the space you claim to have allocated. Sometimes. This is what is referred to as undefined behavior.
Basically, you have told the compiler and the memory management system that you want space to store four integers, and the memory management system allocated space for you to store four integers. It gave you a pointer to that space. In the memory manager's internal accounting, those bytes of ram are now occupied, until you call delete[] arrayPtr;.
However, the memory manager has not allocated that next byte for you. You don't have any way of knowing, in general, what that next byte is, or who it belongs to.
In a simple example program like your example, which just allocates a few bytes, and doesn't allocate anything else, chances are, that next byte belongs to your program, and isn't occupied. If that array is the only dynamically allocated memory in your program, then it's probably, maybe safe to run over the end.
But in a more complex program, with multiple dynamic memory allocations and deallocations, especially near the edges of memory pages, you really have no good way of knowing what any bytes outside of the memory you asked for contain. So when you write to bytes outside of the memory you asked for in new you could be writing to basically anything.
This is where undefined behavior comes in. Because you don't know what's in that space you wrote to, you don't know what will happen as a result. Here's some examples of things that could happen:
The memory was not allocated when you wrote to it. In that case, the data is fine, and nothing bad seems to happen. However, if a later memory allocation uses that space, anything you tried to put there will be lost.
The memory was allocated when you wrote to it. In that case, congratulations, you just overwrote some random bytes from some other data structure somewhere else in your program. Imagine replacing a variable somewhere in one of your objects with random data, and consider what that would mean for your program. Maybe a list somewhere else now has the wrong count. Maybe a string now has some random values for the first few characters, or is now empty because you replaced those characters with zeroes.
The array was allocated at the edge of a page, so the next bytes don't belong to your program. The address is outside your program's allocation. In this case, the OS detects you accessing random memory that isn't yours, and terminates your program immediately with SIGSEGV.
Basically, undefined behavior means that you are doing something illegal, but because C/C++ is designed to be fast, the language designers don't include an explicit check to make sure you don't break the rules, like other languages (e.g. Java, C#). They just list the behavior of breaking the rules as undefined, and then the people who make the compilers can have the output be simpler, faster code, since no array bounds checks are made, and if you break the rules, it's your own problem.
So yes, this sometimes works, but don't ever rely on it.
It would not cause any problems in a a purely abstract setting, where you only worry about whether the logic of the algorithm is sound. In that case there's no reason to declare the size of an array at all. However, your computer exists in the physical world, and only has a limited amount of memory. When you're allocating memory, you're asking the operating system to let you use some of the computer's finite memory. If you go beyond that, the operating system should stop you, usually by killing your process/program.
Yes, you must write it as arrayptr[index] because the position in memory of *(arrayptr + 4) is past the end of the space which you have allocated for the array. Its the flaw in C++ that the array size cant be extended once allocated.

What do C++ arrays init to?

So I can fix this manually so it isn't an urgent question but I thought it was really strange:
Here is the entirety of my code before the weird thing that happens:
int main(int argc, char** arg) {
int memory[100];
int loadCounter = 0;
bool getInput = true;
print_memory(memory);
and then some other unrelated stuff.
The print memory just prints the array which should've initialized to all zero's but instead the first few numbers are:
+1606636544 +32767 +1606418432 +32767 +1856227894 +1212071026 +1790564758 +813168429 +0000 +0000
(the plus and the filler zeros are just for formatting since all the numbers are supposed to be from 0-1000 once the array is filled. The rest of the list is zeros)
It also isn't memory leaking because I tried initializing a different array variable and on the first run it also gave me a ton of weird numbers. Why is this happening?
Since you asked "What do C++ arrays init to?", the answer is they init to whatever happens to be in the memory they have been allocated at the time they come into scope.
I.e. they are not initialized.
Do note that some compilers will initialize stack variables to zero in debug builds; this can lead to nasty, randomly occurring issues once you start doing release builds.
The array you are using is stack allocated:
int memory[100];
When the particular function scope exits (In this case main) or returns, the memory will be reclaimed and it will not leak. This is how stack allocated memory works. In this case you allocated 100 integers (32 bits each on my compiler) on the stack as opposed to on the heap. A heap allocation is just somewhere else in memory hopefully far far away from the stack. Anyways, heap allocated memory has a chance for leaking. Low level Plain Old Data allocated on the stack (like you wrote in your code) won't leak.
The reason you got random values in your function was probably because you didn't initialize the data in the 'memory' array of integers. In release mode the application or the C runtime (in windows at least) will not take care of initializing that memory to a known base value. So the memory that is in the array is memory left over from last time the stack was using that memory. It could be a few milli-seconds old (most likely) to a few seconds old (less likely) to a few minutes old (way less likely). Anyways, it's considered garbage memory and it's to be avoided at all costs.
The problem is we don't know what is in your function called print_memory. But if that function doesn't alter the memory in any ways, than that would explain why you are getting seemingly random values. You need to initialize those values to something first before using them. I like to declare my stack based buffers like this:
int memory[100] = {0};
That's a shortcut for the compiler to fill the entire array with zero's.
It works for strings and any other basic data type too:
char MyName[100] = {0};
float NoMoney[100] = {0};
Not sure what compiler you are using, but if you are using a microsoft compiler with visual studio you should be just fine.
In addition to other answers, consider this: What is an array?
In managed languages, such as Java or C#, you work with high-level abstractions. C and C++ don't provide abstractions (I mean hardware abstractions, not language abstractions like OO features). They are dessigned to work close to metal that is, the language uses the hardware directly (Memory in this case) without abstractions.
That means when you declare a local variable, int a for example, what the compiler does is to say "Ok, im going to interpret the chunk of memory [A,A + sizeof(int)] as an integer, which I call 'a'" (Where A is the offset between the beginning of that chunk and the start address of function's stack frame).
As you can see, the compiler only "assigns" memory-segments to variables. It does not do any "magic", like "creating" variables. You have to understand that your code is executed in a machine, and the machine has only a memory and a CPU. There is no magic.
So what is the value of a variable when the function execution starts? The value represented with the data which the chunk of memory of the variable has. Commonly, that data has no sense from our current point of view (Could be part of the data used previously by a string, for example), so when you access that variable you get extrange values. Thats what we call "garbage": Data previously written which has no sense in our context.
The same applies to an array: An array is only a bigger chunk of memory, with enough space to fit all the values of the array: [A,A + (length of the array)*sizeof(type of array elements)]. So as in the variable case, the memory contains garbage.
Commonly you want to initialize an array with a set of values during its declaration. You could achieve that using an initialiser list:
int array[] = {1,2,3,4};
In that case, the compiler adds code to the function to initialize the memory-chunk which the array is with that values.
Sidenote: Non-POD types and static storage
The things explained above only applies to POD types such as basic types and arrays of basic types. With non-POD types like classes the compiler adds calls to the constructor of the variables, which are designed to initialise the values (attributes) of a class instance.
In addition, even if you use POD types, if variables have static storage specification, the compiler initializes its memory with a default value, because static variables are allocated at program start.
the local variable on stack is not initialized in c/c++. c/c++ is designed to be fast so it doesn't zero stack on function calls.
Before main() runs, the language runtime sets up the environment. Exactly what it's doing you'd have to discover by breaking at the load module's entry point and watching the stack pointer, but at any rate your stack space on entering main is not guaranteed clean.
Anything that needs clean stack or malloc or new space gets to clean it itself. Plenty of things don't. C[++] isn't in the business of doing unnecessary things. In C++ a class object can have non-trivial constructors that run implicitly, those guarantee the object's set up for use, but arrays and plain scalars don't have constructors, if you want an inital value you have to declare an initializer.

What's the advantage of malloc?

What is the advantage of allocating a memory for some data. Instead we could use an array of them.
Like
int *lis;
lis = (int*) malloc ( sizeof( int ) * n );
/* Initialize LIS values for all indexes */
for ( i = 0; i < n; i++ )
lis[i] = 1;
we could have used an ordinary array.
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing? And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
Your question seems to rather compare dynamically allocated C-style arrays with variable-length arrays, which means that this might be what you are looking for: Why aren't variable-length arrays part of the C++ standard?
However the c++ tag yields the ultimate answer: use std::vector object instead.
As long as it is possible, avoid dynamic allocation and responsibility for ugly memory management ~> try to take advantage of objects with automatic storage duration instead. Another interesting reading might be: Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
"And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?"
- If you still consider n to be the amount of integers that it is possible to store in this array, you will most likely experience undefined behavior.
More fundamentally, I think, apart from the stack vs heap and variable vs constant issues (and apart from the fact that you shouldn't be using malloc() in C++ to begin with), is that a local array ceases to exist when the function exits. If you return a pointer to it, that pointer is going to be useless as soon as the caller receives it, whereas memory dynamically allocated with malloc() or new will still be valid. You couldn't implement a function like strdup() using a local array, for instance, or sensibly implement a linked representation list or tree.
The answer is simple. Local1 arrays are allocated on your stack, which is a small pre-allocated memory for your program. Beyond a couple thousand data, you can't really do much on a stack. For higher amounts of data, you need to allocate memory out of your stack.
This is what malloc does.
malloc allocates a piece of memory as big as you ask it. It returns a pointer to the start of that memory, which could be treated similar to an array. If you write beyond the size of that memory, the result is undefined behavior. This means everything could work alright, or your computer may explode. Most likely though you'd get a segmentation fault error.
Reading values from the memory (for example for printing) is the same as reading from an array. For example printf("%d", list[5]);.
Before C99 (I know the question is tagged C++, but probably you're learning C-compiled-in-C++), there was another reason too. There was no way you could have an array of variable length on the stack. (Even now, variable length arrays on the stack are not so useful, since the stack is small). That's why for variable amount of memory, you needed the malloc function to allocate memory as large as you need, the size of which is determined at runtime.
Another important difference between local arrays, or any local variable for that matter, is the life duration of the object. Local variables are inaccessible as soon as their scope finishes. malloced objects live until they are freed. This is essential in practically all data structures that are not arrays, such as linked-lists, binary search trees (and variants), (most) heaps etc.
An example of malloced objects are FILEs. Once you call fopen, the structure that holds the data related to the opened file is dynamically allocated using malloc and returned as a pointer (FILE *).
1 Note: Non-local arrays (global or static) are allocated before execution, so they can't really have a length determined at runtime.
I assume you are asking what is the purpose of c maloc():
Say you want to take an input from user and now allocate an array of that size:
int n;
scanf("%d",&n);
int arr[n];
This will fail because n is not available at compile time. Here comes malloc()
you may write:
int n;
scanf("%d",&n);
int* arr = malloc(sizeof(int)*n);
Actually malloc() allocate memory dynamically in the heap area
Some older programming environments did not provide malloc or any equivalent functionality at all. If you needed dynamic memory allocation you had to code it yourself on top of gigantic static arrays. This had several drawbacks:
The static array size put a hard upper limit on how much data the program could process at any one time, without being recompiled. If you've ever tried to do something complicated in TeX and got a "capacity exceeded, sorry" message, this is why.
The operating system (such as it was) had to reserve space for the static array all at once, whether or not it would all be used. This phenomenon led to "overcommit", in which the OS pretends to have allocated all the memory you could possibly want, but then kills your process if you actually try to use more than is available. Why would anyone want that? And yet it was hyped as a feature in mid-90s commercial Unix, because it meant that giant FORTRAN simulations that potentially needed far more memory than your dinky little Sun workstation had, could be tested on small instance sizes with no trouble. (Presumably you would run the big instance on a Cray somewhere that actually had enough memory to cope.)
Dynamic memory allocators are hard to implement well. Have a look at the jemalloc paper to get a taste of just how hairy it can be. (If you want automatic garbage collection it gets even more complicated.) This is exactly the sort of thing you want a guru to code once for everyone's benefit.
So nowadays even quite barebones embedded environments give you some sort of dynamic allocator.
However, it is good mental discipline to try to do without. Over-use of dynamic memory leads to inefficiency, of the kind that is often very hard to eliminate after the fact, since it's baked into the architecture. If it seems like the task at hand doesn't need dynamic allocation, perhaps it doesn't.
However however, not using dynamic memory allocation when you really should have can cause its own problems, such as imposing hard upper limits on how long strings can be, or baking nonreentrancy into your API (compare gethostbyname to getaddrinfo).
So you have to think about it carefully.
we could have used an ordinary array
In C++ (this year, at least), arrays have a static size; so creating one from a run-time value:
int lis[n];
is not allowed. Some compilers allow this as a non-standard extension, and it's due to become standard next year; but, for now, if we want a dynamically sized array we have to allocate it dynamically.
In C, that would mean messing around with malloc; but you're asking about C++, so you want
std::vector<int> lis(n, 1);
to allocate an array of size n containing int values initialised to 1.
(If you like, you could allocate the array with new int[n], and remember to free it with delete [] lis when you're finished, and take extra care not to leak if an exception is thrown; but life's too short for that nonsense.)
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
malloc in C and new in C++ allocate persistent memory from the "free store". Unlike memory for local variables, which is released automatically when the variable goes out of scope, this persists until you explicitly release it (free in C, delete in C++). This is necessary if you need the array to outlive the current function call. It's also a good idea if the array is very large: local variables are (typically) stored on a stack, with a limited size. If that overflows, the program will crash or otherwise go wrong. (And, in current standard C++, it's necessary if the size isn't a compile-time constant).
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?
You haven't allocated enough space for n integers; so code that assumes you have will try to access memory beyond the end of the allocated space. This will cause undefined behaviour; a crash if you're lucky, and data corruption if you're unlucky.
And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
You mean something like this?
for (i = 0; i < len; ++i) std::cout << lis[i] << '\n';

Default dynamic memory size

I have the following code:
#include <iostream>`
using namespace std;
int main() {
char* data = new char;
cin >> data;
cout << data << endl;
return 1;
}
When I type in a char* of 26 ones as a string literal, it compiles and prints it. But when I do 27 ones as data, it aborts. I want to know why.
Why is it 27?
Does it have a special meaning to it?
You're only allocating one character's worth of space. So, reading in any data more than that is overwriting memory you don't own, so that's undefined behavior. Which is what you're seeing in the result.
You'd have to look into specific details under the hood of your C++ implementation. Probably the implementation of malloc, and so on. Your code writes past the end of your buffer, which is UB according to the C++ standard. To get any idea at all of why it behaves as it does, you'd need to know what is supposed to be stored in the 27 or 28 bytes you overwrote, that you shouldn't have done.
Most likely, 27 ones just so happens to be the point at which you started damaging the data structures used by the memory allocator to track allocated and free blocks. But with UB you might find that the behavior isn't as consistent as it first appears. As a C++ programmer you aren't really "entitled" to know about such details, because if you knew about them then you might start relying on them, and then they might change without notice.
Your dynamically allocating one byte of storage. To allocate multiples, do this:
char* data = new char[how_many_bytes];
When you use a string literal, that much stack space is allocated automatically. When you allocate dynamically, you have to get the number of bytes right or you will get a segfault.
This is just Undefined Behavior, a.k.a. "UB". The program can do anything or nothing. Any effect you see is non-reproducable.
Why is it UB?
Because you allocate space for a single char value, and you treat that as a zero-terminated string. Since the zero takes up one char value there is no (guaranteed) space for real data. However, since C++ implementations generally do not add inefficient checking of things, you can get away with storing data in parts of memory that you don't own – until it crashes or produces invalid results or has other ungood effect, because of the UB.
To do this correctly, use std::string instead of char*, and don't new or delete (a std::string does that automatically for you).
Then use std::getline to read one line of input into the string.