Character arrays question C++ - c++

Is there any difference between the below two snippets?
One is a char array, whereas the other is a character array pointer, but they do behave the same, don't they?
Example 1:
char * transport_layer_header;
// Memory allocation for char * - allocate memory for a 2 character string
char * transport_layer_header = (char *)malloc(2 * sizeof(char));
sprintf(transport_layer_header,"%d%d",1,2);
Example 2:
char transport_layer_header[2];
sprintf(transport_layer_header,"%d%d",1,2);

Yes, there is a difference. In the first example, you dynamically allocate a two-element char array on the heap. In the second example you have a local two-element char array on the stack.
In the first example, since you don't free the pointer returned by malloc, you also have a memory leak.
They can often be used in the same way, for example using sprintf as you demonstrate, but they are fundamentally different under the hood.

The other difference is that your first example will corrupt data on the heap, while the second will corrupt data on the stack. Neither allocates room for the trailing \0.

The most important difference, IMO, is that in the second option transport_layer_header is a const pointer (you can't make it point to a different place), where as in the first option - you can.
This is ofcourse in addition to the previous answers.

Assuming you correct the "no room for the null" problem, i.e. allocate 3 bytes instead of 2, you would normally only use malloc() if you need dynamic memory. For example, if you don't know how big the array is going to be, you might use malloc.
As pointed out, if you malloc() and don't later free the memory then you have a memory leak.
One more point: you really should check the return value of malloc() to ensure you got the memory. I know that in Solaris malloc() never fails (thought it may sleep -- a good reason not to call it if you don't want your process going to sleep, as noted above). I assume that on Linux malloc() could fail (i.e. if there is not enough memory available). [Please correct me if I'm wrong.]

Related

Is calloc better than malloc?

I have just learned about the C calloc() function the other day. Having read its description and how it differs from malloc (1, 2), I get the idea that, as a non-embedded programmer, I should always use calloc(). But is that really the case?
One reservation I have is the extra delay for accessing the calloc()-ed memory, but I also wonder if there are cases when switching from malloc() to calloc() will break the program in some more serious way.
P. S. The zero-initializing aspect of calloc() is quite clear to me. What I'm interested in learning about is the other difference between calloc() and malloc() - lazy memory allocation provided by calloc(). Please don't post an answer if you're going to focus purely on the memory initialization aspect.
This is really a situation-dependent decision. Rule of thumb is
If you're first writing into the allocated memory, malloc() is better (less possible overhead).
Example: Consider the following scenario
char * pointer = NULL;
//allocation
strcpy(pointer, source);
here, allocation can be very well using malloc().
If there's a possibility of read-before-write with the allocated memory, go for calloc(), as it initializes memory. This way you can avoid the problem with unitialized memory read-before-write scenario which invokes undefined behavior.
Example:
char * pointer = NULL;
//allocation
strcat(pointer, source);
Here, strcat() needs the first argument to be a string already, and using malloc() to allocate cannot guarantee that. As calloc() zero-initializes the memory, it will serve the purpose here and thus, calloc() is the way to go for this case.
To elaborate the second scenario, quoting from C11, chapter §7.24.3.1 (follow my emphasis)
The strcat() function appends a copy of the string pointed to by s2 (including the
terminating null character) to the end of the string pointed to by s1. The initial character
of s2 overwrites the null character at the end of s1. [....]
So, in this case, the destination pointer should be a pointer to a string. Allocating via calloc() guarantees that while allocating using malloc() cannot guarantee that, as we know, from chapter §7.22.3.4
The malloc function allocates space for an object whose size is specified by size and
whose value is indeterminate.
EDIT:
One possible scenario where malloc() is advised over calloc(), is writing test stubs used for unit / integration testing. In that case, use of calloc() can hide potential bugs which arrive with cases similar to the later one.
The main difference between malloc and calloc is that calloc will zero-initialize your buffer, and malloc will leave the memory uninitialized.
This gets to the common programming idiom of "don't pay for what you don't use". In other words, why zero-initialize something (which has a cost) if you don't necessarily need to (yet)?
As a side note since you tagged C++: manual memory usage using new/delete is frowned upon in modern C++ (except in rare cases of memory pools, etc). Use of malloc/free is even more rare and should be used very sparingly.
Use calloc for zero-filled allocations, but only when the zero-filling is really needed.
You should always use calloc(count,size) instead of buff=malloc(total_size); memset(buff,0,total_size).
The call to zero-memset is the key. Both malloc and calloc are translated into OS calls which do lots of optimizations, use hardware tricks whenever possible, etc. But there is little that OS can do with memset.
On the other hand, when do you need to zero-fill the allocated memory? The only common use is for zero-ended arbitrary length elements, such as C-strings. If that's the case, sure, go with calloc.
But if you allocate the structures in which the elements are either fixed-length or carry the length of arbitrary-sized elements with them (such as C++ strings and vectors), zero-filling is not helpful at all, and if you try to rely on it, it can lead to tricky bugs.
Suppose you write your custom linked list, and decide to skip the zeroing of the pointer to the next node by allocating the memory for the node with calloc. It works fine, then someone uses it with custom placement new, which doesn't zero-fill. Trouble is, sometimes it will be zero-filled, and can pass all the usual testing, go in production, and there it will crash, crash sometimes, the dreaded unrepeatable bug.
For debug purposes, zero-filling is usually not that good, either. 0 is too common, you rarely can write something like assert(size); because it's usually a valid value, too, you handle it with if(!size), not with asserts. On the debugger it won't catch your eye, either, there are usually zeros everywhere in your memory. The best practice is to avoid unsigned types for the lengths (signed lengths can be useful for runtime error handling and some of the most common overflow checks, too). So, while buff=malloc(total_size); memset(buff,0,total_size) is to be avoided, the following is OK:
const signed char UNINIT_MEM=MY_SENTINEL_VALUE;
buff=malloc(total_size);
#if DEBUG_MEMORY
memset(buff,UNINIT_MEM,total_size);
#endif
In debug mode, runtime library or even OS do this for you sometimes, for example check this excellent post on VC++-specific sentinel values.
It's all about what you want to do with the memory. malloc returns uninitialized (and possibly not even real yet) memory. calloc return real, zero'ed memory. If you need it zero'ed, then yes, calloc is your best option. If you don't, why pay for zero'ing with a latency hit when you don't need it?
malloc() is far more common in C code than calloc().
A text search for "malloc" will miss the calloc() calls.
Replacement libraires will often have mymalloc(), myrealloc(), myfree() but not mycalloc().
Zero-initialisation of pointers and reals isn't actually guaranteed to have the expected effect, though on every major platform all bits zero is NULL for a pointer and 0.0 for a real.
calloc() tends to hide bugs. Debug malloc usually sets a fill pattern like DEADBEEF which evaluates to a large negative number and doesn't look like real data. So the program quickly crashes and, with a debugger, the error is flushed out.

C++ doesn't tell you the size of a dynamic array. But why?

I know that there is no way in C++ to obtain the size of a dynamically created array, such as:
int* a;
a = new int[n];
What I would like to know is: Why? Did people just forget this in the specification of C++, or is there a technical reason for this?
Isn't the information stored somewhere? After all, the command
delete[] a;
seems to know how much memory it has to release, so it seems to me that delete[] has some way of knowing the size of a.
It's a follow on from the fundamental rule of "don't pay for what you don't need". In your example delete[] a; doesn't need to know the size of the array, because int doesn't have a destructor. If you had written:
std::string* a;
a = new std::string[n];
...
delete [] a;
Then the delete has to call destructors (and needs to know how many to call) - in which case the new has to save that count. However, given it doesn't need to be saved on all occasions, Bjarne decided not to give access to it.
(In hindsight, I think this was a mistake ...)
Even with int of course, something has to know about the size of the allocated memory, but:
Many allocators round up the size to some convenient multiple (say 64 bytes) for alignment and convenience reasons. The allocator knows that a block is 64 bytes long - but it doesn't know whether that is because n was 1 ... or 16.
The C++ run-time library may not have access to the size of the allocated block. If for example, new and delete are using malloc and free under the hood, then the C++ library has no way to know the size of a block returned by malloc. (Usually of course, new and malloc are both part of the same library - but not always.)
One fundamental reason is that there is no difference between a pointer to the first element of a dynamically allocated array of T and a pointer to any other T.
Consider a fictitious function that returns the number of elements a pointer points to.
Let's call it "size".
Sounds really nice, right?
If it weren't for the fact that all pointers are created equal:
char* p = new char[10];
size_t ps = size(p+1); // What?
char a[10] = {0};
size_t as = size(a); // Hmm...
size_t bs = size(a + 1); // Wut?
char i = 0;
size_t is = size(&i); // OK?
You could argue that the first should be 9, the second 10, the third 9, and the last 1, but to accomplish this you need to add a "size tag" on every single object.
A char will require 128 bits of storage (because of alignment) on a 64-bit machine. This is sixteen times more than what is necessary.
(Above, the ten-character array a would require at least 168 bytes.)
This may be convenient, but it's also unacceptably expensive.
You could of course envision a version that is only well-defined if the argument really is a pointer to the first element of a dynamic allocation by the default operator new, but this isn't nearly as useful as one might think.
You are right that some part of the system will have to know something about the size. But getting that information is probably not covered by the API of memory management system (think malloc/free), and the exact size that you requested may not be known, because it may have been rounded up.
You will often find that memory managers will only allocate space in a certain multiple, 64 bytes for example.
So, you may ask for new int[4], i.e. 16 bytes, but the memory manager will allocate 64 bytes for your request. To free this memory it doesn't need to know how much memory you asked for, only that it has allocated you one block of 64 bytes.
The next question may be, can it not store the requested size? This is an added overhead which not everybody is prepared to pay for. An Arduino Uno for example only has 2k of RAM, and in that context 4 bytes for each allocation suddenly becomes significant.
If you need that functionality then you have std::vector (or equivalent), or you have higher-level languages. C/C++ was designed to enable you to work with as little overhead as you choose to make use of, this being one example.
There is a curious case of overloading the operator delete that I found in the form of:
void operator delete[](void *p, size_t size);
The parameter size seems to default to the size (in bytes) of the block of memory to which void *p points. If this is true, it is reasonable to at least hope that it has a value passed by the invocation of operator new and, therefore, would merely need to be divided by sizeof(type) to deliver the number of elements stored in the array.
As for the "why" part of your question, Martin's rule of "don't pay for what you don't need" seems the most logical.
There's no way to know how you are going to use that array.
The allocation size does not necessarily match the element number so you cannot just use the allocation size (even if it was available).
This is a deep flaw in other languages not in C++.
You achieve the functionality you desire with std::vector yet still retain raw access to arrays. Retaining that raw access is critical for any code that actually has to do some work.
Many times you will perform operations on subsets of the array and when you have extra book-keeping built into the language you have to reallocate the sub-arrays and copy the data out to manipulate them with an API that expects a managed array.
Just consider the trite case of sorting the data elements.
If you have managed arrays then you can't use recursion without copying data to create new sub-arrays to pass recursively.
Another example is an FFT which recursively manipulates the data starting with 2x2 "butterflies" and works its way back to the whole array.
To fix the managed array you now need "something else" to patch over this defect and that "something else" is called 'iterators'. (You now have managed arrays but almost never pass them to any functions because you need iterators +90% of the time.)
The size of an array allocated with new[] is not visibly stored anywhere, so you can't access it. And new[] operator doesn't return an array, just a pointer to the array's first element. If you want to know the size of a dynamic array, you must store it manually or use classes from libraries such as std::vector

Why uninitialized char array is filled with random symbols?

I have a this fragment of code in C++:
char x[50];
cout << x << endl;
which outputs some random symbols as seen here:
So my first question: what is the reason behind this output? Shouldn't it be spaces or at least same symbols?
The reason I am concerned with this is that I am writing program in CUDA and I'm doing some character manipulations inside __global__ function, hence the use of string gives a "calling host function is not allowed" error.
But if I am using "big enough" char array (each chunk of text I am operating with differs in size, meaning that it will not always utilize char array fully) it's sometimes not fully filled and I left with junk like in the picture below hanging at the end of text:
So my second question: is there any way to avoid this?
what is the reason behind this output?
The values in an automatic variable are indeterminate. The standard doesn't specify it, so it might be spaces as you said, it might be random content.
[...] sometimes not fully filled and I left with junk [...]
Strings in C are null-terminated, so any routine dedicated to printing a string will loop as long as no null byte is encountered. In uninitialized memory, this null byte occurs randomly (or not at all). These weird, trailing characters are a result of that.
is there any way to avoid this?
Yes. Initialize it.
(will assume x86 in this post)
what is the reason behind this output?
Here's roughly what happens, in assembly, when you do char x[50];:
ADD ESP, 0x34 ; 52 bytes
Essentially, the stack is moved up by 0x34 bytes (must be divisible by 4). Then, that space on the stack becomes x. There's no cleaning, no changes or pushes or pops, just this space becoming x. Anything that was there before (abandoned params, return addresses, variables from previous function calls) will be in x.
Here's roughly what happens when you do new char[50]:
1. Control gets passed to the allocator
2. The allocator looks for any heap of sufficient size (readas: an already allocated but uncommited heap)
3. If 2 fails, the allocator makes a new heap
4. The allocator takes the heap (either the found or allocated one) and commits it
5. The address of that heap is returned to your code where it is used as a char*
The same as with a stack, you get whatever data is there. Some programs or systems may have allocators that zero out heaps when they are allocated or committed, where others may only zero when allocated but not committed, and some may not zero at all. Depending on the allocator, you may get clean memory or you may get re-used and dirty memory. This is why the values here can be non-zero and aren't predictable.
is there any way to avoid this?
In the case of heap memory, you can overload the new and delete operators in C++ and always zero newly allocated memory. You can see examples of overloading these operators here. As for memory on the stack, you just have to live with zeroing it out every time.
ZeroMemory(myArray, sizeof(myarray));
Alternatively, for both methods, you could stay away from naked arrays and use std::vector or other wrappers that take care of initialization for you. You'll still want to make sure to initialize integers and other numeric or pointer data-types, though.
No, there is no way to avoid it. C++ does not initialize automatic variables of built-in types (such as arrays of built-in types in your case) automatically, you need to initialize them yourself.
Why are you having issues with this code?
char x[50];
cout << new char[50] << endl;
cout << x << endl;
You're leaking memory with the 'new char[50] without a corresponding delete.
Also, uninitialized memory is undefined as others have said and in most cases you get garbage within that memory block. A better method is to initialize it:
char x[50] = {};
char* y = new char[50]();
Then just remember to call delete on y later to free the memory. Yes, the OS will do it for you, but this is never a way to write good programs though.

How do strings allocate memory in c++?

I know that dynamic memory has advantages over setting a fixed size array and and using a portion of it. But in dynamic memory you have to enter the amount data that you want to store in the array. When using strings you can type as many letters as you want(you can even use strings for numbers and then use a function to convert them). This fact makes me think that dynamic memory for character arrays is obsolete compared to strings.
So i wanna know what are the advantages and disadvantages when using strings? When is the space occupied by strings freed? Is maybe the option to free your dynamically allocated memory with delete an advantage over strings? Please explain.
The short answer is "no, there is no drawbacks, only advantages" with std::string over character arrays.
Of course, strings do USE dynamic memory, it just hides the fact behind the scenes so you don't have to worry about it.
In answer to you question: When is the space occupied by strings freed? this post may be helpful. Basically, std::strings are freed once they go out of scope. Often the compiler can decide when to allocate and release the memory.
std::string usually contains an internal dynamically allocated buffer. When you assign data, or if you push back new data, and the current buffer size is not sufficient, a new buffer is allocated with an increased size and the old data is copied or moved to the new buffer. The old buffer is then deallocated.
The main buffer is deallocated when the string goes out of scope. If the string object is a local variable in a function (on the stack), it will deallocate at the end of the current code block. If it's a function parameter, when the function exits. If it's a class member, whenever the class is destroyed.
The advantage of strings is flexibility (increases in size automatically) and safety (harder to go over the bounds of an array). A fixed-size char array on the stack is faster as no dynamic allocation is required. But you should worry about that if you have a performance problem, and not before.
well, your question got me thinking, and then i understood that you are talking about syntax differences, because both ways are dynamic allocating char arrays. the only difference is in the need:
if you need to create a string containing a sentence then you can, and
that's fine, not to use malloc
if you want an array and to "play" with it, meaning change or set the cells cording to some method, or changing it's size, then initiating it with malloc would be the appropriate way
the only reason i see to a static allocating char a[17] (for example) is for a single purpose string that you need, meaning only when you know the exact size you'll need and it won't change
and one important point the i found:
In dynamic memory allocation, if the memory is being continually allocated but the one allocated for objects that are not in use, is not released, then it can lead to stack overflow condition or memory leak which is a big disadvantage.

C Programming-Stack and Heap Array Declarations

Suppose I declare an array as int myarray[5]
Or declare it as int*myarray=malloc(5*sizeof(int))
Will both the declarations set equal amount of memory in number of bytes?
Without considering that the former declaration is for the stack and the latter on the heap.
Thank you!
There's a fundamental difference, that may not be apparent in the way you use myarray:
int myarray[5]; declares an array of five integers, and the array is an automatic variable (and it is uninitialized).
int * myarray = malloc(5 * sizeof(int)); declares a variable that is a pointer to an int (also as an automatic variable), and that pointer is initialized with the result of a library call. That library call promises to make the resulting pointer point to a region of memory that's big enough to store five consecutive integers.
Because of pointer arithmetic, array-to-pointer decay and the convention that a[i] is the same as *(a + i), you can use both variables in the same way, namely as myarray[i]. This is of course by design.
If you're looking for a difference, then maybe the following helps: The array-of-five-ints is a single object, and it has a definite size. By contrast, the malloc library call doesn't create any objects. It just sets aside enough memory (and suitably aligned), but it could for example allocate a lot more memory.
(In C++ there's of course the additional distinction between memory and objects.)
Neither is guaranteed to allocate exactly 5*sizeof(int) bytes, though both will give you at least that much space (assuming no allocation failures or stack exhaustion).
In the first case, the stack variable may be surrounded by alignment padding, and/or stack canaries (depending on compile options). These could result in the stack pointer being adjusted by more than 5*sizeof(int) bytes.
In the second case, you allocate a int * on the stack (sizeof(int *) bytes), plus the space that malloc returns. malloc may allocate additional memory in the form of allocation tracking structures, alignment padding, linked-list pointers, etc. Thus, in that case you are also not guaranteed to allocate exactly 5*sizeof(int) bytes.
If you want to be very precise about your memory usage, the mmap function allows you to request pages of virtual memory from the OS. The memory you request this way will be precisely the amount you request (ignoring the space taken up in the kernel to track those allocations).
Short Answer: No normally the latter use a small larger memory.
Long Answer:
Memory management will certainly use some extra memory to manage the returned pointer and be able to track it, and free it at a later time and you declare an extra pointer to point to that memory. So its actual memory is sizeof(int*) + malloc_overhead. But in first case you use exactly 5 int(plus alignment possibly).
Dynamic allocation will require at least a few extra bytes; however many bytes for the pointer variable in addition to the 5 int-sized elements, and potentially some extra bytes to track the size of the allocated region so that it can be freed properly.