Character array and its memory allocation in C++

Character array and its memory allocation in C++ - c++

I am bit confused after reading a text book. Consider a character array ar[10] in C++. In the text book it says that 10 bytes will be allocated for the array.
Starting from subscript ar[0], how many elements can I store in the given array? Is it 10? If yes can I store data at ar[10]? I want to know how many bytes will be allocated for the array in total since I came to know that every string ends with \0. Will overflow happen if I try to store a character into ar[10]?

If yes can I store data at ar[10]
No.
In your example, ar is an array with ten values. The first value is index #0, so you have ar[0] through ar[9], inclusively. That's the ten values in this array. Count them. Most of us conveniently have exactly ten fingers. Start counting on your fingers, starting with ar[0], and stop when you've used all your ten fingers. You'll stop on ar[9].
Attempting to access ar[10] is undefined behavior.

It will store 10 items in total, including the '\0'. So, 9 characters, and one '\0' null terminator at ar[9].

You can store ten values, from index 0 to index 9. This seems really wrong at first, but remember that 0 is technically a value and must be counted as one. It's sort of like how unsigned ints will hold 2^32 values, but the highest usable number is actually (2^32)-1.
Note that if you want to have the array be null-terminated you will only be able to store 9 characters, as ar[9] will hold '\0'. You could store another character there instead, but will have to write your code around the fact that your C-string is not null-terminated.
That all said, it's generally considered bad practice to use character arrays for strings in C++. It's a lot more error-prone than just using the string standard library.
More info: http://www.cplusplus.com/reference/string/string/

Hence, you have declared a[10] so it carries 10 values. As it is char array which contains string and string is terminated by '\0'. '\0' is also a value.
So if you string length is n then your array size will be n+1 to keep n length string. Otherwise, the overflow will occur.
Observe the following example
int main(){
char a[1], r, t;
printf("Size %d Byte\n", sizeof(a));
a[0] ='a';
a[1] ='b';
a[2] ='c';
printf("%c\n",r); //c
printf("%c\n",t); //b
}
As your array size is 1. Though you have not assigned value of r,t it is auto assigned by a[2] and a[1] respectively.

Related

Append to String a Signed Int (Converted to Bytes) in Big Endian

I have a 4 byte integer (signed), and (i) I want to reverse the byte order, (ii) I want to store the bytes (i.e. the 4 bytes) as bytes of the string. I am working in C++. In order to reverse the byte order in Big Endian, I was using the ntohl, but I cannot use that due the fact that my numbers can be also negative.
Example:
int32_t a = -123456;
string s;
s.append(reinterpret_cast<char*>(reinterpret_cast<void*>(&a))); // int to byte
Further, when I am appending these data, it seems that I am appending 8 bytes instead of 4, why?
I need to use the append (I cannot use memcpy or something else).
Do you have any suggestion?

I was using the ntohl, but I cannot use that due the fact that my numbers can be also negative.
It's unclear why you think that negative number would be a problem. It's fine to convert negative numbers with ntohl.
s.append(reinterpret_cast<char*>(reinterpret_cast<void*>(&a)));
std::string::append(char*) requires that the argument points to a null terminated string. An integer is not null terminated (unless it happens to contain a byte that incidentally represents a null terminator character). As a result of violating this requirement, the behaviour of the program is undefined.
Do you have any suggestion?
To fix this bug, you can use the std::string::append(char*, size_type) overload instead:
s.append(reinterpret_cast<char*>(&a), sizeof a);
reinterpret_cast<char*>(reinterpret_cast<void*>
The inner cast to void* is redundant. It makes no difference.

sorting a string using counting sort

I looked at the counting sort algorithm to sort a string here: https://www.geeksforgeeks.org/counting-sort/. I have a few questions:
#define RANGE 255
What is the function of RANGE? Why do we have to specifically define the RANGE to 255?
int count[RANGE + 1], i;
Why do we have to declare the size of count[] as RANGE+1? Why couldn't it be just 256?
// Store count of each character
for(i = 0; arr[i]; ++i)
++count[arr[i]];
The array stores the count of the specified digit, but here we have characters in a string, so how does the above code convert the characters to numeric equivalents to be stored in the array?

I would not use that code as a learning example. There are so many errors in it that I wasn't even able to properly compile it.
What is the function of RANGE?
When you compile your code a special program called a preprocessor runs beforehand. The preprocessor essentially replaces a lot of things. It usually does this based on statements called preprocessor directives and they begin with the "#" symbol. In this case, #define RANGE 255 is telling the preprocessor to replace every occurence of "RANGE" in the code with "255". For example, the line int count[RANGE + 1], i becomes int count[255 + 1], i.
Why do we have to specifically define the RANGE to be 255?
To be completely honest I'm not sure why the code decided to use 255 for RANGE. I've tested the code and it works just fine with RANGE equal to 114 and it doesn't work with numbers. If you increase the length of the input string "geeksforgeeks" to something much larger then RANGE won't be sufficiently large enough.
How does the code convert the characters to numeric equivalents to be stored in the count array?
The char data type is actually an integer. Every character we use (A to Z, 0 to 9, punctuation, etc) all has a corresponding number. For example the code below will print out the number which corresponds to a which is "97".
#include <iostream>
int main()
{
char name[] = "abc";
int a = name[0];
std::cout << a;
return 0;
}
The line ++count[arr[i]]; simply accesses element i in the array arr and appends it to the count integer array which it can do because char is an integer. Once we have it in the count integer array it is treated like a normal integer and when we print it out in the console it shows us a number rather than a character.

RANGE defines possible keys for counters. Which are 0..RANGE. It might be arbitrary, but 255 is for having exactly 256 possible values. The same as the number of distinct characters.
So we have possible keys 0..255. That is exactly 256. You can hardcode it like this. But since RANGE is arbitrary, you may want to change it to 512 for example. In that case, you will need to change the size too.
From a logic point string consists of characters, but it is only our minds representation. For C++ string is an array of char type. Which is an integer type. Since the international part of ASCII table uses only values 0..127. We can safely use these values as array indexes.

Why do arrays start on 0 instead of 1? (C, C++) [duplicate]

This question already has answers here:
Why does the indexing start with zero in 'C'?
(16 answers)
Closed 3 years ago.
Not really a code problem but a doubt, why do arrays on C and C++ start on 0? Does it have anything to do with some internal process?
int array[4]={1,2,3,4};
cout<<array[0];
cout<<array[1];
cout<<array[2];
cout<<array[3]; ///This prints 1234
But why that instead of
int array[4]={1,2,3,4};
cout<<array[1]; //as the first element
cout<<array[2];
cout<<array[3];
cout<<array[4];
?

Because the notation does pointer arithmetic. array[0] actually means the location of the array plus the size of 0 elements.
As always in C, you're working close to the hardware.

Consider int arr[i] elements.
arr[i] is interpreted as *(arr + i). Now, arr is the address of the array or address of the 0th index element of the array. So, address of next element in the array is arr + 1 (because elements in the array are stored in consecutive memory locations).
So if you do *(arr+0) it gives the starting memory location of the array.
and *(arr+1) gives next memory location. so this i i.e 0,1,..etc can use like offset.
As #ravnsgaard said in C, you're working close to the hardware.

Sizeof and Strlen

I am trying to implement an encryption using a Salt and a Password. And since the recommended size for a Salt is 64 bits, I declared.
char Salt[8];
I used RAND_pseudo_bytes to get a random Salt this way:
RAND_pseudo_bytes((unsigned char*)Salt, sizeof Salt);
And because the hexdump output was different in length(sometimes 5, mostly 24 bytes) each time I compiled because I wrongly used strlen instread of sizeof:
RAND_pseudo_bytes((unsigned char*)Salt, strlen(Salt));
I tried the following line to figure out what's happening:
printf("\n%d\n",strlen(Salt));
which outputs 24 each time.
So, my question is: Why is the strlen(Salt)=24 when I declared Salt's length 8(sizeof(Salt)=8)? I would understand a 9(with the '\0', although not entirely sure how exactly would that happen), but 24 strikes me as odd. Thank you.

strlen is going to walk down the pointer you gave it and count the number of bytes until it reaches a null byte. In this case, your char array of 8 bytes has no null bytes, so strlen happily continues past the boundary into a region of memory beyond the defined char array on the stack, and whatever happens to be there will determine the behaviour of strlen. In this case, 24 bytes past the beginning of the array, there was a null byte.

Don't use char to represent bytes.
Over half of the values of a byte are not printable, i.e. they don't have corresponding printable values.
I suggest you iterate over the array of uint8_t using printf("0x%02X\n", array[i]);

strlen()searches for the first null character and counts all bytes excluding that null byte.
A salt is 8 non-zero bytes - and there's no guarantee that the next character is a null byte.
That's why sizeof and strlen differ.

sizeof is an operator that returns the number of bytes needed to store a specific data structure. When applied to an an array of characters, it represents of the three cases where the name of the array does not decay to the pointer to its first element (the other two are the usage of & and the initialization via a string literal).
strlen is instead a function, assuming that its input is a null-terminated sequence of characters. Because when you pass the name of the array of characters to a function, it does decay to the pointer of its first element, strlen has no way to know the size of the original data structure (like sizeof does). All it gets is a pointer to char. The only way it can determine the end of the string is by running through the sequence of characters, looking for a '\0'. In your case, it cannot find one before the 24th byte in memory. That happens by pure chance.
Try initializing your array with:
char Salt[8] = {0};
And make sure that your RAND_pseudo_bytes function preserves the sentinel '\0' in the treated string.

Beside the null termination of salt, as others pointed out, you need to change the format specifier in printf to %zu because strlen return type is size_t. Using wrong specifier invokes undefined behavior.

Addressing Your Question about strlen()
What strlen() is counting is the number of bytes until the first '\0' in memory.
char Salt[9] = { '\0' };
Will initialize Salt with all '\0's.
NOTE: As #OliCharlesworth pointed out, Salt can have embedded NULLs. Don't use any str*() methods. You need to use mem*() methods only and keep track of the length yourself. Don't rely on sizeof because arrays are turned into pointers when passed to functions.

what does this notation mean in c++

so I have:
char inBuf[80]
and then there's another line
inBuf+9
what does it mean when I add that +9 to the array's name?

It is same as referencing element number 9(0 based).
An equivalent notation would be:
&inBuf[9]
If you want to get the value, you could use *(inBuf+9)

This would point to the 10th element of the array. So for example:
*(inBuf + 9) = 10
would assign 10 to the 10th element.

Answer has been given already. I may only be repeating it.
This is called pointer arithmetic, because pointers are involved in the arithmetic operation. there are certain things only you can do with pointers. like you can add an integer to it, but you can subtract an integer only if pointer points to some array in the memory. also you can not subtract the pointers, because that may lead to some crucial memory location (for the OS).
addition in pointer arithmetic is special in a way that it takes care of the data type of the array elements, so when you say
char inBuf[80]
inBuf + 9
it advances 9 memory location sufficient enough to hold the 9 character (9*1 bytes typically)
int inBuf[80]
inBuf + 9
this will add 9 memory location sufficient enough to hold the 9 integers (9*4 bytes typically).
array and pointers are not always same, refer to "expert C programming" for that Also never use pointer arithmetic polymorphic-ally, refer "scott meyers book" for that

Using inBuf with no qualifier for an array index to use will be the same as seeing char *inBuf. inBuf + 9 would be the same as inBuf[9].

inBuf+9 means increasing the address of inBuf by 9.

inBuf refer the base address. but inBuf+ 9 locates the 10th element from the base address.
*(inBuf + 9) = 34;
This would assign the value 34 to the 10th element in the inBuf array.

When you perform addition with it, an array identifier such as your inBuf decays to a pointer to the first element in the array, and the number added is multiplied by the size of the array element (in this case char, which has size 1) to produce a new address.
So, inBuf + 9 is the address of the 10th element in the array, which could also be expressed as &inBuf[9]. You can use it as in:
*(inBuf + 9) = '\0'; // overwrite the 10th element in inBuf with a NUL
const char* p = strchr(inBuf + 9, ' '); // find space at or beyond 10th char

inBuf is like to write &inBuf[0].
So inBuf +9 means address of inBuf added with 9 chars length (&inBuf[9]).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Character array and its memory allocation in C++ - c++

It will store 10 items in total, including the '\0'. So, 9 characters, and one '\0' null terminator at ar[9].

Related

Append to String a Signed Int (Converted to Bytes) in Big Endian

sorting a string using counting sort

Why do arrays start on 0 instead of 1? (C, C++) [duplicate]

Sizeof and Strlen

what does this notation mean in c++

Categories

Resources