Sizeof and Strlen - c++

I am trying to implement an encryption using a Salt and a Password. And since the recommended size for a Salt is 64 bits, I declared.
char Salt[8];
I used RAND_pseudo_bytes to get a random Salt this way:
RAND_pseudo_bytes((unsigned char*)Salt, sizeof Salt);
And because the hexdump output was different in length(sometimes 5, mostly 24 bytes) each time I compiled because I wrongly used strlen instread of sizeof:
RAND_pseudo_bytes((unsigned char*)Salt, strlen(Salt));
I tried the following line to figure out what's happening:
printf("\n%d\n",strlen(Salt));
which outputs 24 each time.
So, my question is: Why is the strlen(Salt)=24 when I declared Salt's length 8(sizeof(Salt)=8)? I would understand a 9(with the '\0', although not entirely sure how exactly would that happen), but 24 strikes me as odd. Thank you.

strlen is going to walk down the pointer you gave it and count the number of bytes until it reaches a null byte. In this case, your char array of 8 bytes has no null bytes, so strlen happily continues past the boundary into a region of memory beyond the defined char array on the stack, and whatever happens to be there will determine the behaviour of strlen. In this case, 24 bytes past the beginning of the array, there was a null byte.

Don't use char to represent bytes.
Over half of the values of a byte are not printable, i.e. they don't have corresponding printable values.
I suggest you iterate over the array of uint8_t using printf("0x%02X\n", array[i]);

strlen()searches for the first null character and counts all bytes excluding that null byte.
A salt is 8 non-zero bytes - and there's no guarantee that the next character is a null byte.
That's why sizeof and strlen differ.

sizeof is an operator that returns the number of bytes needed to store a specific data structure. When applied to an an array of characters, it represents of the three cases where the name of the array does not decay to the pointer to its first element (the other two are the usage of & and the initialization via a string literal).
strlen is instead a function, assuming that its input is a null-terminated sequence of characters. Because when you pass the name of the array of characters to a function, it does decay to the pointer of its first element, strlen has no way to know the size of the original data structure (like sizeof does). All it gets is a pointer to char. The only way it can determine the end of the string is by running through the sequence of characters, looking for a '\0'. In your case, it cannot find one before the 24th byte in memory. That happens by pure chance.
Try initializing your array with:
char Salt[8] = {0};
And make sure that your RAND_pseudo_bytes function preserves the sentinel '\0' in the treated string.

Beside the null termination of salt, as others pointed out, you need to change the format specifier in printf to %zu because strlen return type is size_t. Using wrong specifier invokes undefined behavior.

Addressing Your Question about strlen()
What strlen() is counting is the number of bytes until the first '\0' in memory.
char Salt[9] = { '\0' };
Will initialize Salt with all '\0's.
NOTE: As #OliCharlesworth pointed out, Salt can have embedded NULLs. Don't use any str*() methods. You need to use mem*() methods only and keep track of the length yourself. Don't rely on sizeof because arrays are turned into pointers when passed to functions.

Related

Append to String a Signed Int (Converted to Bytes) in Big Endian

I have a 4 byte integer (signed), and (i) I want to reverse the byte order, (ii) I want to store the bytes (i.e. the 4 bytes) as bytes of the string. I am working in C++. In order to reverse the byte order in Big Endian, I was using the ntohl, but I cannot use that due the fact that my numbers can be also negative.
Example:
int32_t a = -123456;
string s;
s.append(reinterpret_cast<char*>(reinterpret_cast<void*>(&a))); // int to byte
Further, when I am appending these data, it seems that I am appending 8 bytes instead of 4, why?
I need to use the append (I cannot use memcpy or something else).
Do you have any suggestion?
I was using the ntohl, but I cannot use that due the fact that my numbers can be also negative.
It's unclear why you think that negative number would be a problem. It's fine to convert negative numbers with ntohl.
s.append(reinterpret_cast<char*>(reinterpret_cast<void*>(&a)));
std::string::append(char*) requires that the argument points to a null terminated string. An integer is not null terminated (unless it happens to contain a byte that incidentally represents a null terminator character). As a result of violating this requirement, the behaviour of the program is undefined.
Do you have any suggestion?
To fix this bug, you can use the std::string::append(char*, size_type) overload instead:
s.append(reinterpret_cast<char*>(&a), sizeof a);
reinterpret_cast<char*>(reinterpret_cast<void*>
The inner cast to void* is redundant. It makes no difference.

std::copy resulting in "Stack around the variable " was corrupted" error

I am currently trying to copy 4 bytes out of a vector<BYTE> into an integer value. When my function returns, I continually get an error that my stack is corrupted (Stack around the variable 'rxPacket' was corrupted). I am running in debug mode and debugging my DLL. I stripped the function down to something very basic shown below, but I still get the resulting error. I did find this issue and I'm wondering if I am experiencing something similar. But, I wanted to check to see if there was something obvious I am missing.
AtpSocket::RxPacket AtpSocket::sendAndWait(AtpSocket::Command cmd,const void* data,size_t dataLen,int timeout) {
AtpSocket::TxPacket txPacket;
AtpSocket::RxPacket rxPacket;
int rxCommandStringLength = 0;
std::vector<BYTE> readBuffer(20, 55);
std::reverse_copy(readBuffer.begin(), readBuffer.begin() + sizeof rxCommandStringLength, &rxCommandStringLength);
return rxPacket;
}
rxCommandStringLength is an int, so &rxCommandStringLength is an int*. Let's just assume for the sake of argument that sizeof(int) is 4 bytes 1.
1: to really ensure that exactly 4 bytes are being copied, you should be using int32_t instead of int, since int is not guaranteed to be 4 bytes on all platforms.
Iterators (which includes raw pointers) are incremented/decremented in terms of whole elements, not bytes. The element type is whatever type is returned when the iterator is dereferenced.
Since your input iterator is a vector::iterator to a BYTE array, and your output iterator is an int* pointer, reverse_copy() will iterate through the source array in whole BYTEs and through the destination memory in whole ints, not in single bytes like you are expecting. In other words, when reverse_copy() increments the input iterator it will jump forward by 1 byte, but when it increments the destination iterator it will jump forward in memory by 4 bytes.
In this example, reverse_copy() will read the 1st input BYTE and write the value to the 1st output int, then read the 2nd input BYTE and write the value to the 2nd output int, and so on, and so on. By the end, the input iterator will have advanced by 4 BYTEs, and the destination pointer will have advanced by 4 ints - 16 bytes total, which goes WAY beyond the bounds of the actual rxCommandStringLength variable, and thus reverse_copy() will have written into surrounding memory, corrupting the stack (if it doesn't just crash instead).
Since you want reverse_copy() to iterate in 1-byte increments for BOTH input AND output, you need to type-cast the int* pointer to match the data type used by the input iterators, eg:
std::reverse_copy(readBuffer.begin(), readBuffer.begin() + sizeof rxCommandStringLength, reinterpret_cast<BYTE*>(&rxCommandStringLength));

Can anyone please explain what this C++ code is doing?

char b = 'a';
int *a = (int*)&b;
std::cout << *a;
What could be the content of *a? It is showing garbage value. Can you anyone please explain. Why?
Suppose char takes one byte in memory and int takes two bytes (the exact number of bytes depends of the platform, but usually they are not same for char and int). You set a to point to the memory location same as b. In case of b dereferencing will consider only one byte because it's of type char. In case of a dereferencing will access two bytes and thus will print the integer stored at these locations. That's why you get a garbage: first byte is 'a', the second is random byte - together they give you a random integer value.
Either the first or the last byte should be hex 61 depending on byte order. The other three bytes are garbage. best to change the int to an unsigned int and change the cout to hex.
I don't know why anyone would want to do this.
You initialize a variable with the datatype char ...
a char in c++ should have 1 Byte and an int should contain 2 Byte. Your a points to the address of the b variable... an adress should be defined as any hexadecimal number. Everytime you call this "program" there should be any other hexadecimal number, because the scheduler assigns any other address to your a variable if you start this program new.
Think of it as byte blocks. Char has one byte block (8 bits). If you set a conversion (int*) then you get the next 7 byte blocks from the char's address. Therefore you get 7 random byte blocks which means you'll get a random integer. That's why you get a garbage value.
The code invokes undefined behavior, garbage is a form of undefined behavior, but your program could also cause a system violation and crash with more consequences.
int *a = (int*)&b; initializes a pointer to int with the address of a char. Dereferencing this pointer will attempt to read an int from that address:
If the address is misaligned and the processor does not support misaligned accesses, you may get a system specific signal or exception.
If the address is close enough to the end of a segment that accessing beyond the first byte causes a segment violation, that's what you can get.
If the processor can read the sizeof(int) bytes at the address, only one of those will be a, (0x61 in ASCII) but the others have undetermined values (aka garbage). As a matter of fact, on some architectures, reading from uninitialized memory may cause problems: under valgrind for example, this will cause a warning to be displayed to the user.
All the above are speculations, undefined behavior means anything can happen.

Character array and its memory allocation in C++

I am bit confused after reading a text book. Consider a character array ar[10] in C++. In the text book it says that 10 bytes will be allocated for the array.
Starting from subscript ar[0], how many elements can I store in the given array? Is it 10? If yes can I store data at ar[10]? I want to know how many bytes will be allocated for the array in total since I came to know that every string ends with \0. Will overflow happen if I try to store a character into ar[10]?
If yes can I store data at ar[10]
No.
In your example, ar is an array with ten values. The first value is index #0, so you have ar[0] through ar[9], inclusively. That's the ten values in this array. Count them. Most of us conveniently have exactly ten fingers. Start counting on your fingers, starting with ar[0], and stop when you've used all your ten fingers. You'll stop on ar[9].
Attempting to access ar[10] is undefined behavior.
It will store 10 items in total, including the '\0'. So, 9 characters, and one '\0' null terminator at ar[9].
You can store ten values, from index 0 to index 9. This seems really wrong at first, but remember that 0 is technically a value and must be counted as one. It's sort of like how unsigned ints will hold 2^32 values, but the highest usable number is actually (2^32)-1.
Note that if you want to have the array be null-terminated you will only be able to store 9 characters, as ar[9] will hold '\0'. You could store another character there instead, but will have to write your code around the fact that your C-string is not null-terminated.
That all said, it's generally considered bad practice to use character arrays for strings in C++. It's a lot more error-prone than just using the string standard library.
More info: http://www.cplusplus.com/reference/string/string/
Hence, you have declared a[10] so it carries 10 values. As it is char array which contains string and string is terminated by '\0'. '\0' is also a value.
So if you string length is n then your array size will be n+1 to keep n length string. Otherwise, the overflow will occur.
Observe the following example
int main(){
char a[1], r, t;
printf("Size %d Byte\n", sizeof(a));
a[0] ='a';
a[1] ='b';
a[2] ='c';
printf("%c\n",r); //c
printf("%c\n",t); //b
}
As your array size is 1. Though you have not assigned value of r,t it is auto assigned by a[2] and a[1] respectively.

When to use unsigned char pointer

What is the use of unsigned char pointers? I have seen it at many places that pointer is type cast to pointer to unsinged char Why do we do so?
We receive a pointer to int and then type cast it to unsigned char*. But if we try to print element in that array using cout it does not print anything. why? I do not understand. I am new to c++.
EDIT Sample Code Below
int Stash::add(void* element)
{
if(next >= quantity)
// Enough space left?
inflate(increment);
// Copy element into storage, starting at next empty space:
int startBytes = next * size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < size; i++)
storage[startBytes + i] = e[i];
next++;
return(next - 1); // Index number
}
You are actually looking for pointer arithmetic:
unsigned char* bytes = (unsigned char*)ptr;
for(int i = 0; i < size; i++)
// work with bytes[i]
In this example, bytes[i] is equal to *(bytes + i) and it is used to access the memory on the address: bytes + (i* sizeof(*bytes)). In other words: If you have int* intPtr and you try to access intPtr[1], you are actually accessing the integer stored at bytes: 4 to 7:
0 1 2 3
4 5 6 7 <--
The size of type your pointer points to affects where it points after it is incremented / decremented. So if you want to iterate your data byte by byte, you need to have a pointer to type of size 1 byte (that's why unsigned char*).
unsigned char is usually used for holding binary data where 0 is valid value and still part of your data. While working with "naked" unsigned char* you'll probably have to hold the length of your buffer.
char is usually used for holding characters representing string and 0 is equal to '\0' (terminating character). If your buffer of characters is always terminated with '\0', you don't need to know it's length because terminating character exactly specifies the end of your data.
Note that in both of these cases it's better to use some object that hides the internal representation of your data and will take care of memory management for you (see RAII idiom). So it's much better idea to use either std::vector<unsigned char> (for binary data) or std::string (for string).
In C, unsigned char is the only type guaranteed to have no trapping values, and which guarantees copying will result in an exact bitwise image. (C++ extends this guarantee to char as well.) For this reason, it is traditionally used for "raw memory" (e.g. the semantics of memcpy are defined in terms of unsigned char).
In addition, unsigned integral types in general are used when bitwise operations (&, |, >> etc.) are going to be used. unsigned char is the smallest unsigned integral type, and may be used when manipulating arrays of small values on which bitwise operations are used. Occasionally, it's also used because one needs the modulo behavior in case of overflow, although this is more frequent with larger types (e.g. when calculating a hash value). Both of these reasons apply to unsigned types in general; unsigned char will normally only be used for them when there is a need to reduce memory use.
The unsinged char type is usually used as a representation of a single byte of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte.
The unsigned char* construct will be a pointer to the binary data buffer (or its 1st element).
I am not 100% sure what does c++ standard precisely says about size of unsigned char, whether it is fixed to be 8 bit or not. Usually it is. I will try to find and post it.
After seeing your code
When you use something like void* input as a parameter of a function, you deliberately strip down information about inputs original type. This is very strong suggestion that the input will be treated in very general manner. I.e. as a arbitrary string of bytes. int* input on the other hand would suggest it will be treated as a "string" of singed integers.
void* is mostly used in cases when input gets encoded, or treated bit/byte wise for whatever reason, since you cannot draw conclusions about its contents.
Then In your function you seem to want to treat the input as a string of bytes. But to operate on objects, e.g. performing operator= (assignment) the compiler needs to know what to do. Since you declare input as void* assignment such as *input = something would have no sense because *input is of void type. To make compiler to treat input elements as the "smallest raw memory pieces" you cast it to the appropriate type which is unsigned int.
The cout probably did not work because of wrong or unintended type conversion. char* is considered a null terminated string and it is easy to confuse singed and unsigned versionin code. If you pass unsinged char* to ostream::operator<< as a char* it will treat and expect the byte input as normal ASCII characters, where 0 is meant to be end of string not an integer value of 0. When you want to print contents of memory it is best to explicitly cast pointers.
Also note that to print memory contents of a buffer you would need to use a loop, since other wise the printing function would not know when to stop.
Unsigned char pointers are useful when you want to access the data byte by byte. For example, a function that copies data from one area to another could need this:
void memcpy (unsigned char* dest, unsigned char* source, unsigned count)
{
for (unsigned i = 0; i < count; i++)
dest[i] = source[i];
}
It also has to do with the fact that the byte is the smallest addressable unit of memory. If you want to read anything smaller than a byte from memory, you need to get the byte that contains that information, and then select the information using bit operations.
You could very well copy the data in the above function using a int pointer, but that would copy chunks of 4 bytes, which may not be the correct behavior in some situations.
Why nothing appears on the screen when you try to use cout, the most likely explanation is that the data starts with a zero character, which in C++ marks the end of a string of characters.