strlen() gives wrong size cause of null bytes in array - c++

I have a dynamic char array that was deserialized from a stream.
Content of char *myarray on the file (with a hexal editor) :
4F 4B 20 31 32 20 0D 0A 00 00 B4 7F
strlen(myarray) returns 8, (must be 12)

strlen counts the characters up to the first 0 character, that's what it's for.
If you want to know the length of the deserialized array, you must get that from somewhere else, the deserialization code should know how large an array it deserialized.

strlen(myarray) returns the index of the first 00 in myarray.

Which language are you asking about?
In C, you'll need to remember the size and pass it to anything that needs to know it. There's no (portable) way to determine the size of an allocated array given just a pointer to it and, as you say, strlen and other functions that work with zero-terminated strings won't work with unterminated lumps of data.
In C++, use std::string or std::vector<char> to manage a dynamic array of bytes. Both of these make the size available, as well as handling deallocation for you.

9th char is 00. i.e '\0'. This is the reason you are getting 8 instead of 12. strlen() takes it as Null terminator.

A C-style String is terminated by NULL, and your char* contains a NULL-Byte at the 9th position, thus strlen returns 8, as it counts the elements until it finds a NULL byte.
(from http://www.cplusplus.com/reference/clibrary/cstring/strlen/):
A C string is as long as the amount of characters between the beginning of the string and the terminating null character.
As you're using the char* for binary data, you must not use the strlen function, but remember (pass along) the size of the char array.
In your case, you could serialize the size of the dynamic array on transmission, and deserialize it before allocating / reading the array.

From cplusplus.com (http://www.cplusplus.com/reference/clibrary/cstring/strlen/):
The length of a C string is determined by the terminating
null-character
You can not expect that it would count the whole string if you have '\0' in the middle of it.
It such scenarios I have found it to be best to serialize the length of the message alongside the data in the stream. For example, you first serialize the length of the char array - 12 and then you serialize the actual data (characters). That way when you read the data you would first read the length and then you can read that much characters and be sure that is your char array.

Related

Size of string data type in C++ array

This is a very simple code in C++. The address of the strings are separated by a constant gap of 28 bytes. What does these 28 bytes contains. I am trying to find an analogy with the gap of 4 bytes of an array containing integers. As far as I know the 4 bytes ensures the upper limit of the value of an integer that can be reached. What happens in case of the 28 bytes. Does it really contain 28*8 bits of character data - I do not believe that. I have tried giving in a large text of data, and it still prints without any issues.
string str[3] = { "a", "b", "c" };
for (int i = 0; i < 3; ++i) {
cout << &str[i] << endl;
}
What does these 28 bytes contains.
It contains the object of type string. We don't know any more unless we know how you have defined that type.
If string is an alias of std::string, then it is a class defined by the standard library. The exact contents and thus the exact size depend on and vary between standard library implementations, and the target architecture.
If we consider what some implementation might do in practice:
Does it really contain 28*8 bits of character data - I do not believe that.
Believe it or not, (modern) string implementations really do contain ~ sizeof(string) (sans potential overhead) bytes of character data when those characters fit on that space.
They use advanced tricks to change the internal layout to support longer strings. For those, they use pointers. Typically, there would be a pointer to beginning, pointer to end of string (storing offset is another option) and pointer (or offset) to the end of dynamic storage. This representation is essentially identical to a vector.
If you read the standard library headers that you use, you'll find the exact definition of the class there.

storing 0 in a unsigned char array

I have a unsigned char array like this:
unsigned char myArr[] = {100, 128, 0, 32, 2, 9};
I am using reinterpret_cast to covert it to const char* as I have to pass a const char* to a method. This information is then sent over grpc and the other application (erlang based recieves it and stores it in a erlang Bin). But what I observe is the Erlang application only received <<100, 128>> and nothing after that. What could be causing this? Is it the 0 in the character array that is the problem here? Could someone explain how to handle the 0 in an unsigned char array? I did read quite a few answers but nothing clearly explains my problem.
What could be causing this? Is it the 0 in the character array that is the problem here?
Most likely, yes.
Probably one of the functions that the pointer is passed to is specified to accept an argument which points to a null terminated string. Your array happens to incidentally be null terminated by containing null character at index 2 which is where the string terminates. Such function would typically only have well defined behavior in case the array is null terminated, so passing pointer to arbitrary binary that might not contain null character would be quite dangerous.
Could someone explain how to handle the 0 in an unsigned char array?
Don't pass the array into functions that expect null terminated strings. This includes most formatted output functions and most functions in <cstring>. The documentation of the function should mention the pre-conditions.
If such functions are your only option, then you can encode the binary data in a textual format and decode it in the other end. A commonly used textual encoding for binary data is Base64 although it is not necessarily the most optimal.

c++ - string of 64 ASCII characters overflows malloc(64 * sizeof(char))

The below code throws an error if I provide a string that is 64 characters long of hexadecimals (ie: 26C8D8AB82B027808A371BC46EA789364AB8419F2B17EADFE955CBE5C6369011), even though I allocated 64 * sizeof(char) bytes for it which should be enough:
char* username = (char*)malloc(64 * sizeof(char));
std::cin >> username;
free(username);
The error is thrown in the third line when I free the allocated memory:
CRT detected that the application wrote to memory after end of heap
buffer.
This does not happen with 63 characters or less. Can anyone tell me why exactly 64 * sizeof(char) is not enough and why is the error thrown when freeing the memory not before ...
C strings are NULL-terminated.
You did not leave space for the terminator.
The error is detected when freeing the memory, because that's the function that looked at the padding after the object and found it was corrupted. If you disable memory debugging, there might not be any checking (possibly even no padding) and this sort of error could go undetected until it trashes a completely unrelated piece of data.
If you know the exact length already and don't need a terminator to mark the end, you can use
cin.read(username, 64);
This will not store a terminator, and also won't ever read more (or less) than 64 characters of input, so it will not overflow.
A C-style string must contain one more char than the number of characters you try to put into it, to leave room for the terminating null character.
you are neglecting NUL CHAR ('\0') at the end of the string...
you should allocate 65 byte for a string with max length of 64

How handles AES_set_encrypt_key short keys

I'm writing a set of tools, where a c++ application encodes data with the AES encryption standard and a java app decodes it. As far as I know the key length has to be 16 bytes. But when I was trying to use passwords with different length I came across the following behaviour of the AES_set_encrypt_key function:
length = 16 : nothing special happens, as expected
length > 16 : password gets cut after the sixteenth character
length < 16 : the password seems to be filled "magical"
So, does anyone know what exactly happens in the last case ?
Btw: Java throws an exception if the password is not exactly 16 chars long
Thanks,
Robert
Don't confuse byte array with C-String. Every C-String is a byte array, but not every byte array is a C-String.
The concept with AES is to use a "KEY". It acts like a password but the concept is a little bit different. It has a fixed size and must be 16 bytes on your case.
The key is a byte array of 16 bytes that is NOT a C-String. It means it can have any value at any point in the buffer, while a C-String must be null-terminated (the '\0' in the end of your content).
When you give a C-String to your AES, it still interprets it as a buffer, ignoring any \0 character on the way. In other words, if your string is "something", the buffer is in fact "something\0??????", where "??????" here means any random trash bytes that cannot be guaranteed to work all the time.
Why does the key length < 16 is working? In debug mode, when you start a buffer, it usually keeps a default value that is repeating on your case. But it changes accordinly to compiler and/or platform, so take take.
And the key length > 16, AES is just picking the 16 first bytes of your buffer and ignoring the rest.

String class e char in C++

When I have
char anything[20];
cout << sizeof anything;
it prints 20.
However
string anymore;
cout << sizeof anymore; // it prints 4
getline(cin, anymore); // let's suppose I type more than one hundred characters
cout << sizeof anymore; // it still prints 4 !
I would like to understand how c++ manages this. Thanks
sizeof is a compile-time construct. It has nothing to do with runtime, but rather gives a fixed result based on the type passed to it (or the type of the value passed to it). So char[20] is 20 bytes, but a string might be 4 or 8 bytes or whatever depending on the implementation. The sizeof isn't telling you how much storage the string allocated dynamically to hold its contents.
sizeof is a compile-time operator. It tells you the size of the type.
It's because of anything - is array with 20 characters. Sizeof of each character 1 byte - so, totally 20 bytes.
And string class contain pointer for the begin of char-array and size_t(unsigned int for example) - it's 4 bytes. sizeof doesn't know how many memory you allocated for the string, it know just that you have pointer for something, because it's compile-time function.
sizeof isn't what you have decided that it should be. It doesn't magically perceive the semantics of whatever type you throw at it. All it knows is how many bytes are used up, directly for storing the instance of a type.
For an array of five characters, that's 5. For a pointer (to anything, including an array), that's usually 4 or 8. For an std::string, it's however many bytes your C++ Standard Library implementation happens to need to do its work. This work usually involves dynamic allocation, so the four bytes you're looking likely represent just enough storage for a pointer.
It is not to be confused with specific "size" semantics. For std::string, that's anymore.length(), which uses whatever internal magic is required to calculate the length of the buffer of characters that it's stored somewhere, possibly (and usually) indirectly.
For what it's worth, I'm very surprised that a std::string could take up only four bytes. I'd expect it'd store at least "length" and a pointer, which is usually going to take more than four bytes.
The 'string' type is a class template. String instances are accessed by references, AKA pointers. 4 bytes on 32-bit systems.