Convert BYTE array into unsigned long long int - c++

I'm trying to convert a BYTE array into an equivalent unsigned long long int value but my coding is not working as expected. Please help with fixing it or suggest an alternative method for the same.
Extra Information: These 4 bytes are combined as a hexadecimal number and an equivalent decimal number is an output. Say for a Given byteArray= {0x00, 0xa8, 0x4f, 0x00}, Hexadecimal number is 00a84f00 and it's equivalent decimal number is 11030272.
#include <iostream>
#include <string>
typedef unsigned char BYTE;
int main(int argc, char *argv[])
{
BYTE byteArray[4] = { 0x00, 0x08, 0x00, 0x00 };
std::string str(reinterpret_cast<char*>(&byteArray[0]), 4);
std::cout << str << std::endl;
unsigned long long ull = std::strtoull(str.c_str(), NULL, 0);
printf ("The decimal equivalents are: %llu", ull);
return EXIT_SUCCESS;
}
I'm getting the following output:
The decimal equivalents are: 0
While the expected output was:
The decimal equivalents are: 2048

When you call std::strtoull(str.c_str(), NULL, 0);, its first argument supplied is equivalent to an empty string, as string is essentially a null-terminated sequence of characters.
Second, std::strtoull() does not convert with byte sequences, it converts with the literal meaning of strings. i.e. you'll get 2048 with std::strtoull("2048", NULL, 10).
Another thing to note is that unsigned long long is a 64-bit data type, whereas your byte array only provides 32 bits. You need to fill the other 32 bytes with zero to get the correct result. I use a direct assignment, but you could also use std::memset() here.
What you want to do is:
ull = 0ULL;
std::memcpy(&ull, byteArray, 4);
Given your platform has little-endian, the result should be 2048.

What you first must remember is that a string, is really a null-terminated string. Secondly, a string is a string of characters, which is not what you have. The third problem is that you have an array of four bytes, which corresponds to an unsigned 32-bit integer, and you want an (at least) 64-bit types which is 8 bytes.
You can solve all these problems with a temporary variable, a simple call to std::memcpy, and an assignment:
uint32_t temp;
std::memcpy(&temp, byteArray, 4);
ull = temp;
Of course, this assumes that the endianness is correct.
Note that I use std::memcpy instead of std::copy (or std::copy_n) because std::memcpy is explicitly mentioned to be able to bypass strict aliasing this way, while I don't think the std::copy functions are. Also the std::copy functions are more for copying elements and not anonymous bytes (even if they can do that too, but with a clunkier syntax).

Given the answers are using std::memcpy, I want to point out that there's a more idiomatic way of doing this operation:
char byteArray[] = { 0x00, 0x08, 0x00, 0x00 };
uint32_t cp;
std::copy(byteArray, byteArray + sizeof(cp), reinterpret_cast<char*>(&cp));
std::copy is similar to std::memcpy, but is the C++ way of doing it.
Note that you need to cast the address of the output variable cp to one of: char *, unsigned char *, signed char *, or std::byte *, because otherwise the operation wouldn't be byte oriented.

Related

C++ byte array to int

Now there is a unsigned char bytes[4] and I've already known that the byte array is generated from an int in C++. How can I convert the array back to int?
You can do that using std::memcpy():
#include <iostream>
#include <cstring>
int main() {
unsigned char bytes[4]{ 0xdd, 0xcc, 0xbb, 0xaa };
int value;
std::memcpy(&value, bytes, sizeof(int));
std::cout << std::hex << value << '\n';
}
I've already known that the byte array is generated from an int in C++.
It is crucial to know how the array is generated from an int. If the array was generated by simply copying the bytes on the same CPU, then you can convert back by simply copying:
int value;
assert(sizeof value == sizeof bytes);
std::memcpy(&value, bytes, sizeof bytes);
However, if the array may follow another representation than what your CPU uses (for example, if you've received the array from another computer, over the network), then you must convert the representation. In order to convert the representation, you must know what representation the source data follows.
Theoretically, you would need to handle different sign representations, but in practice, 2's complement is fairly ubiquitous. A consideration that is actually relevant in practice is the byte-endianness.

Convert Hex String to unsigned Char

I have something like:
string hex = "\x80\x01";
and want to convert it to a unsigned char like:
unsigned char hex_char[] = "\x80\x01";
I tried strcpy but it won't work since it doesn't support unsigned char
I would appreciate any suggestions.
For the in-practice you can just copy the values, any way you find natural.
E.g.
using Byte = unsigned char;
string hex = "\x80\x01";
vector<Byte> bytes( hex.begin(), hex.end() );
Or if you know that it will always be just two bytes,
using Byte = unsigned char;
string hex = "\x80\x01";
Byte bytes[] = {{ hex[0], hex[1] }};
Formally it's a different kettle of fish, because with 8-bit byte the value \x80 won't fit as a positive signed char value. So it ends up as an implementation defined value. But in practice this is not a problem because computer evolution has converged on two's complement representation of signed integers, and I don't think there's any C++ compiler that doesn't use it.

Display value of Hexa stored in Array

I have and array of hexa values called
const char receiptLogo[] = {0x01,0x80,0x00,0xB4};
When I tried to get the value Rprintf("%x\r\n",receiptLogo[3]);
Value was displayed as "ffffffb4" and sometimes it was displayed as "b4"
The whole Function is as
void PRINT_PrintLogo( const char Data[])
{
unsigned int height=0;
const char receiptLogo[] = {0x01,0x80,0x00,0xB4};
height=(((unsigned short)receiptLogo[2] ) << 8) | ((unsigned short)receiptLogo[3] );
Rprintf("height=%d,%x,%x\r\n",height,Data[2],Data[3]);
}
The output of this function is
height=65460,0,ffffffb4
although in other times the output is height=180,0,b4
Kindly please advise the reason behind it
in printf() the %X format specifier is hexadecimal int. If you pass a char as a parameter and the most significant bit is set it should sign extend to fill the size of an integer. The compiler may optimize by packing variables and sometimes your variable my be in different places with respect to byte boundaries (i.e. the char may end up in the least significant byte or most significant byte of a 32-bit integer). This access may cause this behavior.

Can I turn unsigned char into char and vice versa?

I want to use a function that expects data like this:
void process(char *data_in, int data_len);
So it's just processing some bytes really.
But I'm more comfortable working with "unsigned char" when it comes to raw bytes (it somehow "feels" more right to deal with positive 0 to 255 values only), so my question is:
Can I always safely pass a unsigned char * into this function?
In other words:
Is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?
Can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?
Bonus: Is the answer same in C and C++?
The short answer is yes if you use an explicit cast, but to explain it in detail, there are three aspects to look at:
1) Legality of the conversion
Converting between signed T* and unsigned T* (for some type T) in either direction is generally possible because the source type can first be converted to void * (this is a standard conversion, §4.10), and the void * can be converted to the destination type using an explicit static_cast (§5.2.9/13):
static_cast<unsigned char*>(static_cast<void *>(data_in))
This can be abbreviated (§5.2.10/7) as
reinterpret_cast<unsigned char *>(data_in)
because char is a standard-layout type (§3.9.1/7,8 and §3.9/9) and signedness does not change alignment (§3.9.1/1). It can also be written as a C-style cast:
(unsigned char *)(data_in)
Again, this works both ways, from unsigned* to signed* and back. There is also a guarantee that if you apply this procedure one way and then back, the pointer value (i.e. the address it's pointing to) won't have changed (§5.2.10/7).
All of this applies not only to conversions between signed char * and unsigned char *, but also to char */unsigned char * and char */signed char *, respectively. (char, signed char and unsigned char are formally three distinct types, §3.9.1/1.)
To be clear, it doesn't matter which of the three cast-methods you use, but you must use one. Merely passing the pointer will not work, as the conversion, while legal, is not a standard conversion, so it won't be performed implicitly (the compiler will issue an error if you try).
2) Well-definedness of the access to the values
What happens if, inside the function, you dereference the pointer, i.e. you perform *data_in to retrieve a glvalue for the underlying character; is this well-defined and legal? The relevant rule here is the strict-aliasing rule (§3.10/10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
[...]
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
[...]
a char or unsigned char type.
Therefore, accessing a signed char (or char) through an unsigned char* (or char) and vice versa is not disallowed by this rule – you should be able to do this without problems.
3) Resulting values
After derefencing the type-converted pointer, will you be able to work with the value you get? It's important to bear in mind that the conversion and dereferencing of the pointer described above amounts to reinterpreting (not changing!) the bit pattern stored at the address of the character. So what happens when a bit pattern for a signed character is interpreted as that of an unsigned character (or vice versa)?
When going from unsigned to signed, the typical effect will be that for values between 0 and 128 nothing happens, and values above 128 become negative. Similar in reverse: When going from signed to unsigned, negative values will appear as values greater than 128.
But this behaviour isn't actually guaranteed by the Standard. The only thing the Standard guarantees is that for all three types, char, unsigned char and signed char, all bits (not necessarily 8, btw) are used for the value representation. So if you interpret one as the other, make a few copies and then store it back to the original location, you can be sure that there will be no information loss (as you required), but you won't necessarily know what the values actually mean (at least not in a fully portable way).
unsigned char or signed char is just interpretation: there is no conversion happening.
Since you are processing bytes, to show intent, it would be better to declare as
void process(unsigned char *data_in, int data_len);
[As noted by an editor: A plain char may be either a signed or an unsigned type. The C and C++ standards explicitly allow either (it is always a separate type from either unsigned char or signed char, but has the same range as one of them)]
Yes, you can always convert from char to unsigned char & vice versa without problems. If you run the following code, and compare it with an ASCII table (ref. http://www.asciitable.com/), you can see a proof by yourself, and how the C/C++ deal with the conversions - they deal exactly in the same way:
#include "stdio.h"
int main(void) {
//converting from char to unsigned char
char c = 0;
printf("%d byte(s)\n", sizeof(char)); // result: 1byte, i.e. 8bits, so there are 2^8=256 values that a char can store.
for (int i=0; i<256; i++){
printf("int value: %d - from: %c\tto: %c\n", c, c, (unsigned char) c);
c++;
}
//converting from unsigned char to char
unsigned char uc = 0;
printf("\n%d byte(s)\n", sizeof(unsigned char));
for (int i=0; i<256; i++){
printf("int value: %d - from: %c\tto: %c\n", uc, uc, (char) uc);
uc++;
}
}
I will not post the output because it has too many lines! It can be noticed in the output that in the first half of each section, i.e. from i=0:127, the conversion from chars to unsigned chars and vice-versa works well, without any modification or loss.
However, from i=128:255 the chars and the unsigned chars cannot be casted, or you would have different outputs, because unsigned char saves the values from [0:256] and char saves the values in the interval [-128:127]). Nevertheless, the behaviour in this 2nd half is irrelevant, because in C/C++, in general, you only lead with chars/unsigned chars as ASCII characters, whose can take only 128 different values and the other 128 values (positive for chars or negative for unsigned chars) are never used.
If you never put a value in a char that doesn't represent a character, and you never put a value in an unsigned char that doesn't represent a character, everything will be OK!
extra: even if you use UTF-8 or other encodings (for special characters) in your strings with C/C++, everything with this kind of casts would be OK, for instance, using UTF-8 encoding (ref. http://lwp.interglacial.com/appf_01.htm):
char hearts[] = {0xe2, 0x99, 0xa5, 0x00};
char diamonds[] = {0xe2, 0x99, 0xa6, 0x00};
char clubs[] = {0xe2, 0x99, 0xa3, 0x00};
char spades[] = {0xe2, 0x99, 0xa0, 0x00};
printf("hearts (%s)\ndiamonds (%s)\nclubs (%s)\nspades (%s)\n\n", hearts, diamonds, clubs, spades);
the output of that code will be:
hearts (♥)
diamonds (♦)
clubs (♣)
spades (♠)
even if you cast each of its chars to unsigned chars.
so:
"can I always safely pass a unsigned char * into this function?"
yes!
"is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?"
yes!
"can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?"
yes!
"is the answer same in C and C++?"
yes!
Semantically, passing between unsigned char * and char * are safe, and even though casting between them, so as in c++.
However, consider the following sample code:
#include "stdio.h"
void process_unsigned(unsigned char *data_in, int data_len) {
int i=data_len;
unsigned short product=1;
for(; i--; product*=data_in[i])
;
for(i=sizeof(product); i--; ) {
data_in[i]=((unsigned char *)&product)[i];
printf("%d\r\n", data_in[i]);
}
}
void process(char *data_in, int data_len) {
int i=data_len;
unsigned short product=1;
for(; i--; product*=data_in[i])
;
for(i=sizeof(product); i--; ) {
data_in[i]=((unsigned char *)&product)[i];
printf("%d\r\n", data_in[i]);
}
}
void main() {
unsigned char
a[]={1, -1},
b[]={1, -1};
process_unsigned(a, sizeof(a));
process(b, sizeof(b));
getch();
}
output:
0
255
-1
-1
All the code inside process_unsigned and process are just IDENTICAL. The only difference is unsigned and signed. This sample shows that the code in the black box, do be affected by the SIGN, and nothing is guaranteed between the callee and caller.
Thus I would say that, it's applicable of passing only, but none of any other possibilities is guaranteed.
You can pass a pointer to a different kind of char, but you may need to explicitly cast it. The pointers are guaranteed to be the same size and the same values. There isn't going to be any information loss during the conversion.
If you want to convert char to unsigned char inside the function, you just assign a char value to an unsigned char variable or cast the char value to unsigned char.
If you need to convert unsigned char to char without data loss, it's a bit harder, but still possible:
#include <limits.h>
char uc2c(unsigned char c)
{
#if CHAR_MIN == 0
// char is unsigned
return c;
#else
// char is signed
if (c <= CHAR_MAX)
return c;
else
// ASSUMPTION 1: int is larger than char
// ASSUMPTION 2: integers are 2's complement
return c - CHAR_MAX - 1 - CHAR_MAX - 1;
#endif
}
This function will convert unsigned char to char in such a way that the returned value can be converted back to the same unsigned char value as the parameter.
You really need to view the code to process() to know if you can safely pass in unsigned characters. If the function uses the characters as an index into an array, then no, you can't use unsigned data.

Why does this hex value get output as a negative number?

char buffer_b[5] = { 0xDA, 0x00, 0x04, 0x00, 0x07 };
printf("%d\n%d\n%d", buffer_b[0], buffer_b[2], buffer_b[4]);
This gives me output:
-38
4
7
However I am expecting:
218
4
7
Thanks.
char is signed. Use unsigned char.
use %ud also.
Evidently, char is signed in your environment. (That's a detail that can vary from one implementation to the next, and some compilers even offer you an option through a command-line switch.) The number you're printing is 0xDA, which has the most significant bit set, so its value is negative. When the compiler passes that value to printf, it promotes the (signed) char value to type int, and it retains its negativity. You used the %d format string, which tells printf to interpret its argument as a signed value.
To treat the value as unsigned, you should at a minimum use the %u format string. Then either change your array's element type to be an unsigned type, such as unsigned char or uint8_t, or type-cast the printf argument to unsigned.
When the char 0xDA is promoted to int to pass to printf, the compiler is doing a sign-extension, converting it to 0xffffffda, which is the 32-bit representation of -38. You were expecting it to be zero-extended to 0x000000da. To control how the compiler extends a character, you have declare it as signed char or unsigned char. Signed integer types are widened by sign-extending, and unsigned integer types are widened by zero-extending.
You can't predict how any particular compiler will treat an unqualified char, or if it will be the same in the next release of the compiler.