c++ how to cast multiple bytes from char[] into one integer - c++

I am trying to convert a char array to integers:
const int LENGTH = 3 * sizeof(int);
char data[LENGTH];
/* data is filled */
for (int i = 0; i < LENGTH; i += sizeof(int)) {
std::cout << "Integer: " << (int)data[i] << std::endl;
}
for (int i = 0; i < LENGTH; i += sizeof(short)) {
std::cout << (short)data[i] << " ";
}
the output is:
Integer: 0
Integer: 0
Integer: 0
0 3 0 3 0 3
I'd expect that if the shorts are not zero so must the integers. Probably the conversion as seen here works for just that one character/byte and not as expected for the folloing 4 bytes. How can I fix that?
To be clear: I want bytes 0 to 3 casted into one integer, then the next (4 to 7) into the next integer and so on...

You are casting data[i] to an int. However, data[i] is a char, so you can cast all you want, the cast is not going to magically read extra bytes. Instead, you have to cast the data pointer to int * and only then dereference it.
Basically, you'll end up with something like this:
auto voidPtr = static_cast<void const *>(data);
auto intPtr = static_cast<int const *>(voidPtr);
for (size_t i = 0; i < LENGTH / sizeof(int); ++i) {
std::cout << "Int: " << intPtr[i] << "\n";
}
Note how i is only incremented by 1 each time, but the number of increments is divided by sizeof(int). This is because the compiler will automatically do the right thing when you're indexing an int *.
Also be aware that what you're getting back might not be what you expect. Depending on whether the machine you're running this on is big- or little-endian.
P.S.: It's generally discouraged to use a C-style cast, static_cast<int> is much more explicit in showing what you want to achieve.

As #underscore_d pointed out, *((int*)&data[i]) from this answer will result in undefined behaviour and memcpy should be used.
int intData[3];
std::memcpy(intData, data, sizeof data);
for (int i = 0; i < 3; i++) {
std::cout << "int: " << intData[i] << " ";
}
is working fine and complies with the reference of memcpy.

Related

Am I really copying the bytes or am I copying characters in this case?

I have a vector of unsigned char where I copy bytes in C++. I convert all primitive types to bytes and copy to this vector of char (which is interpreted as bytes in C++). Now I am copying also strings. But I am not sure if I am converting strings to bytes. If you take a look at my output when I am printing the vector of unsigned char I am printing bytes from double int float but I am printing the real string of my variable testString. So I suppose that I am not inserting bytes of this testString on my vector of unsigned char. How should I do that?
Thanks
const std::string lat = "lat->", alt = "alt->", lon = "lon->", testString = "TEST-STRING";
double latitude = 10.123456;
double longitude = 50.123456;
double altitude = 1.123456;
std::vector<unsigned char> result(
sizeof(latitude) + sizeof(longitude) + sizeof(altitude) + testString.length());
std::cout << "copying to the vector" << std::endl;
memcpy(result.data(), &longitude, sizeof(longitude));
memcpy(result.data() + sizeof(longitude), &latitude, sizeof(latitude));
memcpy(result.data() + sizeof(longitude) + sizeof(latitude), &altitude, sizeof(altitude));
memcpy(result.data() + sizeof(longitude) + sizeof(latitude) + sizeof(altitude), testString.c_str(),
testString.length() + 1);
std::cout << "copied to the vector\n" << std::endl;
std::cout << "printing the vector" << std::endl;
for (unsigned int j = 0; j < result.size(); j++) {
std::cout << result[j];
}
std::cout << std::endl;
std::cout << "printed the vector\n" << std::endl;
// testing converting back ...................
std::cout << "printing back the original value" << std::endl;
double dLat, dLon, dAlt;
std::string value;
memcpy(&dLon, result.data(), sizeof(longitude));
memcpy(&dLat, result.data() + sizeof(longitude), sizeof(latitude));
memcpy(&dAlt, result.data() + sizeof(longitude) + sizeof(latitude), sizeof(altitude));
value.resize(testString.length());
memcpy(&value[0], result.data() + sizeof(longitude) + sizeof(latitude) + sizeof(altitude),
sizeof(value.data()) + testString.size());
std::cout << alt << dAlt;
std::cout << lat << dLat;
std::cout << lon << dLon;
std::cout << " " << value << std::endl;
std::cout << "printed back the original value\n" << std::endl;
output:
copying to the vector
copied to the vector
printing the vector
[?�gI#m���5?$#l������?TEST-STRING
printed the vector
printing back the original value
alt->1.12346lat->10.1235lon->50.1235 TEST-STRING
printed back the original value
There's no problem with your code! You're printing the actual bytes of your variables. The bytes in a double can't really be interpreted as a text string (at least, it doesn't make sense if you do) but the bytes in a text string can, producing what you see.
Let's say you've got the following code (which is really just disguised C):
#include <cstdio>
int main(int argc, char *argv[]) {
struct {
double latitude;
double longitude;
char name[30];
} structure = {
53.6344,
126.5223167,
"Keyboard Mash"
};
printf("%f %f %s\n", structure.latitude, structure.longitude, structure.name);
for (size_t i = 0; i < sizeof(structure); i += 1) {
printf("%c", ((char*)&structure)[i]);
}
printf("\n");
}
This code would (probably) print:
53.6344 126.5223167 Keyboard Mash
����������������Keyboard Mash�����������������
The first 16 bytes are from the doubles, and the next 30 are from the char[]. That's just how char[]s are stored! Your code is doing what you'd expect it to.
Of course, you can't rely on it doing this in exactly this way; that's undefined behaviour.
I feel like you were expecting something like: 128565TESTSTRING where 12, 85 and 65 are values of longitude, latitude and altitude. Well, that's not going to happen be cause you wrote 12 in the data, not "12"; therefore, it will return you the character whose ASCII code is 12. Maybe you could use something like sprintf() instead.

Comparison Of Pointers

I want to compare the memory address and pointer value of p, p + 1, q , and q + 1.
I want to understand, what the following values actually mean. I can't quite wrap my head around whats going on.
When I run the code:
I get an answer of 00EFF680 for everytime I compare the adresss p with another pointer.
I get an answer of 00EFF670 for everytime I compare the address of q with another pointer.
I get an answer of 15726208 when I look at the pointer value of p.
And I get an answer of 15726212 When I look at the pointer value of p + 1.
I get an answer of 15726192 when I look at the pointer value of q
And I get an answer of 15726200 Wehn I look at the pointer value of q + 1.
Code
#include <iostream>
#include <string>
using namespace std;
int main()
{
int val = 20;
double valD = 20;
int *p = &val;
double *q;
q = &valD;
cout << "Memory Address" << endl;
cout << p == p + 1;
cout << endl;
cout << q == q + 1;
cout << endl;
cout << p == q;
cout << endl;
cout << q == p;
cout << endl;
cout << p == q + 1;
cout << endl;
cout << q == p + 1;
cout << endl;
cout << "Now Compare Pointer Value" << endl;
cout << (unsigned long)(p) << endl;
cout << (unsigned long) (p + 1) << endl;
cout << (unsigned long)(q) << endl;
cout << (unsigned long) (q + 1) << endl;
cout <<"--------" << endl;
return 0;
}
There are a few warnings and/or errors.
The first is that overloaded operator << has higher precedence than the comparison operator (on clang++ -Woverloaded-shift-op-parentheses is the flag).
The second is that there is a comparison of distinct pointer types ('int *' and 'double *').
For the former, parentheses must be placed around the comparison to allow for the comparison to take precedence. For the latter, the pointers should be cast to a type that allows for safe comparison (e.g., size_t).
For instance on line 20, the following would work nicely.
cout << ((size_t) p == (size_t) (q + 1));
As for lines 25-28, this is standard pointer arithmetic. See the explanation here.
As to your question:
I want to compare p, p +1 , q , and q + 1. And Understand what the results mean.
If p is at address 0x80000000 then p+1 is at address 0x80000000 + sizeof(*p). If *p is int then this is 0x80000000 + 0x8 = 0x80000008. And the same reasoning applies for q.
So if you do p == p + 1 then compiler will first do the additon: p+1 then comparison, so you will have 0x80000000 == 0x80000008 which results in false.
Now to your code:
cout << p == p + 1;
is actually equivalent to:
(cout << p) == p + 1;
and that is because << has higher precedence than ==. Actually you should get a compilation error for this.
Another thing is comparision of pointers of non related types like double* with int*, without cast it should not compile.
In C and C++ pointer arithmetic is very closely tied with array manipulation. The goal is that
int array[3] = { 1, 10, 100 };
int *ptr = { 1, 10, 100 };
std::cout << array[2] << '\n';
std::cout << *(ptr + 2) << '\n';
outputs two 100s. This allows the language to treat arrays and pointers as equivalent - that's not the same thing as "the same" or "equal", see the C FAQ for clarification.
This means that the language allows:
int array[3] = { 1, 10, 100 };
int *ptr = { 1, 10, 100 };
And then
std::cout << (void*)array << ", " << (void*)&array[0] << '\n';
outputs the address of the first element twice, the first array behaves like a pointer.
std::cout << (void*)(array + 1) << ", " << (void*)&array[1] << '\n';
prints the address of the second element of array, again array behaving like a pointer in the first case.
std::cout << ptr[2] << ", " << *(ptr + 2) << '\n';
prints element #3 of ptr (100) twice, here ptr is behaving like an array in the first use,
std::cout << (void*)ptr << ", " << (void*)&ptr[0] << '\n';
prints the value of ptr twice, again ptr behaving like an array in the second use,
But this can catch people unaware.
const char* h = "hello"; // h points to the character 'h'.
std::cout << (void*)h << ", " << (void*)(h+1);
This prints the value of h and then a value one higher. But this is purely because the type of h is a pointer to a one-byte-sized data type.
h + 1;
is
h + (sizeof(*h)*1);
If we write:
const char* hp = "hello";
short int* sip = { 1 };
int* ip = { 1 };
std::cout << (void*)hp << ", " << (void*)(hp + 1) << "\n";
std::cout << (void*)sip << ", " << (void*)(sip + 1) << "\n";
std::cout << (void*)ip << ", " << (void*)(ip + 1) << "\n";
The first line of output will show two values 1 byte (sizeof char) apart, the second two values will be 2 bytes (sizeof short int) apart and the last will be four bytes (sizeof int) apart.
The << operator invokes
template<typename T>
std::ostream& operator << (std::ostream& stream, const T& instance);
The operator itself has very high precedence, higher than == so what you are actually writing is:
(std::cout << p) == p + 1
what you need to write is
std::cout << (p == p + 1)
this is going to print 0 (the result of int(false)) if the values are different and 1 (the result of int(true)) if the values are the same.
Perhaps a picture will help (For a 64bit machine)
p is a 64bit pointer to a 32bit (4byte) int. The green pointer p takes up 8 bytes. The data pointed to by p, the yellow int val takes up 4 bytes. Adding 1 to p goes to the address just after the 4th byte of val.
Similar for pointer q, which points to a 64bit (8byte) double. Adding 1 to q goes to the address just after the 8th byte of valD.
If you want to print the value of a pointer, you can cast it to void *, for example:
cout << static_cast<void*>(p) << endl;
A void* is a pointer of indefinite type. C code uses it often to point to arbitrary data whose type isn’t known at compile time; C++ normally uses a class hierarchy for that. Here, though, it means: treat this pointer as nothing but a memory location.
Adding an integer to a pointer gets you another pointer, so you want to use the same technique there:
cout << static_cast<void*>(p+1) << endl;
However, the difference between two pointers is a signed whole number (the precise type, if you ever need it, is defined as ptrdiff_t in <cstddef>, but fortunately you don’t need to worry about that with cout), so you just want to use that directly:
cout << (p+1) - p << endl;
cout << reinterpret_cast<char*>(p+1) - reinterpret_cast<char*>(p) << endl;
cout << (q - p) << endl;
That second line casts to char* because the size of a char is always 1. That’s a big hint what’s going on.
As for what’s going on under the hood: compare the numbers you get to sizeof(*p) and sizeof(*q), which are the sizes of the objects p and q point to.
The pointer values that are printed are likely to change on every execution (see why the addresses of local variables can be different every time and Address Space Layout Randomization)
I get an answer of 00EFF680 for everytime I compare the adresss p with another pointer.
int val = 20;
double valD = 20;
int *p = &val;
cout << p == p + 1;
It is translated into (cout << p) == p + 1; due to the higher precedence of operator << on operator ==.
It print the hexadecimal value of &val, first address on the stack frame of the main function.
Note that in the stack, address are decreasing (see why does the stack address grow towards decreasing memory addresses).
I get an answer of 00EFF670 for everytime I compare the address of q with another pointer.
double *q = &valD;
cout << q == q + 1;
It is translated into (cout << q) == q + 1; due to the precedence of operator << on operator ==.
It prints the hexadecimal value of &valD, second address on the stack frame of the main function.
Note that &valD <= &val - sizeof(decltype(valD) = double) == &val - 8 since val is just after valD on the stack. It is a compiler choice that respects some alignment constraints.
I get an answer of 15726208 when I look at the pointer value of p.
cout << (unsigned long)(p) << endl;
It just prints the decimal value of &val
And I get an answer of 15726212 When I look at the pointer value of p + 1.
int *p = &val;
cout << (unsigned long) (p + 1) << endl;
It prints the decimal value of &val + sizeof(*decltype(p)) = &val + sizeof(int) = &val + 4 since on your machine int = 32 bits
Note that if p is a pointer to type t, p+1 is p + sizeof(t) to avoid memory overlapping in array indexing.
Note that if p is a pointer to void, p+1 should be undefined (see void pointer arithmetic)
I get an answer of 15726192 when I look at the pointer value of q
cout << (unsigned long)(q) << endl;
It prints the decimal value of &valD
And I get an answer of 15726200 Wehn I look at the pointer value of q + 1.
cout << (unsigned long) (q + 1) << endl;
It prints the decimal value of &val + sizeof(*decltype(p)) = &valD + sizeof(double) = &valD + 8

Concatenating an array size in cout statement

I'm trying to output the number of element-objects in my array, but the syntax for that is not the same as it is for Java:
// print list of all messages to the console
void viewSent()
{
cout << "You have " << sent.size() << " new messages.\n";//Error: left of '.size' must have class/struct,union
std::cout << "Index Subject" << '\n';
for (size_t i = 0; i < sent.size(); ++i)
{
std::cout << i << " : " << sent[i].getSubject() << '\n';
}
}
if the .size doesn't work in C++ syntax, what does?
The C++ equivalent of a Java array is std::vector. sent.size() is the correct way to get the size.
You didn't post your definition of sent but it should be std::vector<YourObject> sent;, perhaps with initial size and/or values also specified.
I'm guessing you tried to use a C-style array -- don't do that, C-style arrays have strange syntax and behaviour for historical reasons and there is really no need to use them ever in C++.
If your array is a C-Array, you can loop through it like this:
for (size_t i = 0; i < (sizeof(sent) / sizeof(TYPE)); ++i)
... where TYPE is the underlying type of the array.
For example, if sent is defined as:
int sent[];
... then TYPE would be int, like this:
for (size_t i = 0; i < (sizeof(sent) / sizeof(int)); ++i)
A C-Array is not an object. So it has no members or methods and you cannot use the member operator with it. The sizeof operator is used to find the size of a fundamental type, in bytes. sizeof returns an integer value of type size_t.

Need help understanding pointer arithmetic

I'm having troubles understanding pointer arithmetic or how memory is assigned. In the code snippet below, I am trying to access the value of 'size = 1' which is located 8 bytes before 'test', but I don't get size's value and the value is not random. So I may have an issue with understanding bytes sizes. If void*, long, and char are 8 bytes should it matter when using pointer arithmetic?
#include <iostream>
using namespace std;
char arrayOfCrap[100];
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return ((void*) &arrayOfCrap[29]);
}
int main(){
long * test;
test = (long*)what();
*test = 1221;
cout << "Value of test: " << *test << endl;
cout << "Long number before test: " << *(test-1) << endl;
}
The code works when main moves forward from what()'s void* 'pointer:
#include <iostream>
using namespace std;
char arrayOfCrap[100];
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return ((void*) &arrayOfCrap[28]); //change from above
}
int main(){
long * test;
test = (long*)what();
test++; //change from above
*test = 1221;
cout << "Value of test: " << *test << endl;
cout << "Long number before test: " << *(test-1) << endl;
}
Your code is not locating *size eight bytes before *test:
size = (long*)&arrayOfCrap[28];
arrayOfCrap is char arrayOfCrap[100] so arrayOfCrap[28] is the char at offset 28 and arrayOfCrap[29] is the char at offset 29.
The reason test++ works is that test is of type long*, so incrementing it actually moves to the next position for a long, whereas incrementing a char* or using an index on a char array gives you the next position for a char.
You could also do one of these:
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return size+1;
}
void * what(){
long * size ;
size = (long*)&arrayOfCrap[28];
*size = 1;
return ((void*) &arrayOfCrap[28 + sizeof(long)];
}
By the way, its not necessarily safe to take a pointer to just any memory location and treat it as a pointer to another type. Some platforms require some types to be 'aligned', or to have those types exist only at addresses that are multiples of a certain value. On those platforms reading or writing to an unaligned object may crash (bus error) or otherwise have undefined behavior. Also, some platforms may not crash or behave incorrectly, but have much better performance when reading/writing aligned objects. I know this is completely beside the point of your experimentation, but it's something you should know for real code. Here's an example of what not to do in real code:
int read_int(char *&c) {
int out = *(int*)c; // c may not be properly aligned!
c += sizeof(int);
return out;
}
Unfortunately on a common platform, x86, unaligned access is usually just slow rather than something that will always cause a crash, so users of that platform have to be especially careful.
When you increment a pointer, it increments not by the pointer size, but by the size of the type of the pointer. A char* pointer increments by sizeof(char), a long* pointer increments by sizeof(long)
sizeof(char *), sizeof(long *) should be both the same size (generally 4 bytes on 32-bit systems, 8 bytes on 64-bit systems).
However, sizeof(char) and sizeof(long) are not the same.
You are confusing your pointer size with the integer size.
#include <iostream>
using namespace std;
int main()
{
cout << "\n sizeof(char*) " << sizeof(char *);
cout << "\n sizeof(char) " << sizeof(char);
cout << "\n sizeof(long*) " << sizeof(long *);
cout << "\n sizeof(long) " << sizeof(long);
}
See it in action here: http://ideone.com/gBcjS

Object/Struct Alignment in C/C++

#include <iostream>
using namespace std;
struct test
{
int i;
double h;
int j;
};
int main()
{
test te;
te.i = 5;
te.h = 6.5;
te.j = 10;
cout << "size of an int: " << sizeof(int) << endl; // Should be 4
cout << "size of a double: " << sizeof(double) << endl; //Should be 8
cout << "size of test: " << sizeof(test) << endl; // Should be 24 (word size of 8 for double)
//These two should be the same
cout << "start address of the object: " << &te << endl;
cout << "address of i member: " << &te.i << endl;
//These two should be the same
cout << "start address of the double field: " << &te.h << endl;
cout << "calculate the offset of the double field: " << (&te + sizeof(double)) << endl; //NOT THE SAME
return 0;
}
Output:
size of an int: 4
size of a double: 8
size of test: 24
start address of the object: 0x7fffb9fd44e0
address of i member: 0x7fffb9fd44e0
start address of the double field: 0x7fffb9fd44e8
calculate the offset of the double field: 0x7fffb9fd45a0
Why do the last two lines produce different values? Something I am doing wrong with pointer arithmetic?
(&te + sizeof(double))
This is the same as:
&((&te)[sizeof(double)])
You should do:
(char*)(&te) + sizeof(int)
You are correct -- the problem is with pointer arithmetic.
When you add to a pointer, you increment the pointer by a multiple of that pointer's type
Therefore, &te + 1 will be 24 bytes after &te.
Your code &te + sizeof(double) will add 24 * sizeof(double) or 192 bytes.
Firstly, your code is wrong, you'd want to add the size of the fields before h (i.e. an int), there's no reason to assume double. Second, you need to normalise everything to char * first (pointer arithmetic is done in units of the thing being pointed to).
More generally, you can't rely on code like this to work. The compiler is free to insert padding between fields to align things to word boundaries and so on. If you really want to know the offset of a particular field, there's an offsetof macro that you can use. It's defined in <stddef.h> in C, <cstddef> in C++.
Most compilers offer an option to remove all padding (e.g. GCC's __attribute__ ((packed))).
I believe it's only well-defined to use offsetof on POD types.
struct test
{
int i;
int j;
double h;
};
Since your largest data type is 8 bytes, the struct adds padding around your ints, either put the largest data type first, or think about the padding on your end! Hope this helps!
&te + sizeof(double) is equivalent to &te + 8, which is equivalent to &((&te)[8]). That is — since &te has type test *, &te + 8 adds eight times the size of a test.
You can see what's going on more clearly using the offsetof() macro:
#include <iostream>
#include <cstddef>
using namespace std;
struct test
{
int i;
double h;
int j;
};
int main()
{
test te;
te.i = 5;
te.h = 6.5;
te.j = 10;
cout << "size of an int: " << sizeof(int) << endl; // Should be 4
cout << "size of a double: " << sizeof(double) << endl; // Should be 8
cout << "size of test: " << sizeof(test) << endl; // Should be 24 (word size of 8 for double)
cout << "i: size = " << sizeof te.i << ", offset = " << offsetof(test, i) << endl;
cout << "h: size = " << sizeof te.h << ", offset = " << offsetof(test, h) << endl;
cout << "j: size = " << sizeof te.j << ", offset = " << offsetof(test, j) << endl;
return 0;
}
On my system (x86), I get the following output:
size of an int: 4
size of a double: 8
size of test: 16
i: size = 4, offset = 0
h: size = 8, offset = 4
j: size = 4, offset = 12
On another system (SPARC), I get:
size of an int: 4
size of a double: 8
size of test: 24
i: size = 4, offset = 0
h: size = 8, offset = 8
j: size = 4, offset = 16
The compiler will insert padding bytes between struct members to ensure that each member is aligned properly. As you can see, alignment requirements vary from system to system; on one system (x86), double is 8 bytes but only requires 4-byte alignment, and on another system (SPARC), double is 8 bytes and requires 8-byte alignment.
Padding can also be added at the end of a struct to ensure that everything is aligned properly when you have an array of the struct type. On SPARC, for example, the compile adds 4 bytes pf padding at the end of the struct.
The language guarantees that the first declared member will be at an offset of 0, and that members are laid out in the order in which they're declared. (At least that's true for simple structs; C++ metadata might complicate things.)
Compilers are free to space out structs however they want past the first member, and usually use padding to align to word boundaries for speed.
See these:
C struct sizes inconsistence
Struct varies in memory size?
et. al.