Why is memcpy from int to char not working? - c++

I have the hex value 0x48656c6c6f for which every byte represents the ASCII value of each character in the string "Hello". I also have the a char array that I want to insert these values into.
When I had a hex value that was smaller (for example, 0x48656c6c, which represents "Hell"), printing out the char array gave the correct output. But the following code prints "olle" (in little-endian) but not "olleH". Why is this?
#include <iostream>
#include <cstring>
int main()
{
char x[6] = {0};
int y = 0x48656c6c6f;
std::memcpy(x, &y, sizeof y);
for (char c : x)
std::cout << c;
}
Demo is here.

Probably int is 32 bit on your machine, which means that the upper byte of your constant is cut; so, your int y = 0x48656c6c6f; is actually int y = 0x656c6c6f; (by the way, I think that it counts as signed integer overflow, thus undefined behavior; to have defined behavior here you should use unsigned int).
So, on a little endian machine the in-memory representation of y is 6f 6c 6c 65, which is copied to x, resulting in the "olle" you see.
To "fix" the problem, you should use a bigger-sized integer, which, depending on your platform, may be long long, int64_t or similar stuff. In such a case, be sure to make x big enough (char x[sizeof(y)+1]={0}) to avoid buffer overflows or to change the memcpy to copy only the bytes that fit in x.
Also, always use unsigned integers when doing these kind of tricks - you avoid UB and get predictable behavior in case of overflow.

Probably int is four bytes on your platform.
ideone does show a warning, if there is also an error:
http://ideone.com/TSmDk5
prog.cpp: In function ‘int main()’:
prog.cpp:7:13: warning: overflow in implicit constant conversion [-Woverflow]
prog.cpp:12:5: error: ‘error’ was not declared in this scope

int y = 0x48656c6c6f;
int is not guaranteeed to store it, probably because your machine is 32-bit. Use long long instead

It is because an int is only 4 bytes on your platform and the H is being cut out when you provide a literal that is larger than that.

Related

Is it safe to compare an unsigned int with a std::string::size_type

I am going trough the book "Accelerated C++" by Andrew Koenig and Barbara E. Moo and I have some questions about the main example in chap 2. The code can be summarized as below, and is compiling without warning/error with g++:
#include <string>
using std::string;
int main()
{
const string greeting = "Hello, world!";
// OK
const int pad = 1;
// KO
// int pad = 1;
// OK
// unsigned int pad = 1;
const string::size_type cols = greeting.size() + 2 + pad * 2;
string::size_type c = 0;
if (c == 1 + pad)
{;}
return 0;
}
However, if I replace const int pad = 1; by int pad = 1;, the g++ compiler will return a warning:
warning: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (c == 1 + pad)
If I replace const int pad = 1; by unsigned int pad = 1;, the g++ compiler will not return a warning.
I understand why g++ return the warning, but I am not sure about the three below points:
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
From the compiler point of view:
It is unsafe to compare signed and unsinged variables (non-constants).
It is safe to compare 2 unsinged variables of different sizes.
It is safe to compare an unsigned variable with a singed constant if the compiler can check that constant to be in the allowed range for the type of the signed variable (e.g. for 16-bit signed integer it is safe to use a constant in range [0..32767]).
So the answers to your questions:
Yes, it is safe to compare unsigned int and std::string::size_type.
There is no warning because the compiler can perform the safety check (while compiling :)).
There is no problem to use different unsigned types in comparison. Use unsinged int.
Comparing signed and unsigned values is "dangerous" in the sense that you may not get what you expect when the signed value is negative (it may well behave as a very large unsigned value, and thus a > b gives true when a = -1 and b = 100. (The use of const int works because the compiler knows the value isn't changing and thus can say "well, this value is always 1, so it works fine here")
As long as the value you want to compare fits in unsigned int (on typical machines, a little over 4 billion) is fine.
If you are using std::string with the default allocator (which is likely), then size_type is actually size_t.
[support.types]/6 defines that size_t is
an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
So it's not technically guaranteed to be a unsigned int, but I believe it is defined this way in most cases.
Now regarding your second question: if you use const int something = 2, the compiler sees that this integer is a) never negative and b) never changes, so it's always safe to compare this variable with size_t. In some cases the compiler may optimize the variable out completely and simply replace all it's occurrences with 2.
I would say that it is better to use size_type everywhere where you are to the size of something, since it is more verbose.
What the compiler warns about is the comparison of unsigned and signed integer types. This is because the signed integer can be negative and the meaning is counter intuitive. This is because the signed is converted to unsigned before comparison, which means the negative number will compare greater than the positive.
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Yes, they are both unsigned and then the semantics is what's expected. If their range differs the narrower are converted to a wider type.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
This is because how the compiler is constructed. The compiler parses and to some extent optimizes the code before warnings are issued. The important point is that at the point this warning is being considered the compiler nows that the signed integer is 1 and then it's safe to compare with a unsigned integer.
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
If you don't want it to be constant the best solution would probably be to make it at least an unsigned integer type. However you should be aware that there is no guaranteed relation between normal integer types and sizes, for example unsigned int may be narrower, wider or equal to size_t and size_type (the latter may also differ).

Why system accepts without warning long int passed as int argument?

Quick simple question (almost out of a curiosity):
If I declare for example in a C++ program a long int, and then call it in a function taking an int, I know it will work without any problem unless I give it a 4 byte size value which will lead to bad printing content.
However what surprises me is that it doesn't warn in any way about this. If I declare a 4 byte long int, the system knows that it has 32 bits to store that value. But then, if I pass that same long int to a function where it calls only an int (2 bytes), I'm assuming that I'm using 16 bits in memory that shouldn't be used by this value.
Am I right? Or will it use only the lowest 16 bits from that long int received as argument? What is the process here?
Code example:
#include <stdio.h>
void test(int x) { // My question is why it accepts this?
printf("%d", x);
}
int main() {
long int y=4294967200; // 32 bits
test(y);
return 0;
}
Most likely it's because you didn't enable that feature in your compiler. For example, using GCC with conversion warnings enabled gives:
warning: conversion to ‘int’ from ‘long int’ may alter its value
If the question is why such warnings aren't enabled by default, it's because a lot of very common code patterns produce a large quantity of spurious warnings due to automatic promotions. For example, unsigned char p[10]; ... p[1] ^= 1;.

Char to Int Pointer Casting Not Working

I'm confusing about casting of char to int pointer. I'm checking how pointer's casting works and the below code int to char is working fine.
#include <iostream>
using namespace std;
int main(){
int a=65;
void *p=&a;
cout << *static_cast<char*>(p);
}
Output
A
But when I try to cast from char to int it's not showing correct value.
#include <iostream>
using namespace std;
int main(){
char a='A';
void *p=&a;
cout << *static_cast<int*>(p);
}
What is the problem in the above code? Output is about garbage value.
First, you have to understand that the x86 architecture is what is called little-endian. This means that in multibyte variables, the bytes are ordered in memory from least to most significant. If you don't understand what that means, it'll become clear in a second.
A char is 8 bits -- one byte. When you store 'A' into one, it gets the value 0x41 and is happy. An int is larger; on many architectures it is 32 bits -- 4 bytes. When you assign the value 'A' to an int, it gets the value 0x00000041. This is numerically exactly the same, but there are three extra bytes of zeros in the int.
So your int contains 0x00000041. In memory, that is arranged in bytes, and because you're on a little-endian architecture, those bytes are arranged from least to most significant -- the opposite of how we normally write them! The memory actually looks like this:
+----+----+----+----+
int: | 41 | 00 | 00 | 00 |
+----+----+----+----+
+----+
char: | 41 |
+----+
When you take a pointer to the int and cast it to a char*, and then dereference it, the compiler will take the first byte of the int -- because chars are only one byte wide -- and print it out. The other three bytes get ignored! Now look back and notice that if the order of the bytes in the int were reversed, as on a big-endian architecture, you would have retrieved the value zero instead! So the behavior of this code -- the fact that the cast from int* to char* worked as you expected -- was strictly dependent on the machine you were running it on.
On the other hand, when you take a pointer to the char and cast it to an int*, and then defererence it, the compiler will grab the one byte in the char as you'd expect, but then it will also read three more bytes past it, because ints are four bytes wide! What is in those three bytes? You don't know! Your memory looks like this:
+----+
char: | 41 |
+----+
+----+----+----+----+
int: | 41 | ?? | ?? | ?? |
+----+----+----+----+
You get a garbage value in your int because you're reading memory that is uninitialized. On a different platform or under a different planetary alignment, your code might work perfectly fine, or it might segfault and crash. There's no telling. This is what is known as undefined behavior, and it is a dangerous game that we play with our compilers. We have to be very careful when working with memory on like this; there's nothing scarier than nondeterministic code.
You can safely represent anything as an array of char. It doesn't work the other way. This is part of the STRICT ALIASING rule.
You can read up on strict aliasing in other questions:
What is the strict aliasing rule?
More closely related to your question:
Once again: strict aliasing rule and char*
Quoting the answer given here: What is the strict aliasing rule?
[...] dereferencing a pointer that aliases another of an incompatible type is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.
Also related to your question: Once again: strict aliasing rule and char*
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of type char). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.
(I must give credit for this second link to #Let_Me_Be)
Here when you are doing:
cout << *static_cast<int*>(p);
you are actually saying that p is pointing to an integer (represented by 4 bytes in memory) but you just written a char in it before (represented by 1 bytes in memory) so when you cast it to an integer you expanded your variable to 3 garbage bytes.
But if you cast it back to a char you will get your 'A' because you are slicing your int to a char:
cout << (char) *static_cast<int*>(p);
Otherwise if you just want the ASCII value, cast your void* to an char* (so when you dereference it you are only accessing 1 byte) and cast what is inside it to int.
char a = 'A';
void *p=&a;
cout << static_cast<int>(*((char*)p));
The fact is that static cast is able to understand that you want to cast an char to int (and get his ASCII value) but when asking a char* to int* he just change the number of bytes read when you dereference it.
According to Standards, casting a char (or multiple chars) to int is undefined behavior and therefore any result is allowed. Most compilers will try to do what makes sense, and so the following is a likely reason for the behavior you are seeing on your specific architecture:
Assuming a 32 bit int, an int is the same size as 4 chars
Different architectures will treat those four bytes differently to translate their value to an int, most commonly this is either little endian or big endian
Looking at:
[Byte1][Byte2][Byte3][Byte4]
The int value would either be:
(Little Endina) Byte1+Byte2*256+Byte3*256^2+Byte4*256^3
(Big Endian ) Byte4+Byte3*256+Byte2*256^2+Byte1*256^3
In your case either Byte1 or Byte4 is being set, the remaining bytes are whatever happens to be in memory since you are only reserving one byte where you need 4
Try the following:
int main(){
char a[4]={'A', 0, 0, 0};
void *p=a;
cout << *static_cast<int*>(p);
}
You may have to switch the initialization to {0,0,0, 'A'} to get what you want based on architecture
As noted, this is undefined behavior, but should work with most compilers and give you a better idea of what is going on under the hood
Consider following code:
#include <iostream>
#include <iomanip>
using namespace std;
int main(){
{
int a=65;
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << setfill('0') << setw(2 * sizeof(int)) << *static_cast<int*>(p) << "\n";
}
{
char a='A';
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << *static_cast<int*>(p) << "\n";
}
}
There is indeed 'A' character code (0x41) in the output, but its padded to the size of int with uninitialized values. You can see it, when you output the hexadecimal values of variables.

How to cast char array to int at non-aligned position?

Is there a way in C/C++ to cast a char array to an int at any position?
I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:
unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3)); // = 0x03040506 CORRECT
int32_t d2 = *((int*)(data+p)); // = 0x00010203 WRONG
Update:
As stated in the comments the input comes in tuples of 3 and I cannot
change that.
I want to convert 3 values to an int for further
processing and this conversion should be as fast as possible.
The
solution does not have to be cross platform. I am working with a very
specific compiler and processor, so it can be assumed that it is a 32
bit architecture with big endian.
The lowest byte of the result does not matter to me (see above).
My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?
No you can't do that in a portable way.
The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)
(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)
Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.
Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.
There is no way, converting a pointer to a wrongly aligned one is undefined.
You can use memcpy to copy the char array into an int32_t.
int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4
Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.
Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.
In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like
int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];

c bitfields strange behaviour with long int in struct

i am observing strange behaviour when i run the following code.
i create a bitfield by using a struct, where i want to use 52 bits, so i use long int.
The size of long int is 64 bits on my system, i check it inside the code.
Somehow when i try to set one bit, it alwas sets two bits. one of them is the one i wanted to set and the second one is the index of the first one plus 32.
Cann anybody tell me, why is that?
#include <stdio.h>
typedef struct foo {
long int x:52;
long int:12;
};
int main(){
struct foo test;
int index=0;
printf("%ld\n",sizeof(test));
while(index<64){
if(test.x & (1<<index))
printf("%i\n",index);
index++;
}
test.x=1;
index=0;
while(index<64){
if(test.x & (1<<index))
printf("%i\n",index);
index++;
}
return 0;
}
Sry forgot to post the output, so my question was basicly not understandable...
The Output it gives me is the following:
8
0
32
index is of type int, which is probably 32 bits on your system. Shifting a value by an amount greater than or equal to the number of bits in its type has undefined behavior.
Change index to unsigned long (bit-shifting signed types is ill-advised). Or you can change 1<<index to 1L << index, or even 1LL << index.
As others have pointed out, test is uninitialized. You can initialize it to all zeros like this:
struct foo test = { 0 };
The correct printf format for size_t is %zu, not %ld.
And it wouldn't be a bad idea to modify your code so it doesn't depend on the non-portable assumption that long is 64 bits. It can be as narrow as 32 bits. Consider using the uint_N_t types defined in <stdint.h>.
I should also mention that bit fields of types other than int, unsigned int, signed int, and _Bool (or bool) are implementation-defined.
You have undefined behavior in your code, as you check the bits in text.x without initializing the structure. Because you don't initialize the variable, it will contain random data.