Char to Int Pointer Casting Not Working

Char to Int Pointer Casting Not Working - c++

I'm confusing about casting of char to int pointer. I'm checking how pointer's casting works and the below code int to char is working fine.
#include <iostream>
using namespace std;
int main(){
int a=65;
void *p=&a;
cout << *static_cast<char*>(p);
}
Output
A
But when I try to cast from char to int it's not showing correct value.
#include <iostream>
using namespace std;
int main(){
char a='A';
void *p=&a;
cout << *static_cast<int*>(p);
}
What is the problem in the above code? Output is about garbage value.

First, you have to understand that the x86 architecture is what is called little-endian. This means that in multibyte variables, the bytes are ordered in memory from least to most significant. If you don't understand what that means, it'll become clear in a second.
A char is 8 bits -- one byte. When you store 'A' into one, it gets the value 0x41 and is happy. An int is larger; on many architectures it is 32 bits -- 4 bytes. When you assign the value 'A' to an int, it gets the value 0x00000041. This is numerically exactly the same, but there are three extra bytes of zeros in the int.
So your int contains 0x00000041. In memory, that is arranged in bytes, and because you're on a little-endian architecture, those bytes are arranged from least to most significant -- the opposite of how we normally write them! The memory actually looks like this:
+----+----+----+----+
int: | 41 | 00 | 00 | 00 |
+----+----+----+----+
+----+
char: | 41 |
+----+
When you take a pointer to the int and cast it to a char*, and then dereference it, the compiler will take the first byte of the int -- because chars are only one byte wide -- and print it out. The other three bytes get ignored! Now look back and notice that if the order of the bytes in the int were reversed, as on a big-endian architecture, you would have retrieved the value zero instead! So the behavior of this code -- the fact that the cast from int* to char* worked as you expected -- was strictly dependent on the machine you were running it on.
On the other hand, when you take a pointer to the char and cast it to an int*, and then defererence it, the compiler will grab the one byte in the char as you'd expect, but then it will also read three more bytes past it, because ints are four bytes wide! What is in those three bytes? You don't know! Your memory looks like this:
+----+
char: | 41 |
+----+
+----+----+----+----+
int: | 41 | ?? | ?? | ?? |
+----+----+----+----+
You get a garbage value in your int because you're reading memory that is uninitialized. On a different platform or under a different planetary alignment, your code might work perfectly fine, or it might segfault and crash. There's no telling. This is what is known as undefined behavior, and it is a dangerous game that we play with our compilers. We have to be very careful when working with memory on like this; there's nothing scarier than nondeterministic code.

You can safely represent anything as an array of char. It doesn't work the other way. This is part of the STRICT ALIASING rule.
You can read up on strict aliasing in other questions:
What is the strict aliasing rule?
More closely related to your question:
Once again: strict aliasing rule and char*

Quoting the answer given here: What is the strict aliasing rule?
[...] dereferencing a pointer that aliases another of an incompatible type is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.
Also related to your question: Once again: strict aliasing rule and char*
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of type char). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule.
(I must give credit for this second link to #Let_Me_Be)

Here when you are doing:
cout << *static_cast<int*>(p);
you are actually saying that p is pointing to an integer (represented by 4 bytes in memory) but you just written a char in it before (represented by 1 bytes in memory) so when you cast it to an integer you expanded your variable to 3 garbage bytes.
But if you cast it back to a char you will get your 'A' because you are slicing your int to a char:
cout << (char) *static_cast<int*>(p);
Otherwise if you just want the ASCII value, cast your void* to an char* (so when you dereference it you are only accessing 1 byte) and cast what is inside it to int.
char a = 'A';
void *p=&a;
cout << static_cast<int>(*((char*)p));
The fact is that static cast is able to understand that you want to cast an char to int (and get his ASCII value) but when asking a char* to int* he just change the number of bytes read when you dereference it.

According to Standards, casting a char (or multiple chars) to int is undefined behavior and therefore any result is allowed. Most compilers will try to do what makes sense, and so the following is a likely reason for the behavior you are seeing on your specific architecture:
Assuming a 32 bit int, an int is the same size as 4 chars
Different architectures will treat those four bytes differently to translate their value to an int, most commonly this is either little endian or big endian
Looking at:
[Byte1][Byte2][Byte3][Byte4]
The int value would either be:
(Little Endina) Byte1+Byte2*256+Byte3*256^2+Byte4*256^3
(Big Endian ) Byte4+Byte3*256+Byte2*256^2+Byte1*256^3
In your case either Byte1 or Byte4 is being set, the remaining bytes are whatever happens to be in memory since you are only reserving one byte where you need 4
Try the following:
int main(){
char a[4]={'A', 0, 0, 0};
void *p=a;
cout << *static_cast<int*>(p);
}
You may have to switch the initialization to {0,0,0, 'A'} to get what you want based on architecture
As noted, this is undefined behavior, but should work with most compilers and give you a better idea of what is going on under the hood

Consider following code:
#include <iostream>
#include <iomanip>
using namespace std;
int main(){
{
int a=65;
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << setfill('0') << setw(2 * sizeof(int)) << *static_cast<int*>(p) << "\n";
}
{
char a='A';
cout << hex << static_cast<int>(a) << "\n";
void *p=&a;
cout << hex << *static_cast<int*>(p) << "\n";
}
}
There is indeed 'A' character code (0x41) in the output, but its padded to the size of int with uninitialized values. You can see it, when you output the hexadecimal values of variables.

Related

Why is this pointer null

In Visual Studio, it seems like pointer to member variables are 32 bit signed integers behind the scenes (even in 64 bit mode), and a null-pointer is -1 in that context. So if I have a class like:
#include <iostream>
#include <cstdint>
struct Foo
{
char arr1[INT_MAX];
char arr2[INT_MAX];
char ch1;
char ch2;
};
int main()
{
auto p = &Foo::ch2;
std::cout << (p?"Not null":"null") << '\n';
}
It compiles, and prints "null". So, am I causing some kind of undefined behavior, or was the compiler supposed to reject this code and this is a bug in the compiler?
Edit:
It appears that I can keep the "2 INT_MAX arrays plus 2 chars" pattern and only in that case the compiler allows me to add as many members as I wish and the second character is always considered to be null. See demo. If I changed the pattern slightly (like 1 or 3 chars instead of 2 at some point) it complains that the class is too large.

The size limit of an object is implementation defined, per Annex B of the standard [1]. Your struct is of an absurd size.
If the struct is:
struct Foo
{
char arr1[INT_MAX];
//char arr2[INT_MAX];
char ch1;
char ch2;
};
... the size of your struct in a relatively recent version of 64-bit MSVC appears to be around 2147483649 bytes. If you then add in arr2, suddenly sizeof will tell you that Foo is of size 1.
The C++ standard (Annex B) states that the compiler must document limitations, which MSVC does [2]. It states that it follows the recommended limit. Annex B, Section 2.17 provides a recommended limit of 262144(?) for the size of an object. While it's clear that MSVC can handle more than that, it documents that it follows that minimum recommendation so I'd assume you should take care when your object size is more than that.
[1] http://eel.is/c++draft/implimits
[2] https://learn.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019

It's clearly a collision between an optimization on pointer-to-member representation (use only 4 bytes of storage when no virtual bases are present) and the pigeonhole principle.
For a type X containing N subobjects of type char, there are N+1 possible valid pointer-to-members of type char X::*... one for each subobject, and one for null-pointer-to-member.
This works when there are at least N+1 distinct values in the pointer-to-member representation, which for a 4-byte representation implies that N+1 <= 232 and therefore the maximum object size is 232 - 1.
Unfortunately the compiler in question made the maximum object-type size (before it rejects the program) equal to 232 which is one too large and creates a pigeonhole problem -- at least one pair of pointer-to-members must be indistinguishable. It's not necessary that the null pointer-to-member be one half of this pair, but as you've observed in this implementation it is.

The expression &Foo::ch2 is of type char Foo::*, which is pointer to member of class Foo. By rules, a pointer to member converted to bool should be evaluated as false ONLY if it is a null pointer, i.e. it had nullptr assigned to it.
The fault here appears to be a implementation's flaw. i.e. on gcc compilers with -march=x86-64 any assigned pointer to member evaluates to non-null (1) unless it had nullptr assigned to it with following code:
struct foo
{
char arr1[LLONG_MAX];
char arr2[LLONG_MAX];
char ch1;
char ch2;
};
int main()
{
char foo::* p1 = &foo::ch1;
char foo::* p2 = &foo::ch2;
std::cout << (p1?"Not null ":"null ") << '\n';
std::cout << (p2?"Not null ":"null ") << '\n';
std::cout << LLONG_MAX + LLONG_MAX << '\n';
std::cout << ULLONG_MAX << '\n';
std::cout << offsetof(foo, ch1) << '\n';
}
Output:
Not null
null
-2
18446744073709551615
18446744073709551614
Likely it's related to fact that class size is exceeding platform limitations, leading to offset of member being wrapped around of 0 (internal value of nullptr). Compiler doesn't detect it because it becomes a victim of... integer overflow with signed value and it's programmer's fault to cause UB within compiler by using signed literals as array size: LLONG_MAX + LLONG_MAX = -2 would be "size" of two arrays combined.
Essentially size of first two members is calculated as negative and offset of ch1 is -2 represented as unsigned 18446744073709551614.
And -2 therefore pointer is not null. Another compiler may clamp value to 0 producing a nullptr, or actually detect existing problem as clang does.
If offset of ch1 is -2, then offset of ch2 is -1? Let's add this:
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch1)) << '\n';
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch2)) << '\n';
Additional output:
-2
-1
And offset for first member is obviously 0 and if pointer represent offsets, then it needs another value to represent nullptr. it's logical to assume that this particular compiler considers only -1 to be a null value, which may or may not be case for other implementations.

When I test the code, VS shows that Foo: the class is too large.
When I add char arr3[INT_MAX], Visual Studio will report Error C2089 'Foo': 'struct' too large. Microsoft Docs explains it as The specified structure or union exceeds the 4GB limit.

I am trying to access the individual bytes in a floating point number and I am getting unexpected results

So I have this so far:
#include <iostream>
#include <string>
#include <typeinfo>
using namespace std;
int main ()
{
float f = 3.45; // just an example fp#
char* ptr = (char*)&f; // a character pointer to the first byte of the fp#?
cout << int(ptr[0]) << endl; // these lines are just to see if I get what I
cout << int(ptr[1]) << endl; // am looking for... I want ints that I can
cout << int(ptr[2]) << endl; // otherwise manipulate.
cout << int(ptr[3]) << endl;
}
the result is:
-51
-52
92
64
so obviously -51 and -52 are not in the byte range that I would expect for a char... I have taken information from similar questions to arrive at this code and from all discussions, a conversion from char to int is straightforward. So why negative values? I am trying to look at a four-byte number, therefore I would expect 4 integers, each in the range 0-255.
I am using Codeblocks 13.12 with gcc 4.8.1 with option -std=C++11 on a Windows 8.1 device.
EDIT:
So the solution was to use:
unsigned char* ptr = reinterpret_cast<unsigned char*>( &f );
Thank you for all the responses.

Use unsigned char in order to get (guaranteed) unsigned values.
You're getting negative values because with your compiler and compiler options (yes, that matters) char is a signed type.
By the way, the prefix + operator is a handy, concise way to promote a char to int, for the purpose of displaying the numerical value.
Also, by the way, it's a generally good idea to use a C++ named casts (reinterpret_cast, const_cast, static_cast and dynamic_cast) instead of C style casts, where pointers are concerned. That's because a C style cast can end up doing something unexpected, especially when the code is maintained and types changed. In this case you are expressing a reinterpret_cast, so that's the one to use – just for good habit's sake.

What does (int) mean in C++?

I encountered the following line in a OpenGL tutorial and I wanna know what does the *(int*) mean and what is its value
if ( *(int*)&(header[0x1E])!=0 )

Let's take this a step at a time:
header[0x1E]
header must be an array of some kind, and here we are getting a reference to the 0x1Eth element in the array.
&(header[0x1E])
We take the address of that element.
(int*)&(header[0x1E])
We cast that address to a pointer-to-int.
*(int*)&(header[0x1E])
We dereference that pointer-to-int, yielding an int by interpreting the first sizeof(int) bytes of header, starting at offset 0x1E, as an int and gets the value it finds there.
if ( *(int*)&(header[0x1E])!=0 )
It compares that resulting value to 0 and if it isn't 0, executes whatever is in the body of the if statement.
Note that this is potentially very dangerous. Consider what would happen if header were declared as:
double header [0xFF];
...or as:
int header [5];

It's truly a terrible piece of code, but what it's doing is:
&(header[0x1E])
takes the address of the (0x1E + 1)th element of array header, let's call it addr:
(int *)addr
C-style cast this address into a pointer to an int, let's call this pointer p:
*p
dereferences this memory location as an int.

Assuming header is an array of bytes, and the original code has been tested only on intel, it's equivalent with:
header[0x1E] + header[0x1F] << 8 + header[0x20] << 16 + header[0x21] << 24;
However, besides the potential alignment issues the other posters mentioned, it has at least two more portability problems:
on a platform with 64 bit ints, it will make an int out of bytes 0x1E to 0x25 instead of the above; it will be also wrong on a platform with 16 bit ints, but I suppose those are too old to matter
on a big endian platform the number will be wrong, because the bytes will get reversed and it will end up as:
header[0x1E] << 24 + header[0x1F] << 16 + header[0x20] << 8 + header[0x21];
Also, if it's a bmp file header as rici assumed, the field is probably unsigned and the cast is done to a signed int. In this case it doesn't matter as it's being compared to zero, but in some other case it may.

Why is memcpy from int to char not working?

I have the hex value 0x48656c6c6f for which every byte represents the ASCII value of each character in the string "Hello". I also have the a char array that I want to insert these values into.
When I had a hex value that was smaller (for example, 0x48656c6c, which represents "Hell"), printing out the char array gave the correct output. But the following code prints "olle" (in little-endian) but not "olleH". Why is this?
#include <iostream>
#include <cstring>
int main()
{
char x[6] = {0};
int y = 0x48656c6c6f;
std::memcpy(x, &y, sizeof y);
for (char c : x)
std::cout << c;
}
Demo is here.

Probably int is 32 bit on your machine, which means that the upper byte of your constant is cut; so, your int y = 0x48656c6c6f; is actually int y = 0x656c6c6f; (by the way, I think that it counts as signed integer overflow, thus undefined behavior; to have defined behavior here you should use unsigned int).
So, on a little endian machine the in-memory representation of y is 6f 6c 6c 65, which is copied to x, resulting in the "olle" you see.
To "fix" the problem, you should use a bigger-sized integer, which, depending on your platform, may be long long, int64_t or similar stuff. In such a case, be sure to make x big enough (char x[sizeof(y)+1]={0}) to avoid buffer overflows or to change the memcpy to copy only the bytes that fit in x.
Also, always use unsigned integers when doing these kind of tricks - you avoid UB and get predictable behavior in case of overflow.

Probably int is four bytes on your platform.
ideone does show a warning, if there is also an error:
http://ideone.com/TSmDk5
prog.cpp: In function ‘int main()’:
prog.cpp:7:13: warning: overflow in implicit constant conversion [-Woverflow]
prog.cpp:12:5: error: ‘error’ was not declared in this scope

int y = 0x48656c6c6f;
int is not guaranteeed to store it, probably because your machine is 32-bit. Use long long instead

It is because an int is only 4 bytes on your platform and the H is being cut out when you provide a literal that is larger than that.

confusion in union concept

#include<stdio.h>
union node {
int i;
char c[2];
};
main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
printf("%d\n", n.i);
return 0;
}
I think it gives 512 output becouse c[0] value stores in first byte and c[1] value stores in second byte, but gives 1965097472. Why ?.
I compiled this program in codeblocks in windows.

Your union allocates four bytes, starting off as:
[????] [????] [????] [????]
You set the least two significant bytes:
[????] [????] [0x02] [0x00]
You then print out all four bytes as an integer. You're not going to get 512, necessarily, because anything can be in those most significant two bytes. In this case, you had:
[0x75] [0x21] [0x02] [0x00]

Because undefined behavior. Accessing an union member that wasn't set does that, simple as that. It can do anything, print anything, and even crash.

Undefined behavior is, well... undefined.
We can try to answer why a specific result was given (and the other answers do that by guessing compiler implementation details), but we cannot say why another result was not given. For all that we know, the compiler could have printed 0, formatted your hard drive, set your house on fire or transferred 100,000,000 USD to your bank account.

The intis compiled as a 32 bit number, little endian. By setting the two lower bytes to 2 and 0 respectively and then reading the int you get 1965097472. If you look at the hexadecimal representation 7521 0200 you see your bytes again, and besides it is undefined behaviour and part of it depends on the memory architecture of the platform the program is running on.

Note that your int is likely to be at least 4 bytes (not 2, like it was in the good ol' days). To let the sizes match, change the type of i to uint16_t.
Even after this, the standard does not really permit setting one union member, and then accessing a different one in an attempt to reinterpret the bytes. However, you could get the same effect with a reinterpret_cast.
union node {
uint16_t i;
uint8_t c[2];
};
int main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
std::cout << *reinterpret_cast<uint16_t *>(&n) << std::endl;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js