How to understand this unfamiliar casting? [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
#include <iostream>
int main() {
__int64 a = (__int64)"J\x10";
return 0;
}
When I run it, the result is a = 12950320.
How to understand (__int64)"J\x10"?

"J\x10" is a string literal. String literals are arrays of characters with static storage.
__int64 is presumably some type. Based on the name, we can presume that it is some implementation defined (non-standard) 1 64 bits wide signed integer type.
Expression (T)expression is an explicit type conversion colloquially called C-style cast. It performs one or combination of 2 static cast, reinterpret cast or const cast on the operand expression. In this case, the expression converts the value of the string literal expression into the type __int64.
When the value of an array (such as string literal) is used, it is implicitly converted to a pointer to the first element. This is called decaying. The value of a pointer is the memory address where the object is stored.
So, this pointer to the first character of the string literal is converted to the implementation defined integer type. There is no static cast from pointer to integer, so this is a reinterpret cast. Assuming the integer type is large enough to represent the value stored in the pointer (that'll be the case for most systems today, but is not something guaranteed by C++), this conversion maps the address value to some integer value in an implementation defined manner.
If you're still confused: That's fine; the program doesn't make much sense even after understanding what it does.
1 This means that using such type makes the program usable only on limited set of systems that support such special type.
2 It is generally recommended to avoid using C-style casts and instead use one of the specific casts that you intend to use. C-style casts often prevent the compiler from catching obvious bugs. Also, reinterpret cast and const cast should not be used unless you know exactly what it does in the context where you use it and what are the ramifications.

"J\x10" is a string (two chars here, "J" and hexa 10), which by default in C++ is considered as a const char*.
You are trying to cast that const char * to a __int64 value and store it at "a". I think this is a nasty cast.
Running this code several times will shows you that the pointer may vary from to execution to execution (it may show as the same, just due to OS cache).
Another thing to take into account is that __int64 is not a standard type, but a MS one.

Well, that is an explicit type conversion as mentioned in comments. In this case is it relies heavily on programmer knowing what she/he is doing and memory space needed for the conversion.
Let's say we have memory of 2 bytes, and value of those 2 bytes are 0:
int16_t memory = 0; // 16 bits is 2 bytes,
+-----------+-----------+
| 0000-0000 | 0000-0000 |
+-----------+-----------+
Now we can read those 2 bytes individually (byte per byte), or collectivly (as whole 2 byte value). Now lets think of it as an union, and imagine that 2 byte memory space as independant spaces.
union test{
int16_t bytes2;
int8_t bytes[2];
char chars[2];
};
So when we input an value of 20299 into that union we can do next:
union test sub;
sub.bytes2 = 20299;
std::cout << sub.bytes2 << std::endl; // 20299
std::cout << (int)sub.bytes[1] << " " << (int)sub.bytes[0] << std::endl; // 79 75
std::cout << sub.chars[1] << " " << sub.chars[0] << std::endl; // O K
Which works like this:
+-----------+-----------+
| 20 299 |
+-----------+-----------+
| 79 | 75 |
+-----------+-----------+
| 'O' | 'K' |
+-----------+-----------+
But we could do the similar if we ignore several compiler warnings:
sub.bytes2 = (int16_t)'OK'; //CygWin will issue warning about this
std::cout << sub.bytes2 << std::endl; // 20299
std::cout << (int)sub.bytes[1] << " " << (int)sub.bytes[0] << std::endl; // 79 75
std::cout << sub.chars[1] << " " << sub.chars[0] << std::endl; // O K
But if we pass an string "OK" instead of char array 'OK' we will get an error cast from 'const char*' to 'int16_t {aka short int}' loses precision [-fpermissive], and this is because std::string has an extra character \0 that isn't shown marking an end of string. So instead of passing 2 byte constant, we are passing 3 byte constant, and our compiler is issuing an error.
Among memory error that I described earlier there is a lot more going on. Strings are pointers to the memory addres similar in functionality as our union. So any string is an reference to specific memory addres with no information how many bytes does that memory addres occupy (aka length of an string). Now there are headers ( <string> and <cstring> ) that help with that issue in many ways, but that is off topic for this question.
That is an pure basis how casting works, it reads 1 memory as a different type from original integer.
In code you provided we have __int64 a = (__int64)"J\x10" , where __int64 in an 64 bit integer on Windows, and "J\x10" is string literal of specific size that holds value of those letters - but \x10 is an integer of value of 16.
union test{
uint64_t val;
char chars[8];
};
int64_t a = (int64_t)"J\x10";
union test sub;
sub.val = a;
// 4299178052
std::cout << sub.val << std::endl;
// ◦ ◦ ◦ ◦ # # D
std::cout << sub.chars[7] << " "<< sub.chars[6] << " "<< sub.chars[5] << " "<< sub.chars[4] << " " << sub.chars[3] << " " << sub.chars[2] << " "<< sub.chars[1] << " "<< sub.chars[0] << " "<< std::endl;
// 0 0 0 1 0 40 40 44
std::cout <<std::hex << (int)sub.chars[7] << " "<<std::hex << (int)sub.chars[6] << " "<<std::hex << (int)sub.chars[5] << " "<<std::hex << (int)sub.chars[4] << " "<<std::hex << (int)sub.chars[3] << " " <<std::hex << (int)sub.chars[2] << " " <<std::hex << (int)sub.chars[1] << " "<<std::hex << (int)sub.chars[0] << " "<< std::endl;
But as you can see I didn't get the same result as you, and the issues range in more than this. __int64 is an Virtual Studio only type only, and it is corresponding to the long long, how constant memory reads is prone to errors of referencing, and in general it is bad code that should be discouraged.
This kind of behaviour, especially with VS type only is not easy to understand, and it doesn't have to have same result, it can also raise numerous errors if copy/pasted that are hard to decipher if OS and IDE brands are unknown or not present in metadata. You should always use known aliases and references

Related

Does converting a pointer to int actually return the location of it in byte or something? [duplicate]

This question already has answers here:
What does "dereferencing" a pointer mean?
(6 answers)
Closed 2 years ago.
I am new to C++, but I am curious enough to dig into these strange things.
I was wondering what happens when I convert a pointer to an int and realized could they indicate something. So I wrote this program to test my ideas as pointers in the same array are close enough in terms of memory location to be compared.
This is the code that will explain my question clearly.
#include <iostream>
using namespace std;
int main() {
cout << "--------------------[ Pointers ]--------------------" << endl;
const unsigned int NSTRINGS = 9;
string strArray[NSTRINGS] = { "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine" };
string *pStartArray = &strArray[0]; // Setting pStartArray pointer location to the first block of the array.
string *pEndArray = &strArray[NSTRINGS - 1]; // Setting pEndArray pointer location to the last block of the array.
cout << "---[ pStartArray value : " << *pStartArray << endl; // Showing the value of the pStartArray pointer (Just for safety check).
cout << "---[ pEndArray value : " << *pEndArray << endl; // Showing the value of the pEndArray pointer (Just for safety check).
short int blockDifferential = pEndArray - pStartArray; // Calculating the block differential of those two pointers.
cout << "---[ Differential of the block locations that pointers are pointing to in array (pEndArray - pStartArray) : " << blockDifferential << endl;
long long pStartIntLocation = (long long)pStartArray; // Converting the memory location (Hexadecimal) of pStartArray pointer to int (Maybe it's byte, regardless of being positive or negative). What's your opinion on this?
cout << "---[ (long long) pStartArray current memory location to int : \"" << pStartIntLocation << "\"" << endl;
long long pEndIntLocation = (long long)pEndArray;
cout << "---[ (long long) pEndArray current memory location converted to int : \"" << pEndIntLocation << "\"" << endl; // Converting the memory location (Hexadecimal) of pEndArray pointer to int (Maybe it's byte, regardless of being positive or negative). What's your opinion on this?
short int locationDifferential = pEndIntLocation - pStartIntLocation; // And subtracting the integer convetred location of pEndArray from pStartArray.
cout << "---[ Differential of the memory locations converted to int ((long long)pStartArray - (long long)pEndArray) : " << locationDifferential << " (Bytes?)" << endl; // Seems like even after running the program multiple times, this number does not change. Something's fishy. Doesn't it seem like it's a random thing. It must be investigated.
cout << "---[ Size of variable <string> (According to the computer that it's running) : " << sizeof(string) << " (Bytes)" << endl; // To know how much memory does a string consume. For example mine was 40.
// Here it goes interesting. I can get the block differential of the pointers using <locationDifferential>.
cout << "---[ Differential of the cell location (AGAIN) using the <locationDifferential> that I have calculated : " << locationDifferential/sizeof(string) << endl; // So definately <locationDifferential> was in bytes. Because I got 8 again. I just wonder is it a new discovery. LOL.
/*
I might look really crazy, because I can't tell it another way. It just can't happen.
This is the last try to make it as clear as I can.
pStrArray ]--\ pEndArray ]--\
\ - 8 cell difference - \
Array = | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|-------- Differential ---------|
Cell difference : 8 string cells
String size that I considered : 32 Bytes
data (location difference) : 8 * 32 = 256
So if you see this, all this might make sense.
I am excited to see what opinions you professional programmers have come up with.
- D3F4U1T
*/
cout << "----------------------------------------------------" << endl;
return 0;
}
How does this exactly work?
pStartIntLocation, pEndIntLocation are all in bytes?
If so, why sometimes they return negative value?
This is strange.
Also correct me if I am wrong about any information I provided.
- Best regards.
- D3F4U1T.
Edit 2:
Does the value that results from the conversion from pointer to a long long mean anything? Like the memory address but with the difference that this one is in bytes?
Edit 3: Seems like this is related to virtual address space. Correct me if I am wrong. Does the OS have any mechanism to number the memory as bytes. For example: Byte 1 , Byte 2 , ....
A "pointer" is an integer quantity of some length whose contents are understood to represent a memory address. (By convention, zero means NULL ... no address.)
If you typecast it into an integer, you are simply declaring to the compiler that *"no, these however-many bits should not be treated as an address ... treat them as an integer." The content of the location does not change, only the compiler's interpretation of it.
Typecasting does not change the bits – only the momentary interpretation of what they are and what they mean.
FYI: unions are another way to do a similar thing: every element of a union overlaps the others and describes various interpretations of the same area of storage. (In the Fortran language, this was called EQUIVALENCE.)

How to avoid 0xFF prefix when converting char to short?

When I do:
cout << std::hex << (short)('\x3A') << std::endl;
cout << std::hex << (short)('\x8C') << std::endl;
I expect the following output:
3a
8c
but instead, I have:
3a
ff8c
I suppose that this is due to the way char—and more precisely a signed char—is stored in memory: everything below 0x80 would not be prefixed; the value 0x80 and above, on the other hand, would be prefixed with 0xFF.
When given a signed char, how do I get a hexadecimal representation of the actual character inside it? In other words, how do I get 0x3A for \x3A, and 0x8C for \x8C?
I don't think a conditional logic is well suited here. While I can subtract 0xFF00 from the resulting short when needed, it doesn't seem very clear.
Your output might make more sense if you looked at it in decimal instead of hexadecimal:
std::cout << std::dec << (short)('\x3A') << std::endl;
std::cout << std::dec << (short)('\x8C') << std::endl;
output:
58
-116
The values were cast to short, so we are (most commonly) dealing with 16 bit values. The 16-bit binary representation of -116 is 1111 1111 1000 1100, which becomes FF8C in hexadecimal. So the output is correct given what you requested (on systems where char is a signed type). So not so much the way the char is stored in memory, but more the way the bits are interpreted. As an unsigned value, the 8-bit pattern 1000 1100 represents -116, and the conversion to short is supposed to preserve this value, rather than preserving the bits.
Your desired output of a hexadecimal 8C corresponds (for a short) to the decimal value 140. To get this value out of 8 bits, the value has to be interpreted as an unsigned 8-bit value (since the largest signed 8-bit value is 127). So the data needs to be interpreted as an unsigned char before it gets expanded to some flavor of short. For a character literal like in the example code, this would look like the following.
std::cout << std::hex << (unsigned short)(unsigned char)('\x3A') << std::endl;
std::cout << std::hex << (unsigned short)(unsigned char)('\x8C') << std::endl;
Most likely, the real code would have variables instead of character literals. If that is the case, then rather than casting to an unsigned char, it might be more convenient to declare the variable to be of unsigned char type. Which is possibly the type you should be using anyway, based on the fact that you want to see its hexadecimal value. Not definitively, but this does suggest that the value is seen simply as a byte of data rather than as a number, and that suggests that an unsigned type is appropriate. Have you looked at std::byte?
One other nifty thought to throw out: the following also gives the desired output as a reasonable facsimile of using an unsigned char variable.
#include <iostream>
unsigned char operator "" _u (char c) { return c; } // Suffix for unsigned char literals
int main()
{
std::cout << std::hex << (unsigned short)('\x3A'_u) << std::endl;
std::cout << std::hex << (unsigned short)('\x8C'_u) << std::endl;
}
A more straightforward approach is to cast a signed char to an unsigned char. In other words, this:
cout << std::hex << (short)(unsigned char)('\x3A') << std::endl;
cout << std::hex << (short)(unsigned char)('\x8C') << std::endl;
produces the expected result:
3a
8c
Not sure this is particularly clear, though.

Initialize typedef struct by attribute problem

I defined an struct based on bytes, with size of 3 bytes. (1 packetID and 2 packetSize) I checked the size with sizeof function, and it works well:
#pragma pack(1)
typedef struct ENVIRONMENT_STRUCT{
unsigned char packetID[1];
unsigned char packetSize[2];
}
I created a variable and reserved memory like this:
ENVIRONMENT_STRUCT * environment_struct = new ENVIRONMENT_STRUCT();
For now I want to initialize environment_struct.
The problem is about I am trying to initialize this struct by attribute, just like this:
*environment_struct->packetSize = 100;
But when I checked this value, using:
std::cout << "Packet Size: " << environment_struct->packetSize << endl;
Result: Packet Size: d
Expected result: Packet Size: 100
If i will work with numbers, Should I define the struct using csdint library? For example, u_int8 and this type of variable.
When you do
ENVIRONMENT_STRUCT * environment_struct = new ENVIRONMENT_STRUCT();
you initialize packetSize to be {0, 0}. Then
*environment_struct->packetSize = 100;
turns the array into {100, 0}. Since the array is a character array when you send it to cout with
std::cout << "Packet Size: " << environment_struct->packetSize << endl;
it treats it as a c-string and prints out the string contents. Since you see d that means your system is using ascii as the character 'd' has an integer representation of 100. To see the 100 you need to cast it to an int like
std::cout << "Packet Size: " << static_cast<int>(*environment_struct->packetSize) << endl;
Do note that since packetSize is an array of two chars you can't actually assign a single value that takes up that whole space. If you want this then you need to use fixed width types like
typedef struct ENVIRONMENT_STRUCT{
uint8_t packetID; // unsigned integer that is exactly 8 bits wide. Will be a compiler error if it does not exist
uint16_t packetSize; // unsigned integer that is exactly 16 bits wide. Will be a compiler error if it does not exist
};
int main()
{
ENVIRONMENT_STRUCT * environment_struct = new ENVIRONMENT_STRUCT();
environment_struct->packetSize = 100;
std::cout << "Packet Size: " << environment_struct->packetSize << std::endl;
}
Let's first consider what *environment_struct->packetSize = 100; does. It sets the first byte of ENVIRONMENT_STRUCT::packetSize to 100. A more conventional syntax to do this is: environment_struct->packetSize[0] = 100.
There's really no way to initialize the struct in a way for the expression std::cout << environment_struct->packetSize to result in the output of 100. Let us consider what that does. environment_struct->packetSize is an array, which in this case decays to a pointer to first element. Character pointers inserted into character streams are interpreted as null terminated character strings. Luckily, you had valueinitialized the second byte of environment_struct->packetSize, so your array is indeed null terminated. The value of the first byte is interpreted as an encoded character. On your system encoding, it happens that d character is encoded as value 100.
If you wish to print the numeric value of the first byte of environment_struct->packetSize, which you had set to 100, you can use:
std::cout << "Packet Size: " << (int)environment_struct->packetSize[0] << endl;
You get this result as you tries to print a character symbol not an integer.
To fix it just cast the value or declare it as integer depending on your needs.
Cast example:
std::cout << "Packet Size: " << static_cast<int>(*environment_struct->packetSize) << std::endl;
As packetSize is declared as char-type, it's being output as a character. (ASCII code of character 'd' is 100...)
Try casting it to an integer-type:
std::cout << "Packet Size: " << (int)environment_struct->packetSize << endl;
Alternatively, since you appear to want to store the number as a 2-byte type, you could avoid such casting and simply declare packetSize as unsigned short. This will be interpreted by cout as an integer-type.

std::dec still outputting memory address to hex?

I have:
std::cout << "Start = " << std::dec << (&myObject) << std::endl;
to output an address in decimal. However, the address is still coming out in hex??
(I am outputting one of these for each of ten members, so I don't want to assign each one to a variable and then std::dec the variable separately)
The hex and dec manipulators are for integers, not pointers. Pointers are always rendered in the form that printf's %p formatter would have used on your system (which is, usually, hexadecimal notation).
This helps to emphasise the fact that pointers and numbers are distinct. You may consider it to be one of the rare cases in which number semantics and number representation are, to some degree, coupled.
The best you can do is to cast the pointer to uintptr_t before streaming it:
std::cout << "Start = " << std::dec << uintptr_t(&myObject) << std::endl;
…but please consider whether you really need to do so.

What does casting char array to long does?

The author of this code states that (long)tab returns address of the tab. Is it true? If yes, why is it so?
char tab []= "PJC"
cout << " tab = " << tab << ", address: " << (long)tab << "\n" << endl;
Yes, its true. Raw arrays in C/C++ are considered so that their name is the pointer to the first element. So, you can write:
char tab[] = "PJC";
char c = *(tab + 1); // c == J
As pointer is no more than an integer value representing the address in memory, casting pointer to long will print you the address value.
You must be sure that integer would hold all values. Pointers always matches word size, so on 32-bit CPU a pointer is 4 byte, in 64-bit it is 8 byte and you'll need 64-bit integer not to have overflow - what exact type is it depends on the system (may be long long). You can use intptr_t (thanks #Avt) to store pointer values.
Typecasting a variable changes its interpretation, but the actual value remains the same. If you were to print the value with format specifier %x then you'll always get the same result, what typecast you use won't matter.
In this case, tab is a char*, which is nothing but an "address" of the location.
You should cast to void* to get the address. Run following to check
char tab []= "PJC"
cout << " tab = " << tab << ", address1: " << (void*)tab << ", address2: " << (long)tab << "\n" << endl;
But remember that result depends of architecture!