Strange address conversion in C/C++? [closed] - c++

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I can't say that I know C/C++ bad, but I've encountered with interesting syntax.
I have this code:
int i=7;
char* m=(char*)&i;
m[2]=9;
cout<<i;
Its output 589831. So can someone explain me in details what is going here.

The integer i very likely takes 4 bytes, arranged with the lowest value first (little endian). In memory the values look like this:
0x07 0x00 0x00 0x00
You changed the value at index 2 so now it looks like:
0x07 0x00 0x09 0x00
If you reverse the bytes and put them back together, they make the hex value 0x00090007 which is the same as 589831 in decimal.

a 4-byte integer is filled with the number 7.
the 4-byte integer is mapped to an array of four single bytes (chars). On a little-endian architecture like x86 the least significant bytes come first in a number, so the array looks like this in memory: { 07, 00, 00, 00 }
the 3rd byte of the integer slash byte array is changed to 9. It now looks like this: { 07, 00, 09, 00 }
the resulting integer (hexadecimal 90007) is written to stdout (in decimal format: 589831).
Long story short, it's an example how you can manipulate individual bytes in a multi-byte integer.

You are casting the integer address to a char* then modifying it using array notation. This step
m[2] = 9;
is the same as the pointer arithmetic
*(m+2) = 9;
that is to say, it is modifying the byte at address of m + 2 bytes. Thus you have changed one of the bytes (3rd) in your initial integer value

Here is my breakdown of what is going on, then an explanation.
// An integer on the stack, probably 4 bytes big, but we can't say that for sure.
int i=7; // Looks like 0x0000007 in memory. Endianness needs to be considered.
// Treat that integer as a \0 terminated string.
char* m=(char*)&i; // Acts as an empty string since the first byte is a 0, but we can't count on that.
// Set the second byte to 9.
m[2]=9; // Results in i being 0x00090007 (589831 decimal) on whatever architecture you are running. Once again, can't count on it.
// Print the modified integer.
cout<<i;
This is an incredibly dangerous and stupid thing to do for three reasons...
You should not count on the endianness of your architecture. Your code may end up running on a CPU that has a different underlying representation of what an int is.
You cannot count on int to always be 4 bytes.
You now have a char* that if you ever go to perform a string operation on it could cause a crash. In your specific case, it will print an empty string, but it would not take much for that integer to not have a 0 byte in it and go on reading other parts of your stack.
If you really, really, really need to do this, the preferred method is to use unions but this kind of bit twiddling is very error prone and unions do very little to help.

int i=7 reserves 4 bytes of memory for integer and depending on CPU architecture (lets say yours is i86) would produce something like this in memory
7 0 0 0
then a pointer m created to point at the beginning of 7 0 0 0 .
after m[2] = 9 memory should look like
7 0 9 0 (arrays are zero based);
then you printout i

Related

memory address gets cut off after 8 digits

memory address gets cuts off after 8 digits help
DWORD* memoryAddress = (DWORD*)0x155221000;
turns from 0x155221000 into 0x55221000 after conversion.
On a 32bits system, an address is 4 bytes long. So DWORD* memoryAddress = (DWORD*)0x155221000; would be truncated by definition (also bad to use C-style casts). The compiler should give you a truncation warning by the way.
1428295680 is the base-10 representation of the same value (addresses are usually represented in hexadecimal, still the same value).
As the comments from different people said, DWORD is 4 bytes (just a coincidence that addresses are also 4 bytes), it would also truncate your number for the same reason.

What is the endianness of binary literals in C++14?

I have tried searching around but have not been able to find much about binary literals and endianness. Are binary literals little-endian, big-endian or something else (such as matching the target platform)?
As an example, what is the decimal value of 0b0111? Is it 7? Platform specific? Something else? Edit: I picked a bad value of 7 since it is represented within one byte. The question has been sufficiently answered despite this fact.
Some background: Basically I'm trying to figure out what the value of the least significant bits are, and masking it with binary literals seemed like a good way to go... but only if there is some guarantee about endianness.
Short answer: there isn't one. Write the number the way you would write it on paper.
Long answer:
Endianness is never exposed directly in the code unless you really try to get it out (such as using pointer tricks). 0b0111 is 7, it's the same rules as hex, writing
int i = 0xAA77;
doesn't mean 0x77AA on some platforms because that would be absurd. Where would the extra 0s that are missing go anyway with 32-bit ints? Would they get padded on the front, then the whole thing flipped to 0x77AA0000, or would they get added after? I have no idea what someone would expect if that were the case.
The point is that C++ doesn't make any assumptions about the endianness of the machine*, if you write code using primitives and the literals it provides, the behavior will be the same from machine to machine (unless you start circumventing the type system, which you may need to do).
To address your update: the number will be the way you write it out. The bits will not be reordered or any such thing, the most significant bit is on the left and the least significant bit is on the right.
There seems to be a misunderstanding here about what endianness is. Endianness refers to how bytes are ordered in memory and how they must be interpretted. If I gave you the number "4172" and said "if this is four-thousand one-hundred seventy-two, what is the endianness" you can't really give an answer because the question doesn't make sense. (some argue that the largest digit on the left means big endian, but without memory addresses the question of endianness is not answerable or relevant). This is just a number, there are no bytes to interpret, there are no memory addresses. Assuming 4 byte integer representation, the bytes that correspond to it are:
low address ----> high address
Big endian: 00 00 10 4c
Little endian: 4c 10 00 00
so, given either of those and told "this is the computer's internal representation of 4172" you could determine if its little or big endian.
So now consider your binary literal 0b0111 these 4 bits represent one nybble, and can be stored as either
low ---> high
Big endian: 00 00 00 07
Little endian: 07 00 00 00
But you don't have to care because this is also handled by the hardware, the language dictates that the compiler reads from left to right, most significant bit to least significant bit
Endianness is not about individual bits. Given that a byte is 8 bits, if I hand you 0b00000111 and say "is this little or big endian?" again you can't say because you only have one byte (and no addresses). Endianness doesn't pertain to the order of bits in a byte, it refers to the ordering of entire bytes with respect to address(unless of course you have one-bit bytes).
You don't have to care about what your computer is using internally. 0b0111 just saves you the time from having to write stuff like
unsigned int mask = 7; // only keep the lowest 3 bits
by writing
unsigned int mask = 0b0111;
Without needing to comment explaining the significance of the number.
* In c++20 you can check the endianness using std::endian.
All integer literals, including binary ones are interpreted in the same way as we normally read numbers (left most digit being most significant).
The C++ standard guarantees the same interpretation of literals without having to be concerned with the specific environment you're on. Thus, you don't have to concern yourself with endianness in this context.
Your example of 0b0111 is always equal to seven.
The C++ standard doesn't use terms of endianness in regards to number literals. Rather, it simply describes that literals have a consistent interpretation, and that the interpretation is the one you would expect.
C++ Standard - Integer Literals - 2.14.2 - paragraph 1
An integer literal is a sequence of digits that has no period or
exponent part, with optional separating single quotes that are ignored
when determining its value. An integer literal may have a prefix that
specifies its base and a suffix that specifies its type. The lexically
first digit of the sequence of digits is the most significant. A
binary integer literal (base two) begins with 0b or 0B and consists of
a sequence of binary digits. An octal integer literal (base eight)
begins with the digit 0 and consists of a sequence of octal digits.
A decimal integer literal (base ten) begins with a digit other than 0
and consists of a sequence of decimal digits. A hexadecimal integer
literal (base sixteen) begins with 0x or 0X and consists of a sequence
of hexadecimal digits, which include the decimal digits and the
letters a through f and A through F with decimal values ten through
fifteen. [Example: The number twelve can be written 12, 014, 0XC, or
0b1100. The literals 1048576, 1’048’576, 0X100000, 0x10’0000, and
0’004’000’000 all have the same value. — end example ]
Wikipedia describes what endianness is, and uses our number system as an example to understand big-endian.
The terms endian and endianness refer to the convention used to
interpret the bytes making up a data word when those bytes are stored
in computer memory.
Big-endian systems store the most significant byte of a word in the
smallest address and the least significant byte is stored in the
largest address (also see Most significant bit). Little-endian
systems, in contrast, store the least significant byte in the smallest
address.
An example on endianness is to think of how a decimal number is
written and read in place-value notation. Assuming a writing system
where numbers are written left to right, the leftmost position is
analogous to the smallest address of memory used, and rightmost
position the largest. For example, the number one hundred twenty three
is written 1 2 3, with the hundreds place left-most. Anyone who reads
this number also knows that the leftmost digit has the biggest place
value. This is an example of a big-endian convention followed in daily
life.
In this context, we are considering a digit of an integer literal to be a "byte of a word", and the word to be the literal itself. Also, the left-most character in a literal is considered to have the smallest address.
With the literal 1234, the digits one, two, three and four are the "bytes of a word", and 1234 is the "word". With the binary literal 0b0111, the digits zero, one, one and one are the "bytes of a word", and the word is 0111.
This consideration allows us to understand endianness in the context of the C++ language, and shows that integer literals are similar to "big-endian".
You're missing the distinction between endianness as written in the source code and endianness as represented in the object code. The answer for each is unsurprising: source-code literals are bigendian because that's how humans read them, in object code they're written however the target reads them.
Since a byte is by definition the smallest unit of memory access I don't believe it would be possible to even ascribe an endianness to any internal representation of bits in a byte -- the only way to discover endianness for larger numbers (whether intentionally or by surprise) is by accessing them from storage piecewise, and the byte is by definition the smallest accessible storage unit.
The C/C++ languages don't care about endianness of multi-byte integers. C/C++ compilers do. Compilers parse your source code and generate machine code for the specific target platform. The compiler, in general, stores integer literals the same way it stores an integer; such that the target CPU's instructions will directly support reading and writing them in memory.
The compiler takes care of the differences between target platforms so you don't have to.
The only time you need to worry about endianness is when you are sharing binary values with other systems that have different byte ordering.Then you would read the binary data in, byte by byte, and arrange the bytes in memory in the correct order for the system that your code is running on.
One picture is sometimes more than thousand words.
Endianness is implementation-defined. The standard guarantees that every object has an object representation as an array of char and unsigned char, which you can work with by calling memcpy() or memcmp(). In C++17, it is legal to reinterpret_cast a pointer or reference to any object type (not a pointer to void, pointer to a function, or nullptr) to a pointer to char, unsigned char, or std::byte, which are valid aliases for any object type.
What people mean when they talk about “endianness” is the order of bytes in that object representation. For example, if you declare unsigned char int_bytes[sizeof(int)] = {1}; and int i; then memcpy( &i, int_bytes, sizeof(i)); do you get 0x01, 0x01000000, 0x0100, 0x0100000000000000, or something else? The answer is: yes. There are real-world implementations that produce each of these results, and they all conform to the standard. The reason for this is so the compiler can use the native format of the CPU.
This comes up most often when a program needs to send or receive data over the Internet, where all the standards define that data should be transmitted in big-endian order, on a little-endian CPU like the x86. Some network libraries therefore specify whether particular arguments and fields of structures should be stored in host or network byte order.
The language lets you shoot yourself in the foot by twiddling the bits of an object representation arbitrarily, but it might get you a trap representation, which could cause undefined behavior if you try to use it later. (This could mean, for example, rewriting a virtual function table to inject arbitrary code.) The <type_traits> header has several templates to test whether it is safe to do things with an object representation. You can copy one object over another of the same type with memcpy( &dest, &src, sizeof(dest) ) if that type is_trivially_copyable. You can make a copy to correctly-aligned uninitialized memory if it is_trivially_move_constructible. You can test whether two objects of the same type are identical with memcmp( &a, &b, sizeof(a) ) and correctly hash an object by applying a hash function to the bytes in its object representation if the type has_unique_object_representations. An integral type has no trap representations, and so on. For the most part, though, if you’re doing operations on object representations where endianness matters, you’re telling the compiler to assume you know what you’re doing and your code will not be portable.
As others have mentioned, binary literals are written with the most-significant-digit first, like decimal, octal or hexidecimal literals. This is different from endianness and will not affect whether you need to call ntohs() on the port number from a TCP header read in from the Internet.
You might want to think about C or C++ or any other language as being intrinsically little endian (think about how the bitwise operators work). If the underlying HW is big endian, the compiler ensures that the data is stored in big endian (ditto for other endianness) however your bit wise operations work as if the data is little endian. Thing to remember is that as far as the language is concerned, data is in little endian. Endianness related problems arise when you cast the data from one type to the other. As long as you don't do that you are good.
I was questioned about the statement "C/C++ language as being intrinsically little endian", as such I am providing an example which many knows how it works but well here I go.
typedef union
{
struct {
int a:1;
int reserved:31;
} bits;
unsigned int value;
} u;
u test;
test.bits.a = 1;
test.bits.reserved = 0;
printf("After bits assignment, test.value = 0x%08X\n", test.value);
test.value = 0x00000001;
printf("After value assignment, test.value = 0x%08X\n", test.value);
Output on a little endian system:
After bits assignment, test.value = 0x00000001
After value assignment, test.value = 0x00000001
Output on a big endian system:
After bits assignment, test.value = 0x80000000
After value assignment, test.value = 0x00000001
So, if you do not know the processor's endianness, where does everything come out right? in the little endian system! Thus, I say that the C/C++ language is intrinsically little endian.

Storing negative number in an unsigned int [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have access to a program which I'm running which SHOULD be guessing a very low number for certain things and outputting the number (probably 0 or 1). However, 0.2% of the time when it should be outputting 0 it outputs a number from 4,294,967,286 - 4,294,967,295. (Note: the latter is the max number an unsigned integer can be).
What I GUESS is happening is the function is guessing the number of the data to be less than 0 aka -1 to -9 and when it assigns that number to an unsigned int it's wrapping the number around to be the max or close to the max number.
I therefore assumed the program is written in C (I do not have access to the source code) and then tested in Visual Studio .NET 2012 C what would happen if I assign a variety of negative numbers to an unsigned integer. Unfortunately, nothing seemed to happen - it would still output the number to the console as a negative integer. I'm wondering if this is to do with MSVS 2012 trying to be smart or perhaps some other reason.
Anyway, am I correct in assuming that this is in fact what is happening and the reason why the programs outputs the max number of an unisnged int? Or are there any other valid reasons as to why this is happening?
Edit: All I want to know is if it's valid to assume that attempting to assign a negative number to an unsigned integer can result in setting the integer to the max number aka 4,294,967,295. If this is IMPOSSIBLE then okay, I'm not looking at SPECIFICS on exactly why this is happening with the program as I do not have access to the code. All I want to know is if it's possible and therefore a possible explanation as to why I am getting these results.
In C and C++ assigning -1 to an unsigned number will give you the maximum unsigned value.
This is guaranteed by the standard and all compilers I know (even VC) implement this part correctly. Probably your C example has some other problem for not showing this result (cannot say without seeing the code).
You can think of negative numbers to have its first bit counting negative.
A 4 bit integer would be
Binary HEX INT4 UINT4
(In Memory) (As decimal) (As decimal)
0000 0x0 0 0 (UINT4_MIN)
0001 0x1 1 1
0010 0x2 2 2
0100 0x4 4 4
0111 0x7 7 (INT4_MAX) 7
1000 0x8 -8 (INT4_MIN) 8
1111 0xF -1 15 (UINT4_MAX)
It may be that the header of a library lies to you and the value is negative.
If the library has no other means of telling you about errors this may be a deliberate error value. I have seen "nonsensical" values used in that manner before.
The error could be calculated as (UINT4_MAX - error) or always UINT4_MAX if an error occurs.
Really, without any source code this is a guessing game.
EDIT:
I expanded the illustrating table a bit.
If you want to log a number like that you may want to log it in hexadecimal form. The Hex view allows you to peek into memory a bit quicker if you are used to it.

Difference between byte flip and byte swap

I am trying to find the difference becoz of byte flip functionality I see in Calculator on Mac with Programmer`s view.
So I wrote a program to byte swap a value which we do to go from small to big endian or other way round and I call it as byte swap. But when I see byte flip I do not understand what exactly it is and how is it different than byte swap. I did confirm that the results are different.
For example, for an int with value 12976128
Byte Flip gives me 198;
Byte swap gives me 50688.
I want to implement an algorithm for byte flip since 198 is the value I want to get while reading something. Anything on google says byte flip founds the help byte swap which isnt the case for me.
Byte flip and byte swap are synonyms.
The results you see are just two different ways of swapping the bytes, depending on whether you look at the number as a 32bit number (consisting of 4 bytes), or as the smallest size of a number that can hold 12976128, which is 24 bits or 3 bytes.
The 4byte swap is more usual in computer culture, because 32bit processors are currently predominant (even 64bit architectures still do most of their mathematics in 32bit numbers, partly because of backward compatible software infrastructure, partly because it is enough for many practical purposes). But the Mac Calculator seems to use the minimum-width swap, in this case a 3 byte swap.
12976128, when converted to hexadecimal, gives you 0xC60000. That's 3 bytes total ; each hexadecimal digit is 4 bits, or half a byte wide. The bytes to be swapped are 0xC6, zero, and another zero.
After 3byte swap: 0x0000C6 = 198
After 4byte swap: 0x0000C600 = 50688

Why in C++ the output of integer variable of value 65536 is 0 and < 65536 gives a negative integer and > 65536 value gives a positive integer? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Why in Turbo C++ IDE the output of an integer variable of a hard coded value of 65536 is 0 and lesser than that value (65536) is a negative integer and greater than that value (65536) is a positive integer?
If we initialize an integer with a hard coded value of 65536 and print it, it will print 0 and if we change the value of that integer variable from 65536 to 65535 or lesser like 65534 and so on it prints -1,-2,... and if we change the value of that integer variable from 65536 to 65537 or greater it will print 1,2,3... and so on, why is this happening?
I verified it on Turbo C++ IDE.
Kindly explain the logic and working behind this clearly as I'm a beginner.
The ancient Turbo C++ used 16-bit int.
It seems you are talking about 16-bit signed value (-32768 to 32767), it means that it treats left-most bit as sign.
If you put into it 65535 (1111 1111 1111 1111) - it will treat it as negative since left-most bit is 1. Other bits (all one's) give the greatest negative value which is equal '-1'. It will remain negative until left most bit become 0. It will be 32767.
If you put 65536 (0001 0000 0000 0000 0000) - it will just cut last 16 bit, which all is zero's, and this value will be equal '0'.
65538 (0001 0000 0000 0000 0010) - again will cut last 16 bits, and you will get '2'
Note: Generally speaking you must not save values out of the type range. If you have 16-bit integer which can store only (-32768 to 32767) than you must not put there 65535.
I guess rotation of numbers takes place once we cross over the integer limit.
So once 65536 is reached, again the positive numbers start for greater values.
The original limit is -32768 to 32767. if we go to 32768, we have in fact reached -32768.So when we reach 65536, we get 0 and positive numbers start all over again