memory address gets cut off after 8 digits - c++

memory address gets cuts off after 8 digits help
DWORD* memoryAddress = (DWORD*)0x155221000;
turns from 0x155221000 into 0x55221000 after conversion.

On a 32bits system, an address is 4 bytes long. So DWORD* memoryAddress = (DWORD*)0x155221000; would be truncated by definition (also bad to use C-style casts). The compiler should give you a truncation warning by the way.
1428295680 is the base-10 representation of the same value (addresses are usually represented in hexadecimal, still the same value).
As the comments from different people said, DWORD is 4 bytes (just a coincidence that addresses are also 4 bytes), it would also truncate your number for the same reason.

Related

How pointers in C/C++ actually store addresses?

If an int is stored in memory in 4 bytes each if which having a unique address, which address of these four addresses does a pointer to that int store?
A pointer to int (a int*) stores the address of the first byte of the integer. The size of int is known to the compiler, so it just needs to know where it starts.
How the bytes of the int are interpreted depends on the endianness of your machine, but that doesn't change the fact that the pointer just stores the starting address (the endianness is also known to the compiler).
Those 4 int bytes are not stored in the random locations - they are consecutive. So it is enough to store the reference (address) of first byte of the object.
Depends on the architecture. On a big-endian architecture (M68K, IBM z series), it’s usually the address of the most significant byte. On a little-endian architecture (x86), it’s usually the address of the least-significant byte:
A A+1 A+2 A+3 big-endian
+–––+–––+–––+–––+
|msb| | |lsb|
+–––+–––+–––+–––+
A+3 A+2 A+1 A little-endian
There may be other oddball addressing schemes I’m leaving out.
But basically it’s whatever the underlying architecture considers the “first” byte of the word.
The C Standard does not specify how addresses are represented inside pointers. Yet on most current architectures, a pointer to an int stores its address as the offset in the process' memory space of the first byte of memory used to store it, more precisely the byte with the lowest address.
Note however these remarks:
the int may have more or fewer than 32 bits. The only constraint is it must have at least 15 value bits and a sign bit.
bytes may have more than 8 bits. Most current architectures use 8-bit bytes but early Unix systems had 9-bit bytes, and some DSP systems have 16-, 24- or even 32-bit bytes.
when an int is stored using multiple bytes, it is unspecified how its bits are split among these bytes. Many systems use little-endian representation where the least-significant bits are in the first byte, other systems use big-endian representation where the most significant bits and the sign bit are in the first byte. Other representations are possible but only in theory.
many systems require that the address of the int be aligned on a multiple of their size.
how pointers are stored in memory is also system specific and unspecified. Addresses do not necessarily represent offsets in memory, real of virtual. For example 64-bit pointers on x86 CPUs have a number of bits that can be ignored or that may contain a cryptographic signature verified on the fly by the CPU. Adding one to the stored value of a pointer does not necessarily produce a valid pointer.
If an int is stored in memory in 4 bytes which each has a unique address, which address of these four addresses does a Pointer to that int store?
A pointer to int usually stores the address value of the first byte (which is stored at the lowest memory address) of the int object.
Hence the size of the int is known and constant at a specific implementation/ architecture and an int object is always stored in consecutive bytes (there are no bytes between two of them), it is clear that the following ((if sizeof(int) == 4) three) bytes belong to the same int object.
How the bytes of the int object are interpreted is dependent upon Endianness*.
The first byte is usually automatically aligned on a multiple of the data word size dependent upon a specific architecture, so that the CPU can work most efficiently.
In a 32-bit architecture for example, when the data word size is 4, the first byte lies on a 4-byte boundary - an address location with a multiple of 4.
sizeof(int) is not always 4 (although common) by the way.
*Endianness can influence if the interpretation of the object starts at the most (the first) or least (the last) significant byte.

Difference between 24bit addresses and 24bit arithmetic vs 24 bit address with address arithmetic of 16 bit address?

i have found in the c167 Dokumentation a note on arithmetic of pointers.
There are two macros _huge and _shuge.
A cite from the Doku:
_huge or _shuge. Huge data may be anywhere in memory and you can
also reference it using a 24 bit address. However, address arithmetic
is
done using the complete address (24 bit). Shuge data may also be
anywhere in memory and you can also reference it using a 24 bit
address.
However, address arithmetic is done using a 16 bit address.
So what is the difference in the usage of _huge vs _shuge?
In my understanding the arithmetic of pointers is using an offset from a start address
Example of what I understood so far:
&a[0] + 1 where one element of a is int32 &a[0] gives me the address
of the first element thi
s would be equal to 0x1234211 + 32Bit for
example.**
Is there a difference considering the Note from above and what is the difference in _huge and _shuge?
best regards
Huge was used in the (good?) old 8086 family mode addressing. These were 16 bit processors with a 24 bits address bus. A full address was given by a segment (16 bits) address and an offset (again 16 bits), with the following formula:
linear_address = segment * 16 + offset
The difference between 2 _huge adresses was computed by first converting both to 24 bits linear addresses and substracting that value, while for _shuge one, segment and offset were separately substracted.
Example 0010:1236 - 0011:1234 would give 0000:0012 (18) if computed as _huge and 0001:0002 as _shuge
It's obliquely explained on the 17th page (labeled as page 7) of this PDF: https://www.tasking.com/support/c166/c166_user_guide_v4.0.pdf
By default all __far pointer arithmetic is 14-bit. This implies that comparison of __far pointers is also
done in 14-bit. For __shuge the same is true, but then with 16-bit arithmetic.This saves code significantly,
but has the following implications:
• Comparing pointers to different objects is not reliable. It is only reliable when it is known that these
objects are located in the same page.
• Comparing with NULL is not reliable. Objects that are located in another page at offset 0x0000 have
the low 14 bits (the page offset) zero and will also be evaluated as NULL.
In other words, _shuge pointers' bits above the lowest 16 are ignored except when dereferencing them. You may also note that _shuge pointers have 16-bit alignment, meaning their lowest 4 bits are always zero and therefore only 12 bits need to be considered in comparison or subtraction.

What is the endianness of binary literals in C++14?

I have tried searching around but have not been able to find much about binary literals and endianness. Are binary literals little-endian, big-endian or something else (such as matching the target platform)?
As an example, what is the decimal value of 0b0111? Is it 7? Platform specific? Something else? Edit: I picked a bad value of 7 since it is represented within one byte. The question has been sufficiently answered despite this fact.
Some background: Basically I'm trying to figure out what the value of the least significant bits are, and masking it with binary literals seemed like a good way to go... but only if there is some guarantee about endianness.
Short answer: there isn't one. Write the number the way you would write it on paper.
Long answer:
Endianness is never exposed directly in the code unless you really try to get it out (such as using pointer tricks). 0b0111 is 7, it's the same rules as hex, writing
int i = 0xAA77;
doesn't mean 0x77AA on some platforms because that would be absurd. Where would the extra 0s that are missing go anyway with 32-bit ints? Would they get padded on the front, then the whole thing flipped to 0x77AA0000, or would they get added after? I have no idea what someone would expect if that were the case.
The point is that C++ doesn't make any assumptions about the endianness of the machine*, if you write code using primitives and the literals it provides, the behavior will be the same from machine to machine (unless you start circumventing the type system, which you may need to do).
To address your update: the number will be the way you write it out. The bits will not be reordered or any such thing, the most significant bit is on the left and the least significant bit is on the right.
There seems to be a misunderstanding here about what endianness is. Endianness refers to how bytes are ordered in memory and how they must be interpretted. If I gave you the number "4172" and said "if this is four-thousand one-hundred seventy-two, what is the endianness" you can't really give an answer because the question doesn't make sense. (some argue that the largest digit on the left means big endian, but without memory addresses the question of endianness is not answerable or relevant). This is just a number, there are no bytes to interpret, there are no memory addresses. Assuming 4 byte integer representation, the bytes that correspond to it are:
low address ----> high address
Big endian: 00 00 10 4c
Little endian: 4c 10 00 00
so, given either of those and told "this is the computer's internal representation of 4172" you could determine if its little or big endian.
So now consider your binary literal 0b0111 these 4 bits represent one nybble, and can be stored as either
low ---> high
Big endian: 00 00 00 07
Little endian: 07 00 00 00
But you don't have to care because this is also handled by the hardware, the language dictates that the compiler reads from left to right, most significant bit to least significant bit
Endianness is not about individual bits. Given that a byte is 8 bits, if I hand you 0b00000111 and say "is this little or big endian?" again you can't say because you only have one byte (and no addresses). Endianness doesn't pertain to the order of bits in a byte, it refers to the ordering of entire bytes with respect to address(unless of course you have one-bit bytes).
You don't have to care about what your computer is using internally. 0b0111 just saves you the time from having to write stuff like
unsigned int mask = 7; // only keep the lowest 3 bits
by writing
unsigned int mask = 0b0111;
Without needing to comment explaining the significance of the number.
* In c++20 you can check the endianness using std::endian.
All integer literals, including binary ones are interpreted in the same way as we normally read numbers (left most digit being most significant).
The C++ standard guarantees the same interpretation of literals without having to be concerned with the specific environment you're on. Thus, you don't have to concern yourself with endianness in this context.
Your example of 0b0111 is always equal to seven.
The C++ standard doesn't use terms of endianness in regards to number literals. Rather, it simply describes that literals have a consistent interpretation, and that the interpretation is the one you would expect.
C++ Standard - Integer Literals - 2.14.2 - paragraph 1
An integer literal is a sequence of digits that has no period or
exponent part, with optional separating single quotes that are ignored
when determining its value. An integer literal may have a prefix that
specifies its base and a suffix that specifies its type. The lexically
first digit of the sequence of digits is the most significant. A
binary integer literal (base two) begins with 0b or 0B and consists of
a sequence of binary digits. An octal integer literal (base eight)
begins with the digit 0 and consists of a sequence of octal digits.
A decimal integer literal (base ten) begins with a digit other than 0
and consists of a sequence of decimal digits. A hexadecimal integer
literal (base sixteen) begins with 0x or 0X and consists of a sequence
of hexadecimal digits, which include the decimal digits and the
letters a through f and A through F with decimal values ten through
fifteen. [Example: The number twelve can be written 12, 014, 0XC, or
0b1100. The literals 1048576, 1’048’576, 0X100000, 0x10’0000, and
0’004’000’000 all have the same value. — end example ]
Wikipedia describes what endianness is, and uses our number system as an example to understand big-endian.
The terms endian and endianness refer to the convention used to
interpret the bytes making up a data word when those bytes are stored
in computer memory.
Big-endian systems store the most significant byte of a word in the
smallest address and the least significant byte is stored in the
largest address (also see Most significant bit). Little-endian
systems, in contrast, store the least significant byte in the smallest
address.
An example on endianness is to think of how a decimal number is
written and read in place-value notation. Assuming a writing system
where numbers are written left to right, the leftmost position is
analogous to the smallest address of memory used, and rightmost
position the largest. For example, the number one hundred twenty three
is written 1 2 3, with the hundreds place left-most. Anyone who reads
this number also knows that the leftmost digit has the biggest place
value. This is an example of a big-endian convention followed in daily
life.
In this context, we are considering a digit of an integer literal to be a "byte of a word", and the word to be the literal itself. Also, the left-most character in a literal is considered to have the smallest address.
With the literal 1234, the digits one, two, three and four are the "bytes of a word", and 1234 is the "word". With the binary literal 0b0111, the digits zero, one, one and one are the "bytes of a word", and the word is 0111.
This consideration allows us to understand endianness in the context of the C++ language, and shows that integer literals are similar to "big-endian".
You're missing the distinction between endianness as written in the source code and endianness as represented in the object code. The answer for each is unsurprising: source-code literals are bigendian because that's how humans read them, in object code they're written however the target reads them.
Since a byte is by definition the smallest unit of memory access I don't believe it would be possible to even ascribe an endianness to any internal representation of bits in a byte -- the only way to discover endianness for larger numbers (whether intentionally or by surprise) is by accessing them from storage piecewise, and the byte is by definition the smallest accessible storage unit.
The C/C++ languages don't care about endianness of multi-byte integers. C/C++ compilers do. Compilers parse your source code and generate machine code for the specific target platform. The compiler, in general, stores integer literals the same way it stores an integer; such that the target CPU's instructions will directly support reading and writing them in memory.
The compiler takes care of the differences between target platforms so you don't have to.
The only time you need to worry about endianness is when you are sharing binary values with other systems that have different byte ordering.Then you would read the binary data in, byte by byte, and arrange the bytes in memory in the correct order for the system that your code is running on.
One picture is sometimes more than thousand words.
Endianness is implementation-defined. The standard guarantees that every object has an object representation as an array of char and unsigned char, which you can work with by calling memcpy() or memcmp(). In C++17, it is legal to reinterpret_cast a pointer or reference to any object type (not a pointer to void, pointer to a function, or nullptr) to a pointer to char, unsigned char, or std::byte, which are valid aliases for any object type.
What people mean when they talk about “endianness” is the order of bytes in that object representation. For example, if you declare unsigned char int_bytes[sizeof(int)] = {1}; and int i; then memcpy( &i, int_bytes, sizeof(i)); do you get 0x01, 0x01000000, 0x0100, 0x0100000000000000, or something else? The answer is: yes. There are real-world implementations that produce each of these results, and they all conform to the standard. The reason for this is so the compiler can use the native format of the CPU.
This comes up most often when a program needs to send or receive data over the Internet, where all the standards define that data should be transmitted in big-endian order, on a little-endian CPU like the x86. Some network libraries therefore specify whether particular arguments and fields of structures should be stored in host or network byte order.
The language lets you shoot yourself in the foot by twiddling the bits of an object representation arbitrarily, but it might get you a trap representation, which could cause undefined behavior if you try to use it later. (This could mean, for example, rewriting a virtual function table to inject arbitrary code.) The <type_traits> header has several templates to test whether it is safe to do things with an object representation. You can copy one object over another of the same type with memcpy( &dest, &src, sizeof(dest) ) if that type is_trivially_copyable. You can make a copy to correctly-aligned uninitialized memory if it is_trivially_move_constructible. You can test whether two objects of the same type are identical with memcmp( &a, &b, sizeof(a) ) and correctly hash an object by applying a hash function to the bytes in its object representation if the type has_unique_object_representations. An integral type has no trap representations, and so on. For the most part, though, if you’re doing operations on object representations where endianness matters, you’re telling the compiler to assume you know what you’re doing and your code will not be portable.
As others have mentioned, binary literals are written with the most-significant-digit first, like decimal, octal or hexidecimal literals. This is different from endianness and will not affect whether you need to call ntohs() on the port number from a TCP header read in from the Internet.
You might want to think about C or C++ or any other language as being intrinsically little endian (think about how the bitwise operators work). If the underlying HW is big endian, the compiler ensures that the data is stored in big endian (ditto for other endianness) however your bit wise operations work as if the data is little endian. Thing to remember is that as far as the language is concerned, data is in little endian. Endianness related problems arise when you cast the data from one type to the other. As long as you don't do that you are good.
I was questioned about the statement "C/C++ language as being intrinsically little endian", as such I am providing an example which many knows how it works but well here I go.
typedef union
{
struct {
int a:1;
int reserved:31;
} bits;
unsigned int value;
} u;
u test;
test.bits.a = 1;
test.bits.reserved = 0;
printf("After bits assignment, test.value = 0x%08X\n", test.value);
test.value = 0x00000001;
printf("After value assignment, test.value = 0x%08X\n", test.value);
Output on a little endian system:
After bits assignment, test.value = 0x00000001
After value assignment, test.value = 0x00000001
Output on a big endian system:
After bits assignment, test.value = 0x80000000
After value assignment, test.value = 0x00000001
So, if you do not know the processor's endianness, where does everything come out right? in the little endian system! Thus, I say that the C/C++ language is intrinsically little endian.

How the bits are interpreted in memory

I'm studying the C++ programming language and i have a problem with my book (Programming priciples and practice using C++). What my book says is :
the meaning of the bit in memory is completely dependent on the type used to access it.Think of it this way : computer memory doesn't know about our types, it's just memory. The bits of memory get meaning only when we decide how that memory is to be interpreted.
Can you explain me what does it mean ? Please do it in a simple way because I'm only a beginner who is learning C++ by 3 weeks.
The computer's memory only stores bits and bytes - how those values are interpreted is up to the programmer (and his programming language).
Consider, e.g., the value 01000001. If you interpret it as a number, it's 65 (e.g., in the short datatype). If you interpret it as an ASCII character (e.g., in the char datatype), it's the character 'A'.
A simple example: take the byte 01000001. It contains (as all bytes) 8 bits. There are 2 bits set on (with the value of 1), the second and the last bits. The second has a corresponding decimal value of 64 in the byte, and the last has the value of 1. So they are interpreted as different powers of 2 by convention (in this case 2ˆ6 and 2ˆ0). This byte will have a decimal value of 64 + 1 = 65. For the byte 01000001 itself, there is also an interpretation convention. For instance, it can be the number 65 or the letter 'A' (according to the ASCII Table). Or, the byte can be part of a data that has a larger representation than just one byte.
As a few people have noted, bits are just a way of representing information. We need to have a way to interpret the bits to derive meaning out of them. It's kind of like a dictionary. There are many different "dictionaries" out there for many different types of data. ASCII, 2s complement integers, and so forth.
C++ variables must have a type, so each is assigned to a category, like int, double, float, char, string and so forth. The data type tells the compiler how much space to allocate in memory for your variable, how to assign it a value, and how to modify it.

Difference between byte flip and byte swap

I am trying to find the difference becoz of byte flip functionality I see in Calculator on Mac with Programmer`s view.
So I wrote a program to byte swap a value which we do to go from small to big endian or other way round and I call it as byte swap. But when I see byte flip I do not understand what exactly it is and how is it different than byte swap. I did confirm that the results are different.
For example, for an int with value 12976128
Byte Flip gives me 198;
Byte swap gives me 50688.
I want to implement an algorithm for byte flip since 198 is the value I want to get while reading something. Anything on google says byte flip founds the help byte swap which isnt the case for me.
Byte flip and byte swap are synonyms.
The results you see are just two different ways of swapping the bytes, depending on whether you look at the number as a 32bit number (consisting of 4 bytes), or as the smallest size of a number that can hold 12976128, which is 24 bits or 3 bytes.
The 4byte swap is more usual in computer culture, because 32bit processors are currently predominant (even 64bit architectures still do most of their mathematics in 32bit numbers, partly because of backward compatible software infrastructure, partly because it is enough for many practical purposes). But the Mac Calculator seems to use the minimum-width swap, in this case a 3 byte swap.
12976128, when converted to hexadecimal, gives you 0xC60000. That's 3 bytes total ; each hexadecimal digit is 4 bits, or half a byte wide. The bytes to be swapped are 0xC6, zero, and another zero.
After 3byte swap: 0x0000C6 = 198
After 4byte swap: 0x0000C600 = 50688