c++ pointer on 64 bit machine - c++

I am using c++ under 64 bit linux, the compiler (g++) is also 64 bit. When I print the address of some variable, for example an integer, it is supposed to print a 64 bit integer, but in fact it prints a 48 bit integer.
int i;
cout << &i << endl;
output: 0x7fff44a09a7c
I am wondering where are the other two bytes. Looking forward to you help.
Thanks.

The printing of addresses in most C++ implementations suppresses leading zeroes to make things more readable. Stuff like 0x00000000000013fd does not really add value.
When you wonder why you will normally not see anything more than 48bit values in userspace, this is because the current AMD64 architecture is just defined to have 48bit of virtual address space (as can be seen by e.g. cat /proc/cpuinfo on linux)

They are there - they haven't gone anywhere - it's just the formatting in the stream. It skips leading zeros (check out fill and width properties of stream).
EDIT: on second thoughts, I don't think there is a nice way of changing the formatting for the default operator<< for pointers. The fill and width attributes can be changed if you are streaming out using the std::hex manipulator.

For fun you could use the C output and see if it's more like what you're after:
printf("0x%p");

Related

Viewing double bit pattern in Visual Studio C++ Debugger

I'm working with IEEE-754 doubles, and I'd like to verify that the bit patterns match between different platforms. For this reason I would like to see the bit pattern of a double in the Visual Studio C++ Debugger.
I've tried format specifiers, but they don't seem to allow me to format a double as anything which would allow me to see the bit pattern.
One way I finally found was to use Memory View and enter the address of the variable (&x) in the address field. This allows me to set for instance 8-bit integer hex display, which gives me what I need. But is there any other more convenient way of formatting a double this way in the debugger?
To view the exact binary floating-point value you should print the it as hexadecimal with %a/%A or std::hexfloat instead of examining its bit pattern
printf("Hexadecimal: %a %A\n", 1.5, 1.5);
std::out << std::hexfloat << 1.5 << '\n';
However if you really need to view the actual bit pattern then you just need to reinterpret the type of the underlying memory region like auto bits = reinterpret_cast<uint64_t*>(doubleValue). You don't need to open the Memory View to achieve this, a simple cast would work in the Watch window. So to get the bit pattern of double and float use *(__int64*)&doubleValue,x and *(int*)&floatValue,x respectively. Strict aliasing does occur but you don't actually need to care about it in MSVC debugger
Note that __int64 is a built-in type of MSVC so you might want to use long long instead. Typedefs and macros like uint64_t won't work while watching
Alternatively you can access the bytes separately by casting to char* and print as an array with (char*)&doubleValue, 8, (char*)&floatValue, 4 or (char*)&floatingPoint, [sizeof floatingPoint]. This time strict aliasing doesn't occur but the output may be less readable

Printing a pointer on a 64-bit system [duplicate]

I am using c++ under 64 bit linux, the compiler (g++) is also 64 bit. When I print the address of some variable, for example an integer, it is supposed to print a 64 bit integer, but in fact it prints a 48 bit integer.
int i;
cout << &i << endl;
output: 0x7fff44a09a7c
I am wondering where are the other two bytes. Looking forward to you help.
Thanks.
The printing of addresses in most C++ implementations suppresses leading zeroes to make things more readable. Stuff like 0x00000000000013fd does not really add value.
When you wonder why you will normally not see anything more than 48bit values in userspace, this is because the current AMD64 architecture is just defined to have 48bit of virtual address space (as can be seen by e.g. cat /proc/cpuinfo on linux)
They are there - they haven't gone anywhere - it's just the formatting in the stream. It skips leading zeros (check out fill and width properties of stream).
EDIT: on second thoughts, I don't think there is a nice way of changing the formatting for the default operator<< for pointers. The fill and width attributes can be changed if you are streaming out using the std::hex manipulator.
For fun you could use the C output and see if it's more like what you're after:
printf("0x%p");

C++ portability issue with the function put in fstream.h when passing a numerical value

I have a question about put or other derivatives of in fstream.h. Can I make sure about portability of the code when I want to simply write something like so:
#include <fstream>
#include <iostream>
using namespace std;
typedef unsigned char u8;
int main()
{
fstream f;
u8 ch;
f.open("deneme.txt",ios::out|ios::binary);
f.put(129);
f.close();
return 0;
}
When I write 128 into the put function (which takes type of char value as parameter) I took an € but for 129, it is nothing in the text file. I can't see, can't select. Although the cursor appear at the begining of the file, row,col pointer of the notepad interestingly say 1,2
So, there is something but it is not visible. Also according to tutorials, it was same as far as I remember. Can I write between 0 and 255 without portability issues. In order to ensure about that it writes correctly in binary form in all platforms (compilers/operating systems etc.). The cause of my concern is the char type whose range of value can change from platform to platform. Is there such a portability issue for put function or Do I have to worry about it in long run?
OK, I know what to do about that topic anymore. Thanks everybody trying to help me.
In fact, there are no issue at all. I don't know what OS do you use. But on *nix like OS it is very simple to check that you get what you want. Look at size of file I sure you it will be 1 byte. And if you open it in hex editor you get byte with value 0x81 or 129.
About editors, some modern editor may think that this 129 character is begin of UTF-8 sequence, and it has at least two bytes length and show wrong results. Another not modern editor may think that this is some 8bit local endcoding, but this encoding may not describe character 129, or font used by this editor may not contains such glyph. This is the problems of editors not your program.

On Linux, in C/C++, will a pointer ever have the MSB set?

I want to use a long integer that will be interpreted as a number when the MSB is set otherwise it will be interpreted as a pointer. So would this work or would I run into problems in either C or C++?
This is on a 64-bit system.
Edited for clarity and a better description.
On x86-64, you WILL have a pointer that is over 47 bits in address have the 63rd bit set, since all the bits above "max number of bits supported by the architecture" (which is currently 48) must all have the same value as the most significant bit of the value itself. (That is any address above 0007 FFFF FFFF FFFF will be FFF8 0000 0000 0000 - everything in between is "invalid" as a pointer)
That may well be addresses ONLY used by the kernel, but I'm not sure it's guaranteed to be.
However, I would try to avoid using tricks like this - it's likely to come back and haunt you at some point.
People have tried tricks like this before.
It never works out well in the long run.
Simply don't do it.
Edit: better link - see reference to 'bit31', which was previously never returned as set. Once it could be set (over 2 gigs of RAM, gasp!) it would break naughty programs and therefore programs needed to opt into this option once this much memory became the norm as people had used trickery like this (amongst other things). And now my lovely, short and to the point answer has become too long :-)
So would this work or would I run into problems in either C or C++?
Do you have 64 bits? Do you want your code to be portable to 32 bit systems? long does not necessarily have 64 bits. Big-endian v. little-endian? (Do you know which your system is?)
Plus, hopeless confusion. Please just use an extra variable to store this information or you will have many many bugs surrounding this.
It depends on the architecture. x86_64 architecture, for example, is currently using 48-bit addressing. It means that you could use 16 bits for your own needs (a trick that sometimes referred to as "pointer packing"). However, even the x86_64 architecture definition allows this limit to be raised in future implementations to the full 64 bits. If that happens, you may run into a situation where a lot of your code might need to be changed. So if you really must go that way, make sure your pointer packing is kept in one place that is easy to change in the future. For other architectures you have to check for yourself.
Unless you really need the space, or you're keeping alot of these things around, I would just use a plain union, and add a tag field. If you're going to go down that route, make sure that your memory is aligned to fit your needs.
Take a look at boost::lockfree::detail::tagged_ptr from boost.lockfree
This is a class that was introduced in latest 1_53 boost. It stores pointer and additional 16 bites in 64 bites variable.
Don't do such tricks. If you need to distinguish integers from pointers inside some container, consider using separate bit set to indicate such flag. In C++ std::bitset could be good enough.
Reasons:
Actually nobody guarantees pointers are long unsigned or long long unsigned. If you need
to store them, always apply sizeof() and void * type (if you need
to remove information about pointed object).
Even on one system addresses are highly dependent on architecture.
Kernel modules could seriously change mapping logics for process so you never know what addresses you will need.
Remember that the virtual address returned to your program does may necessarily line up to the actual physical address in memory. Infact, unless you are directly manipulating pretty special memory [e.g. some forms of graphics memory] then this is absolutely the case.
In this case, its the maximum value of the MMU which defines the values of the pointers your program sees. In which case, for x64 I'm pretty sure its (currently) 48bits, but as Mats specifies above once you've got the top bit set in the 48, you get the 63'd bit says aswell.
So taking his answer and mine - its entirely possible to get a pointer with the 47th bit set even with a small amount of RAM, and once you do you get the 63rd bit set.
If the "64-bit system" in question is x86_64, then yes, it will work.

Writing binary data in c++

I am in the process of building an assembler for a rather unusual machine that me and a few other people are building. This machine takes 18 bit instructions, and I am writing the assembler in C++.
I have collected all of the instructions into a vector of 32 bit unsigned integers, none of which is any larger than what can be represented with an 18 bit unsigned number.
However, there does not appear to be any way (as far as I can tell) to output such an unusual number of bits to a binary file in C++, can anyone help me with this.
(I would also be willing to use C's stdio and File structures. However there still does not appear to be any way to output such an arbitrary amount of bits).
Thank you for your help.
Edit: It looks like I didn't specify how the instructions will be stored in memory well enough.
Instructions are contiguous in memory. Say the instructions start at location 0 in memory:
The first instruction will be at 0. The second instruction will be at 18, the third instruction will be at 36, and so on.
There is no gaps, or no padding in the instructions. There can be a few superfluous 0s at the end of the program if needed.
The machine uses big endian instructions. So an instruction stored as 3 should map to: 000000000000000011
Keep an eight-bit accumulator.
Shift bits from the current instruction into to the accumulator until either:
The accumulator is full; or
No bits remain of the current instruction.
Whenever the accumulator is full:
Write its contents to the file and clear it.
Whenever no bits remain of the current instruction:
Move to the next instruction.
When no instructions remain:
Shift zeros into the accumulator until it is full.
Write its contents.
End.
For n instructions, this will leave (8 - 18n mod 8) zero bits after the last instruction.
There are a lot of ways you can achieve the same end result (I am assuming the end result is a tight packing of these 18 bits).
A simple method would be to create a bit-packer class that accepts the 32-bit words, and generates a buffer that packs the 18-bit words from each entry. The class would need to do some bit shifting, but I don't expect it to be particularly difficult. The last byte can have a few zero bits at the end if the original vector length is not a multiple of 4. Once you give all your words to this class, you can get a packed data buffer, and write it to a file.
You could maybe represent your data in a bitset and then write the bitset to a file.
Wouldn't work with fstreams write function, but there is a way that is described here...
The short answer: Your C++ program should output the 18-bit values in the format expected by your unusual machine.
We need more information, specifically, that format that your "unusual machine" expects, or more precisely, the format that your assembler should be outputting. Once you understand what the format of the output that you're generating is, the answer should be straightforward.
One possible format — I'm making things up here — is that we could take two of your 18-bit instructions:
instruction 1 instruction 2 ...
MSB LSB MSB LSB ...
bits → ABCDEFGHIJKLMNOPQR abcdefghijklmnopqr ...
...and write them in an 8-bits/byte file thus:
KLMNOPQR CDEFGHIJ 000000AB klmnopqr cdefghij 000000ab ...
...this is basically arranging the values in "little-endian" form, with 6 zero bits padding the 18-bit values out to 24 bits.
But I'm assuming: the padding, the little-endianness, the number of bits / byte, etc. Without more information, it's hard to say if this answer is even remotely near correct, or if it is exactly what you want.
Another possibility is a tight packing:
ABCDEFGH IJKLMNOP QRabcdef ghijklmn opqr0000
or
ABCDEFGH IJKLMNOP abcdefQR ghijklmn 0000opqr
...but I've made assumptions about where the corner cases go here.
Just output them to the file as 32 bit unsigned integers, just as you have in memory, with the endianness that you prefer.
And then, when the loader / eeprom writer / JTAG or whatever method you use to send the code to the machine, for each 32 bit word that is read, just omit the 14 more significant bits and send the real 18 bits to the target.
Unless, of course, you have written a FAT driver for your machine...