How many bits are required to store the pointer value?

How many bits are required to store the pointer value? - c++

As far as I know, the size of the pointer on 32-bit systems is usually 4 bytes, and on 64-bit systems, 8 bytes. But as far as I know not all the bits are used to store the address. If so, is it safe to use free bits for other purposes? If so, how, and how many free bits are available on 32-bit and 64-bit systems in pointer memory space?

At the time of writing the current 64 bit Intel chips use 48 bit pointers internally.
Every C++ compiler I've come across abstracts this 48 bit pointer to a 64 bit pointer with the most significant 16 bits set to zero.
But the behaviour on using any of the free bits is undefined.
Towards the end of 32 bit chips being the norm, it was possible to have 4GB of physical memory, let alone virtual memory. All 32 bits were used for a pointer.

It is not portable to use any bits in a pointer value for a different purpose.
You can look at the documentation for your platform to see if it guarantees that any particular bits in a pointer value are available for use. It is likely that even if they are not directly involved in addressing, they are reserved for use by the platform.

Related

If pointers are just integers that hold an address (4 bytes - 32 bits), how do they store the extra addresses in 64-bit memory?

When using the sizeof() function in c++ programs, the pointers I've looked at seem to all return a size of 4 bytes. I've seen online that pointers are just integer memory addresses. How does this make sense in 64 bit architectures that would then potentially have memory addresses that cannot be accessed in 4 bytes?

Most modern 64-bit operating systems support a 32-bit mode for backward-compatibility. If sizeof(void*) == 4 with you, then you are probably targetting the 32-bit platform of your operating system, so that your program will run in 32-bit mode.
Check your compiler's documentation on how to target the 64-bit platform of your operating system. Afterwards, you should notice that sizeof(void*) == 8.

If pointers are just integers that hold an address (4 bytes - 32 bits), how do they store the extra addresses in 64-bit memory?
A 32 bit pointer cannot store 64 bit addresses. This is why pointers are 64 bits on 64 bit systems.

Encode additional information in pointer

My problem:
I need to encode additional information about an object in a pointer to the object.
What I thought I could do is use part of the pointer to do so. That is, use a few bits encode bool flags. As far as I know, the same thing is done with certain types of handles in the windows kernel.
Background:
I'm writing a small memory management system that can garbage-collect unused objects. To reduce memory consumption of object references and speed up copying, I want to use pointers with additional encoded data e.g. state of the object(alive or ready to be collected), lock bit and similar things that can be represented by a single bit.
My question:
How can I encode such information into a 64-bit pointer without actually overwriting the important bits of the pointer?
Since x64 windows has limited address space, I believe, not all 64 bits of the pointer are used, so I believe it should be possible. However, I wasn't able to find which bits windows actually uses for the pointer and which not. To clarify, this question is about usermode on 64-bit windows.
Thanks in advance.

This is heavily dependent on the architecture, OS, and compiler used, but if you know those things, you can do some things with it.
x86_64 defines a 48-bit1 byte-oriented virtual address space in the hardware, which means essentially all OSes and compilers will use that. What that means is:
the top 17 bits of all valid addresses must be all the same (all 0s or all 1s)
the bottom k bits of any 2k-byte aligned address must be all 0s
in addition, pretty much all OSes (Windows, Linux, and OSX at least) reserve the addresses with the upper bits set as kernel addresses -- all user addresses must have the upper 17 bits all 0s
So this gives you a variety of ways of packing a valid pointer into less than 64 bits, and then later reconstructing the original pointer with shift and/or mask instructions.
If you only need 3 bits and always use 8-byte aligned pointers, you can use the bottom 3 bits to encode extra info, and mask them off before using the pointer.
If you need more bits, you can shift the pointer up (left) by 16 bits, and use those lower 16 bits for information. To reconstruct the pointer, just right shift by 16.
To do shifting and masking operations on pointers, you need to cast them to intptr_t or int64_t (those will be the same type on any 64-bit implementation of C or C++)
1There's some hints that there may soon be hardware that extends this to 56 bits, so only the top 9 bits would need to be 0s or 1s, but it will be awhile before any OS supports this

Why is int being treated as a 32 bit value when compiled x86_64 [duplicate]

Why is int typically 32 bit on 64 bit compilers? When I was starting programming, I've been taught int is typically the same width as the underlying architecture. And I agree that this also makes sense, I find it logical for a unspecified width integer to be as wide as the underlying platform (unless we are talking 8 or 16 bit machines, where such a small range for int will be barely applicable).
Later on I learned int is typically 32 bit on most 64 bit platforms. So I wonder what is the reason for this. For storing data I would prefer an explicitly specified width of the data type, so this leaves generic usage for int, which doesn't offer any performance advantages, at least on my system I have the same performance for 32 and 64 bit integers. So that leaves the binary memory footprint, which would be slightly reduced, although not by a lot...

Bad choices on the part of the implementors?
Seriously, according to the standard, "Plain ints have the
natural size suggested by the architecture of the execution
environment", which does mean a 64 bit int on a 64 bit
machine. One could easily argue that anything else is
non-conformant. But in practice, the issues are more complex:
switching from 32 bit int to 64 bit int would not allow
most programs to handle large data sets or whatever (unlike the
switch from 16 bits to 32); most programs are probably
constrained by other considerations. And it would increase the
size of the data sets, and thus reduce locality and slow the
program down.
Finally (and probably most importantly), if int were 64 bits,
short would have to be either 16 bits or
32 bits, and you'ld have no way of specifying the other (except
with the typedefs in <stdint.h>, and the intent is that these
should only be used in very exceptional circumstances).
I suspect that this was the major motivation.

The history, trade-offs and decisions are explained by The Open Group at http://www.unix.org/whitepapers/64bit.html. It covers the various data models, their strengths and weaknesses and the changes made to the Unix specifications to accommodate 64-bit computing.

Because there is no advantage to a lot of software to have 64-bit integers.
Using 64-bit int's to calculate things that can be calculated in a 32-bit integer (and for many purposes, values up to 4 billion (or +/- 2 billon) are sufficient), and making them bigger will not help anything.
Using a bigger integer will however have a negative effect on how many integers sized "things" fit in the cache on the processor. So making them bigger will make calculations that involve large numbers of integers (e.g. arrays) take longer because.
The int is the natural size of the machine-word isn't something stipulated by the C++ standard. In the days when most machines where 16 or 32 bit, it made sense to make it either 16 or 32 bits, because that is a very efficient size for those machines. When it comes to 64 bit machines, that no longer "helps". So staying with 32 bit int makes more sense.
Edit:
Interestingly, when Microsoft moved to 64-bit, they didn't even make long 64-bit, because it would break too many things that relied on long being a 32-bit value (or more importantly, they had a bunch of things that relied on long being a 32-bit value in their API, where sometimes client software uses int and sometimes long, and they didn't want that to break).

ints have been 32 bits on most major architectures for so long that changing them to 64 bits will probably cause more problems than it solves.

I originally wrote this up in response to this question. While I've modified it some, it's largely the same.
To get started, it is possible to have plain ints wider than 32 bits, as the C++ draft says:
Note: Plain ints are intended to have the natural size suggested by the architecture of the execution environment; the other signed integer types are provided to meet special needs. — end note
Emphasis mine
This would ostensibly seem to say that on my 64 bit architecture (and everyone else's) a plain int should have a 64 bit size; that's a size suggested by the architecture, right? However I must assert that the natural size for even 64 bit architecture is 32 bits. The quote in the specs is mainly there for cases where 16 bit plain ints is desired--which is the minimum size the specifications allow.
The largest factor is convention, going from a 32 bit architecture with a 32 bit plain int and adapting that source for a 64 bit architecture is simply easier if you keep it 32 bits, both for the designers and their users in two different ways:
The first is that less differences across systems there are the easier is for everyone. Discrepancies between systems been only headaches for most programmer: they only serve to make it harder to run code across systems. It'll even add on to the relatively rare cases where you're not able to do it across computers with the same distribution just 32 bit and 64 bit. However, as John Kugelman pointed out, architectures have gone from a 16 bit to 32 bit plain int, going through the hassle to do so could be done again today, which ties into his next point:
The more significant component is the gap it would cause in integer sizes or a new type to be required. Because sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long) is in the actual specification, a gap is forced if the plain int is moved to 64 bits. It starts with shifting long. If a plain int is adjusted to 64 bits, the constraint that sizeof(int) <= sizeof(long) would force long to be at least 64 bits and from there there's an intrinsic gap in sizes. Since long or a plain int usually are used as a 32 bit integer and neither of them could now, we only have one more data type that could, short. Because short has a minimum of 16 bits if you simply discard that size it could become 32 bits and theoretically fill that gap, however short is intended to be optimized for space so it should be kept like that and there are use cases for small, 16 bit, integers as well. No matter how you arrange the sizes there is a loss of a width and therefore use case for an int entirely unavailable. A bigger width doesn't necessarily mean it's better.
This now would imply a requirement for the specifications to change, but even if a designer goes rogue, it's highly likely it'd be damaged or grow obsolete from the change. Designers for long lasting systems have to work with an entire base of entwined code, both their own in the system, dependencies, and user's code they'll want to run and a huge amount of work to do so without considering the repercussions is simply unwise.
As a side note, if your application is incompatible with a >32 bit integer, you can use static_assert(sizeof(int) * CHAR_BIT <= 32, "Int wider than 32 bits!");. However, who knows maybe the specifications will change and 64 bits plain ints will be implemented, so if you want to be future proof, don't do the static assert.

Main reason is backward compatibility. Moreover, there is already a 64 bit integer type long and same goes for float types: float and double. Changing the sizes of these basic types for different architectures will only introduce complexity. Moreover, 32 bit integer responds to many needs in terms of range.

The C + + standard does not say how much memory should be used for the int type, tells you how much memory should be used at least for the type int. In many programming environments on 32-bit pointer variables, "int" and "long" are all 32 bits long.

Since no one pointed this out yet.
int is guaranteed to be between -32767 to 32767(2^16) That's required by the standard. If you want to support 64 bit numbers on all platforms I suggest using the right type long long which supports (-9223372036854775807 to 9223372036854775807).
int is allowed to be anything so long as it provides the minimum range required by the standard.

Why is int typically 32 bit on 64 bit compilers?

Why is int typically 32 bit on 64 bit compilers? When I was starting programming, I've been taught int is typically the same width as the underlying architecture. And I agree that this also makes sense, I find it logical for a unspecified width integer to be as wide as the underlying platform (unless we are talking 8 or 16 bit machines, where such a small range for int will be barely applicable).
Later on I learned int is typically 32 bit on most 64 bit platforms. So I wonder what is the reason for this. For storing data I would prefer an explicitly specified width of the data type, so this leaves generic usage for int, which doesn't offer any performance advantages, at least on my system I have the same performance for 32 and 64 bit integers. So that leaves the binary memory footprint, which would be slightly reduced, although not by a lot...

Bad choices on the part of the implementors?
Seriously, according to the standard, "Plain ints have the
natural size suggested by the architecture of the execution
environment", which does mean a 64 bit int on a 64 bit
machine. One could easily argue that anything else is
non-conformant. But in practice, the issues are more complex:
switching from 32 bit int to 64 bit int would not allow
most programs to handle large data sets or whatever (unlike the
switch from 16 bits to 32); most programs are probably
constrained by other considerations. And it would increase the
size of the data sets, and thus reduce locality and slow the
program down.
Finally (and probably most importantly), if int were 64 bits,
short would have to be either 16 bits or
32 bits, and you'ld have no way of specifying the other (except
with the typedefs in <stdint.h>, and the intent is that these
should only be used in very exceptional circumstances).
I suspect that this was the major motivation.

The history, trade-offs and decisions are explained by The Open Group at http://www.unix.org/whitepapers/64bit.html. It covers the various data models, their strengths and weaknesses and the changes made to the Unix specifications to accommodate 64-bit computing.

Because there is no advantage to a lot of software to have 64-bit integers.
Using 64-bit int's to calculate things that can be calculated in a 32-bit integer (and for many purposes, values up to 4 billion (or +/- 2 billon) are sufficient), and making them bigger will not help anything.
Using a bigger integer will however have a negative effect on how many integers sized "things" fit in the cache on the processor. So making them bigger will make calculations that involve large numbers of integers (e.g. arrays) take longer because.
The int is the natural size of the machine-word isn't something stipulated by the C++ standard. In the days when most machines where 16 or 32 bit, it made sense to make it either 16 or 32 bits, because that is a very efficient size for those machines. When it comes to 64 bit machines, that no longer "helps". So staying with 32 bit int makes more sense.
Edit:
Interestingly, when Microsoft moved to 64-bit, they didn't even make long 64-bit, because it would break too many things that relied on long being a 32-bit value (or more importantly, they had a bunch of things that relied on long being a 32-bit value in their API, where sometimes client software uses int and sometimes long, and they didn't want that to break).

ints have been 32 bits on most major architectures for so long that changing them to 64 bits will probably cause more problems than it solves.

Main reason is backward compatibility. Moreover, there is already a 64 bit integer type long and same goes for float types: float and double. Changing the sizes of these basic types for different architectures will only introduce complexity. Moreover, 32 bit integer responds to many needs in terms of range.

The C + + standard does not say how much memory should be used for the int type, tells you how much memory should be used at least for the type int. In many programming environments on 32-bit pointer variables, "int" and "long" are all 32 bits long.

Since no one pointed this out yet.
int is guaranteed to be between -32767 to 32767(2^16) That's required by the standard. If you want to support 64 bit numbers on all platforms I suggest using the right type long long which supports (-9223372036854775807 to 9223372036854775807).
int is allowed to be anything so long as it provides the minimum range required by the standard.

What platforms have something other than 8-bit char?

Every now and then, someone on SO points out that char (aka 'byte') isn't necessarily 8 bits.
It seems that 8-bit char is almost universal. I would have thought that for mainstream platforms, it is necessary to have an 8-bit char to ensure its viability in the marketplace.
Both now and historically, what platforms use a char that is not 8 bits, and why would they differ from the "normal" 8 bits?
When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?
In the past I've come across some Analog Devices DSPs for which char is 16 bits. DSPs are a bit of a niche architecture I suppose. (Then again, at the time hand-coded assembler easily beat what the available C compilers could do, so I didn't really get much experience with C on that platform.)

char is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.
Another consideration is that POSIX mandates CHAR_BIT == 8. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char, that's their bad luck.
In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT. If you want an exact 8 bit type, use int8_t. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.

When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?
It's not so much that it's "worth giving consideration" to something as it is playing by the rules. In C++, for example, the standard says all bytes will have "at least" 8 bits. If your code assumes that bytes have exactly 8 bits, you're violating the standard.
This may seem silly now -- "of course all bytes have 8 bits!", I hear you saying. But lots of very smart people have relied on assumptions that were not guarantees, and then everything broke. History is replete with such examples.
For instance, most early-90s developers assumed that a particular no-op CPU timing delay taking a fixed number of cycles would take a fixed amount of clock time, because most consumer CPUs were roughly equivalent in power. Unfortunately, computers got faster very quickly. This spawned the rise of boxes with "Turbo" buttons -- whose purpose, ironically, was to slow the computer down so that games using the time-delay technique could be played at a reasonable speed.
One commenter asked where in the standard it says that char must have at least 8 bits. It's in section 5.2.4.2.1. This section defines CHAR_BIT, the number of bits in the smallest addressable entity, and has a default value of 8. It also says:
Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
So any number equal to 8 or higher is suitable for substitution by an implementation into CHAR_BIT.

Machines with 36-bit architectures have 9-bit bytes. According to Wikipedia, machines with 36-bit architectures include:
Digital Equipment Corporation PDP-6/10
IBM 701/704/709/7090/7094
UNIVAC 1103/1103A/1105/1100/2200,

A few of which I'm aware:
DEC PDP-10: variable, but most often 7-bit chars packed 5 per 36-bit word, or else 9 bit chars, 4 per word
Control Data mainframes (CDC-6400, 6500, 6600, 7600, Cyber 170, Cyber 176 etc.) 6-bit chars, packed 10 per 60-bit word.
Unisys mainframes: 9 bits/byte
Windows CE: simply doesn't support the `char` type at all -- requires 16-bit wchar_t instead

There is no such thing as a completely portable code. :-)
Yes, there may be various byte/char sizes. Yes, there may be C/C++ implementations for platforms with highly unusual values of CHAR_BIT and UCHAR_MAX. Yes, sometimes it is possible to write code that does not depend on char size.
However, almost any real code is not standalone. E.g. you may be writing a code that sends binary messages to network (protocol is not important). You may define structures that contain necessary fields. Than you have to serialize it. Just binary copying a structure into an output buffer is not portable: generally you don't know neither the byte order for the platform, nor structure members alignment, so the structure just holds the data, but not describes the way the data should be serialized.
Ok. You may perform byte order transformations and move the structure members (e.g. uint32_t or similar) using memcpy into the buffer. Why memcpy? Because there is a lot of platforms where it is not possible to write 32-bit (16-bit, 64-bit -- no difference) when the target address is not aligned properly.
So, you have already done a lot to achieve portability.
And now the final question. We have a buffer. The data from it is sent to TCP/IP network. Such network assumes 8-bit bytes. The question is: of what type the buffer should be? If your chars are 9-bit? If they are 16-bit? 24? Maybe each char corresponds to one 8-bit byte sent to network, and only 8 bits are used? Or maybe multiple network bytes are packed into 24/16/9-bit chars? That's a question, and it is hard to believe there is a single answer that fits all cases. A lot of things depend on socket implementation for the target platform.
So, what I am speaking about. Usually code may be relatively easily made portable to certain extent. It's very important to do so if you expect using the code on different platforms. However, improving portability beyond that measure is a thing that requires a lot of effort and often gives little, as the real code almost always depends on other code (socket implementation in the example above). I am sure that for about 90% of code ability to work on platforms with bytes other than 8-bit is almost useless, for it uses environment that is bound to 8-bit. Just check the byte size and perform compilation time assertion. You almost surely will have to rewrite a lot for a highly unusual platform.
But if your code is highly "standalone" -- why not? You may write it in a way that allows different byte sizes.

It appears that you can still buy an IM6100 (i.e. a PDP-8 on a chip) out of a warehouse. That's a 12-bit architecture.

Many DSP chips have 16- or 32-bit char. TI routinely makes such chips for example.

The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values. Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits
Quoted from http://en.wikipedia.org/wiki/Byte#History
Not sure about other languages though.
http://en.wikipedia.org/wiki/IBM_7030_Stretch#Data_Formats
Defines a byte on that machine to be variable length

The DEC PDP-8 family had a 12 bit word although you usually used 8 bit ASCII for output (on a Teletype mostly). However, there was also a 6-BIT character code that allowed you to encode 2 chars in a single 12-bit word.

For one, Unicode characters are longer than 8-bit. As someone mentioned earlier, the C spec defines data types by their minimum sizes. Use sizeof and the values in limits.h if you want to interrogate your data types and discover exactly what size they are for your configuration and architecture.
For this reason, I try to stick to data types like uint16_t when I need a data type of a particular bit length.
Edit: Sorry, I initially misread your question.
The C spec says that a char object is "large enough to store any member of the execution character set". limits.h lists a minimum size of 8 bits, but the definition leaves the max size of a char open.
Thus, the a char is at least as long as the largest character from your architecture's execution set (typically rounded up to the nearest 8-bit boundary). If your architecture has longer opcodes, your char size may be longer.
Historically, the x86 platform's opcode was one byte long, so char was initially an 8-bit value. Current x86 platforms support opcodes longer than one byte, but the char is kept at 8 bits in length since that's what programmers (and the large volumes of existing x86 code) are conditioned to.
When thinking about multi-platform support, take advantage of the types defined in stdint.h. If you use (for instance) a uint16_t, then you can be sure that this value is an unsigned 16-bit value on whatever architecture, whether that 16-bit value corresponds to a char, short, int, or something else. Most of the hard work has already been done by the people who wrote your compiler/standard libraries.
If you need to know the exact size of a char because you are doing some low-level hardware manipulation that requires it, I typically use a data type that is large enough to hold a char on all supported platforms (usually 16 bits is enough) and run the value through a convert_to_machine_char routine when I need the exact machine representation. That way, the platform-specific code is confined to the interface function and most of the time I can use a normal uint16_t.

what sort of consideration is it worth giving to platforms with non-8-bit char?
magic numbers occur e.g. when shifting;
most of these can be handled quite simply
by using CHAR_BIT and e.g. UCHAR_MAX instead of 8 and 255 (or similar).
hopefully your implementation defines those :)
those are the "common" issues.....
another indirect issue is say you have:
struct xyz {
uchar baz;
uchar blah;
uchar buzz;
}
this might "only" take (best case) 24 bits on one platform,
but might take e.g. 72 bits elsewhere.....
if each uchar held "bit flags" and each uchar only had 2 "significant" bits or flags that
you were currently using, and you only organized them into 3 uchars for "clarity",
then it might be relatively "more wasteful" e.g. on a platform with 24-bit uchars.....
nothing bitfields can't solve, but they have other things to watch out
for ....
in this case, just a single enum might be a way to get the "smallest"
sized integer you actually need....
perhaps not a real example, but stuff like this "bit" me when porting / playing with some code.....
just the fact that if a uchar is thrice as big as what is "normally" expected,
100 such structures might waste a lot of memory on some platforms.....
where "normally" it is not a big deal.....
so things can still be "broken" or in this case "waste a lot of memory very quickly" due
to an assumption that a uchar is "not very wasteful" on one platform, relative to RAM available, than on another platform.....
the problem might be more prominent e.g. for ints as well, or other types,
e.g. you have some structure that needs 15 bits, so you stick it in an int,
but on some other platform an int is 48 bits or whatever.....
"normally" you might break it into 2 uchars, but e.g. with a 24-bit uchar
you'd only need one.....
so an enum might be a better "generic" solution ....
depends on how you are accessing those bits though :)
so, there might be "design flaws" that rear their head....
even if the code might still work/run fine regardless of the
size of a uchar or uint...
there are things like this to watch out for, even though there
are no "magic numbers" in your code ...
hope this makes sense :)

The weirdest one I saw was the CDC computers. 6 bit characters but with 65 encodings. [There were also more than one character set -- you choose the encoding when you install the OS.]
If a 60 word ended with 12, 18, 24, 30, 36, 40, or 48 bits of zero, that was the end of line character (e.g. '\n').
Since the 00 (octal) character was : in some code sets, that meant BNF that used ::= was awkward if the :: fell in the wrong column. [This long preceded C++ and other common uses of ::.]

ints used to be 16 bits (pdp11, etc.). Going to 32 bit architectures was hard. People are getting better: Hardly anyone assumes a pointer will fit in a long any more (you don't right?). Or file offsets, or timestamps, or ...
8 bit characters are already somewhat of an anachronism. We already need 32 bits to hold all the world's character sets.

The Univac 1100 series had two operational modes: 6-bit FIELDATA and 9-bit 'ASCII' packed 6 or 4 characters respectively into 36-bit words. You chose the mode at program execution time (or compile time.) It's been a lot of years since I actually worked on them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js