I have heard quite a lot about storing external data in pointer.
For example in (short string optimization).
For example:
when we want to overload << for our SSO class, dependant of the length of the string we want to print either value of pointer or string.
Instead of creating bool flag we could encode this flag inside pointer itself. If i am not mistaken its thanks PC architecture that adds padding to prevent unalligned memory access.
But i have yet to see it in example. How could we detect such flag, when binary operation such as & to check if RSB or LSB is set to 1 ( as a flag ) are not allowed on pointers? Also wouldnt this mess up dereferencing pointers?
It is quite possible to do such things (unlike other's have said). Most modern architectures (x86-64, for example) enforce alignment requirements that allow you to use the fact that the least significant bits of a pointer may be assumed to be zero, and make use of that storage for other purposes.
Let me pause for a second and say that what I'm about to describe is considered 'undefined behavior' by the C & C++ standard. You are going off-the-rails in a non-portable way by doing what I describe, but there are more standards governing the rules of a computer than the C++ standard (such as the processors assembly reference and architecture docs). Caveat emptor.
With the assumption that we're working on x86_64, let us say that you have a class/structure that starts with a pointer member:
struct foo {
bar * ptr;
/* other stuff */
By the x86 architectural constraints, that pointer in foo must be aligned on an 8-byte boundary. In this trivial example, you can assume that every pointer to a struct foo is therefore an address divisible by 8, meaning the lowest 3 bits of a foo * will be zero.
In order to take advantage of such a constraint, you must play some casting games to allow the pointer to be treated as a different type. There's a bunch of different ways of performing the casting, ranging from the old C method (not recommended) of casting it to and from a uintptr_t to cleaner methods of wrapping the pointer in a union. In order to access either the pointer or ancillary data, you need to logically 'and' the datum with a bitmask that zeros out the part of the datum you don't wish.
As an example of this explanation, I wrote an AVL tree a few years ago that sinks the balance book-keeping data into a pointer, and you can take a look at that example here: (everything you need to see is contained in the struct avl_tree_node at the line I referenced).
Swinging back to a topic you mentioned in your initial question... Short string optimization isn't implemented quite the same way. The implementations of it in Clang and GCC's standard libraries differ somewhat, but both boil down to using a union to overload a block of storage with either a pointer or an array of bytes, and play some clever tricks with the string's internal length field for differentiating whether the data is a pointer or local array. For more of the details, this blog post is rather good at explaining:

"encode this flag inside pointer itself"
No, you are not allowed to do this in either C or C++.
The behaviour on setting (let alone dereferencing) a pointer to memory you don't own is undefined in either language.
What is the rationale for limitations on pointer arithmetic or comparison?

In C/C++, addition or subtraction on pointer is defined only if the resulting pointer lies within the original pointed complete object. Moreover, comparison of two pointers can only be performed if the two pointed objects are subobjects of a unique complete object.
What are the reasons of such limitations?
I supposed that segmented memory model (see here §1.2.1) could be one of the reasons but since compilers can actually define a total order on all pointers as demonstrated by this answer, I am doubting this.
The reason is to keep the possibility to generate reasonable code. This applies to systems with a flat memory model as well as to systems with more complex memory models. If you forbid the (not very useful) corner cases like adding or subtracting out of arrays and demanding a total order on pointers between objects you can skip a lot of overhead in the generated code.
The limitations imposed by the standard allows the compiler to make assumptions on pointer arithmetic and use this to improve quality of the code. It covers both computing things statically in the compiler instead of at runtime and choosing which instrutions and addressing modes to use. As an example, consider a program with two pointers p1 and p2. If the compiler can derive that they point to different data objects it can safely assume that any no operation based on following p1 will ever affect the object pointed to by p2. This allows the compiler to reorder loads and stores based on p1 without consider loads and stores based on p2 and the other way around.
There are architectures where program and data spaces are separated, and it's simply impossible to subtract two arbitrary pointers. A pointer to a function or to const static data will be in a completely different address space than a normal variable.
Even if you arbitrarily supplied a ranking between different address spaces, there's a possibility that the diff_t type would need to be a larger size. And the process of comparing or subtracting two pointers would be greatly complicated. That's a bad idea in a language that is designed for speed.
You only prove that the restriction could be removed - but miss that it would come with a cost (in terms of memory and code) - which was contrary to the goals of C.
Specifically the difference needs to have a type, which is ptrdiff_t, and one would assume it is similar to size_t.
In a segmented memory model you (normally) indirectly have a limitation on the sizes of objects - assuming that the answers in: What's the real size of `size_t`, `uintptr_t`, `intptr_t` and `ptrdiff_t` type on 16-bit systems using segmented addressing mode? are correct.
Thus at least for differences removing that restriction would not only add extra instructions to ensure a total order - for an unimportant corner case (as in other answer), but also spend double the amount of memory for differences etc.
C was designed to be more minimalistic and not to force compiler to spend memory and code on such cases. (In those days memory limitations mattered more.)
Obviously there are also other benefits - like the possibility to detect errors when mixing pointers from different arrays. Similarly as mixing iterators for two different containers is undefined in C++ (with some minor exceptions) - and some debug-implementations detect such errors.
The rationale is that some architectures have segmented memory, and pointers to different objects may point at different memory segments. The difference between the two pointers would then not necessarily be something meaningful.
This goes back all the way to pre-standard C. The C rationale doesn't mention this explicitly, but it hints at this being the reason, if we look where it explains the rationale why using a negative array index is undefined behavior (C99 rationale 5.10 6.5.6, emphasis mine):
In the case of p-1, on the other hand, an entire object would have to be allocated prior to the
array of objects that p traverses, so decrement loops that run off the bottom of an array can fail.
This restriction allows segmented architectures, for instance, to place objects at the start of a
range of addressable memory.
Since the C standard intends to cover the majority of processor architectures, it should also cover this one:
Imagine an architecture (I know one, but wouldn't name it) where pointers are not just plain numbers, but are like structures or "descriptors". Such a structure contains information about the object it points into (its virtual address and size) and the offset within it. Adding or subtracting a pointer produces a new structure with only the offset field adjusted; producing a structure with the offset greater than the size of the object is hardware prohibited. There are other restrictions (such as how the initial descriptor is produced or what are the other ways to modify it), but they are not relevant to the topic.
In most cases where the Stanadrd classifies an action as invoking Undefined Behavior, it has done so because:
There might be platforms where defining the behavior would be expensive. Segmented architectures could behave weirdly if code tries to do pointer arithmetic that extends beyond object boundaries, and some compilers may evaluate p > q by testing the sign of q-p.
There are some kinds of programming where defining the behavior would be useless. Many kinds of code can get by just fine without relying upon forms of pointer addition, subtraction, or relational comparison beyond those given by the Standard.
People writing compilers for various purposes should be capable of recognizing cases where quality compilers intended for such purposes should behave predictably, and handling such cases when appropriate, whether or not the Standard compels them to do so.
Both #1 and #2 are very low bars, and #3 was thought to be a "gimme". Although it has become fashionable for compiler writers to show off their cleverness by finding ways of breaking code whose behavior was defined by quality implementations intended for low-level programming, I don't think the authors of the Standard expected compiler writers to perceive a huge difference between actions which were required to behave predictably, versus those where nearly all quality implementations were expected to behave identically, but where there it might conceivably be useful to let some arcane implementations do something else.
I would like to answer this by inverting the question. Instead of asking why pointer addition and most of the arithmetic operations are not allowed, why do pointers allow only adding or subtracting an integer, post and pre increment and decrement and comparison (or subtraction) of pointers pointing to the same array? It is to do with the logical consequence of the arithmetic operation.
Adding/subtracting an integer n to a pointer p gives me the address of nth element from the currently pointed element either in the forward or reverse direction. Similarly, subtracting p1 and p2 pointing to the same array gives me the count of elements between the two pointers.
range of values a c pointer can take?

In "Computer System: A Programmer's Perspective", section 2.1 (page 31), it says:
The value of a pointer in C is the virtual address of the first byte of some block of storage.
To me it sounds like the C pointer's value can take values from 0 to [size of virtual memory - 1]. Is that the case? If yes, I wonder if there is any mechanism that checks if all pointers in a program are assigned with legal values -- values at least 0 and at most [size of virtual memory - 1], and where such mechanism is built in -- in compiler? OS? or somewhere else?
There is no process that checks pointers for validity as use of invalid pointers has undefined effects anyway.
Usually it will be impossible for a pointer to hold a value outside of the addressable range as the two will have the same available range — e.g. both will be 32 bit. However some CPUs have rules about pointer alignment that may render some addresses invalid for some types of data. Some runtimes, such as 64-bit Objective-C, which is a strict superset of C, use incorrectly aligned pointers to disguise literal objects as objects on the heap.
There are also some cases where the complete address space is defined by the instruction set to be one thing but is implemented by that specific hardware to be another. An example from history is the original 68000 which defined a 32-bit space but had only 24 address lines. Very early versions of Mac OS used the spare 8 bits for flags describing the block of data, relying on the hardware to ignore them.
there's no runtime checking of validity;
even if there were, the meaning of validity is often dependent on the specific model of CPU (not just the family) or specific version of the OS (ditto) so as to make checking a less trivial task than you might guess.
In practise what will normally happen if your address is illegal per that hardware but is accessed as though legal is a processor exception.
A pointer in C is an abstract object. The only guarantee provided by the C standard is that pointers can point to all the things they need to within C: functions, objects, one past the end of an object, and NULL.
In typical C implementations, pointers can point to any address in virtual memory, and some C implementations deliberately support this in large part. However, there are complications. For example, the value used for NULL may be difficult to use as an address, and converting pointers created for one type to another type may fail (due to alignment problems). Additionally, there are legal non-typical C implementations where pointers do not directly correlate to memory addresses in a normal way.
You should not expect to use pointers to access memory arbitrarily without understanding the rules of the C standard and of the C implementations you use.
There is no mechanism in C which will check if pointers in a program are valid. The programmer is responsible for using them correctly.
Do I understand C/C++ strict-aliasing correctly?

I've read this article about C/C++ strict aliasing. I think the same applies to C++.
As I understand, strict aliasing is used to rearrange the code for performance optimization. That's why two pointers of different (and unrelated in C++ case) types cannot refer to the same memory location.
Does this mean that problems can occur only if memory is modified? Apart of possible problems with memory alignment.
For example, handling network protocol, or de-serialization. I have a byte array, dynamically allocated and packet struct is properly aligned. Can I reinterpret_cast it to my packet struct?
char const* buf = ...; // dynamically allocated
unsigned int i = *reinterpret_cast<unsigned int*>(buf + shift); // [shift] satisfies alignment requirements
The problem here is not strict aliasing so much as structure representation requirements.
First, it is safe to alias between char, signed char, or unsigned char and any one other type (in your case, unsigned int. This allows you to write your own memory-copy loops, as long as they're defined using a char type. This is authorized by the following language in C99 (§6.5):
6. The effective type of an object for an access to its stored value is the declared type of the object, if any. [Footnote: Allocated objects have no declared type] [...] If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then the effective type
of the modified object for that access and for subsequent accesses that do not modify the
value is the effective type of the object from which the value is copied, if it has one. For
all other accesses to an object having no declared type, the effective type of the object is
simply the type of the lvalue used for the access.
7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [Footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.]
a type compatible with the effective type of the object,
a character type.
Similar language can be found in the C++0x draft N3242 §3.11/10, although it is not as clear when the 'dynamic type' of an object is assigned (I'd appreciate any further references on what the dynamic type is of a char array, to which a POD object has been copied as a char array with proper alignment).
As such, aliasing is not a problem here. However, a strict reading of the standard indicates that a C++ implementation has a great deal of freedom in choosing a representation of an unsigned int.
As one random example, unsigned ints might be a 24-bit integer, represented in four bytes, with 8 padding bits interspersed; if any of these padding bits does not match a certain (constant) pattern, it is viewed as a trap representation, and dereferencing the pointer will result in a crash. Is this a likely implementation? Perhaps not. But there have been, historically, systems with parity bits and other oddness, and so directly reading from the network into an unsigned int, by a strict reading of the standard, is not kosher.
Now, the problem of padding bits is mostly a theoretical issue on most systems today, but it's worth noting. If you plan to stick to PC hardware, you don't really need to worry about it (but don't forget your ntohls - endianness is still a problem!)
Structures make it even worse, of course - alignment representations depend on your platform. I have worked on an embedded platform in which all types have an alignment of 1 - no padding is ever inserted into structures. This can result in inconsistencies when using the same structure definitions on multiple platforms. You can either manually work out the byte offsets for data structure members and reference them directly, or use a compiler-specific alignment directive to control padding.
So you must be careful when directly casting from a network buffer to native types or structures. But the aliasing itself is not a problem in this case.
Actually this code already has UB at the point you dereference the reinterpret_casted integer pointer without even needing to invoke strict-aliasing rules. Not only that, but if you aren't rather careful, reinterpreting directly to your packet structure could cause all sorts of issues depending on struct packing and endianness.
Can storing unrelated data in the least-significant-bit of a pointer work reliably?

Let me just say up front that what I'm aware that what I'm about to propose is a mortal sin, and that I will probably burn in Programming Hell for even considering it.
That said, I'm still interested in knowing if there's any reason why this wouldn't work.
The situation is: I have a reference-counting smart-pointer class that I use everywhere. It currently looks something like this (note: incomplete/simplified pseudocode):
class IRefCountable
IRefCountable() : _refCount(0) {}
virtual ~IRefCountable() {}
void Ref() {_refCount++;}
bool Unref() {return (--_refCount==0);}
unsigned int _refCount;
class Ref
Ref(IRefCountable * ptr, bool isObjectOnHeap) : _ptr(ptr), _isObjectOnHeap(isObjectOnHeap)
if ((_ptr->Unref())&&(_isObjectOnHeap)) delete _ptr;
IRefCountable * _ptr;
bool _isObjectOnHeap;
Today I noticed that sizeof(Ref)=16. However, if I remove the boolean member variable _isObjectOnHeap, sizeof(Ref) is reduced to 8. That means that for every Ref in my program, there are 7.875 wasted bytes of RAM... and there are many, many Refs in my program.
Well, that seems like a waste of some RAM. But I really need that extra bit of information (okay, humor me and assume for the sake of the discussion that I really do). And I notice that since IRefCountable is a non-POD class, it will (presumably) always be allocated on a word-aligned memory address. Therefore, the least significant bit of (_ptr) should always be zero.
Which makes me wonder... is there any reason why I can't OR my one bit of boolean data into the least-significant bit of the pointer, and thus reduce sizeof(Ref) by half without sacrificing any functionality? I'd have to be careful to AND out that bit before dereferencing the pointer, of course, which would make pointer dereferences less efficient, but that might be made up for by the fact that the Refs are now smaller, and thus more of them can fit into the processor's cache at once, and so on.
Is this a reasonable thing to do? Or am I setting myself up for a world of hurt? And if the latter, how exactly would that hurt be visited upon me? (Note that this is code that needs to run correctly in all reasonably modern desktop environments, but it doesn't need to run in embedded machines or supercomputers or anything exotic like that)
If you want to use only the standard facilities and not rely on any implementation then with C++0x there are ways to express alignment (here is a recent question I answered). There's also std::uintptr_t to reliably get an unsigned integral type large enough to hold a pointer. Now the one thing guaranteed is that a conversion from the pointer type to std::[u]intptr_t and back to that same type yields the original pointer.
I suppose you could argue that if you can get back the original std::intptr_t (with masking), then you can get the original pointer. I don't know how solid this reasoning would be.
[edit: thinking about it there's no guarantee that an aligned pointer takes any particular form when converted to an integral type, e.g. one with some bits unset. probably too much of a stretch here]
The problem here is that it is entirely machine-dependent. It isn't something one often sees in C or C++ code, but it has certainly been done many times in assembly. Old Lisp interpreters almost always used this trick to store type information in the low bit(s). (I have seen int in C code, but in projects that were being implemented for a specific target platform.)
Personally, if I were trying to write portable code, I probably wouldn't do this. The fact is that it will almost certainly work on "all reasonably modern desktop environments". (Certainly, it will work on every one I can think of.)
A lot depends on the nature of your code. If you are maintaining it, and nobody else will ever have to deal with the "world of hurt", then it might be ok. You will have to add ifdef's for any odd architecture that you might need to support later on. On the other hand, if you are releasing it to the world as "portable" code, that would be cause for concern.
Another way to handle this is to write two versions of your smart pointer, one for machines on which this will work and one for machines where it won't. That way, as long as you maintain both versions, it won't be that big a deal to change a config file to use the 16-byte version.
It goes without saying that you would have to avoid writing any other code that assumes sizeof(Ref) is 8 rather than 16. If you are using unit tests, run them with both versions.
Any reason? Unless things have changed in the standard lately, the value representation of a pointer is implementation-defined. It is certainly possible that some implementation somewhere may pull the same trick, defining these otherwise-unused low bits for its own purposes. It's even more possible that some implementation might use word-pointers rather than byte-pointers, so instead of two adjacent words being at "addresses" 0x8640 and 0x8642, they would be at "addresses" 0x4320 and 0x4321.
One tricky way around the problem would be to make Ref a (de facto) abstract class, and all instances would actually be instances of RefOnHeap and RefNotOnHeap. If there are that many Refs around, the extra space used to store the code and metadata for three classes rather than one would be made up by the space savings in having each Ref being half the size. (Won't work too well, the compiler can omit the vtable pointer if there are no virtual methods and introducing virtual methods will add the 4-or-8 bytes back to the class).
You always have at least a free bit to use in the pointer as long as
you're not pointing to arbitrary positions inside a struct or array with alignment of 1, or
the platform gives you a free bit
Since IRefCountable has an alignment of 4, you'll have 2 free bottom bits in IRefCountable* to use
Regarding the first point, storing data in the least significant bit is always reliable if the pointer is aligned to a power of 2 larger than 1. That means it'll work for everything apart from char*/bool* or a pointer to a struct containing all char/bool members, and obviously it'll work for IRefCountable* in your case. In C++11 you can use alignof or std::alignment_of to ensure that you have the required alignment like this
static_assert(alignof(Ref) > 1);
static_assert(alignof(IRefCountable) > 1);
// This check for power of 2 is likely redundant
static_assert((alignof(Ref) & (alignof(Ref) - 1)) == 0);
// Now IRefCountable* is always aligned,
// so its least significant bit can be used freely
Even if you have some object with only 1-byte alignment, for example if you change the _refCount in IRefCountable to uint8_t, then you can still enforce alignment requirement with alignas, or with other extensions in older C++ like __declspec(align). Dynamically allocated memory is already aligned to max_align_t, or you can use aligned_alloc() for a higher level alignment
My second bullet point means in case you really need to store arbitrary pointers to objects with absolute 1-byte alignment then most of the time you can still utilize the feature from the platform
On many 32-bit platforms the address space is split in half for user and kernel processes. User pointers will always have the most significant bit unset so you can use that to store data. Of course it won't work on platforms with more than 2GB of user address space, like when the split is 3/1 or 4/4
On 64-bit platforms currently most have only 48-bit virtual address, and a few newer high-end CPUs may have 57-bit virtual address which is far from the total 64 bits. Therefore you'll have lots of bits to spare. And in reality this always work in personal computing since you'll never be able to fill that vast address space
This is called tagged pointer
If the data is always heap-allocated then you can tell the OS to limit the range of address space to use to get more bits
For more information read Using the extra 16 bits in 64-bit pointers
Yes, this can work reliably. This is, in fact, used by the Linux kernel as part of its red-black tree implementation. Instead of storing an extra boolean to indicate whether a node is red or black (which can take up quite a bit of additional space), the kernel uses the low-order bit of the parent node address.
From rbtree_types.h:
struct rb_node {
unsigned long __rb_parent_color;
struct rb_node *rb_right;
struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));
The __rb_parent_color field stores both the address of the nodes parent and the color of the node (in the least-significant bit).
Getting The Pointer
To retrieve the parent address from this field you just clear the lower order bits (this clears the lowest 2-bits).
From rbtree.h:
#define rb_parent(r) ((struct rb_node *)((r)->__rb_parent_color & ~3))
Getting The Boolean
To retrieve the color you just extract the lower bit and treat it like a boolean.
From rbtree_augmented.h:
#define __rb_color(pc) ((pc) & 1)
#define __rb_is_black(pc) __rb_color(pc)
#define __rb_is_red(pc) (!__rb_color(pc))
#define rb_color(rb) __rb_color((rb)->__rb_parent_color)
#define rb_is_red(rb) __rb_is_red((rb)->__rb_parent_color)
#define rb_is_black(rb) __rb_is_black((rb)->__rb_parent_color)
Setting The Pointer And Boolean
You set the pointer and boolean value using standard bit manipulation operations (making sure to preserve each part of the final value).
From rbtree_augmented.h:
static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
static inline void rb_set_parent_color(struct rb_node *rb,
struct rb_node *p, int color)
rb->__rb_parent_color = (unsigned long)p | color;
You can also clear the boolean value setting it to false via (unsigned long)p & ~1.
There will be always a sense of uncertainty in mind even if this method is working, because ultimately you are playing with the internal architecture which may or may not be portable.
On the other hand to solve this problem, if you want to avoid bool variable, I would suggest a simple constructor as,
Ref(IRefCountable * ptr) : _ptr(ptr)
if(ptr != 0)
From the code, I smell that the reference counting is needed only when the object is on heap. For automatic objects, you can simply pass 0 to the class Ref and put appropriate null checks in constructor/destructor.
Have you thought about an out of class storage ?
Depending on whether you have (or not) to worry about multi-threading and control the implementation of new/delete/malloc/free, it might be worth a try.
The point would be that instead of incrementing a local counter (local to the object), you would maintain a "counter" map address --> count that would haughtily ignore addresses passed that are outside the allocated area (stack for example).
It may seem silly (there is room for contention in MT), but it also plays rather nice with read-only since the object is not "modified" only for counting.
Is it possible to store pointer with two higher-order zero bytes as a WORD when it's not in a cpu register?

On a system where size of a pointer is 4 bytes when the intention is to just address parts of memory that are addressable by two bytes(lower parts), is it possible to store the pointer as a two byte WORD when it's not in some cpu register? I don't see any way cause assuming we've got any WORD like one named "twoBytes" by declaring a pointer like:
char * pointer = reinterpret_cast<char *>((unsigned int)(twoBytes))
We're introducing a whole new entity with 4 bytes that's gonna be saved as a 4-byte entity.
Generally you can store however little information is needed to recover the original pointer value, so yes, you can, although it's outside the guarantees offered by the language (you need to be sure how your particular compiler treats reinterpret casts).
However, in e.g. Windows the only thing you can be sure of is that the upper word of a 32-bit pointer is non-zero for user code (except for nullpointers). This is implicit in the Windows API macros like MAKEINTATOM. If the most significant word could be zero then the APIs couldn't reliably distinguish pointers that represent small integers, from pointers to text strings.
So, in general, optimizing that way won't buy you anything unless you're doing kernel programming. Also, saving a few bytes is seldom worth the added complexity.
Cheers & hth.,
What you are describing sounds more like a compiler feature (good old fashioned "near" pointers) than something you can do from inside the language. Take it up with whoever made the compiler you're using. I can vouch for the theoretical possibility of being able to implement this behavior in GCC, although I suspect it would be a huge pain in the ass.
As an alternative hack, you might be able to get most of what you want using a base pointer and 'unsigned short' offsets.
No, for the same reason you can't store the word "bike" in two bytes. The data just won't fit. Don't cast pointers to non-pointer types, it's often non-portable and can silently introduce truncation and cause some nasty bugs.
You can also use it without a named 4 byte entity:
((char*)(unsigned)twoBytes)[idx] = some_val;
twoBytes will only take up two bytes in memory. When you cast it to a char*, your compiler will make a 4-byte value to actually address the data, but you'll never see it, and it will likely only ever be in a register. I think that's what you were asking.