How much memory is allocated for pointer * variables?

How much memory is allocated for pointer * variables? - c++

My question:
int
float
char
str
These var types each have a standardized allotment of bytes allocated for them. But I never learned about this for pointers:
These "address" storing variables, whose contents seem like just a series of alphanumeric characters.
C++ code:
int *ptr1;
char *ptr2;
void *ptr3;
So what is the standard memory allocation for the pointers listed above? And in general?

A pointer has the size sizeof(ptr1) bytes and that's about all the standard specifies. In theory, different object pointers can have different sizes, and different function pointers may have different sizes as well.
What it boils down to underneath the hood is that the CPU data size that we usually mean when speaking of 32 bit CPU etc (the max chunk it can process in a single instruction) does not necessarily correspond to the address bus width/addressable memory space. The latter determines the size of pointers.
In practice (simplification):
8 and 16 bit microcontroller systems, as well as old 16 bit PC may have diverse pointer sizes, typically between 8 and 32 bits, where 16 bits is most common. They usually use non-standard keyword such as pointer qualitifers near or far to indicate this.
Some ISA might have a "zero page", meaning 8 bit pointers to the first 256 addresses, where they can execute code/access data faster (usually this will be on von Neumann MCUs). Similarly, they can have extended memory beyond the normal 65536 bytes, which may give slower execution/data access. 24 bit pointers aren't uncommon.
32 bit systems typically have 32 bit pointers.
64 bit systems typically have 64 bit pointers.
In case you need to grab the absolute address for some reason, then uintptr_t is the portable integer type which should be large enough to hold any pointer address on the given system.

Related

Allocate small struct to 32 bit aligned in 64 bit system

The problem: I'm implementing a non-blocking data structure, where threads alter a shared pointer using a CAS operation. As pointers can be recycled, we have the ABA issue. To avoid this, I want to attach a version to each pointer. This is called a versioned pointer. A CAS128 is considered more expensive than a CAS64, so I'm trying to avoid going above 64 bits.
I'm trying to implement a versioned pointer. In a 32b system, the versioned pointer is a 64b struct, where the top 32 bits are the pointer and the bottom 32 is its version. This allows me to use CAS64 to atomically alter the pointer.
I'm having issues with a 64b system. In this case, I still want to use CAS64 instead of CAS128, so I'm trying to allocate a pointer aligned to 4GB (i.e., 32 zeros). I can then use masks to infer the pointer and version.
The solutions I've tried using alligned_malloc, padding, and std::align, but these involve allocating very large amounts of memory, e.g., alligned_malloc(1LL << 32, (1LL << 32)* sizeof(void*)) allocates 4GB of memory. Another solution is using a memory mapped file, but this involves synchronization that we're trying to avoid.
Is there a way to allocate 8B of memory aligned to 4GB that I'm missing?

First off, a non-portable solution that limits the code complexity creep to the point of allocation (see below for another approach that makes point of use more complicated, but should be portable); it only works on POSIX systems (not Windows), but you could reduce your overhead to the size of a page (not 8 bytes, but in the context of a 64 bit system, wasting 4088 bytes isn't too bad if you're not doing it too often; obviously, the nature of your problem means that you can't possibly waste more than sysconf(_SC_PAGESIZE) - 8 bytes per 4 GB, so that's not too bad) by the following mechanism:
mmap 4 GB of memory anonymously (not file-backed; pass fd of -1 and include the MAP_ANONYMOUS flag)
Compute the address of the 4 GB aligned pointer within that block
munmap the memory preceding that address, and the memory beginning sysconf(_SC_PAGE_SIZE) bytes after that address
This works because memory mappings aren't monolithic; they can be unmapped piecemeal, individual pages can be remapped without error, etc.
Note that if you're short on swap space, the brief request for 4 GB might cause problems (e.g. on a Linux system with heuristic overcommit disabled, it might fail to allocate the memory if it can't back it with swap, even though you never use most of it). You can experiment with passing MAP_NORESERVE to the original request, then performing the unmapping, then remapping that single page with MAP_FIXED (without MAP_NORESERVE) to ensure the allocation can be used without triggering a SIGSEGV at time of writing.
If you can't use POSIX mmap, should it really be impossible to use CAS128, you may want to consider a segmented memory model like the old x86 scheme for these pointers. You block allocate 4 GB segments (they don't need any special alignment) up front, and have your "pointers" be 32 bit offsets from the base address of the segment; you can't use the whole 64 bit address space (unless you allow for multiple selectors, possibly by repurposing part of the version number field for example; you can probably make do with a few million versions rather than four billion after all), but if you don't need to do so, this lets you have a base address that never changes after allocation (so no atomics needed), with offsets that fit within your desired 32 bit field. So instead of getting your data via:
data = *mystruct.pointer;
you have a segment pointer like this initialized early:
char *const base_address = new char[1ULL << 32]; // Or use smart pointer of your choosing
wrap it in a suballocator to partition the space, and now lookup is instead:
data = *reinterpret_cast<REAL_TYPE_HERE*>(&base_address[mystruct.pointer]);
I'm sure there are nifty ways to wrap this up better with custom allocators, custom operator news, what have you, but I've never had to do this in C++ (I've done similar magic in C, where there are no facilities to make it "pretty"), and I'd probably get it wrong, so I'll leave that as an exercise.

Memory alignment, structs and malloc

It's a bit hard to formulate what I want to know in a single question, so I'll try to break it down.
For example purposes, let's say we have the following struct:
struct X {
uint8_t a;
uint16_t b;
uint32_t c;
};
Is it true that the compiler is guaranteed to never rearrange the order of X's members, only add padding where necessary? In other words, is it always true that offsetof(X, a) < offsetof(X, c)?
Is it true that the compiler will pick the largest alignment among X's members and use it to align objects of type X (i.e. addresses of X instances will be divisible by the largest alignment among X's members)?
Since malloc does not know anything about the type of objects that we're going to store when we allocate a buffer, how does it choose an alignment for the returned address? Does it simply return an adress that is divisible by the largest alignment possible (in which case, no matter what structure we put in the buffer, the memory accesses will always be aligned)?

Yes
No, the compiler will use its knowledge of the target host hardware to select the best alignment.
See question 2.

Since malloc does not know anything about the type of objects that we're going to store when we allocate a buffer, how does it choose an alignment for the returned address?
malloc(3) returns "memory that is suitably aligned for any kind of variable."
Does it simply return an adress that is divisible by the largest alignment possible (in which case, no matter what structure we put in the buffer, the memory accesses will always be aligned)?
Yes, but watch your compliance with the strict aliasing rule.

The compiler will do whatever is most beneficial on that computer in the largest number of circumstances. On most platforms loading bus wide values on bus width offsets is fastest.
That means that generally on 32-bit computers compilers will chose to align 32-bit numbers on 4 byte offsets. On 64-bit computers 64-bit values are aligned on 8 byte offsets.
On most computers smaller values like 8-bit and 16-bit values are slower to load. Probably all 4 or 8 bytes around it are loaded and the byte or two bytes you need are masked off.
When you have special circumstances you can override the compiler by specifying the alignment and the padding. You might do this when you know fast loading is not important, but you really want to pack the data tightly. Or when you are playing very subtle tricks with casting and unions.
Memory allocation routines on almost any modern computer will always return memory that is aligned on at least the bus width of the platform ( e.g. 4 or 8 bytes ) - or even more - like 16 byte alignment.
When you call "malloc" you are responsible for knowing the size of the structures you need. Luckily the compiler will tell you the size of any structure with "sizeof". That means that if you pack a structure to save memory, sizeof will return a smaller value than an unpacked structure. So you really will save memory - if you are allocating small structures in large arrays of them.
If you allocate small packed structures one at a time - then yes - if you pack them or not it won't make any difference. That is because when you allocate some odd small piece of memory - the allocator will actually use significantly more memory than that. It will allocate an convenient sized block of memory for you, and then an additional block of memory for itself to keep track of you allocation.
Which is why if you care about memory use and want to pack your structures - you definitely don't want to allocate them one at time.

Why does Malloc() care about boundary alignments?

I've heard that malloc() aligns memory based on the type that is being allocated. For example, from the book Understanding and Using C Pointers:
The memory allocated will be aligned according to the pointer's data type. Fore example, a four-byte integer would be allocated on an address boundary evenly divisible by four.
If I follow, this means that
int *integer=malloc(sizeof(int)); will be allocated on an address boundary evenly divisible by four. Even without casting (int *) on malloc.
I was working on a chat server; I read of a similar effect with structs.
And I have to ask: logically, why does it matter what the address boundary itself is divisible on? What's wrong with allocating a group of memory to the tune of n*sizeof(int) using an integer on address 129?
I know how pointer arithmetic works *(integer+1), but I can't work out the importance of boundaries...

The memory allocated will be aligned according to the pointer's data
type.
If you are talking about malloc, this is false. malloc doesn't care what you do with the data and will allocate memory aligned to fit the most stringent native type of the implementation.
From the standard:
The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object with a
fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated (until the
space is explicitly deallocated)
And:
Logically, why does it matter what the address boundary itself is
divisible on
Due to the workings of the underlying machine, accessing unaligned data might be more expensive (e.g. x86) or illegal (e.g. arm). This lets the hardware take shortcuts that improve performance / simplify implementation.

In many processors, data that isn't aligned will cause a "trap" or "exception" (this is a different form of exception than those understood by the C++ compiler. Even on processors that don't trap when data isn't aligned, it is typically slower (twice as slow, for example) when the data is not correctly aligned. So it's in the compiler's/runtime library's best interest to ensure that things are nicely aligned.
And by the way, malloc (typically) doesn't know what you are allocating. Insteat, malloc will align ALL data, no matter what size it is, to some suitable boundary that is "good enough" for general data-access - typically 8 or 16 bytes in modern OS/processor combinations, 4 bytes in older systems.
This is because malloc won't know if you do char* p = malloc(1000); or double* p = malloc(1000);, so it has to assume you are storing double or whatever is the item with the largest alignment requirement.

The importance of alignment is not a language issue but a hardware issue. Some machines are incapable of reading a data value that is not properly aligned. Others can do it but do so less efficiently, e.g., requiring two reads to read one misaligned value.

The book quote is wrong; the memory returned by malloc is guaranteed to be aligned correctly for any type. Even if you write char *ch = malloc(37);, it is still aligned for int or any other type.
You seem to be asking "What is alignment?" If so, there are several questions on SO about this already, e.g. here, or a good explanation from IBM here.

It depends on the hardware. Even assuming int is 32 bits, malloc(sizeof(int)) could return an address divisible by 1, 2, or 4. Different processors handle unaligned access differently.
Processors don't read directly from RAM any more, that's too slow (it takes hundreds of cycles). So when they do grab RAM, they grab it in big chunks, like 64 bytes at a time. If your address isn't aligned, the 4-byte integer might straddle two 64-byte cache lines, so your processor has to do two loads and fix up the result. Or maybe the engineers decided that building the hardware to fix up unaligned loads isn't necessary, so the processor signals an exception: either your program crashes, or the operating system catches the exception and fixes up the operation (hundreds of wasted cycles).
Aligning addresses means your program plays nicely with hardware.

Because it's more fast; Most processor likes data which is aligned. Even, Some processor CANNOT access data which is not aligned! (If you try to access this data, processor may occur fault)

Is there a memory overhead associated with heap memory allocations (eg markers in the heap)?

Thinking in particular of C++ on Windows using a recent Visual Studio C++ compiler, I am wondering about the heap implementation:
Assuming that I'm using the release compiler, and I'm not concerned with memory fragmentation/packing issues, is there a memory overhead associated with allocating memory on the heap? If so, roughly how many bytes per allocation might this be?
Would it be larger in 64-bit code than 32-bit?
I don't really know a lot about modern heap implementations, but am wondering whether there are markers written into the heap with each allocation, or whether some kind of table is maintained (like a file allocation table).
On a related point (because I'm primarily thinking about standard-library features like 'map'), does the Microsoft standard-library implementation ever use its own allocator (for things like tree nodes) in order to optimize heap usage?

Yes, absolutely.
Every block of memory allocated will have a constant overhead of a "header", as well as a small variable part (typically at the end). Exactly how much that is depends on the exact C runtime library used. In the past, I've experimentally found it to be around 32-64 bytes per allocation. The variable part is to cope with alignment - each block of memory will be aligned to some nice even 2^n base-address - typically 8 or 16 bytes.
I'm not familiar with how the internal design of std::map or similar works, but I very much doubt they have special optimisations there.
You can quite easily test the overhead by:
char *a, *b;
a = new char;
b = new char;
ptrdiff_t diff = a - b;
cout << "a=" << a << " b=" << b << " diff=" << diff;
[Note to the pedants, which is probably most of the regulars here, the above a-b expression invokes undefined behaviour, since subtracting the address of one piece of allocated and the address of another, is undefined behaviour. This is to cope with machines that don't have linear memory addresses, e.g. segmented memory or "different types of data is stored in locations based on their type". The above should definitely work on any x86-based OS that doesn't use a segmented memory model with multiple data segments in for the heap - which means it works for Windows and Linux in 32- and 64-bit mode for sure].
You may want to run it with varying types - just bear in mind that the diff is in "number of the type, so if you make it int *a, *b will be in "four bytes units". You could make a reinterpret_cast<char*>(a) - reinterpret_cast<char *>(b);
[diff may be negative, and if you run this in a loop (without deleting a and b), you may find sudden jumps where one large section of memory is exhausted, and the runtime library allocated another large block]

Visual C++ embeds control information (links/sizes and possibly some checksums) near the boundaries of allocated buffers. That also helps to catch some buffer overflows during memory allocation and deallocation.
On top of that you should remember that malloc() needs to return pointers suitably aligned for all fundamental types (char, int, long long, double, void*, void(*)()) and that alignment is typically of the size of the largest type, so it could be 8 or even 16 bytes. If you allocate a single byte, 7 to 15 bytes can be lost to alignment only. I'm not sure if operator new has the same behavior, but it may very well be the case.
This should give you an idea. The precise memory waste can only be determined from the documentation (if any) or testing. The language standard does not define it in any terms.

Yes. All practical dynamic memory allocators have a minimal granularity1. For example, if the granularity is 16 bytes and you request only 1 byte, the whole 16 bytes is allocated nonetheless. If you ask for 17 bytes, a block whose size is 32 bytes is allocated etc...
There is also a (related) issue of alignment.2
Quite a few allocators seem to be a combination of a size map and free lists - they split potential allocation sizes to "buckets" and keep a separate free list for each of them. Take a look at Doug Lea's malloc. There are many other allocation techniques with various tradeoffs but that goes beyond the scope here...
1 Typically 8 or 16 bytes. If the allocator uses a free list then it must encode two pointers inside every free slot, so a free slot cannot be smaller than 8 bytes (on 32-bit) or 16 byte (on 16-bit). For example, if allocator tried to split a 8-byte slot to satisfy a 4-byte request, the remaining 4 bytes would not have enough room to encode the free list pointers.
2 For example, if the long long on your platform is 8 bytes, then even if the allocator's internal data structures can handle blocks smaller than that, actually allocating the smaller block might push the next 8-byte allocation to an unaligned memory address.

What are near, far and huge pointers?

Can anyone explain to me these pointers with a suitable example ... and when these pointers are used?

The primary example is the Intel X86 architecture.
The Intel 8086 was, internally, a 16-bit processor: all of its registers were 16 bits wide. However, the address bus was 20 bits wide (1 MiB). This meant that you couldn't hold an entire address in a register, limiting you to the first 64 kiB.
Intel's solution was to create 16-bit "segment registers" whose contents would be shifted left four bits and added to the address. For example:
DS ("Data Segment") register: 1234 h
DX ("D eXtended") register: + 5678h
------
Actual address read: 179B8h
This created the concept of 64 kiB segment. Thus a "near" pointer would just be the contents of the DX register (5678h), and would be invalid unless the DS register was already set correctly, while a "far" pointer was 32 bits (12345678h, DS followed by DX) and would always work (but was slower since you had to load two registers and then restore the DS register when done).
(As supercat notes below, an offset to DX that overflowed would "roll over" before being added to DS to get the final address. This allowed 16-bit offsets to access any address in the 64 kiB segment, not just the part that was ± 32 kiB from where DX pointed, as is done in other architectures with 16-bit relative offset addressing in some instructions.)
However, note that you could have two "far" pointers that are different values but point to the same address. For example, the far pointer 100079B8h points to the same place as 12345678h. Thus, pointer comparison on far pointers was an invalid operation: the pointers could differ, but still point to the same place.
This was where I decided that Macs (with Motorola 68000 processors at the time) weren't so bad after all, so I missed out on huge pointers. IIRC, they were just far pointers that guaranteed that all the overlapping bits in the segment registers were 0's, as in the second example.
Motorola didn't have this problem with their 6800 series of processors, since they were limited to 64 kiB, When they created the 68000 architecture, they went straight to 32 bit registers, and thus never had need for near, far, or huge pointers. (Instead, their problem was that only the bottom 24 bits of the address actually mattered, so some programmers (notoriously Apple) would use the high 8 bits as "pointer flags", causing problems when address buses expanded to 32 bits (4 GiB).)
Linus Torvalds just held out until the 80386, which offered a "protected mode" where the addresses were 32 bits, and the segment registers were the high half of the address, and no addition was needed, and wrote Linux from the outset to use protected mode only, no weird segment stuff, and that's why you don't have near and far pointer support in Linux (and why no company designing a new architecture will ever go back to them if they want Linux support). And they ate Robin's minstrels, and there was much rejoicing. (Yay...)

Difference between far and huge pointers:
As we know by default the pointers are near for example: int *p is a near pointer. Size of near pointer is 2 bytes in case of 16 bit compiler. And we already know very well size varies compiler to compiler; they only store the offset of the address the pointer it is referencing. An address consisting of only an offset has a range of 0 - 64K bytes.
Far and huge pointers:
Far and huge pointers have a size of 4 bytes. They store both the segment and the offset of the address the pointer is referencing. Then what is the difference between them?
Limitation of far pointer:
We cannot change or modify the segment address of given far address by applying any arithmetic operation on it. That is by using arithmetic operator we cannot jump from one segment to other segment.
If you will increment the far address beyond the maximum value of its offset address instead of incrementing segment address it will repeat its offset address in cyclic order. This is also called wrapping, i.e. if offset is 0xffff and we add 1 then it is 0x0000 and similarly if we decrease 0x0000 by 1 then it is 0xffff and remember there is no change in the segment.
Now I am going to compare huge and far pointers :
1.When a far pointer is incremented or decremented ONLY the offset of the pointer is actually incremented or decremented but in case of huge pointer both segment and offset value will change.
Consider the following Example, taken from HERE :
int main()
{
char far* f=(char far*)0x0000ffff;
printf("%Fp",f+0x1);
return 0;
}
then the output is:
0000:0000
There is no change in segment value.
And in case of huge Pointers :
int main()
{
char huge* h=(char huge*)0x0000000f;
printf("%Fp",h+0x1);
return 0;
}
The Output is:
0001:0000
This is because of increment operation not only offset value but segment value also change.That means segment will not change in case of far pointers but in case of huge pointer, it can move from one segment to another .
2.When relational operators are used on far pointers only the offsets are compared.In other words relational operators will only work on far pointers if the segment values of the pointers being compared are the same. And in case of huge this will not happen, actually comparison of absolute addresses takes place.Let us understand with the help of an example of far pointer :
int main()
{
char far * p=(char far*)0x12340001;
char far* p1=(char far*)0x12300041;
if(p==p1)
printf("same");
else
printf("different");
return 0;
}
Output:
different
In huge pointer :
int main()
{
char huge * p=(char huge*)0x12340001;
char huge* p1=(char huge*)0x12300041;
if(p==p1)
printf("same");
else
printf("different");
return 0;
}
Output:
same
Explanation: As we see the absolute address for both p and p1 is 12341 (1234*10+1 or 1230*10+41) but they are not considered equal in 1st case because in case of far pointers only offsets are compared i.e. it will check whether 0001==0041. Which is false.
And in case of huge pointers, the comparison operation is performed on absolute addresses that are equal.
A far pointer is never normalized but a huge pointer is normalized . A normalized pointer is one that has as much of the address as possible in the segment, meaning that the offset is never larger than 15.
suppose if we have 0x1234:1234 then the normalized form of it is 0x1357:0004(absolute address is 13574).
A huge pointer is normalized only when some arithmetic operation is performed on it, and not normalized during assignment.
int main()
{
char huge* h=(char huge*)0x12341234;
char huge* h1=(char huge*)0x12341234;
printf("h=%Fp\nh1=%Fp",h,h1+0x1);
return 0;
}
Output:
h=1234:1234
h1=1357:0005
Explanation:huge pointer is not normalized in case of assignment.But if an arithmetic operation is performed on it, it will be normalized.So, h is 1234:1234 and h1 is 1357:0005which is normalized.
4.The offset of huge pointer is less than 16 because of normalization and not so in case of far pointers.
lets take an example to understand what I want to say :
int main()
{
char far* f=(char far*)0x0000000f;
printf("%Fp",f+0x1);
return 0;
}
Output:
0000:0010
In case of huge pointer :
int main()
{
char huge* h=(char huge*)0x0000000f;
printf("%Fp",h+0x1);
return 0;
}
Output:
0001:0000
Explanation:as we increment far pointer by 1 it will be 0000:0010.And as we increment huge pointer by 1 then it will be 0001:0000 because it's offset cant be greater than 15 in other words it will be normalized.

In the old days, according to the Turbo C manual, a near pointer was merely 16 bits when your entire code and data fit in the one segment. A far pointer was composed of a segment as well as an offset but no normalisation was performed. And a huge pointer was automatically normalised. Two far pointers could conceivably point to the same location in memory but be different whereas the normalised huge pointers pointing to the same memory location would always be equal.

All of the stuff in this answer is relevant only to the old 8086 and 80286 segmented memory model.
near: a 16 bit pointer that can address any byte in a 64k segment
far: a 32 bit pointer that contains a segment and an offset. Note that because segments can overlap, two different far pointers can point to the same address.
huge: a 32 bit pointer in which the segment is "normalised" so that no two far pointers point to the same address unless they have the same value.
tee: a drink with jam and bread.
That will bring us back to doh oh oh oh
and when these pointers are used?
in the 1980's and 90' until 32 bit Windows became ubiquitous,

In some architectures, a pointer which can point to every object in the system will be larger and slower to work with than one which can point to a useful subset of things. Many people have given answers related to the 16-bit x86 architecture. Various types of pointers were common on 16-bit systems, though near/fear distinctions could reappear in 64-bit systems, depending upon how they're implemented (I wouldn't be surprised if many development systems go to 64-bit pointers for everything, despite the fact that in many cases that will be very wasteful).
In many programs, it's pretty easy to subdivide memory usage into two categories: small things which together total up to a fairly small amount of stuff (64K or 4GB) but will be accessed often, and larger things which may total up to much larger quantity, but which need not be accessed so often. When an application needs to work with part of an object in the "large things" area, it copies that part to the "small things" area, works with it, and if necessary writes it back.
Some programmers gripe at having to distinguish between "near" and "far" memory, but in many cases making such distinctions can allow compilers to produce much better code.
(note: Even on many 32-bit systems, certain areas of memory can be accessed directly without extra instructions, while other areas cannot. If, for example, on a 68000 or an ARM, one keeps a register pointing at global variable storage, it will be possible to directly load any variable within the first 32K (68000) or 2K (ARM) of that register. Fetching a variable stored elsewhere will require an extra instruction to compute the address. Placing more frequently-used variables in the preferred regions and letting the compiler know would allow for more efficient code generation.

This terminology was used in 16 bit architectures.
In 16 bit systems, data was partitioned into 64Kb segments. Each loadable module (program file, dynamically loaded library etc) had an associated data segment - which could store up to 64Kb of data only.
A NEAR pointer was a pointer with 16 bit storage, and referred to data (only) in the current modules data segment.
16bit programs that had more than 64Kb of data as a requirement could access special allocators that would return a FAR pointer - which was a data segment id in the upper 16 bits, and a pointer into that data segment, in the lower 16 bits.
Yet larger programs would want to deal with more than 64Kb of contiguous data. A HUGE pointer looks exactly like a far pointer - it has 32bit storage - but the allocator has taken care to arrange a range of data segments, with consecutive IDs, so that by simply incrementing the data segment selector the next 64Kb chunk of data can be reached.
The underlying C and C++ language standards never really recognized these concepts officially in their memory models - all pointers in a C or C++ program are supposed to be the same size. So the NEAR, FAR and HUGE attributes were extensions provided by the various compiler vendors.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js