why the value of address in c s always even? - c++

Why does the value of the address in C and C++ is always even?
For example: I declare a variable int x and x has a memory address 0x6ffe1c (in hexa decimal). No matter what. That value is never odd number it is always an even number. Why is that so??

Computer memory is composed of bits. The bits are organized into groups. A computer may have a gigabyte of memory, which is over 1,000,000,000 bytes or 8,000,000,000 bits, but the physical connections to memory cannot simply get any one particular bit from that memory.
When the processor wants data from memory, it puts a signal on a bus that asks for a particular word of memory. A bus is largely a set of wires that connects different parts of the computer. A word of memory is a group of bits of some size particular to that hardware, perhaps 32 bits. When the memory device sees a request for a word, it gets those bits and puts them on the bus, all at once. (The bus for that will have 32 or more wires, so it can carry all the data for one word at one time.)
Let’s continue with the example of 32-bit words. Since memory is grouped into words of 32 bits, each word has a memory address that is a multiple of 32 bits, or four bytes. And every address that is a multiple of four (0, 4, 8, 12, 16, … 4096, 4100, 4104,…) is the address of a word. The processor always reads or writes memory in units of words—that is the only interaction the hardware can do; the processor cannot read individual bytes from memory. If your int is in a single word, then the processor can get it from memory by asking for that word.
On the other hand, suppose your int starts at address 99. Then one byte of it is in the word that starts at address 96 (addresses 96 to 99), and three bytes of it are in the word that starts at address 100 (addresses 100 to 103). In order to get your int, the processor has to read two words and then stitch together bytes from them to make one int.
First, that is a waste of time. Doing two reads from memory takes longer than doing one read. Second, if the processor has to have extra wires and circuits for doing that, it makes the processor more expensive and use more energy, and it takes resources away from other things the processor could be doing, like adding or multiplying.
So processors are designed to prefer aligned data. They may have components for handling unaligned data, but using those components may take extra time or resources. So compilers are designed to align objects in ways that are preferable for the target architecture.

Why does the value of the address in C and C++ is always even?
This is not true in general. For example, if you have an array char[2], you'll find that exactly one of those elements has an even address, and the other must have an odd address.
I declare a variable int x and x has a memory address 0x6ffe1c
Every type in C and C++ have have some alignment requirement. That is, objects of the type must be stored in an address that is divisible by that alignment. Alignment requirement is an integer that is always a power of two.
There is exactly one power of two that is not even: 20 == 1. Objects with an alignment requirement of 1 can be stored in odd addresses. char always has size and alignment of 1 byte. int typically has a higher alignment requirement in which case it will be stored in an even address.
The reason why alignment is important is that there are CPU instruction sets which only allow reading and writing to memory addresses that are aligned to the width of the CPU word. Other CPU instruction sets may support operations on addresses aligned to some fractions of the word size. Further, some CPU instruction sets, such as the x86 support operating on entirely unaligned address but (at least on older models), such operations may be much slower.

Related

Cache mapping techniques

Im trying to understand hardware Caches. I have a slight idea, but i would like to ask on here whether my understanding is correct or not.
So i understand that there are 3 types of cache mapping, direct, full associative and set associative.
I would like to know is the type of mapping implemented with logic gates in hardware and specific to say some computer system and in order to change the mapping, one would be required to changed the electrical connections?
My current understanding is that in RAM, there exists a memory address to refer to each block of memory. Within a block contains words, each words contain a number of bytes. We can represent the number of options with number of bits.
So for example, 4096 memory locations, each memory location contains 16 bytes. If we were to refer to each byte then 2^12*2^4 = 2^16
16 bit memory address would be required to refer to each byte.
The cache also has a memory address, valid bit, tag, and some data capable of storing a block of main memory of n words and thus m bytes. Where m = n*i (bytes per word)
For an example, direct mapping
1 block of main memory can only be at one particular memory location in cache. When the CPU requests for some data using a 16bit memory location of RAM, it checks for cache first.
How does it know that this particular 16bit memory address can only be in a few places?
My thoughts are, there could be some electrical connection between every RAM address to a cache address. The 16bit address could then be split into parts, for example only compare the left 8bits with every cache memory address, then if match compare the byte bits, then tag bits then valid bit
Is my understanding correct? Thank you!
Really do appreciate if someone read this long post
You may want to read 3.3.1 Associativity in What Every Programmer Should Know About Memory from Ulrich Drepper.
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf#subsubsection.3.3.1
The title is a little bit catchy, but it explains everything you ask in detail.
In short:
the problem of caches is the number of comparisons. If your cache holds 100 blocks, you need to perform 100 comparisons in one cycle. You can reduce this number with the introduction of sets. if A specific memory-region can only be placed in slot 1-10, you reduce the number of comparisons to 10.
The sets are addressed by an additional bit-field inside the memory-address called index.
So for instance your 16 Bit (from your example) could be splitted into:
[15:6] block-address; stored in the `cache` as the `tag` to identify the block
[5:4] index-bits; 2Bit-->4 sets
[3:0] block-offset; byte position inside the block
so the choice of the method depends on the availability of hardware-resources and the access-time you want to archive. Its pretty much hardwired, since you want to reduce the comparison-logic.
There are few mapping functions used for map cache lines with main memory
Direct Mapping
Associative Mapping
Set-Associative Mapping
you have to have an slight idea about these three mapping functions

How does std::alignas optimize the performance of a program?

In 32-bit machine, One memory read cycle gets 4 bytes of data.
So for reading below buffer, It should take 32 read-cycle to read a buffer of 128 bytes mentioned below.
char buffer[128];
Now, Suppose if I have aligned this buffer as mentioned below then please let me know how will it make it faster to read?
alignas(128) char buffer[128];
I am assuming the memory read cycle will remain 4 bytes only.
The size of the registers used for memory access is only one part of the story, the other part is the size of the cache-line.
If a cache-line is 64 bytes and your char[128] is naturally aligned, the CPU generally needs to manipulate three different cache-lines. With alignas(64) or alignas(128), only two cache-lines need to be touched.
If you are working with memory mapped file, or under swapping conditions, the next level of alignment kicks in: the size of a memory page. This would call for 4096 or 8192 byte alignments.
However, I seriously doubt that alignas() has any significant positive effect if the specified alignment is larger than the natural alignment that the compiler uses anyway: It significantly increases memory consumption, which may be enough to trigger more cache-lines/memory pages being touched in the first place. It's only the small misalignments that need to be avoided because they may trigger huge slowdowns on some CPUs, or might be downright illegal/impossible on others.
Thus, truth is only in measurement: If you need all the speedup you can get, try it, measure the runtime difference, and see whether it works out.
In 32 bit machine, One memory read cycle gets 4 bytes of data.
It's not that simple. Just the term "32 bit machine" is already too broad and can mean many things. 32b registers (GP registers? ALU registers? Address registers?)? 32b address bus? 32b data bus? 32b instruction word size?
And "memory read" by whom. CPU? Cache? DMA chip?
If you have a HW platform where memory is read by 4 bytes (aligned by 4) in single cycle and without any cache, then alignas(128) will do no difference (than alignas(4)).

Why does Malloc() care about boundary alignments?

I've heard that malloc() aligns memory based on the type that is being allocated. For example, from the book Understanding and Using C Pointers:
The memory allocated will be aligned according to the pointer's data type. Fore example, a four-byte integer would be allocated on an address boundary evenly divisible by four.
If I follow, this means that
int *integer=malloc(sizeof(int)); will be allocated on an address boundary evenly divisible by four. Even without casting (int *) on malloc.
I was working on a chat server; I read of a similar effect with structs.
And I have to ask: logically, why does it matter what the address boundary itself is divisible on? What's wrong with allocating a group of memory to the tune of n*sizeof(int) using an integer on address 129?
I know how pointer arithmetic works *(integer+1), but I can't work out the importance of boundaries...
The memory allocated will be aligned according to the pointer's data
type.
If you are talking about malloc, this is false. malloc doesn't care what you do with the data and will allocate memory aligned to fit the most stringent native type of the implementation.
From the standard:
The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object with a
fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated (until the
space is explicitly deallocated)
And:
Logically, why does it matter what the address boundary itself is
divisible on
Due to the workings of the underlying machine, accessing unaligned data might be more expensive (e.g. x86) or illegal (e.g. arm). This lets the hardware take shortcuts that improve performance / simplify implementation.
In many processors, data that isn't aligned will cause a "trap" or "exception" (this is a different form of exception than those understood by the C++ compiler. Even on processors that don't trap when data isn't aligned, it is typically slower (twice as slow, for example) when the data is not correctly aligned. So it's in the compiler's/runtime library's best interest to ensure that things are nicely aligned.
And by the way, malloc (typically) doesn't know what you are allocating. Insteat, malloc will align ALL data, no matter what size it is, to some suitable boundary that is "good enough" for general data-access - typically 8 or 16 bytes in modern OS/processor combinations, 4 bytes in older systems.
This is because malloc won't know if you do char* p = malloc(1000); or double* p = malloc(1000);, so it has to assume you are storing double or whatever is the item with the largest alignment requirement.
The importance of alignment is not a language issue but a hardware issue. Some machines are incapable of reading a data value that is not properly aligned. Others can do it but do so less efficiently, e.g., requiring two reads to read one misaligned value.
The book quote is wrong; the memory returned by malloc is guaranteed to be aligned correctly for any type. Even if you write char *ch = malloc(37);, it is still aligned for int or any other type.
You seem to be asking "What is alignment?" If so, there are several questions on SO about this already, e.g. here, or a good explanation from IBM here.
It depends on the hardware. Even assuming int is 32 bits, malloc(sizeof(int)) could return an address divisible by 1, 2, or 4. Different processors handle unaligned access differently.
Processors don't read directly from RAM any more, that's too slow (it takes hundreds of cycles). So when they do grab RAM, they grab it in big chunks, like 64 bytes at a time. If your address isn't aligned, the 4-byte integer might straddle two 64-byte cache lines, so your processor has to do two loads and fix up the result. Or maybe the engineers decided that building the hardware to fix up unaligned loads isn't necessary, so the processor signals an exception: either your program crashes, or the operating system catches the exception and fixes up the operation (hundreds of wasted cycles).
Aligning addresses means your program plays nicely with hardware.
Because it's more fast; Most processor likes data which is aligned. Even, Some processor CANNOT access data which is not aligned! (If you try to access this data, processor may occur fault)

Virtual memory and alignment - how do they factor together?

I think I understand memory alignment, but what confuses me is that the address of a pointer on some systems is going to be in virtual memory, right? So most of the checking/ensuring of alignment I have seen seem to just use the pointer address. Is it not possible that the physical memory address will not be aligned? Isn't that problematic for things like SSE?
The physical address will be aligned because virtual memory only maps aligned pages to physical memory (and the pages are typically 4KB).
So unless you need alignment > page size, the physical memory will be aligned as per your requirements.
In the specific case of SSE, everything works fine because you only need 16 byte alignment.
I am not aware of any actual system in which an aligned virtual memory address can result in a misaligned physical memory address.
Typically, all alignments on a given platform will be powers of two. For example, on x86 32-bit integers have a natural alignment of 4 bytes (2^2). The page size - which defines how fine a block you can map in physical memory - is generally a large power of two. On x86, the smallest allowable page size is 4096 bytes (2^12). The largest datatype that might need alignment on x86 is 128 bits (for XMM registers and CMPXCHG16B) 32 bytes (for AVX) - 2^5. Since 2^12 is divisible by 2^5, you'll find that everything aligns right at the start of a page, and since pages are aligned both in virtual and physical memory, a virtual-aligned address will always be physical-aligned.
On a more practical level, allowing aligned virtual addresses to map to unaligned physical addresses not only would make it really hard to generate code, it would also make the CPU architecture more complex than simply allowing any alignment (since now we have odd-sized pages and other weirdness...)
Note that you may have reason to ask for larger alignments than a page from time to time. Typically, for user space coding, it doesn't matter if this is aligned in physical RAM (for that matter, if you're requesting multiple pages, it's unlikely to be even contiguous!). Problems here only arise if you're writing a device driver and need a large, aligned, contiguous block for DMA. But even then usually the device isn't a stickler about larger-than-page-size alignment.

What are near, far and huge pointers?

Can anyone explain to me these pointers with a suitable example ... and when these pointers are used?
The primary example is the Intel X86 architecture.
The Intel 8086 was, internally, a 16-bit processor: all of its registers were 16 bits wide. However, the address bus was 20 bits wide (1 MiB). This meant that you couldn't hold an entire address in a register, limiting you to the first 64 kiB.
Intel's solution was to create 16-bit "segment registers" whose contents would be shifted left four bits and added to the address. For example:
DS ("Data Segment") register: 1234 h
DX ("D eXtended") register: + 5678h
------
Actual address read: 179B8h
This created the concept of 64 kiB segment. Thus a "near" pointer would just be the contents of the DX register (5678h), and would be invalid unless the DS register was already set correctly, while a "far" pointer was 32 bits (12345678h, DS followed by DX) and would always work (but was slower since you had to load two registers and then restore the DS register when done).
(As supercat notes below, an offset to DX that overflowed would "roll over" before being added to DS to get the final address. This allowed 16-bit offsets to access any address in the 64 kiB segment, not just the part that was ± 32 kiB from where DX pointed, as is done in other architectures with 16-bit relative offset addressing in some instructions.)
However, note that you could have two "far" pointers that are different values but point to the same address. For example, the far pointer 100079B8h points to the same place as 12345678h. Thus, pointer comparison on far pointers was an invalid operation: the pointers could differ, but still point to the same place.
This was where I decided that Macs (with Motorola 68000 processors at the time) weren't so bad after all, so I missed out on huge pointers. IIRC, they were just far pointers that guaranteed that all the overlapping bits in the segment registers were 0's, as in the second example.
Motorola didn't have this problem with their 6800 series of processors, since they were limited to 64 kiB, When they created the 68000 architecture, they went straight to 32 bit registers, and thus never had need for near, far, or huge pointers. (Instead, their problem was that only the bottom 24 bits of the address actually mattered, so some programmers (notoriously Apple) would use the high 8 bits as "pointer flags", causing problems when address buses expanded to 32 bits (4 GiB).)
Linus Torvalds just held out until the 80386, which offered a "protected mode" where the addresses were 32 bits, and the segment registers were the high half of the address, and no addition was needed, and wrote Linux from the outset to use protected mode only, no weird segment stuff, and that's why you don't have near and far pointer support in Linux (and why no company designing a new architecture will ever go back to them if they want Linux support). And they ate Robin's minstrels, and there was much rejoicing. (Yay...)
Difference between far and huge pointers:
As we know by default the pointers are near for example: int *p is a near pointer. Size of near pointer is 2 bytes in case of 16 bit compiler. And we already know very well size varies compiler to compiler; they only store the offset of the address the pointer it is referencing. An address consisting of only an offset has a range of 0 - 64K bytes.
Far and huge pointers:
Far and huge pointers have a size of 4 bytes. They store both the segment and the offset of the address the pointer is referencing. Then what is the difference between them?
Limitation of far pointer:
We cannot change or modify the segment address of given far address by applying any arithmetic operation on it. That is by using arithmetic operator we cannot jump from one segment to other segment.
If you will increment the far address beyond the maximum value of its offset address instead of incrementing segment address it will repeat its offset address in cyclic order. This is also called wrapping, i.e. if offset is 0xffff and we add 1 then it is 0x0000 and similarly if we decrease 0x0000 by 1 then it is 0xffff and remember there is no change in the segment.
Now I am going to compare huge and far pointers :
1.When a far pointer is incremented or decremented ONLY the offset of the pointer is actually incremented or decremented but in case of huge pointer both segment and offset value will change.
Consider the following Example, taken from HERE :
int main()
{
char far* f=(char far*)0x0000ffff;
printf("%Fp",f+0x1);
return 0;
}
then the output is:
0000:0000
There is no change in segment value.
And in case of huge Pointers :
int main()
{
char huge* h=(char huge*)0x0000000f;
printf("%Fp",h+0x1);
return 0;
}
The Output is:
0001:0000
This is because of increment operation not only offset value but segment value also change.That means segment will not change in case of far pointers but in case of huge pointer, it can move from one segment to another .
2.When relational operators are used on far pointers only the offsets are compared.In other words relational operators will only work on far pointers if the segment values of the pointers being compared are the same. And in case of huge this will not happen, actually comparison of absolute addresses takes place.Let us understand with the help of an example of far pointer :
int main()
{
char far * p=(char far*)0x12340001;
char far* p1=(char far*)0x12300041;
if(p==p1)
printf("same");
else
printf("different");
return 0;
}
Output:
different
In huge pointer :
int main()
{
char huge * p=(char huge*)0x12340001;
char huge* p1=(char huge*)0x12300041;
if(p==p1)
printf("same");
else
printf("different");
return 0;
}
Output:
same
Explanation: As we see the absolute address for both p and p1 is 12341 (1234*10+1 or 1230*10+41) but they are not considered equal in 1st case because in case of far pointers only offsets are compared i.e. it will check whether 0001==0041. Which is false.
And in case of huge pointers, the comparison operation is performed on absolute addresses that are equal.
A far pointer is never normalized but a huge pointer is normalized . A normalized pointer is one that has as much of the address as possible in the segment, meaning that the offset is never larger than 15.
suppose if we have 0x1234:1234 then the normalized form of it is 0x1357:0004(absolute address is 13574).
A huge pointer is normalized only when some arithmetic operation is performed on it, and not normalized during assignment.
int main()
{
char huge* h=(char huge*)0x12341234;
char huge* h1=(char huge*)0x12341234;
printf("h=%Fp\nh1=%Fp",h,h1+0x1);
return 0;
}
Output:
h=1234:1234
h1=1357:0005
Explanation:huge pointer is not normalized in case of assignment.But if an arithmetic operation is performed on it, it will be normalized.So, h is 1234:1234 and h1 is 1357:0005which is normalized.
4.The offset of huge pointer is less than 16 because of normalization and not so in case of far pointers.
lets take an example to understand what I want to say :
int main()
{
char far* f=(char far*)0x0000000f;
printf("%Fp",f+0x1);
return 0;
}
Output:
0000:0010
In case of huge pointer :
int main()
{
char huge* h=(char huge*)0x0000000f;
printf("%Fp",h+0x1);
return 0;
}
Output:
0001:0000
Explanation:as we increment far pointer by 1 it will be 0000:0010.And as we increment huge pointer by 1 then it will be 0001:0000 because it's offset cant be greater than 15 in other words it will be normalized.
In the old days, according to the Turbo C manual, a near pointer was merely 16 bits when your entire code and data fit in the one segment. A far pointer was composed of a segment as well as an offset but no normalisation was performed. And a huge pointer was automatically normalised. Two far pointers could conceivably point to the same location in memory but be different whereas the normalised huge pointers pointing to the same memory location would always be equal.
All of the stuff in this answer is relevant only to the old 8086 and 80286 segmented memory model.
near: a 16 bit pointer that can address any byte in a 64k segment
far: a 32 bit pointer that contains a segment and an offset. Note that because segments can overlap, two different far pointers can point to the same address.
huge: a 32 bit pointer in which the segment is "normalised" so that no two far pointers point to the same address unless they have the same value.
tee: a drink with jam and bread.
That will bring us back to doh oh oh oh
and when these pointers are used?
in the 1980's and 90' until 32 bit Windows became ubiquitous,
In some architectures, a pointer which can point to every object in the system will be larger and slower to work with than one which can point to a useful subset of things. Many people have given answers related to the 16-bit x86 architecture. Various types of pointers were common on 16-bit systems, though near/fear distinctions could reappear in 64-bit systems, depending upon how they're implemented (I wouldn't be surprised if many development systems go to 64-bit pointers for everything, despite the fact that in many cases that will be very wasteful).
In many programs, it's pretty easy to subdivide memory usage into two categories: small things which together total up to a fairly small amount of stuff (64K or 4GB) but will be accessed often, and larger things which may total up to much larger quantity, but which need not be accessed so often. When an application needs to work with part of an object in the "large things" area, it copies that part to the "small things" area, works with it, and if necessary writes it back.
Some programmers gripe at having to distinguish between "near" and "far" memory, but in many cases making such distinctions can allow compilers to produce much better code.
(note: Even on many 32-bit systems, certain areas of memory can be accessed directly without extra instructions, while other areas cannot. If, for example, on a 68000 or an ARM, one keeps a register pointing at global variable storage, it will be possible to directly load any variable within the first 32K (68000) or 2K (ARM) of that register. Fetching a variable stored elsewhere will require an extra instruction to compute the address. Placing more frequently-used variables in the preferred regions and letting the compiler know would allow for more efficient code generation.
This terminology was used in 16 bit architectures.
In 16 bit systems, data was partitioned into 64Kb segments. Each loadable module (program file, dynamically loaded library etc) had an associated data segment - which could store up to 64Kb of data only.
A NEAR pointer was a pointer with 16 bit storage, and referred to data (only) in the current modules data segment.
16bit programs that had more than 64Kb of data as a requirement could access special allocators that would return a FAR pointer - which was a data segment id in the upper 16 bits, and a pointer into that data segment, in the lower 16 bits.
Yet larger programs would want to deal with more than 64Kb of contiguous data. A HUGE pointer looks exactly like a far pointer - it has 32bit storage - but the allocator has taken care to arrange a range of data segments, with consecutive IDs, so that by simply incrementing the data segment selector the next 64Kb chunk of data can be reached.
The underlying C and C++ language standards never really recognized these concepts officially in their memory models - all pointers in a C or C++ program are supposed to be the same size. So the NEAR, FAR and HUGE attributes were extensions provided by the various compiler vendors.