How does a compiler know the alignment of a physical address? - c++

I know that some CPU architectures don't support unaligned address access(e.g., ARM architectures prior to ARM 4 had no instructions to access half-word objects in memory). And some compiler(e.g., some version of GCC) for that architecture will use a series of memory access when it finds a misaligned address, so that the misaligned access is almost transparent to developers.(Refer to The Definitive Guide to GCC, By William von Hagen)
But I'm wondering how does a compiler know whether an address is aligned or not? After all, what a compiler sees is the virtual address(effective address, EA), if it can see anything. When the program is run, EA could be mapped to any physical address by OS. Even if virtual address is aligned, the resulting physical address could be misaligned, isn't it? The alignment of physical address is what really matters and transfers on CPU address lines.
Because a compiler is not aware of the physical address at all, how can it be smart enough to know if a variable's address is aligned?

Virtual address is not mapped to just any physical address. Virtual memory comes in pages that are mapped in an aligned manner to physical pages. (generally aligned to 4096).
See: Virtual memory and alignment - how do they factor together?

Alignment is a very useful attribute for object code, partly because some machines insist on "aligned access" but in modern computers because cache lines have huge impact on performance and thus cache-alignment of code/loops/data/locks is thus a requirement from your local friendly compiler.
Virtally all the loaders in the world support loading of code at power-of-two aligned boundaries of some modest size and on up. (Assemblers and linkers support this too with various ALIGNMENT directives). Often linkers and loaders just align the first loaded value anyway to a well-known boundary size; OSes with virtual memory often provide a convenient boundary based on VM page size (ties to other answer).
So a compiler can essentially know what the alignment of its emitted code/data is. And by keeping track of how much code it has emitted, it can know what the alignment of any emitted value is. If it needs alignment, it can issue a linker directive, or for modest sizes, simply pad until the emitted amount of code is suitably aligned.
Because of this, you can be pretty sure most compilers will not place code or data constructs in ways that cross cache line (or other architecture imposed) boundaries in a way that materially affects performance unless directed to do so.

Related

Are std::vector elements contiguous in physical memory?

My question is similar to this, however I'm asking something a bit different.
It is clear, that it is possible to use the address of the first std::vector element as a C type array. That means that in virtual memory, std::vector elements are contiguous. However, if physical memory is fragmented, it is possible that std::vector is actually split into many parts in the physical memory.
My question is: Are std::vector elements contiguous in physical memory (as well as virtual memory)?
The memory used to store the data in a vector must be at contiguous addresses as those addresses are visible to the code.
In a typical case on most modern CPUs/OSes, that will mean the virtual addresses must be contiguous. If those virtual addresses cross a page boundary, then there's a good chance that the physical addresses will no longer be contiguous.
I should add that this is only rarely a major concern. Modern systems have at least some support for such fragmented memory usage right down to the hardware level in many cases. For example, many network and disk controllers include "scatter/gather" capability, where the OS uses the page tables to translate the virtual addresses for the buffer to physical addresses, then supplies a number of physical addresses directly to the controller, which then gathers the data from those addresses if it's transferring from memory to peripheral or "scatters" the data out to those addresses if it's transferring from peripheral to memory.
No, there is no guarantee that you will be provided contiguous physical memory in C++'s abstract machine. Abstractions and hardware below malloc are free to use discontiguous memory.
Only your targeted implementation could make such a guarantee, but the language/model does not care. It relies on the system to do its job.
Virtual to physical memory mapping is handled largely by the CPU, but with kernel support. A userland process cannot know what this mapping is: your program, no matter what the programming language, deals solely in virtual memory addresses. You cannot expect, nor is there any way of even finding out, if two adjacent virtual memory addresses that straddle a page boundary are adjacent in physical memory, so there is absolutely no point worrying about it.

What type of address returned on applying ampersand to a variable or a data type in C/C++ or in any other such language?

This is a very basic question boggling mind since the day I heard about the concept of virtual and physical memory concept in my OS class. Now I know that at load time and compile time , virtual address and logical adress binding scheme is same but at execution time they differ.
First of all why is it beneficial to generate virtual address at compile and load time and and what is returned when we apply the ampersand operator to get the address of a variable, naive datatypes , user-defined type and function definition addresses?
And how does OS maps exactly from virtual to physical address when it does so? These questions are hust out from curiosity and I would love some good and deep insights considering modern day OS' , How was it in early days OS' .I am only C/C++ specific since I don't know much about other languages.
Physical addresses occur in hardware, not software. A possible/occasional exception is in the operating system kernel. Physical means it's the address that the system bus and the RAM chips see.
Not only are physical addresses useless to software, but it could be a security issue. Being able to access any physical memory without address translation, and knowing the addresses of other processes, would allow unfettered access to the machine.
That said, smaller or embedded machines might have no virtual memory, and some older operating systems did allow shared libraries to specify their final physical memory location. Such policies hurt security and are obsolete.
At the application level (e.g. Linux application process), only virtual addresses exist. Local variables are on the stack (or in registers). The stack is organized in call frames. The compiler generates the offset of a local variable within the current call frame, usually an offset relative to the stack pointer or frame pointer register (so the address of a local variable, e.g. in a recursive function, is known only at runtime).
Try to step by step a recursive function in your gdb debugger and display the address of some local variable to understand more. Try also the bt command of gdb.
Type
cat /proc/self/maps
to understand the address space (and virtual memory mapping) of the process executing that cat command.
Within the kernel, the mapping from virtual addresses to physical RAM is done by code implementing paging and driving the MMU. Some system calls (notably mmap(2) and others) can change the address space of your process.
Some early computers (e.g. those from the 1950-s or early 1960-s like CAB 500 or IBM 1130 or IBM 1620) did not have any MMU, even the original Intel 8086 didn't have any memory protection. At that time (1960-s), C did not exist. On processors without MMU you don't have virtual addresses (only physical ones, including in your embedded C code for a washing-machine manufacturer). Some machines could protect writing into some memory banks thru physical switches. Today, some low end cheap processors (those in washing machines) don't have any MMU. Most cheap microcontrollers don't have any MMU. Often (but not always), the program is in some ROM so cannot be overwritten by buggy code.

boost::interprocess - allocate_aligned - same alignment guaranteed in all processes?

If I use allocate_aligned to allocate a chunk of aligned memory in managed shared memory, is it guaranteed that this allocation will have the same alignment when shared in other processes ? The documentation makes it clear that the base address may be mapped differently, of course, but it doesn't seem to say anything about alignment.
I've run an experiment which seem to show that the alignment is the same, but that may just be down to luck and so I'd like to get a more reliable confirmation as to the expected behaviour. (Common sense says that it ought to be the same alignment, otherwise it would seriously limit the usefulness of allocate_aligned in shared memory, but I really need more than just an appeal to common sense.)
Yes, unless you need more than page alignment for some strange reason.
The base address may be mapped differently, but such mappings are done with page granularity. This implies 4K alignment on common architectures.

What are the most common configurations where pointer writes are not atomic?

I am interested in multithreading. There are a lot of gotchas in the field, for example, there is no guarantee that pointer writes are atomic. I get this, but would like to know what are the most popular current configurations when this is actually the case? For example, on my Macbook Pro/gcc, pointer writes definitely seem to be atomic.
This is mostly a problem for CPU architectures where the pointer width is larger than the width of the CPU architecture. For instance, on ATmega CPUs, an 8-bit architecture, the address space is 16-bit. If there aren't any specific instructions to load and store 16-bit addresses, at least two instructions are needed to load / store a pointer value.
See here.
Nearly each architecture is impacted as Daniel said. Unless memory alignment is enforced each write potentially results into several operations but also this fails if the address bus is smaller than the data bus. So you will most likely need writing code using locking mechanisms. This is anyway a good idea as you probably want your code to be portable. For some very special architectures these locking functions would simply be empty.
Pointers might not be atomic types on platforms that use a segmented address space, like MS-DOS or Win 3.x. But I'm not aware of any modern desktop/server platforms using this kind of architecture (at least at the platform's level).
However, even if a write is atomic from the point of view of the C compiler there might be other issues that come into play, even on modern desktop/server systems, especially when dealing with multicore/multiprocessor systems (caching, memory access reordering done at a lower level by the processor). 'Atomic' APIs provided by a platform deal with those issues using memory barriers (if required), so you still should probably use those APIs when trying to ensure that a memory access is atomic.

How to get the amount of virtual memory available in C++?

I would like to map a file into memory using mmap function and would like to know if the amount of virtual memory on the current platform is sufficient to map a huge file. For a 32 system I cannot map file larger than 4 Gb.
Would std::numeric_limits<size_t>::max() give me the amount of addressable memory or is there any other type that I should test (off_t or something else)?
As Lie Ryan has pointed out in his comment the "virtual memory" here is misused. The question, however holds: there is a type associated with a pointer and it has the maximum value that defines the upper limit of what you can possibly adress on your system. What is this type? Is it size_t or perhaps ptrdiff_t?
size_t is only required to be big enough to store the biggest possible single contiguous object. That may not be the same as the size of the address space (on systems with a segmented memory model, for example)
However, on common platforms with a flat memory space, the two are equal, and so you can get away with using size_t in practice if you know the target CPU.
Anyway, this doesn't really tell you anything useful. Sure, a 32-bit CPU has a 4GB memory space, and so size_t is a 32-bit unsigned integer. But that says nothing about how much you can allocate. Some part of the memory space is used by the OS. And some parts are already used by your own application: for mapping the executable into memory (as well as any dynamic libraries it may use), for each thread's stack, allocated memory on the heap and so on.
So no, tricks such as taking the size of size_t tells you a little bit about the address space you're running in, but nothing very usable. You can ask the OS how much memory is in use by your process and other metrics, but again, that doesn't really help you much. It is possible for a process to use just a couple of megabytes, but have that spread out over so many small allocations that it's impossible to find a contiguous block of memory larger than 100MB, say. And so, on a 32-bit machine, with a process that uses nearly no memory, you'd be unlikely to make such an allocation. (And even if the OS had a magical WhatIsTheLargestPossibleMemoryAllocationICanMake() API, that still wouldn't help you. It would tell you what you needed from a moment ago. You have no guarantee that the answer would still be valid by the time you tried to map the file.
So really, the best you can do is try to map the file, and see if it fails.
Hi you can use GlobalMemoryStatusEx and VirtualQueryEx if you coding in win32
Thing is, the size of a pointer tells you nothing about how much of that "address space" is actually available to you, i.e. can be mapped as a single contiguous chunk.
It's limited by:
the operating system. It may choose to only make a subset of the theoretically-possible address range available to you, because mappable memory is needed for OS-own purposes (like, say, making the graphics card framebuffer visible, and of course for use by the OS itself).
configurable limits. On Linux / UNIX, the "ulimit" command resp. setrlimit() system call allows to restrict the maximum size of an application's address space in various ways, and Windows has similar options through registry parameters.
the history of the application. If the application uses memory mapping extensively, the address space can fragment limiting the maximum size of "available" contiguous virtual addresses.
the hardware platform. Some CPUs have address spaces with "holes"; an example of that is 64bit x86 where pointers are only valid if they're between 0x0..0x7fffffffffff or 0xffff000000000000 and 0xffffffffffffffff. I.e. you have 2x128TB instead of the full 16EB. Think of it as 48-bit "signed" pointers ...
Finally, don't confuse "available memory" and "available address space". There's a difference between doing a malloc(someBigSize) and a mmap(..., someBigSize, ...) because the former might require availability of physical memory to accommodate the request while the latter usually only requires availability of a large-enough free address range.
For UNIX platforms, part of the answer is to use getrlimit(RLIMIT_AS) as this gives the upper bound for the current invocation of your application - as said, the user and/or admin can configure this. You're guaranteed that any attempt to mmap areas larger than that will fail.
Re your rephrased question "upper limit of what you can possibly adress on your system", is somewhat misleading; it's hardware architecture specific. There are 64bit architectures out there (x64, sparc) whose MMU happily allows (uintptr_t)(-1) as valid address, i.e. you can map something into the last page of a 64bit address space. Whether the operating system allows an application to do so or not is again an entirely different question ...
For user applications, the "high mark" isn't (always) fixed a-priori. It's tunable on e.g. Solaris or Linux. That's where getrlimit(RLIMIT_AS) comes in.
Note that again, by specification, there'd be nothing to prevent a (weird) operating system design to choose e.g. putting application stacks and heaps at "low" addresses while putting code at "high" addresses, on a platform with address space holes. You'd need full 64bit pointers there, can't make them any smaller, but there could be an arbitrary number of "inaccessible / invalid" ranges which never are made available to your app.
You can try sizeof(int*). This will give you the length (in bytes) of a pointer on the target platform. Thus, you can find out how big the addressable space is.