Is it possible to access physical address 0? - c++

In C/C++, it is not allowed to access data at address 0.
However, the physical memory are numbered from 0. And, in DOS era, the interrupt vector table was located at physical address 0. The first interrupt vector was the handler of the division-by-zero exception.
My question is:
Under what cases is it allowed to access physical address 0?

To access physical address zero, it depends on which platform you are talking.
The language has no idea on the underlying addressing model, it depends on the OS.
On bare metal environment, you have total control on the page table if paging is enabled, or just de-reference zero if paging is not enabled.
On some Unix and Linux variation, you do mmap and perhaps also open /dev/mem to get a non-null pointer with logical address non-zero but physical address zero, it may require some access rights.
I'm not sure on Windows.
PS. Other answers seems make a confusion on language level pointer and physical address.

In C/C++, it is not allowed to access address 0.
Yes you can, as long as there's addressable memory there. On most platforms, there won't be.
Under what cases is it allowed to access physical address 0?
You can access any physical address if it's mapped into virtual memory. If there's anything sensitive there, then the OS probably won't allow that in user code. Within the kernel, it's just a case of setting up the page tables to include that address.

Generally, the address space in virtual memory is managed by the operating system.
A freestanding C (or C++) implementation could certainly allow you to dereference (void*)0 in an implementation specific way. But beware of undefined behavior.
The C and C++ standards are very careful about the NULL pointer (and C++11 added the nullptr keyword for several good reasons).
A C compiler (at least a hosted implementation) is allowed to suppose that after a successful dereference a pointer is not null. Some optimizations in GCC are doing that.
Most hosted C or C++ implementations have a null pointer which is an all-zero-bits word, but that is not required by the standard (however, it is very common; it helps the compiler and the libc).
However, pragmatically, a lot of software is supposing that NULL is represented by all-zero-bits (in theory it is a mistake).
I know no C implementation where NULL is not all-zero-bits, but that is not required. However, coding such a compiler would be a headache.
On some operating systems, an application can change its address space, e.g. with mmap(2) on POSIX or Linux.
If you really wanted, you could access address 0 in C, but you really should never want to do that.

There's no specification in either C or C++ that would allow you to assign a specific physical address to a pointer. So your question about "how one would access address 0" could be translated to "how one would assign 0 address to a pointer" formally has no answer. You simply can't assign a specific address to a pointer in C/C++.
But you can get that effect through integer-to-pointer conversion:-
uintptr_t null_address = 0;
void *ptr = (void *) null_address;

The C Standard does not require that platforms provide access to any particular physical memory location, zero or otherwise. While it would be common on many embedded platforms for (char*)0x1234 to access physical location 0x1234, nothing in the C Standard requires that.
The C Standard also does not require that platforms do anything in particular if an attempt is made to access a null pointer. Consequently, if a compiler writer wanted to treat a null pointer dereference as an access to physical location zero, such behavior would be conforming.
Compilers where an access to address zero would be meaningful and useful will typically interpret a null pointer access as an access to location zero. The only problem would be with platforms where accesses to location zero are useful but compiler writers don't know that. That situation can generally only be handled by checking the compiler's documentation for ways to force the compiler to treat a null pointer like any other address.

Your question is a bit confusing. You ask whether it is possible to "access physical address zero." In a modern paged operating system, applications cannot specify physical addresses at all. Physical addresses can only be accessed through kernel mode.
In virtual memory systems, it became common not to map the first page of virtual memory into the process address space. Thus, accessing virtual address zero would trigger an access violation.
In such systems, it is usually possible for the application to map the first page to the process access space through system services. Even if the page is not there by default, it can be added by the application.
The answer is that it is possible to access physical memory address zero in kernel mode and it is possible to access virtual address zero IF the application maps that page to memory (and the OS allows that).

Related

Why location 0x00000000 is accessible if it is a flash memory

As I know the memory location 0x00000000 isn't accessible through a pointer in C, but recently in my project I was able to access the zeroth location using a pointer.
The the zeroth location is flash memory.
But according to the post Why is address zero used for the null pointer? , referencing zero through pointer is a reserved value. So, my question is as it is a reserved value why this doesn't hold true for memory mapped to flash at location zero?
Processor : TMS470Rxx
OS : Micro C OS-II
Thanks
There are many platforms where address zero behaves differently than on the typical desktop PC.
There are microcontrollers where hardware devices are memory-mapped at device zero.
There are operating systems that will happily let you specifically map something at address zero if you wish to. If you have writable flash mapped there, then you can write there.
Different platforms work in different ways.
The decision as to what happens when using what address is rather complex. It depends on the processor architecture, OS and sometimes "what some software does".
The processor may not have memory protection, or address zero may indeed be "used for something" (in x86 real-mode, that DOS runs in, address zero contains the vector table, where interrupts and such jump to, so it's incredibly sensitive to being overwritten - hence badly written programs could and would crash not just themselves but the entire machine in DOS).
Typically, modern OS's on processors that have "virtual memory mapping" (so the physical address that the processor actually uses is not (or may not be) the same as the virtual address that the program sees) will map address zero as "not accessible" for your typical applications.
The OS may allow access to address zero at times:
I had a bug in a Windows driver many years ago, that used a NULL-pointer, and that that particular situation, address 0 was not causing a crash. The crash happened much later when the content of address zero was being used for something - at which point the system blue-screened (at least until I attached the debugger and debugged the problem, and then fixed it so it didn't try to use a NULL pointer - I don't remember if I had to allocate some memory or just skipped that bit if the pointer was NULL).
The choice of 0 for NULL is based on a combination of "we have to choose some value (and there is not one value that can be 100% sure that nobody will ever need to use)" - particularly on machines with a 16 or 32-bit address range. Normally, however, address zero, if it is used, contains something "special" (vector table for interrupts, bootstrap code for the processor, or similar), so you are unlikely to meaningfully store data in there. C and C++ as languages do not require that NULL-pointers can't be accessed [but also does not "allow you" to freely access this location], just that this value can be used as "pointer that doesn't point at anything" - and the spec also provides for the case where your NULL value is not ACTUALLY zero. But the compiler has to deal with "ZERO means NULL for pointers, so replace it with X".
The value zero does, at least sometimes, have a benefit over other values - mostly that many processors have special instructions to compare with or identify zero.

range of values a c pointer can take?

In "Computer System: A Programmer's Perspective", section 2.1 (page 31), it says:
The value of a pointer in C is the virtual address of the first byte of some block of storage.
To me it sounds like the C pointer's value can take values from 0 to [size of virtual memory - 1]. Is that the case? If yes, I wonder if there is any mechanism that checks if all pointers in a program are assigned with legal values -- values at least 0 and at most [size of virtual memory - 1], and where such mechanism is built in -- in compiler? OS? or somewhere else?
There is no process that checks pointers for validity as use of invalid pointers has undefined effects anyway.
Usually it will be impossible for a pointer to hold a value outside of the addressable range as the two will have the same available range — e.g. both will be 32 bit. However some CPUs have rules about pointer alignment that may render some addresses invalid for some types of data. Some runtimes, such as 64-bit Objective-C, which is a strict superset of C, use incorrectly aligned pointers to disguise literal objects as objects on the heap.
There are also some cases where the complete address space is defined by the instruction set to be one thing but is implemented by that specific hardware to be another. An example from history is the original 68000 which defined a 32-bit space but had only 24 address lines. Very early versions of Mac OS used the spare 8 bits for flags describing the block of data, relying on the hardware to ignore them.
So:
there's no runtime checking of validity;
even if there were, the meaning of validity is often dependent on the specific model of CPU (not just the family) or specific version of the OS (ditto) so as to make checking a less trivial task than you might guess.
In practise what will normally happen if your address is illegal per that hardware but is accessed as though legal is a processor exception.
A pointer in C is an abstract object. The only guarantee provided by the C standard is that pointers can point to all the things they need to within C: functions, objects, one past the end of an object, and NULL.
In typical C implementations, pointers can point to any address in virtual memory, and some C implementations deliberately support this in large part. However, there are complications. For example, the value used for NULL may be difficult to use as an address, and converting pointers created for one type to another type may fail (due to alignment problems). Additionally, there are legal non-typical C implementations where pointers do not directly correlate to memory addresses in a normal way.
You should not expect to use pointers to access memory arbitrarily without understanding the rules of the C standard and of the C implementations you use.
There is no mechanism in C which will check if pointers in a program are valid. The programmer is responsible for using them correctly.
For practical purposes a C pointer is either NULL or a memory address to something else. I've never heard of NULL being anything but zero in real life. If it's a memory address you're not supposed to "care" what the actual number is; just pass it around, dereference it etc.

non-NULL reserved pointer value

How can I create a reserved pointer value?
The context is this: I have been thinking of how to implement a data structure for a dynamic scripting language (I am not planning on implementing this - just wondering how it would be done).
Strings may contain arbitrary bytes, including NUL. Thus, it is necessary to store the value separately. This requires a pointer (to point to the array) and a number. The first trick is that if the pointer is NULL, it cannot possibly be a valid string, so the number can be used for an actual integer.
If a second reserved pointer value could be created, this could be used to imply that the other field is now being used as a floating-point value. Can this be done?
One thought is to mmap() an address with no permissions, which could also be done to replace the usage of the NULL pointer.
On any modern system, you can just use the pointer values 1, 2, ... 4095 for such purposes. Another frequent choice is (uintptr_t)-1, which is technically inferior, but used more frequently than 1 nevertheless.
Why are these values "safe"?
Modern systems safeguard against NULL pointer accesses by making it impossible to map anything at virtual address zero. Almost any dereferencing of a NULL pointer will hit this nonexistant region, and the hardware will tell the OS system that something bad happened, which triggers the OS to segfault the process.
Since virtual memory pages are page aligned (at least 4k on current hardware), and nothing is mapped to address zero, nothing can be mapped to the entire range 0, ..., 4095, protecting all these addresses in the same way, and you can use them as special purpose values.
How much virtual memory space is reserved for this purpose is a system parameter, on linux it is controlled by /proc/sys/vm/mmap_min_addr, and the root user can change it to zero, which would disable this protection (which would not be a very smart idea). The default on Ubuntu is 64k (i. e. 16 pages).
This is also the reason why (uintptr_1)-1 is less safe than 1; even though any load of more than one byte will hit the zero page, the address (uintptr_1)-1 itself is not necessarily protected in this way. Consequently, doing string operations on (char*)-1 does not necessarily segfault.
Edit:
My original explanation with the special mapping seems to have been a bit stale, probably this was the way things were handled on the old Mac/PPC platform. Even though the effect is pretty much the same, I changed the details of the answer to reflect modern linux. Anyway, the important point is not how the null page protection is achieved, the important point is that any sane, modern system will have some null page protection that encompasses at least the mentioned address range. Some more details can be found in this SO answer: https://stackoverflow.com/a/12645890/2445184
In standard C (and standard C++), the approach that's 100% valid and works is simple: declare a variable, use its address as a magic value.
char *ptr;
char magic;
if (ptr == &magic) { ... }
This guarantees that magic will never have any overlap with another object.
Magic pointer values such as (char *) 1 have their advantages too, but it's so easy to get them wrong (even if you disregard the theoretical implementations where (char *) 1 may be a valid object, if you use (int *) 1 as a magic pointer value, and the optimiser assumes int * values are suitably aligned, it may removes checks that are no-ops only in 100% valid code, not in your code) that I'd recommend the standard approach, and optionally temporarily switch to magic pointer values only if you find they help you debug.
mmaping an address can fail if the address is already assigned. Probably it would better to use an address of some static variable or function. Or to obtain an unique address via malloc(1).

Is the pointer guaranteed to be > a certain value?

In C++ when i do new (or even malloc) is there any guarantee that the return address will be greater than a certain value? Because... in this project i find it -very- useful to use 0-1k as a enum. But i wouldn't want to do that if its possible to get a value that low. My only target systems are 32 or 64bit CPUs with the OS window/linux and mac.
Does the standard say anything about pointers? Does windows or linux say anything about their C runtime and what the lowest memory address (for ram) is?
-edit- i end up modifying my new overload to check if the address is above >1k. I call std::terminate if it doesn't.
In terms of standard, there is nothing. But in reality, it depends on the target OS, windows for instance reserves the first 64kb of memory as a no-mans land (depending on the build it is read-only memory, else it is marked as PAGE_NOACCESS), while it uses the upper 0x80000000+ for kernel memory, but it can be changed, see this & this on MSDN.
On x64 you can also use the higher bits of the address (only 47bits are used for addresses currently), but its not such a good idea, as later on it will change and your program will break (AMD who set the standard also advise against it).
There's no such guarantee. You can try using placement new if you need very specific memory locations but it has certain problems that you'll have to work hard to avoid. Why don't you try using a map with an integer key that has the pointer as its value instead? That way you wouldn't have to rely on specific memory addresses and ranges.
In theory, no -- a pointer's not even guaranteed to be > 0. However, in practice, viewed as an unsigned integer (don't forget that a pointer may have a high-order "1" bit), no system that I know of would have a pointer value less than about 1000. But relying on that is relying on "undefined behavior".
There is no standard for where valid memory addresses come from; to write safe system-independent code, you cannot rely on certain addresses (and even with anecdotal support, you never know when that will change with a new system update).
It's very platform-specific, so I would discourage relying on this kind of information unless you have a very good reason and are aware of consequences for portability, maintainability etc.
NULL is guaranteed to be 0x0 always. If I recall correctly, x86 reserves the first 128 MB of address space as "NULL-equivalent", so that valid pointers can't take on values in this range. On x64 there are some additional addresses which you shouldn't encounter in practice, at least for now.
As for address space reserved for the operating system, it will clearly depend on the OS. On Linux, the kernel-user space division is configurable in the kernel, so at least the 3 splits: 1-3 GB, 2-2 GB and 3-1 GB are common on 32-bit systems. You can find more details on kerneltrap.

Is 0x000001, 0x000002, etc. ever a valid memory address in application level programming?

Or are those things are reserved for the operation system and things like that?
Thanks.
While it's unlikely that 0x00000001, etc. will be valid pointers (especially if you use odd numbers on many processors) using a pointer to store an integer value will be highly system dependent.
Are you really that strapped for space?
Edit:
You could make it portable like this:
char *base = malloc(NUM_MAGIC_VALUES);
#define MAGIC_VALUE_1 (base + 0)
#define MAGIC_VALUE_2 (base + 1)
...
Well the OS is going to give each program it's own virtual memory space, so when the application references memory spaces 0x0000001 or 0x0000002, it's actually referencing some other physical memory address. I would take a look at paging and virtual memory. So a program will never have access to memory the operating system is using. However I would stay away from manually assigning a memory address for a pointer rather than using malloc() because those memory addresses might be text or reserved space.
This depends on operating system layout. For User space applications running in general purpose operating systems, these are inaccessible addresses.
This problem is related to a architecture's virtual address space. Have a loot at this http://web.cs.wpi.edu/~cs3013/c07/lectures/Section09.1-Intel.pdf
Of course, you can do this:
int* myPointer1 = 0x000001;
int* myPointer2 = 0x000032;
But do not try to dereference addresses, cause it will end in an Access Violation.
The OS gives you the memory, by the way these addresses are just virtual
the OS hides the details and shows it like a big, continous stripe.
Maybe the 0x000000-0x211501 part is on a webserver and you read/write it through net,
and remaining is on your hard disk. Physical memory is just an illusion from your current viewpoint.
You tagged your question C++. I believe that in C++ the address at 0 is reserved and is normally referred to as NULL. Other than that you cannot assume anything. If you want to ask about a particular implementation on a particular OS then that would be a different question.
It depends on the compiler/platform, but many older compilers actually have something like the string "(null)" at address 0x00000000. This is a debug feature because that string will show up if a NULL pointer is ever used by accident. On newer systems like Windows, a pointer to this area will most likely cause a processor exception.
I can pretty much guarantee that address 1 and 2 will either be in use or will raise a processor exception if they're ever used. You can store any value you like in a pointer. But if you try and dereference a pointer with a random value, you're definitely asking for problems.
How about a nice integer instead?
Although the standard requires that NULL is 0, a pointer that is NULL does not have to consist of all zero bits, although it will do in many implementations. That is also something you have to beware of if you memset a POD struct that contains some pointers, and then rely on the pointers holding "NULL" as their value.
If you want to use the same space as a pointer you could use a union, but I guess what you really want is something that doubles up as a pointer and something else, and you know it is not a pointer to a real address if it contains low-numbered values. (With a union you still need to know which type you have).
I'd be interested to know what the magic other value is really being used for. Is this some lazy-evaluation issue where the pointer gives an indication of how to load the data when it is not yet loaded and a genuine pointer when it is?
Yes, on some platforms address 0x00000001 and 0x00000002 are valid addresses. On other platforms they are not.
In the embedded systems world, the validity depends on what resides at those locations. Some platforms may put interrupt or reset vectors at those addresses. Other embedded platforms may place Position Independent executable code there.
There is no standard specification for the layout of addresses. One cannot assume anything. If you want your code to be portable then forget about accessing specific addresses and leave that to the OS.
Also, the structure of a pointer is platform dependent. So is the conversion of the value in a pointer to a physical address. Some systems may only decode a portion of the pointer, others use the entire pointer value. Some may use indirection (a.k.a. virtual addressing) to access real objects. Still no standardization here either.