Related
Is there any way to determine (programatically, of course) if a given pointer is "valid"? Checking for NULL is easy, but what about things like 0x00001234? When trying to dereference this kind of pointer an exception/crash occurs.
A cross-platform method is preferred, but platform-specific (for Windows and Linux) is also ok.
Update for clarification:
The problem is not with stale/freed/uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
Update for clarification: The problem is not with stale, freed or uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
You can't make that check. There is simply no way you can check whether a pointer is "valid". You have to trust that when people use a function that takes a pointer, those people know what they are doing. If they pass you 0x4211 as a pointer value, then you have to trust it points to address 0x4211. And if they "accidentally" hit an object, then even if you would use some scary operation system function (IsValidPtr or whatever), you would still slip into a bug and not fail fast.
Start using null pointers for signaling this kind of thing and tell the user of your library that they should not use pointers if they tend to accidentally pass invalid pointers, seriously :)
Here are three easy ways for a C program under Linux to get introspective about the status of the memory in which it is running, and why the question has appropriate sophisticated answers in some contexts.
After calling getpagesize() and rounding the pointer to a page
boundary, you can call mincore() to find out if a page is valid and
if it happens to be part of the process working set. Note that this requires
some kernel resources, so you should benchmark it and determine if
calling this function is really appropriate in your api. If your api
is going to be handling interrupts, or reading from serial ports
into memory, it is appropriate to call this to avoid unpredictable
behaviors.
After calling stat() to determine if there is a /proc/self directory available, you can fopen and read through /proc/self/maps
to find information about the region in which a pointer resides.
Study the man page for proc, the process information pseudo-file
system. Obviously this is relatively expensive, but you might be
able to get away with caching the result of the parse into an array
you can efficiently lookup using a binary search. Also consider the
/proc/self/smaps. If your api is for high-performance computing then
the program will want to know about the /proc/self/numa which is
documented under the man page for numa, the non-uniform memory
architecture.
The get_mempolicy(MPOL_F_ADDR) call is appropriate for high performance computing api work where there are multiple threads of
execution and you are managing your work to have affinity for non-uniform memory
as it relates to the cpu cores and socket resources. Such an api
will of course also tell you if a pointer is valid.
Under Microsoft Windows there is the function QueryWorkingSetEx that is documented under the Process Status API (also in the NUMA API).
As a corollary to sophisticated NUMA API programming this function will also let you do simple "testing pointers for validity (C/C++)" work, as such it is unlikely to be deprecated for at least 15 years.
Preventing a crash caused by the caller sending in an invalid pointer is a good way to make silent bugs that are hard to find.
Isn't it better for the programmer using your API to get a clear message that his code is bogus by crashing it rather than hiding it?
On Win32/64 there is a way to do this. Attempt to read the pointer and catch the resulting SEH exeception that will be thrown on failure. If it doesn't throw, then it's a valid pointer.
The problem with this method though is that it just returns whether or not you can read data from the pointer. It makes no guarantee about type safety or any number of other invariants. In general this method is good for little else other than to say "yes, I can read that particular place in memory at a time that has now passed".
In short, Don't do this ;)
Raymond Chen has a blog post on this subject: http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
AFAIK there is no way. You should try to avoid this situation by always setting pointers to NULL after freeing memory.
On Unix you should be able to utilize a kernel syscall that does pointer checking and returns EFAULT, such as:
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdbool.h>
bool isPointerBad( void * p )
{
int fh = open( p, 0, 0 );
int e = errno;
if ( -1 == fh && e == EFAULT )
{
printf( "bad pointer: %p\n", p );
return true;
}
else if ( fh != -1 )
{
close( fh );
}
printf( "good pointer: %p\n", p );
return false;
}
int main()
{
int good = 4;
isPointerBad( (void *)3 );
isPointerBad( &good );
isPointerBad( "/tmp/blah" );
return 0;
}
returning:
bad pointer: 0x3
good pointer: 0x7fff375fd49c
good pointer: 0x400793
There's probably a better syscall to use than open() [perhaps access], since there's a chance that this could lead to actual file creation codepath, and a subsequent close requirement.
Regarding the answer a bit up in this thread:
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
My advice is to stay away from them, someone has already posted this one:
http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
Another post on the same topic and by the same author (I think) is this one:
http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx ("IsBadXxxPtr should really be called CrashProgramRandomly").
If the users of your API sends in bad data, let it crash. If the problem is that the data passed isn't used until later (and that makes it harder to find the cause), add a debug mode where the strings etc. are logged at entry. If they are bad it will be obvious (and probably crash). If it is happening way to often, it might be worth moving your API out of process and let them crash the API process instead of the main process.
Firstly, I don't see any point in trying to protect yourself from the caller deliberately trying to cause a crash. They could easily do this by trying to access through an invalid pointer themselves. There are many other ways - they could just overwrite your memory or the stack. If you need to protect against this sort of thing then you need to be running in a separate process using sockets or some other IPC for communication.
We write quite a lot of software that allows partners/customers/users to extend functionality. Inevitably any bug gets reported to us first so it is useful to be able to easily show that the problem is in the plug-in code. Additionally there are security concerns and some users are more trusted than others.
We use a number of different methods depending on performance/throughput requirements and trustworthyness. From most preferred:
separate processes using sockets (often passing data as text).
separate processes using shared memory (if large amounts of data to pass).
same process separate threads via message queue (if frequent short messages).
same process separate threads all passed data allocated from a memory pool.
same process via direct procedure call - all passed data allocated from a memory pool.
We try never to resort to what you are trying to do when dealing with third party software - especially when we are given the plug-ins/library as binary rather than source code.
Use of a memory pool is quite easy in most circumstances and needn't be inefficient. If YOU allocate the data in the first place then it is trivial to check the pointers against the values you allocated. You could also store the length allocated and add "magic" values before and after the data to check for valid data type and data overruns.
I've got a lot of sympathy with your question, as I'm in an almost identical position myself. I appreciate what a lot of the replies are saying, and they are correct - the routine supplying the pointer should be providing a valid pointer. In my case, it is almost inconceivable that they could have corrupted the pointer - but if they had managed, it would be MY software that crashes, and ME that would get the blame :-(
My requirement isn't that I continue after a segmentation fault - that would be dangerous - I just want to report what happened to the customer before terminating so that they can fix their code rather than blaming me!
This is how I've found to do it (on Windows): http://www.cplusplus.com/reference/clibrary/csignal/signal/
To give a synopsis:
#include <signal.h>
using namespace std;
void terminate(int param)
/// Function executed if a segmentation fault is encountered during the cast to an instance.
{
cerr << "\nThe function received a corrupted reference - please check the user-supplied dll.\n";
cerr << "Terminating program...\n";
exit(1);
}
...
void MyFunction()
{
void (*previous_sigsegv_function)(int);
previous_sigsegv_function = signal(SIGSEGV, terminate);
<-- insert risky stuff here -->
signal(SIGSEGV, previous_sigsegv_function);
}
Now this appears to behave as I would hope (it prints the error message, then terminates the program) - but if someone can spot a flaw, please let me know!
There are no provisions in C++ to test for the validity of a pointer as a general case. One can obviously assume that NULL (0x00000000) is bad, and various compilers and libraries like to use "special values" here and there to make debugging easier (For example, if I ever see a pointer show up as 0xCECECECE in visual studio I know I did something wrong) but the truth is that since a pointer is just an index into memory it's near impossible to tell just by looking at the pointer if it's the "right" index.
There are various tricks that you can do with dynamic_cast and RTTI such to ensure that the object pointed to is of the type that you want, but they all require that you are pointing to something valid in the first place.
If you want to ensure that you program can detect "invalid" pointers then my advice is this: Set every pointer you declare either to NULL or a valid address immediately upon creation and set it to NULL immediately after freeing the memory that it points to. If you are diligent about this practice, then checking for NULL is all you ever need.
Setting the pointer to NULL before and after using is a good technique. This is easy to do in C++ if you manage pointers within a class for example (a string):
class SomeClass
{
public:
SomeClass();
~SomeClass();
void SetText( const char *text);
char *GetText() const { return MyText; }
void Clear();
private:
char * MyText;
};
SomeClass::SomeClass()
{
MyText = NULL;
}
SomeClass::~SomeClass()
{
Clear();
}
void SomeClass::Clear()
{
if (MyText)
free( MyText);
MyText = NULL;
}
void SomeClass::Settext( const char *text)
{
Clear();
MyText = malloc( strlen(text));
if (MyText)
strcpy( MyText, text);
}
Indeed, something could be done under specific occasion: for example if you want to check whether a string pointer string is valid, using write(fd, buf, szie) syscall can help you do the magic: let fd be a file descriptor of temporary file you create for test, and buf pointing to the string you are tesing, if the pointer is invalid write() would return -1 and errno set to EFAULT which indicating that buf is outside your accessible address space.
Peeter Joos answer is pretty good. Here is an "official" way to do it:
#include <sys/mman.h>
#include <stdbool.h>
#include <unistd.h>
bool is_pointer_valid(void *p) {
/* get the page size */
size_t page_size = sysconf(_SC_PAGESIZE);
/* find the address of the page that contains p */
void *base = (void *)((((size_t)p) / page_size) * page_size);
/* call msync, if it returns non-zero, return false */
int ret = msync(base, page_size, MS_ASYNC) != -1;
return ret ? ret : errno != ENOMEM;
}
There isn't any portable way of doing this, and doing it for specific platforms can be anywhere between hard and impossible. In any case, you should never write code that depends on such a check - don't let the pointers take on invalid values in the first place.
As others have said, you can't reliably detect an invalid pointer. Consider some of the forms an invalid pointer might take:
You could have a null pointer. That's one you could easily check for and do something about.
You could have a pointer to somewhere outside of valid memory. What constitutes valid memory varies depending on how the run-time environment of your system sets up the address space. On Unix systems, it is usually a virtual address space starting at 0 and going to some large number of megabytes. On embedded systems, it could be quite small. It might not start at 0, in any case. If your app happens to be running in supervisor mode or the equivalent, then your pointer might reference a real address, which may or may not be backed up with real memory.
You could have a pointer to somewhere inside your valid memory, even inside your data segment, bss, stack or heap, but not pointing at a valid object. A variant of this is a pointer that used to point to a valid object, before something bad happened to the object. Bad things in this context include deallocation, memory corruption, or pointer corruption.
You could have a flat-out illegal pointer, such as a pointer with illegal alignment for the thing being referenced.
The problem gets even worse when you consider segment/offset based architectures and other odd pointer implementations. This sort of thing is normally hidden from the developer by good compilers and judicious use of types, but if you want to pierce the veil and try to outsmart the operating system and compiler developers, well, you can, but there is not one generic way to do it that will handle all of the issues you might run into.
The best thing you can do is allow the crash and put out some good diagnostic information.
In general, it's impossible to do. Here's one particularly nasty case:
struct Point2d {
int x;
int y;
};
struct Point3d {
int x;
int y;
int z;
};
void dump(Point3 *p)
{
printf("[%d %d %d]\n", p->x, p->y, p->z);
}
Point2d points[2] = { {0, 1}, {2, 3} };
Point3d *p3 = reinterpret_cast<Point3d *>(&points[0]);
dump(p3);
On many platforms, this will print out:
[0 1 2]
You're forcing the runtime system to incorrectly interpret bits of memory, but in this case it's not going to crash, because the bits all make sense. This is part of the design of the language (look at C-style polymorphism with struct inaddr, inaddr_in, inaddr_in6), so you can't reliably protect against it on any platform.
It's unbelievable how much misleading information you can read in articles above...
And even in microsoft msdn documentation IsBadPtr is claimed to be banned. Oh well - I prefer working application rather than crashing. Even if term working might be working incorrectly (as long as end-user can continue with application).
By googling I haven't found any useful example for windows - found a solution for 32-bit apps,
http://www.codeproject.com/script/Content/ViewAssociatedFile.aspx?rzp=%2FKB%2Fsystem%2Fdetect-driver%2F%2FDetectDriverSrc.zip&zep=DetectDriverSrc%2FDetectDriver%2Fsrc%2FdrvCppLib%2Frtti.cpp&obid=58895&obtid=2&ovid=2
but I need also to support 64-bit apps, so this solution did not work for me.
But I've harvested wine's source codes, and managed to cook similar kind of code which would work for 64-bit apps as well - attaching code here:
#include <typeinfo.h>
typedef void (*v_table_ptr)();
typedef struct _cpp_object
{
v_table_ptr* vtable;
} cpp_object;
#ifndef _WIN64
typedef struct _rtti_object_locator
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
const type_info *type_descriptor;
//const rtti_object_hierarchy *type_hierarchy;
} rtti_object_locator;
#else
typedef struct
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
unsigned int type_descriptor;
unsigned int type_hierarchy;
unsigned int object_locator;
} rtti_object_locator;
#endif
/* Get type info from an object (internal) */
static const rtti_object_locator* RTTI_GetObjectLocator(void* inptr)
{
cpp_object* cppobj = (cpp_object*) inptr;
const rtti_object_locator* obj_locator = 0;
if (!IsBadReadPtr(cppobj, sizeof(void*)) &&
!IsBadReadPtr(cppobj->vtable - 1, sizeof(void*)) &&
!IsBadReadPtr((void*)cppobj->vtable[-1], sizeof(rtti_object_locator)))
{
obj_locator = (rtti_object_locator*) cppobj->vtable[-1];
}
return obj_locator;
}
And following code can detect whether pointer is valid or not, you need probably to add some NULL checking:
CTest* t = new CTest();
//t = (CTest*) 0;
//t = (CTest*) 0x12345678;
const rtti_object_locator* ptr = RTTI_GetObjectLocator(t);
#ifdef _WIN64
char *base = ptr->signature == 0 ? (char*)RtlPcToFileHeader((void*)ptr, (void**)&base) : (char*)ptr - ptr->object_locator;
const type_info *td = (const type_info*)(base + ptr->type_descriptor);
#else
const type_info *td = ptr->type_descriptor;
#endif
const char* n =td->name();
This gets class name from pointer - I think it should be enough for your needs.
One thing which I'm still afraid is performance of pointer checking - in code snipet above there is already 3-4 API calls being made - might be overkill for time critical applications.
It would be good if someone could measure overhead of pointer checking compared for example to C#/managed c++ calls.
It is not a very good policy to accept arbitrary pointers as input parameters in a public API. It's better to have "plain data" types like an integer, a string or a struct (I mean a classical struct with plain data inside, of course; officially anything can be a struct).
Why? Well because as others say there is no standard way to know whether you've been given a valid pointer or one that points to junk.
But sometimes you don't have the choice - your API must accept a pointer.
In these cases, it is the duty of the caller to pass a good pointer. NULL may be accepted as a value, but not a pointer to junk.
Can you double-check in any way? Well, what I did in a case like that was to define an invariant for the type the pointer points to, and call it when you get it (in debug mode). At least if the invariant fails (or crashes) you know that you were passed a bad value.
// API that does not allow NULL
void PublicApiFunction1(Person* in_person)
{
assert(in_person != NULL);
assert(in_person->Invariant());
// Actual code...
}
// API that allows NULL
void PublicApiFunction2(Person* in_person)
{
assert(in_person == NULL || in_person->Invariant());
// Actual code (must keep in mind that in_person may be NULL)
}
Following does work in Windows (somebody suggested it before):
static void copy(void * target, const void* source, int size)
{
__try
{
CopyMemory(target, source, size);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
doSomething(--whatever--);
}
}
The function has to be static, standalone or static method of some class.
To test on read-only, copy data in the local buffer.
To test on write without modifying contents, write them over.
You can test first/last addresses only.
If pointer is invalid, control will be passed to 'doSomething',
and then outside the brackets.
Just do not use anything requiring destructors, like CString.
On Windows I use this code:
void * G_pPointer = NULL;
const char * G_szPointerName = NULL;
void CheckPointerIternal()
{
char cTest = *((char *)G_pPointer);
}
bool CheckPointerIternalExt()
{
bool bRet = false;
__try
{
CheckPointerIternal();
bRet = true;
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
}
return bRet;
}
void CheckPointer(void * A_pPointer, const char * A_szPointerName)
{
G_pPointer = A_pPointer;
G_szPointerName = A_szPointerName;
if (!CheckPointerIternalExt())
throw std::runtime_error("Invalid pointer " + std::string(G_szPointerName) + "!");
}
Usage:
unsigned long * pTest = (unsigned long *) 0x12345;
CheckPointer(pTest, "pTest"); //throws exception
On macOS, you can do this with mach_vm_region, which as well as telling you if a pointer is valid, also lets you validate what access you have to the memory to which the pointer points (read/write/execute). I provided sample code to do this in my answer to another question:
#include <mach/mach.h>
#include <mach/mach_vm.h>
#include <stdio.h>
#include <stdbool.h>
bool ptr_is_valid(void *ptr, vm_prot_t needs_access) {
vm_map_t task = mach_task_self();
mach_vm_address_t address = (mach_vm_address_t)ptr;
mach_vm_size_t size = 0;
vm_region_basic_info_data_64_t info;
mach_msg_type_number_t count = VM_REGION_BASIC_INFO_COUNT_64;
mach_port_t object_name;
kern_return_t ret = mach_vm_region(task, &address, &size, VM_REGION_BASIC_INFO_64, (vm_region_info_t)&info, &count, &object_name);
if (ret != KERN_SUCCESS) return false;
return ((mach_vm_address_t)ptr) >= address && ((info.protection & needs_access) == needs_access);
}
#define TEST(ptr,acc) printf("ptr_is_valid(%p,access=%d)=%d\n", (void*)(ptr), (acc), ptr_is_valid((void*)(ptr),(acc)))
int main(int argc, char**argv) {
TEST(0,0);
TEST(0,VM_PROT_READ);
TEST(123456789,VM_PROT_READ);
TEST(main,0);
TEST(main,VM_PROT_READ);
TEST(main,VM_PROT_READ|VM_PROT_EXECUTE);
TEST(main,VM_PROT_EXECUTE);
TEST(main,VM_PROT_WRITE);
TEST((void*)(-1),0);
return 0;
}
The SEI CERT C Coding Standard recommendation MEM10-C. Define and use a pointer validation function says it is possible to do a check to some degree, especially under Linux OS.
The method described in the link is to keep track of the highest memory address returned by malloc and add a function that tests if someone tries to use a pointer greater than that value. It is probably of limited use.
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
These take time proportional to the length of the block, so for sanity check I just check the starting address.
I have seen various libraries use some method to check for unreferenced memory and such. I believe they simply "override" the memory allocation and deallocation methods (malloc/free), which has some logic that keeps track of the pointers. I suppose this is overkill for your use case, but it would be one way to do it.
Technically you can override operator new (and delete) and collect information about all allocated memory, so you can have a method to check if heap memory is valid.
but:
you still need a way to check if pointer is allocated on stack ()
you will need to define what is 'valid' pointer:
a) memory on that address is
allocated
b) memory at that address
is start address of object (e.g.
address not in the middle of huge
array)
c) memory at that address
is start address of object of expected type
Bottom line: approach in question is not C++ way, you need to define some rules which ensure that function receives valid pointers.
There is no way to make that check in C++. What should you do if other code passes you an invalid pointer? You should crash. Why? Check out this link: http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx
Addendum to the accpeted answer(s):
Assume that your pointer could hold only three values -- 0, 1 and -1 where 1 signifies a valid pointer, -1 an invalid one and 0 another invalid one. What is the probability that your pointer is NULL, all values being equally likely? 1/3. Now, take the valid case out, so for every invalid case, you have a 50:50 ratio to catch all errors. Looks good right? Scale this for a 4-byte pointer. There are 2^32 or 4294967294 possible values. Of these, only ONE value is correct, one is NULL, and you are still left with 4294967292 other invalid cases. Recalculate: you have a test for 1 out of (4294967292+ 1) invalid cases. A probability of 2.xe-10 or 0 for most practical purposes. Such is the futility of the NULL check.
You know, a new driver (at least on Linux) that is capable of this probably wouldn't be that hard to write.
On the other hand, it would be folly to build your programs like this. Unless you have some really specific and single use for such a thing, I wouldn't recommend it. If you built a large application loaded with constant pointer validity checks it would likely be horrendously slow.
you should avoid these methods because they do not work. blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx – JaredPar Feb 15 '09 at 16:02
If they don't work - next windows update will fix it ?
If they don't work on concept level - function will be probably removed from windows api completely.
MSDN documentation claim that they are banned, and reason for this is probably flaw of further design of application (e.g. generally you should not eat invalid pointers silently - if you're in charge of design of whole application of course), and performance/time of pointer checking.
But you should not claim that they does not work because of some blog.
In my test application I've verified that they do work.
these links may be helpful
_CrtIsValidPointer
Verifies that a specified memory range is valid for reading and writing (debug version only).
http://msdn.microsoft.com/en-us/library/0w1ekd5e.aspx
_CrtCheckMemory
Confirms the integrity of the memory blocks allocated in the debug heap (debug version only).
http://msdn.microsoft.com/en-us/library/e73x0s4b.aspx
Is there any way to determine (programatically, of course) if a given pointer is "valid"? Checking for NULL is easy, but what about things like 0x00001234? When trying to dereference this kind of pointer an exception/crash occurs.
A cross-platform method is preferred, but platform-specific (for Windows and Linux) is also ok.
Update for clarification:
The problem is not with stale/freed/uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
Update for clarification: The problem is not with stale, freed or uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
You can't make that check. There is simply no way you can check whether a pointer is "valid". You have to trust that when people use a function that takes a pointer, those people know what they are doing. If they pass you 0x4211 as a pointer value, then you have to trust it points to address 0x4211. And if they "accidentally" hit an object, then even if you would use some scary operation system function (IsValidPtr or whatever), you would still slip into a bug and not fail fast.
Start using null pointers for signaling this kind of thing and tell the user of your library that they should not use pointers if they tend to accidentally pass invalid pointers, seriously :)
Here are three easy ways for a C program under Linux to get introspective about the status of the memory in which it is running, and why the question has appropriate sophisticated answers in some contexts.
After calling getpagesize() and rounding the pointer to a page
boundary, you can call mincore() to find out if a page is valid and
if it happens to be part of the process working set. Note that this requires
some kernel resources, so you should benchmark it and determine if
calling this function is really appropriate in your api. If your api
is going to be handling interrupts, or reading from serial ports
into memory, it is appropriate to call this to avoid unpredictable
behaviors.
After calling stat() to determine if there is a /proc/self directory available, you can fopen and read through /proc/self/maps
to find information about the region in which a pointer resides.
Study the man page for proc, the process information pseudo-file
system. Obviously this is relatively expensive, but you might be
able to get away with caching the result of the parse into an array
you can efficiently lookup using a binary search. Also consider the
/proc/self/smaps. If your api is for high-performance computing then
the program will want to know about the /proc/self/numa which is
documented under the man page for numa, the non-uniform memory
architecture.
The get_mempolicy(MPOL_F_ADDR) call is appropriate for high performance computing api work where there are multiple threads of
execution and you are managing your work to have affinity for non-uniform memory
as it relates to the cpu cores and socket resources. Such an api
will of course also tell you if a pointer is valid.
Under Microsoft Windows there is the function QueryWorkingSetEx that is documented under the Process Status API (also in the NUMA API).
As a corollary to sophisticated NUMA API programming this function will also let you do simple "testing pointers for validity (C/C++)" work, as such it is unlikely to be deprecated for at least 15 years.
Preventing a crash caused by the caller sending in an invalid pointer is a good way to make silent bugs that are hard to find.
Isn't it better for the programmer using your API to get a clear message that his code is bogus by crashing it rather than hiding it?
On Win32/64 there is a way to do this. Attempt to read the pointer and catch the resulting SEH exeception that will be thrown on failure. If it doesn't throw, then it's a valid pointer.
The problem with this method though is that it just returns whether or not you can read data from the pointer. It makes no guarantee about type safety or any number of other invariants. In general this method is good for little else other than to say "yes, I can read that particular place in memory at a time that has now passed".
In short, Don't do this ;)
Raymond Chen has a blog post on this subject: http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
AFAIK there is no way. You should try to avoid this situation by always setting pointers to NULL after freeing memory.
On Unix you should be able to utilize a kernel syscall that does pointer checking and returns EFAULT, such as:
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdbool.h>
bool isPointerBad( void * p )
{
int fh = open( p, 0, 0 );
int e = errno;
if ( -1 == fh && e == EFAULT )
{
printf( "bad pointer: %p\n", p );
return true;
}
else if ( fh != -1 )
{
close( fh );
}
printf( "good pointer: %p\n", p );
return false;
}
int main()
{
int good = 4;
isPointerBad( (void *)3 );
isPointerBad( &good );
isPointerBad( "/tmp/blah" );
return 0;
}
returning:
bad pointer: 0x3
good pointer: 0x7fff375fd49c
good pointer: 0x400793
There's probably a better syscall to use than open() [perhaps access], since there's a chance that this could lead to actual file creation codepath, and a subsequent close requirement.
Regarding the answer a bit up in this thread:
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
My advice is to stay away from them, someone has already posted this one:
http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
Another post on the same topic and by the same author (I think) is this one:
http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx ("IsBadXxxPtr should really be called CrashProgramRandomly").
If the users of your API sends in bad data, let it crash. If the problem is that the data passed isn't used until later (and that makes it harder to find the cause), add a debug mode where the strings etc. are logged at entry. If they are bad it will be obvious (and probably crash). If it is happening way to often, it might be worth moving your API out of process and let them crash the API process instead of the main process.
Firstly, I don't see any point in trying to protect yourself from the caller deliberately trying to cause a crash. They could easily do this by trying to access through an invalid pointer themselves. There are many other ways - they could just overwrite your memory or the stack. If you need to protect against this sort of thing then you need to be running in a separate process using sockets or some other IPC for communication.
We write quite a lot of software that allows partners/customers/users to extend functionality. Inevitably any bug gets reported to us first so it is useful to be able to easily show that the problem is in the plug-in code. Additionally there are security concerns and some users are more trusted than others.
We use a number of different methods depending on performance/throughput requirements and trustworthyness. From most preferred:
separate processes using sockets (often passing data as text).
separate processes using shared memory (if large amounts of data to pass).
same process separate threads via message queue (if frequent short messages).
same process separate threads all passed data allocated from a memory pool.
same process via direct procedure call - all passed data allocated from a memory pool.
We try never to resort to what you are trying to do when dealing with third party software - especially when we are given the plug-ins/library as binary rather than source code.
Use of a memory pool is quite easy in most circumstances and needn't be inefficient. If YOU allocate the data in the first place then it is trivial to check the pointers against the values you allocated. You could also store the length allocated and add "magic" values before and after the data to check for valid data type and data overruns.
I've got a lot of sympathy with your question, as I'm in an almost identical position myself. I appreciate what a lot of the replies are saying, and they are correct - the routine supplying the pointer should be providing a valid pointer. In my case, it is almost inconceivable that they could have corrupted the pointer - but if they had managed, it would be MY software that crashes, and ME that would get the blame :-(
My requirement isn't that I continue after a segmentation fault - that would be dangerous - I just want to report what happened to the customer before terminating so that they can fix their code rather than blaming me!
This is how I've found to do it (on Windows): http://www.cplusplus.com/reference/clibrary/csignal/signal/
To give a synopsis:
#include <signal.h>
using namespace std;
void terminate(int param)
/// Function executed if a segmentation fault is encountered during the cast to an instance.
{
cerr << "\nThe function received a corrupted reference - please check the user-supplied dll.\n";
cerr << "Terminating program...\n";
exit(1);
}
...
void MyFunction()
{
void (*previous_sigsegv_function)(int);
previous_sigsegv_function = signal(SIGSEGV, terminate);
<-- insert risky stuff here -->
signal(SIGSEGV, previous_sigsegv_function);
}
Now this appears to behave as I would hope (it prints the error message, then terminates the program) - but if someone can spot a flaw, please let me know!
There are no provisions in C++ to test for the validity of a pointer as a general case. One can obviously assume that NULL (0x00000000) is bad, and various compilers and libraries like to use "special values" here and there to make debugging easier (For example, if I ever see a pointer show up as 0xCECECECE in visual studio I know I did something wrong) but the truth is that since a pointer is just an index into memory it's near impossible to tell just by looking at the pointer if it's the "right" index.
There are various tricks that you can do with dynamic_cast and RTTI such to ensure that the object pointed to is of the type that you want, but they all require that you are pointing to something valid in the first place.
If you want to ensure that you program can detect "invalid" pointers then my advice is this: Set every pointer you declare either to NULL or a valid address immediately upon creation and set it to NULL immediately after freeing the memory that it points to. If you are diligent about this practice, then checking for NULL is all you ever need.
Setting the pointer to NULL before and after using is a good technique. This is easy to do in C++ if you manage pointers within a class for example (a string):
class SomeClass
{
public:
SomeClass();
~SomeClass();
void SetText( const char *text);
char *GetText() const { return MyText; }
void Clear();
private:
char * MyText;
};
SomeClass::SomeClass()
{
MyText = NULL;
}
SomeClass::~SomeClass()
{
Clear();
}
void SomeClass::Clear()
{
if (MyText)
free( MyText);
MyText = NULL;
}
void SomeClass::Settext( const char *text)
{
Clear();
MyText = malloc( strlen(text));
if (MyText)
strcpy( MyText, text);
}
Indeed, something could be done under specific occasion: for example if you want to check whether a string pointer string is valid, using write(fd, buf, szie) syscall can help you do the magic: let fd be a file descriptor of temporary file you create for test, and buf pointing to the string you are tesing, if the pointer is invalid write() would return -1 and errno set to EFAULT which indicating that buf is outside your accessible address space.
Peeter Joos answer is pretty good. Here is an "official" way to do it:
#include <sys/mman.h>
#include <stdbool.h>
#include <unistd.h>
bool is_pointer_valid(void *p) {
/* get the page size */
size_t page_size = sysconf(_SC_PAGESIZE);
/* find the address of the page that contains p */
void *base = (void *)((((size_t)p) / page_size) * page_size);
/* call msync, if it returns non-zero, return false */
int ret = msync(base, page_size, MS_ASYNC) != -1;
return ret ? ret : errno != ENOMEM;
}
There isn't any portable way of doing this, and doing it for specific platforms can be anywhere between hard and impossible. In any case, you should never write code that depends on such a check - don't let the pointers take on invalid values in the first place.
As others have said, you can't reliably detect an invalid pointer. Consider some of the forms an invalid pointer might take:
You could have a null pointer. That's one you could easily check for and do something about.
You could have a pointer to somewhere outside of valid memory. What constitutes valid memory varies depending on how the run-time environment of your system sets up the address space. On Unix systems, it is usually a virtual address space starting at 0 and going to some large number of megabytes. On embedded systems, it could be quite small. It might not start at 0, in any case. If your app happens to be running in supervisor mode or the equivalent, then your pointer might reference a real address, which may or may not be backed up with real memory.
You could have a pointer to somewhere inside your valid memory, even inside your data segment, bss, stack or heap, but not pointing at a valid object. A variant of this is a pointer that used to point to a valid object, before something bad happened to the object. Bad things in this context include deallocation, memory corruption, or pointer corruption.
You could have a flat-out illegal pointer, such as a pointer with illegal alignment for the thing being referenced.
The problem gets even worse when you consider segment/offset based architectures and other odd pointer implementations. This sort of thing is normally hidden from the developer by good compilers and judicious use of types, but if you want to pierce the veil and try to outsmart the operating system and compiler developers, well, you can, but there is not one generic way to do it that will handle all of the issues you might run into.
The best thing you can do is allow the crash and put out some good diagnostic information.
In general, it's impossible to do. Here's one particularly nasty case:
struct Point2d {
int x;
int y;
};
struct Point3d {
int x;
int y;
int z;
};
void dump(Point3 *p)
{
printf("[%d %d %d]\n", p->x, p->y, p->z);
}
Point2d points[2] = { {0, 1}, {2, 3} };
Point3d *p3 = reinterpret_cast<Point3d *>(&points[0]);
dump(p3);
On many platforms, this will print out:
[0 1 2]
You're forcing the runtime system to incorrectly interpret bits of memory, but in this case it's not going to crash, because the bits all make sense. This is part of the design of the language (look at C-style polymorphism with struct inaddr, inaddr_in, inaddr_in6), so you can't reliably protect against it on any platform.
It's unbelievable how much misleading information you can read in articles above...
And even in microsoft msdn documentation IsBadPtr is claimed to be banned. Oh well - I prefer working application rather than crashing. Even if term working might be working incorrectly (as long as end-user can continue with application).
By googling I haven't found any useful example for windows - found a solution for 32-bit apps,
http://www.codeproject.com/script/Content/ViewAssociatedFile.aspx?rzp=%2FKB%2Fsystem%2Fdetect-driver%2F%2FDetectDriverSrc.zip&zep=DetectDriverSrc%2FDetectDriver%2Fsrc%2FdrvCppLib%2Frtti.cpp&obid=58895&obtid=2&ovid=2
but I need also to support 64-bit apps, so this solution did not work for me.
But I've harvested wine's source codes, and managed to cook similar kind of code which would work for 64-bit apps as well - attaching code here:
#include <typeinfo.h>
typedef void (*v_table_ptr)();
typedef struct _cpp_object
{
v_table_ptr* vtable;
} cpp_object;
#ifndef _WIN64
typedef struct _rtti_object_locator
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
const type_info *type_descriptor;
//const rtti_object_hierarchy *type_hierarchy;
} rtti_object_locator;
#else
typedef struct
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
unsigned int type_descriptor;
unsigned int type_hierarchy;
unsigned int object_locator;
} rtti_object_locator;
#endif
/* Get type info from an object (internal) */
static const rtti_object_locator* RTTI_GetObjectLocator(void* inptr)
{
cpp_object* cppobj = (cpp_object*) inptr;
const rtti_object_locator* obj_locator = 0;
if (!IsBadReadPtr(cppobj, sizeof(void*)) &&
!IsBadReadPtr(cppobj->vtable - 1, sizeof(void*)) &&
!IsBadReadPtr((void*)cppobj->vtable[-1], sizeof(rtti_object_locator)))
{
obj_locator = (rtti_object_locator*) cppobj->vtable[-1];
}
return obj_locator;
}
And following code can detect whether pointer is valid or not, you need probably to add some NULL checking:
CTest* t = new CTest();
//t = (CTest*) 0;
//t = (CTest*) 0x12345678;
const rtti_object_locator* ptr = RTTI_GetObjectLocator(t);
#ifdef _WIN64
char *base = ptr->signature == 0 ? (char*)RtlPcToFileHeader((void*)ptr, (void**)&base) : (char*)ptr - ptr->object_locator;
const type_info *td = (const type_info*)(base + ptr->type_descriptor);
#else
const type_info *td = ptr->type_descriptor;
#endif
const char* n =td->name();
This gets class name from pointer - I think it should be enough for your needs.
One thing which I'm still afraid is performance of pointer checking - in code snipet above there is already 3-4 API calls being made - might be overkill for time critical applications.
It would be good if someone could measure overhead of pointer checking compared for example to C#/managed c++ calls.
It is not a very good policy to accept arbitrary pointers as input parameters in a public API. It's better to have "plain data" types like an integer, a string or a struct (I mean a classical struct with plain data inside, of course; officially anything can be a struct).
Why? Well because as others say there is no standard way to know whether you've been given a valid pointer or one that points to junk.
But sometimes you don't have the choice - your API must accept a pointer.
In these cases, it is the duty of the caller to pass a good pointer. NULL may be accepted as a value, but not a pointer to junk.
Can you double-check in any way? Well, what I did in a case like that was to define an invariant for the type the pointer points to, and call it when you get it (in debug mode). At least if the invariant fails (or crashes) you know that you were passed a bad value.
// API that does not allow NULL
void PublicApiFunction1(Person* in_person)
{
assert(in_person != NULL);
assert(in_person->Invariant());
// Actual code...
}
// API that allows NULL
void PublicApiFunction2(Person* in_person)
{
assert(in_person == NULL || in_person->Invariant());
// Actual code (must keep in mind that in_person may be NULL)
}
Following does work in Windows (somebody suggested it before):
static void copy(void * target, const void* source, int size)
{
__try
{
CopyMemory(target, source, size);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
doSomething(--whatever--);
}
}
The function has to be static, standalone or static method of some class.
To test on read-only, copy data in the local buffer.
To test on write without modifying contents, write them over.
You can test first/last addresses only.
If pointer is invalid, control will be passed to 'doSomething',
and then outside the brackets.
Just do not use anything requiring destructors, like CString.
On Windows I use this code:
void * G_pPointer = NULL;
const char * G_szPointerName = NULL;
void CheckPointerIternal()
{
char cTest = *((char *)G_pPointer);
}
bool CheckPointerIternalExt()
{
bool bRet = false;
__try
{
CheckPointerIternal();
bRet = true;
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
}
return bRet;
}
void CheckPointer(void * A_pPointer, const char * A_szPointerName)
{
G_pPointer = A_pPointer;
G_szPointerName = A_szPointerName;
if (!CheckPointerIternalExt())
throw std::runtime_error("Invalid pointer " + std::string(G_szPointerName) + "!");
}
Usage:
unsigned long * pTest = (unsigned long *) 0x12345;
CheckPointer(pTest, "pTest"); //throws exception
On macOS, you can do this with mach_vm_region, which as well as telling you if a pointer is valid, also lets you validate what access you have to the memory to which the pointer points (read/write/execute). I provided sample code to do this in my answer to another question:
#include <mach/mach.h>
#include <mach/mach_vm.h>
#include <stdio.h>
#include <stdbool.h>
bool ptr_is_valid(void *ptr, vm_prot_t needs_access) {
vm_map_t task = mach_task_self();
mach_vm_address_t address = (mach_vm_address_t)ptr;
mach_vm_size_t size = 0;
vm_region_basic_info_data_64_t info;
mach_msg_type_number_t count = VM_REGION_BASIC_INFO_COUNT_64;
mach_port_t object_name;
kern_return_t ret = mach_vm_region(task, &address, &size, VM_REGION_BASIC_INFO_64, (vm_region_info_t)&info, &count, &object_name);
if (ret != KERN_SUCCESS) return false;
return ((mach_vm_address_t)ptr) >= address && ((info.protection & needs_access) == needs_access);
}
#define TEST(ptr,acc) printf("ptr_is_valid(%p,access=%d)=%d\n", (void*)(ptr), (acc), ptr_is_valid((void*)(ptr),(acc)))
int main(int argc, char**argv) {
TEST(0,0);
TEST(0,VM_PROT_READ);
TEST(123456789,VM_PROT_READ);
TEST(main,0);
TEST(main,VM_PROT_READ);
TEST(main,VM_PROT_READ|VM_PROT_EXECUTE);
TEST(main,VM_PROT_EXECUTE);
TEST(main,VM_PROT_WRITE);
TEST((void*)(-1),0);
return 0;
}
The SEI CERT C Coding Standard recommendation MEM10-C. Define and use a pointer validation function says it is possible to do a check to some degree, especially under Linux OS.
The method described in the link is to keep track of the highest memory address returned by malloc and add a function that tests if someone tries to use a pointer greater than that value. It is probably of limited use.
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
These take time proportional to the length of the block, so for sanity check I just check the starting address.
I have seen various libraries use some method to check for unreferenced memory and such. I believe they simply "override" the memory allocation and deallocation methods (malloc/free), which has some logic that keeps track of the pointers. I suppose this is overkill for your use case, but it would be one way to do it.
Technically you can override operator new (and delete) and collect information about all allocated memory, so you can have a method to check if heap memory is valid.
but:
you still need a way to check if pointer is allocated on stack ()
you will need to define what is 'valid' pointer:
a) memory on that address is
allocated
b) memory at that address
is start address of object (e.g.
address not in the middle of huge
array)
c) memory at that address
is start address of object of expected type
Bottom line: approach in question is not C++ way, you need to define some rules which ensure that function receives valid pointers.
There is no way to make that check in C++. What should you do if other code passes you an invalid pointer? You should crash. Why? Check out this link: http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx
Addendum to the accpeted answer(s):
Assume that your pointer could hold only three values -- 0, 1 and -1 where 1 signifies a valid pointer, -1 an invalid one and 0 another invalid one. What is the probability that your pointer is NULL, all values being equally likely? 1/3. Now, take the valid case out, so for every invalid case, you have a 50:50 ratio to catch all errors. Looks good right? Scale this for a 4-byte pointer. There are 2^32 or 4294967294 possible values. Of these, only ONE value is correct, one is NULL, and you are still left with 4294967292 other invalid cases. Recalculate: you have a test for 1 out of (4294967292+ 1) invalid cases. A probability of 2.xe-10 or 0 for most practical purposes. Such is the futility of the NULL check.
You know, a new driver (at least on Linux) that is capable of this probably wouldn't be that hard to write.
On the other hand, it would be folly to build your programs like this. Unless you have some really specific and single use for such a thing, I wouldn't recommend it. If you built a large application loaded with constant pointer validity checks it would likely be horrendously slow.
you should avoid these methods because they do not work. blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx – JaredPar Feb 15 '09 at 16:02
If they don't work - next windows update will fix it ?
If they don't work on concept level - function will be probably removed from windows api completely.
MSDN documentation claim that they are banned, and reason for this is probably flaw of further design of application (e.g. generally you should not eat invalid pointers silently - if you're in charge of design of whole application of course), and performance/time of pointer checking.
But you should not claim that they does not work because of some blog.
In my test application I've verified that they do work.
these links may be helpful
_CrtIsValidPointer
Verifies that a specified memory range is valid for reading and writing (debug version only).
http://msdn.microsoft.com/en-us/library/0w1ekd5e.aspx
_CrtCheckMemory
Confirms the integrity of the memory blocks allocated in the debug heap (debug version only).
http://msdn.microsoft.com/en-us/library/e73x0s4b.aspx
Currently, I write
assert(false);
at places that my code is never supposed to reach. One example, in a very C-ish style, is:
int findzero( int length, int * array ) {
for( int i = 0; i < length; i++ )
if( array[i] == 0 )
return i;
assert(false);
}
My compiler recognizes that the program finishes once assert(false) has been reached. However, whenever I compile with -DNDEBUG for performance reasons, the last assertion vanishes and the compiler warns that the execution finishes the function without a return statement.
What are better alternatives of finishing off a program if a supposedly unreachable part of the code has been reached? The solution should
be recognized by the compiler and not produce warnings (like the ones above or others)
perhaps even allow for a custom error message.
I am explicitly interested in solutions no matter whether it's modern C++ or like 90s C.
Replacing your assert(false) is exactly what "unreachable" built-ins are for.
They are a semantic equivalent to your use of assert(false). In fact, VS's is spelt very similarly.
GCC/Clang/Intel:
__builtin_unreachable()
MSVS:
__assume(false)
These have effect regardless of NDEBUG (unlike assert) or optimisation levels.
Your compiler, particularly with the above built-ins but also possibly with your assert(false), nods its head in understanding that you're promising that part of the function will never be reached. It can use this to perform some optimisations on certain code paths, and it will silence warnings about missing returns because you've already promised that it was deliberate.
The trade-off is that the statement itself has undefined behaviour (much like going forth and flowing off the end of the function was already). In some situations, you may instead wish to consider throwing an exception (or returning some "error code" value instead), or calling std::abort() (in C++) if you want to just terminate the program.
There's a proposal (P0627R0), to add this to C++ as a standard attribute.
From the GCC docs on Builtins:
If control flow reaches the point of the __builtin_unreachable, the program is undefined. It is useful in situations where the compiler cannot deduce the unreachability of the code. [..]
I like to use
assert(!"This should never happen.");
...which can also be used with a condition, as in
assert(!vector.empty() || !"Cannot take element from empty container." );
What's nice about this is that the string shows up in the error message in case an assertion does not hold.
As a fully portable solution, consider this:
[[ noreturn ]] void unreachable(std::string_view msg = "<No Message>") {
std::cerr << "Unreachable code reached. Message: " << msg << std::endl;
std::abort();
}
The message part is, of course, optional.
Looks like std::unreachable() made it to C++23:
https://en.cppreference.com/w/cpp/utility/unreachable
I use a custom assert that turns into __builtin_unreachable() or *(char*)0=0 when NDEBUG is on (I also use an enum variable instead of a macro so that I can easily set NDEBUG per scope).
In pseudocode, it's something like:
#define my_assert(X) do{ \
if(!(X)){ \
if (my_ndebug) MY_UNREACHABLE(); \
else my_assert_fail(__FILE__,__LINE__,#X); \
} \
}while(0)
The __builtin_unreachable() should eliminate the warning and help with optimization at the same time, but in debug mode, it's better to have an assert or an abort(); there so you get a reliable panic. (__builtin_unreachable() just gives you undefined behavior when reached).
I recommend C++ Core Gudelines's Expects and Ensures. They can be configured to abort (default), throw, or do nothing on violation.
To suppress compiler warnings on unreachable branches you can also use GSL_ASSUME.
#include <gsl/gsl>
int findzero( int length, int * array ) {
Expects(length >= 0);
Expects(array != nullptr);
for( int i = 0; i < length; i++ )
if( array[i] == 0 )
return i;
Expects(false);
// or
// GSL_ASSUME(false);
}
assert is meant for scenarios that are ACTUALLY supposed to be impossible to happen during execution. It is useful in debugging to point out "Hey, turns out what you thought to be impossible is, in fact, not impossible." It looks like what you should be doing in the given example is expressing the function's failure, perhaps by returning -1 as that would not be a valid index. In some instances, it might be useful to set errno to clarify the exact nature of an error. Then, with this information, the calling function can decide how to handle such error.
Depending on how critical this error is to the rest of the application, you might try to recover from it, or you might just log the error and call exit to put it out of its misery.
I believe the reason you are getting the errors is because assertions are generally used for debugging on your own code. When these functions are run in release, exceptions should be used instead with an exit by std::abort() to indicate abnormal program termination.
If you still want to use asserts, there is an answer about defining a custom one by PSkocik, as well as a link here where someone proposes the use of custom asserts and how to enable them in cmake here as well.
One rule that is sometimes found in style-guides is
"Never return from the middle of a function"
All functions should have a single return, at the end of the function.
Following this rule, your code would look like:
int findzero( int length, int * array ) {
int i;
for( i = 0; i < length; i++ )
{
if( array[i] == 0 )
break; // Break the loop now that i is the correct value
}
assert(i < length); // Assert that a valid index was found.
return i; // Return the value found, or "length" if not found!
}
First of all: I know that most optimization bugs are due to programming errors or relying on facts which may change depending on optimization settings (floating point values, multithreading issues, ...).
However I experienced a very hard to find bug and am somewhat unsure if there is any way to prevent these kind of errors from happening without turning the optimization off. Am I missing something? Could this really be an optimizer bug? Here's a simplified example:
struct Data {
int a;
int b;
double c;
};
struct Test {
void optimizeMe();
Data m_data;
};
void Test::optimizeMe() {
Data * pData; // Note that this pointer is not initialized!
bool first = true;
for (int i = 0; i < 3; ++i) {
if (first) {
first = false;
pData = &m_data;
pData->a = i * 10;
pData->b = i * pData->a;
pData->c = pData->b / 2;
} else {
pData->a = ++i;
} // end if
} // end for
};
int main(int argc, char *argv[]) {
Test test;
test.optimizeMe();
return 0;
}
The real program of course has a lot more to do than this. But it all boils down to the fact that instead of accessing m_data directly, a (previously unitialized) pointer is being used. As soon as I add enough statements to the if (first)-part, the optimizer seems to change the code to something along these lines:
if (first) {
first = false;
// pData-assignment has been removed!
m_data.a = i * 10;
m_data.b = i * m_data.a;
m_data.c = m_data.b / m_data.a;
} else {
pData->a = ++i; // This will crash - pData is not set yet.
} // end if
As you can see, it replaces the unnecessary pointer dereference with a direct write to the member struct. However it does not do this in the else-branch. It also removes the pData-assignment. Since the pointer is now still unitialized, the program will crash in the else-branch.
Of course there are various things which could be improved here, so you might blame it on the programmer:
Forget about the pointer and do what the optimizer does - use m_data directly.
Initialize pData to nullptr - that way the optimizer knows that the else-branch will fail if the pointer is never assigned. At least it seems to solve the problem in my test-environment.
Move the pointer assignment in front of the loop (effectively initializing pData with &m_data, which then could also be a reference instead of a pointer (for good measure). This makes sense because pData is needed in all cases so there is no reason to do this inside the loop.
The code is obviously smelly, to say the least, and I'm not trying to "blame" the optimizer for doing this. But I'm asking: What am I doing wrong? The program might be ugly, but it's valid code...
I should add that I'm using VS2012 with C++/CLI and v110_xp-Toolset. Optimization is set to /O2. Please also note that if you really want to reproduce the problem (that's not really the point of this question though) you need to play around with the complexity of the program. This is a very simplified example and the optimizer sometimes doesn't remove the pointer assignment. Hiding &m_data behind a function seems to "help".
EDIT:
Q: How do I know that the compiler is optimizing it to something like the example provided?
A: I'm not very good at reading assembler, I have looked at it however and have made 3 observations which make me believe that it's behaving this way:
As soon as optimization kicks in (adding more assignments usually does the trick) the pointer assignment has no associated assembler statement. It also hasn't been moved up to the declaration, so it's really left uninitialized it seems (at least to me).
In cases where the program crashes, the debugger skips the assignment statement. In cases where the program runs without problems, the debugger stops there.
If I watch the content of pData and the content of m_data while debugging, it clearly shows that all assignments in the if-branch have an effect on m_data and m_data receives the correct values. The pointer itself it still pointing to the same uninitialized value it had from the beginning. Therefore I have to assume that it is in fact not using the pointer to make the assignments at all.
Q: Does it have to do anything with i (Loop unrolling)?
A: No, the actual program actually uses do { ... } while() to loop over a SQL SELECT-resultset so the iteration count is completely runtime-specific and cannot be predetermined by the compiler.
It sure looks like an bug to me. It's fine for the optimizer to eliminate the unnecessary redirection, but it should not eliminate the assignment to pData.
Of course, you can work around the problem by assigning to pData before the loop (at least in this simple example). I gather that the problem in your actual code isn't as easily resolved.
I also vote for an optimizer bug if it is really reproducible in this example. To overrule the optimizer you could try to declare pData as volatile.
This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
It seems that
if (x=y) { .... }
instead of
if (x==y) { ... }
is a root of many evils.
Why don't all compilers mark it as error instead of a configurable warning?
I'm interested in finding out cases where the construct if (x=y) is useful.
One useful construct is for example:
char *pBuffer;
if (pBuffer = malloc(100))
{
// Continue to work here
}
As mentioned before, and downvoted several times now, I might add this is not specially good style, but I have seen it often enough to say it's useful. I've also seen this with new, but it makes more pain in my chest.
Another example, and less controversial, might be:
while (pointer = getNextElement(context))
{
// Go for it. Use the pointer to the new segment of data.
}
which implies that the function getNextElement() returns NULL when there is no next element so that the loop is exited.
Most of the time, compilers try very hard to remain backward compatible.
Changing their behavior in this matter to throw errors will break existing legitimate code, and even starting to throw warnings about it will cause problems with automatic systems that keep track of code by automatically compiling it and checking for errors and warnings.
This is an evil we're pretty much stuck with at the moment, but there are ways to circumvent and reduce the dangers of it.
Example:
void *ptr = calloc(1, sizeof(array));
if (NULL = ptr) {
// Some error
}
This causes a compilation error.
Simple answer: An assignment operation, such as x=y, has a value, which is the same as the newly assigned value in x. You can use this directly in a comparison, so instead of
x = y; if (x) ...
you can write
if (x = y) ...
It is less code to write (and read), which is sometimes a good thing, but nowadays most people agree that it should be written in some other way to increase readability. For example, like this:
if ((x = y) != 0) ...
Here is a realistic example. Assume you want to allocate some memory with malloc, and see if it worked. It can be written step by step like this:
p = malloc(4711); if (p != NULL) printf("Ok!");
The comparison to NULL is redundant, so you can rewrite it like this:
p = malloc(4711); if (p) printf("Ok!");
But since the assignment operation has a value, which can be used, you could put the entire assignment in the if condition:
if (p = malloc(4711)) printf("Ok!");
This does the same thing, but it is more concise.
Because it's not illegal (in C or C++ anyway) and sometimes useful...
if ( (x = read(blah)) > 0)
{
// now you know how many bits/bytes/whatever were read
// and can use that info. Esp. if you know, say 30 bytes
// are coming but only got 10
}
Most compilers kick up a real stink if you don't put parenthesis around the assignment anyway, which I like.
About the valid uses of if(i = 0)
The problem is that you're taking the problem upside down. The "if" notation is not about comparing two values like in some other languages.
The C/C++ "if" instruction waits for any expression that will evaluate to either a boolean, or a null/non-null value. This expression can include two values comparison, and/or can be much more complex.
For example, you can have:
if(i >> 3)
{
std::cout << "i is less than 8" << std::endl
}
Which proves that, in C/C++, the if expression is not limited to == and =. Anything will do, as long as it can be evaluated as true or false (C++), or zero non-zero (C/C++).
Another C++ valid use:
if(MyObject * pObject = dynamic_cast<MyInterface *>(pInterface))
{
pObject->doSomething();
}
And these are simple uses of the if expression (note that this can be used, too, in the for loop declaration line). More complex uses do exist.
About advanced uses of if(i = 0) in C++ (Quoted from myself)
After discovering a duplicate of this question at In which case is if(a=b) a good idea?, I decided to complete this answer with an additional bonus, that is, variable injection into a scope, which is possible in C++, because if will evaluate its expression, including a variable declaration, instead of limiting itself to compare two operands like it is done in other languages:
So, quoting from myself:
Another use would be to use what is called C++ variable injection. In Java, there is this cool keyword:
synchronized(p)
{
// Now, the Java code is synchronized using p as a mutex
}
In C++, you can do it, too. I don't have the exact code in mind (nor the exact Dr. Dobb's Journal's article where I discovered it), but this simple define should be enough for demonstration purposes:
#define synchronized(lock) \
if (auto_lock lock_##__LINE__(lock))
synchronized(p)
{
// Now, the C++ code is synchronized using p as a mutex
}
This is the same way, mixing injection with an if and for declaration. You can declare a primitive foreach macro (if you want an industrial-strength foreach, use Boost's).
See the following articles for a less naive, more complete and more robust implementation:
FOR_EACH and LOCK
Exception Safety Analysis
Concurrent Access Control & C++
How many errors of this kind really happens?
Rarely. In fact, I have yet to remember one, and I have been a professional for the past 8 years.
I guess it happened, but then, in 8 years, I did produce a sizeable quantity of bugs. It's just that this kind of bugs did not happen enough to have me remember them in frustration.
In C, you'll have more bugs because of buffer overruns, like:
void doSomething(char * p)
{
strcpy(p, "Hello, World! How are you \?\n");
}
void doSomethingElse()
{
char buffer[16];
doSomething(buffer);
}
In fact, Microsoft was burned so hard because of that they added a warning in Visual C++ 2008 deprecating strcpy!
How can you avoid most errors?
The very first "protection" against this error is to "turn around" the expression: As you can't assign a value to a constant, this:
if(0 = p) // ERROR: It should have been if(0 == p). IT WON'T COMPILE!
It won't compile.
But I find this quite a poor solution, because it tries to hide behind a style what should be a general programming practice, that is: Any variable that is not supposed to change should be constant.
For example, instead of:
void doSomething(char * p)
{
if(p == NULL) // POSSIBLE TYPO ERROR
return;
size_t length = strlen(p);
if(length == 0) // POSSIBLE TYPO ERROR
printf("\"%s\" length is %i\n", p, length);
else
printf("the string is empty\n");
}
Trying to "const" as many variables as possible will make you avoid most typo errors, including those not inside "if" expressions:
void doSomething(const char * const p) // CONST ADDED HERE
{
if(p == NULL) // NO TYPO POSSIBLE
return;
const size_t length = strlen(p); // CONST ADDED HERE
if(length == 0) // NO TYPO POSSIBLE
printf("\"%s\" length is %i\n", p, length);
else
printf("the string is empty\n");
}
Of course, it is not always possible (as some variables do need to change), but I found than most of the variables I use are constants (I keep initializing them once, and then, only reading them).
Conclusion
Usually, I see code using the if(0 == p) notation, but without the const-notation.
To me, it's like having a trash can for recyclables, and another for non-recyclable, and then in the end, throw them together in the same container.
So, do not parrot an easy style habit hoping it will make your code a lot better. It won't. Use the language constructs as much as possible, which means, in this case, using both the if(0 == p) notation when available, and using of the const keyword as much as possible.
The 'if(0 = x)' idiom is next to useless because it doesn't help when both sides are variables ('if(x = y)') and most (all?) of the time you should be using constant variables rather than magic numbers.
Two other reasons I never use this idiom, IMHO it makes code less readable and to be honest I find the single '='to be the root of very little evil. If you test your code thouroughly (which we all do, obviously) this sort of syntax error turns up very quickly.
Standard C idiom for iterating:
list_elem* curr;
while ( (curr = next_item(list)) != null ) {
/* ... */
}
Many compilers will detect this and warn you, but only if you set the warning level high enough.
For example:
~> gcc -c -Wall foo.c
foo.c: In function ‘foo’:
foo.c:5: warning: suggest parentheses around assignment used as truth value
Is this really such a common error? I learned about it when I learned C myself, and as a teacher I have occasionally warned my students and told them that it is a common error, but I have rarely seen it in real code, even from beginners. Certainly not more often than other operator mistakes, such as for example writing "&&" instead of "||".
So the reason that compilers don't mark it as an error (except for it being perfectly valid code) is perhaps that it isn't the root of very many evils.
The assignment as conditional is legal C and C++, and any compiler that doesn't permit it isn't a real C or C++ compiler. I would hope that any modern language not designed to be explicitly compatible with C (as C++ was) would consider it an error.
There are cases where this allows concise expressions, such as the idiomatic while (*dest++ = *src++); to copy a string in C, but overall it's not very useful, and I consider it a mistake in language design. It is, in my experience, easy to make this mistake, and hard to spot when the compiler doesn't issue a warning.
I think the C and C++ language designers noticed there is no real use in forbidding it because
Compilers can warn about it if they want anyway
Disallowing it would add special cases to the language, and would remove a possible feature.
There isn't complexity involved in allowing it. C++ just says that an expression implicitly convertible to bool is required. In C, there are useful cases detailed by other answers. In C++, they go one step further and allowed this one in addition:
if(type * t = get_pointer()) {
// ....
}
Which actually limits the scope of t to only the if and its bodies.
It depends on the language. Java flags it as an error as only Boolean expressions can be used inside the if parenthesis (and unless the two variables are Boolean, in which case the assignment is also a Boolean).
In C, it is a quite common idiom for testing pointers returned by malloc or if after a fork we are in the parent or child process:
if ( x = (X*) malloc( sizeof(X) ) {
// 'malloc' worked, pointer != 0
if ( pid = fork() ) {
// Parent process as pid != 0
C/C++ compilers will warn with a high enough warning level if you ask for it, but it cannot be considered an error as the language allows it. Unless, then again, you ask the compiler to treat warnings as errors.
Whenever comparing with constants, some authors suggest using the test constant == variable so that the compiler will detect if the user forgets the second equality sign.
if ( 0 == variable ) {
// The compiler will complaint if you mistakenly
// write =, as you cannot assign to a constant
Anyway, you should try to compile with the highest possible warning settings.
Try viewing
if( life_is_good() )
enjoy_yourself();
as
if( tmp = life_is_good() )
enjoy_yourself();
Part of it has to do with personal style and habits. I am agnostic to reading either if (kConst == x) or if (x == kConst). I don't use the constant on the left because historically I don't make that error and I write code as I would say it or would like to read it. I see this as a personal decision as part of a personal responsibility to being a self-aware, improving engineer. For example, I started analyzing the types of bugs that I was creating and started to re-engineer my habits so as not to make them - similar to constant on the left, just with other things.
That said, compiler warnings, historically, are pretty crappy and even though this problem has been well known for years, I didn't see it in a production compiler until the late 80's. I also found that working on projects that were portable helped clean up my C a great deal, as different compilers and different tastes (ie, warnings) and different subtle semantic differences.
I, personally, consider this the most useful example.
Say that you have a function read() that returns the number of bytes read, and you need to use this in a loop. It's a lot simpler to use
while((count = read(foo)) > 0) {
//Do stuff
}
than to try and get the assignment out of the loop head, which would result in things like
while(1) {
count = read(foo);
if(!(count > 0))
break;
//...
}
or
count = read(foo);
while(count > 0) {
//...
count = read(foo);
}
The first construct feels awkward, and the second repeats code in an unpleasant way.
Unless, of course, I've missed some brilliant idiom for this...
There are a lot of great uses of the assignment operator in a conditional statement, and it'd be a royal pain in the ass to see warnings about each one all the time. What would be nice would be a function in your IDE that let you highlight all the places where assignment has been used instead of an equality check - or - after you write something like this:
if (x = y) {
then that line blinks a couple of times. Enough to let you know that you've done something not exactly standard, but not so much that it's annoying.
if ((k==1) || (k==2)) is a conditional
if ((k=1) || (k=2) ) is BOTH a conditional AND an assignment statement
Here's the explanation
Like most languages, C works inner-most to outermost in order by operator precedence.
First, it tries to set k to 1, and succeeds.
Result: k = 1 and Boolean = 'true'
Next: it sets k to 2, and succeeds.
Result: k = 2 and Boolean = 'true'
Next: it evaluates (true || true)
Result: k still = 2, and Boolean = true
Finally, it then resolves the conditional: If (true)
Result: k = 2 and the program takes the first branch.
In nearly 30 years of programming I have not seen a valid reason for using this construct, though if one exists it probably has to do with a need to deliberately obfuscate your code.
When one of our new people has a problem, this is one of the things I look for, right along with not sticking a terminator on a string, and copying a debug statement from one place to another and not changing the '%i to '%s' to match the new field they are dumping.
This is fairly common in our shop because we constantly switch between C and Oracle PL/SQL; if( k = 1) is the correct syntax in PL/SQL.
It is very common with "low level" loop constructs in C/C++, such as with copies:
void my_strcpy(char *dst, const char *src)
{
while((*dst++ = *src++) != '\0') { // Note the use of extra parentheses, and the explicit compare.
/* DO NOTHING */
}
}
Of course, assignments are very common with for loops:
int i;
for(i = 0; i < 42; ++i) {
printf("%d\n", i);
}
I do believe it is easier to read assignments when they are outside of if statements:
char *newstring = malloc(strlen(src) * sizeof(char));
if(newstring == NULL) {
fprintf(stderr, "Out of memory, d00d! Bailing!\n");
exit(2);
}
// Versus:
if((newstring = malloc(strlen(src) * sizeof(char))) == NULL) // ew...
Make sure the assignment is obvious, thuogh (as with the first two examples). Don't hide it.
As for accidental uses ... that doesn't happen to me much. A common safeguard is to put your variable (lvalues) on the right hand side of the comparison, but that doesn't work well with things like:
if(*src == *dst)
because both oprands to == are lvalues!
As for compilers ... who can blame 'em? Writing compilers is difficult, and you should be writing perfect programs for the compiler anyway (remember GIGO?). Some compilers (the most well-known for sure) provide built-in lint-style checking, but that certainly isn't required. Some browsers don't validate every byte of HTML and Javascript it's thrown, so why would compilers?
There are several tactics to help spot this .. one is ugly, the other is typically a macro. It really depends on how you read your spoken language (left to right, right to left).
For instance:
if ((fp = fopen("foo.txt", "r") == NULL))
Vs:
if (NULL == (fp = fopen(...)))
Sometimes it can be easier to read/write (first) what your testing for, which makes it easier to spot an assignment vs a test. Then bring in most comp.lang.c folks that hate this style with a passion.
So, we bring in assert():
#include <assert.h>
...
fp = fopen("foo.txt", "r");
assert(fp != NULL);
...
when your at the midst, or end of a convoluted set of conditionals, assert() is your friend. In this case, if FP == NULL, an abort() is raised and the line/file of the offending code is conveyed.
So if you oops:
if (i = foo)
insted of
if (i == foo)
followed by
assert (i > foo + 1)
... you'll quickly spot such mistakes.
Hope this helps :)
In short, reversing arguments sometimes helps when debugging .. assert() is your long life friend and can be turned off in compiler flags in production releases.
As pointed out in other answers, there are cases where using assignment within a condition offers a brief-but-readable piece of code that does what you want. Also, a lot of up-to-date compilers will warn you if they see an assignment where they expect a condition. (If you're a fan of the zero-warnings approach to development, you'll have seen these.)
One habit I've developed that keeps me from getting bitten by this (at least in C-ish languages) is that if one of the two values I'm comparing is a constant (or otherwise not a legal lvalue), I put it on the left-hand side of the comparator: if (5 == x) { whatever(); } Then, if I should accidentally type if (5 = x), the code won't compile.
You asked why it was useful, but keep questioning examples people are providing. It's useful because it's concise.
Yes, all the examples which use it can be rewritten - as longer pieces of code.
I have only had this typo once in my 15 years of development. I would not say it is on the top of my list of things to look out for. I also avoid that construct anyway.
Note also that some compilers (the one I use) issue a warning on that code. Warnings can be treated as errors for any compiler worth its salt. They can also be ignored.
Placing the constant on the left side of a comparison is defensive programming. Sure you would never make the silly mistake of forgetting that extra '=', but who knows about the other guy.
The D programming language does flag this as an error. To avoid the problem with wanting to use the value later, it allows declarations sort of like C++ allows with for loops.
if(int i = some_fn())
{
another_fn(i);
}
The compiler won't flag it as an error because it is valid C/C++. But what you can do (at least with Visual C++) is turn up the warning level so that it flags it as a warning and then tell the compiler to treat warnings as errors. This is a good practice anyway so that developers don't ignore warnings.
If you had actually meant = instead of == then you need to be more explicit about it. E.g.,
if ((x = y) != 0)
Theoretically, you're supposed to be able to do this:
if ((x = y))
to override the warning, but that doesn't seem to always work.
In practice I don't do it, but a good tip is to do:
if ( true == $x )
In the case that you leave out an equals, assigning $x to true will obviously return an error.
RegEx sample
RegEx r;
if(((r = new RegEx("\w*)).IsMatch()) {
// ... do something here
}
else if((r = new RegEx("\d*")).IsMatch()) {
// ... do something here
}
Assign a value test
int i = 0;
if((i = 1) == 1) {
// 1 is equal to i that was assigned to a int value 1
}
else {
// ?
}
That's why it's better to write:
0 == CurrentItem
Instead of:
CurrentItem == 0
so that the compiler warns you if you type = instead of ==.