C/C++ Finding pointers - c++

I've recently completed a basic memory scanner, and like all people that have ever done this... I've realised that scanning for values every time a program starts is a pain!
It would be useful to work out the base pointers for these values so that they can be hard coded into a application. To work out how to do this I created this short console app using visual c++ for my "target".
int main(int argc, char *argv[], char *envp[])
{
unsigned int * Pointer1;
Pointer1 = (unsigned int *)malloc(sizeof(unsigned int));
char s[20];
while(1)
{
std::cout<<"(Pid:"<<GetCurrentProcessId()<<") Please enter a value to be stored in memory (type 'q' to exit): \n";
std::cin>>s;
std::cout<<"\n";
if(s[0] =='q')
{
break;
}
*Pointer1 = str2int(s);
std::cout<<"Value:"<<*Pointer1<<" addr:0x"<<Pointer1<<"\n";
}
free(Pointer1);
return 0;
}
I then used this app in accordance with my memory scanner to work out the addresses of the pointer and the memory address the pointer points to. I did this a few times to collate the bellow data. (Paste using '|' as delimiter into excel)
pid |P Hex|P Val|PHx to Dec|M Hex|M Val|MHx To Dec
7028|002afcfc|216032|2817276|00034BE0|10|216032
6312|0032fa70|4541408|3340912|00454BE0|20|4541408
1512|0043fb1c|3242304|4455196|00317940|30|3242304
8140|0036f9fc|1997096|3602940|001E7928|30|1997096
However I cant work out a reliable way of calculating what memory address the pointer will be, what methods/calculations could I perform to do this? I'm aware that the more life like example of pointer chains exists however this seemed like a good starting point before making it harder.

The memory address is whatever malloc returns. There is no reliable way of calculating what it will return. It all depends on how your malloc library works, what virtual address ranges it is given by the OS (via sbrk or mmap or others), etc.

You are asking how to predict the address that will be returned from malloc.
It doesn't matter.
If you could predict what would be returned from malloc, malloc wouldn't need to return anything.
In practice, it never matters what address is returned from malloc, other than NULL. A call to malloc implies that you need a number of bytes of memory. And malloc will give you that. You got what you needed.
Conversely, if you could determine what addresses would be returned from malloc, you would never need to call malloc. You could just use the addresses you predicted.
Put simply, malloc exists because it is not trivial to predict what memory might be available.

Related

How would I validate the address being pointed to is of a type that I want? [duplicate]

Is there any way to determine (programatically, of course) if a given pointer is "valid"? Checking for NULL is easy, but what about things like 0x00001234? When trying to dereference this kind of pointer an exception/crash occurs.
A cross-platform method is preferred, but platform-specific (for Windows and Linux) is also ok.
Update for clarification:
The problem is not with stale/freed/uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
Update for clarification: The problem is not with stale, freed or uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
You can't make that check. There is simply no way you can check whether a pointer is "valid". You have to trust that when people use a function that takes a pointer, those people know what they are doing. If they pass you 0x4211 as a pointer value, then you have to trust it points to address 0x4211. And if they "accidentally" hit an object, then even if you would use some scary operation system function (IsValidPtr or whatever), you would still slip into a bug and not fail fast.
Start using null pointers for signaling this kind of thing and tell the user of your library that they should not use pointers if they tend to accidentally pass invalid pointers, seriously :)
Here are three easy ways for a C program under Linux to get introspective about the status of the memory in which it is running, and why the question has appropriate sophisticated answers in some contexts.
After calling getpagesize() and rounding the pointer to a page
boundary, you can call mincore() to find out if a page is valid and
if it happens to be part of the process working set. Note that this requires
some kernel resources, so you should benchmark it and determine if
calling this function is really appropriate in your api. If your api
is going to be handling interrupts, or reading from serial ports
into memory, it is appropriate to call this to avoid unpredictable
behaviors.
After calling stat() to determine if there is a /proc/self directory available, you can fopen and read through /proc/self/maps
to find information about the region in which a pointer resides.
Study the man page for proc, the process information pseudo-file
system. Obviously this is relatively expensive, but you might be
able to get away with caching the result of the parse into an array
you can efficiently lookup using a binary search. Also consider the
/proc/self/smaps. If your api is for high-performance computing then
the program will want to know about the /proc/self/numa which is
documented under the man page for numa, the non-uniform memory
architecture.
The get_mempolicy(MPOL_F_ADDR) call is appropriate for high performance computing api work where there are multiple threads of
execution and you are managing your work to have affinity for non-uniform memory
as it relates to the cpu cores and socket resources. Such an api
will of course also tell you if a pointer is valid.
Under Microsoft Windows there is the function QueryWorkingSetEx that is documented under the Process Status API (also in the NUMA API).
As a corollary to sophisticated NUMA API programming this function will also let you do simple "testing pointers for validity (C/C++)" work, as such it is unlikely to be deprecated for at least 15 years.
Preventing a crash caused by the caller sending in an invalid pointer is a good way to make silent bugs that are hard to find.
Isn't it better for the programmer using your API to get a clear message that his code is bogus by crashing it rather than hiding it?
On Win32/64 there is a way to do this. Attempt to read the pointer and catch the resulting SEH exeception that will be thrown on failure. If it doesn't throw, then it's a valid pointer.
The problem with this method though is that it just returns whether or not you can read data from the pointer. It makes no guarantee about type safety or any number of other invariants. In general this method is good for little else other than to say "yes, I can read that particular place in memory at a time that has now passed".
In short, Don't do this ;)
Raymond Chen has a blog post on this subject: http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
AFAIK there is no way. You should try to avoid this situation by always setting pointers to NULL after freeing memory.
On Unix you should be able to utilize a kernel syscall that does pointer checking and returns EFAULT, such as:
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdbool.h>
bool isPointerBad( void * p )
{
int fh = open( p, 0, 0 );
int e = errno;
if ( -1 == fh && e == EFAULT )
{
printf( "bad pointer: %p\n", p );
return true;
}
else if ( fh != -1 )
{
close( fh );
}
printf( "good pointer: %p\n", p );
return false;
}
int main()
{
int good = 4;
isPointerBad( (void *)3 );
isPointerBad( &good );
isPointerBad( "/tmp/blah" );
return 0;
}
returning:
bad pointer: 0x3
good pointer: 0x7fff375fd49c
good pointer: 0x400793
There's probably a better syscall to use than open() [perhaps access], since there's a chance that this could lead to actual file creation codepath, and a subsequent close requirement.
Regarding the answer a bit up in this thread:
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
My advice is to stay away from them, someone has already posted this one:
http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
Another post on the same topic and by the same author (I think) is this one:
http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx ("IsBadXxxPtr should really be called CrashProgramRandomly").
If the users of your API sends in bad data, let it crash. If the problem is that the data passed isn't used until later (and that makes it harder to find the cause), add a debug mode where the strings etc. are logged at entry. If they are bad it will be obvious (and probably crash). If it is happening way to often, it might be worth moving your API out of process and let them crash the API process instead of the main process.
Firstly, I don't see any point in trying to protect yourself from the caller deliberately trying to cause a crash. They could easily do this by trying to access through an invalid pointer themselves. There are many other ways - they could just overwrite your memory or the stack. If you need to protect against this sort of thing then you need to be running in a separate process using sockets or some other IPC for communication.
We write quite a lot of software that allows partners/customers/users to extend functionality. Inevitably any bug gets reported to us first so it is useful to be able to easily show that the problem is in the plug-in code. Additionally there are security concerns and some users are more trusted than others.
We use a number of different methods depending on performance/throughput requirements and trustworthyness. From most preferred:
separate processes using sockets (often passing data as text).
separate processes using shared memory (if large amounts of data to pass).
same process separate threads via message queue (if frequent short messages).
same process separate threads all passed data allocated from a memory pool.
same process via direct procedure call - all passed data allocated from a memory pool.
We try never to resort to what you are trying to do when dealing with third party software - especially when we are given the plug-ins/library as binary rather than source code.
Use of a memory pool is quite easy in most circumstances and needn't be inefficient. If YOU allocate the data in the first place then it is trivial to check the pointers against the values you allocated. You could also store the length allocated and add "magic" values before and after the data to check for valid data type and data overruns.
I've got a lot of sympathy with your question, as I'm in an almost identical position myself. I appreciate what a lot of the replies are saying, and they are correct - the routine supplying the pointer should be providing a valid pointer. In my case, it is almost inconceivable that they could have corrupted the pointer - but if they had managed, it would be MY software that crashes, and ME that would get the blame :-(
My requirement isn't that I continue after a segmentation fault - that would be dangerous - I just want to report what happened to the customer before terminating so that they can fix their code rather than blaming me!
This is how I've found to do it (on Windows): http://www.cplusplus.com/reference/clibrary/csignal/signal/
To give a synopsis:
#include <signal.h>
using namespace std;
void terminate(int param)
/// Function executed if a segmentation fault is encountered during the cast to an instance.
{
cerr << "\nThe function received a corrupted reference - please check the user-supplied dll.\n";
cerr << "Terminating program...\n";
exit(1);
}
...
void MyFunction()
{
void (*previous_sigsegv_function)(int);
previous_sigsegv_function = signal(SIGSEGV, terminate);
<-- insert risky stuff here -->
signal(SIGSEGV, previous_sigsegv_function);
}
Now this appears to behave as I would hope (it prints the error message, then terminates the program) - but if someone can spot a flaw, please let me know!
There are no provisions in C++ to test for the validity of a pointer as a general case. One can obviously assume that NULL (0x00000000) is bad, and various compilers and libraries like to use "special values" here and there to make debugging easier (For example, if I ever see a pointer show up as 0xCECECECE in visual studio I know I did something wrong) but the truth is that since a pointer is just an index into memory it's near impossible to tell just by looking at the pointer if it's the "right" index.
There are various tricks that you can do with dynamic_cast and RTTI such to ensure that the object pointed to is of the type that you want, but they all require that you are pointing to something valid in the first place.
If you want to ensure that you program can detect "invalid" pointers then my advice is this: Set every pointer you declare either to NULL or a valid address immediately upon creation and set it to NULL immediately after freeing the memory that it points to. If you are diligent about this practice, then checking for NULL is all you ever need.
Setting the pointer to NULL before and after using is a good technique. This is easy to do in C++ if you manage pointers within a class for example (a string):
class SomeClass
{
public:
SomeClass();
~SomeClass();
void SetText( const char *text);
char *GetText() const { return MyText; }
void Clear();
private:
char * MyText;
};
SomeClass::SomeClass()
{
MyText = NULL;
}
SomeClass::~SomeClass()
{
Clear();
}
void SomeClass::Clear()
{
if (MyText)
free( MyText);
MyText = NULL;
}
void SomeClass::Settext( const char *text)
{
Clear();
MyText = malloc( strlen(text));
if (MyText)
strcpy( MyText, text);
}
Indeed, something could be done under specific occasion: for example if you want to check whether a string pointer string is valid, using write(fd, buf, szie) syscall can help you do the magic: let fd be a file descriptor of temporary file you create for test, and buf pointing to the string you are tesing, if the pointer is invalid write() would return -1 and errno set to EFAULT which indicating that buf is outside your accessible address space.
Peeter Joos answer is pretty good. Here is an "official" way to do it:
#include <sys/mman.h>
#include <stdbool.h>
#include <unistd.h>
bool is_pointer_valid(void *p) {
/* get the page size */
size_t page_size = sysconf(_SC_PAGESIZE);
/* find the address of the page that contains p */
void *base = (void *)((((size_t)p) / page_size) * page_size);
/* call msync, if it returns non-zero, return false */
int ret = msync(base, page_size, MS_ASYNC) != -1;
return ret ? ret : errno != ENOMEM;
}
There isn't any portable way of doing this, and doing it for specific platforms can be anywhere between hard and impossible. In any case, you should never write code that depends on such a check - don't let the pointers take on invalid values in the first place.
As others have said, you can't reliably detect an invalid pointer. Consider some of the forms an invalid pointer might take:
You could have a null pointer. That's one you could easily check for and do something about.
You could have a pointer to somewhere outside of valid memory. What constitutes valid memory varies depending on how the run-time environment of your system sets up the address space. On Unix systems, it is usually a virtual address space starting at 0 and going to some large number of megabytes. On embedded systems, it could be quite small. It might not start at 0, in any case. If your app happens to be running in supervisor mode or the equivalent, then your pointer might reference a real address, which may or may not be backed up with real memory.
You could have a pointer to somewhere inside your valid memory, even inside your data segment, bss, stack or heap, but not pointing at a valid object. A variant of this is a pointer that used to point to a valid object, before something bad happened to the object. Bad things in this context include deallocation, memory corruption, or pointer corruption.
You could have a flat-out illegal pointer, such as a pointer with illegal alignment for the thing being referenced.
The problem gets even worse when you consider segment/offset based architectures and other odd pointer implementations. This sort of thing is normally hidden from the developer by good compilers and judicious use of types, but if you want to pierce the veil and try to outsmart the operating system and compiler developers, well, you can, but there is not one generic way to do it that will handle all of the issues you might run into.
The best thing you can do is allow the crash and put out some good diagnostic information.
In general, it's impossible to do. Here's one particularly nasty case:
struct Point2d {
int x;
int y;
};
struct Point3d {
int x;
int y;
int z;
};
void dump(Point3 *p)
{
printf("[%d %d %d]\n", p->x, p->y, p->z);
}
Point2d points[2] = { {0, 1}, {2, 3} };
Point3d *p3 = reinterpret_cast<Point3d *>(&points[0]);
dump(p3);
On many platforms, this will print out:
[0 1 2]
You're forcing the runtime system to incorrectly interpret bits of memory, but in this case it's not going to crash, because the bits all make sense. This is part of the design of the language (look at C-style polymorphism with struct inaddr, inaddr_in, inaddr_in6), so you can't reliably protect against it on any platform.
It's unbelievable how much misleading information you can read in articles above...
And even in microsoft msdn documentation IsBadPtr is claimed to be banned. Oh well - I prefer working application rather than crashing. Even if term working might be working incorrectly (as long as end-user can continue with application).
By googling I haven't found any useful example for windows - found a solution for 32-bit apps,
http://www.codeproject.com/script/Content/ViewAssociatedFile.aspx?rzp=%2FKB%2Fsystem%2Fdetect-driver%2F%2FDetectDriverSrc.zip&zep=DetectDriverSrc%2FDetectDriver%2Fsrc%2FdrvCppLib%2Frtti.cpp&obid=58895&obtid=2&ovid=2
but I need also to support 64-bit apps, so this solution did not work for me.
But I've harvested wine's source codes, and managed to cook similar kind of code which would work for 64-bit apps as well - attaching code here:
#include <typeinfo.h>
typedef void (*v_table_ptr)();
typedef struct _cpp_object
{
v_table_ptr* vtable;
} cpp_object;
#ifndef _WIN64
typedef struct _rtti_object_locator
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
const type_info *type_descriptor;
//const rtti_object_hierarchy *type_hierarchy;
} rtti_object_locator;
#else
typedef struct
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
unsigned int type_descriptor;
unsigned int type_hierarchy;
unsigned int object_locator;
} rtti_object_locator;
#endif
/* Get type info from an object (internal) */
static const rtti_object_locator* RTTI_GetObjectLocator(void* inptr)
{
cpp_object* cppobj = (cpp_object*) inptr;
const rtti_object_locator* obj_locator = 0;
if (!IsBadReadPtr(cppobj, sizeof(void*)) &&
!IsBadReadPtr(cppobj->vtable - 1, sizeof(void*)) &&
!IsBadReadPtr((void*)cppobj->vtable[-1], sizeof(rtti_object_locator)))
{
obj_locator = (rtti_object_locator*) cppobj->vtable[-1];
}
return obj_locator;
}
And following code can detect whether pointer is valid or not, you need probably to add some NULL checking:
CTest* t = new CTest();
//t = (CTest*) 0;
//t = (CTest*) 0x12345678;
const rtti_object_locator* ptr = RTTI_GetObjectLocator(t);
#ifdef _WIN64
char *base = ptr->signature == 0 ? (char*)RtlPcToFileHeader((void*)ptr, (void**)&base) : (char*)ptr - ptr->object_locator;
const type_info *td = (const type_info*)(base + ptr->type_descriptor);
#else
const type_info *td = ptr->type_descriptor;
#endif
const char* n =td->name();
This gets class name from pointer - I think it should be enough for your needs.
One thing which I'm still afraid is performance of pointer checking - in code snipet above there is already 3-4 API calls being made - might be overkill for time critical applications.
It would be good if someone could measure overhead of pointer checking compared for example to C#/managed c++ calls.
It is not a very good policy to accept arbitrary pointers as input parameters in a public API. It's better to have "plain data" types like an integer, a string or a struct (I mean a classical struct with plain data inside, of course; officially anything can be a struct).
Why? Well because as others say there is no standard way to know whether you've been given a valid pointer or one that points to junk.
But sometimes you don't have the choice - your API must accept a pointer.
In these cases, it is the duty of the caller to pass a good pointer. NULL may be accepted as a value, but not a pointer to junk.
Can you double-check in any way? Well, what I did in a case like that was to define an invariant for the type the pointer points to, and call it when you get it (in debug mode). At least if the invariant fails (or crashes) you know that you were passed a bad value.
// API that does not allow NULL
void PublicApiFunction1(Person* in_person)
{
assert(in_person != NULL);
assert(in_person->Invariant());
// Actual code...
}
// API that allows NULL
void PublicApiFunction2(Person* in_person)
{
assert(in_person == NULL || in_person->Invariant());
// Actual code (must keep in mind that in_person may be NULL)
}
Following does work in Windows (somebody suggested it before):
static void copy(void * target, const void* source, int size)
{
__try
{
CopyMemory(target, source, size);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
doSomething(--whatever--);
}
}
The function has to be static, standalone or static method of some class.
To test on read-only, copy data in the local buffer.
To test on write without modifying contents, write them over.
You can test first/last addresses only.
If pointer is invalid, control will be passed to 'doSomething',
and then outside the brackets.
Just do not use anything requiring destructors, like CString.
On Windows I use this code:
void * G_pPointer = NULL;
const char * G_szPointerName = NULL;
void CheckPointerIternal()
{
char cTest = *((char *)G_pPointer);
}
bool CheckPointerIternalExt()
{
bool bRet = false;
__try
{
CheckPointerIternal();
bRet = true;
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
}
return bRet;
}
void CheckPointer(void * A_pPointer, const char * A_szPointerName)
{
G_pPointer = A_pPointer;
G_szPointerName = A_szPointerName;
if (!CheckPointerIternalExt())
throw std::runtime_error("Invalid pointer " + std::string(G_szPointerName) + "!");
}
Usage:
unsigned long * pTest = (unsigned long *) 0x12345;
CheckPointer(pTest, "pTest"); //throws exception
On macOS, you can do this with mach_vm_region, which as well as telling you if a pointer is valid, also lets you validate what access you have to the memory to which the pointer points (read/write/execute). I provided sample code to do this in my answer to another question:
#include <mach/mach.h>
#include <mach/mach_vm.h>
#include <stdio.h>
#include <stdbool.h>
bool ptr_is_valid(void *ptr, vm_prot_t needs_access) {
vm_map_t task = mach_task_self();
mach_vm_address_t address = (mach_vm_address_t)ptr;
mach_vm_size_t size = 0;
vm_region_basic_info_data_64_t info;
mach_msg_type_number_t count = VM_REGION_BASIC_INFO_COUNT_64;
mach_port_t object_name;
kern_return_t ret = mach_vm_region(task, &address, &size, VM_REGION_BASIC_INFO_64, (vm_region_info_t)&info, &count, &object_name);
if (ret != KERN_SUCCESS) return false;
return ((mach_vm_address_t)ptr) >= address && ((info.protection & needs_access) == needs_access);
}
#define TEST(ptr,acc) printf("ptr_is_valid(%p,access=%d)=%d\n", (void*)(ptr), (acc), ptr_is_valid((void*)(ptr),(acc)))
int main(int argc, char**argv) {
TEST(0,0);
TEST(0,VM_PROT_READ);
TEST(123456789,VM_PROT_READ);
TEST(main,0);
TEST(main,VM_PROT_READ);
TEST(main,VM_PROT_READ|VM_PROT_EXECUTE);
TEST(main,VM_PROT_EXECUTE);
TEST(main,VM_PROT_WRITE);
TEST((void*)(-1),0);
return 0;
}
The SEI CERT C Coding Standard recommendation MEM10-C. Define and use a pointer validation function says it is possible to do a check to some degree, especially under Linux OS.
The method described in the link is to keep track of the highest memory address returned by malloc and add a function that tests if someone tries to use a pointer greater than that value. It is probably of limited use.
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
These take time proportional to the length of the block, so for sanity check I just check the starting address.
I have seen various libraries use some method to check for unreferenced memory and such. I believe they simply "override" the memory allocation and deallocation methods (malloc/free), which has some logic that keeps track of the pointers. I suppose this is overkill for your use case, but it would be one way to do it.
Technically you can override operator new (and delete) and collect information about all allocated memory, so you can have a method to check if heap memory is valid.
but:
you still need a way to check if pointer is allocated on stack ()
you will need to define what is 'valid' pointer:
a) memory on that address is
allocated
b) memory at that address
is start address of object (e.g.
address not in the middle of huge
array)
c) memory at that address
is start address of object of expected type
Bottom line: approach in question is not C++ way, you need to define some rules which ensure that function receives valid pointers.
There is no way to make that check in C++. What should you do if other code passes you an invalid pointer? You should crash. Why? Check out this link: http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx
Addendum to the accpeted answer(s):
Assume that your pointer could hold only three values -- 0, 1 and -1 where 1 signifies a valid pointer, -1 an invalid one and 0 another invalid one. What is the probability that your pointer is NULL, all values being equally likely? 1/3. Now, take the valid case out, so for every invalid case, you have a 50:50 ratio to catch all errors. Looks good right? Scale this for a 4-byte pointer. There are 2^32 or 4294967294 possible values. Of these, only ONE value is correct, one is NULL, and you are still left with 4294967292 other invalid cases. Recalculate: you have a test for 1 out of (4294967292+ 1) invalid cases. A probability of 2.xe-10 or 0 for most practical purposes. Such is the futility of the NULL check.
You know, a new driver (at least on Linux) that is capable of this probably wouldn't be that hard to write.
On the other hand, it would be folly to build your programs like this. Unless you have some really specific and single use for such a thing, I wouldn't recommend it. If you built a large application loaded with constant pointer validity checks it would likely be horrendously slow.
you should avoid these methods because they do not work. blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx – JaredPar Feb 15 '09 at 16:02
If they don't work - next windows update will fix it ?
If they don't work on concept level - function will be probably removed from windows api completely.
MSDN documentation claim that they are banned, and reason for this is probably flaw of further design of application (e.g. generally you should not eat invalid pointers silently - if you're in charge of design of whole application of course), and performance/time of pointer checking.
But you should not claim that they does not work because of some blog.
In my test application I've verified that they do work.
these links may be helpful
_CrtIsValidPointer
Verifies that a specified memory range is valid for reading and writing (debug version only).
http://msdn.microsoft.com/en-us/library/0w1ekd5e.aspx
_CrtCheckMemory
Confirms the integrity of the memory blocks allocated in the debug heap (debug version only).
http://msdn.microsoft.com/en-us/library/e73x0s4b.aspx

How to check that the address pointed by a pointer is pointing to an object not a garbage before reading or writing to this object? [duplicate]

Is there any way to determine (programatically, of course) if a given pointer is "valid"? Checking for NULL is easy, but what about things like 0x00001234? When trying to dereference this kind of pointer an exception/crash occurs.
A cross-platform method is preferred, but platform-specific (for Windows and Linux) is also ok.
Update for clarification:
The problem is not with stale/freed/uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
Update for clarification: The problem is not with stale, freed or uninitialized pointers; instead, I'm implementing an API that takes pointers from the caller (like a pointer to a string, a file handle, etc.). The caller can send (in purpose or by mistake) an invalid value as the pointer. How do I prevent a crash?
You can't make that check. There is simply no way you can check whether a pointer is "valid". You have to trust that when people use a function that takes a pointer, those people know what they are doing. If they pass you 0x4211 as a pointer value, then you have to trust it points to address 0x4211. And if they "accidentally" hit an object, then even if you would use some scary operation system function (IsValidPtr or whatever), you would still slip into a bug and not fail fast.
Start using null pointers for signaling this kind of thing and tell the user of your library that they should not use pointers if they tend to accidentally pass invalid pointers, seriously :)
Here are three easy ways for a C program under Linux to get introspective about the status of the memory in which it is running, and why the question has appropriate sophisticated answers in some contexts.
After calling getpagesize() and rounding the pointer to a page
boundary, you can call mincore() to find out if a page is valid and
if it happens to be part of the process working set. Note that this requires
some kernel resources, so you should benchmark it and determine if
calling this function is really appropriate in your api. If your api
is going to be handling interrupts, or reading from serial ports
into memory, it is appropriate to call this to avoid unpredictable
behaviors.
After calling stat() to determine if there is a /proc/self directory available, you can fopen and read through /proc/self/maps
to find information about the region in which a pointer resides.
Study the man page for proc, the process information pseudo-file
system. Obviously this is relatively expensive, but you might be
able to get away with caching the result of the parse into an array
you can efficiently lookup using a binary search. Also consider the
/proc/self/smaps. If your api is for high-performance computing then
the program will want to know about the /proc/self/numa which is
documented under the man page for numa, the non-uniform memory
architecture.
The get_mempolicy(MPOL_F_ADDR) call is appropriate for high performance computing api work where there are multiple threads of
execution and you are managing your work to have affinity for non-uniform memory
as it relates to the cpu cores and socket resources. Such an api
will of course also tell you if a pointer is valid.
Under Microsoft Windows there is the function QueryWorkingSetEx that is documented under the Process Status API (also in the NUMA API).
As a corollary to sophisticated NUMA API programming this function will also let you do simple "testing pointers for validity (C/C++)" work, as such it is unlikely to be deprecated for at least 15 years.
Preventing a crash caused by the caller sending in an invalid pointer is a good way to make silent bugs that are hard to find.
Isn't it better for the programmer using your API to get a clear message that his code is bogus by crashing it rather than hiding it?
On Win32/64 there is a way to do this. Attempt to read the pointer and catch the resulting SEH exeception that will be thrown on failure. If it doesn't throw, then it's a valid pointer.
The problem with this method though is that it just returns whether or not you can read data from the pointer. It makes no guarantee about type safety or any number of other invariants. In general this method is good for little else other than to say "yes, I can read that particular place in memory at a time that has now passed".
In short, Don't do this ;)
Raymond Chen has a blog post on this subject: http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
AFAIK there is no way. You should try to avoid this situation by always setting pointers to NULL after freeing memory.
On Unix you should be able to utilize a kernel syscall that does pointer checking and returns EFAULT, such as:
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdbool.h>
bool isPointerBad( void * p )
{
int fh = open( p, 0, 0 );
int e = errno;
if ( -1 == fh && e == EFAULT )
{
printf( "bad pointer: %p\n", p );
return true;
}
else if ( fh != -1 )
{
close( fh );
}
printf( "good pointer: %p\n", p );
return false;
}
int main()
{
int good = 4;
isPointerBad( (void *)3 );
isPointerBad( &good );
isPointerBad( "/tmp/blah" );
return 0;
}
returning:
bad pointer: 0x3
good pointer: 0x7fff375fd49c
good pointer: 0x400793
There's probably a better syscall to use than open() [perhaps access], since there's a chance that this could lead to actual file creation codepath, and a subsequent close requirement.
Regarding the answer a bit up in this thread:
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
My advice is to stay away from them, someone has already posted this one:
http://blogs.msdn.com/oldnewthing/archive/2007/06/25/3507294.aspx
Another post on the same topic and by the same author (I think) is this one:
http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx ("IsBadXxxPtr should really be called CrashProgramRandomly").
If the users of your API sends in bad data, let it crash. If the problem is that the data passed isn't used until later (and that makes it harder to find the cause), add a debug mode where the strings etc. are logged at entry. If they are bad it will be obvious (and probably crash). If it is happening way to often, it might be worth moving your API out of process and let them crash the API process instead of the main process.
Firstly, I don't see any point in trying to protect yourself from the caller deliberately trying to cause a crash. They could easily do this by trying to access through an invalid pointer themselves. There are many other ways - they could just overwrite your memory or the stack. If you need to protect against this sort of thing then you need to be running in a separate process using sockets or some other IPC for communication.
We write quite a lot of software that allows partners/customers/users to extend functionality. Inevitably any bug gets reported to us first so it is useful to be able to easily show that the problem is in the plug-in code. Additionally there are security concerns and some users are more trusted than others.
We use a number of different methods depending on performance/throughput requirements and trustworthyness. From most preferred:
separate processes using sockets (often passing data as text).
separate processes using shared memory (if large amounts of data to pass).
same process separate threads via message queue (if frequent short messages).
same process separate threads all passed data allocated from a memory pool.
same process via direct procedure call - all passed data allocated from a memory pool.
We try never to resort to what you are trying to do when dealing with third party software - especially when we are given the plug-ins/library as binary rather than source code.
Use of a memory pool is quite easy in most circumstances and needn't be inefficient. If YOU allocate the data in the first place then it is trivial to check the pointers against the values you allocated. You could also store the length allocated and add "magic" values before and after the data to check for valid data type and data overruns.
I've got a lot of sympathy with your question, as I'm in an almost identical position myself. I appreciate what a lot of the replies are saying, and they are correct - the routine supplying the pointer should be providing a valid pointer. In my case, it is almost inconceivable that they could have corrupted the pointer - but if they had managed, it would be MY software that crashes, and ME that would get the blame :-(
My requirement isn't that I continue after a segmentation fault - that would be dangerous - I just want to report what happened to the customer before terminating so that they can fix their code rather than blaming me!
This is how I've found to do it (on Windows): http://www.cplusplus.com/reference/clibrary/csignal/signal/
To give a synopsis:
#include <signal.h>
using namespace std;
void terminate(int param)
/// Function executed if a segmentation fault is encountered during the cast to an instance.
{
cerr << "\nThe function received a corrupted reference - please check the user-supplied dll.\n";
cerr << "Terminating program...\n";
exit(1);
}
...
void MyFunction()
{
void (*previous_sigsegv_function)(int);
previous_sigsegv_function = signal(SIGSEGV, terminate);
<-- insert risky stuff here -->
signal(SIGSEGV, previous_sigsegv_function);
}
Now this appears to behave as I would hope (it prints the error message, then terminates the program) - but if someone can spot a flaw, please let me know!
There are no provisions in C++ to test for the validity of a pointer as a general case. One can obviously assume that NULL (0x00000000) is bad, and various compilers and libraries like to use "special values" here and there to make debugging easier (For example, if I ever see a pointer show up as 0xCECECECE in visual studio I know I did something wrong) but the truth is that since a pointer is just an index into memory it's near impossible to tell just by looking at the pointer if it's the "right" index.
There are various tricks that you can do with dynamic_cast and RTTI such to ensure that the object pointed to is of the type that you want, but they all require that you are pointing to something valid in the first place.
If you want to ensure that you program can detect "invalid" pointers then my advice is this: Set every pointer you declare either to NULL or a valid address immediately upon creation and set it to NULL immediately after freeing the memory that it points to. If you are diligent about this practice, then checking for NULL is all you ever need.
Setting the pointer to NULL before and after using is a good technique. This is easy to do in C++ if you manage pointers within a class for example (a string):
class SomeClass
{
public:
SomeClass();
~SomeClass();
void SetText( const char *text);
char *GetText() const { return MyText; }
void Clear();
private:
char * MyText;
};
SomeClass::SomeClass()
{
MyText = NULL;
}
SomeClass::~SomeClass()
{
Clear();
}
void SomeClass::Clear()
{
if (MyText)
free( MyText);
MyText = NULL;
}
void SomeClass::Settext( const char *text)
{
Clear();
MyText = malloc( strlen(text));
if (MyText)
strcpy( MyText, text);
}
Indeed, something could be done under specific occasion: for example if you want to check whether a string pointer string is valid, using write(fd, buf, szie) syscall can help you do the magic: let fd be a file descriptor of temporary file you create for test, and buf pointing to the string you are tesing, if the pointer is invalid write() would return -1 and errno set to EFAULT which indicating that buf is outside your accessible address space.
Peeter Joos answer is pretty good. Here is an "official" way to do it:
#include <sys/mman.h>
#include <stdbool.h>
#include <unistd.h>
bool is_pointer_valid(void *p) {
/* get the page size */
size_t page_size = sysconf(_SC_PAGESIZE);
/* find the address of the page that contains p */
void *base = (void *)((((size_t)p) / page_size) * page_size);
/* call msync, if it returns non-zero, return false */
int ret = msync(base, page_size, MS_ASYNC) != -1;
return ret ? ret : errno != ENOMEM;
}
There isn't any portable way of doing this, and doing it for specific platforms can be anywhere between hard and impossible. In any case, you should never write code that depends on such a check - don't let the pointers take on invalid values in the first place.
As others have said, you can't reliably detect an invalid pointer. Consider some of the forms an invalid pointer might take:
You could have a null pointer. That's one you could easily check for and do something about.
You could have a pointer to somewhere outside of valid memory. What constitutes valid memory varies depending on how the run-time environment of your system sets up the address space. On Unix systems, it is usually a virtual address space starting at 0 and going to some large number of megabytes. On embedded systems, it could be quite small. It might not start at 0, in any case. If your app happens to be running in supervisor mode or the equivalent, then your pointer might reference a real address, which may or may not be backed up with real memory.
You could have a pointer to somewhere inside your valid memory, even inside your data segment, bss, stack or heap, but not pointing at a valid object. A variant of this is a pointer that used to point to a valid object, before something bad happened to the object. Bad things in this context include deallocation, memory corruption, or pointer corruption.
You could have a flat-out illegal pointer, such as a pointer with illegal alignment for the thing being referenced.
The problem gets even worse when you consider segment/offset based architectures and other odd pointer implementations. This sort of thing is normally hidden from the developer by good compilers and judicious use of types, but if you want to pierce the veil and try to outsmart the operating system and compiler developers, well, you can, but there is not one generic way to do it that will handle all of the issues you might run into.
The best thing you can do is allow the crash and put out some good diagnostic information.
In general, it's impossible to do. Here's one particularly nasty case:
struct Point2d {
int x;
int y;
};
struct Point3d {
int x;
int y;
int z;
};
void dump(Point3 *p)
{
printf("[%d %d %d]\n", p->x, p->y, p->z);
}
Point2d points[2] = { {0, 1}, {2, 3} };
Point3d *p3 = reinterpret_cast<Point3d *>(&points[0]);
dump(p3);
On many platforms, this will print out:
[0 1 2]
You're forcing the runtime system to incorrectly interpret bits of memory, but in this case it's not going to crash, because the bits all make sense. This is part of the design of the language (look at C-style polymorphism with struct inaddr, inaddr_in, inaddr_in6), so you can't reliably protect against it on any platform.
It's unbelievable how much misleading information you can read in articles above...
And even in microsoft msdn documentation IsBadPtr is claimed to be banned. Oh well - I prefer working application rather than crashing. Even if term working might be working incorrectly (as long as end-user can continue with application).
By googling I haven't found any useful example for windows - found a solution for 32-bit apps,
http://www.codeproject.com/script/Content/ViewAssociatedFile.aspx?rzp=%2FKB%2Fsystem%2Fdetect-driver%2F%2FDetectDriverSrc.zip&zep=DetectDriverSrc%2FDetectDriver%2Fsrc%2FdrvCppLib%2Frtti.cpp&obid=58895&obtid=2&ovid=2
but I need also to support 64-bit apps, so this solution did not work for me.
But I've harvested wine's source codes, and managed to cook similar kind of code which would work for 64-bit apps as well - attaching code here:
#include <typeinfo.h>
typedef void (*v_table_ptr)();
typedef struct _cpp_object
{
v_table_ptr* vtable;
} cpp_object;
#ifndef _WIN64
typedef struct _rtti_object_locator
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
const type_info *type_descriptor;
//const rtti_object_hierarchy *type_hierarchy;
} rtti_object_locator;
#else
typedef struct
{
unsigned int signature;
int base_class_offset;
unsigned int flags;
unsigned int type_descriptor;
unsigned int type_hierarchy;
unsigned int object_locator;
} rtti_object_locator;
#endif
/* Get type info from an object (internal) */
static const rtti_object_locator* RTTI_GetObjectLocator(void* inptr)
{
cpp_object* cppobj = (cpp_object*) inptr;
const rtti_object_locator* obj_locator = 0;
if (!IsBadReadPtr(cppobj, sizeof(void*)) &&
!IsBadReadPtr(cppobj->vtable - 1, sizeof(void*)) &&
!IsBadReadPtr((void*)cppobj->vtable[-1], sizeof(rtti_object_locator)))
{
obj_locator = (rtti_object_locator*) cppobj->vtable[-1];
}
return obj_locator;
}
And following code can detect whether pointer is valid or not, you need probably to add some NULL checking:
CTest* t = new CTest();
//t = (CTest*) 0;
//t = (CTest*) 0x12345678;
const rtti_object_locator* ptr = RTTI_GetObjectLocator(t);
#ifdef _WIN64
char *base = ptr->signature == 0 ? (char*)RtlPcToFileHeader((void*)ptr, (void**)&base) : (char*)ptr - ptr->object_locator;
const type_info *td = (const type_info*)(base + ptr->type_descriptor);
#else
const type_info *td = ptr->type_descriptor;
#endif
const char* n =td->name();
This gets class name from pointer - I think it should be enough for your needs.
One thing which I'm still afraid is performance of pointer checking - in code snipet above there is already 3-4 API calls being made - might be overkill for time critical applications.
It would be good if someone could measure overhead of pointer checking compared for example to C#/managed c++ calls.
It is not a very good policy to accept arbitrary pointers as input parameters in a public API. It's better to have "plain data" types like an integer, a string or a struct (I mean a classical struct with plain data inside, of course; officially anything can be a struct).
Why? Well because as others say there is no standard way to know whether you've been given a valid pointer or one that points to junk.
But sometimes you don't have the choice - your API must accept a pointer.
In these cases, it is the duty of the caller to pass a good pointer. NULL may be accepted as a value, but not a pointer to junk.
Can you double-check in any way? Well, what I did in a case like that was to define an invariant for the type the pointer points to, and call it when you get it (in debug mode). At least if the invariant fails (or crashes) you know that you were passed a bad value.
// API that does not allow NULL
void PublicApiFunction1(Person* in_person)
{
assert(in_person != NULL);
assert(in_person->Invariant());
// Actual code...
}
// API that allows NULL
void PublicApiFunction2(Person* in_person)
{
assert(in_person == NULL || in_person->Invariant());
// Actual code (must keep in mind that in_person may be NULL)
}
Following does work in Windows (somebody suggested it before):
static void copy(void * target, const void* source, int size)
{
__try
{
CopyMemory(target, source, size);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
doSomething(--whatever--);
}
}
The function has to be static, standalone or static method of some class.
To test on read-only, copy data in the local buffer.
To test on write without modifying contents, write them over.
You can test first/last addresses only.
If pointer is invalid, control will be passed to 'doSomething',
and then outside the brackets.
Just do not use anything requiring destructors, like CString.
On Windows I use this code:
void * G_pPointer = NULL;
const char * G_szPointerName = NULL;
void CheckPointerIternal()
{
char cTest = *((char *)G_pPointer);
}
bool CheckPointerIternalExt()
{
bool bRet = false;
__try
{
CheckPointerIternal();
bRet = true;
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
}
return bRet;
}
void CheckPointer(void * A_pPointer, const char * A_szPointerName)
{
G_pPointer = A_pPointer;
G_szPointerName = A_szPointerName;
if (!CheckPointerIternalExt())
throw std::runtime_error("Invalid pointer " + std::string(G_szPointerName) + "!");
}
Usage:
unsigned long * pTest = (unsigned long *) 0x12345;
CheckPointer(pTest, "pTest"); //throws exception
On macOS, you can do this with mach_vm_region, which as well as telling you if a pointer is valid, also lets you validate what access you have to the memory to which the pointer points (read/write/execute). I provided sample code to do this in my answer to another question:
#include <mach/mach.h>
#include <mach/mach_vm.h>
#include <stdio.h>
#include <stdbool.h>
bool ptr_is_valid(void *ptr, vm_prot_t needs_access) {
vm_map_t task = mach_task_self();
mach_vm_address_t address = (mach_vm_address_t)ptr;
mach_vm_size_t size = 0;
vm_region_basic_info_data_64_t info;
mach_msg_type_number_t count = VM_REGION_BASIC_INFO_COUNT_64;
mach_port_t object_name;
kern_return_t ret = mach_vm_region(task, &address, &size, VM_REGION_BASIC_INFO_64, (vm_region_info_t)&info, &count, &object_name);
if (ret != KERN_SUCCESS) return false;
return ((mach_vm_address_t)ptr) >= address && ((info.protection & needs_access) == needs_access);
}
#define TEST(ptr,acc) printf("ptr_is_valid(%p,access=%d)=%d\n", (void*)(ptr), (acc), ptr_is_valid((void*)(ptr),(acc)))
int main(int argc, char**argv) {
TEST(0,0);
TEST(0,VM_PROT_READ);
TEST(123456789,VM_PROT_READ);
TEST(main,0);
TEST(main,VM_PROT_READ);
TEST(main,VM_PROT_READ|VM_PROT_EXECUTE);
TEST(main,VM_PROT_EXECUTE);
TEST(main,VM_PROT_WRITE);
TEST((void*)(-1),0);
return 0;
}
The SEI CERT C Coding Standard recommendation MEM10-C. Define and use a pointer validation function says it is possible to do a check to some degree, especially under Linux OS.
The method described in the link is to keep track of the highest memory address returned by malloc and add a function that tests if someone tries to use a pointer greater than that value. It is probably of limited use.
IsBadReadPtr(), IsBadWritePtr(), IsBadCodePtr(), IsBadStringPtr() for Windows.
These take time proportional to the length of the block, so for sanity check I just check the starting address.
I have seen various libraries use some method to check for unreferenced memory and such. I believe they simply "override" the memory allocation and deallocation methods (malloc/free), which has some logic that keeps track of the pointers. I suppose this is overkill for your use case, but it would be one way to do it.
Technically you can override operator new (and delete) and collect information about all allocated memory, so you can have a method to check if heap memory is valid.
but:
you still need a way to check if pointer is allocated on stack ()
you will need to define what is 'valid' pointer:
a) memory on that address is
allocated
b) memory at that address
is start address of object (e.g.
address not in the middle of huge
array)
c) memory at that address
is start address of object of expected type
Bottom line: approach in question is not C++ way, you need to define some rules which ensure that function receives valid pointers.
There is no way to make that check in C++. What should you do if other code passes you an invalid pointer? You should crash. Why? Check out this link: http://blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx
Addendum to the accpeted answer(s):
Assume that your pointer could hold only three values -- 0, 1 and -1 where 1 signifies a valid pointer, -1 an invalid one and 0 another invalid one. What is the probability that your pointer is NULL, all values being equally likely? 1/3. Now, take the valid case out, so for every invalid case, you have a 50:50 ratio to catch all errors. Looks good right? Scale this for a 4-byte pointer. There are 2^32 or 4294967294 possible values. Of these, only ONE value is correct, one is NULL, and you are still left with 4294967292 other invalid cases. Recalculate: you have a test for 1 out of (4294967292+ 1) invalid cases. A probability of 2.xe-10 or 0 for most practical purposes. Such is the futility of the NULL check.
You know, a new driver (at least on Linux) that is capable of this probably wouldn't be that hard to write.
On the other hand, it would be folly to build your programs like this. Unless you have some really specific and single use for such a thing, I wouldn't recommend it. If you built a large application loaded with constant pointer validity checks it would likely be horrendously slow.
you should avoid these methods because they do not work. blogs.msdn.com/oldnewthing/archive/2006/09/27/773741.aspx – JaredPar Feb 15 '09 at 16:02
If they don't work - next windows update will fix it ?
If they don't work on concept level - function will be probably removed from windows api completely.
MSDN documentation claim that they are banned, and reason for this is probably flaw of further design of application (e.g. generally you should not eat invalid pointers silently - if you're in charge of design of whole application of course), and performance/time of pointer checking.
But you should not claim that they does not work because of some blog.
In my test application I've verified that they do work.
these links may be helpful
_CrtIsValidPointer
Verifies that a specified memory range is valid for reading and writing (debug version only).
http://msdn.microsoft.com/en-us/library/0w1ekd5e.aspx
_CrtCheckMemory
Confirms the integrity of the memory blocks allocated in the debug heap (debug version only).
http://msdn.microsoft.com/en-us/library/e73x0s4b.aspx

practical explanation of c++ functions with pointers

I am relatively new to C++...
I am learning and coding but I am finding the idea of pointers to be somewhat fuzzy. As I understand it * points to a value and & points to an address...great but why? Which is byval and which is byref and again why?
And while I feel like I am learning and understanding the idea of stack vs heap, runtime vs design time etc, I don't feel like I'm fully understanding what is going on. I don't like using coding techniques that I don't fully understand.
Could anyone please elaborate on exactly what and why the pointers in this fairly "simple" function below are used, esp the pointer to the function itself.. [got it]
Just asking how to clean up (delete[]) the str... or if it just goes out of scope.. Thanks.
char *char_out(AnsiString ansi_in)
{
// allocate memory for char array
char *str = new char[ansi_in.Length() + 1];
// copy contents of string into char array
strcpy(str, ansi_in.c_str());
return str;
}
Revision 3
TL;DR:
AnsiString appears to be an object which is passed by value to that function.
char* str is on the stack.
A new array is created on the heap with (ansi_in.Length() + 1) elements. A pointer to the array is stored in str. +1 is used because strings in C/C++ typically use a null terminator, which is a special character used to identify the end of the string when scanning through it.
ansi_in.cstr() is called, copying a pointer to its string buffer into an unnamed local variable on the stack.
str and the temporary pointer are pushed onto the stack and strcpy is called. This has the effect of copying the string(including the null-terminator) pointed at from the temporary to str.
str is returned to the caller
Long answer:
You appear to be struggling to understand stack vs heap, and pointers vs non-pointers. I'll break them down for you and then answer your question.
The stack is a concept where a fixed region of memory is allocated for each thread before it starts and before any user code runs.
Ignoring lower level details such as calling conventions and compiler optimizations, you can reason that the following happens when you call a function:
Arguments are pushed onto the stack. This reserves part of the stack for use of the arguments.
The function performs some job, using and copying the arguments as needed.
The function pops the arguments off the stack and returns. This frees the space reserved for the arguments.
This isn't limited to function calls. When you declare objects and primitives in a function's body, space for them is reserved via pushing. When they're out of scope, they're automatically cleaned up by calling destructors and popping.
When your program runs out of stack space and starts using the space outside of it, you'll typically encounter an error. Regardless of what the actual error is, it's known as a stack overflow because you're going past it and therefore "overflowing".
The heap is a different concept where the remaining unused memory of the system is available for you to manually allocate and deallocate from. This is primarily used when you have a large data set that's too big for the stack, or when you need data to persist across arbitrary functions.
C++ is a difficult beast to master, but if you can wrap your head around the core concepts is becomes easier to understand.
Suppose we wanted to model a human:
struct Human
{
const char* Name;
int Age;
};
int main(int argc, char** argv)
{
Human human;
human.Name = "Edward";
human.Age = 30;
return 0;
}
This allocates at least sizeof(Human) bytes on the stack for storing the 'human' object. Right before main() returns, the space for 'human' is freed.
Now, suppose we wanted an array of 10 humans:
int main(int argc, char** argv)
{
Human humans[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
return 0;
}
This allocates at least (sizeof(Human) * 10) bytes on the stack for storing the 'humans' array. This too is automatically cleaned up.
Note uses of ".". When using anything that's not a pointer, you access their contents using a period. This is direct memory access if you're not using a reference.
Here's the single object version using the heap:
int main(int argc, char** argv)
{
Human* human = new Human();
human->Name = "Edward";
human->Age = 30;
delete human;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for the pointer 'human', and at least sizeof(Human) bytes on the heap for storing the object it points to. 'human' is not automatically cleaned up, you must call delete to free it. Note uses of "a->b". When using pointers, you access their contents using the "->" operator. This is indirect memory access, because you're accessing memory through an variable address.
It's sort of like mail. When someone wants to mail you something they write an address on an envelope and submit it through the mail system. A mailman takes the mail and moves it to your mailbox. For comparison the pointer is the address written on the envelope, the memory management unit(mmu) is the mail system, the electrical signals being passed down the wire are the mailman, and the memory location the address refers to is the mailbox.
Here's the array version using the heap:
int main(int argc, char** argv)
{
Human* humans = new Human[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
delete[] humans;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for pointer 'humans', and (sizeof(Human) * 10) bytes on the heap for storing the array it points to. 'humans' is also not automatically cleaned up; you must call delete[] to free it.
Note uses of "a[i].b" rather than "a[i]->b". The "[]" operator(indexer) is really just syntactic sugar for "*(a + i)", which really just means treat it as a normal variable in a sequence so I can type less.
In both of the above heap examples, if you didn't write delete/delete[], the memory that the pointers point to would leak(also known as dangle). This is bad because if left unchecked it could eat through all your available memory, eventually crashing when there isn't enough or the OS decides other apps are more important than yours.
Using the stack is usually the wiser choice as you get automatic lifetime management via scope(aka RAII) and better data locality. The only "drawback" to this approach is that because of scoped lifetime you can't directly access your stack variables once the scope has exited. In other words you can only use stack variables within the scope they're declared. Despite this, C++ allows you to copy pointers and references to stack variables, and indirectly use them outside the scope they're declared in. Do note however that this is almost always a very bad idea, don't do it unless you really know what you're doing, I can't stress this enough.
Passing an argument by-ref means pushing a copy of a pointer or reference to the data on the stack. As far as the computer is concerned pointers and references are the same thing. This is a very lightweight concept, but you typically need to check for null in functions receiving pointers.
Pointer variant of an integer adding function:
int add(const int* firstIntPtr, const int* secondIntPtr)
{
if (firstIntPtr == nullptr) {
throw std::invalid_argument("firstIntPtr cannot be null.");
}
if (secondIntPtr == nullptr) {
throw std::invalid_argument("secondIntPtr cannot be null.");
}
return *firstIntPtr + *secondIntPtr;
}
Note the null checks. If it didn't verify its arguments are valid, they very well may be null or point to memory the app doesn't have access to. Attempting to read such values via dereferencing(*firstIntPtr/*secondIntPtr) is undefined behavior and if you're lucky results in a segmentation fault(aka access violation on windows), crashing the program. When this happens and your program doesn't crash, there are deeper issues with your code that are out of the scope of this answer.
Reference variant of an integer adding function:
int add(const int& firstInt, const int& secondInt)
{
return firstInt + secondInt;
}
Note the lack of null checks. By design C++ limits how you can acquire references, so you're not suppose to be able to pass a null reference, and therefore no null checks are required. That said, it's still possible to get a null reference through converting a pointer to a reference, but if you're doing that and not checking for null before converting you have a bug in your code.
Passing an argument by-val means pushing a copy of it on the stack. You almost always want to pass small data structures by value. You don't have to check for null when passing values because you're passing the actual data itself and not a pointer to it.
i.e.
int add(int firstInt, int secondInt)
{
return firstInt + secondInt;
}
No null checks are required because values, not pointers are used. Values can't be null.
Assuming you're interested in learning about all this, I highly suggest you use std::string(also see this) for all your string needs and std::unique_ptr(also see this) for managing pointers.
i.e.
std::string char_out(AnsiString ansi_in)
{
return std::string(ansi_in.c_str());
}
std::unique_ptr<char[]> char_out(AnsiString ansi_in)
{
std::unique_ptr<char[]> str(new char[ansi_in.Length() + 1]);
strcpy(str.get(), ansi_in.c_str());
return str; // std::move(str) if you're using an older C++11 compiler.
}

How to know if two C-strings point to one memory block?

I have an array allocated with malloc:
char *aStr1 = (char* ) malloc (10);
And then I filled this memory:
strcpy(aStr1, "ABCDEFGHI");
After that I created a new pointer aStr2:
char *aStr2 = aStr1 + 5;
And i set fourth element of memory to '\0':
*(aStr1 + 4) = '\0';
And finally, using these two pointers in a simple function:
int checkMem(char *aStr1, char *aStr2);
This function returns true (some none zero value) if aStr1 and aStr2 pointed to one memory block, and returns zero in another case.
How i can implement this function? (I read many linux mans about allocs function and haven't found any information about such problem).
//Added
I need this to do something like that:
char *aStr1 = (char *) malloc (10);
char *aStr2 = aStr1 + 5;
strcpy(aStr1, "ABCDEFGHI");
*(aStr1 + 4) = '\0';
and than:
my_strcat(aStr1, aStr2);
I do not ask for help to implement my_strcat, but maybe, get some hint how i can resolve its problem
//Updated
thanx, for all. I solved it.
Without any low level functions you cannot correctly know, how many memory allocate (maybe on some platform or realization you can do this:
size_t ptr_size = *((size_t *)ptr - 1);
but maybe not for all it will be correct).
And solving is simple: i create local copy of aSrc2, then realloc aSrc1 and copy aSrc2 to new aSrc1.
Unfortunately you cannot tell if two pointers point to memory that belonged to the same initial allocation.
You can create classes/structures that, for instance, save the initial allocation and then you could compare them.
But without added information, you simply cannot tell.
There is no provided standard mechanism for doing this, its up to you to track the memory you received and how big those allocations are, so you'd probably want to provide your own malloc wrapper that tracks what's allocd. Store the pointers in a map so you can use lower_bound to find the nearest allocate to the first string and then check if the second string is in the same allocation.

In C/C++, how do I get the data in a virtual memory space?

I am working with the debug information. I am trying to write kind of like a "debug information parser", I am using DWARF and ELF libraries to do this, but they do not offer anything besides information of the memory space, I am trying to get the data in that memory space. I am hooked to the program. I am using a tool called Pin, so I am actually running the code inside the other program.. That is why I have access to its variables.
Assuming I have a pointer to an address, I want to get all the data that is stored in that address and the next 4 bytes (for example).
As an example, let's say I have an address 0xDEADBEEF and I want to go through the next 4 bytes starting from that address and read the data (dereference the pointer on each byte)
I am relatively new to C, and what I am attempting to do is:
char * address = "0xDEADBEEF";
unsigned int bytesize = 4;
ptr = (void *) address;
ptr_limit = ptr + bytesize;
for(ptr; ptr < ptr_limit; ptr++)
cout << ptr;
I know this might be completely wrong, and I am getting a lot of compiling errors, but it is just to show a bit of the logic I am trying to use...
OK, C and C++ are low level, but they aren't the wild west. You aren't allowed to just make up an address and access it. You aren't allowed to do that in assembly on most OSs; this is where SegFaults come from.
The way you get memory is to allocate it. This process involves telling the OS that you want a piece of memory of some size. At which point, the OS does its stuff so that you can access a certain range of virtual memory. Attempts to access memory outside of this range (or any range that the OS has allowed you to access) will cause the OS to terminate your program.
In C, you generally use malloc/calloc/realloc to allocate memory and free to tell the OS that you're done with it. C++ uses new to allocate objects and delete to deallocate them.
I am trying to write kind of like a "debug information parser", I am using DWARF and ELF libraries to do this, but they do not offer anything besides information of the memory space, I am trying to get the data in that memory space
It'd be great if you put things like that in your question.
In any case, you're talking about accessing someone else's memory, which is not done. Well, it's not permitted by the rules of standard C and C++. The various OSs have calls that can allow you to map some address space of another processes onto yours. But that's much more complex and OS-specific.
A memory address is an integer type (read number).
In your example, you have a char * (read string).
The following code:
char * address = "0xDEADBEEF";
void * ptr = ( void * )address;
will just put the address of that char * variable, as a void *, into p.
It won't set the pointer to memory address 0xDEADBEEF.
If you want to access that specific memory location (assuming you know what you are doing), you'll need something like:
void * ptr = ( void * )0xDEADBEEF;
I said "assuming you know what you are doing", because accessing such a specific address will eventually result in a segfault, since you basically don't know such an address is in your address space, unless you're doing stuff in ring 0 (read kernel), for instance, with DMA.
But then I would assume you know a pointer is a number, not a string...
The code can be pretty simple:
char *address, *limit;
for(address = (char *)0xdeadbeef, limit = address+4; address < limit; address++)
cout << *address;
Note, however, that while converting an arbitrary integer to an address is allowed, the result of using the resulting pointer isn't (even close to) portable. Based on your comment (that you're getting addresses via debug information) so the addresses you're getting should be valid, the result should normally work (there's just no guarantee of it being portable).
store your address as DWORD => DWORD address = 0xDEADBEEF
then cast this address to pointer => void *ptr = (void *)address
here is a example:
char *pointer = "FOO";
DWORD address = (DWORD)pointer;
printf("0x%u\n", address);
printf("%s\n", (char *)address); // prints FOO
address++; //move 1 byte
printf("%s\n", (char *)address); // prints OO