Why uninitialized pointers cause mem access violations close to 0?

Why uninitialized pointers cause mem access violations close to 0? - c++

It is said that often (but not always) when you get an AV in a memory location close to zero (like $89) you have an uninitialized pointer.
But I have seen this also in Delphi books... Hm... or they have been all written by the same author(s)???
Update:
Quote from "C++ builder 6 developers guide" by Bob Swart et all, page 71:
When the memory address ZZZZZZZZZ is close to zero, the cause is often
an uninitialized pointer that has been accessed.
Why is it so? Why uninitialized pointers contain low numbers? Why not big numbers like $FFFFFFF or plain random numbers? Is this urban myth?

This is confusing "uninitialized pointers" with null references or null pointers. Access to an object's fields, or indexes into a pointer, will be represented as an offset with respect to the base pointer. If that reference is null then the offsets will generally be addresses either near zero (for positive offsets) or addresses near the maximum value of the native pointer size (for negative offsets).
Access violations at addresses with these characteristic small (or large) values are a good clue that you have a null reference or null pointer, specifically, and not simply an uninitialized pointer. An uninitialized reference can have a null value, but may also have any other value depending on how it is allocated.

Why uninitialized pointers contain low numbers?
They don't. They can contain any value.
Why not big numbers like $FFFFFFF?
They can perfectly well contain values like $FFFFFFF.
or plain random numbers?
Uninitialised variables tend not to be truly random. They typically contain whatever happened to have been written to that memory location the last time it was used. For instance, it is very common for uninitialised local variables to contain the same value every time a function is called because the history of stack usage happens to be repeatable.
It's also worth pointing out that random is an often misused word. People often say random when they actually mean distributed randomly with uniform distribution. I expect that's what you meant when you used the term random.

Your statement about AV close to zero is true for dereferencing a null pointer. It is zero or close to zero because you either dereference the null pointer:
int* p{};
const auto v = *p; // <-- AV at memory location = 0
or access an array item:
char* p{};
const auto v = p[100]; // <--AV at memory location = 100
or a struct field:
struct Data
{
int field1;
int field2;
};
Data* p{};
const auto v = p->field2; // AV at memory location = 4

Related

C++ what happens when incrementing a char *

Say I have the following code:
void incrementPointer( const char *x) {
char *localVar = new char;
char *localVarPtr = localVar;
while(*xPtr != '\0') {
xPtr++;
localVarPtr++;
}
}
Let's say that x is pointing to some null-terminated word. After the execution of the while loop, is the localVarPtr pointing to a location that has not been allocated for localVar? For instance, if I declare some other variable in this function, and then set all bytes between localVar and localVarPtr to the character 'c', would this potentially overwrite the value of the other variable?
My other question is if this is considered bad practice (i.e. potentially overwriting variables or causing undefined behavior), what would be the way to allocate enough space for localVar if x is pointing to a word who's size is unlimited? The size of x may be larger than an unsigned integer, and thus I would not be able to use the size in the initialization of localVar.

Let's say that x is pointing to some null-terminated word. After the execution of the while loop, is the localVarPtr pointing to a location that has not been allocated for localVar?
That depends entirely on the length of x - If for example x contains only the null-terminator then no, the value of localVarPtr points to the beginning of the region allocated by new.
For instance, if I declare some other variable in this function, and then set all bytes between localVar and localVarPtr to the character 'c', would this potentially overwrite the value of the other variable?
You're not mutating the memory stored in any of the addresses so no, an overwrite would not occur. If your localVarPtr points to the region at say address 0x8000 and you then modify 0x8010, assuming that 0x0810 is a "local variable" then yes, its contents would be changed. This is the wild west, any valid (or sometimes invalid) address that is dereferenced and assigned to will change or signal.
My other question is if this is considered bad practice (i.e. potentially overwriting variables or causing undefined behavior),
Yes, typically pointing to or modifying undefined memory is bad practice. A pointer to a local variable, intentionally set that way however, is perfectly fine. Undefined behavior is always bad practice.
what would be the way to allocate enough space for localVar if x is pointing to a word who's size is unlimited? The size of x may be larger than an unsigned integer, and thus I would not be able to use the size in the initialization of localVar.
As Mat denoted, there is no such way to have an "unlimited" length string. If you need to allocate space to match the size of the string for x, you'll have to simply know the size either by calling a function to determine its length, or passing it in to this function. Once you know the size, you can use new to allocate for it.

Trying to understand this variable definition

What exactly does this do?
int test = *(int*)(0x154512);

0x154512
is an integer, written in base 16.
(int*)(0x154512)
says to treat that number as the address of an int variable.
*(int*)(0x154512)
says to dereference that address, or get the int value at that address.
int test = *(int*)(0x154512)
says to declare the int variable test and assign it the int value located at address 0x154512.

Let's break this down into pieces.
0x154512 is a hex value, or base-16, which is often used for memory addresses for convenience reasons.
int* declares a pointer to a value of type int. So, (int*)(0x154512) means 0x154512 is being treated as a memory address, which we expect to hold an integer.
The last * on the left is the dereference operator, which means "get the value located at this pointer" more or less.
So it copies the integer at memory address 0x154512 to the variable "test".
For more information on pointers:
http://www.cplusplus.com/doc/tutorial/pointers/
If you're planning to do a lot of C++ in the future, make sure to give this a nice, long read. Pointers are fun.

One line get integer value stored at 0x154512 memory location

Array values at negative one

Are there any repricussions having a value in an array stored at -1? could it affect the program or computer in a bad way? I am really curious, I'm new to programming and any clarification I can get really helps, thanks.

There's no way to store anything in an array object at index -1. A mere attempt to obtain a pointer to that non-existing element results in undefined behavior.
Negative indices (like -1) may appear in array-like contexts in situations when base pointer is not the array object itself, but rather an independent pointer pointing into the middle of another array object, as in
int a[10];
int *p = &a[5];
p[-1] = 42; // OK, sets `a[4]`
p[-2] = 5; // OK, sets `a[3]`
But any attempts to access non-existent elements before the beginning of the actual array result in undefined behavior
a[-1]; // undefined behavior
p[-6]; // undefined behavior

You see if you are trying to take element of array by pointing some value in brackets, basically you're specifying offset (multiplied by size of allocated type) from memory address. If you've allocated array in a typical way like int *a = new int[N], memory you're allowed to use is limited from address a until a + <size of memory allocated> (which in this case sizeof (int) * N), so by trying to get value with index -1 you are getting out of bounds of your array and it certainly will lead you to error or possible program crash.
There's of course a chance that your memory pointer is not the one at the beginning of some allocated sequence, like (considering previous example) int *b = a + 1, in this case you may (at least compiler allows that) take value of a[-1] and it would be valid, but since it's pretty hard to manage correctness of code like this I would still recommend against it.

Intuitively explaining pointers and their significance?

I'm having a hard time understanding pointers, particularly function pointers, and I was hoping someone could give me a rundown of exactly what they are and how they should be used in a program. Code blocks in C++ would be especially appreciated.
Thank you.

The concept of indirection is important to understand.
Here we are passing by value (note that a local copy is created and operated on, not the original version) via increment(x):
And here, by pointer (memory address) via increment(&x):
Note that references work similarly to pointers except that the syntax is similar to value copies (obj.member) and that pointers can point to 0 ("null" pointer) whereas references must be pointing to non-zero memory addresses.
Function pointers, on the other hand, let you dynamically change the behaviour of code at runtime by conveniently passing around and dealing withh functions in the same way you would pass around variables. Functors are often preferred (especially by the STL) since their syntax is cleaner and they let you associate local state with a function instance (read up about callbacks and closures, both are useful computer science concepts). For simple function pointers/callbacks, lambdas are often used (new in C++11) due to their compact and in-place syntax.

The pointer points to the point, is an integer value that has the address of that point.
Pointers can point to other pointers. Then you can get the values more-indirect way.
Reference operator (&):
You may equate a pointer to a reference of a variable or a pointer.
Dereference operator (*):
You may get the value of cell pointed by the pointer.
Arrays are decayed into a pointer when passed to a function by not a reference.
Function pointers are not inlined and makes program more functional. Callback is an example to this.

As an analogy, think of the memory in the computer as an Excel sheet. The equivalent of assigning a value to a variable in a C/C++ program would be to write something into a cell on the Excel sheet. Reading from a variable would be like looking at a cell's content.
Now, if you have a cell (say C3) whose content is "B8" you can interpret that content as a reference to another cell. If you treat the cell C3 in that manner, C3 becomes like a pointer. (In Excel, you can actually achieve this behavior by entering =B8 into C3).
In such a scenario, you basically state that the cell whose value you're interested in is referenced in C3. In C++, this could be something like:
int B8 = 42;
int* C3 = &B8;
You now have two variables that occuppy memory. Now, if you want to know what C3 points to, you'll use
int my_value = *C3;
As for function pointers: these are variables like ordinary pointers but the address (cell) they point to is not just a value but rather a function you can call.

Here you can find some example uses of Function pointer:
http://www.cprogramming.com/tutorial/function-pointers.html

In order to understand pointers one needs to understand a bit about hardware and memory layout.
Computer memory can be seen as a cupboard with drawers. A pointer can point to an arbitrary drawer, when you "dereference" the pointer you are looking inside the drawer, i.e. the value stored in the "drawer":
e.g. ten numbers stored after one another
short a[] = {9,2,3,4,5,6,7,8,1,-1] ;
in memory the values that consist the array 'a' are stored sequentially
+---+---+---+---+---+---+---+---+---+---+
a->| 9 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 1 | -1|
+---+---+---+---+---+---+---+---+---+---+
the 'a' array above is a pointer and is the start address where the
values are stored in memory.
short* ptr = 0; // a pointer not pointing to anything (NULL)
ptr = a + 5; // the pointer is now pointing to the 6th value in the array
// the + 5 is an offset on the starting address of a and since the type of
// a is short int array the compiler calculates the correct byte offset based
// on that type. in the above example 5 is 5 short ints since ptr is of type
// short*
*ptr has the value 6 i.e. we are looking at what the ptr is pointing to.
The size of each "drawer" is determined of the data type that is stored
In the above example 10 short ints are stored, every short int occupies 2 bytes
so the whole array occupies 20 bytes of memory (sizeof(a))

Variables store values.
Pointers store values too. The only difference is that they are memory addresses. They are still numbers.
Nothing complicated here.
You could store all pointers in int or long variables:
int p = 43567854;
char *p1 = (char *) p;
But advantage of storing them in pointer variables is that you can describe what type of variable is saved at the address the pointer points to.
What gets complicated about pointers is the cryptic syntax you have to use with them. Syntax is cryptic so that it's short to type.
Like:
&p = return address of variable
*p = return the value of first member of array that is stored at the address
By using the two rules above we can write this cryptic code:
&(*++t)
Which translated into human language gets quite long:
Increase value of t by 1. this now points to second member of array of values pointer points to. then get value of second member (*) and then get address of this value. if we print this pointer it will print whole string except the first character.
You should make a "pointers syntax cheat sheet.txt" and you are good.
And have open "Pointers tests" projects to test everythng that is unclear to you.
Pointers are similar to regular expressions in a way.

What does "dereferencing" a pointer mean?

Please include an example with the explanation.

Reviewing the basic terminology
It's usually good enough - unless you're programming assembly - to envisage a pointer containing a numeric memory address, with 1 referring to the second byte in the process's memory, 2 the third, 3 the fourth and so on....
What happened to 0 and the first byte? Well, we'll get to that later - see null pointers below.
For a more accurate definition of what pointers store, and how memory and addresses relate, see "More about memory addresses, and why you probably don't need to know" at the end of this answer.
When you want to access the data/value in the memory that the pointer points to - the contents of the address with that numerical index - then you dereference the pointer.
Different computer languages have different notations to tell the compiler or interpreter that you're now interested in the pointed-to object's (current) value - I focus below on C and C++.
A pointer scenario
Consider in C, given a pointer such as p below...
const char* p = "abc";
...four bytes with the numerical values used to encode the letters 'a', 'b', 'c', and a 0 byte to denote the end of the textual data, are stored somewhere in memory and the numerical address of that data is stored in p. This way C encodes text in memory is known as ASCIIZ.
For example, if the string literal happened to be at address 0x1000 and p a 32-bit pointer at 0x2000, the memory content would be:
Memory Address (hex) Variable name Contents
1000 'a' == 97 (ASCII)
1001 'b' == 98
1002 'c' == 99
1003 0
...
2000-2003 p 1000 hex
Note that there is no variable name/identifier for address 0x1000, but we can indirectly refer to the string literal using a pointer storing its address: p.
Dereferencing the pointer
To refer to the characters p points to, we dereference p using one of these notations (again, for C):
assert(*p == 'a'); // The first character at address p will be 'a'
assert(p[1] == 'b'); // p[1] actually dereferences a pointer created by adding
// p and 1 times the size of the things to which p points:
// In this case they're char which are 1 byte in C...
assert(*(p + 1) == 'b'); // Another notation for p[1]
You can also move pointers through the pointed-to data, dereferencing them as you go:
++p; // Increment p so it's now 0x1001
assert(*p == 'b'); // p == 0x1001 which is where the 'b' is...
If you have some data that can be written to, then you can do things like this:
int x = 2;
int* p_x = &x; // Put the address of the x variable into the pointer p_x
*p_x = 4; // Change the memory at the address in p_x to be 4
assert(x == 4); // Check x is now 4
Above, you must have known at compile time that you would need a variable called x, and the code asks the compiler to arrange where it should be stored, ensuring the address will be available via &x.
Dereferencing and accessing a structure data member
In C, if you have a variable that is a pointer to a structure with data members, you can access those members using the -> dereferencing operator:
typedef struct X { int i_; double d_; } X;
X x;
X* p = &x;
p->d_ = 3.14159; // Dereference and access data member x.d_
(*p).d_ *= -1; // Another equivalent notation for accessing x.d_
Multi-byte data types
To use a pointer, a computer program also needs some insight into the type of data that is being pointed at - if that data type needs more than one byte to represent, then the pointer normally points to the lowest-numbered byte in the data.
So, looking at a slightly more complex example:
double sizes[] = { 10.3, 13.4, 11.2, 19.4 };
double* p = sizes;
assert(p[0] == 10.3); // Knows to look at all the bytes in the first double value
assert(p[1] == 13.4); // Actually looks at bytes from address p + 1 * sizeof(double)
// (sizeof(double) is almost always eight bytes)
++p; // Advance p by sizeof(double)
assert(*p == 13.4); // The double at memory beginning at address p has value 13.4
*(p + 2) = 29.8; // Change sizes[3] from 19.4 to 29.8
// Note earlier ++p and + 2 here => sizes[3]
Pointers to dynamically allocated memory
Sometimes you don't know how much memory you'll need until your program is running and sees what data is thrown at it... then you can dynamically allocate memory using malloc. It is common practice to store the address in a pointer...
int* p = (int*)malloc(sizeof(int)); // Get some memory somewhere...
*p = 10; // Dereference the pointer to the memory, then write a value in
fn(*p); // Call a function, passing it the value at address p
(*p) += 3; // Change the value, adding 3 to it
free(p); // Release the memory back to the heap allocation library
In C++, memory allocation is normally done with the new operator, and deallocation with delete:
int* p = new int(10); // Memory for one int with initial value 10
delete p;
p = new int[10]; // Memory for ten ints with unspecified initial value
delete[] p;
p = new int[10](); // Memory for ten ints that are value initialised (to 0)
delete[] p;
See also C++ smart pointers below.
Losing and leaking addresses
Often a pointer may be the only indication of where some data or buffer exists in memory. If ongoing use of that data/buffer is needed, or the ability to call free() or delete to avoid leaking the memory, then the programmer must operate on a copy of the pointer...
const char* p = asprintf("name: %s", name); // Common but non-Standard printf-on-heap
// Replace non-printable characters with underscores....
for (const char* q = p; *q; ++q)
if (!isprint(*q))
*q = '_';
printf("%s\n", p); // Only q was modified
free(p);
...or carefully orchestrate reversal of any changes...
const size_t n = ...;
p += n;
...
p -= n; // Restore earlier value...
free(p);
C++ smart pointers
In C++, it's best practice to use smart pointer objects to store and manage the pointers, automatically deallocating them when the smart pointers' destructors run. Since C++11 the Standard Library provides two, unique_ptr for when there's a single owner for an allocated object...
{
std::unique_ptr<T> p{new T(42, "meaning")};
call_a_function(p);
// The function above might throw, so delete here is unreliable, but...
} // p's destructor's guaranteed to run "here", calling delete
...and shared_ptr for share ownership (using reference counting)...
{
auto p = std::make_shared<T>(3.14, "pi");
number_storage1.may_add(p); // Might copy p into its container
number_storage2.may_add(p); // Might copy p into its container } // p's destructor will only delete the T if neither may_add copied it
Null pointers
In C, NULL and 0 - and additionally in C++ nullptr - can be used to indicate that a pointer doesn't currently hold the memory address of a variable, and shouldn't be dereferenced or used in pointer arithmetic. For example:
const char* p_filename = NULL; // Or "= 0", or "= nullptr" in C++
int c;
while ((c = getopt(argc, argv, "f:")) != -1)
switch (c) {
case f: p_filename = optarg; break;
}
if (p_filename) // Only NULL converts to false
... // Only get here if -f flag specified
In C and C++, just as inbuilt numeric types don't necessarily default to 0, nor bools to false, pointers are not always set to NULL. All these are set to 0/false/NULL when they're static variables or (C++ only) direct or indirect member variables of static objects or their bases, or undergo zero initialisation (e.g. new T(); and new T(x, y, z); perform zero-initialisation on T's members including pointers, whereas new T; does not).
Further, when you assign 0, NULL and nullptr to a pointer the bits in the pointer are not necessarily all reset: the pointer may not contain "0" at the hardware level, or refer to address 0 in your virtual address space. The compiler is allowed to store something else there if it has reason to, but whatever it does - if you come along and compare the pointer to 0, NULL, nullptr or another pointer that was assigned any of those, the comparison must work as expected. So, below the source code at the compiler level, "NULL" is potentially a bit "magical" in the C and C++ languages...
More about memory addresses, and why you probably don't need to know
More strictly, initialised pointers store a bit-pattern identifying either NULL or a (often virtual) memory address.
The simple case is where this is a numeric offset into the process's entire virtual address space; in more complex cases the pointer may be relative to some specific memory area, which the CPU may select based on CPU "segment" registers or some manner of segment id encoded in the bit-pattern, and/or looking in different places depending on the machine code instructions using the address.
For example, an int* properly initialised to point to an int variable might - after casting to a float* - access memory in "GPU" memory quite distinct from the memory where the int variable is, then once cast to and used as a function pointer it might point into further distinct memory holding machine opcodes for the program (with the numeric value of the int* effectively a random, invalid pointer within these other memory regions).
3GL programming languages like C and C++ tend to hide this complexity, such that:
If the compiler gives you a pointer to a variable or function, you can dereference it freely (as long as the variable's not destructed/deallocated meanwhile) and it's the compiler's problem whether e.g. a particular CPU segment register needs to be restored beforehand, or a distinct machine code instruction used
If you get a pointer to an element in an array, you can use pointer arithmetic to move anywhere else in the array, or even to form an address one-past-the-end of the array that's legal to compare with other pointers to elements in the array (or that have similarly been moved by pointer arithmetic to the same one-past-the-end value); again in C and C++, it's up to the compiler to ensure this "just works"
Specific OS functions, e.g. shared memory mapping, may give you pointers, and they'll "just work" within the range of addresses that makes sense for them
Attempts to move legal pointers beyond these boundaries, or to cast arbitrary numbers to pointers, or use pointers cast to unrelated types, typically have undefined behaviour, so should be avoided in higher level libraries and applications, but code for OSes, device drivers, etc. may need to rely on behaviour left undefined by the C or C++ Standard, that is nevertheless well defined by their specific implementation or hardware.

Dereferencing a pointer means getting the value that is stored in the memory location pointed by the pointer. The operator * is used to do this, and is called the dereferencing operator.
int a = 10;
int* ptr = &a;
printf("%d", *ptr); // With *ptr I'm dereferencing the pointer.
// Which means, I am asking the value pointed at by the pointer.
// ptr is pointing to the location in memory of the variable a.
// In a's location, we have 10. So, dereferencing gives this value.
// Since we have indirect control over a's location, we can modify its content using the pointer. This is an indirect way to access a.
*ptr = 20; // Now a's content is no longer 10, and has been modified to 20.

In simple words, dereferencing means accessing the value from a certain memory location against which that pointer is pointing.

A pointer is a "reference" to a value.. much like a library call number is a reference to a book. "Dereferencing" the call number is physically going through and retrieving that book.
int a=4 ;
int *pA = &a ;
printf( "The REFERENCE/call number for the variable `a` is %p\n", pA ) ;
// The * causes pA to DEREFERENCE... `a` via "callnumber" `pA`.
printf( "%d\n", *pA ) ; // prints 4..
If the book isn't there, the librarian starts shouting, shuts the library down, and a couple of people are set to investigate the cause of a person going to find a book that isn't there.

Code and explanation from Pointer Basics:
The dereference operation starts at
the pointer and follows its arrow over
to access its pointee. The goal may be
to look at the pointee state or to
change the pointee state. The
dereference operation on a pointer
only works if the pointer has a
pointee -- the pointee must be
allocated and the pointer must be set
to point to it. The most common error
in pointer code is forgetting to set
up the pointee. The most common
runtime crash because of that error in
the code is a failed dereference
operation. In Java the incorrect
dereference will be flagged politely
by the runtime system. In compiled
languages such as C, C++, and Pascal,
the incorrect dereference will
sometimes crash, and other times
corrupt memory in some subtle, random
way. Pointer bugs in compiled
languages can be difficult to track
down for this reason.
void main() {
int* x; // Allocate the pointer x
x = malloc(sizeof(int)); // Allocate an int pointee,
// and set x to point to it
*x = 42; // Dereference x to store 42 in its pointee
}

I think all the previous answers are wrong, as they
state that dereferencing means accessing the actual value.
Wikipedia gives the correct definition instead:
https://en.wikipedia.org/wiki/Dereference_operator
It operates on a pointer variable, and returns an l-value equivalent to the value at the pointer address. This is called "dereferencing" the pointer.
That said, we can dereference the pointer without ever
accessing the value it points to. For example:
char *p = NULL;
*p;
We dereferenced the NULL pointer without accessing its
value. Or we could do:
p1 = &(*p);
sz = sizeof(*p);
Again, dereferencing, but never accessing the value. Such code will NOT crash:
The crash happens when you actually access the data by an
invalid pointer. However, unfortunately, according the the
standard, dereferencing an invalid pointer is an undefined
behaviour (with a few exceptions), even if you don't try to
touch the actual data.
So in short: dereferencing the pointer means applying the
dereference operator to it. That operator just returns an
l-value for your future use.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js