I have few questions:
1) why when I created more than two dynamic allocated variables the difference between their memory address is 16 bytes. (I thought one of the advantages of using dynamic variables is saving memory, so when you delete unused variable it will free that memory); but if the difference between two dynamic variables is 16 bytes even using a short integer, then there a lot of memery that I will not benifit .
2) creating a dynamic allocated variable using new operator.
int x;
cin >> x;
int* a = new int(3);
int y = 4;
int z = 1;
In the e.g above. what is the flow of execution of this program. is it gonna store all variable likes x,a,y and z in the stack and then will store the value 3 in the address that a points to?
3) creating a dynamic alloated array.
int x;
cin >> x;
int* array = new int[x];
int y = 4;
int z = 1;
and the same question here.
4) does the size of the heap(free scope) depend on how much of memory im using in the code area,the stack are, and the global area ?
Storing small values like integers on the heap is fairly pointless because you use the same or more memory to store the pointer. The 16 byte alignment is just so the CPU can access the memory as efficiently as possible.
Yes, although the stack variables might be allocated to registers; that is up to the compiler.
Same as 2.
The size of the heap is controlled by the operating system and expanded as necessary as you allocate more memory.
Yes, in the examples, a and array are both "stack" variables. The data they point to is not.
I put stack in quotes because we are not going to concern ourselves with hardware detail here, but just the semantics. They have the semantics of stack variables.
The chunks of heap memory which you allocate need to store some housekeeping data so that the allocator (the code which works in behind of new) could work. The data usually includes chunk length and the address of next allocated chunk, among other things — depending on the actual allocator.
In your case, the service data are stored directly in front of (and, maybe, behind of, too) the actual allocated chunk. This (plus, likely, alignment) is the reason of 16 byte gap you observe.
Related
I am now reading the source code of OPENCV, a computer vision open source library. I am confused with this function:
#define CV_MALLOC_ALIGN 16
void* fastMalloc( size_t size )
{
uchar* udata = (uchar*)malloc(size + sizeof(void*) + CV_MALLOC_ALIGN);
if(!udata)
return OutOfMemoryError(size);
uchar** adata = alignPtr((uchar**)udata + 1, CV_MALLOC_ALIGN);
adata[-1] = udata;
return adata;
}
/*!
Aligns pointer by the certain number of bytes
This small inline function aligns the pointer by the certian number of bytes by
shifting it forward by 0 or a positive offset.
*/
template<typename _Tp> static inline _Tp* alignPtr(_Tp* ptr, int n=(int)sizeof(_Tp))
{
return (_Tp*)(((size_t)ptr + n-1) & -n);
}
fastMalloc is used to allocated memory for a pointer, which invoke malloc function and then alignPtr. I cannot understand well why alignPtr is called after memory is allocated? My basic understanding is by doing so it is much faster for the machine to find the pointer. Can some references on this issue be found in the internet? For modern computer, is it still necessary to perform this operation? Any ideas will be appreciated.
Some platforms require certain types of data to appear on certain byte boundaries (e.g:- some compilers
require pointers to be stored on 4-byte boundaries).
This is called alignment, and it calls for extra padding within, and possibly at the end of, the object's data.
Compiler might break in case they didn't find proper alignment OR there could be performance bottleneck in reading that data ( as there would be a need to read two blocks for getting same data).
EDITED IN RESPONSE TO COMMENT:-
Memory request by a program is generally handled by memory allocator. One such memory allocator is fixed-size allocator. Fixed size allocation return chunks of specified size even if requested memory is less than that particular size. So, with that background let me try to explain what's going on here:-
uchar* udata = (uchar*)malloc(size + sizeof(void*) + CV_MALLOC_ALIGN);
This would allocate amount of memory which is equal to memory_requested + random_size. Here random_size is filling up the gap to make it fit for size specified for fixed allocation scheme.
uchar** adata = alignPtr((uchar**)udata + 1, CV_MALLOC_ALIGN);
This is trying to align pointer to specific boundary as explained above.
It allocates a block a bit bigger than it was asked for.
Then it sets adata to the address of the next properly allocated byte (add one byte, then round up to the next properly aligned address).
Then it stores the original pointer before the new address. I assume this is later used to free the originally allocated block.
And then we return the new address.
This only makes sense if CV_MALLOC_ALIGN is a stricter alignment than malloc guarantees - perhaps a cache line?
I have two ways of constructing a 2D array:
int arr[NUM_ROWS][NUM_COLS];
//...
tmp = arr[i][j]
and flattened array
int arr[NUM_ROWS*NUM_COLS];
//...
tmp = arr[i*NuM_COLS+j];
I am doing image processing so even a little improvement in access time is necessary. Which one is faster? I am thinking the first one since the second one needs calculation, but then the first one requires two addressing so I am not sure.
I don't think there is any performance difference. System will allocate same amount of contiguous memory in both cases. For calculate i*Numcols+j, either you would do it for 1D array declaration, or system would do it in 2D case. Only concern is ease of usage.
You should have trust into the capabilities of your compiler in optimizing standard code.
Also you should have trust into modern CPUs having fast numeric multiplication instructions.
Don't bother to use one or another!
I - decades ago - optimized some code greatly by using pointers instead of using 2d-array-calculation --> but this will a) only be useful if it is an option to store the pointer - e.g. in a loop and b) have low impact since i guess modern cpus should do 2d array access in a single cycle? Worth measuring! May be related to the array size.
In any case pointers using ptr++ or ptr += NuM_COLS will for sure be a little bit faster if applicable!
The first method will almost always be faster. IN GENERAL (because there are always corner cases) processor and memory architecture as well as compilers may have optimizations built in to aid with 2d arrays or other similar data structures. For example, GPUs are optimized for matrix (2d array) math.
So, again in general, I would allow the compiler and hardware to optimize your memory and address arithmetic if possible.
...also I agree with #Paul R, there are much bigger considerations when it comes to performance than your array allocation and address arithmetic.
There are two cases to consider: compile time definition and run-time definition of the array size. There is big difference in performance.
Static allocation, global or file scope, fixed size array:
The compiler knows the size of the array and tells the linker to allocate space in the data / memory section. This is the fastest method.
Example:
#define ROWS 5
#define COLUMNS 6
int array[ROWS][COLUMNS];
int buffer[ROWS * COLUMNS];
Run time allocation, function local scope, fixed size array:
The compiler knows the size of the array, and tells the code to allocate space in the local memory (a.k.a. stack) for the array. In general, this means adding a value to a stack register. Usually one or two instructions.
Example:
void my_function(void)
{
unsigned short my_array[ROWS][COLUMNS];
unsigned short buffer[ROWS * COLUMNS];
}
Run Time allocation, dynamic memory, fixed size array:
Again, the compiler has already calculated the amount of memory required for the array since it was declared with fixed size. The compiler emits code to call the memory allocation function with the required amount (usually passed as a parameter). A little slower because of the function call and the overhead required to find some dynamic memory (and maybe garbage collection).
Example:
void another_function(void)
{
unsigned char * array = new char [ROWS * COLS];
//...
delete[] array;
}
Run Time allocation, dynamic memory, variable size:
Regardless of the dimensions of the array, the compiler must emit code to calculate the amount of memory to allocate. This quantity is then passed to the memory allocation function. A little slower than above because of the code required to calculate the size.
Example:
int * create_board(unsigned int rows, unsigned int columns)
{
int * board = new int [rows * cols];
return board;
}
Since your goal is image processing then I would assume your images are too large for static arrays. The correct question you should be about dynamically allocated arrays
In C/C++ there are multiple ways you can allocate a dynamic 2D array How do I work with dynamic multi-dimensional arrays in C?. To make this work in both C/C++ we can use malloc with casting (for C++ only you can use new)
Method 1:
int** arr1 = (int**)malloc(NUM_ROWS * sizeof(int*));
for(int i=0; i<NUM_ROWS; i++)
arr[i] = (int*)malloc(NUM_COLS * sizeof(int));
Method 2:
int** arr2 = (int**)malloc(NUM_ROWS * sizeof(int*));
int* arrflat = (int*)malloc(NUM_ROWS * NUM_COLS * sizeof(int));
for (int i = 0; i < dimension1_max; i++)
arr2[i] = arrflat + (i*NUM_COLS);
Method 2 essentially creates a contiguous 2D array: i.e. arrflat[NUM_COLS*i+j] and arr2[i][j] should have identical performance. However, arrflat[NUM_COLS*i+j] and arr[i][j] from method 1 should not be expected to have identical performance since arr1 is not contiguous. Method 1, however, seems to be the method that is most commonly used for dynamic arrays.
In general, I use arrflat[NUM_COLS*i+j] so I don't have to think of how to allocated dynamic 2D arrays.
I am an expert C# programmer, but I am very new to C++. I get the basic idea of pointers just fine, but I was playing around. You can get the actual integer value of a pointer by casting it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
Which makes sense; it's a memory address. But I can move to the next pointer, and cast it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
and I get a seemingly random number (although this is not a good PSRG). What is this number? Is it another number used by the same process? Is it possibly from a different process? Is this bad practice, or disallowed? And if not, is there a use for this? It's kind of cool.
What is this number? Is it another number used by the same process? Is it possibly from a different process?
You cannot generally cast pointers to integers and back and expect them to be dereferencable. Integers are numbers. Pointers are pointers. They are totally different abstractions and are not compatible.
If integers are not large enough to be able to store the internal representation of pointers (which is likely the case; integers are usually 32 bits long and pointers are usually 64 bits long), or if you modify the integer before casting it back to a pointer, your program exhibits undefined behaviour and as such anything can happen.
See C++: Is it safe to cast pointer to int and later back to pointer again?
Is this bad practice, or disallowed?
Disallowed? Nah.
Bad practice? Terrible practice.
You move beyond i pointer by 4 or 8 bytes and print out the number, which might be another number stored in your program space. The value is unknown and this is Undefined Behavior. Also there is a good chance that you might get an error (that means your program can blow up) [Ever heard of SIGSEGV? The Segmentation violation problem]
You are discovering that random places in memory contain "unknown" data. Not only that, but you may find yourself pointing to memory that your process does not have "rights" to so that even the act of reading the contents of an address can cause a segmentation fault.
In general is you allocate some memory to a pointer (for example with malloc) you may take a look at these locations (which may have random data "from the last time" in them) and modify them. But data that does not belong explicitly to a pointer's block of memory can behave all kings of undefined behavior.
Incidentally if you want to look at the "next" location just to
NextValue = *(iptr + 1);
Don't do any casting - pointer arithmetic knows (in your case) exactly what the above means : " the contents of the next I refer location".
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
You can cast int to pointer and back again, and it will give you same value
Is it possibly from a different process? no it's not, and you can't access memory of other process except using readProcessMemmory and writeProcessMemory under win32 api.
You get other number because you add 1 to the pointer, try to subtract 1 and you will same value.
When you define an integer by
int i = 5;
it means you allocate a space in your thread stack, and initialize it as 5. Then you get a pointer to this memory, which is actually a position in you current thread stack
When you increase your pointer by 1, it means you point to the next location in your thread stack, and you parse it again as an integer,
int* jptr = (int*)((int)iptr + 1);
int j = (int)*jptr;
Then you will get an integer from you thread stack which is close to where you defined your int i.
Of course this is not suggested to do, unless you want to become an hacker and want to exploit stack overflow (here it means what it is, not the site name, ha!)
Using a pointer to point to a random address is very dangerous. You must not point to an address unless you know what you're doing. You could overwrite its content or you may try to modify a constant in read-only memory which leads to an undefined behaviour...
This for example when you want to retrieve the elements of an array. But cannot cast a pointer to integer. You just point to the start of the array and increase your pointer by 1 to get the next element.
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr;
printf("%d", *p); // this will print 1
p++; // pointer arithmetics
printf("%d", *p); // this will print 2
It's not "random". It just means that there are some data on the next address
Reading a 32-bit word from an address A will copy the 4 bytes at [A], [A+1], [A+2], [A+3] into a register. But if you dereference an int at [A+1] then the CPU will load the bytes from [A+1] to [A+4]. Since the value of [A+4] is unknown it may make you think that the number is "random"
Anyway this is EXTREMELY dangerous 💀 since
the pointer is misaligned. You may see the program runs fine because x86 allows for unaligned accesses (with some performance penalty). But most other architectures prohibit unaligned operations and your program will just end in segmentation fault. For more information read Purpose of memory alignment, Data Alignment: Reason for restriction on memory address being multiple of data type size
you may not be allowed to touch the next byte as it may be outside of your address space, is write-only, is used for another variable and you changed its value, or whatever other reasons. You'll also get a segfault in that case
the next byte may not be initialized and reading it will crash your application on some architectures
That's why the C and C++ standard state that reading memory outside an array invokes undefined behavior. See
How dangerous is it to access an array out of bounds?
Access array beyond the limit in C and C++
Is accessing a global array outside its bound undefined behavior?
I want to share some memory between different processes running a DLL. Therefore i create a memory-mapped-file by HANDLE hSharedFile = CreateFileMapping(...) then LPBYTE hSharedView = MapViewOfFile(...) and LPBYTE aux = hSharedView
Now I want to read a bool, a int, a float and a char from the aux array. Reading a bool and char is easy. But how would I go around reading a int or float? Notice that the int or float could start at position 9 e.g. a position that is not dividable by 4.
I know you can read a char[4] and then memcpy it into a float or int. But i really need this to be very fast. I am wondering if it is possible to do something with pointers?
Thanks in advance
If you know, for instance, that array elements aux[13..16] contain a float, then you can access this float in several ways:
float f = *(float*)&aux[13] ; // Makes a copy. The simplest solution.
float* pf = (float*)&aux[13] ; // Here you have to use *pf to access the float.
float& rf = *(float*)&aux[13] ; // Doesn't make a copy, and is probably what you want.
// (Just use rf to access the float.)
There is nothing wrong with grabbing an int at offset 9:
int* intptr = (int*) &data[9];
int mynumber = *intptr;
There might be a really tiny performance penalty for this "unaligned" access, but it will still work correctly, and the chances of you noticing any differences are slim.
First of all, I think you should measure. There are three options you can go with that I can think of:
with unaligned memory
with memcpy into buffers
with custom-aligned memory
Unaligned memory will work fine, it will just be slower than aligned. How slower is that, and does it matter to you? Measure to find out.
Copying into a buffer will trade off the slower unaligned accesses for additional copy operations. Measuring will tell you if it's worth it.
If using unaligned memory is too slow for you and you don't want to copy data around (perhaps because of the performance cost), then you can possibly do faster by wasting some memory space and increasing your program complexity. Don't use the mapped memory blindly: round your "base" pointer upwards to a suitable value (e.g. 8 bytes) and only do reads/writes at 8-byte increments of this "base" value. This will ensure that all your accesses will be aligned.
But do measure before you go into all this trouble.
I know that pointers store the address of the value that they point to, but if you display the value of a pointer directly to the screen, you get a hexadecimal number. If the number is exactly what the pointer stores, then when saying
pA = pB; //both are pointers
you're copying the address. Then wouldn't there be a bigger overhead to using pointers when working with very small items like ints and bools?
A pointer is essentially just a number. It stores the address in RAM where the data is. The pointer itself is pretty small (probably the same size as an int on 32 bit architectures, long on 64 bit).
You are correct though that an int * would not save any space when working with ints. But that is not the point (no pun intended). Pointers are there so you can have references to things, not just use the things themselves.
Memory addresses.
That is the locations in memory where other stuff is.
Pointers are generally the word size of the processor, so they can generally be moved around in a single instruction cycle. In short, they are fast.
As others have said, a pointer stores a memory address which is "just a number' but that is an abstraction. Depending on processor architecture it may be more than one number, for instance a base and offset that must be added to dereference the pointer. In this case the overhead is slightly higher than if the address is a single number.
Yes, there is overhead in accessing an int or a bool via a pointer vs. directly, where the processor can put the variable in a register. Pointers are usually used where the value of the indirection outweighs any overhead, i.e. traversing an array.
I've been referring to time overhead. Not sure if OP was more concerned space or time overhead.
The number refers to its address in memory. The size of a pointer is typically the native size of the computer's architecture so there is no additional overhead compared to any other primitive type.
On some architectures there is an additional overhead of pointers to characters because the architecture only supports addressing words (32- or 64-bit values). A pointer to a character is therefore stored as a word address and an offset of the character within that word. De-referencing the pointer involves fetching the word and then shifting and masking it's value to extract the character.
Let me start from the basics. First of all, you will have to know what variable are and how they are used.
Variables are basically memory locations(usually containing some values) and we use some identifier(i.e., variable names) to refer to that memory location and use the value present at that location.
For understanding it better, suppose we want the information from memory cells present at some location relative to the current variable. Can we use the identifier to extract information from nearby cells?
No. Because the identifier(variable name) will only give the value contained in that particular cell.
But, If somehow we can get the memory address at which this variable is present then we can easily move to nearby locations and use their information as well(at runtime).
This is where pointers come into play. They are used to store the location of that variable so that we can use the additional address information whenever required.
Syntax: To store the address of a variable we can simply use & (address-of) operator.
foo = &bar
Here foo stores the address of variable bar.
Now, what if we want to know the value present at that address?
For that, we can simply use the * (dereference) operator.
value = *foo
Now that we have to store the address of a variable, we'll be needing the memory the same way as we need in case of a variable. This means pointers are also stored in the memory the same way as other variables, so just like in case of variables, we can also store the address of a pointer into yet another pointer.
An address in memory. Points to somewhere! :-)
Yes, you're right, both in terms of speed and memory.
Pointers almost always take up more bytes than your standard int and, especially, bool and char data types. On modern machines pointers typically are 8 bytes while char is almost always just 1 byte.
In this example, accessing the the char and bool from Foo requires more machine instructions than accessing from Bar:
struct Foo
{
char * c; // single character
bool * b; // single bool
};
struct Bar
{
char c;
bool b;
};
... And if we decide to make some arrays, then the size of the arrays of Foo would be 8 times larger - and the code is more spread-apart so this means you'll end up having a lot more cache misses.
#include <vector>
int main()
{
int size = 1000000;
std::vector<Foo> foo(size);
std::vector<Bar> bar(size);
return 0;
}
As dmckee pointed out, a single copy of a one-byte bool and a single copy of a pointer are just as fast:
bool num1, num2,* p1, * p2;
num1 = num2; // this takes one clock cycle
p1 = p2; // this takes another
As dmckee said, this is true when you're using a 64-bit architecture.
However, copying of arrays of ints, bools and chars can be much faster, because we can squeeze multiples of them onto each register:
#include <iostream>
int main ()
{
const int n_elements = 100000 * sizeof(int64_t);
bool A[n_elements];
bool B[n_elements];
int64_t * A_fast = (int64_t *) A;
int64_t * B_fast = (int64_t *) B;
const int n_quick_elements = n_elements / sizeof(int64_t);
for (int i = 0; i < 10000; ++i)
for (int j = 0; j < n_quick_elements; ++j)
A_fast[j] = B_fast[j];
return 0;
}
The STL containers and other good libraries do this sort of thing for us, using type_traits (is_trivially_copyable) and std::memcopy. Using pointers under the false guise that they're always just as fast can prevent those libraries from optimising.
Conclusion: It may seem obvious with these examples, but only use pointers/references on basic data types when you need to take/give access to the original object.