I was looking over some C++ code and I ran into this memcpy function. I understand what memcpy does but they add an int to the source. I tried looking up the source code for memcpy but I can't seem to understand what the adding is actually doing to the memcpy function.
memcpy(Destination, SourceData + intSize, SourceDataSize);
In other words, I want to know what SourceData + intSize is doing. (I am trying to convert this to java.)
EDIT:
So here is my attempt at doing a memcpy function in java using a for loop...
for(int i = 0 ; i < SourceDataSize ; i ++ ) {
Destination[i] = SourceData[i + 0x100];
}
It is the same thing as:
memcpy(&Destination[0], &SourceData[intSize], SourceDataSize);
This is basic pointer arithmetic. SourceData points to some data type, and adding n to it increases the address it's pointing to by n * sizeof(*SourceData).
For example, if SourceData is defined as:
uint32_t *SourceData;
and
sizeof(uint32_t) == 4
then adding 2 to SourceData would increase the address it holds by 8.
As an aside, if SourceData is defined as an array, then adding n to it is sometimes the same as accessing the nth element of the array. It's easy enough to see for n==0; when n==1, it's easy to see that you'll be accessing a memory address that's sizeof(*SourceData) bytes after the beginning of the array.
SourceData + intSize is skipping intSize * sizeof(source data type) bytes at the beginning of SourceData. Maybe SourceDataSize is stored there or something like that.
The closest equivalent to memcpy in Java that you're probably going to get is System.arraycopy, since Java doesn't really have pointers in the same sense.
The add will change the address used for the source of the memory copy.
The amount the address changes will depend on the type of SourceData.
(See http://www.learncpp.com/cpp-tutorial/68-pointers-arrays-and-pointer-arithmetic/)
It might be trying to copy a section of an array SourceData starting at offset intSize and of length SourceDataSize/sizeof(*SourceData).
EDIT
So, for example, if the array was of integers of size 4 bytes, then the equivalent java code would look like:
for(int i = 0 ; i < SourceDataSize/4 ; i ++ ) {
Destination[i] = SourceData[i + intSize];
}
Regarding doing this in Java:
Your loop
for(int i = 0 ; i < SourceDataSize ; i ++ ) {
Destination[i] = SourceData[i + 0x100];
}
will always start copying data from 0x100 elements into SourceData; this may not be desired behavior. (For instance, when i=0, Destination[0] = SourceData[0 + 0x100]; and so forth.) This would be what you wanted if you never wanted to copy SourceData[0]..SourceData[0xFF], but note that hard-coding this prevents it from being a drop-in replacement for memcpy.
The reason the intSize value is specified in the original code is likely because the first intSize elements are not part of the 'actual' data, and those bytes are used for bookkeeping somehow (like a record of what the total size of the buffer is). memcpy itself doesn't 'see' the offset; it only knows the pointer it's starting with. SourceData + intSize creates a pointer that points intSize bytes past SourceData.
But, more importantly, what you are doing is likely to be extremely slow. memcpy is a very heavily optimized function that maps to carefully tuned assembly on most architectures, and replacing it with a simple loop-per-byte iteration will dramatically impact the performance characteristics of the code. While what you are doing is appropriate if you are trying to understand how memcpy and pointers work, note that if you are attempting to port existing code to Java for actual use you will likely want to use a morally equivalent Java function like java.util.Arrays.copyOf.
Related
I have a linear int array arr, which is on CUDA global memory. I want to set sub-arrays of arr to defined values. The sub-array start indexes are given by the starts array, while the length of each sub-array is given in counts array.
What I want to do is to set the value of sub-array i starting from starts[i] and continuing upto counts[i] to the value starts[i]. That is, the operation is:
arr[starts[i]: starts[i]+counts[i]] = starts[i]
I thought of using memset() in the kernel for setting the values. However, it is not getting correctly written ( the array elements are being assigned some random values). The code I am using is:
#include <stdlib.h>
__global__ void kern(int* starts,int* counts, int* arr,int* numels)
{
unsigned int idx = threadIdx.x + blockIdx.x*blockDim.x;
if (idx>=numels[0])
return;
const int val = starts[idx];
memset(&arr[val], val, sizeof(arr[0])*counts[idx]) ;
__syncthreads();
}
Please note that numels[0] contains the number of elements in starts array.
I have checked the code with cuda-memcheck() but didn't get any errors. I am using PyCUDA, if it's relevant. I am probably misunderstanding the usage of memset here, as I am learning CUDA.
Can you please suggest a way to correct this? Or other efficient way of doint this operation.
P.S: I know that thrust::fill() can probably do this well, but since I am learning CUDA, I would like to know how to do this without using external libraries.
The memset and memcpy implementations in CUDA device code emit simple, serial, byte values operations (and note that memset can't set anything other than byte values, which might be contributing to the problem you see if the values you are trying to set are larger than 8 bits).
You could replace the memset call with something like this:
const int val = starts[idx];
//memset(&arr[val], val, sizeof(arr[0])*counts[idx]) ;
for(int i = 0; i < counts[idx]; i++)
arr[val + i] = val;
The performance of that code will probably be better than the built-in memset.
Note also that the __syncthreads() call at the end of your kernel is both unnecessary, and a potential source of deadlock and should be removed. See here for more information.
How close to the maximum value can a valid pointer be (as a global, allocated on the stack, malloc, new, VirtualAlloc, or any other alloc method a program/library might use), such that ptr + n risks overflowing?
I come across a lot of code that adds values to pointers when dealing with strings/arrays (in C++ sometimes also in a generic "random access iterator" template function).
e.g.
auto end = arr_ptr + len; //or just whatever some_container.end() returns
for (auto i = begin; i < end; ++i) { ... }
for (auto i = begin; i + 2 <= end; i += 2) { ...i[0]...i[1]... }
if (arr_ptr + 4 <= end && memcmp(arr_ptr, "test", 4) == 0) { ... }
if (arr_ptr + count > end) resize(...);
Would it be valid for the last array element to end on 0xFFFFFFFF (assuming 32bit), such that end == 0? If not, how close can it be?
I think always using p != end (and only ever adding 1) or taking the length as len = end - begin then using that (e.g. (end - begin) >= 4) is always safe, but wondering if it is actually an issue to look out for, and audit and change existing code for.
The standard doesn't talk about pointer overflow, it talks about what pointer values can legitimately be formed by pointer arithmetic. Simply put, the legitimate range is pointers into your object/array plus a one-past-the-end pointer.
Then, it is the responsibility of the C or C++ implementation not to create any objects in locations where some implementation-specific danger like pointer overflow prevents those legitimate pointer values from working correctly.
So neither malloc etc, nor the stack (presuming you haven't exceeded any stack bounds) will give you an array of char, starting at an address to which you cannot (due to overflow) add the size of the array.
how close can it be?
As close as allows all the required pointer values to work correctly. So on this 32-bit system, a 1-byte object starting at 0xFFFFFFFE would be the maximum possible address. The standard doesn't permit you to add 2 to the address, so it "doesn't matter" that doing so would overflow, so far as the implementation is concerned. For a 2-byte object the max would be 0xFFFFFFFD if the type is unaligned, but that's an odd number, so 0xFFFFFFFC if it requires 2-alignment.
Of course, other implementation details might dictate a lower limit. For example, it's not unusual for a system to reserve a page of memory either side of 0 and make it inaccessible. This helps catch errors where someone has accessed a null pointer with a small offset. Granted, this is more likely to happen with positive offsets than negative, but still. If your 32-bit system decided to do that, then malloc would need to take account of it, and would never return 0xFFFFFFFE.
I am an expert C# programmer, but I am very new to C++. I get the basic idea of pointers just fine, but I was playing around. You can get the actual integer value of a pointer by casting it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
Which makes sense; it's a memory address. But I can move to the next pointer, and cast it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
and I get a seemingly random number (although this is not a good PSRG). What is this number? Is it another number used by the same process? Is it possibly from a different process? Is this bad practice, or disallowed? And if not, is there a use for this? It's kind of cool.
What is this number? Is it another number used by the same process? Is it possibly from a different process?
You cannot generally cast pointers to integers and back and expect them to be dereferencable. Integers are numbers. Pointers are pointers. They are totally different abstractions and are not compatible.
If integers are not large enough to be able to store the internal representation of pointers (which is likely the case; integers are usually 32 bits long and pointers are usually 64 bits long), or if you modify the integer before casting it back to a pointer, your program exhibits undefined behaviour and as such anything can happen.
See C++: Is it safe to cast pointer to int and later back to pointer again?
Is this bad practice, or disallowed?
Disallowed? Nah.
Bad practice? Terrible practice.
You move beyond i pointer by 4 or 8 bytes and print out the number, which might be another number stored in your program space. The value is unknown and this is Undefined Behavior. Also there is a good chance that you might get an error (that means your program can blow up) [Ever heard of SIGSEGV? The Segmentation violation problem]
You are discovering that random places in memory contain "unknown" data. Not only that, but you may find yourself pointing to memory that your process does not have "rights" to so that even the act of reading the contents of an address can cause a segmentation fault.
In general is you allocate some memory to a pointer (for example with malloc) you may take a look at these locations (which may have random data "from the last time" in them) and modify them. But data that does not belong explicitly to a pointer's block of memory can behave all kings of undefined behavior.
Incidentally if you want to look at the "next" location just to
NextValue = *(iptr + 1);
Don't do any casting - pointer arithmetic knows (in your case) exactly what the above means : " the contents of the next I refer location".
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
You can cast int to pointer and back again, and it will give you same value
Is it possibly from a different process? no it's not, and you can't access memory of other process except using readProcessMemmory and writeProcessMemory under win32 api.
You get other number because you add 1 to the pointer, try to subtract 1 and you will same value.
When you define an integer by
int i = 5;
it means you allocate a space in your thread stack, and initialize it as 5. Then you get a pointer to this memory, which is actually a position in you current thread stack
When you increase your pointer by 1, it means you point to the next location in your thread stack, and you parse it again as an integer,
int* jptr = (int*)((int)iptr + 1);
int j = (int)*jptr;
Then you will get an integer from you thread stack which is close to where you defined your int i.
Of course this is not suggested to do, unless you want to become an hacker and want to exploit stack overflow (here it means what it is, not the site name, ha!)
Using a pointer to point to a random address is very dangerous. You must not point to an address unless you know what you're doing. You could overwrite its content or you may try to modify a constant in read-only memory which leads to an undefined behaviour...
This for example when you want to retrieve the elements of an array. But cannot cast a pointer to integer. You just point to the start of the array and increase your pointer by 1 to get the next element.
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr;
printf("%d", *p); // this will print 1
p++; // pointer arithmetics
printf("%d", *p); // this will print 2
It's not "random". It just means that there are some data on the next address
Reading a 32-bit word from an address A will copy the 4 bytes at [A], [A+1], [A+2], [A+3] into a register. But if you dereference an int at [A+1] then the CPU will load the bytes from [A+1] to [A+4]. Since the value of [A+4] is unknown it may make you think that the number is "random"
Anyway this is EXTREMELY dangerous 💀 since
the pointer is misaligned. You may see the program runs fine because x86 allows for unaligned accesses (with some performance penalty). But most other architectures prohibit unaligned operations and your program will just end in segmentation fault. For more information read Purpose of memory alignment, Data Alignment: Reason for restriction on memory address being multiple of data type size
you may not be allowed to touch the next byte as it may be outside of your address space, is write-only, is used for another variable and you changed its value, or whatever other reasons. You'll also get a segfault in that case
the next byte may not be initialized and reading it will crash your application on some architectures
That's why the C and C++ standard state that reading memory outside an array invokes undefined behavior. See
How dangerous is it to access an array out of bounds?
Access array beyond the limit in C and C++
Is accessing a global array outside its bound undefined behavior?
I've seen very often array iterations using plain pointer arithmetic even in newer C++ code. I wonder how safe they really are and if it's a good idea to use them. Consider this snippet (it compiles also in C if you put calloc in place of new):
int8_t *buffer = new int8_t[16];
for (int8_t *p = buffer; p < buffer + 16; p++) {
...
}
Wouldn't this kind of iteration result in an overflow and the loop being skipped completely when buffer happens to become allocated at address 0xFFFFFFF0 (in a 32 bit address space) or 0xFFFFFFFFFFFFFFF0 (64 bit)?
As far as I know, this would be an exceptionally unlucky, but still possible circumstance.
This is safe. The C and C++ standards explicitly allow you to calculate a pointer value that points one item beyond the end of an array, and to compare a pointer that points within the array to that value.
An implementation that had an overflow problem in the situation you describe would simply not be allowed to place an array right at the end of memory like that.
In practice, a more likely problem is buffer + 16 comparing equal to NULL, but this is not allowed either and again a conforming implementation would need to leave an empty place following the end of the array.
I know that pointers store the address of the value that they point to, but if you display the value of a pointer directly to the screen, you get a hexadecimal number. If the number is exactly what the pointer stores, then when saying
pA = pB; //both are pointers
you're copying the address. Then wouldn't there be a bigger overhead to using pointers when working with very small items like ints and bools?
A pointer is essentially just a number. It stores the address in RAM where the data is. The pointer itself is pretty small (probably the same size as an int on 32 bit architectures, long on 64 bit).
You are correct though that an int * would not save any space when working with ints. But that is not the point (no pun intended). Pointers are there so you can have references to things, not just use the things themselves.
Memory addresses.
That is the locations in memory where other stuff is.
Pointers are generally the word size of the processor, so they can generally be moved around in a single instruction cycle. In short, they are fast.
As others have said, a pointer stores a memory address which is "just a number' but that is an abstraction. Depending on processor architecture it may be more than one number, for instance a base and offset that must be added to dereference the pointer. In this case the overhead is slightly higher than if the address is a single number.
Yes, there is overhead in accessing an int or a bool via a pointer vs. directly, where the processor can put the variable in a register. Pointers are usually used where the value of the indirection outweighs any overhead, i.e. traversing an array.
I've been referring to time overhead. Not sure if OP was more concerned space or time overhead.
The number refers to its address in memory. The size of a pointer is typically the native size of the computer's architecture so there is no additional overhead compared to any other primitive type.
On some architectures there is an additional overhead of pointers to characters because the architecture only supports addressing words (32- or 64-bit values). A pointer to a character is therefore stored as a word address and an offset of the character within that word. De-referencing the pointer involves fetching the word and then shifting and masking it's value to extract the character.
Let me start from the basics. First of all, you will have to know what variable are and how they are used.
Variables are basically memory locations(usually containing some values) and we use some identifier(i.e., variable names) to refer to that memory location and use the value present at that location.
For understanding it better, suppose we want the information from memory cells present at some location relative to the current variable. Can we use the identifier to extract information from nearby cells?
No. Because the identifier(variable name) will only give the value contained in that particular cell.
But, If somehow we can get the memory address at which this variable is present then we can easily move to nearby locations and use their information as well(at runtime).
This is where pointers come into play. They are used to store the location of that variable so that we can use the additional address information whenever required.
Syntax: To store the address of a variable we can simply use & (address-of) operator.
foo = &bar
Here foo stores the address of variable bar.
Now, what if we want to know the value present at that address?
For that, we can simply use the * (dereference) operator.
value = *foo
Now that we have to store the address of a variable, we'll be needing the memory the same way as we need in case of a variable. This means pointers are also stored in the memory the same way as other variables, so just like in case of variables, we can also store the address of a pointer into yet another pointer.
An address in memory. Points to somewhere! :-)
Yes, you're right, both in terms of speed and memory.
Pointers almost always take up more bytes than your standard int and, especially, bool and char data types. On modern machines pointers typically are 8 bytes while char is almost always just 1 byte.
In this example, accessing the the char and bool from Foo requires more machine instructions than accessing from Bar:
struct Foo
{
char * c; // single character
bool * b; // single bool
};
struct Bar
{
char c;
bool b;
};
... And if we decide to make some arrays, then the size of the arrays of Foo would be 8 times larger - and the code is more spread-apart so this means you'll end up having a lot more cache misses.
#include <vector>
int main()
{
int size = 1000000;
std::vector<Foo> foo(size);
std::vector<Bar> bar(size);
return 0;
}
As dmckee pointed out, a single copy of a one-byte bool and a single copy of a pointer are just as fast:
bool num1, num2,* p1, * p2;
num1 = num2; // this takes one clock cycle
p1 = p2; // this takes another
As dmckee said, this is true when you're using a 64-bit architecture.
However, copying of arrays of ints, bools and chars can be much faster, because we can squeeze multiples of them onto each register:
#include <iostream>
int main ()
{
const int n_elements = 100000 * sizeof(int64_t);
bool A[n_elements];
bool B[n_elements];
int64_t * A_fast = (int64_t *) A;
int64_t * B_fast = (int64_t *) B;
const int n_quick_elements = n_elements / sizeof(int64_t);
for (int i = 0; i < 10000; ++i)
for (int j = 0; j < n_quick_elements; ++j)
A_fast[j] = B_fast[j];
return 0;
}
The STL containers and other good libraries do this sort of thing for us, using type_traits (is_trivially_copyable) and std::memcopy. Using pointers under the false guise that they're always just as fast can prevent those libraries from optimising.
Conclusion: It may seem obvious with these examples, but only use pointers/references on basic data types when you need to take/give access to the original object.