c++ void* memory traversal - c++

I'm trying to store a couple of ints in memory using void* & then retrieve them but it keeps throwing "pointer of type ‘void *’ used in arithmetic" warning.
void *a = new char[4];
memset(a, 0 , 4);
unsigned short d = 7;
memcpy(a, (void *)&d, 2);
d=8;
memcpy(a+2, (void *)&d, 2); //pointer of type ‘void *’ used in arithmetic
/*Retrieving*/
unsigned int *data = new unsigned int();
memcpy(data, a, 2);
cout << (unsigned int)(*data);
memcpy(data, a+2, 2); //pointer of type ‘void *’ used in arithmetic
cout << (unsigned int)(*data);
The results are as per expectation but I fear that these warnings might turn into errors on some compiler. Is there another way to do this that I'm not aware of?
I know this is perhaps a bad practice in normal scenario but the problem statement requires that unsigned integers be stored and sent in 2 byte packets. Please correct me if I'm wrong but as per my understanding, using a char* instead of a void* would have taken up 3 bytes for 3-digit numbers.

a+2, with a being a pointer, means that the pointer is increased to allow space for two items of the pointer type. V.g., if a was int32 *, a + 2 would mean "a position plus 8 bytes".
Since void * has no type, it can only try to guess what do you mean by a + 2, since it does not know the size of the type being referred.

The problem is that the compiler doesn't know what to do with
a+2
This instruction means "Move pointer 'a' forward by 2 * (sizeof-what-is-pointed-to-by-'a')".
If a is void *, the compiler doesn't know the size of the target object (there isn't one!), so it gives an error.
You need to do:
memcpy(data, ((char *)a)+2, 2);
This way, the compiler knows how to add 2 - it knows the sizeof(char).

Please correct me if I'm wrong but as per my understanding, using a char* instead of a void* would have taken up 3 bytes for 3-digit numbers.
Yes, you are wrong, that would be the case if you were transmitting the numbers as chars. 'char*' is just a convenient way of referring to 8-bit values - and since you are receiving pairs of bytes, you could treat the destination memory are char's to do simple math. But it is fairly common for people to use 'char' arrays for network data streams.
I prefer to use something like BYTE or uint8_t to indicate clearly 'I'm working with bytes' as opposed to char or other values.
void* is a pointer to an unknown, more importantly, 0 sized type (void). Because the size is zero, offset math is going to result in zeros, so the compiler tells you it's invalid.
It is possible that your solution could be as simple as to receive the bytes from the network into a byte-based array. An int is 32 bits, 4 bytes. They're not "char" values, but quads of a 4-byte integer.
#include <cstdint>
uint8_t buffer[4];
Once you know you've filled the buffer, you can simply say
uint32_t integer = *(static_cast<uint32*>(buffer));
Whether this is correct will depend on whether the bytes are in network or host order. I'm guessing you'll probably need:
uint32_t integer = ntohl(*(static_cast<uint32*>(buffer)));

Related

C++ adding 4 bytes to pointer address

I have a question about pointers, and memory addresses:
Supposing I have the following code:
int * array = (int *) malloc(sizeof(int) * 4);
Now in array im storing a memory address, I know that c++ takes already care when adding +1 to this pointer it will add 4 bytes, but what If I want to add manually 4 bytes?
array + 0x004
If im correct this will lead to add 4*4 (16) bytes, but my Idea is to add manually those 4 bytes.
Why? Just playing around, i've tried this and I got a totally different result from what I expected, then i've researched and i've seen that c++ takes already care when you add +1 to a pointer (it sums 4 bytes in this case).
Any idea?
For a pointer p to a type T with value v, the expression p+n will (on most systems anyway) result in a pointer to the address v+n*sizeof(T). To get a fixed-byte offset to the pointer, you can first cast it to a character pointer, like this:
reinterpret_cast<T*>(reinterpret_cast<char*>(p) + n)
In c++, sizeof(char) is defined to be equal to 1.
Do note that accessing improperly aligned values can have large performance penalties.
Another thing to note is that, in general, casting pointers to different types is not allowed (called the strict aliasing rule), but an exception is explicitly made for casting any pointer type to char* and back.
The trick is convert the type of array into any pointer-type with a size of 1 Byte, or store the pointer value in an integer.
#include <stdint.h>
int* increment_1(int* ptr) {
//C-Style
return (int*)(((char*)ptr) + 4);
}
int* increment_2(int* ptr) {
//C++-Style
char* result = reinterpret_cast<char*>(ptr);
result += 4;
return reinterpret_cast<int*>(result);
}
int* increment_3(int* ptr) {
//Store in integer
intptr_t result = reinterpret_cast<intptr_t>(ptr);
result += 4;
return reinterpret_cast<int*>(result);
}
Consider that if you add an arbitrary number of bytes to an address of an object of type T, it no longer makes sense to use a pointer of type T, since there might not be an object of type T at the incremented memory address.
If you want to access a particular byte of an object, you can do so using a pointer to a char, unsigned char or std::byte. Such objects are the size of a byte, so incrementing behaves just as you would like. Furthermore, while rules of C++ disallow accessing objects using incompatible pointers, these three types are excempt of that rule and are allowed to access objects of any type.
So, given
int * array = ....
You can access the byte at index 4 like this:
auto ptr = reinterpret_cast<unsigned char*>(array);
auto byte_at_index_4 = ptr + 4;
array + 0x004
If im correct this will lead to add 4*4 (16) bytes
Assuming sizeof(int) happens to be 4, then yes. But size of int is not guaranteed to be 4.

Obtaining an int from a void pointer which points to a short

I have a return value from a library which is a void pointer. I know that it points to a short int; I try to obtain the int value in the following way (replacing the function call with a simple assignment to a void *):
short n = 1;
void* s = &n;
int k = *(int*)s;
I try to cast a void pointer that points to an address in which there is a short and I try to cast the pointer to point to an int and when I do so the output becomes a rubbish value. While I understand why it's behaving like that I don't know if there's a solution to this.
If the problem you are dealing with truly deals with short and int, you can simply avoid the pointer and use:
short n = 1;
int k = n;
If the object types you are dealing with are different, then the solution will depend on what those types are.
Update, in response to OP's comment
In a comment, you said,
I have a function that returns a void pointer and I would need to cast the value accordingly.
If you know that the function returns a void* that truly points to a short object, then, your best bet is:
void* ptr = function_returning_ptr();
short* sptr = reinterpret_cast<short*>(ptr);
int k = *sptr;
The last line work since *sptr evaluates to a short and the conversion of a short to an int is a valid operation. On the other hand,
int k = *(int*)sptr;
does not work since conversion of short* to an int* is not a valid operation.
Your code is subject to undefined behavior, as it violates the so-called strict aliasing rules. Without going into too much detail and simplifying a bit, the rule states that you can not access an object of type X though a pointer to type Z unless types X and Z are related. There is a special exception for char pointer, but it doesn't apply here.
In your example, short and int are not related types, and as such, accessing one through pointer to another is not allowed.
The size of a short is only 16 bits the size of a int is 32 bits ( in most cases not always) this means that you are tricking the computer into thinking that your pointer to a short is actually pointing to an integer. This causes it to read more memory that it should and is reading garbage memory. If you cast s to a pointer to a short then deference it it will work.
short n = 1;
void* s = &n;
int k = *(short*)s;
Assuming you have 2 byte shorts and 4 byte ints, There's 3 problems with casting pointers in your method.
First off, the 4 byte int will necessarily pick up some garbage memory when using the short's pointer. If you're lucky the 2 bytes after short n will be 0.
Second, the 4 byte int may not be properly aligned. Basically, the memory address of a 4 byte int has to be a multiple of 4, or else you risk bus errors. Your 2 byte short is not guaranteed to be properly aligned.
Finally, you have a big-endian/little-endian dependency. You can't turn a big-endian short into a little-endian int by just tacking on some 0's at the end.
In the very fortunate circumstance that the bytes following the short are 0, AND the short is integer aligned, AND the system uses little-endian representation, then such a cast will probably work. It would be terrible, but it would (probably) work.
The proper solution is to use the original type and let the compiler cast. Instead of int k = *(int*)s;, you need to use int k = *(short *)s;

How to cast char array to int at non-aligned position?

Is there a way in C/C++ to cast a char array to an int at any position?
I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:
unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3)); // = 0x03040506 CORRECT
int32_t d2 = *((int*)(data+p)); // = 0x00010203 WRONG
Update:
As stated in the comments the input comes in tuples of 3 and I cannot
change that.
I want to convert 3 values to an int for further
processing and this conversion should be as fast as possible.
The
solution does not have to be cross platform. I am working with a very
specific compiler and processor, so it can be assumed that it is a 32
bit architecture with big endian.
The lowest byte of the result does not matter to me (see above).
My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?
No you can't do that in a portable way.
The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)
(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)
Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.
Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.
There is no way, converting a pointer to a wrongly aligned one is undefined.
You can use memcpy to copy the char array into an int32_t.
int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4
Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.
Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.
In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like
int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];

Can I cast an unsigned char* to an unsigned int*?

error: invalid static_cast from type ‘unsigned char*’ to type ‘uint32_t* {aka unsigned int*}’
uint32_t *starti = static_cast<uint32_t*>(&memory[164]);
I've allocated an array of chars, and I want to read 4 bytes as a 32bit int, but I get a compiler error.
I know that I can bit shift, like this:
(start[0] << 24) + (start[1] << 16) + (start[2] << 8) + start[3];
And it will do the same thing, but this is a lot of extra work.
Is it possible to just cast those four bytes as an int somehow?
static_cast is meant to be used for "well-behaved" casts, such as double -> int.
You must use reinterpret_cast:
uint32_t *starti = reinterpret_cast<uint32_t*>(&memory[164]);
Or, if you are up to it, C-style casts:
uint32_t *starti = (uint32_t*)&memory[164];
Yes, you can convert an unsigned char* pointer value to uint32_t* (using either a C-style cast or a reinterpret_cast) -- but that doesn't mean you can necessarily use the result.
The result of such a conversion might not point to an address that's properly aligned to hold a uint32_t object. For example, an unsigned char* might point to an odd address; if uint32_t requires even alignment, you'll have undefined behavior when you try to dereference the result.
If you can guarantee somehow that the unsigned char* does point to a properly aligned address, you should be ok.
I am used to BDS2006 C++ but anyway this should work fine on other compilers too
char memory[164];
int *p0,*p1,*p2;
p0=((int*)((void*)(memory))); // p0 starts from start
p1=((int*)((void*)(memory+64))); // p1 starts from 64th char
p2=((int*)((void*)(&memory[64]))); // p2 starts from 64th char
You can use reinterpret_cast as suggested by faranwath but please understand the risk of going that route.
The value of what you get back will be radically different in a little endian system vs a big endian system. Your method will work in both cases.

When to use unsigned char pointer

What is the use of unsigned char pointers? I have seen it at many places that pointer is type cast to pointer to unsinged char Why do we do so?
We receive a pointer to int and then type cast it to unsigned char*. But if we try to print element in that array using cout it does not print anything. why? I do not understand. I am new to c++.
EDIT Sample Code Below
int Stash::add(void* element)
{
if(next >= quantity)
// Enough space left?
inflate(increment);
// Copy element into storage, starting at next empty space:
int startBytes = next * size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < size; i++)
storage[startBytes + i] = e[i];
next++;
return(next - 1); // Index number
}
You are actually looking for pointer arithmetic:
unsigned char* bytes = (unsigned char*)ptr;
for(int i = 0; i < size; i++)
// work with bytes[i]
In this example, bytes[i] is equal to *(bytes + i) and it is used to access the memory on the address: bytes + (i* sizeof(*bytes)). In other words: If you have int* intPtr and you try to access intPtr[1], you are actually accessing the integer stored at bytes: 4 to 7:
0 1 2 3
4 5 6 7 <--
The size of type your pointer points to affects where it points after it is incremented / decremented. So if you want to iterate your data byte by byte, you need to have a pointer to type of size 1 byte (that's why unsigned char*).
unsigned char is usually used for holding binary data where 0 is valid value and still part of your data. While working with "naked" unsigned char* you'll probably have to hold the length of your buffer.
char is usually used for holding characters representing string and 0 is equal to '\0' (terminating character). If your buffer of characters is always terminated with '\0', you don't need to know it's length because terminating character exactly specifies the end of your data.
Note that in both of these cases it's better to use some object that hides the internal representation of your data and will take care of memory management for you (see RAII idiom). So it's much better idea to use either std::vector<unsigned char> (for binary data) or std::string (for string).
In C, unsigned char is the only type guaranteed to have no trapping values, and which guarantees copying will result in an exact bitwise image. (C++ extends this guarantee to char as well.) For this reason, it is traditionally used for "raw memory" (e.g. the semantics of memcpy are defined in terms of unsigned char).
In addition, unsigned integral types in general are used when bitwise operations (&, |, >> etc.) are going to be used. unsigned char is the smallest unsigned integral type, and may be used when manipulating arrays of small values on which bitwise operations are used. Occasionally, it's also used because one needs the modulo behavior in case of overflow, although this is more frequent with larger types (e.g. when calculating a hash value). Both of these reasons apply to unsigned types in general; unsigned char will normally only be used for them when there is a need to reduce memory use.
The unsinged char type is usually used as a representation of a single byte of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte.
The unsigned char* construct will be a pointer to the binary data buffer (or its 1st element).
I am not 100% sure what does c++ standard precisely says about size of unsigned char, whether it is fixed to be 8 bit or not. Usually it is. I will try to find and post it.
After seeing your code
When you use something like void* input as a parameter of a function, you deliberately strip down information about inputs original type. This is very strong suggestion that the input will be treated in very general manner. I.e. as a arbitrary string of bytes. int* input on the other hand would suggest it will be treated as a "string" of singed integers.
void* is mostly used in cases when input gets encoded, or treated bit/byte wise for whatever reason, since you cannot draw conclusions about its contents.
Then In your function you seem to want to treat the input as a string of bytes. But to operate on objects, e.g. performing operator= (assignment) the compiler needs to know what to do. Since you declare input as void* assignment such as *input = something would have no sense because *input is of void type. To make compiler to treat input elements as the "smallest raw memory pieces" you cast it to the appropriate type which is unsigned int.
The cout probably did not work because of wrong or unintended type conversion. char* is considered a null terminated string and it is easy to confuse singed and unsigned versionin code. If you pass unsinged char* to ostream::operator<< as a char* it will treat and expect the byte input as normal ASCII characters, where 0 is meant to be end of string not an integer value of 0. When you want to print contents of memory it is best to explicitly cast pointers.
Also note that to print memory contents of a buffer you would need to use a loop, since other wise the printing function would not know when to stop.
Unsigned char pointers are useful when you want to access the data byte by byte. For example, a function that copies data from one area to another could need this:
void memcpy (unsigned char* dest, unsigned char* source, unsigned count)
{
for (unsigned i = 0; i < count; i++)
dest[i] = source[i];
}
It also has to do with the fact that the byte is the smallest addressable unit of memory. If you want to read anything smaller than a byte from memory, you need to get the byte that contains that information, and then select the information using bit operations.
You could very well copy the data in the above function using a int pointer, but that would copy chunks of 4 bytes, which may not be the correct behavior in some situations.
Why nothing appears on the screen when you try to use cout, the most likely explanation is that the data starts with a zero character, which in C++ marks the end of a string of characters.