When to use unsigned char pointer - c++

What is the use of unsigned char pointers? I have seen it at many places that pointer is type cast to pointer to unsinged char Why do we do so?
We receive a pointer to int and then type cast it to unsigned char*. But if we try to print element in that array using cout it does not print anything. why? I do not understand. I am new to c++.
EDIT Sample Code Below
int Stash::add(void* element)
{
if(next >= quantity)
// Enough space left?
inflate(increment);
// Copy element into storage, starting at next empty space:
int startBytes = next * size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < size; i++)
storage[startBytes + i] = e[i];
next++;
return(next - 1); // Index number
}

You are actually looking for pointer arithmetic:
unsigned char* bytes = (unsigned char*)ptr;
for(int i = 0; i < size; i++)
// work with bytes[i]
In this example, bytes[i] is equal to *(bytes + i) and it is used to access the memory on the address: bytes + (i* sizeof(*bytes)). In other words: If you have int* intPtr and you try to access intPtr[1], you are actually accessing the integer stored at bytes: 4 to 7:
0 1 2 3
4 5 6 7 <--
The size of type your pointer points to affects where it points after it is incremented / decremented. So if you want to iterate your data byte by byte, you need to have a pointer to type of size 1 byte (that's why unsigned char*).
unsigned char is usually used for holding binary data where 0 is valid value and still part of your data. While working with "naked" unsigned char* you'll probably have to hold the length of your buffer.
char is usually used for holding characters representing string and 0 is equal to '\0' (terminating character). If your buffer of characters is always terminated with '\0', you don't need to know it's length because terminating character exactly specifies the end of your data.
Note that in both of these cases it's better to use some object that hides the internal representation of your data and will take care of memory management for you (see RAII idiom). So it's much better idea to use either std::vector<unsigned char> (for binary data) or std::string (for string).

In C, unsigned char is the only type guaranteed to have no trapping values, and which guarantees copying will result in an exact bitwise image. (C++ extends this guarantee to char as well.) For this reason, it is traditionally used for "raw memory" (e.g. the semantics of memcpy are defined in terms of unsigned char).
In addition, unsigned integral types in general are used when bitwise operations (&, |, >> etc.) are going to be used. unsigned char is the smallest unsigned integral type, and may be used when manipulating arrays of small values on which bitwise operations are used. Occasionally, it's also used because one needs the modulo behavior in case of overflow, although this is more frequent with larger types (e.g. when calculating a hash value). Both of these reasons apply to unsigned types in general; unsigned char will normally only be used for them when there is a need to reduce memory use.

The unsinged char type is usually used as a representation of a single byte of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte.
The unsigned char* construct will be a pointer to the binary data buffer (or its 1st element).
I am not 100% sure what does c++ standard precisely says about size of unsigned char, whether it is fixed to be 8 bit or not. Usually it is. I will try to find and post it.
After seeing your code
When you use something like void* input as a parameter of a function, you deliberately strip down information about inputs original type. This is very strong suggestion that the input will be treated in very general manner. I.e. as a arbitrary string of bytes. int* input on the other hand would suggest it will be treated as a "string" of singed integers.
void* is mostly used in cases when input gets encoded, or treated bit/byte wise for whatever reason, since you cannot draw conclusions about its contents.
Then In your function you seem to want to treat the input as a string of bytes. But to operate on objects, e.g. performing operator= (assignment) the compiler needs to know what to do. Since you declare input as void* assignment such as *input = something would have no sense because *input is of void type. To make compiler to treat input elements as the "smallest raw memory pieces" you cast it to the appropriate type which is unsigned int.
The cout probably did not work because of wrong or unintended type conversion. char* is considered a null terminated string and it is easy to confuse singed and unsigned versionin code. If you pass unsinged char* to ostream::operator<< as a char* it will treat and expect the byte input as normal ASCII characters, where 0 is meant to be end of string not an integer value of 0. When you want to print contents of memory it is best to explicitly cast pointers.
Also note that to print memory contents of a buffer you would need to use a loop, since other wise the printing function would not know when to stop.

Unsigned char pointers are useful when you want to access the data byte by byte. For example, a function that copies data from one area to another could need this:
void memcpy (unsigned char* dest, unsigned char* source, unsigned count)
{
for (unsigned i = 0; i < count; i++)
dest[i] = source[i];
}
It also has to do with the fact that the byte is the smallest addressable unit of memory. If you want to read anything smaller than a byte from memory, you need to get the byte that contains that information, and then select the information using bit operations.
You could very well copy the data in the above function using a int pointer, but that would copy chunks of 4 bytes, which may not be the correct behavior in some situations.
Why nothing appears on the screen when you try to use cout, the most likely explanation is that the data starts with a zero character, which in C++ marks the end of a string of characters.

Related

Obtaining an int from a void pointer which points to a short

I have a return value from a library which is a void pointer. I know that it points to a short int; I try to obtain the int value in the following way (replacing the function call with a simple assignment to a void *):
short n = 1;
void* s = &n;
int k = *(int*)s;
I try to cast a void pointer that points to an address in which there is a short and I try to cast the pointer to point to an int and when I do so the output becomes a rubbish value. While I understand why it's behaving like that I don't know if there's a solution to this.
If the problem you are dealing with truly deals with short and int, you can simply avoid the pointer and use:
short n = 1;
int k = n;
If the object types you are dealing with are different, then the solution will depend on what those types are.
Update, in response to OP's comment
In a comment, you said,
I have a function that returns a void pointer and I would need to cast the value accordingly.
If you know that the function returns a void* that truly points to a short object, then, your best bet is:
void* ptr = function_returning_ptr();
short* sptr = reinterpret_cast<short*>(ptr);
int k = *sptr;
The last line work since *sptr evaluates to a short and the conversion of a short to an int is a valid operation. On the other hand,
int k = *(int*)sptr;
does not work since conversion of short* to an int* is not a valid operation.
Your code is subject to undefined behavior, as it violates the so-called strict aliasing rules. Without going into too much detail and simplifying a bit, the rule states that you can not access an object of type X though a pointer to type Z unless types X and Z are related. There is a special exception for char pointer, but it doesn't apply here.
In your example, short and int are not related types, and as such, accessing one through pointer to another is not allowed.
The size of a short is only 16 bits the size of a int is 32 bits ( in most cases not always) this means that you are tricking the computer into thinking that your pointer to a short is actually pointing to an integer. This causes it to read more memory that it should and is reading garbage memory. If you cast s to a pointer to a short then deference it it will work.
short n = 1;
void* s = &n;
int k = *(short*)s;
Assuming you have 2 byte shorts and 4 byte ints, There's 3 problems with casting pointers in your method.
First off, the 4 byte int will necessarily pick up some garbage memory when using the short's pointer. If you're lucky the 2 bytes after short n will be 0.
Second, the 4 byte int may not be properly aligned. Basically, the memory address of a 4 byte int has to be a multiple of 4, or else you risk bus errors. Your 2 byte short is not guaranteed to be properly aligned.
Finally, you have a big-endian/little-endian dependency. You can't turn a big-endian short into a little-endian int by just tacking on some 0's at the end.
In the very fortunate circumstance that the bytes following the short are 0, AND the short is integer aligned, AND the system uses little-endian representation, then such a cast will probably work. It would be terrible, but it would (probably) work.
The proper solution is to use the original type and let the compiler cast. Instead of int k = *(int*)s;, you need to use int k = *(short *)s;

Use reinterpret_cast to convert binary data at an offset in the char array

I found this post:
Why is memcpy slower than a reinterpret_cast when parsing binary data?
where somebody uses reinterpret_cast to convert binary data to an integer. However (I presume) the number they are converting is at the 0th element in the char* array.
How could I use the above for situations where the binary number I want to convert is offset N bytes from the beginning of the char array?
I want to convert the binary number in as fewer CPU cycles as possible, hence my interest in reinterpret_cast and the above SO question.
Just add a offset to the byte array address.
Instead of x, cast x+123
But: Have you read the first line of the question (bold edit)?
TLDR: I forgot to enable compiler optimizations. With the
optimizations enabled the performance is (nearly) identical.
If you have a *array; initialized with your binary data then you can simply do this:
for (int offset = 0; offset < sizeof (array); offset++)
{
... = *reinterpret_cast<const int*>(array + offset);
}

c++ void* memory traversal

I'm trying to store a couple of ints in memory using void* & then retrieve them but it keeps throwing "pointer of type ‘void *’ used in arithmetic" warning.
void *a = new char[4];
memset(a, 0 , 4);
unsigned short d = 7;
memcpy(a, (void *)&d, 2);
d=8;
memcpy(a+2, (void *)&d, 2); //pointer of type ‘void *’ used in arithmetic
/*Retrieving*/
unsigned int *data = new unsigned int();
memcpy(data, a, 2);
cout << (unsigned int)(*data);
memcpy(data, a+2, 2); //pointer of type ‘void *’ used in arithmetic
cout << (unsigned int)(*data);
The results are as per expectation but I fear that these warnings might turn into errors on some compiler. Is there another way to do this that I'm not aware of?
I know this is perhaps a bad practice in normal scenario but the problem statement requires that unsigned integers be stored and sent in 2 byte packets. Please correct me if I'm wrong but as per my understanding, using a char* instead of a void* would have taken up 3 bytes for 3-digit numbers.
a+2, with a being a pointer, means that the pointer is increased to allow space for two items of the pointer type. V.g., if a was int32 *, a + 2 would mean "a position plus 8 bytes".
Since void * has no type, it can only try to guess what do you mean by a + 2, since it does not know the size of the type being referred.
The problem is that the compiler doesn't know what to do with
a+2
This instruction means "Move pointer 'a' forward by 2 * (sizeof-what-is-pointed-to-by-'a')".
If a is void *, the compiler doesn't know the size of the target object (there isn't one!), so it gives an error.
You need to do:
memcpy(data, ((char *)a)+2, 2);
This way, the compiler knows how to add 2 - it knows the sizeof(char).
Please correct me if I'm wrong but as per my understanding, using a char* instead of a void* would have taken up 3 bytes for 3-digit numbers.
Yes, you are wrong, that would be the case if you were transmitting the numbers as chars. 'char*' is just a convenient way of referring to 8-bit values - and since you are receiving pairs of bytes, you could treat the destination memory are char's to do simple math. But it is fairly common for people to use 'char' arrays for network data streams.
I prefer to use something like BYTE or uint8_t to indicate clearly 'I'm working with bytes' as opposed to char or other values.
void* is a pointer to an unknown, more importantly, 0 sized type (void). Because the size is zero, offset math is going to result in zeros, so the compiler tells you it's invalid.
It is possible that your solution could be as simple as to receive the bytes from the network into a byte-based array. An int is 32 bits, 4 bytes. They're not "char" values, but quads of a 4-byte integer.
#include <cstdint>
uint8_t buffer[4];
Once you know you've filled the buffer, you can simply say
uint32_t integer = *(static_cast<uint32*>(buffer));
Whether this is correct will depend on whether the bytes are in network or host order. I'm guessing you'll probably need:
uint32_t integer = ntohl(*(static_cast<uint32*>(buffer)));

Integer to Character conversion in C

Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?
What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();
Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.
That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.
As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.
Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness

string::size_type instead of int

const std::string::size_type cols = greeting.size() + pad * 2 + 2;
Why string::size_type? int is supposed to work! it holds numbers!!!
A short holds numbers too. As does a signed char.
But none of those types are guaranteed to be large enough to represent the sizes of any strings.
string::size_type guarantees just that. It is a type that is big enough to represent the size of a string, no matter how big that string is.
For a simple example of why this is necessary, consider 64-bit platforms. An int is typically still 32 bit on those, but you have far more than 2^32 bytes of memory.
So if a (signed) int was used, you'd be unable to create strings larger than 2^31 characters.
size_type will be a 64-bit value on those platforms however, so it can represent larger strings without a problem.
The example that you've given,
const std::string::size_type cols = greeting.size() + pad * 2 + 2;
is from Accelerated C++ by Koenig. He also states the reason for his choice right after this, namely:
The std::string type defines size_type to be the name of the
appropriate type for holding the number of characters in a string. Whenever we need a local
variable to contain the size of a string, we should use std::string::size_type as the type of that variable.
The reason that we have given cols a type of std::string::size_type is
to ensure that cols is capable of containing the number of characters
in greeting, no matter how large that number might be. We could simply
have said that cols has type int, and indeed, doing so would probably
work. However, the value of cols depends on the size of the input to
our program, and we have no control over how long that input might be.
It is conceivable that someone might give our program a string so long
that an int is insufficient to contain its length.
A nested size_type typedef is a requirement for STL-compatible containers (which std::string happens to be), so generic code can choose the correct integer type to represent sizes.
There's no point in using it in application code, a size_t is completely ok (int is not, because it's signed, and you'll get signed/unsigned comparison warnings).