I have downloaded an image and it is saved in a std::string.
Now I want to use/open it with following conditions:
typedef uint8_t byte //1 byte unsigned integer type.
static open(const byte * data, long size)
How do I cast from string to byte* ?
/EDIT:
i have already tried this:
_data = std::vector<byte>(s.begin(), s.end());
//_data = std::vector<uint8_t>(s.begin(), s.end()); //also fails, same error
_p = &_data[0];
open(_p, _data.size())
but i get:
undefined reference to 'open(unsigned char const*, long)'
why does it interpret byte wrongly as char?!
/EDIT2:
just to test it i changed to function call to
open(*_p, _data.size())
but then i get:
error: no matching function for call to 'open(unsigned char&, size_t)'
[...] open(const byte*, long int) <near match>
So the function is definitly found...
Two possibilities:
1) the common one. On your system, char is either 2's complement or else unsigned, and hence it is "safe" to read chars as unsigned chars, and (if char is signed) the result is the same as converting from signed to unsigned.
In which case, use reinterpret_cast<const uint8_t*>(string.data()).
2) the uncommon one. On your system, char is signed and not 2's complement and hence for char *ptr pointing to a negative value the result of *(uint8_t*)ptr is not equal to (uint8_t)(*ptr). Then depending what open actually does with the data, the above might be fine or you might have to copy the data out of a string and into a container of uint8_t, converting as you go. For example, vector<uint8_t> v(string.begin(), string.end());. Then use &v[0] provided that the length is not 0 (unlike string, you aren't permitted to take a pointer to the data of an empty vector even if you never dereference it).
Non-2's-complement systems are approximately non-existent, and even if there was one I think it's fairly unlikely that a system on which char was signed and not 2's complement would provide uint8_t at all (because if it does then it must also provide int8_t). So (2) only serves pedantic completeness.
why does it interpret byte wrongly as char
It isn't wrong, uint8_t is a typedef for unsigned char on your system.
std::string string_with_image;
string_with_image.data();
Related
I am reading the Beej's Guide to network programming book and I am having trouble understanding a function. The function expects a char * pointer but it dereferences the pointer and casts it to a (unsigned long int) and perform some bitwise operations. Why couldn't we just pass it as a
(unsigned int *) instead of (unsigned char *). Also if the parameter was replaced by (void *) and then inside code we did some thing like:
*(unsigned long int *)buf[0] << 24
will we get the same result? (Sorry this is my first time asking a question here so let me know if any more info is required).
unsigned long int unpacku32(unsigned char *buf)
{
return ((unsigned long int)buf[0]<<24) |
((unsigned long int)buf[1]<<16) |
((unsigned long int)buf[2]<< 8) |
buf[3];
}
What you're suggesting is not guaranteed to work. Unless buf points to an actual unsigned long, you're attempting to read an object of one type as another which is not allowed (unless you're reading as an unsigned char). There could be further issues if the pointer value you create is not properly aligned for its type.
Then there is also the issue of endianness. Bytes sent over a network are typically sent in big-endian format, i.e. most significant byte first. If your system is little-endian, it will interpret the bytes in the reverse order.
The function you posted demonstrates the proper way of deserializing an unsigned long from a byte buffer in a standard compliant manner.
That would make it dependable on the endianness of the platform. So we pick out the parts from the defined order to make it platform neutral.
buf[0] is treated as 8 bit unsigned value. If we do this:
(unsigned long int)buf[0] << 24, by casting we tell to treat it not as 8 bit value, but as 64 bit so we got more space to work with.
We shifted only buf[0], buf[1] and other fields are not considered during shifting process.
If you want to convert to unsigned long lets say a string "aabbccd" and we don't care about endianness we can do this like below:
char* str = const_cast<char *>("aabbccd\0");
unsigned long value = *(reinterpret_cast<unsigned long *>(str));
std::cout << value << std::endl;
std::cout << reinterpret_cast<char *>(&value) << std::endl;
It should be pointed, unsigned long can store up to 8 chars only, because its 64 bit integer.
However if many platforms are going to use same data, doing it like this maybe be not enough due to endianness. The approach given in your book is as someone mentioned platform neutral.
The function expects a char * pointer but it dereferences the
pointer and casts it to a (unsigned long int) and perform some
bitwise operations.
Actually, what the code does is use the array index operator to pull out the first byte from the buffer, casts that to an unsigned long int, and then does some bitwise operations. The pointer that's dereferenced is an unsigned char * not anything to do with long integers.
Why couldn't we just pass it as a (unsigned int *) instead of
(unsigned char *).
Because it isn't a pointer to any kind of integer. It's a pointer to a buffer of unsigned char, i.e. bytes. Treating a pointer as if it were a pointer to a different type is likely to lead to a violation of the "Strict Aliasing Rule" (which I encourage you to read about).
Also if the parameter was replaced by (void *) and then inside code we
did some thing like *(unsigned long int *)buf[0] << 24 will we get
the same result?
No. If you define buf as a void*, then buf[0] is a meaningless expression. If buf is defined as, or cast to, an unsigned long int *, then buf[0] is an unsigned long int, not the unsigned char that the algorithm is expecting. There will almost certainly be too many bits set (as many as 64, not 8) and the result of the expression will be invalid.
Need to safe convert array from unsigned char* to char*.
I do it this way. Is it correct or not?
std::vector < unsigned char > arr;
char *imgData = (char*) malloc( arr.size() );
for ( int i = 0; i < arr.size(); i++ ) imgData[ i ] = ( arr.at( i ) - 128 );
No, that is not safe. Or more to the point, it's not well-defined behavior in C++.
char is allowed to be signed or unsigned, however the implementation sees fit. If char is unsigned, subtracting 128 from the unsigned char will just truncate half of the bits. And if char is signed, there's no guarantee that it's two's complement signed, so subtracting 128 won't do what you want.
The kind of conversion you're trying to do is not reasonable. You named the variable imgData, so it seems like you intend to send that data to some image API. And that API takes regular char. So your goal seems to be to convert each unsigned char into a char that shares the exact same bit-pattern of the original unsigned char.
In that case... just cast the pointer: static_cast<char*>(arr.data()). You're going to provoke undefined behavior either way; I'd rather do it in the way that's likely to actually work ;)
Also, it should be noted that C++14 makes it effectively impossible to implement a signed version of char that doesn't use two's complement. That's because of the need to support UTF-8 through a possibly-signed-char type. You have to be able to cast a char* into an unsigned char* and back, in such a way that the bit-pattern of all valid UTF-8 code units is preserved.
So the cast still is the option most likely to actually do what you want.
error: invalid static_cast from type ‘unsigned char*’ to type ‘uint32_t* {aka unsigned int*}’
uint32_t *starti = static_cast<uint32_t*>(&memory[164]);
I've allocated an array of chars, and I want to read 4 bytes as a 32bit int, but I get a compiler error.
I know that I can bit shift, like this:
(start[0] << 24) + (start[1] << 16) + (start[2] << 8) + start[3];
And it will do the same thing, but this is a lot of extra work.
Is it possible to just cast those four bytes as an int somehow?
static_cast is meant to be used for "well-behaved" casts, such as double -> int.
You must use reinterpret_cast:
uint32_t *starti = reinterpret_cast<uint32_t*>(&memory[164]);
Or, if you are up to it, C-style casts:
uint32_t *starti = (uint32_t*)&memory[164];
Yes, you can convert an unsigned char* pointer value to uint32_t* (using either a C-style cast or a reinterpret_cast) -- but that doesn't mean you can necessarily use the result.
The result of such a conversion might not point to an address that's properly aligned to hold a uint32_t object. For example, an unsigned char* might point to an odd address; if uint32_t requires even alignment, you'll have undefined behavior when you try to dereference the result.
If you can guarantee somehow that the unsigned char* does point to a properly aligned address, you should be ok.
I am used to BDS2006 C++ but anyway this should work fine on other compilers too
char memory[164];
int *p0,*p1,*p2;
p0=((int*)((void*)(memory))); // p0 starts from start
p1=((int*)((void*)(memory+64))); // p1 starts from 64th char
p2=((int*)((void*)(&memory[64]))); // p2 starts from 64th char
You can use reinterpret_cast as suggested by faranwath but please understand the risk of going that route.
The value of what you get back will be radically different in a little endian system vs a big endian system. Your method will work in both cases.
I want to use a function that expects data like this:
void process(char *data_in, int data_len);
So it's just processing some bytes really.
But I'm more comfortable working with "unsigned char" when it comes to raw bytes (it somehow "feels" more right to deal with positive 0 to 255 values only), so my question is:
Can I always safely pass a unsigned char * into this function?
In other words:
Is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?
Can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?
Bonus: Is the answer same in C and C++?
The short answer is yes if you use an explicit cast, but to explain it in detail, there are three aspects to look at:
1) Legality of the conversion
Converting between signed T* and unsigned T* (for some type T) in either direction is generally possible because the source type can first be converted to void * (this is a standard conversion, §4.10), and the void * can be converted to the destination type using an explicit static_cast (§5.2.9/13):
static_cast<unsigned char*>(static_cast<void *>(data_in))
This can be abbreviated (§5.2.10/7) as
reinterpret_cast<unsigned char *>(data_in)
because char is a standard-layout type (§3.9.1/7,8 and §3.9/9) and signedness does not change alignment (§3.9.1/1). It can also be written as a C-style cast:
(unsigned char *)(data_in)
Again, this works both ways, from unsigned* to signed* and back. There is also a guarantee that if you apply this procedure one way and then back, the pointer value (i.e. the address it's pointing to) won't have changed (§5.2.10/7).
All of this applies not only to conversions between signed char * and unsigned char *, but also to char */unsigned char * and char */signed char *, respectively. (char, signed char and unsigned char are formally three distinct types, §3.9.1/1.)
To be clear, it doesn't matter which of the three cast-methods you use, but you must use one. Merely passing the pointer will not work, as the conversion, while legal, is not a standard conversion, so it won't be performed implicitly (the compiler will issue an error if you try).
2) Well-definedness of the access to the values
What happens if, inside the function, you dereference the pointer, i.e. you perform *data_in to retrieve a glvalue for the underlying character; is this well-defined and legal? The relevant rule here is the strict-aliasing rule (§3.10/10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
[...]
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
[...]
a char or unsigned char type.
Therefore, accessing a signed char (or char) through an unsigned char* (or char) and vice versa is not disallowed by this rule – you should be able to do this without problems.
3) Resulting values
After derefencing the type-converted pointer, will you be able to work with the value you get? It's important to bear in mind that the conversion and dereferencing of the pointer described above amounts to reinterpreting (not changing!) the bit pattern stored at the address of the character. So what happens when a bit pattern for a signed character is interpreted as that of an unsigned character (or vice versa)?
When going from unsigned to signed, the typical effect will be that for values between 0 and 128 nothing happens, and values above 128 become negative. Similar in reverse: When going from signed to unsigned, negative values will appear as values greater than 128.
But this behaviour isn't actually guaranteed by the Standard. The only thing the Standard guarantees is that for all three types, char, unsigned char and signed char, all bits (not necessarily 8, btw) are used for the value representation. So if you interpret one as the other, make a few copies and then store it back to the original location, you can be sure that there will be no information loss (as you required), but you won't necessarily know what the values actually mean (at least not in a fully portable way).
unsigned char or signed char is just interpretation: there is no conversion happening.
Since you are processing bytes, to show intent, it would be better to declare as
void process(unsigned char *data_in, int data_len);
[As noted by an editor: A plain char may be either a signed or an unsigned type. The C and C++ standards explicitly allow either (it is always a separate type from either unsigned char or signed char, but has the same range as one of them)]
Yes, you can always convert from char to unsigned char & vice versa without problems. If you run the following code, and compare it with an ASCII table (ref. http://www.asciitable.com/), you can see a proof by yourself, and how the C/C++ deal with the conversions - they deal exactly in the same way:
#include "stdio.h"
int main(void) {
//converting from char to unsigned char
char c = 0;
printf("%d byte(s)\n", sizeof(char)); // result: 1byte, i.e. 8bits, so there are 2^8=256 values that a char can store.
for (int i=0; i<256; i++){
printf("int value: %d - from: %c\tto: %c\n", c, c, (unsigned char) c);
c++;
}
//converting from unsigned char to char
unsigned char uc = 0;
printf("\n%d byte(s)\n", sizeof(unsigned char));
for (int i=0; i<256; i++){
printf("int value: %d - from: %c\tto: %c\n", uc, uc, (char) uc);
uc++;
}
}
I will not post the output because it has too many lines! It can be noticed in the output that in the first half of each section, i.e. from i=0:127, the conversion from chars to unsigned chars and vice-versa works well, without any modification or loss.
However, from i=128:255 the chars and the unsigned chars cannot be casted, or you would have different outputs, because unsigned char saves the values from [0:256] and char saves the values in the interval [-128:127]). Nevertheless, the behaviour in this 2nd half is irrelevant, because in C/C++, in general, you only lead with chars/unsigned chars as ASCII characters, whose can take only 128 different values and the other 128 values (positive for chars or negative for unsigned chars) are never used.
If you never put a value in a char that doesn't represent a character, and you never put a value in an unsigned char that doesn't represent a character, everything will be OK!
extra: even if you use UTF-8 or other encodings (for special characters) in your strings with C/C++, everything with this kind of casts would be OK, for instance, using UTF-8 encoding (ref. http://lwp.interglacial.com/appf_01.htm):
char hearts[] = {0xe2, 0x99, 0xa5, 0x00};
char diamonds[] = {0xe2, 0x99, 0xa6, 0x00};
char clubs[] = {0xe2, 0x99, 0xa3, 0x00};
char spades[] = {0xe2, 0x99, 0xa0, 0x00};
printf("hearts (%s)\ndiamonds (%s)\nclubs (%s)\nspades (%s)\n\n", hearts, diamonds, clubs, spades);
the output of that code will be:
hearts (♥)
diamonds (♦)
clubs (♣)
spades (♠)
even if you cast each of its chars to unsigned chars.
so:
"can I always safely pass a unsigned char * into this function?"
yes!
"is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?"
yes!
"can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?"
yes!
"is the answer same in C and C++?"
yes!
Semantically, passing between unsigned char * and char * are safe, and even though casting between them, so as in c++.
However, consider the following sample code:
#include "stdio.h"
void process_unsigned(unsigned char *data_in, int data_len) {
int i=data_len;
unsigned short product=1;
for(; i--; product*=data_in[i])
;
for(i=sizeof(product); i--; ) {
data_in[i]=((unsigned char *)&product)[i];
printf("%d\r\n", data_in[i]);
}
}
void process(char *data_in, int data_len) {
int i=data_len;
unsigned short product=1;
for(; i--; product*=data_in[i])
;
for(i=sizeof(product); i--; ) {
data_in[i]=((unsigned char *)&product)[i];
printf("%d\r\n", data_in[i]);
}
}
void main() {
unsigned char
a[]={1, -1},
b[]={1, -1};
process_unsigned(a, sizeof(a));
process(b, sizeof(b));
getch();
}
output:
0
255
-1
-1
All the code inside process_unsigned and process are just IDENTICAL. The only difference is unsigned and signed. This sample shows that the code in the black box, do be affected by the SIGN, and nothing is guaranteed between the callee and caller.
Thus I would say that, it's applicable of passing only, but none of any other possibilities is guaranteed.
You can pass a pointer to a different kind of char, but you may need to explicitly cast it. The pointers are guaranteed to be the same size and the same values. There isn't going to be any information loss during the conversion.
If you want to convert char to unsigned char inside the function, you just assign a char value to an unsigned char variable or cast the char value to unsigned char.
If you need to convert unsigned char to char without data loss, it's a bit harder, but still possible:
#include <limits.h>
char uc2c(unsigned char c)
{
#if CHAR_MIN == 0
// char is unsigned
return c;
#else
// char is signed
if (c <= CHAR_MAX)
return c;
else
// ASSUMPTION 1: int is larger than char
// ASSUMPTION 2: integers are 2's complement
return c - CHAR_MAX - 1 - CHAR_MAX - 1;
#endif
}
This function will convert unsigned char to char in such a way that the returned value can be converted back to the same unsigned char value as the parameter.
You really need to view the code to process() to know if you can safely pass in unsigned characters. If the function uses the characters as an index into an array, then no, you can't use unsigned data.
Is it safe to convert, say, from an unsigned char * to a signed char * (or just a char *?
The access is well-defined, you are allowed to access an object through a pointer to signed or unsigned type corresponding to the dynamic type of the object (3.10/15).
Additionally, signed char is guaranteed not to have any trap values and as such you can safely read through the signed char pointer no matter what the value of the original unsigned char object was.
You can, of course, expect that the values you read through one pointer will be different from the values you read through the other one.
Edit: regarding sellibitze's comment, this is what 3.9.1/1 says.
A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.9); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers.
So indeed it seems that signed char may have trap values. Nice catch!
The conversion should be safe, as all you're doing is converting from one type of character to another, which should have the same size. Just be aware of what sort of data your code is expecting when you dereference the pointer, as the numeric ranges of the two data types are different. (i.e. if your number pointed by the pointer was originally positive as unsigned, it might become a negative number once the pointer is converted to a signed char* and you dereference it.)
Casting changes the type, but does not affect the bit representation. Casting from unsigned char to signed char does not change the value at all, but it affects the meaning of the value.
Here is an example:
#include <stdio.h>
int main(int args, char** argv) {
/* example 1 */
unsigned char a_unsigned_char = 192;
signed char b_signed_char = b_unsigned_char;
printf("%d, %d\n", a_signed_char, a_unsigned_char); //192, -64
/* example 2 */
unsigned char b_unsigned_char = 32;
signed char a_signed_char = a_unsigned_char;
printf("%d, %d\n", b_signed_char, b_unsigned_char); //32, 32
return 0;
}
In the first example, you have an unsigned char with value 192, or 110000000 in binary. After the cast to signed char, the value is still 110000000, but that happens to be the 2s-complement representation of -64. Signed values are stored in 2s-complement representation.
In the second example, our unsigned initial value (32) is less than 128, so it seems unaffected by the cast. The binary representation is 00100000, which is still 32 in 2s-complement representation.
To "safely" cast from unsigned char to signed char, ensure the value is less than 128.
It depends on how you are going to use the pointer. You are just converting the pointer type.
You can safely convert an unsigned char* to a char * as the function you are calling will be expecting the behavior from a char pointer, but, if your char value goes over 127 then you will get a result that will not be what you expected, so just make certain that what you have in your unsigned array is valid for a signed array.
I've seen it go wrong in a few ways, converting to a signed char from an unsigned char.
One, if you're using it as an index to an array, that index could go negative.
Secondly, if inputted to a switch statement, it may result in a negative input which often is something the switch isn't expecting.
Third, it has different behavior on an arithmetic right shift
int x = ...;
char c = 128
unsigned char u = 128
c >> x;
has a different result than
u >> x;
Because the former is sign-extended and the latter isn't.
Fourth, a signed character causes underflow at a different point than an unsigned character.
So a common overflow check,
(c + x > c)
could return a different result than
(u + x > u)
Safe if you are dealing with only ASCII data.
I'm astonished it hasn't been mentioned yet: Boost numeric cast should do the trick - but only for the data of course.
Pointers are always pointers. By casting them to a different type, you only change the way the compiler interprets the data pointed to.