I am reading the Beej's Guide to network programming book and I am having trouble understanding a function. The function expects a char * pointer but it dereferences the pointer and casts it to a (unsigned long int) and perform some bitwise operations. Why couldn't we just pass it as a
(unsigned int *) instead of (unsigned char *). Also if the parameter was replaced by (void *) and then inside code we did some thing like:
*(unsigned long int *)buf[0] << 24
will we get the same result? (Sorry this is my first time asking a question here so let me know if any more info is required).
unsigned long int unpacku32(unsigned char *buf)
{
return ((unsigned long int)buf[0]<<24) |
((unsigned long int)buf[1]<<16) |
((unsigned long int)buf[2]<< 8) |
buf[3];
}
What you're suggesting is not guaranteed to work. Unless buf points to an actual unsigned long, you're attempting to read an object of one type as another which is not allowed (unless you're reading as an unsigned char). There could be further issues if the pointer value you create is not properly aligned for its type.
Then there is also the issue of endianness. Bytes sent over a network are typically sent in big-endian format, i.e. most significant byte first. If your system is little-endian, it will interpret the bytes in the reverse order.
The function you posted demonstrates the proper way of deserializing an unsigned long from a byte buffer in a standard compliant manner.
That would make it dependable on the endianness of the platform. So we pick out the parts from the defined order to make it platform neutral.
buf[0] is treated as 8 bit unsigned value. If we do this:
(unsigned long int)buf[0] << 24, by casting we tell to treat it not as 8 bit value, but as 64 bit so we got more space to work with.
We shifted only buf[0], buf[1] and other fields are not considered during shifting process.
If you want to convert to unsigned long lets say a string "aabbccd" and we don't care about endianness we can do this like below:
char* str = const_cast<char *>("aabbccd\0");
unsigned long value = *(reinterpret_cast<unsigned long *>(str));
std::cout << value << std::endl;
std::cout << reinterpret_cast<char *>(&value) << std::endl;
It should be pointed, unsigned long can store up to 8 chars only, because its 64 bit integer.
However if many platforms are going to use same data, doing it like this maybe be not enough due to endianness. The approach given in your book is as someone mentioned platform neutral.
The function expects a char * pointer but it dereferences the
pointer and casts it to a (unsigned long int) and perform some
bitwise operations.
Actually, what the code does is use the array index operator to pull out the first byte from the buffer, casts that to an unsigned long int, and then does some bitwise operations. The pointer that's dereferenced is an unsigned char * not anything to do with long integers.
Why couldn't we just pass it as a (unsigned int *) instead of
(unsigned char *).
Because it isn't a pointer to any kind of integer. It's a pointer to a buffer of unsigned char, i.e. bytes. Treating a pointer as if it were a pointer to a different type is likely to lead to a violation of the "Strict Aliasing Rule" (which I encourage you to read about).
Also if the parameter was replaced by (void *) and then inside code we
did some thing like *(unsigned long int *)buf[0] << 24 will we get
the same result?
No. If you define buf as a void*, then buf[0] is a meaningless expression. If buf is defined as, or cast to, an unsigned long int *, then buf[0] is an unsigned long int, not the unsigned char that the algorithm is expecting. There will almost certainly be too many bits set (as many as 64, not 8) and the result of the expression will be invalid.
Related
error: invalid static_cast from type ‘unsigned char*’ to type ‘uint32_t* {aka unsigned int*}’
uint32_t *starti = static_cast<uint32_t*>(&memory[164]);
I've allocated an array of chars, and I want to read 4 bytes as a 32bit int, but I get a compiler error.
I know that I can bit shift, like this:
(start[0] << 24) + (start[1] << 16) + (start[2] << 8) + start[3];
And it will do the same thing, but this is a lot of extra work.
Is it possible to just cast those four bytes as an int somehow?
static_cast is meant to be used for "well-behaved" casts, such as double -> int.
You must use reinterpret_cast:
uint32_t *starti = reinterpret_cast<uint32_t*>(&memory[164]);
Or, if you are up to it, C-style casts:
uint32_t *starti = (uint32_t*)&memory[164];
Yes, you can convert an unsigned char* pointer value to uint32_t* (using either a C-style cast or a reinterpret_cast) -- but that doesn't mean you can necessarily use the result.
The result of such a conversion might not point to an address that's properly aligned to hold a uint32_t object. For example, an unsigned char* might point to an odd address; if uint32_t requires even alignment, you'll have undefined behavior when you try to dereference the result.
If you can guarantee somehow that the unsigned char* does point to a properly aligned address, you should be ok.
I am used to BDS2006 C++ but anyway this should work fine on other compilers too
char memory[164];
int *p0,*p1,*p2;
p0=((int*)((void*)(memory))); // p0 starts from start
p1=((int*)((void*)(memory+64))); // p1 starts from 64th char
p2=((int*)((void*)(&memory[64]))); // p2 starts from 64th char
You can use reinterpret_cast as suggested by faranwath but please understand the risk of going that route.
The value of what you get back will be radically different in a little endian system vs a big endian system. Your method will work in both cases.
What I have is this
struct Record
{
unsigned char cat;
unsigned char len[2]={0x00, 0x1b}; // can't put short here because that
// whould change the size of the struct
unsigned char dat[253];
};
Record record;
unsigned short recordlen = *((unsigned short*)record.len);
This result in recordlen=0x1b00 instead of 0x001b
Same with *reinterpret_cast<unsigned short*>(record.len)
Can you explain why ? How should I be doing this ?
What you encounter is called "endianness". In x86, all numeric variables are stored "little endian", meaning the least-significant byte comes first.
From the Wikipedia page:
The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses.
This depends on endianess of your cpu. See wikipedia.
In your case you have "little endian", which means that least significant bytes come first. This is convenient when you want to convert numbers to different byte sizes: if you use a long int to represent a short number, its representation is the same as if it were a short number, only it has additional zeroes at the end.
Can you explain why?
Because you cannot assume a specific endianness of your computer architecture.
The natural follow-up question is what do you do about it. Fortunately, you can force a specific byte order by calling one of these functions htonl, htons, ntohl, or ntohs. They work regardless of the computer architecture on which you run them:
On the sending end, you convert from host order to network order; on the receiving end, you convert from network order to host order.
// Sending end
unsigned short recordlen = calculate_len();
*reinterpret_cast<unsigned short*>(record.len) = htons(recordlen);
// Receiving end
unsigned short recordlen = ntohs(*reinterpret_cast<unsigned short*>(record.len));
unsigned short recordlen = *((unsigned short*)record.len);
This is broken. record.len doesn't point to an unsigned short. Telling the compiler it does is just lying.
I presume you want:
unsigned short recordlen = static_cast<unsigned short>(record.len[0]) * 256 +
static_cast<unsigned short>(record.len[1]);
Or, if you like it better:
unsigned short recordlen = (static_cast<unsigned short>(record.len[0]) << 8) |
static_cast<unsigned short>(record.len[1]);
If not, code whatever it is you actually want.
I have downloaded an image and it is saved in a std::string.
Now I want to use/open it with following conditions:
typedef uint8_t byte //1 byte unsigned integer type.
static open(const byte * data, long size)
How do I cast from string to byte* ?
/EDIT:
i have already tried this:
_data = std::vector<byte>(s.begin(), s.end());
//_data = std::vector<uint8_t>(s.begin(), s.end()); //also fails, same error
_p = &_data[0];
open(_p, _data.size())
but i get:
undefined reference to 'open(unsigned char const*, long)'
why does it interpret byte wrongly as char?!
/EDIT2:
just to test it i changed to function call to
open(*_p, _data.size())
but then i get:
error: no matching function for call to 'open(unsigned char&, size_t)'
[...] open(const byte*, long int) <near match>
So the function is definitly found...
Two possibilities:
1) the common one. On your system, char is either 2's complement or else unsigned, and hence it is "safe" to read chars as unsigned chars, and (if char is signed) the result is the same as converting from signed to unsigned.
In which case, use reinterpret_cast<const uint8_t*>(string.data()).
2) the uncommon one. On your system, char is signed and not 2's complement and hence for char *ptr pointing to a negative value the result of *(uint8_t*)ptr is not equal to (uint8_t)(*ptr). Then depending what open actually does with the data, the above might be fine or you might have to copy the data out of a string and into a container of uint8_t, converting as you go. For example, vector<uint8_t> v(string.begin(), string.end());. Then use &v[0] provided that the length is not 0 (unlike string, you aren't permitted to take a pointer to the data of an empty vector even if you never dereference it).
Non-2's-complement systems are approximately non-existent, and even if there was one I think it's fairly unlikely that a system on which char was signed and not 2's complement would provide uint8_t at all (because if it does then it must also provide int8_t). So (2) only serves pedantic completeness.
why does it interpret byte wrongly as char
It isn't wrong, uint8_t is a typedef for unsigned char on your system.
std::string string_with_image;
string_with_image.data();
Hi I'm trying to use the code from, http://blog.firetree.net/2006/08/23/nasa-srtm-elevation-data/, with no success, after much chasing around I found where it is failing, but have no idea how to fix it, please help, this has been doing my head in for about 6 hours.
This is the line that fails. data is a void pointer to a memory mapped file.
unsigned short datum=((unsigned short*)data)[i];
I'm on OpenSuse using the gcc compiler. I'm on a 64bit system.
Thanks in advance.
I think maybe this is caused by the memory alignment.
in some platform pointer value can't be cast to some types.
for example, a platform needs int* should be aligned with 4, so 0x12345 can be void* or char*, but if you assigned it to int*, crash happened.
for your situation,
you can cast the void pointer to unsigned char* first, then convert 2 unsigned chars to unsigned short:
unsigned char a =((unsigned char*)data)[i];
unsigned char b =((unsigned char*)data)[i+1];
if (platform_is_little_endian()) {
unsigned short datatum = (b << sizeof(unsigned char)) | a;
}
else {
// platform is big endian
unsigned short datatum = (a << sizeof(unsigned char)) | b;
}
If you're saying it crashes at that point, then I would imagine you were reading outside of the array. But storing a pointer to an unsigned short as an unsigned short is interesting to say the least, does that even compile?
Solved by a combination of Donald Tangs method and the realization that the program was reading outside the array. For other potential users the problem occurs when finding the value of num_rows num_cols as the authors square root function doesn't square root.
I have been working on a legacy C++ application and am definitely outside of my comfort-zone (a good thing). I was wondering if anyone out there would be so kind as to give me a few pointers (pun intended).
I need to cast 2 bytes in an unsigned char array to an unsigned short. The bytes are consecutive.
For an example of what I am trying to do:
I receive a string from a socket and place it in an unsigned char array. I can ignore the first byte and then the next 2 bytes should be converted to an unsigned char. This will be on windows only so there are no Big/Little Endian issues (that I am aware of).
Here is what I have now (not working obviously):
//packetBuffer is an unsigned char array containing the string "123456789" for testing
//I need to convert bytes 2 and 3 into the short, 2 being the most significant byte
//so I would expect to get 515 (2*256 + 3) instead all the code I have tried gives me
//either errors or 2 (only converting one byte
unsigned short myShort;
myShort = static_cast<unsigned_short>(packetBuffer[1])
Well, you are widening the char into a short value. What you want is to interpret two bytes as an short. static_cast cannot cast from unsigned char* to unsigned short*. You have to cast to void*, then to unsigned short*:
unsigned short *p = static_cast<unsigned short*>(static_cast<void*>(&packetBuffer[1]));
Now, you can dereference p and get the short value. But the problem with this approach is that you cast from unsigned char*, to void* and then to some different type. The Standard doesn't guarantee the address remains the same (and in addition, dereferencing that pointer would be undefined behavior). A better approach is to use bit-shifting, which will always work:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
This is probably well below what you care about, but keep in mind that you could easily get an unaligned access doing this. x86 is forgiving and the abort that the unaligned access causes will be caught internally and will end up with a copy and return of the value so your app won't know any different (though it's significantly slower than an aligned access). If, however, this code will run on a non-x86 (you don't mention the target platform, so I'm assuming x86 desktop Windows), then doing this will cause a processor data abort and you'll have to manually copy the data to an aligned address before trying to cast it.
In short, if you're going to be doing this access a lot, you might look at making adjustments to the code so as not to have unaligned reads and you'll see a perfromance benefit.
unsigned short myShort = *(unsigned short *)&packetBuffer[1];
The bit shift above has a bug:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
if packetBuffer is in bytes (8 bits wide) then the above shift can and will turn packetBuffer into a zero, leaving you with only packetBuffer[2];
Despite that this is still preferred to pointers. To avoid the above problem, I waste a few lines of code (other than quite-literal-zero-optimization) it results in the same machine code:
unsigned short p;
p = packetBuffer[1]; p <<= 8; p |= packetBuffer[2];
Or to save some clock cycles and not shift the bits off the end:
unsigned short p;
p = (((unsigned short)packetBuffer[1])<<8) | packetBuffer[2];
You have to be careful with pointers, the optimizer will bite you, as well as memory alignments and a long list of other problems. Yes, done right it is faster, done wrong the bug can linger for a long time and strike when least desired.
Say you were lazy and wanted to do some 16 bit math on an 8 bit array. (little endian)
unsigned short *s;
unsigned char b[10];
s=(unsigned short *)&b[0];
if(b[0]&7)
{
*s = *s+8;
*s &= ~7;
}
do_something_With(b);
*s=*s+8;
do_something_With(b);
*s=*s+8;
do_something_With(b);
There is no guarantee that a perfectly bug free compiler will create the code you expect. The byte array b sent to the do_something_with() function may never get modified by the *s operations. Nothing in the code above says that it should. If you don't optimize your code then you may never see this problem (until someone does optimize or changes compilers or compiler versions). If you use a debugger you may never see this problem (until it is too late).
The compiler doesn't see the connection between s and b, they are two completely separate items. The optimizer may choose not to write *s back to memory because it sees that *s has a number of operations so it can keep that value in a register and only save it to memory at the end (if ever).
There are three basic ways to fix the pointer problem above:
Declare s as volatile.
Use a union.
Use a function or functions whenever changing types.
You should not cast a unsigned char pointer into an unsigned short pointer (for that matter cast from a pointer of smaller data type to a larger data type). This is because it is assumed that the address will be aligned correctly. A better approach is to shift the bytes into a real unsigned short object, or memcpy to a unsigned short array.
No doubt, you can adjust the compiler settings to get around this limitation, but this is a very subtle thing that will break in the future if the code gets passed around and reused.
Maybe this is a very late solution but i just want to share with you. When you want to convert primitives or other types you can use union. See below:
union CharToStruct {
char charArray[2];
unsigned short value;
};
short toShort(char* value){
CharToStruct cs;
cs.charArray[0] = value[1]; // most significant bit of short is not first bit of char array
cs.charArray[1] = value[0];
return cs.value;
}
When you create an array with below hex values and call toShort function, you will get a short value with 3.
char array[2];
array[0] = 0x00;
array[1] = 0x03;
short i = toShort(array);
cout << i << endl; // or printf("%h", i);
static cast has a different syntax, plus you need to work with pointers, what you want to do is:
unsigned short *myShort = static_cast<unsigned short*>(&packetBuffer[1]);
Did nobody see the input was a string!
/* If it is a string as explicitly stated in the question.
*/
int byte1 = packetBuffer[1] - '0'; // convert 1st byte from char to number.
int byte2 = packetBuffer[2] - '0';
unsigned short result = (byte1 * 256) + byte2;
/* Alternatively if is an array of bytes.
*/
int byte1 = packetBuffer[1];
int byte2 = packetBuffer[2];
unsigned short result = (byte1 * 256) + byte2;
This also avoids the problems with alignment that most of the other solutions may have on certain platforms. Note A short is at least two bytes. Most systems will give you a memory error if you try and de-reference a short pointer that is not 2 byte aligned (or whatever the sizeof(short) on your system is)!
char packetBuffer[] = {1, 2, 3};
unsigned short myShort = * reinterpret_cast<unsigned short*>(&packetBuffer[1]);
I (had to) do this all the time. big endian is an obvious problem. What really will get you is incorrect data when the machine dislike misaligned reads! (and write).
you may want to write a test cast and an assert to see if it reads properly. So when ran on a big endian machine or more importantly a machine that dislikes misaligned reads an assert error will occur instead of a weird hard to trace 'bug' ;)
On windows you can use:
unsigned short i = MAKEWORD(lowbyte,hibyte);
I realize this is an old thread, and I can't say that I tried every suggestion made here. I'm just making my self comfortable with mfc, and I was looking for a way to convert a uint to two bytes, and back again at the other end of a socket.
There are alot of bit shifting examples you can find on the net, but none of them seemed to actually work. Alot of the examples seem overly complicated; I mean we're just talking about grabbing 2 bytes out of a uint, sending them over the wire, and plugging them back into a uint at the other end, right?
This is the solution I finally came up with:
class ByteConverter
{
public:
static void uIntToBytes(unsigned int theUint, char* bytes)
{
unsigned int tInt = theUint;
void *uintConverter = &tInt;
char *theBytes = (char*)uintConverter;
bytes[0] = theBytes[0];
bytes[1] = theBytes[1];
}
static unsigned int bytesToUint(char *bytes)
{
unsigned theUint = 0;
void *uintConverter = &theUint;
char *thebytes = (char*)uintConverter;
thebytes[0] = bytes[0];
thebytes[1] = bytes[1];
return theUint;
}
};
Used like this:
unsigned int theUint;
char bytes[2];
CString msg;
ByteConverter::uIntToBytes(65000,bytes);
theUint = ByteConverter::bytesToUint(bytes);
msg.Format(_T("theUint = %d"), theUint);
AfxMessageBox(msg, MB_ICONINFORMATION | MB_OK);
Hope this helps someone out.