Memcpy uint32_t into char* - c++

I testing a bit with different formats and stuff like that. And we got a task where we have to put uint32_t into char*. This is the code i use:
void appendString(string *s, uint32_t append){
char data[4];
memcpy(data, &append, sizeof(append));
s->append(data);
}
void appendString(string *s, short append){
char data[2];
memcpy(data, &append, sizeof(append));
s->append(data);
}
From string to char is simple and we have to add multiple uints into the char*. So now i'm just calling it like:
string s;
appendString(&s, (uint32_t)1152); //this works
appendString(&s, (uint32_t)640); //this also works
appendString(&s, (uint32_t)512); //this doesn't work
I absolutely don't understand why the last one isn't working properly. I've tested multiple variations of transform this. One way always gave me output like (in bits): 00110100 | 00110101 ... so the first 2 bits are always zero, followed by 11 and then for me some random numbers.. What am i doing wrong?

Assuming that string is std::string, then the single-argument version of std::string::append is being used, which assumes the input data is NUL-terminated. Yours is not, but append will go looking for the first NUL byte anyway.
512 is 0x00000100, which on a little endian machine is 0x00 0x01 0x00 0x00. Since the first byte is NUL, std::string::append() stops there.
Use the version of std::string::append() where you pass in the length.

Related

Convert BYTE array into unsigned long long int

I'm trying to convert a BYTE array into an equivalent unsigned long long int value but my coding is not working as expected. Please help with fixing it or suggest an alternative method for the same.
Extra Information: These 4 bytes are combined as a hexadecimal number and an equivalent decimal number is an output. Say for a Given byteArray= {0x00, 0xa8, 0x4f, 0x00}, Hexadecimal number is 00a84f00 and it's equivalent decimal number is 11030272.
#include <iostream>
#include <string>
typedef unsigned char BYTE;
int main(int argc, char *argv[])
{
BYTE byteArray[4] = { 0x00, 0x08, 0x00, 0x00 };
std::string str(reinterpret_cast<char*>(&byteArray[0]), 4);
std::cout << str << std::endl;
unsigned long long ull = std::strtoull(str.c_str(), NULL, 0);
printf ("The decimal equivalents are: %llu", ull);
return EXIT_SUCCESS;
}
I'm getting the following output:
The decimal equivalents are: 0
While the expected output was:
The decimal equivalents are: 2048
When you call std::strtoull(str.c_str(), NULL, 0);, its first argument supplied is equivalent to an empty string, as string is essentially a null-terminated sequence of characters.
Second, std::strtoull() does not convert with byte sequences, it converts with the literal meaning of strings. i.e. you'll get 2048 with std::strtoull("2048", NULL, 10).
Another thing to note is that unsigned long long is a 64-bit data type, whereas your byte array only provides 32 bits. You need to fill the other 32 bytes with zero to get the correct result. I use a direct assignment, but you could also use std::memset() here.
What you want to do is:
ull = 0ULL;
std::memcpy(&ull, byteArray, 4);
Given your platform has little-endian, the result should be 2048.
What you first must remember is that a string, is really a null-terminated string. Secondly, a string is a string of characters, which is not what you have. The third problem is that you have an array of four bytes, which corresponds to an unsigned 32-bit integer, and you want an (at least) 64-bit types which is 8 bytes.
You can solve all these problems with a temporary variable, a simple call to std::memcpy, and an assignment:
uint32_t temp;
std::memcpy(&temp, byteArray, 4);
ull = temp;
Of course, this assumes that the endianness is correct.
Note that I use std::memcpy instead of std::copy (or std::copy_n) because std::memcpy is explicitly mentioned to be able to bypass strict aliasing this way, while I don't think the std::copy functions are. Also the std::copy functions are more for copying elements and not anonymous bytes (even if they can do that too, but with a clunkier syntax).
Given the answers are using std::memcpy, I want to point out that there's a more idiomatic way of doing this operation:
char byteArray[] = { 0x00, 0x08, 0x00, 0x00 };
uint32_t cp;
std::copy(byteArray, byteArray + sizeof(cp), reinterpret_cast<char*>(&cp));
std::copy is similar to std::memcpy, but is the C++ way of doing it.
Note that you need to cast the address of the output variable cp to one of: char *, unsigned char *, signed char *, or std::byte *, because otherwise the operation wouldn't be byte oriented.

Can i use an array of uchars as a single uchar?

I'm making a LZW compressor that records its output in hexadecimal. It currently uses an uchar (OpenCV) for storing values, and outputs the uchar in hexadecimal.
However, I have been asked to allow the user to choose how many bytes are used when storing each value, so he could have, for example, 2 bytes for each value (or 32 bytes, it's up to him).
So, to manipulate the output, I was thinking of using an array of uchars (so, if the user asks for 32 bytes, I use an array of 32 uchars), and the question is: is there an easy way to write a big value to this array and outputting that value later without having to worry about what is in what index and other things? That is, to treat the array as just a x byte uchar? Should I use a vector?
Any help is appreciated.
You could use the following union
union pun_unsigned {
unsigned char c[sizeof(uint64_t)];
uint16_t u16;
uint32_t u32;
uint64_t u64;
};
Note that only conversions from or to (signed or unsigned) char are defined behaviour.

Read a Fixed Number of (Binary) Bytes from an unsigned const char*

I have an unsigned const char* buffer in memory (comes from the network) that I need to do some stuff with. What stumps me right now is that I need to interpret the first two bytes as binary data, while the rest is ASCII. I have no problem reading the ASCII (I think), but I can't figure out how to read just the first two bytes of the unsigned array, and turn them into (say) an int. I was going to use reinterpret_cast, but the first two bytes are not null-terminated, and the only other help I could find was all about file IO.
In short, I have something like {0000000000001011}ABC Z123 XY0 5, where the characters outside the curly braces are read as ASCII, while the ones inside are supposed to be a single binary number, i.e. 11).
int c1 = buffer[0];
int c2 = buffer[1];
int number = c1 << 8 + c2;
unsigned char* asciiData = buffer+2;
I really don't get why the bytes have to be "null-terminated" for you to use reinterpret_cast. What I would do (and works so far in my projects) is:
uint16_t first_bytes = *(reinterpret_cast<const uint16_t*>(buffer));
That would get you the first two bytes in the buffer and assign the value to the first_bytes variable.

Cast streamstream to struct

I have the following struct.
struct testStruct {
uint8_t firstval[2];
uint16_t secondval;
uinit8_t thirdval;
}myStruct;
Now I get a stringstream with the following content.
"/002/003/000/207/001"
I got this content over the network. Befor sending it was "0x02 0x03 0x00 0xB8 0x01"
But if i cast this "/207" into a uint8_t, i get 184 (hex 0xB8). So the stringstream content should be correct.
How can I copy the content of this stringstream to the struct?
I tried:
memcpy((char*)&myStruct, sstream.str().c_str(), len);
The values of myStruct.firstval[0], myStruct.firstval[1] and myStruct.thirdval are correct.
The value of myStruct.secondval is incorrect, because it is a 2-byte-datatype.
You're probably on a platform where the bytes in a uint16_t are stored in the opposite order in memory than they are on the network. You'll need to use ntohs to convert the second value to the host's byte order from network byte order.
In addition, structs can have padding inserted between the members for alignment or efficiency reasons, so in general using memcpy on the raw bytes will not work, unless you ensure the struct is packed correctly.
It looks to me like you're probably reading big-endian values and then interpreting them as little endian. You'll need to byte swap secondval. First let's just say that serializing by memcpy isn't portable at all, but if you wish to proceed in that direction, I think something as simple as calling ntohs on myStruct.secondval should do the trick.
Better approach than using memcpy:
istream& operator>>(istream& is, testStruct& t)
{
is >> t.firstval[0];
is >> t.firstval[1];
is >> t.secondval;
t.secondval = ntohs(t.secondval);
is >> t.thirdval;
return is;
}

C++: how to cast 2 bytes in an array to an unsigned short

I have been working on a legacy C++ application and am definitely outside of my comfort-zone (a good thing). I was wondering if anyone out there would be so kind as to give me a few pointers (pun intended).
I need to cast 2 bytes in an unsigned char array to an unsigned short. The bytes are consecutive.
For an example of what I am trying to do:
I receive a string from a socket and place it in an unsigned char array. I can ignore the first byte and then the next 2 bytes should be converted to an unsigned char. This will be on windows only so there are no Big/Little Endian issues (that I am aware of).
Here is what I have now (not working obviously):
//packetBuffer is an unsigned char array containing the string "123456789" for testing
//I need to convert bytes 2 and 3 into the short, 2 being the most significant byte
//so I would expect to get 515 (2*256 + 3) instead all the code I have tried gives me
//either errors or 2 (only converting one byte
unsigned short myShort;
myShort = static_cast<unsigned_short>(packetBuffer[1])
Well, you are widening the char into a short value. What you want is to interpret two bytes as an short. static_cast cannot cast from unsigned char* to unsigned short*. You have to cast to void*, then to unsigned short*:
unsigned short *p = static_cast<unsigned short*>(static_cast<void*>(&packetBuffer[1]));
Now, you can dereference p and get the short value. But the problem with this approach is that you cast from unsigned char*, to void* and then to some different type. The Standard doesn't guarantee the address remains the same (and in addition, dereferencing that pointer would be undefined behavior). A better approach is to use bit-shifting, which will always work:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
This is probably well below what you care about, but keep in mind that you could easily get an unaligned access doing this. x86 is forgiving and the abort that the unaligned access causes will be caught internally and will end up with a copy and return of the value so your app won't know any different (though it's significantly slower than an aligned access). If, however, this code will run on a non-x86 (you don't mention the target platform, so I'm assuming x86 desktop Windows), then doing this will cause a processor data abort and you'll have to manually copy the data to an aligned address before trying to cast it.
In short, if you're going to be doing this access a lot, you might look at making adjustments to the code so as not to have unaligned reads and you'll see a perfromance benefit.
unsigned short myShort = *(unsigned short *)&packetBuffer[1];
The bit shift above has a bug:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
if packetBuffer is in bytes (8 bits wide) then the above shift can and will turn packetBuffer into a zero, leaving you with only packetBuffer[2];
Despite that this is still preferred to pointers. To avoid the above problem, I waste a few lines of code (other than quite-literal-zero-optimization) it results in the same machine code:
unsigned short p;
p = packetBuffer[1]; p <<= 8; p |= packetBuffer[2];
Or to save some clock cycles and not shift the bits off the end:
unsigned short p;
p = (((unsigned short)packetBuffer[1])<<8) | packetBuffer[2];
You have to be careful with pointers, the optimizer will bite you, as well as memory alignments and a long list of other problems. Yes, done right it is faster, done wrong the bug can linger for a long time and strike when least desired.
Say you were lazy and wanted to do some 16 bit math on an 8 bit array. (little endian)
unsigned short *s;
unsigned char b[10];
s=(unsigned short *)&b[0];
if(b[0]&7)
{
*s = *s+8;
*s &= ~7;
}
do_something_With(b);
*s=*s+8;
do_something_With(b);
*s=*s+8;
do_something_With(b);
There is no guarantee that a perfectly bug free compiler will create the code you expect. The byte array b sent to the do_something_with() function may never get modified by the *s operations. Nothing in the code above says that it should. If you don't optimize your code then you may never see this problem (until someone does optimize or changes compilers or compiler versions). If you use a debugger you may never see this problem (until it is too late).
The compiler doesn't see the connection between s and b, they are two completely separate items. The optimizer may choose not to write *s back to memory because it sees that *s has a number of operations so it can keep that value in a register and only save it to memory at the end (if ever).
There are three basic ways to fix the pointer problem above:
Declare s as volatile.
Use a union.
Use a function or functions whenever changing types.
You should not cast a unsigned char pointer into an unsigned short pointer (for that matter cast from a pointer of smaller data type to a larger data type). This is because it is assumed that the address will be aligned correctly. A better approach is to shift the bytes into a real unsigned short object, or memcpy to a unsigned short array.
No doubt, you can adjust the compiler settings to get around this limitation, but this is a very subtle thing that will break in the future if the code gets passed around and reused.
Maybe this is a very late solution but i just want to share with you. When you want to convert primitives or other types you can use union. See below:
union CharToStruct {
char charArray[2];
unsigned short value;
};
short toShort(char* value){
CharToStruct cs;
cs.charArray[0] = value[1]; // most significant bit of short is not first bit of char array
cs.charArray[1] = value[0];
return cs.value;
}
When you create an array with below hex values and call toShort function, you will get a short value with 3.
char array[2];
array[0] = 0x00;
array[1] = 0x03;
short i = toShort(array);
cout << i << endl; // or printf("%h", i);
static cast has a different syntax, plus you need to work with pointers, what you want to do is:
unsigned short *myShort = static_cast<unsigned short*>(&packetBuffer[1]);
Did nobody see the input was a string!
/* If it is a string as explicitly stated in the question.
*/
int byte1 = packetBuffer[1] - '0'; // convert 1st byte from char to number.
int byte2 = packetBuffer[2] - '0';
unsigned short result = (byte1 * 256) + byte2;
/* Alternatively if is an array of bytes.
*/
int byte1 = packetBuffer[1];
int byte2 = packetBuffer[2];
unsigned short result = (byte1 * 256) + byte2;
This also avoids the problems with alignment that most of the other solutions may have on certain platforms. Note A short is at least two bytes. Most systems will give you a memory error if you try and de-reference a short pointer that is not 2 byte aligned (or whatever the sizeof(short) on your system is)!
char packetBuffer[] = {1, 2, 3};
unsigned short myShort = * reinterpret_cast<unsigned short*>(&packetBuffer[1]);
I (had to) do this all the time. big endian is an obvious problem. What really will get you is incorrect data when the machine dislike misaligned reads! (and write).
you may want to write a test cast and an assert to see if it reads properly. So when ran on a big endian machine or more importantly a machine that dislikes misaligned reads an assert error will occur instead of a weird hard to trace 'bug' ;)
On windows you can use:
unsigned short i = MAKEWORD(lowbyte,hibyte);
I realize this is an old thread, and I can't say that I tried every suggestion made here. I'm just making my self comfortable with mfc, and I was looking for a way to convert a uint to two bytes, and back again at the other end of a socket.
There are alot of bit shifting examples you can find on the net, but none of them seemed to actually work. Alot of the examples seem overly complicated; I mean we're just talking about grabbing 2 bytes out of a uint, sending them over the wire, and plugging them back into a uint at the other end, right?
This is the solution I finally came up with:
class ByteConverter
{
public:
static void uIntToBytes(unsigned int theUint, char* bytes)
{
unsigned int tInt = theUint;
void *uintConverter = &tInt;
char *theBytes = (char*)uintConverter;
bytes[0] = theBytes[0];
bytes[1] = theBytes[1];
}
static unsigned int bytesToUint(char *bytes)
{
unsigned theUint = 0;
void *uintConverter = &theUint;
char *thebytes = (char*)uintConverter;
thebytes[0] = bytes[0];
thebytes[1] = bytes[1];
return theUint;
}
};
Used like this:
unsigned int theUint;
char bytes[2];
CString msg;
ByteConverter::uIntToBytes(65000,bytes);
theUint = ByteConverter::bytesToUint(bytes);
msg.Format(_T("theUint = %d"), theUint);
AfxMessageBox(msg, MB_ICONINFORMATION | MB_OK);
Hope this helps someone out.