I have a char buffer buf containing buf[0] = 10, buf[1] = 3, buf[2] = 3, buf[3] = 0, buf[4] = 58,
and a structure:
typedef struct
{
char type;
int version;
int length;
}Header;
I wanted to convert the buf into a Header. Now I am using the function
int getByte( unsigned char* buf)
{
int number = buf[0];
return number;
}
int getInt(unsigned char* buf)
{
int number = (buf[0]<<8)+buf[1];
return number;
}
main()
{
Header *head = new Header;
int location = 0;
head->type = getByte(&buf[location]);
location++; // location = 1
head->version = getInt(&buf[location]);
location += 2; // location = 3
head->ength = getInt(&buf[location]);
location += 2; // location = 5
}
I am searching for a solution such as
Header *head = new Header;
memcpy(head, buf, sizeof(head));
In this, first value in the Header, head->type is proper and rest is garbage. Is it possible to convert unsigned char* buf to Header?
The only full portable and secure way is:
void convertToHeader(unsigned char const * const buffer, Header *header)
{
header->type = buffer[0];
header->version = (buffer[1] << 8) | buffer[2];
header->length = (buffer[3] << 8) | buffer[4];
}
and
void convertFromHeader(Header const * const header, unsigned char * buffer)
{
buffer[0] = header->type;
buffer[1] = (static_cast<unsigned int>(header->version) >> 8) & 0xFF;
buffer[2] = header->version & 0xFF;
buffer[3] = (static_cast<unsigned int>(header->length) >> 8) & 0xFF;
buffer[4] = header->length & 0xFF;
}
Example
see Converting bytes array to integer for explanations
EDIT
A quick summary of previous link: other possible solutions (memcpy or union for example) are no portable according endianess of different system (doing what you do is probably for a sort of communication between at least two heterogeneous systems) => some of systems byte[0] is LSB of int and byte[1] is MSB and on other is the inverse.
Also, due to alignement, struct Header can be bigger than 5 bytes (probably 6 bytes in your case, if alignement is 2 bytes!) (see here for example)
Finally, according alignment restrictions and aliasing rules on some platform, compiler can generate incorrect code.
What you want would need your version and length to have the same length as 2 elements of your buf array; that is you'd need to use the type uint16_t, defined in <cstdint>, rather than int which is likely longer. And also you'd need to make buf an array of uint8_t, as char is allowed to take more than 1 byte!
You probably also need to move type to the end; as otherwise the compiler will almost certainly insert a padding byte after it to be able to align version to a 2-byte boundary (once you have made it uint16_t and thus 2 bytes); and then your buf[1] would end up there rather than were you want it.
This is probably what you observe right now, by the way: by having a char followed by an int, which is probably 4 bytes, you have 3 bytes of padding, and the elements 1 to 3 of your array are being inserted there (=lost forever).
Another solution would be to modify your buf array to be longer and have empty padding bytes as well, so that the data will be actually aligned with the struct fields.
Worth mentioning again is that, as pointed out in the comments, sizeof(head) returns the size of pointers on your system, not of the Header structure. You can directly write sizeof(Header); but at this level of micromanagement, you wont be losing any more flexibility if you just write "5", really.
Also, endianness can screw with you. Processors have no obbligation to store the bytes of a number in the order you expect rather than the opposite one; both make internal sense after all. This means that blindly copying bytes buf[0], buf[1] into a number can result in (buf[0]<<8)+buf[1], but also in (buf[1]<<8)+buf[0], or even in (buf[1]<<24)+(buf[0]<<16) if the data type is 4 bytes (as int usually is). And even if it works on your computer now, there is at least one out there where the same code will result in garbage. Unless, that is, those bytes actually come from reinterpreting a number in the first place. In which case the code is wrong (not portable) now, however.
...is it worth it?
All things considered, my advice is strongly to keep the way you handle them now. Maybe simplify it.
It really makes no sense to convert a byte to an int then to byte again, or to take the address of a byte to dereference it again, nor there is need of helper variables with no descriptive name and no purpose other than being returned, or of a variable whose value you know in advance at all time.
Just do
int getTwoBytes(unsigned char* buf)
{
return (buf[0]<<8)+buf[1];
}
main()
{
Header *head = new Header;
head->type = buf[0];
head->version = getTwoBytes(buf + 1);
head->length = getTwoBytes(buf + 3);
}
the better way is to create some sort of serialization/deserialization routines.
also, I'd use not just int or char types, but would use more specific int32_t etc. it's just platform-independent way (well, actually you can also pack your data structures with pragma pack).
struct Header
{
char16_t type;
int32_t version;
int32_t length;
};
struct Tools
{
std::shared_ptr<Header> deserializeHeader(const std::vector<unsigned char> &loadedBuffer)
{
std::shared_ptr<Header> header(new Header);
memcpy(&(*header), &loadedBuffer[0], sizeof(Header));
return header;
}
std::vector<unsigned char> serializeHeader(const Header &header)
{
std::vector<unsigned char> buffer;
buffer.resize(sizeof(Header));
memcpy(&buffer[0], &header, sizeof(Header));
return buffer;
}
}
tools;
Header header = {'B', 5834, 4665};
auto v1 = tools.serializeHeader(header);
auto v2 = tools.deserializeHeader(v1);
Related
Consider the following c++ code:
unsigned char* data = readData(..); //Let say data consist of 12 characters
unsigned int dataSize = getDataSize(...); //the size in byte of the data is also known (let say 12 bytes)
struct Position
{
float pos_x; //remember that float is 4 bytes
double pos_y; //remember that double is 8 bytes
}
Now I want to fill a Position variable/instance with data.
Position pos;
pos.pos_x = ? //data[0:4[ The first 4 bytes of data should be set to pos_x, since pos_x is of type float which is 4 bytes
pos.pos_x = ? //data[4:12[ The remaining 8 bytes of data should be set to pos_y which is of type double (8 bytes)
I know that in data, the first bytes correspond to pos_x and the rest to pos_y. That means the 4 first byte/character of data should be used to fill pos_x and the 8 remaining byte fill pos_y but I don't know how to do that.
Any idea? Thanks. Ps: I'm limited to c++11
You can use plain memcpy as another answer advises. I suggest packing memcpy into a function that also does error checking for you for most convenient and type-safe usage.
Example:
#include <cstring>
#include <stdexcept>
#include <type_traits>
struct ByteStreamReader {
unsigned char const* begin;
unsigned char const* const end;
template<class T>
operator T() {
static_assert(std::is_trivially_copyable<T>::value,
"The type you are using cannot be safely copied from bytes.");
if(end - begin < static_cast<decltype(end - begin)>(sizeof(T)))
throw std::runtime_error("ByteStreamReader");
T t;
std::memcpy(&t, begin, sizeof t);
begin += sizeof t;
return t;
}
};
struct Position {
float pos_x;
double pos_y;
};
int main() {
unsigned char data[12] = {};
unsigned dataSize = sizeof data;
ByteStreamReader reader{data, data + dataSize};
Position p;
p.pos_x = reader;
p.pos_y = reader;
}
One thing that you can do is to copy the data byte-by byte. There is a standard function to do that: std::memcpy. Example usage:
assert(sizeof pos.pos_x == 4);
std::memcpy(&pos.pos_x, data, 4);
assert(sizeof pos.pos_y == 8);
std::memcpy(&pos.pos_y, data + 4, 8);
Note that simply copying the data only works if the data is in the same representation as the CPU uses. Understand that different processors use different representations. Therefore, if your readData receives the data over the network for example, a simple copy is not a good idea. The least that you would have to do in such case is to possibly convert the endianness of the data to the native endianness (probably from big endian, which is conventionally used as the network endianness). Converting from one floating point representation to another is much trickier, but luckily IEE-754 is fairly ubiquitous.
I have to read 10 bytes from a file and the last 4 bytes are an unsigned integer. But I got a 11 char byte long char array / pointer. How do I convert the last 4 bytes (without the zero terminating character at the end) to an unsigned integer?
//pesudo code
char *p = readBytesFromFile();
unsigned int myInt = 0;
for( int i = 6; i < 10; i++ )
myInt += (int)p[i];
Is that correct? Doesn't seem correct to me.
The following code might work:
myInt = *(reinterpret_cast<unsigned int*>(p + 6));
iff:
There are no alignment problems (e.g. on a GPU memory space this is very likely to blow if some guarantees aren't provided).
You can guarantee that the system endianness is the same used to store the data
You can be sure that sizeof(int) == 4, this is not guaranteed everywhere
If not, as Dietmar suggested, you should loop over your data (forward or reverse according to the endianness) and do something like
myInt = myInt << 8 | static_cast<unsigned char>(p[i])
this is alignment-safe (it should be on every system). Still pay attention to points 1 and 3.
I agree with the previous answer but just wanna add that this solution may not work 100% if the file was created with a different endianness.
I do not want to confuse you with extra information but keep in mind that endianness may cause you problem when you cast directly from a file.
Here's a tutorial on endianness : http://www.codeproject.com/Articles/4804/Basic-concepts-on-Endianness
Try myInt = *(reinterpret_cast<unsigned int*>(p + 6));.
This takes the address of the 6th character, reinterprets as a pointer to an unsigned int, and then returns the (unsigned int) value it points to.
Maybe using an union is an option? I think this might work;
UPDATE: Yes, it works.
union intc32 {
char c[4];
int v;
};
int charsToInt(char a, char b, char c, char d) {
intc32 r = { { a, b, c, d } };
return r.v;
}
I have such structure
typedef struct {
int32_t DataLen;
char Data[1];
} MTEMSG;
So Data contains DataLen symbols that should be decoded by certain rules. I should write ReadInt ReadString etc methods.
As a first step I want to write ReadInt. From documentation this is "Four bytes in a format of x86 CPU (the little-endian byte goes first)." How can I convert char[1] to int? I guess it should be something like:
MTEMSG* data;
int offset;
....
int Reader::ReadInt()
{
int result = // read 4 bytes starting from offset
offset += 4;
}
It's allowed to use boost and c++11. Just looking for simple and fast method to convert.
I hope once you suggest me how to convert int I can do many of the rest methods myself.
Totally illegal and UB, but you would do something like *reinterpret_cast<int*>(data+offset).
Watch out for alignment and stuff.
First of all, in C++ as they have stated in the comments, this is illegal. Nevertheless, assuming your compiler assumes you might do something like this and has a well-defined behavior for it, then let's go ahead.
So semantically, you have such a struct:
typedef struct {
int32_t DataLen;
char Data[N];
} MTEMSG;
where N is "large enough".
And you need to convert Data to a 4-byte little endian integer. That's quite simple:
MTEMSG* data;
int offset = 0;
....
int Reader::ReadInt()
{
/* Note: int32_t would be more precise */
int result = data->Data[offset + 0]
| (data->Data[offset + 1] << 8)
| (data->Data[offset + 2] << 16)
| (data->Data[offset + 3] << 24);
offset += 4;
}
In C/C++, is there an easy way to apply bitwise operators (specifically left/right shifts) to dynamically allocated memory?
For example, let's say I did this:
unsigned char * bytes=new unsigned char[3];
bytes[0]=1;
bytes[1]=1;
bytes[2]=1;
I would like a way to do this:
bytes>>=2;
(then the 'bytes' would have the following values):
bytes[0]==0
bytes[1]==64
bytes[2]==64
Why the values should be that way:
After allocation, the bytes look like this:
[00000001][00000001][00000001]
But I'm looking to treat the bytes as one long string of bits, like this:
[000000010000000100000001]
A right shift by two would cause the bits to look like this:
[000000000100000001000000]
Which finally looks like this when separated back into the 3 bytes (thus the 0, 64, 64):
[00000000][01000000][01000000]
Any ideas? Should I maybe make a struct/class and overload the appropriate operators? Edit: If so, any tips on how to proceed? Note: I'm looking for a way to implement this myself (with some guidance) as a learning experience.
I'm going to assume you want bits carried from one byte to the next, as John Knoeller suggests.
The requirements here are insufficient. You need to specify the order of the bits relative to the order of the bytes - when the least significant bit falls out of one byte, does to go to the next higher or next lower byte.
What you are describing, though, used to be very common for graphics programming. You have basically described a monochrome bitmap horizontal scrolling algorithm.
Assuming that "right" means higher addresses but less significant bits (ie matching the normal writing conventions for both) a single-bit shift will be something like...
void scroll_right (unsigned char* p_Array, int p_Size)
{
unsigned char orig_l = 0;
unsigned char orig_r;
unsigned char* dest = p_Array;
while (p_Size > 0)
{
p_Size--;
orig_r = *p_Array++;
*dest++ = (orig_l << 7) + (orig_r >> 1);
orig_l = orig_r;
}
}
Adapting the code for variable shift sizes shouldn't be a big problem. There's obvious opportunities for optimisation (e.g. doing 2, 4 or 8 bytes at a time) but I'll leave that to you.
To shift left, though, you should use a separate loop which should start at the highest address and work downwards.
If you want to expand "on demand", note that the orig_l variable contains the last byte above. To check for an overflow, check if (orig_l << 7) is non-zero. If your bytes are in an std::vector, inserting at either end should be no problem.
EDIT I should have said - optimising to handle 2, 4 or 8 bytes at a time will create alignment issues. When reading 2-byte words from an unaligned char array, for instance, it's best to do the odd byte read first so that later word reads are all at even addresses up until the end of the loop.
On x86 this isn't necessary, but it is a lot faster. On some processors it's necessary. Just do a switch based on the base (address & 1), (address & 3) or (address & 7) to handle the first few bytes at the start, before the loop. You also need to special case the trailing bytes after the main loop.
Decouple the allocation from the accessor/mutators
Next, see if a standard container like bitset can do the job for you
Otherwise check out boost::dynamic_bitset
If all fails, roll your own class
Rough example:
typedef unsigned char byte;
byte extract(byte value, int startbit, int bitcount)
{
byte result;
result = (byte)(value << (startbit - 1));
result = (byte)(result >> (CHAR_BITS - bitcount));
return result;
}
byte *right_shift(byte *bytes, size_t nbytes, size_t n) {
byte rollover = 0;
for (int i = 0; i < nbytes; ++i) {
bytes[ i ] = (bytes[ i ] >> n) | (rollover < n);
byte rollover = extract(bytes[ i ], 0, n);
}
return &bytes[ 0 ];
}
Here's how I would do it for two bytes:
unsigned int rollover = byte[0] & 0x3;
byte[0] >>= 2;
byte[1] = byte[1] >> 2 | (rollover << 6);
From there, you can generalize this into a loop for n bytes. For flexibility, you will want to generate the magic numbers (0x3 and 6) rather then hardcode them.
I'd look into something similar to this:
#define number_of_bytes 3
template<size_t num_bytes>
union MyUnion
{
char bytes[num_bytes];
__int64 ints[num_bytes / sizeof(__int64) + 1];
};
void main()
{
MyUnion<number_of_bytes> mu;
mu.bytes[0] = 1;
mu.bytes[1] = 1;
mu.bytes[2] = 1;
mu.ints[0] >>= 2;
}
Just play with it. You'll get the idea I believe.
Operator overloading is syntactic sugar. It's really just a way of calling a function and passing your byte array without having it look like you are calling a function.
So I would start by writing this function
unsigned char * ShiftBytes(unsigned char * bytes, size_t count_of_bytes, int shift);
Then if you want to wrap this up in an operator overload in order to make it easier to use or because you just prefer that syntax, you can do that as well. Or you can just call the function.
I want to read sizeof(int) bytes from a char* array.
a) In what scenario's do we need to worry if endianness needs to be checked?
b) How would you read the first 4 bytes either taking endianness into consideration or not.
EDIT : The sizeof(int) bytes that I have read needs to be compared with an integer value.
What is the best approach to go about this problem
Do you mean something like that?:
char* a;
int i;
memcpy(&i, a, sizeof(i));
You only have to worry about endianess if the source of the data is from a different platform, like a device.
a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.
You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
You process a data file that was created on a system with a different "endianness."
In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.
b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:
int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data
Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.
To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:
inline unsigned short swap_16bit(unsigned short us)
{
return (unsigned short)(((us & 0xFF00) >> 8) |
((us & 0x00FF) << 8));
}
inline unsigned long swap_32bit(unsigned long ul)
{
return (unsigned long)(((ul & 0xFF000000) >> 24) |
((ul & 0x00FF0000) >> 8) |
((ul & 0x0000FF00) << 8) |
((ul & 0x000000FF) << 24));
}
Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:
int foo = *(int*)(stream+offset_in_stream);
The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is
best: the library routines htonl and ntohl work very well with this, and they
are usually fairly well optimized.
However, if network byte order is not being used, you may need to do things in
other ways. You need to know two things: the size of an integer, and the byte order.
Once you know that, you know how many bytes to extract and in which order to put
them together into an int.
Some example code that assumes sizeof(int) is the right number of bytes:
#include <limits.h>
int bytes_to_int_big_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result = (result << CHAR_BIT) + bytes[i];
return result;
}
int bytes_to_int_little_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result += bytes[i] << (i * CHAR_BIT);
return result;
}
#ifdef TEST
#include <stdio.h>
int main(void)
{
const int correct = 0x01020304;
const char little[] = "\x04\x03\x02\x01";
const char big[] = "\x01\x02\x03\x04";
printf("correct: %0x\n", correct);
printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
return 0;
}
#endif
How about
int int_from_bytes(const char * bytes, _Bool reverse)
{
if(!reverse)
return *(int *)(void *)bytes;
char tmp[sizeof(int)];
for(size_t i = sizeof(tmp); i--; ++bytes)
tmp[i] = *bytes;
return *(int *)(void *)tmp;
}
You'd use it like this:
int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);
If you're on a system where casting void * to int * may result in alignment conflicts, you can use
int int_from_bytes(const char * bytes, _Bool reverse)
{
int tmp;
if(reverse)
{
for(size_t i = sizeof(tmp); i--; ++bytes)
((char *)&tmp)[i] = *bytes;
}
else memcpy(&tmp, bytes, sizeof(tmp));
return tmp;
}
You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.
Given that, can't you just use a for loop?
void ReadBytes(char * stream) {
for (int i = 0; i < sizeof(int); i++) {
char foo = stream[i];
}
}
}
Are you asking for something more complicated than that?
You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.
Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.
Why read when you can just compare?
bool AreEqual(int i, char *data)
{
return memcmp(&i, data, sizeof(int)) == 0;
}
If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.