How to reduce CPU usage? - c++

I have to decode some byte array(raw data). It can consist of basic data types(int,unsigned int,char,short etc.).According to defined structure, i need to interpret them. below is example:
struct testData
{
int a;
char c;
};
unsigned char** buf = {0x01,0x00,0x00,0x00,0x41}
example byte array(in little endian) : 0100000041
should give decoding like : a = 1, c = 'A'
The sample data can be very big and the sample structure( e.g. testData) can contain 200 - 3000 fields.
If I use am casting to read the appropriate data from **buf and set pointer like below:
int a = *(reinterpret_cast<int*>(*buf);
*buf += 4;
char c = **buf;
*buf += 1;
My CPU usage is quite high if number of fileds need to be decoded are high. example:
struct testData
{
int element1;
char element2;
int element3;
... ...
... ...
short element200;
char element201;
char element202;
}
Is there a way to reduce the CPU load as well as keep decoding very fast?
I have two constraints:
"Structure can contain padding byte."
I do not have control on how structure will be defined. Structure can contain nested elements as well.

int a = *(reinterpret_cast<int*>(*buf);
Don't use reinterpret_cast. You are lying to the compiler and forcing unaligned accesses. Worse, you are hiding from the compiler the very information it needs to optimize your code -- that the pointer is actually to characters. Instead, code what you mean as straightforward as possible, which is:
int a=static_cast<int>(*buf[0]) |
(static_cast<int>(*buf[1])<<8) |
(static_cast<int>(*buf[2])<<16) |
(static_cast<int>(*buf[3])<<24);
This is simple, clear, and what you actually want. The compiler will have no problem optimizing it. (And, it will work regardless of your platform's endianness.)

You should be able to simply map the struct to the buffer, as long as the struct is properly packed:
#pragma pack(push, 1)
struct testData
{
int element1;
char element2;
int element3;
... ...
... ...
short element200;
char element201;
char element202;
}
#pragma pack(pop)
You should also declare the structure in a alignment sensible way, don't mix int followed by char followed by int... Then if you read the data in an aligned buffer, a simple cast of the buffer to testData* would give you access to all members. This way you would avoid all those gratuitous copies. If you read the structure in forward fashion (p->element1, then read p->element2 then p->element3 and so on) hardware prefetch should kick in and give a big boost.
Further enhancements would require actual measurements of hot spots. Also, check this book out from the library and read it: The Software Optimization Cookbook.

Further to David Schwartz's response, you can tidy this up by writing some helper template functions. I'd suggest something like this (untested).
template<typename T>
const unsigned char * read_from_buffer( T* value, const unsigned char * buffer);
template<>
const unsigned char * read_from_buffer<int>( int* value, const unsigned char * buffer)
{
*value = static_cast<int>(*buf[0]) |
(static_cast<int>(*buf[1])<<8) |
(static_cast<int>(*buf[2])<<16) |
(static_cast<int>(*buf[3])<<24);
return buffer+4'
}
template<>
const unsigned char * read_from_buffer<char>( char * value, const unsigned char * buffer )
{
*value = *buffer;
return buffer+1;
}
struct TestData
{
int a;
char c;
};
int main()
{
unsigned char buf[] = {0x01,0x00,0x00,0x00,0x41};
unsigned char * ptr = buf;
TestData data;
ptr = read_from_buffer( &data.a, ptr );
ptr = read_from_buffer( &data.c, ptr );
}
You could encapsulate this even further and add error checking etc and you'd have a nice binary stream like interface.

Related

Import structs as nested, anonymous structs in union using C++

Please consider the following "unchangeable" declarations:
typedef struct T_MESSAGE
{
unsigned int uiTimestamp;
unsigned char ucDataType;
unsigned int uiDataSize;
unsigned char aucData[1024];
} TT_MESSAGE;
typedef struct T_SENSORDATA_HEADER
{
unsigned char ucSensorType;
unsigned char ucMountingPoint;
} TT_SENSORDATA_HEADER;
In case the message contains Sensor Data, the data is stored within the aucData array, always beginning with the Sensor Data Header. I would like to create a union or struct, which allows me to directly access all members of such a message, without having to use another variable name.
I hope you understand what I want to do by looking at my previous attempts.
I tried it like this:
union SensorDataMessage
{
struct T_Message;
struct
{
unsigned : 32; // Skip uiTimestamp
unsigned : 8; // Skip ucDataType
unsigned : 32; // Skip uiDataSize
struct T_SENSORDATA_HEADER;
};
};
and this:
struct SensorDataOverlay
{
unsigned : 32; // Skip uiTimestamp
unsigned : 8; // Skip ucDataType
unsigned : 32; // Skip uiDataSize
struct T_SENSORDATA_HEADER;
};
union SensorDataMessage
{
struct T_Message;
struct SensorDataOverlay;
};
But none of that is working. In the end, I would like to be able to write something like this:
int Evaluate(SensorDataMessage msg)
{
unsigned char tmp = msg.ucDataType;
unsigned char tmp2 = msg.ucSensorType;
[...]
}
From here I learned that what I want to do should be possible, but only in Visual C:
A Microsoft C extension allows you to declare a structure variable
within another structure without giving it a name. These nested
structures are called anonymous structures. C++ does not allow
anonymous structures.
You can access the members of an anonymous structure as if they were
members in the containing structure.
However, this seems not to be entirely true, since anonymous structs can be used in Visual C++ as well, like it is suggested here.
I would highly appreciate any help.
Here's what I found might help you out:
Have to change C/C++ compiler as Compile as C code (/TC) to gain anonymous structure support.
There's a missing keyword union on the declaration of Evaluate()
Anonymous native data type declaration in SensorDataOverlay seems would confuse the compiler, so I try to collect them into one single structure as CommonHeader, and put one pack in SensorDataOverlay.
I found T_MESSAGE and SensorDataOverlay shared the same scheme in the first three fields, I would say it would be better to be replaced as CommonHeader, would make more sense in perspective of data inheritance. Since at the beginning of question you pointed out that the T_MESSAGE is unchangeable, so I don't do any modification in the following code.
the complete code posted here, able to run, and I guess the memory offset scheme meets your needs.
*struct CommonHeader
{
unsigned int skipUiTimestamp;
unsigned char skipUcDataType;
unsigned int skipUiDataSize;
};
struct SensorDataOverlay
{
/* Use CommonHeader instead */
//unsigned : 32; // Skip uiTimestamp
//unsigned : 8; // Skip ucDataType
//unsigned : 32; // Skip uiDataSize
struct CommonHeader;
struct T_SENSORDATA_HEADER;
};
union SensorDataMessage
{
TT_MESSAGE;
struct SensorDataOverlay;
};
int Evaluate(union SensorDataMessage msg)
{
unsigned char tmp = msg.uiDataSize;
unsigned char tmp2 = msg.ucSensorType;
return 0;
}*

C++ understanding Unions and Structs

I've come to work on an ongoing project where some unions are defined as follows:
/* header.h */
typedef union my_union_t {
float data[4];
struct {
float varA;
float varB;
float varC;
float varD;
};
} my_union;
If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:
typedef struct my_struct {
float varA;
float varB;
float varC;
float varD;
};
Won't be the space allocated for both of them the same?
And how can I initialize varA,varB... from my_union?
Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.
The way you are using a union is a recipe for disaster.
You are assuming the the struct in the union is packing the floats with no gaps between then!
The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA; is the same as the address of data[0].
Never use a union in this way.
As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.
Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (e.g. permits more easily union of objects and their construction or initialization).
The advantage of using your union is to be able to index the n-th floating point (with 0 <= n <= 3) as u.data[n]
To assign a union field in some variable declared my_union u; just code e.g. u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;
A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):
class IntOrString {
bool isint;
union {
int num; // when isint is true
str::string str; // when isint is false
};
public:
IntOrString(int n=0) : isint(true), num(n) {};
IntOrString(std::string s) : isint(false), str(s) {};
IntOrString(const IntOrString& o): isint(o.isint)
{ if (isint) num = o.num; else str = o.str); };
IntOrString(IntOrString&&p) : isint(p.isint)
{ if (isint) num = std::move (p.num);
else str = std::move (p.str); };
~IntOrString() { if (isint) num=0; else str->~std::string(); };
void set (int n)
{ if (!isint) str->~std::string(); isint=true; num=n; };
void set (std::string s) { str = s; isint=false; };
bool is_int() const { return isint; };
int as_int() const { return (isint?num:0; };
const std::string as_string() const { return (isint?"":str;};
};
Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container (std::vector<IntOrString>)
See also std::optional in future versions of C++ (which conceptually is a tagged union with void)
BTW, in Ocaml, you simply code:
type intorstring = Integer of int | String of string;;
and you'll use pattern matching. If you wanted to make that mutable, you'll need to make a record or a reference of it.
You'll better use union-s in a C++ idiomatic way (see this for general advices).
I think the best way to understand unions is to just to give 2 common practical examples.
The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.
What you could do instead, is make a little union, and use that to loop over the image buffer:
union RGB
{
char raw[3];
struct
{
char R;
char G;
char B;
} colors;
}
RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.
Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.
union MySpecialRegister
{
uint32_t register;
struct
{
unsigned int firstField : 5;
unsigned int somethingInTheMiddle : 25;
unsigned int lastField : 6;
} data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct
The advantage is that with a union you can access the same memory in two different ways.
In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.
With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.
No, it is not for saving space. It is for ability to represent some binary data as various data types.
for example
#include <iostream>
#include <stdint.h>
union Foo{
int x;
struct y
{
unsigned char b0, b1, b2, b3;
};
char z[sizeof(int)];
};
int main()
{
Foo bar;
bar.x = 100;
std::cout << std::hex; // to show number in hexadec repr;
for(size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
}
return 0;
}
output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes. If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64
You can achieve the same using reinterpret_cast:
#include <iostream>
#include <stdint.h>
int main()
{
int x = 100;
char * xBytes = reinterpret_cast<char*>(&x);
std::cout << std::hex; // to show number in hexadec repr;
for (size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
}
return 0;
}
its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.
Also, it is usefull when you have to deal with bit fields, but its a whole different story :)
First of all: Avoid unions where the access goes to the same memory but to different types!
Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.
if you have
union X
{
int x;
char y[4];
};
you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.
struct Y
{
enum { IS_CHAR, IS_INT } tinfo;
union
{
int x;
char y[4];
};
}
But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:
class Base
{
};
class Int_Type: public Base
{
...
int x;
};
class Char_Type: public Base
{
...
char y[4];
};
So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.
As mentioned already from Basile's answer, a useful case can be the access via different names to the same type.
union X
{
struct data
{
float a;
float b;
};
float arr[2];
};
which allows different access ways to the same data with the same type. Using different types which are stored in the same memory should be avoided at all!

C++ variable length arrays in struct

I am writing a program for creating, sending, receiving and interpreting ARP packets. I have a structure representing the ARP header like this:
struct ArpHeader
{
unsigned short hardwareType;
unsigned short protocolType;
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
unsigned short operationCode;
unsigned char senderHardwareAddress[6];
unsigned char senderProtocolAddress[4];
unsigned char targetHardwareAddress[6];
unsigned char targetProtocolAddress[4];
};
This only works for hardware addresses with length 6 and protocol addresses with length 4. The address lengths are given in the header as well, so to be correct the structure would have to look something like this:
struct ArpHeader
{
unsigned short hardwareType;
unsigned short protocolType;
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
unsigned short operationCode;
unsigned char senderHardwareAddress[hardwareAddressLength];
unsigned char senderProtocolAddress[protocolAddressLength];
unsigned char targetHardwareAddress[hardwareAddressLength];
unsigned char targetProtocolAddress[protocolAddressLength];
};
This obviously won't work since the address lengths are not known at compile time. Template structures aren't an option either since I would like to fill in values for the structure and then just cast it from (ArpHeader*) to (char*) in order to get a byte array which can be sent on the network or cast a received byte array from (char*) to (ArpHeader*) in order to interpret it.
One solution would be to create a class with all header fields as member variables, a function to create a byte array representing the ARP header which can be sent on the network and a constructor which would take only a byte array (received on the network) and interpret it by reading all header fields and writing them to the member variables. This is not a nice solution though since it would require a LOT more code.
In contrary a similar structure for a UDP header for example is simple since all header fields are of known constant size. I use
#pragma pack(push, 1)
#pragma pack(pop)
around the structure declaration so that I can actually do a simple C-style cast to get a byte array to be sent on the network.
Is there any solution I could use here which would be close to a structure or at least not require a lot more code than a structure?
I know the last field in a structure (if it is an array) does not need a specific compile-time size, can I use something similar like that for my problem? Just leaving the sizes of those 4 arrays empty will compile, but I have no idea how that would actually function. Just logically speaking it cannot work since the compiler would have no idea where the second array starts if the size of the first array is unknown.
You want a fairly low level thing, an ARP packet, and you are trying to find a way to define a datastructure properly so you can cast the blob into that structure. Instead, you can use an interface over the blob.
struct ArpHeader {
mutable std::vector<uint8_t> buf_;
template <typename T>
struct ref {
uint8_t * const p_;
ref (uint8_t *p) : p_(p) {}
operator T () const { T t; memcpy(&t, p_, sizeof(t)); return t; }
T operator = (T t) const { memcpy(p_, &t, sizeof(t)); return t; }
};
template <typename T>
ref<T> get (size_t offset) const {
if (offset + sizeof(T) > buf_.size()) throw SOMETHING;
return ref<T>(&buf_[0] + offset);
}
ref<uint16_t> hwType() const { return get<uint16_t>(0); }
ref<uint16_t> protType () const { return get<uint16_t>(2); }
ref<uint8_t> hwAddrLen () const { return get<uint8_t>(4); }
ref<uint8_t> protAddrLen () const { return get<uint8_t>(5); }
ref<uint16_t> opCode () const { return get<uint16_t>(6); }
uint8_t *senderHwAddr () const { return &buf_[0] + 8; }
uint8_t *senderProtAddr () const { return senderHwAddr() + hwAddrLen(); }
uint8_t *targetHwAddr () const { return senderProtAddr() + protAddrLen(); }
uint8_t *targetProtAddr () const { return targetHwAddr() + hwAddrLen(); }
};
If you need const correctness, you remove mutable, create a const_ref, and duplicate the accessors into non-const versions, and make the const versions return const_ref and const uint8_t *.
Short answer: you just cannot have variable-sized types in C++.
Every type in C++ must have a known (and stable) size during compilation. IE operator sizeof() must give a consistent answer. Note, you can have types that hold variable amount of data (eg: std::vector<int>) by using the heap, yet the size of the actual object is always constant.
So, you can never produce a type declaration that you would cast and get the fields magically adjusted. This goes deeply into the fundamental object layout - every member (aka field) must have a known (and stable) offset.
Usually, the issue have is solved by writing (or generating) member functions that parse the input data and initialize the object's data. This is basically the age-old data serialization problem, which has been solved countless times in the last 30 or so years.
Here is a mockup of a basic solution:
class packet {
public:
// simple things
uint16_t hardware_type() const;
// variable-sized things
size_t sender_address_len() const;
bool copy_sender_address_out(char *dest, size_t dest_size) const;
// initialization
bool parse_in(const char *src, size_t len);
private:
uint16_t hardware_type_;
std::vector<char> sender_address_;
};
Notes:
the code above shows the very basic structure that would let you do the following:
packet p;
if (!p.parse_in(input, sz))
return false;
the modern way of doing the same thing via RAII would look like this:
if (!packet::validate(input, sz))
return false;
packet p = packet::parse_in(input, sz); // static function
// returns an instance or throws
If you want to keep access to the data simple and the data itself public, there is a way to achieve what you want without changing the way you access data. First, you can use std::string instead of the char arrays to store the addresses:
#include <string>
using namespace std; // using this to shorten notation. Preferably put 'std::'
// everywhere you need it instead.
struct ArpHeader
{
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
string senderHardwareAddress;
string senderProtocolAddress;
string targetHardwareAddress;
string targetProtocolAddress;
};
Then, you can overload the conversion operator operator const char*() and the constructor arpHeader(const char*) (and of course operator=(const char*) preferably too), in order to keep your current sending/receiving functions working, if that's what you need.
A simplified conversion operator (skipped some fields, to make it less complicated, but you should have no problem in adding them back), would look like this:
operator const char*(){
char* myRepresentation;
unsigned char mySize
= 2+ senderHardwareAddress.length()
+ senderProtocolAddress.length()
+ targetHardwareAddress.length()
+ targetProtocolAddress.length();
// We need to store the size, since it varies
myRepresentation = new char[mySize+1];
myRepresentation[0] = mySize;
myRepresentation[1] = hardwareAddressLength;
myRepresentation[2] = protocolAddressLength;
unsigned int offset = 3; // just to shorten notation
memcpy(myRepresentation+offset, senderHardwareAddress.c_str(), senderHardwareAddress.size());
offset += senderHardwareAddress.size();
memcpy(myRepresentation+offset, senderProtocolAddress.c_str(), senderProtocolAddress.size());
offset += senderProtocolAddress.size();
memcpy(myRepresentation+offset, targetHardwareAddress.c_str(), targetHardwareAddress.size());
offset += targetHardwareAddress.size();
memcpy(myRepresentation+offset, targetProtocolAddress.c_str(), targetProtocolAddress.size());
return myRepresentation;
}
While the constructor can be defined as such:
ArpHeader& operator=(const char* buffer){
hardwareAddressLength = buffer[1];
protocolAddressLength = buffer[2];
unsigned int offset = 3; // just to shorten notation
senderHardwareAddress = string(buffer+offset, hardwareAddressLength);
offset += hardwareAddressLength;
senderProtocolAddress = string(buffer+offset, protocolAddressLength);
offset += protocolAddressLength;
targetHardwareAddress = string(buffer+offset, hardwareAddressLength);
offset += hardwareAddressLength;
targetProtocolAddress = string(buffer+offset, protocolAddressLength);
return *this;
}
ArpHeader(const char* buffer){
*this = buffer; // Re-using the operator=
}
Then using your class is as simple as:
ArpHeader h1, h2;
h1.hardwareAddressLength = 3;
h1.protocolAddressLength = 10;
h1.senderHardwareAddress = "foo";
h1.senderProtocolAddress = "something1";
h1.targetHardwareAddress = "bar";
h1.targetProtocolAddress = "something2";
cout << h1.senderHardwareAddress << ", " << h1.senderProtocolAddress
<< " => " << h1.targetHardwareAddress << ", " << h1.targetProtocolAddress << endl;
const char* gottaSendThisSomewhere = h1;
h2 = gottaSendThisSomewhere;
cout << h2.senderHardwareAddress << ", " << h2.senderProtocolAddress
<< " => " << h2.targetHardwareAddress << ", " << h2.targetProtocolAddress << endl;
delete[] gottaSendThisSomewhere;
Which should offer you the utility needed, and keep your code working without changing anything out of the class.
Note however that if you're willing to change the rest of the code a bit (talking here about the one you've written already, ouside of the class), jxh's answer should work as fast as this, and is more elegant on the inner side.

Variable length array inside a c++ sturcture

I wondering what the best solution is for a structure with variable length array for one of the fields. I've done a bunch of research and I haven't seen a clear answer yet.
I've been playing with the below code and trying to get the varField to be set to an array the size of 10 bytes.
typedef struct TestStruct{
int size;
unsigned char varField[1];
}
I have tried doing zero sized array and that gives me a compile error.
I also tried something like this and it gave me a compile error.
int size= 10;
struct TestStruct*test = malloc(sizeof(struct TestStruct) + (size- 1));
test->size= size;
Thank you so much for help.
The preferred way is to use the dynamically re-sizable std::vector. This class has the rule of five built in.
struct TestStruct {
std::vector<unsigned char> varField;
}
If you're allergic to the standard library, you could use:
unsigned char *varfield;
And supply the necessary constructors/destructors.
If you are implementing messages, a better solution is to set up a hierarchy:
struct Message_Base
{
unsigned int message_length_in_bytes;
unsigned int message_id;
virtual Checksum_Type calculate_checksum(void) = 0;
virtual bool send_message(Receiver& r) = 0;
virtual bool receive_message(Sender& s) = 0;
virtual void process_message(void) = 0;
};
Each child class would be a different message with possible different lengths. Some possible common methods to all message are listed.
This is how to implement using Object Oriented and C++.
The class C language implementation is to declare a zero length array at the end for the message's unique data.
You look like you want a std::vector<unsigned char>:
struct TestStruct{
std::vector<unsigned char> varField;
}
and you get the size with:
ts.varField.size();
You can't. In C++ dynamic size arrays are illegal. The size of an array must be a compile time constant expression.
The options you have basically are
Use an STL container like std::vector or the like. The benefits are that they also take care of memory allocation and deallocation for you.
Use a pointer in your struct and allocate the memory for it dynamically. Don't forget to use delete[] instead of just delete!
In most compilers, the following will work:
template<unsigned N>
struct TestStruct {
unsigned size = N;
unsigned char varField[N];
};
struct ITestStruct {
unsigned size;
unsigned char varField[1]; // variable
};
template<unsigned N>
ITestStruct* make_test_struct() {
return reinterpret_cast<ITestStruct*>(new TestStruct<N>());
};
ITestStruct* make_test_struct( unsigned n ) {
char* buff = new char[ sizeof(ITestStruct)+n-1 ];
ITestStruct* retval = reinterpret_cast<ITestStruct*>(buff);
retval->size = n;
return retval;
}
If you replace char with another non-POD type, things will get hairy.

Simplest way to read binary data from a std::vector<unsigned char>?

I have a lump of binary data in the form of const std::vector<unsigned char>, and want to be able to extract individual fields from that, such as 4 bytes for an integer, 1 for a boolean, etc. This needs to be, as far as possible, both efficient and simple. eg. It should be able to read the data in place without needing to copy it (eg. into a string or array). And it should be able to read one field at a time, like a parser, since the lump of data does not have a fixed format. I already know how to determine what type of field to read in each case - the problem is getting a usable interface on top of an std::vector for doing this.
However I can't find a simple way to get this data into an easily usable form that gives me useful read functionality. eg. std::basic_istringstream<unsigned char> gives me a reading interface, but it seems like I need to copy the data into a temporary std::basic_string<unsigned char> first, which is not idea for bigger blocks of data.
Maybe there is some way I can use a streambuf in this situation to read the data in place, but it would appear that I'd need to derive my own streambuf class to do that.
It occurs to me that I can probably just use sscanf on the vector's data(), and that would seem to be both more succinct and more efficient than the C++ standard library alternatives. EDIT: Having been reminded that sscanf doesn't do what I wrongly thought it did, I actually don't know a clean way to do this in C or C++. But am I missing something, and if so, what?
You have access to the data in a vector through its operator[]. A vector's data is guranteed to be stored in a single contiguous array, and [] returns a reference to a member of that array. You may use that reference directly, or through a memcpy.
std::vector<unsigned char> v;
...
byteField = v[12];
memcpy(&intField, &v[13], sizeof intField);
memcpy(charArray, &v[20], lengthOfCharArray);
EDIT 1:
If you want something "more convenient" that that, you could try:
template <class T>
ReadFromVector(T& t, std::size_t offset,
const std::vector<unsigned char>& v) {
memcpy(&t, &v[offset], sizeof(T));
}
Usage would be:
std::vector<unsigned char> v;
...
char c;
int i;
uint64_t ull;
ReadFromVector(c, 17, v);
ReadFromVector(i, 99, v);
ReadFromVector(ull, 43, v);
EDIT 2:
struct Reader {
const std::vector<unsigned char>& v;
std::size_t offset;
Reader(const std::vector<unsigned char>& v) : v(v), offset() {}
template <class T>
Reader& operator>>(T&t) {
memcpy(&t, &v[offset], sizeof t);
offset += sizeof t;
return *this;
}
void operator+=(int i) { offset += i };
char *getStringPointer() { return &v[offset]; }
};
Usage:
std::vector<unsigned char> v;
Reader r(v);
int i; uint64_t ull;
r >> i >> ull;
char *companyName = r.getStringPointer();
r += strlen(companyName);
If your vector stores binary data, you can't use sscanf or similar, they work on text.
For converting a byte for a bool is simple enough
bool b = my_vec[10];
For extracting an unsigned int that's stored in big endian order (assuming your ints are 32 bits):
unsigned int i = my_vec[10] << 24 | my_vec[11] << 16 | my_vec[12] << 8 | my_vec[13];
A 16 bit unsigned short would be similar:
unsigned short s = my_vec[10] << 8 | my_vec[11];ยจ
If you can afford the Qt dependency, QByteArray has the fromRawData() named constructor, which wraps existing data buffers in a QByteArray without copying the data. With that byte array, you can the feed a QTextStream.
I'm not aware of any such function in the standard streams library (short of implementing your own streambuf, of course), but I'd love to be proved wrong :)
You can use a struct that describes the data you are trying to extract. You can move data from your vector into the struct like this:
struct MyData {
int intVal;
bool boolVal;
char[15] stringVal;
} __attribute__((__packed__));
// assuming all extracted types are prefixed with a one byte indicator.
// Also assumes "vec" is your populated vector
int pos = 0;
while (pos < vec.size()-1) {
switch(vec[pos++]) {
case 0: { // handle int
int intValue;
memcpy(&vec[pos], &intValue, sizeof(int));
pos += sizeof(int);
// do something with handled value
break;
}
case 1: { // handle double
double doubleValue;
memcpy(&vec[pos], &doubleValue, sizeof(double));
pos += sizeof(double);
// do something with handled value
break;
}
case 2: { // handle MyData
struct MyData data;
memcpy(&vec[pos], &data, sizeof(struct MyData));
pos += sizeof(struct MyData);
// do something with handled value
break;
}
default: {
// ERROR: unknown type indicator
break;
}
}
}
Use a for loop to iterate over the vector and use bitwise operators to access each bit group. For example, to access the upper four bits of the first usigned char in your vector:
int myInt = vec[0] & 0xF0;
To read the fifth bit from the right, right after the chunk we just read:
bool myBool = vec[0] & 0x08;
The three least significant (lowest) bits can be accesed like so:
int myInt2 = vec[0] & 0x07;
You can then repeat this process (using a for loop) for every element in your vector.