I am new to Arduino and C++ development coming from C# so likely am missing so fundamental understanding. Kindly answer accordingly.
Context
I am writing an Arduino sketch whereby I form a Http GET request in order to receive data from a Web API. In receiving the response, I am able to read the stream into byte data[] using client.read(data,client.available()). In my application, I know each byte represents a char in ASCII encoding. For processing of the response, I wish to convert this byte[] to a char[], however this got me thinking...
Question
How in C++ can I generically cast a byte[] to another known type without copying memory? In C# I would achieve this using the MemoryMarshal. Something tells me a I should be able to simply initialise an object from a pointer?
Many Thanks
byte is not a native C++ type, the Arduino environment creates it using a typedef, it is actually a uint_8 (unsigned 8 bit integer) type. This is also the underlying type of a char, so you don't actually have to do anything. A byte array and char array are already the same data type, they're just labelled differently.
You could use a (char) cast to improve your code clarity, and this would not copy any data.
Update:
You can use a cast in C or C++ to tell the compiler to interpret some raw data as a different data type. In the example below, an array of 6 floats containing 2D vectors arranged XYXYXY, is cast to an array of 2D vector structures. This is done without any data being copies. However there are many pitfalls with this technique. You need be absolutely sure how the compiler is laying out the underlying storage of the structure. This is not defined by the standard so can vary between compilers, many will add padding for word alignment which will vary based on the architecture used. So use this method with care.
struct My2DVector {
float x, y;
}
float flatVectorData[] = { 0.0, 1.0, 2.0, 2.5, -5.0, 3.0};
// Cast the pointer to float to a pointer to My2DVector
My2DVector* structVectorData = (My2DVector*)flatVectorData;
printf("Vector 2 (%f %f)\n", structVectorData[1].x, structVectorData[1].y);
Related
Let there be a structure
struct MyDataStructure
{
int a;
int b;
string c;
};
Let there be a function in the interface exposed by a dll.
class IDllInterface
{
public:
void getData(MyDataStructure&) = 0;
};
From a client exe which loads the dll, would the following code be safe?
...
IDllInterface* dll = DllFactory::getInterface(); // Imagine this exists
MyDataStructure data;
dll->getData(data);
...
Assume, of course, that MyDataStructure is known to both the client and the dll. Also according to what I understand, as the code is compiled separately for the dll and exe, the MyDataStructure could be different for difference compilers/compiler versions. Is my understanding correct.
If so, how can you pass data between the dll boundaries safely when working with different compilers/compiler versions.
You could use a "protocol" approach. For this, you could use a memory buffer to transfer the data and both sides just have to agree on the buffer layout.
The protocol agreement could be something like:
We don't use a struct, we just use a memory buffer - (pass me a pointer or whatever means toolkit allows sharing a memory buffer.
We clear the buffer to 0s before setting any data in it.
All ints use 4 bytes in the buffer. This means each side uses whatever int type under their compiler is 4 bytes e.g. int/long.
For the particular case of two ints, the first 8 bytes has the ints and after that it's the string data.
#define MAX_STRING_SIZE_I_NEED 128
// 8 bytes for ints.
#define DATA_SIZE (MAX_STRING_SIZE_I_NEED + 8)
char xferBuf[DATA_SIZE];
So Dll sets int etc. e.g.
void GetData(void* p);
// "int" is whatever type is known to use 4 bytes
(int*) p = intA_ValueImSending;
(int*) (p + 4) = intB_ValueImSending;
strcpy((char*) (p + 8), stringBuf_ImSending);
On the receving end it's easy enough to place the buffered values in the struct:
char buf[DATA_SIZE];
void* p =(void*) buf;
theDll.GetData(p);
theStrcuctInstance.intA = *(int*) p;
theStrcuctInstance.intB = *(int*) (p + 4);
...
If you want you could even agree on the endianness of the bytes per integer and set each of the 4 bytes of each integer in the buffer - but you probably wouldn't need to go to that extent.
For more general purpose both sides could agree on "markers" in the buffer. The buffer would look like this:
<marker>
<data>
<marker>
<data>
<marker>
<data>
...
Marker: 1st byte indicates the data type, the 2nd byte indicates the length (very much like a network protocol).
If you want to pass a string in COM, you normally want to use a COM BSTR object. You can create one with SysAllocString. This is defined to be neutral between compilers, versions, languages, etc. Contrary to popular belief, COM does directly support the int type--but from its perspective, int is always a 32-bit type. If you want a 64-bit integer, that's a Hyper, in COM-speak.
Of course you could use some other format that both sides of your connection know/understand/agree upon. Unless you have an extremely good reason to do this, it's almost certain to be a poor idea. One of the major strengths of COM is exactly the sort of interoperation you seem to want--but inventing your own string formation would limit that substantially.
Using JSON for communication.
I think I have found an easier way to do it hence answering my own question. As suggested in the answer by #Greg, one has to make sure that the data representation follows a protocol e.g. network protocol. This makes sure that the object representation between different binary components (exe and dll here) becomes irrelevant. If we think about it again this is the same problem that JSON solves by defining a simple object representation protocol.
So a simple yet powerful solution according to me would be to construct a JSON object from your object in the exe, serialise it, pass it across the dll boundary as bytes and deserialise it in the dll. The only agreement between the dll and exe would be that both use the same string encoding (e.g. UTF-8).
https://en.wikibooks.org/wiki/JsonCpp
One can use the above Jsoncpp library. The strings are encoded by UTF-8 by default in Jsoncpp library so that it convenient as well :-)
As far as I could find, the width of the bool type is implementation-defined. But are there any fixed-width boolean types, or should I stick to, for e.g., a uint8_t to represent a fixed-width bool?
[EDIT]
I made this python script that auto-generates a C++ class which can hold the variables I want to be able to send between a micro controller and my computer. The way it works is that it also keeps two arrays holding a pointer to each one of these variables and the sizeof each one of them. This gives me the necessary information to easily serialize and deserialize each one of these variables. For this to work however the sizeof, endianness, etc of the variable types have to be the same on both sides since I'm using the same generated code on both sides.
I don't know if this will be a problem yet, but I don't expect it to be. I have already worked with this (32bit ARM) chip before and haven't had problems sending integer and float types in the past. However it will be a few days until I'm back and can try booleans out on the chip. This might be a bigger issue later, since this code might be reused on other chips later.
So my question is. Is there a fixed width bool type defined in the standard libraries or should I just use a uint8_t to represent the boolean?
There is not. Just use uint8_t if you need to be sure of the size. Any integer type can easily be treated as boolean in C-related languages. See https://stackoverflow.com/a/4897859/1105015 for a lengthy discussion of how bool's size is not guaranteed by the standard to be any specific value.
I'm working with some embedded code and I am writing something new from scratch so I am preferring to stick with the uint8_t, int8_t and so on types.
However, when porting a function:
void functionName(char *data)
to:
void functionName(int8_t *data)
I get the compiler warning "converts between pointers to integer types with different sign" when passing a literal string to the function. ( i.e. when calling functionName("put this text in"); ).
Now, I understand why this happens and these lines are only debug however I wonder what people feel is the most appropriate way of handling this, short of typecasting every literal string. I don't feel that blanket typecasting in any safer in practice than using potentially ambiguous types like "char".
You seem to be doing the wrong thing, here.
Characters are not defined by C as being 8-bit integers, so why would you ever choose to use int8_t or uint8_t to represent character data, unless you are working with UTF-8?
For C's string literals, their type is pointer to char, and that's not at all guaranteed to be 8-bit.
Also it's not defined if it's signed or unsigned, so just use const char * for string literals.
To answer your addendum (the original question was nicely answered by #unwind). I think it mostly depends on the context. If you are working with text i.e. string literals you have to use const char* or char* because the compiler will convert the characters accordingly. Short of writing your own string implementation you are probably stuck with whatever the compiler provides to you. However, the moment you have to interact with someone/something outside of your CPU context e.g. network, serial, etc. you have to have control over the exact size (which I suppose is where your question stems from). In this case I would suggest writing functions to convert strings or any data-type for that matter to uint8_t buffers for serialized sending (or receiving).
const char* my_string = "foo bar!";
uint8_t buffer* = string2sendbuffer(my_string);
my_send(buffer, destination);
The string2buffer function would know everything there is to know about putting characters in a buffer. For example it might know that you have to encode each char into two buffer elements using big-endian byte ordering. This function is most certainly platform dependent but encapsulates all this platform dependence so you would gain a lot of flexibility.
The same goes for every other complex data-type. For everything else (where the compiler does not have that strong an opinion) I would advise on using the (u)intX_t types provided by stdint.h (which should be portable).
It is implementation-defined whether the type char is signed or unsigned. It looks like you are using an environment where is it unsigned.
So, you can either use uint8_t or stick with char, whenever you are dealing with characters.
So I'm doing tests with the C++ MongoDB driver
Here is my test code
http://pastebin.com/eQUekQU2
In the code I make a small integer array and insert it into the mongo database as binary.
I retrieve the binary with this line of code as I traverse the rows
mongo::BSONElement array = obj["binTest"];
At this point I have the binary in this array variable which is of BSONElement type. I want to convert this binary back into an integer array. A function to do this can be seen in the api
http://api.mongodb.org/cplusplus/current/classmongo_1_1_b_s_o_n_element.html#a8f4902eacf15f5069f4bb752bfd0aef4
Function Header
const char* mongo::BSONElement::binData (int &len)const
I want to run the function, get the binary data in const char* format and convert it into an int array. Can I caste it, or do I have to go every 4 bytes and put it into an array myself?
Thanks
As long as you run the code on a hardware architecture with the same endianess (on PCs it is little-endianess), you can cast the retrieved char pointer to an int32 pointer and treat it as an int32 array.
However, if you have data exchange or cross-platform interoperability in your mind, it would be a bad idea to treat arrays of data types as byte arrays for storage/exchange purposes.
You would need to consider that different hardware platforms might have different endianess. In such cases, casting the char pointer to int32* would not work, and you would need to take the individual bytes and compose the int32 values. But it can become even trickier, because the endianess of the int32's written into the BSON-BinData is exactly the same endianess as the computer architecture the 'write code' was running on - which you would need to know to decode the binData again.
You would also need to consider that for arrays of other data types, casting might not work despite using the same hardware architecture. For example, the size of the long data type is different among the popular OS's for the Intel x86/x64 platform: http://software.intel.com/en-us/articles/size-of-long-integer-type-on-different-architecture-and-os
NOTE: I know that this has been asked many times before, but none of the questions have had a link to a concrete, portable, maintained library for this.
I need a C or C++ library that implements Python/Ruby/Perl like pack/unpack functions. Does such a library exist?
EDIT: Because the data I am sending is simple, I have decided to just use memcpy, pointers, and the hton* functions. Do I need to manipulate a char in any way to send it over the network in a platform agnostic manner? (the char is only used as a byte, not as a character).
In C/C++ usually you would just write a struct with the various members in the correct order (correct packing may require compiler-specific pragmas) and dump/read it to/from file with a raw fwrite/fread (or read/write when dealing with C++ streams). Actually, pack and unpack were born to read stuff generated with this method.
If you instead need the result in a buffer instead of a file it's even easier, just copy the structure to your buffer with a memcpy.
If the representation must be portable, your main concerns are is byte ordering and fields packing; the first problem can be solved with the various hton* functions, while the second one with compiler-specific directives.
In particular, many compilers support the #pragma pack directive (see here for VC++, here for gcc), that allows you to manage the (unwanted) padding that the compiler may insert in the struct to have its fields aligned on convenient boundaries.
Keep in mind, however, that on some architectures it's not allowed to access fields of particular types if they are not aligned on their natural boundaries, so in these cases you would probably need to do some manual memcpys to copy the raw bytes to variables that are properly aligned.
Why not boost serialization or protocol buffers?
Yes: Use std::copy from <algorithm> to operate on the byte representation of a variable. Every variable T x; can be accessed as a byte array via char * p = reinterpret_cast<char*>(&x); and p can be treated like a pointer to the first element of a an array char[sizeof(T)]. For example:
char buf[100];
double q = get_value();
char const * const p = reinterpret_cast<char const *>(&q);
std::copy(p, p + sizeof(double), buf);
// more stuff like that
some_stream.write(buf) //... etc.
And to go back:
double r;
std::copy(data, data + sizeof(double), reinterpret_cast<char *>(&r));
In short, you don't need a dedicated pack/unpack in C++, because the language already allows you access to its variables' binary representation as a standard part of the language.