Encoding and decoding uint8_t using TinyCbor C Library - c++

I am implementing C++ 11 based application and I am using TinyCbor C library for Encoding and Decoding application specific data as below:
#include "cbor.h"
#include <iostream>
using namespace std;
int main() {
struct MyTest {
uint8_t varA;
float vabB;
};
MyTest obj;
obj.varA = 100; // If I set it t0 below 20 then it works
obj.varB = 10.10;
uint8_t buff[100];
//Encode
CborEncode encoder;
CborEncode array;
cbor_encoder_init(&encoder, buff, sizeof(buff), 0);
cbor_encoder_create_array(&encode, &array, CborIndefiniteLength);
cbor_encode_simple_value(&array, obj.varA);
cbor_encode_float(&array, obj.varB);
cbor_encoder_close_container(&encoder, &array);
// Decode
CborParser parse;
CborValue value;
cbor_parser_init(buff, sizeof(buff), 0, &parser, &value);
CborValue array;
cbor_value_enter_container(&value, &array);
uint8_t val;
cbor_value_get_simple_type(&array, &val);
// This prints blank
cout << "uint8_t value: " << static_cast<int>(val) << endl;
float fval;
cbor_value_get_simple_type(&array, &fval);
cout << "float value: " << fval << endl;
return 0;
}
Above code works when I set value of uint8_t varA to below 20, I see 20 getting printed on console but if I set more than 20 then sometimes it gives error CborErrorIllegalSimpleType. Or if value is set to 21 then it returns me type as CborBooleanType or CborNullType.
What is wrong with the code
How to encode and decode uint8_t using TinyCbor.

You have a few things going on here.
cbor_encoder_create_array(&encode, &array, CborIndefiniteLength);
Don’t use indefinite length unless you plan on streaming the encoding. If you have all the data in front of you when encoring, use defined length. See why here: Serialize fixed size Map to CBOR also, and I’m not sure what version you are using but indefinite length objects were at least fairly recently a TO DO list item for TinyCBOR.
cbor_value_get_simple_type(&array, &val);
You don’t need simple here. Simple types are primatives that are mostly undefined. CBOR’s default type is int64_t, signed long long. Simple does allow for a natural stop at 8bit, but 20 in simple is Boolean false. 21 true , 22 null. You discovered this already. It can’t store a negative, or float, and while you could use it to represent an 8bit like “100”, you really shouldn’t. The good news about CBOR’s types is that while the default is 64bit, it uses as little memory to encode as needed. So storing 100 in CBOR is one byte, not 8, not counting the overhead for a second. EDIT: Clarification, 2 bytes to store integer 100. CBOR is one byte for 0-23, but 24 and one has a byte of overhead.
By overhead I mean, when you encode a cbor integer, the first three bits mean UNSIGNED INTEGER (binary 000), one bit is reserved, the other 4 bits are value, so if you had a small value that fit in those four bits (value 0-23) you can get it all in one byte. That reserved bit is weird, see the chart here to understand it: https://en.m.wikipedia.org/wiki/CBOR So 24-255 requires 2 bytes encoded, etc etc. A far cry from it always using 8bytes just because it can.
The short version here is that CBOR won’t use more space than needed - BUT - it’s not a strongly typed serializer! If you mean for 100 inside 8bits to be stored in that array and someone stores a 10000 to be with 16bits, it’ll look/work fine until you go to parse and store a large number in an 8bit spot. You need to cast or validate your data. I recommend parse THEN validate.
cbor_value_get_simple_type(&array, &fval);
cout << "float value: " << fval << endl;
I’d have to look in TinyCBOR’s code, but I think this is a happy accident and not technically supported to work. Because simple types use the same three major bits you are able to get the value’s 64bit precision with get_simple. You should instead examine the type and make the correct call for get full or half precision float.
TinyCBOR is pretty nice, but there are definitely a couple gotchas hidden. It really helps to undersnd the CBOR encoding even though you are trusting the serializer to do the work for you.

Use CborError cbor_encode_uint(CborEncoder *encoder, uint64_t value) instead. This function will encode an unsigned integer value with the smallest representation.

Related

How to iterate over every bit of a type in C++

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this
#include <iostream>
template<typename T>
bool getIthBit (T data, unsigned int bit) {
return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}
int main() {
uint32_t a = 16;
for (int i = 0; i < 32; i++) {
std::cout << getIthBit (a, i);
}
std::cout << std::endl;
}
Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here
#include <iostream>
struct s {
uint32_t i;
char c;
};
int main() {
std::cout << sizeof (s) << std::endl;
}
The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".
It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.
I recommend not attempting to support types that have padding as keys of your DST.
Following trick might work for finding padding bits of trivially copyable classes:
Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.
I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.
So, there are a lot of caveats, but it should work in your example case:
s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);
std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;
There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.
So if an object has unique representations, it is safe to assume that every bit is significant.
There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.
Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }
Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.
¹ like multiple NaN floating point values

C++ File IO: Reading and Writing 16-bit Words

I want to write non-Unicode, 16-bit words to a file a read them back later. I know with a bit of byte manipulation I can do this in char mode using fstream::read() and fstream::write(). What do I need to do to use 16-bit words directly?
For example, it seems I should be able to do the following:
basic_ofstream<uint16_t> aw;
aw.open("test.bin", ios::binary);
uint16_t c[] = {0x55aa, 0x1188};
aw.write(c, 2);
aw.close();
basic_ifstream<uint16_t> ax;
ax.open("test.bin", ios::binary);
uint16_t ui[2];
ax.read(ui, 2);
ax.close();
cout << endl << hex << unsigned(ui[0]) << " " << unsigned(ui[1]) << endl;
gcc 4.4 output:
d 0
Vc++10 output:
CCCC CCCC
I've also tried using std::basic_filebuf<uint16_t> direct and got the same results. Why?
I'm actually surprised you got the streams instantiated to do any reading at all! What the result will be is possibly implementation defined (i.e., you might find the behavior described in the compiler's documentation) but possibly it is just not specified (although not quite undefined). I don't think the stream classes are required to support instantiations for other types than char and wchar_t immediately, i.e., without the user providing at least some of the facets.
The standard stream classes are templates on the character type but aren't easy to instantiate for any unsupported type. At bare minimum, you'd need to implement a suitable std::codecvt<int16_t, char, std::mbstate_t> facet converting between the external representation in byte and the internal representation. From the looks of it the two systems you tried have made different choices for their default implementation.
std::codecvt<internT, externT, stateT> is the facet used to convert between an external representation of characters and an internal representation of characters. Streams are only required to support char which is considered to represent bytes as the external type externT. The internal character type internT can be any integral type but the conversion needs to be defined by implementing the code conversion facet. If I recall correctly, the streams can also assume that the state type stateT is std::mbstate_t (which is actually somewhat problematic because there is no interface defined for this type!).
Unless you are really dedicated in creating an I/O stream for your character type uint16_t, you probably want to read bytes using std::ifstream and convert them to your character type. Similarly for writing characters. To really create an I/O stream also supporting formatting, you'd need a number of other facets, too (e.g., std::ctype<uint16_t>, std::num_punct<uint16_t>) and you'd need to build a std::locale to contain all of these plus a few which can be instantiated from the standard library's implementation (e.g., std::num_get<uint16_t> and std::num_put<uint16_t>; I think their iterator types are suitable defaulted).
When I try your code, the file is written, but nothing is inside, its size is 0 after closing it. When reading from that file, nothing can be read. What you see in the output is uninitialized garbage.
Besides using ofstream/ifstream with default char you should not necessarily rely on read() and write() methods because they do not indicate if they actually write anything. Refer to http://en.cppreference.com/w/cpp/io/basic_ostream/write for details on this. Especially this is interesting:
This function is an unformatted output function: it begin execution by
constructing an object of type sentry, which flushes the tie()'d
output buffers if necessary and checks the stream errors. After
construction, if the sentry object returns false, the function returns
without attempting any output.
It is likely that this is why there is not output written to your file because it seems it is not designed to work with any other types than char or similar.
Update: To see if writing/reading succeed check the fail or bad bit which should have already indicated that something went wrong.
cout << aw.fail() << aw.bad() << "\n";
cout << ax.fail() << ax.bad() << "\n";
Both were set to true, so your real question should have been: why did the call to write() fail?
I suggest reading: http://www.cplusplus.com/articles/DzywvCM9/
Snippets:
"The problem with these types is that their size is not well defined.
int might be 8 bytes on one machine, but only 4 bytes on another. The
only one that's consistent is char... which is guaranteed to always be
1 byte."
u16 ReadU16(istream& file)
{
u16 val;
u8 bytes[2];
file.read( (char*)bytes, 2 ); // read 2 bytes from the file
val = bytes[0] | (bytes[1] << 8); // construct the 16-bit value from those bytes
return val;
}
void WriteU16(ostream& file, u16 val)
{
u8 bytes[2];
// extract the individual bytes from our value
bytes[0] = (val) & 0xFF; // low byte
bytes[1] = (val >> 8) & 0xFF; // high byte
// write those bytes to the file
file.write( (char*)bytes, 2 );
}
You may want to refresh on the "typedef" keyword as well, for defining the guaranteed-#-bits types. While a little more of a learning curve, Boost and C99 compilers define guaranteed size types as well. I'm not sure about X++0x, but it's too new to be portable.
You can use char specializations and reinterpret_cast:
basic_ofstream<char> aw;
...
aw.write( reinterpret_cast<const char*>(i16buf), n2write*sizeof(int16_t) );
basic_ifstream<char> ax;
...
ax.read( reinterpret_cast<char*>(i16buf), n2read*sizeof(int16_t) );
The "sizeof(int16_t)" is for the edge cases where sizeof(int16_t)==1 (e.g. DSP processors)
Of course, if you need to read/write in a specific byte order, then you need endian conversion functions. Note, there is no standard compile-time way of determining endianness.

short pointer to a float

i run this code in c++:
#include <iostream>
using namespace std;
int main()
{
float f = 7.0;
short s = *(short *)&f;
cout << sizeof(float) << endl
<< sizeof(short) << endl
<< s << endl;
return 0;
}
i get the following out pot:
4
2
0
but, in a lecture given in Stanford university, Professor Jerry Cain says he is sure the out pot well not be 0.
the lecture is can be fond here. he says that around the 48 minute.
is he wrong, or that some standard change since? or is there a difference between platforms?
I'm using g++ to compile my code.
EDIT: in the next lecture he does mention "big endian" and "small endian" and says that they well affect the result.
static void bitPrint(float f)
{
assert(sizeof(int) == sizeof(float));
int *data = reinterpret_cast<int*>(&f);
for (int i = 0; i < sizeof(int) * 8; ++i)
{
int bit = (1 << i) & *data;
if (bit) bit = 1;
cout << bit;
}
cout << endl;
}
int main()
{
float f = 7.0;
bitPrint(f);
return 0;
}
This program prints 00000000000000000000011100000010
Since the sizeof(short) == 2 on your platform you get the first 2 bytes which are both zeros
Note that since size of types and possibly float implementation (not sure about this) are implementation defined different output can be seen on different platforms.
Well, let's see. First you write a float into the memory. It occupies 4 bytes, and it's value is 7. A float in the memory looks something like "sign bit -> exponent bits -> mantissa bits". I'm not sure how many bits are there for each part exactly, probably that depends on your platform.
Since the float's value is 7, it only occupies some of the least-significant bits on the right (I assume big-endian).
Your short pointer points to the beginning of the float, which means to the most significant bit. Since the value is greater than 0, the sign bit is zero. Since the float value is far on the right, we can say that those two most significant bytes are filled with zeros.
Now, provided that a size of short is 2, which means we will only take two bytes out of float's 4 bytes, we get our 0.
I believe though, that this result is rather UB and can differ on different platforms, compilers, etc.
Accessing data through a pointer to a different type than it was stored as gives (except in a few special cases) undefined behavour.
Firstly it's platform dependent how the data it stored so different systems may well give different values, and secondly the compiler might well generate code that doesn't even see the value you'd expect as it's allowed to do anything it likes when you do this (It's undefined behavour due to the strict aliases rules).
Having said that there are probably reasons why the number you are seeing is valid, but you can't rely on it unless you specifically know your platform will do what you expect, it's not guarenteed by the standard.
He's "pretty" sure it's not zero, he says that explicitly.
However, given that the representation of a short can be big-endian or little-endian, I wouldn't be so certain. In any case, this is a throwaway line at the end of a fifty-minute lecture so we can forgive him a little. It may be he came back in the next lecture with a clarification.
You would need to examine the underlying bits at (at least) a byte-by-byte level to understand what's going on.

what's the size of hex value of some memory address converted to int or other type?

For example:
int* x = new int;
int y = reinterpret_cast<int>(x);
y now holds the integer value of the memory address of variable x.
Variable y is of size int. Will that int size always be large enough to store the converted memory address of ANY TYPE being converted to int?
EDIT:
Or is safer to use long int to avoid a possible loss of data?
EDIT 2: Sorry people, to make this question more understandable the thing I want to find out here is the size of returned HEX value as a number, not size of int nor size of pointer to int but plain hex value. I need to get that value in in human-readable notation. That's why I'm using reinterpret_cast to convert that memory HEX value to DEC value. But to store the value safely I also need to fing out into what kind of variable to it: int, long - what type is big enough?
No, that's not safe. There's no guarantee sizeof(int) == sizeof(int*)
On a 64 bit platform you're almost guaranteed that it's not.
As for the "hexadecimal value" ... I'm not sure what you're talking about. If you're talking about the textual representation of the pointer in hexadecimal ... you'd need a string.
Edit to try and help the OP based on comments:
Because computers don't work in hex. I don't know how else to explain it. An int stores some amount of bits (binary), as does a long. Hexadecimal is a textual representation of those bits (specifically, the base16 representation). strings are used for textual representations of values. If you need a hexadecimal representation of a pointer, you would need to convert that pointer to text (hex).
Here's a c++ example of how you would do that:
test.cpp
#include <string>
#include <iostream>
#include <sstream>
int main()
{
int *p; // declare a pointer to an int.
std::ostringstream oss; // create a stringstream
std::string s; // create a string
// this takes the value of p (the memory address), converts it to
// the hexadecimal textual representation, and puts it in the stream
oss << std::hex << p;
// Get a std::string from the stream
s = oss.str();
// Display the string
std::cout << s << std::endl;
}
Sample output:
roach$ g++ -o test test.cpp
roach$ ./test
0x7fff68e07730
It's worth noting that the same thing is needed when you want to see the base10 (decimal) representation of a number - you have to convert it to a string. Everything in memory is stored in binary (base2)
On most 64-bit targets, int is still 32-bit, while pointer is 64bit, so it won't work.
http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
What you probably want is to use std::ostream's formatting of addresses:
int x(0);
std::cout << &x << '\n';
As to the length of the produced string, you need to determine the size of the respective pointer: for each used byte the output will use two hex digit because each hex digit can represent 16 values. All bytes are typically used even if it is unlikely that you have memory for all bytes e.g. when the size of pointers is 8 bytes as happens on 64 bit systems. This is because the stacks often grow from the biggest address downwards while the executable code start at the beginning of the address range (well, the very first page may be unused to cause segmentation violations if it is touched in any way). Above the executable code live some data segments, followed by the heap, and lots of unused pages.
There is question addressing similar topic:
https://stackoverflow.com/a/2369593/1010666
Summary: do not try to write pointers into non-pointer variable.
If you need to print out the pointer value, there are other solutions.

Define smallest possible datatype in c++ that can hold six values

I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.
Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.
#include <iostream>
enum Foo
{
a = 0x0,
b = 0x1,
c = 0x2,
d = 0x3,
e = 0x4,
f = 0x5,
};
int main()
{
Foo myfoo = a;
char mychar = 'a';
std::cout << sizeof(myfoo); // prints 4
std::cout << sizeof(mychar); // prints 1
return 1;
}
Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)
Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.
A char is the smallest possible type.
If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:
struct foo {
unsigned int val1:3;
unsigned int val2:3;
};
and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.
C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.
As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.
C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.
Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.
If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.
This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.
You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.
For example:
struct example
{
unsigned int a : 3; //<Three bits, can be 0 through 7.
bool b : 1; //<One bit, the stores 0 or 1.
unsigned int c : 10; //<Ten bits, can be 0 through 1023.
unsigned int d : 19; //<19 bits, can be 0 through 524287.
}
In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.
Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.
Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:
//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;
To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:
//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);
See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).
Then to decode:
const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;
Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.
Minimal size what you can use - 1 byte.
But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.
You don't have to enumerate the values of the enum:
enum Foo
{
a,
b,
c,
d,
e,
f,
};
Foo myfoo = a;
Here Foo is an alias of int, which on your machine takes 4 bytes.
The smallest type is char, which is defined as the smallest addressable data on the target machine. The CHAR_BIT macro yields the number of bits in a char and is defined in limits.h.
[Edit]
Note that generally speaking you shouldn't ask yourself such questions. Always use [unsigned] int if it's sufficient, except when you allocate quite a lot of memory (e.g. int[100*1024] vs char[100*1024], but consider using std::vector instead).
The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:
enum Foo {
a, b, c, d, e, f
}
__attribute__((__packed__));
Now, sizeof(Foo) == 1.
The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.
#include <iostream>
using namespace std;
class MyType {
public:
MyType( int a ) : val( a ) {
if ( val < 0 || val > 6 ) {
throw( "bad value" );
}
}
int Value() const {
return val;
}
private:
char val;
};
int main() {
MyType v( 2 );
cout << sizeof(v) << endl;
cout << v.Value() << endl;
}
It is likely that packing oddly sized values into bitfields will incur a sizable performance penalty due to the architecture not supporting bit-level operations (thus requiring several processor instructions per operation). Before you implement such a type, ask yourself if it is really necessary to use as little space as possible, or if you are committing the cardinal sin of programming that is premature optimization. At most, I would encapsulate the value in a class whose backing store can be changed transparently if you really do need to squeeze every last byte for some reason.
You can use an unsigned char. Probably typedef it into an BYTE. It will occupy only one byte.