C++ File IO: Reading and Writing 16-bit Words

C++ File IO: Reading and Writing 16-bit Words - c++

I want to write non-Unicode, 16-bit words to a file a read them back later. I know with a bit of byte manipulation I can do this in char mode using fstream::read() and fstream::write(). What do I need to do to use 16-bit words directly?
For example, it seems I should be able to do the following:
basic_ofstream<uint16_t> aw;
aw.open("test.bin", ios::binary);
uint16_t c[] = {0x55aa, 0x1188};
aw.write(c, 2);
aw.close();
basic_ifstream<uint16_t> ax;
ax.open("test.bin", ios::binary);
uint16_t ui[2];
ax.read(ui, 2);
ax.close();
cout << endl << hex << unsigned(ui[0]) << " " << unsigned(ui[1]) << endl;
gcc 4.4 output:
d 0
Vc++10 output:
CCCC CCCC
I've also tried using std::basic_filebuf<uint16_t> direct and got the same results. Why?

I'm actually surprised you got the streams instantiated to do any reading at all! What the result will be is possibly implementation defined (i.e., you might find the behavior described in the compiler's documentation) but possibly it is just not specified (although not quite undefined). I don't think the stream classes are required to support instantiations for other types than char and wchar_t immediately, i.e., without the user providing at least some of the facets.
The standard stream classes are templates on the character type but aren't easy to instantiate for any unsupported type. At bare minimum, you'd need to implement a suitable std::codecvt<int16_t, char, std::mbstate_t> facet converting between the external representation in byte and the internal representation. From the looks of it the two systems you tried have made different choices for their default implementation.
std::codecvt<internT, externT, stateT> is the facet used to convert between an external representation of characters and an internal representation of characters. Streams are only required to support char which is considered to represent bytes as the external type externT. The internal character type internT can be any integral type but the conversion needs to be defined by implementing the code conversion facet. If I recall correctly, the streams can also assume that the state type stateT is std::mbstate_t (which is actually somewhat problematic because there is no interface defined for this type!).
Unless you are really dedicated in creating an I/O stream for your character type uint16_t, you probably want to read bytes using std::ifstream and convert them to your character type. Similarly for writing characters. To really create an I/O stream also supporting formatting, you'd need a number of other facets, too (e.g., std::ctype<uint16_t>, std::num_punct<uint16_t>) and you'd need to build a std::locale to contain all of these plus a few which can be instantiated from the standard library's implementation (e.g., std::num_get<uint16_t> and std::num_put<uint16_t>; I think their iterator types are suitable defaulted).

When I try your code, the file is written, but nothing is inside, its size is 0 after closing it. When reading from that file, nothing can be read. What you see in the output is uninitialized garbage.
Besides using ofstream/ifstream with default char you should not necessarily rely on read() and write() methods because they do not indicate if they actually write anything. Refer to http://en.cppreference.com/w/cpp/io/basic_ostream/write for details on this. Especially this is interesting:
This function is an unformatted output function: it begin execution by
constructing an object of type sentry, which flushes the tie()'d
output buffers if necessary and checks the stream errors. After
construction, if the sentry object returns false, the function returns
without attempting any output.
It is likely that this is why there is not output written to your file because it seems it is not designed to work with any other types than char or similar.
Update: To see if writing/reading succeed check the fail or bad bit which should have already indicated that something went wrong.
cout << aw.fail() << aw.bad() << "\n";
cout << ax.fail() << ax.bad() << "\n";
Both were set to true, so your real question should have been: why did the call to write() fail?

I suggest reading: http://www.cplusplus.com/articles/DzywvCM9/
Snippets:
"The problem with these types is that their size is not well defined.
int might be 8 bytes on one machine, but only 4 bytes on another. The
only one that's consistent is char... which is guaranteed to always be
1 byte."
u16 ReadU16(istream& file)
{
u16 val;
u8 bytes[2];
file.read( (char*)bytes, 2 ); // read 2 bytes from the file
val = bytes[0] | (bytes[1] << 8); // construct the 16-bit value from those bytes
return val;
}
void WriteU16(ostream& file, u16 val)
{
u8 bytes[2];
// extract the individual bytes from our value
bytes[0] = (val) & 0xFF; // low byte
bytes[1] = (val >> 8) & 0xFF; // high byte
// write those bytes to the file
file.write( (char*)bytes, 2 );
}
You may want to refresh on the "typedef" keyword as well, for defining the guaranteed-#-bits types. While a little more of a learning curve, Boost and C99 compilers define guaranteed size types as well. I'm not sure about X++0x, but it's too new to be portable.

You can use char specializations and reinterpret_cast:
basic_ofstream<char> aw;
...
aw.write( reinterpret_cast<const char*>(i16buf), n2write*sizeof(int16_t) );
basic_ifstream<char> ax;
...
ax.read( reinterpret_cast<char*>(i16buf), n2read*sizeof(int16_t) );
The "sizeof(int16_t)" is for the edge cases where sizeof(int16_t)==1 (e.g. DSP processors)
Of course, if you need to read/write in a specific byte order, then you need endian conversion functions. Note, there is no standard compile-time way of determining endianness.

Related

Why bitset order looks like reversing per byte

I'm having trouble when I want to read binary file into bitset and process it.
std::ifstream is("data.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg(0, is.end);
int length = is.tellg();
is.seekg(0, is.beg);
char *buffer = new char[length];
is.read(buffer, length);
is.close();
const int k = sizeof(buffer) * 8;
std::bitset<k> tmp;
memcpy(&tmp, buffer, sizeof(buffer));
std::cout << tmp;
delete[] buffer;
}
int a = 5;
std::bitset<32> bit;
memcpy(&bit, &a, sizeof(a));
std::cout << bit;
I want to get {05 00 00 00} (hex memory view), bitset[0~31]={00000101 00000000 00000000 00000000} but I get bitset[0~31]={10100000 00000000 00000000 00000000}

You need to learn how to crawl before you can crawl on broken glass.
In short, computer memory is an opaque box, and you should stop making assumptions about it.
Hyrum's law is the stupidest thing that has ever existed and if you stopped proliferating this cancer, that would be great.
What I'm about to write is common sense to every single competent C++ programmer out there. As trivial as breathing, and as important as breathing. It should be included in every single copy of C++ book ever, hammered into the heads of new programmers as soon as possible, but for some undefined reason, isn't.
The only things you can rely on when it comes to what I'm going to loosely define as "memory" is bits of a byte never being out of order. std::byte is such a type, and before it was added to the standard, we used unsigned char, they are more or less interchangeable, but you should prefer std::byte whenever you can.
So, what do I mean by this?
std::byte a = 0b10101000;
assert(((a >> 3) & 1) == 1); // always true
That's it, everything else is up to the compiler, your machine architecture and stars in the sky.
Oh, what, you thought you can just write int a = 0b1010100000000010; and expect something good? I'm sorry, but that's just not how things work in these savage lands. If you expect any order here, you will have to split it into bytes yourself, you cannot just cast this into std::byte bytes[2] and expect bytes[0] == 0b10101000. It is NEVER correct to assume anything here, if you do, one day your code will break, and by the time you realize that it's broken it will be too late, because it will be yet another undebuggable 30 million line of code legacy codebase half of which is only available in proprietary shared objects that we didn't have source code of since 1997. Goodluck.
So, what's the correct way? Luckily for us, binary shifts are architecture independent. int is guaranteed to be no smaller than 2 bytes, so that's the only thing this example relies on, but most machines have sizeof (int) == 4. If you needed more bytes, or exact number of bytes, you should be using appropriate type from Fixed width integer types.
int a = 0b1010100000000010;
std::byte bytes[2]; // always correct
// std::byte bytes[4]; // stupid assumption by inexperienced programmers
// std::byte bytes[sizeof (a)]; // flexible solution that needs more work
// we think in terms of 8 bits, don't care about the rest
bytes[0] = a & 0xFF;
// we need to skip possibly more than 8 bits to access next 8 bits however
bytes[1] = (a >> CHAR_BIT) & 0xFF;
This is the only absolutely correct way to convert sizeof (T) > 1 into array of bytes and if you see anything else then it's without a doubt subpar implementation that will stop working the moment you change a compiler and/or machine architecture.
The reverse is true too, you need to use binary shifts to convert a byte array to a size bigger than 1 byte.
On top of that, this only applies to primitive types. int, long, short... Sometimes you can rely on it working correctly with float or double as long as you always need IEEE 754 and will never need a machine so old or bizarre that it doesn't support IEEE 754. That's it.
If you think really long and hard, you may realize that this is no different from structs.
struct x {
int a;
int b;
};
What can we rely on? Well, we know that x will have address of a. That's it. If we want to set b, we need to access it by x.b, every other assumption is ALWAYS wrong with no ifs or buts. The only exception is if you wrote your own compiler and you are using your own compiler, but you're ignoring the standard and at that point anything is possible; that's fine, but it's not C++ anymore.
So, what can we infer from what we know now? Array of bytes cannot be just memcpy'd into a std::bitset. You don't know its implementation and you cannot know its implementation, it may change tomorrow and if your code breaks because of that then it's wrong and you're a failure of a programmer.
Want to convert an array of bytes to a bitset? Then go ahead and iterate over every single bit in the byte array and set each bit of the bitset however you need it to be, that's the only correct and sane solution. Every other solution is objectively wrong, now, and forever. Until someone decides to say otherwise in C++ standard. Let's just hope that it will never happen.

How to handle binary files in a portable way using std::fstream?

The put/get methods of std::fstream classes operate on char arguments rather than ints.
Is there a portable way of representing these char-bytes as integers ?
(My naive expectation is that a binary file is a sequence of bytes,
i.e. a sequence of integers).
To make this question more concrete, consider the following two functions:
void print_binary_file_to_cout( const std::string &filename)
{
std::ifstream ifs(filename, std::ios_base::binary|std::ios_base::in);
char c;
while(ifs.get(c))
std::cout << static_cast<int>(c) << std::endl;
}
and
void make_binary_file_from_cin( const std::string &filename)
{
std::ofstream ofs(filename, std::ios_base::binary|std::ios_base::out);
const int no_char = 256;
int cInt = no_char;
while(std::cin>>cInt && cInt!=no_char )
ofs.put( static_cast<char>( cInt ) );
}
Now, suppose that one function is compiled on Windows in Visual Studio, and the other in gcc on Linux. If the output of print...() is given as the input to make...()
will the original file be reproduced?
I guess not, so I'm asking how to correctly implement this idea, i.e.
how to get a portable (and human-understandable) representation of bytes in binary files?

The most common human-readable representation of bytes is in hex (base 16) notation. You can tell iostreams to use hex format by passing std::hex into the stream. std::hex modifies the streams behavior accordingly both for input and output streams. This format is also canonical to work independent of compilers and platforms, and you do not need to use a separator (like newline) between values. As a stop value, you can use any character outside [0-9a-fA-F].
Note that you should use unsigned chars.

There is a lot of code out there tat presumes the char functions will work correctly with unsigned char variables, perhaps with a static_cast, that the forms are bit identical, but the language lawyers will say that assumption can't be relied on if you are writing "perfect" portable code.
Luckily, reinterpret_cast does offer the facility to cast any pointer into a pointer to signed or unsigned char, and that is the easiest get-out.
Two notes top consider for all binary files:
On windows the file must be opened in binary mode, otherwise any bytes with code 13 will mysteriously disappear.
To store numbers larger than 256 you will need to span together a number of byte values. You need to decide the convention for doing this: wether the first byte is the least or most significant part of the value. Certain archetectures (arm native and 68K) use the "big end" model, where the most significant byte is first, while intel (and arm in switched mode) use a "little end" model. If you are reading byte by byte you just have to specify it.

Encoding and decoding uint8_t using TinyCbor C Library

I am implementing C++ 11 based application and I am using TinyCbor C library for Encoding and Decoding application specific data as below:
#include "cbor.h"
#include <iostream>
using namespace std;
int main() {
struct MyTest {
uint8_t varA;
float vabB;
};
MyTest obj;
obj.varA = 100; // If I set it t0 below 20 then it works
obj.varB = 10.10;
uint8_t buff[100];
//Encode
CborEncode encoder;
CborEncode array;
cbor_encoder_init(&encoder, buff, sizeof(buff), 0);
cbor_encoder_create_array(&encode, &array, CborIndefiniteLength);
cbor_encode_simple_value(&array, obj.varA);
cbor_encode_float(&array, obj.varB);
cbor_encoder_close_container(&encoder, &array);
// Decode
CborParser parse;
CborValue value;
cbor_parser_init(buff, sizeof(buff), 0, &parser, &value);
CborValue array;
cbor_value_enter_container(&value, &array);
uint8_t val;
cbor_value_get_simple_type(&array, &val);
// This prints blank
cout << "uint8_t value: " << static_cast<int>(val) << endl;
float fval;
cbor_value_get_simple_type(&array, &fval);
cout << "float value: " << fval << endl;
return 0;
}
Above code works when I set value of uint8_t varA to below 20, I see 20 getting printed on console but if I set more than 20 then sometimes it gives error CborErrorIllegalSimpleType. Or if value is set to 21 then it returns me type as CborBooleanType or CborNullType.
What is wrong with the code
How to encode and decode uint8_t using TinyCbor.

You have a few things going on here.
cbor_encoder_create_array(&encode, &array, CborIndefiniteLength);
Don’t use indefinite length unless you plan on streaming the encoding. If you have all the data in front of you when encoring, use defined length. See why here: Serialize fixed size Map to CBOR also, and I’m not sure what version you are using but indefinite length objects were at least fairly recently a TO DO list item for TinyCBOR.
cbor_value_get_simple_type(&array, &val);
You don’t need simple here. Simple types are primatives that are mostly undefined. CBOR’s default type is int64_t, signed long long. Simple does allow for a natural stop at 8bit, but 20 in simple is Boolean false. 21 true , 22 null. You discovered this already. It can’t store a negative, or float, and while you could use it to represent an 8bit like “100”, you really shouldn’t. The good news about CBOR’s types is that while the default is 64bit, it uses as little memory to encode as needed. So storing 100 in CBOR is one byte, not 8, not counting the overhead for a second. EDIT: Clarification, 2 bytes to store integer 100. CBOR is one byte for 0-23, but 24 and one has a byte of overhead.
By overhead I mean, when you encode a cbor integer, the first three bits mean UNSIGNED INTEGER (binary 000), one bit is reserved, the other 4 bits are value, so if you had a small value that fit in those four bits (value 0-23) you can get it all in one byte. That reserved bit is weird, see the chart here to understand it: https://en.m.wikipedia.org/wiki/CBOR So 24-255 requires 2 bytes encoded, etc etc. A far cry from it always using 8bytes just because it can.
The short version here is that CBOR won’t use more space than needed - BUT - it’s not a strongly typed serializer! If you mean for 100 inside 8bits to be stored in that array and someone stores a 10000 to be with 16bits, it’ll look/work fine until you go to parse and store a large number in an 8bit spot. You need to cast or validate your data. I recommend parse THEN validate.
cbor_value_get_simple_type(&array, &fval);
cout << "float value: " << fval << endl;
I’d have to look in TinyCBOR’s code, but I think this is a happy accident and not technically supported to work. Because simple types use the same three major bits you are able to get the value’s 64bit precision with get_simple. You should instead examine the type and make the correct call for get full or half precision float.
TinyCBOR is pretty nice, but there are definitely a couple gotchas hidden. It really helps to undersnd the CBOR encoding even though you are trusting the serializer to do the work for you.

Use CborError cbor_encode_uint(CborEncoder *encoder, uint64_t value) instead. This function will encode an unsigned integer value with the smallest representation.

How to read and write data in 8 bit integers unit form by c++ file functions

Is it possible to store data in integer form from 0 to 255 rather than 8-bit characters.Although both are same thing, how can we do it, for example, with write() function?
Is it ok to directly cast any integer to char and vice versa? Does something like
{
int a[1]=213;
write((char*)a,1);
}
and
{
int a[1];
read((char*)a,1);
cout<<a;
}
work to get 213 from the same location in the file? It may work on that computer but is it portable, in other words, is it suitable for cross-platform projects in that way? If I create a file format for each game level(which will store objects' coordinates in the current level's file) using this principle, will it work on other computers/systems/platforms in order to have loaded same level?

The code you show would write the first (lowest-address) byte of a[0]'s object representation - which may or may not be the byte with the value 213. The particular object representation of an int is imeplementation defined.
The portable way of writing one byte with the value of 213 would be
unsigned char c = a[0];
write(&c, 1);

You have the right idea, but it could use a bit of refinement.
{
int intToWrite = 213;
unsigned char byteToWrite = 0;
if ( intToWrite > 255 || intToWrite < 0 )
{
doError();
return();
}
// since your range is 0-255, you really want the low order byte of the int.
// Just reading the 1st byte may or may not work for your architecture. I
// prefer to let the compiler handle the conversion via casting.
byteToWrite = (unsigned char) intToWrite;
write( &byteToWrite, sizeof(byteToWrite) );
// you can hard code the size, but I try to be in the habit of using sizeof
// since it is better when dealing with multibyte types
}
{
int a = 0;
unsigned char toRead = 0;
// just like the write, the byte ordering of the int will depend on your
// architecture. You could write code to explicitly handle this, but it's
// easier to let the compiler figure it out via implicit conversions
read( &toRead, sizeof(toRead) );
a = toRead;
cout<<a;
}
If you need to minimize space or otherwise can't afford the extra char sitting around, then it's definitely possible to read/write a particular location in your integer. However, it can need linking in new headers (e.g. using htons/ntons) or annoying (using platform #defines).

It will work, with some caveats:
Use reinterpret_cast<char*>(x) instead of (char*)x to be explicit that you’re performing a cast that’s ordinarily unsafe.
sizeof(int) varies between platforms, so you may wish to use a fixed-size integer type from <cstdint> such as int32_t.
Endianness can also differ between platforms, so you should be aware of the platform byte order and swap byte orders to a consistent format when writing the file. You can detect endianness at runtime and swap bytes manually, or use htonl and ntohl to convert between host and network (big-endian) byte order.
Also, as a practical matter, I recommend you prefer text-based formats—they’re less compact, but far easier to debug when things go wrong, since you can examine them in any text editor. If you determine that loading and parsing these files is too slow, then consider moving to a binary format.

How to convert first two bytes of char[] from network byte order to host byte order?

I have a char xyz[2];
I receive 2 octet numbers in network byte order in xyz[0] and xyz[1].How do i change this to host order.
How do i use ntohs to convert xyz. Please help.

Do you mean "How do I convert a data stream in network order to a data stream in host order?" In that case you can use the ntohs()/htons() functions. Be careful how you invoke those, though, since you may have to take alignment issues into account. The universal solution would be to do a manual swapping of the (or each) pair of bytes.
On the other hand, if you want to deserialize data that comes to you in network order, and you want to use the values that are serialized in the data in your program, then the notion of a "host order" is a red herring. All you need to know whether the data that you receive is in big-endian or little-endian order:
unsigned short int le_value = xyz[0] + (xyz[1] << 8); // little-endian
unsigned short int be_value = xyz[1] + (xyz[0] << 8); // big-endian
This is a typical hallmark of platform-independent programming: Your program interna should be entirely independent of implementation details, and fixed implementations have to be specified precisely at the program boundaries (i.e. (de)serialization).
Note that in general, you cannot just take an existing byte buffer and interpret it as a different value in place, since that is not allowed by the standard. That is, you can only treat data as an int that has been declared as an int. For example, the following is illegal: char * buf = get(); int a = *(int*)buf;. The legal version starts with the target type: int a; get_data((char*)&a);

Try this:
uint16_t result = ntohs(*(uint16_t*)xyz);

use std::swap if you know that the host order and the network order are different or do nothing?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js