Reading binary data with fstream method - c++

I need to read binary data to buffer, but in the fstreams I have read function reading data into char buffer, so my question is:
How to transport/cast binary data into unsigned char buffer and is it best solution in this case?
Example
char data[54];
unsigned char uData[54];
fstream file(someFilename,ios::in | ios::binary);
file.read(data,54);
// There to transport **char** data into **unsigned char** data (?)
// How to?

Just read it into unsigned char data in the first place
unsigned char uData[54];
fstream file(someFilename,ios::in | ios::binary);
file.read((char*)uData, 54);
The cast is necessary but harmless.

You don't need to declare the extra array uData. The data array can simply be cast to unsigned:
unsigned char* uData = reinterpret_cast<unsigned char*>(data);
When accessing uData you instruct the compiler to interpret the data different, for example data[3] == -1, means uData[3] == 255

You could just use
std::copy(data, data + n, uData);
where n is the result returned from file.read(data, 54). I think, specifically for char* and unsigned char* you can also portably use
std::streamsize n = file.read(reinterpret_cast<char*>(uData));

Related

What is the most suitable type of vector to keep the bytes of a file?

What is the most suitable type of vector to keep the bytes of a file?
I'm considering using the int type, because the bits "00000000" (1 byte) are interpreted to 0!
The goal is to save this data (bytes) to a file and retrieve from this file later.
NOTE: The files contain null bytes ("00000000" in bits)!
I'm a bit lost here. Help me! =D Thanks!
UPDATE I:
To read the file I'm using this function:
char* readFileBytes(const char *name){
std::ifstream fl(name);
fl.seekg( 0, std::ios::end );
size_t len = fl.tellg();
char *ret = new char[len];
fl.seekg(0, std::ios::beg);
fl.read(ret, len);
fl.close();
return ret;
}
NOTE I: I need to find a way to ensure that bits "00000000" can be recovered from the file!
NOTE II: Any suggestions for a safe way to save those bits "00000000" to a file?
NOTE III: When using char array I had problems converting bits "00000000" for that type.
Code Snippet:
int bit8Array[] = {0, 0, 0, 0, 0, 0, 0, 0};
char charByte = (bit8Array[7] ) |
(bit8Array[6] << 1) |
(bit8Array[5] << 2) |
(bit8Array[4] << 3) |
(bit8Array[3] << 4) |
(bit8Array[2] << 5) |
(bit8Array[1] << 6) |
(bit8Array[0] << 7);
UPDATE II:
Following the #chqrlie recommendations.
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <random>
#include <cstring>
#include <iterator>
std::vector<unsigned char> readFileBytes(const char* filename)
{
// Open the file.
std::ifstream file(filename, std::ios::binary);
// Stop eating new lines in binary mode!
file.unsetf(std::ios::skipws);
// Get its size
std::streampos fileSize;
file.seekg(0, std::ios::end);
fileSize = file.tellg();
file.seekg(0, std::ios::beg);
// Reserve capacity.
std::vector<unsigned char> unsignedCharVec;
unsignedCharVec.reserve(fileSize);
// Read the data.
unsignedCharVec.insert(unsignedCharVec.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return unsignedCharVec;
}
int main(){
std::vector<unsigned char> unsignedCharVec;
// txt file contents "xz"
unsignedCharVec=readFileBytes("xz.txt");
// Letters -> UTF8/HEX -> bits!
// x -> 78 -> 0111 1000
// z -> 7a -> 0111 1010
for(unsigned char c : unsignedCharVec){
printf("%c\n", c);
for(int o=7; o >= 0; o--){
printf("%i", ((c >> o) & 1));
}
printf("%s", "\n");
}
// Prints...
// x
// 01111000
// z
// 01111010
return 0;
}
UPDATE III:
This is the code I am using using to write to a binary file:
void writeFileBytes(const char* filename, std::vector<unsigned char>& fileBytes){
std::ofstream file(filename, std::ios::out|std::ios::binary);
file.write(fileBytes.size() ? (char*)&fileBytes[0] : 0,
std::streamsize(fileBytes.size()));
}
writeFileBytes("xz.bin", fileBytesOutput);
UPDATE IV:
Futher read about UPDATE III:
c++ - Save the contents of a "std::vector<unsigned char>" to a file
CONCLUSION:
Definitely the solution to the problem of the "00000000" bits (1 byte) was change the type that stores the bytes of the file to std::vector<unsigned char> as the guidance of friends. std::vector<unsigned char> is a universal type (exists in all environments) and will accept any octal (unlike char* in "UPDATE I")!
In addition, changing from array (char) to vector (unsigned char) was crucial for success! With vector I manipulate my data more securely and completely independent of its content (in char array I have problems with this).
Thanks a lot!
Use std::vector<unsigned char>. Don't use std::uint8_t: it's won't exist on systems that don't have a native hardware type of exactly 8 bits. unsigned char will always exist; it will usually be the smallest addressable type that the hardware supports, and it's required to be at least 8 bits wide, so if you're trafficking in 8-bit bytes, it will handle the bits that you need.
If you really, really, really like the fixed-width types, you might consider std::uint_least8_t, which will always exist, and has at least eight bits, or std::uint_fast8_t, which also has at least eight bits. But file I/O traffics in char types, and mixing char and it's variants with vaguely specified "least" and "fast" types may well get confusing.
There are 3 problems in your code:
You use the char type and return a char *. Yet the return value is not a proper C string as you do not allocate an extra byte for the '\0' terminator nor null terminate it.
If the file may contain null bytes, you should probably use type unsigned char or uint8_t to make it explicit that the array does not contain text.
You do not return the array size to the caller. The caller has no way to tell how long the array is. You should probably use a std::vector<uint8_t> or std::vector<unsigned char> instead of an array allocated with new.
uint8_t is the winner in my eyes:
it's exactly 8 bits, or 1 byte, long;
it's unsigned without requiring you to type unsigned every time;
it's exactly the same on all platforms;
it's a generic type that does not imply any specific use, unlike char / unsigned char, which is associated with characters of text even if it can technically be used for any purpose just the same as uint8_t.
Bottom line: uint8_t is functionally equivalent to unsigned char, but does a better job of saying this is some data of unspecified nature in the source code.
So use std::vector<uint8_t>.
#include <stdint.h> to make the uint8_t definition available.
P. S. As pointed out in the comments, the C++ standard defines char as 1 byte, and byte is not, strictly speaking, required to be the same as octet (8 bits). On such a hypothetical system, char will still exist and will be 1 byte long, but uint8_t is defined as 8 bits (octet) and thus may not exist (due to implementation difficulties / overhead). So char is more portable, theoretically speaking, but uint8_t is more strict and has wider guarantees of expected behavior.

Convert a 16-bit integer to an array of char? (C++)

I need to write 16-bit integers to a file. fstream only writes characters. Thus I need to convert the integers to char - the actual integer, not the character representing the integer (i.e. 0 should be 0x00, not 0x30) I tried the following:
char * chararray = (char*)(&the_int);
However this creates a backwards array of two characters. The individual characters are not flipped, but the order of the characters is. Thus I created this function:
char * inttochar(uint16_t input)
{
int input_size = sizeof(input);
char * chararray = (char*)(&input);
char * output;
output[0]='\0';
for (int i=0; i<input_size; i++)
{
output[i]=chararray[input_size-(i+1)];
}
return output;
}
This seems slow. Surely there is a more efficient, less hacky way to convert it?
It's a bit hard to understand what you're asking here (perhaps it's just me, although I gather the commentators thought so too).
You write
fstream only writes characters
That's true, but doesn't necessarily mean you need to create a character array explicitly.
E.g., if you have an fstream object f (opened in binary mode), you can use the write method:
uint16_t s;
...
f.write(static_cast<const char *>(&s), sizeof(uint16_t));
As others have noted, when you serialize numbers, it often pays to use a commonly-accepted ordering. Hence, use htons (refer to the documentation for your OS's library):
uint16_t s;
...
const uint16_t ns = htons(s);
f.write(static_cast<const char *>(&ns), sizeof(uint16_t));

Converting byte array to unsigned long in C++

I am reading in binary data from a file:
char* buffIn = new char[8];
ifstream inFile(path, ifstream::binary);
inFile.read(buffIn, 8);
I then want to convert the char* read in (as binary) to an unsigned long but I am having problems - I am not quite sure what is going on, but for instance 0x00000000000ACD gets interpreted as 0xFFFFFFFFFFFFCD - I suspect all the 0x00 bytes are causing some sort of problem when converting from char* to unsigned long...
unsigned long number = *(buffIn);
How do I do this properly?
Since buffIn is of type char pointer, when you do *(buffIn) you are just grabbing one character. You have to reinterpret the memory address as an unsigned long pointer and then dereference it.
unsigned long number = *((unsigned long*)buffIn);
In addition to recasting the char[8] (which will only read the the first unsigned long - which is 32-bits in length), you can also use some simple bit-wise operations
unsigned long value = (((unsigned long)buffin[0]) << 24) | (((unsigned long)buffin[1]) << 16) | (((unsigned long)buffin[2]) << 8) | (unsigned long)buffin[3];
Try something like
unsigned long* buffInL = new unsigned long[2];
char* buffIn=(char*)buffInL;
ifstream inFile(path, ifstream::binary);
inFile.read(buffIn, 8);
Unlike other types, char* is allowed to alias.

Deciphering unsigned char*

I have a process that listens to an UDP multi-cast broadcast and reads in the data as a unsigned char*.
I have a specification that indicates fields within this unsigned char*.
Fields are defined in the specification with a type and size.
Types are: uInt32, uInt64, unsigned int, and single byte string.
For the single byte string I can merely access the offset of the field in the unsigned char* and cast to a char, such as:
char character = (char)(data[1]);
Single byte uint32 i've been doing the following, which also seems to work:
uint32_t integer = (uint32_t)(data[20]);
However, for multiple byte conversions I seem to be stuck.
How would I convert several bytes in a row (substring of data) to its corresponding datatype?
Also, is it safe to wrap data in a string (for use of substring functionality)? I am worried about losing information, since I'd have to cast unsigned char* to char*, like:
std::string wrapper((char*)(data),length); //Is this safe?
I tried something like this:
std::string wrapper((char*)(data),length); //Is this safe?
uint32_t integer = (uint32_t)(wrapper.substr(20,4).c_str()); //4 byte int
But it doesn't work.
Thoughts?
Update
I've tried the suggest bit shift:
void function(const unsigned char* data, size_t data_len)
{
//From specifiction: Field type: uInt32 Byte Length: 4
//All integer fields are big endian.
uint32_t integer = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | (data[3]);
}
This sadly gives me garbage (same number for every call --from a callback).
I think you should be very explicit, and not just do "clever" tricks with casts and pointers. Instead, write a function like this:
uint32_t read_uint32_t(unsigned char **data)
{
const unsigned char *get = *data;
*data += 4;
return (get[0] << 24) | (get[1] << 16) | (get[2] << 8) | get[3];
}
This extracts a single uint32_t value from a buffer of unsigned char, and increases the buffer pointer to point at the next byte of data in the buffer.
This assumes big-endian data, you need to have a well-defined idea of the buffer's endian-mode in order to interpret it.
Depends on the byte ordering of the protocol, for big-endian or so called network byte order do:
uint32_t i = data[0] << 24 | data[1] << 16 | data[2] << 8 | data[3];
Without commenting on whether it's a good idea or not, the reason why it doesn't work for you is that the result of wrapper.substring(20,4).c_str() is (uint32_t *), not (uint32_t). So if you do:
uint32_t * integer = (uint32_t *)(wrapper.substr(20,4).c_str(); it should work.
uint32_t integer = ntohl(*reinterpret_cast<const uint32_t*>(data + 20));
or (handles alignment issues):
uint32_t integer;
memcpy(&integer, data+20, sizeof integer);
integer = ntohl(integer);
The pointer way:
uint32_t n = *(uint32_t*)&data[20];
You will run into problems on different endian architectures though. The solution with bit shifts is better and consistent.
std::string wrapper((char*)(data),length); //Is this safe?
This should be safe since you specified the length of the data.
On the other hand if you did this:
std::string wrapper((char*)data);
The string length would be determined wherever the first 0 byte occurs, and you will more than likely chop off some data.

Reading data from binary file

I am trying to read data from binary file, and having issues. I have reduced it down to the most simple case here, and it still won't work. I am new to c++ so I may be doing something silly but, if anyone could advise I would be very grateful.
Code:
int main(int argc,char *argv[]) {
ifstream myfile;
vector<bool> encoded2;
cout << encoded2 << "\n"<< "\n" ;
myfile.open(argv[2], ios::in | ios::binary |ios::ate );
myfile.seekg(0,ios::beg);
myfile.read((char*)&encoded2, 1 );
myfile.close();
cout << encoded2 << "\n"<< "\n" ;
}
Output
00000000
000000000000000000000000000011110000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Compression_Program(58221) malloc: * error for object 0x10012d: Non-aligned pointer being freed
* set a breakpoint in malloc_error_break to debug
Thanks in advance.
Do not cast a vector<bool>* to a char*. It is does not do anything predictable.
You are reading on encoded2: myfile.read((char*)&encoded2, 1 );. this is wrong. you can to read a bool and then put it in encoded2
bool x;
myfile.read( &x, 1 );
encoded2[0] = x;
Two mistakes here:
you assume the address of a vector is the address of the first element
you rely on vector<bool>
Casting a vector into a char * is not really a good thing, because a vector is an object and stores some state along with its elements.
Here you are probably overwriting the state of the vector, thus the destructor of fails.
Maybe you would like to cast the elements of the vector (which are guaranteed to be stored contiguously in memory). But another trap is that vector<bool> may be implementation-optimized.
Therefore you should do a encoded2.reserve(8) and use myfile.read(reinterpret_cast<char *>(&encoded2[0])).
But probably you want to do something else and we need to know what the purpose is here.
You're overwriting a std::vector, which you shouldn't do. A std::vector is actually a pointer to a data array and an integer (probably a size_t) holding its size; if you overwrite these with practically random bits, data corruption will occur.
Since you're only reading a single byte, this will suffice:
char c;
myfile.read(&c, 1);
The C++ language does not provide an efficient I/O method for reading bits as bits. You have to read bits in groups. Also, you have to worry about Endianess when reading int the bits.
I suggest the old fashioned method of allocating a buffer, reading into the buffer then operating on the buffer.
Allocating a buffer
const unsigned int BUFFER_SIZE = 1024 * 1024; // Let the compiler calculate it.
//...
unsigned char * const buffer = new unsigned char [BUFFER_SIZE]; // The pointer is constant.
Reading in the data
unsigned int bytes_read = 0;
ifstream data_file("myfile.bin", ios::binary); // Open file for input without translations.
data_file.read(buffer, BUFFER_SIZE); // Read data into the buffer.
bytes_read = data_file.gcount(); // Get actual count of bytes read.
Reminders:
delete the buffer when you are
finished with it.
Close the file when you are finished
with it.
myfile.read((char*) &encoded2[0], sizeof(int)* COUNT);
or you can use push_back();
int tmp;
for(int i = 0; i < COUNT; i++) {
myfile.read((char*) &tmp, 4);
encoded2.push_back(tmp);
}