most efficient way to Read and write binary - c++

I'm working on a project that I need to first read data from a file, then make some change to it, and then save it to another file (all in binary mode).
For reading, my first try was to open the file with ifstream and read directly from the file with read(), but because I need to read small bytes from the file back to back, I think it's not a good idea to keep reading data directly from the file itself. I mean, currently I'm doing it this way for reading the file into a structure and normal variables:
namespace DBinary {
#pragma pack(push, 1)
struct Structure
{
int32_t iData1;
int16_t iData2;
int16_t iData3;
int16_t iData4a;
int16_t iData4b;
int32_t iData4c;
};
#pragma pack(pop)
}
int main()
{
std::ifstream input(path, std::ios::binary);
//for reading structure
DBinary::Structure tstruc{};
file.read((char*)&tstruc, sizeof(DBinary::Structure));
//read single value
uint16_t anint = 0;
core_file.read((char*)&anint, sizeof(anint));
}
It's OK, but I think I can do it better, because the file isn't that big. Maybe I can read it fully into memory and then work on it? But I'm not sure what is the best way to do that, and how to do that, because I don't have much experience in C++ and I'm new to it.
I also want to be able to freely edit and change the data that I read from files, so its important for me to also support that.

i prefer this
std::fstream fa("/etc/passwd",std::ios_base::in|std::ios_base::binary);
std::stringstream mj;
fa>>mj.rdbuf();
then you have all stuff in mj.str()

Related

Read an image or pdf using C++ without external library

I was just thinking after reading about Java & C#, whether C++ can also read image & pdf files without the use of external libraries ? C++ doesn't have the byte type like Java & C#. Then how can we accomplish the task ( again without using an external library) ?
Can anyone give a small demonstration (ie a program or code to read or copy or write image or pdf files) ?
You can use unsigned char or char reinterpreted as some integer type to parse binary file formats like pdf, jpeg etc. You can create a buffer as std::vector<char> and read it as following:
std::vector<char> buffer((
std::istreambuf_iterator<char>(infile)), // Ensure infile was opened with binary attribute
(std::istreambuf_iterator<char>()));
Related questions: Reading and writing binary file
There is no difference what file you are reading opened in binary mode, there is only difference is how you should interpret the data you get from the file.
It's significantly better to take ready to use library like e.g. libjpeg or whatever. There are plenty of them. But If you really want to do this, at first you should define suitable structures and constants (see links below) to make code to be convinient and useable. Then you just read the data and try to interpret it step by step. The code below is just pseudo code, I didn't compile it.
#include <fstream>
// define header structure
struct jpeg_header
{
enum class marker: unsigned short { eoi = 0xffd8, sof0 = 0xffc0 ... };
...
};
bool is_eoi(unsigned short m) { return jpeg_header::eoi == m; }
jpeg_header read_jpeg_header(const std::string& fn)
{
std::ifstream inf(fn, std::ifstream::binary);
if (!inf)
{
throw std::runtime_error("Can't open file: " + fn);
}
inf.exceptions(std::ifstream::failbit | std::ifstream::eofbit);
unsigned short marker = inf.get() << 8;
marker |= inf.get();
if (!is_eoi(marker))
{
throw std::runtime_error("Invalid jpeg header");
}
...
jpeg_header header;
// read further and fill header structure
...
return header;
}
To read huge block of data use ifstream::read(), ifstream::readsome() methods. Here is the good example http://en.cppreference.com/w/cpp/io/basic_istream/read.
Those functions also work faster then stream iterators. It's also better define your own exception classes derived from std::runtime_error.
For details on file formats you interested in look here
Structure of a PDF file?
https://en.wikipedia.org/wiki/JPEG_File_Interchange_Format
https://en.wikipedia.org/wiki/JPEG
It would be a strange world to have a system language like C and in this case C++ without a type byte :).
Yeah, I take it, it has strange name, unsigned char, but it is still there:).
Really just think about the magnitude of re-development of all things to avoid byte:). Peripherals, many registers in CPU's and other chips, communication, data protocols. It would all have to be redone:).

C++ Parsing data bytes into struct

I am new in here and I have a question
I have a struct, let's say overall size is 8 bytes, here the struct:
struct Header
{
int ID; // 4 bytes
char Title [4]; // 4 bytes too
}; // so it 8 bytes right?
and I have a file with 8 bytes too...
I just want to ask, how to parse data on that file into the struct of that
I have tried this one:
Header* ParseHeader(char* filename)
{
char* buffer = new char[8];
fstream fs(filename);
if (fs.is_open() != true)
throw new exception("Couldn't Open file for Parsing Header.");
fs.read(buffer, 8);
if (!fs)
{
delete[] buffer;
throw new exception("Couldn't Read header OJN file.\nHeader data was corrupted");
}
Header* header = (Header*)((void*)buffer);
delete[] buffer;
fs.close();
return header;
}
but it fail, and return invalid data than what I was expect (I can make you sure, this is not file fault, the file structured correctly)
Can someone help me?
Thanks
Seems like you do everything fine until this point:
Header* header = (Header*)((void*)buffer);
delete[] buffer;
fs.close();
notice you delete the buffer after the casting, meaning that header points to a deleted location -> junk, you need to either not delete or copy the data if you like to still use it.
Also, to be quite honest, I don't understand how your code compiles, your function states it returns a Header, while you return a Header*..
You are deleting the data that is being returned. Therefore header is no longer accessible.
I think you meant the line to be:
Header header = *(Header*)((void*)buffer);
This will actually copy the header.
The fact that your 8 bytes file correctly maps to your struct Header is mere luck as far as C++ is involved. The structure could have internal padding that make it bigger than 8 bytes, and the data endianness could be different between your file and your CPU.
I realize your code works with your particular compiler version, on your operating system and on your CPU but you should not get into the habit of coding like that, otherwise you'll probably get into big trouble as soon as you change any of those parameters (or maybe even just some compiler flags). In other words, what you are doing is extremely bad practice. In C++ you don't even have the guarantee that an int is actually 4 bytes.
The Right Way™ to load such binary data from a file is to load each field individually and ensure proper endianness conversion depending on the CPU you're using (eg. through hton* / ntoh* or similar functions). Using a fixed-size type like int32_t also helps.
Just define your structure in 1-byte boundary as:
#pragma pack(1)
struct Header
{
int ID; // 4 bytes
char Title [4]; // 4 bytes too
};
#pragma pack()
First pack statement instructs the compiler to use one-byte padding for members in structure. This way size of Header will be 8 bytes. Second pack statement instructs to go back to default setting. You may need to use push and pop instructions (See enter link description here) - but this isn't required for you.
Secondly, and more importantly, you should not use hard-code values like 8. Always use sizeof to read or write a structure. Also, this statement is absolutely not needed:
char* buffer = new char[8];
...
Just declare Header variable itself, and read on it:
Header header;
...
fs.read(&header, sizeof(Header));

Parsing binary data from file

and thank you in advance for your help!
I am in the process of learning C++. My first project is to write a parser for a binary-file format we use at my lab. I was able to get a parser working fairly easily in Matlab using "fread", and it looks like that may work for what I am trying to do in C++. But from what I've read, it seems that using an ifstream is the recommended way.
My question is two-fold. First, what, exactly, are the advantages of using ifstream over fread?
Second, how can I use ifstream to solve my problem? Here's what I'm trying to do. I have a binary file containing a structured set of ints, floats, and 64-bit ints. There are 8 data fields all told, and I'd like to read each into its own array.
The structure of the data is as follows, in repeated 288-byte blocks:
Bytes 0-3: int
Bytes 4-7: int
Bytes 8-11: float
Bytes 12-15: float
Bytes 16-19: float
Bytes 20-23: float
Bytes 24-31: int64
Bytes 32-287: 64x float
I am able to read the file into memory as a char * array, with the fstream read command:
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
So, from what I understand, I now have a pointer to an array called "buffer". If I were to call buffer[0], I should get a 1-byte memory address, right? (Instead, I'm getting a seg fault.)
What I now need to do really ought to be very simple. After executing the above ifstream code, I should have a fairly long buffer populated with a number of 1's and 0's. I just want to be able to read this stuff from memory, 32-bits at a time, casting as integers or floats depending on which 4-byte block I'm currently working on.
For example, if the binary file contained N 288-byte blocks of data, each array I extract should have N members each. (With the exception of the last array, which will have 64N members.)
Since I have the binary data in memory, I basically just want to read from buffer, one 32-bit number at a time, and place the resulting value in the appropriate array.
Lastly - can I access multiple array positions at a time, a la Matlab? (e.g. array(3:5) -> [1,2,1] for array = [3,4,1,2,1])
Firstly, the advantage of using iostreams, and in particular file streams, relates to resource management. Automatic file stream variables will be closed and cleaned up when they go out of scope, rather than having to manually clean them up with fclose. This is important if other code in the same scope can throw exceptions.
Secondly, one possible way to address this type of problem is to simply define the stream insertion and extraction operators in an appropriate manner. In this case, because you have a composite type, you need to help the compiler by telling it not to add padding bytes inside the type. The following code should work on gcc and microsoft compilers.
#pragma pack(1)
struct MyData
{
int i0;
int i1;
float f0;
float f1;
float f2;
float f3;
uint64_t ui0;
float f4[64];
};
#pragma pop(1)
std::istream& operator>>( std::istream& is, MyData& data ) {
is.read( reinterpret_cast<char*>(&data), sizeof(data) );
return is;
}
std::ostream& operator<<( std::ostream& os, const MyData& data ) {
os.write( reinterpret_cast<const char*>(&data), sizeof(data) );
return os;
}
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
you need to allocate a buffer first before you read into it:
buffer = new filesize[filesize];
datafile.read (buffer, filesize);
as to the advantages of ifstream, well it is a matter of abstraction. You can abstract the contents of your file in a more convenient way. You then do not have to work with buffers but instead can create the structure using classes and then hide the details about how it is stored in the file by overloading the << operator for instance.
You might perhaps look for serialization libraries for C++. Perhaps s11n might be useful.
This question shows how you can convert data from a buffer to a certain type. In general, you should prefer using a std::vector<char> as your buffer. This would then look like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::ifstream input("your_file.dat");
std::vector<char> buffer;
std::copy(std::istreambuf_iterator<char>(input),
std::istreambuf_iterator<char>(),
std::back_inserter(buffer));
}
This code will read the entire file into your buffer. The next thing you'd want to do is to write your data into valarrays (for the selection you want). valarray is constant in size, so you have to be able to calculate the required size of your array up-front. This should do it for your format:
std::valarray array1(buffer.size()/288); // each entry takes up 288 bytes
Then you'd use a normal for-loop to insert the elements into your arrays:
for(int i = 0; i < buffer.size()/288; i++) {
array1[i] = *(reinterpret_cast<int *>(buffer[i*288])); // first position
array2[i] = *(reinterpret_cast<int *>(buffer[i*288]+4)); // second position
}
Note that on a 64-bit system this is unlikely to work as you expect, because an integer would take up 8 bytes there. This question explains a bit about C++ and sizes of types.
The selection you describe there can be achieved using valarray.

Writing to file using c and c++

When I try to write the file using C; fwrite which accepts void type as data, it is not interpreted by text editor.
struct index
{
index(int _x, int _y):x(_x), y(_y){}
int x, y;
}
index i(4, 7);
FILE *stream;
fopen_s(&stream, "C:\\File.txt", "wb");
fwrite(&i, sizeof(index), 1, stream);
but when I try with C++; ofstream write in binary mode, it is readable. why doesn't it come up same as written using fwrite?
This is the way to write binary data using a stream in C++:
struct C {
int a, b;
} c;
#include <fstream>
int main() {
std::ofstream f("foo.txt",std::ios::binary);
f.write((const char*)&c, sizeof c);
}
This shall save the object in the same way as fwrite would. If it doesn't for you, please post your code with streams - we'll see what's wrong.
C++'s ofstream stream insertion only does text. The difference between opening a iostream in binary vs text mode is weather or not end of line character conversion happens. If you want to write a binary format where a 32 bit int takes exactly 32 bits use the c functions in c++.
Edit on why fwrite may be the better choice:
Ostream's write method is more or less a clone of fwrite(except it is a little less useful since it only takes a byte array and length instead of fwrite's 4 params) but by sticking to fwrite there is no way to accidentally use stream insertion in one place and write in another. More less it is a safety mechanism. While you gain that margin of safety you loose a little flexibility, you can no longer make a iostream derivative that compresses output with out changing any file writing code.

Portable way to get file size in C/C++

I need to determin the byte size of a file.
The coding language is C++ and the code should work with Linux, windows and any other operating system. This implies using standard C or C++ functions/classes.
This trivial need has apparently no trivial solution.
Using std's stream you can use:
std::ifstream ifile(....);
ifile.seekg(0, std::ios_base::end);//seek to end
//now get current position as length of file
ifile.tellg();
If you deal with write only file (std::ofstream), then methods are some another:
ofile.seekp(0, std::ios_base::end);
ofile.tellp();
You can use stat system call:
#ifdef WIN32
_stat64()
#else
stat64()
If you only need the file size this is certainly overkill but in general I would go with Boost.Filesystem for platform-independent file operations.
Amongst other attribute functions it contains
template <class Path> uintmax_t file_size(const Path& p);
You can find the reference here. Although Boost Libraries may seem huge I found it to often implement things very efficiently. You could also only extract the function you need but this might proof difficult as Boost is rather complex.
std::intmax_t file_size(std::string_view const& fn)
{
std::filebuf fb;
return fb.open(fn.data(), std::ios::binary | std::ios::in) ?
std::intmax_t(fb.pubseekoff({}, std::ios::end, std::ios::in)) :
std::intmax_t(-1);
}
We sacrifice 1 bit for the error indicator and standard disclaimers apply when running on 32-bit systems. Use std::filesystem::file_size(), if possible, as std::filebuf may dynamically allocate buffers for file io. This would make all the iostream-based methods wasteful and slow. Files were/are meant to be streamed, though much more so in the past than today, which relegates file sizes to secondary importance.
Working example.
Simples:
std::ifstream ifs;
ifs.open("mybigfile.txt", std::ios::bin);
ifs.seekg(0, std::ios::end);
std::fpos pos = ifs.tellg();
Portability requires you to use the least common denominators, which would be C. (not c++)
The method that I use is the following.
#include <stdio.h>
long filesize(const char *filename)
{
FILE *f = fopen(filename,"rb"); /* open the file in read only */
long size = 0;
if (fseek(f,0,SEEK_END)==0) /* seek was successful */
size = ftell(f);
fclose(f);
return size;
}
The prize for absolute inefficiency would go to:
auto file_size(std::string_view const& fn)
{
std::ifstream ifs(fn.data(), std::ios::binary);
return std::distance(std::istream_iterator<char>(ifs), {});
}
Example.
Often we want to get things done in the most portable manner, but in certain situations, especially like this, I would strongly recommend using system API's for best performance.