file compression to handle intermediary output in c++ - c++

I want to compress a intermediate output of my program ( in C++) and then decompress it.

You can use Boost IOStreams to compress your data, for example something along these lines to compress/decompresses into/from a file (example adapted from Boost docs):
#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
namespace bo = boost::iostreams;
int main()
{
{
std::ofstream ofile("hello.gz", std::ios_base::out | std::ios_base::binary);
bo::filtering_ostream out;
out.push(bo::gzip_compressor());
out.push(ofile);
out << "This is a gz file\n";
}
{
std::ifstream ifile("hello.gz", std::ios_base::in | std::ios_base::binary);
bo::filtering_streambuf<bo::input> in;
in.push(bo::gzip_decompressor());
in.push(ifile);
boost::iostreams::copy(in, std::cout);
}
}
You can also have a look at Boost Serialization - which can make saving your data much easier. It is possible to combine the two approaches (example). IOStreams support bzip compression as well.
EDIT: To address your last comment - you can compress an existing file... but it would be better to write it as compressed to begin with. If you really want, you could tweak the following code:
std::ifstream ifile("file", std::ios_base::in | std::ios_base::binary);
std::ofstream ofile("file.gz", std::ios_base::out | std::ios_base::binary);
bo::filtering_streambuf<bo::output> out;
out.push(bo::gzip_compressor());
out.push(ofile);
bo::copy(ifile, out);

Related

File object with fstream object is not created

I am trying to create a binary file as follows:
#include <iostream>
#include<fstream>
using namespace std;
int main()
{
cout<<"Hello World";
fstream fileObj = std::fstream("test_File.db", std::ios::in | std::ios::out | std::ios::binary);
if(fileObj)
std::cout<<"success";
else
std::cout<<"fail";
return 0;
}
But fileObj is not created and always else part is executed. Please guide if I am missing anything.
A stream opened with in | out | binary does not create a file that does not exist. You should get into the habit of reading the documentation!
Try in | out | app | binary (assuming you want existing contents to be kept; also get into the habit of clearly stating your goal/requirements).
And there is no need to initialise from a temporary like that; just instantiate the object in the usual manner, e.g.
std::fstream fileObj(
"test_File.db",
std::ios::in | std::ios::out | std::ios::app | std::ios::binary
);

fstream seekp() not working when the file is opened in ios::in and out mode

I want to replace some certain portion (in the middle) in a binary file. If I use ostream out("file.bin",ios::binary) ,it will delete the old file and creat a new one. But if I use fstream out("file.bin",ios::binary|ios::in|ios::out) ,seekp() will not go to the right place and tellp() always return -1. So is there any way to replace some certain portion in a file?
Thank you in advance.
You must open the stream with the at the end, in and out bits set:
std::fstream out("file.bin", ios::binary | ios_base::ate);
This will prevent your file to be reset at opening; then, using seekp and unformatted output functions you will be able to edit it in the middle.
This example outputs stackovstrlow, showing how to chain all steps together:
#include <fstream>
#include <string>
#include <vector>
#include <iostream>
int main()
{
// create the bin file
{
std::string str("stackoverflow\n");
std::ofstream file("file.bin", std::ios_base::binary);
file.write(str.c_str(), str.length() + 1);
}
// edit the bin file "in the middle"
{
std::fstream file("file.bin", std::ios_base::in | std::ios_base::out | std::ios_base::ate);
file.seekp(7);
file.write("str", 3);
}
// read and see what we've done
std::ifstream file("file.bin", std::ios_base::binary);
std::vector<char> v(14);
file.read(v.data(), 14);
std::string str(v.cbegin(), v.cend());
std::cout << str;
}
Seeking on file streams is supposed to work although not always. Notably, seeking does fail if the encoding used by the imbue()ed std::locale() is variable width. Quoting from 27.9.1.5 [filebuf.virtuals] paragraph 13:
Effects: Let width denote a_codecvt.encoding(). If is_open() == false, or off != 0 && width <= 0, then the positioning operation fails. ...
Assuming the file was opened OK, it would imply a std::locale with a non-fixed width encoding was used. The approach to avoid this issue is to use the C-locale before opening the file. For example:
std::fstream stream;
stream.imbue(std::locale::classic());
stream.open("file.bin", std::ios_base::binary | std::ios_base::in | std::ios_base::out);

Boost 1.59 not decompressing all bzip2 streams

I've been trying to decompress some .bz2 files on the fly and line-by-line so to speak as the files I'm dealing with are massive uncompressed (region of 100 GB uncompressed) so I wanted to add a solution that saves disk space.
I have no problems decompressing using files compressed with vanilla bzip2 but files compressed with pbzip2 only decompress the first bz2 stream it finds. This bugtracker relates to the problem: https://svn.boost.org/trac/boost/ticket/3853 but I was lead to believe it was fixed past version 1.41. I've checked the bzip2.hpp file and it contains the 'fixed' version and I've also checked that the version of Boost used in the program is 1.59.
The code is here:
cout<<"Warning bzip2 support is a little buggy!"<<endl;
//Open the file here
trans_file.open(files[i].c_str(), std::ios_base::in | std::ios_base::binary);
//Set up boost bzip2 compression
boost::iostreams::filtering_istream in;
in.push(boost::iostreams::bzip2_decompressor());
in.push(trans_file);
std::string str;
//Begin reading
while(std::getline(in, str))
{
std::stringstream stream(str);
stream>>id_f>>id_i>>aif;
/* Do stuff with values here*/
}
Any suggestions would be great. Thanks!
You are right.
It seems that changeset #63057 only fixes part of the issue.
The corresponding unit-test does work, though. But it uses the copy algorithm (also on a composite<> instead of a filtering_istream, if that is relevant).
I'd open this as a defect or a regression. Include a file that exhibits the problem, of course. For me it's reproduced using just /etc/dictionaries-common/words compressed with pbzip2 (default options).
I have the test.bz2 here: http://7f0d2fd2-af79-415c-ab60-033d3b494dc9.s3.amazonaws.com/test.bz2
Here's my test program:
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/stream.hpp>
#include <fstream>
#include <iostream>
namespace io = boost::iostreams;
void multiple_member_test(); // from the unit tests in changeset #63057
int main() {
//multiple_member_test();
//return 0;
std::ifstream trans_file("test.bz2", std::ios::binary);
//Set up boost bzip2 compression
io::filtering_istream in;
in.push(io::bzip2_decompressor());
in.push(trans_file);
//Begin reading
std::string str;
while(std::getline(in, str))
{
std::cout << str << "\n";
}
}
#include <boost/iostreams/compose.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <cassert>
#include <sstream>
void multiple_member_test() // from the unit tests in changeset #63057
{
std::string data(20ul << 20, '*');
std::vector<char> temp, dest;
// Write compressed data to temp, twice in succession
io::filtering_ostream out;
out.push(io::bzip2_compressor());
out.push(io::back_inserter(temp));
io::copy(boost::make_iterator_range(data), out);
out.push(io::back_inserter(temp));
io::copy(boost::make_iterator_range(data), out);
// Read compressed data from temp into dest
io::filtering_istream in;
in.push(io::bzip2_decompressor());
in.push(io::array_source(&temp[0], temp.size()));
io::copy(in, io::back_inserter(dest));
// Check that dest consists of two copies of data
assert(data.size() * 2 == dest.size());
assert(std::equal(data.begin(), data.end(), dest.begin()));
assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size() / 2));
dest.clear();
io::copy(
io::array_source(&temp[0], temp.size()),
io::compose(io::bzip2_decompressor(), io::back_inserter(dest)));
// Check that dest consists of two copies of data
assert(data.size() * 2 == dest.size());
assert(std::equal(data.begin(), data.end(), dest.begin()));
assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size() / 2));
}

read and write a binary file in c++ with fstream

I'm trying to write simple c++ code to read and write a file.
The problem is my output file is smaller than the original file, and I'm stuck finding the cause.
I have a image with 6.6 kb and my output image is about 6.4 kb
#include <iostream>
#include <fstream>
using namespace std;
ofstream myOutpue;
ifstream mySource;
int main()
{
mySource.open("im1.jpg", ios_base::binary);
myOutpue.open("im2.jpg", ios_base::out);
char buffer;
if (mySource.is_open())
{
while (!mySource.eof())
{
mySource >> buffer;
myOutpue << buffer;
}
}
mySource.close();
myOutpue.close();
return 1;
}
Why not just:
#include <fstream>
int main()
{
std::ifstream mySource("im1.jpg", std::ios::binary);
std::ofstream myOutpue("im2.jpg", std::ios::binary);
myOutpue << mySource.rdbuf();
}
Or, less chattily:
int main()
{
std::ofstream("im2.jpg", std::ios::binary)
<< std::ifstream("im1.jpg", std::ios::binary).rdbuf();
}
Two things: You forget to open the output in binary mode, and you can't use the input/output operator >> and << for binary data, except if you use the output operator to write the input-streams basic_streambuf (which you can get using rdbuf).
For input use read and for output use write.
There are 3 problems in your code:
1- You have not opened your output file in Binary.
2- Your code return "1", normally you should return "0", if something went wrong then return an error code.
3- You should use "manipulators" and make c++ not to avoid whitespaces, so in order to read from file instead of:
mySource >> buffer;
you should use:
mySource >> std:noskipws >> buffer;
Well, its just because of padding at the end of the image. eof of any file do not include the padded bytes added at the end of file.
Try this
take img1.jpg contains 20 space charecter at the end not visible here (uegfuyregwfyugwrerycgerfcg6ygerbucykgeugcrgfrgeyf ) and run your program (do not include parenthesis in the file, these are used to show the data content)
you will see img2.jpg contains (uegfuyregwfyugwrerycgerfcg6ygerbucykgeugcrgfrgeyf)
So, its better option to read the file byte by byte using the filesize which you can get using stat, and run for loop till filesize. Hope this should resolve your problem you mentioned above

How to resume appending data to a file at a specific position? (std::ostream, streampos, tellp/seekp)

I'm trying to append some data to a file, but in some cases want to skip back a bit from the end to overwrite the tail end of the file. However, neither seekp( pos ) nor seekp( offset, relative ) is having any effect for me (except complaining when using a negative offset). Am I using them incorrectly or are they broken?
A little example follows. Compiler: gcc version 4.4.4 (Debian 4.4.4-6)
#include <fstream>
#include <sstream>
#include <boost/date_time/posix_time/posix_time.hpp>
using namespace std;
using namespace boost::posix_time;
int main(int nargs, char** pargs){
if( nargs < 2 || nargs > 3 ){
cerr<<"Usage: "<<pargs[0]<<" file [pos]"<<endl;
return 1;
}
const char* fname = pargs[1];
ofstream f( fname, ios::binary|ios::out|ios::ate );
if( !f.good() ){
cerr<<"Unable to open file!"<<endl;
return 1;
}
if( nargs == 3 ){
istringstream offss(pargs[2]);
streamoff off;
offss >> off;
cout <<"Seeking to: "<<off<<endl;
f.seekp( off, ios_base::end ); // using beg or cur instead doesn't help, nor does: seekp( off )
if( f.fail() ){
cerr<<"Unable to seek!"<<endl;
f.clear(); // clear error flags
}
}
f<<"[appending some data at "<<second_clock::local_time()<<"]\n";
return 0;
}
Now, if I seek using a 0 offset, it should place the output-position at the end of the file and writes should append, right? Well, it's having no effect for me (osf was not previously empty):
> ./ostream_app_pos osf 0
Seeking to: 0
> cat osf
[appending some data at 2010-Jul-21 11:16:16]
The usual way of appending is to use ios::app. In this case, appending works, but trying to seek with a neg/pos offset has no effect, since (from gcc doc):
ios::app Seek to end of file before each write.
I also tried using neither ios::ate nor ios::app (presumably truncate mode), to the same effect as ios::ate.
Sorry if this reads rather like a bug report, but I wanted to check whether there's something I've got wrong here in the usage of seekp and get an idea of whether it's compiler-specific.
You need to open the file with both input and output attributes.
The following code doesn't have the usual error handling, it is just to illustrate a technique.
#include <iostream>
#include <fstream>
int main()
{
const char *szFname = "c:\\tmp\\tmp.txt";
std::fstream fs(szFname,
std::fstream::binary |
std::fstream::in |
std::fstream::out);
fs.seekp(13, std::fstream::beg);
fs << "123456789";
return 0;
}
================================================
C:\Dvl\Tmp>type c:\tmp\tmp.txt
abdcefghijklmnopqrstuvwxyz
C:\Dvl\Tmp>Test.exe
C:\Dvl\Tmp>type c:\tmp\tmp.txt
abdcefghijklm123456789wxyz
C:\Dvl\Tmp>