In my c++ code I need to write a lot of data into a file and I would like to use the boost mapped file instead of using normal file. Only when I finish writing all the data in memory I would like to dump the mapped file to the disk on one shot.
I use Visual Studio 2010 on Windows Server 2008 R2 and boost 1.58.
I've never used mapped file so I tried to compile the example on the boost documentation
#include <iostream>
#include <fstream>
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
int main(int argc, char** argv)
{
using namespace boost::interprocess;
const char* fileName = "C:\\logAcq\\test.bin";
const std::size_t fileSize = 10000;
std::cout << "create file" << std::endl;
try
{
file_mapping::remove(fileName);
std::filebuf fbuf;
fbuf.open(fileName, std::ios_base::in | std::ios_base::out | std::ios_base::trunc | std::ios_base::binary);
std::cout << "set size" << std::endl;
fbuf.pubseekoff(fileSize-1, std::ios_base::beg);
fbuf.sputc(0);
std::cout << "remove on exit" << std::endl;
struct file_remove
{
file_remove(const char* fileName)
:fileName_(fileName) {}
~file_remove(){ file_mapping::remove(fileName_); }
const char *fileName_;
}remover(fileName);
std::cout << "create file mapping" << std::endl;
file_mapping m_file(fileName, read_write);
std::cout << "map the whole file" << std::endl;
mapped_region region(m_file, read_write);
std::cout << "get the address" << std::endl;
void* addr = region.get_address();
std::size_t size = region.get_size();
std::cout << "write all memory to 1" << std::endl;
memset(addr, 1, size);
}
catch (interprocess_exception &ex)
{
fprintf(stderr, "Exception %s\n", ex.what());
fflush(stderr);
system("PAUSE");
return 0;
}
system("PAUSE");
return 0;
}
but I get the exception
Exception The volume for a file has been externally altered so that the opened file is no longer valid.
when I create the region
"mapped_region region(m_file, read_write)"
Any help is appreciate.
Thanks
Exception The volume for a file has been externally altered so that the opened file is no longer valid.
Strongly suggests that the file is changed by another program, while it was mapped. And the error message indicates the change happened to affect the size in such a way that is not allowed.
Avoid other programs writing to the file, or have proper synchronization and sharing precautions (like, don't change the size, or only grow etc.)
UPDATE
Your added SSCCE confirms that you held the file open while mapping:
You need to close the fbuf before mapping the file. Also, you need to remove the mapping before allowing it to be removed.
Working sample:
Live On Coliru
#include <iostream>
#include <fstream>
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
int main() {
using namespace boost::interprocess;
const char *fileName = "test.bin";
const std::size_t fileSize = 10000;
std::cout << "create file " << fileName << std::endl;
try {
file_mapping::remove(fileName);
{
std::filebuf fbuf;
fbuf.open(fileName, std::ios_base::in | std::ios_base::out | std::ios_base::trunc | std::ios_base::binary);
std::cout << "set size" << std::endl;
fbuf.pubseekoff(fileSize - 1, std::ios_base::beg);
fbuf.sputc(0);
}
std::cout << "remove on exit" << std::endl;
struct file_remove {
file_remove(const char *fileName) : fileName_(fileName) {}
~file_remove() { file_mapping::remove(fileName_); }
const char *fileName_;
} remover(fileName);
{
std::cout << "create file mapping" << std::endl;
file_mapping m_file(fileName, read_write);
std::cout << "map the whole file" << std::endl;
mapped_region region(m_file, read_write);
std::cout << "get the address" << std::endl;
void *addr = region.get_address();
std::size_t size = region.get_size();
std::cout << "write all memory to 1" << std::endl;
memset(addr, 1, size);
}
} catch (interprocess_exception &ex) {
fprintf(stderr, "Exception %s\n", ex.what());
fflush(stderr);
}
}
Related
I'm attempting to write a simple program to extract some data from a bunch of AVRO files. The schema for each file may be different so I would like to read the files generically (i.e. without having to pregenerate and then compile in the schema for each) using the C++ interface.
I have been attempting to follow the generic.cc example but it assumes a separate schema where I would like to read the schema from each AVRO file.
Here is my code:
#include <fstream>
#include <iostream>
#include "Compiler.hh"
#include "DataFile.hh"
#include "Decoder.hh"
#include "Generic.hh"
#include "Stream.hh"
const std::string BOLD("\033[1m");
const std::string ENDC("\033[0m");
const std::string RED("\033[31m");
const std::string YELLOW("\033[33m");
int main(int argc, char**argv)
{
std::cout << "AVRO Test\n" << std::endl;
if (argc < 2)
{
std::cerr << BOLD << RED << "ERROR: " << ENDC << "please provide an "
<< "input file\n" << std::endl;
return -1;
}
avro::DataFileReaderBase dataFile(argv[1]);
auto dataSchema = dataFile.dataSchema();
// Write out data schema in JSON for grins
std::ofstream output("data_schema.json");
dataSchema.toJson(output);
output.close();
avro::DecoderPtr decoder = avro::binaryDecoder();
auto inStream = avro::fileInputStream(argv[1]);
decoder->init(*inStream);
avro::GenericDatum datum(dataSchema);
avro::decode(*decoder, datum);
std::cout << "Type: " << datum.type() << std::endl;
return 0;
}
Everytime I run the code, no matter what file I use, I get this:
$ ./avrotest twitter.avro
AVRO Test
terminate called after throwing an instance of 'avro::Exception'
what(): Cannot have negative length: -40 Aborted
In addition to my own data files, I have tried using the data files located here: https://github.com/miguno/avro-cli-examples, with the same result.
I tried using the avrocat utility on all of the same files and it works fine. What am I doing wrong?
(NOTE: outputting the data schema for each file in JSON works correctly as expected)
After a bunch more fooling around, I figured it out. You're supposed to use DataFileReader templated with GenericDatum. With the end result being something like this:
#include <fstream>
#include <iostream>
#include "Compiler.hh"
#include "DataFile.hh"
#include "Decoder.hh"
#include "Generic.hh"
#include "Stream.hh"
const std::string BOLD("\033[1m");
const std::string ENDC("\033[0m");
const std::string RED("\033[31m");
const std::string YELLOW("\033[33m");
int main(int argc, char**argv)
{
std::cout << "AVRO Test\n" << std::endl;
if (argc < 2)
{
std::cerr << BOLD << RED << "ERROR: " << ENDC << "please provide an "
<< "input file\n" << std::endl;
return -1;
}
avro::DataFileReader<avro::GenericDatum> reader(argv[1]);
auto dataSchema = reader.dataSchema();
// Write out data schema in JSON for grins
std::ofstream output("data_schema.json");
dataSchema.toJson(output);
output.close();
avro::GenericDatum datum(dataSchema);
while (reader.read(datum))
{
std::cout << "Type: " << datum.type() << std::endl;
if (datum.type() == avro::AVRO_RECORD)
{
const avro::GenericRecord& r = datum.value<avro::GenericRecord>();
std::cout << "Field-count: " << r.fieldCount() << std::endl;
// TODO: pull out each field
}
}
return 0;
}
Perhaps an example like this should be included with libavro...
I am using the boost gzip_decompressor() from the following link:
How can I read line-by-line using Boost IOStreams' interface for Gzip files?
Reading the gzip file works fine, but how do I read the gzip_params? I want to know the original file name that's stored in the gzip_params.file_name.
Excellent question.
The solution is to use component<N, T> to get a pointer to the actual decompressor instance:
Live On Coliru
#include <iostream>
#include <fstream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
int main()
{
std::ifstream file("file.gz", std::ios_base::in | std::ios_base::binary);
try {
boost::iostreams::filtering_istream in;
using gz_t = boost::iostreams::gzip_decompressor;
in.push(gz_t());
in.push(file);
for(std::string str; std::getline(in, str); )
{
std::cout << "Processed line " << str << '\n';
}
if (gz_t* gz = in.component<0, gz_t>()) {
std::cout << "Original filename: " << gz->file_name() << "\n";
std::cout << "Original mtime: " << gz->mtime() << "\n";
std::cout << "Zip comment: " << gz->comment() << "\n";
}
}
catch(const boost::iostreams::gzip_error& e) {
std::cout << e.what() << '\n';
}
}
Preparing a sample file using
gzip testj.txt
mv testj.txt.gz file.gz
Prints
Processed line Hello world
Original filename: testj.txt
Original mtime: 1518987084
Zip comment:
ifstream fin;
fin.open("C:\\Users\\Zach\\Desktop\\input.txt");
if (!fin)
{
cout << "e";
}
e is printing whether I use the full pathway or just input.txt from a resource file
If the file exists, make sure that you have got the path specified correctly. Since you're running on Windows, you can verify the full path to your executable with the following code.
#include <iostream>
#include <fstream>
#include <string>
#include <windows.h>
#define BUFSIZE 4096
std::string getExePath()
{
char result[BUFSIZE];
return std::string(result, GetModuleFileName(NULL, result, BUFSIZE));
}
int main()
{
std::ifstream infile("input.txt");
if (infile.is_open())
{
std::cout << "Success!" << std::endl;
infile.close();
}
else
{
std::cout << "Failed to open input.txt!" << std::endl;
std::cout << "Executable path is ->" << getExePath() << "<-" << std::endl;
}
return 0;
}
This will allow you to verify that your path to the input file is correct, assuming that it's collocated with your executable.
You need to direct output into the ifstream object by using fin << "string"; and not directing to standard out via cout.
This is about C ++ library boost.
The managed_mapped_file :: shrink_to_fit function works differently on Linux and Windows.
On Linux, this function succeeds even if the target instance exists.
However, on Windows, this function will fail if the target instance exists.
Is this correct behavior?
It seems correct to do the same behavior, is this a bug?
I put the sample code below.
Compilation environment
boost:version.1.65.1
Windows
VisualStudio2017
WSL(Ubuntu16.04)
Linux
UbuntuServer17.10,
Clang++5.0,
g++7.2.0
Compile with
clang++-5.0 -std=c++1z ./test.cpp -o test -lpthread
#define BOOST_DATE_TIME_NO_LIB
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <vector>
#include <iostream>
namespace bip = boost::interprocess;
using intAlloc = bip::allocator<int, bip::managed_mapped_file::segment_manager>;
using intVec = std::vector<int, intAlloc>;
int main() {
bip::managed_mapped_file *p_file_vec;
intVec *vecObj;
std::string fileName = "tmp.dat";
size_t fileSize = 1024 * 1024 * 1;
bip::file_mapping::remove(fileName.c_str());
p_file_vec = new bip::managed_mapped_file(bip::create_only, fileName.c_str(), fileSize);
vecObj = p_file_vec->construct<intVec>("myVecName")(p_file_vec->get_allocator<int>());
for (size_t i = 0; i < 10; i++)
{
vecObj->push_back(1 + 100);
}
p_file_vec->flush();
try
{ //Fail when execute on Windows(WSL),but Success on Linux(Ubuntu17.10).
std::cout << "try to shrink:pointer has existed yet!" << std::endl;
bip::managed_mapped_file::shrink_to_fit(fileName.c_str());
std::cout << "success to shrink!" << std::endl;
}
catch (const boost::interprocess::interprocess_exception &ex)
{
std::cerr << "fail to shrink!" << std::endl;
std::cerr << ex.what() << std::endl;;
}
std::cout <<"please pless enter key."<< std::endl;
std::cin.get();
try
{ //Success when execute on Windows(WSL) and Linux(Ubuntu17.10).
delete p_file_vec;
std::cout << "try to shrink:pointer has deleted!" << std::endl;
bip::managed_mapped_file::shrink_to_fit(fileName.c_str());
std::cout << "success to shrink!" << std::endl;
}
catch (const std::exception& ex)
{
std::cerr << "fail to shrink!" << std::endl;
std::cerr << ex.what() << std::endl;;
}
std::cout << "please pless enter key." << std::endl;
std::cin.get();
}
Don't use new and delete in C++ (rule of thumb).
Apart from that
delete p_file_vec;
does NOT delete anything physical. It effectively disconnects from the mapped file. This is also why shrink_to_fit works: the documentation explicitly says:
If the application can find a moment where no process is attached it can grow or shrink to fit the managed segment.
And here
So, in short: the behaviour is correct on both platforms. It's just UNDEFINED what happens in your case when you shrink while the mapped file is in use (on Ubuntu).
Fixed Code:
Live On Coliru
#include <boost/interprocess/managed_mapped_file.hpp>
#include <iostream>
#include <vector>
namespace bip = boost::interprocess;
using intAlloc = bip::allocator<int, bip::managed_mapped_file::segment_manager>;
using intVec = std::vector<int, intAlloc>;
int main() {
std::string const fileName = "tmp.dat";
bip::file_mapping::remove(fileName.c_str());
{
bip::managed_mapped_file file_vec(bip::create_only, fileName.c_str(), 1l << 20);
auto *vecObj = file_vec.construct<intVec>("myVecName")(file_vec.get_allocator<int>());
for (size_t i = 0; i < 10; i++) {
vecObj->push_back(1 + 100);
}
}
try { // Success when execute on Windows(WSL) and Linux(Ubuntu17.10).
std::cout << "try to shrink:pointer has deleted!" << std::endl;
bip::managed_mapped_file::shrink_to_fit(fileName.c_str());
std::cout << "success to shrink!" << std::endl;
} catch (const std::exception &ex) {
std::cerr << "fail to shrink!" << std::endl;
std::cerr << ex.what() << std::endl;
;
}
}
I am trying a reasonably simple program to test binary input/output. I am basically writing a file with a header (string) and some data (doubles). The code is as follows:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
int main() {
typedef std::ostream_iterator<double> oi_t;
typedef std::istream_iterator<double> ii_t;
std::ofstream ofs("data.bin", std::ios::in);
//-If file doesn't exist, create a new one now
if(!ofs) {
ofs.open("data.bin", std::ios::out|std::ios::binary|std::ios::app);
}
else {
ofs.close();
ofs.open("data.bin", std::ios::out|std::ios::binary|std::ios::app);
}
//-Write a header consisting of length of grid subdomain and its name
///*
const std::string grid = "Header";
unsigned int olen = grid.size();
ofs.write(reinterpret_cast<const char*>(&olen), sizeof(olen));
ofs.write(grid.c_str(), olen);
//*/
//-Now write the data
///*
std::vector<double> data_out;
//std::vector<std::pair<int, int> > cell_ids;
for(int i=0; i<100; ++i) {
data_out.push_back(5.0*double(i) + 100.0);
}
ofs << std::setprecision(4);
std::copy(data_out.begin(), data_out.end(), oi_t(ofs, " "));
//*/
ofs.close();
//-Now read the binary file; first header then data
std::ifstream ifs("data.bin", std::ios::binary);
///*
unsigned int ilen;
ifs.read(reinterpret_cast<char*>(&ilen), sizeof(ilen));
std::string header;
if(ilen > 0) {
char* buf = new char[ilen];
ifs.read(buf,ilen);
header.append(buf,ilen);
delete[] buf;
}
std::cout << "Read header: " << header << "\n";
//*/
///*
std::vector<double> data_in;
ii_t ii(ifs);
std::copy(ii, ii_t(), std::back_inserter(data_in));
std::cout << "Read data size: " << data_in.size() << "\n";
//*/
ifs.close();
//-Check the result
///*
for(int i=0; i < data_out.size(); ++i) {
std::cout << "Testing input/output element #" << i << " : "
<< data_out[i] << " " << data_in[i] << "\n";
}
std::cout << "Element sizes: " << data_out.size() << " " << data_in.size() <<
"\n";
//*/
return 0;
}
The problem is that when I try to write and read (and then print) both the header and the data it fails (I confirmed that it doesn't read the data then, but displays the header correctly). But when I comment out one of the write sections (header and/or data), it displays that part correctly indicating the read worked. I am sure I am not doing the read properly. Perhaps I am missing the usage of seekg somewhere.
The code runs fine for me. However you never check if the file is successfully opened for writing, so it could be silently failing on your system. After you open ofs you should add
if (!ofs) {
std::cout << "Could not open file for writing" << std::endl;
return 1;
}
And the same thing after you open ifs
if (!ifs) {
std::cout << "Could not open file for reading" << std::endl;
return 1;
}
Or something along those lines. Also I do not understand why you check if the file exists first since you do the same whether it exists or not.
This should work
#include <iostream>
using std::cout;
using std::cerr;
using std::cin;
using std::endl;
#include <fstream>
using std::ifstream;
#include <cstdint>
int main() {
ifstream fin;
fin.open("input.dat", std::ios::binary | std::ios::in);
if (!fin) {
cerr << "Cannot open file " << "input.dat" << endl;
exit(1);
}
uint8_t input_byte;
while (fin >> input_byte) {
cout << "got byte " << input_byte << endl;
}
return 0;
}