Suppose I have an ifstream which represents a large file containing lots of sub-files aggregated together. I want to be able to create a "sub" istream from the larger ifstream (given a size and offest) representing a part of the file so other code can read from that substream as if it was an independent istream.
Any ideas on how I might accomplish this?
EDIT
- I would prefer to avoid boost.
This is an example of a streambuf "filter" that reads from a contained streambuf starting at a specified location and reading up to a specified size. You create substreambuf, passing your original streambuf in and substreambuf then translates access so that everything is read from the desired location in the underlying streambuf.
Most of the overhead involved in calling sgetc and snextc from underflow and uflow should optimize away. Many extraction operators work byte by byte, so there should not be additional overhead beyond maintaining the read position within the subsection and checking for the end of the subsection. Of course, reading large chunks of data will be less efficient with this class (although that could be fixed).
This still needs improvements like testing that the requested location is within the underlying streambuf.
class substreambuf : public std::streambuf
{
public:
substreambuf(std::streambuf *sbuf, std::size_t start, std::size_t len) : m_sbuf(sbuf), m_start(start), m_len(len), m_pos(0)
{
std::streampos p = m_sbuf->pubseekpos(start);
assert(p != std::streampos(-1));
setbuf(NULL, 0);
}
protected:
int underflow()
{
if (m_pos + std::streamsize(1) >= m_len)
return traits_type::eof();
return m_sbuf->sgetc();
}
int uflow()
{
if (m_pos + std::streamsize(1) > m_len)
return traits_type::eof();
m_pos += std::streamsize(1);
return m_sbuf->sbumpc();
}
std::streampos seekoff(std::streamoff off, std::ios_base::seekdir way, std::ios_base::openmode which = std::ios_base::in | std::ios_base::out)
{
std::streampos cursor;
if (way == std::ios_base::beg)
cursor = off;
else if (way == std::ios_base::cur)
cursor = m_pos + off;
else if (way == std::ios_base::end)
cursor = m_len - off;
if (cursor < 0 || cursor >= m_len)
return std::streampos(-1);
m_pos = cursor;
if (m_sbuf->pubseekpos(m_start + m_pos, std::ios_base::beg) == std::streampos(-1))
return std::streampos(-1);
return m_pos;
}
std::streampos seekpos(std::streampos sp, std::ios_base::openmode which = std::ios_base::in | std::ios_base::out)
{
if (sp < 0 || sp >= m_len)
return std::streampos(-1);
m_pos = sp;
if (m_sbuf->pubseekpos(m_start + m_pos, std::ios_base::beg) == std::streampos(-1))
return std::streampos(-1);
return m_pos;
}
private:
std::streambuf *m_sbuf;
std::streampos m_start;
std::streamsize m_len;
std::streampos m_pos;
};
It can be used like this
using namespace std;
void somefunc(ifstream &bigifs)
{
substreambuf sbuf(bigifs.rdbuf(),100,100);
//new istream with the substreambuf as its streambuf
istream isub(&sbuf);
//use isub normally
}
This was inspired by Filtering Streambufs
I've done something like this using the Boost.Iostreams library. Look under Tutorial|Writing Devices. The idea is to create a "device" class which implements the low-level interface (read/write/seek) and then instantiate an istream/ostream derived class using your device class to do the actual I/O.
All iostreams put most of their custom logic in their streambuf specializations. fstream (or basic_fstream) initializes istream with an instance of file_buf. Same for stringstream (stringbuf). If you want to roll your own substream stream, you can do it by implementing your own streambuf in terms of a parent stream.
Just a little idea : If you have control over the client side of the code (i.e. the part that uses the input stream), I suggest you modify it to accept two additional parameters, like illustrated below :
// Old code
void ClassUsingInput::SetInput(std::streambuf & inputbuf)
{
// Implementation ...
}
Can become :
// New code
void ClassUsingInput::SetInput(std::streambuf & inputbuf, std::streampos position, std::streamsize size)
{
inputbuf.pubseekpos(position) ;
// internally use size to detect end-of-substream
}
Related
I am using a std::string to hold binary data read from a socket.
The data consists of messages beginning with a '$' and ending with a '#'. Each message may contain '\0' characters.
I use std::string::find() to find the location of the first message and extract it from the string using std::string::substr():
class MessageSplitter {
public:
MessageSplitter() { m_data.reserve(1'000'000); }
void appendBinaryData(const std::string& binaryData) {
m_data.append(bytes);
}
bool popMessage(std::string& msg) {
size_t beg_index = m_data.find("$");
if (beg_index == std::string::npos) {
return false;
}
size_t end_index = m_data.find("#", beg_index);
if (end_index == std::string::npos) {
return false;
}
size_t count = end_index - beg_index + end.size();
msg = m_data.substr(beg_index, count);
m_data = m_data.substr(end_index + end.size());
return true;
}
private:
std::string m_data;
};
I read from socket this way (error checking on recv omitted):
char buffer[4096];
int ret = ::recv(m_socket, buffer, 4096, 0);
std::string binaryData = std::string(buffer, ret);
This approach seems to work fine on Windows.
However is it guaranteed to work on other platforms according to the C++ standard?
This is perfectly safe from a language level. std::string is guaranteed to be able to handle non-printable characters including embedded nul characters just fine.
From a programmer's prospective though it's somewhat unsafe because it's surprising. When I see std::string I generally expect it to be printable text. It has an operator<< for example to make it easy to print to output streams, and I have to remember never to use that.
For the second reason, I would tend to prefer something more explicit. std::vector<std::byte> or std::vector<unsigned char> or similar. Something that doesn't act like text is much more difficult to accidentally treat as text.
I'm writing to the /dev interface of a hardware device on linux. The /dev interface is presented as a linux file, to talk to the device you simply read and write the file. I am using std c++ file wrappers std::fwrite and std::fread because i need access to the file underlying file descriptor for ioctl calls, which is not exposed with the prefered std::ofstream but i digress.
The issue is simple, a write followed by a read fails when using the std:: * calls. It appears to be an issue with fseek but I am unsure. With the fseek code as shown below, successive writes return as if they are a success but no data is written, without fseek code the std::fread call returns an error value. Curiously the linux file functions (write and read) work perfectly, without any fseek mess or anything at all. My question is WHY!?
Linux function version (works perfectly):
bool Write(const std::vector<T> &data)
{
if(write(GetFileDescriptor(),&data[0],sizeof(T) * data.size()) ==
sizeof(T) * data.size())
return true;
return false;
}
std::vector<T> Read(int CountOfT)
{
std::vector<T> buf(CountOfT);
if(read(GetFileDescriptor(), &buf[0], sizeof(T) * CountOfT) !=
sizeof(T) * CountOfT)
throw "stuff"; //i actually use std::optional
return buf;
}
STD Version (fails)
bool Write(const std::vector<T> &data)
{
if(std::fwrite(data.data(), sizeof(T), data.size(), m_fd.get()) <
data.size())
return false;
return true;
}
std::vector<T> Read(int CountOfT)
{
long fileoffset = std::ftell(m_fd.get()); //get current offset
std::fseek(m_fd.get(),0,SEEK_SET); //place offset at file start
std::vector<T> buf(CountOfT);
if(std::fread(&buf[0],sizeof(T),buf.size(),m_fd.get()) < CountOfT)
throw "stuff";
std::fseek(m_fd.get(),fileoffset,SEEK_SET); //reset to where it was
return buf;
}
I have to read some binary file in blocks of 8 bytes and then send those blocks by tcp socket.
Can I use C++ iterator for this task? Like:
FileIterator file("name_file.bin");
for(iter = file.begin(); iter != file.end(); iter++) {
sendTcp(iter);
}
Class FileIterator has to return some struct which will be sent.
In constructor of FileIterator I open binary file and read it. Then I create dinamic array and write in it file's content. And in each step iterator I have to read next block from array and write it in struct and return.
Yes you can!
You can use fstream with istream_iterator, like so:
auto f = std::ifstream("lol.bin", std::ios::binary | std::ios::in);
f.exceptions(std::ios::badbit);
for (auto start = std::istream_iterator<char>{ f }, end = std::istream_iterator<char>{}; start != end; ++start)
{
...
}
Edit:
I haven't notice you asked for 8 bytes block. The way you can solve it is like this:
First define an operator>> for example:
struct My8Bytes {
char bytes[8];
};
std::istream& operator>>(std::istream& s, My8Bytes& bytes) {
s.read(bytes.bytes, sizeof(bytes.bytes));
return s;
}
and than use the the iterator the same way as before, only now with your specific type:
for (auto start = std::istream_iterator<My8Bytes>{ f }, end = std::istream_iterator<My8Bytes>{}; start != end; ++start)
{
...
}
I see this as an X-Y problem. Yes, it can be done with an iterator, but iterators aren't the best fit solution for this job. Using an iterator for this is an interesting educational experience, but going old school solves this problem with almost zero fuss and much easier error resolution.
#include <iostream>
#include <fstream>
// simple 8 byte struct
struct EightByteStruct
{
uint32_t a;
uint32_t b;
};
// quick hack send routine. Added capacity for some simple error checking.
bool sendTcp(EightByteStruct & test)
{
bool rval = false;
// send test. Set rval true if success
return rval;
}
//main event: read file into struct, write struct to socket
int main()
{
std::ifstream in("filename", std::ios::binary);
EightByteStruct test;
while (in.read((char*)&test, sizeof(test)))
{ // will not enter if sizeof(test) bytes not read from file
if (sendTcp(test))
{
// handle send error
}
}
// test here for any file error conditions you wish to have special handling
}
I am trying to serialize a Plain Old Datastructure using ifstream and ofstream and I wasn't able to get it to work. I then tried to reduce my problem to an ultra basic serialization of just a char and int and even that didn't work. Clearly I'm missing something at a core fundamental level.
For a basic structure:
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(std::ofstream& ofs);
};
With serialize function:
void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
bool isError = (false == ofs.good());
if (false == isError)
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
}
Why would this fail with the following short program?
//ultra basic serialization test.
SerializeTestStruct* testStruct = new SerializeTestStruct();
testStruct->mCharVal = 'y';
testStruct->mIntVal = 9;
//write
std::string testFileName = "test.bin";
std::ofstream fileOut(testFileName.data());
fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct->Serialize(fileOut);
fileOut.flush();
fileOut.close();
delete testStruct;
//read
char * memblock;
std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
// get length of file:
fileIn.seekg (0, std::ifstream::end);
int length = fileIn.tellg();
fileIn.seekg (0, std::ifstream::beg);
// allocate memory:
memblock = new char [length];
fileIn.read(memblock, length);
fileIn.close();
// read data as a block:
SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();
delete[] testStruct2;
}
When I run through the code I notice that memblock has a "y" at the top so maybe it is working and it's just a problem with the placement new at the very end? After that placement new I end up with a SerializeTestStruct with values: 0, 0.
Here is how your stuff should read:
#include <fstream>
#include <string>
#include <stdexcept>
struct SerializeTestStruct
{
char mCharVal;
unsigned int mIntVal;
void Serialize(::std::ostream &os);
static SerializeTestStruct Deserialize(::std::istream &is);
};
void SerializeTestStruct::Serialize(std::ostream &os)
{
if (os.good())
{
os.write((char*)&mCharVal, sizeof(mCharVal));
os.write((char*)&mIntVal, sizeof(mIntVal));
}
}
SerializeTestStruct SerializeTestStruct::Deserialize(std::istream &is)
{
SerializeTestStruct retval;
if (is.good())
{
is.read((char*)&retval.mCharVal, sizeof(retval.mCharVal));
is.read((char*)&retval.mIntVal, sizeof(retval.mIntVal));
}
if (is.fail()) {
throw ::std::runtime_error("failed to read full struct");
}
return retval;
}
int main(int argc, const char *argv[])
{
//ultra basic serialization test.
// setup
const ::std::string testFileName = "test.bin";
// write
{
SerializeTestStruct testStruct;
testStruct.mCharVal = 'y';
testStruct.mIntVal = 9;
::std::ofstream fileOut(testFileName.c_str());
fileOut.open(testFileName.c_str(),
std::ofstream::binary|std::ofstream::out);
fileOut.clear();
testStruct.Serialize(fileOut);
}
// read
{
::std::ifstream fileIn (testFileName.c_str(),
std::ifstream::in|std::ifstream::binary);
if (fileIn.is_open())
{
SerializeTestStruct testStruct = \
SerializeTestStruct::Deserialize(fileIn);
::std::cout << "testStruct.mCharVal == '" << testStruct.mCharVal
<< "' && testStruct.mIntVal == " << testStruct.mIntVal
<< '\n';
}
}
return 0;
}
Style issues:
Don't use new to create things if you can help it. Stack allocated objects are usually what you want and significantly easier to manage than the arbitrary lifetime objects you allocate from the heap. If you do use new, consider using a smart pointer type of some kind to help manage the lifetime for you.
Serialization and deserialization code should be matched up so that they can be examined and altered together. This makes maintenance of such code much easier.
Rely on C++ to clean things up for you with destructors, that's what they're for. This means making basic blocks containing parts of your code if it the scopes of the variables used is relatively confined.
Don't needlessly use flags.
Mistakes...
Don't use the data member function of ::std::string.
Using placement new and that memory block thing is really bad idea because it's ridiculously complex. And if you did use it, then you do not use array delete in the way you did. And lastly, it won't work anyway for a reason explained later.
Do not use ofstream in the type taken by your Serialize function as it is a derived class who's features you don't need. You should always use the most base class in a hierarchy that has the features you need unless you have a very specific reason not to. Serialize works fine with the features of the base ostream class, so use that type instead.
The on-disk layout of your structure and the in memory layout do not match, so your placement new technique is doomed to fail. As a rule, if you have a serialize function, you need a matching deserialize function.
Here is a further explanation of your memory layout issue. The structure, in memory, on an x86_64 based Linux box looks like this:
+------------+-----------+
|Byte number | contents |
+============+===========+
| 0 | 0x79 |
| | (aka 'y') |
+------------+-----------+
| 1 | padding |
+------------+-----------+
| 3 | padding |
+------------+-----------+
| 4 | padding |
+------------+-----------+
| 5 | 9 |
+------------+-----------+
| 6 | 0 |
+------------+-----------+
| 7 | 0 |
+------------+-----------+
| 8 | 0 |
+------------+-----------+
The contents of the padding section are undefined, but generally 0. It doesn't matter though because that space is never used and merely exists so that access to the following int lies on an efficient 4-byte boundary.
The size of your structure on disk is 5 bytes, and is completely missing the padding sections. So that means when you read it into memory it won't line up properly with the in memory structure at all and accessing it is likely to cause some kind of horrible problem.
The first rule, if you need a serialize function, you need a deserialize function. Second rule, unless you really know exactly what you are doing, do not dump raw memory into a file. This will work just fine in many cases, but there are important cases in which it won't work. And unless you are aware of what does and doesn't work, and when it does or doesn't work, you will end up code that seems to work OK in certain test situations, but fails miserable when you try to use it in a real system.
My code still does dump memory into a file. And it should work as long as you read the result back on exactly the same architecture and platform with code compiled with the same version of the compiler as when you wrote it. As soon as one of those variables changes, all bets are off.
bool isError = (false == ofs.good());
if (false == isError)
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
change to
if ( ofs.good() )
{
ofs.write((char*)&mCharVal, sizeof(mCharVal));
ofs.write((char*)&mIntVal, sizeof(mIntVal));
}
I would do:
ostream & operator << ( ostream &os, const SerializeTestStruct &mystruct )
{
if ( ofs.good() )
{
os.write((char*)&mystruct.mCharVal, sizeof(mCharVal));
os.write((char*)&mystruct.mIntVal, sizeof(mIntVal));
}
return os;
}
The problem is here:
SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();
This will construct value-initialized object of type SerializeTestStruct in previously allocated memory. It will fill the memblock with zeros, since value-initialization is zero-initialization for POD-types (more info).
Here's fast fix for your code:
SerializeTestStruct* testStruct2 = new SerializeTestStruct;
fileIn.read( (char*)&testStruct2->mCharVal, sizeof(testStruct2->mCharVal) );
fileIn.read( (char*)&testStruct2->mIntVal, sizeof(testStruct2->mIntVal) );
fileIn.close();
// do some with testStruct2
// ...
delete testStruct2;
In my opinion, you need allow serialization to a buffer and not directly to a stream. Writing to a buffer allows for nested or inherited classes to write to memory, then the whole buffer can be written to the stream. Writing bits and pieces to the stream is not efficient.
Here is something I concocted, before I stopped writing binary data to streams:
struct Serialization_Interface
{
//! Returns size occupied on a stream.
/*! Note: size on the platform may be different.
* This method is used to allocate memory.
*/
virtual size_t size_on_stream(void) const = 0;
//! Stores the fields of the object to the given pointer.
/*! Pointer is incremented by the size on the stream.
*/
virtual void store_to_buffer(unsigned char *& p_buffer) const = 0;
//! Loads the object's fields from the buffer, advancing the pointer.
virtual void load_from_buffer(const unsigned char *& p_buffer) = 0;
};
struct Serialize_Test_Structure
: Serialization_Interface
{
char mCharVal;
int mIntVal;
size_t size_on_stream(void) const
{
return sizeof(mCharVal) + sizeof(mIntVal);
}
void store_to_buffer(unsigned char *& p_buffer) const
{
*p_buffer++ = mCharVal;
((int&)(*p_buffer)) = mIntVal;
p_buffer += sizeof(mIntVal);
return;
}
void load_from_buffer(const unsigned char *& p_buffer)
{
mCharVal = *p_buffer++;
mIntVal = (const int&)(*p_buffer);
p_buffer += sizeof(mIntVal);
return;
}
};
int main(void)
{
struct Serialize_Test_Struct myStruct;
myStruct.mCharVal = 'G';
myStruct.mIntVal = 42;
// Allocate a buffer:
unsigned char * buffer = new unsigned char[](myStruct.size_on_stream());
// Create output file.
std::ofstream outfile("data.bin");
// Does your design support this concept?
unsigned char * p_buffer = buffer;
myStruct.store_to_buffer(p_buffer);
outfile.write((char *) buffer, myStruct.size_on_stream());
outfile.close();
return 0;
}
I stopped writing binary data to streams in favor of textual data because textual data doesn't have to worry about Endianess or which IEEE floating point format is accepted by the receiving platform.
Am I the only one that finds this totally opaque:
bool isError = (false == ofs.good());
if (false == isError) {
// stuff
}
why not:
if ( ofs ) {
// stuff
}
How can I write only every third item in a char buffer to file quickly in C++?
I get a three-channel image from my camera, but each channel contains the same info (the image is grayscale). I'd like to write only one channel to disk to save space and make the writes faster, since this is part of a real-time, data collection system.
C++'s ofstream::write command seems to only write contiguous blocks of binary data, so my current code writes all three channels and runs too slowly:
char * data = getDataFromCamera();
int dataSize = imageWidth * imageHeight * imageChannels;
std::ofstream output;
output.open( fileName, std::ios::out | std::ios::binary );
output.write( data, dataSize );
I'd love to be able to replace the last line with a call like:
int skipSize = imageChannels;
output.write( data, dataSize, skipSize );
where skipSize would cause write to put only every third into the output file. However, I haven't been able to find any function that does this.
I'd love to hear any ideas for getting a single channel written to disk quickly.
Thanks.
You'll probably have to copy every third element into a buffer, then write that buffer out to disk.
You can use a codecvt facet on a local to filter out part of the output.
Once created you can imbue any stream with the appropraite local and it will only see every third character on the input.
#include <locale>
#include <fstream>
#include <iostream>
class Filter: public std::codecvt<char,char,mbstate_t>
{
public:
typedef std::codecvt<char,char,mbstate_t> MyType;
typedef MyType::state_type state_type;
typedef MyType::result result;
// This indicates that we are converting the input.
// Thus forcing a call to do_out()
virtual bool do_always_noconv() const throw() {return false;}
// Reads from -> from_end
// Writes to -> to_end
virtual result do_out(state_type &state,
const char *from, const char *from_end, const char* &from_next,
char *to, char *to_limit, char* &to_next) const
{
// Notice the increment of from
for(;(from < from_end) && (to < to_limit);from += 3,to += 1)
{
(*to) = (*from);
}
from_next = from;
to_next = to;
return((to > to_limit)?partial:ok);
}
};
Once you have the facet all you need is to know how to use it:
int main(int argc,char* argv[])
{
// construct a custom filter locale and add it to a local.
const std::locale filterLocale(std::cout.getloc(), new Filter());
// Create a stream and imbue it with the locale
std::ofstream saveFile;
saveFile.imbue(filterLocale);
// Now the stream is imbued we can open it.
// NB If you open the file stream first.
// Any attempt to imbue it with a local will silently fail.
saveFile.open("Test");
saveFile << "123123123123123123123123123123123123123123123123123123";
std::vector<char> data[1000];
saveFile.write( &data[0], data.length() /* The filter implements the skipSize */ );
// With a tinay amount of extra work
// You can make filter take a filter size
// parameter.
return(0);
}
Let's say your buffer is 24-bit RGB, and you're using a 32-bit processor (so that operations on 32-bit entities are the most efficient).
For the most speed, let's work with a 12-byte chunk at a time. In twelve bytes, we'll have 4 pixels, like so:
AAABBBCCCDDD
Which is 3 32-bit values:
AAAB
BBCC
CDDD
We want to turn this into ABCD (a single 32-bit value).
We can create ABCD by applying a mask to each input and ORing.
ABCD = A000 | 0BC0 | 000D
In C++, with a little-endian processor, I think it would be:
unsigned int turn12grayBytesInto4ColorBytes( unsigned int buf[3] )
{
return (buf[0]&0x000000FF) // mask seems reversed because of little-endianness
| (buf[1]&0x00FFFF00)
| (buf[2]&0xFF000000);
}
It's probably fastest to do this another conversion to another buffer and THEN dump to disk, instead of going directly to disk.
There is no such a functionality in the standardlibrary afaik. Jerry Coffin's solution will work best. I wrote a simple snippet which should do the trick:
const char * data = getDataFromCamera();
const int channelNum = 0;
const int channelSize = imageWidth * imageHeight;
const int dataSize = channelSize * imageChannels;
char * singleChannelData = new char[channelSize];
for(int i=0; i<channelSize ++i)
singleChannelData[i] = data[i*imageChannels];
try {
std::ofstream output;
output.open( fileName, std::ios::out | std::ios::binary );
output.write( singleChannelData, channelSize );
}
catch(const std::ios_base::failure& output_error) {
delete [] channelSize;
throw;
}
delete [] singleChannelData;
EDIT: i added try..catch. Of course you could aswell use a std::vector for nicer code, but it might be a tiny bit slower.
First, I'd mention that to maximize writing speed, you should write buffers that are multiples of the sector size (eg. 64KB or 256KB)
To answer your question, you're going to have to copy every 3rd element from your source data into another buffer, and then write that to the stream.
If I recall correctly Intel Performance Primitives has functions for copying buffers, skipping a certain number of elements. Using IPP will probably have faster results than your own copy routine.
I'm tempted to say that you should read your data into a struct and then overload the insertion operator.
ostream& operator<< (ostream& out, struct data * s) {
out.write(s->first);
}