Reading multiple files - c++

I want to alternate between reading multiple files. Below is a simplified version of my code.
ifstream* In_file1 = new ifstream("a.dat", ios::binary);
ifstream* In_file2 = new ifstream("b..dat", ios::binary);
ifstream* In_file;
int ID;
In_file = In_file1;
int ID = 0;
//LOOPING PORTION
if (In_file -> eof())
{
In_file -> seekg(0, ios_base::beg);
In_file->close();
switch (ID)
{
case 0:
In_file = In_file2; ID = 1; break;
case 1:
In_file = In_file1; ID = 0; break;
}
}
//some codes
:
:
In_file->read (data, sizeof(double));
//LOOPING PORTION
The code works well if I am reading the files one time and I thought that everything was cool. However, if the part termed 'looping portion' is within a loop, then the behaviour becomes weird and I start having a single repeating output. Please, can someone tell me what is wrong and how I can fix it? If you have a better method of tacking the problem, please suggest. I appreciate it.
//SOLVED
Thank you everybody for your comments, I appreciate it. Here is what I simple did:
Instead of the original
switch (ID)
{
case 0:
In_file = In_file2; ID = 1; break;
case 1:
In_file = In_file1; ID = 0; break;
}
I simply did
switch (ID)
{
case 0:
In_file = new ifstream("a.dat", ios::binary); ID = 1; break;
case 1:
In_file = new ifstream("b.dat", ios::binary); ID = 0; break;
}
Now it works like charm and I can loop as much as I want:-). I appreciate your comments, great to know big brother still helps.

Let's see: the code you posted works fine, and you want us to tell you
what's wrong with the code you didn't post. That's rather difficult.
Still, the code you posted probably doesn't work correctly either.
std::istream::eof can only be used reliably after an input (or some
other operation) has failed; in the code you've posted, it will almost
certainly be false, regardless.
In addition: there's no need to dynamically allocate ifstream; in
fact, there are almost no cases where dynamic allocation of ifstream
is appropriate. And you don't check that the opens have succeeded.
If you want to read two files, one after the other, the simplest way is
to use two loops, one after the other (calling a common function for
processing the data). If for some reason that's not appropriate, I'd
use a custom streambuf, which takes a list of filenames in the
constructor, and advances to the next whenever it reaches end of file on
one, only returning EOF when it has reached the end of all of the
files. (The only complication in doing this is what to do if one of the
opens fails. I do this often enough that it's part of my tool kit,
and I use a callback to handle failure. For a one time use, however,
you can just hard code in whatever is appropriate.)
As a quick example:
// We define our own streambuf, deriving from std::streambuf
// (All istream and ostream delegate to a streambuf for the
// actual data transfer; we'll use an instance of this to
// initialize the istream we're going to read from.)
class MultiFileInputStreambuf : public std::streambuf
{
// The list of files we will process
std::vector<std::string> m_filenames;
// And our current position in the list (actually
// one past the current position, since we increment
// it when we open the file).
std::vector<std::string>::const_iterator m_current;
// Rather than create a new filebuf for each file, we'll
// reuse this one, closing any previously open file, and
// opening a new file, as needed.
std::filebuf m_streambuf;
protected:
// This is part of the protocol for streambuf. The base
// class will call this function anytime it needs to
// get a character, and there aren't any in the buffer.
// This function can set up a buffer, if it wants, but
// in this case, the buffering is handled by the filebuf,
// so it's likely not worth the bother. (But this depends
// on the cost of virtual functions---without a buffer,
// each character read will require a virtual function call
// to get here.
//
// The protocol is to return the next character, or EOF if
// there isn't one.
virtual int underflow()
{
// Get one character from the current streambuf.
int result = m_streambuf.sgetc();
// As long as 1) the current streambuf is at end of file,
// and 2) there are more files to read, open the next file
// and try to get a character from it.
while ( result == EOF && m_current != m_filenames.eof() ) {
m_streambuf.close();
m_streambuf.open( m_current->c_str(), std::ios::in );
if ( !m_streambuf.is_open() )
// Error handling here...
++ m_current;
result = m_streambuf.sgetc();
}
// We've either gotten a character from the (now) current
// streambuf, or there are no more files, and we'll return
// the EOF from our last attempt at reading.
return result;
}
public:
// Use a template and two iterators to initialize the list
// of files from any STL sequence whose elements can be
// implicitly converted to std::string.
template<typename ForwardIterator>
MultiFileInputStreambuf(ForwardIterator begin, ForwardIterator end)
: m_filenames(begin, end)
, m_current(m_filenames.begin())
{
}
};

#include <iostream>
#include <fstream>
#include <string>
#define NO_OF_FILES 2
int main ()
{
std::ifstream in;
std::string line;
std::string files[NO_OF_FILES] =
{
"file1.txt",
"file2.txt",
};
// start our engine!
for (int i = 0; i < NO_OF_FILES; i++)
{
in.open(files[i].c_str(), std::fstream::in);
if (in.is_open())
{
std::cout << "reading... " << files[i] << endl;
while (in.good())
{
getline(in, line);
std::cout << line << std::endl;
}
in.close();
std::cout << "SUCCESS" << std::endl;
}
else
std::cout << "Error: unable to open " + files[i] << std::endl;
}
return 0;
}

Related

C++: buffer the cin istream

The problem is:
I have a code that operates on a fully functional istream. It uses methods like:
istream is;
is.seekg(...) // <--- going backwards at times
is.tellg() // <--- to save the position before looking forward
etc.
These methods are only available for istreams from, say, a file. However, if I use cin in this fashion, it will not work--cin does not have the option of saving a position, reading forward, then returning to the saved position.
// So, I can't cat the file into the program
cat file | ./program
// I can only read the file from inside the program
./program -f input.txt
// Which is the problem with a very, very large zipped file
// ... that cannot coexist on the same raid-10 drive system
// ... with the resulting output
zcat really_big_file.zip | ./program //<--- Doesn't work due to cin problem
./program -f really_big_file.zip //<--- not possible without unzipping
I can read cin into a deque, and process the deque. A 1mb deque buffer would be more than enough. However, this is problematic in three senses:
I have to rewrite everything to do this with a deque
It wont be as bulletproof as just using an istream, for which the code has already been debugged
It seems like, if I implement it as a deque with some difficulty, someone is going to come along and say, why didn't you just do it like ___
What is the proper/most efficient way to create a usable istream object, in the sense that all members are active, with a cin istream?
(Bearing in mind that performance is important)
You could create a filtering stream buffer reading from std::cin when getting new data but buffering all received characters. You'd be able to implement seeking within the buffered range of the input. Seeking beyond the end of the already buffered input would imply reading corresponding amounts of data. Here is an example of a corresponding implementation:
#include <iostream>
#include <vector>
class bufferbuf
: public std::streambuf {
private:
std::streambuf* d_sbuf;
std::vector<char> d_buffer;
int_type underflow() {
char buffer[1024];
std::streamsize size = this->d_sbuf->sgetn(buffer, sizeof(buffer));
if (size == 0) {
return std::char_traits<char>::eof();
}
this->d_buffer.insert(this->d_buffer.end(), buffer, buffer + size);
this->setg(this->d_buffer.data(),
this->d_buffer.data() + this->d_buffer.size() - size,
this->d_buffer.data() + this->d_buffer.size());
return std::char_traits<char>::to_int_type(*this->gptr());
}
pos_type seekoff(off_type off, std::ios_base::seekdir whence, std::ios_base::openmode) {
switch (whence) {
case std::ios_base::beg:
this->setg(this->eback(), this->eback() + off, this->egptr());
break;
case std::ios_base::cur:
this->setg(this->eback(), this->gptr() + off, this->egptr());
break;
case std::ios_base::end:
this->setg(this->eback(), this->egptr() + off, this->egptr());
break;
default: return pos_type(off_type(-1)); break;
}
return pos_type(off_type(this->gptr() - this->eback()));
}
pos_type seekpos(pos_type pos, std::ios_base::openmode) {
this->setg(this->eback(), this->eback() + pos, this->egptr());
return pos_type(off_type(this->gptr() - this->eback()));
}
public:
bufferbuf(std::streambuf* sbuf)
: d_sbuf(sbuf)
, d_buffer() {
this->setg(0, 0, 0); // actually the default setting
}
};
int main() {
bufferbuf sbuf(std::cin.rdbuf());
std::istream in(&sbuf);
std::streampos pos(in.tellg());
std::string line;
while (std::getline(in, line)) {
std::cout << "pass1: '" << line << "'\n";
}
in.clear();
in.seekg(pos);
while (std::getline(in, line)) {
std::cout << "pass2: '" << line << "'\n";
}
}
This implementation buffers input before passing it on to the reading step. You can read individual characters (e.g. change char buffer[1024]; to become char buffer[1]; or replace the use of sgetn() appropriately using sbumpc()) to provide a more direct response: there is a trade-off between immediate response and performance for batch processing.
cin is user input and should be treated as unpredictable. If you want to use mentioned functionality and you are sure about your input you can read whole input to istringstream and then operate on it

how to discard from streams? .ignore() doesnt work for this purpose, any other methods?

I have a lack of understanding about streams. The idea is, to read a file to the ifstream and then working with it. Extract Data from the stream to a string, and discard the part which is now in a string from the stream. Is that possible? Or how to handle those problems?
The following method, is for inserting a file which is properly read by the ifstream. (its a text file, containing informations about "Lost" episodes, its an episodeguide. It works fine, for one element of the class episodes. Every time i instantiate a episode file, i want to check the stream of that file, discard the informations about one episode (its indicated by "****", then the next episode starts) and process the informations discarded in a string. If I create a new object of Episode I want to discard the next informations about the episodes after "****" to the next "****" and so on.
void Episode::read(ifstream& in) {
string contents((istreambuf_iterator<char>(in)), istreambuf_iterator<char>());
size_t episodeEndPos = contents.find("****");
if ( episodeEndPos == -1) {
in.ignore(numeric_limits<char>::max());
in.clear(), in.sync();
fullContent = contents;
}
else { // empty stream for next episode
in.ignore(episodeEndPos + 4);
fullContent = contents.substr(0, episodeEndPos);
}
// fill attributes
setNrHelper();
setTitelHelper();
setFlashbackHelper();
setDescriptionHelper();
}
I tried it with inFile >> words (to read the words, this is a way to get the words out of the stream) another way i was thinking about is, to use .ignore (to ignore an amount of characters in the stream). But that doesnt work as intended. Sorry for my bad english, hopefully its clear what i want to do.
If your goal is at each call of Read() to read the next episode and advance in the file, then the trick is to to use tellg() and seekg() to bookmark the position and update it:
void Episode::Read(ifstream& in) {
streampos pos = in.tellg(); // backup current position
string fullContent;
string contents((istreambuf_iterator<char>(in)), istreambuf_iterator<char>());
size_t episodeEndPos = contents.find("****");
if (episodeEndPos == -1) {
in.ignore(numeric_limits<char>::max());
in.clear(), in.sync();
fullContent = contents;
}
else { // empty stream for next episode
fullContent = contents.substr(0, episodeEndPos);
in.seekg(pos + streamoff(episodeEndPos + 4)); // position file at next episode
}
}
In this way, you can call several time your function, every call reading the next episode.
However, please note that your approach is not optimised. When you construct your contents string from a stream iterator, you load the full rest of the file in the memory, starting at the current position in the stream. So here you keep on reading and reading again big subparts of the file.
Edit: streamlined version adapted to your format
You just need to read the line, check if it's not a separator line and concatenate...
void Episode::Read(ifstream& in) {
string line;
string fullContent;
while (getline(in, line) && line !="****") {
fullContent += line + "\n";
}
cout << "DATENSATZ: " << fullContent << endl; // just to verify content
// fill attributes
//...
}
The code you got reads the entire stream in one go just to use some part of the read text to initialize an object. Imagining a gigantic file that is almost certainly a bad idea. The easier approach is to just read until the end marker is found. In an ideal world, the end marker is easily found. Based on comments it seems to be on a line of its own which would make it quite easy:
void Episode::read(std::istream& in) {
std::string text;
for (std::string line; in >> line && line != "****"; ) {
text += line + "\n";
}
fullContent = text;
}
If the separate isn't on a line of its own, you could use code like this instead:
void Episode::read(std::istream& in) {
std::string text;
for (std::istreambuf_iterator<char> it(in), end; it != end; ++it) {
text.push_back(*it);
if (*it == '*' && 4u <= text.size() && text.substr(text.size() - 4) == "****") {
break;
}
if (4u <= text.size() && text.substr(text.size() - 4u) == "****") {
text.resize(text.size() - 4u);
}
fullContent = text;
}
Both of these approaches would simple read the file from start to end and consume the characters to be extracted in the process, stopping as soon as reading of one record is done.

Using seekg() in text mode

While trying to read in a simple ANSI-encoded text file in text mode (Windows), I came across some strange behaviour with seekg() and tellg(); Any time I tried to use tellg(), saved its value (as pos_type), and then seek to it later, I would always wind up further ahead in the stream than where I left off.
Eventually I did a sanity check; even if I just do this...
int main()
{
std::ifstream dataFile("myfile.txt",
std::ifstream::in);
if (dataFile.is_open() && !dataFile.fail())
{
while (dataFile.good())
{
std::string line;
dataFile.seekg(dataFile.tellg());
std::getline(dataFile, line);
}
}
}
...then eventually, further into the file, lines are half cut-off. Why exactly is this happening?
This issue is caused by libstdc++ using the difference between the current remaining buffer with lseek64 to determine the current offset.
The buffer is set using the return value of read, which for a text mode file on windows returns the number of bytes that have been put into the buffer after endline conversion (i.e. the 2 byte \r\n endline is converted to \n, windows also seems to append a spurious newline to the end of the file).
lseek64 however (which with mingw results in a call to _lseeki64) returns the current absolute file position, and once the two values are subtracted you end up with an offset that is off by 1 for each remaining newline in the text file (+1 for the extra newline).
The following code should display the issue, you can even use a file with a single character and no newlines due to the extra newline inserted by windows.
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
For a file with a single a character I get the following output
2 3
Clearly off by 1 for the first call to tellg. After the second call the file position is correct as the end has been reached after taking the extra newline into account.
Aside from opening the file in binary mode, you can circumvent the issue by disabling buffering
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f;
f.rdbuf()->pubsetbuf(nullptr, 0);
f.open("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
but this is far from ideal.
Hopefully mingw / mingw-w64 or gcc can fix this, but first we'll need to determine who would be responsible for fixing it. I suppose the base issue is with MSs implementation of lseek which should return appropriate values according to how the file has been opened.
Thanks for this , though it's a very old post. I was stuck on this problem for more then a week. Here's some code examples on my site (the menu versions 1 and 2). Version 1 uses the solution presented here, in case anyone wants to see it .
:)
void customerOrder::deleteOrder(char* argv[]){
std::fstream newinFile,newoutFile;
newinFile.rdbuf()->pubsetbuf(nullptr, 0);
newinFile.open(argv[1],std::ios_base::in);
if(!(newinFile.is_open())){
throw "Could not open file to read customer order. ";
}
newoutFile.open("outfile.txt",std::ios_base::out);
if(!(newoutFile.is_open())){
throw "Could not open file to write customer order. ";
}
newoutFile.seekp(0,std::ios::beg);
std::string line;
int skiplinesCount = 2;
if(beginOffset != 0){
//write file from zero to beginoffset and from endoffset to eof If to delete is non-zero
//or write file from zero to beginoffset if to delete is non-zero and last record
newinFile.seekg (0,std::ios::beg);
// if primarykey < largestkey , it's a middle record
customerOrder order;
long tempOffset(0);
int largestKey = order.largestKey(argv);
if(primaryKey < largestKey) {
//stops right before "current..." next record.
while(tempOffset < beginOffset){
std::getline(newinFile,line);
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
newinFile.seekg(endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while( std::getline(newinFile,line) ) {
newoutFile << line << std::endl;
}
} else if (primaryKey == largestKey){
//its the last record.
//write from zero to beginoffset.
while((tempOffset < beginOffset) && (std::getline(newinFile,line)) ) {
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
} else {
throw "Error in delete key"
}
} else {
//its the first record.
//write file from endoffset to eof
//works with endOffset - 4 (but why??)
newinFile.seekg (endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while(std::getline(newinFile,line)) {
newoutFile << line << std::endl;
}
}
newoutFile.close();
newinFile.close();
}
beginOffset is a specific point in the file (beginning of each record) , and endOffset is the end of the record, calculated in another function with tellg (findFoodOrder) I did not add this as it may become very lengthy, but you can find it on my site (under: menu version 1 link) :
http://www.buildincode.com

Parse buffered data line by line

I want to write a parser for Wavefront OBJ file format, plain text file.
Example can be seen here: people.sc.fsu.edu/~jburkardt/data/obj/diamond.obj.
Most people use old scanf to parse this format line by line, however I would prefer to load the whole file at once to reduce IO operation count. Is there a way to parse this kind of buffered data line by line?
void ObjModelConditioner::Import(Model& asset)
{
uint8_t* buffer = SyncReadFile(asset.source_file_info());
delete [] buffer;
}
Or would it be preferable to load whole file into a string and try to parse that?
After a while It seems I found sufficient (and simple) solution. Since my goal is to create asset conditioning pipeline, the code has to be able to handle large amounts of data efficiently. Data can be read into a string at once and once loaded, stringstream can be initialized with this string.
std::string data;
SyncReadFile(asset.source_file_info(), data);
std::stringstream data_stream(data);
std::string line;
Then I simply call getline():
while(std::getline(data_stream, line))
{
std::stringstream line_stream(line);
std::string type_token;
line_stream >> type_token;
if (type_token == "v") {
// Vertex position
Vector3f position;
line_stream >> position.x >> position.y >> position.z;
// ...
}
else if (type_token == "vn") {
// Vertex normal
}
else if (type_token == "vt") {
// Texture coordinates
}
else if (type_token == "f") {
// Face
}
}
Here's a function that splits a char array into a vector of strings (assuming each new string starts with '\n' symbol):
#include <iostream>
#include <vector>
std::vector< std::string >split(char * arr)
{
std::string str = arr;
std::vector< std::string >result;
int beg=0, end=0;//begining and end of each line in the array
while( end = str.find( '\n', beg + 1 ) )
{
if(end == -1)
{
result.push_back(str.substr(beg));
break;
}
result.push_back(str.substr(beg, end - beg));
beg = end;
}
return result;
}
Here's the usage:
int main()
{
char * a = "asdasdasdasdasd \n asdasdasd \n asdasd";
std::vector< std::string >result = split(a);
}
If you've got the raw data in a char[] (or a unsigned char[]), and
you know its length, it's pretty trivial to write an input only, no seek
supported streambuf which will allow you to create an std::istream
and to use std::getline on it. Just call:
setg( start, start, start + length );
in the constructor. (Nothing else is needed.)
It really depends on how you're going to parse the text. One way to do this would be simply to read the data into a vector of strings. I'll assume that you've already covered issues such as scaleability / use of memory etc.
std::vector<std::string> lines;
std::string line;
ifstream file(filename.c_str(), ios_base::in);
while ( getline( file, line ) )
{
lines.push_back( line );
}
file.close();
This would cache your file in lines. Next you need to go through lines
for ( std::vector<std::string>::const_iterator it = lines.begin();
it != lines.end(); ++it)
{
const std::string& line = *it;
if ( line.empty() )
continue;
switch ( line[0] )
{
case 'g':
// Some stuff
break;
case 'v':
// Some stuff
break;
case 'f':
// Some stuff
break;
default:
// Default stuff including '#' (probably nothing)
}
}
Naturally, this is very simplistic and depends largely on what you want to do with your file.
The size of the file that you've given as an example is hardly likely to cause IO stress (unless you're using some very lightweight equipment) but if you're reading many files at once I suppose it might be an issue.
I think your concern here is to minimise IO and I'm not sure that this solution will really help that much since you're going to be iterating over a collection twice. If you need to go back and keep reading the same file over and over again, then it will definitely speed things up to cache the file in memory but there are just as easy ways to do this such as memory mapping a file and using normal file accessing. If you're really concerned, then try profiling a solution like this against simply processing the file directly as you read from IO.

How to find specific string constant in line and copy the following

I am creating a somewhat weak/vague database (My experience is very little, and please forgive the mess of my code). For this, I create a check everytime my console program starts that checks whether a database (copied to userlist.txt) is created already, if not a new will be created, if the database exists, however, it should all be copied to a 'vector users' (Which is a struct) I have within the class 'userbase' that will then contain all user information.
My userstats struct looks like this,
enum securityLevel {user, moderator, admin};
struct userstats
{
string ID;
string name;
string password;
securityLevel secLev;
};
I will contain all this information from a textfile in this code,
int main()
{
Userbase userbase; // Class to contain userinformation during runtime.
ifstream inFile;
inFile.open("userlist.txt");
if(inFile.good())
{
// ADD DATE OF MODIFICATION
cout << "USERLIST FOUND, READING USERS.\n";
userstats tempBuffer;
int userCount = -1;
int overCount = 0;
while(!inFile.eof())
{
string buffer;
getline(inFile, buffer);
if (buffer == "ID:")
{
userCount++;
if (userCount > overCount)
{
userbase.users.push_back(tempBuffer);
overCount++;
}
tempBuffer.ID = buffer;
cout << "ID"; // Just to see if works
}
else if (buffer == "name:")
{
cout << "name"; // Just to see if works
tempBuffer.name = buffer;
}
else if (buffer == "password:")
{
cout << "password"; // Just to see if works
tempBuffer.password = buffer;
}
}
if (userCount == 0)
{
userbase.users.push_back(tempBuffer);
}
inFile.close();
}
...
What I try to do is to read and analyze every line of the text file. An example of the userlist.txt could be,
created: Sun Apr 15 22:19:44 2012
mod_date: Sun Apr 15 22:19:44 2012
ID:1d
name:admin
password:Admin1
security level:2
(I am aware I do not read "security level" into the program yet)
EDIT: There could also be more users simply following the "security level:x"-line of the preceding user in the list.
Now, if the program reads the line "ID:1d" it should then copy this into the struct and finally I will put it all into the vector userbase.users[i]. This does not seem to work, however. It does not seem to catch on to any of the if-statements. I've gotten this sort of program to work before, so I am very confused what I am doing wrong. I could really use some help with this. Any other kind of criticism of the code is very welcome.
Regards,
Mikkel
None of the if (buffer == ...) will ever be true as the lines always contain the value of the attribute contained on each line as well as the type of the attribute. For example:
ID:1d
when getline() reads this buffer will contain ID:1d so:
if (buffer == "ID:")
will be false. Use string.find() instead:
if (0 == buffer.find("ID:")) // Comparing to zero ensures that the line
{ // starts with "ID:".
// Avoid including the attribute type
// in the value.
tempBuffer.ID.assign(buffer.begin() + 3, buffer.end());
}
As commented by jrok, the while for reading the file is incorrect as no check is made immediately after getline(). Change to:
string buffer;
while(getline(inFile, buffer))
{
...