Skip reading a line in a INI file if its length greater than n in C++ - c++

I want to skip reading a line in the INI file if has more than 1000 characters.This is the code i'm using:
#define MAX_LINE 1000
char buf[MAX_LINE];
CString strTemp;
str.Empty();
for(;;)
{
is.getline(buf,MAX_LINE);
strTemp=buf;
if(strTemp.IsEmpty()) break;
str+=strTemp;
if(str.Find("^")>-1)
{
str=str.Left( str.Find("^") );
do
{
is.get(buf,2);
} while(is.gcount()>0);
is.getline(buf,2);
}
else if(strTemp.GetLength()!=MAX_LINE-1) break;
}
//is.getline(buf,MAX_LINE);
return is;
...
The problem i'm facing is that if the characters exceed 1000 if seems to fall in a infinite loop(unable to read next line).How can i make the getline to skip that line and read the next line??

const std::size_t max_line = 1000; // not a macro, macros are disgusting
std::string line;
while (std::getline(is, line))
{
if (line.length() > max_line)
continue;
// else process the line ...
}

How abut checking the return value of getline and break if that fails?
..or if is is an istream, you could check for an eof() condition to break you out.
#define MAX_LINE 1000
char buf[MAX_LINE];
CString strTemp;
str.Empty();
while(is.eof() == false)
{
is.getline(buf,MAX_LINE);
strTemp=buf;
if(strTemp.IsEmpty()) break;
str+=strTemp;
if(str.Find("^")>-1)
{
str=str.Left( str.Find("^") );
do
{
is.get(buf,2);
} while((is.gcount()>0) && (is.eof() == false));
stillReading = is.getline(buf,2);
}
else if(strTemp.GetLength()!=MAX_LINE-1)
{
break;
}
}
return is;

For something completely different:
std::string strTemp;
str.Empty();
while(std::getline(is, strTemp)) {
if(strTemp.empty()) break;
str+=strTemp.c_str(); //don't need .c_str() if str is also a std::string
int pos = str.Find("^"); //extracted this for speed
if(pos>-1){
str=str.Left(pos);
//Did not translate this part since it was buggy
} else
//not sure of the intent here either
//it would stop reading if the line was less than 1000 characters.
}
return is;
This uses strings for ease of use, and no maximum limits on lines. It also uses the std::getline for the dynamic/magic everything, but I did not translate the bit in the middle since it seemed very buggy to me, and I couldn't interpret the intent.
The part in the middle simply reads two characters at a time until it reaches the end of the file, and then everything after that would have done bizarre stuff since you weren't checking return values. Since it was completely wrong, I didn't interpret it.

Related

Reading from FileStream with arbitrary delimiter

I have encountered a problem to read msg from a file using C++. Usually what people does is create a file stream then use getline() function to fetch msg. getline() function can accept an additional parameter as delimiter so that it return each "line" separated by the new delimiter but not default '\n'. However, this delimiter has to be a char. In my usecase, it is possible the delimiter in the msg is something else like "|--|", so I try to get a solution such that it accept a string as delimiter instead of a char.
I have searched StackOverFlow a little bit and found some interesting posts.
Parse (split) a string in C++ using string delimiter (standard C++)
This one gives a solution to use string::find() and string::substr() to parse with arbitrary delimiter. However, all the solutions there assumes input is a string instead of a stream, In my case, the file stream data is too big/waste to fit into memory at once so it should read in msg by msg (or a bulk of msg at once).
Actually, read through the gcc implementation of std::getline() function, it seems it is much more easier to handle the case delimiter is a singe char. Since every time you load in a chunk of characters, you can always search the delimiter and separate them. While it is different if you delimiter is more than one char, the delimiter itself may straddle between two different chunks and cause many other corner cases.
Not sure whether anyone else has faced this kind of requirement before and how you guys handled it elegantly. It seems it would be nice to have a standard function like istream& getNext (istream&& is, string& str, string delim)? This seems to be a general usecase to me. Why not this one is in Standard lib so that people no longer to implement their own version separately?
Thank you very much
The STL simply does not natively support what you are asking for. You will have to write your own function (or find a 3rd party function) that does what you need.
For instance, you can use std::getline() to read up to the first character of your delimiter, and then use std::istream::get() to read subsequent characters and compare them to the rest of your delimiter. For example:
std::istream& my_getline(std::istream &input, std::string &str, const std::string &delim)
{
if (delim.empty())
throw std::invalid_argument("delim cannot be empty!");
if (delim.size() == 1)
return std::getline(input, str, delim[0]);
str.clear();
std::string temp;
char ch;
bool found = false;
do
{
if (!std::getline(input, temp, delim[0]))
break;
str += temp;
found = true;
for (int i = 1; i < delim.size(); ++i)
{
if (!input.get(ch))
{
if (input.eof())
input.clear(std::ios_base::eofbit);
str.append(delim.c_str(), i);
return input;
}
if (delim[i] != ch)
{
str.append(delim.c_str(), i);
str += ch;
found = false;
break;
}
}
}
while (!found);
return input;
}
if you are ok with reading byte by byte, you could build a state transition table implementation of a finite state machine to recognize your stop condition
std::string delimeter="someString";
//initialize table with a row per target string character, a column per possible char and all zeros
std::vector<vector<int> > table(delimeter.size(),std::vector<int>(256,0));
int endState=delimeter.size();
//set the entry for the state looking for the next letter and finding that character to the next state
for(unsigned int i=0;i<delimeter.size();i++){
table[i][(int)delimeter[i]]=i+1;
}
now in you can use it like this
int currentState=0;
int read=0;
bool done=false;
while(!done&&(read=<istream>.read())>=0){
if(read>=256){
currentState=0;
}else{
currentState=table[currentState][read];
}
if(currentState==endState){
done=true;
}
//do your streamy stuff
}
granted this only works if the delimiter is in extended ASCII, but it will work fine for some thing like your example.
It seems, it is easiest to create something like getline(): read to the last character of the separator. Then check if the string is long enough for the separator and, if so, if it ends with the separator. If it is not, carry on reading:
std::string getline(std::istream& in, std::string& value, std::string const& separator) {
std::istreambuf_iterator<char> it(in), end;
if (separator.empty()) { // empty separator -> return the entire stream
return std::string(it, end);
}
std::string rc;
char last(separator.back());
for (; it != end; ++it) {
rc.push_back(*it);
if (rc.back() == last
&& separator.size() <= rc.size()
&& rc.substr(rc.size() - separator.size()) == separator) {
return rc.resize(rc.size() - separator.size());
}
}
return rc; // no separator was found
}

Parse buffered data line by line

I want to write a parser for Wavefront OBJ file format, plain text file.
Example can be seen here: people.sc.fsu.edu/~jburkardt/data/obj/diamond.obj.
Most people use old scanf to parse this format line by line, however I would prefer to load the whole file at once to reduce IO operation count. Is there a way to parse this kind of buffered data line by line?
void ObjModelConditioner::Import(Model& asset)
{
uint8_t* buffer = SyncReadFile(asset.source_file_info());
delete [] buffer;
}
Or would it be preferable to load whole file into a string and try to parse that?
After a while It seems I found sufficient (and simple) solution. Since my goal is to create asset conditioning pipeline, the code has to be able to handle large amounts of data efficiently. Data can be read into a string at once and once loaded, stringstream can be initialized with this string.
std::string data;
SyncReadFile(asset.source_file_info(), data);
std::stringstream data_stream(data);
std::string line;
Then I simply call getline():
while(std::getline(data_stream, line))
{
std::stringstream line_stream(line);
std::string type_token;
line_stream >> type_token;
if (type_token == "v") {
// Vertex position
Vector3f position;
line_stream >> position.x >> position.y >> position.z;
// ...
}
else if (type_token == "vn") {
// Vertex normal
}
else if (type_token == "vt") {
// Texture coordinates
}
else if (type_token == "f") {
// Face
}
}
Here's a function that splits a char array into a vector of strings (assuming each new string starts with '\n' symbol):
#include <iostream>
#include <vector>
std::vector< std::string >split(char * arr)
{
std::string str = arr;
std::vector< std::string >result;
int beg=0, end=0;//begining and end of each line in the array
while( end = str.find( '\n', beg + 1 ) )
{
if(end == -1)
{
result.push_back(str.substr(beg));
break;
}
result.push_back(str.substr(beg, end - beg));
beg = end;
}
return result;
}
Here's the usage:
int main()
{
char * a = "asdasdasdasdasd \n asdasdasd \n asdasd";
std::vector< std::string >result = split(a);
}
If you've got the raw data in a char[] (or a unsigned char[]), and
you know its length, it's pretty trivial to write an input only, no seek
supported streambuf which will allow you to create an std::istream
and to use std::getline on it. Just call:
setg( start, start, start + length );
in the constructor. (Nothing else is needed.)
It really depends on how you're going to parse the text. One way to do this would be simply to read the data into a vector of strings. I'll assume that you've already covered issues such as scaleability / use of memory etc.
std::vector<std::string> lines;
std::string line;
ifstream file(filename.c_str(), ios_base::in);
while ( getline( file, line ) )
{
lines.push_back( line );
}
file.close();
This would cache your file in lines. Next you need to go through lines
for ( std::vector<std::string>::const_iterator it = lines.begin();
it != lines.end(); ++it)
{
const std::string& line = *it;
if ( line.empty() )
continue;
switch ( line[0] )
{
case 'g':
// Some stuff
break;
case 'v':
// Some stuff
break;
case 'f':
// Some stuff
break;
default:
// Default stuff including '#' (probably nothing)
}
}
Naturally, this is very simplistic and depends largely on what you want to do with your file.
The size of the file that you've given as an example is hardly likely to cause IO stress (unless you're using some very lightweight equipment) but if you're reading many files at once I suppose it might be an issue.
I think your concern here is to minimise IO and I'm not sure that this solution will really help that much since you're going to be iterating over a collection twice. If you need to go back and keep reading the same file over and over again, then it will definitely speed things up to cache the file in memory but there are just as easy ways to do this such as memory mapping a file and using normal file accessing. If you're really concerned, then try profiling a solution like this against simply processing the file directly as you read from IO.

read multiple lines but especially... parsing them efficiently

I need to read multiple lines with specific keywords at the beginning.
I have a basic problem and I'd need a hand to help me.
Here are the kind of input:
keyword1 0.0 0.0
keyword1 1.0 5.0
keyword2 10.0
keyword3 0.5
keyword4 6.0
rules are:
lines containing keyword1 & keyword2 SHOULD be in that order AND before any other lines.
lines containing keyword3 & keyword4 can be in any order
keyword1 HAS TO be followed by 2 double
keyword2, 3 & 4 HAVE TO be followed by 1 double
at the end of a block of lines containing all the four keyword followed by their double, the "loop" breaks and a calculation is triggered.
Here's the source I have:
using namespace std;
int main (int argc, const char * argv[]) {
vector<double> arrayInputs;
string line;
double keyword1_first, keyword1_second, keyword4,
keyword3, keyword2;
bool inside_keyword1=false, after_keyword2=false,
keyword4_defined=false, keyword3_defined=false ;
//cin.ignore();
while (getline(cin, line)) {
if (inside_keyword1 && after_keyword2 && keyword3 && keyword4) {
break;
}
else
{
std::istringstream split(line);
std::vector<std::string> tokens;
char split_char = ' ';
for (std::string each; std::getline(split, each, split_char); tokens.push_back(each));
if (tokens.size() > 2)
{
if (tokens[0] != "keyword1") return EXIT_FAILURE; // input format error
else
{
keyword1_first = atof(tokens[1].c_str());
keyword1_second = atof(tokens[2].c_str());
inside_keyword1 = true;
}
}
else
{
if (tokens[0] == "keyword2")
{
if (inside_keyword1)
{
keyword2 = atof(tokens[1].c_str());
after_keyword2 = true;
}
else return EXIT_FAILURE; // cannot define anything else keyword2 after keyword1 definition
}
else if (tokens[0] == "keyword3")
{
if (inside_keyword1 && after_keyword2)
{
keyword3 = atof(tokens[1].c_str());
keyword3_defined = true;
}
else return EXIT_FAILURE; // cannot define keyword3 outside a keyword1
}
else if (tokens[0] == "keyword4")
{
if (inside_keyword1 && after_keyword2)
{
keyword4 = atof(tokens[1].c_str());
keyword4_defined = true;
}
else return EXIT_FAILURE; // cannot define keyword4 outside a keyword1
}
}
}
}
// Calculation
// output
return EXIT_SUCCESS;
}
My question is: Is there a more efficient way to go about this besides using booleans in the reading/parsing loop ?
You ask about something "more efficient", but it seems you don't have a particular performance objective. So what you want here is probably more like a Code Review. There's a site for that, in particular:
https://codereview.stackexchange.com/
But anyway...
You are correct to intuit that four booleans are not really called for here. That's 2^4 = 16 different "states", many of which you should never be able to get to. (Your specification explicitly forbids, for instance, keyword3_defined == true when after_keyword1 == false).
Program state can be held in enums and booleans, sure. That makes it possible for a "forgetful" loop to revisit a line of code under different circumstances, yet still remember what phase of processing it is in. It's useful in many cases, including in sophisticated parsers. But if your task is linear and simple, it's better to implicitly "know" the state based on having reached a certain line of code.
As an educational example to show the contrast I'm talking about, here's a silly state machine to read in a letter A followed by any number of letter Bs:
enum State {
beforeReadingAnA,
haveReadAnA,
readingSomeBs,
doneReadingSomeBs
};
State s = beforeReadingAnA;
char c;
while(true) {
switch (s) {
case beforeReadingAnA:
cin >> c;
if (cin.good() && c == 'A') {
// good! accept and state transition to start reading Bs...
s = haveReadAnA;
} else {
// ERROR: expected an A
return EXIT_CODE_FAILURE;
};
break;
case haveReadAnA:
// We've read an A, so state transition into reading Bs
s = readingSomeBs;
break;
case readingSomeBs:
cin >> c;
if (cin.good() && c == 'B') {
// good! stay in the readingSomeBs state
} else if (cin.eof()) {
// reached the end of the input after 0 or more Bs
s = doneReadingSomeBs;
} else {
// ERROR: expected a B or the EOF
return EXIT_CODE_FAILURE;
}
break;
case doneReadingSomeBs:
// all done!
return EXIT_CODE_SUCCESS;
}
}
As mentioned, it's a style of coding that can be very, very useful. Yet for this case it's ridiculous. Compare with a simple linear piece of code that does the same thing:
// beforeReadingAnA is IMPLICIT
char c;
cin >> c;
if (cin.fail() || c != 'A')
return EXIT_CODE_FAILURE;
// haveReadAnA is IMPLICIT
do {
// readingSomeBs is IMPLICIT
cin >> c;
if (cin.eof())
return EXIT_CODE_SUCCESS;
if (cin.fail() || c != 'B')
return EXIT_CODE_FAILURE;
}
// doneReadingSomeBs is IMPLICIT
All the state variables disappear. They are unnecessary because the program just "knows where it is". If you rethink your example then you can probably do the same. You won't need four booleans because you can put your cursor on a line of code and say with confidence what those four boolean values would have to be if that line of code happens to be running.
As far as efficiency goes, the <iostream> classes can make life easier than you have it here and be more idiomatically C++ without invoking C-isms like atof or ever having to use c_str(). Let's look at a simplified excerpt of your code that just reads the doubles associated with "keyword1".
string line;
getline(cin, line);
istringstream split(line);
vector<string> tokens;
char split_char = ' ';
string each;
while (getline(split, each, split_char)) {
tokens.push_back(each);
}
double keyword1_first, keyword1_second;
if (tokens.size() > 2) {
if (tokens[0] != "keyword1") {
return EXIT_FAILURE; // input format error
} else {
keyword1_first = atof(tokens[1].c_str());
keyword1_second = atof(tokens[2].c_str());
}
}
Contrast that with this:
string keyword;
cin >> keyword;
if (keyword != "keyword1") {
return EXIT_FAILURE;
}
double keyword1_first, keyword1_second;
cin >> keyword1_first >> keyword1_second;
Magic. Iostreams can detect the type you are trying to read or write. If it encounters a problem interpreting the input in the way you ask for, then it will leave the input in the buffer so you can try reading it another way. (In the case of asking for a string, the behavior is to read a series of characters up to whitespace...if you actually wanted an entire line, you would use getline as you had done.)
The error handling is something you'll have to deal with, however. It's possible to tell iostreams to use exception-handling methodology, so that the standard response to encountering a problem (such as a random word in a place where a double was expected) would be to crash your program. But the default is to set a failure flag that you need to test:
cin erratic behaviour
There's nuance to iostream, so you probably want to do some survey of Q&A...I've been learning a bit myself lately while answering/asking here:
Output error when input isn't a number. C++
When to use printf/scanf vs cout/cin?

function of searching a string from a file

This is some code I wrote to check a string's presence in a file:
bool aviasm::in1(string s)
{
ifstream in("optab1.txt",ios::in);//opening the optab
//cout<<"entered in1 func"<<endl;
char c;
string x,y;
while((c=in.get())!=EOF)
{
in.putback(c);
in>>x;
in>>y;
if(x==s)
return true;
}
return false;
}
it is sure that the string being searched lies in the first column of the optab1.txt and in total there are two columns in the optab1.txt for every row.
Now the problem is that no matter what string is being passed as the parameter s to the function always returns false. Can you tell me why this happens?
What a hack! Why not use standard C++ string and file reading functions:
bool find_in_file(const std::string & needle)
{
std::ifstream in("optab1.txt");
std::string line;
while (std::getline(in, line)) // remember this idiom!!
{
// if (line.substr(0, needle.length()) == needle) // not so efficient
if (line.length() >= needle.length() && std::equal(needle.begin(), needle.end(), line.begin())) // better
// if (std::search(line.begin(), line.end(), needle.begin(), needle.end()) != line.end()) // for arbitrary position
{
return true;
}
}
return false;
}
You can replace substr by more advanced string searching functions if the search string isn't required to be at the beginning of a line. The substr version is the most readable, but it makes a copy of the substring. The equal version compares the two strings in-place (but requires the additional size check). The search version finds the substring anywhere, not just at the beginning of the line (but at a price).
It's not too clear what you're trying to do, but the condition in the
while will never be met if plain char is unsigned. (It usually
isn't, so you might get away with it.) Also, you're not extracting the
end of line in the loop, so you'll probably see it instead of EOF, and
pass once too often in the loop. I'd write this more along the lines
of:
bool
in1( std::string const& target )
{
std::ifstream in( "optab1.txt" );
if ( ! in.is_open() )
// Some sort of error handling, maybe an exception.
std::string line;
while ( std::getline( in, line )
&& ( line.size() < target.size()
|| ! std::equal( target.begin(), target.end(), line.begin() ) ) )
;
return in;
}
Note the check that the open succeeded. One possible reason you're
always returning false is that you're not successfully opening the file.
(But we can't know unless you check the status after the open.)

File I/O logic with a while statement, how would this code be expected to behave?

I'm trying to understand some differences in file i/o techniques. Suppose I have the following code:
FILE *work_fp;
char record[500] = {0};
while(!feof(work_fp))
{
static int first = 1;
fgets(record, 200, work_fp);
if (first)
{
var1 = 2;
length += var1;
}
first = 0;
if (feof(work_fp))
{
continue;
}
if((int)strlen(record) < length)
{
fclose(work_fp);
std::ostringstream err;
err << "ERROR -> Found a record with fewer bytes than required in file."
<< std::endl;
throw std::runtime_error(err.str());
}
const int var2 = 1;
if(memcmp(argv[1], record + var2, 3) == 0)
{
load_count_struct(record, var1);
}
}
I'm not seeing how the second if argument can be true.
if (feof(work_fp))
{
continue;
}
If feof(work_fp) is true wouldn't the while argument be false? Then the continue could never get called?
FOLLOW UP QUESTION:
Ok, I see how fgets can cause work_fp to reach eof conditions.
Suppose I want to try and implement this another way. Using getline(), for example.
std::string data(file);
std::ifstream in(data.c_str());
if (!in.is_open())
{
std::ostringstream err;
err << "Cannot open file: " << file << std::endl;
throw std::runtime_error(err.str());
}
std::string buffer = "";
std::string record = "";
while (getline(in, buffer))
{
static int first = 1;
if (first)
{
var1 = 2;
length += var1;
}
first = 0;
if (//What should go here?!?)
{
break;
}
// etc...
}
Any suggestions? I'm thinking
if (buffer == std::string::npos)
no?
The line:
fgets(record, 200, work_fp);
can advance to read/write head to the end of the file, thus changing the return value on feof.
First of all, your code invokes undefined behaviour, because you've not initialized work_fp, yet you're using it, passing it to feof(), first in while(!feof(work_fp))
, and elsewhere in the code.
Anyway, supposing you initialize it by opening some file, then I would answer your question as follows:
The following code reads some data from the file using work_fp, that means, it is possible that feof(work_fp) will return true in the second if condition, because after reading data using fgets(), the file pointer work_fp may reach end of file.
fgets(record, 200, work_fp);
In the while loop fgets() is called and the file pointer is advanced. Then if(feof(work_fp)) checks if the end of the file is reached. If so then continue the while loop. The while loop then continues if the end of the file is NOT reached, which in this case will be false. Hence the logic works.
That is a weird statement, and I think it should be
if (feof(work_fp)){
break;
}
The continue; can get called, since it occurs after an fgets, but calling continue is pointless since that brings execution to the next iteration of the loop which is guaranteed to be false and quit the loop. It makes more sense, and is more readable/understable to put break; there.
Since you have a fgets within the while before your check on feof, the feof status of work_fp may have changed during that read, in which case, it may evaluate to true.
There is a read operation on work_fp between the while and if conditions, so that feof() could be true.
The eof can have been reached at the following line:
fgets(record, 200, work_fp);
So right after having been evaluated to false in the while statement.
This would make the
if (feof(work_fp))
evaluated to true.
But this code can be simplified.