Help improve this INI parsing code - c++

This is something simple I came up with for this question. I'm not entirely happy with it and I saw it as a chance to help improve my use of STL and streams based programming.
std::wifstream file(L"\\Windows\\myini.ini");
if (file)
{
bool section=false;
while (!file.eof())
{
std::wstring line;
std::getline(file, line);
if (line.empty()) continue;
switch (line[0])
{
// new header
case L'[':
{
std::wstring header;
size_t pos=line.find(L']');
if (pos!=std::wstring::npos)
{
header=line.substr(1, pos);
if (header==L"Section")
section=true;
else
section=false;
}
}
break;
// comments
case ';':
case ' ':
case '#':
break;
// var=value
default:
{
if (!section) continue;
// what if the name = value does not have white space?
// what if the value is enclosed in quotes?
std::wstring name, dummy, value;
lineStm >> name >> dummy;
ws(lineStm);
WCHAR _value[256];
lineStm.getline(_value, ELEMENTS(_value));
value=_value;
}
}
}
}
How would you improve this? Please do not recommend alternative libraries - I just want a simple method for parsing out some config strings from an INI file.

// what if the name = value does not have white space?
// what if the value is enclosed in quotes?
I would use boost::regex to match for every different type of element, something like:
boost::smatch matches;
boost::regex name_value("(\S+)\s*=\s*(\S+)");
if(boost::regex_match(line, matches, name_value))
{
name = matches[1];
value = matches[2];
}
the regular expressions might need some tweaking.
I would also replace de stream.getline with std::getline, getting rid of the static char array.

This:
for (size_t i=1; i<line.length(); i++)
{
if (line[i]!=L']')
header.push_back(line[i]);
else
break;
}
should be simplified by a call to wstrchr, wcschr, WSTRCHR, or something else, depending on what platform you are on.

// how to get a line into a string in one go?
Use the (nonmember) getline function from the standard string header.

Related

How to access previous string while reading from a file?

I need some help with a project I'm working on. In the program I'm reading in strings from a file and doing different things depending on if they have punctuation marks or not. If the string has punctuation you separate the punctuation and set it as a value to the string key, then set the end of sentence value as "$", then lastly set the key for the beginning of the next sentence as "^" with the next read in string as its value. I have the code for where it ends with a punctuation finished, but i'm not sure entirely what to do if it DOESN'T have punctuation.
Essentially if the read in string doesn't have punctuation marks then i want to simply do: mapName[previousString].push_back(newString)
But how do i access that previous string? If i try to read in 2 strings at once i would still have to check for punctuation, which defeats the purpose of checking only once for punctuation. Apologies if this is a dumb question, but i've been trying to work on this all day yesterday and today. Any help would be greatly appreciated!
void BookBot::readIn(const std::string & filename) {
ifstream inputFile;
string Startkey = "^"; //beginning of sentence
string value;
string value2;
inputFile.open(filename); //open file;
while(inputFile) {
inputFile >> value; //read a string into value
sanitize(value); //clean up string if needed
size_t end = value.size()-1;
if(isEndPunctuation(value[end])) {
string endKey = "$";
string endChar(1,value[end]);
value = value.substr(0,end);
markov_chain[value].push_back(endChar);
markov_chain[endChar].push_back(endKey);
markov_chain[endKey].push_back(Startkey);
inputFile >> value2;
sanitize(value2);
markov_chain[Startkey].push_back(value2);
} else {
//if it DOESN'T HAVE PUNCTUATION
//Essentially i just want to be able to do
//markov_chain[previousString].push_back(newString)
}
}
}
But how do i access that previous string?
Well, you have to remember it. Unfortunately your question doesn't make it very clear which strings qualify. I'll assume that it is value after processing in this code fragment.
string previousString;
while(inputFile) {
...
if(...) {
...
value = value.substr(0,end);
...
markov_chain[Startkey].push_back(value2);
previousString = value;
}
else {
markov_chain[previousString].push_back(value);
}
...
}
Edit:
From your comment it sounds like the else case may also need to set the previousString
else {
markov_chain[previousString].push_back(value);
previousString = value;
}
in which case it could just be moved to the bottom of the loop.
while(inputFile) {
...
previousString = value;
}

Reading from FileStream with arbitrary delimiter

I have encountered a problem to read msg from a file using C++. Usually what people does is create a file stream then use getline() function to fetch msg. getline() function can accept an additional parameter as delimiter so that it return each "line" separated by the new delimiter but not default '\n'. However, this delimiter has to be a char. In my usecase, it is possible the delimiter in the msg is something else like "|--|", so I try to get a solution such that it accept a string as delimiter instead of a char.
I have searched StackOverFlow a little bit and found some interesting posts.
Parse (split) a string in C++ using string delimiter (standard C++)
This one gives a solution to use string::find() and string::substr() to parse with arbitrary delimiter. However, all the solutions there assumes input is a string instead of a stream, In my case, the file stream data is too big/waste to fit into memory at once so it should read in msg by msg (or a bulk of msg at once).
Actually, read through the gcc implementation of std::getline() function, it seems it is much more easier to handle the case delimiter is a singe char. Since every time you load in a chunk of characters, you can always search the delimiter and separate them. While it is different if you delimiter is more than one char, the delimiter itself may straddle between two different chunks and cause many other corner cases.
Not sure whether anyone else has faced this kind of requirement before and how you guys handled it elegantly. It seems it would be nice to have a standard function like istream& getNext (istream&& is, string& str, string delim)? This seems to be a general usecase to me. Why not this one is in Standard lib so that people no longer to implement their own version separately?
Thank you very much
The STL simply does not natively support what you are asking for. You will have to write your own function (or find a 3rd party function) that does what you need.
For instance, you can use std::getline() to read up to the first character of your delimiter, and then use std::istream::get() to read subsequent characters and compare them to the rest of your delimiter. For example:
std::istream& my_getline(std::istream &input, std::string &str, const std::string &delim)
{
if (delim.empty())
throw std::invalid_argument("delim cannot be empty!");
if (delim.size() == 1)
return std::getline(input, str, delim[0]);
str.clear();
std::string temp;
char ch;
bool found = false;
do
{
if (!std::getline(input, temp, delim[0]))
break;
str += temp;
found = true;
for (int i = 1; i < delim.size(); ++i)
{
if (!input.get(ch))
{
if (input.eof())
input.clear(std::ios_base::eofbit);
str.append(delim.c_str(), i);
return input;
}
if (delim[i] != ch)
{
str.append(delim.c_str(), i);
str += ch;
found = false;
break;
}
}
}
while (!found);
return input;
}
if you are ok with reading byte by byte, you could build a state transition table implementation of a finite state machine to recognize your stop condition
std::string delimeter="someString";
//initialize table with a row per target string character, a column per possible char and all zeros
std::vector<vector<int> > table(delimeter.size(),std::vector<int>(256,0));
int endState=delimeter.size();
//set the entry for the state looking for the next letter and finding that character to the next state
for(unsigned int i=0;i<delimeter.size();i++){
table[i][(int)delimeter[i]]=i+1;
}
now in you can use it like this
int currentState=0;
int read=0;
bool done=false;
while(!done&&(read=<istream>.read())>=0){
if(read>=256){
currentState=0;
}else{
currentState=table[currentState][read];
}
if(currentState==endState){
done=true;
}
//do your streamy stuff
}
granted this only works if the delimiter is in extended ASCII, but it will work fine for some thing like your example.
It seems, it is easiest to create something like getline(): read to the last character of the separator. Then check if the string is long enough for the separator and, if so, if it ends with the separator. If it is not, carry on reading:
std::string getline(std::istream& in, std::string& value, std::string const& separator) {
std::istreambuf_iterator<char> it(in), end;
if (separator.empty()) { // empty separator -> return the entire stream
return std::string(it, end);
}
std::string rc;
char last(separator.back());
for (; it != end; ++it) {
rc.push_back(*it);
if (rc.back() == last
&& separator.size() <= rc.size()
&& rc.substr(rc.size() - separator.size()) == separator) {
return rc.resize(rc.size() - separator.size());
}
}
return rc; // no separator was found
}

populating a string vector with tab delimited text

I'm very new to C++.
I'm trying to populate a vector with elements from a tab delimited file. What is the easiest way to do that?
Thanks!
There could be many ways to do it, simple Google search give you a solution.
Here is example from one of my projects. It uses getline and read comma separated file (CSV), I let you change it for reading tab delimited file.
ifstream fin(filename.c_str());
string buffer;
while(!fin.eof() && getline(fin, buffer))
{
size_t prev_pos = 0, curr_pos = 0;
vector<string> tokenlist;
string token;
// check string
assert(buffer.length() != 0);
// tokenize string buffer.
curr_pos = buffer.find(',', prev_pos);
while(1) {
if(curr_pos == string::npos)
curr_pos = buffer.length();
// could be zero
int token_length = curr_pos-prev_pos;
// create new token and add it to tokenlist.
token = buffer.substr(prev_pos, token_length);
tokenlist.push_back(token);
// reached end of the line
if(curr_pos == buffer.length())
break;
prev_pos = curr_pos+1;
curr_pos = buffer.find(',', prev_pos);
}
}
UPDATE: Improved while condition.
This is probably the easiest way to do it, but vcp's approach can be more efficient.
std::vector<string> tokens;
std::string token;
while (std::getline(infile, token, '\t')
{
tokens.push_back(token);
}
Done. You can actually get this down to about three lines of code with an input iterator and a back inserter, but why?
Now if the file is cut up into lines and separated by tabs on those lines, you also have to handle the line delimiters. Now you just do the above twice, one loop for lines and an inner loop to parse the tabs.
std::vector<string> tokens;
std::string line;
while (std::getline(infile, line)
{
std::stringstream instream(line)
std::string token;
while (std::getline(instream, token, '\t')
{
tokens.push_back(token);
}
}
And if you needed to do line, then tabs, then... I dunno... quotes? Three loops. But to be honest by three I'm probably looking at writing a state machine. I doubt your teacher wants anything like that at this stage.

Parse buffered data line by line

I want to write a parser for Wavefront OBJ file format, plain text file.
Example can be seen here: people.sc.fsu.edu/~jburkardt/data/obj/diamond.obj.
Most people use old scanf to parse this format line by line, however I would prefer to load the whole file at once to reduce IO operation count. Is there a way to parse this kind of buffered data line by line?
void ObjModelConditioner::Import(Model& asset)
{
uint8_t* buffer = SyncReadFile(asset.source_file_info());
delete [] buffer;
}
Or would it be preferable to load whole file into a string and try to parse that?
After a while It seems I found sufficient (and simple) solution. Since my goal is to create asset conditioning pipeline, the code has to be able to handle large amounts of data efficiently. Data can be read into a string at once and once loaded, stringstream can be initialized with this string.
std::string data;
SyncReadFile(asset.source_file_info(), data);
std::stringstream data_stream(data);
std::string line;
Then I simply call getline():
while(std::getline(data_stream, line))
{
std::stringstream line_stream(line);
std::string type_token;
line_stream >> type_token;
if (type_token == "v") {
// Vertex position
Vector3f position;
line_stream >> position.x >> position.y >> position.z;
// ...
}
else if (type_token == "vn") {
// Vertex normal
}
else if (type_token == "vt") {
// Texture coordinates
}
else if (type_token == "f") {
// Face
}
}
Here's a function that splits a char array into a vector of strings (assuming each new string starts with '\n' symbol):
#include <iostream>
#include <vector>
std::vector< std::string >split(char * arr)
{
std::string str = arr;
std::vector< std::string >result;
int beg=0, end=0;//begining and end of each line in the array
while( end = str.find( '\n', beg + 1 ) )
{
if(end == -1)
{
result.push_back(str.substr(beg));
break;
}
result.push_back(str.substr(beg, end - beg));
beg = end;
}
return result;
}
Here's the usage:
int main()
{
char * a = "asdasdasdasdasd \n asdasdasd \n asdasd";
std::vector< std::string >result = split(a);
}
If you've got the raw data in a char[] (or a unsigned char[]), and
you know its length, it's pretty trivial to write an input only, no seek
supported streambuf which will allow you to create an std::istream
and to use std::getline on it. Just call:
setg( start, start, start + length );
in the constructor. (Nothing else is needed.)
It really depends on how you're going to parse the text. One way to do this would be simply to read the data into a vector of strings. I'll assume that you've already covered issues such as scaleability / use of memory etc.
std::vector<std::string> lines;
std::string line;
ifstream file(filename.c_str(), ios_base::in);
while ( getline( file, line ) )
{
lines.push_back( line );
}
file.close();
This would cache your file in lines. Next you need to go through lines
for ( std::vector<std::string>::const_iterator it = lines.begin();
it != lines.end(); ++it)
{
const std::string& line = *it;
if ( line.empty() )
continue;
switch ( line[0] )
{
case 'g':
// Some stuff
break;
case 'v':
// Some stuff
break;
case 'f':
// Some stuff
break;
default:
// Default stuff including '#' (probably nothing)
}
}
Naturally, this is very simplistic and depends largely on what you want to do with your file.
The size of the file that you've given as an example is hardly likely to cause IO stress (unless you're using some very lightweight equipment) but if you're reading many files at once I suppose it might be an issue.
I think your concern here is to minimise IO and I'm not sure that this solution will really help that much since you're going to be iterating over a collection twice. If you need to go back and keep reading the same file over and over again, then it will definitely speed things up to cache the file in memory but there are just as easy ways to do this such as memory mapping a file and using normal file accessing. If you're really concerned, then try profiling a solution like this against simply processing the file directly as you read from IO.

Skip reading a line in a INI file if its length greater than n in C++

I want to skip reading a line in the INI file if has more than 1000 characters.This is the code i'm using:
#define MAX_LINE 1000
char buf[MAX_LINE];
CString strTemp;
str.Empty();
for(;;)
{
is.getline(buf,MAX_LINE);
strTemp=buf;
if(strTemp.IsEmpty()) break;
str+=strTemp;
if(str.Find("^")>-1)
{
str=str.Left( str.Find("^") );
do
{
is.get(buf,2);
} while(is.gcount()>0);
is.getline(buf,2);
}
else if(strTemp.GetLength()!=MAX_LINE-1) break;
}
//is.getline(buf,MAX_LINE);
return is;
...
The problem i'm facing is that if the characters exceed 1000 if seems to fall in a infinite loop(unable to read next line).How can i make the getline to skip that line and read the next line??
const std::size_t max_line = 1000; // not a macro, macros are disgusting
std::string line;
while (std::getline(is, line))
{
if (line.length() > max_line)
continue;
// else process the line ...
}
How abut checking the return value of getline and break if that fails?
..or if is is an istream, you could check for an eof() condition to break you out.
#define MAX_LINE 1000
char buf[MAX_LINE];
CString strTemp;
str.Empty();
while(is.eof() == false)
{
is.getline(buf,MAX_LINE);
strTemp=buf;
if(strTemp.IsEmpty()) break;
str+=strTemp;
if(str.Find("^")>-1)
{
str=str.Left( str.Find("^") );
do
{
is.get(buf,2);
} while((is.gcount()>0) && (is.eof() == false));
stillReading = is.getline(buf,2);
}
else if(strTemp.GetLength()!=MAX_LINE-1)
{
break;
}
}
return is;
For something completely different:
std::string strTemp;
str.Empty();
while(std::getline(is, strTemp)) {
if(strTemp.empty()) break;
str+=strTemp.c_str(); //don't need .c_str() if str is also a std::string
int pos = str.Find("^"); //extracted this for speed
if(pos>-1){
str=str.Left(pos);
//Did not translate this part since it was buggy
} else
//not sure of the intent here either
//it would stop reading if the line was less than 1000 characters.
}
return is;
This uses strings for ease of use, and no maximum limits on lines. It also uses the std::getline for the dynamic/magic everything, but I did not translate the bit in the middle since it seemed very buggy to me, and I couldn't interpret the intent.
The part in the middle simply reads two characters at a time until it reaches the end of the file, and then everything after that would have done bizarre stuff since you weren't checking return values. Since it was completely wrong, I didn't interpret it.