Parsing Data of data from a file - c++

i have this project due however i am unsure of how to parse the data by the word, part of speech and its definition... I know that i should make use of the tab spacing to read it but i have no idea how to implement it. here is an example of the file
Recollection n. The power of recalling ideas to the mind, or the period within which things can be recollected; remembrance; memory; as, an event within my recollection.
Nip n. A pinch with the nails or teeth.
Wodegeld n. A geld, or payment, for wood.
Xiphoid a. Of or pertaining to the xiphoid process; xiphoidian.
NB: Each word and part of speech and definition is one line in a text file.

If you can be sure that the definition will always follow the first period on a line, you could use an implementation like this. But it will break if there are ever more than 2 periods on a single line.
string str = "";
vector<pair<string,string>> v; // <word,definition>
while(getline(fileStream, str, '.')) { // grab line, deliminated '.'
str[str.length() - 1] = ""; // get rid of n, v, etc. from word
v.push_back(make_pair<string,string>(str,"")); // push the word
getline(fileStream, str, '.'); // grab the next part of the line
v.back()->second = str; // push definition into last added element
}
for(auto x : v) { // check your results
cout << "word -> " << x->first << endl;
cout << "definition -> " << x->second << endl << endl;
}
The better solution would be to learn Regular Expressions. It's a complicated topic but absolutely necessary if you want to learn how to parse text efficiently and properly:
http://www.cplusplus.com/reference/regex/

Related

C++ retrieve numerical values in a line of string

Here is the content of txt file that i've managed read.
X-axis=0-9
y-axis=0-9
location.txt
temp.txt
I'm not sure whether if its possible but after reading the contents of this txt file i'm trying to store just the x and y axis range into 2 variables so that i'll be able to use it for later functions. Any suggestion? And do i need to use vectors? Here is the code for reading of the file.
string configName;
ifstream inFile;
do {
cout << "Please enter config filename: ";
cin >> configName;
inFile.open(configName);
if (inFile.fail()){
cerr << "Error finding file, please re-enter again." << endl;
}
} while (inFile.fail());
string content;
string tempStr;
while (getline(inFile, content)){
if (content[0] && content[1] == '/') continue;
cout << endl << content << endl;
depends on the style of your file, if you are always sure that the style will remain unchanged, u can read the file character by character and implement pattern recognition stuff like
if (tempstr == "y-axis=")
and then convert the appropriate substring to integer using functions like
std::stoi
and store it
I'm going to assume you already have the whole contents of the .txt file in a single string somewhere. In that case, your next task should be to split the string. Personally, yes, I would recommend using vectors. Say you wanted to split that string by newlines. A function like this:
#include <string>
#include <vector>
std::vector<std::string> split(std::string str)
{
std::vector<std::string> ret;
int cur_pos = 0;
int next_delim = str.find("\n");
while (next_delim != -1) {
ret.push_back(str.substr(cur_pos, next_delim - cur_pos));
cur_pos = next_delim + 1;
next_delim = str.find("\n", cur_pos);
}
return ret;
}
Will split an input string by newlines. From there, you can begin parsing the strings in that vector. They key functions you'll want to look at are std::string's substr() and find() methods. A quick google search should get you to the relevant documentation, but here you are, just in case:
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
Now, say you have the string "X-axis=0-9" in vec[0]. Then, what you can do is do a find for = and then get the substrings before and after that index. The stuff before will be "X-axis" and the stuff after will be "0-9". This will allow you to figure that the "0-9" should be ascribed to whatever "X-axis" is. From there, I think you can figure it out, but I hope this gives you a good idea as to where to start!
std::string::find() can be used to search for a character in a string;
std::string::substr() can be used to extract part of a string into another new sub-string;
std::atoi() can be used to convert a string into an integer.
So then, these three functions will allow you to do some processing on content, specifically: (1) search content for the start/stop delimiters of the first value (= and -) and the second value (- and string::npos), (2) extract them into temporary sub-strings, and then (3) convert the sub-strings to ints. Which is what you want.

Formatting text in C++

So I'm trying to make a little text based game for my first real project in C++. When the game calls for a big block of text, several paragraphs worth, I'm having trouble getting it to look nice. I want to have uniform line lengths, and to do that I'm just having to manually enter line breaks in the appropraite places. Is there a command to do this for me? I have seen the setw() command, but that doesn't make the text wrap if it goes past the width. Any advice? Here is what I'm doing right now if that helps.
cout << " This is essentially what I do with large blocks of" << '\n';
cout << " descriptive text. It lets me see how long each of " << '\n';
cout << " the lines will be, but it's really a hassle, see? " << '\n';
Getting or writing a library function to automatically insert leading whitespace and newlines suitable for a particular width of output would be a good idea. This website considers library recommendations off topic, but I've included some code below - not particularly efficient but easy to understand. The basic logic should be to jump forwards in the string to the maximum width, then move backwards until you find whitespace (or perhaps a hyphen) at which you're prepared to break the line... then print the leading whitespace and the remaining part of the line. Continue until done.
#include <iostream>
std::string fmt(size_t margin, size_t width, std::string text)
{
std::string result;
while (!text.empty()) // while more text to move into result
{
result += std::string(margin, ' '); // add margin for this line
if (width >= text.size()) // rest of text can fit... nice and easy
return (result += text) += '\n';
size_t n = width - 1; // start by assuming we can fit n characters
while (n > width / 2 && isalnum(text[n]) && isalnum(text[n - 1]))
--n; // between characters; reduce n until word breaks or 1/2 width left
// move n characters from text to result...
(result += text.substr(0, n)) += '\n';
text.erase(0, n);
}
return result;
}
int main()
{
std::cout << fmt(5, 70,
"This is essentially what I do with large blocks of "
"descriptive text. It lets me see how long each of "
"the lines will be, but it's really a hassle, see?");
}
That hasn't been tested very thoroughly, but seems to work. See it run here. For some alternatives, see this SO question.
BTW, your original code can be simplified to...
cout << " This is essentially what I do with large blocks of\n"
" descriptive text. It lets me see how long each of\n"
" the lines will be, but it's really a hassle, see?\n";
...as C++ considers the statement unfinished until it hits the semicolon, and double-quoted string literals that appear next to each other in the code are concatenated automatically, as if the inner double quotes and whitespace were removed.
There are pretty library to handle text formatting, named cppformat/cppformat on github.
By using this, you can manipulate text formatting easily.
You can make a tierce simple function
void print(const std::string& brute, int sizeline)
{
for(int i = 0; i < brute.length(); i += sizeline)
std::cout << brute.substr(i, sizeline) << std::endl;
}
int main()
{
std::string mytext = "This is essentially what I do with large blocks of"
"descriptive text. It lets me see how long each of "
"1the lines will be, but it's really a hassle, see?";
print(mytext, 20);
return 0;
}
but words will be cut
output:
This is essentially
what I do with large
blocks ofdescriptiv
e text. It lets me s
ee how long each of
1the lines will be,
but it's really a ha
ssle, see?

Convert text to csv file in C++?

I got a text file that contain lots of line like the following:
data[0]: a=123 b=234 c=3456 d=4567 e=123.45 f=234.56
I am trying to extract the number out in order to convert it to a csv file in order to let excel import and recognize it.
My logic is, find the " " character, then chop the data out. For example, chop between first
" " and second " ". Is it viable? I have been trying on this but I did not succeed.
Actually I want to create a csv file like
a, b, c, d, e, f
123, 234, 3456 .... blablabla
234, 345, 4567 .... blablabla
But it seems it is quite difficult to do this specific task.
Are there any utilities/better method that could help me to do this?
I suggest you take a look at boost::tokenizer, this is the best approach I have found. You will find several example on the web. Have also a look at this high-score question.
Steps: for each line:
Cut string in two parts using the : character
Cut the right part into several strings using space character
separate the values using the = character, and stuff these into a std::vector<std::string>
Put these values in a file.
Last part can be something like:
std::ofstream f( "myfile.csv" );
for( const auto& s: vstrings )
f << s << ',';
f << "\n";
A easy way with no non-Standard libraries is:
std::string line;
while (getline(input_stream, line))
{
std::istringstream iss(line);
std::string word;
if (is >> word) // throw away "data[n]:"
{
std::string identifier;
std::string value;
while (getline(iss, identifier, '=') && is >> value)
std::cout << value << ",";
std::cout << '\n';
}
}
You can tweak it if training commas are causing excel any trouble, add more sanity checks (e.g. that value is numeric, that fields are consistent across all lines), but the basic parsing above is a start.

Reading text file by scanning for keywords

As part of a bigger application I am working on a class for reading input from a text file for use in the initialization of the program. Now I am myself fairly new to programming, and I only started to learn C++ in December, so I would be very grateful for some hints and ideas on how to get started! I apologise in advance for a rather long wall of text.
The text file format is "keyword-driven" in the following way:
There are a rather small number of main/section keywords (currently 8) that need to be written in a given order. Some of them are optional, but if they are included they should adhere to the given ordering.
Example:
Suppose there are 3 potential keywords ordered like as follows:
"KEY1" (required)
"KEY2" (optional)
"KEY3" (required)
If the input file only includes the required ones, the ordering should be:
"KEY1"
"KEY3"
Otherwise it should be:
"KEY1"
"KEY2"
"KEY3"
If all the required keywords are present, and the total ordering is ok, the program should proceed by reading each section in the sequence given by the ordering.
Each section will include a (possibly large) amount of subkeywords, some of which are optional and some of which are not, but here the order does NOT matter.
Lines starting with characters '*' or '--' signify commented lines, and they should be ignored (as well as empty lines).
A line containing a keyword should (preferably) include nothing else than the keyword. At the very least, the keyword must be the first word appearing there.
I have already implemented parts of the framework, but I feel my approach so far has been rather ad-hoc. Currently I have manually created one method per section/main keyword , and the first task of the program is to scan the file for to locate these keywords and pass the necessary information on to the methods.
I first scan through the file using an std::ifstream object, removing empty and/or commented lines and storing the remaining lines in an object of type std::vector<std::string>.
Do you think this is an ok approach?
Moreover, I store the indices where each of the keywords start and stop (in two integer arrays) in this vector. This is the input to the above-mentioned methods, and it would look something like this:
bool readMAINKEY(int start, int stop);
Now I have already done this, and even though I do not find it very elegant, I guess I can keep it for the time being.
However, I feel that I need a better approach for handling the reading inside of each section, and my main issue is how should I store the keywords here? Should they be stored as arrays within a local namespace in the input class or maybe as static variables in the class? Or should they be defined locally inside relevant functions? Should I use enums? The questions are many!
Now I've started by defining the sub-keywords locally inside each readMAINKEY() method, but I found this to be less than optimal. Ideally I want to reuse as much code as possible inside each of these methods, calling upon a common readSECTION() method, and my current approach seems to lead to much code duplication and potential for error in programming. I guess the smartest thing to do would simply be to remove all the (currently 8) different readMAINKEY() methods, and use the same function for handling all kinds of keywords. There is also the possibility for having sub-sub-keywords etc. as well (i.e. a more general nested approach), so I think maybe this is the way to go, but I am unsure on how it would be best to implement it?
Once I've processed a keyword at the "bottom level", the program will expect a particular format of the following lines depending on the actual keyword. In principle each keyword will be handled differently, but here there is also potential for some code reuse by defining different "types" of keywords depending on what the program expects to do after triggering the reading of it. Common task include e.g. parsing an integer or a double array, but in principle it could be anything!
If a keyword for some reason cannot be correctly processed, the program should attempt as far as possible to use default values instead of terminating the program (if reasonable), but an error message should be written to a logfile. For optional keywords, default values will of course also be used.
In order to summarise, therefore, my main questions are the following:
1. Do you think think my approach of storing the relevant lines in a std::vector<std::string> to be reasonable?
This will of course require me to do a lot of "indexing work" to keep track of where in the vector the different keywords are located. Or should I work more "directly" with the original std::ifstream object? Or something else?
2. Given such a vector storing the lines of the text file, how I can I best go about detecting the keywords and start reading the information following them?
Here I will need to take account of possible ordering and whether a keyword is required or not. Also, I need to check if the lines following each "bottom level" keyword is in the format expected in each case.
One idea I've had is to store the keywords in different containers depending on whether they are optional or not (or maybe use object(s) of type std::map<std::string,bool>), and then remove them from the container(s) if correctly processed, but I am not sure exactly how I should go about it..
I guess there is really a thousand different ways one could answer these questions, but I would be grateful if someone more experienced could share some ideas on how to proceed. Is there e.g. a "standard" way of doing such things? Of course, a lot of details will also depend on the concrete application, but I think the general format indicated here can be used in a lot of different applications without a lot of tinkering if programmed in a good way!
UPDATE
Ok, so let my try to be more concrete. My current application is supposed to be a reservoir simulator, so as part of the input I need information about the grid/mesh, about rock and fluid properties, about wells/boundary conditions throughout the simulation and so on. At the moment I've been thinking about using (almost) the same set-up as the commercial Eclipse simulator when it comes to input, for details see
http://petrofaq.org/wiki/Eclipse_Input_Data.
However, I will probably change things a bit, so nothing is set in stone. Also, I am interested in making a more general "KeywordReader" class that with slight modifications can be adapted for use in other applications as well, at least it can be done in a reasonable amount of time.
As an example, I can post the current code that does the initial scan of the text file and locates the positions of the main keywords. As I said, I don't really like my solution very much, but it seems to work for what it needs to do.
At the top of the .cpp file I have the following namespace:
//Keywords used for reading input:
namespace KEYWORDS{
/*
* Main keywords and corresponding boolean values to signify whether or not they are required as input.
*/
enum MKEY{RUNSPEC = 0, GRID = 1, EDIT = 2, PROPS = 3, REGIONS = 4, SOLUTION = 5, SUMMARY =6, SCHEDULE = 7};
std::string mainKeywords[] = {std::string("RUNSPEC"), std::string("GRID"), std::string("EDIT"), std::string("PROPS"),
std::string("REGIONS"), std::string("SOLUTION"), std::string("SUMMARY"), std::string("SCHEDULE")};
bool required[] = {true,true,false,true,false,true,false,true};
const int n_key = 8;
}//end KEYWORDS namespace
Then further down I have the following function. I am not sure how understandable it is though..
bool InputReader::scanForMainKeywords(){
logfile << "Opening file.." << std::endl;
std::ifstream infile(filename);
//Test if file was opened. If not, write error message:
if(!infile.is_open()){
logfile << "ERROR: Could not open file! Unable to proceed!" << std::endl;
std::cout << "ERROR: Could not open file! Unable to proceed!" << std::endl;
return false;
}
else{
logfile << "Scanning for main keywords..." << std::endl;
int nkey = KEYWORDS::n_key;
//Initially no keywords have been found:
startIndex = std::vector<int>(nkey, -1);
stopIndex = std::vector<int>(nkey, -1);
//Variable used to control that the keywords are written in the correct order:
int foundIndex = -1;
//STATISTICS:
int lineCount = 0;//number of non-comment lines in text file
int commentCount = 0;//number of commented lines in text file
int emptyCount = 0;//number of empty lines in text file
//Create lines vector:
lines = std::vector<std::string>();
//Remove comments and empty lines from text file and store the result in the variable file_lines:
std::string str;
while(std::getline(infile,str)){
if(str.size()>=1 && str.at(0)=='*'){
commentCount++;
}
else if(str.size()>=2 && str.at(0)=='-' && str.at(1)=='-'){
commentCount++;
}
else if(str.size()==0){
emptyCount++;
}
else{
//Found a non-empty, non-comment line.
lines.push_back(str);//store in std::vector
//Start by checking if the first word of the line is one of the main keywords. If so, store the location of the keyword:
std::string fw = IO::getFirstWord(str);
for(int i=0;i<nkey;i++){
if(fw.compare(KEYWORDS::mainKeywords[i])==0){
if(i > foundIndex){
//Found a valid keyword!
foundIndex = i;
startIndex[i] = lineCount;//store where the keyword was found!
//logfile << "Keyword " << fw << " found at line " << lineCount << " in lines array!" << std::endl;
//std::cout << "Keyword " << fw << " found at line " << lineCount << " in lines array!" << std::endl;
break;//fw cannot equal several different keywords at the same time!
}
else{
//we have found a keyword, but in the wrong order... Terminate program:
std::cout << "ERROR: Keywords have been entered in the wrong order or been repeated! Cannot continue initialisation!" << std::endl;
logfile << "ERROR: Keywords have been entered in the wrong order or been repeated! Cannot continue initialisation!" << std::endl;
return false;
}
}
}//end for loop
lineCount++;
}//end else (found non-comment, non-empty line)
}//end while (reading ifstream)
logfile << "\n";
logfile << "FILE STATISTICS:" << std::endl;
logfile << "Number of commented lines: " << commentCount << std::endl;
logfile << "Number of non-commented lines: " << lineCount << std::endl;
logfile << "Number of empty lines: " << emptyCount << std::endl;
logfile << "\n";
/*
Print lines vector to screen:
for(int i=0;i<lines.size();i++){
std:: cout << "Line nr. " << i << " : " << lines[i] << std::endl;
}*/
/*
* So far, no keywords have been entered in the wrong order, but have all the necessary ones been found?
* Otherwise return false.
*/
for(int i=0;i<nkey;i++){
if(KEYWORDS::required[i] && startIndex[i] == -1){
logfile << "ERROR: Incorrect input of required keywords! At least " << KEYWORDS::mainKeywords[i] << " is missing!" << std::endl;;
logfile << "Cannot proceed with initialisation!" << std::endl;
std::cout << "ERROR: Incorrect input of required keywords! At least " << KEYWORDS::mainKeywords[i] << " is missing!" << std::endl;
std::cout << "Cannot proceed with initialisation!" << std::endl;
return false;
}
}
//If everything is in order, we also initialise the stopIndex array correctly:
int counter = 0;
//Find first existing keyword:
while(counter < nkey && startIndex[counter] == -1){
//Keyword doesn't exist. Leave stopindex at -1!
counter++;
}
//Store stop index of each keyword:
while(counter<nkey){
int offset = 1;
//Find next existing keyword:
while(counter+offset < nkey && startIndex[counter+offset] == -1){
offset++;
}
if(counter+offset < nkey){
stopIndex[counter] = startIndex[counter+offset]-1;
}
else{
//reached the end of array!
stopIndex[counter] = lines.size()-1;
}
counter += offset;
}//end while
/*
//Print out start/stop-index arrays to screen:
for(int i=0;i<nkey;i++){
std::cout << "Start index of " << KEYWORDS::mainKeywords[i] << " is : " << startIndex[i] << std::endl;
std::cout << "Stop index of " << KEYWORDS::mainKeywords[i] << " is : " << stopIndex[i] << std::endl;
}
*/
return true;
}//end else (file opened properly)
}//end scanForMainKeywords()
You say your purpose is to read initialization data from a text file.
Seems you need to parse (syntax analyze) this file and store the data under the right keys.
If the syntax is fixed and each construction starts with a keyword, you could write a recursive descent (LL1) parser creating a tree (each node is a stl vector of sub-branches) to store your data.
If the syntax is free, you might pick JSON or XML and use an existing parsing library.

Reading two last occurrences from a file

I have a little problem,
I've got a file containing a lot of information ( posted on pastbin, because its pretty long, http://pastebin.com/MPcTMHfd )
Yes, it is a PokerStars card log, but I am making a odd calculator for myself, I even asked PokerStars about that.
When I filter the file I get something like
CODE FOR FILTERING:
getline(failas,line);
if( line.find(search_str) != std::string::npos )
{
firstCard = line.substr(4);
cout << firstCard << '\n' ;
}
- Result:
::: 7c
::: 5d
::: 13c
::: 7d
::: 12h
::: 13d
and so on, so the thing I want to do is get last cards ( 12h and 13h as in the example above )
All I managed to get is last card, (13d)
Any ideas, how I could read two lines or any other ideas how to solve my little problem?
I know that it is a beginner question, but I haven't really found a suitable answer anywhere.
So you want the n last cards ? Then maintain a "last n cards" list and update it for every card found.
Like that : (code written by hand)
#include <list>
#include <string>
std::list<std::string> last_n_cards;
const unsigned int last_n_cards_max = 2;
// basic init to make code simpler, can be done differently
last_n_cards.push_back( "?" ); // too lazy to write the for(last_n_cards_max) loop ;)
last_n_cards.push_back( "?" );
(loop)
if( line.find(search_str) != std::string::npos )
{
currentCard = line.substr(4);
cout << currentCard << '\n';
last_n_cards.push_back(currentCard); // at new card to the end
last_n_cards.pop_front(); // remove 1 card from the front
}
// At the end :
cout << "Last two cards are : " << last_n_cards[0] << " and " << last_n_cards[1] << '\n';
std::list API is here http://en.cppreference.com/w/cpp/container/list
Note : have you considered using another language than C++ for this task ? Non perf-intensive files parsing may be easier with a dynamic language like python.
There could be two simple methods to solve your problem.
Scan whole file to figure out how many lines (cards) it has and then simply set position to one before last card and read it and then the next. You will have two last cards read.
Use two pointers. Set first pointer to first line and next one to first + 2. Then iterate through whole file and once second pointer will reach end of file (EOF) your first pointer will point to one before last card.