Parsing a string in c++ using header file and cpp file - c++

I am able to read my text file. Now i would like to parse the string line by line.
I am using header file and cpp file..
can anyone help me with parsing tutorial.
Where can find a good tutorial for parsing?

You can try http://www.cppreference.com/wiki/ and look at examples of using stringstreams.

I don't see what this has to do with header files, but here's how you parse a stream line by line:
void read_line(std::istream& is)
{
// read the lisn from is, for example: reading whitespace-delimited words:
std::string word;
while(is >> word)
process_word(word);
if( !is.eof() ) // some other error?
throw "Dude, you need better error handling!";
}
void read_file(std::istream& is)
{
for(;;)
{
std::string line;
if( !std::getline(is,line) )
break;
std::istringstream iss(line);
read_line(iss);
}
if( !is.eof() ) // some other error?
throw "Dude, you need better error handling!";
}

Try this:
#include <iostream>
#include <vector>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream fs("myFile.txt");
string input;
vector<string> sets;
while( getline(fs, input) )
sets.push_back(input);
}

First you need to know if the lines contain fixed length fields or are the fields variable length. Fixed length fields are usually padded with some character such as spaces or zeros. Variable length fields are usually terminated by a character such as a comma or tab.
Variable Length Fields
Use the std::string::find or std::string::find_first to find the ending character; also account for the end of the string as the last field may not contain the terminating character. Use this position to determine the length of the field (ending field position - starting field position). Finally, use std::string::substr to extract the field's content.
Fixed Length Fields
Use the std::string::substr method to extract the text. The starting and ending positions can be calculated using the accumulated lengths of the previous fields, if any, and the size of the current field.
Converting Field Text
The contents of the field may not be a string and will need to be converted to an internal data type. For example, a number. Use std::istringstream to convert the text of the field to an internal data type.

Related

In c++, how do you get the input of a string, float and integer from 1 line?

An input file is entered with the following data:
Juan Dela Cruz 150.50 5
'Juan Dela Cruz' is a name that I would like to assign to string A,
'150.50' is a number I would like to assign to float B
and 5 is a number I would like to assign to int C.
If I try cin, it is delimited by the spaces in between.
If I use getline, it's getting the whole line as a string.
What would be the correct syntax for this?
If we analyze the string, then we can make the following observation. At the very end, we have an integer. In front of the integer we have a space. And in front of that the float value. And again in fron of that a space.
So, we can simply look from the back of the string for the 2nd last space. This can easily be achieved by
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
We need a nested statement of rfind please see here, version no 3.
Then we build a substring with the name. From start of the string up to the found position.
For the numbers, we put the rest of the original string into an std::istringstream and then simply extract from there.
Please see the following simple code, which has just a few lines of code.
#include <iostream>
#include <string>
#include <cctype>
#include <sstream>
int main() {
// This is the string that we read via getline or whatever
std::string lineFromeFile("Juan Dela Cruz 150.50 5");
// Let's search for the 2nd last space
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
// Get the name as a substring from the original string
std::string name = lineFromeFile.substr(0, position);
// Put the numbers in a istringstream for better extraction
std::istringstream iss(lineFromeFile.substr(position));
// Get the rest of the values
float fValue;
int iValue;
iss >> fValue >> iValue;
// Show result to use
std::cout << "\nName:\t" << name << "\nFloat:\t" << fValue << "\nInt:\t" << iValue << '\n';
return 0;
}
Probably simplest in this case would be to read whole line into string and then parse it with regex:
const std::regex reg("\\s*(\\S.*)\\s+(\\d+(\\.\\d+)?)\\s+(\\d+)\\s*");
std::smatch match;
if (std::regex_match( input, match, reg)) {
auto A = match[1];
auto B = std::stof( match[2] );
auto C = std::stoi( match[4] );
} else {
// error invalid format
}
Live example
As always when the input does not (or sometimes does not) match a strict enough syntax, read the whole line and then apply the rules which to a human are "obvious".
In this case (quoting comment by john):
Read the whole string as a single line. Then analyze the string to work out where the breaks are between A, B and C. Then convert each part to the type you require.
Specifically, you probably want to use reverse searching functions (e.g. https://en.cppreference.com/w/cpp/string/byte/strrchr ), because the last parts of the input seem the most strictly formatted, i.e. easiest to parse. The rest is then the unpredictable part at the start.
either try inputting the different data type in different lines and then use line breaks to input different data types or use the distinction to differentiate different data types like adding a . or comma
use the same symbol after each data package, for example, Juan Dela Cruz;150.50;5 then you can check for a ; and separate your string there.
If you want to use the same input format you could use digits as an indicator to separate them

How to efficiently read only strings from a big txt file

I have a very big .txt file (9 MB). In it the words are stored like this :
да 2337093
е 1504540
не 1480296
се 1212312
Every line in the .txt file consists of a string followed by a single space and a number.
I want to get only the words and store them in a string array. I see that a regex will be an overkill here, but fail to think of a another way as i'm not familiar with streams in c++.
Similar to below sample
#include <bits/stdc++.h>
using namespace std;
int main() {
vector<string> strings;
ifstream file("path_to_file");
string line;
while (getline(file, line))
strings.push_back(line.substr(0, line.find(" ")));
// Do whatever you want with 'strings' vector
}
You should read file line by line, and for each line use string's substr() method to parse a line based on space location, and you can use find() method to find the location of delimiter. take the word part which is before space and ignore rest.
You can look here for an example.

How do I only read a certain part of this line into a structure?

I am working with a csv file with a comma(,) as the delimiter. A certain line in the text file version of the csv file looks like this.
Station Name,MONTREAL/PIERRE ELLIOTT TRUDEAU INTL,,,,,,,,,,,,,,,,,,,,,,,
I want to be able to only store "MONTREAL/PIERRE ELLIOTT TRUDEAU INTL", minus the quotes. Therefore, i want to be able to not store STATION NAME. Based on my research, my code looks like this.
#include<string>
#include<sstream>
#include<fstream>
using namespace std;
struct company_data
{
string station_name, province, climate_identifier, TC_identifier, time_info;
float latitude, longitude;
int WMO_identifier;
string E, M, NA, symbol;
};
void accept_company_data (company_data initial)
{
ifstream infile;
infile.open("eng-hourly-montreal-wind_dec_2015.csv");
string line, temp1,temp2;
getline (infile, line);
istringstream iss(line);
iss>>temp1;
iss>>initial.station_name;
cout<<initial.station_name;
}
Any help would be greatly appreciated.
There are two ways to solve this
both of these use "C" string. you can use string.c_str() to get that.
take a look at strtok() - this will break up a string based on some delimiter (in your case the comma). on Linux/UNIX type 'man strtok'
set a pointer to the beginning of the string and loop till you hit the comma. Then increment the pointer by one (to pass over the comma) and save that position (set a pointer to it). Now continue to look for the next comma. When you have that next comma you can copy all the characters from the start pointer to the end pointer.
for example:
char *string = "you're input,with commas, in it";
char *start_pointer, *end_pointer, *ptr;
ptr = string;
while (*ptr!=',') ptr++; // scan along looking for comma
ptr++; // the above while, will have stopped on the comma
start_pointer=ptr;
while (*ptr!=',' && *ptr) ptr++;
end_pointer=ptr;
//now you can copy to your destination
char destination_buffer[128];
char *des=destination_buffer;
for(ptr=start_pointer;ptr<end_pointer;) *des++=*ptr++;
the above is not really efficient since you scan twice
what you could do is after you found the first comma
you can do
while(*ptr!=','&&*ptr) *des++=*ptr++;
the "&& *ptr " is looking for a NULL that delimits the end of a string.

getline to split string manipulation error

Hey guys so I have an assignment for class where I have to split a string and manipulate it. However, when I try to split the string and assign it to an array only the first element comes and the other two don't. Please help.
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
using namespace std;
int main()
{
string str;
cout<<"Enter a first name, middle initial, last name: ";
cin>> str;
string word;
string array[3];
stringstream stream(str);
int counter = 0;
while( getline(stream, word, ' ') )
{
cout<<word;
array[counter] = word;
counter++;
}
cout<<"The "<<array[0].length()<<" characters of the first name are: "<<array[0]<<endl;
cout<<"The "<<array[2].length()<<" characters of the last name are: "<<array[2]<<endl;
string newstring = array[2]+", "+array[0]+" "+array[1];
cout<<"In the phone book, the name would be: "<<newstring<<endl;
cout<<"The length of the name is: "<<newstring.length()<<endl;
cout<<"The comma is at position: "<<newstring.find(",")<<endl;
array[0].swap(array[2]);
cout<<"After the swap, the last name is "<<array[2]<<" and the first name is "<<array[0];
system("pause");
return 0;
}
There are a few blatant errors in your code:
You need to always check your input after trying to read! You do that using the while-loop but you also need to verify that you actually successfully read the string first.
It seems you are mixing the what the input operator for std::string and std::getline() are doing: the input operator reads the first word after skipping leading spaces while std::getline() read, well, a line (whether the line terminator can be specified as third argument).
When reading fixed sized array you always need to make sure you do not read more than fits into this array! You may have heart about hackers exploiting software by using buffer overruns: assuming you'd actually indeed read a line first followed by splitting it into words you'd have created one of those exploitable programs! If you don't want to check before each word if there is enough space in the array, you'd use, e.g., a std::vector<std::string> (doing so also has a problem with hackers, namely that it opens up the program for a Denial of Service attack but although this is still a problem it is a somewhat lesser problem).
There are also a few smaller issues with your program, too:
If you are only reading from a string stream, you should use std::istringstream as there is no need to also set up the writing part of the std::stringstream.
The programs asks for "first name, middle name, and last name". I would read that specification to use, e.g., "John, F., Kennedy" but it seems you'd expect "John F. Kennedy". One reason I would expect that commas are to be used is that I don't have a middle name, i.e., I would enter "Dietmar, , Kühl".

String in a text file containing a string in C++

here's a part from my code
string word;
cin >> word;
string keyword;
while (file >> keyword && keyword != word){}
This searches for a word in a file and if it finds that word (keyword) then it starts a string from there later. It's working perfectly at the moment. My problem is that when the line is
"Julia","2 Om.","KA","1 Om. 4:2"
if I enter word Julia I can not find it and use it for my purposes (just FYI I'm counting it). It works if I search for "Julia","2 since this is where space comes in.
I'd like to know how can I change line
while (file >> keyword && keyword != word){}
so I can see when the text/string CONTAINS that string since at the moment it only finds and accepts it if I enter the WHOLE string perfectly.
EDIT: Also what I have found this far is only strstr, strtok, strcmp. But these fit more with printf than with cout.
You can use methods from std::string like find.
#include <string>
#include <iostream>
// ...
std::string keyword;
std::string word;
getline(file, keyword);
do
{
std::cin >> word;
}
while (keyword.find(word) == std::string::npos);
The problem is that you're extracting strings, which by default will extract up until the next space. So at the first iteration, keyword is "Julia","2. If you want to extract everything separated by commas, I suggest using std::getline with , as the delimeter:
while (std::getline(file, keyword, ','))
This will look through all of the quoted strings. Now you can use std::string::find to determine if the input word is found within that quoted string:
while (std::getline(file, keyword, ',') &&
keyword.find(word) == std::string::npos)
Now this will loop through each quoted string until it gets to the one that contains word.
Use this method of istream to get a whole line instead of just a single "word":
http://www.cplusplus.com/reference/istream/istream/getline/
Then use strstr, to find the location of a string (like Julia) in a string (the line of the file):
http://www.cplusplus.com/reference/cstring/strstr/