What types of indicators are there for the end of a string when tokenizing a sentence? - c++

I am trying to take a string holding a sentence and break it up by words to add to a linked list class called wordList.
When dealing with strings in C++, what is the indicator that you have reached the end of a string? Searched here are found that c strings are null terminated and some are indicated with a '\0' but these solutions give me errors.
I know there are other ways to do this (like scanning through individual characters) but I am fuzzy on how to implement.
void lineScan( string line) // Adds words to wordList from line of a file
{
istringstream iss(line);
string lineWord;
getline(iss, lineWord, ' ');
wrds.addWords( lineWord );
while( lineWord!= NULL )
{
getline(iss, lineWord, ' ');
wrds.addWords( lineWord );
}
}

You probably want to skip all whitespace, not use a single space as separator (your code will read empty tokens).
But you're not really dealing with strings here, and in particular not with C strings.
Since you're using istringstream, you're looking for the end of a stream, and it works like all instreams.
void lineScan(string line) // Adds words to wordList from line of a file
{
istringstream iss(line);
string word;
while (iss >> word)
{
wrds.addWords(word);
}
}

Related

How to skip a line of a file if it starts with # in c++

So say I have a txt file that goes like:
#unwanted line
something=another thing
something2=another thing 2
#unwanted_line_2=unwanted
something3=another thing 3
and I am reading it with
getline(inFile,astring,'=');
to separate a something from its value (inside a while loop). How do I skip the entire lines that start with # ?
Also I'm storing this in a vector, if it is of any matter.
Use getline() without a delimiter to read an entire line up to \n. Then check if the line begins with #, and if so then discard it and move on. Otherwise, put the string into an istringstream and use getline() with '=' as the delimiter to split the line (or, just use astring.find() and astring.substr() instead).
For example:
while (getline(inFile, astring))
{
if (!asstring.empty() && astring[0] != '#')
{
istringstream iss(astring);
getline(iss, aname, '=');
getline(iss, avalue);
...
}
}

C++ read different kind of datas from file until there's a string beginning with a number

In C++, I'd like to read from an input file which contains different kind of datas: first the name of a contestant (2 or more strings with whitespaces), then an ID (string without whitespaces, always beginning with a number), then another strings without ws and a numbers (the sports and their achieved places).
For example:
Josh Michael Allen 1063Szinyei running 3 swimming 1 jumping 1
I show you the code what I started to write and then stucked..
void ContestEnor::next()
{
string line;
getline(_f , line);
if( !(_end = _f.fail()) ){
istringstream is(line);
is >> _cur.contestant >> _cur.id; // here I don't know how to go on
_cur.counter = 0;
//...
}
}
Thank you for your help in advance.
You should look into using std::getline with a delimiter. This way, you can delimit on a space character and read until you find a string where the first character in a number. Here is a short code example (this seems rather homework-like, so I don't want to write too much of it for you ;):
std::string temp, id;
while (std::getline(_f, temp, ' ')) {
if (temp[0] >= 0 && temp[0] <= '9') {
id = temp;
}
// you would need to add more code for the rest of the data on that line
}
/* close the file, etc. */
This code should be pretty self-explanatory. The most important thing to know is that you can use std::getline to get data up until a delimiter. The delimiter is consumed, just like the default behavior of delimiting on a newline character. Thus, the name getline isn't entirely accurate - you can still get only part of a line if you need to.

populating a string vector with tab delimited text

I'm very new to C++.
I'm trying to populate a vector with elements from a tab delimited file. What is the easiest way to do that?
Thanks!
There could be many ways to do it, simple Google search give you a solution.
Here is example from one of my projects. It uses getline and read comma separated file (CSV), I let you change it for reading tab delimited file.
ifstream fin(filename.c_str());
string buffer;
while(!fin.eof() && getline(fin, buffer))
{
size_t prev_pos = 0, curr_pos = 0;
vector<string> tokenlist;
string token;
// check string
assert(buffer.length() != 0);
// tokenize string buffer.
curr_pos = buffer.find(',', prev_pos);
while(1) {
if(curr_pos == string::npos)
curr_pos = buffer.length();
// could be zero
int token_length = curr_pos-prev_pos;
// create new token and add it to tokenlist.
token = buffer.substr(prev_pos, token_length);
tokenlist.push_back(token);
// reached end of the line
if(curr_pos == buffer.length())
break;
prev_pos = curr_pos+1;
curr_pos = buffer.find(',', prev_pos);
}
}
UPDATE: Improved while condition.
This is probably the easiest way to do it, but vcp's approach can be more efficient.
std::vector<string> tokens;
std::string token;
while (std::getline(infile, token, '\t')
{
tokens.push_back(token);
}
Done. You can actually get this down to about three lines of code with an input iterator and a back inserter, but why?
Now if the file is cut up into lines and separated by tabs on those lines, you also have to handle the line delimiters. Now you just do the above twice, one loop for lines and an inner loop to parse the tabs.
std::vector<string> tokens;
std::string line;
while (std::getline(infile, line)
{
std::stringstream instream(line)
std::string token;
while (std::getline(instream, token, '\t')
{
tokens.push_back(token);
}
}
And if you needed to do line, then tabs, then... I dunno... quotes? Three loops. But to be honest by three I'm probably looking at writing a state machine. I doubt your teacher wants anything like that at this stage.

c++ Ignore sections of string before or after reading a file

I am writing a program where I read a text file using getline() and store each line in a vector<string>.
ifstream flirfile(flir_time_dir);
vector<string> flir_times;
string flir_line;
while (getline(flirfile, flir_line))
{
flir_times.push_back(flir_line);
}
This is the text file that the program reads:
The program works fine but what I want to do is ignore everything on each line except for the hex string in the middle. So in other words ignore everything before the first underscore and after the second underscore (including the underscores). Is there an easy way to do that? And is it better to ignore the text while the file is being read or afterwards when the lines are stored in the vector? Thanks.
There are ways to split strings on separators, which means you can split the string on the '_' character, and use only the middle-part.
A simple way of doing this is to use std::getline and std::istringstream:
while (getline(flirfile, flir_line))
{
std::istringstream is{flir_line};
std::string dummy;
std::string flir;
std::getline(is, dummy, '_');
std::getline(is, flir, '_');
flir_times.push_back(flir);
}
If your compiler supports C++11 you can use regular expression:
#include <regex>
// ...
std::regex my_regex("_[^_]*_");
while (getline(flirfile, flir_line))
{
std::smatch my_match;
if (std::regex_search(flir_line, my_match, my_regex))
flir_times.push_back(my_match[0]);
}

String in a text file containing a string in C++

here's a part from my code
string word;
cin >> word;
string keyword;
while (file >> keyword && keyword != word){}
This searches for a word in a file and if it finds that word (keyword) then it starts a string from there later. It's working perfectly at the moment. My problem is that when the line is
"Julia","2 Om.","KA","1 Om. 4:2"
if I enter word Julia I can not find it and use it for my purposes (just FYI I'm counting it). It works if I search for "Julia","2 since this is where space comes in.
I'd like to know how can I change line
while (file >> keyword && keyword != word){}
so I can see when the text/string CONTAINS that string since at the moment it only finds and accepts it if I enter the WHOLE string perfectly.
EDIT: Also what I have found this far is only strstr, strtok, strcmp. But these fit more with printf than with cout.
You can use methods from std::string like find.
#include <string>
#include <iostream>
// ...
std::string keyword;
std::string word;
getline(file, keyword);
do
{
std::cin >> word;
}
while (keyword.find(word) == std::string::npos);
The problem is that you're extracting strings, which by default will extract up until the next space. So at the first iteration, keyword is "Julia","2. If you want to extract everything separated by commas, I suggest using std::getline with , as the delimeter:
while (std::getline(file, keyword, ','))
This will look through all of the quoted strings. Now you can use std::string::find to determine if the input word is found within that quoted string:
while (std::getline(file, keyword, ',') &&
keyword.find(word) == std::string::npos)
Now this will loop through each quoted string until it gets to the one that contains word.
Use this method of istream to get a whole line instead of just a single "word":
http://www.cplusplus.com/reference/istream/istream/getline/
Then use strstr, to find the location of a string (like Julia) in a string (the line of the file):
http://www.cplusplus.com/reference/cstring/strstr/