How to find a string of 2 words in a file? - c++

With the following code, I can find a string of 1 word (in this example I'm looking for "Word"):
ifstream file("file.txt");
string str;
while (file >> str){
if (str.find("Word") != string::npos){
////
}
}
But it doesn't work if I want to find, for example, "Computer screen", which is composed of two words.
Thanks

file >> str reads a parameter (in this case, a string) delimited with whitespace. If you want to read the whole line (or in any case, more than one word at once), you can use getline operator (reads the string which is delimited by newline by default).
ifstream file("file.txt");
string str;
while (std::getline (file,str)){
if (str.find("Computer screen") != string::npos){
////
}
}

If you know there are two words and what they are, all you need is this:
ifstream file("file.txt");
string str;
while (file >> str){
if (str.find("Computer") != string::npos){
file >> str;
if (str.find("screen") != string::npos) {
////
}
}
}
But more likely, you are asking to find a single string that might be two words, or three or more.
Then, can you count on the string being on a single line? In which case, #Ashalynd's solution will work. But if the string might be broken it will fail. You then need to handle that case.
If your file is "small" - i.e. can easily fit in memory, read in the whole thing, remove line breaks and search for the string.
If it is too large, read in lines as pairs.
Something like this:
std::ifstream file("file.txt");
std::string str[2];
int i = 0;
std::getline (file,str[i]);
++i;
while (std::getline (file,str[i]))
{
int next_i = (i+1)%2;
std::string pair = str[next_i] + " " + str[i];
if (pair.find("Computer screen") != std::string::npos)
{
////
}
i = next_i;
}
All this assumes that the possible white space between the words in the string is a single space or a newline. If there is a line break with more white-space of some kind (e.g. tabs, you need either to replace white-space in the search string with a regex for white-space, or implement a more complex state machine.
Also, consider whether you need to manage case, probably by converting all strings to lower case before the match.

Related

Reading in only letters from a text file

I am trying to read in from a text file a poem that contains commas, spaces, periods, and newline character. I am trying to use getline to read in each separate word. I do not want to read in any of the commas, spaces, periods, or newline character. As I read in each word I am capitalizing each letter then calling my insert function to insert each word into a binary search tree as a separate node. I do not know the best way to separate each word. I have been able to separate each word by spaces but the commas, periods, and newline characters keep being read in.
Here is my text file:
Roses are red,
Violets are blue,
Data Structures is the best,
You and I both know it is true.
The code I am using is this:
string inputFile;
cout << "What is the name of the text file?";
cin >> inputFile;
ifstream fin;
fin.open(inputFile);
//Input once
string input;
getline(fin, input, ' ');
for (int i = 0; i < input.length(); i++)
{
input[i] = toupper(input[i]);
}
//check for duplicates
if (tree.Find(input, tree.Current, tree.Parent) == true)
{
tree.Insert(input);
countNodes++;
countHeight = tree.Height(tree.Root);
}
Basically I am using the getline(fin,input, ' ') to read in my input.
I was able to figure out a solution. I was able to read in an entire line of code into the variable line, then I searched each letter of the word and only kept what was a letter and I stored that into word.Then, I was able to call my insert function to insert the Node into my tree.
const int MAXWORDSIZE = 50;
const int MAXLINESIZE = 1000;
char word[MAXWORDSIZE], line[MAXLINESIZE];
int lineIdx, wordIdx, lineLength;
//get a line
fin.getline(line, MAXLINESIZE - 1);
lineLength = strlen(line);
while (fin)
{
for (int lineIdx = 0; lineIdx < lineLength;)
{
//skip over non-alphas, and check for end of line null terminator
while (!isalpha(line[lineIdx]) && line[lineIdx] != '\0')
++lineIdx;
//make sure not at the end of the line
if (line[lineIdx] != '\0')
{
//copy alphas to word c-string
wordIdx = 0;
while (isalpha(line[lineIdx]))
{
word[wordIdx] = toupper(line[lineIdx]);
wordIdx++;
lineIdx++;
}
//make it a c-string with the null terminator
word[wordIdx] = '\0';
//THIS IS WHERE YOU WOULD INSERT INTO THE BST OR INCREMENT FREQUENCY COUNTER IN THE NODE
if (tree.Find(word) == false)
{
tree.Insert(word);
totalNodes++;
//output word
//cout << word << endl;
}
else
{
tree.Counter();
}
}
This is a good time for a technique I've posted a few times before: define a ctype facet that treats everything but letters as white space (searching for imbue will show several examples).
From there, it's a matter of std::transform with istream_iterators on the input side, a std::set for the output, and a lambda to capitalize the first letter.
You can make a custom getline function for multiple delimiters:
std::istream &getline(std::istream &is, std::string &str, std::string const& delims)
{
str.clear();
// the 3rd parameter type and the condition part on the right side of &&
// should be all that differs from std::getline
for(char c; is.get(c) && delims.find(c) == std::string::npos; )
str.push_back(c);
return is;
}
And use it:
getline(fin, input, " \n,.");
You can use std::regex to select your tokens
Depending on the size of your file you can read it either line by line or entirely in an std::string.
To read the file you can use :
std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
and this will do the matching for space separated string.
std::regex word_regex(",\\s]+");
auto what =
std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();
std::vector<std::string> v;
for (;what!=wend ; wend) {
std::smatch match = *what;
V.push_back(match.str());
}
I think to separate tokens separated either by , space or new line you should use this regex : (,| \n| )[[:alpha:]].+ . I have not tested though and it might need you to check this out.

Ignoring whitespaces/tabs/newlines with getline()

I went through lot of the resources on web but still not able to get this. I didn't understand how
std::skipws works to ignore whitespaces , tabs and newlines.
Following is my simple code
vector<string> vec;
while(1){
getline(cin, s);
if( s.compare("#") == 0)
break;
else
vec.push_back(s);
}
I will enter a line of strings with newlines, whitespaces and tabs. After input I want to store strings into the vector and that will stop when "#" string is encountered. I tried with the above code but it store spaces along with the strings in the vector though it terminates after enterting "#".
The purpose of std::getline is to read an entire line, including whitespace, into a string buffer.
If you want to read tokens from a stream, skipping whitespace, then use the standard input operator >>.
std::vector<std::string> vec;
std::string s;
while(std::cin >> s && s != "#") {
vec.push_back(s);
}
Live example
std::skipws is skipping only the leading whitespace characters in any input stream. It therefore has no effect on all the whitespaces after the first non-whitespace. If you want to read whole lines with getline(cin, s) you might as well consider removing the blanks and tabs that have been read from the string before pushing it into the container like so :
while (1){
getline(cin, s);
if (s.compare("#") == 0) {
break;
}
else {
s.erase(remove_if(s.begin(), s.end(), ::isspace), s.end());
vec.push_back( s );
}
}
For a discussion on how to remove whitespaces from a string see also : Remove spaces from std::string in C++

Extracting arguments using stringstream

I want to input a phrase and extract each character of the phrase:
int main()
{
int i = 0;
string line, command;
getline(cin, line); //gets the phrase ex: hi my name is andy
stringstream lineStream(line);
lineStream>>command;
while (command[i]!=" ") //while the character isn't a whitespace
{
cout << command[i]; //print out each character
i++;
}
}
however i get the error: cant compare between pointer and integer at the while statement
As your title "Extracting arguments using stringstream" suggests:
I think you're looking for this :
getline(cin, line);
stringstream lineStream(line);
std::vector<std::string> commands; //Can use a vector to store the words
while (lineStream>>command)
{
std::cout <<command<<std::endl;
//commands.push_back(command); // Push the words in vector for later use
}
command is a string, so command[i] is a character. You can't compare characters to string literals, but you can compare them to character literals, like
command[i]!=' '
However, you're not going to get a space in your string, as the input operator >> reads space delimited "words". So you have undefined behavior as the loop will continue out of bounds of the string.
You might want two loops, one outer reading from the string stream, and one inner to get the characters from the current word. Either that, or loop over the string in line instead (which I don't recommend as there are more whitespace characters than just space). Or of course, since the "input" from the string stream already is whitespace separated, just print the string, no need to loop over the characters.
To extract all words from the string stream and into an vector of strings, you can use the following:
std::istringstream is(line);
std::vector<std::string> command_and_args;
std::copy(std::istream_iterator<std::string>(is),
std::istream_iterator<std::string>(),
std::back_inserter(command_and_args));
After the above code, the vector command_and_args contains all whitespace delimited words from the string stream, with command_and_args[0] being the command.
References: std::istream_iterator, std::back_inserter, std::copy.

Read an entire line including spaces from fstream

I am currently working on a small project in C++ and am a bit confused at the moment. I need to read a certain amount of words in a line that is taken from a file using ifstream in(). The problem with it right now is the fact that it keeps ignoring spaces. I need to count the amount of spaces within the file to calculate the number of words. Is there anyway to have in() not ignore the white space?
ifstream in("input.txt");
ofstream out("output.txt");
while(in.is_open() && in.good() && out.is_open())
{
in >> temp;
cout << tokencount(temp) << endl;
}
To count the number of spaces in a file:
std::ifstream inFile("input.txt");
std::istreambuf_iterator<char> it (inFile), end;
int numSpaces = std::count(it, end, ' ');
To count the number of whitespace characters in a file:
std::ifstream inFile("input.txt");
std::istreambuf_iterator<char> it (inFile), end;
int numWS = std::count_if(it, end, (int(*)(int))std::isspace);
As an alternative, instead of counting spaces, you could count words.
std::ifstream inFile("foo.txt);
std::istream_iterator<std::string> it(inFile), end;
int numWords = std::distance(it, end);
Here's how I'd do it:
std::ifstream fs("input.txt");
std::string line;
while (std::getline(fs, line)) {
int numSpaces = std::count(line.begin(), line.end(), ' ');
}
In general, if I have to do something for every line of a file, I find std::getline to be the least finicky way of doing it. If I need stream operators from there I'll end up making a stringstream out of just that line. It's far from the most efficient way of doing things but I'm usually more concerned with getting it right and moving on with life for this sort of thing.
You can use count with an istreambuf_iterator:
ifstream fs("input.txt");
int num_spaces = count(istreambuf_iterator<unsigned char>(fs),
istreambuf_iterator<unsigned char>(),
' ');
edit
Originally my answer used istream_iterator, however as #Robᵩ pointed out it doesn't work.
istream_iterator will iterate over a stream, but assume whitespace formatting and skip over it. My example above but using istream_iterator returned the result zero, as the iterator skipped whitespace and then I asked it to count the spaces that were left.
istreambuf_iterator however takes one raw character at a time, no skipping.
See istreambuf_iterator vs istream_iterator for more info.

Parsing a string of numbers into an integer array

I have a text file with numbers ranging from 0-255 separated by commas. I want to be able to store each of these numbers into an integer array. An example of what the text file might contain is;
"32,51,45,12,5,2,7,2,9,233,132,175,143,33..." etc
I have managed to get my program to store the data from the text file as a string and output them on the screen. What I need to do next is store the values of that string in an integer array, separating the numbers by the commas.
Here is the code I have written so far, which I am having problems getting it working;
int _tmain(int argc, _TCHAR* argv[])
{
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
//STRING CONVERSION
std::string str = line;
std::vector<int> vect;
std::stringstream ss(str);
int i = 0;
while (ss >> i)
{
vect.push_back(i);
if (ss.peek() == ',')
ss.ignore();
}
system("pause");
return 0;
It looks like your code for tokenizing your string is bit off. In particular you need to make sure you call atoi() on the string of your integer to get an integer. I'll focus on the parsing of the string though.
One thing you could use is C's strtok. I recommend this mainly because your case is rather simple, and this is probably the simplest way to go about it.
The code you'd look for is essentially this:
char* numStr = strtok(str.c_str(), ",");
while (numStr)
{
vect.push_back(atoi(numStr));
numStr = strtok(NULL, ",");
}
strtok() takes two arguments: a pointer to the C-style string (char*) you're tokenizing, and the string of delimiters (note that each character in the delimiter string is treated as its own delimiter).
I should mention that strtok is not thread-safe, and you also have to ensure that the string you extract from the file ends with a null character \0.
The answers to this question provide many alternatives to my solution. If you'd prefer to use std::stringstream then I suggest you look at the 5th answer on that page.
Regarding your trouble with PDBs, what is the exact error you're getting?