Finding pattern in a text in C++ - c++

I have written the following code to find the number of "ATA" in a text that is read to a string as "GCTATAATAGCCATA". The count returned should be 3 but it returns 0. When I check in debugger the string for text is initially created. However, when an empty string is passed to the function patternCount. Am I reading the contents of the file into the string text correctly?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
void patternCount(string text, string pattern);
int main()
{
string text;
fstream file_("test.txt");
if(file_.is_open())
{
while(getline(file_,text))
{
cout << text << '\n';
}
file_.close();
}
cout << "Enter a string ";
string pattern;
getline(cin, pattern);
patternCount(text, pattern);
return 0;
}
void patternCount(string text, string pattern)
{
int count = 0;
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos)
{
nPos = text.find(pattern, nPos + 1);
++count;
}
cout << "There are " << count <<" " << pattern << " in your text.\n";
}

This code just counts the number of occurrence of input string in the last line of text file. If that line is empty or no does not contain the string, The output result will be 0.
But I guess the OP wants to search a whole file, in which case the main function need be fixed accordingly.
std::ifstream file{"test.txt"};
std::ostringstream text;
std::copy(std::istream_iterator<char>{file}, std::istream_iterator<char>{},std::ostream_iterator<char>{text});
//...
patternCount(text.str(), pattern);

So if I understand correctly, you're not sure if you're reading correctly the contents from the file test.txt. If you want to read every content, then try this instead:
ifstream file_("test.txt");
string s,text;
file_>>s;
text=s;
while(file_>>s)
{
text=text+" "+s;
}
This should probably work. Note that reading from a file like filename>>string only reads till the first space. That's why I'm using the while. You can also use getline(), which reads the whole text with spaces. Also note that you should include fstream. Printing out the text should help more as well.

#include <iostream>
#include <fstream>
#include <string>
using std::cout;
using std::cerr;
using std::string;
int count = 0; // we will count the total pattern count here
void patternCount(string text, string pattern);
int main() {
cout << "Enter a string ";
string pattern;
std::getline(cin, pattern);
string text;
fstream file_("test.txt");
if(file_.is_open()){
while(std::getline(file_,text))
patternCount(text,pattern);
file_.close();
}else
cerr<<"Failed to open file";
cout << "There are " << count <<" " << pattern << " in your text.\n";
return 0;
}
void patternCount(string text, string pattern){
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos) {
nPos = text.find(pattern, nPos + 1);
++count;
}
}
The Problem
Your code was good, there were no bugs in patternCount function.
But You were reading the file in an incorrect way. See, everytime you call std::getline(file_, text), the old result of the _text are overwritten by new line. So, in the end of the loop, when you pass text to patternCount function, your text only contains the last line of the file.
The Solution
You could have solved it in two ways:
As mentioned above, you could run patternCount() to each line in while loop and update a global count variable.
You could append all the lines to text in while loop and at last call the patternCount function.
Whichever you prefer, I have implemented the first, while second one is in other answers.

Related

One extra line being read in file handling

I am trying to get the number of lines and words from a text file in c++. But one extra line is being read by compiler.
#include <iostream>
#include<fstream>
using namespace std;
int main(void)
{
ifstream f;
string name;
char a;
int line, words;
line = 0;
words = 0;
f.open("file.txt");
while (f) {
f.get(a);
if (a == '\n')
{
line++;
words++;
}
if (a == ' ')
{
words++;
}
}
f.close();
cout << "Number of words in file : " << words << endl;
cout << "Numbers of lines in the file : " << line << endl;
}
OUTPUT:-
Number of words in file : 79
Numbers of lines in the file : 3
file.txt:-
This C++ Program which counts the number of lines in a file. The program creates an input file stream, reads a line on every iteration of a while loop, increments the count variable and the count variable is printed on the screen.
Here is source code of the C++ program which counts the number of lines in a file. The C++ program is successfully compiled and run on a Linux system. The program output is also shown below.
I am puzzled why one extra line is being read. Kindly help.
You are not checking if f.get() succeeds or fails. When it does fail, a is not updated, and you are not breaking the loop yet, so you end up acting on a's previous value again. And then the next loop iteration detects the failure and breaks the loop.
Change this:
while (f) {
f.get(a);
...
}
to this instead:
while (f.get(a)) {
...
}
That being said, you are also not taking into account that the last line in a file may not end with '\n', and if it does not then you are not counting that line. And also, you are assuming that every line always has at least 1 word in it, as you are incrementing words on every '\n' even for lines that have no words in them.
I would suggest using std::getline() to read and count lines, and std::istringstream to read and count the words in each line, eg:
#include <fstream>
#include <sstream>
#include <string>
#include <cctype>
using namespace std;
int main(void)
{
ifstream f;
string line, word;
int lines = 0, words = 0;
f.open("file.txt");
while (getline(f, line))
{
++lines;
std::istringstream iss(line);
while (iss >> word) {
++words;
}
}
f.close();
cout << "Number of words in file : " << words << endl;
cout << "Numbers of lines in the file : " << line << endl;
}
It is because you do not check in what state is stream that f.get(a) returns.

find lines (from a file) that contain a specified word

I cannot figure out how to list out the lines that contain a specified word. I am provided a .txt file that contains lines of text.
So far I have come this far, but my code is outputting the amount of lines there are. Currently this is the solution that made sense in my head:
#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
void searchFile(istream& file, string& word) {
string line;
int lineCount = 0;
while(getline(file, line)) {
lineCount++;
if (line.find(word)) {
cout << lineCount;
}
}
}
int main() {
ifstream infile("words.txt");
string word = "test";
searchFile(infile, word);
}
However, this code simply doesn't get the results I expect.
The output should just simply state which lines have the specified word on them.
So, to sum up the solution from the comments. It is just about the std::string's find member function. It doesn't return anything compatible with a boolean, it either return an index if found, or std::string::npos if not found, which is a special constant.
So calling it with traditional way if (line.find(word)) is wrong, but instead, it should be checked this way:
if (line.find(word) != std::string::npos) {
std::cout << "Found the string at line: " << lineCount << "\n";
} else {
// String not found (of course this else block could be omitted)
}

How to remove first word from a string?

Let's say I have
string sentence{"Hello how are you."}
And I want string sentence to have "how are you" without the "Hello". How would I go about doing that.
I tried doing something like:
stringstream ss(sentence);
ss>> string junkWord;//to get rid of first word
But when I did:
cout<<sentence;//still prints out "Hello how are you"
It's pretty obvious that the stringstream doesn't change the actual string. I also tried using strtok but it doesn't work well with string.
Try the following
#include <iostream>
#include <string>
int main()
{
std::string sentence{"Hello how are you."};
std::string::size_type n = 0;
n = sentence.find_first_not_of( " \t", n );
n = sentence.find_first_of( " \t", n );
sentence.erase( 0, sentence.find_first_not_of( " \t", n ) );
std::cout << '\"' << sentence << "\"\n";
return 0;
}
The output is
"how are you."
str=str.substr(str.find_first_of(" \t")+1);
Tested:
string sentence="Hello how are you.";
cout<<"Before:"<<sentence<<endl;
sentence=sentence.substr(sentence.find_first_of(" \t")+1);
cout<<"After:"<<sentence<<endl;
Execution:
> ./a.out
Before:Hello how are you.
After:how are you.
Assumption is the line does not start with an empty space. In such a case this does not work.
find_first_of("<list of characters>").
the list of characters in our case is space and a tab. This will search for first occurance of any of the list of characters and return an iterator. After that adding +1 movers the position by one character.Then the position points to the second word of the line.
Substr(pos) will fetch the substring starting from position till the last character of the string.
You can for example take the remaining substring
string sentence{"Hello how are you."};
stringstream ss{sentence};
string junkWord;
ss >> junkWord;
cout<<sentence.substr(junkWord.length()+1); //string::substr
However, it also depends what you want to do further
There are countless ways to do this. I think I would go with this:
#include <iostream>
#include <string>
int main() {
std::string sentence{"Hello how are you."};
// First, find the index for the first space:
auto first_space = sentence.find(' ');
// The part of the string we want to keep
// starts at the index after the space:
auto second_word = first_space + 1;
// If you want to write it out directly, write the part of the string
// that starts at the second word and lasts until the end of the string:
std::cout.write(
sentence.data() + second_word, sentence.length() - second_word);
std::cout << std::endl;
// Or, if you want a string object, make a copy from the start of the
// second word. substr copies until the end of the string when you give
// it only one argument, like here:
std::string rest{sentence.substr(second_word)};
std::cout << rest << std::endl;
}
Of course, unless you have a really good reason not to, you should check that first_space != std::string::npos, which would mean the space was not found. The check is omitted in my sample code for clarity :)
You could use string::find() to locate the first space. Once you have its index, then get the sub string with string::substr() from the index after the index of the space up to the end of the string.
One liner:
std::string subStr = sentence.substr(sentence.find_first_not_of(" \t\r\n", sentence.find_first_of(" \t\r\n", sentence.find_first_not_of(" \t\r\n"))));
working example:
#include <iostream>
#include <string>
void main()
{
std::string sentence{ "Hello how are you." };
char whiteSpaces[] = " \t\r\n";
std::string subStr = sentence.substr(sentence.find_first_not_of(whiteSpaces, sentence.find_first_of(whiteSpaces, sentence.find_first_not_of(whiteSpaces))));
std::cout << subStr;
std::cin.ignore();
}
Here's how to use a stringstream to extract the junkword while ignoring any space before or after (using std::ws), then get the rest of the sentence, with robust error handling....
std::string sentence{"Hello how are you."};
std::stringstream ss{sentence};
std::string junkWord;
if (ss >> junkWord >> std::ws && std::getline(ss, sentence, '\0'))
std::cout << sentence << '\n';
else
std::cerr << "the sentence didn't contain ANY words at all\n";
See it running on ideone here....
#include <iostream> // cout
#include <string> // string
#include <sstream> // string stream
using namespace std;
int main()
{
string testString = "Hello how are you.";
istringstream iss(testString); // note istringstream NOT sstringstream
char c; // this will read the delima (space in this case)
string firstWord;
iss>>firstWord>>c; // read the first word and end after the first ' '
cout << "The first word in \"" << testString << "\" is \"" << firstWord << "\""<<endl;
cout << "The rest of the words is \"" <<testString.substr(firstWord.length()+1) << "\""<<endl;
return 0;
}
output
The first word in "Hello how are you." is "Hello"
The rest of the words is "how are you."
live testing at ideon

HW Help: get char instead of get line C++

I wrote the code below that successfully gets a random line from a file; however, I need to be able to modify one of the lines, so I need to be able to get the line character by character.
How can I change my code to do this?
Use std::istream::get instead of std::getline. Just read your string character by character until you reach \n, EOF or other errors. I also recommend you read the full std::istream reference.
Good luck with your homework!
UPDATE:
OK, I don't think an example will hurt. Here is how I'd do it if I were you:
#include <string>
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace std;
static std::string
answer (const string & question)
{
std::string answer;
const string filename = "answerfile.txt";
ifstream file (filename.c_str ());
if (!file)
{
cerr << "Can't open '" << filename << "' file.\n";
exit (1);
}
for (int i = 0, r = rand () % 5; i <= r; ++i)
{
answer.clear ();
char c;
while (file.get (c).good () && c != '\n')
{
if (c == 'i') c = 'I'; // Replace character? :)
answer.append (1, c);
}
}
return answer;
}
int
main ()
{
srand (time (NULL));
string question;
cout << "Please enter a question: " << flush;
cin >> question;
cout << answer (question) << endl;
}
... the only thing is that I have no idea why do you need to read string char by char in order to modify it. You can modify std::string object, which is even easier. Let's say you want to replace "I think" with "what if"? You might be better off reading more about
std::string and using find, erase, replace etc.
UPDATE 2:
What happens with your latest code is simply this - you open a file, then you get its content character by character until you reach newline (\n). So in either case you will end up reading the first line and then your do-while loop will terminate. If you look into my example, I did while loop that reads line until \n inside a for loop. So that is basically what you should do - repeat your do-while loop for as many times as many lines you want/can get from that file. For example, something like this will read you two lines:
for (int i = 1; i <= 2; ++i)
{
do
{
answerfile.get (answer);
cout << answer << " (from line " << i << ")\n";
}
while (answer != '\n');
}

Reading a file into an array

I would like to read a text file and input its contents into an array. Then I would like to show the contents of the array in the command line.
My idea is to open the file using:
inFile.open("pigData.txt")
And then to get the contents of the file using:
inFile >> myarray [size]
And then show the contents using a for loop.
My problem is that the file I am trying to read contain words and I don't know how to get a whole word as an element in the array. Also, let's say that the words are divided by spaces, thus:
hello goodbye
Could be found on the file. I would like to read the whole line "hello goodbye" into an element of a parallel array. How can I do that?
Should be pretty straightforward.
std::vector<std::string> file_contents;
std::string line;
while ( std::getline(inFile,line) )
file_contents.push_back(line);
std::vector<std::string>::iterator it = file_contents.begin();
for(; it!=file_contents.end() ; ++it)
std::cout << *it << "\n";
Edit:
Your comment about having "hello goodbye" as element zero and element one is slightly confusing to me. The above code snip will read each line of the file and store that as an individual entry in the array 'file_contents'. If you want to read it and split it on spaces that is slightly different.
For context, you could have provided a link to your previous question, about storing two lists of words in different languages. There I provided an example of reading the contents of a text file into an array:
const int MaxWords = 100;
std::string piglatin[MaxWords];
int numWords = 0;
std::ifstream input("piglatin.txt");
std::string line;
while (std::getline(input, line) && numWords < MaxWords) {
piglatin[numWords] = line;
++numWords;
}
if (numWords == MaxWords) {
std::cerr << "Too many words" << std::endl;
}
You can't have one parallel array. For something to be parallel, there must be at least two. For parallel arrays of words, you could use a declarations like this:
std::string piglatin[MaxWords];
std::string english[MaxWords];
Then you have two options for filling the arrays from the file:
Read an entire line, and the split the line into two words based on where the first space is:
while (std::getline(input, line) && numWords < MaxWords) {
std::string::size_type space = line.find(' ');
if (space == std::string::npos)
std::cerr << "Only one word" << std::endl;
piglatin[numWords] = line.substr(0, space);
english[numWords] = line.substr(space + 1);
++numWords;
}
Read one word at a time, and assume that each line has exactly two words on it. The >> operator will read a word at a time automatically. (If each line doesn't have exactly two words, then you'll have problems. Try it out to see how things go wrong. Really. Getting experience with a bug when you know what the cause is will help you in the future when you don't know what the cause is.)
while (input && numWords < MaxWords) {
input >> piglatin[numWords];
input >> english[numWords];
++numWords;
}
Now, if you really one one array with two elements, then you need to define another data structure because an array can only have one "thing" in each element. Define something that can hold two strings at once:
struct word_pair {
std::string piglatin;
std::string english;
};
Then you'll have just one array:
word_pair words[MaxWords];
You can fill it like this:
while (std::getline(input, line) && numWords < MaxWords) {
std::string::size_type space = line.find(' ');
if (space == std::string::npos)
std::cerr << "Only one word" << std::endl;
words[numWords].piglatin = line.substr(0, space);
words[numWords].english = line.substr(space + 1);
++numWords;
}
Notice how the code indexes into the words array to find the next word_pair object, and then it uses the . operator to get to the piglatin or english field as necessary.
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;
int main()
{
// This will store each word (separated by a space)
vector<string> words;
// Temporary variable
string buff;
// Reads the data
fstream inFile("words.txt");
while(!inFile.eof())
{
inFile>>buff;
words.push_back(buff);
}
inFile.close();
// Display
for(size_t i=0;i<words.size();++i) cout<<words[i]<<" ";
return 0;
}
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int main ()
{
vector<string> fileLines;
string line;
ifstream inFile("pigData.txt");
if ( inFile.is_open() ) {
while ( !inFile.eof() ) {
getline(inFile, line);
fileLines.push_back(line);
}
inFile.close();
} else {
cerr << "Error opening file" << endl;
exit(1);
}
for (int i=0; i<fileLines.size(); ++i) {
cout << fileLines[i] << "\n";
}
cout << endl;
return 0;
}