compare two strings by individual characters C++ - c++

Working on a program that compares an argument to text in a file (my file being a dictionary containing a lot of english words).
Right now the application works only the strings match completely.
Wanted to know if there was a way to compare a partial string that is inputted to a complete string in the file and have it be a match.
Example if the arg is ap, it'll match it to apple, application alliance ext.
# include <iostream>
# include <fstream>
# include <cstdlib>
# include <string>
using namespace std;
int main ( int argc, char *argv[] ) {
ifstream inFile;
inFile.open("/Users/mikelucci/Desktop/american-english-insane");
//Check for error
if (inFile.fail()) {
cerr << "Fail to Open File" << endl;
exit(1);
}
string word;
int count = 0;
//Use loop to read through file until the end
while (!inFile.eof()) {
inFile >> word;
if (word == argv[1]) {
count++;
}
}
cout << count << " words matched." << endl;
inFile.close();
return 0;
}

If by "match" you mean "a string from file contains a string from the input" then you can use string::find method. In this case your condition would look like that:
word.find(argv[1]) != string::npos
If by "match" you mean "a string from file starts with a string from the input" then, again you can use string::find but with the following condition:
word.find(argv[1]) == 0
The relevant documentation is here.

Start by copying argv[1] into a string (not strictly necessary, but it makes the subsequent comparison a bit simpler):
std::string target(arg[1]);
Then use std::equal:
if (std::equal(word.begin(), word.end(), target.begin(). target.end()))
This form of equal (added in C++14) returns true if the shorter of the two sequences matches the corresponding characters at the beginning of the longer.

Related

Need to read a line of text from an archive.txt until "hhh" its found and then go to the next line

My teacher gave me this assignment, where I need to read words from a file and then do some stuff with them. My issue is that these words have to end with "hhh", for example: groundhhh, wallhhh, etc.
While looking for a way to do this, I came up with the getline function from the <fstream> library. The thing is that getline(a,b,c) uses 3 arguments, where the third argument is to read until c is found but c has to be a char, so it doesn't work for me.
In essence, what I'm trying to achieve is reading a word from a file, "egghhh" for example, and make it so if "hhh" is read then it means the line finishes there and I receive "egg" as the word. My teacher used the term "sentinel" to describe this hhh thing.
This is my try:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
ifstream read_archive;
void showListWords(){
string word;
string sentinel = "hhh";
while (getline(read_archive,word,sentinel)){
cout << word << endl;
}
}
void openReadArchive(){
read_archive.open("words.txt");
if(read_archive.fail())
cout << "There is something wrong with the archive" << endl;
}
As you discovered, std::getline() allows you to specify only 1 char for the "sentinel" that it stops reading on, where '\n' (line break) is the default.
What you can do is use this default to read an entire line of text from the file into a std::string, then put that string into an std::istringstream and read words from that stream until the end of the stream is reached or you find a word that ends with "hhh", eg:
#include <sstream>
#include <limits>
void showListWords()
{
string line, word;
string sentinel = "hhh";
while (getline(read_archive, line))
{
istringstream iss(line);
while (iss >> word)
{
if (word.length() > sentinel.length())
{
string::size_type index = word.length() - sentinel.length();
if (word.compare(index, sentinel.length(), sentinel) == 0)
{
word.resize(index);
iss.ignore(numeric_limits<streamsize>::max());
}
}
cout << word << endl;
}
}
}
In which case, you could alternatively just read words from the original file stream, stopping when you find a word that ends with "hhh" and ignore() the rest of the current line before continuing to read words from the next line, eg:
#include <limits>
void showListWords()
{
string word;
string sentinel = "hhh";
while (read_archive >> word)
{
if (word.length() > sentinel.length())
{
string::size_type index = word.length() - sentinel.length();
if (word.compare(index, sentinel.length(), sentinel) == 0)
{
word.resize(index);
read_archive.ignore(numeric_limits<streamsize>:max, '\n');
}
}
cout << word << endl;
}
}

find lines (from a file) that contain a specified word

I cannot figure out how to list out the lines that contain a specified word. I am provided a .txt file that contains lines of text.
So far I have come this far, but my code is outputting the amount of lines there are. Currently this is the solution that made sense in my head:
#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
void searchFile(istream& file, string& word) {
string line;
int lineCount = 0;
while(getline(file, line)) {
lineCount++;
if (line.find(word)) {
cout << lineCount;
}
}
}
int main() {
ifstream infile("words.txt");
string word = "test";
searchFile(infile, word);
}
However, this code simply doesn't get the results I expect.
The output should just simply state which lines have the specified word on them.
So, to sum up the solution from the comments. It is just about the std::string's find member function. It doesn't return anything compatible with a boolean, it either return an index if found, or std::string::npos if not found, which is a special constant.
So calling it with traditional way if (line.find(word)) is wrong, but instead, it should be checked this way:
if (line.find(word) != std::string::npos) {
std::cout << "Found the string at line: " << lineCount << "\n";
} else {
// String not found (of course this else block could be omitted)
}

Finding pattern in a text in C++

I have written the following code to find the number of "ATA" in a text that is read to a string as "GCTATAATAGCCATA". The count returned should be 3 but it returns 0. When I check in debugger the string for text is initially created. However, when an empty string is passed to the function patternCount. Am I reading the contents of the file into the string text correctly?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
void patternCount(string text, string pattern);
int main()
{
string text;
fstream file_("test.txt");
if(file_.is_open())
{
while(getline(file_,text))
{
cout << text << '\n';
}
file_.close();
}
cout << "Enter a string ";
string pattern;
getline(cin, pattern);
patternCount(text, pattern);
return 0;
}
void patternCount(string text, string pattern)
{
int count = 0;
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos)
{
nPos = text.find(pattern, nPos + 1);
++count;
}
cout << "There are " << count <<" " << pattern << " in your text.\n";
}
This code just counts the number of occurrence of input string in the last line of text file. If that line is empty or no does not contain the string, The output result will be 0.
But I guess the OP wants to search a whole file, in which case the main function need be fixed accordingly.
std::ifstream file{"test.txt"};
std::ostringstream text;
std::copy(std::istream_iterator<char>{file}, std::istream_iterator<char>{},std::ostream_iterator<char>{text});
//...
patternCount(text.str(), pattern);
So if I understand correctly, you're not sure if you're reading correctly the contents from the file test.txt. If you want to read every content, then try this instead:
ifstream file_("test.txt");
string s,text;
file_>>s;
text=s;
while(file_>>s)
{
text=text+" "+s;
}
This should probably work. Note that reading from a file like filename>>string only reads till the first space. That's why I'm using the while. You can also use getline(), which reads the whole text with spaces. Also note that you should include fstream. Printing out the text should help more as well.
#include <iostream>
#include <fstream>
#include <string>
using std::cout;
using std::cerr;
using std::string;
int count = 0; // we will count the total pattern count here
void patternCount(string text, string pattern);
int main() {
cout << "Enter a string ";
string pattern;
std::getline(cin, pattern);
string text;
fstream file_("test.txt");
if(file_.is_open()){
while(std::getline(file_,text))
patternCount(text,pattern);
file_.close();
}else
cerr<<"Failed to open file";
cout << "There are " << count <<" " << pattern << " in your text.\n";
return 0;
}
void patternCount(string text, string pattern){
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos) {
nPos = text.find(pattern, nPos + 1);
++count;
}
}
The Problem
Your code was good, there were no bugs in patternCount function.
But You were reading the file in an incorrect way. See, everytime you call std::getline(file_, text), the old result of the _text are overwritten by new line. So, in the end of the loop, when you pass text to patternCount function, your text only contains the last line of the file.
The Solution
You could have solved it in two ways:
As mentioned above, you could run patternCount() to each line in while loop and update a global count variable.
You could append all the lines to text in while loop and at last call the patternCount function.
Whichever you prefer, I have implemented the first, while second one is in other answers.

Count first digit on each line of a text file

My project takes a filename and opens it. I need to read each line of a .txt file until the first digit occurs, skipping whitespace, chars, zeros, or special chars. My text file could look like this:
1435 //1, nextline
0 //skip, next line
//skip, nextline
(*Hi 245*) 2 //skip until second 2 after comment and count, next line
345 556 //3 and count, next line
4 //4, nextline
My desired output would be all the way up to nine but I condensed it:
Digit Count Frequency
1: 1 .25
2: 1 .25
3: 1 .25
4: 1 .25
My code is as follows:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
int digit = 1;
int array[8];
string filename;
//cout for getting user path
//the compiler parses string literals differently so use a double backslash or a forward slash
cout << "Enter the path of the data file, be sure to include extension." << endl;
cout << "You can use either of the following:" << endl;
cout << "A forwardslash or double backslash to separate each directory." << endl;
getline(cin,filename);
ifstream input_file(filename.c_str());
if (input_file.is_open()) { //if file is open
cout << "open" << endl; //just a coding check to make sure it works ignore
string fileContents; //string to store contents
string temp;
while (!input_file.eof()) { //not end of file I know not best practice
getline(input_file, temp);
fileContents.append(temp); //appends file to string
}
cout << fileContents << endl; //prints string for test
}
else {
cout << "Error opening file check path or file extension" << endl;
}
In this file format, (* signals the beginning of a comment, so everything from there to a matching *) should be ignored (even if it contains a digit). For example, given input of (*Hi 245*) 6, the 6 should be counted, not the 2.
How do I iterate over the file only finding the first integer and counting it, while ignoring comments?
One way to approach your problem is the following:
Create a std::map<int, int> where the key is the digit and the value is the count. This allows you to compute statistics on your digits such as the count and the frequency after you have parsed the file. Something similar can be found in this SO answer.
Read each line of your file as a std::string using std::getline as shown in this SO answer.
For each line, strip the comments using a function such as this:
std::string& strip_comments(std::string & inp,
std::string const& beg,
std::string const& fin = "") {
std::size_t bpos;
while ((bpos = inp.find(beg)) != std::string::npos) {
if (fin != "") {
std::size_t fpos = inp.find(fin, bpos + beg.length());
if (fpos != std::string::npos) {
inp = inp.erase(bpos, fpos - bpos + fin.length());
} else {
// else don't erase because fin is not found, but break
break;
}
} else {
inp = inp.erase(bpos, inp.length() - bpos);
}
}
return inp;
}
which can be used like this:
std::string line;
std::getline(input_file, line);
line = strip_comments(line, "(*", "*)");
After stripping the comments, use the string member function find_first_of to find the first digit:
std::size_t dpos = line.find_first_of("123456789");
What is returned here is the index location in the string for the first digit. You should check that the returned position is not std::string::npos, as that would indicate that no digits are found. If the first digit is found, the corresponding character can be extracted using const char c = line[dpos]; and converted to an integer using std::atoi.
Increment the count for that digit in the std::map as shown in that first linked SO answer. Then loop back to read the next line.
After reading all lines from the file, the std::map will contain the counts for all first digits found in each line stripped of comments. You can then iterate over this map to retrieve all the counts, accumulate the total count over all digits found, and compute the frequency for each digit. Note that digits not found will not be in the map.
I hope this helps you get started. I leave the writing of the code to you. Good luck!

Using Boost-Regex to parse string into characters and numerals

I'd like to use Boost's Regex library to separate a string containing labels and numbers into tokens. For example 'abc1def002g30' would be separated into {'abc','1','def','002','g','30'}. I modified the example given in Boost documentation to come up with this code:
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main(int argc,char **argv){
string s,str;
int count;
do{
count=0;
if(argc == 1)
{
cout << "Enter text to split (or \"quit\" to exit): ";
getline(cin, s);
if(s == "quit") break;
}
else
s = "This is a string of tokens";
boost::regex re("[0-9]+|[a-z]+");
boost::sregex_token_iterator i(s.begin(), s.end(), re, 0);
boost::sregex_token_iterator j;
while(i != j)
{
str=*i;
cout << str << endl;
count++;
i++;
}
cout << "There were " << count << " tokens found." << endl;
}while(argc == 1);
return 0;
}
The number of tokens stored in count is correct. However, *it contains only an empty string so nothing is printed. Any guesses as to what I am doing wrong?
EDIT: as per the fix suggested below, I modified the code and it now works correctly.
From the docs on the sregex_token_iterator:
Effects: constructs a regex_token_iterator that will enumerate one string for each regular expression match of the expression re found within the sequence [a,b), using match flags m (see match_flag_type). The string enumerated is the sub-expression submatch for each match found; if submatch is -1, then enumerates all the text sequences that did not match the expression re (that is to performs field splitting)
Since your regex matching all items (unlike the sample code, which only matched the strings), you get empty results.
Try replacing it with a 0.