need to create a word matcher in c++ - c++

need to create a word matcher which counts how many times a specific word is mentioned in a text file. here is what i have done so far and am not sure what iv done wrong. 1 text file contains a long paragraph the other just contains a few words. I need to compare both text files e.g. the word "and" is in the short text file. need to compare this with the long paragraph and see how many time this words appears and then have a report at the end of the program which displays this.
E.g and - 6tmes, but - 0times, it - 23times.
^^ something like this. not sure how to start making this
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream infile("text1.txt");
if(!infile)
{
cout << "Error";
}
string words[250];
int counter = 0;
while (!infile.eof() )
{
infile >> words[counter];
counter++;
}
ifstream infile2("banned.txt");
if(!infile2)
{
cout << "Error";
}
string bannedwords[250];
counter = 0;
while (!infile2.eof() )
{
infile2 >> words[counter];
counter++;
}
int eatcount= 0;
int orcount = 0;
int hellocount = 0;
int number;
for(int i=0; i<200; i++)
{
for(int j = 0; j < 8; j++)
{
if ( words[i] == bannedwords[j])
{
cout << words[i] << " ";
if (words[i]=="eat")
{
eatcount++;
}
else if (words[i] == "or")
{
orcount++;
}
else if (words[i]== "hello")
{
hellocount++;
}
}
}
}
cout << endl;
cout<< "eat was found "<<eatcount<<" times";
cout << endl;
cout<< "or was found "<<orcount<<" times";
cout << endl;
cout<< "hello was found "<<hellocount<<" times";
system("pause");
}

Why not use a std::multiset?
ifstream infile("text1.txt");
if(!infile)
{
cout << "Error";
}
std::multiset<string> words;
string tmp;
while (!infile.eof() )
{
infile >> tmp;
words.insert(tmp);
}
Then also use a map for the banned words:
ifstream infile2("banned.txt");
if(!infile2)
{
cout << "Error";
}
std::map<string, int> banned;
string tmp;
while (!infile2.eof() )
{
infile2 >> tmp;
banned.insert(tmp);
}
Then you can use std::multiset::count(string) to find the words without all the extra looping. You would only need one loop to go through your banned words list. e.g:
std::map<string, int>::iterator bannedwordIter = bannedwords.begin();
for( ; bannedwordIter != bannedwords.end(); ++bannedwordIter )
{
bannedwordIter->second = words.count(bannedwordIter->first);
// you could print here as you process, or have another loop that prints it all after you finish
cout << bannedwordIter->first << " - " << bannedwordIter->second << " times." << endl;
}

A minimal way would be to use regular expressions, like so
#include <iostream>
#include <fstream>
#include <string>
#include <regex>
using namespace std;
unsigned countMatches(std::istream &is, std::string const &word)
{
string text;
unsigned count(0);
std::regex const expression(word);
while (getline(is, text)) {
count += distance(sregex_iterator(
text.begin(), text.end(), expression), sregex_iterator());
}
return count;
}
so you just pass it the input stream (in your case an input file stream) and it counts the occurences of the word specified after creating a regular expression that matches that word
int main()
{
ifstream ifs;
ifs.open("example_text_file.txt");
cout << countMatches(ifs, "word_you_want_to_search_for") << endl;
return 0;
}

Related

Wordcount function having trouble adding words to a set of unique words

I'm writing a wordcount function that should be able to read elements from stdin into a string. Then evaluate the string and return the number of words, number of lines, size of the string, and the number of unique words.
My issue is when it comes to adding words to the unique set. When I write it to add elements to a set, it would count the whitespace as part of the word then push entirely into my set.
Example:
Input:
this is
is
a test
test
Output
a
test
is test this
line is 4
Words = 7
size is 27
Unique is 6
It counts 7 words in total and 6 unique. I tried debugging it by printing bits of the code as i go so i can keep track of where I went wrong. I can only conclude that the issue lies within my if loops. How can I get past this, I've been stuck for some time now.
Here is my code:
#include<iostream>
#include<string>
#include<set>
using std::string;
using std::set;
using std::cin;
using std::cout;
set<string> UNIQUE;
size_t sfind(const string s) //will take string a count words, add to set
{
string a;
int linecount = 0;
int state = 0; //0 represents reading whitespace/tab, 1 = reading letter
int count = 0; //word count
for(size_t i =0; i < s.length(); i++) {
a+=s[i]; //add to new string to add to set
if(state ==0) { //start at whitespace
if(state != ' ' && state != '\t') { //we didnt read whitespace
count++;
state =1;
}
}
else if(s[i]== ' ' || s[i] == '\t' || s[i] == '\n') {
state = 0;
UNIQUE.insert(a); //add to UNIQUE words
a.clear(); // clear and reset the string
}
if (s[i] == '\n') {
linecount++;
}
}
for(set<string>::iterator i = UNIQUE.begin(); i!= UNIQUE.end(); i++) {
cout << *i;
}
cout << '\n';
cout << "line is " << linecount << '\n';
return count;
}
int main()
{
char c;
string s;
while(fread(&c,1,1,stdin)) {
s+=c; //read element add to string
}
cout << "Words = " << sfind(s) << '\n';
cout << "size is " << s.length() << '\n';
cout << "Unique is "<< UNIQUE.size() << '\n';
return 0;
}
Also I will be using
fread(&c,1,1,stdin)
because i will be using it later on with a larger wordcount function.
Rather than writing code trying to parse the string on spaces, use std::istringstream to do the parsing.
Here is an example:
#include <string>
#include <iostream>
#include <sstream>
#include <set>
int main()
{
std::set<std::string> stringSet;
std::string line;
while (std::getline(std::cin, line))
{
std::istringstream oneline(line);
std::string word;
while (oneline >> word)
{
std::cout << word << "\n";
stringSet.insert(word);
}
}
std::cout << "\n\nThere are " << stringSet.size() << " unique words";
}
Live Example

Conversion of string to char in c++

I am trying to build a spell check validator using Hash Table. I have list of words in a text file. I want to imported them to program and entered them into Hash Table using seperate chaining. Now, I want to run the program and I have these two errors. Can anyone help me with this?
at line 30-- no matching function for call to
'std::__cxx11::basic_string::push_back(std::__cxx11::string&)'
at line 40-- no match for 'operator==' (operand types are
'__gnu_cxx::__alloc_traits, char>::value_type'
{aka 'char'} and 'std::__cxx11::string' {aka
'std::__cxx11::basic_string'})
I know it's simple mistake of converting str to char but I couldn't figure out how to do without changing the rest of the program.
I would like to have a simple solution which doesnot change the existing code.
If it is not possible please tell me how to proceed.
#include<iostream>
#include <string>
#include <cstring>
#include <fstream>
std::string hashTable[27];
int hashTableSize = 27;
#define MAX_LEN 27
using namespace std;
int hashFunc(std::string s)
{
// A simple hashing, no collision handled
int sum=0,index=0;
for(std::string::size_type i=0; i < s.length(); i++)
{
sum += s[i];
}
index = sum % MAX_LEN;
return index;
}
void insert(std::string s)
{
// Compute the index using Hash Function
int index = hashFunc(s);
// Insert the element in the linked list at the particular index
hashTable[index].push_back(s);
}
void search(string s)
{
//Compute the index by using the hash function
int index = hashFunc(s);
//Search the linked list at that specific index
for(int i = 0;i < hashTable[index].size();i++)
{
if(hashTable[index][i] == s)
{
cout << s << " is found!" << endl;
return;
}
}
cout << s << " is not found!" << endl;
}
int main(){
//opening text file
std::ifstream inFile;
inFile.open("un.txt");
// If text file doesnot exist or not included in root folder.
if(inFile.fail()) {
std::cerr << "Error opening file"<< std::endl ;
exit(1);
}
//if text file exists.
std::string wordsinfile;
std::string words[100];
int count=0,i=0;
std::string str;
// writing words from text file into Array.
while( !inFile.eof()) {
inFile >> wordsinfile;
words[i]=wordsinfile;
count++;
i++;
}
for (i=0;i<100;i++){
std::cout<< words[i]<<std::endl;
}
for(i=0;i<=23;i++) {
insert(words[i]);
}
int choice;
string z;
string y;
while(1) {
cout << "Enter choice. 1) Insert\n 2) Search\n 3) Exit\n";
cin >> choice;
switch (choice) {
case 1:
cin>>y;
insert(y);
break;
case 2:
cin>>z;
search(z);
break;
case 3:
exit(0);
}
}
return 0;
}
My txt file had 38 different words and size of hash table is 27
Here's the correct version of your program. Your hashtable is supposed to be a collection of strings and since you are using indexing in the search function, using an STL vector for hashtable should do the trick.
#include <iostream>
#include <string>
#include <cstring>
#include <fstream>
#include <vector>
std::vector<std::string> hashTable[27];
int hashTableSize = 27;
#define MAX_LEN 27
using namespace std;
int hashFunc(std::string s)
{
// A simple hashing, no collision handled
int sum=0,index=0;
for(std::string::size_type i=0; i < s.length(); i++)
{
sum += s[i];
}
index = sum % MAX_LEN;
return index;
}
void insert(std::string s)
{
// Compute the index using Hash Function
int index = hashFunc(s);
// Insert the element in the linked list at the particular index
hashTable[index].push_back(s);
}
void search(string s)
{
//Compute the index by using the hash function
int index = hashFunc(s);
//Search the linked list at that specific index
for(int i = 0;i < hashTable[index].size();i++)
{
if(hashTable[index][i] == s)
{
cout << s << " is found!" << endl;
return;
}
}
cout << s << " is not found!" << endl;
}
int main(){
//opening text file
std::ifstream inFile;
inFile.open("un.txt");
// If text file doesnot exist or not included in root folder.
if(inFile.fail()) {
std::cerr << "Error opening file"<< std::endl ;
exit(1);
}
//if text file exists.
std::string wordsinfile;
std::vector<std::string> words;
std::string str;
// writing words from text file into Array.
while( inFile >> wordsinfile) {
words.push_back(std::move(wordsinfile));
}
std::cout << "Total words read: " << words.size() << std::endl;
for (int i = 0;i < words.size(); i++){
std::cout << words[i] << std::endl;
}
for(int i=0; i < words.size(); i++) {
insert(words[i]);
}
int choice;
string z;
string y;
while(1) {
cout << "Enter choice. 1) Insert\n 2) Search\n 3) Exit\n";
cin >> choice;
switch (choice) {
case 1:
cin>>y;
insert(y);
break;
case 2:
cin>>z;
search(z);
break;
case 3:
exit(0);
}
}
return 0;
}

Read new line character in fstream C++

How do I read the new line character? I am trying to do a character count, but the new line gets in the way. I tried doing if (text[i] == ' ' && text[i] == '\n') but that didn't work. Here is my repl.it session.
I am trying to read this from file.txt:
i like cats
dogs are also cool
so are orangutans
This is my code:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream input;
input.open("file.txt");
int numOfWords = 0;
while (true)
{
string text;
getline(input, text);
for(int i = 0; i < text.length(); i++)
{
if (text[i] == ' ')
{
numOfWords++;
}
}
if (input.fail())
{
break;
}
}
cout << "Number of words: " << numOfWords+1 << endl;
input.close();
}
Your question is asking how to count characters, but your code is counting words instead. std::getline() swallows line breaks. You don't need to worry about them if you want to count words. In fact, you can use operator>> to greatly simplify your counting in that case, eg:
int main()
{
ifstream input("file.txt");
int numOfWords = 0;
string word;
while (input >> word)
++numOfWords;
cout << "Number of words: " << numOfWords << endl;
return 0;
}
If you really want to count characters instead of words, use std::ifstream::get() to read the file 1 character at a time, eg:
int main()
{
ifstream input("file.txt");
int numOfChars = 0;
int numOfWords = 0;
bool isInSpace = true;
char ch;
while (input.get(ch))
{
++numOfChars;
if (std::isspace(ch, input.getloc())) {
isInSpace = true;
}
else if (isInSpace) {
isInSpace = false;
++numOfWords;
}
}
cout << "Number of chars: " << numOfChars << endl;
cout << "Number of words: " << numOfWords << endl;
return 0;
}

Reading from a text file into an array

I'm just a beginner for C++
I want to read the text file (maximum of 1024 words) into an array, and I need to ignore all single character words. Can you guys help me to discard words that are single characters to avoid symbols, special characters.
This is my code:
#include <fstream>
#include <string>
#include <iostream>
using namespace std;
const int SIZE = 1024;
void showArray(string names[], int SIZE){
cout << "Unsorted words: " << endl;
for (int i = 0; i < SIZE; i++){
cout << names[i] << " ";
cout << endl;
}
cout << endl;
}
int main()
{
int count = 0;
string names[SIZE];
// Ask the user to input the file name
cout << "Please enter the file name: ";
string fileName;
getline(cin, fileName);
ifstream inputFile;
inputFile.open(fileName);
// If the file name cannot open
if (!inputFile){
cout << "ERROR opening file!" << endl;
exit(1);
}
// sort the text file into array
while (count < SIZE)
{
inputFile >> names[count];
if (names[count].length() == 1);
else
{
count++;
}
}
showArray(names, SIZE); // This function will show the array on screen
system("PAUSE");
return 0;
}
If you change names into a std::vector, then you can populate it using push_back. You could fill names like this.
for (count = 0; count < SIZE; count++)
{
std::string next;
inputFile >> next;
if (next.length() > 1);
{
names.push_back(next);
}
}
Alternatively you could fill all the words into names and then utilize the Erase–remove idiom.
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter<std::vector<std::string>>(names));
names.erase(std::remove(names.begin(), names.end(),
[](const std::string& x){return x.length() == 1;}), names.end());

How to read a file into vector in C++?

I need to read from a .data or .txt file containing a new float number on each line into a vector.
I have searched far and wide and applied numerous different methods but every time I get the same result, of a Main.size() of 0 and an error saying "Vector Subscript out of Range", so evidently the vector is just not reading anything into the file.
Note: the file is both in the folder and also included in the VS project.
Anyway, here's my code:
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>
using namespace std;
int main() {
vector<double> Main;
int count;
string lineData;
double tmp;
ifstream myfile ("test.data", ios::in);
double number;
myfile >> count;
for(int i = 0; i < count; i++) {
myfile >> tmp;
Main.push_back(tmp);
cout << count;
}
cout << "Numbers:\n";
cout << Main.size();
for (int i=0; i=((Main.size())-1); i++) {
cout << Main[i] << '\n';
}
cin.get();
return 0;
}
The result I get is always simply:
Numbers:
0
Your loop is wrong:
for (int i=0; i=((Main.size())-1); i++) {
Try this:
for (int i=0; i < Main.size(); i++) {
Also, a more idiomatic way of reading numbers into a vector and writing them to stdout is something along these lines:
#include <iostream>
#include <iterator>
#include <fstream>
#include <vector>
#include <algorithm> // for std::copy
int main()
{
std::ifstream is("numbers.txt");
std::istream_iterator<double> start(is), end;
std::vector<double> numbers(start, end);
std::cout << "Read " << numbers.size() << " numbers" << std::endl;
// print the numbers to stdout
std::cout << "numbers read in:\n";
std::copy(numbers.begin(), numbers.end(),
std::ostream_iterator<double>(std::cout, " "));
std::cout << std::endl;
}
although you should check the status of the ifstream for read errors.
Just to expand on juanchopanza's answer a bit...
for (int i=0; i=((Main.size())-1); i++) {
cout << Main[i] << '\n';
}
does this:
Create i and set it to 0.
Set i to Main.size() - 1. Since Main is empty, Main.size() is 0, and i gets set to -1.
Main[-1] is an out-of-bounds access. Kaboom.
Just a piece of advice.
Instead of writing
for (int i=0; i=((Main.size())-1); i++) {
cout << Main[i] << '\n';
}
as suggested above, write a:
for (vector<double>::iterator it=Main.begin(); it!=Main.end(); it++) {
cout << *it << '\n';
}
to use iterators. If you have C++11 support, you can declare i as auto i=Main.begin() (just a handy shortcut though)
This avoids the nasty one-position-out-of-bound error caused by leaving out a -1 unintentionally.
1.
In the loop you are assigning value rather than comparing value so
i=((Main.size())-1) -> i=(-1) since Main.size()
Main[i] will yield "Vector Subscript out of Range" coz i = -1.
2.
You get Main.size() as 0 maybe becuase its not it can't find the file. Give the file path and check the output. Also it would be good to initialize the variables.
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
int main()
{
fstream dataFile;
string name , word , new_word;
vector<string> test;
char fileName[80];
cout<<"Please enter the file name : ";
cin >> fileName;
dataFile.open(fileName);
if(dataFile.fail())
{
cout<<"File can not open.\n";
return 0;
}
cout<<"File opened.\n";
cout<<"Please enter the word : ";
cin>>word;
cout<<"Please enter the new word : ";
cin >> new_word;
while (!dataFile.fail() && !dataFile.eof())
{
dataFile >> name;
test.push_back(name);
}
dataFile.close();
}
//file name must be of the form filename.yourfileExtension
std::vector<std::string> source;
bool getFileContent(std::string & fileName)
{
if (fileName.substr(fileName.find_last_of(".") + 1) =="yourfileExtension")
{
// Open the File
std::ifstream in(fileName.c_str());
// Check if object is valid
if (!in)
{
std::cerr << "Cannot open the File : " << fileName << std::endl;
return false;
}
std::string str;
// Read the next line from File untill it reaches the end.
while (std::getline(in, str))
{
// Line contains string of length > 0 then save it in vector
if (str.size() > 0)
source.push_back(str);
}
/*for (size_t i = 0; i < source.size(); i++)
{
lexer(source[i], i);
cout << source[i] << endl;
}
*/
//Close The File
in.close();
return true;
}
else
{
std::cerr << ":VIP doe\'s not support this file type" << std::endl;
std::cerr << "supported extensions is filename.yourfileExtension" << endl;
}
}