Word count in c++ after using (getline(cin,input))? - c++

So i am extremely new to this. I have an assignment to count the number of lines, words, characters, unique lines and unique words from user input. So far I have gotten lines, unique lines and characters from my code. I thought I got the words but then it doesn't work when i factor in double spaces and tabs. Also i have no clue how to find the unique words. Please offer your assistance.
Code:
// What I dont have:
//words
//Total words
#include <iostream>
#include <string>
#include <set>
using namespace std;
unsigned long countWords(const string& s, set<string>& wl); //total words
int main()
{
int linenum=0, charnum=0, totalwords=0;
set<string> lines;
string input;
set<string> unique; //to store unique words from countWords function
while (getline(cin,input))
{
lines.insert(input);
linenum++;
charnum+= input.length();
totalwords += countWords(input,unique);
}
cout << linenum <<" "<< totalwords <<" "<< charnum <<" " << lines.size()<<" " << unique.size()<< endl;
system("PAUSE");
return 0;
}
unsigned long countWords(const string& s, set<string>& wl) //total words
{
int wcount=1;
for (unsigned int i=0; i < s.length(); i++)
{
if ((s.at(i) == ' ')&&(s.at(i)+1 !='\0')) {
wcount++;
}
}
return wcount;
}

you need to put +1 inside brackets,your function will be like that
unsigned long countWords(const string& s, set<string>& wl) //total words
{
int wcount=0;// initial value must be zero
int N = 0;// you need to add this to count the characters of each word.
for (unsigned int i=0; i < s.length(); i++)
{
if ((s.at(i) == ' ')||(s.at(i+1) =='\0')) {// Condition must be or instead of and
wl.insert(s.substr(i-N-1,N));
++wcount;
N = 0;
}else ++N;
}
return wcount;
}

Here is an example of how the function could look
#include <iostream>
#include <sstream>
#include <set>
#include <string>
#include <iterator>
#include <algorithm>
unsigned long countWords( std::set<string> &wl, const std::string &s )
{
std::istringstream is( s );
wl.insert( std::istream_iterator<std::string>( is ),
std::istream_iterator<std::string>() );
is.clear();
is.str( s );
return ( std::distance( std::istream_iterator<std::string>( is ),
std::istream_iterator<std::string>() ) );
}
//...
In this example puctuations are considered as parts of words.
If you do not know yet std::istringstream and other facilities of C++ then you can write the function the following way
#include <iostream>
#include <set>
#include <string>
unsigned long countWords( std::set<string> &wl, const std::string &s )
{
const char *white_space = " \t";
unsigned long count = 0;
for ( std::string::size_type pos = 0, n = 0;
( pos = s.find_first_not_of( white_space, pos ) ) != std::string::npos;
pos = n == std::string::npos ? s.size() : n )
{
++count;
n = s.find_first_of( white_space, pos );
wl.insert( s.substr( pos, ( n == std::string::npos ? std::string::npos : n - pos ) ) );
}
return count;
}
//...

Related

Trying to remove all non alpha characters from a string using C++ what's the best way to do that given the code that I have?

I'm a beginner with C++ and not too familiar with the language yet. So what would be the simplest way to fix my code? I think there's something wrong with userInput.insert(i, ""); but I'm not sure what.
Example: If the input is: -Hello, 1 world$!
The output would be: Helloworld
#include <iostream>
#include<string>
using namespace std;
int main() {
string userInput;
string lowerAlpha = "abcdefghijklmnopqrstuvwxyz";
string upperAlpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
getline(cin, userInput);
for (int i = 0; i < userInput.size(); ++i) {
for (int j = 0; j < 26; ++j) {
if ((userInput.at(i) != lowerAlpha.at(j)) || (userInput.at(i) != upperAlpha.at(j))) {
userInput.insert(i, "");
}
}
}
cout << userInput << endl;
return 0;
}
If your compiler supports the C++ 20 then you can use standard function std::erase_if with a lambda as for example
#include <string>
#include <cctype>
//...
std::erase_if( userInput, []( unsigned char c ) { return not isalpha( c ); } );
Otherwise use the standard algorithm std::remove_if with the member function erase as for example
#include <string>
#include <iterator>
#include <algorithm>
#include <cctype>
//...
userInput.erase( std::remove_if( std::begin( userInput ),
std::end( userInput ),
[]( unsigned char c )
{
return not isalpha( c );
} ), std::end( userInput ) );
If to use your approach with for-loops then the code of the loops can look for example the following way
#include <iostream>
#include <string>
#include <string_view>
#include <cctype>
int main()
{
const std::string_view lowerAlpha = "abcdefghijklmnopqrstuvwxyz";
const std::string_view upperAlpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::string userInput = "-Hello, 1 world$!";
std::cout << userInput << '\n';
for ( std::string::size_type i = 0; i < userInput.size(); )
{
if ( lowerAlpha.find( userInput[i] ) == std::string_view::npos &&
upperAlpha.find( userInput[i] ) == std::string_view::npos )
{
userInput.erase( i, 1 );
}
else
{
++i;
}
}
std::cout << userInput << '\n';
}
The program output is
-Hello, 1 world$!
Helloworld

C++ How to write a code that counts top words while removing any special characters from text file

How do I take a text file from the command line that opens and reads it, and then count the top words in that file but also removes any special characters. I have this code done here and used maps but it isn't counting every word. For instance "hello." is one word and also "$#%hello<>?/". I have this file from the song shake it off that's supposed to read shake 78 times but I only counted 26 in this code.
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <vector>
using namespace std;
string ask(const string& msg) {
string ans;
cout << msg;
getline(cin, ans);
return ans;
}
int main() {
ifstream fin( ask("Enter file name: ").c_str() ) ;
if (fin.fail()) {
cerr << "ERROR"; // this is if the file fails to open
return 1;
}
map<string, int> wordCount;
string entity;
while (fin >> entity) {
vector<string> words;
for (int i = 0, a = 0; i < entity.length(); i++) {
char& c = entity[i];
if (c < 'A' || (c > 'Z' && c < 'a') || c > 'z') {
string word = entity.substr(a, i - a);
a = i + 1;
if (word.length() > 0)
words.push_back(word);
}
}
for (auto & word : words)
wordCount[word]++;
}
fin.close();
vector<string> topWords;
const size_t MAX_WORDS = 10;
for ( auto iter = wordCount.begin(); iter != wordCount.end(); iter ++ ) {
int som = 0, lim = topWords.size();
while (som < lim) {
int i = ( som + lim ) / 2;
int count = wordCount[topWords[i]];
if ( iter -> second > count)
lim = i;
else if ( iter -> second < count )
som = i + 1;
else
som = lim = i;
}
if (som < MAX_WORDS ) {
topWords.insert( topWords.begin() + som, iter -> first );
if ( topWords.size() > MAX_WORDS )
topWords.pop_back();
}
}
for (auto & topWord : topWords)
cout << "(" << wordCount[topWord] << ")\t" << topWord << endl;
return 0;
}
One last thing if yall can probably help me on is how would I also write a code that takes a number from the command line alongside the filename and with that number, display the number of top words corresponding with that number passed in the command line, I would assume there is a parse args involved maybe.
Thank you again!
https://s3.amazonaws.com/mimirplatform.production/files/48a9fa64-cddc-4e45-817f-3e16bd7772c2/shake_it_off.txt
!hi!
#hi#
#hi#
$hi$
%hi%
^hi^
&hi&
*hi*
(hi(
)hi)
_hi_
-hi-
+hi+
=hi=
~hi~
`hi`
:hi:
;hi;
'hi'
"hi"
<hi<
>hi>
/hi/
?hi?
{hi{
}hi}
[hi[
]hi]
|hi|
\hi\
bob bob bob bob bob bob bob !###$$%#&#^*()#*#)_++(#<><#:":bob###$$%#&#^*()#*#)_++(#<><#:":
!###$$%#&#^*()#*#)_++(#<><#:":bob###$$%#&#^*()#*#)_++(#<><#:": !###$$%#&#^*()#*#)_++(#<><#:":bob###$$%#&#^*()#*#)_++(#<><#:":
!###$$%#&#^*()#*#)_++(#<><#:":bob###$$%#&#^*()#*#)_++(#<><#:": !###$$%#&#^*()#*#)_++(#<><#:":bob###$$%#&#^*()#*#)_++(#<><#:
this is the special character test
Your original code is somewhat hard to refine, I have followed your description to get a program that uses STL.
Combine erase with remove_if to remove unwanted chars
Use set to resort by counts
If you have some experience with Boost, it's a use case with multimap or bimap, which can make the code even more cleaner.
#include <algorithm>
#include <fstream>
#include <iostream>
#include <map>
#include <set>
#include <string>
#include <vector>
using namespace std;
string ask(const string& msg) {
string ans;
cout << msg;
getline(cin, ans);
return ans;
}
int main() {
// ifstream fin(ask("Enter file name: ").c_str());
ifstream fin("shake_it_off.txt");
if (fin.fail()) {
cerr << "ERROR"; // this is if the file fails to open
return 1;
}
map<string, size_t> wordCount;
string entity;
while (fin >> entity) {
entity.erase(std::remove_if(entity.begin(), entity.end(),
[](char ch) { return !isalpha(ch); }),
entity.end());
wordCount[entity] += 1;
}
auto cmp = [](const std::pair<std::string, size_t>& lhs,
const std::pair<std::string, size_t>& rhs) {
return lhs.second > rhs.second;
};
std::multiset<std::pair<std::string, size_t>, decltype(cmp)> top(
wordCount.begin(), wordCount.end(), cmp);
auto it = top.begin();
const size_t MAX_WORDS = 10;
for (size_t i = 0; i < MAX_WORDS && it != top.end(); ++i, ++it) {
cout << "(" << it->first << ")\t" << it->second << endl;
}
return 0;
}

Removing all occurrences of a character from a string in C++

My assignment says I should write a function called removeChar that;
Takes 4 inputs: an integer num, a string str, a string s and a character c, but does not return
anything.
Finds the number of all the occurrences of c in s (both capital and lower case (hint: you may
use ASCII codes for comparison)) and saves it in num
Copies the trimmed string in str
Write in the same file a main() function containing a series of tests to showcase the correct
behavior of removeChar ().
But all printing operations should be done from the main() function. I have this code:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
string removeChar(int num, string s, string str, char c);
int main()
{
string s = "asdfasdf";
s = removeChar(1, "a", "hello", 'h');
cout << s;
}
string removeChar(int num, string s, string str, char c)
{
int i;
for (i = 0; i < s.length(); i++)
if (int(s.at(i)) == int(c))
num = int(c);
str.erase(std::remove(str.begin(), str.end(), (char)num), str.end());
return str;
}
It doesn't work, and even if it did, I need to have a void function.
If I have understood the description of the assignment correctly,
then you need something like the following:
#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>
#include <cctype>
void removeChar( std::string::size_type &n, std::string &str, const std::string &s, char c )
{
str.clear();
c = std::toupper( ( unsigned char )c );
auto equal_to_c = [&c]( const auto &item )
{
return std::toupper( ( unsigned char )item ) == c;
};
std::remove_copy_if( std::begin( s ), std::end( s ),
std::back_inserter( str ),
equal_to_c );
n = s.size() - str.size();
}
int main()
{
std::string::size_type n = 0;
std::string str;
removeChar( n, str, "This is a silly assignment", 's' );
std::cout << "n = " << n << ", str = " << str << '\n';
return 0;
}
The program output is:
n = 5, str = Thi i a illy aignment

When testing my c++ code in Xcode, using "Ab cde Fg" as a test string, my code returns "Not Unique"...Why? The code is listed below

I have been practicing interview questions in C++ in Xcode, however I have come across unexpected behavior, yet no compilation error. The code is expected to return whether or not a string contains all unique ASCII characters or not. Testing my code in Xcode on my Mac, with the string "Ab cde Fg" returns "Not Unique". Why is this?
bool isUnique1(std::string str)
{
if (str.length() > 128)
return false;
bool * barr = new bool[128];
for (int i = 0; i < str.length(); i++)
{
int val = str[i];
if (barr[val])
return false;
barr[val] = true;
}
delete[] barr;
return true;
}
int main()
{
std::string name;
bool result1;
std::cout << "Enter a string to test: ";
getline (std::cin, name);
result1 = isUnique1(name);
if (result1)
std::cout << "Unique \n";
else
std::cout << "Not Unique \n";
return 0;
}
The array is not initialized. Write
bool * barr = new bool[128]();
Pay attention to that this string
"Ab cde Fg"
^ ^
contains non-unique spaces.
Maybe you should write the function such a way that it would ignore white spaces.
If to ignore white spaces then the function can be defined for example the following way as it is shown in the demonstrative program below.
#include <iostream>
#include <iomanip>
#include <string>
#include <set>
#include <cctype>
bool isUnique1( const std::string &s )
{
std::set<char> set;
std::pair<std::set<char>::iterator, bool> p( std::set<char>::iterator(), true );
for ( std::string::size_type i = 0; p.second && i < s.size(); i++ )
{
if ( not std::isspace( ( unsigned char )s[i] ) ) p = set.insert( s[i] );
}
return p.second;
}
int main()
{
std::cout << std::boolalpha << isUnique1( "Ab cde Fg" ) << '\n';
return 0;
}
The program output is
true
Otherwise if white spaces must not be ignored then the loop will look like
for ( std::string::size_type i = 0; p.second && i < s.size(); i++ )
{
p = set.insert( s[i] );
}
Or without a loop the function can be written the following way
#include <iostream>
#include <iomanip>
#include <string>
#include <set>
#include <iterator>
#include <cctype>
bool isUnique1( const std::string &s )
{
return std::set<char>( std::begin( s ), std::end( s ) ).size() == s.size();
}
int main()
{
const char *s = "abcdefghijklmnopqrstuvwxyz";
std::cout << std::boolalpha << isUnique1( s ) << '\n';
return 0;
}
The program output is
true

Recursive Function that returns all substrings of a string

I need to implement a function in C++,
vector<string> generateSubstrings(string s),
that returns a vector of all substrings of a string. For example, the substrings of the string “rum” are the seven strings
“r”, “ru”, “rum”, “u”, “um”, “m”, “”.
The function has to be recursive and has to return the results as a vector.
Here is my code so far. It's only printing "r", "ru" and "rm". I'm having alot of trouble implementing this function. I've been working on this for the past few hours but I just can't figure out how to get it working as stated, so any help would be appreciated.
#include <iostream>
#include <string>
#include <vector>
using namespace std;
vector<string> generateSubstrings(string s, int num){
int index = num;
int SIZE = s.size();
vector<string> substrings;
if(index == s.size()){//BASE CASE
string temp = s.substr(index,1);
substrings.push_back(temp);
}
else{
for(int i = 0; i < SIZE; ++i){
string temp = s.at(index) + s.substr(i,i);
substrings.push_back(temp);
}
generateSubstrings(s, num + 1);
}
return substrings;
}
int main() {
vector<string> vec(20);
vec = generateSubstrings("rum", 0);
cout << endl << endl;cout << "PRINTING VECTOR" << endl;
for ( int i = 0; i<vec.size();++i){
cout << vec.at(i);
cout << endl;
}
cout << "DONE";
}
In your assignment there is written that the recursive function has to be declared like
vector<string> generateSubstrings(string s),
But you are trying to make another function recursive that declared like
vector<string> generateSubstrings(string s, int num);
So in any case your solution does not satisfy the requirement of the assignment.
The function can look the following way
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> generateSubstrings( std::string s )
{
if ( s.empty() ) return {};
std::vector<std::string> v;
v.reserve( s.size() * ( s.size() + 1 ) / 2 );
for ( std::string::size_type i = 0; i < s.size(); i++ )
{
v.push_back( s.substr( 0, i + 1 ) );
}
for ( const std::string &t : generateSubstrings( s.substr( 1 ) ) )
{
v.push_back( t );
}
return v;
}
int main()
{
std::string s( "rum" );
for ( const std::string &t : generateSubstrings( s ) )
{
std::cout << t << std::endl;
}
return 0;
}
Its output is
r
ru
rum
u
um
m
If you need also to include an empty string then you should change condition
if ( s.empty() ) return {};
in appropriate way. For example
if ( s.empty() ) return { "" };
Also in this case you should write
v.reserve( s.size() * ( s.size() + 1 ) / 2 + 1 );
Also you can replace the loop in the shown function with method insert. For example
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> generateSubstrings( std::string s )
{
if ( s.empty() ) return {};
std::vector<std::string> v;
v.reserve( s.size() * ( s.size() + 1 ) / 2 );
for ( std::string::size_type i = 0; i < s.size(); i++ )
{
v.push_back( s.substr( 0, i + 1 ) );
}
std::vector<std::string> v2 = generateSubstrings( s.substr( 1 ) );
v.insert( v.end(), v2.begin(), v2.end() );
return v;
}
int main()
{
std::string s( "rum" );
for ( const std::string &t : generateSubstrings( s ) )
{
std::cout << t << std::endl;
}
return 0;
}
The program output will be the same as shown above.
Here is a program modification that includes an empty string in the vector.
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> generateSubstrings( std::string s )
{
if ( s.empty() ) return { "" };
std::vector<std::string> v;
v.reserve( s.size() * ( s.size() + 1 ) / 2 + 1 );
for ( std::string::size_type i = 0; i < s.size(); i++ )
{
v.push_back( s.substr( 0, i + 1 ) );
}
std::vector<std::string> v2 = generateSubstrings( s.substr( 1 ) );
v.insert( v.end(), v2.begin(), v2.end() );
return v;
}
int main()
{
std::string s( "rum" );
for ( const std::string &t : generateSubstrings( s ) )
{
std::cout << t << std::endl;
}
return 0;
}
Here's an answer using Python. It prints the correct result for "rum", but for "rumm" it prints two "m" substrings for obvious reasons:
def substrings(s):
result = []
if len(s) == 0:
result.append("")
if len(s) > 0:
result += substrings(s[1:])
for n in range(1,len(s)+1):
result.append(s[0:n])
return result
print substrings("rum")
print substrings("rumm")
The idea of the algorithm is the following: for "rum", the substrings are the substrings of "um" followed by "r", "ru" and "rum". For "um", the substrings are the substrings of "m" followed by "u" and "um". For "m", the substrings are the substrings of "" followed by "m". For "", the substrings are simply "". So, the final list is "", "m", "u", "um", "r", "ru", "rum".
Although this isn't C++, you should be able to translate the code to C++. But that may not necessarily be what you want as "rumm" has two "m" substrings. If you think that "rumm" should have only one "m" substring, please leave a comment and I'll post another answer.
First, you should pay attention about code indent.
Then, I don't look your code, I wrote some code to achieve your aim, as follow:
void generateSubstrings(string s, int num, vector<string> &sta)
{
if (num == s.size())
return;
auto b = begin(s) + num;
string temp = "";
temp += *b;
sta.push_back(temp);
b++;
while (b != end(s))
{
temp += *b;
sta.push_back(temp);
b++;
}
generateSubstrings(s, num + 1, sta);
}