Counting occurrences of letter in a file - c++

I'm trying to count the number of times each letter appears in a file. When I run the code below it counts "Z" twice. Can anyone explain why?
The test data is:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
#include <iostream> //Required if your program does any I/O
#include <iomanip> //Required for output formatting
#include <fstream> //Required for file I/O
#include <string> //Required if your program uses C++ strings
#include <cmath> //Required for complex math functions
#include <cctype> //Required for letter case conversion
using namespace std; //Required for ANSI C++ 1998 standard.
int main ()
{
string reply;
string inputFileName;
ifstream inputFile;
char character;
int letterCount[127] = {};
cout << "Input file name: ";
getline(cin, inputFileName);
// Open the input file.
inputFile.open(inputFileName.c_str()); // Need .c_str() to convert a C++ string to a C-style string
// Check the file opened successfully.
if ( ! inputFile.is_open())
{
cout << "Unable to open input file." << endl;
cout << "Press enter to continue...";
getline(cin, reply);
exit(1);
}
while ( inputFile.peek() != EOF )
{
inputFile >> character;
//toupper(character);
letterCount[static_cast<int>(character)]++;
}
for (int iteration = 0; iteration <= 127; iteration++)
{
if ( letterCount[iteration] > 0 )
{
cout << static_cast<char>(iteration) << " " << letterCount[iteration] << endl;
}
}
system("pause");
exit(0);
}

As others have pointed out, you have two Qs in the input. The reason you have two Zs is that the last
inputFile >> character;
(probably when there's just a newline character left in the stream, hence not EOF) fails to convert anything, leaving a 'Z' in the global 'character' from the previous iteration. Try inspecting inputFile.fail() afterwards to see this:
while (inputFile.peek() != EOF)
{
inputFile >> character;
if (!inputFile.fail())
{
letterCount[static_cast<int>(character)]++;
}
}
The idiomatic way to write the loop, and which also fixes your 'Z' problem, is:
while (inputFile >> character)
{
letterCount[static_cast<int>(character)]++;
}

There are two Q's in your uppercase string. I believe the reason you get two counts for Z is that you should check for EOF after reading the character, not before, but I am not sure about that.

Well, others already have pointed out the error in your code.
But here is one elegant way you can read the file and count the letters in it:
struct letter_only: std::ctype<char>
{
letter_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
return &rc[0];
}
};
struct Counter
{
std::map<char, int> letterCount;
void operator()(char item)
{
if ( item != std::ctype_base::space)
++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution!
}
operator std::map<char, int>() { return letterCount ; }
};
int main()
{
ifstream input;
input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only!
input.open("filename.txt");
istream_iterator<char> start(input);
istream_iterator<char> end;
std::map<char, int> letterCount = std::for_each(start, end, Counter());
for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
{
cout << it->first <<" : "<< it->second << endl;
}
}
This is modified (untested) version of this solution:
Elegant ways to count the frequency of words in a file

For one thing, you do have two Q's in the input.
Regarding Z, #Jeremiah is probably right in that it is doubly counted due to it being the last character, and your code not detecting EOF properly. This can be easily verified by e.g. changing the order of input characters.
As a side note, here
for (int iteration = 0; iteration <= 127; iteration++)
your index goes out of bounds; either the loop condition should be iteration < 127, or your array declared as int letterCount[128].

Given that you apparently only want to count English letters, it seems like you should be able to simplify your code considerably:
int main(int argc, char **argv) {
std::ifstream infile(argv[1]);
char ch;
static int counts[26];
while (infile >> ch)
if (isalpha(ch))
++counts[tolower(ch)-'a'];
for (int i=0; i<26; i++)
std::cout << 'A' + i << ": " << counts[i] <<"\n";
return 0;
}
Of course, there are quite a few more possibilities. Compared to #Nawaz's code (for example), this is obviously quite a bit shorter and simpler -- but it's also more limited (e.g., as it stands, it only works with un-accented English characters). It's pretty much restricted to the basic ASCII letters -- EBCDIC encoding, ISO 8859-x, or Unicode will break it completely.
His also makes it easy to apply the "letters only" filtration to any file. Choosing between them depends on whether you want/need/can use that flexibility or not. If you only care about the letters mentioned in the question, and only on typical machines that use some superset of ASCII, this code will handle the job more easily -- but if you need more than that, it's not suitable at all.

Related

(C++) Why can't I rely on an extration operator to act as an iterator?

#include <iostream>
#include <fstream>
int main()
{
std::ifstream file("input.txt");
char currentChar;
int charCount = 0;
while (file >> currentChar)
{
charCount++;
if (currentChar == 'a')
{
std::cout << charCount;
}
}
in the above, the charCount that's printed is massively large. If I move charcount into the if statement and turn the input into repetitions of the character 'a', it counts correctly (or would count the number of a's correctly). Is "file >> currentChar" what's causing the charCount number to increment so highly? And if so, what's it doing? Why?
It's not "massively large". You're simply outputting the current count every time you encounter the letter a, and because you don't include any whitespace or newlines, then every number will be joined together and appear like a huge number.
Try this:
std::cout << charCount << std::endl;
And consider doing this just once after the loop. Unless for some reason you want to show all intermediate counts.

Program is counting consonants wrong

I'm trying to make a program that counts all the vowels and all the consonants in a text file. However, if the file has a word such as cat it says that there are 3 consonants and 1 vowel when there should be 2 consonants and 1 vowel.
#include <string>
#include <cassert>
#include <cstdio>
using namespace std;
int main(void)
{
int i, j;
string inputFileName;
ifstream fileIn;
char ch;
cout<<"Enter the name of the file of characters: ";
cin>>inputFileName;
fileIn.open(inputFileName.data());
assert(fileIn.is_open());
i=0;
j=0;
while(!(fileIn.eof())){
ch=fileIn.get();
if (ch == 'a'||ch == 'e'||ch == 'i'||ch == 'o'||ch == 'u'||ch == 'y'){
i++;
}
else{
j++;
}
}
cout<<"The number of Consonants is: " << j << endl;
cout<<"The number of Vowels is: " << i << endl;
return 0;
}
Here you check if the eof state is set, then try to read a char. eof will not be set until you try to read beyond the end of the file, so reading a char fails, but you'll still count that char:
while(!(fileIn.eof())){
ch=fileIn.get(); // this fails and sets eof when you're at eof
So, if your file only contains 3 chars, c, a and t and you've read the t you'll find that eof() is not set. It'll be set when you try reading the next char.
A better way is to check if fileIn is still in a good state after the extraction:
while(fileIn >> ch) {
With that in place the counting should add up. All special characters will be counted as consonants though. To improve on that, you could check that the char is a letter:
#include <cctype>
// ...
while(fileIn >> ch) {
if(std::isalpha(ch)) { // only count letters
ch = std::tolower(ch); // makes it possible to count uppercase letters too
if(ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u' || ch == 'y') {
i++;
} else {
j++;
}
}
}
Your program doesn't check for numbers and special characters, as well as uppercase letters. Plus, the .eof() is misused: it gets to the last character of the file, loops again, reads one more character, and only then it realizes it is at the end of the file, generating the extra consonant problem. Consider using while((ch = inFile.get()) != EOF).
I would use a different approach, searching strings:
const std::string vowels = "aeiou";
int vowel_quantity = 0;
int consonant_quantity = 0;
char c;
while (file >> c)
{
if (isalpha(c))
{
if (vowels.find(c) != std::string::npos)
{
++vowel_quantity;
}
else
{
++consonant_quantity;
}
}
}
Note: in the above code fragment, the character is first tested for an alphabetic characters. Characters may not be alphabetical like period or question mark. Your code counts periods as consonants.
Edit 1: character arrays
If you are not allowed to use std::string, you could also use character arrays (a.k.a. C-Strings):
static const char vowels[] = "aeiou";
int vowel_quantity = 0;
int consonant_quantity = 0;
char c;
while (file >> c)
{
if (isalpha(c))
{
if (strchr(vowels, c) != NULL)
{
++vowel_quantity;
}
else
{
++consonant_quantity;
}
}
}
I first thought my very first comment to your question was just a sidenote, but in fact it's the reason for the results you're getting. Your reading loop
while(!(fileIn.eof())){
ch=fileIn.get();
// process ch
}
is flawed. At the end of the file you'll check for EOF with !fileIn.eof() but you haven't read past the end yet so your program enters the loop once again and fileIn.get() will return EOF which will be counted as a consonant. The correct way to read is
while ((ch = file.get()) != EOF) {
// process ch
}
with ch declared as integer or
while (file >> ch) {
// process ch
}
with ch declared as char. To limit the scope of ch to the loop consider using a for-loop:
for (int ch{ file.get() }; ch != EOF; ch = file.get()) {
// process ch;
}
As #TedLyngmo pointed out in the comments, EOF could be replaced by std::char_traits<char>::eof() for consistency although it is specified to return EOF.
Also your program should handle everything that isn't a letter (numbers, signs, control characters, ...) differently from vowels and consonants. Have a look at the functions in <cctype>.
In addition to Why !.eof() inside a loop condition is always wrong., you have another test or two you must implement to count all vowels and consonants. As mentioned in the comment, you will want to use tolower() (by including cctype) to convert each char to lower before your if statement to ensure you classify both upper and lower-case vowels.
In addition to testing for vowels, you need an else if (isalpha(c)) test. You don't want to classify whitespace or punctuation as consonants.
Additionally, unless you were told to treat 'y' as a vowel, it technically isn't one. I'll leave that up to you.
Adding the tests, you could write a short implementation as:
#include <iostream>
#include <fstream>
#include <string>
#include <cctype>
int main (void) {
size_t cons = 0, vowels = 0;
std::string ifname {};
std::ifstream fin;
std::cout << "enter filename: ";
if (!(std::cin >> ifname)) {
std::cerr << "(user canceled input)\n";
exit (EXIT_FAILURE);
}
fin.open (ifname);
if (!fin.is_open()) {
std::cerr << "error: file open failed '" << ifname << "'\n";
exit (EXIT_FAILURE);
}
/* loop reading each character in file */
for (int c = fin.get(); !fin.eof(); c = fin.get()) {
c = tolower(c); /* convert to lower */
if (c=='a' || c=='e' || c=='i' || c=='o' || c=='u')
vowels++;
else if (isalpha(c)) /* must be alpha to be consonant */
cons++;
}
std::cout << "\nIn file " << ifname << " there are:\n " << vowels
<< " vowels, and\n " << cons << " conansants\n";
}
(also worth reading Why is “using namespace std;” considered bad practice?)
Example Input File
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/vowelscons
enter filename: dat/captnjack.txt
In file dat/captnjack.txt there are:
25 vowels, and
34 conansants
Which if you count and classify each character gives the correct result.
Look things over and let me know if you have any questions.
I know that the following will be hard to digest. I want to show it anyway, because it is the "more-modern C++"-solution.
So, I will first think and develop an algorithm, and then use moderen C++ elements to implement it.
First to the algorithm. If we use the ASCII code to encode letters, then we will see the following:
We see that the ASCII code for uppercase and lowercase letters just differ in the lower 5 bits. So, if we mask the ASCII code with 0x1F, so char c{'a'}; unsigned int x{c & 0x1F}, we will get values between 1 and 26. So, we can calculte a 5 bit value for each letter. If we now mark all vowels with a 1, we can build a binary number, consisting of 32bits (an unsigned int) and set a bit at each position, where the vowel is true. We then get something like
Bit position
3322 2222 2222 1111 1111 1100 0000 0000
1098 7654 3210 9876 5432 1098 7654 3210
Position with vowels:
0000 0000 0010 0000 1000 0010 0010 0010
This numer can be converted to 0x208222. And if we now want to find out, if a letter (regardless whether upper- or lowercase) is a vowel, then we mask out the not necessary bits from the chararcter ( C & 1F ) and shift the binary number to the right as much position, as the resulting letter code has. If then the bit is set at the LSB position, then we have a vowel. This know how is decades old.
Aha. No so easy, but will work for ASCII coded letters.
Next, we create a Lambda, that will read a string that purely consists of alpha letters and counts the vowels. What is not a vowel, that is a consonant (because we have letters only).
Then we use modern C++ elements to calculate the requested values:
The result is some elegant C++ code with only a few lines.
Please see
#include <utility>
#include <algorithm>
#include <string>
#include <iostream>
#include <fstream>
#include <cctype>
int main() {
// Lambda for counting vowels and consonants in a string consisting of letters only
auto countVowelsAndConsonants = [](std::string& s) -> std::pair<size_t, size_t> {
size_t numberOfVowels = std::count_if(s.begin(), s.end(), [](const char c) { return (0x208222 >> (c & 0x1f)) & 1; });
return { numberOfVowels, s.size() - numberOfVowels }; };
// Inform the user what to do: He should enter a valid filename
std::cout << "\nCount vowels and consonants.\n\nEnter a valid filename with the source text: ";
// Read the filename
if (std::string fileName{}; std::cin >> fileName) {
// Now open the file and check, if it could be opened
if (std::ifstream sourceFileStream(fileName); sourceFileStream) {
// Read the complete source text file into a string. But only letters
std::string completeSourceTextFile{};
std::copy_if(std::istreambuf_iterator<char>(sourceFileStream), {}, std::back_inserter(completeSourceTextFile), std::isalpha);
// Now count the corresponding vowels and consonants
const auto [numberOfVowels, numberOfConsonants] = countVowelsAndConsonants(completeSourceTextFile);
// Show result to user:
std::cout << "\n\nNumber of vowels: " << numberOfVowels << "\nNumber of consonants: " << numberOfConsonants << "\n\n";
}
else {
std::cerr << "\n*** Error. Could not open source text file '" << fileName << "'\n\n";
}
}
else {
std::cerr << "\n*** Error. Could not get file name for source text file\n\n";
}
return 0;
}
Please note:
There are one million possible solutions. Everbody can do, what he wants.
Some people are still more in a C-Style mode and others do like more to program in C++

C++ Cin input to array

I am a beginner in c++ and I want to enter a string as character by character into an array , so that I can implement a reverse function .. However unlike C when the enter is hit a '\n' is not insterted in the stream.. how can I stop data from being entered ?
my code is :
#include<iostream>
#include<array>
#define SIZE 100
using namespace std;
char *reverse(char *s)
{
array<char, SIZE>b;
int c=0;
for(int i =(SIZE-1);i>=0;i--){
b[i] = s[c];
c++;
}
return s;
}
int main()
{
cout<<"Please insert a string"<<endl;
char a[SIZE];
int i=0;
do{
cin>>a[i];
i++;
}while(a[i-1]!= '\0');
reverse(a);
return 0;
}
When you read character by character, it really reads characters, and newline is considered a white-space character.
Also the array will never be terminated as a C-style string, that's not how reading characters work. That means your loop condition is wrong.
To begin with I suggest you start using std::string for your strings. You can still read character by character. To continue you need to actually check what characters you read, and end reading once you read a newline.
Lastly, your reverse function does not work. First of all the loop itself is wrong, secondly you return the pointer to the original string, not the "reversed" array.
To help you with the reading it could be done something like
std::string str;
while (true)
{
char ch;
std::cin >> ch;
if (ch == '\n')
{
break; // End loop
}
str += ch; // Append character to string
}
Do note that not much of this is really needed as shown in the answer by Stack Danny. Even my code above could be simplified while still reading one character at a time.
Since you tagged your question as C++ (and not C) why not actually solve it with the modern C++ headers (that do exactly what you want, are tested, save and work really fast (rather than own functions))?
#include <string>
#include <algorithm>
#include <iostream>
int main(){
std::string str;
std::cout << "Enter a string: ";
std::getline(std::cin, str);
std::reverse(str.begin(), str.end());
std::cout << str << std::endl;
return 0;
}
output:
Enter a string: Hello Test 4321
1234 tseT olleH

HW Help: get char instead of get line C++

I wrote the code below that successfully gets a random line from a file; however, I need to be able to modify one of the lines, so I need to be able to get the line character by character.
How can I change my code to do this?
Use std::istream::get instead of std::getline. Just read your string character by character until you reach \n, EOF or other errors. I also recommend you read the full std::istream reference.
Good luck with your homework!
UPDATE:
OK, I don't think an example will hurt. Here is how I'd do it if I were you:
#include <string>
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace std;
static std::string
answer (const string & question)
{
std::string answer;
const string filename = "answerfile.txt";
ifstream file (filename.c_str ());
if (!file)
{
cerr << "Can't open '" << filename << "' file.\n";
exit (1);
}
for (int i = 0, r = rand () % 5; i <= r; ++i)
{
answer.clear ();
char c;
while (file.get (c).good () && c != '\n')
{
if (c == 'i') c = 'I'; // Replace character? :)
answer.append (1, c);
}
}
return answer;
}
int
main ()
{
srand (time (NULL));
string question;
cout << "Please enter a question: " << flush;
cin >> question;
cout << answer (question) << endl;
}
... the only thing is that I have no idea why do you need to read string char by char in order to modify it. You can modify std::string object, which is even easier. Let's say you want to replace "I think" with "what if"? You might be better off reading more about
std::string and using find, erase, replace etc.
UPDATE 2:
What happens with your latest code is simply this - you open a file, then you get its content character by character until you reach newline (\n). So in either case you will end up reading the first line and then your do-while loop will terminate. If you look into my example, I did while loop that reads line until \n inside a for loop. So that is basically what you should do - repeat your do-while loop for as many times as many lines you want/can get from that file. For example, something like this will read you two lines:
for (int i = 1; i <= 2; ++i)
{
do
{
answerfile.get (answer);
cout << answer << " (from line " << i << ")\n";
}
while (answer != '\n');
}

C++ program not moving past cin step for string input

I'm obviously not quite getting the 'end-of-file' concept with C++ as the below program just isn't getting past the "while (cin >> x)" step. Whenever I run it from the command line it just sits there mocking me.
Searching through SO and other places gives a lot of mention to hitting ctrl-z then hitting enter to put through an end-of-file character on windows, but that doesn't seem to be working for me. That makes me assume my problem is elsewhere. Maybe defining x as a string is my mistake? Any suggestions about where I'm going wrong here would be great.
Note: sorry for the lack of comments in the code - the program itself is supposed to take in a series of
words and then spit back out the count for each word.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iomanip>
using std::cin;
using std::cout; using std::endl;
using std::sort;
using std::string; using std::vector;
int main()
{
cout << "Enter a series of words separated by spaces, "
"followed by end-of-file: ";
vector<string> wordList;
string x;
while (cin >> x)
wordList.push_back(x);
typedef vector<string>::size_type vec_sz;
vec_sz size = wordList.size();
if (size == 0) {
cout << endl << "This list appears empty. "
"Please try again." << endl;
return 1;
}
sort(wordList.begin(), wordList.end());
cout << "Your word count is as follows:" << endl;
int wordCount = 1;
for (int i = 0; i != size; i++) {
if (wordList[i] == wordList[i+1]) {
wordCount++;
}
else {
cout << wordList[i] << " " << wordCount << endl;
wordCount = 1;
}
}
return 0;
}
If you're on windows ^Z has to come as the first character after a newline, if you're on a unixy shell then you want to type ^D.
The input portion of your code works. The only real problem I see is with the loop the tries to count up the words:
for (int i = 0; i != size; i++) {
if (wordList[i] == wordList[i+1]) {
The valid subscripts for wordList run from 0 through size-1. In the last iteration of your loop, i=size-1, but then you try to use wordList[i+1], indexing beyond the end of the vector and getting undefined results. If you used wordList.at(i+1) instead, it would throw an exception, quickly telling you more about the problem.
My guess is that what's happening is that you're hitting Control-Z, and it's exiting the input loop, but crashing when it tries to count the words, so when you fix that things will work better in general. If you really can't get past the input loop after fixing the other problem(s?), and you're running under Windows, you might try using F6 instead of entering control-Z -- it seems to be a bit more dependable.
I pretty much always use getline when using cin (particularly when what I want is a string):
istream& std::getline( istream& is, string& s );
So, you'd call getline(cin, x) and it would grab everything up to the newline. You have to wait for the newline for cin to give you anything anyway. So, in that case, your loop would become:
while(getline(cin, x))
wordList.push_back(x);
cin does not accept blank space or line breaks so execution of cin does not complete unless you enter something , here is a test program that gives you what you want
#include "stdafx.h"
#include<iostream>
#include <string>
#include <sstream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
string str = "";
while(std::getline(cin, str) && str!="")
{
cout<<"got "<<str<<endl;
}
cout<<"out"<<endl;
cin>>str;
return 0;
}