Counting alphanumeric characters in a text file in C++ - c++

I wrote a program for counting the number of alphanumeric characters in a text file. However, the number it returns is always larger than the number that online character counters return.
For example, the program will calculate the number of alphanumeric characters in this text:
if these people had strange fads and expected obedience on the most
extraordinary matters they were at least ready to pay for their
eccentricity
to be 162. Running the program again, it'll say there are 164 characters in the text. Running it again, it'll say there are 156 characters. Using this online character counter, it seems that the character count ought to be lower than 144 (the online character counter includes spaces as well).
Here is the code:
#include <iostream>
#include <fstream>
#include <cctype>
using namespace std;
int main() {
char line[100];
int charcount = 0;
ifstream file("pg1661sample.txt");
while (!file.eof()) {
file.getline(line, 99);
for (int i = 0; i < 100; i++) {
if (isalnum(line[i])) {
charcount++;
}
}
}
cout << endl << "Alphanumeric character count: " << charcount;
cin.get();
return 0;
}
What am I doing wrong?

Try:
#include <iterator>
#include <algorithm>
#include <iostream>
#include <cctype>
bool isAlphaNum(unsigned char x){return std::isalnum(x);}
int main()
{
std::cout << "Alphanumeric character count: " <<
std::count_if(std::istream_iterator<char>(std::cin),
std::istream_iterator<char>(),
isAlphaNum
) ;
}
Problems with your code:
EOF is not true until you read past the end of file:
// this is true even if there is nothing left to read.
// If fails the first time you read after there is nothing left.
while (!file.eof()) {
// thus this line may fail
file.getline(line, 99);
It is better to always do this:
while(file.getline(line, 99))
The loop is only entered if the getline actually worked.
You are also using a bad version of getline (as lines may be larger than 100 characters).
Try and use the version that works with std::string so it auto expands.
std::string line;
while(std::getline(file, line))
{
// stuff
}
Next you assume the line is exactly 100 characters.
What happedn if the line is only 2 characters long?
for (int i = 0; i < 100; i++)
Basically you will scan over the data and it will count letters that were from left over from a previous line (if a previous line was longer than the current) or completely random garbage. If you are still useing file.getline() then you can retrieve the number of characters from a line using file.gcount(). If you use the std::getline() then the variable line will be the exact size of the line read (line.size()).

while (!file.eof()) {
Don't do this. eof() doesn't return true until after an attempted input has failed, so loops like this run an extra time. Instead, do this:
while (!file.getline(line, 99)) {
The loop will terminate when the input ends.
The other problem is in the loop that counts characters. Ask yourself: how many characters got read into the buffer on each pass through the input loop? And why, then, is the counting loop looking at 100 characters?

You're assuming that getline() fills line with exactly 100 characters. Check the length of the string read in by getline(), e.g. using strlen():
for (int i = 0; i < strlen(line); i++) {
if (isalnum(line[i])) {
charcount++;
}
}
EDIT: Also, make sure you heed the suggestion from other answers to use getline()'s return value for the loop condition rather than calling eof().

Related

Reading specific lines from a file in C++

The following code is meant to loop through the last ten lines of a file that I have previously opened. I think the seekg function refers to binary files and only go through individual bytes of data, so that may be my issue here.
//Set cursor to 10 places before end
//Read lines of input
input.seekg(10L, ios::end);
getline(input, a);
while (input) {
cout << a << endl;
getline(input, a);
}
input.close();
int b;
cin >> b;
return 0;
}
The other method I was thinking of doing is just counting the number of times the file gets looped through initially, taking that and subtracting ten, then counting through the file that number of times, then outputting the next ten, but that seems extensive for what I want to do.
Is there something like seekg that will go to a specific line in the text file? Or should I use the method I proposed above?
EDIT: I answered my own question: the looping thing was like 6 more lines of code.
Search backwards for the newline character 10 times or until the file cursor is less than or equal to zero.
If you don't care about the order of the last 10 lines, you can do this:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <cmath>
int main() {
std::ifstream file("test.txt");
std::vector<std::string> lines(10);
for ( int i = 0; getline(file, lines[i % 10]); ++i );
return 0;
}

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.
This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.
You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.
I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

C++ - Xcode Program

I am writing a program that extracts data from a text file and encrypts it. I am having some trouble with this. First of all there is an error at the getline(data,s[i]) part. Also the text file has two sentences but it only encrypts the second sentence. The other issue with that is It encrypts one letter at a time and outputs the sentence every time. It should output just the sentence encrypted.
#include <iostream>
#include <fstream>
#include <istream>
using namespace std;
int main(){
//Declare Variables
string s;
ifstream data;
//Uses Fstream to open text file
data.open ("/Users/MacBookPro/Desktop/data.txt");
// Use while loop to extract the data from the text file
while(!data.eof()){
getline(data,s);
cout<< s << endl;
}
//Puts the data from the text file into a string array
for(int i = 0; data.good(); i++){
getline(data, s[i]);
cout<< s <<endl;
}
// encrypts the string
if(data.is_open()){
for(int i = 0; i < s.length();i++){
s[i] += 2;
cout << s << endl;
}
}
return 0;
}
In the code below you already reach the end of the stream, and store the last line on the string s.
while(!data.eof()){
getline(data,s);
cout<< s << endl;
}
My suggestion is that you use a list of strings.
vector< string > s;
string tmp;
while(!data.eof()){
getline(data,tmp);
s.push_back(tmp);
cout<< s << endl;
}
The next step you loop through the list and do the encryption
for(i=0; i < s.size(); i++)
{
// encrypt s[i]
}
Hope this helped!
First I had to add this line to get "getline" to be recognised:
#include <string>
Then, there was indeed an error with the line:
getline(data, s[i]);
This is a compilation error, that function is expecting a stream and a string, but you pass it a stream and a char.
Changing that line for:
getline(data, s);
makes your program compile.
However it probably does not do what you want at this point, since the variable i from the for is being ignored.
I suggest that you check out some documentation on the getline function, then rethink what you want to do and try again.
You can fine some doc here:
https://msdn.microsoft.com/en-us/library/vstudio/2whx1zkx(v=vs.100).aspx
Your other concern was that it output your string many times. This is normal, since your cout statement is inside in your encryption loop.
Move it outside the loop instead, to output it only one time once the encryption loop is done.
It is important to spend the time to understand what each line of your program is doing, and why you need it to achieve your goal.
Also when doing something that we find complicated, its easier to do one small part of it at a time, make sure it works, then continue with the next part.
Good Luck :)
Create a temporary string for containing each line and an integer since we're going to find the total number of lines to create an array for all of them.
string temp = "";
int numberOfLines = 0;
Now we try to find the total number of lines
while(data.good()) {
getline(data, temp);
cout << temp << endl;
numberOfLines++;
}
Now we can create an array for all of the lines. This is a dynamic array which you can read on it for more information.
string * lines = new string[numberOfLines];
Now is the time to roll back and read encrypt all the lines. But first we have to go back to first position of file. That's why we use seekg
data.seekg(data.beg);
For each line we read, we'll put in the array and loop through each character, encrypt it and then show the whole sentence.
int i = 0;
while (data.good()) {
getline(data, lines[i]);
i++;
for (int j = 0; lines[i].size(); j++ ) {
lines[i].at(j) += 2;
}
cout << lines[i] << endl;
}
Voila!
s[i] is a character in the string (actually, a character reference), not the string object itself. string::operator[] returns a char& in the docs. See here.
Consider declaring a std::vector<string> string_array; and then use the string_array.push_back(data) member function to append strings from the file onto the vector. Use a for loop to iterate through the vector at a later time with a vector<string>::iterator or the std::vector::size function to get the length of the vector for a traditional for loop (via a call to string_array.size()). Use the square brackets to get each string from the vector (string_array[0 or 1 or etc.]).
Get characters from each string in the vector by using something like string_array[n][m] for the mth character of the nth string. Iterating over each character should be as simple as using the string::length member function to get the string length, and then another for loop.
Also, std::cout << s << std::endl is being used in the wrong places. To output each character, try std::cout << s[i] << std::endl instead, or printf("%c", s[i]), whichever you like.
I'd suggest not using an array to hold strings from the file because you don't know what the array's length will be at runtime (the file size could be unbounded), so a vector is better suited for this case.
Finally, if you need code, there's a beginner's forum post here that I think will help you out. It has a lot of code like yours, but you'll have to modify it for your purposes.
Finally, please use:
std::someCPPLibraryFunction(args);
instead of...
using namespace std;
someCPPLibraryFunction(args);

first string in array of strings is being skipped

Somehow when I run this code and it comes to inputting strings, the first string where i=0 is being skipped and it starts entering strings from A[1]. So I end up with A[0] filled with random stuff from memory. Can someone please point at the problem?
cin>>s;
char** A;
A = new char *[s];
cout<<"now please fill the strings"<<endl;
for (int i=0;i<s;i++)
{
A[i] = new char[100];
cout<<"string "<<i<<": ";
gets(A[i]);
}
That code is horrible. Here's how it should look like in real C++:
#include <string>
#include <iostream>
#include <vector>
int main()
{
std::cout << "Please start entering lines. A blank line or "
<< "EOF (Ctrl-D) will terminate the input.\n";
std::vector<std::string> lines;
for (std::string line; std::getline(std::cin, line) && !line.empty(); )
{
lines.push_back(line);
}
std::cout << "Thank you, goodbye.\n";
}
Note the absence of any pointers or new expressions.
If you like you can add a little prompt print by adding std::cout << "> " && at the beginning of the conditional check in the for loop.
Probably because you're using gets()... never use gets()
Use fgets() instead.
gets vs fgets
The problem is that cin>>s; just picks up the number you want and leaves a \n (newline from the enter press) on stdin that gets() picks up in the first iteration. This is not the nicest way to fix it, but to prove it write this line after that line:
int a = fgetc(stdin);
Check out a afterwards to confirm it has a newline.
Well, you probably get an empty string: when reading s you use formatted input which stops as soon as a non-digit is encountered, e.g., the newline used to indicate its input is finished. gets(), thus, immediately finds a newline, terminating the first string read.
That said, you shall never use gets(): It is a primary security problem and the root cause of many potential attack! You should, instead, use fgets() or, better, yet, std::getline() together with std::strings and a std::vector<std::string> >. Aslo, you should always verify that the attempt to input was successful:
if ((std::cin >> s).ignore(std::numeric_limits<std::streamsize>::max(), `\n`)) {
std::string line;
for (int i(0); i != s && std::getline(std::cin, line); ) {
A.push_back(line);
}
}

fstream's getline() works for a little bit...and then breaks

I have a text file with a bunch of numbers separated by newlines, like this:
123.25
95.12
114.12 etc...
The problem is, when my program reads it, it only copies the number to the array up to the second number and then fills the rest of the elements with zeroes. I've tried using delimiters and ignore statements but nothing has worked. Here's the code.
Edit(here's the whole program:)
#include <iostream>
#include <string.h>
#include <iomanip>
#include <fstream>
using namespace std;
struct utilityInfo
{
char utility[20];
double monthlyExpenses[12];
};
int main(){
utilityInfo Utility[3];
char charray[100];
fstream inFile;
inFile.open("expenses.txt");
inFile.getline(charray, 7);
cout<<charray<<endl;
if(inFile.fail()) cout<<"it didnt work";
for(int i=0; i<12; i++)
{
inFile.getline(charray,20);
Utility[0].monthlyExpenses[i]=atof(charray);
}
for(int z=0; z<12; z++)
{
cout<<Utility[0].monthlyExpenses[z]<<endl;
}
inFile.close();
return 0;
}
Here's what the text file looks like:
207.14
177.34
150.55
104.22
86.36
53.97
52.55
58.77
64.66
120.32
153.45
170.90
And here's what the output looks like:
207.14
177.34
0
0
0
0
0
0
0
0
0
0
Your first entry in your file, "207.14" is actually "207.14 " -- (there's a space there). You read 7 characters but leave " " there, this means that istream::getline sets the failbit on inFile, meaning your successive getlines fail.
To fix this either read enough to reach the newline character, remove the space and/or clear inFiles failbit after your first getline.
You should also add a check within your for loop to handle any errors that may occur with fail/bad/eof bits.