calculating average length of words in a local text file - c++

I need to find the
- average length of all the words
- the shortest and longest word length; and
- how many words are
in a separate text file, using c++. There are 79 words in the file and it is called "test.txt."
what i have so far is
#include <bits/stdc++.h>
#include <cstdio>
using namespace std;
int main()
{
FILE* fp;
char buffer[100];
fp = fopen("test.txt", "r");
while (!feof(fp)) // to read file
{
// fucntion used to read the contents of file
fread(buffer, sizeof(buffer), 100, fp);
cout << buffer;
}
return 0;
}
All this does is print out the words that are in the file.
I am using an online compiler until i can get to my desktop with visual studio 2017 later today

Well, with c++ instead of FILE* rather use a std::ifstream, a std::string word; variable and formatted text extraction operator>>() to read single words from the file in a loop:
std::ifstream infile("test.txt");
std:string word;
while(infile >> word) {
}
Count every word read from the file in a variable int wordCount;
int wordCount = 0;
while(infile >> word) {
++wordCount;
}
Sum up the character lengths of the read words in another variable int totalWordsCharacters; (you can use the std::string::length() function to determine the number of characters used in a word).
int totalWordsCharacters = 0;
while(infile >> word) {
totalWordsCharacters += word.length();
}
After you completed reading that file, you can easily compute the average length of words by dividing
int avgCharacterPerWord = totalWordsCharacters / wordCount;
Here's a complete working example, the only difference is the '\n' in your input file format was replaced by a simple blank character (' ').

If you want to have the average between ALL the words, you have to add all lengths together and divide it by the number of words in your file (You said 79 words)
But if you want to get the average between only the shortest word and the longest one you will have to first: Get those words.
You can do that by simply use two counters as you go through all words. The first counter will be set to the length of the current word if it has a smaller length as the first counter. The second counter will be set to the length of the current word if it has a grater length as the second counter.
Then you will add those two counters together and divide them by 2.

Your problem is that you are writing C Code. This makes the problem harder.
In C++ reading a list of words from a file is simple using the >> operator.
std::ifstream file("FileName");
std::string word;
while(file >> word)
{
// I have read another word from the file.
// Do your calculations here.
}
// print out your results here after the loop.
Note the >> operator treats end of line just like a space and simply ignores it (It acts like a word separator).

Related

How to access individual word from c++ vector?

At the end of the program I output the contents of a vector, the strings were inputted from a text file. The entire vector is outputted, but how do I just output one word? I am asking this because I will later need to modify each string.
#include<iostream>
#include<fstream>
#include<vector>
#include<algorithm>
using namespace std;
int main(){
ifstream in;
string line, file_name;
vector <string> phrase;
int total_words, total_letters, total_chars;
cout << "PIG LATIN PROGRAM" << endl;
cout << "Which file are you accessing? : ";
cin >> file_name;
in.open(file_name);
if (in.fail()) cout << "\nFile not found!" << endl;
while(getline(in, line)) phrase.push_back(line);
for(int i = 0; i < phrase.size(); i++){
int limit = phrase.size() - 1;
while(i < limit && phrase[i] == phrase[i]){
i++;
}
cout << phrase[i];
}
You could start by splitting the line in phrase[i] at points there's whitespace:
std::istringstream iss{phrase[i]};
std::vector<std::string> words;
std::string word;
while (iss >> word)
words.push_back(std::move(word));
std::istringstream creates an input stream - a bit like cin - that contains the full line of text read from your file and stored in phrase[i]. If you then use >> word it will extract one whitespace-delimited word of text at a time.
Say your line/phrase[i] input contained "the blue socks were her favourites", it'll be split nicely into words. If there is also punctuation in the line, some of the strings in words will embed that punctuation, e.g. "world.". If you care about that, you can learn to use std::string member functions to search in and edit the strings.
In the case of punctuation you could use
std::erase(std::remove_if(word.begin(), word.end(), std::ispunct), word.end()) to remove it (further details/explanation).
phrase[i] == phrase[i]
Well, that's just redundant. This will always return true for a vector holding strings.
for(int i = 0; (...); i++){
while( (...) ){
i++;
}
}
You are modifying variable i twice in this a single for loop. Once in the third parameter of for, and once in an inner while loop. It's almost never a good idea.
What's happening here is that you set i=0, then immediately set it to point to the last element of a vector (as the second condition in while is always true).
Then you print this element to console, which is the last line of your text file.
What you want to do, is:
1. Load text file line by line into a vector.
2. Each element of vector will hold a single line.
3. Split each line into a vector of WORDS (space separated).
4. Work with the resulting vector.
Or pheraps:
1. Load file word by word at the beginning.
vector<string> words;
copy( istream_iterator<string>{YourFileStream}, istream_iterator<string>{}, back_inserter{words} ); // this will copy the content of file directly into vector, white-space-separated (no need for while loop to do it)
for ( auto i = phrase.begin(); i != phrase.end(); ++i ) // it's the proper c++ way of iterating over a vector. very similar, but variable i will point to every element of vector in order ( not just to the index of an element )
{
// do some work on *i. at least:
std::cout << *i; // dereference operator (*) is needed here, since i doesn't hold index of an element, it's a "pointer" to an element
}
If you need the first approach ( to differentiate between words in different lines ), here you can find some excellent ways to separate a string by any delimeter (space, for example): The most elegant way to iterate the words of a string

Reading from file into string while counting the lines C++

I am trying to read a file into a string array. But I want to do it as if I do not know what the length of the document is. so I want to get a while loop to count the lines, and then another one to read the document.
When I do this, it works fine but it assumes I know what the length is going to be for the size of the arrays.
string count_lines;//dummy string to read the line temp
string votes[11];
string ID[11];
string whole_line[11];
int i = 0;
while (getline(file, count_lines))
{
whole_line[i] = count_lines;
ID[i].assign(count_lines, 0, 4);
votes[i].assign(count_lines, 6, 4);
cout << count_lines << endl;
i++;
}
But i tried to do this variation but it just prints blank lines with the same function as i printed the option above
string count_lines;//dummy string to read the line temp
string votes[11];
string ID[11];
string whole_line[11];
int i = 0;
while (getline(file, count_lines))
{
i++;
}
int k = 0;
while (getline(file, count_lines) && k < i)
{
whole_line[k] = count_lines;
ID[k].assign(count_lines, 0, 4);
votes[k].assign(count_lines, 6, 4);
cout << count_lines << endl;
i++;
}
I am not sure what i'm doing wrong.
Each call to std::geline (as well as << operator and read method) advances input position stored in the stream object. In the first while loop, you read the entire file, so after this loop, input position indicator points to the end of the file.
In order to start reading from the beginning in the second loop, you have to reset the position back to 0, using the ifstream::seekg method. This way you'll be able to "re-read" the entire file.
On the other hand, as pointed out in the comments, this isn't really the best way to read a file into memory, line by line. It would probably be better to use std::vector to store lines and append lines read with getline to it. Alternatively, you could read the entire file at once into a single buffer and split it into lines.
If you really are just looking to get the number of lines in your file, it's far more efficient to read the entire file into a buffer all at once, then just count the number of newline characters contained within. The following code is one of the more efficient ways this could be done.
#include <vector>
#include <algorithm>
#include <fstream>
int main()
{
std::ifstream textFile("your_file.txt", std::ios::binary);
size_t fileSize = textFile.tellg();
textFile.seekg(0, std::ios::beg);
std::vector<char> buffer(fileSize);
size_t numLines(0);
if (textFile.read(buffer.data(), fileSize))
{
numLines = std::count(buffer.begin(), buffer.end(), '\n');
}
return 0;
}

C++ - Program to count occurrences of word without ifstream

I have a code where the program will read a word from user and then count its total occurrence in a text file “my_data.txt”. But I don't want to use the ifstream function. I already have a text like "the sky is blue".
I want the program to read from that. I know I can create a string and add the text but how can I count the occurrences?
Here is my code so far:
#include<iostream.h>
#include<fstream.h>
#include<string.h>
int main()
{
ifstream fin("my_data.txt"); //opening text file
int count=0;
char ch[20],c[20];
cout<<"Enter a word to count:";
gets(c);
while(fin)
{
fin>>ch;
if(strcmp(ch,c)==0)
count++;
}
cout<<"Occurrence="<<count<<"\n";
fin.close(); //closing file
return 0;
}
Without using ifstream, you have some choices: cin and piping; or fscanf. I really don't understand why you don't want to use ifstream.
cin and Piping
You can use the cin stream and let the OS rout the data file to your program.
You loop would look something like this:
std::string word;
while (cin >> word)
{
// process the word
}
An example invocation using a command line is:
my_program.exe < my_data.txt
This invocation tells the Operating System to redirect the standard input to a driver that reads from the file my_data.txt.
Using fscanf
The fscanf comes from the C background and can be used to read from files. Developing the correct format specifier for a word can be tricky. But it isn't std::ifstream.
Also, fscanf cannot be safely used with the std::string, whereas std::ifstream can be used safely with std::string.
Edit 1: Words From a String
Since there is some ambiguity in your question, one interpretation is that you want to count words from a string of text.
Let's say you have a declaration like this:
const std::string sentence = "I'm hungry, feed me now.";
You could use std::istringstream and count the words:
std::string word;
std::istringstream sentence_stream(sentence);
unsigned int word_count = 0U;
while (sentence_stream >> word)
{
++word_count;
}

When parsing a string using a string stream, it extracts a new line character

Description of the program : The program must read in a variable amount of words until a sentinel value is specified ("#" in this case). It stores the words in a vector array.
Problem : I use a getline to read in the string and parse the string with a stringstream. My problem is that the stringstream is not swallowing the new line character at the end of each line and is instead extracting it.
Some solutions I have thought of is to cut off the last character by creating a subset or checking if the next extracted word is a new line character, but I feel there is a better cost efficient solution such as changing the conditions for my loops.
I have included a minimized version of the overall code that reproduces the problem.
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
const int MAX_LIST_SIZE = 1000;
string str;
string list[MAX_LIST_SIZE];
int numWords = 0;
// program starts here
getline(cin, str); // read innput
stringstream parse(str); // use stringstream to parse input
while(str != "#") // read in until sentinel value
{
while(!parse.fail()) // until all words are extracted from the line
{
parse >> list[numWords]; // store words
numWords++;
}
getline(cin,str); // get next line
parse.clear();
parse.str(str);
}
// print number of words
cout << "Number of words : " << numWords << endl;
}
And a set of test input data that will produce the problem
Input:
apples oranges mangos
bananas
pineapples strawberries
Output:
Number of words : 9
Expected Output:
Number of words : 6
I would appreciate any suggestions on how to deal with this problem in an efficient manner.
Your logic for parsing out the stream isn't quite correct. fail() only becomes true after a >> operation fails, so you'll doing an extra increment each time. For example:
while(!parse.fail())
{
parse >> list[numWords]; // fails
numWords++; // increment numWords anyway
} // THEN check !fail(), but we incremented already!
All of these operations have returns that you should check as you go to avoid this problem:
while (getline(cin, str)) { // fails if no more lines in cin
if (str != "#") { // doesn't need to be a while
stringstream parse(str);
while (parse >> list[numWords]) { // fails if no more words
++numWords; // *only* increment if we got one!
}
}
}
Even better would be to not use an array at all for the list of words:
std::vector<std::string> words;
Which can be used in the inner loop:
std::string temp;
while (parse >> temp) {
words.push_back(temp);
}
The increment on numwords happens one more time than you intend at the end of each line. Use a std::vector< std::string > for your list. Then you can use list.size().

How to input text containing less than 1000 words with spaces and punctuations?

I am able to input string using the following code:
string str;
getline(cin, str);
But I want to know how to put an upper limit on the number of words that can be given as input.
You cannot do what you are asking with just getline or even read. If you want to limit the number of words you can use a simple for loop and the stream in operator.
#include <vector>
#include <string>
int main()
{
std::string word;
std::vector<std::string> words;
for (size_t count = 0; count < 1000 && std::cin >> word; ++count)
words.push_back(word);
}
This will read up to 1000 words and stuff them into a vector.
getline() reads characters and has no notion of what a word is. The definition of a word is likely to change with context and language. You'll need to read a stream one character at a time, extracting words that match your definition of a word and stop when you have met your limit.
You can either read one character at a time, or only process 1000 characters from your string(s).
You may be able to set a limit on std::string and use that.
Following will read only count no words separated by spaces in a vector, discarding
others.
Here punctuations are also read as "word" is separated by spaces, you need to remove them from vector.
std::vector<std::string> v;
int count=1000;
std::copy_if(std::istream_iterator<std::string>(std::cin),
// can use a ifstream here to read from file
std::istream_iterator<std::string>(),
std::back_inserter(v),
[&](const std::string & s){return --count >= 0;}
);
Hope this program helps you out. This code handles input ofmultiple words in a single line as well
#include<iostream>
#include<string>
using namespace std;
int main()
{
const int LIMIT = 5;
int counter = 0;
string line;
string words[LIMIT];
bool flag = false;
char* word;
do
{
cout<<"enter a word or a line";
getline(cin,line);
word = strtok(const_cast<char*>(line.c_str())," ");
while(word)
{
if(LIMIT == counter)
{
cout<<"Limit reached";
flag = true;
break;
}
words[counter] = word;
word = strtok(NULL," ");
counter++;
}
if(flag)
{
break;
}
}while(counter>0);
getchar();
}
As of now, this program has the limit to accept only 5 words and put it in a string array.
Use the following function:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684961%28v=vs.85%29.aspx
You can specify the third argument to limit the amount of read characters.