Read an entire line including spaces from fstream - c++

I am currently working on a small project in C++ and am a bit confused at the moment. I need to read a certain amount of words in a line that is taken from a file using ifstream in(). The problem with it right now is the fact that it keeps ignoring spaces. I need to count the amount of spaces within the file to calculate the number of words. Is there anyway to have in() not ignore the white space?
ifstream in("input.txt");
ofstream out("output.txt");
while(in.is_open() && in.good() && out.is_open())
{
in >> temp;
cout << tokencount(temp) << endl;
}

To count the number of spaces in a file:
std::ifstream inFile("input.txt");
std::istreambuf_iterator<char> it (inFile), end;
int numSpaces = std::count(it, end, ' ');
To count the number of whitespace characters in a file:
std::ifstream inFile("input.txt");
std::istreambuf_iterator<char> it (inFile), end;
int numWS = std::count_if(it, end, (int(*)(int))std::isspace);
As an alternative, instead of counting spaces, you could count words.
std::ifstream inFile("foo.txt);
std::istream_iterator<std::string> it(inFile), end;
int numWords = std::distance(it, end);

Here's how I'd do it:
std::ifstream fs("input.txt");
std::string line;
while (std::getline(fs, line)) {
int numSpaces = std::count(line.begin(), line.end(), ' ');
}
In general, if I have to do something for every line of a file, I find std::getline to be the least finicky way of doing it. If I need stream operators from there I'll end up making a stringstream out of just that line. It's far from the most efficient way of doing things but I'm usually more concerned with getting it right and moving on with life for this sort of thing.

You can use count with an istreambuf_iterator:
ifstream fs("input.txt");
int num_spaces = count(istreambuf_iterator<unsigned char>(fs),
istreambuf_iterator<unsigned char>(),
' ');
edit
Originally my answer used istream_iterator, however as #Robᵩ pointed out it doesn't work.
istream_iterator will iterate over a stream, but assume whitespace formatting and skip over it. My example above but using istream_iterator returned the result zero, as the iterator skipped whitespace and then I asked it to count the spaces that were left.
istreambuf_iterator however takes one raw character at a time, no skipping.
See istreambuf_iterator vs istream_iterator for more info.

Related

calculating average length of words in a local text file

I need to find the
- average length of all the words
- the shortest and longest word length; and
- how many words are
in a separate text file, using c++. There are 79 words in the file and it is called "test.txt."
what i have so far is
#include <bits/stdc++.h>
#include <cstdio>
using namespace std;
int main()
{
FILE* fp;
char buffer[100];
fp = fopen("test.txt", "r");
while (!feof(fp)) // to read file
{
// fucntion used to read the contents of file
fread(buffer, sizeof(buffer), 100, fp);
cout << buffer;
}
return 0;
}
All this does is print out the words that are in the file.
I am using an online compiler until i can get to my desktop with visual studio 2017 later today
Well, with c++ instead of FILE* rather use a std::ifstream, a std::string word; variable and formatted text extraction operator>>() to read single words from the file in a loop:
std::ifstream infile("test.txt");
std:string word;
while(infile >> word) {
}
Count every word read from the file in a variable int wordCount;
int wordCount = 0;
while(infile >> word) {
++wordCount;
}
Sum up the character lengths of the read words in another variable int totalWordsCharacters; (you can use the std::string::length() function to determine the number of characters used in a word).
int totalWordsCharacters = 0;
while(infile >> word) {
totalWordsCharacters += word.length();
}
After you completed reading that file, you can easily compute the average length of words by dividing
int avgCharacterPerWord = totalWordsCharacters / wordCount;
Here's a complete working example, the only difference is the '\n' in your input file format was replaced by a simple blank character (' ').
If you want to have the average between ALL the words, you have to add all lengths together and divide it by the number of words in your file (You said 79 words)
But if you want to get the average between only the shortest word and the longest one you will have to first: Get those words.
You can do that by simply use two counters as you go through all words. The first counter will be set to the length of the current word if it has a smaller length as the first counter. The second counter will be set to the length of the current word if it has a grater length as the second counter.
Then you will add those two counters together and divide them by 2.
Your problem is that you are writing C Code. This makes the problem harder.
In C++ reading a list of words from a file is simple using the >> operator.
std::ifstream file("FileName");
std::string word;
while(file >> word)
{
// I have read another word from the file.
// Do your calculations here.
}
// print out your results here after the loop.
Note the >> operator treats end of line just like a space and simply ignores it (It acts like a word separator).

Reading strings until end of line? [duplicate]

This question already has answers here:
taking input of a string word by word
(3 answers)
Closed 8 years ago.
Is there some way to read consecutive words separated by spaces as strings until end of line is found in C++? To be precise, I'm working on an algorithmic problem and the input goes like:
some_word1 some_word2 some_word3 (...) some_wordn
other_data
And the trick is I don't know how many words will there be in the first line, just that I should read them as separate words for further processing. I know I could use getline(), but after that I'd have to work char-by-char to write each word in a new string when space occurs. Not that it's a lot of work, I'm just curious if there's a better way of doing this.
Why would you have to work character by character after using getline?
The usual way of parsing line oriented input is to read line by line,
using getline, and then use an std::istringstream to parse the line
(assuming that is the most appropriate parsing tool, as it is in your
case). So to read the file:
std::string line;
while ( std::getline( input, line ) ) {
std::istringstream parse( line );
// ...
}
You could use sstream and combine it with getline(), which is something you already know.
#include <iostream>
#include <sstream>
int main()
{
std::string fl;
std::getline(std::cin, fl); // get first line
std::istringstream iss(fl);
std::string word;
while(iss >> word) {
std::cout << "|" << word << "|\n";
}
// now parse the other lines
while (std::getline(std::cin, fl)) {
std::cout << fl << "\n";
}
}
Output:
a b
|a|
|b|
a
a
g
g
t
t
You can see that the spaces are not saved.
Here you can see relevant answers:
Split a string in C++
Taking input of a string word by word
I would suggest to read the complete line as a string and split the string into a vector of strings.
Splitting a string can be found from this question Split a string in C++?
string linestr;
cin>>linestr;
string buf; // Have a buffer string
stringstream ss(linestr); // Insert the string into a stream
vector<string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);

How to find a string of 2 words in a file?

With the following code, I can find a string of 1 word (in this example I'm looking for "Word"):
ifstream file("file.txt");
string str;
while (file >> str){
if (str.find("Word") != string::npos){
////
}
}
But it doesn't work if I want to find, for example, "Computer screen", which is composed of two words.
Thanks
file >> str reads a parameter (in this case, a string) delimited with whitespace. If you want to read the whole line (or in any case, more than one word at once), you can use getline operator (reads the string which is delimited by newline by default).
ifstream file("file.txt");
string str;
while (std::getline (file,str)){
if (str.find("Computer screen") != string::npos){
////
}
}
If you know there are two words and what they are, all you need is this:
ifstream file("file.txt");
string str;
while (file >> str){
if (str.find("Computer") != string::npos){
file >> str;
if (str.find("screen") != string::npos) {
////
}
}
}
But more likely, you are asking to find a single string that might be two words, or three or more.
Then, can you count on the string being on a single line? In which case, #Ashalynd's solution will work. But if the string might be broken it will fail. You then need to handle that case.
If your file is "small" - i.e. can easily fit in memory, read in the whole thing, remove line breaks and search for the string.
If it is too large, read in lines as pairs.
Something like this:
std::ifstream file("file.txt");
std::string str[2];
int i = 0;
std::getline (file,str[i]);
++i;
while (std::getline (file,str[i]))
{
int next_i = (i+1)%2;
std::string pair = str[next_i] + " " + str[i];
if (pair.find("Computer screen") != std::string::npos)
{
////
}
i = next_i;
}
All this assumes that the possible white space between the words in the string is a single space or a newline. If there is a line break with more white-space of some kind (e.g. tabs, you need either to replace white-space in the search string with a regex for white-space, or implement a more complex state machine.
Also, consider whether you need to manage case, probably by converting all strings to lower case before the match.

Reading from an iostream

Maybe I'm missing something, but I'm having a lot of trouble finding any information on how to how to read from an iostream (std::iostream& stream). Is there a way I can convert it to a string or similar?
For clarification this is (what I'm basically trying to do, for example):
std::stringstream ss("Maybe I'm missing something \n but I'm having a lot of trouble finding any information on how to how to read from an iostream.");
readStream(ss);
void readStream(std::iostream& stream)
{
std::string out;
stream >> out;
// Do some stuff with the string
}
This seems to work, but out will be equal to "Maybe" rather than the full string.
You read from an iostream the same way you would if you were using cin.
stream >> varName;
Crazy syntax yes, but that's what the makers of streams decided to do.
You can also use get and getline if your reading to strings. Get will get the next character or a specified buffer of characters, and getline will go to the next newline.
getline(stringName);
You can read more on this here: http://cplusplus.com/reference/iostream/iostream/
Streams converts automatically for the type they are shifting to.
using namespace std;
int number;
double fraction;
string world;
stream >> number >> fraction >> world;
When shifting to a string, it reads until the first word delimiter, you may wish to use std::getline.
using namespace std;
string line;
getline(stream,line);
Maybe you want to read whole lines. In this case you have to use std::getline, thus having:
void readStream(std::iostream& stream)
{
std::string out;
// while getting lines
while(std::getline(stream, out))
{
// Do some stuff with each line
}
}
You can also choose a line delimiter character, by passing it to std::getline as a third parameter.
The stream operator >> is used to read formatted white space separated text.
int val1;
stream >> val1; // reads a space separated int
float val2;
stream >> val2; // reads a space separated float
std::string val3;
stream >> val3; // reads a space separated word.
Unfortunately std::string (and C-Strings) are not symmetric (input/output do not work in the same way (unlike the other fundamental types)). When you write them they write the full string (up to the null terminator, '\0', of the C-string).
If you want to read a whole line of text use std::getline()
std::string line;
std::getline(stream, line);
But like most languages, you can loop reading the stream until it is finished.
std::string word;
while(stream >> word)
{
// Reads one word at a time until the EOF.
std::cout << "Got a word (" << word << ")\n";
}
Or the same thing one line at a time:
std::string line;
while(std::getline(stream, line))
{
// Reads one word at a time until the EOF.
std::cout << "Got a word (" << word << ")\n";
}
Note 1: I mentioned white space separated above. White space includes space/tab and most importantly new line so using the operator >> above it will read one word at a time until the end of file, but ignore new line.
Note 2: The operator >> is supposed to be used on formatted text. Thus its first action is to drop prefix white space characters. On the first non white space text, parse the input as appropriate for the input type and stop on the first character that does not match that type (this includes white space).

Why is this IO operation looping infinitely?

I am trying to read from a text file and tokenize the input. I was getting a segmentation fault until I realized I forgot to close my ifstream. I added the close call and now it loops infinitely. I'm just trying to learn how to use strtok for now, that is why the code doesn't really look complete.
void loadInstructions(char* fileName)
{
ifstream input;
input.open(fileName);
while(!input.eof());
{
string line;
getline (input,line);
char * lineChar = &line[0];
//instruction cmd; //This will be used later to store instructions from the parse
char * token;
token = strtok (lineChar," ");
// just trying to get the line number for now
int lineNumber = atoi(token);
cout << lineNumber << "\n";
}
input.close();
}
input file:(one line)
5 +8 0 0 25
This while(input.good()); is probably not what you intended...
Use this:
string line;
while(getline (input,line))
{
If the getline() works then the loop is entered.
If you try and read past the EOF then it will fail and the loop will exit.
So this should word as expected.
Rather than using strtok() (which damages the string) and atoi() which is non portable.
Use std::stringstream
std::stringstream linestream(line);
int lineNumber;
linestream >> lineNumber; // reads a number from the line.
Don't explicitly close() the stream (unless you want to detect and correct for any problems). The file will be closed when the object goes out of scope at the end of the function.
You want to use eof() not good().
Avoid strtok. There are other ways to tokenize a string that do not require the called function to modify your string. The fact that it modifies the string it tokenizes could also be what causes the loop here.
But more likely, the good() member is not the right one. Try !input.eof() or similar, depending on what you need.
While you've already gotten some answers to the question you asked, perhaps it's worth answering some you should have about the code that you didn't ask:
void loadInstructions(char* fileName)
Since the function isn't going to modify the file name, you almost certainly want to change this to:
void loadInstructions(char const *fileName)
or
void loadInstructions(std::string const &fileName)
ifstream input;
input.open(fileName);
It's much cleaner to combine these:
ifstream input(fileName);
or (if you passed a string instead):
ifstream input(fileName.c_str());
while(!input.eof());
This has already been covered.
string line;
getline (input,line);
char * lineChar = &line[0];
//instruction cmd; //This will be used later to store instructions from the parse
char * token;
token = strtok (lineChar," ");
// just trying to get the line number for now
int lineNumber = atoi(token);
Most of this is just extraneous. You can just let atoi convert directly from the original input:
string line;
getline(input, line);
int lineNumber = atoi(line);
If you're going to tokenize more later, you can use strtol instead:
char *end_ptr;
int lineNumber = strtol(line, &end_ptr, 10);
This will set end_ptr to point just past the end of the part that strtol converted.
I'd also consider an alternative though: moving your code for reading and parsing a line into a class, and define operator>> to read those:
struct line {
int number;
operator int() { return number; }
};
std::istream &operator>>(std::istream &is, line &l) {
// Just for fun, we'll read the data in an alternative fashion.
// Instead of read a line into a buffer, then parse out the first number,
// we'll read a number from the stream, then ignore the rest of the line.
// I usually prefer the other way, but this is worth knowing as well.
is >> l.number;
// When you're ready to parse more, add the extra parsing code here.
is.ignore(std::numeric_limits<std::istream::pos_type>::max, '\n');
return is;
}
With this in place, we can print out the line numbers pretty easily:
std::copy(std::istream_iterator<line>(input),
std::istream_iterator<line>(),
std::ostream_iterator<int>(std::cout, "\n"));
input.close();
I'd usually just let the stream close automatically when it goes out of scope.