Read input file word by word [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I will first off say that yes, this is a homework assignment, but my teacher is really not too clear on how to do things.
I'm asked to, in c++, Write a function that will pass words from a file one at a time. The function will calculate the word length and then print out TO SCREEN the word and its length on its own line.
The main will open your input file, read it word by word in a loop and then pass the word into your function for it to be printed.
I know how to open a file using fstream and all that, read it word by word, but not in a loop or a function by the void readfile () one. My problem here is putting everything together.
This is my program to open a file, get the length and display it in a parallel array
//declare parallel arrays
string words [MAXSIZE];
//open files
outputFile.open("output.txt");
inputFile.open ("/Users/cathiedeane/Documents/CIS 22A/Lab 4/Lab 4 Part 2/lab4.txt");
//inputvalidation
while (!inputFile.eof())
{
for(int i = 0; i < MAXSIZE; ++i)
{
outputFile << words[i] << " " << endl;
inputFile >> words[i];
}
inputFile.close();
}
for (int i= 0; i <= MAXSIZE; i++)
{ cout << words[i] << ":" << words[i].size()<< endl;
outputFile << endl;
}
//close outputfile
outputFile.close();
return 0;
}

So basically your assignment is :
function read_word
/* what you have to work on */
end
function read_file_word_by_word
open file
while not end_of_file
word = read_word
print word, word_length
end
close file
end
To read a word, you need to define what it is. Usually it's a bunch of letters, delimited by other characters that are not letters (whitespace, commas, etc.).
You could read the file character by character and store them when they are letters until you encounter some other kind of character. What you have stored is a word, and you can get its length quite easily.
Tip: http://www.cplusplus.com/reference/istream/istream/get/ allows you to read a single character from a file.

#include <fstream>
#include <iostream>
#include <string>
using namespace std;
void func(const string& word)
{
//set field width
cout.width(30);
cout << left << word << " " << word.size() << endl;
}
int main(int argc, char* argv[])
{
ifstream ifs("F:\\tmp\\test.txt");
if(ifs.fail())
{
cout << "fail to open file" << endl;
return -1;
}
_Ctypevec ct = _Getctype();
for(char ch = 0; ch < SCHAR_MAX; ch++)
{
//set all punctuations as field separator of extraction
if(ispunct(ch))
{
(const_cast<short*>(ct._Table))[ch] = ctype<char>::space;
}
}
//change the default locale object of ifstream
ifs.imbue(locale(ifs.getloc(), new ctype<char>(ct._Table)));
string word;
while(ifs >> word)
{
func(word);
}
ifs.close();
}

You'll obviously want to separate each word in to its own string index to store them in your array. To separate each word, establish a break point like char break = ' '; Then, while your IOStream is reading the file, just add the words to the index using an iterator (i++)

Now that some time has passed since you asked the question, I would like to add that this could be answered in a quite small amount of code:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
void printString( const string & str ) { // ignore the & for now, you'll get to it later.
cout << str << " : " << str.size() << endl;
}
int main() {
ifstream fin("your-file-name.txt");
if (!fin) {
cout << "Could not open file" << endl;
return 1;
}
string word; // You only need one word at a time.
while( fin >> word ) {
printString(word);
}
fin.close();
}
A small note on fin >> word, this expression returns true for as long as there was a word read into the string. It will also skip any whitespace (tab, space and newline) by default.

Related

Finding pattern in a text in C++

I have written the following code to find the number of "ATA" in a text that is read to a string as "GCTATAATAGCCATA". The count returned should be 3 but it returns 0. When I check in debugger the string for text is initially created. However, when an empty string is passed to the function patternCount. Am I reading the contents of the file into the string text correctly?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
void patternCount(string text, string pattern);
int main()
{
string text;
fstream file_("test.txt");
if(file_.is_open())
{
while(getline(file_,text))
{
cout << text << '\n';
}
file_.close();
}
cout << "Enter a string ";
string pattern;
getline(cin, pattern);
patternCount(text, pattern);
return 0;
}
void patternCount(string text, string pattern)
{
int count = 0;
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos)
{
nPos = text.find(pattern, nPos + 1);
++count;
}
cout << "There are " << count <<" " << pattern << " in your text.\n";
}
This code just counts the number of occurrence of input string in the last line of text file. If that line is empty or no does not contain the string, The output result will be 0.
But I guess the OP wants to search a whole file, in which case the main function need be fixed accordingly.
std::ifstream file{"test.txt"};
std::ostringstream text;
std::copy(std::istream_iterator<char>{file}, std::istream_iterator<char>{},std::ostream_iterator<char>{text});
//...
patternCount(text.str(), pattern);
So if I understand correctly, you're not sure if you're reading correctly the contents from the file test.txt. If you want to read every content, then try this instead:
ifstream file_("test.txt");
string s,text;
file_>>s;
text=s;
while(file_>>s)
{
text=text+" "+s;
}
This should probably work. Note that reading from a file like filename>>string only reads till the first space. That's why I'm using the while. You can also use getline(), which reads the whole text with spaces. Also note that you should include fstream. Printing out the text should help more as well.
#include <iostream>
#include <fstream>
#include <string>
using std::cout;
using std::cerr;
using std::string;
int count = 0; // we will count the total pattern count here
void patternCount(string text, string pattern);
int main() {
cout << "Enter a string ";
string pattern;
std::getline(cin, pattern);
string text;
fstream file_("test.txt");
if(file_.is_open()){
while(std::getline(file_,text))
patternCount(text,pattern);
file_.close();
}else
cerr<<"Failed to open file";
cout << "There are " << count <<" " << pattern << " in your text.\n";
return 0;
}
void patternCount(string text, string pattern){
size_t nPos = text.find(pattern, 0);
while (nPos != string::npos) {
nPos = text.find(pattern, nPos + 1);
++count;
}
}
The Problem
Your code was good, there were no bugs in patternCount function.
But You were reading the file in an incorrect way. See, everytime you call std::getline(file_, text), the old result of the _text are overwritten by new line. So, in the end of the loop, when you pass text to patternCount function, your text only contains the last line of the file.
The Solution
You could have solved it in two ways:
As mentioned above, you could run patternCount() to each line in while loop and update a global count variable.
You could append all the lines to text in while loop and at last call the patternCount function.
Whichever you prefer, I have implemented the first, while second one is in other answers.

First letters being erased when reading from data file c++

Can someone please explain why only the first letters are being deleted when reading in from a data file but only on the 1/2/3 parts of the array and not the 0 part? (sorry really don't know how to explain it)(I'll only include part of what I am getting as well as data file)
What i get
GoogleyleSmith01#gmail.comyleman27ecurity question:White rabbit with a watch
Deviantartragonmaster27andalfthegreyNULL
What it's supposed to be
GoogleKyleSmith01#gmail.comKyleman27securityquestion:Whiterabbitwithawatch
DeviantartDragonmaster27GandalfthegreyNULL
And the original data file
Google;KyleSmith01#gmail.com;Kyleman27;security question:White rabbit with a watch;
Deviantart;Dragonmaster27;Gandalfthegrey; NULL;
I won't include all of the code as it shouldn't be relevant to this issue
#include<iostream>
#include <fstream>
#include <string>
#include <vector>
#include<sstream>
using namespace std;
const int NCOLS = 4;
const int NROWS = 10;
void description_and_options(string data[][NCOLS], int count[NCOLS]);
void available_options();
void view_line_data(int choice,string data[][NCOLS]);
int main()
{
ifstream file_name;//create the new file
string user_input_file;//the files name inputed by the user
int stringlength;
string read_in_array[NROWS][NCOLS];
string line;
int counter[10] = { 1,2,3,4,5,6,7,8,9,10 };
string user_option_choice;
string small_exit = "x";
string large_exit = "X";
int view_choice;
cout << "Enter the name of the input file: ";
cin >> user_input_file;
if (user_input_file.length() > 4)// check to see if its more than 4 in length
{
stringlength = user_input_file.length(); //saves length
if (user_input_file.substr(stringlength - 4, 4) == ".txt")//checks to see if its .dat
{
file_name.open(user_input_file.c_str());
if (file_name.fail())
{
cerr << "The file " << user_input_file << " failed to open.\n";//tells user if it fails
exit(1);
}
}
}
else
{
user_input_file += ".txt";//adds .dat to non .dat
file_name.open(user_input_file.c_str());
}
if (file_name.fail())
{
cout << "File failed to open" << endl;
system("PAUSE");
exit(1);
}
for (int row = 0; row <= 9; row++)
{
for (int col = 0; col < 4; col++)
{
if (getline(file_name, line, ';'))
{
file_name.ignore(1, '\n');
read_in_array[row][col] = line;
cout << read_in_array[row][col];
}
}
cout << endl;
}
//[updown][leftright]
file_name.close();
is there anyway to fix this without completely changing the code?
It is ignoring the first character because you tell it to
file_name.ignore(1, '\n');
Is going to ignore the first character in the stream after each call to getline. It looks like you are doing this because you think the ; in the file it still there. What you need to remember about getline is that it discards the delimiter you use. That means it will read until it finds a ; and then it tosses that ; out. This means you do not need to ignore it since it is no longer there.
Just removing the call to ignore is not enough to fix the issue though. Since you are trying to parse an entire line what we need to do is read the line into a stringstream and then call getline on the stream to get the individual parts. This is because just reading to ; is going to capture the newline.
A quick refactor of your code gives you something that should look like
for (int row = 0; row <= 9; row++)
{
std::string temp;
std::getline(file_name, temp)
std::stringstream ss(temp)
for (int col = 0; col < 4; col++)
{
if (getline(ss, line, ';'))
{
read_in_array[row][col] = line;
cout << read_in_array[row][col];
}
}
cout << endl;
}
You are using wrongly ifstream::ignore().
Extracts characters from the input sequence and discards them, until
either n characters have been extracted, or one compares equal to
delim.
file_name.ignore(1, '\n'); always dismiss the first letter. In your case, the first letter after ";" in line.
file_name.ignore(1, '\n'); will make the stream ignore one character from the input.
From reading your code:
For what you call "the 0 part", ignore is not called yet before the first getline in the loop.
For "parts 1/2/3", the ignore statement makes the stream skip the next character
For the remaining parts, there is either a space or a '\n' that was skipped so that the readable letter was not skipped.

Why is this word sorting program only looping once?

I'm trying to create a word sorting program that will read the words in a .txt file and then write them to a new file in order from shortest words to longest words. So, for instance, if the first file contains:
elephant
dog
mouse
Once the program has executed, I want the second file (which is initially blank) to contain:
dog
mouse
elephant
Here's the code:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string word;
ifstream readFrom;
ofstream writeTo;
readFrom.open("C:\\Users\\owner\\Desktop\\wordlist.txt");
writeTo.open("C:\\Users\\owner\\Desktop\\newwordlist.txt");
if (readFrom && writeTo)
{
cout << "Both files opened successfully.";
for (int lettercount = 1; lettercount < 20; lettercount++)
{
while (readFrom >> word)
{
if (word.length() == lettercount)
{
cout << "Writing " << word << " to file\n";
writeTo << word << endl;
}
}
readFrom.seekg(0, ios::beg); //resets read pos to beginning of file
}
}
else
cout << "Could not open one or both of files.";
return 0;
}
For the first iteration of the for loop, the nested while loop seems to work just fine, writing the correct values to the second file. However, something goes wrong in all the next iterations of the for loop, because no further words are written to the file. Why is that?
Thank you so much.
while (readFrom >> word)
{
}
readFrom.seekg(0, ios::beg); //resets read pos to begin
The while loop will continue until special flags are set on readFrom, namely, the EOF flag. Seeking to the beginning does not clear any flags, including EOF. Add the following line right before the seek to clear the flags and your code should work fine.
readFrom.clear();
After seek, clear the EOF flag.
readFrom.clear();

HW Help: get char instead of get line C++

I wrote the code below that successfully gets a random line from a file; however, I need to be able to modify one of the lines, so I need to be able to get the line character by character.
How can I change my code to do this?
Use std::istream::get instead of std::getline. Just read your string character by character until you reach \n, EOF or other errors. I also recommend you read the full std::istream reference.
Good luck with your homework!
UPDATE:
OK, I don't think an example will hurt. Here is how I'd do it if I were you:
#include <string>
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace std;
static std::string
answer (const string & question)
{
std::string answer;
const string filename = "answerfile.txt";
ifstream file (filename.c_str ());
if (!file)
{
cerr << "Can't open '" << filename << "' file.\n";
exit (1);
}
for (int i = 0, r = rand () % 5; i <= r; ++i)
{
answer.clear ();
char c;
while (file.get (c).good () && c != '\n')
{
if (c == 'i') c = 'I'; // Replace character? :)
answer.append (1, c);
}
}
return answer;
}
int
main ()
{
srand (time (NULL));
string question;
cout << "Please enter a question: " << flush;
cin >> question;
cout << answer (question) << endl;
}
... the only thing is that I have no idea why do you need to read string char by char in order to modify it. You can modify std::string object, which is even easier. Let's say you want to replace "I think" with "what if"? You might be better off reading more about
std::string and using find, erase, replace etc.
UPDATE 2:
What happens with your latest code is simply this - you open a file, then you get its content character by character until you reach newline (\n). So in either case you will end up reading the first line and then your do-while loop will terminate. If you look into my example, I did while loop that reads line until \n inside a for loop. So that is basically what you should do - repeat your do-while loop for as many times as many lines you want/can get from that file. For example, something like this will read you two lines:
for (int i = 1; i <= 2; ++i)
{
do
{
answerfile.get (answer);
cout << answer << " (from line " << i << ")\n";
}
while (answer != '\n');
}

Counting occurrences of letter in a file

I'm trying to count the number of times each letter appears in a file. When I run the code below it counts "Z" twice. Can anyone explain why?
The test data is:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
#include <iostream> //Required if your program does any I/O
#include <iomanip> //Required for output formatting
#include <fstream> //Required for file I/O
#include <string> //Required if your program uses C++ strings
#include <cmath> //Required for complex math functions
#include <cctype> //Required for letter case conversion
using namespace std; //Required for ANSI C++ 1998 standard.
int main ()
{
string reply;
string inputFileName;
ifstream inputFile;
char character;
int letterCount[127] = {};
cout << "Input file name: ";
getline(cin, inputFileName);
// Open the input file.
inputFile.open(inputFileName.c_str()); // Need .c_str() to convert a C++ string to a C-style string
// Check the file opened successfully.
if ( ! inputFile.is_open())
{
cout << "Unable to open input file." << endl;
cout << "Press enter to continue...";
getline(cin, reply);
exit(1);
}
while ( inputFile.peek() != EOF )
{
inputFile >> character;
//toupper(character);
letterCount[static_cast<int>(character)]++;
}
for (int iteration = 0; iteration <= 127; iteration++)
{
if ( letterCount[iteration] > 0 )
{
cout << static_cast<char>(iteration) << " " << letterCount[iteration] << endl;
}
}
system("pause");
exit(0);
}
As others have pointed out, you have two Qs in the input. The reason you have two Zs is that the last
inputFile >> character;
(probably when there's just a newline character left in the stream, hence not EOF) fails to convert anything, leaving a 'Z' in the global 'character' from the previous iteration. Try inspecting inputFile.fail() afterwards to see this:
while (inputFile.peek() != EOF)
{
inputFile >> character;
if (!inputFile.fail())
{
letterCount[static_cast<int>(character)]++;
}
}
The idiomatic way to write the loop, and which also fixes your 'Z' problem, is:
while (inputFile >> character)
{
letterCount[static_cast<int>(character)]++;
}
There are two Q's in your uppercase string. I believe the reason you get two counts for Z is that you should check for EOF after reading the character, not before, but I am not sure about that.
Well, others already have pointed out the error in your code.
But here is one elegant way you can read the file and count the letters in it:
struct letter_only: std::ctype<char>
{
letter_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
return &rc[0];
}
};
struct Counter
{
std::map<char, int> letterCount;
void operator()(char item)
{
if ( item != std::ctype_base::space)
++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution!
}
operator std::map<char, int>() { return letterCount ; }
};
int main()
{
ifstream input;
input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only!
input.open("filename.txt");
istream_iterator<char> start(input);
istream_iterator<char> end;
std::map<char, int> letterCount = std::for_each(start, end, Counter());
for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
{
cout << it->first <<" : "<< it->second << endl;
}
}
This is modified (untested) version of this solution:
Elegant ways to count the frequency of words in a file
For one thing, you do have two Q's in the input.
Regarding Z, #Jeremiah is probably right in that it is doubly counted due to it being the last character, and your code not detecting EOF properly. This can be easily verified by e.g. changing the order of input characters.
As a side note, here
for (int iteration = 0; iteration <= 127; iteration++)
your index goes out of bounds; either the loop condition should be iteration < 127, or your array declared as int letterCount[128].
Given that you apparently only want to count English letters, it seems like you should be able to simplify your code considerably:
int main(int argc, char **argv) {
std::ifstream infile(argv[1]);
char ch;
static int counts[26];
while (infile >> ch)
if (isalpha(ch))
++counts[tolower(ch)-'a'];
for (int i=0; i<26; i++)
std::cout << 'A' + i << ": " << counts[i] <<"\n";
return 0;
}
Of course, there are quite a few more possibilities. Compared to #Nawaz's code (for example), this is obviously quite a bit shorter and simpler -- but it's also more limited (e.g., as it stands, it only works with un-accented English characters). It's pretty much restricted to the basic ASCII letters -- EBCDIC encoding, ISO 8859-x, or Unicode will break it completely.
His also makes it easy to apply the "letters only" filtration to any file. Choosing between them depends on whether you want/need/can use that flexibility or not. If you only care about the letters mentioned in the question, and only on typical machines that use some superset of ASCII, this code will handle the job more easily -- but if you need more than that, it's not suitable at all.