Count first digit on each line of a text file - c++

My project takes a filename and opens it. I need to read each line of a .txt file until the first digit occurs, skipping whitespace, chars, zeros, or special chars. My text file could look like this:
1435 //1, nextline
0 //skip, next line
//skip, nextline
(*Hi 245*) 2 //skip until second 2 after comment and count, next line
345 556 //3 and count, next line
4 //4, nextline
My desired output would be all the way up to nine but I condensed it:
Digit Count Frequency
1: 1 .25
2: 1 .25
3: 1 .25
4: 1 .25
My code is as follows:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
int digit = 1;
int array[8];
string filename;
//cout for getting user path
//the compiler parses string literals differently so use a double backslash or a forward slash
cout << "Enter the path of the data file, be sure to include extension." << endl;
cout << "You can use either of the following:" << endl;
cout << "A forwardslash or double backslash to separate each directory." << endl;
getline(cin,filename);
ifstream input_file(filename.c_str());
if (input_file.is_open()) { //if file is open
cout << "open" << endl; //just a coding check to make sure it works ignore
string fileContents; //string to store contents
string temp;
while (!input_file.eof()) { //not end of file I know not best practice
getline(input_file, temp);
fileContents.append(temp); //appends file to string
}
cout << fileContents << endl; //prints string for test
}
else {
cout << "Error opening file check path or file extension" << endl;
}
In this file format, (* signals the beginning of a comment, so everything from there to a matching *) should be ignored (even if it contains a digit). For example, given input of (*Hi 245*) 6, the 6 should be counted, not the 2.
How do I iterate over the file only finding the first integer and counting it, while ignoring comments?

One way to approach your problem is the following:
Create a std::map<int, int> where the key is the digit and the value is the count. This allows you to compute statistics on your digits such as the count and the frequency after you have parsed the file. Something similar can be found in this SO answer.
Read each line of your file as a std::string using std::getline as shown in this SO answer.
For each line, strip the comments using a function such as this:
std::string& strip_comments(std::string & inp,
std::string const& beg,
std::string const& fin = "") {
std::size_t bpos;
while ((bpos = inp.find(beg)) != std::string::npos) {
if (fin != "") {
std::size_t fpos = inp.find(fin, bpos + beg.length());
if (fpos != std::string::npos) {
inp = inp.erase(bpos, fpos - bpos + fin.length());
} else {
// else don't erase because fin is not found, but break
break;
}
} else {
inp = inp.erase(bpos, inp.length() - bpos);
}
}
return inp;
}
which can be used like this:
std::string line;
std::getline(input_file, line);
line = strip_comments(line, "(*", "*)");
After stripping the comments, use the string member function find_first_of to find the first digit:
std::size_t dpos = line.find_first_of("123456789");
What is returned here is the index location in the string for the first digit. You should check that the returned position is not std::string::npos, as that would indicate that no digits are found. If the first digit is found, the corresponding character can be extracted using const char c = line[dpos]; and converted to an integer using std::atoi.
Increment the count for that digit in the std::map as shown in that first linked SO answer. Then loop back to read the next line.
After reading all lines from the file, the std::map will contain the counts for all first digits found in each line stripped of comments. You can then iterate over this map to retrieve all the counts, accumulate the total count over all digits found, and compute the frequency for each digit. Note that digits not found will not be in the map.
I hope this helps you get started. I leave the writing of the code to you. Good luck!

Related

The problem of analyzing a string and searching

I want to write code that takes a string of text from the user and shows the number of characters and the number of words using the .find () function. then takes a word from user and Search the text and show the position of the word. I'm in trouble now, please help me.
#include<iostream>
#include <cctype>
#include<string>
#include<cstring>
using namespace std;
int main()
{ char quit;
int word=0;
string txt;
cout << "Enter a string: ";
getline(cin, txt);
cout << "The number of characters in the string is:" << txt.length() << endl;
while(string txt != NULL)
{ if(txt.find(" "))
++word;
}
cout<<"wors is "<<word;
while(quit!='q')
{
cout<<"wors is ";
cin>>search;
cout<<"Enter(c)if you want to continue, and enter(q)if you want quic:";
cin>>quit;
}
return 0;
}
Here's an example of extracting words. There are many other methods.
static const char end_of_word_chars[] = "!?., :\t";
//...
std::string::size_type previous_position = 0;
std::string::size_type position = txt.find_first_of(end_of_word_chars);
while (position != std::string::npos)
{
std::string word = txt.substr(previous_position, position - previous_position);
std::cout << word << "\n";
previous_position = txt.find_first_of(position + 1);
position = txt.find_first_not_of(end_of_word_chars);
}
The above code uses an array of "end of word characters", to denote the end of a word. The string txt is searched from the beginning to find the position of the first character that is in the set of word endinging characters. In the while loop, the spaces or non-word characters are skipped. And the position of the next "word ending" character is found and the loop may repeat again.
Edit 1: String as stream
Another method is to treat the txt as a string stream and use operator>> to skip whitespace:
std::istringstream text_stream(txt);
std::string word;
while (text_stream >> word)
{
std::cout << word << "\n";
}
One issue with the above code fragment is that it doesn't account for word ending characters that are not spaces or tabs. So for example, in the text "Yes. I'm Home.", the period is included as part of the "word", such as "Yes." and "Home."

C++ : find word in a string, count how many times was found, then print meaning of the word

I'm doing the assignment and I'm at the end of my powers. Right now I can't figure out what's missing or what I could change.
I need the program to read me a file. If it finds the beginning of the search word, it lists the word and its meaning. If he finds it more than once, he writes only that word without meaning.
Right now, if the program finds more words, it writes the meaning for the first word and writes the word for the other words found.
I don't know what other cycle I could use. If you could help me, I would be grateful.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include<bits/stdc++.h>
using namespace std;
int main()
{
ifstream dictionary("dictionary.txt");
if(!dictionary.is_open()){
cout<< "File failed to open" << endl;
return 0;
}
int option;
cout << "1.<starting>" << endl;
cout << "4.<stop>" << endl;
cin >> option;
string find_word;
string word, meaning;
string line;
string found;
int count = 0;
if (option == 1)
{
cout << "Find the meaning of the word beginning with the characters:";
cin >> find_word;
while (getline(dictionary,line))
{
stringstream ss(line);
getline (ss, word, ';');
getline (ss, meaning, ';');
if (word.rfind(find_word, 0) != string::npos)
{
count++;
if (count <=1)
{
found = word + meaning;
cout << found << endl;
}
if (count >= 2)
{
found = word ;
cout << found << endl;
}
}
}
}
if (option == 4)
{
return 0;
}
dictionary.close();
return 0;
}
EDIT
dictionary.txt looks like this:
attention; attentionmeaning
attention; attentionmeaning2
computer; computermeaning
criminal; criminalmeaning
boat; boatmeaning
alien; alienmeaning
atter; meaning
.
.
etc.
For example input is:
Find the meaning of the word beginning with the characters: att
this is what i get now (output):
attention attentionmeaning
attention
atter
this is what i expect (desire output):
attention
attention
atter
if program find only one searching word it should write this:
Find the meaning of the word beginning with the characters: bo
output:
boat boatmeaning
As it was already suggested, while reading the file, you don't know if there will be more than one entries matching your search term. That being said, you need some intermediate structure to store all the matching entries.
After you have gathered all the results, you can easily check if the data contains more than one result, in which case you only print the "word" without the meaning. In case there is only one result, you can print the "word" together with its meaning.
The code for that could look something like this:
struct Entry {
std::string name;
std::string meaning;
bool startsWith(const std::string& str) {
return name.find(str) != std::string::npos;
}
};
Entry createEntry(const std::string& line) {
Entry entry;
std::stringstream ss(line);
std::getline(ss, entry.name, ';');
std::getline(ss, entry.meaning, ';');
return entry;
}
int main() {
std::string query = "att";
std::ifstream dictionary("dictionary.txt");
std::vector<Entry> entries;
std::string line;
while (std::getline(dictionary, line)) {
Entry entry = createEntry(line);
if (entry.startsWith(query)) {
entries.emplace_back(std::move(entry));
}
}
for (const Entry& entry : entries) {
std::cout << entry.name << (entries.size() > 1 ? "\n" : " " + entry.meaning + '\n');
}
}
This code could definitely be more optimized, but for the sake of simplicity, this should suffice.
Demo
The problem is that at the first time through the loop you do not know if there is one or more valid words that follow from your string. I would suggest you create an empty list outside the loop, and push all the word and meaning pairs that match onto the list. Then after if the size of the list is 1 you can output the word and meaning pair else use a for loop to loop through and just print the words.

Reading text with blanks and numeric data from a file

So I have data in a text like this:
Alaska 200 500
New Jersey 400 300
.
.
And I am using ifstream to open it.
This is part of a course assignment. We are not allowed to read in the whole line all at once and parse it into the various pieces. So trying to figure out how to read each part of every line.
Using >> will only read in "New" for "New Jersey" due to the white space/blank in the middle of that state name. Have tried a number of different things like .get(), .read(), .getline(). I have not been able to get the whole state name read in, and then read in the remainder of the numeric data for a given line.
I am wondering whether it is possible to read the whole line directly into a structure. Of course, structure is a new thing we are learning...
Any suggestions?
Can't you just read the state name in a loop?
Read a string from cin: if the first character of the string is numeric then you've reached the next field and you can exit the loop. Otherwise just append it to the state name and loop again.
Here is a line by line parsing solution that doesn't use any c-style parsing methods:
std::string line;
while (getline(ss, line) && !line.empty()) {
size_t startOfNumbers = line.find_first_of("0123456789");
size_t endOfName = line.find_last_not_of(" ", startOfNumbers);
std::string name = line.substr(0, endOfName); // Extract name
std::stringstream nums(line.substr(startOfNumbers)); // Get rest of the line
int num1, num2;
nums >> num1 >> num2; // Read numbers
std::cout << name << " " << num1 << " " << num2 << std::endl;
}
If you can't use getline, do it yourself: Read and store in a buffer until you find '\n'. In this case you probably also cannot use all the groovy stuff in std::string and algorithm and might as well use good ol' C programming at that point.
Once you have grabbed a line, read your way backwards from the end of the line and
Discard all whitespace until you find non whitespace.
Gather characters found into token 3 until you find whitepace again.
Read and discard the whitespace until you find the end of token 2.
Gather token 2 until you find more whitespace.
Discard the whitespace until you find the end of token 1. The rest of the line is all token 1.
convert token 2 and token 3 into numbers. I like to use strtol for this.
You can build all of the above or Daniel's answer (use his answer if at all possible) into an overload of operator>>. This lets you
mystruct temp;
while (filein >> temp)
{
// do something with temp. Stick it in a vector, whatever
}
The code to do this looks something like (Stealing wholesale from What are the basic rules and idioms for operator overloading? <-- Read this. It could save your life one day)
std::istream& operator>>(std::istream& is, mystruct & obj)
{
// read obj from stream
if( /* no valid object of T found in stream */ )
is.setstate(std::ios::failbit);
return is;
}
Here's another example of reading the file word by word. Edited to remove the example using the eof check as the while loop condition. Also included a struct as you mentioned that's what you just learned. I'm not sure how you're supposed to use your struct, so I just made it simple and had it contain 3 variables, a string, and 2 ints. To verify it reads correctly it couts the contents of the struct variables after its read in which includes printing out "New Jersey" as one word.
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h> // for atoi
using namespace std;
// Not sure how you're supposed to use the struct you mentioned. But for this example it'll just contain 3 variables to store the data read in from each line
struct tempVariables
{
std::string state;
int number1;
int number2;
};
// This will read the set of characters and return true if its a number, or false if its just string text
bool is_number(const std::string& s)
{
return !s.empty() && s.find_first_not_of("0123456789") == std::string::npos;
}
int main()
{
tempVariables temp;
ifstream file;
file.open("readme.txt");
std::string word;
std::string state;
bool stateComplete = false;
bool num1Read = false;
bool num2Read = false;
if(file.is_open())
{
while (file >> word)
{
// Check if text read in is a number or not
if(is_number(word))
{
// Here set the word (which is the number) to an int that is part of your struct
if(!num1Read)
{
// if code gets here we know it finished reading the "string text" of the line
stateComplete = true;
temp.number1 = atoi(word.c_str());
num1Read = true; // won't read the next text in to number1 var until after it reads a state again on next line
}
else if(!num2Read)
{
temp.number2 = atoi(word.c_str());
num2Read = true; // won't read the next text in to number2 var until after it reads a state agaon on next line
}
}
else
{
// reads in the state text
temp.state = temp.state + word + " ";
}
if(stateComplete)
{
cout<<"State is: " << temp.state <<endl;
temp.state = "";
stateComplete = false;
}
if(num1Read && num2Read)
{
cout<<"num 1: "<<temp.number1<<endl;
cout<<"num 2: "<<temp.number2<<endl;
num1Read = false;
num2Read = false;
}
}
}
return 0;
}

How can I ignore the "end of line" or "new line" character when reading text files word by word?

Objective:
I am reading a text file word by word, and am saving each word as an element in an array. I am then printing out this array, word by word. I know this could be done more efficiently, but this is for an assignment and I have to use an array.
I'm doing more with the array, such as counting repeated elements, removing certain elements, etc. I also have successfully converted the files to be entirely lowercase and without punctuation.
Current Situation:
I have a text file that looks like this:
beginning of file
more lines with some bizzare spacing
some lines next to each other
while
others are farther apart
eof
Here is some of my code with itemsInArray initialized at 0 and an array of words refered to as wordArray[ (approriate length for my file ) ]:
ifstream infile;
infile.open(fileExample);
while (!infile.eof()) {
string temp;
getline(infile,temp,' '); // Successfully reads words seperated by a single space
if ((temp != "") && (temp != '\n') && (temp != " ") && (temp != "\n") && (temp != "\0") {
wordArray[itemsInArray] = temp;
itemsInArray++;
}
The Problem:
My code is saving the end of line character as an item in my array. In my if statement, I've listed all of the ways I have tried to disclude the end of line character, but I've had no luck.
How can I prevent the end of line character from saving as an item in my array?
I've tried a few other methods I have found on threads similar to this, including something with a *const char that I couldn't make work, as well as iterating through and deleting the new line characters. I've been working on this for hours, I don't want to repost the same issue, and have tried many many methods.
The standard >> operator overloaded for std::string already uses white-space as word boundary so your program can be simplified a lot.
#include <iostream>
#include <string>
#include <vector>
int
main()
{
std::vector<std::string> words {};
{
std::string tmp {};
while (std::cin >> tmp)
words.push_back(tmp);
}
for (const auto& word : words)
std::cout << "'" << word << "'" << std::endl;
}
For the input you are showing, this will output:
'beginning'
'of'
'file'
'more'
'lines'
'with'
'some'
'bizzare'
'spacing'
'some'
'lines'
'next'
'to'
'each'
'other'
'while'
'others'
'are'
'farther'
'apart'
'eof'
Isn't this what you want?
The stream's extraction operator should take care of that for you
std::ifstream ifs("file.txt");
while (ifs.good())
{
std::string word;
ifs >> word;
if (ifs.eof())
{
break;
}
std::cout << word << "\n";
}
int main()
{
char *n;
int count=0,count1=0;
ofstream output("user.txt");
output<<"aa bb cc";
output.close();
ifstream input("user.txt");
while(!input.eof())
{
count++;
if(count1<count)
cout<<" ";
count1=count;
input>>n;
cout<<n;
}
cout<<"\ncount="<<count;
getch();
}

Searching for a phrase in a text file c++

I'm trying to read a text file to find how many times a phrase/sentence(/substring?) occurs. I've done a real bodge job on it currently (see code below) but as you'll see, it relies on some rather clunky if statements.
I don't have access to the files I''ll be using it on at home, so I've used a file called big.txt and search for phrases like "and the" for the time being.
Ideally, I'd like to be able to search for "this error code 1" and it return the number of times it occurs. Any ideas on how I might get my code to work that way would be incredibly useful!
int fileSearch(string errorNameOne, string errorNameTwo, string textFile) {
string output; //variable that will store word from text file
ifstream inFile;
inFile.open(textFile); //open the selected text file
if (!inFile.is_open()) {
cerr << "The file cannot be opened";
exit(1);
}
if (inFile.is_open()) { //Check to make sure the file has opened correctly
while (!inFile.eof()) { //While the file is NOT at the end of the file
inFile >> output; //Send the data from the file to "output" as a string
if (output == errorNameOne) { //Check to look for first word of error code
marker = 1; //If this word is present, set a marker to 1
}
else if (marker == 1) { //If the marker is set to 1,
if (output == errorNameTwo) { //and if the word matches the second error code...
count++; //increse count
}
marker = 0; //either way, set marker to 0 again
}
}
}
inFile.close(); //Close the opened file
return count; //Function returns count of error
}
Given that your phrase can only occur once per line and the number follows the phrase after a number of spaces you can read the file line by line and use std::string::find() to see of your phrase is somewhere in the line. That will return the position of the phrase. You can then work on checking the rest of the line immediately after the phrase to test the number for 1 or 0.
This code may not be exactly what you want (still not certain of the exact specs) but hopefully it should contain enough examples of what you can do to achieve your goal.
// pass the open file stream in to this function along with the
// phrase you are looking for and the number to check
int count(std::istream& is, const std::string& phrase, const int value)
{
int count = 0;
std::string line;
while(std::getline(is, line)) // read the stream line by line
{
// check if the phrase appears somewhere in the line (pos)
std::string::size_type pos = line.find(phrase);
if(pos != std::string::npos) // phrase found pos = position of phrase beginning
{
// turn the part of the line after the phrase into an input-stream
std::istringstream iss(line.substr(pos + phrase.size()));
// attempt to read a number and check if the number is what we want
int v;
if(iss >> v && v == value)
++count;
}
}
return count;
}
int main()
{
const std::string file = "tmp.txt";
std::ifstream ifs(file);
if(!ifs.is_open())
{
std::cerr << "ERROR: Unable to open file: " << file << '\n';
return -1;
}
std::cout << "count: " << count(ifs, "Header Tangs Present", 1) << '\n';
}
Hope this helps.