Searching for a phrase in a text file c++ - c++

I'm trying to read a text file to find how many times a phrase/sentence(/substring?) occurs. I've done a real bodge job on it currently (see code below) but as you'll see, it relies on some rather clunky if statements.
I don't have access to the files I''ll be using it on at home, so I've used a file called big.txt and search for phrases like "and the" for the time being.
Ideally, I'd like to be able to search for "this error code 1" and it return the number of times it occurs. Any ideas on how I might get my code to work that way would be incredibly useful!
int fileSearch(string errorNameOne, string errorNameTwo, string textFile) {
string output; //variable that will store word from text file
ifstream inFile;
inFile.open(textFile); //open the selected text file
if (!inFile.is_open()) {
cerr << "The file cannot be opened";
exit(1);
}
if (inFile.is_open()) { //Check to make sure the file has opened correctly
while (!inFile.eof()) { //While the file is NOT at the end of the file
inFile >> output; //Send the data from the file to "output" as a string
if (output == errorNameOne) { //Check to look for first word of error code
marker = 1; //If this word is present, set a marker to 1
}
else if (marker == 1) { //If the marker is set to 1,
if (output == errorNameTwo) { //and if the word matches the second error code...
count++; //increse count
}
marker = 0; //either way, set marker to 0 again
}
}
}
inFile.close(); //Close the opened file
return count; //Function returns count of error
}

Given that your phrase can only occur once per line and the number follows the phrase after a number of spaces you can read the file line by line and use std::string::find() to see of your phrase is somewhere in the line. That will return the position of the phrase. You can then work on checking the rest of the line immediately after the phrase to test the number for 1 or 0.
This code may not be exactly what you want (still not certain of the exact specs) but hopefully it should contain enough examples of what you can do to achieve your goal.
// pass the open file stream in to this function along with the
// phrase you are looking for and the number to check
int count(std::istream& is, const std::string& phrase, const int value)
{
int count = 0;
std::string line;
while(std::getline(is, line)) // read the stream line by line
{
// check if the phrase appears somewhere in the line (pos)
std::string::size_type pos = line.find(phrase);
if(pos != std::string::npos) // phrase found pos = position of phrase beginning
{
// turn the part of the line after the phrase into an input-stream
std::istringstream iss(line.substr(pos + phrase.size()));
// attempt to read a number and check if the number is what we want
int v;
if(iss >> v && v == value)
++count;
}
}
return count;
}
int main()
{
const std::string file = "tmp.txt";
std::ifstream ifs(file);
if(!ifs.is_open())
{
std::cerr << "ERROR: Unable to open file: " << file << '\n';
return -1;
}
std::cout << "count: " << count(ifs, "Header Tangs Present", 1) << '\n';
}
Hope this helps.

Related

Infile stops reading command line arguments

**Edit: As it turns out, it was a simple typo under the if(i==0) statement. I missed putting {} to enclose both first_nonterminal statements.
I'm creating a CFG for an assignment, but I've gotten stuck. My program is supposed to read a file (of strings) by getting the file name from the command line, and then do certain things with the contents of the file.
using namespace std;
string current, first_nonterminal;
int main(int argc, char *argv[])
{
if(argc != 2)
{
std::cout << "No file name given" << std::endl; // if there is no file name in command line
exit(1);
}
ifstream infile(argv[1]);
if(!infile)
{
std::cout << "Given file " << argv[1] << " will not open."; // if file refuses to open
exit(2);
}
string word;
for(int i = 0; infile >> word; ++i)
{
cout << word << endl; // (debug) print input word
try // check if first word is in correct format
{
if (i == 0 && word.find(':') == string::npos) // check only first word,
{
throw runtime_error("File does not have correct format.");
}
}
catch(runtime_error &e)
{
cout << "Error:" << e.what();
exit(3);
}
if (i==0)
first_nonterminal = word;
first_nonterminal.pop_back(); // remove colon
insert(word); //put string through insert() method
}
randomize(); // randomize and replace
print(); // print end result
infile.close();
}
The above code intakes a file which is formatted like so:
STMT: THIS THAT OTHER
THIS: That carpet
THIS: Atlanta
THAT: is wild
OTHER: .
OTHER: , oooh OTHER2
OTHER2: oooh OTHER2
OTHER2: !
Any word that has a colon following it is considered a nonterminal, with the words following it considered terminals. Regardless, I've figured out the issue isn't my randomize() or insert() functions, as they work perfectly if I hard-code the file into the program. My issue is the file stops being read after a certain number of strings, and I'm not sure why. For example, when I put the above's file name into the command line, it runs through, but then after it puts "That" into the insert() function, it prints "carpet" via the debug cout, and then stops.

C++ How to get substring from ifstream and getline()

What is printed to the console:
START(0,0)
GOAL(0,2)
ooox
xxoo
ooox
I want to be able to obtain the substring of the START and GOAL points, not including the brackets just the coordinate pair. I would also want to store them as variables as well since I want to add validation whether the START or GOAL points are out of bounds from the grid.
I am trying to make an application to traverse the 2D grid, where the 'x' represents the blocked paths and 'o' represents unblocked.
The starting point is always from the bottom left of the grid as represented below:
(0,2)(1,2)(2,2)(3,2)
(0,1)(1,1)(2,1)(3,1)
(0,0)(1,0)(2,0)(3,0)
I have tried using .substr() method with the start and end points of where I would like to store the values but it does not print out anything in the console.
void Grid::loadFromFile(const std::string& filename){
std::string line;
std::ifstream file(filename);
file.open(filename);
// Reads the file line by line and outputs each line
while(std::getline(file, line)) {
std::cout << line << std::endl;
}
std::string startPoint, goalPoint;
startPoint = line.substr(6,3);
std::cout << startPoint << std::endl;
file.close();
}
I expect std::cout << startPoint << std::endl; to print the substring into the console but it just reads the file and prints whatever is in it, and nothing else.
The problem is you are reading ALL lines of the file first, THEN you are parsing only the last line that was read, asking for a starting index that is out of range.
You need to move your parsing inside the reading loop instead:
void Grid::loadFromFile(const std::string& filename)
{
std::ifstream file(filename);
if (!file.is_open()) return;
std::string line, startPoint, goalPoint;
std::vector<std::string> grid;
while (std::getline(file, line))
{
if (line.compare(0, 5, "START") == 0)
startPoint = line.substr(6,3);
else if (line.compare(0, 4, "GOAL") == 0)
goalPoint = line.substr(5,3);
else
grid.push_back(line);
}
file.close();
std::cout << startPoint << std::endl;
std::cout << goalPoint << std::endl;
// process grid as needed...
}
Or, if you know the 1st two lines are ALWAYS START and GOAL:
void Grid::loadFromFile(const std::string& filename)
{
std::ifstream file(filename);
if (!file.is_open()) return;
std::string line, startPoint, goalPoint;
std::vector<std::string> grid;
if (!std::getline(file, line)) return;
if (line.compare(0, 5, "START") != 0) return;
startPoint = line.substr(6,3);
if (!std::getline(file, line)) return;
if (line.compare(0, 4, "GOAL") != 0) return;
goalPoint = line.substr(5,3);
while (std::getline(file, line))
grid.push_back(line);
file.close();
std::cout << startPoint << std::endl;
std::cout << goalPoint << std::endl;
// process grid as needed...
}
I believe that getline only stores data from file into string line for every line of the file in the for loop until it reaches null.
So after the for loop line = null essentially.
You need either an alternative way of reading the file or a way of storing the data for use outside the for loop scope(perhaps a string array).
Hope that helps :)

How can I label the lines of an existing file?

Lets say I have a text file containing something like:
Four
score
and
seven
years
ago
...
I want to be able to label these lines so that after the program runs, the file looks like:
1.Four
2.score
3.and
4.seven
5.years
6.ago
...
I've prepared a solution; however, I find it to be heavy weight and it has a problem of labeling one past the last line...
std::string file = "set_test - Copy.txt";
std::ifstream in_test{file};
std::vector<std::string> lines;
while(in_test) {
std::string temp;
getline(in_test, temp);
lines.push_back(temp);
}
in_test.close();
std::ofstream out_test{file};
for(unsigned int i = 0; i < lines.size(); ++i) {
out_test << i+1 << '.' << lines[i] << '\n';
}
On top of being heavy-weight, this solution also labels the line beyond the last line of text.
Does anyone have a better solution to this problem?
The cause of your problem is this structure
while (stream is good)
read from stream
do something
as it will read too much. (See this Q&A for explanation.)
What's happening is that the very last getline, the one that actually reaches the end of the file, will fail and leave temp empty.
Then you add that empty line to your lines.
The "canonical" stream-reading loop structure is
while (attempt to read)
do something with the result
in your case,
std::string temp;
while (getline(in_test, temp)) {
lines.push_back(temp);
}
If you write to a different file you don't need to store anything except the last line; you can write each line immediately.
If you want to replace the original, you can replace the old with the new afterwards.
Something like this:
std::ifstream in_test{"set_test - Copy.txt";}
std::ofstream out_test{"set_test - Numbered.txt"};
if (!in_test || !out_test) {
std::cerr << "There was an error in the opening of the files.\n";
return;
}
int i = 1;
std::string line;
while (getline(in_test, line) && out_test << i << '.' << line << '\n') {
i++;
}

Count first digit on each line of a text file

My project takes a filename and opens it. I need to read each line of a .txt file until the first digit occurs, skipping whitespace, chars, zeros, or special chars. My text file could look like this:
1435 //1, nextline
0 //skip, next line
//skip, nextline
(*Hi 245*) 2 //skip until second 2 after comment and count, next line
345 556 //3 and count, next line
4 //4, nextline
My desired output would be all the way up to nine but I condensed it:
Digit Count Frequency
1: 1 .25
2: 1 .25
3: 1 .25
4: 1 .25
My code is as follows:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
int digit = 1;
int array[8];
string filename;
//cout for getting user path
//the compiler parses string literals differently so use a double backslash or a forward slash
cout << "Enter the path of the data file, be sure to include extension." << endl;
cout << "You can use either of the following:" << endl;
cout << "A forwardslash or double backslash to separate each directory." << endl;
getline(cin,filename);
ifstream input_file(filename.c_str());
if (input_file.is_open()) { //if file is open
cout << "open" << endl; //just a coding check to make sure it works ignore
string fileContents; //string to store contents
string temp;
while (!input_file.eof()) { //not end of file I know not best practice
getline(input_file, temp);
fileContents.append(temp); //appends file to string
}
cout << fileContents << endl; //prints string for test
}
else {
cout << "Error opening file check path or file extension" << endl;
}
In this file format, (* signals the beginning of a comment, so everything from there to a matching *) should be ignored (even if it contains a digit). For example, given input of (*Hi 245*) 6, the 6 should be counted, not the 2.
How do I iterate over the file only finding the first integer and counting it, while ignoring comments?
One way to approach your problem is the following:
Create a std::map<int, int> where the key is the digit and the value is the count. This allows you to compute statistics on your digits such as the count and the frequency after you have parsed the file. Something similar can be found in this SO answer.
Read each line of your file as a std::string using std::getline as shown in this SO answer.
For each line, strip the comments using a function such as this:
std::string& strip_comments(std::string & inp,
std::string const& beg,
std::string const& fin = "") {
std::size_t bpos;
while ((bpos = inp.find(beg)) != std::string::npos) {
if (fin != "") {
std::size_t fpos = inp.find(fin, bpos + beg.length());
if (fpos != std::string::npos) {
inp = inp.erase(bpos, fpos - bpos + fin.length());
} else {
// else don't erase because fin is not found, but break
break;
}
} else {
inp = inp.erase(bpos, inp.length() - bpos);
}
}
return inp;
}
which can be used like this:
std::string line;
std::getline(input_file, line);
line = strip_comments(line, "(*", "*)");
After stripping the comments, use the string member function find_first_of to find the first digit:
std::size_t dpos = line.find_first_of("123456789");
What is returned here is the index location in the string for the first digit. You should check that the returned position is not std::string::npos, as that would indicate that no digits are found. If the first digit is found, the corresponding character can be extracted using const char c = line[dpos]; and converted to an integer using std::atoi.
Increment the count for that digit in the std::map as shown in that first linked SO answer. Then loop back to read the next line.
After reading all lines from the file, the std::map will contain the counts for all first digits found in each line stripped of comments. You can then iterate over this map to retrieve all the counts, accumulate the total count over all digits found, and compute the frequency for each digit. Note that digits not found will not be in the map.
I hope this helps you get started. I leave the writing of the code to you. Good luck!

Detect last line of file C++

I've been working on some code for a file parser function to learn some C++:
It's supposed to read in this text file:
>FirstSeq
AAAAAAAAAAAAAA
BBBBBBBBBBBBBB
>SecondSeq
TTTTTTTTTTTTTT
>ThirdSequence
CCCCCCCCCCCCCC
>FourthSequence
GGGGGGGGGGGGGG
and print out the names (lines with '>' at the start) and then the sequences.
However from the output:
AAAAAAAAAAAAAABBBBBBBBBBBBBB
TTTTTTTTTTTTTT
CCCCCCCCCCCCCC
FirstSeq
SecondSeq
ThirdSequence
FourthSequence
We see that the final line of G characters is not included. The code is below. What it does is loop over lines, if it finds a name, appends it to the vector of names, if it finds a sequence, appends it to a temporary string (incase the sequence is more than one line, like the first sequence), then when it finds the name of the next sequence, stores the built up temporary string in a vector and then proceeds by overwriting the temporary string and starting again. I suspect that it is because in the while loop of the function: The line fullSequence.push_back(currentSeq); which is called whenever a new name was detected previously to push the old temp string onto the vector would not be called for the last line of G's and so it is not being included, although the name "FourthSeq" is recorded, rather the line of G's is read into the temporary string, but then is not passed to the vector. So, how can I make it so as I can detect that this is the last line of the file and so should make sure the temporary string is pushed onto the vector?
Thanks,
Ben.
CODE:
#include<fstream>
#include<iostream>
#include<string>
#include<vector>
void fastaRead(string fileName)
{
ifstream inputFile;
inputFile.open(fileName);
if (inputFile.is_open()) {
vector<string> fullSequence, sequenceNames;
string currentSeq;
string line;
bool newseq = false;
bool firstseq = true;
cout << "Reading Sequence" << endl;
while (getline(inputFile, line))
{
if (line[0] == '>') {
sequenceNames.push_back(line.substr(1,line.size()));
newseq = true;
} else {
if (newseq == true) {
if(firstseq == false){
fullSequence.push_back(currentSeq);
} else {
firstseq = false;
}
currentSeq = line;
newseq = false;
} else {
currentSeq.append(line);
}
}
}
//Report back the sequences and the sequence names...
for ( vector<string>::iterator i = fullSequence.begin(); i != fullSequence.end(); i++) {
cout << *i << endl;
}
for ( vector<string>::iterator i = sequenceNames.begin(); i != sequenceNames.end(); i++) {
cout << *i << endl;
}
cout << fullSequence.size() << endl;
cout << sequenceNames.size() << endl;
inputFile.close();
} else {
perror("error whilst reading this file");
}
if(inputFile.bad()){
perror("error whilst reading this file");
}
}
int main()
{
cout << "Fasta Sequence Filepath" << endl;
string input = "boop.txt";
fastaRead(input);
return 0;
}
Getline() will "fail" when it finds an EOF in the line, so the last line you read will not go through your loop.
I've solved this problem two ways, either by having two flags or just by processing the last line after the loop.
For two flags, the loop requires both to be true, you set one to false when getline() fails, and you set the other one to false if the first one is false, this gives you one extra loop after EOF.
Good luck!