Finding words in text file to extract data using C++

Finding words in text file to extract data using C++ - c++

I am trying to find the total amount of time spent doing a certain activity with C++ and Mac Automator (you do not need to know
Automator to help me). I am using Mac Automator to output a text file using "Event Summary" and "New Text File" actions. It outputs a text file like this:
Viewable text file
I am currently struggling over something very trivial; I cannot accurately find the words "Time" and "Date" in the text file. If I cannot find the words "Time" and "Date" I cannot begin processing the total amount of time spent doing that activity or whether that activity went over midnight (I sometimes work into the ams). So far I think I have spent four hours with mixed results. Any feedback would be appreciated.
The code below is the code I am using at the moment. I can find the word "Time" and "Date" at the very start of the file, or if a ':' is in front of the word "Time" or "Date", but when it is on a different line the programs fails:
cout << "Reading from the file...." << endl;
infile.open("calendar workflow text.txt");
while(infile.getline(buff, BUFFSIZE, ':')){ //reads everything
cout << buff << endl; // prints everything
if(strcmp("Time",buff)==0){
cout<<"Time found in text\n"<<endl;
}
else if (strcmp("Date",buff)==0){
cout<<"Date found in text\n"<<endl;
}
}
infile.close();
cout<<"Total Time in all events: "<<sumtime<<" hrs"<<endl;
return (0);
If you want the automator workflow I can give it to you.

There are a few assumptions that you need to check for:
Is "Time" always going to start at the very first character position? No space or tab before you see the word "Time"?
Can there be multiple "Time" words in the same line?
Can another word appear before "Time"?
strcmp("Time",buff) assumes that your entire string "buff" has just one word in it "Time".
That is not what you want. If assumption 1 is true, you can simply do
if strncmp(buff, "Time", 4) == 0 {
// do something, as you found time
}
Otherwise, for a generic position, you can use strstr(buff, "Time"), for a substring match where "Time" could be anywhere in the string. Once you get the position, skip over exactly the number of characters to get to the value for time. Extract that and perform your calculations.
Typically, in parsing files, you will have to have to some allowance for spaces/tabs etc. Otherwise, the code becomes too brittle and can fail testcases that deviate ever so slightly.

Related

weird visual studio to linux results

I have this project where we have to convert a made up simple coding language given through a txt into a c++ language and currently I am having a problem with checking my work through linux (where it will be tested on). On the other hand Visual Studio where I wrote my code works how it should be.
From what I observed something weird is going on when I am taking each line in the text, separating each word and placing it in a vector. How I do this is by starting with an empty string, looking at each character of the line, and when I see a space or a tab or any kind of character that signifies a separator then I push back that string to a vector then make the string empty again for the next characters. So something line "STR man = bloo" turns into <"STR", "man", "=", "bloo"> and so on.
In visual studio it worked like this, but it seems like in linux.. somehow the second element seems to add a weird character which would be "man ". Its not a space or an empty line and when I tried to look at size with a format std::cout << name << ":" << size << std::endl, it couts not the right format or result.
enter image description here
enter image description here
Looking at visual Studio.
first line: "size" how many elements in that vector.
second line: is a name that I want to check if its already in a vector along with the length of that name
third line: shows all the elements in that vector (which should only show 1 since the size is 1) along with its length as well.
Looking at the linux result
-first line: "size" equals 1. Which is right
second line: the name im checking and its length, which is right
third line: is where the problem occurs. The name is 1 character greater than what it is suppose to be and instead of adding onto whats printed out, it replaces the first 3 letters instead?
fourth line: should not even happen because the size is only 1 and its a basic for loop of (int i = 0; i < size; i++)
Ive tried to use brute force and just pop back the last character of the second element causing my visual studio code to fail the tests but that still fails the conditional of if the name is = to the name in the vector (which should be true) in linux. Whatever character I add on the second element when I cout that value, such as << name << "." << endl; it seems to always go and replaces first letters of that string.
Pls help

Sorry for the delay, the problem was that I was not handling invisible characters such as \r.

When reading from a file in C++, can I just copy the text itself?

Sorry, the wording for the actual question is probably wrong. I have a program that reads in a line from a .txt file and then puts the string into an object to compare it to a string entered by the user. I haven't been able to get it to match, and when I've tried to see what is entered, I don't see much. Maybe there's an invisible character denoting the end of the line? I've tried code like this:
std::cout << "...." << table[row][col]->get() << "...." <<std::endl;
And got
....a
as the result. When reading the file I used std::getline() if that makes a difference.

I didn't find a true fix, although I did see that the length of the read-in string was one int longer than the actual word. I was able to use a substring to cut the end off of the string.

C++ count functional words occurrence

I'm trying to count occurrences of specific words from a text file, the problem is that when my code is reading the file - it is reading it with white-space delimiters but some of the words i want to count are "2 word words" for example "out from"
additional to this there is a second problem and that is the words like "aren't" and "don't" - my code seem to ignore this words even when i put them with backslash in the map - my guess is that it is getting ignored in the process of reading it from the file for some reason
the end outcome that i am looking for is the frequency of the words that i am searching for.
std::list<std::string> Fwords = {
"a","abroad","as far as","ahead of"};
// Begin reading from file:
std::ifstream fileStream(fileName);
// Check if we've opened the file (as we should have).
if (fileStream.is_open())
while (fileStream.good())
{
// Store the next word in the file in a local variable.
std::string word;
fileStream >> word;
std::cout << "This is the word: " << word << endl;
if (std::find(std::begin(Fwords), std::end(Fwords), word) != std::end(Fwords))
wordsCount[word]++;
}
input:
"ahead of me as far as abroad me"
this would be the expected output:
abroad:1
ahead of:1
as far as:1

This approach won't work. Your problem is that you're reading one word at a time from the file. No amount of backslashing or manipulating the list / map of words will fix that.
But how are you supposed to know how many words to read? You don't—it'll have to be trial and error.
One way to "brute force" this, considering your level of programming, would be to add an else case to
if (std::find(std::begin(Fwords), std::end(Fwords), word) != std::end(Fwords))
{
// ...
}
in which you check for words in the map that begin with the word from the file, e.g. "as," but with a space, so the search is for as . If one or more matches are found, then it's time to read another word from the file, e.g. "as far." This should be put in a loop (or a function called in a loop) so that the search for as far and reading another word "as" happens automatically. Upon successfully finding as far as, you're done. You're also done upon failure to find as , as far , or as far as, i.e. if you don't have these in your map, in which case, you want to run a for loop through each word to check if they are words by themselves, and increase their count if so. In this endeavor, you'll realize that you need the same code as your original code; so it'd be smart to factor it out into a function as well.

Is it possible to skip a line in a data file?

I have a data file that I am trying to input and the data is split into sections via a blank line. The data will be read in from a text file.
How do I make my code skip a blank line to read in the next piece of data? I am currently just in the planning stages of my application.
I'm a beginner so I'm not really sure how to go about this.
Can anyone advise a method on how to approach this?
I have just written it out and my code looks like this:
string ship2_id;
char ship2_journey_id[20];
float ship2_l;
int ship2_s;
getline(itinerary_file, ship2_id);
if (ship2_id = ' ')
{
itinerary_file.ignore(numeric_limits<streamsize>::max(), '\n');
}
else
getline(itinerary_file, ship2_id);
cout << ship2_id << endl;

Yes,
stream.ignore(max_number_of_chars_to_be_skipped, '\n');
I usually just use 1ul<<30 or similar for the first parameter, but
this could be a DoS vector if the input is untrusted and slow to skip those chars
the "pedant" value would read std::numeric_limits<std::stream_pos>::max() or similar

I don't what are you using to read the file, but, to search for a blank line, look for two "line breaks" together. Take in account that the "line breaker" character is different for some OS. In Windows, by default, there are two characters that are used together for a line break.

Strange error printing getline() strings in cout

I was trying to test my classes when I encountered a weird problem in the input of test cases.
I tried to simplify the input to see what went wrong so I created the program below.
#include <iostream>
#include <string>
int main()
{
std::string number;
while (std::getline(std::cin, number))
{
std::cout << std::string(number) << " ";
}
}
Basically, I am getting each line of text and storing it in a string variable using getline(). Then I display each string using std::cout and append a single space character.
My input file contains this:
one six
one seven
The expected output should be like this:
one six one seven
But instead, I get this:
one seven
That is a space character followed by the second line of the input. It disregards the first line of input. I know for a fact that each line are being read properly because they were correctly displayed when I replaced the code with this:
std::cout << std::string(number) << std::endl;
This error is quite new to me. What's happening here? Can anybody explain? TIA!

Ok, its clear.
Your input file must be : one six\r\ntwo seven\r\n with normal Windows EOL.
When you read it under cygwin, you get in first read one six\r, only the \n being eaten by getline, and same one seven\r on the second line.
So you write : one six\r one seven\r (with an ending blank). But the \r alone put the curson back in first column of same line and second line erases first.
And normally the problem is not visible if you replace the ending blank by a std::eol that puts the cursor on a new line. The tab (\t) if really a special case : it put the cursor on eighth column exactly where you expect it, but by pure chance. If you invert the two lines it would be more apparent because you would see the remaining of first line at end of second.
You can confirm it by writing the output to a file and editing it.
I could reproduce it under Linux with a Windows EOL. The reason for that is that Cygwin closely mimics Unix-Linux and use Unix EOL convention of only \n.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js