Reading text with blanks and numeric data from a file

Reading text with blanks and numeric data from a file - c++

So I have data in a text like this:
Alaska 200 500
New Jersey 400 300
.
.
And I am using ifstream to open it.
This is part of a course assignment. We are not allowed to read in the whole line all at once and parse it into the various pieces. So trying to figure out how to read each part of every line.
Using >> will only read in "New" for "New Jersey" due to the white space/blank in the middle of that state name. Have tried a number of different things like .get(), .read(), .getline(). I have not been able to get the whole state name read in, and then read in the remainder of the numeric data for a given line.
I am wondering whether it is possible to read the whole line directly into a structure. Of course, structure is a new thing we are learning...
Any suggestions?

Can't you just read the state name in a loop?
Read a string from cin: if the first character of the string is numeric then you've reached the next field and you can exit the loop. Otherwise just append it to the state name and loop again.

Here is a line by line parsing solution that doesn't use any c-style parsing methods:
std::string line;
while (getline(ss, line) && !line.empty()) {
size_t startOfNumbers = line.find_first_of("0123456789");
size_t endOfName = line.find_last_not_of(" ", startOfNumbers);
std::string name = line.substr(0, endOfName); // Extract name
std::stringstream nums(line.substr(startOfNumbers)); // Get rest of the line
int num1, num2;
nums >> num1 >> num2; // Read numbers
std::cout << name << " " << num1 << " " << num2 << std::endl;
}

If you can't use getline, do it yourself: Read and store in a buffer until you find '\n'. In this case you probably also cannot use all the groovy stuff in std::string and algorithm and might as well use good ol' C programming at that point.
Once you have grabbed a line, read your way backwards from the end of the line and
Discard all whitespace until you find non whitespace.
Gather characters found into token 3 until you find whitepace again.
Read and discard the whitespace until you find the end of token 2.
Gather token 2 until you find more whitespace.
Discard the whitespace until you find the end of token 1. The rest of the line is all token 1.
convert token 2 and token 3 into numbers. I like to use strtol for this.
You can build all of the above or Daniel's answer (use his answer if at all possible) into an overload of operator>>. This lets you
mystruct temp;
while (filein >> temp)
{
// do something with temp. Stick it in a vector, whatever
}
The code to do this looks something like (Stealing wholesale from What are the basic rules and idioms for operator overloading? <-- Read this. It could save your life one day)
std::istream& operator>>(std::istream& is, mystruct & obj)
{
// read obj from stream
if( /* no valid object of T found in stream */ )
is.setstate(std::ios::failbit);
return is;
}

Here's another example of reading the file word by word. Edited to remove the example using the eof check as the while loop condition. Also included a struct as you mentioned that's what you just learned. I'm not sure how you're supposed to use your struct, so I just made it simple and had it contain 3 variables, a string, and 2 ints. To verify it reads correctly it couts the contents of the struct variables after its read in which includes printing out "New Jersey" as one word.
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h> // for atoi
using namespace std;
// Not sure how you're supposed to use the struct you mentioned. But for this example it'll just contain 3 variables to store the data read in from each line
struct tempVariables
{
std::string state;
int number1;
int number2;
};
// This will read the set of characters and return true if its a number, or false if its just string text
bool is_number(const std::string& s)
{
return !s.empty() && s.find_first_not_of("0123456789") == std::string::npos;
}
int main()
{
tempVariables temp;
ifstream file;
file.open("readme.txt");
std::string word;
std::string state;
bool stateComplete = false;
bool num1Read = false;
bool num2Read = false;
if(file.is_open())
{
while (file >> word)
{
// Check if text read in is a number or not
if(is_number(word))
{
// Here set the word (which is the number) to an int that is part of your struct
if(!num1Read)
{
// if code gets here we know it finished reading the "string text" of the line
stateComplete = true;
temp.number1 = atoi(word.c_str());
num1Read = true; // won't read the next text in to number1 var until after it reads a state again on next line
}
else if(!num2Read)
{
temp.number2 = atoi(word.c_str());
num2Read = true; // won't read the next text in to number2 var until after it reads a state agaon on next line
}
}
else
{
// reads in the state text
temp.state = temp.state + word + " ";
}
if(stateComplete)
{
cout<<"State is: " << temp.state <<endl;
temp.state = "";
stateComplete = false;
}
if(num1Read && num2Read)
{
cout<<"num 1: "<<temp.number1<<endl;
cout<<"num 2: "<<temp.number2<<endl;
num1Read = false;
num2Read = false;
}
}
}
return 0;
}

Related

How to replace Hi with Bye in a file

I want to replace hi with a bye by reading a file and outputting another file with the replaced letters.
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream myfile;
ofstream output;
output.open("outputfile.txt");
myfile.open("infile.txt");
char letter;
myfile.get(letter);
while (!myfile.eof()) {
if (letter == 'H') {
char z = letter++;
if (z == 'i')
output << "BYE";
}
else output << letter;
}
output.close();
myfile.close();
return 0;
}
My outputs are repeated capital I's that is repeated infinity times.
Here is my input file
Hi
a Hi Hi a
Hi a a Hi

Don't check eof
The eof method is returning the location of the input stream read pointer, and not the status of the get. It is more like telling you whether or not get will succeed, so you could write something like:
while (!myfile.eof()) {
char letter;
myfile.get(letter);
//...
}
In this way, you would at least be getting a new letter at each iteration, and the loop ends when the read pointer reaches the end of the input.
But, there are other cases that might cause the get to not succeed. Fortunately, these are captured by the stream itself, which is returned by get. Testing the status of the stream is as easy as treating the stream as a boolean. So, a more idiomatic way to write the loop is:
char letter;
while (myfile.get(letter)) {
//...
}
Peek at the next letter
When you want to look at the next letter in the input following the detected 'H', you perform an increment.
char z = letter++;
But, this does not achieve the desired result. Instead, it just sets both letter and z variables to the numerical successor of 'H' ('H' + 1), and does not observe the next letter in the input stream.
There is another method you can use that is like get, but leaves the input in the input stream. It is called peek.
char z;
auto peek = [&]() -> decltype(myfile) {
if (myfile) z = myfile.peek();
return myfile;
};
if (peek()) {
//...
}
And now, you can check the value of z, but it is still considered input for the next get on letter.
Close to what you implemented
So, the complete loop could look like:
char letter;
while (myfile.get(letter)) {
if (letter == 'H') {
char z;
auto peek = [&]() -> decltype(myfile) {
if (myfile) z = myfile.peek();
return myfile;
};
if (peek() && z == 'i') {
myfile.get(z);
output << "BYE";
continue;
}
}
output << letter;
}
With this approach, you will be able to correctly handle troublesome cases like HHi as input, or the last letter in the input being an H.

Your two lines:
myfile.get(letter);
while (!myfile.eof()) {
are wrong.
First off you only read letter once, hence your infinite loop.
Secondly you don't use eof in a while loop.
You want something more like:
while (myfile.get(letter)) {
Also:
char z = letter++;
is wrong, you want to read another letter:
myfile.get(z);
but you have to be careful that you get something, so
if(!myfile.get(z)) {
output << letter;
break;
}
So finally:
char letter;
while (myfile.get(letter)) {
if (letter == 'H') {
char z;
if(!myfile.get(z)) {
output << letter;
break;
}
if (z == 'i') {
output << "BYE";
}
else output << letter << z;
}
else output << letter;
}
But now we are consuming the character after any H which may not be desirable.
See #jxh's answer for a way to do this with look ahead.

There is a dedicated function to replace patterns in strings. For example, you could use std::regex_replace. That is very simple. We define, what should be searched for and with what that would be replaced.
Some comments. On StackOverflow, I cannot use files. So in my example program, I use a std::istringstream instead. But this is also an std::istream. You can use any other std::istream as well. So if you define an std::ifstream to read from a file, then it will work in the same way as the std::istringstream. You can simply replace it. For the output I use the same mechanism to show the result on the console.
Please see the simple solution:
#include <iostream>
#include <sstream>
#include <regex>
// The source file
std::istringstream myfile{ R"(Hi
a Hi Hi a
Hi a a Hi)" };
// The destination file
std::ostream& output{ std::cout };
int main() {
// Temporary string, to hold one line that was read from a file
std::string line{};
// Read all lines from the file
while (std::getline(myfile, line)) {
// Replace the sub-string and write to output file
output << std::regex_replace(line, std::regex("Hi"), "Bye") << "\n";
}
return 0;
}

Count first digit on each line of a text file

My project takes a filename and opens it. I need to read each line of a .txt file until the first digit occurs, skipping whitespace, chars, zeros, or special chars. My text file could look like this:
1435 //1, nextline
0 //skip, next line
//skip, nextline
(*Hi 245*) 2 //skip until second 2 after comment and count, next line
345 556 //3 and count, next line
4 //4, nextline
My desired output would be all the way up to nine but I condensed it:
Digit Count Frequency
1: 1 .25
2: 1 .25
3: 1 .25
4: 1 .25
My code is as follows:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
int digit = 1;
int array[8];
string filename;
//cout for getting user path
//the compiler parses string literals differently so use a double backslash or a forward slash
cout << "Enter the path of the data file, be sure to include extension." << endl;
cout << "You can use either of the following:" << endl;
cout << "A forwardslash or double backslash to separate each directory." << endl;
getline(cin,filename);
ifstream input_file(filename.c_str());
if (input_file.is_open()) { //if file is open
cout << "open" << endl; //just a coding check to make sure it works ignore
string fileContents; //string to store contents
string temp;
while (!input_file.eof()) { //not end of file I know not best practice
getline(input_file, temp);
fileContents.append(temp); //appends file to string
}
cout << fileContents << endl; //prints string for test
}
else {
cout << "Error opening file check path or file extension" << endl;
}
In this file format, (* signals the beginning of a comment, so everything from there to a matching *) should be ignored (even if it contains a digit). For example, given input of (*Hi 245*) 6, the 6 should be counted, not the 2.
How do I iterate over the file only finding the first integer and counting it, while ignoring comments?

One way to approach your problem is the following:
Create a std::map<int, int> where the key is the digit and the value is the count. This allows you to compute statistics on your digits such as the count and the frequency after you have parsed the file. Something similar can be found in this SO answer.
Read each line of your file as a std::string using std::getline as shown in this SO answer.
For each line, strip the comments using a function such as this:
std::string& strip_comments(std::string & inp,
std::string const& beg,
std::string const& fin = "") {
std::size_t bpos;
while ((bpos = inp.find(beg)) != std::string::npos) {
if (fin != "") {
std::size_t fpos = inp.find(fin, bpos + beg.length());
if (fpos != std::string::npos) {
inp = inp.erase(bpos, fpos - bpos + fin.length());
} else {
// else don't erase because fin is not found, but break
break;
}
} else {
inp = inp.erase(bpos, inp.length() - bpos);
}
}
return inp;
}
which can be used like this:
std::string line;
std::getline(input_file, line);
line = strip_comments(line, "(*", "*)");
After stripping the comments, use the string member function find_first_of to find the first digit:
std::size_t dpos = line.find_first_of("123456789");
What is returned here is the index location in the string for the first digit. You should check that the returned position is not std::string::npos, as that would indicate that no digits are found. If the first digit is found, the corresponding character can be extracted using const char c = line[dpos]; and converted to an integer using std::atoi.
Increment the count for that digit in the std::map as shown in that first linked SO answer. Then loop back to read the next line.
After reading all lines from the file, the std::map will contain the counts for all first digits found in each line stripped of comments. You can then iterate over this map to retrieve all the counts, accumulate the total count over all digits found, and compute the frequency for each digit. Note that digits not found will not be in the map.
I hope this helps you get started. I leave the writing of the code to you. Good luck!

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.

This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.

You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.

I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

When parsing a string using a string stream, it extracts a new line character

Description of the program : The program must read in a variable amount of words until a sentinel value is specified ("#" in this case). It stores the words in a vector array.
Problem : I use a getline to read in the string and parse the string with a stringstream. My problem is that the stringstream is not swallowing the new line character at the end of each line and is instead extracting it.
Some solutions I have thought of is to cut off the last character by creating a subset or checking if the next extracted word is a new line character, but I feel there is a better cost efficient solution such as changing the conditions for my loops.
I have included a minimized version of the overall code that reproduces the problem.
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
const int MAX_LIST_SIZE = 1000;
string str;
string list[MAX_LIST_SIZE];
int numWords = 0;
// program starts here
getline(cin, str); // read innput
stringstream parse(str); // use stringstream to parse input
while(str != "#") // read in until sentinel value
{
while(!parse.fail()) // until all words are extracted from the line
{
parse >> list[numWords]; // store words
numWords++;
}
getline(cin,str); // get next line
parse.clear();
parse.str(str);
}
// print number of words
cout << "Number of words : " << numWords << endl;
}
And a set of test input data that will produce the problem
Input:
apples oranges mangos
bananas
pineapples strawberries
Output:
Number of words : 9
Expected Output:
Number of words : 6
I would appreciate any suggestions on how to deal with this problem in an efficient manner.

Your logic for parsing out the stream isn't quite correct. fail() only becomes true after a >> operation fails, so you'll doing an extra increment each time. For example:
while(!parse.fail())
{
parse >> list[numWords]; // fails
numWords++; // increment numWords anyway
} // THEN check !fail(), but we incremented already!
All of these operations have returns that you should check as you go to avoid this problem:
while (getline(cin, str)) { // fails if no more lines in cin
if (str != "#") { // doesn't need to be a while
stringstream parse(str);
while (parse >> list[numWords]) { // fails if no more words
++numWords; // *only* increment if we got one!
}
}
}
Even better would be to not use an array at all for the list of words:
std::vector<std::string> words;
Which can be used in the inner loop:
std::string temp;
while (parse >> temp) {
words.push_back(temp);
}

The increment on numwords happens one more time than you intend at the end of each line. Use a std::vector< std::string > for your list. Then you can use list.size().

Reading in from a txt file. Trouble parsing info

I want to read in scores from a txt file. The scores are going into a struct.
struct playerScore
{
char name[32];
int score, difficulty;
float time;
};
the text file looks like this
Seth 26.255 40 7
as one line, where each item is followed by a tab. (Name\t time\t score\t difficulty\n)
When I begin to read in the text, I don't know how to tell the program when to stop. The scores file could be any number of lines or score entries. This is what I have attempted.
hs.open("scores.txt", ios_base::in);
hs.seekg(0, hs.beg);
if (hs.is_open())
{
int currpos = 0;
while (int(hs.tellg()) != int(hs.end));
{
hs>> inScore.name;
hs >> inScore.time;
hs >> inScore.score;
hs >> inScore.difficulty;
hs.ignore(INT_MAX, '\n');
AllScores.push_back(inScore);
currpos = (int)hs.tellg();
}
}
I'm trying to make a loop that will read in a line of code into a temp struct for the data, then push that struct into a vector of structs. Then update the currpos variable with the current location of the input pointer. However, the loop just gets stuck on the condition and freezes.

There are a multitude of ways to do this, but the following is likely what you're looking for. Declare a free-operator for extracting a single-line definition of a player-score:
std::istream& operator >>(std::istream& inf, playerScore& ps)
{
// read a single line.
std::string line;
if (std::getline(inf, line))
{
// use a string stream to parse line by line.
std::istringstream iss(line);
if (!(iss.getline(ps.name, sizeof(ps.name)/sizeof(*ps.name), '\t') &&
(iss >> ps.time >> ps.score >> ps.difficulty)))
{
// fails to parse a full record. set the top-stream fail-bit.
inf.setstate(std::ios::failbit);
}
}
return inf;
}
With that, your read code can now do this:
std::istream_iterator<playerScore> hs_it(hs), hs_eof;
std::vector<playerScore> scores(hs_it, hs_eof);

I dont think that you can just >> from your file. Do you think it will take everything till \t? :)
You can try to take for example token with strtok()
I guess it can use '\t' to split string and take for each variable via this function needed part of string
In case if it strtok() doesnt work that way i guess you can just copy till '\t' in sub-loop

You can do like this
playerScore s1;
fstream file;
file.open("scores.txt", ios::in | ios::out);
while(!file.eof()) //For end of while loop
{
file.read(s1, sizeof(playerScore));//read data in one structure.
AllScores.push_back(s1);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading text with blanks and numeric data from a file - c++

Can't you just read the state name in a loop? Read a string from cin: if the first character of the string is numeric then you've reached the next field and you can exit the loop. Otherwise just append it to the state name and loop again.

Related

How to replace Hi with Bye in a file

Count first digit on each line of a text file

Splitting sentences and placing in vector

When parsing a string using a string stream, it extracts a new line character

Reading in from a txt file. Trouble parsing info

Categories

Resources