How to extract specific data from text file containing whitespace and newlines? - c++

I would like to extract and analyze data from a large text file. The data contains floats, integers and words.
The way I thought of doing this is to extract a complete line (up to newline) using std::getline(). Then extract individual data from the line extracted before (extract until whitespace, then repeat).
So far I have this:
int main( )
{
std::ifstream myfile;
myfile.open( "example.txt", std::ios::in );
if( !(myfile.is_open()) )
{ std::cout << "Error Opening File";
std::exit(0); }
std::string firstline;
while( myfile.good() )
{
std::getline( myfile, firstline);
std::cout<< "\n" << firstline <<"\n";
}
myfile.close();
return 0;
}
I have several problems:
1) How do I extract up to a whitespace?
2) What would be the best method of storing the data? There are about 7-9 data types, and the data file is large.
EDIT: An example of the file would be:
Result Time Current Path Requirements
PASS 04:31:05 14.3 Super_Duper_capacitor_413 -39.23
FAIL 04:31:45 13.2 Super_Duper_capacitor_413 -45.23
...
Ultimately I would like to analyze the data, but so far I'm more concerned about proper input/reading.

You can use std::stringstream to parse the data and let it worry about skipping the whitspaces. Since each element in the input line appears to require additional processing just parse them into local variables and after all post processing is done store the final results into a data structure.
#include <sstream>
#include <iomanip>
std::stringstream templine(firstline);
std::string passfail;
float floatvalue1;
std::string timestr;
std::string namestr;
float floatvalue2;
// split to two lines for readability
templine >> std::skipws; // no need to worry about whitespaces
templine >> passfail >> timestr >> floatvalue1 >> namestr >> floatvalue2;
If you do not need or want to validate that the data is in the correct format you can parse the lines directly into a data structure.
struct LineData
{
std::string passfail;
float floatvalue1;
int hour;
int minute;
int seconds;
std::string namestr;
float floatvalue2;
};
LineData a;
char sep;
// parse the pass/fail
templine >> a.passfail;
// parse time value
templine >> a.hour >> sep >> a.minute >> sep >> a.seconds;
// parse the rest of the data
templine >> a.timestr >> a.floatvalue1 >> a.namestr >> a.floatvalue2;

For the first question, you can do this:
while( myfile.good() )
{
std::getline( myfile, firstline);
std::cout<< "\n" << firstline <<"\n";
std::stringstream ss(firstline);
std::string word;
while (std::getline(ss,word,' '))
{
std::cout << "Word: " << word << std::endl;
}
}
As for the second question, can you give us more precision about the data types and what is it you want to do with the data once stored?

Related

Sorting function with output to a separate file

I need to sort the input text document and write it into another test document. so that lines with an even number of words delete every second word, and lines with an odd number of words remain unchanged. The sorting should be performed as a function, the first pointer to the input of the string and the second pointer to the output.
I did the input and output, tried to do a sorting to begin with without a function, but nothing worked. can you tell me what I did not do the rule?
Example:
Input.txt
I do not like to go to school or study
I like to play games
Output.txt
I not to to or
I like to play games
string in, out;
cin >> in;
cin >> out;
ifstream input(in);
string str;
ofstream output(out);
string st;
while (getline(input, str))
{
do
{
int i = count(str.cbegin(), str.cend(), ' ');
if (i % 2 == 0)
st.append(str);
else
st.append(" ");
i++;
}
while (input);
output << st << endl;
}
The std::stringstream will be your friend.
Read about it here and especially for the useful std::istringstream here.
What can we do with this thing? We can put a string into it and then use the extraction operator >> like with any other stream.
So, counting words will become very simple. We read a line from the file as a std::string and then put it into a std::istringstream, Then we extract dummy words and increase a counter until the reading fails. Then we have the number of words.
Example:
// We read one line. Count the number of words
// Put the string in a stringstream to easily extract words
std::istringstream iss(line);
// Counter for words
unsigned int wordCount{};
// Dummy word
std::string tempWord{};
// Do the counting
while (iss >> tempWord) ++wordCount;
The stream tries too read until it cannot read any more (the stream is empty) and then sets a failbit. The failure of any stream can be checked with its bool operator or the !-operator. The while loop expects a boolean. And the extraction operation iss >> tempWord will return a reference to the stream and then its bool operator will be called and used as condition.
OK, understood, we can now count words. Next step is to skip words.
This is similar simple. In case of even number of words in a string, we will read always 2 words in a loop and simply throw the second one away.
Like this
std::string usedWord{};
// Read 2 words, ignore the second
while (iss >> usedWord >> tempWord)
result += (usedWord + ' ');
There are of course many other possible ways to solve that, but maybe the below solution can give you an idea.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
int main() {
// First read the input filename
std::cout << "\nPlease enter the path for the input file:\n";
std::string inputFileName{};
std::getline(std::cin, inputFileName);
// next read the output filename
std::cout << "\nPlease enter the path for the output file:\n";
std::string outputFileName{};
std::getline(std::cin, outputFileName);
// Open now both files. If one file cannot be opened, then no need tocontinue
// Open input file and check, if it could be opened
std::ifstream inputFileStream(inputFileName);
if (inputFileStream) {
// Input file could be opened without error.
// Now open output file and check, if it could be opened
std::ofstream outputFileStream(outputFileName);
if (outputFileStream) {
// Now, both input and output file streams are open
// Read lines
std::string line{};
while (std::getline(inputFileStream, line)) {
// We read one line. Count the number of words
// Put the string in a stringstream to easily extract words
std::istringstream iss(line);
// Counter for words
unsigned int wordCount{};
// Dummy word
std::string tempWord{};
// Do the counting
while (iss >> tempWord) ++wordCount;
// if the number of words is even then
if ((wordCount % 2) == 0) {
// Rinitialize stringstream
iss.clear(), iss.str(line);
// Here we store the resulting sentence
std::string result{};
// Now extract the word that we will keep and the other one that we will throw away
std::string usedWord{};
// Read 2 words, ignore the second
while (iss >> usedWord >> tempWord)
result += (usedWord + ' ');
// Write result to output
outputFileStream << result << '\n';
}
else
outputFileStream << line << '\n';
}
}
else std::cerr << "\n*** Error: could not open output file '" << outputFileName << "'\n\n";
}
else std::cerr << "\n*** Error: could not open input file '" << inputFileName << "'\n\n";
}

C++ read pipe delimited file

I have file1.txt with this information
4231650|A|4444
4225642|A|5555
I checked the code here for how to read pipe delimited file in C++
C++ Read file line by line then split each line using the delimiter
I modified the code a little bit per my needs. The problem is that it reads the first pipe fine but afterwards how do I read rest of the values?
This is my code:
std::ifstream file("file1.txt");
std::string line;
while(std::getline(file, line))
{
std::stringstream linestream(line);
std::string data;
std::string valStr1;
std::string valStr2;
std::getline(linestream, data, '|'); // read up-to the first pipe
// Read rest of the pipe values? Why did the accepted answer worked for int but not string???
linestream >> valStr1 >> valStr2;
cout << "data: " << data << endl;
cout << "valStr1: " << valStr1 << endl;
cout << "valStr2: " << valStr2 << endl;
}
Here's the output:
Code Logic starts here ...
data: 4231650
valStr1: A|4444
valStr2: A|4444
data: 4225642
valStr1: A|5555
valStr2: A|5555
Existing ...
Why did the accepted answer worked for int but not string?
Because | is not a digit and is an implicit delimiter for int numbers. But it is a good char for a string.
Continue doing it in the same way
std::getline(linestream, data, '|');
std::getline(linestream, varStr1, '|');
std::getline(linestream, varStr2);

How skip reading " : " while taking input from a file in C++

Lets say I want to input the hours, minutes and seconds from the first line of a file and store them to 3 different variables, hrs, mins and sec respectively.
I cant figure out an easy way to skip reading the colon character (":").
Input file example:
12:49:00
Store:
hrs = 12
mins = 59
sec = 00
You can use std::regex to match, range-check and validate your input all at once.
#include <iostream>
#include <regex>
#include <string>
int main()
{
const std::regex time_regex("(\\d|[0,1]\\d|2[0-3]):([0-5]\\d):([0-5]\\d)");
std::smatch time_match;
std::string line;
while (std::getline(std::cin, line))
{
if (std::regex_match(line, time_match, time_regex))
{
int hours = std::stoi(time_match[1]);
int minutes = std::stoi(time_match[2]);
int seconds = std::stoi(time_match[3]);
std::cout << "h=" << hours << " m=" << minutes << " s=" << seconds << std::endl;
}
else
{
std::cout << "Invalid time: " << line << std::endl;
}
}
return 0;
}
See this example live here.
Breaking down the regular expression (\\d|[0,1]\\d|2[0-3]):([0-5]\\d):([0-5]\\d):
\d|[0,1]\d|2[0-3] matches the hour (24-hour time) which is one of:
\d : 0-9
[0,1]\d : 01-19
2[0-3] : 20-23
[0-5]\d matches the minutes: two digits 00-59
[0-5]\d matches the seconds: two digits 00-59, as above.
An alternative not using a temporary character for skipping the colon:
#include <iostream>
int main()
{
int h,m,s;
std::cin >> h;
std::cin.ignore(1) >> m;
std::cin.ignore(1) >> s;
std::cout << h << ':' << m << ':' << s << std::endl;
return 0;
}
This seems to work:
int h, m, s;
char c;
cin >> h >> c >> m >> c >> s;
You just skip : symbol this way. I don't know whether it's a good solution.
With cin.ignore:
cin >> h;
cin.ignore(1);
cin >> m;
cin.ignore(1);
cin >> s;
There are already several good answers and one that has already been accepted; however I like to propose my solution not only as a valid answer to your problem but also in regards to a good design practice. IMHO when it involves reading information from a file and storing it's contents to variables or data structures I prefer to do it in a specific way. I like to separate the functionality and responsibility of specific operations into their own functions:
1: I first like to have a function to open a file, read the contents and to store the information into either a string, a stream or some large buffer. Once the appropriate amount of information is read from the file, then the function will close the file handle as we are done with it and then return back the results. There are several ways to do this yet they are all similar.
a: Read a single line from the file and return back a string or a stream.
b: Read in all information form the file line by line and store each line into its own string or stream and return back a vector of those strings or streams.
c: Read in all of the contents of the file into a single string, stream or large buffer and return that back.
2: After I have the contents of that file then I will typically call a function that will parse that data and these functions will vary depending on the type of content that needs to be parsed based on the data structures that will be used. Also, these parsing functions will call a function that will split the string into a vector of strings called tokens. After the split string function is called then the parsing of data will use the string manipulators-converters to convert a string to the required built in types that are needed for the current data structure that is in use and store them into the data structure that is passed in by reference.
3: There are two variations of my splitString function.
a: One takes a single character as a delimiter.
b: The other will take a string as its delimiter.
c: Both functions will return a vector of strings, based on the delimiter used.
Here is an example of my code using this text file for input.
time.txt
4:32:52
main.cpp
#include <vector>
#include <string>
#include <sstream>
#include <fstream>
#include <iostream>
#include <exception>
struct Time {
int hours;
int minutes;
int seconds;
};
std::vector<std::string> splitString( const std::string& s, char delimiter ) {
std::vector<std::string> tokens;
std::string token;
std::istringstream tokenStream( s );
while( std::getline( tokenStream, token, delimiter ) ) {
tokens.push_back( token );
}
return tokens;
}
std::string getLineFromFile( const char* filename ) {
std::ifstream file( filename );
if( !file ) {
std::stringstream stream;
stream << "failed to open file " << filename << '\n';
throw std::runtime_error( stream.str() );
}
std::string line;
std::getline( file, line );
file.close();
return line;
}
void parseLine( const std::string& fileContents, Time& time ) {
std::vector<std::string> output = splitString( fileContents, ':' );
// This is where you would want to do your sanity check to make sure
// that the contents from the file are valid inputs before converting
// them to the appropriate types and storing them into your data structure.
time.hours = std::stoi( output.at( 0 ) );
time.minutes = std::stoi( output.at( 1 ) );
time.seconds = std::stoi( output.at( 2 ) );
}
int main() {
try {
Time t;
std::string line = getLineFromFile( "time.txt" );
parseLine( line, t );
std::cout << "Hours: " << t.hours << '\n'
<< "Minutes: " << t.minutes << '\n'
<< "Seconds: " << t.seconds << "\n\n";
} catch( std::runtime_error& e ) {
std::cerr << e.what() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Output:
Hours: 4
Minutes: 32
Seconds: 52
Now as you can see in this particular situation the functions that are being used here is designed only to read a single line from the file and of course the very first line from the file. I have other functions in my library not shown here that will read each line of a file until there are no more lines to read, or read all of the file into a single buffer. I have another version of split string that will take a string as its delimiter instead of a single character. Finally for the parsing function, each parsing function will end up being unique due to the fact that it will rely on the data structure that you are trying to use.
This allows the code to be readable as each function does what it is supposed to do and nothing more. I prefer this design over the fact of trying to get information from a file and trying to parse it while the file is open. Too many things can go wrong while the file is open and if the data is read wrong or corrupted but to the point where the compiler doesn't complain about it, then your variables or data structures may contain invalid information without you being aware of it. At least in this way you can open the file, get what you need from the file and store it into a string or a vector of strings, close the file when done reading and return back the contents. Then it becomes the parsing function's responsibility to test the data after it has been tokenized. Now, in the current parsing function that I shown above I did not do any sanity check to keep things simple, but that is where you would test your data to see if the information is valid before returning back your populated data structure.
If you are interested in another version of this where there are multiple lines being read in from the file, just comment a request and I will append it to this answer.

C++ ignore alphabetical characters when reading input from a file

I have a program that takes an input stream from a text file, which contains positive integers, delimited by spaces. The file contains only numbers and one instance of abc which my program should ignore before continuing to read data from the file.
this is my code and it does not work
int line;
in >> line;
in.ignore(1, 'abc');
in.clear();
could someone specify what the problem is? essentially, I want to discard the alpha input, clear cin and continue to read from file but I get an infinite loop.
This code should work
I get three parts from the input stream, and put the useful parts into a stringstream and read from it
int part1, part2;
std::string abc;
in >> part1 >> abc >> part2;
std::stringstream ss;
ss << part1 << part2;
int line;
ss >> line
You can explicitely ignore any alphabetical character this way:
#include <locale>
#include <iostream>
std::istream& read_number(std::istream& is, int& number) {
auto& ctype = std::use_facet<std::ctype<char>>(is.getloc());
while(ctype.is(std::ctype_base::alpha, is.peek())) is.ignore(1);
return is >> number;
}
int main() {
using std::string;
using std::cin;
using std::locale;
int number;
while(read_number(std::cin, number)) {
std::cout << number << ' ';
}
}
For input like "452abc def 23 11 -233b" this will produce "452 23 11 -233".

Reading Values from fields in a .csv file?

Yesterday I made a small script with some help to read .csv files. I though I found a way to read the first value and store it, but for some reason it stores the last value instead.
I store what I thought should be the first value under value1, and re-display it to make sure it displays properly and is in fact being stored under a callable variable.
Does anyone know what is wrong with this code? I think I should be using vectors but as I read the reference sheets I find on the internet about them I am being a little thrown of. Any help is appreciated.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
int loop = 1;
string value;
string value1;
while(loop = 1)
{
cout << "Welcome! \n" << endl;
ifstream myfile;
myfile.open ("C:/Documents and Settings/RHatfield/My Documents/C++/Product Catalog Creator/Source/External/Sample.csv");
while (myfile.good())
getline ( myfile, value, ',' );
cout << string ( value) << endl;
value1 = value;
cout << value1;
myfile.close();
system("PAUSE");
return 0;
}
}
Your error seems to be in pure code formatting.
while (myfile.good())
There is no {} after the loop. So only next line is repeated.
The following code is executed after reading of the whole file.
cout << string ( value) << endl;
Thus value stores the last line of the file.
You may want to change the condition in your while loop:
char separator;
int value1;
int value2;
int value3;
while (myfile >> value1)
{
// Skip the separator, e.g. comma (',')
myfile >> separator;
// Read in next value.
myfile >> value2;
// Skip the separator, e.g. comma (',')
myfile >> separator;
// Read in next value.
myfile >> value3;
// Ignore the newline, as it is still in the buffer.
myfile.ignore(10000, '\n');
// Process or store values.
}
The above code fragment is not robust but demonstrates the concept of reading from a file, skipping non-numeric separators and processing the end of the line. The code is optimized either.