Read CSV File Very Fast - c++

i have a csv file and i have to read this file with only fstream library. There are 8 columns but i will use only first three columns. This file contains 591.000 lines data.
I tried to read like this;
while (retailFile.good()) {
if (i == 0) continue;
getline(retailFile, invoiceNo, ';');
getline(retailFile, stockCode, ';');
getline(retailFile, desc, ';');
getline(retailFile, dummy, ';');
getline(retailFile, dummy, ';');
getline(retailFile, dummy, ';');
getline(retailFile, dummy, ';');
getline(retailFile, dummy);
i++;
}
Tried like that - I wasn't too hopeful - it was a complete disappointment.
How can read very fast? It's ridiculous to keep it in an empty variable. Can't we pass that columns?

To find the end of the line, you have to read through all of the columns in the line looking for the end of the line. This is unavoidable. You do not have to process those unwanted fields though.
Taking inspiration from option two of this linked answer I get something like
//discard first line without looking at it.
if (retailFile.ignore(std::numeric_limits<std::streamsize>::max(), '\n')
{ // ALWAYS test IO transactions to make sure they worked, even something as
// trivial as ignoring the input.
std::string line;
while (std::getline(retailFile, line))
{ // read the whole line
// wrap the line in a stream for easy parsing
std::istringstream stream (line);
if (std::getline(retailFile, invoiceNo, ';') &&
std::getline(retailFile, stockCode, ';') &&
std::getline(retailFile, desc, ';'))
{ // successfully read all three required columns
// Do not use anything you read until after you know it is good. Not
// checking leads to bugs and malware.
// strongly consider doing something with the variables here. The next loop
// iteration will write over them
i++;
}
else
{
// failed to find all three columns. You should look into why and
// handle accordingly.
}
}
}
else
{
// failed to ignore the line. You should look into why and handle accordingly.
}
You probably won't find much of an actual speed difference. Reading files off a disk is usually more time consuming than doing anything with the file unless you do a lot of stuff with the file's data after reading it. There are potentially faster ways to do the splitting of the line, but again the difference is probably buried in the cost of reading the file in the first place.

The question is: What is fast?
In the below demo, I create I file with 591.000 lines. Size is 74MB.
Then I set a bigger input buffer for the std::ifstream, read all lines, parse them, and copy the first 3 entries into the resulting vector. The rest I do ignore.
To avoid that the result is optimized away, I show 50 lines of output.
VS2019, C++17, Release Mode, all optimizations on.
Result: ~2.7s for reading and parsing all lines on my machine. (I must admit that I have 4 SSDs in RAID 0 via PCIe)
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <regex>
#include <array>
#include <chrono>
#include <iterator>
int main() {
// Put whatever filename you want
static const std::string fileName{ "r:\\big.txt" };
// Start Time measurement
auto start = std::chrono::system_clock::now();
#if 0
// Write file with 591000 lines
if (std::ofstream ofs(fileName); ofs) {
for (size_t i = 0U; i < 591000U; ++i) {
ofs << "invoiceNo_" << i << ";"
<< "stockCode_" << i << ";"
<< "description_" << i << ";"
<< "Field_4_" << i << ";"
<< "Field_5_" << i << ";"
<< "Field_6_" << i << ";"
<< "Field_7_" << i << ";"
<< "Field_8_" << i << "\n";
}
}
#endif
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
// How long did it take?
std::cout << "Time for writing the file: " << elapsed.count() << " ms\n";
// We are just interested in 3 fields
constexpr size_t NumberOfNeededFields = 3U;
// We expect 591000 lines, give a little bit more
constexpr size_t NumberOfExpectedFilesInFile = 600000U;
// We will create a bigger input buffer for our stream
constexpr size_t ifStreamBufferSize = 100000U;
static char buffer[ifStreamBufferSize];
// The delimtzer for our csv
static const std::regex delimiter{ ";" };
// Main working variables
using Fields3 = std::array<std::string, NumberOfNeededFields>;
static Fields3 fields3;
static std::vector<Fields3> fields{};
// Reserve space to avoid reallocation
fields.reserve(NumberOfExpectedFilesInFile);
// Start timer
start = std::chrono::system_clock::now();
// Open file and check, if it is open
if (std::ifstream ifs(fileName); ifs) {
// Set bigger file buffer
ifs.rdbuf()->pubsetbuf(buffer, ifStreamBufferSize);
// Read all lines
for (std::string line{}; std::getline(ifs, line); ) {
// Parse string
std::copy_n(std::sregex_token_iterator(line.begin(), line.end(), delimiter, -1), NumberOfNeededFields, fields3.begin());
// Store resulting 3 fields
fields.push_back(std::move(fields3));
}
}
end = std::chrono::system_clock::now();
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << "Time for parsing the file: " << elapsed.count() << " ms\n";
// Show some result
for (size_t i = 0; i < fields.size(); i += (fields.size()/50)) {
std::copy_n(fields[i].begin(), NumberOfNeededFields, std::ostream_iterator<std::string>(std::cout, " "));
std::cout << "\n";
}
return 0;
}

Related

Rid of white space in file print

i am trying to get my file to remove the leading and trailing space but it does not work.
this is the txt file contents:
392402 wench
I have tried printing out my code, and this is what is displayed.
first: 392402 wench second:
I want it to display this instead
first: 392402 second: wench
this is my code
void readFile(const string &fileName) {
int limit;
ifstream ifs(fileName);
string::size_type position;
key_type item;
mapped_type count;
string line;
if (ifs.is_open()) {
ifs >> limit;
for (int i = 0; i < limit; i++) {
getline(ifs, line);
position = line.find(" ", 0);
auto c = line.substr(position + 1);
item = line.substr(0, position);
cout << "first: " << c << " second: " << item << endl;
value_type value(item, count);
values.push_back(value);
}
} else {
cout << "Can't open file.";
}
what am i doing wrong? Thank you
The two biggest mistakes you're making are (a) not checking your values for expected output as you go, and (b) not running your code in a debugger to see what is really happening. If you had, the values of position, c, and item would have been blatantly wrong, and you could then surmise where to go from there.
Belaying the highly-likely possibility that the loop iteration is broken from inception because you never consumed the remainder of the entry line containing input, let's look at the actual data and what you're asking of it with your code.
We read this entire line:
392402 wench
You then ask "find the first single-space string in this line" via this code:
position = line.find(" ", 0);
Well, that would be here:
392402 wench
^here
So position is zero (0). You then ask for the sub-string, starting a that position + 1, through the end of the string with this code:
auto c = line.substr(position + 1);
Therefore c now contains (leading space removed via the +1):
392402 wench
Now we build item, which is done with this line:
item = line.substr(0, position);
Remember, position is zero, so you're asking for the string, starting at location 0, length 0. As you can imagine, that isn't going to amount to anything. So now item is an empty string.
Finally, the output statement:
cout << "first: " << c << " second: " << item << endl;
will produce:
first: 392402 wench second:
I.e. exactly what you're seeing. And that's it. Clearly this is wrong.
Alternative
Use better error checking, value checking, and a string stream for per-line extraction. The following code doesn't give two cents about your type aliases (mainly because you didn't include them anyway and I'd rather not loft any guesses as to their origin).
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <limits>
// Expects a file with the following format:
// count
// name1 value1
// name2 value2
// ...
void readFile(const std::string &fileName)
{
std::ifstream ifs(fileName);
if (ifs.is_open())
{
int limit;
if (ifs >> limit && limit > 0)
{
// consume through end of line.
ifs.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
// repeat until `limit` iterations or stream error/eof
std::string line;
for (int i = 0; i < limit && std::getline(ifs, line); i++)
{
std::istringstream iss(line);
// extract line values. Note these *can* be formatted
// extraction for things besides just strings
std::string first, second;
if (iss >> first >> second)
{
std::cout << "first: " << first << " second: " << second << '\n';
// TODO: whatever you want to do with first/second
}
}
}
ifs.close();
}
else
{
std::cerr << "Can't open file: " << fileName << '\n';
}
}
Note: The above code will NOT work for remaining-line-content as the expected second value. E.g. It will not process something like this as you may first expect:
10000 this is a multi-word description
will produce this:
first: 10000 second: this
which is considerably different than what you may be expecting:
first: 10000 second: this is a multi-word description
There was no suggestion in the original post such support was mandatory, though adding it wouldn't be terribly difficult to add. If it is a requirement, I leave that task to you.

How skip reading " : " while taking input from a file in C++

Lets say I want to input the hours, minutes and seconds from the first line of a file and store them to 3 different variables, hrs, mins and sec respectively.
I cant figure out an easy way to skip reading the colon character (":").
Input file example:
12:49:00
Store:
hrs = 12
mins = 59
sec = 00
You can use std::regex to match, range-check and validate your input all at once.
#include <iostream>
#include <regex>
#include <string>
int main()
{
const std::regex time_regex("(\\d|[0,1]\\d|2[0-3]):([0-5]\\d):([0-5]\\d)");
std::smatch time_match;
std::string line;
while (std::getline(std::cin, line))
{
if (std::regex_match(line, time_match, time_regex))
{
int hours = std::stoi(time_match[1]);
int minutes = std::stoi(time_match[2]);
int seconds = std::stoi(time_match[3]);
std::cout << "h=" << hours << " m=" << minutes << " s=" << seconds << std::endl;
}
else
{
std::cout << "Invalid time: " << line << std::endl;
}
}
return 0;
}
See this example live here.
Breaking down the regular expression (\\d|[0,1]\\d|2[0-3]):([0-5]\\d):([0-5]\\d):
\d|[0,1]\d|2[0-3] matches the hour (24-hour time) which is one of:
\d : 0-9
[0,1]\d : 01-19
2[0-3] : 20-23
[0-5]\d matches the minutes: two digits 00-59
[0-5]\d matches the seconds: two digits 00-59, as above.
An alternative not using a temporary character for skipping the colon:
#include <iostream>
int main()
{
int h,m,s;
std::cin >> h;
std::cin.ignore(1) >> m;
std::cin.ignore(1) >> s;
std::cout << h << ':' << m << ':' << s << std::endl;
return 0;
}
This seems to work:
int h, m, s;
char c;
cin >> h >> c >> m >> c >> s;
You just skip : symbol this way. I don't know whether it's a good solution.
With cin.ignore:
cin >> h;
cin.ignore(1);
cin >> m;
cin.ignore(1);
cin >> s;
There are already several good answers and one that has already been accepted; however I like to propose my solution not only as a valid answer to your problem but also in regards to a good design practice. IMHO when it involves reading information from a file and storing it's contents to variables or data structures I prefer to do it in a specific way. I like to separate the functionality and responsibility of specific operations into their own functions:
1: I first like to have a function to open a file, read the contents and to store the information into either a string, a stream or some large buffer. Once the appropriate amount of information is read from the file, then the function will close the file handle as we are done with it and then return back the results. There are several ways to do this yet they are all similar.
a: Read a single line from the file and return back a string or a stream.
b: Read in all information form the file line by line and store each line into its own string or stream and return back a vector of those strings or streams.
c: Read in all of the contents of the file into a single string, stream or large buffer and return that back.
2: After I have the contents of that file then I will typically call a function that will parse that data and these functions will vary depending on the type of content that needs to be parsed based on the data structures that will be used. Also, these parsing functions will call a function that will split the string into a vector of strings called tokens. After the split string function is called then the parsing of data will use the string manipulators-converters to convert a string to the required built in types that are needed for the current data structure that is in use and store them into the data structure that is passed in by reference.
3: There are two variations of my splitString function.
a: One takes a single character as a delimiter.
b: The other will take a string as its delimiter.
c: Both functions will return a vector of strings, based on the delimiter used.
Here is an example of my code using this text file for input.
time.txt
4:32:52
main.cpp
#include <vector>
#include <string>
#include <sstream>
#include <fstream>
#include <iostream>
#include <exception>
struct Time {
int hours;
int minutes;
int seconds;
};
std::vector<std::string> splitString( const std::string& s, char delimiter ) {
std::vector<std::string> tokens;
std::string token;
std::istringstream tokenStream( s );
while( std::getline( tokenStream, token, delimiter ) ) {
tokens.push_back( token );
}
return tokens;
}
std::string getLineFromFile( const char* filename ) {
std::ifstream file( filename );
if( !file ) {
std::stringstream stream;
stream << "failed to open file " << filename << '\n';
throw std::runtime_error( stream.str() );
}
std::string line;
std::getline( file, line );
file.close();
return line;
}
void parseLine( const std::string& fileContents, Time& time ) {
std::vector<std::string> output = splitString( fileContents, ':' );
// This is where you would want to do your sanity check to make sure
// that the contents from the file are valid inputs before converting
// them to the appropriate types and storing them into your data structure.
time.hours = std::stoi( output.at( 0 ) );
time.minutes = std::stoi( output.at( 1 ) );
time.seconds = std::stoi( output.at( 2 ) );
}
int main() {
try {
Time t;
std::string line = getLineFromFile( "time.txt" );
parseLine( line, t );
std::cout << "Hours: " << t.hours << '\n'
<< "Minutes: " << t.minutes << '\n'
<< "Seconds: " << t.seconds << "\n\n";
} catch( std::runtime_error& e ) {
std::cerr << e.what() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Output:
Hours: 4
Minutes: 32
Seconds: 52
Now as you can see in this particular situation the functions that are being used here is designed only to read a single line from the file and of course the very first line from the file. I have other functions in my library not shown here that will read each line of a file until there are no more lines to read, or read all of the file into a single buffer. I have another version of split string that will take a string as its delimiter instead of a single character. Finally for the parsing function, each parsing function will end up being unique due to the fact that it will rely on the data structure that you are trying to use.
This allows the code to be readable as each function does what it is supposed to do and nothing more. I prefer this design over the fact of trying to get information from a file and trying to parse it while the file is open. Too many things can go wrong while the file is open and if the data is read wrong or corrupted but to the point where the compiler doesn't complain about it, then your variables or data structures may contain invalid information without you being aware of it. At least in this way you can open the file, get what you need from the file and store it into a string or a vector of strings, close the file when done reading and return back the contents. Then it becomes the parsing function's responsibility to test the data after it has been tokenized. Now, in the current parsing function that I shown above I did not do any sanity check to keep things simple, but that is where you would test your data to see if the information is valid before returning back your populated data structure.
If you are interested in another version of this where there are multiple lines being read in from the file, just comment a request and I will append it to this answer.

getLine() returns newline and no data

I have the following code:
const char *fn = fileName.c_str();
std::ifstream file (fn);
std::vector<std::string> value(20000);
int i = 0;
while ( file.good() )
{
getline ( file, value[i] );
i+=1;
std::cout << value[i]<< std::endl;
std::cout << i << std::endl;
}
The program reads the whole file, I know this because the correct number of indexes are printed. However there is no data, just a new line before each printing of "i". This is a file that I have saved from excel in windows and am reading in Linux - Is this my issue? What happened to my data?
there is no data, just a new line before each printing of "i".
Because you increment i before accessing value[i].
Incrementing i just after accessing value[i] solves the problem of missing data.
DEMO
A better way to read in the file:
std::string text_line;
std::vector<string> file_lines;
while (std::getline(file, text_line))
{
file_lines.push_back(text_line);
}
Although not optimal speed-wise, it gets the job done and doesn't have an upper limit (except by the amount of memory your program is allowed).
Edit:
Sorry, I was simply fixing the logic error apparent.
However, here is an ideal version of reading lines of a file:
#include <iostream>
#include <fstream>
#include <vector>
int main() {
std::ifstream file {"test.txt"};
std::vector<std::string> values;
std::string temp;
while (getline(file, temp)) {
values.push_back(temp);
}
for (int i = 0; i < values.size(); ++i) {
std::cout << values[i] << '\n' << i << '\n';
}
}

c++ program for reading an unknown size csv file (filled only with floats) with constant (but unknown) number of columns into an array

was wondering if someone could give me a hand im trying to build a program that reads in a big data block of floats with unknown size from a csv file. I already wrote this in MATLAB but want to compile and distribute this so moving to c++.
Im just learning and trying to read in this to start
7,5,1989
2,4,2312
from a text file.
code so far.
// Read in CSV
//
// Alex Byasse
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <sstream>
#include <stdlib.h>
int main() {
unsigned int number_of_lines = 0;
FILE *infile = fopen("textread.csv", "r");
int ch;
int c = 0;
bool tmp = true;
while (EOF != (ch=getc(infile))){
if(',' == ch){
++c;
}
if ('\n' == ch){
if (tmp){
int X = c;
tmp = false;
}
++number_of_lines;
}
}
fclose(infile);
std::ifstream file( "textread.csv" );
if(!file){
std:cerr << "Failed to open File\n";
return 1;
}
const int ROWS = X;
const int COLS = number_of_lines;
const int BUFFSIZE = 100;
int array[ROWS][COLS];
char buff[BUFFSIZE];
std::string line;
int col = 0;
int row = 0;
while( std::getline( file, line ) )
{
std::istringstream iss( line );
std::string result;
while( std::getline( iss, result, ',' ) )
{
array[row][col] = atoi( result.c_str() );
std::cout << result << std::endl;
std::cout << "column " << col << std::endl;
std::cout << "row " << row << std::endl;
col = col+1;
if (col == COLS){
std:cerr << "Went over number of columns " << COLS;
}
}
row = row+1;
if (row == ROWS){
std::cerr << "Went over length of ROWS " << ROWS;
}
col = 0;
}
return 0;
}
My matlab code i use is >>
fid = fopen(twoDM,'r');
s = textscan(fid,'%s','Delimiter','\n');
s = s{1};
s_e3t = s(strncmp('E3T',s,3));
s_e4q = s(strncmp('E4Q',s,3));
s_nd = s(strncmp('ND',s,2));
[~,cell_num_t,node1_t,node2_t,node3_t,mat] = strread([s_e3t{:}],'%s %u %u %u %u %u');
node4_t = node1_t;
e3t = [node1_t,node2_t,node3_t,node4_t];
[~,cell_num_q,node1_q,node2_q,node3_q,node_4_q,~] = strread([s_e4q{:}],'%s %u %u %u %u %u %u');
e4q = [node1_q,node2_q,node3_q,node_4_q];
[~,~,node_X,node_Y,~] = strread([s_nd{:}],'%s %u %f %f %f');
cell_id = [cell_num_t;cell_num_q];
[~,i] = sort(cell_id,1,'ascend');
cell_node = [e3t;e4q];
cell_node = cell_node(i,:);
Any help appreciated.
Alex
I would, obviously, just use IOStreams. Reading a homogeneous array or arrays from a CSV file without having to bother with any quoting is fairly trivial:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
std::istream& comma(std::istream& in)
{
if ((in >> std::ws).peek() != std::char_traits<char>::to_int_type(',')) {
in.setstate(std::ios_base::failbit);
}
return in.ignore();
}
int main()
{
std::vector<std::vector<double>> values;
std::istringstream in;
for (std::string line; std::getline(std::cin, line); )
{
in.clear();
in.str(line);
std::vector<double> tmp;
for (double value; in >> value; in >> comma) {
tmp.push_back(value);
}
values.push_back(tmp);
}
for (auto const& vec: values) {
for (auto val: vec) {
std::cout << val << ", ";
}
std::cout << "\n";
}
}
Given the simple structure of the file, the logic can actually be simplified: Instead of reading the values individually, each line can be viewed as a sequence of values if the separators are read automatically. Since a comma won't be read automatically, the commas are replaced by spaced before creating the string stream for the internal lines. The corresponding code becomes
#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
int main()
{
std::vector<std::vector<double> > values;
std::ifstream fin("textread.csv");
for (std::string line; std::getline(fin, line); )
{
std::replace(line.begin(), line.end(), ',', ' ');
std::istringstream in(line);
values.push_back(
std::vector<double>(std::istream_iterator<double>(in),
std::istream_iterator<double>()));
}
for (std::vector<std::vector<double> >::const_iterator
it(values.begin()), end(values.end()); it != end; ++it) {
std::copy(it->begin(), it->end(),
std::ostream_iterator<double>(std::cout, ", "));
std::cout << "\n";
}
}
Here is what happens:
The destination values is defined as a vector of vectors of double. There isn't anything guaranteeing that the different rows are the same size but this is trivial to check once the file is read.
An std::ifstream is defined and initialized with the file. It may be worth checking the file after construction to see if it could be opened for reading (if (!fin) { std::cout << "failed to open...\n";).
The file is processed one line at a time. The lines are simply read using std::getline() to read them into a std::string. When std::getline() fails it couldn't read another line and the conversion ends.
Once the line is read, all commas are replaced by spaces.
From the thus modified line a string stream for reading the line is constructed. The original code reused a std::istringstream which was declared outside the loop to save the cost of constructing the stream all the time. Since the stream goes bad when the lines is completed, it first needed to be in.clear()ed before its content was set with in.str(line).
The individual values are iterated using an std::istream_iterator<double> which just read a value from the stream it is constructed with. The iterator given in is the start of the sequence and the default constructed iterator is the end of the sequence.
The sequence of values produced by the iterators is used to immediately construct a temporary std::vector<double> representing a row.
The temporary vector is pushed to the end of the target array.
Everything after that is simply printing the content of the produced matrix using C++11 features (range-based for and variables with automatically deduced type).
As proposed here changing getline escape may help you with better reading of csv file but you need to change type from string to int.
For dealing with any number of rows and cols you may use multi dimensional vector (vector inside vector as described here), then you have each row in one vector and all rows in the bigger vectors
int fclose(infile);
This line is wrong. The compiler thinks you're trying to initialize the variable fclose with a FILE*, which is wrong. It should be this if you're simply trying to close the file:
fclose(infile);
I intended this as an edit to Dietmar Kuhl's solution, but it was rejected as too large an edit...
The usual reason given for converting Matlab to C++ is performance. So I benchmarked these two solutions. I compiled with G++ 4.7.3 for cygwin with the following options "-Wall -Wextra -std=c++0x -O3 -fwhole-program". I tested on a 32-bit Intel Atom N550.
As input I used 2 10,000 line files. The first file was 10 "0.0" values per line, the second file was 100 "0.0" values per line.
I timed from the command line using time and I used the average of the sum of user+sys over three runs.
I modified the second program to read from std::cin as in the first program.
Finally, I ran the tests again with std::cin.sync_with_stdio(false);
Results (time in seconds):
sync no sync
10/line 100/line 10/line 100/line
prog A 1.839 16.873 0.721 6.228
prog B 1.741 16.098 0.721 5.563
The obvious conclusion is that version B is slightly faster, but more importantly, you should disable syncing with stdio.

Reading a file into an array

I would like to read a text file and input its contents into an array. Then I would like to show the contents of the array in the command line.
My idea is to open the file using:
inFile.open("pigData.txt")
And then to get the contents of the file using:
inFile >> myarray [size]
And then show the contents using a for loop.
My problem is that the file I am trying to read contain words and I don't know how to get a whole word as an element in the array. Also, let's say that the words are divided by spaces, thus:
hello goodbye
Could be found on the file. I would like to read the whole line "hello goodbye" into an element of a parallel array. How can I do that?
Should be pretty straightforward.
std::vector<std::string> file_contents;
std::string line;
while ( std::getline(inFile,line) )
file_contents.push_back(line);
std::vector<std::string>::iterator it = file_contents.begin();
for(; it!=file_contents.end() ; ++it)
std::cout << *it << "\n";
Edit:
Your comment about having "hello goodbye" as element zero and element one is slightly confusing to me. The above code snip will read each line of the file and store that as an individual entry in the array 'file_contents'. If you want to read it and split it on spaces that is slightly different.
For context, you could have provided a link to your previous question, about storing two lists of words in different languages. There I provided an example of reading the contents of a text file into an array:
const int MaxWords = 100;
std::string piglatin[MaxWords];
int numWords = 0;
std::ifstream input("piglatin.txt");
std::string line;
while (std::getline(input, line) && numWords < MaxWords) {
piglatin[numWords] = line;
++numWords;
}
if (numWords == MaxWords) {
std::cerr << "Too many words" << std::endl;
}
You can't have one parallel array. For something to be parallel, there must be at least two. For parallel arrays of words, you could use a declarations like this:
std::string piglatin[MaxWords];
std::string english[MaxWords];
Then you have two options for filling the arrays from the file:
Read an entire line, and the split the line into two words based on where the first space is:
while (std::getline(input, line) && numWords < MaxWords) {
std::string::size_type space = line.find(' ');
if (space == std::string::npos)
std::cerr << "Only one word" << std::endl;
piglatin[numWords] = line.substr(0, space);
english[numWords] = line.substr(space + 1);
++numWords;
}
Read one word at a time, and assume that each line has exactly two words on it. The >> operator will read a word at a time automatically. (If each line doesn't have exactly two words, then you'll have problems. Try it out to see how things go wrong. Really. Getting experience with a bug when you know what the cause is will help you in the future when you don't know what the cause is.)
while (input && numWords < MaxWords) {
input >> piglatin[numWords];
input >> english[numWords];
++numWords;
}
Now, if you really one one array with two elements, then you need to define another data structure because an array can only have one "thing" in each element. Define something that can hold two strings at once:
struct word_pair {
std::string piglatin;
std::string english;
};
Then you'll have just one array:
word_pair words[MaxWords];
You can fill it like this:
while (std::getline(input, line) && numWords < MaxWords) {
std::string::size_type space = line.find(' ');
if (space == std::string::npos)
std::cerr << "Only one word" << std::endl;
words[numWords].piglatin = line.substr(0, space);
words[numWords].english = line.substr(space + 1);
++numWords;
}
Notice how the code indexes into the words array to find the next word_pair object, and then it uses the . operator to get to the piglatin or english field as necessary.
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;
int main()
{
// This will store each word (separated by a space)
vector<string> words;
// Temporary variable
string buff;
// Reads the data
fstream inFile("words.txt");
while(!inFile.eof())
{
inFile>>buff;
words.push_back(buff);
}
inFile.close();
// Display
for(size_t i=0;i<words.size();++i) cout<<words[i]<<" ";
return 0;
}
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int main ()
{
vector<string> fileLines;
string line;
ifstream inFile("pigData.txt");
if ( inFile.is_open() ) {
while ( !inFile.eof() ) {
getline(inFile, line);
fileLines.push_back(line);
}
inFile.close();
} else {
cerr << "Error opening file" << endl;
exit(1);
}
for (int i=0; i<fileLines.size(); ++i) {
cout << fileLines[i] << "\n";
}
cout << endl;
return 0;
}