c++ file io & splitting by separator - c++

I have a file with data listed as follows:
0, 2, 10
10, 8, 10
10, 10, 10
10, 16, 10
15, 10, 16
17, 10, 16
I want to be able to input the file and split it into three arrays, in the process trimming all excess spaces and converting each element to integers.
For some reason I can't find an easy way to do this in c++. The only success I've had is by inputting each line into an array, and then regexing out all the spaces and then splitting it up. This entire process took me a good 20-30 lines of code and its a pain to modify for say another separator(eg. space), etc.
This is the python equivalent of what I would like to have in C++:
f = open('input_hard.dat')
lines = f.readlines()
f.close()
#declarations
inint, inbase, outbase = [], [], []
#input parsing
for line in lines:
bits = string.split(line, ',')
inint.append(int(bits[0].strip()))
inbase.append(int(bits[1].strip()))
outbase.append(int(bits[2].strip()))
The ease of use of doing this in python is one of the reasons why I moved to it in the first place. However, I require to do this in C++ now and I would hate to have to use my ugly 20-30 line code.
Any help would be appreciated, thanks!

There's no real need to use boost in this example as streams will do the trick nicely:
int main(int argc, char* argv[])
{
ifstream file(argv[1]);
const unsigned maxIgnore = 10;
const int delim = ',';
int x,y,z;
vector<int> vecx, vecy, vecz;
while (file)
{
file >> x;
file.ignore(maxIgnore, delim);
file >> y;
file.ignore(maxIgnore, delim);
file >> z;
vecx.push_back(x);
vecy.push_back(y);
vecz.push_back(z);
}
}
Though if I were going to use boost I'd prefer the simplicity of tokenizer to regex... :)

There is really nothing wrong with fscanf, which is probably the fastest solution in this case. And it's as short and readable as the python code:
FILE *fp = fopen("file.dat", "r");
int x, y, z;
std::vector<int> vx, vy, vz;
while (fscanf(fp, "%d, %d, %d", &x, &y, &z) == 3) {
vx.push_back(x);
vy.push_back(y);
vz.push_back(z);
}
fclose(fp);

Something like:
vector<int> inint;
vector<int> inbase;
vector<int> outbase;
while (fgets(buf, fh)) {
char *tok = strtok(buf, ", ");
inint.push_back(atoi(tok));
tok = strtok(NULL, ", ");
inbase.push_back(atoi(tok));
tok = strtok(NULL, ", ");
outbase.push_back(atoi(tok));
}
Except with error checking.

why not the same code as in python :) ?
std::ifstream file("input_hard.dat");
std::vector<int> inint, inbase, outbase;
while (file.good()){
int val1, val2, val3;
char delim;
file >> val1 >> delim >> val2 >> delim >> val3;
inint.push_back(val1);
inbase.push_back(val2);
outbase.push_back(val3);
}

std::getline allows you to read a line of text, and you can use a string stream to parse the individual line:
string buf;
getline(cin, buf);
stringstream par(buf);
char buf2[512];
par.getline(buf2, 512, ','); /* Reads until the first token. */
Once you get the line of text into the string, you can actually use any parsing function you want, even sscanf(buf.c_str(), "%d,%d'%d", &i1, &i2, &i3), by using atoi on the substring with the integer, or through some other method.
You can also ignore unwanted characters in the input stream, if you know they're there:
if (cin.peek() == ',')
cin.ignore(1, ',');
cin >> nextInt;

If you don't mind using the Boost libraries...
#include <string>
#include <vector>
#include <boost/lexical_cast.hpp>
#include <boost/regex.hpp>
std::vector<int> ParseFile(std::istream& in) {
const boost::regex cItemPattern(" *([0-9]+),?");
std::vector<int> return_value;
std::string line;
while (std::getline(in, line)) {
string::const_iterator b=line.begin(), e=line.end();
boost::smatch match;
while (b!=e && boost::regex_search(b, e, match, cItemPattern)) {
return_value.push_back(boost::lexical_cast<int>(match[1].str()));
b=match[0].second;
};
};
return return_value;
}
That pulls the lines from the stream, then uses the Boost::RegEx library (with a capture group) to extract each number from the lines. It automatically ignores anything that isn't a valid number, though that can be changed if you wish.
It's still about twenty lines with the #includes, but you can use it to extract essentially anything from the file's lines. This is a trivial example, I'm using pretty much identical code to extract tags and optional values from a database field, the only major difference is the regular expression.
EDIT: Oops, you wanted three separate vectors. Try this slight modification instead:
const boost::regex cItemPattern(" *([0-9]+), *([0-9]+), *([0-9]+)");
std::vector<int> vector1, vector2, vector3;
std::string line;
while (std::getline(in, line)) {
string::const_iterator b=line.begin(), e=line.end();
boost::smatch match;
while (b!=e && boost::regex_search(b, e, match, cItemPattern)) {
vector1.push_back(boost::lexical_cast<int>(match[1].str()));
vector2.push_back(boost::lexical_cast<int>(match[2].str()));
vector3.push_back(boost::lexical_cast<int>(match[3].str()));
b=match[0].second;
};
};

If you want to be able to scale to harder input formats, you should consider spirit, boost parser combinator library.
This page has an example which almost do what you need (with reals and one vector though)

Related

Reading a input file into a vector

I'm trying to read a file of int's and double's into a vector but I am having difficulty doing so. Given something like:
1 2.1 3 4
2 4
3
9 0.1
How can I use ifstream and the getline function to convert the string into integers and doubles & inserting this into a vector?
I know this is incorrect but I am thinking of something along the lines of:
vector<Pair *> vec; //Pair is a class that contains a int & a double data member
string str;
double num;
ifstream f;
f.open("name of file");
while(getline(f, str){
num = stod(str);
}
To insert into the vector I believe I can do something along the lines of:
Pair * pairObj = new Pair(x,y); //"x" being of type int and "y" being of type double
v.push_back(pair);
I'm sorry if this is unclear, please let me know and I will do my best to explain myself.
You should just use stream iterators!
#include <iostream> // for IO
#include <vector> // for vector!
#include <iterator> // for stream iterator
#include <algorithm> // for copy (optional)
if you are directly initializing
vector<double>vdata{istream_iterator<double>(ifile),
istream_iterator<double>()};
else use copy or copy_n if you only want a fixed amount of data
copy(istream_iterator<double>(ifile),
istream_iterator<double(),
back_inserter(vdata));
if you are working with a large file i would recommend using this method
vector<doube>vdata;
// this will save alot of time, if you don't resize the vector must keep reallocating data
vdata.reserve(file_size);
copy(istream_iterator<double>(ifile),
istream_iterator<double>(),
back_inserter(vdata));
strtod() is C. Proper C++ uses the >> operator.
Once you have read each line of text, construct a std::istringstream from the string, then use operator>> to parse it.
Something along these line::
std::ifstream f("name of file");
// Check if the file was succesfully opened, etc...
std::string str;
while( getline(f, str))
{
std::istringstream i(str);
std::vector<double> v;
double d;
while (i >> d)
{
v.push_back(d);
}
if (!i.eof())
{
// Must be a parsing failure, deal with it in some way.
}
else
{
// Otherwise, v is the vector of numbers on this line.
}
}
string str;
std::vector< double> vd;
// loop reading lines of input
while( getline( f, str )
{
std::stringstream sst(str);
std::string a;
// loop reading space separated values in line
while( getline( sst, a, ' ' ) )
// conver to double and add to end of vectior
vd.push_back( stod( a );
}
// check for complete pairs
if( vd.size() % 2 )
cout << "Error!"
// loop over pairs
vector< pair<int,double> > vpairs;
for( int kp = 0; kp < vd.size()/2; kp++ )
vpairs.push_back( pair<int,double>( (int)vd[kp*2],vd[kp*2+1) );

Extract numbers from a line of string in c++

I am making a natural language calculator in C++. The user will input a line of string for calculation. The program will extract the numbers and the operation and apply it accordingly. Following is part of my code
#include<iostream>
#include<string>
#include<sstream>
using namespace std;
int main()
{
string inp;
float n1,n2;
string s1,s2;
cout<<"Enter your string"<<endl;
getline(cin,inp);
stringstream ss;
ss.str(inp);
ss>>s1>>n1>>s2>>n2;
}
The program will run successfully if the user enters in correct format i.e Add 2 and 3, Subtract 8 from 12.
But the problem is in two cases
If the user enters in some other format like "7 plus 6".
Even in the correct format but only one number "square root of 25".
Is there a solution which can extract the floats regardless of the position or number of floats?
Thanks
If what you want to do is literally extract the floats, you can take advantage of the fact that std::stof can additionally return where it leaves off, which you can use to determine if the entire "word" is a float (e.g. for "6c") and catch the invalid_argument for words that are definitely not floats (e.g. for "plus"):
std::vector<float> getFloats(const std::string& s) {
std::istringstream iss(s);
std::string word;
std::vector<float> result;
size_t pos = 0;
while (iss >> word) {
try {
float f = std::stof(word, &pos);
if (pos == word.size()) {
result.push_back(f);
}
}
catch (std::invalid_argument const& ) {
// no part of word is a float
continue;
}
}
return result;
}
With that, getFloats("7 plus 6") yields {7, 6} and getFloats("square root of 25") yields {25}.

Converting string to char and int data types

I've populated a string vector with with numbers and characters (*,+,-,/). I want to assign each number and character to two new vector, and int vector and a char vector. Is there a way to convert the everything from string to the desired data type?
You can use string stream in the <sstream> header.
string myString = "123";
stringstream sStream( myString );
int convertedInt;
sStream >> convertedInt.
Include the <sstream> header and you can do something like this:
std::vector<std::string> stringVector = /* get data from somewhere */
std::vector<int> intVector;
std::vector<char> charVector;
for (std::vector<std::string>::const_iterator it = stringVector.begin(); it != stringVector.end(); it++)
{
if (it->length() == 0)
continue; // ignore any empty strings
int intValue;
std::istingstream ss(*it);
if (ss >> someValue) // try to parse string as integer
intVector.push_back(someValue); // int parsed successfully
else
charVector.pushBack((*it)[0]);
}
This assumes anything that cannot be parsed as an integer should be pushed into the char vector instead (so, 234, 100000 and -34 will be put into intVector, and /, + etc will be put into charVector). Only the first character of a non-integer value is pushed, so if you have *hello or *123, only * will be put into the charVector.
If you are using C++11, you can swap the std::vector<std::string>::const_iterator with auto to make it look a bit nicer.

Parsing coordinates from file using c++

I have a question about parsing coordinates from a file into my C++ program.
The content of the file "file.txt" consists of one line: "1,2"
the 1 needs to be the X coordinate. The ',' is the delimiter. And the 2 is the Y coordinate.
The output of my program is: "1".
It looks like my program only puts the string in front of the delimiter in the vector and then thinks its the end of the file.
How can i solve this problem?
You can find my code down here. Thanks in advance!
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <sstream>
char data[220];
void parseString(std::string string);
int main(int argc, char **argv) {
std::ifstream indata("file.txt");
std::vector <std::string> buffer(5);
int i = 0;
while(indata.good())
{
indata.getline(data, 220);
parseString(data);
++i;
}
return 0;
}
void parseString(std::string string){
std::string delimiter = ",";
size_t pos = 0;
std::string token;
std::vector<std::string> tempVector(2);
int i = 0;
while ((pos = string.find(delimiter)) != std::string::npos) {
token = string.substr(0, pos);
tempVector[i] = token;
string.erase(0, pos + delimiter.length());
}
for(std::string S : tempVector){
std::cout << S << std::endl;
}
}
Here is the problem come from:
while ((pos = string.find(delimiter)) != std::string::npos) {
token = string.substr(0, pos);
tempVector[i] = token;
string.erase(0, pos + delimiter.length());
}
After the first loop parsing (in while), you erase the first part, i.e. "1,", which leaves you only "2". Then you will stop here as no more delimiter is in it. That's why you only got 1.
You can simply put string data into a std::istringstream, then you can parse data easily by using >>:
std::istringstream iss(data); // e.g. data = "1,2"
int first_int, second_int;
char delimiter;
iss >> first_int >> delimiter >> second_int;
| | |
1 ',' 2
The root of the problem is that your requirements are underspecified. For example:
Can you assume that every coordinate is just from 0 to 9? Or are there coordinates with more digits?
Can there be negative coordinates? Should you be able to handle a minus character? Is a plus character allowed, i.e. something like "-1,+1"?
Where is whitespace allowed?
Do you have to handle errors such as when the file is empty or there is no ',' at all, or if there are multiple commas, or if one of the supposed numbers does not consist of digits?
Are you allowed to ignore everything after correct input, i.e. something like "1,2xxx"?
For the simplest of requirements imaginable here, you could just do:
if (data[1] == ',') {
int x = data[0] - '0';
int y = data[2] - '0';
}
But that's apparently not good enough. So you do have more complex requirements, and I think you should put more thought into them. Only then will you be able to produce a really correct program.
As a final word, mind that user input is always a very complex thing, and it's generally hard to think about and cover each and every corner case, but everyone likes programs which handle user input correctly and intuitively and report errors in the most precise way possible, don't we? :)

Need a regular expression to extract only letters and whitespace from a string

I'm building a small utility method that parses a line (a string) and returns a vector of all the words. The istringstream code I have below works fine except for when there is punctuation so naturally my fix is to want to "sanitize" the line before I run it through the while loop.
I would appreciate some help in using the regex library in c++ for this. My initial solution was to us substr() and go to town but that seems complicated as I'll have to iterate and test each character to see what it is then perform some operations.
vector<string> lineParser(Line * ln)
{
vector<string> result;
string word;
string line = ln->getLine();
istringstream iss(line);
while(iss)
{
iss >> word;
result.push_back(word);
}
return result;
}
Don't need to use regular expressions just for punctuation:
// Replace all punctuation with space character.
std::replace_if(line.begin(), line.end(),
std::ptr_fun<int, int>(&std::ispunct),
' '
);
Or if you want everything but letters and numbers turned into space:
std::replace_if(line.begin(), line.end(),
std::not1(std::ptr_fun<int,int>(&std::isalphanum)),
' '
);
While we are here:
Your while loop is broken and will push the last value into the vector twice.
It should be:
while(iss)
{
iss >> word;
if (iss) // If the read of a word failed. Then iss state is bad.
{ result.push_back(word);// Only push_back() if the state is not bad.
}
}
Or the more common version:
while(iss >> word) // Loop is only entered if the read of the word worked.
{
result.push_back(word);
}
Or you can use the stl:
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(result)
);
[^A-Za-z\s] should do what you need if your replace the matching characters by nothing. It should remove all characters that are not letters and spaces. Or [^A-Za-z0-9\s] if you want to keep numbers too.
You can use online tools like this one : http://gskinner.com/RegExr/ to test out your patterns (Replace tab). Indeed some modifications can be required based on the regex lib you are using.
I'm not positive, but I think this is what you're looking for:
#include<iostream>
#include<regex>
#include<vector>
int
main()
{
std::string line("some words: with some punctuation.");
std::regex words("[\\w]+");
std::sregex_token_iterator i(line.begin(), line.end(), words);
std::vector<std::string> list(i, std::sregex_token_iterator());
for (auto j = list.begin(), e = list.end(); j != e; ++j)
std::cout << *j << '\n';
}
some
words
with
some
punctuation
The simplest solution is probably to create a filtering
streambuf to convert all non alphanumeric characters to space,
then to read using std::copy:
class StripPunct : public std::streambuf
{
std::streambuf* mySource;
char myBuffer;
protected:
virtual int underflow()
{
int result = mySource->sbumpc();
if ( result != EOF ) {
if ( !::isalnum( result ) )
result = ' ';
myBuffer = result;
setg( &myBuffer, &myBuffer, &myBuffer + 1 );
}
return result;
}
public:
explicit StripPunct( std::streambuf* source )
: mySource( source )
{
}
};
std::vector<std::string>
LineParser( std::istream& source )
{
StripPunct sb( source.rdbuf() );
std::istream src( &sb );
return std::vector<std::string>(
(std::istream_iterator<std::string>( src )),
(std::istream_iterator<std::string>()) );
}