C++ string parsing (python style) - c++

I love how in python I can do something like:
points = []
for line in open("data.txt"):
a,b,c = map(float, line.split(','))
points += [(a,b,c)]
Basically it's reading a list of lines where each one represents a point in 3D space, the point is represented as three numbers separated by commas
How can this be done in C++ without too much headache?
Performance is not very important, this parsing only happens one time, so simplicity is more important.
P.S. I know it sounds like a newbie question, but believe me I've written a lexer in D (pretty much like C++) which involves reading some text char by char and recognizing tokens,
it's just that, coming back to C++ after a long period of python, just makes me not wanna waste my time on such things.

I`d do something like this:
ifstream f("data.txt");
string str;
while (getline(f, str)) {
Point p;
sscanf(str.c_str(), "%f, %f, %f\n", &p.x, &p.y, &p.z);
points.push_back(p);
}
x,y,z must be floats.
And include:
#include <iostream>
#include <fstream>

All these good examples aside, in C++ you would normally override the operator >> for your point type to achieve something like this:
point p;
while (file >> p)
points.push_back(p);
or even:
copy(
istream_iterator<point>(file),
istream_iterator<point>(),
back_inserter(points)
);
The relevant implementation of the operator could look very much like the code by j_random_hacker.

The C++ String Toolkit Library (StrTk) has the following solution to your problem:
#include <string>
#include <deque>
#include "strtk.hpp"
struct point { double x,y,z; }
int main()
{
std::deque<point> points;
point p;
strtk::for_each_line("data.txt",
[&points,&p](const std::string& str)
{
strtk::parse(str,",",p.x,p.y,p.z);
points.push_back(p);
});
return 0;
}
More examples can be found Here

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm> // For replace()
using namespace std;
struct Point {
double a, b, c;
};
int main(int argc, char **argv) {
vector<Point> points;
ifstream f("data.txt");
string str;
while (getline(f, str)) {
replace(str.begin(), str.end(), ',', ' ');
istringstream iss(str);
Point p;
iss >> p.a >> p.b >> p.c;
points.push_back(p);
}
// Do something with points...
return 0;
}

This answer is based on the previous answer by j_random_hacker and makes use of Boost Spirit.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <boost/spirit.hpp>
using namespace std;
using namespace boost;
using namespace boost::spirit;
struct Point {
double a, b, c;
};
int main(int argc, char **argv)
{
vector<Point> points;
ifstream f("data.txt");
string str;
Point p;
rule<> point_p =
double_p[assign_a(p.a)] >> ','
>> double_p[assign_a(p.b)] >> ','
>> double_p[assign_a(p.c)] ;
while (getline(f, str))
{
parse( str, point_p, space_p );
points.push_back(p);
}
// Do something with points...
return 0;
}

Fun with Boost.Tuples:
#include <boost/tuple/tuple_io.hpp>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
int main() {
using namespace boost::tuples;
typedef boost::tuple<float,float,float> PointT;
std::ifstream f("input.txt");
f >> set_open(' ') >> set_close(' ') >> set_delimiter(',');
std::vector<PointT> v;
std::copy(std::istream_iterator<PointT>(f), std::istream_iterator<PointT>(),
std::back_inserter(v)
);
std::copy(v.begin(), v.end(),
std::ostream_iterator<PointT>(std::cout)
);
return 0;
}
Note that this is not strictly equivalent to the Python code in your question because the tuples don't have to be on separate lines. For example, this:
1,2,3 4,5,6
will give the same output than:
1,2,3
4,5,6
It's up to you to decide if that's a bug or a feature :)

You could read the file from a std::iostream line by line, put each line into a std::string and then use boost::tokenizer to split it. It won't be quite as elegant/short as the python one but a lot easier than reading things in a character at a time...

Its nowhere near as terse, and of course I didn't compile this.
float atof_s( std::string & s ) { return atoi( s.c_str() ); }
{
ifstream f("data.txt")
string str;
vector<vector<float>> data;
while( getline( f, str ) ) {
vector<float> v;
boost::algorithm::split_iterator<string::iterator> e;
std::transform(
boost::algorithm::make_split_iterator( str, token_finder( is_any_of( "," ) ) ),
e, v.begin(), atof_s );
v.resize(3); // only grab the first 3
data.push_back(v);
}

One of Sony Picture Imagework's open-source projects is Pystring, which should make for a mostly direct translation of the string-splitting parts:
Pystring is a collection of C++ functions which match the interface and behavior of python’s string class methods using std::string. Implemented in C++, it does not require or make use of a python interpreter. It provides convenience and familiarity for common string operations not included in the standard C++ library
There are a few examples, and some documentation

all these are good examples. yet they dont answer the following:
a CSV file with different column numbers (some rows with more columns than others)
or when some of the values have white space (ya yb,x1 x2,,x2,)
so for those who are still looking, this class:
http://www.codeguru.com/cpp/tic/tic0226.shtml
is pretty cool... some changes might be needed

Related

separating 2 words from a string

I have done a lot of reading on this topic online, and cannot figure out if my code is working. i am working on my phone with the c4droid app, and the debugger is nearly useless as far as i can tell.
as the title says, i need to separate 2 words out of one input. depending on what the first word is, the second may or may not be used. if i do not need the second word everything is fine. if i need and have the second word it works, or seems to. but if i need a second word but only have the first it compiles, but crashes with an out of range exception.
ActionCommand is a vector of strings with 2 elements.
void splitstring(std::string original)
{
std::string
std::istringstream OrigStream(original);
OrigStream >> x;
ActionCommand.at(0) = x;
OrigStream >> x;
ActionCommand.at(1) = x;
return;
}
this code will separate the words right?
any help would be appreciated.
more of the code:
called from main-
void DoAction(Character & Player, room & RoomPlayerIn)
{
ParseAction(Player, GetAction(), RoomPlayerIn);
return;
}
std::string GetAction()
{
std::string action;
std::cout<< ">";
std::cin>>action;
action = Lowercase(action);
return action;
}
maybe Lowercase is the problem.
std::string Lowercase(std::string sourceString)
{
std::string destinationString;
destinationString.resize(sourceString.size());
std::transform(sourceString.begin(), sourceString.end(), destinationString.begin(), ::tolower);
return destinationString;
)
void ParseAction(Character & Player, std::string CommandIn, room & RoomPlayerIn)
(
std::vector<std::string> ActionCommand;
splitstring(CommandIn, ActionCommand);
std::string action = ActionCommand.at(0);
if (ActionCommand.size() >1)
std::string action2 = ActionCommand.at(1);
skipping some ifs
if (action =="wield")
{
if(ActionCommand.size() >1)
DoWield(action2);
else std::cout<<"wield what??"<<std::endl;
return;
}
and splitstring now looks like this
void splitstring(std::string const &original, std::vector<std::string> &ActionCommand)
{
std::string x;
std::istringstream OrigStream(original);
if (OrigStream >>x)
ActionCommand.push_back(x);
else return;
if (OrigStream>>x)
ActionCommand.push_back(x);
return;
}
#include <sstream>
#include <vector>
#include <string>
std::vector<std::string> ActionCommand;
void splitstring(std::string const &original)
{
std::string x;
std::istringstream OrigStream{ original };
if(OrigStream >> x)
ActionCommand.push_back(x);
else return;
if(OrigStream >> x)
ActionCommand.push_back(x);
}
Another idea would be to use the standard library. You can split a string into tokens (using spaces as dividers) with the following function:
#include <string>
#include <vector>
#include <sstream>
#include <iterator>
inline auto tokenize(const std::string &String)
{
auto Stream = std::stringstream(String);
return std::vector<std::string>{std::istream_iterator<std::string>{Stream}, std::istream_iterator<std::string>{}};
}
Here, the result is created in place by using an std::istream_iterator, which basically stands in for the >> operation in your example.
Warning:
This code needs at least c++11 to compile.

Elegant solution for string parsing

so i got a dozen of strings which i download, example's below which i need to parse.
"Australija 036 AUD 1 4,713831 4,728015 4,742199"
"Vel. Britanija 826 GBP 1 10,300331 10,331325 10,362319"
So my first idea was to count manually where the number i need is (the second one, 4,728015 or 10,331325 in exampels up) and get substring.(52,8)
But then i realized that few of the the strings im parsing has a >9 number in it, so i would need a substring of (51,9) for that case, so i cant do it this way
Second idea was to save all the number like chars in a vector, and then get vector[4] and save it into a seperate variable.
And third one is to just loop the string until i position myself after the 5th group of spaces and then substring it.
Just looking for some feedback on what would be "best".
The problem
is that we can have multiple words at the beginning of the string. I.e. the first element may contain spaces.
The solution
Start from the end of the string where we are stable.
Split the string up at the spaces. Start counting from the end, and pick the previous-last element.
Solution 1: Boost string algorithms
#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>
using namespace std;
using namespace boost;
string extractstring(string & fullstring)
{
vector<string> vs;
split(vs, fullstring);
return vs[vs.size() - 2];
}
Solution 2: QString (from Qt framework)
#include <QString>
QString extractstring(QString & fullstring)
{
QStringlist sl = fullstring.split(" ");
return sl[vs.size() - 2];
}
Solution 3: STL only
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
using namespace std;
string extractstring(string & fullstring)
{
istringstream iss(fullstring);
vector<string> elements;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(elements));
return elements[elements.size() - 2];
}
Other solutions: regex, C-pointer acrobatic.
Update:
I would not use sscanf based solutions because it may be difficult to identify multiple words at the beginning of the string.
I believe you can do it with a single line using sscanf?
http://www.cplusplus.com/reference/cstdio/sscanf/
For example (http://ideone.com/e2cCT9):
char *str = "Australija 4,713831 4,728015 4,742199";
char tmp[255];
int a,b,c,d;
sscanf(str, "%*[^0-9] %d,%d %d,%d", &a, &b, &c, &d);
printf("Parsed values: %d %d %d %d\n",a,b,c,d);
The hurdle is that the first field is allowed to have spaces, but the remaining fields are separated by spaces.
This may not be elegant, but the concept should work:
std::string text_line;
getline(my_file, text_line);
std::string::size_type field_1_start;
const unsigned int text_length = text_line.length();
for (field_1_start = 0; field_1_start < text_length; ++field_1_start)
{
if (is_digit(text_line[field_1_start])
{
break;
}
}
if (field_1_start < text_length)
{
std::string remaining_text = text_line.substr(field_1_start, text_length - field_1_start);
std::istringstream input_data(remaining_text);
int field_1;
std::string field2;
input_data >> field_1;
input_data >> field_2;
//...
}

Process returned 255 <0xFF>, c++, program stopped working

My c++ code looks like:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream fin("input.txt");
ofstream fout("out.txt");
string line;
unsigned int number=0;
int counter=0;
while(fin>>line)
{
while(counter<=2)
{
if(line[number]=='/')
counter++;
number++;
}
for(int i=0;i<number;i++)
{
fout.put(line[i]);
}
fout.put('\n');
number=0;
counter=0;
cin.clear();
}
cout<<"DONE!";
}
when i try to run it the program stops working, what might cause this problem? There is no infinit loop, because there is lot of '/' symbols in input.txt. The Program outputs file, this file does not contain whole information but only part of result...
If any information is needet to solve this problem, will be happy to share it.
SAMPLE of input.txt :
http://www.ttsgs.com/page/51/
http://meshing.it/companies/61855-Granify
http://www.theglobeandmail.com/report-on-business/small.....
https://venngage.com/blog/index.php/page/5/
http://www.klasscapital.com/portfolio/granify
http://content.granify.com/why-ab-testing-is-not-enough
http://meetups.shopify.com/meetups/edmonton-shopify-meet-up
http://www.klasscapital.com/partners/jeff-lawrence
https://medium.com/startup-communities/81bb8f8ddfcb
http://freshit.net/blog/internet-marketing/chyortova-dy.....
http://www.higeek.cn/granify?????????.....
http://savepearlharbor.com/?paged=2557
http://www.sellerforum.de/small-talk-allgemeines-f1/irc.....
https://trango.co/preventing-abandoned-carts-using-ai/
http://www.imdevice.com/204602/
http://www.ifanr.com/news/page/17
http://www.webdesign-inspiration.com/web-designs/style/.....
http://worthyofnote.co.uk/tag/ecommerce/page/3/
http://www.siliconsolutions-inc.com/granify-raises-1-5-.....
http://crowdfundingnews.com/category/tech/page/425/
http://meetups.shopify.com/meetups/30
SAMPLE of out.txt:
http://www.ttsgs.com/
http://meshing.it/
http://www.theglobeandmail.com/
https://venngage.com/
http://www.klasscapital.com/
http://content.granify.com/
http://meetups.shopify.com/
http://www.klasscapital.com/
https://medium.com/
http://freshit.net/
http://www.higeek.cn/
http://savepearlharbor.com/
http://www.sellerforum.de/
https://trango.co/
http://www.imdevice.com/
http://www.ifanr.com/
http://www.webdesign-inspiration.com/
http://worthyofnote.co.uk/
http://www.siliconsolutions-inc.com/
http://crowdfundingnews.com/
http://meetups.shopify.com/
https://angel.co/
http://cdling.com/
http://www.sunwei.asia/
https://angel.co/
I'd re-structure the code a bit. First and foremost, I'd separate the code to read and write data from the code to trim the string where needed. Second, I'd use standard algorithms to handle most of the file I/O.
The code would look something like this:
#include <string>
#include <algorithm>
#include <vector>
#include <fstream>
struct trim {
std::string operator()(std::string const &input) {
unsigned pos = 0;
for (int i=0; i<3; i++)
pos = input.find('/', pos+1);
return std::string(input, 0, pos+1);
}
};
int main() {
std::ifstream in("input.txt");
std::ofstream out("output.txt");
std::transform(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(out, "\n"),
trim());
}
Note that this depends on the fact that a URL isn't supposed to include any white space. If there's a chance that your input does contain whitespace other than the new-line characters separating the lines, then you'd also want to look at the answers to a previous question about how to iterate a line at a time. Although written specifically about std::cin, the same principles apply to essentially any input stream.
Anyway, for your sample input, this code produces the following output:
http://www.ttsgs.com/
http://meshing.it/
http://www.theglobeandmail.com/
https://venngage.com/
http://www.klasscapital.com/
http://content.granify.com/
http://meetups.shopify.com/
http://www.klasscapital.com/
https://medium.com/
http://freshit.net/
http://www.higeek.cn/
http://savepearlharbor.com/
http://www.sellerforum.de/
https://trango.co/
http://www.imdevice.com/
http://www.ifanr.com/
http://www.webdesign-inspiration.com/
http://worthyofnote.co.uk/
http://www.siliconsolutions-inc.com/
http://crowdfundingnews.com/
http://meetups.shopify.com/
The expression:
fin>>line
within the while condition may not return false since ifstream::operator>> return ifstream& and ifstream test may return true if at least one of failbit or badbit is set (I don't think it's the case), then you should have an infinite loop.

C++ Reading lines from txt, removing white spaces and saving to new file

could I ask for advice? Please, could someone give an example of code, which deletes spaces from lines of the first text file and saves "new text without spaces" into the second file. I understand how it be probably working, but I can not write it, because i am beginner in programing. Thanks for any advice.
My codes:
//read csv file
void readCSV(istream &input, vector< vector<string> > &output)
{
string csvLine;
while(getline(input, csvLine) )
{
istringstream csvStream(csvLine);
vector<string> csvColumn;
string csvElement;
while(getline(csvStream, csvElement) )
{
csvColumn.push_back(csvElement);
}
output.push_back(csvColumn);
}
}
//save all from csv to txt
void saveToTxt()
{
fstream file("file.csv", ios::in);
ofstream outfile;
outfile.open("file.txt");
typedef vector< vector<string> > csvVector;
csvVector csvData;
readCSV(file, csvData);
for(csvVector::iterator i = csvData.begin(); i != csvData.end(); ++i)
{
for(vector<string>::iterator j = i->begin(); j != i->end(); ++j)
{
outfile<<*j<<endl;
}
//code for deleting spaces, what i found, but i can't implement to above codes, coz my programming skill are not big
string s;
while (getline( cin, s ))
{
s.erase(
remove_if(
s.begin(),
s.end(),
ptr_fun <int, int> ( isspace )
),
s.end()
);
cout<<s<<endl;
I love solutions which won't qualify as a result for a homework assignment. Below is how I would write code for the specification, partly because I genuinely think that this is how it is to be done and partly to give others a bit of interesting reading. It contains all the necessary hints to create a teacher-friendly solution, too:
#include <algorithm>
#include <cctype>
#include <iterator>
int main() {
std::remove_copy_if(
std::istreambuf_iterator<char>(std::ifstream("in.txt").rdbuf()),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(std::ofstream("out.txt").rdbuf()),
[](unsigned char c){ return std::isspace(c) && c != '\n'; });
}
If you can't use a C++ 2011 compiler you'll need to replace the lambda function by an actual function with the same signature.
You can make this significantly simpler by using the same idea you have forremove_if, but instead applying it directly to determining whether to copy at all. something like the code below. Note: not tested, but I hope you get the idea.
#include <iostream>
#include <iterator>
#include <fstream>
#include <functional>
#include <cctype>
using namespace std;
int main(int argc, char *argv[])
{
ifstream is("file.csv", ios::in);
ofstream os("file.txt", ios::out|ios::trunc);
std::remove_copy_if(
istream_iterator<char>(is),
istream_iterator<char>(),
ostream_iterator<char>(os),
std::ptr_fun<int,int>(isspace));
os.close();
is.close();
return 0;
}
EDIT: I can't believe Deitmar and I had almost identical ideas.

c++ what is the fastest way of storing comma separated int in std::vector

I have a comma separated integers and I want to store them in std::vector<int>. Currently I am manually doing it. Is there any built-in function which did the above functionality?
Edit:
I was in hurry and forget to put full details
Actually i have string (to be exact Unicode string) containing CSvs e.g. "1,2,3,4,5"
Now i want to store them in std::vector<int> so in above case my vector would have five elements pushed into it. Currently i am doing this by manual but its slow as well as there is lot of mess with that code
It's probably not be the most efficient way, but here's a way to do it using the TR1 regex functionality (I also use C++0x lambda syntax in this sample, but obviously it could also be done without that):
#include <iostream>
#include <algorithm>
#include <vector>
#include <regex>
#include <iterator>
#include <cstdlib>
std::vector<int> GetList(const std::wstring &input)
{
std::vector<int> result;
std::wsregex_iterator::regex_type rex(L"(\\d+)(,|$)");
std::wsregex_iterator it(input.begin(), input.end(), rex);
std::transform(it, std::wsregex_iterator(), std::back_inserter(result),
[] (const std::wsregex_iterator::value_type &m)
{ return std::wcstol(m[1].str().c_str(), nullptr, 10); });
return result;
}
You can do this using purely in STL for simplicity (easy to reading, no complex libs needed), which will be fast for coding, but not the fastest in terms of execution speed (though you can probably tweak it a little, like pre-reserving space in the vector:
std::vector<int> GetValues(std::wstring s, wchar_t delim)
{
std::vector<int> v;
std::wstring i;
std::wstringstream ss(s);
while(std::getline(ss,i,delim))
{
std::wstringstream c(i);
int x;
c >> x;
v.push_back(x);
}
return v;
}
(no forwarding(&&) or atoi to keep the code portable).
Sadly, the STL doesn't allow you to split a string on a separator. You can use boost to do it though: (requires a recent C++ compiler such as MSVC 2010 or GCC 4.5)
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
#include <iterator>
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>
using namespace std;
int main(int argc, char** argv)
{
string input = "1,2,3,4";
vector<string> strs;
boost::split(strs, input, boost::is_any_of(","));
vector<int> result;
transform(
strs.begin(), strs.end(), back_inserter(result),
[](const string& s) -> int { return boost::lexical_cast<int>(s); }
);
for (auto i = result.begin(); i != result.end(); ++i)
cout << *i << endl;
}
The quick and dirty option is to use the C string library strtok() function, and atoi():
void Split(char * string, std::vector<int>& intVec)
{
char * pNext = strtok(string, ",");
while (pNext != NULL)
{
intVec.push_back(atoi(pNext));
pNext = strtok(NULL, ",");
}
}
Insert your own input data validation as required.
See:
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
As well as the wide string versions:
http://msdn.microsoft.com/en-us/library/2c8d19sb%28v=vs.71%29.aspx
http://msdn.microsoft.com/en-us/library/aa273408%28v=vs.60%29.aspx
EDIT:
Note that strtok() will modify your original string, so pass a copy if need be.
Try this:
It will read any type (that can be read with >>) separated by any char (that you choose).
Note: After the object is read there should can only be space between the object and the separator. Thus for things like ObjectSepReader<std::string, ','> it will read a word list separated by ','.
This makes it simple to use our standard algorithms:
#include <vector>
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>
int main()
{
std::stringstream data("1,2,3,4,5,6,7,8,9");
std::vector<int> vdata;
// Read the data from a stream
std::copy(std::istream_iterator<ObjectSepReader<int, ','> >(data),
std::istream_iterator<ObjectSepReader<int, ','> >(),
std::back_inserter(vdata)
);
// Copy data to output for testing
std::copy(vdata.begin(), vdata.end(), std::ostream_iterator<int>(std::cout," "));
}
The secret class to make it work.
template<typename T,char S>
struct ObjectSepReader
{
T value;
operator T const&() const {return value;}
};
template<typename T,char S>
std::istream& operator>>(std::istream& stream, ObjectSepReader<T,S>& data)
{
char terminator;
std::string line;
std::getline(stream, line, S);
std::stringstream linestream(line + ':');
if (!(linestream >> data.value >> terminator) || (linestream.tellg() != line.size()+1) || (terminator != ':'))
{ stream.setstate(std::ios::badbit);
}
return stream;
}
Personally I'd make a structure and have the vector contain instances of the struct.
Like so:
struct ExampleStruct
{
int a;
int b;
int c;
};
vector<ExampleStruct> structVec;
How about this?
#include <string>
#include <vector>
#include <functional>
#include <algorithm>
#include <iostream>
struct PickIntFunc
{
PickIntFunc(std::vector<int>& vecInt): _vecInt(vecInt),_pBegin(0){}
char operator () (const char& aChar)
{
if(aChar == ',' || aChar == 0)
{
_vecInt.push_back(atoi(std::string(_pBegin,&aChar).c_str()));
_pBegin = 0;
}
else
{
if(_pBegin == 0)
{
_pBegin = &aChar;
}
}
return aChar;
}
const char* _pBegin;
std::vector<int>& _vecInt;
};
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<int> vecInt;
char intStr[] = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20";
std::for_each(intStr,intStr+sizeof(intStr),PickIntFunc(vecInt));
// Now test it
std::for_each(vecInt.begin(),vecInt.end(), [] (int i) { std::cout << i << std::endl;});
return 0;
}