Split a c++ string without boost and not on whitespace [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Splitting a string in C++
I have a string:
14332x+32x=10
I'd like to split it so that it looks like:
[14332][+32][10]
So far, I've tried doing
char c;
std::stringstream ss(equation1);
while (ss >> c) {
std::cout << c << std::endl;
}
but after testing what that prints, I don't think it's possible to do from that info. I know that I need to split the string on x and =, but I'm not sure if that's possible and if it is how. I've googled it and didn't find anything that looked helpful, but i'm new too c++ and the answer might be right in front of me.
I'd like to not use boost. Any advice would be helpful!

Consider using using a facet that specifies x and = as whitespace characters:
#include <locale>
#include <iostream>
#include <sstream>
struct punct_ctype : std::ctype<char> {
punct_ctype() : std::ctype<char>(get_table()) {}
static mask const* get_table()
{
static mask rc[table_size];
rc[' '] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
rc['x'] = std::ctype_base::space;
rc['='] = std::ctype_base::space;
return &rc[0];
}
};
int main() {
std::string equation;
while(std::getline(std::cin, equation)) {
std::istringstream ss(equation);
ss.imbue(std::locale(ss.getloc(), new punct_ctype));
std::string term;
while(ss >> term) {
std::cout << "[" << term << "]";
}
std::cout << "\n";
}
}

The manual way would be to to do a for loop on each character in the string and if the character is == the character your splitting by copy it to a new string (use list/array of strings if >1 split is expected).
Also I think std has split by character functionality. If not, then stringstream::GetLine() has an overload that takes in a character to split by and it will ignore spaces.
GetLine() is very good :)

You can use sscanf like this:
sscanf(s.c_str(), "%[^x]x%[^x]x=%s", a, b, c);
Where %[^x] represents "any character except x". If you don't care for the symbols (i.e. + etc) but just for the numbers, you could do something like:
sscanf(s.c_str(), "%dx%dx=%d", &x, &y, &z);

If you don't mind using c++11, you could use something similar to this:
#include <string>
#include <vector>
#include <iostream>
#include <algorithm>
#include <functional>
#include <unordered_set>
typedef std::vector<std::string> strings;
typedef std::unordered_set<char> tokens;
struct tokenize
{
tokenize(strings& output,const tokens& t) :
v_(output),
t_(t)
{}
~tokenize()
{
if(!s.empty())
v_.push_back(s);
}
void operator()(const char &c)
{
if(t_.find(c)!=t_.end())
{
if(!s.empty())
v_.push_back(s);
s="";
}
else
{
s = s + c;
}
}
private:
std::string s;
strings& v_;
const tokens& t_;
};
void split(const std::string& input, strings& output, const tokens& t )
{
tokenize tokenizer(output,t);
for( auto i : input )
{
tokenizer(i);
}
}
int main()
{
strings tokenized;
tokens t;
t.insert('x');
t.insert('=');
std::string input = "14332x+32x=10";
split(input,tokenized,t);
for( auto i : tokenized )
{
std::cout<<"["<<i<<"]";
}
return 0;
}
Ideone link to the above code: http://ideone.com/17g75F

See this SO answer for a getline_until() function that provides a simple, basic tokenization capability that should let you do something like the following:
#include <string>
#include <stringstream>
#include "getline_until.h"
int main()
{
std::string equation1("14332x+32x=10");
std::stringstream ss(equation1);
std::string token;
while (getline_until(ss, token, "x=")) {
if (!token.empty()) std::cout << "[" << token << "]";
}
std::cout << std::endl;
}
The getline_until() function lets you specify a list of delimiters similar to strtok() (though getline_until() will return empty tokens instead of skipping a run of delimiters like strtok()). Or you can provide a predicate that lets you use a function to decide when to delimit tokens.
One thing it won't let you do (again - similar to strtok() or the standard getline()) is split tokens on merely context - there has to be a delimiter character that gets discarded. For example, with the following input:
42+24
getline_until() (like strtok() or getline()) cannot split the above into the tokens 42, +, and 24.

Related

Formatted input in C++ . like this ---> x : y : z

I want to take inputs in this format (x : y : z) in C++ language.
in the input section the input formation will be like this:
x : y : z
where, x,y and z are three separate integer type inputs.
You can simply read it from any stream like this
#include <iostream>
int main() {
int x, y, z;
char colon;
if (std::cin >> x >> colon >> y >> colon >> z) {
std::cout << "\nYou entered:\t" << x << "\t" << y << "\t" << z;
}
else {
std::cerr << "\nError: Wrong input format\n";
}
return 0;
}
EDIT: Based on the comment of Alan Birtles I will add input validation. Although, I could not read that in the question.
And before people want to generalize the question, I will also answer that.
But first. You can also use:
if ((std::cin >> x >> colon) && (colon ==':') && (std::cin >> y >> colon) && (colon == ':') && (std::cin >> z)) {
I do not think that anybody cares about that, but, just to be complete . . .
For the general case. You should maybe better use std::getline to read a complete line of input and then split it.
You never need boost for such a task.
See some common patterns for splitting a string:
Splitting a string into tokens is a very old task. There are many many solutions available. All have different properties. Some are difficult to understand, some are hard to develop, some are more complex, slower or faster or more flexible or not.
Alternatives
Handcrafted, many variants, using pointers or iterators, maybe hard to develop and error prone.
Using old style std::strtok function. Maybe unsafe. Maybe should not be used any longer
std::getline. Most used implementation. But actually a "misuse" and not so flexible
Using dedicated modern function, specifically developed for this purpose, most flexible and good fitting into the STL environment and algortithm landscape. But slower.
Please see 4 examples in one piece of code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>
using Container = std::vector<std::string>;
std::regex delimiter{ "," };
int main() {
// Some function to print the contents of an STL container
auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };
// Example 1: Handcrafted -------------------------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Search for comma, then take the part and add to the result
for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {
// So, if there is a comma or the end of the string
if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {
// Copy substring
c.push_back(stringToSplit.substr(startpos, i - startpos));
startpos = i + 1;
}
}
print(c);
}
// Example 2: Using very old strtok function ----------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
c.push_back(token);
}
print(c);
}
// Example 3: Very often used std::getline with additional istringstream ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Put string in an std::istringstream
std::istringstream iss{ stringToSplit };
// Extract string parts in simple for loop
for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
;
print(c);
}
// Example 4: Most flexible iterator solution ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
//
// Everything done already with range constructor. No additional code needed.
//
print(c);
// Works also with other containers in the same way
std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
print(c2);
// And works with algorithms
std::deque<std::string> c3{};
std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));
print(c3);
}
return 0;
}
A possibility is boost::split(), which allows the specification of multiple delimiters and does not require prior knowledge of the size of the input:
#include <iostream>
#include <vector>
#include <string>
#include <boost/algorithm/string.hpp>
#include <boost/algorithm/string/split.hpp>
int main()
{
std::vector<std::string> tokens;
std::string s("x:y:z");
boost::split(tokens, s, boost::is_any_of(":"));
// "x" == tokens[0]
// "y" == tokens[1]
// "z" == tokens[2]
return 0;
}

Split string by the first N occurrences of a delimiter using boost::split

I know that maybe this is a silly question, but I didn't find an answer yet. Let's suppose I have the following string:
std::string str("1,2,3,4,5,6,7,8,9,10");
I would like to split it using the comma as a delimiter, but only at the first N occurrences. For instance, for N = 2 I should obtain as a result:
{ "1", "2", "3,4,5,6,7,8,9,10" }.
Is it possibile to do it with boost::split? I have always used it in the following form:
std::vector<std::string> values;
boost::split(values, str, boost::is_any_of(","), boost::token_compress_on);
which splits the string on every occurrence of the delimiter.
In case it is not possibile, can someone suggest me an efficient way to do it?
I would like to avoid dealing with regexes, i. e. I would like to write a function like:
std::vector<std::string> tokenize(const std::string& to_split, const std::string& delimiter, int occurrences)
Thanks in advance for the help.
You can use:
find_nth(str, ",", 1);
from <boost/algorithm/string/find.hpp> to find the 2nd (or nth) occurrence of the character of interest.
Then use the traditional substr() on that possition.
Then tokenize the first part of the string boost::split.
And add the last part to the vector.
You can make use of stringstream and a modified locale:
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
struct csv_whitespace : std::ctype<char> {
static const mask* make_table() {
static std::vector<mask> v(classic_table(),
classic_table() + table_size);
v[','] |= space;
v[' '] &= ~space;
return &v[0];
}
csv_whitespace(std::size_t refs = 0)
: std::ctype<char>{ make_table(), false, refs } { }
};
int main() {
std::string str("1,2,3,4,5,6,7,8,9,10");
std::stringstream ss(str);
auto loc = ss.getloc();
ss.imbue(std::locale(loc, new csv_whitespace));
std::size_t N = 3;
while(N--) {
std::string a;
ss >> a;
std::cout << a << '\n';
}
ss.ignore();
ss.imbue(loc);
std::string rest;
ss >> rest;
std::cout << "Rest: " << rest << "\n";
}
Live demo
The trick here is to treat commas as whitespace, which can be done in the ctype<> facet.

Using strtok() to parse text file

I've been trying to make a program that parses a text file and feeds 6 pieces of information into an array of objects. The problem for me is that I'm having issues figuring out how to process the text file. I was told that the first step I needed to do was to write some code that counted how many letters long each entry was. The txt file is in this format:
"thing1","thing2","thing3","thing4","thing5","thing6"
This is the current version of my code:
#include<iostream>
#include<string>
#include<fstream>
#include<cstring>
using namespace std;
int main()
{
ifstream myFile("Book List.txt");
while(myFile.good())
{
string line;
getline(myFile, line);
char *sArr = new char[line.length() + 1];
strcpy(sArr, line.c_str());
char *sPtr;
sPtr = strtok(sArr, " ");
while(sPtr != NULL)
{
cout << strlen(sPtr) << " ";
sPtr = strtok(NULL, " ");
}
cout << endl;
}
myFile.close();
return 0;
}
So there are two things making it hard for me right now.
1) How do I deal with the delimiters?
2) How do I deal with "skipping" the first quotation mark in each line?
Read in a string instead of a c-style string. This means that you can use the handy std methods.
The std::string::find() method should help you out with finding each thing that you want to parse.
http://www.cplusplus.com/reference/string/string/find/
You can use this to find all the commas, which will give you the starts of all the things.
Then you can use std::string::substr() to cut up the string into each piece.
http://www.cplusplus.com/reference/string/string/substr/
You can manage to get rid of the quotation marks by passing in 1 more than the start and 1 less than the length of the thing, you can also use
If you have to use strtok then this code snippet should give enough to modify your program to parse your data:
#include <cstdio>
#include <cstring>
int main ()
{
char str[] ="\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str,"\",");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ",\"");
}
return 0;
}
If you do not have to use strtok then you should use std::string as others have advised. Using std::string and std::istringstream:
#include <string>
#include <sstream>
#include <vector>
#include <iostream>
int main ()
{
std::string str2( "\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"" ) ;
std::istringstream is(str2);
std::string part;
while (getline(is, part, ','))
std::cout << part.substr(1,part.length()-2) << std::endl;
return 0;
}
For starters, don't use strtok if you can avoid it (and you easily can here - and you can even avoid using the find series of functions as well).
If you want to read in the whole line and then parse it:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
// defines a new ctype that treats commas as whitespace
struct csv_reader : std::ctype<char>
{
csv_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
rc[','] = std::ctype_base::space;
return &rc[0];
}
};
int main()
{
std::ifstream fin("yourFile.txt");
std::string line;
csv_reader csv;
std::vector<std::vector<std::string>> values;
while (std::getline(fin, line))
{
istringstream iss(line);
iss.imbue(std::locale(std::locale(), csv));
std::vector<std::string> vec;
std::copy(std::istream_iterator<std::string>(iss), std::istream_iterator<std::string>(), std::back_inserter(vec));
values.push_back(vec);
}
// values now contains a vector for each line that has the strings split by their commas
fin.close();
return 0;
}
That answers your first question. For your second, you can skip all the quotation marks by adding them to the rc mask (also treating them as whitespace) or you can strip them out afterwards (either directly or by using a transform):
std::transform(vec.begin(), vec.end(), vec.begin(), [](std::string& s)
{
std::string::iterator pend = std::remove_if(s.begin(), s.end(), [](char c)
{
return c == '"';
});
s.erase(pend, s.end());
});

Splitting a C++ std::string using tokens, e.g. ";" [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to split a string in C++?
Best way to split a string in C++? The string can be assumed to be composed of words separated by ;
From our guide lines point of view C string functions are not allowed and also Boost is also not allowed to use because of security conecerns open source is not allowed.
The best solution I have right now is:
string str("denmark;sweden;india;us");
Above str should be stored in vector as strings. how can we achieve this?
Thanks for inputs.
I find std::getline() is often the simplest. The optional delimiter parameter means it's not just for reading "lines":
#include <sstream>
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<string> strings;
istringstream f("denmark;sweden;india;us");
string s;
while (getline(f, s, ';')) {
cout << s << endl;
strings.push_back(s);
}
}
You could use a string stream and read the elements into the vector.
Here are many different examples...
A copy of one of the examples:
std::vector<std::string> split(const std::string& s, char seperator)
{
std::vector<std::string> output;
std::string::size_type prev_pos = 0, pos = 0;
while((pos = s.find(seperator, pos)) != std::string::npos)
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
output.push_back(substring);
prev_pos = ++pos;
}
output.push_back(s.substr(prev_pos, pos-prev_pos)); // Last word
return output;
}
There are several libraries available solving this problem, but the simplest is probably to use Boost Tokenizer:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
std::string str("denmark;sweden;india;us");
boost::char_separator<char> sep(";");
tokenizer tokens(str, sep);
BOOST_FOREACH(std::string const& token, tokens)
{
std::cout << "<" << *tok_iter << "> " << "\n";
}

Reading integers from a text file with words

I'm trying to read just the integers from a text file structured like this....
ALS 46000
BZK 39850
CAR 38000
//....
using ifstream.
I've considered 2 options.
1) Regex using Boost
2) Creating a throwaway string ( i.e. I read in a word, don't do anything with it, then read in the score ). However, this is a last resort.
Are there any ways to express in C++ that I want the ifstream to only read in text that is an integer? I'm reluctant to use regular expressions if it turns out that there is a much simpler way to accomplish this.
why to make simple things complicated?
whats wrong in this :
ifstream ss("C:\\test.txt");
int score;
string name;
while( ss >> name >> score )
{
// do something with score
}
Edit:
it's in fact possible to work on streams directly with spirit than I suggested previously, with a parser:
+(omit[+(alpha|blank)] >> int_)
and one line of code(except for variable definitions):
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
you get all numbers into a vector at once with one line, couldn't be simpler.
if you do not mind using boost.spirit2. the parser to get numbers from a line only is
omit[+(alpha|blank)] >> int_
to extract everything is
+(alpha|blank) >> int_
See the whole program below(Test with VC10 Beta 2):
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <cstring>
#include <vector>
#include <fstream>
#include <algorithm>
#include <iterator>
using std::cout;
using namespace boost::spirit;
using namespace boost::spirit::qi;
void extract_everything(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
std::string s;
int i;
parse(it_begin, it_end, +(alpha|blank)>>int_, s, i);
cout << "string " << s
<< "followed by nubmer " << i
<< std::endl;
}
void extract_number(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
int i;
parse(it_begin, it_end, omit[+(alpha|blank)] >> int_, i);
cout << "number only: " << i << std::endl;
}
void extract_line()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
std::string s;
int i;
// iterated file line by line
while(getline(f, s))
{
cout << "parsing " << s << " yields:\n";
extract_number(s); //
extract_everything(s);
}
}
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
int main(int argc, char * argv[])
{
extract_line();
extract_file();
return 0;
}
outputs:
parsing ALS 46000 yields:
number only: 46000
string ALS followed by nubmer 46000
parsing BZK 39850 yields:
number only: 39850
string BZK followed by nubmer 39850
parsing CAR 38000 yields:
number only: 38000
string CAR followed by nubmer 38000
46000, 39850, 38000,
You can call ignore to have in skip over a specified number of characters.
istr.ignore(4);
You can also tell it to stop at a delimiter. You would still need to know the maximum number of characters the leading string could be, but this would also work for shorter leading strings:
istr.ignore(10, ' ');
You could also write a loop that just reads characters until you see the first digit character:
char c;
while (istr.getchar(c) && !isdigit(c))
{
// do nothing
}
if (istr && isdigit(c))
istr.putback(c);
here goes :P
private static void readFile(String fileName) {
try {
HashMap<String, Integer> map = new HashMap<String, Integer>();
File file = new File(fileName);
Scanner scanner = new Scanner(file).useDelimiter(";");
while (scanner.hasNext()) {
String token = scanner.next();
String[] split = token.split(":");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
} else {
split = token.split("=");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
}
}
}
scanner.close();
System.out.println("Counts:" + map);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readFile("test.txt");
}
}
fscanf(file, "%*s %d", &num);
or %05d if you have leading zeros and fixed width of 5....
sometimes the fastest way to do things in C++ is to use C. :)
You can create a ctype facet that classifies letters as white space. Create a locale that uses this facet, then imbue the stream with that locale. Having that, you can extract numbers from the stream, but all letters will be treated as white space (i.e. when you extract numbers, the letters will be ignored just like a space or a tab would be):
Such a locale can look like this:
#include <iostream>
#include <locale>
#include <vector>
#include <algorithm>
struct digits_only: std::ctype<char>
{
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
if (rc['0'] == std::ctype_base::space)
std::fill_n(&rc['0'], 9, std::ctype_base::mask());
return &rc[0];
}
};
Sample code to use it could look like this:
int main() {
std::cin.imbue(std::locale(std::locale(), new digits_only()));
std::copy(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, "\n"));
}
Using your sample data, the output I get from this looks like this:
46000
39850
38000
Note that as it stands, I've written this to accept only digits. If (for example) you were reading floating point numbers, you'd also want to retain '.' (or the locale-specific equivalent) as the decimal point. One way to handle things is to start with a copy of the normal ctype table, and then just set the things you want to ignore as space.