Tokenize a String in C++ - c++

I have a string currentLine="12 23 45"
I need to extract 12, 23, 45 from this string without using Boost libraries. Since i am using string, strtok fails for me. I have tried a number of things still no success.
Here is my last attempt
while(!inputFile.eof())
while(getline(inputFile,currentLine))
{
int countVar=0;
int inputArray[10];
char* tokStr;
tokStr=(char*)strtok(currentLine.c_str()," ");
while(tokstr!=NULL)
{
inputArray[countVar]=(int)tokstr;
countVar++;
tokstr=strtok(NULL," ");
}
}
}
the one without strtok
string currentLine;
while(!inputFile.eof())
while(getline(inputFile,currentLine))
{
cout<<atoi(currentLine.c_str())<<" "<<endl;
int b=0,c=0;
for(int i=1;i<currentLine.length();i++)
{
bool lockOpen=false;
if((currentLine[i]==' ') && (lockOpen==false))
{
b=i;
lockOpen=true;
continue;
}
if((currentLine[i]==' ') && (lockOpen==true))
{
c=i;
break;
}
}
cout<<b<<"b is"<<" "<<c;
}

Try this:
#include <sstream>
std::string str = "12 34 56";
int a,b,c;
std::istringstream stream(str);
stream >> a >> b >> c;
Read a lot about c++ streams here: http://www.cplusplus.com/reference/iostream/

std::istringstream istr(your_string);
std::vector<int> numbers;
int number;
while (istr >> number)
numbers.push_back(number);
Or, simpler (though not really shorter):
std::vector<int> numbers;
std::copy(
std::istream_iterator<int>(istr),
std::istream_iterator<int>(),
std::back_inserter(numbers));
(Requires the standard headers <sstream>, <algorithm> and <iterator>.)

You can also opt for Boost tokenizer ......
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string str= "India, gold was dear";
char_separator<char> sep(", ");
tokenizer< char_separator<char> > tokens(str, sep);
BOOST_FOREACH(string t, tokens)
{
cout << t << "." << endl;
}
}

stringstream and boost::tokenizer are two possibilities. Here is a more explicit solution using string::find and string::substr.
std::list<std::string>
tokenize(
std::string const& str,
char const token[])
{
std::list<std::string> results;
std::string::size_type j = 0;
while (j < str.length())
{
std::string::size_type k = str.find(token, j);
if (k == std::string::npos)
k = str.length();
results.push_back(str.substr(j, k-j));
j = k + 1;
}
return results;
}
Hope this helps. You can easily turn this into an algorithm that writes the tokens to arbitrary containers or takes a function handle that processes the tokens.

Related

How to split string read from text file into array using c++

I want to split the strings on each line of my text file into an array, similar to the split() function in python. my desired syntax is a loop that enters every split-string into the next index of an array,
so for example if my string:
"ab,cd,ef,gh,ij"
, every time I encounter a comma then I would:
datafile >> arr1[i]
and my array would end up:
arr1 = [ab,cd,ef,gh,ij]
a mock code without reading a text file is provided below
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <string.h>
#include <string>
using namespace std;
int main(){
char str[] = "ab,cd,ef,gh,ij"; //" ex str in place of file contents/fstream sFile;"
const int NUM = 5;
string sArr[NUM];//empty array
char *token = strtok(str, ",");
for (int i=0; i < NUM; i++)
while((token!=NULL)){
("%s\n", token) >> sArr[i];
token = strtok(NULL, ",");
}
cout >> sArr;
return 0;
}
In C++ you can read a file line by line and directly get a std::string.
You will found below an example I made with a split() proposal as you requested, and a main() example of reading a file:
Example
data file:
ab,cd,ef,gh
ij,kl,mn
c++ code:
#include <fstream>
#include <iostream>
#include <vector>
std::vector<std::string> split(const std::string & s, char c);
int main()
{
std::string file_path("data.txt"); // I assumed you have that kind of file
std::ifstream in_s(file_path);
std::vector <std::vector<std::string>> content;
if(in_s)
{
std::string line;
std::vector <std::string> vec;
while(getline(in_s, line))
{
for(const std::string & str : split(line, ','))
vec.push_back(str);
content.push_back(vec);
vec.clear();
}
in_s.close();
}
else
std::cout << "Could not open: " + file_path << std::endl;
for(const std::vector<std::string> & str_vec : content)
{
for(unsigned int i = 0; i < str_vec.size(); ++i)
std::cout << str_vec[i] << ((i == str_vec.size()-1) ? ("") : (" : "));
std::cout << std::endl;
}
return 0;
}
std::vector<std::string> split(const std::string & s, char c)
{
std::vector<std::string> splitted;
std::string word;
for(char ch : s)
{
if((ch == c) && (!word.empty()))
{
splitted.push_back(word);
word.clear();
}
else
word += ch;
}
if(!word.empty())
splitted.push_back(word);
return splitted;
}
output:
ab : cd : ef : gh
ij : kl : mn
I hope it will help.
So, a few things to fix. Firstly, arrays and NUM are kind of limiting - you have to fix up NUM whenever you change the input string, so C++ provides std::vector which can resize itself to however many strings it finds. Secondly, you want to call strtok until it returns nullptr once, and you can do that with one loop. With both your for and NUM you call strtok too many times - even after it has returned nullptr. Next, to put the token into a std::string, you would assign using my_string = token; rather than ("%s\n", token) >> my_string - which is a broken mix of printf() formatting and C++ streaming notation. Lastly, to print the elements you've extracted, you can use another loop. All these changes are illustrated below.
char str[] = "ab,cd,ef,gh,ij";
std::vector<std::string> strings;
char* token = strtok(str, ",");
while ((token != nullptr))
{
strings.push_back(token);
token = strtok(NULL, ",");
}
for (const auto& s : strings)
cout >> s >> '\n';
Your code is overly complicated and wrong.
You probably want this:
#include <iostream>
#include <string>
#include <string.h>
using namespace std;
int main() {
char str[] = "ab,cd,ef,gh,ij"; //" ex str in place of file contents/fstream sFile;"
const int NUM = 5;
string sArr[NUM];//empty array
char *token = strtok(str, ",");
int max = 0;
while ((token != NULL)) {
sArr[max++] = token;
token = strtok(NULL, ",");
}
for (int i = 0; i < max; i++)
cout << sArr[i] << "\n";
return 0;
}
This code is still poor and no bound checking is done.
But anyway, you should rather do it the C++ way as suggested in the other answers.
Use boost::split
#include <boost/algorithm/string.hpp>
[...]
std::vector<std::string> strings;
std::string val("ab,cd,ef,gh,ij");
boost::split(strings, val, boost::is_any_of(","));
You could do something like this
std::string str = "ab,cd,ef,gh,ij";
std::vector<std::string> TokenList;
std::string::size_type lastPos = 0;
std::string::size_type pos = str.find_first_of(',', lastPos);
while(pos != std::string::npos)
{
std::string temp(str, lastPos, pos - lastPos);
TokenList.push_back(temp);
lastPos = pos + 1;
pos = str.find_first_of(',', lastPos);
}
if(lastPos != str.size())
{
std::string temp(str, lastPos, str.size());
TokenList.push_back(temp);
}
for(int i = 0; i < TokenList.size(); i++)
std::cout << TokenList.at(i) << std::endl;

istream_iterator deal with string with pipe [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 4 years ago.
If I have a std::string containing a comma-separated list of numbers, what's the simplest way to parse out the numbers and put them in an integer array?
I don't want to generalise this out into parsing anything else. Just a simple string of comma separated integer numbers such as "1,1,1,1,2,1,1,1,0".
Input one number at a time, and check whether the following character is ,. If so, discard it.
#include <vector>
#include <string>
#include <sstream>
#include <iostream>
int main()
{
std::string str = "1,2,3,4,5,6";
std::vector<int> vect;
std::stringstream ss(str);
for (int i; ss >> i;) {
vect.push_back(i);
if (ss.peek() == ',')
ss.ignore();
}
for (std::size_t i = 0; i < vect.size(); i++)
std::cout << vect[i] << std::endl;
}
Something less verbose, std and takes anything separated by a comma.
stringstream ss( "1,1,1,1, or something else ,1,1,1,0" );
vector<string> result;
while( ss.good() )
{
string substr;
getline( ss, substr, ',' );
result.push_back( substr );
}
Yet another, rather different, approach: use a special locale that treats commas as white space:
#include <locale>
#include <vector>
struct csv_reader: std::ctype<char> {
csv_reader(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc[','] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
rc[' '] = std::ctype_base::space;
return &rc[0];
}
};
To use this, you imbue() a stream with a locale that includes this facet. Once you've done that, you can read numbers as if the commas weren't there at all. Just for example, we'll read comma-delimited numbers from input, and write then out one-per line on standard output:
#include <algorithm>
#include <iterator>
#include <iostream>
int main() {
std::cin.imbue(std::locale(std::locale(), new csv_reader()));
std::copy(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, "\n"));
return 0;
}
The C++ String Toolkit Library (Strtk) has the following solution to your problem:
#include <string>
#include <deque>
#include <vector>
#include "strtk.hpp"
int main()
{
std::string int_string = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15";
std::vector<int> int_list;
strtk::parse(int_string,",",int_list);
std::string double_string = "123.456|789.012|345.678|901.234|567.890";
std::deque<double> double_list;
strtk::parse(double_string,"|",double_list);
return 0;
}
More examples can be found Here
Alternative solution using generic algorithms and Boost.Tokenizer:
struct ToInt
{
int operator()(string const &str) { return atoi(str.c_str()); }
};
string values = "1,2,3,4,5,9,8,7,6";
vector<int> ints;
tokenizer<> tok(values);
transform(tok.begin(), tok.end(), back_inserter(ints), ToInt());
Lots of pretty terrible answers here so I'll add mine (including test program):
#include <string>
#include <iostream>
#include <cstddef>
template<typename StringFunction>
void splitString(const std::string &str, char delimiter, StringFunction f) {
std::size_t from = 0;
for (std::size_t i = 0; i < str.size(); ++i) {
if (str[i] == delimiter) {
f(str, from, i);
from = i + 1;
}
}
if (from <= str.size())
f(str, from, str.size());
}
int main(int argc, char* argv[]) {
if (argc != 2)
return 1;
splitString(argv[1], ',', [](const std::string &s, std::size_t from, std::size_t to) {
std::cout << "`" << s.substr(from, to - from) << "`\n";
});
return 0;
}
Nice properties:
No dependencies (e.g. boost)
Not an insane one-liner
Easy to understand (I hope)
Handles spaces perfectly fine
Doesn't allocate splits if you don't want to, e.g. you can process them with a lambda as shown.
Doesn't add characters one at a time - should be fast.
If using C++17 you could change it to use a std::stringview and then it won't do any allocations and should be extremely fast.
Some design choices you may wish to change:
Empty entries are not ignored.
An empty string will call f() once.
Example inputs and outputs:
"" -> {""}
"," -> {"", ""}
"1," -> {"1", ""}
"1" -> {"1"}
" " -> {" "}
"1, 2," -> {"1", " 2", ""}
" ,, " -> {" ", "", " "}
You could also use the following function.
void tokenize(const string& str, vector<string>& tokens, const string& delimiters = ",")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first non-delimiter.
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters.
lastPos = str.find_first_not_of(delimiters, pos);
// Find next non-delimiter.
pos = str.find_first_of(delimiters, lastPos);
}
}
std::string input="1,1,1,1,2,1,1,1,0";
std::vector<long> output;
for(std::string::size_type p0=0,p1=input.find(',');
p1!=std::string::npos || p0!=std::string::npos;
(p0=(p1==std::string::npos)?p1:++p1),p1=input.find(',',p0) )
output.push_back( strtol(input.c_str()+p0,NULL,0) );
It would be a good idea to check for conversion errors in strtol(), of course. Maybe the code may benefit from some other error checks as well.
I'm surprised no one has proposed a solution using std::regex yet:
#include <string>
#include <algorithm>
#include <vector>
#include <regex>
void parse_csint( const std::string& str, std::vector<int>& result ) {
typedef std::regex_iterator<std::string::const_iterator> re_iterator;
typedef re_iterator::value_type re_iterated;
std::regex re("(\\d+)");
re_iterator rit( str.begin(), str.end(), re );
re_iterator rend;
std::transform( rit, rend, std::back_inserter(result),
[]( const re_iterated& it ){ return std::stoi(it[1]); } );
}
This function inserts all integers at the back of the input vector. You can tweak the regular expression to include negative integers, or floating point numbers, etc.
#include <sstream>
#include <vector>
const char *input = "1,1,1,1,2,1,1,1,0";
int main() {
std::stringstream ss(input);
std::vector<int> output;
int i;
while (ss >> i) {
output.push_back(i);
ss.ignore(1);
}
}
Bad input (for instance consecutive separators) will mess this up, but you did say simple.
string exp = "token1 token2 token3";
char delimiter = ' ';
vector<string> str;
string acc = "";
for(int i = 0; i < exp.size(); i++)
{
if(exp[i] == delimiter)
{
str.push_back(acc);
acc = "";
}
else
acc += exp[i];
}
bool GetList (const std::string& src, std::vector<int>& res)
{
using boost::lexical_cast;
using boost::bad_lexical_cast;
bool success = true;
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
boost::char_separator<char> sepa(",");
tokenizer tokens(src, sepa);
for (tokenizer::iterator tok_iter = tokens.begin();
tok_iter != tokens.end(); ++tok_iter) {
try {
res.push_back(lexical_cast<int>(*tok_iter));
}
catch (bad_lexical_cast &) {
success = false;
}
}
return success;
}
I cannot yet comment (getting started on the site) but added a more generic version of Jerry Coffin's fantastic ctype's derived class to his post.
Thanks Jerry for the super idea.
(Because it must be peer-reviewed, adding it here too temporarily)
struct SeparatorReader: std::ctype<char>
{
template<typename T>
SeparatorReader(const T &seps): std::ctype<char>(get_table(seps), true) {}
template<typename T>
std::ctype_base::mask const *get_table(const T &seps) {
auto &&rc = new std::ctype_base::mask[std::ctype<char>::table_size]();
for(auto &&sep: seps)
rc[static_cast<unsigned char>(sep)] = std::ctype_base::space;
return &rc[0];
}
};
This is the simplest way, which I used a lot. It works for any one-character delimiter.
#include<bits/stdc++.h>
using namespace std;
int main() {
string str;
cin >> str;
int temp;
vector<int> result;
char ch;
stringstream ss(str);
do
{
ss>>temp;
result.push_back(temp);
}while(ss>>ch);
for(int i=0 ; i < result.size() ; i++)
cout<<result[i]<<endl;
return 0;
}
simple structure, easily adaptable, easy maintenance.
std::string stringIn = "my,csv,,is 10233478,separated,by commas";
std::vector<std::string> commaSeparated(1);
int commaCounter = 0;
for (int i=0; i<stringIn.size(); i++) {
if (stringIn[i] == ",") {
commaSeparated.push_back("");
commaCounter++;
} else {
commaSeparated.at(commaCounter) += stringIn[i];
}
}
in the end you will have a vector of strings with every element in the sentence separated by spaces. empty strings are saved as separate items.
Simple Copy/Paste function, based on the boost tokenizer.
void strToIntArray(std::string string, int* array, int array_len) {
boost::tokenizer<> tok(string);
int i = 0;
for(boost::tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
if(i < array_len)
array[i] = atoi(beg->c_str());
i++;
}
void ExplodeString( const std::string& string, const char separator, std::list<int>& result ) {
if( string.size() ) {
std::string::const_iterator last = string.begin();
for( std::string::const_iterator i=string.begin(); i!=string.end(); ++i ) {
if( *i == separator ) {
const std::string str(last,i);
int id = atoi(str.c_str());
result.push_back(id);
last = i;
++ last;
}
}
if( last != string.end() ) result.push_back( atoi(&*last) );
}
}
#include <sstream>
#include <vector>
#include <algorithm>
#include <iterator>
const char *input = ",,29870,1,abc,2,1,1,1,0";
int main()
{
std::stringstream ss(input);
std::vector<int> output;
int i;
while ( !ss.eof() )
{
int c = ss.peek() ;
if ( c < '0' || c > '9' )
{
ss.ignore(1);
continue;
}
if (ss >> i)
{
output.push_back(i);
}
}
std::copy(output.begin(), output.end(), std::ostream_iterator<int> (std::cout, " ") );
return 0;
}

Counting occurrences of word in vector of characters

I have written a program to store a text file in vector of characters .
#include<iostream>
#include<fstream>
#include <algorithm>
#include<vector>
using namespace std;
int main()
{
vector<char> vec;
ifstream file("text.txt");
if(!file.eof() && !file.fail())
{
file.seekg(0, std::ios_base::end);
std::streampos fileSize = file.tellg();
vec.resize(fileSize);
file.seekg(0, std::ios_base::beg);
file.read(&vec[0], fileSize);
}
int c = count(vec.begin(), vec.end(), 'U');
cout << c;
return 0;
}
I want to count occurrence of "USER" in the text file , but using count i can only count number of characters . How can i count number of occurrences of "USER" in the vector of character?
For example
text.txt
USERABRUSER#$$* 34 USER ABC RR IERUSER
Then the count of "USER" is 4. Words can only be in uppercase.
std::string has a find member function that will find an occurrence of one string inside another. You can use that to count occurrences something like this:
size_t count(std::string const &haystack, std::string const &needle) {
auto occurrences = 0;
auto len = needle.size();
auto pos = 0;
while (std::string::npos != (pos = haystack.find(needle, pos))) {
++occurrences;
pos += len;
}
return occurrences;
}
For example:
int main() {
std::string input{ "USERABRUSER#$$* 34 USER ABC RR IERUSER" };
std::cout << count(input, "USER");
}
...produces an output of 4.
This is how I would do it:
#include <fstream>
#include <sstream>
#include <iostream>
#include <unordered_map>
#include <string>
using namespace std;
int main() {
unordered_map<string, size_t> data;
string line;
ifstream file("text.txt");
while (getline(file, line)) {
istringstream is(line);
string word;
while (is >> word) {
++data[word];
}
}
cout << data["USER"] << endl;
return 0;
}
Let's try again. Once again, a vector isn't necessary. This is what I would consider to be the most C++ idiomatic way. It uses std::string's find() method to repeatedly find the substring in order until the end of the string is reached.
#include <fstream>
#include <iostream>
#include <string>
int main() {
// Read entire file into a single string.
std::ifstream file_stream("text.txt");
std::string file_contents(std::istreambuf_iterator<char>(file_stream),
std::istreambuf_iterator<char>());
unsigned count = 0;
std::string substr = "USER";
for (size_t i = file_contents.find(substr); i != std::string::npos;
i = str.find(substr, i + substr.length())) {
++count;
}
}

How to put an element of a string to a vector?

I want to put specific elements of a string to a vector<string>.
To give you a better explanation of what i intend to do:
string str;
vector<string> in;
cin >> str; // input: abc
for(int i = 0;i < str.length();i++) {
in.push_back(&str[i]);
}
now i want the first element of vector<string> in to be "a" (in[0] = "a"), the second to be b etc.. i want to use strings for this. Is it possible to do it because when i print the vector it gives me that first it has abc then bc and in the end only c?
std::string has a constructor that takes a integral count and a single char to instantiate a string with n elements of the same value. You can use this and the fact that std::strings is much like a container of char:
for (auto c : str)
in.emplace_back(1ul, c);
Alternatively, you can store single char in the vector instead of std::string:
std::vector<char> in(str.begin(), str.end());
Either use std::vector<char> or use substr.
Example 1:
#include <string>
#include <iostream>
#include <vector>
int main()
{
std::string str;
std::vector<char> in;
std::cin >> str; // input: abc
for (std::string::size_type i = 0; i < str.length(); i++) {
in.push_back(str[i]);
}
}
Or a simpler variant:
int main()
{
std::string str;
std::cin >> str; // input: abc
std::vector<char> in(str.begin(), str.end());
}
Example 2:
#include <string>
#include <iostream>
#include <vector>
int main()
{
std::string str;
std::vector<std::string> in;
std::cin >> str; // input: abc
for (std::string::size_type i = 0; i < str.length(); i++) {
in.push_back(str.substr(i, 1));
}
}

C++ split string by line

I need to split string by line.
I used to do in the following way:
int doSegment(char *sentence, int segNum)
{
assert(pSegmenter != NULL);
Logger &log = Logger::getLogger();
char delims[] = "\n";
char *line = NULL;
if (sentence != NULL)
{
line = strtok(sentence, delims);
while(line != NULL)
{
cout << line << endl;
line = strtok(NULL, delims);
}
}
else
{
log.error("....");
}
return 0;
}
I input "we are one.\nyes we are." and invoke the doSegment method. But when i debugging, i found the sentence parameter is "we are one.\\nyes we are", and the split failed. Can somebody tell me why this happened and what should i do. Is there anyway else i can use to split string in C++. thanks !
I'd like to use std::getline or std::string::find to go through the string.
below code demonstrates getline function
int doSegment(char *sentence)
{
std::stringstream ss(sentence);
std::string to;
if (sentence != NULL)
{
while(std::getline(ss,to,'\n')){
cout << to <<endl;
}
}
return 0;
}
You can call std::string::find in a loop and the use std::string::substr.
std::vector<std::string> split_string(const std::string& str,
const std::string& delimiter)
{
std::vector<std::string> strings;
std::string::size_type pos = 0;
std::string::size_type prev = 0;
while ((pos = str.find(delimiter, prev)) != std::string::npos)
{
strings.push_back(str.substr(prev, pos - prev));
prev = pos + delimiter.size();
}
// To get the last substring (or only, if delimiter is not found)
strings.push_back(str.substr(prev));
return strings;
}
See example here.
#include <sstream>
#include <string>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string& str)
{
auto result = std::vector<std::string>{};
auto ss = std::stringstream{str};
for (std::string line; std::getline(ss, line, '\n');)
result.push_back(line);
return result;
}
#include <iostream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> splitter(string in_pattern, string& content){
vector<string> split_content;
regex pattern(in_pattern);
copy( sregex_token_iterator(content.begin(), content.end(), pattern, -1),
sregex_token_iterator(),back_inserter(split_content));
return split_content;
}
int main()
{
string sentence = "This is the first line\n";
sentence += "This is the second line\n";
sentence += "This is the third line\n";
vector<string> lines = splitter(R"(\n)", sentence);
for (string line: lines){cout << line << endl;}
}
We have a string with multiple lines
we split those into an array (vector)
We print out those elements in a for loop
Using the library range-v3:
#include <range/v3/all.hpp>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string_view str) {
return str | ranges::views::split('\n')
| ranges::to<std::vector<std::string>>();
}
Using C++23 ranges:
#include <ranges>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string_view str) {
return str | std::ranges::views::split('\n')
| std::ranges::to<std::vector<std::string>>();
}
This fairly inefficient way just loops through the string until it encounters an \n newline escape character. It then creates a substring and adds it to a vector.
std::vector<std::string> Loader::StringToLines(std::string string)
{
std::vector<std::string> result;
std::string temp;
int markbegin = 0;
int markend = 0;
for (int i = 0; i < string.length(); ++i) {
if (string[i] == '\n') {
markend = i;
result.push_back(string.substr(markbegin, markend - markbegin));
markbegin = (i + 1);
}
}
return result;
}