How to split the strings in vc++? - c++

I have a string "stack+ovrflow*newyork;" i have to split this stack,overflow,newyork
any idea??

First and foremost if available, I would always use boost::tokenizer for this kind of task (see and upvote the great answers below)
Without access to boost, you have a couple of options:
You can use C++ std::strings and parse them using a stringstream and getline (safest way)
std::string str = "stack+overflow*newyork;";
std::istringstream stream(str);
std::string tok1;
std::string tok2;
std::string tok3;
std::getline(stream, tok1, '+');
std::getline(stream, tok2, '*');
std::getline(stream, tok3, ';');
std::cout << tok1 << "," << tok2 << "," << tok3 << std::endl
Or you can use one of the strtok family of functions (see Naveen's answer for the unicode agnostic version; see xtofls comments below for warnings about thread safety), if you are comfortable with char pointers
char str[30];
strncpy(str, "stack+overflow*newyork;", 30);
// point to the delimeters
char* result1 = strtok(str, "+");
char* result2 = strtok(str, "*");
char* result3 = strtok(str, ";");
// replace these with commas
if (result1 != NULL)
{
*result1 = ',';
}
if (result2 != NULL)
{
*result2 = ',';
}
// output the result
printf(str);

Boost tokenizer
Simple like this:
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
std::string stringToTokenize= "stack+ovrflow*newyork;";
boost::char_separator<char> sep("+*;");
boost::tokenizer< boost::char_separator<char> > tok(stringToTokenize, sep);
std::vector<std::string> vectorWithTokenizedStrings;
vectorWithTokenizedStrings.assign(tok.begin(), tok.end());
Now vectorWithTokenizedStrings has the tokens you are looking for. Notice the boost::char_separator variable. It holds the separators between the tokens.

See boost tokenizer here.

You can use _tcstok to tokenize the string based on a delimiter.

This site has a string tokenising function that takes a string of characters to use as delimiters and returns a vector of strings.
Simple STL String Tokenizer Function

There is another way to split a string using c/c++ :
First define a function to split a string:
//pointers of the substrings, assume the number of fields will not be over 5
char *fields[5];
//str: the string to splitted
//splitter: the split charactor
//return the real number of fields or 0 if any error exits
int split(char* str, char *splitter)
{
if(NULL == str)
{
return 0;
}
int cnt;
fields[0] = str;
for(cnt = 1; (fields[cnt] = strstr(fields[cnt - 1], splitter)) != NULL &&
cnt < 5; cnt++)
{
*fields[cnt] = '\0';
++fields[cnt];
}
return cnt;
}
then you can use this function to split string as following:
char* str = "stack+ovrflow*newyork;"
split(str, "+");
printf("%s\n", fields[0]); //print "stack"
split(fields[1], "*");
printf("%s\n", fields[0]); //print "ovrflow"
split(fields[1], ";");
printf("%s\n", fields[0]); //print "newyork"
this way will be more efficient and reusable

Related

Strtok and Char* [duplicate]

This question already has answers here:
C's strtok() and read only string literals
(5 answers)
Closed 8 years ago.
I have a simple code where Iam trying to go through a char* and spit it into separate words. Here is the simple code I have.
#include <iostream>
#include <stdio.h>
int main ()
{
char * string1 = "- This is a test string";
char * character_pointer;
std::cout << "Splitting stringinto tokens:" << string1 << std::endl;
character_pointer = strtok (string1," ");
while (character_pointer != NULL)
{
printf ("%s\n", character_pointer);
character_pointer = strtok (NULL, " ");
}
return 0;
}
I am getting an error that will not allow me to do this.
So my question is, how do I go through and find each word in a char*. For my actual program I am working on, one of my libraries returns a paragraph of words as a const char* and I need to stem each word using a stemming algorithm (I know how to do this, I just do not know how to send each individual word to the stemmer). If someone could just solve how to get the example code to work, I will be able to figure it out. All of the examples online use a char[] for string1 instead of a char* and I cannot do that.
This is the simplest (codewise) way I know to split a string in c++:
std::string string1 = "- This is a test string";
std::string word;
std::istringstream iss(string1);
// by default this splits on any whitespace
while(iss >> word) {
std::cout << word << '\n';
}
or like this if you want to specify a delimiter.
while(std::getline(iss, word, ' ')) {
std::cout << word << '\n';
}
Here's a corrected version, try it out:
#include <iostream>
#include <stdio.h>
#include <cstring>
int main ()
{
char string1[] = "- This is a test string";
char * character_pointer;
std::cout << "Splitting stringinto tokens:" << string1 << std::endl;
character_pointer = strtok (string1," ");
while (character_pointer != NULL)
{
printf ("%s\n", character_pointer);
character_pointer = strtok (NULL, " ");
}
return 0;
}
There are different ways you could do this in C++.
If space is your delimited then you can get the tokens this way:
std::string text = "- This is a test string";
std::istringstream ss(text);
std::vector<std::string> tokens;
std::copy(std::istream_iterator<std::string>(ss),
std::istream_iterator<std::string>(),
std::back_inserter<std::vector<std::string>>(tokens));
You can also tokenize the string in C++ using regular expressions.
std::string text = "- This is a test string";
std::regex pattern("\\s+");
std::sregex_token_iterator it(std::begin(text), std::end(text), pattern, -1);
std::sregex_token_iterator end;
for(; it != end; ++it)
{
std::cout << it->str() << std::endl;
}
Forget about strtok. To get exactly what you seem to be
aiming for:
std::string const source = "- This is a test string";
std::vector<std::string> tokens;
std::string::const_iterator start = source.begin();
std::string::const_iterator end = source.end();
std::string::const_iterator next = std::find( start, end, ' ' );
while ( next != end ) {
tokens.push_back( std::string( start, next ) );
start = next + 1;
next = std::find( start, end, ' ' );
}
tokens.push_back( std::string( start, next ) );
Of course, this can be modified as much as you want: you can use
std::find_first_of is you want more than one separator, or
std::search if you want a multi-character separator, or even
std::find_if for an arbitrary test (with a lambda, if you have
C++11). And in most of the cases where you're parsing, you can
just pass around two iterators, rather than having to construct
a substring; you only need to construct a substring when you
want to save the extracted token somewhere.
Once you get used to using iterators and the standard
algorithms, you'll find it a lot more flexible than strtok,
and it doesn't have all of the drawbacks which the internal
state implies.

How to split character using c++ win32 Api?

i am using c++ win32 Api.
i want to split the character using delimiter.
that character like "CN=USERS,OU=Marketing,DC=RAM,DC=COM".
i want to split the charcter into after the first comma(,).that means i need only
OU=Marketing,DC=RAM,DC=COM.
i already tried strtok function,but it split CN=USERS only.
How can i achieve this?
Try below code, you should be able to get each item(separated by ',') easily:
strtok version:
char domain[] = "CN=USERS,OU=Marketing,DC=RAM,DC=COM";
char *token = std::strtok(domain, ",");
while (token != NULL) {
std::cout << token << '\n';
token = std::strtok(NULL, ",");
}
std::stringstream version:
std::stringstream ss("CN=USERS,OU=Marketing,DC=RAM,DC=COM");
std::string item;
while(std::getline(ss, item, ','))
{
cout << item << endl;
}
Have a look at std::getline()
http://en.cppreference.com/w/cpp/string/basic_string/getline
Using strchr makes it quite easy:
char domain[] = "CN=USERS,OU=Marketing,DC=RAM,DC=COM";
char *p = strchr(domain, ',');
if (p == NULL)
{
// error, no comma in the string
}
++p; // point to the character after the comma

C++, How to get multiple input divided by whitespace?

I have a program that need to get multiple cstrings. I current get one at a time and then ask if you want to input another word. I cannot find any simple way to get just one input with words divided be whitespace. i.e. "one two three" and save the the input in an array of cstrings.
typedef char cstring[20]; cstring myWords[50];
At the moment I am trying to use getline and save the input to a cstring and then I am trying to use the string.h library to manipulate it. Is that the right approach? How else could this be done?
If you really have to use c-style strings, you could use istream::getline, strtok and strcpy functions:
typedef char cstring[20]; // are you sure that 20 chars will be enough?
cstring myWords[50];
char line[2048]; // what's the max length of line?
std::cin.getline(line, 2048);
int i = 0;
char* nextWord = strtok(line, " \t\r\n");
while (nextWord != NULL)
{
strcpy(myWords[i++], nextWord);
nextWord = strtok(NULL, " \t\r\n");
}
But much better would be to use std::string, std::getline, std::istringstream and >> operator instead:
using namespace std;
vector<string> myWords;
string line;
if (getline(cin, line))
{
istringstream is(line);
string word;
while (is >> word)
myWords.push_back(word);
}
std::vector<std::string> strings;
for (int i = 0; i < MAX_STRINGS && !cin.eof(); i++) {
std::string str;
std::cin >> str;
if (str.size())
strings.push_back(str);
}

CString Parsing Carriage Returns

Let's say I have a string that has multiple carriage returns in it, i.e:
394968686
100630382
395950966
335666021
I'm still pretty amateur hour with C++, would anyone be willing to show me how you go about: parsing through each "line" in the string ? So I can do something with it later (add the desired line to a list). I'm guessing using Find("\n") in a loop?
Thanks guys.
while (!str.IsEmpty())
{
CString one_line = str.SpanExcluding(_T("\r\n"));
// do something with one_line
str = str.Right(str.GetLength() - one_line.GetLength()).TrimLeft(_T("\r\n"));
}
Blank lines will be eliminated with this code, but that's easily corrected if necessary.
You could try it using stringstream. Notice that you can overload the getline method to use any delimeter you want.
string line;
stringstream ss;
ss << yourstring;
while ( getline(ss, line, '\n') )
{
cout << line << endl;
}
Alternatively you could use the boost library's tokenizer class.
You can use stringstream class in C++.
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
int main()
{
string str = "\
394968686\
100630382\
395950966\
335666021";
stringstream ss(str);
vector<string> v;
string token;
// get line by line
while (ss >> token)
{
// insert current line into a std::vector
v.push_back(token);
// print out current line
cout << token << endl;
}
}
Output of the program above:
394968686
100630382
395950966
335666021
Note that no whitespace will be included in the parsed token, with the use of operator>>. Please refer to comments below.
If your string is stored in a c-style char* or std::string then you can simply search for \n.
std::string s;
size_t pos = s.find('\n');
You can use string::substr() to get the substring and store it in a list. Pseudo code,
std::string s = " .... ";
for(size_t pos, begin = 0;
string::npos != (pos = s.find('\n'));
begin = ++ pos)
{
list.push_back(s.substr(begin, pos));
}

C++ split string

I am trying to split a string using spaces as a delimiter. I would like to store each token in an array or vector.
I have tried.
string tempInput;
cin >> tempInput;
string input[5];
stringstream ss(tempInput); // Insert the string into a stream
int i=0;
while (ss >> tempInput){
input[i] = tempInput;
i++;
}
The problem is that if i input "this is a test", the array only seems to store input[0] = "this". It does not contain values for input[2] through input[4].
I have also tried using a vector but with the same result.
Go to the duplicate questions to learn how to split a string into words, but your method is actually correct. The actual problem lies in how you are reading the input before trying to split it:
string tempInput;
cin >> tempInput; // !!!
When you use the cin >> tempInput, you are only getting the first word from the input, not the whole text. There are two possible ways of working your way out of that, the simplest of which is forgetting about the stringstream and directly iterating on input:
std::string tempInput;
std::vector< std::string > tokens;
while ( std::cin >> tempInput ) {
tokens.push_back( tempInput );
}
// alternatively, including algorithm and iterator headers:
std::vector< std::string > tokens;
std::copy( std::istream_iterator<std::string>( std::cin ),
std::istream_iterator<std::string>(),
std::back_inserter(tokens) );
This approach will give you all the tokens in the input in a single vector. If you need to work with each line separatedly then you should use getline from the <string> header instead of the cin >> tempInput:
std::string tempInput;
while ( getline( std::cin, tempInput ) ) { // read line
// tokenize the line, possibly with your own code or
// any answer in the 'duplicate' question
}
Notice that it’s much easier just to use copy:
vector<string> tokens;
copy(istream_iterator<string>(cin),
istream_iterator<string>(),
back_inserter(tokens));
As for why your code doesn’t work: you’re reusing tempInput. Don’t do that. Furthermore, you’re first reading a single word from cin, not the whole string. That’s why only a single word is put into the stringstream.
The easiest way: Boost.Tokenizer
std::vector<std::string> tokens;
std::string s = "This is, a test";
boost::tokenizer<> tok(s);
for(boost::tokenizer<>::iterator it=tok.begin(); it != tok.end(); ++it)
{
tokens.push_back(*it);
}
// tokens is ["This", "is", "a", "test"]
You can parameter the delimiters and escape sequences to only take spaces if you wish, by default it tokenize on both spaces and punctuation.
Here a little algorithm where it splits the string into a list just like python does.
std::list<std::string> split(std::string text, std::string split_word) {
std::list<std::string> list;
std::string word = "";
int is_word_over = 0;
for (int i = 0; i <= text.length(); i++) {
if (i <= text.length() - split_word.length()) {
if (text.substr(i, split_word.length()) == split_word) {
list.insert(list.end(), word);
word = "";
is_word_over = 1;
}
//now we want that it jumps the rest of the split character
else if (is_word_over >= 1) {
if (is_word_over != split_word.length()) {
is_word_over += 1;
continue;
}
else {
word += text[i];
is_word_over = 0;
}
}
else {
word += text[i];
}
}
else {
word += text[i];
}
}
list.insert(list.end(), word);
return list;
}
There probably exists a more optimal way to write this.