This question already has answers here:
C's strtok() and read only string literals
(5 answers)
Closed 8 years ago.
I have a simple code where Iam trying to go through a char* and spit it into separate words. Here is the simple code I have.
#include <iostream>
#include <stdio.h>
int main ()
{
char * string1 = "- This is a test string";
char * character_pointer;
std::cout << "Splitting stringinto tokens:" << string1 << std::endl;
character_pointer = strtok (string1," ");
while (character_pointer != NULL)
{
printf ("%s\n", character_pointer);
character_pointer = strtok (NULL, " ");
}
return 0;
}
I am getting an error that will not allow me to do this.
So my question is, how do I go through and find each word in a char*. For my actual program I am working on, one of my libraries returns a paragraph of words as a const char* and I need to stem each word using a stemming algorithm (I know how to do this, I just do not know how to send each individual word to the stemmer). If someone could just solve how to get the example code to work, I will be able to figure it out. All of the examples online use a char[] for string1 instead of a char* and I cannot do that.
This is the simplest (codewise) way I know to split a string in c++:
std::string string1 = "- This is a test string";
std::string word;
std::istringstream iss(string1);
// by default this splits on any whitespace
while(iss >> word) {
std::cout << word << '\n';
}
or like this if you want to specify a delimiter.
while(std::getline(iss, word, ' ')) {
std::cout << word << '\n';
}
Here's a corrected version, try it out:
#include <iostream>
#include <stdio.h>
#include <cstring>
int main ()
{
char string1[] = "- This is a test string";
char * character_pointer;
std::cout << "Splitting stringinto tokens:" << string1 << std::endl;
character_pointer = strtok (string1," ");
while (character_pointer != NULL)
{
printf ("%s\n", character_pointer);
character_pointer = strtok (NULL, " ");
}
return 0;
}
There are different ways you could do this in C++.
If space is your delimited then you can get the tokens this way:
std::string text = "- This is a test string";
std::istringstream ss(text);
std::vector<std::string> tokens;
std::copy(std::istream_iterator<std::string>(ss),
std::istream_iterator<std::string>(),
std::back_inserter<std::vector<std::string>>(tokens));
You can also tokenize the string in C++ using regular expressions.
std::string text = "- This is a test string";
std::regex pattern("\\s+");
std::sregex_token_iterator it(std::begin(text), std::end(text), pattern, -1);
std::sregex_token_iterator end;
for(; it != end; ++it)
{
std::cout << it->str() << std::endl;
}
Forget about strtok. To get exactly what you seem to be
aiming for:
std::string const source = "- This is a test string";
std::vector<std::string> tokens;
std::string::const_iterator start = source.begin();
std::string::const_iterator end = source.end();
std::string::const_iterator next = std::find( start, end, ' ' );
while ( next != end ) {
tokens.push_back( std::string( start, next ) );
start = next + 1;
next = std::find( start, end, ' ' );
}
tokens.push_back( std::string( start, next ) );
Of course, this can be modified as much as you want: you can use
std::find_first_of is you want more than one separator, or
std::search if you want a multi-character separator, or even
std::find_if for an arbitrary test (with a lambda, if you have
C++11). And in most of the cases where you're parsing, you can
just pass around two iterators, rather than having to construct
a substring; you only need to construct a substring when you
want to save the extracted token somewhere.
Once you get used to using iterators and the standard
algorithms, you'll find it a lot more flexible than strtok,
and it doesn't have all of the drawbacks which the internal
state implies.
Related
I'm working on a program that takes user input two times. The user will input a sentence, and then enter a key phrase to remove each found match throughout the sentence, and return what's left. The code I have below for the 'logic' is...
char delim[AFFIX];
strcpy(delim, userInput); //storing the phrase user wants to remove
char *token = strtok(sentence, delim);
while (token)
{
cout << token << endl;
token = strtok(NULL, delim);
}
The issue I'm having is that strtok removes every single instance of a character found. For example, if the user wishes to remove all instances of 'pre' then the word "preformed" would turn into "fo m d" instead of formed. Is there a way to restrict strtok from removing EVERY instance of a character found, and only remove the series of characters?
I understand that working with strings or vectors would make my life much easier, but I want to work with arrays of chars. I'm really sorry if this isn't clear enough, I'm very new to c++. Any advice on how to approach this problem would be greatly appreciated. Thank you for your time.
A easy and more or less dirty solution would be to just add spaces before and after your search word. To ensure it's actually a full word and not part of a word.
Eg:
Input: "pre"
Check: " pre "
To ensure you also cut the first word you would need to check the first part of the string and add a space before it if the first word is also the inputted word.
Example:
#include <string>
#include <iostream>
void removeSubstrs(std::string& s, std::string& p) {
std::string::size_type n = p.length();
for (std::string::size_type i = s.find(p);
i != std::string::npos;
i = s.find(p))
s.erase(i, n);
}
void check(std::string& s, std::string& p) {
int slen = s.length();
int plen = p.length();
std::string firstpart = s.substr (0,plen);
std::string lastpart = s.substr (slen-plen,slen);
if (firstpart == p) {
s = " " + s;
}
if (lastpart == p) {
s += " ";
}
}
int main ()
{
std::string str = "pre test inppre test pre";
std::string search = "pre";
std::string pattern = " " + search + " ";
check(str, search);
removeSubstrs(str, pattern);
std::cout << str << std::endl;
return 0;
}
strtok isn't the right choice here, you can think of strtok's 2nd argument as a list of delimiters (each char in that list is a delimiter) and it breaks the string into tokens using those delimiters.
A better solution would be something along the lines of strstr and strcat.
maybe something like this (using MSVC++)
void removeSubstr(char *string, const char *sub, int n) {
char *match;
int len = strlen(sub);
while ((match = strstr(string, sub))) {
*match = '\0';
strcat_s(string, n, match + len);
}
}
int main() {
char test[] = "abcxyz123xyz";
removeSubstr(test, "xyz", sizeof(test) / sizeof(char));
cout << test << endl;
}
strcat_s is a more secure version of strcat (MSVC++ likes it way more than strcat :p)
There are better ways of doing this ofc but I'm restricting myself to char arrays here as you requested...
Suppose i have a string as below
input = " \\PATH\MYFILES This is my sting "
output = MYFILES
from RHS when first char '\' is found get the word (ie MYFILES) and erase the rest.
Below is my approach i tired but its bad because there is a Runtime error as ABORTED TERMINATED WITH A CORE.
Please suggest cleanest and/or shortest way to get only a single word (ie MYFILES ) from the above string?
I have searching and try it from last two days but no luck .please help
Note: The input string in above example is not hardcoded as it ought to be .The string contain changes dynamically but char '\' available for sure.
std::regex const r{R"~(.*[^\\]\\([^\\])+).*)~"} ;
std::string s(R"(" //PATH//MYFILES This is my sting "));
std::smatch m;
int main()
{
if(std::regex_match(s,m,r))
{
std::cout<<m[1]<<endl;
}
}
}
To erase the part of a string, you have to find where is that part begins and ends. Finding somethig inside an std::string is very easy because the class have six buit-in methods for this (std::string::find_first_of, std::string::find_last_of, etc.). Here is a small example of how your problem can be solved:
#include <iostream>
#include <string>
int main() {
std::string input { " \\PATH\\MYFILES This is my sting " };
auto pos = input.find_last_of('\\');
if(pos != std::string::npos) {
input.erase(0, pos + 1);
pos = input.find_first_of(' ');
if(pos != std::string::npos)
input.erase(pos);
}
std::cout << input << std::endl;
}
Note: watch out for escape sequences, a single backslash is written as "\\" inside a string literal.
I have a string str ( "1 + 2 = 3" ). I want to obtain the individual numbers of the string in their decimal values( not ASCII ). I have tried atoi and c_str(). But both them require the entire string to consist of only numbers. I am writing my code in C++.
Any help would be great.
My challenge is to evaluate a prefix expression. I am reading from a file where each line contains a prefix expression. My code snippet to tokenize and and store the variables is as shown below. Each line of the file contains numbers and operators(+,-,*) which are separated by a space.
Ex - line = ( * + 2 3 4);
ifstream file;
string line;
file.open(argv[1]);
while(!file.eof())
{
getline(file,line);
if(line.length()==0)
continue;
else
{
vector<int> vec;
string delimiters = " ";
size_t current;
size_t next = -1;
do
{
current = next + 1;
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
vec.push_back(atoi((line.substr( current, next - current )).c_str()));
}while (next != string::npos);
cout << vec[0] << endl;
}
}
file.close();
In this case vec[0] prints 50 not 2.
You need to learn to delimit a string. Your delimiting characters would be mathematical operators (ie:
C: creating array of strings from delimited source string
http://www.gnu.org/software/libc/manual/html_node/Finding-Tokens-in-a-String.html
In the case of the second link, you would do something like:
const char delimiters[] = "+-=";
With this knowledge, you can create an array of strings, and call atoi() on each string to get the numeric equivalent. Then you can use the address (array index) of each delimiter to determine which operator is there.
For just things like addition and subtraction, this will be dead simple. If you want order of operations and multiplication, parentheses, etc, your process flow logic will be more complicated.
For a more in-depth example, please see this final link. A simple command-line calculator in C. That should make it crystal clear.
http://stevehanov.ca/blog/index.php?id=26
You will not fall into your if, since your next position will be at a delimiter.
string delimiters = " ";
...
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
...
Since your delimiters consist of " ", then line[next] will be a space character.
From the description of your problem, you are missing code that will save away your operators. There is no code to attempt to find the operators.
You don't have to assume ASCII for testing for a digit. You can use is_digit() for example, or you can compare against '9' and '0'.
When you print your vector element, you may be accessing the vector inappropriately, because no item may have ever been inserted into the array.
Don't use fin.eof() to control a loop. That function is only useful after a read has failed.
There are a number of ways to get ints from a std::string, I'm choosing std::stoi() from the C++11 standard in this case.
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
typedef std::vector<int> ints;
bool is_known_operator(std::string const& token)
{
static char const* tokens[] = {"*", "/", "+", "-"};
return std::find(std::begin(tokens), std::end(tokens), token) != std::end(tokens);
}
ints tokenise(std::string const& line)
{
ints vec;
std::string token;
std::istringstream iss(line);
while (iss >> token)
{
if (is_known_operator(token))
{
std::cout << "Handle operator [" << token << "]" << std::endl;
}
else
{
try
{
auto number = std::stoi(token);
vec.push_back(number);
}
catch (const std::invalid_argument&)
{
std::cerr << "Unexpected item in the bagging area ["
<< token << "]" << std::endl;
}
}
}
return vec;
}
int main(int, const char *argv[])
{
std::ifstream file(argv[1]);
std::string line;
ints vec;
while (std::getline(file, line))
{
vec = tokenise(line);
}
std::cout << "The following " << vec.size() << " numbers were read:\n";
std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, "\n"));
}
This question already has answers here:
Removing leading and trailing spaces from a string
(26 answers)
Closed 7 years ago.
Hi I was wondering what is the shortest way to make string str1 seem equal to string str2
str1 = "Front Space";
str2 = " Front Space";
/*Here is where I'm missing some code to make the strings equal*/
if (str1.compare(str2) == 0) { // They match
cout << "success!!!!" << endl; // This is the output I want
}
All I need it for str1 to equal str2
How can I do it?
I have made multiple attempts but they all don't seem to work correctly. I think it's because of the number of characters in the string, ie: str1 has less characters than str2.
for (int i = 1; i <= str1.length() + 1; i++){
str1[str1.length() - i ] = str1[str1.length() - (i + 1)];
}
Any help appreciated
If you can use Boost, trim functions are available in boost/algorithm/string.hpp
str1 = "Front Space";
str2 = " Front Space";
boost::trim_left( str2 ); // removes leading whitespace
if( str1 == str2 ) {
// ...
}
Similarly, there's trim that removes both leading and trailing whitespace. And all these functions have *_copy counterparts that return a trimmed string instead of modifying the original.
If you cannot use Boost, it's not hard to create your own trim_left function.
#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
void trim_left( std::string& s )
{
auto it = s.begin(), ite = s.end();
while( ( it != ite ) && std::isspace( *it ) ) {
++it;
}
s.erase( s.begin(), it );
}
int main()
{
std::string s1( "Hello, World" ), s2( " \n\tHello, World" );
trim_left( s1 ); trim_left( s2 );
std::cout << s1 << std::endl;
std::cout << s2 << std::endl;
}
Output:
Hello, World
Hello, World
As others have said, you can use boost. If you don't want to use boost, or you can't (maybe because it's homework), it is easy to make an ltrim function.
string ltrim(string str)
{
string new_str;
size_t index = 0;
while (index < str.size())
{
if (isspace(str[index]))
index++;
else
break;
}
if (index < str.size())
new_str = str.substr(index);
return new_str;
}
LLVM also has some trim member functions for their StringRef class. This works without modifying your string and without making copies, in case that's important to you.
llvm::StringRef ref1(str1), ref2(str2);
ref1.ltrim();
ref2.ltrim();
if (ref1 == ref2) {
// match
}
I have a string "stack+ovrflow*newyork;" i have to split this stack,overflow,newyork
any idea??
First and foremost if available, I would always use boost::tokenizer for this kind of task (see and upvote the great answers below)
Without access to boost, you have a couple of options:
You can use C++ std::strings and parse them using a stringstream and getline (safest way)
std::string str = "stack+overflow*newyork;";
std::istringstream stream(str);
std::string tok1;
std::string tok2;
std::string tok3;
std::getline(stream, tok1, '+');
std::getline(stream, tok2, '*');
std::getline(stream, tok3, ';');
std::cout << tok1 << "," << tok2 << "," << tok3 << std::endl
Or you can use one of the strtok family of functions (see Naveen's answer for the unicode agnostic version; see xtofls comments below for warnings about thread safety), if you are comfortable with char pointers
char str[30];
strncpy(str, "stack+overflow*newyork;", 30);
// point to the delimeters
char* result1 = strtok(str, "+");
char* result2 = strtok(str, "*");
char* result3 = strtok(str, ";");
// replace these with commas
if (result1 != NULL)
{
*result1 = ',';
}
if (result2 != NULL)
{
*result2 = ',';
}
// output the result
printf(str);
Boost tokenizer
Simple like this:
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
std::string stringToTokenize= "stack+ovrflow*newyork;";
boost::char_separator<char> sep("+*;");
boost::tokenizer< boost::char_separator<char> > tok(stringToTokenize, sep);
std::vector<std::string> vectorWithTokenizedStrings;
vectorWithTokenizedStrings.assign(tok.begin(), tok.end());
Now vectorWithTokenizedStrings has the tokens you are looking for. Notice the boost::char_separator variable. It holds the separators between the tokens.
See boost tokenizer here.
You can use _tcstok to tokenize the string based on a delimiter.
This site has a string tokenising function that takes a string of characters to use as delimiters and returns a vector of strings.
Simple STL String Tokenizer Function
There is another way to split a string using c/c++ :
First define a function to split a string:
//pointers of the substrings, assume the number of fields will not be over 5
char *fields[5];
//str: the string to splitted
//splitter: the split charactor
//return the real number of fields or 0 if any error exits
int split(char* str, char *splitter)
{
if(NULL == str)
{
return 0;
}
int cnt;
fields[0] = str;
for(cnt = 1; (fields[cnt] = strstr(fields[cnt - 1], splitter)) != NULL &&
cnt < 5; cnt++)
{
*fields[cnt] = '\0';
++fields[cnt];
}
return cnt;
}
then you can use this function to split string as following:
char* str = "stack+ovrflow*newyork;"
split(str, "+");
printf("%s\n", fields[0]); //print "stack"
split(fields[1], "*");
printf("%s\n", fields[0]); //print "ovrflow"
split(fields[1], ";");
printf("%s\n", fields[0]); //print "newyork"
this way will be more efficient and reusable