I am reading a piece of c++ code in which ids, curi and len are integer, content are string. I don't understand what's match_word() part. Is it a function or variable? I can't find its definition in all header files.
if(-1!=ids)
{
len = ids - curi;
string match_word(content, curi, len);
bool rejudge = false;
...
}
From std::string documentation
string match_word(content, curi, len);
This is/uses a substring constructor which
Copies the portion of content that begins at the character position curi and spans len characters (or until the end of content, if either content is too short or if len is string::npos).
So for example
std::string s = "Hello World";
string match_word(s, 2, 7);
std::cout<<match_word<<std::endl; //prints llo Wor
The above will print llo Wor
Now coming to your question:
Is this a function or variable definition in c++ code?
Basically this is a variable definition using the substring constructor. So in your case match_word is a variable of type std::string.
string match_word(content, curi, len);
match_word is a string. This is a declaration. Looks like you want to call this constructor:
string (const string& str, size_t pos, size_t len = npos);
Here is an example from cplusplus
#include <iostream>
#include <string>
int main()
{
std::string s0 ("Initial string");
std::string s3 (s0, 8, 3);
std::cout << s3;
}
This will print:
str
Related
As far as I can understand strtok() doesn't modify the underlying string, so why doesn't it not take a const char* pointer rather than a const char* pointer? Also while tokenizing you wouldn't want your string to change, right?
Updated:
https://godbolt.org/z/3SPvRB
It is clear that strtok() does modify the underlying string. What is the alternative for an non-mutating tokenizer?
But strtok DOES change the string.
Take the following code:
char sz[] = "The quick brown fox";
char* token = strtok(sz, " ");
It's going to alter the contents of the array into:
"The\0quick brown fox";
The first discovered delimiter gets replaced with a null char. Internally (via thread local storage or global variable), the pointer to the the next char past the discovered delimiter is stored such that a subsequent call to strtok(NULL, " ") will parse the next token from the original string.
It does modify the underlying string. See: http://www.cplusplus.com/reference/cstring/strtok/
This end of the token is automatically replaced by a null-character, and the beginning
of the token is returned by the function.
Proof:
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
/* note this line... */
printf ("str = \"%s\"\n",str);
return 0;
}
Prints:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
str = "- This"
Updated: https://godbolt.org/z/3SPvRB It is clear that strtok() does
modify the underlying string. What is the alternative for an
non-mutating tokenizer?
As mentioned in the comments, you can either:
make a copy of the original string and then tokenize the copy with strtok(); or
write your own implementation that brackets the tokens and copies the tokens to new storage:
using C strspn to scan forward to the first non-delimiter character which will be the beginning of the token, then use strcspn to scan forward to the next delimiter marking the end of the token,
do the same thing manually with a pair of pointers; or
for C++11 or later, you can use .find_first_not_of() to scan forward to the first non-delimiter character, and then .find_first_of() to locate the delimiter that follows the token.
In each case you will then copy the token characters to a new string (using memcpy for C-type implementation -- don't forget to nul-terminate) or for C++11 simply using the .substr() member function.
A very-basic C++11 implementation would look similar to:
std::vector<std::string> stringtok (const std::string& s, const std::string& delim)
{
std::vector<std::string> v {}; /* vector of strings for tokens */
size_t beg = 0, end = 0; /* begin and end positons in str */
/* while non-delimiter char found */
while ((beg = s.find_first_not_of (delim, end)) != std::string::npos) {
end = s.find_first_of (delim, beg); /* find delim after non-delim */
v.push_back (s.substr (beg, end - beg)); /* add substr to vector */
if (end == std::string::npos) /* if last delim, break */
break;
}
return v; /* return vector of tokens */
}
If you follow the logic, it tracks exactly what is described above the function definition. Combining it into a short example, you would have:
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> stringtok (const std::string& s, const std::string& delim)
{
std::vector<std::string> v {}; /* vector of strings for tokens */
size_t beg = 0, end = 0; /* begin and end positons in str */
/* while non-delimiter char found */
while ((beg = s.find_first_not_of (delim, end)) != std::string::npos) {
end = s.find_first_of (delim, beg); /* find delim after non-delim */
v.push_back (s.substr (beg, end - beg)); /* add substr to vector */
if (end == std::string::npos) /* if last delim, break */
break;
}
return v; /* return vector of tokens */
}
int main (void) {
std::string str = " my dog has fleas ",
delim = " ";
std::vector<std::string> tokens;
tokens = stringtok (str, delim);
std::cout << "string: '" << str << "'\ntokens:\n";
for (auto s : tokens)
std::cout << " " << s << '\n';
}
Example Use/Output
$ ./bin/stringtok
string: ' my dog has fleas '
tokens:
my
dog
has
fleas
Note: this is only one of many ways to implement a string tokenization that does not modify the original. Look things over and let me know if you have further questions.
I want to replace a string within a string with *, using this code to replace everything between he and ld in helloworld:
#include <string>
#include <iostream>
int main()
{
const std::string msg = "helloworld";
const std::string from = "he";
const std::string to = "ld";
std::string s = msg;
std::size_t startpos = s.find(from);
std::size_t endpos = s.find(to);
unsigned int l = endpos-startpos-2;
s.replace(startpos+2, endpos, l, '*');
std::cout << s;
}
The output I got is He*****, but I wanted and expected He*****ld.
What did I get wrong?
You are replacing all the characters after index two. Count the indexes and replace only the range you want.
Try this:
#include <iostream>
#include <string>
int main ()
{
//this one for replace
string str="Hello World";
// replace string added to this one
string str2=str;
// You can use string position.
str2.replace(2,6,"******");
cout << str2 << '\n';
return 0;
}
First parameter for starting character
Second parameter for ending character
And Third parameter for string
There is a several ways you can do this. This is a one simple method.
UPDATE(After added your code):
Change:
unsigned int l=endpos-startpos-2;
s.replace(startpos+2,endpos,l,'*');
To:
unsigned int l=endpos-3;
s.replace(startpos+2,l,l,'*');
Because your endpos store position of character d. You need to substract 3 by endpos then l variable value become 7. After that in replace() change second parameter to l.
read more about replace().
I am trying to make a function that takes a constant reference of a string as input and returns the string after each character of the string is rotated 1 place to the right. Using references and pointers still confuses me and I am not sure how to obtain the string from the constant reference.
string rotate(const string &str){
string *uno = &str;
string dos = rotate(uno.rbegin(), uno.rbegin() + 1, uno.rend());
return dos;}
This is what I have got so far but it does not compile. Any tips on how to properly get the string from the constant reference will be appreciated.
You can't perform the rotation in-place without violating the const contract on the parameter, so you should copy the input and return a new string:
string rotate(const string &str){
string uno = str;
rotate(uno.rbegin(), uno.rbegin() + 1, uno.rend());
return uno;
}
Another reasonable option would be to use std::rotate_copy
The line
string* uno = string &str;
makes no sense. I think you mean
string* uno = const_cast<string*>(&str);
You might consider this rotate:
// rotate last char to front
std::string rotate(const std::string& str)
{
return(str[str.size()-1] +
str.substr(0,str.size()-1));
}
// 'abcdefghijklmnopqrstuvwxyz'
// 'zabcdefghijklmnopqrstuvwxy'
You could pass in a string to receive the rotated string, thus avoiding return by value copy.
I passed the string in by pointer, as its clearer at the call site that it's intended to be altered, but it could easily be passed by reference if preferred.
#include <string>
#include <iostream>
#include <algorithm>
void rotate(std::string const& str, std::string* out)
{
*out = str;
std::rotate(out->rbegin(), out->rbegin() + 1, out->rend());
}
int main(int, char**)
{
std::string out;
std::string x = "1234567";
std::cout << x << '\n';
::rotate(x, &out);
std::cout << out << '\n';
}
I know the starting address of the string(e.g., char* buf) and the max length int l; of the string(i.e., total number of characters is less than or equal to l).
What is the simplest way to get the value of the string from the specified memory segment? In other words, how to implement string retrieveString(char* buf, int l);.
EDIT: The memory is reserved for writing and reading string of variable length. In other words, int l;indicates the size of the memory and not the length of the string.
std::string str(buffer, buffer + length);
Or, if the string already exists:
str.assign(buffer, buffer + length);
Edit: I'm still not completely sure I understand the question. But if it's something like what JoshG is suggesting, that you want up to length characters, or until a null terminator, whichever comes first, then you can use this:
std::string str(buffer, std::find(buffer, buffer + length, '\0'));
char *charPtr = "test string";
cout << charPtr << endl;
string str = charPtr;
cout << str << endl;
Use the string's constructor
basic_string(const charT* s,size_type n, const Allocator& a = Allocator());
EDIT:
OK, then if the C string length is not given explicitly, use the ctor:
basic_string(const charT* s, const Allocator& a = Allocator());
There seems to be a few details left out of your explanation, but I will do my best...
If these are NUL-terminated strings or the memory is pre-zeroed, you can just iterate down the length of the memory segment until you hit a NUL (0) character or the maximum length (whichever comes first). Use the string constructor, passing the buffer and the size determined in the previous step.
string retrieveString( char* buf, int max ) {
size_t len = 0;
while( (len < max) && (buf[ len ] != '\0') ) {
len++;
}
return string( buf, len );
}
If the above is not the case, I'm not sure how you determine where a string ends.
std::string str;
char* const s = "test";
str.assign(s);
string& assign (const char* s); => signature FYR
Reference/s here.
Let,
char* rw="hii"; //This string is readable and writeable
const char* r="hello"; // This string is only readable
we can convert char* or const char* to string with the help of string's constructor.
string string_name(parameter);
This parameter accepts both char* and const char* types .
Examples:
1) string st(rw);
Now string 'st', contains "hii"
2) string st(r);
Now, string 'st' contains "hello".
In both the examples, string 'st' is writable and readable.
I have a string that I would like to tokenize.
But the C strtok() function requires my string to be a char*.
How can I do this simply?
I tried:
token = strtok(str.c_str(), " ");
which fails because it turns it into a const char*, not a char*
#include <iostream>
#include <string>
#include <sstream>
int main(){
std::string myText("some-text-to-tokenize");
std::istringstream iss(myText);
std::string token;
while (std::getline(iss, token, '-'))
{
std::cout << token << std::endl;
}
return 0;
}
Or, as mentioned, use boost for more flexibility.
Duplicate the string, tokenize it, then free it.
char *dup = strdup(str.c_str());
token = strtok(dup, " ");
free(dup);
If boost is available on your system (I think it's standard on most Linux distros these days), it has a Tokenizer class you can use.
If not, then a quick Google turns up a hand-rolled tokenizer for std::string that you can probably just copy and paste. It's very short.
And, if you don't like either of those, then here's a split() function I wrote to make my life easier. It'll break a string into pieces using any of the chars in "delim" as separators. Pieces are appended to the "parts" vector:
void split(const string& str, const string& delim, vector<string>& parts) {
size_t start, end = 0;
while (end < str.size()) {
start = end;
while (start < str.size() && (delim.find(str[start]) != string::npos)) {
start++; // skip initial whitespace
}
end = start;
while (end < str.size() && (delim.find(str[end]) == string::npos)) {
end++; // skip to end of word
}
if (end-start != 0) { // just ignore zero-length strings.
parts.push_back(string(str, start, end-start));
}
}
}
There is a more elegant solution.
With std::string you can use resize() to allocate a suitably large buffer, and &s[0] to get a pointer to the internal buffer.
At this point many fine folks will jump and yell at the screen. But this is the fact. About 2 years ago
the library working group decided (meeting at Lillehammer) that just like for std::vector, std::string should also formally, not just in practice, have a guaranteed contiguous buffer.
The other concern is does strtok() increases the size of the string. The MSDN documentation says:
Each call to strtok modifies strToken by inserting a null character after the token returned by that call.
But this is not correct. Actually the function replaces the first occurrence of a separator character with \0. No change in the size of the string. If we have this string:
one-two---three--four
we will end up with
one\0two\0--three\0-four
So my solution is very simple:
std::string str("some-text-to-split");
char seps[] = "-";
char *token;
token = strtok( &str[0], seps );
while( token != NULL )
{
/* Do your thing */
token = strtok( NULL, seps );
}
Read the discussion on http://www.archivum.info/comp.lang.c++/2008-05/02889/does_std::string_have_something_like_CString::GetBuffer
With C++17 str::string receives data() overload that returns a pointer to modifieable buffer so string can be used in strtok directly without any hacks:
#include <string>
#include <iostream>
#include <cstring>
#include <cstdlib>
int main()
{
::std::string text{"pop dop rop"};
char const * const psz_delimiter{" "};
char * psz_token{::std::strtok(text.data(), psz_delimiter)};
while(nullptr != psz_token)
{
::std::cout << psz_token << ::std::endl;
psz_token = std::strtok(nullptr, psz_delimiter);
}
return EXIT_SUCCESS;
}
output
pop
dop
rop
EDIT: usage of const cast is only used to demonstrate the effect of strtok() when applied to a pointer returned by string::c_str().
You should not use
strtok() since it modifies the tokenized string which may lead to undesired, if not undefined, behaviour as the C string "belongs" to the string instance.
#include <string>
#include <iostream>
int main(int ac, char **av)
{
std::string theString("hello world");
std::cout << theString << " - " << theString.size() << std::endl;
//--- this cast *only* to illustrate the effect of strtok() on std::string
char *token = strtok(const_cast<char *>(theString.c_str()), " ");
std::cout << theString << " - " << theString.size() << std::endl;
return 0;
}
After the call to strtok(), the space was "removed" from the string, or turned down to a non-printable character, but the length remains unchanged.
>./a.out
hello world - 11
helloworld - 11
Therefore you have to resort to native mechanism, duplication of the string or an third party library as previously mentioned.
I suppose the language is C, or C++...
strtok, IIRC, replace separators with \0. That's what it cannot use a const string.
To workaround that "quickly", if the string isn't huge, you can just strdup() it. Which is wise if you need to keep the string unaltered (what the const suggest...).
On the other hand, you might want to use another tokenizer, perhaps hand rolled, less violent on the given argument.
Assuming that by "string" you're talking about std::string in C++, you might have a look at the Tokenizer package in Boost.
First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.
But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.
std::string data("The data I want to tokenize");
// Create a buffer of the correct length:
std::vector<char> buffer(data.size()+1);
// copy the string into the buffer
strcpy(&buffer[0],data.c_str());
// Tokenize
strtok(&buffer[0]," ");
If you don't mind open source, you could use the subbuffer and subparser classes from https://github.com/EdgeCast/json_parser. The original string is left intact, there is no allocation and no copying of data. I have not compiled the following so there may be errors.
std::string input_string("hello world");
subbuffer input(input_string);
subparser flds(input, ' ', subparser::SKIP_EMPTY);
while (!flds.empty())
{
subbuffer fld = flds.next();
// do something with fld
}
// or if you know it is only two fields
subbuffer fld1 = input.before(' ');
subbuffer fld2 = input.sub(fld1.length() + 1).ltrim(' ');
Typecasting to (char*) got it working for me!
token = strtok((char *)str.c_str(), " ");
Chris's answer is probably fine when using std::string; however in case you want to use std::basic_string<char16_t>, std::getline can't be used. Here is a possible other implementation:
template <class CharT> bool tokenizestring(const std::basic_string<CharT> &input, CharT separator, typename std::basic_string<CharT>::size_type &pos, std::basic_string<CharT> &token) {
if (pos >= input.length()) {
// if input is empty, or ends with a separator, return an empty token when the end has been reached (and return an out-of-bound position so subsequent call won't do it again)
if ((pos == 0) || ((pos > 0) && (pos == input.length()) && (input[pos-1] == separator))) {
token.clear();
pos=input.length()+1;
return true;
}
return false;
}
typename std::basic_string<CharT>::size_type separatorPos=input.find(separator, pos);
if (separatorPos == std::basic_string<CharT>::npos) {
token=input.substr(pos, input.length()-pos);
pos=input.length();
} else {
token=input.substr(pos, separatorPos-pos);
pos=separatorPos+1;
}
return true;
}
Then use it like this:
std::basic_string<char16_t> s;
std::basic_string<char16_t> token;
std::basic_string<char16_t>::size_type tokenPos=0;
while (tokenizestring(s, (char16_t)' ', tokenPos, token)) {
...
}
It fails because str.c_str() returns constant string but char * strtok (char * str, const char * delimiters ) requires volatile string. So you need to use *const_cast< char > inorder to make it voletile.
I am giving you a complete but small program to tokenize the string using C strtok() function.
#include <iostream>
#include <string>
#include <string.h>
using namespace std;
int main() {
string s="20#6 5, 3";
// strtok requires volatile string as it modifies the supplied string in order to tokenize it
char *str=const_cast< char *>(s.c_str());
char *tok;
tok=strtok(str, "#, " );
int arr[4], i=0;
while(tok!=NULL){
arr[i++]=stoi(tok);
tok=strtok(NULL, "#, " );
}
for(int i=0; i<4; i++) cout<<arr[i]<<endl;
return 0;
}
NOTE: strtok may not be suitable in all situation as the string passed to function gets modified by being broken into smaller strings. Pls., ref to get better understanding of strtok functionality.
How strtok works
Added few print statement to better understand the changes happning to string in each call to strtok and how it returns token.
#include <iostream>
#include <string>
#include <string.h>
using namespace std;
int main() {
string s="20#6 5, 3";
char *str=const_cast< char *>(s.c_str());
char *tok;
cout<<"string: "<<s<<endl;
tok=strtok(str, "#, " );
cout<<"String: "<<s<<"\tToken: "<<tok<<endl;
while(tok!=NULL){
tok=strtok(NULL, "#, " );
cout<<"String: "<<s<<"\t\tToken: "<<tok<<endl;
}
return 0;
}
Output:
string: 20#6 5, 3
String: 206 5, 3 Token: 20
String: 2065, 3 Token: 6
String: 2065 3 Token: 5
String: 2065 3 Token: 3
String: 2065 3 Token:
strtok iterate over the string first call find the non delemetor character (2 in this case) and marked it as token start then continues scan for a delimeter and replace it with null charater (# gets replaced in actual string) and return start which points to token start character( i.e., it return token 20 which is terminated by null). In subsequent call it start scaning from the next character and returns token if found else null. subsecuntly it returns token 6, 5, 3.