This question already has answers here:
convert a char* to std::string
(13 answers)
Closed 6 years ago.
What is the correct/best/simplest way to convert a c-style string to a std::string.
The conversion should accept a max_length, and terminate the string at the first \0 char, if this occur before max_length charter.
This page on string::string gives two potential constructors that would do what you want:
string ( const char * s, size_t n );
string ( const string& str, size_t pos, size_t n = npos );
Example:
#include<cstdlib>
#include<cstring>
#include<string>
#include<iostream>
using namespace std;
int main(){
char* p= (char*)calloc(30, sizeof(char));
strcpy(p, "Hello world");
string s(p, 15);
cout << s.size() << ":[" << s << "]" << endl;
string t(p, 0, 15);
cout << t.size() << ":[" << t << "]" << endl;
free(p);
return 0;
}
Output:
15:[Hello world]
11:[Hello world]
The first form considers p to be a simple array, and so will create (in our case) a string of length 15, which however prints as a 11-character null-terminated string with cout << .... Probably not what you're looking for.
The second form will implicitly convert the char* to a string, and then keep the maximum between its length and the n you specify. I think this is the simplest solution, in terms of what you have to write.
std::string str(c_str, strnlen(c_str, max_length));
At Christian Rau's request:
strnlen is specified in POSIX.1-2008 and available in GNU's glibc and the Microsoft run-time library. It is not yet found in some other systems; you may fall back to Gnulib's substitute.
std::string the_string(c_string);
if(the_string.size() > max_length)
the_string.resize(max_length);
This is actually trickier than it looks, because you can't call strlen
unless the string is actually nul terminated. In fact, without some
additional constraints, the problem practically requires inventing a new
function, a version of strlen which never goes beyond the a certain
length. However:
If the buffer containing the c-style string is guaranteed to be at least
max_length char's (although perhaps with a '\0' before the end),
then you can use the address-length constructor of std::string, and
trim afterwards:
std::string result( c_string, max_length );
result.erase( std::find( result.begin(), result.end(), '\0' ), result.end() );
and if you know that c_string is a nul terminated string (but perhaps
longer than max_length, you can use strlen:
std::string result( c_string, std::min( strlen( c_string ), max_length ) );
There is a constructor accepting two pointer parameters, so the code is simply
std::string cppstr(cstr, cstr + min(max_length, strlen(cstr)));
this is also going to be as efficient as std::string cppstr(cstr) if the length is smaller than max_length.
What you want is this constructor:
std::string ( const string& str, size_t pos, size_t n = npos ), passing pos as 0. Your const char* c-style string will get implicitly cast to const string for the first parameter.
const char *c_style = "012abd";
std::string cpp_style = std::string(c_style, 0, 10);
UPDATE: removed the "new" from the cpp_style initialization
Related
Today I gave a test there was the following question written and because I am new to C++, I became confused to the following question.
Why doesn't the following statement work?
char str[ ] = "Hello" ;
strcat ( str, '!' ) ;
char str[] = "Hello";
strcat (str, '!') ;
strcat second argument must be a pointer to a string, but you are passing a character constant.
The correct call would be strcat(str, "!"); (note the " instead of the ') but you also need to reserve enough space in str which is only large enough to hold the "Hello" string. For example, for your test, you can reserve more bytes with char str[64] = "Hello";
strcat() calls for pointer for both arguments.
'!' will converted to an (invalid for many chance) pointer by implementation-defined manner, then the program may crash for Segmentation Fault.
Note that
char str[ ] = "Hello" ;
strcat ( str, "!" ) ;
won't work well either due to lack of buffer.
char str[ ] = "Hello" ;
strcat ( str, '!' ) ;
^^^ --- this is of type char
strcat signature is :
char * strcat ( char * destination, const char * source );
^^^^^^^^^^^^^^^^^^^
so, second parameter is of type const char* and not char. You must pass either string literal or variable of type const char*. Actually string literals are of type const char[] but they decay to const char* when being assigned to.
strcat function expects two strings.'!' is a character.
in order to concatenate safely,your array must be big enough to hold the other string,so change '!' to "!",and str[] to str[8] or more.
int main(void)
{
char str[20] = "Hello" ;
strcat ( str, "!" ) ;
printf("%s\n",str);
}
Two problems:
The immediate issue is that the second parameter must be a const char*. strcat reads the memory starting at that location, until the \0 is reached. If no \0 can be reached then the function behaviour is undefined.
It's up to you to make sure that str is large enough to receive the concatenated string. If not, then the behaviour is undefined.
On the second point, you can write something like char str[100] = "Hello"; That will reserve 100 bytes of memory, populating the first 6 elements with Hello\0.
Why did you decide that these statements
char str[ ] = "Hello" ;
strcat ( str, '!' ) ;
do not work?:)
All depends on how the function strcat is defined.
For example it can be defined the following way
char * strcat( char *s, char c )
{
size_t n = std::strlen( s );
if ( n ) s[n-1] = c;
return s;
}
Or the following way
char * strcat( char *s, char c )
{
size_t n = std::strlen( s );
char *t = new char[n + 2];
std::strcpy( t, s );
t[ n - 2 ] = c;
t[ n - 1 ] = '\0';
return t;
}
If strcat in your example is the standard C function std::strcat then again it does not mean that the statements will not work. It means that the program with such statements will not compile because the second argument has a wrong type.
But if you specify a correct value for the second argument like this
strcat ( str, "!" ) ;
that is using a string literal instead of the character literal then indeed the statements will not work as you are expecting because array str does not have enough space to append string literal "!".
The array should be defined at least like
char str[7] = "Hello" ;
^^^
and the function should be called like
strcat ( str, "!" ) ;
^^^
After trying for about 1 hour, my code didn't work because of this:
void s_s(string const& s, char data[10])
{
for (int i = 0; i < 10; i++)
data[i] = s[i];
}
int main()
{
string ss = "1234567890";
char data[10];
s_s("1234567890", data);
cout << data << endl;//why junk
}
I simply don't understand why the cout displays junk after the char array. Can someone please explain why and how to solve it?
You need to null terminate your char array.
std::cout.operator<<(char*) uses \0 to know where to stop.
Your char[] decays to char* by the way.
Look here.
As already mentioned you want to NUL terminate your array, but here's something else to consider:
If s is your source string, then you want to loop to s.size(), so that you don't loop past the size of your source string.
void s_s(std::string const& s, char data[20])
{
for (unsigned int i = 0; i < s.size(); i++)
data[i] = s[i];
data[s.size()] = '\0';
}
Alternatively, you can try this:
std::copy(ss.begin(), ss.begin()+ss.size(),
data);
data[ss.size()] = '\0';
std::cout << data << std::endl;
You have ONLY allocated 10 bytes for data
The string is actually 11 bytes since there is an implied '\0' at the end
At a minimum you should increase the size of data to 11, and change your loop to copy the '\0' as well
The function std::ostream::operator<< that you are trying to use in the last line of the main will take your char array as a pointer and will print every char until the null sentinel character is found (the character is \0).
This sentinel character is generally generated for you in statements where a C-string literal is defined:
char s[] = "123";
In the above example sizeof(s) is 4 because the actual characters stored are:
'1', '2', '3', '\0'
The last character is fundamental in tasks that require to loop on every char of a const char* string, because the condition for the loop to terminate, is that the \0 must be read.
In your example the "junk" that you see are the bytes following the 0 char byte in the memory (interpreted as char). This behavior is clearly undefined and can potentially lead the program to crash.
One solution is to obviously add the \0 char at the end of the char array (of course fixing the size).
The best solution, though, is to never use const char* for strings at all. You are correctly using std::string in your example, which will prevent this kind of problems and many others.
If you ever need a const char* (for C APIs for example) you can always use std::string::c_str and retrieve the C string version of the std::string.
Your example could be rewritten to:
int main(int, char*[]) {
std::string ss = "1234567890";
const char* data = ss.c_str();
std::cout << data << std::endl;
}
(in this particular instance, a version of std::ostream::operator<< that takes a std::string is already defined, so you don't even need data at all)
I know the starting address of the string(e.g., char* buf) and the max length int l; of the string(i.e., total number of characters is less than or equal to l).
What is the simplest way to get the value of the string from the specified memory segment? In other words, how to implement string retrieveString(char* buf, int l);.
EDIT: The memory is reserved for writing and reading string of variable length. In other words, int l;indicates the size of the memory and not the length of the string.
std::string str(buffer, buffer + length);
Or, if the string already exists:
str.assign(buffer, buffer + length);
Edit: I'm still not completely sure I understand the question. But if it's something like what JoshG is suggesting, that you want up to length characters, or until a null terminator, whichever comes first, then you can use this:
std::string str(buffer, std::find(buffer, buffer + length, '\0'));
char *charPtr = "test string";
cout << charPtr << endl;
string str = charPtr;
cout << str << endl;
Use the string's constructor
basic_string(const charT* s,size_type n, const Allocator& a = Allocator());
EDIT:
OK, then if the C string length is not given explicitly, use the ctor:
basic_string(const charT* s, const Allocator& a = Allocator());
There seems to be a few details left out of your explanation, but I will do my best...
If these are NUL-terminated strings or the memory is pre-zeroed, you can just iterate down the length of the memory segment until you hit a NUL (0) character or the maximum length (whichever comes first). Use the string constructor, passing the buffer and the size determined in the previous step.
string retrieveString( char* buf, int max ) {
size_t len = 0;
while( (len < max) && (buf[ len ] != '\0') ) {
len++;
}
return string( buf, len );
}
If the above is not the case, I'm not sure how you determine where a string ends.
std::string str;
char* const s = "test";
str.assign(s);
string& assign (const char* s); => signature FYR
Reference/s here.
Let,
char* rw="hii"; //This string is readable and writeable
const char* r="hello"; // This string is only readable
we can convert char* or const char* to string with the help of string's constructor.
string string_name(parameter);
This parameter accepts both char* and const char* types .
Examples:
1) string st(rw);
Now string 'st', contains "hii"
2) string st(r);
Now, string 'st' contains "hello".
In both the examples, string 'st' is writable and readable.
I am relatively new to C++. Recent assignments have required that I convert a multitude of char buffers (from structures/sockets, etc.) to strings. I have been using variations on the following but they seem awkward. Is there a better way to do this kind of thing?
#include <iostream>
#include <string>
using std::string;
using std::cout;
using std::endl;
char* bufferToCString(char *buff, int buffSize, char *str)
{
memset(str, '\0', buffSize + 1);
return(strncpy(str, buff, buffSize));
}
string& bufferToString(char* buffer, int bufflen, string& str)
{
char temp[bufflen];
memset(temp, '\0', bufflen + 1);
strncpy(temp, buffer, bufflen);
return(str.assign(temp));
}
int main(int argc, char *argv[])
{
char buff[4] = {'a', 'b', 'c', 'd'};
char str[5];
string str2;
cout << bufferToCString(buff, sizeof(buff), str) << endl;
cout << bufferToString(buff, sizeof(buff), str2) << endl;
}
Given your input strings are not null terminated, you shouldn't use str... functions. You also can't use the popularly used std::string constructors. However, you can use this constructor:
std::string str(buffer, buflen): it takes a char* and a length. (actually const char* and length)
I would avoid the C string version. This would give:
std::string bufferToString(char* buffer, int bufflen)
{
std::string ret(buffer, bufflen);
return ret;
}
If you really must use the C-string version, either drop a 0 at the bufflen position (if you can) or create a buffer of bufflen+1, then memcpy the buffer into it, and drop a 0 at the end (bufflen position).
If the data buffer may have null ('\0') characters in it, you don't want to use the null-terminated operations.
You can either use the constructor that takes char*, length.
char buff[4] = {'a', 'b', 'c', 'd'};
cout << std::string(&buff[0], 4);
Or you can use the constructor that takes a range:
cout << std::string(&buff[0], &buff[4]); // end is last plus one
Do NOT use the std::string(buff) constructor with the buff[] array above, because it is not null-terminated.
std::string to const char*:
my_str.c_str();
char* to std::string:
string my_str1 ("test");
char test[] = "test";
string my_str2 (test);
or even
string my_str3 = "test";
The method needs to know the size of the string. You have to either:
in case of char* pass the length to
method
in case of char* pointing to null
terminating array of characters you can
use everything up to null
character
for char[] you can use templates to
figure out the size of the char[]
1) example - for cases where you're passing the bufflen:
std::string bufferToString(char* buffer, int bufflen)
{
return std::string(buffer, bufflen);
}
2) example - for cases where buffer is points to null terminated array of characters:
std::string bufferToString(char* buffer)
{
return std::string(buffer);
}
3) example - for cases where you pass char[]:
template <typename T, size_t N>
std::string tostr(T (&array)[N])
{
return std::string(array, N);
}
Usage:
char tstr[] = "Test String";
std::string res = tostr(tstr);
std::cout << res << std::endl;
For the first 2 cases you don't actually have to create new method:
std::string(buffer, bufflen);
std::string(buffer);
std::string buf2str(const char* buffer)
{
return std::string(buffer);
}
Or just
std::string mystring(buffer);
Use string constructor that takes the size:
string ( const char * s, size_t n );
Content is initialized to a copy of the string formed by the first n
characters in the array of characters
pointed by s.
cout << std::string(buff, sizeof(buff)) << endl;
http://www.cplusplus.com/reference/string/string/string/
Non-null-terminated buffer to C string:
memcpy(str, buff, buffSize);
str[bufSize] = 0; // not buffSize+1, because C indexes are 0-based.
string value (reinterpret_cast(buffer), length);
I have a string that I would like to tokenize.
But the C strtok() function requires my string to be a char*.
How can I do this simply?
I tried:
token = strtok(str.c_str(), " ");
which fails because it turns it into a const char*, not a char*
#include <iostream>
#include <string>
#include <sstream>
int main(){
std::string myText("some-text-to-tokenize");
std::istringstream iss(myText);
std::string token;
while (std::getline(iss, token, '-'))
{
std::cout << token << std::endl;
}
return 0;
}
Or, as mentioned, use boost for more flexibility.
Duplicate the string, tokenize it, then free it.
char *dup = strdup(str.c_str());
token = strtok(dup, " ");
free(dup);
If boost is available on your system (I think it's standard on most Linux distros these days), it has a Tokenizer class you can use.
If not, then a quick Google turns up a hand-rolled tokenizer for std::string that you can probably just copy and paste. It's very short.
And, if you don't like either of those, then here's a split() function I wrote to make my life easier. It'll break a string into pieces using any of the chars in "delim" as separators. Pieces are appended to the "parts" vector:
void split(const string& str, const string& delim, vector<string>& parts) {
size_t start, end = 0;
while (end < str.size()) {
start = end;
while (start < str.size() && (delim.find(str[start]) != string::npos)) {
start++; // skip initial whitespace
}
end = start;
while (end < str.size() && (delim.find(str[end]) == string::npos)) {
end++; // skip to end of word
}
if (end-start != 0) { // just ignore zero-length strings.
parts.push_back(string(str, start, end-start));
}
}
}
There is a more elegant solution.
With std::string you can use resize() to allocate a suitably large buffer, and &s[0] to get a pointer to the internal buffer.
At this point many fine folks will jump and yell at the screen. But this is the fact. About 2 years ago
the library working group decided (meeting at Lillehammer) that just like for std::vector, std::string should also formally, not just in practice, have a guaranteed contiguous buffer.
The other concern is does strtok() increases the size of the string. The MSDN documentation says:
Each call to strtok modifies strToken by inserting a null character after the token returned by that call.
But this is not correct. Actually the function replaces the first occurrence of a separator character with \0. No change in the size of the string. If we have this string:
one-two---three--four
we will end up with
one\0two\0--three\0-four
So my solution is very simple:
std::string str("some-text-to-split");
char seps[] = "-";
char *token;
token = strtok( &str[0], seps );
while( token != NULL )
{
/* Do your thing */
token = strtok( NULL, seps );
}
Read the discussion on http://www.archivum.info/comp.lang.c++/2008-05/02889/does_std::string_have_something_like_CString::GetBuffer
With C++17 str::string receives data() overload that returns a pointer to modifieable buffer so string can be used in strtok directly without any hacks:
#include <string>
#include <iostream>
#include <cstring>
#include <cstdlib>
int main()
{
::std::string text{"pop dop rop"};
char const * const psz_delimiter{" "};
char * psz_token{::std::strtok(text.data(), psz_delimiter)};
while(nullptr != psz_token)
{
::std::cout << psz_token << ::std::endl;
psz_token = std::strtok(nullptr, psz_delimiter);
}
return EXIT_SUCCESS;
}
output
pop
dop
rop
EDIT: usage of const cast is only used to demonstrate the effect of strtok() when applied to a pointer returned by string::c_str().
You should not use
strtok() since it modifies the tokenized string which may lead to undesired, if not undefined, behaviour as the C string "belongs" to the string instance.
#include <string>
#include <iostream>
int main(int ac, char **av)
{
std::string theString("hello world");
std::cout << theString << " - " << theString.size() << std::endl;
//--- this cast *only* to illustrate the effect of strtok() on std::string
char *token = strtok(const_cast<char *>(theString.c_str()), " ");
std::cout << theString << " - " << theString.size() << std::endl;
return 0;
}
After the call to strtok(), the space was "removed" from the string, or turned down to a non-printable character, but the length remains unchanged.
>./a.out
hello world - 11
helloworld - 11
Therefore you have to resort to native mechanism, duplication of the string or an third party library as previously mentioned.
I suppose the language is C, or C++...
strtok, IIRC, replace separators with \0. That's what it cannot use a const string.
To workaround that "quickly", if the string isn't huge, you can just strdup() it. Which is wise if you need to keep the string unaltered (what the const suggest...).
On the other hand, you might want to use another tokenizer, perhaps hand rolled, less violent on the given argument.
Assuming that by "string" you're talking about std::string in C++, you might have a look at the Tokenizer package in Boost.
First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.
But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.
std::string data("The data I want to tokenize");
// Create a buffer of the correct length:
std::vector<char> buffer(data.size()+1);
// copy the string into the buffer
strcpy(&buffer[0],data.c_str());
// Tokenize
strtok(&buffer[0]," ");
If you don't mind open source, you could use the subbuffer and subparser classes from https://github.com/EdgeCast/json_parser. The original string is left intact, there is no allocation and no copying of data. I have not compiled the following so there may be errors.
std::string input_string("hello world");
subbuffer input(input_string);
subparser flds(input, ' ', subparser::SKIP_EMPTY);
while (!flds.empty())
{
subbuffer fld = flds.next();
// do something with fld
}
// or if you know it is only two fields
subbuffer fld1 = input.before(' ');
subbuffer fld2 = input.sub(fld1.length() + 1).ltrim(' ');
Typecasting to (char*) got it working for me!
token = strtok((char *)str.c_str(), " ");
Chris's answer is probably fine when using std::string; however in case you want to use std::basic_string<char16_t>, std::getline can't be used. Here is a possible other implementation:
template <class CharT> bool tokenizestring(const std::basic_string<CharT> &input, CharT separator, typename std::basic_string<CharT>::size_type &pos, std::basic_string<CharT> &token) {
if (pos >= input.length()) {
// if input is empty, or ends with a separator, return an empty token when the end has been reached (and return an out-of-bound position so subsequent call won't do it again)
if ((pos == 0) || ((pos > 0) && (pos == input.length()) && (input[pos-1] == separator))) {
token.clear();
pos=input.length()+1;
return true;
}
return false;
}
typename std::basic_string<CharT>::size_type separatorPos=input.find(separator, pos);
if (separatorPos == std::basic_string<CharT>::npos) {
token=input.substr(pos, input.length()-pos);
pos=input.length();
} else {
token=input.substr(pos, separatorPos-pos);
pos=separatorPos+1;
}
return true;
}
Then use it like this:
std::basic_string<char16_t> s;
std::basic_string<char16_t> token;
std::basic_string<char16_t>::size_type tokenPos=0;
while (tokenizestring(s, (char16_t)' ', tokenPos, token)) {
...
}
It fails because str.c_str() returns constant string but char * strtok (char * str, const char * delimiters ) requires volatile string. So you need to use *const_cast< char > inorder to make it voletile.
I am giving you a complete but small program to tokenize the string using C strtok() function.
#include <iostream>
#include <string>
#include <string.h>
using namespace std;
int main() {
string s="20#6 5, 3";
// strtok requires volatile string as it modifies the supplied string in order to tokenize it
char *str=const_cast< char *>(s.c_str());
char *tok;
tok=strtok(str, "#, " );
int arr[4], i=0;
while(tok!=NULL){
arr[i++]=stoi(tok);
tok=strtok(NULL, "#, " );
}
for(int i=0; i<4; i++) cout<<arr[i]<<endl;
return 0;
}
NOTE: strtok may not be suitable in all situation as the string passed to function gets modified by being broken into smaller strings. Pls., ref to get better understanding of strtok functionality.
How strtok works
Added few print statement to better understand the changes happning to string in each call to strtok and how it returns token.
#include <iostream>
#include <string>
#include <string.h>
using namespace std;
int main() {
string s="20#6 5, 3";
char *str=const_cast< char *>(s.c_str());
char *tok;
cout<<"string: "<<s<<endl;
tok=strtok(str, "#, " );
cout<<"String: "<<s<<"\tToken: "<<tok<<endl;
while(tok!=NULL){
tok=strtok(NULL, "#, " );
cout<<"String: "<<s<<"\t\tToken: "<<tok<<endl;
}
return 0;
}
Output:
string: 20#6 5, 3
String: 206 5, 3 Token: 20
String: 2065, 3 Token: 6
String: 2065 3 Token: 5
String: 2065 3 Token: 3
String: 2065 3 Token:
strtok iterate over the string first call find the non delemetor character (2 in this case) and marked it as token start then continues scan for a delimeter and replace it with null charater (# gets replaced in actual string) and return start which points to token start character( i.e., it return token 20 which is terminated by null). In subsequent call it start scaning from the next character and returns token if found else null. subsecuntly it returns token 6, 5, 3.