I have a method that takes std::string_view and uses function, which takes null terminated string as parameter. For example:
void stringFunc(std::experimental::string_view str) {
some_c_library_func(/* Expects null terminated string */);
}
The question is, what is the proper way to handle this situation? Is str.to_string().c_str() the only option? And I really want to use std::string_view in this method, because I pass different types of strings in it.
I solved this problem by creating an alternate string_view class called zstring_view. It's privately inherited from string_view and contains much of its interface.
The principal difference is that zstring_view cannot be created from a string_view. Also, any string_view APIs that would remove elements from the end are not part of the interface or they return a string_view instead of a zstring_view.
They can be created from any NUL-terminated string source: std::string and so forth. I even created special user-defined literal suffixes for them: _zsv.
The idea being that, so long as you don't put a non-NUL-terminated string into zstring_view manually, all zstring_views should be NUL-terminated. Like std::string, the NUL character is not part of the size of the string, but it is there.
I find it very useful for dealing with C interfacing.
You cannot alter a string through std::string_view. Therefore you cannot add a terminating '\0' character. Hence you need to copy the string somewhere else to add a '\0'-terminator. You could avoid heap allocations by putting the string on the stack, if it's short enough. If you know, that the std::string_view is part of a null-terminated string, then you may check, if the character past the end is a '\0' character and avoid the copy in that case. Other than that, I don't see much more room for optimizations.
You certainly shouldn't call data on std::experimental::string_view:
Unlike basic_string::data() and string literals, data() may return a
pointer to a buffer that is not null-terminated.
So call to_string and c_str on that:
void stringFunc(std::experimental::string_view str) {
some_c_library_func(str.to_string().c_str());
}
or:
void stringFunc(std::experimental::string_view str) {
std::string real_str(str);
some_c_library_func(real_str.c_str());
}
In some cases C-stype functions have overloads, which accept the length of string as a separate argument.
E.g. instead of using strcasesmp() it worth to switch to strncasecmp().
For sure, in this particular case, it would require additional logic implemented in case when strings are not equal, but the first n characters are equal.
But it could be good alternative to writing custom class for string views.
Related
How to initialize a string data type variable by null character('\0')???
string txt='\0';
is it right or wrong?
This is wrong, because there is no constructor of std::string which takes char. If you want to construct an std::string which has a null-terminator in it (for whatever reasons you might have) you should use different constructor:
std::string txt(1, '\0');
Please note, this is the answer which takes question on it's face value. You probably do not need this at all, but I do not know that for sure.
First of all understand that to make std::string::c_str() work, std::string always allocates space for terminating zero, so most probably default constructor will do the job for you.
On other hand if you need string which length is not zero and contains zero inside there are two approaches.
use std::string txt(1, '\0'); as #SergeyA sucgested in other ansewear. (this will work with old C++03)
use string literal suffix operators, which is more handy IMO.
.
using namespace std::literals::string_literals;
auto s2 = "\0"s;
Here is live example.
I have a method that takes std::string_view and uses function, which takes null terminated string as parameter. For example:
void stringFunc(std::experimental::string_view str) {
some_c_library_func(/* Expects null terminated string */);
}
The question is, what is the proper way to handle this situation? Is str.to_string().c_str() the only option? And I really want to use std::string_view in this method, because I pass different types of strings in it.
I solved this problem by creating an alternate string_view class called zstring_view. It's privately inherited from string_view and contains much of its interface.
The principal difference is that zstring_view cannot be created from a string_view. Also, any string_view APIs that would remove elements from the end are not part of the interface or they return a string_view instead of a zstring_view.
They can be created from any NUL-terminated string source: std::string and so forth. I even created special user-defined literal suffixes for them: _zsv.
The idea being that, so long as you don't put a non-NUL-terminated string into zstring_view manually, all zstring_views should be NUL-terminated. Like std::string, the NUL character is not part of the size of the string, but it is there.
I find it very useful for dealing with C interfacing.
You cannot alter a string through std::string_view. Therefore you cannot add a terminating '\0' character. Hence you need to copy the string somewhere else to add a '\0'-terminator. You could avoid heap allocations by putting the string on the stack, if it's short enough. If you know, that the std::string_view is part of a null-terminated string, then you may check, if the character past the end is a '\0' character and avoid the copy in that case. Other than that, I don't see much more room for optimizations.
You certainly shouldn't call data on std::experimental::string_view:
Unlike basic_string::data() and string literals, data() may return a
pointer to a buffer that is not null-terminated.
So call to_string and c_str on that:
void stringFunc(std::experimental::string_view str) {
some_c_library_func(str.to_string().c_str());
}
or:
void stringFunc(std::experimental::string_view str) {
std::string real_str(str);
some_c_library_func(real_str.c_str());
}
In some cases C-stype functions have overloads, which accept the length of string as a separate argument.
E.g. instead of using strcasesmp() it worth to switch to strncasecmp().
For sure, in this particular case, it would require additional logic implemented in case when strings are not equal, but the first n characters are equal.
But it could be good alternative to writing custom class for string views.
There is a function I want to use that takes char str[] as a parameter. I want to call the function giving a string input.
void someFunction (char str[]) {
/* ... */
}
// Works.
someFunction("1010101");
// Does not work.
string someString;
someFunction(someString);
How can I get the second call to work?
EDIT: I cannot change the function's input parameters.
Depends on the nature of the string manipulations. If you read but don't write the string, change the prototype to const char str[] and use someString.c_str(), like others are suggesting.
If you change the characters but not the length of the string, use &*someString.begin().
If you extend/truncate the string, it's easier to pass a string& and work in terms of the string object. Less trouble, honestly.
You should be able to do:
someFunction(const_cast<char*>(someString.c_str()));
Although I'm not sure what will happen if str gets modified.
It's probably best if you just modify the original function to take a different parameter type.
What you want for std::string is void someFunction(std::string& str);
There's a reason for the issue -- a std::string's data is not guaranteed to be contiguous memory (at least, before C++11). Therefore, manipulating its buffer as a contiguous allocation (char[]) is a very bad idea.
casting away the const of std::string::c_str() is also a bad idea. One immediate problem you may face is that a std::string implementation may share backing string allocations with other std::string instances (copy-on-write), and you will end up modifying the values of other std::strings. Of course, there are many other bad things that could go wrong in their own implementation-defined ways -- the standard left this very flexible for the implementors of standard libraries.
EDIT: I cannot change the function's input parameters.
Use a std::vector instead.
You could have your function take a std::string instead:
void someFunction (std::string &str) {
I have just done what appears to be a common newbie mistake:
First we read one of many tutorials that goes like this:
#include <fstream>
int main() {
using namespace std;
ifstream inf("file.txt");
// (...)
}
Secondly, we try to use something similar in our code, which goes something like this:
#include <fstream>
int main() {
using namespace std;
std::string file = "file.txt"; // Or get the name of the file
// from a function that returns std::string.
ifstream inf(file);
// (...)
}
Thirdly, the newbie developer is perplexed by some cryptic compiler error message.
The problem is that ifstream takes const * char as a constructor argument.
The solution is to convert std::string to const * char.
Now, the real problem is that, for a newbie, "file.txt" or similar examples given in almost all the tutorials very much looks like a std::string.
So, is "my text" a std::string, a c-string or a *char, or does it depend on the context?
Can you provide examples on how "my text" would be interpreted differently according to context?
[Edit: I thought the example above would have made it obvious, but I should have been more explicit nonetheless: what I mean is the type of any string enclosed within double quotes, i.e. "myfilename.txt", not the meaning of the word 'string'.]
Thanks.
So, is "string" a std::string, a c-string or a *char, or does it depend on the context?
Neither C nor C++ have a built-in string data type, so any double-quoted strings in your code are essentially const char * (or const char [] to be exact). "C string" usually refers to this, specifically a character array with a null terminator.
In C++, std::string is a convenience class that wraps a raw string into an object. By using this, you can avoid having to do (messy) pointer arithmetic and memory reallocations by yourself.
Most standard library functions still take only char * (or const char *) parameters.
You can implicitly convert a char * into std::string because the latter has a constructor to do that.
You must explicitly convert a std::string into a const char * by using the c_str() method.
Thanks to Clark Gaebel for pointing out constness, and jalf and GMan for mentioning that it is actually an array.
"myString" is a string literal, and has the type const char[9], an array of 9 constant char. Note that it has enough space for the null terminator. So "Hi" is a const char[3], and so forth.
This is pretty much always true, with no ambiguity. However, whenever necessary, a const char[9] will decay into a const char* that points to its first element. And std::string has an implicit constructor that accepts a const char*. So while it always starts as an array of char, it can become the other types if you need it to.
Note that string literals have the unique property that const char[N] can also decay into char*, but this behavior is deprecated. If you try to modify the underlying string this way, you end up with undefined behavior. Its just not a good idea.
std::string file = "file.txt";
The right hand side of the = contains a (raw) string literal (i.a. a null-terminated byte string). Its effective type is array of const char.
The = is a tricky pony here: No assignment happens. The std::string class has a constructor that takes a pointer to char as an argument and this is called to create a temporary std::string and this is used to copy-construct (using the copy ctor of std::string) the object file of type std::string.
The compiler is free to elide the copy ctor and directly instantiate file though.
However, note that std:string is not the same thing as a C-style null-terminated string. It is not even required to be null-terminated.
ifstream inf("file.txt");
The std::ifstream class has a ctor that takes a const char * and the string literal passed to it decays to a pointer to the first element of the string.
The thing to remember is this: std::string provides (almost seamless) conversion from C-style strings. You have to look up the signature of the function to see if you are passing in a const char * or a std::string (the latter because of implicit conversions).
So, is "string" a std::string, a c-string or a char*, or does it depend on the context?
It depends entirely on the context. :-) Welcome to C++.
A C string is a null-terminated string, which is almost always the same thing as a char*.
Depending on the platforms and frameworks you are using, there might be even more meanings of the word "string" (for example, it is also used to refer to QString in Qt or CString in MFC).
The C++ standard library provides a std::string class to manage and represent character sequences. It encapsulates the memory management and is most of the time implemented as a C-string; but that is an implementation detail. It also provides manipulation routines for common tasks.
The std::string type will always be that (it doesn't have a conversion operator to char* for example, that's why you have the c_str() method), but it can be initialized or assigned to by a C-string (char*).
On the other hand, if you have a function that takes a std::string or a const std::string& as a parameter, you can pass a c-string (char*) to that function and the compiler will construct a std::string in-place for you. That would be a differing interpretation according to context as you put it.
Neither C nor C++ have a built-in string data type.
When the compiler finds, during the compilation, a double-quoted strings is implicitly referred (see the code below), the string itself is stored in program code/text and generates code to create even character array:
The array is created in static storage because it must persist to be referred later.
The array is made to constant because it must always contain the original data (Hello).
So at last, what you have is const char * to this constant static character array.
const char* v()
{
char* text = “Hello”;
return text;
// Above code can be reduced to:
// return “Hello”;
}
During the program run, when the control finds opening bracket, it creates “text”, the char* pointer, in the stack and constant array of 6 elements (including the null terminator ‘\0’ at the end) in static memory area. When control finds next line (char* text = “Hello”;), the starting address of the 6 element array is assigned to “text”. In next line (return text;), it returns “text”. With the closing bracket “text” will disappear from the stack, but array is still in the static memory area.
You need not to make return type const. But if you try to change the value in static array using non constant char* it will still give you an error during the run time because the array is constant. So, it’s always good to make return constant to make sure, it cannot be referred by non constant pointer.
But if the compiler finds a double-quoted strings is explicitly referred as an array, the compiler assumes that the programmer is going to (smartly) handle it. See the following wrong example:
const char* v()
{
char text[] = “Hello”;
return text;
}
During the compilation, compiler checks, quoted text and save it as it is in the code to fill the generated array during the runt time. Also, it calculate the array size, in this case again as 6.
During the program run, with the open bracket, the array “text[]” with 6 elements is created in stack. But no initialization. When the code finds (char text[] = “Hello”;), the array is initialized (with the text in compiled code). So array is now on the stack. When the compiler finds (return text;), it returns the starting address of the array “text”. When the compiler find the closing bracket, the array disappears from the stack. So no way to refer it by the return pointer.
Most standard library functions still take only char * (or const char *) parameters.
The Standard C++ library has a powerful class called string for manipulating text. The internal data structure for string is character arrays. The Standard C++ string class is designed to take care of (and hide) all the low-level manipulations of character arrays that were previously required of the C programmer. Note that std::string is a class:
You can implicitly convert a char * into std::string because the
latter has a constructor to do that.
You can explicitly convert a std::string into a const char * by using the c_str() method.
As often as possible it should mean std::string (or an alternative such as wxString, QString, etc., if you're using a framework that supplies such. Sometimes you have no real choice but to use a NUL-terminated byte sequence, but you generally want to avoid it when possible.
Ultimately, there simply is no clear, unambiguous terminology. Such is life.
To use the proper wording (as found in the C++ language standard) string is one of the varieties of std::basic_string (including std::string) from chapter 21.3 "String classes" (as in C++0x N3092), while the argument of ifstream's constructor is NTBS (Null-terminated byte sequence)
To quote, C++0x N3092 27.9.1.4/2.
basic_filebuf* open(const char* s, ios_base::openmode mode);
...
opens a file, if possible, whose name is the NTBS s
I was wondering, I normally use std::string for my code, but when you are passing a string in a parameter for a simply comparison, is it better to just use a literal?
Consider this function:
bool Message::hasTag(string tag)
{
for(Uint tagIndex = 0; tagIndex < m_tags.size();tagIndex++)
{
if(m_tags[tagIndex] == tag)
return 0;
}
return 1;
}
Despite the fact that the property it is making a comparison with is a vector, and whatever uses this function will probably pass strings to it, would it still be better to use a const char* to avoid creating a new string that will be used like a string literal anyway?
If you want to use classes, the best approach here is a const reference:
bool Message::hasTag(const string& tag);
That way, redudant copying can be minimized and it's made clear that the method doesn't intend to modify the argument. I think a clever compiler can emit pretty good code for the case when this is called with a string literal.
Passing a character pointer requires you to use strcmp() to compare, since if you start comparing pointers directly using ==, there will be ... trouble.
Short answer: it depends.
Long answer: std::string is highly useful because it provides a lot of utility functions for strings (searching for substrings, extracting substrings, concatenating strings etc.). It also manages the memory for you, so the ownership of the string cannot be confused.
In your case, you don't need either. You just need to know whether any of the objects in m_tags matches the given string. So for your case, writing the function using a const char *s is perfectly sufficient.
However, as a foot note: you almost always want to prefer std::string over (const) char * when talking about return values. That's because C strings have no ownership semantics at all, so a function returning a const char * needs to be documented very carefully, explaining who owns the pointed to memory (caller or callee) and, in case the callee gets it, how to free it (delete[], delete, free, something else).
I think it would be enough to pass an reference rather than value of string. I mean:
bool Message::hasTag(const string& tag)
That would copy only the reference to the original string value. Which must be created somwhere anyway, but outside of the function. This function would not copy its parameter whatsoever.
Since m_tags is a vector of strings anyway (I suppose), const string& parameter would be better idea.