I'm searching for a substring using string::find in C++. When I defined a string using const auto and used the variable later down, eclipse replaced . with ->.
I found this SO thread which concludes that auto foo = "bar" is deduced to a (const char *) foo = "bar". So eclipse is correct converting . to -> even though I was a bit baffled to begin with. I assumed incorrectly auto would become std::string.
Would there be a downside deducing auto foo = "bar" to std::string instead of const char * ? Increased code size, slower performance?
Your code could have a million classes that can be constructed implicitly from a const char *. Why should std::string be chosen?
auto simply saves some keyboard typing you if you want a variable with the same type of the expression¹, not if you want to create a different object.
(1) more or less; things as always get somewhat hairy with C++...
Well, likely, you have just answered your own question. std::string takes slightly more space (it has size counter), its creation involves dynamic allocation etc.
The lack of a complex string type may seem an anachronism nowadays, but since C++ is oriented toward a complete replacement of C with its low-level efficiency, it's pretty explainable.
Moreover, std::string is just a library class. you can choose a different string type, e.g. QString or std::experimental::string_view, if your task requires it. BTW, string_view is much more similar to const char[] since it doesn't provide dynamic manipulations at all and can be used in constexpr
"Foobar" is a string literal and not a std::string. This is stored as const char[7] in a read only section of your binary.
std::string te type has an implicit conversion from const char * because it has a single argument constructor without it being explicit which is invoked if you write: std::string s = "foobar";. Note that the default argument of allocator is assigned on the constructor.
Using const auto gives you the actual type instead of a converted type. So converting a string literal to std::string actually creates another object that references the literal.
http://en.cppreference.com/w/cpp/language/string_literal
http://en.cppreference.com/w/cpp/string/basic_string
Related
We’re updating a project to c++20, and are running into errors where we pass string literals into functions which take char *. I know this has been changed to make code more safe, but we are interfacing with libraries which we cannot change.
I’d rather not disable the strict treatment of literals via compiler flags, so is there a good way to wrap these literals just in these particular cases?
I was thinking of an inline function, that was named something specific to the library, that internally would use const_cast. That way later if we want to change the code because the library gets updated, we know exactly where to look.
Any other ideas?
"Any other ideas?"
static char my_string[] = "string";
...
//elsewhere in the code
library_function(my_string);
The only difference between passing a string like that, and passing a string literal is the section of the assembly the data is stored in.
A string literal is stored in .text, a non-modifiable section.
The non-const string will be stored in .data.
If you really, really care if you're passing a function a pointer to .text or a pointer to .data, and you really, really, trust the library to not modify the parameter now and for ever, then you can certainly cast away the const-ness of your string literals.
Ignoring the fact that documentation lags behind implementation, even if we could believe the documentation promise to not modify its inputs, if it doesn't enforce it through the interface, at any time, on purpose or on accident, that input could be modified.
The following string literal creates a std::string for each literal and implicitly converts to char*.
#include <string>
struct stringlitwrapper
{
constexpr stringlitwrapper(const char* c) : s(c) {};
operator char*() { return s.data(); }
std::string s;
};
constexpr stringlitwrapper operator"" _w (const char* c, std::size_t n)
{
return stringlitwrapper(c);
}
void libfunction(char* param) {
// uses non-const char* as parameter
}
int main() {
libfunction("string literal"_w);
return 0;
}
For compilers, which do not support constexpr here (e.g. msvc does, clang not), leave both constexpr away.
By internally storing the literal as non-const string, there is no undefined behaviour involved.
(The library function of course should not overwrite at all or at least not write over the end of the string.)
To prevent heap allocations, the std::string could be replaced by a char array with fixed (maximal) size.
I'm wondering if it's possible in C++ to declare a function parameter that must be a string literal? My goal is to receive an object that I can keep only the pointer to and know it won't be free()ed out from under me (i.e. has application lifetime scope).
For example, say I have something like:
#include <string.h>
struct Example {
Example(const char *s) : string(s) { }
const char *string;
};
void f() {
char *freeableFoo = strdup("foo");
Example e(freeableFoo); // e.string's lifetime is unknown
Example e1("literalFoo"); // e1.string is always valid
free(freeableFoo);
// e.string is now invalid
}
As shown in the example, when freeableFoo is free()ed the e.string member becomes invalid. This happens without Example's awareness.
Obviously we can get around this if Example copies the string in its constructor, but I'd like to not allocate memory for a copy.
Is a declaration possible for Example's constructor that says "you must pass a string literal" (enforced at compile-time) so Example knows it doesn't have to copy the string and it knows its string pointer will be valid for the application's lifetime?
In C++20 you can do it using a wrapper class that has a consteval converting constructor which takes a string literal:
struct literal_wrapper
{
template<class T, std::size_t N, std::enable_if_t<std::is_same_v<T, const char>>...>
consteval literal_wrapper(T (&s)[N]) : p(s) {}
char const* p;
};
The idea is that string literals have type const char[N] and we match this.
Then you can use this wrapper class in places where want to enforce passing a string literal:
void takes_literal(string_literal lit) {
// use lit.p here
}
You can call this as foo("foobar").
Note that this will also match static-storage const char[] arrays, like so:
const char array[] = {'a'};
takes_literal(array); // this compiles
Static arrays have most of the same characteristics as string literals, however, e.g., indefinite storage duration, which may work for you.
It does not match local arrays because the decayed pointer value is not a constant expression (that's where consteval comes in).
This answer is almost directly copied from the first variant suggested in C.M.'s comment on the question.
if it's possible in C++ to declare a function parameter that must be a string literal?
In C++ string literals have type char const[N], so that you can declare a parameter to be of such a type:
struct Example {
template<size_t N>
Example(char const(&string_literal)[N]); // In C++20 declare this constructor consteval.
template<size_t N>
Example(char(&)[N]) = delete; // Reject non-const char[N] arguments.
};
However, not every char const[N] is a string literal. One can have local variables and data members of such types. In C++20 you can declare the constructor as consteval to make it reject non-literal arguments for string_literal parameter.
Conceptually, you'd like to determine the storage duration of the argument to a constructor/function parameter. Or, more precisely, whether the argument has a longer lifetime than Example::string reference to it. C++ doesn't provide that, C++20 consteval is still a poor-man's proxy for that.
gcc extension __builtin_constant_p detetmines whether an expression is a compile-time constant which is widely used in preprocessor macros in Linux kernel source code. However, it can only evaluate to 1 on expressions, but never on functions' parameters, so that its use is limited to preprocessor macros.
The traditional solution for the problem of different object lifetimes has been organizing objects into a hierarchy, where objects at lower levels have smaller lifetimes than objects at higher levels, and hence, an object can always have a plain pointer to an object at a higher level of hierarchy valid. This approach is somewhat advanced, labour intensive and error prone, but it totally obviates the need for any garbage collection or smart-pointers, so that it's only used in ultra critical applications where no cost is too high. The opposite extreme of this approach is using std::shared_ptr/std::weak_ptr for everything which snowballs into maintenance nightmare pretty rapidly.
Just make the constructor explicitly taking rvalue:
struct Example {
Example(const char*&& s) : string(s) { }
const char* string;
};
How do we differentiate char arrays and string in c++?
Is there anything char arrays do better than std::string ?
How do we differentiate char arrays and string in c++?
You don't, string literals are by definition null-terminated char arrays. Since arrays decay into pointers the first chance they get, const char* is (still) often a synonym for string.
If you are asking about when you should write new char[n], the answer is never. If anything, it should be std::make_unique<char[]>(n); and unless you are writing your own version of std::string, use the standard one. If you need a buffer, use std::vector or std::array.
There are some advantages of const char[] constants over const std::string but they are being "solved" by the new C++ Standards:
Before C++20, std::string could not be used in constexpr context. So, I still prefer declaring global string constants with constexpr const char[] if all I do is just passing them to some function. As #HolyBlackCat mentioned in the comments, C++17 std::string_view makes this use-case obsolote too, especially with the new sv literal:
#include <string_view>
using namespace std::literals;
//Compile-time string_view
constexpr auto str = "hello"sv;
const char* is somewhat more universal. You can pass it to a function accepting const char*, std::string, or std::string_view. The reverse requires std::string::c_str() and it is not possible to so without copying the std::string_view.
There is no dynamic allocation involved. Although std::string might employ SSO, it is not guaranteed. This might be relevant for very small systems where the heap is precious and the program flash memory is more accomodating and contains the literal anyway.
Interacting with old libraries. But even then, std::string is null-terminated too.
Overall, my recommendation would be to use std::string_view every chance you get - for any non-owning string, including holding string literals. Most importantly, it should replace const char* and const std::string& function parameters. If you want to own a string, use std::string.
One reason (which I personally don't think is a very good one) to use char arrays is the use case when you want your code to compile with both a C compiler and a C++ compiler.
Does it matter anymore if I use boost::string_ref over std::string& ? I mean, is it really more efficient to use boost::string_ref over the std version when you are processing strings ? I don't really get the explanation offered here: http://www.boost.org/doc/libs/1_61_0/libs/utility/doc/html/string_ref.html . What really confuses me is the fact that std::string is also a handle class that only points to the allocated memory, and since c++11, with move semantics the copy operations noted in the article above are not going to happen. So, which one is more efficient ?
The use case for string_ref (or string_view in recent Boost and C++17) is for substring references.
The case where
the source string happens to be std::string
and the full length of a source string is referenced
is a (a-typical) special case, where it does indeed resemble std::string const&.
Note also that operations on string_ref (like sref.substring(...)) automatically return more string_ref objects, instead of allocating a new std::string.
I have never used it be it seems to me that its purpose is to provide an interface similar to std::string but without having to allocate a string for manipulation. Take the example given extract_part(): it is given a hard-coded C array "ABCDEFG", but because the initial function takes a std::string an allocation takes place (std::string will have its own version of "ABCDEFG"). Using string_ref, no allocation occurs, it uses the reference to the initial "ABCDEFG". The constraint is that the string is read-only.
This answer uses the new name string_view to mean the same as string_ref.
What really confuses me is the fact that std::string is also a handle class that only points to the allocated memory
A string allocates, owns, and manages its own memory. A string_view is a handle to some memory that was already allocated. The memory is managed by some other mechanism, unrelated to the string_view.
If you already have some text data, for example in a char array, then the additional memory allocation involved in constructing a string might be redundant. A string_view could be more efficient because it would allow you to operate directly on the original data in the char array. However, it would not permit the data to be modified; string_view allows no non-const access, because it doesn't own the data it refers to.
and since c++11, with move semantics the copy operations noted in the article above are not going to happen.
You can only move from an object that is ready to be discarded. Copying still serves a purpose and is necessary in many cases.
The example in the article constructs two new strings (not copies) and also constructs two copies of existing strings. In C++98 the copies could already be elided by RVO without move semantics, so they're not a big deal. By using string_view it avoids constructing the two new strings. Move semantics are irrelevant here.
In the call to extract_part("ABCDEFG") a string_view is constructed which refers to the char array represented by the string literal. Constructing a string here would have involved a memory allocation and a copy of the char array.
In the call to bar.substr(2,3) a string_view is constructed which refers to parts of the data already referred to by the first string_view. Using a string here would have involved another memory allocation and copy of part of the data.
So, which one is more efficient?
This is a bit like asking if a hammer is more efficient than a screwdriver. They serve different purposes, so it depends what it is you're trying to accomplish.
You need to be careful when using string_view that the memory it refers to remains valid throughout its lifetime.
If you stick to std::string it does not matter, but boost::string_ref also supports const char*. That is, do you intend to call your string processing function foo with std::string only?
void foo(const std::string&);
foo("won't work"); // no support for `const char*`
Since boost::string_ref is constructable from const char*, it is more flexible since it works with both const char* and std::string.
The proposal N3442 might be helpful.
In short: The main benefit of std::string_view over const std::string& is that you can pass both const char* and std::string objects without doing a copy. As others have said, it also allows you to pass substrings without copying, although (in my experience) this is somewhat less often important.
Consider the following (silly) function (yes I know you could just call s.at(2)):
char getThird(std::string s)
{
if (s.size() < 3) throw std::runtime_error("String too short");
return s[2];
}
This function works, but the string is passed by value. This means the whole length of the string is copied even though we don't look at all of it, and it also (often) incurs a dynamic memory allocation. Doing this in a tight loop can be very expensive. One solution to this is to pass the string by const reference instead:
char getThird(const std::string& s);
This works a lot better if you have a std::string variable and you pass it as a parameter to getThird. But now there's a problem: what if you have a null-terminated const char* string? When you call this function, a temporary std::string will get constructed, so you still get still get the copy and dynamic memory allocation.
Here's another attempt:
char getThird(const char* s)
{
if (std::strlen(s) < 3) throw std::runtime_error("String too short");
return s[2];
}
This will obviously now work fine for const char* variables. It will also work for std::string variables, but calling it is a little awkward: getThird(myStr.c_str()). What's more, std::string supports embedded null characters, and getThird will misinterpret the string as ended at the first of these. At worst this could cause a security vulnerability - imagine if the function were called checkStringForBadHacks!
Another problem is simply that it's annoying to write a function in terms of old null-terminated strings instead of std::string objects with their handy methods. Did you notice, for example, that this function looks at the whole length of the string even though only the first few characters are important? It's hidden in std::strlen, which iterates over all characters looking for the null terminator. We could replace that with a manual check that the first three characters aren't null, but you can see this is a lot less convenient than the other versions.
Step in std::string_view (or boost::string_view, previously known as boost::string_ref):
char getThird(std::string_view s)
{
if (s.size() < 3) throw std::runtime_error("String too short");
return s[2];
}
This gives you the nice methods you expect from a proper string class, like .size(), and it works in both the situations discussed above, plus another:
It works with std::string objects, which can be implicitly be converted to std::string_view objects.
It works with const char* null-terminated strings, which can also be implicitly be converted to std::string_view objects.
This does have the potential disadvantage that constructing the std::string_view requires iterating over the whole string to find the length, even if the function that uses it never needs it (as is the case here). However, if a caller is using a const char* as a parameter to several functions (or one function in a loop) that take std::string_view objects it could always manually construct that object beforehand. This could even give a performance increase, because if that function(s) do need the length then it is precomputed once and reused.
As other answers have mentioned, it also avoids a copy when you only want to pass a substring. For example, this is very useful in parsing. But std::string_view is justified even without this feature.
It's worth noting that there is a case where the original function signature, taking a std::string by value, may actually be better than a std::string_view. That's where you were going to make a copy of the string anyway, for example to store in some other variable or to return from the function. Imagine this function:
std::string changeThird(std::string s, char c)
{
if (s.size() < 3) throw std::runtime_error("String too short");
s[2] = c;
return s;
}
// vs.
std::string changeThird(std::string_view s, char c)
{
if (s.size() < 3) throw std::runtime_error("String too short");
std::string result = s;
result[2] = c;
return result;
}
Note that both of these involve exactly one copy: In the first case this is done implicitly when the parameter s is constructed from whatever is passed in (including if it is another std::string). In the second case we do it explicitly when we create result. But the return statement does not do a copy, because uses move semantics (as if we had done std::move(result)), or more likely uses the return value optimisation.
The reason the first version can be better is that it is actually possible for it to perform zero copies, if the caller moves the argument:
std::string something = getMyString();
std::string other = changeThird(std::move(something), "x");
In this case, the first changeThird does not involve any copy at all, whereas the second one does.
I have just done what appears to be a common newbie mistake:
First we read one of many tutorials that goes like this:
#include <fstream>
int main() {
using namespace std;
ifstream inf("file.txt");
// (...)
}
Secondly, we try to use something similar in our code, which goes something like this:
#include <fstream>
int main() {
using namespace std;
std::string file = "file.txt"; // Or get the name of the file
// from a function that returns std::string.
ifstream inf(file);
// (...)
}
Thirdly, the newbie developer is perplexed by some cryptic compiler error message.
The problem is that ifstream takes const * char as a constructor argument.
The solution is to convert std::string to const * char.
Now, the real problem is that, for a newbie, "file.txt" or similar examples given in almost all the tutorials very much looks like a std::string.
So, is "my text" a std::string, a c-string or a *char, or does it depend on the context?
Can you provide examples on how "my text" would be interpreted differently according to context?
[Edit: I thought the example above would have made it obvious, but I should have been more explicit nonetheless: what I mean is the type of any string enclosed within double quotes, i.e. "myfilename.txt", not the meaning of the word 'string'.]
Thanks.
So, is "string" a std::string, a c-string or a *char, or does it depend on the context?
Neither C nor C++ have a built-in string data type, so any double-quoted strings in your code are essentially const char * (or const char [] to be exact). "C string" usually refers to this, specifically a character array with a null terminator.
In C++, std::string is a convenience class that wraps a raw string into an object. By using this, you can avoid having to do (messy) pointer arithmetic and memory reallocations by yourself.
Most standard library functions still take only char * (or const char *) parameters.
You can implicitly convert a char * into std::string because the latter has a constructor to do that.
You must explicitly convert a std::string into a const char * by using the c_str() method.
Thanks to Clark Gaebel for pointing out constness, and jalf and GMan for mentioning that it is actually an array.
"myString" is a string literal, and has the type const char[9], an array of 9 constant char. Note that it has enough space for the null terminator. So "Hi" is a const char[3], and so forth.
This is pretty much always true, with no ambiguity. However, whenever necessary, a const char[9] will decay into a const char* that points to its first element. And std::string has an implicit constructor that accepts a const char*. So while it always starts as an array of char, it can become the other types if you need it to.
Note that string literals have the unique property that const char[N] can also decay into char*, but this behavior is deprecated. If you try to modify the underlying string this way, you end up with undefined behavior. Its just not a good idea.
std::string file = "file.txt";
The right hand side of the = contains a (raw) string literal (i.a. a null-terminated byte string). Its effective type is array of const char.
The = is a tricky pony here: No assignment happens. The std::string class has a constructor that takes a pointer to char as an argument and this is called to create a temporary std::string and this is used to copy-construct (using the copy ctor of std::string) the object file of type std::string.
The compiler is free to elide the copy ctor and directly instantiate file though.
However, note that std:string is not the same thing as a C-style null-terminated string. It is not even required to be null-terminated.
ifstream inf("file.txt");
The std::ifstream class has a ctor that takes a const char * and the string literal passed to it decays to a pointer to the first element of the string.
The thing to remember is this: std::string provides (almost seamless) conversion from C-style strings. You have to look up the signature of the function to see if you are passing in a const char * or a std::string (the latter because of implicit conversions).
So, is "string" a std::string, a c-string or a char*, or does it depend on the context?
It depends entirely on the context. :-) Welcome to C++.
A C string is a null-terminated string, which is almost always the same thing as a char*.
Depending on the platforms and frameworks you are using, there might be even more meanings of the word "string" (for example, it is also used to refer to QString in Qt or CString in MFC).
The C++ standard library provides a std::string class to manage and represent character sequences. It encapsulates the memory management and is most of the time implemented as a C-string; but that is an implementation detail. It also provides manipulation routines for common tasks.
The std::string type will always be that (it doesn't have a conversion operator to char* for example, that's why you have the c_str() method), but it can be initialized or assigned to by a C-string (char*).
On the other hand, if you have a function that takes a std::string or a const std::string& as a parameter, you can pass a c-string (char*) to that function and the compiler will construct a std::string in-place for you. That would be a differing interpretation according to context as you put it.
Neither C nor C++ have a built-in string data type.
When the compiler finds, during the compilation, a double-quoted strings is implicitly referred (see the code below), the string itself is stored in program code/text and generates code to create even character array:
The array is created in static storage because it must persist to be referred later.
The array is made to constant because it must always contain the original data (Hello).
So at last, what you have is const char * to this constant static character array.
const char* v()
{
char* text = “Hello”;
return text;
// Above code can be reduced to:
// return “Hello”;
}
During the program run, when the control finds opening bracket, it creates “text”, the char* pointer, in the stack and constant array of 6 elements (including the null terminator ‘\0’ at the end) in static memory area. When control finds next line (char* text = “Hello”;), the starting address of the 6 element array is assigned to “text”. In next line (return text;), it returns “text”. With the closing bracket “text” will disappear from the stack, but array is still in the static memory area.
You need not to make return type const. But if you try to change the value in static array using non constant char* it will still give you an error during the run time because the array is constant. So, it’s always good to make return constant to make sure, it cannot be referred by non constant pointer.
But if the compiler finds a double-quoted strings is explicitly referred as an array, the compiler assumes that the programmer is going to (smartly) handle it. See the following wrong example:
const char* v()
{
char text[] = “Hello”;
return text;
}
During the compilation, compiler checks, quoted text and save it as it is in the code to fill the generated array during the runt time. Also, it calculate the array size, in this case again as 6.
During the program run, with the open bracket, the array “text[]” with 6 elements is created in stack. But no initialization. When the code finds (char text[] = “Hello”;), the array is initialized (with the text in compiled code). So array is now on the stack. When the compiler finds (return text;), it returns the starting address of the array “text”. When the compiler find the closing bracket, the array disappears from the stack. So no way to refer it by the return pointer.
Most standard library functions still take only char * (or const char *) parameters.
The Standard C++ library has a powerful class called string for manipulating text. The internal data structure for string is character arrays. The Standard C++ string class is designed to take care of (and hide) all the low-level manipulations of character arrays that were previously required of the C programmer. Note that std::string is a class:
You can implicitly convert a char * into std::string because the
latter has a constructor to do that.
You can explicitly convert a std::string into a const char * by using the c_str() method.
As often as possible it should mean std::string (or an alternative such as wxString, QString, etc., if you're using a framework that supplies such. Sometimes you have no real choice but to use a NUL-terminated byte sequence, but you generally want to avoid it when possible.
Ultimately, there simply is no clear, unambiguous terminology. Such is life.
To use the proper wording (as found in the C++ language standard) string is one of the varieties of std::basic_string (including std::string) from chapter 21.3 "String classes" (as in C++0x N3092), while the argument of ifstream's constructor is NTBS (Null-terminated byte sequence)
To quote, C++0x N3092 27.9.1.4/2.
basic_filebuf* open(const char* s, ios_base::openmode mode);
...
opens a file, if possible, whose name is the NTBS s