We’re updating a project to c++20, and are running into errors where we pass string literals into functions which take char *. I know this has been changed to make code more safe, but we are interfacing with libraries which we cannot change.
I’d rather not disable the strict treatment of literals via compiler flags, so is there a good way to wrap these literals just in these particular cases?
I was thinking of an inline function, that was named something specific to the library, that internally would use const_cast. That way later if we want to change the code because the library gets updated, we know exactly where to look.
Any other ideas?
"Any other ideas?"
static char my_string[] = "string";
...
//elsewhere in the code
library_function(my_string);
The only difference between passing a string like that, and passing a string literal is the section of the assembly the data is stored in.
A string literal is stored in .text, a non-modifiable section.
The non-const string will be stored in .data.
If you really, really care if you're passing a function a pointer to .text or a pointer to .data, and you really, really, trust the library to not modify the parameter now and for ever, then you can certainly cast away the const-ness of your string literals.
Ignoring the fact that documentation lags behind implementation, even if we could believe the documentation promise to not modify its inputs, if it doesn't enforce it through the interface, at any time, on purpose or on accident, that input could be modified.
The following string literal creates a std::string for each literal and implicitly converts to char*.
#include <string>
struct stringlitwrapper
{
constexpr stringlitwrapper(const char* c) : s(c) {};
operator char*() { return s.data(); }
std::string s;
};
constexpr stringlitwrapper operator"" _w (const char* c, std::size_t n)
{
return stringlitwrapper(c);
}
void libfunction(char* param) {
// uses non-const char* as parameter
}
int main() {
libfunction("string literal"_w);
return 0;
}
For compilers, which do not support constexpr here (e.g. msvc does, clang not), leave both constexpr away.
By internally storing the literal as non-const string, there is no undefined behaviour involved.
(The library function of course should not overwrite at all or at least not write over the end of the string.)
To prevent heap allocations, the std::string could be replaced by a char array with fixed (maximal) size.
Related
I'm wondering if it's possible in C++ to declare a function parameter that must be a string literal? My goal is to receive an object that I can keep only the pointer to and know it won't be free()ed out from under me (i.e. has application lifetime scope).
For example, say I have something like:
#include <string.h>
struct Example {
Example(const char *s) : string(s) { }
const char *string;
};
void f() {
char *freeableFoo = strdup("foo");
Example e(freeableFoo); // e.string's lifetime is unknown
Example e1("literalFoo"); // e1.string is always valid
free(freeableFoo);
// e.string is now invalid
}
As shown in the example, when freeableFoo is free()ed the e.string member becomes invalid. This happens without Example's awareness.
Obviously we can get around this if Example copies the string in its constructor, but I'd like to not allocate memory for a copy.
Is a declaration possible for Example's constructor that says "you must pass a string literal" (enforced at compile-time) so Example knows it doesn't have to copy the string and it knows its string pointer will be valid for the application's lifetime?
In C++20 you can do it using a wrapper class that has a consteval converting constructor which takes a string literal:
struct literal_wrapper
{
template<class T, std::size_t N, std::enable_if_t<std::is_same_v<T, const char>>...>
consteval literal_wrapper(T (&s)[N]) : p(s) {}
char const* p;
};
The idea is that string literals have type const char[N] and we match this.
Then you can use this wrapper class in places where want to enforce passing a string literal:
void takes_literal(string_literal lit) {
// use lit.p here
}
You can call this as foo("foobar").
Note that this will also match static-storage const char[] arrays, like so:
const char array[] = {'a'};
takes_literal(array); // this compiles
Static arrays have most of the same characteristics as string literals, however, e.g., indefinite storage duration, which may work for you.
It does not match local arrays because the decayed pointer value is not a constant expression (that's where consteval comes in).
This answer is almost directly copied from the first variant suggested in C.M.'s comment on the question.
if it's possible in C++ to declare a function parameter that must be a string literal?
In C++ string literals have type char const[N], so that you can declare a parameter to be of such a type:
struct Example {
template<size_t N>
Example(char const(&string_literal)[N]); // In C++20 declare this constructor consteval.
template<size_t N>
Example(char(&)[N]) = delete; // Reject non-const char[N] arguments.
};
However, not every char const[N] is a string literal. One can have local variables and data members of such types. In C++20 you can declare the constructor as consteval to make it reject non-literal arguments for string_literal parameter.
Conceptually, you'd like to determine the storage duration of the argument to a constructor/function parameter. Or, more precisely, whether the argument has a longer lifetime than Example::string reference to it. C++ doesn't provide that, C++20 consteval is still a poor-man's proxy for that.
gcc extension __builtin_constant_p detetmines whether an expression is a compile-time constant which is widely used in preprocessor macros in Linux kernel source code. However, it can only evaluate to 1 on expressions, but never on functions' parameters, so that its use is limited to preprocessor macros.
The traditional solution for the problem of different object lifetimes has been organizing objects into a hierarchy, where objects at lower levels have smaller lifetimes than objects at higher levels, and hence, an object can always have a plain pointer to an object at a higher level of hierarchy valid. This approach is somewhat advanced, labour intensive and error prone, but it totally obviates the need for any garbage collection or smart-pointers, so that it's only used in ultra critical applications where no cost is too high. The opposite extreme of this approach is using std::shared_ptr/std::weak_ptr for everything which snowballs into maintenance nightmare pretty rapidly.
Just make the constructor explicitly taking rvalue:
struct Example {
Example(const char*&& s) : string(s) { }
const char* string;
};
I'm searching for a substring using string::find in C++. When I defined a string using const auto and used the variable later down, eclipse replaced . with ->.
I found this SO thread which concludes that auto foo = "bar" is deduced to a (const char *) foo = "bar". So eclipse is correct converting . to -> even though I was a bit baffled to begin with. I assumed incorrectly auto would become std::string.
Would there be a downside deducing auto foo = "bar" to std::string instead of const char * ? Increased code size, slower performance?
Your code could have a million classes that can be constructed implicitly from a const char *. Why should std::string be chosen?
auto simply saves some keyboard typing you if you want a variable with the same type of the expression¹, not if you want to create a different object.
(1) more or less; things as always get somewhat hairy with C++...
Well, likely, you have just answered your own question. std::string takes slightly more space (it has size counter), its creation involves dynamic allocation etc.
The lack of a complex string type may seem an anachronism nowadays, but since C++ is oriented toward a complete replacement of C with its low-level efficiency, it's pretty explainable.
Moreover, std::string is just a library class. you can choose a different string type, e.g. QString or std::experimental::string_view, if your task requires it. BTW, string_view is much more similar to const char[] since it doesn't provide dynamic manipulations at all and can be used in constexpr
"Foobar" is a string literal and not a std::string. This is stored as const char[7] in a read only section of your binary.
std::string te type has an implicit conversion from const char * because it has a single argument constructor without it being explicit which is invoked if you write: std::string s = "foobar";. Note that the default argument of allocator is assigned on the constructor.
Using const auto gives you the actual type instead of a converted type. So converting a string literal to std::string actually creates another object that references the literal.
http://en.cppreference.com/w/cpp/language/string_literal
http://en.cppreference.com/w/cpp/string/basic_string
Is there any simple method to detect, if the parameter passed to a function(const char *argument) was a constant literal or a variable?
I'm trying to fix errors in some code, which is filled with IsBadWritePtr calls, which throw access violation exceptions if the parameter was a constant literal.
This was a terrible design stupidity but now I'm not allowed to change the awkward behavior.
You can add a different overload that will be a better match for string literals. This is not really science but just heuristics:
void f(const char* p); // potential literal
void f(char *p); // pointer to non-const
Another idea would be taking advantage that literals are really arrays:
template <int N>
void f(const char (&_)[N]); // potential literal
Note that they don't quite detect literal vs. not literal, but rather some of the other features. const char* p = createANewString(); f(p); will resolve to f(const char*), and const char x[] = { 'A', 'b', 'c', '\0' }; will resolve to the template. Neither of them are literals, but you probably don't want to modify either.
Once you make that change, is should be simple to find out where each of the overloads is called.
This all works on the premise that the main function should not take the argument as const char* if it modifies it internally, and that the issue you are facing is because for backwards compatibility your compiler is allowing the call to a function that takes a pointer to non-const with a literal...
I don't think there is an way to detect that not at-least without using some hackery.
Since the interface takes a const char * the responsibility of the function is to not modify the passed string anyways. You need to modify the implementation because it is simply incorrect.
VirtualQuery can be used to detect if the address is writable, read-only, or inaccessible. Examine the State and Protect members of the returned MEMORY_BASIC_INFORMATION structure to see if the memory is accessible and has the access you need.
One VERY hackish way would involve checking if the pointer is in the .rdata segment.
Use dumpbin /headers after the build to retrieve the offset and length of the .rdata section, or parse the PE headers yourself. naturally, this is toolchain specific and generally a bad idea. Also, if the code needs to interoperate with DLLs, you'd have to check several executables and several .rdata segments.
I have just done what appears to be a common newbie mistake:
First we read one of many tutorials that goes like this:
#include <fstream>
int main() {
using namespace std;
ifstream inf("file.txt");
// (...)
}
Secondly, we try to use something similar in our code, which goes something like this:
#include <fstream>
int main() {
using namespace std;
std::string file = "file.txt"; // Or get the name of the file
// from a function that returns std::string.
ifstream inf(file);
// (...)
}
Thirdly, the newbie developer is perplexed by some cryptic compiler error message.
The problem is that ifstream takes const * char as a constructor argument.
The solution is to convert std::string to const * char.
Now, the real problem is that, for a newbie, "file.txt" or similar examples given in almost all the tutorials very much looks like a std::string.
So, is "my text" a std::string, a c-string or a *char, or does it depend on the context?
Can you provide examples on how "my text" would be interpreted differently according to context?
[Edit: I thought the example above would have made it obvious, but I should have been more explicit nonetheless: what I mean is the type of any string enclosed within double quotes, i.e. "myfilename.txt", not the meaning of the word 'string'.]
Thanks.
So, is "string" a std::string, a c-string or a *char, or does it depend on the context?
Neither C nor C++ have a built-in string data type, so any double-quoted strings in your code are essentially const char * (or const char [] to be exact). "C string" usually refers to this, specifically a character array with a null terminator.
In C++, std::string is a convenience class that wraps a raw string into an object. By using this, you can avoid having to do (messy) pointer arithmetic and memory reallocations by yourself.
Most standard library functions still take only char * (or const char *) parameters.
You can implicitly convert a char * into std::string because the latter has a constructor to do that.
You must explicitly convert a std::string into a const char * by using the c_str() method.
Thanks to Clark Gaebel for pointing out constness, and jalf and GMan for mentioning that it is actually an array.
"myString" is a string literal, and has the type const char[9], an array of 9 constant char. Note that it has enough space for the null terminator. So "Hi" is a const char[3], and so forth.
This is pretty much always true, with no ambiguity. However, whenever necessary, a const char[9] will decay into a const char* that points to its first element. And std::string has an implicit constructor that accepts a const char*. So while it always starts as an array of char, it can become the other types if you need it to.
Note that string literals have the unique property that const char[N] can also decay into char*, but this behavior is deprecated. If you try to modify the underlying string this way, you end up with undefined behavior. Its just not a good idea.
std::string file = "file.txt";
The right hand side of the = contains a (raw) string literal (i.a. a null-terminated byte string). Its effective type is array of const char.
The = is a tricky pony here: No assignment happens. The std::string class has a constructor that takes a pointer to char as an argument and this is called to create a temporary std::string and this is used to copy-construct (using the copy ctor of std::string) the object file of type std::string.
The compiler is free to elide the copy ctor and directly instantiate file though.
However, note that std:string is not the same thing as a C-style null-terminated string. It is not even required to be null-terminated.
ifstream inf("file.txt");
The std::ifstream class has a ctor that takes a const char * and the string literal passed to it decays to a pointer to the first element of the string.
The thing to remember is this: std::string provides (almost seamless) conversion from C-style strings. You have to look up the signature of the function to see if you are passing in a const char * or a std::string (the latter because of implicit conversions).
So, is "string" a std::string, a c-string or a char*, or does it depend on the context?
It depends entirely on the context. :-) Welcome to C++.
A C string is a null-terminated string, which is almost always the same thing as a char*.
Depending on the platforms and frameworks you are using, there might be even more meanings of the word "string" (for example, it is also used to refer to QString in Qt or CString in MFC).
The C++ standard library provides a std::string class to manage and represent character sequences. It encapsulates the memory management and is most of the time implemented as a C-string; but that is an implementation detail. It also provides manipulation routines for common tasks.
The std::string type will always be that (it doesn't have a conversion operator to char* for example, that's why you have the c_str() method), but it can be initialized or assigned to by a C-string (char*).
On the other hand, if you have a function that takes a std::string or a const std::string& as a parameter, you can pass a c-string (char*) to that function and the compiler will construct a std::string in-place for you. That would be a differing interpretation according to context as you put it.
Neither C nor C++ have a built-in string data type.
When the compiler finds, during the compilation, a double-quoted strings is implicitly referred (see the code below), the string itself is stored in program code/text and generates code to create even character array:
The array is created in static storage because it must persist to be referred later.
The array is made to constant because it must always contain the original data (Hello).
So at last, what you have is const char * to this constant static character array.
const char* v()
{
char* text = “Hello”;
return text;
// Above code can be reduced to:
// return “Hello”;
}
During the program run, when the control finds opening bracket, it creates “text”, the char* pointer, in the stack and constant array of 6 elements (including the null terminator ‘\0’ at the end) in static memory area. When control finds next line (char* text = “Hello”;), the starting address of the 6 element array is assigned to “text”. In next line (return text;), it returns “text”. With the closing bracket “text” will disappear from the stack, but array is still in the static memory area.
You need not to make return type const. But if you try to change the value in static array using non constant char* it will still give you an error during the run time because the array is constant. So, it’s always good to make return constant to make sure, it cannot be referred by non constant pointer.
But if the compiler finds a double-quoted strings is explicitly referred as an array, the compiler assumes that the programmer is going to (smartly) handle it. See the following wrong example:
const char* v()
{
char text[] = “Hello”;
return text;
}
During the compilation, compiler checks, quoted text and save it as it is in the code to fill the generated array during the runt time. Also, it calculate the array size, in this case again as 6.
During the program run, with the open bracket, the array “text[]” with 6 elements is created in stack. But no initialization. When the code finds (char text[] = “Hello”;), the array is initialized (with the text in compiled code). So array is now on the stack. When the compiler finds (return text;), it returns the starting address of the array “text”. When the compiler find the closing bracket, the array disappears from the stack. So no way to refer it by the return pointer.
Most standard library functions still take only char * (or const char *) parameters.
The Standard C++ library has a powerful class called string for manipulating text. The internal data structure for string is character arrays. The Standard C++ string class is designed to take care of (and hide) all the low-level manipulations of character arrays that were previously required of the C programmer. Note that std::string is a class:
You can implicitly convert a char * into std::string because the
latter has a constructor to do that.
You can explicitly convert a std::string into a const char * by using the c_str() method.
As often as possible it should mean std::string (or an alternative such as wxString, QString, etc., if you're using a framework that supplies such. Sometimes you have no real choice but to use a NUL-terminated byte sequence, but you generally want to avoid it when possible.
Ultimately, there simply is no clear, unambiguous terminology. Such is life.
To use the proper wording (as found in the C++ language standard) string is one of the varieties of std::basic_string (including std::string) from chapter 21.3 "String classes" (as in C++0x N3092), while the argument of ifstream's constructor is NTBS (Null-terminated byte sequence)
To quote, C++0x N3092 27.9.1.4/2.
basic_filebuf* open(const char* s, ios_base::openmode mode);
...
opens a file, if possible, whose name is the NTBS s
If i have a function api that expects a 14 digit input and returns a 6 digit output. I basically define the input as a const char *. would that be the correct and safe thing to do?
also why would I not want to just do char * which I could but it seems more prudent to use const char * in that case especially since its an api that i am providing. so for different input values I generate 6 digit codes.
I am not sure why are you using char pointers, where you could use std::string:
std::string code(const std::string& input)
{ ... }
If you don't have the choice, using const char* gives a guarantee to the user that you won't change his data especially if it was a string literal where modifying one is undefined behavior.
By using const you're promising your user that you won't change the string being passed in. It becomes part of the API helping define your function's behavior. It also let's users pass constant strings, including literal strings like "mystring".
When you say const char *c you are telling the compiler that you will not be making any changes to the data that c points to. So this is a good practice if you will not be directly modifying your input data.
String literals have static storage class (they exist for the duration of the program) and may or may not be shared if the same string literal is referenced from multiple locations in a program. The effect of modifying a string literal is undefined; thus, you should always declare a pointer to a string literal as const char *.
You get several benefits for using const:
It documents your code, the user knows no harm will be done to this string.
You allow the user to send a const char* which he might have. Converting from non-const to const is automatic. The other way around is something that should be avoided (And done explicitly, and might lead to undefined behavior at times)
You let the compiler check you. The compiler can now verify that you don't accidentally change the user's string.
You need to use a const char * anywhere that you're passing a string literal, or the compiler will balk (assuming you don't want to convert it to a std::string).
const char* is usually used in parameters, stating that your function won't modify that string.
void function(char* modified_str, const char* not_modified_str) { ... }
If you're returning the const char* what you want to say is not obvious. You try to tell that nobody should modify the returned string, but you still (I think it would be that way) transfer the ownership to the calling routine, so that it would have to invoke delete[] on the char that your function returned.
Generally speaking, use std::string, then your function will look the following way:
std::string function(std::string& modified_str, const std::string& not_modified_str) { ... }