How strstr return is not a constant - c++

The standard function strstr is used to find the location of a sub-string in a string. Both the arguments of the function are of const char * type, but the return type is char *.
I would like to know how a standard function is implemented violating the const-correctness.

All the const char * is telling you is that strstr is not going to modify the string you pass into it.
Whether you modify the returned string or not is up to you as it is your string!
In C++ this has been changed by overloading the method and having two versions, the const input version has a const output.
In C it doesn't have quite that level of safety built in for you and assumes you know yourself whether you should be modifying the returned string.

C allows pointing to memory with const or non-const pointers, regardless if the object was defined with the const qualifier or not.
6.5 Expressions
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a qualified version of a type compatible with the effective type of the object,
The prototype of strstr in C is:
char *strstr(const char *s1, const char *s2);
The returned pointer, if valid, points to string s1. This can be achieved with a cast:
const char safe = 's' ;
char* careful = ( char* )&safe ;
The problem is modifying that memory.
6.7.3 Type qualifiers
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined.
Since you created the string, you should know whether you can modify it or not, therefore you can accept the return value with a pointer to const, to avoid any problems:
const char* find = strstr( ... ) ;

According to ISO C++ 21.8(7) strstr returns a const char* or a char* depending on if it gets a const char* or a char*.
const char* strstr(const char* s1, const char* s2);
char* strstr( char* s1, const char* s2);

It's done by specifying the signature, and leaving the implementation up to the compiler builders.
Note that returning a char* which points to a const char[] string is just dangerous, but not yet a violation of any rule. However, any attempt to write to that memory is still Undefined Behavior.

The strstr function dates back to an era before there was such a thing as a const pointer. In cases were it would be legal for code to write to memory identified by the first pointer passed to strstr, it would be legal for code to write to memory identified by the returned pointer, and in cases where the returned value would only be used in ways that were legal with a pointer to read-only memory (e.g. a string literal), one could legally pass such a pointer to strstr.
If a functionality similar to strstr were being defined today, it might be implemented using two methods--one of which could accept any pointer and return a pointer which could not be written by the recipient, and one of which would only accept writable pointers but would return a pointer that the recipient could use as it saw fit. Because some code which used strstr would need to pass read-only pointers, however, and because some code which used strstr needed to be able to write to the pointers that it would yield when given writable pointers, it was necessary to have one set of pointer qualifications work both ways. The consequence is a set of pointer qualifications which are not really "safe" [since it may return a writable pointer to a read-only area of memory] and which will let code compile in some cases where it really "shouldn't", but which will allow code written before the days of const pointers to continue working as intended.

C++ provides two versions one that takes const arguments and ones that takes non-const.
const char* strstr( const char* str, const char* target );
char* strstr( char* str, const char* target );
Since in C we can not overload, we are left with two unpleasant choices:
Either we take the arguments as non-const but if our sources are indeed const then we need to perform an unpleasant cast to non-const.
The second option is the one we have which is that we take the arguments as const but we return a non-const. We could return a const char* but then we could never modify the result.

The return value is not the variable that you have passed to the function as parameter. This function returns a pointer to the first occurrence in haystack of any of the entire sequence of characters specified in needle, or a null pointer if the sequence is not present in haystack.

Related

Is it good practice to pass `char []` to a function which accepts `std::string&`

I am not facing any issues with below code but is it good practice to pass char [] to a function which accepts std::string& as a parameter
const char* function(std::string& MyString)
{
MyString = "Hello World";
return MyString.c_str();
}
int main()
{
char MyString[50];
/*
*Is it good practice to cast like this?
*what possible issues i could face because of this casting?
*/
function((std::string)MyString);
std::cin.get();
return 0;
}
This simply won't work because it will require creation of temporary std::string which can not be bound to l-value reference. Even if function took a reference to std::string const creation of temporary would have an impact of performance. So depending on nature of the function it may be a good idea to add an overload that accepts a pointer to a c-string as well. Alternatively, if function is not going to modify the string you can make it accept std::string_view so it can handle both std::string and c-strings.
No, it is bad practice, as the cast has no effect; a std::string can be constructed from a char * with a non-explicit constructor, so you can remove the cast and you'll get exactly the same code (just with an implicit construction instead of an explicit cast).
Now as written, you'll get an error (at least with a non-broken compiler), as you can't pass a temporary object to a non-const lvalue reference. But if you change the function to take a const std::string &, it will work just fine.
Also bad practice is returning the char * you get by calling std::string::c_str() -- this pointer will only be valid as long as the string object is not modified or destroyed -- so the returned pointer will become invalid (dangling) as soon a the temp you passed as an argument was destroyed. If you were to save that returned pointer in a local variable in main and then try to do something with it (like printing it), that would be undefined behavior.
In short passing char[] to function accepting string is common practice (from C). And it is not bad. The explicit cast is not good here. The function also is not good, as it not accept passing char[] ...

Is there a way to pass a string literal as reference in C++

Within C++ it is common to pass by reference instead of pointer if a value can not be NULL.
Suppose I have a function with the following signature, which is often used with a string literal.
void setText( const char* text );
I was wondering how I could change the function in such a way that it accepts a reference (and has the advantage not to accept NULL)?
If I would change it to (const char& text) then it would be a ref to a single char. From which the address can ba taken inside the function... but feels not nice.
Another option would be (const std::string& text) which has the disadvantage that it always calls a constructor and does some dynamic memory allocation.
Any other common ways, or just stick to the std::string& or the char* ?
Honestly, I would just keep the const char* text function and add an overload const std::string& text function that calls the first one with setText(text.c_str())
There's a slight problem here in that C++ and references-to-arrays aren't the best pair. For reference, see: C++ pass an array by reference
Since you're talking about binding a reference to a string, and a string is an array of characters, we run into that problem head-on. In light of this, the best we can really do is bind a ref to a const char*, which looks like this:
void ref(const char* const& s);
But this doesn't do what you want; this binds a reference to a pointer, and all it guarantees is that the pointer itself exists, not that it's pointing to a valid string literal.
This same problem is present in the std::string& examples: those only guarantee that you've bound to a std::string object, but that string could very well be empty, so you still haven't guaranteed yourself a string that has anything of value in it.
In the end, I'll second what Zan says. const char* is a well respected idiom passing string literals, and then having a second overload that binds to strings is a nice convenience.
(One last note: std::string doesn't "always" allocate memory. Implementations with the small string optimization will skip it for strings as long as 23 characters.)

std::string declared function does not complain returning char *

This is not my program so don't start berating me :-). Some random program I got. A globally declared buffer is being returned by MyFunc(). I use VS2008 and it does not complain
static char buffer[1024];
std::string MyFunc() {
....
....
return buffer;
}
However when I add this line of code
char * ret;
ret = MyFunc()
It complains: "error: no suitable conversion function from "std::string" to "char *" exists"
My question is why is the compiler complaining now? Why this inconstancy in syntax checking? Again I dont have the freedom to change MyFunc(). In my program if I can make
std::string ret;
ret = MyFunc();
and get rid of the syntax error but would really like to understand this strange behavior.
string() has a constructor that accepts a char*, so you get an automatic conversion. There is no automatic conversion from a string to a char*. You have to call string::c_str() to get the char*.
Edit
Although you asked only for an explanation of the behavior, others in this forum seem to think I have short-changed you by not mentioning that string::c_str returns a const char*, not a simple char*. But the explanation remains: there is no implicit/automatic conversion from string to char* or const char*. Feel free to read about c_str here if it's important to you.
It is not the syntax, it is the structure of the std::string that makes the compiler behave differently.
When you are returning a char* from a function returning std::string, the compiler notices that there is a constructor of std::string that takes char*, calls that constructor, and quietly returns the result.
When you are trying to return a std::string from a char* - returning function, the compiler tries to see if there is a conversion operator to make char* from a std::string, finds that there is no such operator, and reports an error.
If you want to convert a string to char*, you need to make a copy of the string's buffer, like this:
char* ret_ch = new char[ret.size()+1];
memcpy(ret_ch, ret.c_str(), ret.size()+1);
return ret_ch;
You could think that it is OK to return c_str() by itself, but it is not a good idea: the buffer that "backs up" this C string belongs to std::string object, so once the string gets deallocated, accessing the buffer starts producing undefined behavior. That is why you need to make an explicit copy when you access the buffer of a string. Of course you are also responsible for calling delete[] on the copied result.
std::string is designed as implicitly constructable from char const* because this supports using string literals and typical C style code strings as initializer values.
If this was not supported then one would just have to use some intermediate function, which would add nothing but verbosity and inefficiency.
In the other direction, however, std::string is intentionally designed to not convert implicitly to char const*. Part of the rationale is probably that with std::string being logically mutable, the returned raw pointer is only valid as long as no operations are performed that might cause a buffer replacement or string destruction. For example,
char const* s = foo().c_str();
where foo produces a std::string, makes s point to a buffer that no longer exists, a dangling pointer that is invalid.
The c_str() member function call makes the conversion stand out.
Consider how more common that problem could be if one could write just
char const* s = foo();
and have that compile.
Regarding that strike-through (deleted) text, I realized that it's completely irrelevant whether the string is logically mutable or immutable. Sorry. Need more coffee!

C++: Is "my text" a std::string, a *char or a c-string?

I have just done what appears to be a common newbie mistake:
First we read one of many tutorials that goes like this:
#include <fstream>
int main() {
using namespace std;
ifstream inf("file.txt");
// (...)
}
Secondly, we try to use something similar in our code, which goes something like this:
#include <fstream>
int main() {
using namespace std;
std::string file = "file.txt"; // Or get the name of the file
// from a function that returns std::string.
ifstream inf(file);
// (...)
}
Thirdly, the newbie developer is perplexed by some cryptic compiler error message.
The problem is that ifstream takes const * char as a constructor argument.
The solution is to convert std::string to const * char.
Now, the real problem is that, for a newbie, "file.txt" or similar examples given in almost all the tutorials very much looks like a std::string.
So, is "my text" a std::string, a c-string or a *char, or does it depend on the context?
Can you provide examples on how "my text" would be interpreted differently according to context?
[Edit: I thought the example above would have made it obvious, but I should have been more explicit nonetheless: what I mean is the type of any string enclosed within double quotes, i.e. "myfilename.txt", not the meaning of the word 'string'.]
Thanks.
So, is "string" a std::string, a c-string or a *char, or does it depend on the context?
Neither C nor C++ have a built-in string data type, so any double-quoted strings in your code are essentially const char * (or const char [] to be exact). "C string" usually refers to this, specifically a character array with a null terminator.
In C++, std::string is a convenience class that wraps a raw string into an object. By using this, you can avoid having to do (messy) pointer arithmetic and memory reallocations by yourself.
Most standard library functions still take only char * (or const char *) parameters.
You can implicitly convert a char * into std::string because the latter has a constructor to do that.
You must explicitly convert a std::string into a const char * by using the c_str() method.
Thanks to Clark Gaebel for pointing out constness, and jalf and GMan for mentioning that it is actually an array.
"myString" is a string literal, and has the type const char[9], an array of 9 constant char. Note that it has enough space for the null terminator. So "Hi" is a const char[3], and so forth.
This is pretty much always true, with no ambiguity. However, whenever necessary, a const char[9] will decay into a const char* that points to its first element. And std::string has an implicit constructor that accepts a const char*. So while it always starts as an array of char, it can become the other types if you need it to.
Note that string literals have the unique property that const char[N] can also decay into char*, but this behavior is deprecated. If you try to modify the underlying string this way, you end up with undefined behavior. Its just not a good idea.
std::string file = "file.txt";
The right hand side of the = contains a (raw) string literal (i.a. a null-terminated byte string). Its effective type is array of const char.
The = is a tricky pony here: No assignment happens. The std::string class has a constructor that takes a pointer to char as an argument and this is called to create a temporary std::string and this is used to copy-construct (using the copy ctor of std::string) the object file of type std::string.
The compiler is free to elide the copy ctor and directly instantiate file though.
However, note that std:string is not the same thing as a C-style null-terminated string. It is not even required to be null-terminated.
ifstream inf("file.txt");
The std::ifstream class has a ctor that takes a const char * and the string literal passed to it decays to a pointer to the first element of the string.
The thing to remember is this: std::string provides (almost seamless) conversion from C-style strings. You have to look up the signature of the function to see if you are passing in a const char * or a std::string (the latter because of implicit conversions).
So, is "string" a std::string, a c-string or a char*, or does it depend on the context?
It depends entirely on the context. :-) Welcome to C++.
A C string is a null-terminated string, which is almost always the same thing as a char*.
Depending on the platforms and frameworks you are using, there might be even more meanings of the word "string" (for example, it is also used to refer to QString in Qt or CString in MFC).
The C++ standard library provides a std::string class to manage and represent character sequences. It encapsulates the memory management and is most of the time implemented as a C-string; but that is an implementation detail. It also provides manipulation routines for common tasks.
The std::string type will always be that (it doesn't have a conversion operator to char* for example, that's why you have the c_str() method), but it can be initialized or assigned to by a C-string (char*).
On the other hand, if you have a function that takes a std::string or a const std::string& as a parameter, you can pass a c-string (char*) to that function and the compiler will construct a std::string in-place for you. That would be a differing interpretation according to context as you put it.
Neither C nor C++ have a built-in string data type.
When the compiler finds, during the compilation, a double-quoted strings is implicitly referred (see the code below), the string itself is stored in program code/text and generates code to create even character array:
The array is created in static storage because it must persist to be referred later.
The array is made to constant because it must always contain the original data (Hello).
So at last, what you have is const char * to this constant static character array.
const char* v()
{
char* text = “Hello”;
return text;
// Above code can be reduced to:
// return “Hello”;
}
During the program run, when the control finds opening bracket, it creates “text”, the char* pointer, in the stack and constant array of 6 elements (including the null terminator ‘\0’ at the end) in static memory area. When control finds next line (char* text = “Hello”;), the starting address of the 6 element array is assigned to “text”. In next line (return text;), it returns “text”. With the closing bracket “text” will disappear from the stack, but array is still in the static memory area.
You need not to make return type const. But if you try to change the value in static array using non constant char* it will still give you an error during the run time because the array is constant. So, it’s always good to make return constant to make sure, it cannot be referred by non constant pointer.
But if the compiler finds a double-quoted strings is explicitly referred as an array, the compiler assumes that the programmer is going to (smartly) handle it. See the following wrong example:
const char* v()
{
char text[] = “Hello”;
return text;
}
During the compilation, compiler checks, quoted text and save it as it is in the code to fill the generated array during the runt time. Also, it calculate the array size, in this case again as 6.
During the program run, with the open bracket, the array “text[]” with 6 elements is created in stack. But no initialization. When the code finds (char text[] = “Hello”;), the array is initialized (with the text in compiled code). So array is now on the stack. When the compiler finds (return text;), it returns the starting address of the array “text”. When the compiler find the closing bracket, the array disappears from the stack. So no way to refer it by the return pointer.
Most standard library functions still take only char * (or const char *) parameters.
The Standard C++ library has a powerful class called string for manipulating text. The internal data structure for string is character arrays. The Standard C++ string class is designed to take care of (and hide) all the low-level manipulations of character arrays that were previously required of the C programmer. Note that std::string is a class:
You can implicitly convert a char * into std::string because the
latter has a constructor to do that.
You can explicitly convert a std::string into a const char * by using the c_str() method.
As often as possible it should mean std::string (or an alternative such as wxString, QString, etc., if you're using a framework that supplies such. Sometimes you have no real choice but to use a NUL-terminated byte sequence, but you generally want to avoid it when possible.
Ultimately, there simply is no clear, unambiguous terminology. Such is life.
To use the proper wording (as found in the C++ language standard) string is one of the varieties of std::basic_string (including std::string) from chapter 21.3 "String classes" (as in C++0x N3092), while the argument of ifstream's constructor is NTBS (Null-terminated byte sequence)
To quote, C++0x N3092 27.9.1.4/2.
basic_filebuf* open(const char* s, ios_base::openmode mode);
...
opens a file, if possible, whose name is the NTBS s

when to use const char *

If i have a function api that expects a 14 digit input and returns a 6 digit output. I basically define the input as a const char *. would that be the correct and safe thing to do?
also why would I not want to just do char * which I could but it seems more prudent to use const char * in that case especially since its an api that i am providing. so for different input values I generate 6 digit codes.
I am not sure why are you using char pointers, where you could use std::string:
std::string code(const std::string& input)
{ ... }
If you don't have the choice, using const char* gives a guarantee to the user that you won't change his data especially if it was a string literal where modifying one is undefined behavior.
By using const you're promising your user that you won't change the string being passed in. It becomes part of the API helping define your function's behavior. It also let's users pass constant strings, including literal strings like "mystring".
When you say const char *c you are telling the compiler that you will not be making any changes to the data that c points to. So this is a good practice if you will not be directly modifying your input data.
String literals have static storage class (they exist for the duration of the program) and may or may not be shared if the same string literal is referenced from multiple locations in a program. The effect of modifying a string literal is undefined; thus, you should always declare a pointer to a string literal as const char *.
You get several benefits for using const:
It documents your code, the user knows no harm will be done to this string.
You allow the user to send a const char* which he might have. Converting from non-const to const is automatic. The other way around is something that should be avoided (And done explicitly, and might lead to undefined behavior at times)
You let the compiler check you. The compiler can now verify that you don't accidentally change the user's string.
You need to use a const char * anywhere that you're passing a string literal, or the compiler will balk (assuming you don't want to convert it to a std::string).
const char* is usually used in parameters, stating that your function won't modify that string.
void function(char* modified_str, const char* not_modified_str) { ... }
If you're returning the const char* what you want to say is not obvious. You try to tell that nobody should modify the returned string, but you still (I think it would be that way) transfer the ownership to the calling routine, so that it would have to invoke delete[] on the char that your function returned.
Generally speaking, use std::string, then your function will look the following way:
std::string function(std::string& modified_str, const std::string& not_modified_str) { ... }