I am confused why compiler gives
const char s[]="hello";
s[2]='t'; // Compile Time Error
char *t = "hello";
*(t+2)='u'; // Run time Error
I guess in both case the compiler should give compile time error. Can anyone tell me particular reason for this to be this way?
In the first case, you are writing to a const and the compiler notices that and can reject that.
In the second case, t is a pointer to a non-const char, so you can dereference it and write at *(t+2). However, since t is initialized with a pointer to a read-only segment, you are getting a segmentation violation at runtime.
You could painfully configure your linker to put all data in writable segments. This is ugly and non standard.
P.S. Some sophisticated static analyzers (maybe Frama-C) might catch both errors without running the program. One could also imagine extending GCC e.g. with MELT to add such checks (but this is non-trivial work, and it might be hard to get funded for it).
Backwards compatibility.
You can't modify a const char. That much is obvious.
What isn't obvious is that the type of a string literal is actually a pointer to constant characters, not a pointer to characters. The second declaration, actually, therefore has a wrong type. This is supported, however, for historical reasons.
Note that the above is a bit of a lie. Rather than pointers, string literals are actually char[] types.
In particular, the type of a string literal is a char[] rather than a const char[] in C89 and C99 [and I think C11, not sure though]. It's not actually wrong then, but the data is stored in a read only segment, so it's undefined behavior to try to write to it.
Also, for what it's worth, you can use -Wwrite-strings with gcc (g++ already includes it) to be warned of this.
More information here and here.
When you do: char *t = "hello"; then t is a pointer that points to a memory that is in the code part, so you can't change it. Because it's read-only, you're getting segmentation fault at runtime.
When you do: const char s[]="hello"; then s is an array of chars that are on the stack, but it's const, so you can't change it and you get a compilation error (The compiler knows it is const so he doesn't allow you to change it).
Using const when you don't want your String to be changed is good because this arise a compilation error, which is a way better than a run time error.
Consider the following series of statements:
char *t = "hello";
char s[5];
t = s;
*(t+2)='u';
This series of statements will give no run time error, because the statement *(t+2)='u'; is not invalid. It is trying to modify a const (read-only) memory location in your case, but the compiler has no way of knowing whether an access violation will occur.
Related
I have a piece of code that looks like this:
constexpr gsl::cstring_span<> const somestring{"Hello, I am a string"};
and it refuses to compile with a message complaining that some non-constexpr function is being called somewhere.
Why is this? This seems like the most important use-case to support. The whole point is to have compile-time bounds checking if at all possible. Compile time bounds checking involving constant string literals seems like the thing it would be used for the most often. But this can't happen if it can't be declared constexpr. What's going on here?
I think the problem is that string literals have type array of const char and are null-terminated. But who is to say you are constructing your cstring_span from a null-terminated array?
Because of that the constructor of cstring_span does a physical check to remove the null terminator if it exists, otherwise accept the full length of the array.
I am not sure how powerful constexpr expressions can be but it may be possibly to implement it in a constexpr way. You could create an issue asking about it here:
https://github.com/Microsoft/GSL/issues
C++ compilers happily compiles this code, with no warning:
int ival = 8;
const char *strval = "x";
const char *badresult = ival + strval;
Here we add a char* pointer value (strval) to an int value (ival) and store the result in a char* pointer (badresult). Of course, the content of the badresult will be total garbage and the app might crash on this line or later when it is trying to use the badresult elsewhere.
The problem is that it is very easy to make such mistakes in real life. The one I caught in my code looked like this:
message += port + "\n";
(where message is a string type handling the result with its operator += function; port is an int and \n is obviously a const char pointer).
Is there any way to disable this kind of behavior and trigger an error at compile time?
I don't see any normal use case for adding char* to int and I would like a solution to prevent this kind of mistakes in my large code base.
When using classes, we can create private operators and use the explicit keyword to disable unneeded conversions/casts, however now we are talking about basic types (char* and int).
One solutions is to use clang as that has a flag to enable warning for this.
However I can't use clang, so I am seeking for a solution that triggers a compiler error (some kind of operator overload or mangling with some defines to prevent such constructs or any other idea).
Is there any way to disable this kind of behavior and trigger an error at compile time?
Not in general, because your code is very similar to the following, legitimate, code:
int ival = 3;
const char *strval = "abcd";
const char *goodresult = ival + strval;
Here goodresult is pointing to the last letter d of strval.
BTW, on Linux, getpid(2) is known to return a positive integer. So you could imagine:
int ival = (getpid()>=0)?3:1000;
const char *strval = "abcd";
const char *goodresult = ival + strval;
which is morally the same as the previous example (so we humans know that ival is always 3). But teaching the compiler that getpid() does not return a negative value is tricky in practice (the return type pid_t of getpid is some signed integer, and has to be signed to be usable by fork(2), which could give -1). And you could imagine more weird examples!
You want compile-time detection of buffer overflow (or more generally of undefined behavior), and in general that is equivalent to the halting problem (it is an unsolvable problem). So it is impossible in general.
Of course, one could claim that a clever compiler could warn for your particular case, but then there is a concern about what cases should be useful to warn.
You might try some static source program analysis tools, perhaps Clang static analyzer or Frama-C (with its recent Frama-C++ variant) - or some costly proprietary tools like Coverity and many others. These tools don't detect all errors statically and takes much more time to execute than an optimizing compiler.
You could (for example) write your own GCC plugin to detect such mistakes (that means developing your own static source code analyzer). You'll spend months in writing it. Are you sure it is worth the effort?
However I can't use clang,
Why? You could ask permission to use the clang static analyzer (or some other one), during development (not for production). If your manager refuses that, it becomes a management problem, not a technical one.
I don't see any normal use case for adding char* to int
You need more imagination. Think of something like
puts("non-empty" + (isempty(x)?4:0));
Ok that is not very readable code, but it is legitimate. In the previous century, when memory was costly, some people used to code that way.
Today you'll code perhaps
if (isempty(x))
puts("empty");
else
puts("non-empty")
and the cute thing is that a clever compiler could probably optimize the later into the equivalent of former (according to the as-if rule).
No way. It is valid syntax, and very useful in many cases.
Just think about you were to write int b=a+10 but you wrote int b=a+00 incorrectly, the compiler won't know it is an error by mistake.
However, you can consider to use C++ classes. Most C++ classes are well designed to prevent such obvious mistakes.
In the first example in your question, really, compilers should issue a warning. Compilers can trivially see that the addition resolves to 8 + "x" and clang does indeed optimise it to a constant. I see the fact it doesn't warn about this as a compiler bug. Although compilers are not required to warn about this, clang goes through great efforts to provide useful diagnostics, and it would be an improvement to diagnose this as well.
In the second example, as Matteo Italia pointed out, clang does already provide a warning option for this, enabled by default: -Wstring-plus-int. You can turn specific warnings into errors by using -Werror=<warning-option>, so in this case -Werror=string-plus-int.
I'm currently learning pointers and my professor provided this piece of code as an example:
//We cannot predict the behavior of this program!
#include <iostream>
using namespace std;
int main()
{
char * s = "My String";
char s2[] = {'a', 'b', 'c', '\0'};
cout << s2 << endl;
return 0;
}
He wrote in the comments that we can't predict the behavior of the program. What exactly makes it unpredictable though? I see nothing wrong with it.
The behaviour of the program is non-existent, because it is ill-formed.
char* s = "My String";
This is illegal. Prior to 2011, it had been deprecated for 12 years.
The correct line is:
const char* s = "My String";
Other than that, the program is fine. Your professor should drink less whiskey!
The answer is: it depends on what C++ standard you're compiling against. All the code is perfectly well-formed across all standards‡ with the exception of this line:
char * s = "My String";
Now, the string literal has type const char[10] and we're trying to initialize a non-const pointer to it. For all other types other than the char family of string literals, such an initialization was always illegal. For example:
const int arr[] = {1};
int *p = arr; // nope!
However, in pre-C++11, for string literals, there was an exception in §4.2/2:
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; [...]. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ]
So in C++03, the code is perfectly fine (though deprecated), and has clear, predictable behavior.
In C++11, that block does not exist - there is no such exception for string literals converted to char*, and so the code is just as ill-formed as the int* example I just provided. The compiler is obligated to issue a diagnostic, and ideally in cases such as this that are clear violations of the C++ type system, we would expect a good compiler to not just be conforming in this regard (e.g. by issuing a warning) but to fail outright.
The code should ideally not compile - but does on both gcc and clang (I assume because there's probably lots of code out there that would be broken with little gain, despite this type system hole being deprecated for over a decade). The code is ill-formed, and thus it does not make sense to reason about what the behavior of the code might be. But considering this specific case and the history of it being previously allowed, I do not believe it to be an unreasonable stretch to interpret the resulting code as if it were an implicit const_cast, something like:
const int arr[] = {1};
int *p = const_cast<int*>(arr); // OK, technically
With that, the rest of the program is perfectly fine, as you never actually touch s again. Reading a created-const object via a non-const pointer is perfectly OK. Writing a created-const object via such a pointer is undefined behavior:
std::cout << *p; // fine, prints 1
*p = 5; // will compile, but undefined behavior, which
// certainly qualifies as "unpredictable"
As there is no modification via s anywhere in your code, the program is fine in C++03, should fail to compile in C++11 but does anyway - and given that the compilers allow it, there's still no undefined behavior in it†. With allowances that the compilers are still [incorrectly] interpreting the C++03 rules, I see nothing that would lead to "unpredictable" behavior. Write to s though, and all bets are off. In both C++03 and C++11.
†Though, again, by definition ill-formed code yields no expectation of reasonable behavior
‡Except not, see Matt McNabb's answer
Other answers have covered that this program is ill-formed in C++11 due to the assignment of a const char array to a char *.
However the program was ill-formed prior to C++11 also.
The operator<< overloads are in <ostream>. The requirement for iostream to include ostream was added in C++11.
Historically, most implementations had iostream include ostream anyway, perhaps for ease of implementation or perhaps in order to provide a better QoI.
But it would be conforming for iostream to only define the ostream class without defining the operator<< overloads.
The only slightly wrong thing that I see with this program is that you're not supposed to assign a string literal to a mutable char pointer, though this is often accepted as a compiler extension.
Otherwise, this program appears well-defined to me:
The rules that dictate how character arrays become character pointers when passed as parameters (such as with cout << s2) are well-defined.
The array is null-terminated, which is a condition for operator<< with a char* (or a const char*).
#include <iostream> includes <ostream>, which in turn defines operator<<(ostream&, const char*), so everything appears to be in place.
You can't predict the behaviour of the compiler, for reasons noted above. (It should fail to compile, but may not.)
If compilation succeeds, then the behaviour is well-defined. You certainly can predict the behaviour of the program.
If it fails to compile, there is no program. In a compiled language, the program is the executable, not the source code. If you don't have an executable, you don't have a program, and you can't talk about behaviour of something that doesn't exist.
So I'd say your prof's statement is wrong. You can't predict the behaviour of the compiler when faced with this code, but that's distinct from the behaviour of the program. So if he's going to pick nits, he'd better make sure he's right. Or, of course, you might have misquoted him and the mistake is in your translation of what he said.
As others have noted, the code is illegitimate under C++11, although it was valid under earlier versions. Consequently, a compiler for C++11 is required to issue at least one diagnostic, but behavior of the compiler or the remainder of the build system is unspecified beyond that. Nothing in the Standard would forbid a compiler from exiting abruptly in response to an error, leaving a partially-written object file which a linker might think was valid, yielding a broken executable.
Although a good compiler should always ensure before it exits that any object file it is expected to have produced will be either valid, non-existent, or recognizable as invalid, such issues fall outside the jurisdiction of the Standard. While there have historically been (and may still be) some platforms where a failed compilation can result in legitimate-appearing executable files that crash in arbitrary fashion when loaded (and I've had to work with systems where link errors often had such behavior), I would not say that the consequences of syntax errors are generally unpredictable. On a good system, an attempted build will generally either produce an executable with a compiler's best effort at code generation, or won't produce an executable at all. Some systems will leave behind the old executable after a failed build, since in some cases being able to run the last successful build may be useful, but that can also lead to confusion.
My personal preference would be for disk-based systems to to rename the output file, to allow for the rare occasions when that executable would be useful while avoiding the confusion that can result from mistakenly believing one is running new code, and for embedded-programming systems to allow a programmer to specify for each project a program that should be loaded if a valid executable is not available under the normal name [ideally something which which safely indicates the lack of a useable program]. An embedded-systems tool-set would generally have no way of knowing what such a program should do, but in many cases someone writing "real" code for a system will have access to some hardware-test code that could easily be adapted to the purpose. I don't know that I've seen the renaming behavior, however, and I know that I haven't seen the indicated programming behavior.
I've read a few things saying that a #define doesn't take up any memory, but a colleague at work was very insistent that modern compilers don't have any differences when it comes to const int/strings.
#define STD_VEC_HINT 6;
const int stdVecHint = 6;
The conversation came about because an old bit of code that was being modernised that dealt with encryption that had its key as a #define.
I always thought that a variable would end up getting a memory address which would show its contents, but maybe compiling under release removes such stuff.
A good compiler will not allocate space for a const variable that can be elided. In C++, const variables at the module scope are also implicity static in visibility, so it's easier for the compiler to optimize out the variable as well. The link time optimization feature of GCC helps as well to do cross-module optimization.
Don't forget the even more important fact that const variables have proper scoping and type safety, which are missing from #define.
As with so many thinks, it depends..!
A #define will just inject the constant straight into your code, so it won't take up any memory. The same is potentially true for a const.
However, you can take the address of a const:
const int *value = &stdVecHint;
And since you're taking its address the compiler will need to store the constant in memory in order to generate an address, so in this case it will require memory.
The compiler is likely to replace occurrences with stdVecHint with the literal 6 everywhere it is used. An address and memory space will be taken up if you take its address explicitly, but then again this is a moot point since you couldn't do that with the STD_VEC_HINT. Pedantically, yes, stdVecHint is a variable with internal linkage and every translation unit that sees the definition will have its own copy of it. In practice, it shouldn't increase the memory footprint.
Pre-processor command uses this type of command like #define ,....
and no need to allocate memory because
and change name of constant [#define pi 3.14] like pi with 3.14 and then compile code
but in const command [const float pi=3.14;] need to memory allocation
For compatibility with older versions of the preprocessor is no problem, but future unclear
be successfull
With this code, I get a segmentation fault:
char* inputStr = "abcde";
*(inputStr+1)='f';
If the code was:
const char* inputStr = "abcde";
*(inputStr+1)='f';
I will get compile error for "assigning read-only location".
However, for the first case, there is no compile error; just the segmentation fault when the assign operation actually happened.
Can anyone explain this?
Here is what the standard says about string literals in section [2.13.4/2]:
A string literal that does not begin with u, U, or L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char”, where n is the size of the string as defined below; it has static storage duration (3.7) and is initialized with the given characters.
So, strictly speaking, "abcde" has type
const char[6]
Now what happens in your code is an implicit cast to
char*
so that the assignment is allowed. The reason why it is so is, likely, compatibility with C. Have a look also at the discussion here: http://learningcppisfun.blogspot.com/2009/07/string-literals-in-c.html
Once the cast is done, you are syntactically free to modify the literal, but it fails because the compiler stores the literal in a non writable segment of memory, as the standard itself allow.
This gets created in the code segment:
char *a = "abcde";
Essentially it's const.
If you wish to edit it, try:
char a[] = "abcde";
The standard states that you are not allowed to modify string literals directly, regardless of whether you mark them const or not:
Whether all string literals are distinct (that is, are stored in nonoverlapping
objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.
In fact, in C (unlike C++), string literals are not const but you're still not allowed to write to them.
This restriction on writing allows certain optimisations to take place, such as sharing of literals along the lines of:
char *ermsg = "invalid option";
char *okmsg = "valid option";
where okmsg can actually point to the 'v' character in ermsg, rather than being a distinct string.
String literals are typically stored in read-only memory. Trying to change this memory will kill your program.
Here's a good explanation: Is a string literal in c++ created in static memory?
It is mostly ancient history; once upon a long time ago, string literals were not constant.
However, most modern compilers place string literals into read-only memory (typically, the text segment of your program, where your code also lives), and any attempt to change a string literal will yield a core dump or equivalent.
With G++, you can most certainly get the compilation warning (-Wall if it is not enabled by default). For example, G++ 4.6.0 compiled on MacOS X 10.6.7 (but running on 10.7) yields:
$ cat xx.cpp
int main()
{
char* inputStr = "abcde";
*(inputStr+1)='f';
}
$ g++ -c xx.cpp
xx.cpp: In function ‘int main()’:
xx.cpp:3:22: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
$
So the warning is enabled by default.
What happened is that the compiler put the constant "abcde" in some read-only memory segment. You pointed your (non-const) char* inputStr at that constant, and kaboom, segfault.
Lesson to be learned: Don't invoke undefined behavior.
Edit (elaboration)
However, for the first case, there is no compile error, just segmentation fault when the assign operation actually happened.
You need to enabled your compiler warnings. Always set your compiler warnings as high as possible.
Even though "abcde" is a string literal, which should not be modified, you've told the compiler that you don't care about that by having a non-const char* point to it.
The compiler will happily assume that you know what you're doing, and not throw an error. However, there's a good chance that the code will fail at runtime when you do indeed try to modify the string literal.
String literals, while officially non-const, are almost always stored in read-only memory. In your setup, this is apparently only the case if it is declared as const char array.
Note that the standard forbids you to modify any string literal.
a little bit of history of string literals in Ritchie's words.
mostly about the orgin and the evolution of string literals from K&R 1.
Hope this might clarify a thing or two about const and string literals.
"From: Dennis Ritchie
Subject: Re: History question: String literals.
Date: 02 Jun 1998
Newsgroups: comp.std.c
At the time that the C89 committee was working, writable
string literals weren't "legacy code" (Margolin) and what standard
there existed (K&R 1) was quite explicit (A.2.5) that
strings were just a way of initializing a static array.
And as Barry pointed out there were some (mktemp) routines
that used this fact.
I wasn't around for the committee's deliberations on the
point, but I suspect that the BSD utility for fiddling
the assembler code to move the initialization of strings
to text instead of data, and the realization that most
literal strings were not in fact overwritten, was more
important than some very early version of gcc.
Where I think the committee might have missed something
is in failure to find a formulation that explained
the behavior of string literals in terms of const.
That is, if "abc" is an anonymous literal of type
const char [4]
then just about all of its properties (including the
ability to make read-only, and even to share its storage
with other occurrences of the same literal) are nearly
explained.
The problem with this was not only the relatively few
places that string literals were actually written on, but much
more important, working out feasible rules for assignments
to pointers-to-const, in particular for function's actual
arguments. Realistically the committee knew that whatever
rules they formulated could not require a mandatory
diagnostic for every func("string") in the existing world.
So they decided to leave "..." of ordinary char array
type, but say one was required not to write over it.
This note, BTW, isn't intended to be read as a snipe
at the formulation in C89. It is very hard to get things
both right (coherent and correct) and usable (consistent
enough, attractive enough).
Dennis
"