Have a look at this code:
#include <iostream>
using namespace std;
int main()
{
const char* str0 = "Watchmen";
const char* str1 = "Watchmen";
char* str2 = "Watchmen";
char* str3 = "Watchmen";
cerr << static_cast<void*>( const_cast<char*>( str0 ) ) << endl;
cerr << static_cast<void*>( const_cast<char*>( str1 ) ) << endl;
cerr << static_cast<void*>( str2 ) << endl;
cerr << static_cast<void*>( str3 ) << endl;
return 0;
}
Which produces an output like this:
0x443000
0x443000
0x443000
0x443000
This was on the g++ compiler running under Cygwin. The pointers all point to the same location even with no optimization turned on (-O0).
Does the compiler always optimize so much that it searches all the string constants to see if they are equal? Can this behaviour be relied on?
It can't be relied on, it is an optimization which is not a part of any standard.
I'd changed corresponding lines of your code to:
const char* str0 = "Watchmen";
const char* str1 = "atchmen";
char* str2 = "tchmen";
char* str3 = "chmen";
The output for the -O0 optimization level is:
0x8048830
0x8048839
0x8048841
0x8048848
But for the -O1 it's:
0x80487c0
0x80487c1
0x80487c2
0x80487c3
As you can see GCC (v4.1.2) reused first string in all subsequent substrings. It's compiler choice how to arrange string constants in memory.
It's an extremely easy optimization, probably so much so that most compiler writers don't even consider it much of an optimization at all. Setting the optimization flag to the lowest level doesn't mean "Be completely naive," after all.
Compilers will vary in how aggressive they are at merging duplicate string literals. They might limit themselves to a single subroutine — put those four declarations in different functions instead of a single function, and you might see different results. Others might do an entire compilation unit. Others might rely on the linker to do further merging among multiple compilation units.
You can't rely on this behavior, unless your particular compiler's documentation says you can. The language itself makes no demands in this regard. I'd be wary about relying on it in my own code, even if portability weren't a concern, because behavior is liable to change even between different versions of a single vendor's compiler.
You surely should not rely on that behavior, but most compilers will do this. Any literal value ("Hello", 42, etc.) will be stored once, and any pointers to it will naturally resolve to that single reference.
If you find that you need to rely on that, then be safe and recode as follows:
char *watchmen = "Watchmen";
char *foo = watchmen;
char *bar = watchmen;
You shouldn't count on that of course. An optimizer might do something tricky on you, and it should be allowed to do so.
It is however very common. I remember back in '87 a classmate was using the DEC C compiler and had this weird bug where all his literal 3's got turned into 11's (numbers may have changed to protect the innocent). He even did a printf ("%d\n", 3) and it printed 11.
He called me over because it was so weird (why does that make people think of me?), and after about 30 minutes of head scratching we found the cause. It was a line roughly like this:
if (3 = x) break;
Note the single "=" character. Yes, that was a typo. The compiler had a wee bug and allowed this. The effect was to turn all his literal 3's in the entire program into whatever happened to be in x at the time.
Anyway, its clear the C compiler was putting all literal 3's in the same place. If a C compiler back in the 80's was capable of doing this, it can't be too tough to do. I'd expect it to be very common.
I would not rely on the behavior, because I am doubtful the C or C++ standards would make explicit this behavior, but it makes sense that the compiler does it. It also makes sense that it exhibits this behavior even in the absence of any optimization specified to the compiler; there is no trade-off in it.
All string literals in C or C++ (e.g. "string literal") are read-only, and thus constant. When you say:
char *s = "literal";
You are in a sense downcasting the string to a non-const type. Nevertheless, you can't do away with the read-only attribute of the string: if you try to manipulate it, you'll be caught at run-time rather than at compile-time. (Which is actually a good reason to use const char * when assigning string literals to a variable of yours.)
No, it can't be relied on, but storing read-only string constants in a pool is a pretty easy and effective optimization. It's just a matter of storing an alphabetical list of strings, and then outputting them into the object file at the end. Think of how many "\n" or "" constants are in an average code base.
If a compiler wanted to get extra fancy, it could re-use suffixes too: "\n" can be represented by pointing to the last character of "Hello\n". But that likely comes with very little benifit for a significant increase in complexity.
Anyway, I don't believe the standard says anything about where anything is stored really. This is going to be a very implementation-specific thing. If you put two of those declarations in a separate .cpp file, then things will likely change too (unless your compiler does significant linking work.)
Related
My knowledge till now was that arrays in C and CPP/C++ have fixed sizes. However recently I encountered 2 pieces of code which seems to contradict this fact. I am attaching the pics here. Want to hear everyone's thoughts on how these are working. Also pasting the code and doubts here:
1.
#include <iostream>
#include <string.h>
using namespace std;
int main()
{
char str1[]="Good"; //size of str1 should be 5
char str2[]="Afternoon"; //size of str2 should be 10
cout<<"\nSize of str1 before the copy: "<<sizeof(str1);
cout<<"\nstr1: "<<str1;
strcpy(str1,str2); //copying str1 into str2
cout<<"\nSize of str1 after the copy: "<<sizeof(str1);
cout<<"\nstr1: "<<str1;
return 0;
}
your text
O/P:
Size of str1 before the copy: 5
str1: Good
Size of str1 after the copy: 5
str1: Afternoon
In first snippet I am using strcpy to copy char str2[] contents that is "Afternoon" into char str1[] whose size is 5 less than size of str2. So theoritically the line strcpy(str1,str2) should give error as size of str1 is less than size of str2 and fixed. But it executes, and more surprising is the fact that even after str1 contain the word "afternoon" the size is still the same.
2.
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char first_string[10]; // declaration of char array variable
char second_string[20]; // declaration of char array variable
int i; // integer variable declaration
cout<<"Enter the first string: ";
cin>>first_string;
cout<<"\nEnter the second string: ";
cin>>second_string;
for(i=0;first_string[i]!='\0';i++);
for(int j=0;second_string[j]!='\0';j++)
{
first_string[i]=second_string[j];
i++;
}
first_string[i]='\0';
cout<<"After concatenation, the string would look like: "<<first_string;
return 0;
}
O/P:
Enter the first string: good
Enter the second string: afternoon
After concatenation, the string would look like: goodafternoon
Here also even if I provide a string of length 20 as input to second_string[] it's still able to concatenate both the strings and put them in first_string[], even though the size of the concatenated string will be clearly greater than size of first_string[] which is 10.
I tried to assign a string of greater length to a string variable of smaller length. techincally it should not work but it worked anyway
There are two misunderstandings here
sizeof is the size of the array at compile time. It has nothing to do with the contents of the array. You can change the contents all you like and sizeof will still be the same. If you want the length of a string use the function strlen.
Most of the time when you break the rules of C++ it leads to undefined behaviour. Copying a string into an array that is too small to hold that string is one example of undefined behaviour.
You said
So theoritically the line strcpy(str1,str2) should give error as size
of str1 is less than size of str2 and fixed.
This is untrue. Undefined behaviour does not mean that there must be an error. It means exactly what it says, the behaviour of your program is undefined, anything could happen. That might mean an error message, or it might mean a crash, or it might mean that your program seems to work. The behaviour is undefined.
You aren't alone in thinking as you did. I reckon the purpose of sizeof and the nature of undefined behaviour are two of the commonest beginner misunderstandings.
And to answer the question in the title. The size of a character array is fixed in C++, nothing in your example contradicts that.
I've honestly never seen a C++ programmer write char stringname[20] = "string";, that just isn't the way you'd handle strings in C++⁰.
And neither would a C programmer use array notation, because well, it's just not common; you'd typically use arrays for things that aren't strings, even if the type of a "string literal" is actually char[length + 1].
Your access beyond the end of an array is simply a bug. It is undefined behaviour. A buffer overflow. A static code analyzer, quite possibly even a compiler, would tell you that this is a mortal sin. The str* functions know literally nothing about the size of your array, they only see a pointer to the first element, and your array literally knows nothing about the length of the string it contains, which is given by the terminating zero character's position. You're mixing up two things here!
In C++, you'd definitely use the std::string class to read from cin, exactly to avoid the problem with buffer overflows.
So, honestly: If you're a C++ beginner, maybe try to ignore C strings for now. It's not a C++ way of dealing with string data other than fixed string literals (i.e., things between "" in your source code), and the C way of string handling is literally still the dominant cause for remote-exploitable bugs in software, far as I can tell. C++ is not C, and, honestly, when it comes to handling strings, for the better. Including both <string.h> and <iostreams> is a pretty reliable indication of a programming beginner who has access to bad guides that treat C++ as extended C. But that's simply not true; it's a very different programming language with some far-reaching C compatibility, but you would, and should, not mix these two languages – as a beginner, it's hard enough to learn one¹.
⁰ Technically speaking, it even feels wrong; a string literal in C++ is a const char pointer, whereas it's just a char pointer in C. C and C++ are not the same language.
¹If you feel like you're explaining C++ to people, and sometimes feel overwhelmed with making a good explanation for things to people who are not expert C programmers already, Kate Gregory made a nice talk why teaching C to teach C++ is a really bad idea, which I agree to, even if she overstresses a few points.
So, my phd project relies on a piece of software I've been building for nearly 3 years. It runs, its stable (It doesn't crash or throw exceptions) and I'm playing with the release version of it. And I've come to realise that there is a huge performance hit, because I'm relying too much on boost::iequals.
I know, there's a lot of on SO about this, this is not a question on how to do it, but rather why is this happening.
Consider the following:
#include <string.h>
#include <string>
#include <boost/algorithm/string.hpp>
void posix_str ( )
{
std::string s1 = "Alexander";
std::string s2 = "Pericles";
std::cout << "POSIX strcasecmp: " << strcasecmp( s1.c_str(), s2.c_str() ) << std::endl;
}
void boost_str ( )
{
std::string s1 = "Alexander";
std::string s2 = "Pericles";
std::cout << "boost::iequals: " << boost::iequals( s1, s2 ) << std::endl;
}
int main ( )
{
posix_str();
boost_str();
return 0;
}
I put this through valgrind and cachegrind, and to my suprise, boost is 4 times slower than the native posix or the std (which appears to be using the same posix) methods. Four times, now that is a lot, even considering that C++ offers a nice safety net. Why is that? I would really like other people to run this, and explain to me, what makes such a performance hit. Is it all the allocations (seems to be from the caller map).
I'm not dissing on boost, I love it and use it everywhere and anywhere.
EDIT: This graph shows what I mean
Boost::iequals is locale-aware. As you can see from its definition here it takes an optional third parameter that is defaulted to a default-constructed std::locale, that represents the current global C++ locale, as set by std::locale::global.
This more or less means that the compiler has no way to know in advance which locale is going to be used, and that means that there will be an indirect call to a certain function to convert each character to lower-case in the current locale.
On the other hand, the documentation for strcasecmp states that:
In the POSIX locale, strcasecmp() and strncasecmp() shall behave as if the strings had been converted to lowercase and then a byte comparison performed. The results are unspecified in other locales.
That means that the locale is fixed, hence you can expect it to be heavily optimized.
I was solving a question my teacher gave me and hit a little snag.
I am supposed to give the output of the following code:(It's written in Turbo C++)
#include<iostream.h>
void main()
{
char *p = "School";
char c;
c=++(*(p++));
cout<<c<<","<<p<<endl;
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
}
The output the program gives is:
T,chool
ijool,j,i
I got the part where the pointer itself increments and then increments the value which the pointer points to. But i don't get the part where the string prints out ijool
Can someone help me out?
The program you showed is non-standard and ill-formed (and should not compile).
"Small" problems:
The proper header for input/output streams in C++ is <iostream>, not <iostream.h>
main() returns an int, not a void.
cout and endl cannot be used without a using namespace std; at the beginning of the file, or better: use std::cout and std::endl.
"Core" problems:
char* p = "School"; is a pointer to string litteral. This conversion is valid in C++03 and deprecated in C++11. Aside from that, normally string litterals are read only, and attempts to modify them often result in segfaults (and modifying a string litteral is undefined behvior by the standard). So, you have undefined behavior everytime you use p, because you modify what it points to, which is the string litteral.
More subtle (and the practical explanation): you are modifying p several times in the line std::cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<std::endl;. It is undefined behavior. The order used for the operations on p is not defined, here it seems the compiler starts from the right. You can see sequence points, sequence before/after for a better explanation.
You might be interested with the live code here, which is more like what you seemed to expect from your program.
Let's assume you correct:
the header to <iostream> - there is no iostream.h header
your uses of cout and endl with std::cout and std::endl respectively
the return type of main to int
Okay,
char *p = "School";
The string literal "School" is of type "array of 7 const char." The conversion to char* was deprecated in C++03. In C++11, this is invalid.
c=++(*(p++));
Here we hit undefined behaviour. As I said before, the chars in a string literal are const. You simply can't modify them. The prefix ++ here will attempt to modify the S character in the string literal.
So from this point onwards, there's no use making conjectures about what should happen. You have undefined behaviour. Anything can happen.
Even if the preceding lines were legal, this line is also undefined behavior, which means that you cannot accurately predict what the output will be:
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
Notice how it modifies the value of p multiple times on that line (really between sequence points)? That's not allowed. At best you can say "on this compiler with this run-time library and this environment at this moment of execution I observed the following behavior", but because it is undefined behavior you can't count on it to do the same thing every time you run the program, or even if the same code is encountered multiple times within the same run of the program.
There are at least three problems with this code (and maybe more; I'm not a C++ expert).
The first problem is that string constants like should not be modified as they can be placed in read-only parts of the program memory that the OS maps directly to the exe file on disk (the OS may share them between several running instances of that same program for example, or avoid those parts of memory needing to be written to the swap file when RAM is low, as it knows it can get the original from the exe). The example crashes on my compiler, for example. To modify the string you should allocate a modifiable duplicate of the string, such as with strdup.
The second problem is it's using cout and endl from the std namespace without declaring that. You should prefix their accesses with std:: or add a using namespace std; declaration.
The third problem is that the order in which the operations on the second cout line happen is undefined behavior, leading to the apparently mysterious change of the string between the time it was displayed at the end of the first cout line and the next line.
Since this code is not intended to do anything in particular, there are different, valid ways you could fix it. This will probably run:
#include <iostream>
#include <string.h>
#include <stdlib.h>
using namespace std;
int main()
{
char *string = strdup("School");
char *p = string;
char c;
c=++(*(p++));
cout<<c<<","<<p<<endl;
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
free(string);
}
(On my compiler this outputs: T,chool, diool,i,d.)
It still has undefined behavior though. To fix that, rework the second cout line as follows:
cout << p << ",";
cout << ++(*(p--)) << ",";
cout << ++(*(p++)) << endl;
That should give T,chool, chool,d,U (assuming a character set that has A to Z in order).
p++ moves the position of p from "School" to "chool". Before that, since it is p++, not ++p, it increments the value of the char. Now c = "T" from "S"
When you output p, you output the remainder of p, which we identified before as "chool".
Since it is best to learn from trial and error, run this code with a debugger. That is a great tool which will follow you forever. That will help for the second set of cout statements. If you need help with gdb or VS debugger, we can walk through it.
My understanding is that string is a member of the std namespace, so why does the following occur?
#include <iostream>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString);
cin.get();
return 0;
}
Each time the program runs, myString prints a seemingly random string of 3 characters, such as in the output above.
C++23 Update
We now finally have std::print as a way to use std::format for output directly:
#include <print>
#include <string>
int main() {
// ...
std::print("Follow this command: {}", myString);
// ...
}
This combines the best of both approaches.
Original Answer
It's compiling because printf isn't type safe, since it uses variable arguments in the C sense1. printf has no option for std::string, only a C-style string. Using something else in place of what it expects definitely won't give you the results you want. It's actually undefined behaviour, so anything at all could happen.
The easiest way to fix this, since you're using C++, is printing it normally with std::cout, since std::string supports that through operator overloading:
std::cout << "Follow this command: " << myString;
If, for some reason, you need to extract the C-style string, you can use the c_str() method of std::string to get a const char * that is null-terminated. Using your example:
#include <iostream>
#include <string>
#include <stdio.h>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString.c_str()); //note the use of c_str
cin.get();
return 0;
}
If you want a function that is like printf, but type safe, look into variadic templates (C++11, supported on all major compilers as of MSVC12). You can find an example of one here. There's nothing I know of implemented like that in the standard library, but there might be in Boost, specifically boost::format.
[1]: This means that you can pass any number of arguments, but the function relies on you to tell it the number and types of those arguments. In the case of printf, that means a string with encoded type information like %d meaning int. If you lie about the type or number, the function has no standard way of knowing, although some compilers have the ability to check and give warnings when you lie.
Please don't use printf("%s", your_string.c_str());
Use cout << your_string; instead. Short, simple and typesafe. In fact, when you're writing C++, you generally want to avoid printf entirely -- it's a leftover from C that's rarely needed or useful in C++.
As to why you should use cout instead of printf, the reasons are numerous. Here's a sampling of a few of the most obvious:
As the question shows, printf isn't type-safe. If the type you pass differs from that given in the conversion specifier, printf will try to use whatever it finds on the stack as if it were the specified type, giving undefined behavior. Some compilers can warn about this under some circumstances, but some compilers can't/won't at all, and none can under all circumstances.
printf isn't extensible. You can only pass primitive types to it. The set of conversion specifiers it understands is hard-coded in its implementation, and there's no way for you to add more/others. Most well-written C++ should use these types primarily to implement types oriented toward the problem being solved.
It makes decent formatting much more difficult. For an obvious example, when you're printing numbers for people to read, you typically want to insert thousands separators every few digits. The exact number of digits and the characters used as separators varies, but cout has that covered as well. For example:
std::locale loc("");
std::cout.imbue(loc);
std::cout << 123456.78;
The nameless locale (the "") picks a locale based on the user's configuration. Therefore, on my machine (configured for US English) this prints out as 123,456.78. For somebody who has their computer configured for (say) Germany, it would print out something like 123.456,78. For somebody with it configured for India, it would print out as 1,23,456.78 (and of course there are many others). With printf I get exactly one result: 123456.78. It is consistent, but it's consistently wrong for everybody everywhere. Essentially the only way to work around it is to do the formatting separately, then pass the result as a string to printf, because printf itself simply will not do the job correctly.
Although they're quite compact, printf format strings can be quite unreadable. Even among C programmers who use printf virtually every day, I'd guess at least 99% would need to look things up to be sure what the # in %#x means, and how that differs from what the # in %#f means (and yes, they mean entirely different things).
use myString.c_str() if you want a c-like string (const char*) to use with printf
thanks
Use std::printf and c_str()
example:
std::printf("Follow this command: %s", myString.c_str());
You can use snprinft to determine the number of characters needed and allocate a buffer of the right size.
int length = std::snprintf(nullptr, 0, "There can only be %i\n", 1 );
char* str = new char[length+1]; // one more character for null terminator
std::snprintf( str, length + 1, "There can only be %i\n", 1 );
std::string cppstr( str );
delete[] str;
This is a minor adaption of an example on cppreference.com
printf accepts a variable number of arguments. Those can only have Plain Old Data (POD) types. Code that passes anything other than POD to printf only compiles because the compiler assumes you got your format right. %s means that the respective argument is supposed to be a pointer to a char. In your case it is an std::string not const char*. printf does not know it because the argument type goes lost and is supposed to be restored from the format parameter. When turning that std::string argument into const char* the resulting pointer will point to some irrelevant region of memory instead of your desired C string. For that reason your code prints out gibberish.
While printf is an excellent choice for printing out formatted text, (especially if you intend to have padding), it can be dangerous if you haven't enabled compiler warnings. Always enable warnings because then mistakes like this are easily avoidable. There is no reason to use the clumsy std::cout mechanism if the printf family can do the same task in a much faster and prettier way. Just make sure you have enabled all warnings (-Wall -Wextra) and you will be good. In case you use your own custom printf implementation you should declare it with the __attribute__ mechanism that enables the compiler to check the format string against the parameters provided.
The main reason is probably that a C++ string is a struct that includes a current-length value, not just the address of a sequence of chars terminated by a 0 byte. Printf and its relatives expect to find such a sequence, not a struct, and therefore get confused by C++ strings.
Speaking for myself, I believe that printf has a place that can't easily be filled by C++ syntactic features, just as table structures in html have a place that can't easily be filled by divs. As Dykstra wrote later about the goto, he didn't intend to start a religion and was really only arguing against using it as a kludge to make up for poorly-designed code.
It would be quite nice if the GNU project would add the printf family to their g++ extensions.
Printf is actually pretty good to use if size matters. Meaning if you are running a program where memory is an issue, then printf is actually a very good and under rater solution. Cout essentially shifts bits over to make room for the string, while printf just takes in some sort of parameters and prints it to the screen. If you were to compile a simple hello world program, printf would be able to compile it in less than 60, 000 bits as opposed to cout, it would take over 1 million bits to compile.
For your situation, id suggest using cout simply because it is much more convenient to use. Although, I would argue that printf is something good to know.
Here’s a generic way of doing it.
#include <string>
#include <stdio.h>
auto print_helper(auto const & t){
return t;
}
auto print_helper(std::string const & s){
return s.c_str();
}
std::string four(){
return "four";
}
template<class ... Args>
void print(char const * fmt, Args&& ...args){
printf(fmt, print_helper(args) ...);
}
int main(){
std::string one {"one"};
char const * three = "three";
print("%c %d %s %s, %s five", 'c', 3+4, one + " two", three, four());
}
Ps: This is more of a conceptual question.
I know this makes things more complicated for no good reason, but here is what I'm wondering. If I'm not mistaken, a const char* "like this" in c++ is pointing to l and will be automatically zero terminated on compile time. I believe it is creating a temporary variable const char* to hold it, unless it is keeping track of the offset using a byte variable (I didn't check the disassembly). My question is, how would you if even possible, add characters to this string without having to call functions or instantiating strings?
Example (This is wrong, just so you can visualize what I meant):
"Like thi" + 's';
The closest thing I came up with was to store it to a const char* with enough spaces and change the other characters.
Example:
char str[9];
strcpy(str, "Like thi")
str[8] = 's';
Clarification:
Down vote: This question does not show any research effort; it is unclear or not useful
Ok, so the question has been highly down voted. There wasn't much reasoning on which of these my question was lacking on, so I'll try to improve all of those qualities.
My question was more so I could have a better understanding of what goes on when you simply create a string "like this" without storing the address of that string in a const char* I also wanted to know if it was possible to concatenate/change the content of that string without using functions like strcat() and without using the overloaded operator + from the class string. I'm aware this is not exactly useful for dealing with strings in C++, but I was curious whether or not there was a way besides the standard ways for doing so.
string example = "Like thi" + "s"; //I'm aware of the string class and its member functions
const char* example2 = "Like this"; //I'm also aware of C-type Strings (CString as well)
It is also possible that not having English as my native language made things even worst, I apologize for the confusion.
Instead of using a plain char string, you should use the string library provided by the C++ library:
#include <string>
#include <iostream>
using namespace std;
int main()
{
string str = "Like thi";
cout << str << endl;
str = str + "s";
cout << str << endl;
return 0;
}
Normally, it's not possible to simply concatenate plain char * strings in C or C++, because they are merely pointers to arrays of characters. There's almost no reason you should be using a bare character array in C++ if you intend on doing any string manipulations within your own code.
Even if you need access to the C representation (e.g. for an external library) you can use string::c_str().
First, there is nothing null terminated, but the zero terminated. All char* strings in C end with '\0'.
When you in code do something like this:
char *name="Daniel";
compiler will generate a string that has a contents:
Daniel\0
and will initialize name pointer to point at it at a certain time during program execution depending on the variable context (member, static, ...).
Appending ANYTHING to the name won't work as you expect, since memory pointed to by name isn't changeable, and you'll probably get either access violation error or will overwrite something else.
Having
const char* copyOfTheName = name;
won't create a copy of the string in question, it will only have copyOfTheName point to the original string, so having
copyOfTheName[6]='A';
will be exactly as
name[6]='A';
and will only cause problems to you.
Use std::strcat instead. And please, do some investigating how the basic string operations work in C.