Are strings in C++ copied when modified? - c++

Using the std::string class in C++, it is possible to modify a character using the array notation, like:
std::string s = "Hello";
s[0] = 'X';
cout << s << '\n';
I have checked that this code compiles, and prints "Xello" as expected. However, I was wondering what the cost of this operation is: is it constant time, or is it O(n) because the string is copied?

The string isn't copied. The internal data is directly modified.
It basically gets the internal data pointer of the actual string memory, and modifies it. Imagine doing this:
char *data = &str[0];
for(size_t i = 0; i < str.size(); ++i)
{
data[i] = '!';
}
The code sets every character of the string to an exclamation mark.
But if the string was copied, then after the first write, the data pointer would become invalid.
Or to use another example:
std::cout << str[5] << std::endl;
That prints the 6th character of the string. Why would that copy the string?
C++ can't tell the difference between char c = str[5] and str[5] = c (except as far as const vs non-const function calls go).
Also, str[n] is guaranteed to never throw exceptions, as long as n < str.size(). It can't make that guarantee if it had to allocate memory internally for a copy - because the allocation could fail and throw.
(As #juanchopanza mentioned, older C++ standards permitted CoW strings, but the latest C++ standard forbids this)

You can modify stl string like in you example, no copy will be done. Standard library string class does not manage string pool like other languages do for strings (like Java). This operation is constant in complexity.

You do only modify the first element s[0], therefore it can't be O(n). You don't copy the string.

Related

Is it possible for separately initialized string variables to overlap?

If I initialize several string(character array) variables in the following ways:
const char* myString1 = "string content 1";
const char* myString2 = "string content 2";
Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.
So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)
By overlap, I mean the following behaviour;
// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;
It outputs
string costring content 2
string content 2
So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C++ can prevent other string literals from "landing" on the memory locations of the older ones.
How does C++/compiler avoid such problem?
If I change const char* to const char[], is it still the same?
Yes, string literals are allowed to overlap in general. From lex.string#9
... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.
A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.
Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.
The compiler knows the size of each string, because it can "see" it in your code.
Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.
And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.
If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.
As for the const char[], most compilers will treat it the same way as const char*.
Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)
const char *a = "Hello there."
const char *b = "Hello there."
cout << (a == b);
// prints "1" which means they point to the same thing
The const char * can share a string though.
const char *a = "Hello there.";
const char *b = a + 6;
cout << a;
// prints "Hello there."
cout << b;
// prints "there."
I think to answer your second question an explanation of c-style strings is useful.
A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)
So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".
const char *a = "Hello\0 there.";
cout << a;
// prints "Hello"
For the most part const char *a and const char a[] are the same.
// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."
// This is valid
const char *c = b + 3; // *c = "re."
// This, however, is not valid
const char d[] = b + 3;

in the C ++ stl, does the string container actually contain a string with a closing 0? [duplicate]

Will the below string contain the null terminator '\0'?
std::string temp = "hello whats up";
No, but if you say temp.c_str() a null terminator will be included in the return from this method.
It's also worth saying that you can include a null character in a string just like any other character.
string s("hello");
cout << s.size() << ' ';
s[1] = '\0';
cout << s.size() << '\n';
prints
5 5
and not 5 1 as you might expect if null characters had a special meaning for strings.
Not in C++03, and it's not even guaranteed before C++11 that in a C++ std::string is continuous in memory. Only C strings (char arrays which are intended for storing strings) had the null terminator.
In C++11 and later, mystring.c_str() is equivalent to mystring.data() is equivalent to &mystring[0], and mystring[mystring.size()] is guaranteed to be '\0'.
In C++17 and later, mystring.data() also provides an overload that returns a non-const pointer to the string's contents, while mystring.c_str() only provides a const-qualified pointer.
This depends on your definition of 'contain' here. In
std::string temp = "hello whats up";
there are few things to note:
temp.size() will return the number of characters from first h to last p (both inclusive)
But at the same time temp.c_str() or temp.data() will return with a null terminator
Or in other words int(temp[temp.size()]) will be zero
I know, I sound similar to some of the answers here but I want to point out that size of std::string in C++ is maintained separately and it is not like in C where you keep counting unless you find the first null terminator.
To add, the story would be a little different if your string literal contains embedded \0. In this case, the construction of std::string stops at first null character, as following:
std::string s1 = "ab\0\0cd"; // s1 contains "ab", using string literal
std::string s2{"ab\0\0cd", 6}; // s2 contains "ab\0\0cd", using different ctr
std::string s3 = "ab\0\0cd"s; // s3 contains "ab\0\0cd", using ""s operator
References:
https://akrzemi1.wordpress.com/2014/03/20/strings-length/
http://en.cppreference.com/w/cpp/string/basic_string/basic_string
Yes if you call temp.c_str(), then it will return null-terminated c-string.
However, the actual data stored in the object temp may not be null-terminated, but it doesn't matter and shouldn't matter to the programmer, because when then programmer wants const char*, he would call c_str() on the object, which is guaranteed to return null-terminated string.
With C++ strings you don't have to worry about that, and it's possibly dependent of the implementation.
Using temp.c_str() you get a C representation of the string, which will definitely contain the \0 char. Other than that, i don't really see how it would be useful on a C++ string
std::string internally keeps a count of the number of characters. Internally it works using this count. Like others have said, when you need the string for display or whatever reason, you can its c_str() method which will give you the string with the null terminator at the end.

Store value in Pointers as an Array - C++

I am trying to make a function like strcpy in C++. I cannot use built-in string.h functions because of restriction by our instructor. I have made the following function:
int strlen (char* string)
{
int len = 0;
while (string [len] != (char)0) len ++;
return len;
}
char* strcpy (char* *string1, char* string2)
{
for (int i = 0; i<strlen (string2); i++) *string1[i] = string2[i];
return *string1;
}
main()
{
char* i = "Farid";
strcpy (&i, "ABC ");
cout<<i;
}
But I am unable to set *string1 [i] value. When I try to do so an error appears on screen 'Program has encountered a problem and need to close'.
What should I do to resolve this problem?
Your strcpy function is wrong. When you write *string1[i] you are actually modifying the first character of the i-th element of an imaginary array of strings. That memory location does not exist and your program segfaults.
Do this instead:
char* strcpy (char* string1, char* string2)
{
for (int i = 0; i<strlen (string2); i++) string1[i] = string2[i];
return string1;
}
If you pass a char* the characters are already modifiable. Note It is responsibility of the caller to allocate the memory to hold the copy. And the declaration:
char* i = "Farid";
is not a valid allocation, because the i pointer will likely point to read-only memory. Do instead:
char i[100] = "Farid";
Now i holds 100 chars of local memory, plenty of room for your copy:
strcpy(i, "ABC ");
If you wanted this function to allocate memory, then you should create another one, say strdup():
char* strdup (char* string)
{
size_t len = strlen(string);
char *n = malloc(len);
if (!n)
return 0;
strcpy(n, string);
return n;
}
Now, with this function the caller has the responsibility to free the memory:
char *i = strdup("ABC ");
//use i
free(i);
Because this error in the declaration of strcpy: "char* *string1"
I don't think you meant string1 to be a pointer to a pointer to char.
Removing one of the * should word
The code has several issues:
You can't assign a string literal to char* because the string literal has type char const[N] (for a suitable value of N) which converts to char const* but not to char*. In C++03 it was possible to convert to char* for backward compatibility but this rule is now gone. That is, your i needs to be declared char const*. As implemented above, your code tries to write read-only memory which will have undesirable effects.
The declaration of std::strcpy() takes a char* and a char const*: for the first pointer you need to provide sufficient space to hold a string of the second argument. Since this is error-prone it is a bad idea to use strcpy() in the first place! Instead, you want to replicate std::strncpy() which takes as third argument the length of the first buffer (actually, I'm never sure if std::strncpy() guarantees zero termination or not; you definitely also want to guarantee zero termination).
It is a bad idea to use strlen() in the loop condition as the function needs to be evaluated for each iteration of the loop, effectively changing the complexity of strlen() from linear (O(N)) to quadratic (O(N2)). Quadratic complexity is very bad. Copying a string of 1000 characters takes 1000000 operations. If you want to try out the effect, copy a string with 1000000 characters using a linear and a quadratic algorithm.
Your strcpy() doesn't add a null-terminator.
In C++ (and in C since ~1990) the implicit int rule doesn't apply. That is, you really need to write int in front of main().
OK, a couple of things:
you are missing the return type for the main function
declaration. Not really allowed under the standard. Some compilers will still allow it, but others will fail on the compile.
the way you have your for loop structured in
strcpy you are calling your strlen function each time through
the loop, and it is having to re-count the characters in the source
string. Not a big deal with a string like "ABC " but as strings get
longer.... Better to save the value of the result into a variable and use that in the for loop
Because of the way that you are declaring i in
`main' you are pointing to read-only storage, and will be causing an
access violation
Look at the other answers here for how to rebuild your code.
Pointer use in C and C++ is a perennial issue. I'd like to suggest the following tutorial from Paul DiLorenzo, "Learning C++ Pointers for REAL dummies.".
(This is not to imply that you are a "dummy," it's just a reference to the ",insert subject here> for Dummies" lines of books. I would not be surprised that the insertion of "REAL" is to forestall lawsuits over trademarked titles)
It is an excellent tutorial.
Hope it helps.

What is the right way to handle char* strings?

I have a third party library that is using char* (non-const) as placeholder for string values. What is the right and safe way to assign values to those datatypes? I have the following test benchmark that uses my own timer class to measure execution times:
#include "string.h"
#include <iostream>
#include <sj/timer_chrono.hpp>
using namespace std;
int main()
{
sj::timer_chrono sw;
int iterations = 1e7;
// first method gives compiler warning:
// conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
cout << "creating c-strings unsafe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = "teststring";
}
sw.stop();
cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;
cout << "creating c-strings safe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = new char[strlen("teststr")];
strcpy(str, "teststring");
}
sw.stop();
cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;
return 0;
}
Output:
creating c-strings unsafe(?) way...
1.9164 ns
creating c-strings safe(?) way...
31.7406 ns
While the "safe" way get's rid of the compiler warning it makes the code about 15-20 times slower according to this benchmark (1.9 nanoseconds per iteration vs 31.7 nanoseconds per iteration). What is the correct way and what are is so dangerous about that "deprecated" way?
The C++ standard is clear:
An ordinary string literal has type “array of n const char” (section 2.14.5.8 in C++11).
and
The effect of attempting to modify a string literal is undefined (section 2.14.5.12 in C++11).
For a string known at compile time, the safe way of obtaining a non-const char* is this
char literal[] = "teststring";
you can then safely
char* ptr = literal;
If at compile time you don't know the string but know its length you can use an array:
char str[STR_LENGTH + 1];
If you don't know the length then you will need to use dynamic allocation. Make sure you deallocate the memory when the strings are no longer needed.
This will work only if the API doesn't take ownership of the char* you pass.
If it tries to deallocate the strings internally then it should say so in the documentation and inform you on the proper way to allocate the strings. You will need to match your allocation method with the one used internally by the API.
The
char literal[] = "test";
will create a local, 5 character array with automatinc storage (meaning the variable will be destroyed when the execution leaves the scope in which the variable is declared) and initialize each character in the array with the characters 't', 'e', 's', 't' and '\0'.
You can later edit these characters: literal[2] = 'x';
If you write this:
char* str1 = "test";
char* str2 = "test";
then, depending on the compiler, str1 and str2 may be the same value (i.e., point to the same string).
("Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation- defined." in Section 2.14.5.12 of the C++ standard)
It may also be true that they are stored in a read-only section of memory and therefore any attempt to modify the string will result in an exception/crash.
They are also, in reality of the type const char* so this line:
char* str = "test";
actually casts away the const-ness on the string, which is why the compiler will issue the warning.
The unsafe way is the way to go for all strings that are known at compile-time.
Your "safe" way leaks memory and is rather horrific.
Normally you'd have a sane C API which accepts const char *, so you could use a proper safe way in C++, i.e. std::string and its c_str() method.
If your C API assumes ownership of the string, your "safe way" has another flaw: you can't mix new[] and free(), passing memory allocated using the C++ new[] operator to a C API which expects to call free() on it is not allowed. If the C API doesn't want to call free() later on the string, it should be fine to use new[] on the C++ side.
Also, this is a strange mixture of C++ and C.
You seem to have a fundamental misunderstanding about C strings here.
cout << "creating c-strings unsafe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = "teststring";
}
Here, you're just assigning a pointer to a string literal constant. In C and C++, string literals are of type char[N], and you can assign a pointer to a string literal array because of array "decay". (However, it's deprecated to assign a non-const pointer to a string literal.)
But assigning a pointer to a string literal can't be what you want to do. Your API expects a non-const string. String literals are const.
What is the right and safe way to assign values to those [char* strings]?
There's no general answer to this question. Whenever you work with C strings (or pointers in general), you need to deal with the concept of ownership. C++ takes care of this for you automatically with std::string. Internally, std::string owns a pointer to a char* array, but it manages the memory for you so you don't need to care about it. But when you use raw C-strings, you DO need to put thought into managing the memory.
How you manage the memory depends on what you're doing with your program. If you allocate a C-string with new[], then you need to deallocate it with delete[]. If you allocate it with malloc, then you must deallocate it with free(). A good solution for working with C-strings in C++ is to use a smart pointer which takes ownership of the allocated C string. (But you'll need to use a deleter that deallocates the memory with delete[]). Or you can just use std::vector<char>. As always, don't forget to allocate room for the terminating null char.
Also, the reason your 2nd loop is so much slower is because it allocates memory in each iteration, whereas the 1st loop simply assigns a pointer to a statically-allocated string literal.

Simple String Manipulation

Suppose we have char *a ="Mission Impossible";
If we give cout<<*(a+1), then the output is i.
Is there any way to change this value, or this is not possible?
Yes, there are several ways to do this, but you have to make a copy of the string first because if you didn't, you'd be modifying memory you're not allowed to (where string literals are stored).
const char* a = "Mission Impossible"; // const char*, not char*, because we can't
// modify its contents
char buf[80] = {}; // create an array of chars 80 large, all initialised to 0
strncpy(buf, a, 79); // copy up to 79 characters from a to buf
cout << *(buf + 1); // prints i
buf[1] = 'b';
cout << *(buf + 1); // prints b
*(buf + 1) = 't';
cout << buf[1]; // prints t
That said, if this exercise is not for learning purposes, it is highly recommended that you learn and use std::string rather than C-style strings. They are superior in almost every way and will result in far less frustration and errors in your code.
char a[] = "Mission Impossible";
a[1] = 'x';
String literals cannot be modified. Typically they are placed a section of the binary that will be mapped read-only, therefore writing to them generates a fault. (This is implementation-defined behavior, but this happens to be the most common implementation these days.)
By declaring the string as a character array it is writable. The other alternative would be to duplicate the string literal into heap memory, either through malloc, new, or std::string.
No, the char* a is actually read-only and if you try to modify the content you will get undefined behavior. You should ideally declare a as const char*.
The simplest way to change that is doing *(a+1)='value_you_want';
This will change the content of a pointer (your case pointer is a+1) to the value you set.