What is the right way to handle char* strings? - c++

I have a third party library that is using char* (non-const) as placeholder for string values. What is the right and safe way to assign values to those datatypes? I have the following test benchmark that uses my own timer class to measure execution times:
#include "string.h"
#include <iostream>
#include <sj/timer_chrono.hpp>
using namespace std;
int main()
{
sj::timer_chrono sw;
int iterations = 1e7;
// first method gives compiler warning:
// conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
cout << "creating c-strings unsafe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = "teststring";
}
sw.stop();
cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;
cout << "creating c-strings safe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = new char[strlen("teststr")];
strcpy(str, "teststring");
}
sw.stop();
cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;
return 0;
}
Output:
creating c-strings unsafe(?) way...
1.9164 ns
creating c-strings safe(?) way...
31.7406 ns
While the "safe" way get's rid of the compiler warning it makes the code about 15-20 times slower according to this benchmark (1.9 nanoseconds per iteration vs 31.7 nanoseconds per iteration). What is the correct way and what are is so dangerous about that "deprecated" way?

The C++ standard is clear:
An ordinary string literal has type “array of n const char” (section 2.14.5.8 in C++11).
and
The effect of attempting to modify a string literal is undefined (section 2.14.5.12 in C++11).
For a string known at compile time, the safe way of obtaining a non-const char* is this
char literal[] = "teststring";
you can then safely
char* ptr = literal;
If at compile time you don't know the string but know its length you can use an array:
char str[STR_LENGTH + 1];
If you don't know the length then you will need to use dynamic allocation. Make sure you deallocate the memory when the strings are no longer needed.
This will work only if the API doesn't take ownership of the char* you pass.
If it tries to deallocate the strings internally then it should say so in the documentation and inform you on the proper way to allocate the strings. You will need to match your allocation method with the one used internally by the API.
The
char literal[] = "test";
will create a local, 5 character array with automatinc storage (meaning the variable will be destroyed when the execution leaves the scope in which the variable is declared) and initialize each character in the array with the characters 't', 'e', 's', 't' and '\0'.
You can later edit these characters: literal[2] = 'x';
If you write this:
char* str1 = "test";
char* str2 = "test";
then, depending on the compiler, str1 and str2 may be the same value (i.e., point to the same string).
("Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation- defined." in Section 2.14.5.12 of the C++ standard)
It may also be true that they are stored in a read-only section of memory and therefore any attempt to modify the string will result in an exception/crash.
They are also, in reality of the type const char* so this line:
char* str = "test";
actually casts away the const-ness on the string, which is why the compiler will issue the warning.

The unsafe way is the way to go for all strings that are known at compile-time.
Your "safe" way leaks memory and is rather horrific.
Normally you'd have a sane C API which accepts const char *, so you could use a proper safe way in C++, i.e. std::string and its c_str() method.
If your C API assumes ownership of the string, your "safe way" has another flaw: you can't mix new[] and free(), passing memory allocated using the C++ new[] operator to a C API which expects to call free() on it is not allowed. If the C API doesn't want to call free() later on the string, it should be fine to use new[] on the C++ side.
Also, this is a strange mixture of C++ and C.

You seem to have a fundamental misunderstanding about C strings here.
cout << "creating c-strings unsafe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = "teststring";
}
Here, you're just assigning a pointer to a string literal constant. In C and C++, string literals are of type char[N], and you can assign a pointer to a string literal array because of array "decay". (However, it's deprecated to assign a non-const pointer to a string literal.)
But assigning a pointer to a string literal can't be what you want to do. Your API expects a non-const string. String literals are const.
What is the right and safe way to assign values to those [char* strings]?
There's no general answer to this question. Whenever you work with C strings (or pointers in general), you need to deal with the concept of ownership. C++ takes care of this for you automatically with std::string. Internally, std::string owns a pointer to a char* array, but it manages the memory for you so you don't need to care about it. But when you use raw C-strings, you DO need to put thought into managing the memory.
How you manage the memory depends on what you're doing with your program. If you allocate a C-string with new[], then you need to deallocate it with delete[]. If you allocate it with malloc, then you must deallocate it with free(). A good solution for working with C-strings in C++ is to use a smart pointer which takes ownership of the allocated C string. (But you'll need to use a deleter that deallocates the memory with delete[]). Or you can just use std::vector<char>. As always, don't forget to allocate room for the terminating null char.
Also, the reason your 2nd loop is so much slower is because it allocates memory in each iteration, whereas the 1st loop simply assigns a pointer to a statically-allocated string literal.

Related

Is it possible for separately initialized string variables to overlap?

If I initialize several string(character array) variables in the following ways:
const char* myString1 = "string content 1";
const char* myString2 = "string content 2";
Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.
So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)
By overlap, I mean the following behaviour;
// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;
It outputs
string costring content 2
string content 2
So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C++ can prevent other string literals from "landing" on the memory locations of the older ones.
How does C++/compiler avoid such problem?
If I change const char* to const char[], is it still the same?
Yes, string literals are allowed to overlap in general. From lex.string#9
... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.
A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.
Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.
The compiler knows the size of each string, because it can "see" it in your code.
Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.
And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.
If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.
As for the const char[], most compilers will treat it the same way as const char*.
Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)
const char *a = "Hello there."
const char *b = "Hello there."
cout << (a == b);
// prints "1" which means they point to the same thing
The const char * can share a string though.
const char *a = "Hello there.";
const char *b = a + 6;
cout << a;
// prints "Hello there."
cout << b;
// prints "there."
I think to answer your second question an explanation of c-style strings is useful.
A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)
So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".
const char *a = "Hello\0 there.";
cout << a;
// prints "Hello"
For the most part const char *a and const char a[] are the same.
// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."
// This is valid
const char *c = b + 3; // *c = "re."
// This, however, is not valid
const char d[] = b + 3;

A value of type "*const char *" cannot be assigned to an entity of type "char *"

So I am trying to avoid using strings for this. I am basically trying to make a string array.
char **hourtimes = (char**)malloc(100 * sizeof(char*));
for (int i = 0; i < 100; i++) {
(*hourtimes) = (char*)malloc((100 * sizeof(char)));
}
So I made a string array basically here
Now, I want to make hourtimes[0] = "twelve";
I tried doing *hourtimes = "twelve";
but I get the same error, I think this works in c, but I'm using c++
hourtimes[0][0] = 't';
hourtimes[0][1] = 'w';
etc works just fine but that would be too cumbersome
*hourtimes = "twelve" is setting *hourtimes to point to an immutable string literal. You are then trying to modify that immutable string. What you want to do is copy "twelve" into *hourtimes.
strcpy(hourtimes[0],"twelve");
Note: This answer was written at a time when the question was tagged for C. C++ will have different preferred ways of doing this kind of thing.
The error message tells you exactly what's wrong: You can't assign a const char * to a char *. What does that mean, though?
Both const char * and char * are types. They are, in fact, very nearly the same type; one is a pointer to a character, and the other is a pointer to a constant character. That means that the latter can't be changed1; that's, after all, what "constant" means. So when you try to tell the compiler to treat a pointer to a constant type as a pointer to a non-const type, it'll give you an error -- because otherwise it'd have no way to guarantee that the string isn't modified.
"whatever" is always a const char *, not a char *, because that's stored in memory that's generally not meant to be modified, and the compiler can make some really neat optimizations if it can safely assume that it's unchanged (which, because it's const, it can).
I won't tell you how to "properly" write the code you're going for, because if you're using C++, you should be using std::vector and std::string instead of anything with pointers whenever possible, and that probably includes here. If, for whatever reason, you need to use pointers, the comments have covered that well enough.
1: Okay, yes, it can -- but that's outside the scope of this answer, and I don't want to confuse any beginners.
In your allocation loop, (*hourtimes) is the same as hourtimes[0], so you are assigning your allocated sub-arrays to the same slot in the main array on each loop iteration, causing memory leaks and uninitialized slots. You need to use hourtimes[i] instead:
char **hourtimes = (char**)malloc(100 * sizeof(char*));
for (int i = 0; i < 100; i++) {
hourtimes[i] = (char*)malloc(100 * sizeof(char));
}
And don't forget to deallocate the arrays when you are done with them:
for (int i = 0; i < 100; i++) {
free(hourtimes[i]);
}
free(hourtimes);
Now, a string literal has type const char[N], where N is the number of characters in the literal, + 1 for the null terminator. So "twelve" would be a const char[7].
Your arrays only allow char* pointers to be stored, but a const char[N] decays into a const char* pointer to the first char. You can't assign a const char* to a char*, thus the compiler error.
Even if it were possible to do (which it is, but only with a type-cast), you shouldn't do it, because doing so would cause a memory leak as you would lose your original pointer to the allocated array, and worse free() can't deallocate a string literal anyway.
What you really want to do is copy the content of the string literal into the allocated array storage. You can use strncpy() for that:
strncpy(hourtimes[0], "twelve", 100);
Now, with all of that said, this is the C way of handling arrays of strings. The C++ way is to use std::vector and std::string instead:
#include <string>
#include <vector>
std::vector<std::string> hourtimes(100);
...
hourtimes[0] = "twelve";
This is a string literal, which can be used as a pointer to a constant char, but not as a pointer to a non-const char.
"twelve"
You do however attempt to assign it to a pointer to non-const char.
hourtimes[0] = "twelve";
That is what the compiler does not like.

Are strings in C++ copied when modified?

Using the std::string class in C++, it is possible to modify a character using the array notation, like:
std::string s = "Hello";
s[0] = 'X';
cout << s << '\n';
I have checked that this code compiles, and prints "Xello" as expected. However, I was wondering what the cost of this operation is: is it constant time, or is it O(n) because the string is copied?
The string isn't copied. The internal data is directly modified.
It basically gets the internal data pointer of the actual string memory, and modifies it. Imagine doing this:
char *data = &str[0];
for(size_t i = 0; i < str.size(); ++i)
{
data[i] = '!';
}
The code sets every character of the string to an exclamation mark.
But if the string was copied, then after the first write, the data pointer would become invalid.
Or to use another example:
std::cout << str[5] << std::endl;
That prints the 6th character of the string. Why would that copy the string?
C++ can't tell the difference between char c = str[5] and str[5] = c (except as far as const vs non-const function calls go).
Also, str[n] is guaranteed to never throw exceptions, as long as n < str.size(). It can't make that guarantee if it had to allocate memory internally for a copy - because the allocation could fail and throw.
(As #juanchopanza mentioned, older C++ standards permitted CoW strings, but the latest C++ standard forbids this)
You can modify stl string like in you example, no copy will be done. Standard library string class does not manage string pool like other languages do for strings (like Java). This operation is constant in complexity.
You do only modify the first element s[0], therefore it can't be O(n). You don't copy the string.

Store value in Pointers as an Array - C++

I am trying to make a function like strcpy in C++. I cannot use built-in string.h functions because of restriction by our instructor. I have made the following function:
int strlen (char* string)
{
int len = 0;
while (string [len] != (char)0) len ++;
return len;
}
char* strcpy (char* *string1, char* string2)
{
for (int i = 0; i<strlen (string2); i++) *string1[i] = string2[i];
return *string1;
}
main()
{
char* i = "Farid";
strcpy (&i, "ABC ");
cout<<i;
}
But I am unable to set *string1 [i] value. When I try to do so an error appears on screen 'Program has encountered a problem and need to close'.
What should I do to resolve this problem?
Your strcpy function is wrong. When you write *string1[i] you are actually modifying the first character of the i-th element of an imaginary array of strings. That memory location does not exist and your program segfaults.
Do this instead:
char* strcpy (char* string1, char* string2)
{
for (int i = 0; i<strlen (string2); i++) string1[i] = string2[i];
return string1;
}
If you pass a char* the characters are already modifiable. Note It is responsibility of the caller to allocate the memory to hold the copy. And the declaration:
char* i = "Farid";
is not a valid allocation, because the i pointer will likely point to read-only memory. Do instead:
char i[100] = "Farid";
Now i holds 100 chars of local memory, plenty of room for your copy:
strcpy(i, "ABC ");
If you wanted this function to allocate memory, then you should create another one, say strdup():
char* strdup (char* string)
{
size_t len = strlen(string);
char *n = malloc(len);
if (!n)
return 0;
strcpy(n, string);
return n;
}
Now, with this function the caller has the responsibility to free the memory:
char *i = strdup("ABC ");
//use i
free(i);
Because this error in the declaration of strcpy: "char* *string1"
I don't think you meant string1 to be a pointer to a pointer to char.
Removing one of the * should word
The code has several issues:
You can't assign a string literal to char* because the string literal has type char const[N] (for a suitable value of N) which converts to char const* but not to char*. In C++03 it was possible to convert to char* for backward compatibility but this rule is now gone. That is, your i needs to be declared char const*. As implemented above, your code tries to write read-only memory which will have undesirable effects.
The declaration of std::strcpy() takes a char* and a char const*: for the first pointer you need to provide sufficient space to hold a string of the second argument. Since this is error-prone it is a bad idea to use strcpy() in the first place! Instead, you want to replicate std::strncpy() which takes as third argument the length of the first buffer (actually, I'm never sure if std::strncpy() guarantees zero termination or not; you definitely also want to guarantee zero termination).
It is a bad idea to use strlen() in the loop condition as the function needs to be evaluated for each iteration of the loop, effectively changing the complexity of strlen() from linear (O(N)) to quadratic (O(N2)). Quadratic complexity is very bad. Copying a string of 1000 characters takes 1000000 operations. If you want to try out the effect, copy a string with 1000000 characters using a linear and a quadratic algorithm.
Your strcpy() doesn't add a null-terminator.
In C++ (and in C since ~1990) the implicit int rule doesn't apply. That is, you really need to write int in front of main().
OK, a couple of things:
you are missing the return type for the main function
declaration. Not really allowed under the standard. Some compilers will still allow it, but others will fail on the compile.
the way you have your for loop structured in
strcpy you are calling your strlen function each time through
the loop, and it is having to re-count the characters in the source
string. Not a big deal with a string like "ABC " but as strings get
longer.... Better to save the value of the result into a variable and use that in the for loop
Because of the way that you are declaring i in
`main' you are pointing to read-only storage, and will be causing an
access violation
Look at the other answers here for how to rebuild your code.
Pointer use in C and C++ is a perennial issue. I'd like to suggest the following tutorial from Paul DiLorenzo, "Learning C++ Pointers for REAL dummies.".
(This is not to imply that you are a "dummy," it's just a reference to the ",insert subject here> for Dummies" lines of books. I would not be surprised that the insertion of "REAL" is to forestall lawsuits over trademarked titles)
It is an excellent tutorial.
Hope it helps.

Difference between string and char[] types in C++

For C, we use char[] to represent strings.
For C++, I see examples using both std::string and char arrays.
#include <iostream>
#include <string>
using namespace std;
int main () {
string name;
cout << "What's your name? ";
getline(cin, name);
cout << "Hello " << name << ".\n";
return 0;
}
#include <iostream>
using namespace std;
int main () {
char name[256];
cout << "What's your name? ";
cin.getline(name, 256);
cout << "Hello " << name << ".\n";
return 0;
}
(Both examples adapted from http://www.cplusplus.com.)
What is the difference between these two types in C++? (In terms of performance, API integration, pros/cons, ...)
A char array is just that - an array of characters:
If allocated on the stack (like in your example), it will always occupy eg. 256 bytes no matter how long the text it contains is
If allocated on the heap (using malloc() or new char[]) you're responsible for releasing the memory afterwards and you will always have the overhead of a heap allocation.
If you copy a text of more than 256 chars into the array, it might crash, produce ugly assertion messages or cause unexplainable (mis-)behavior somewhere else in your program.
To determine the text's length, the array has to be scanned, character by character, for a \0 character.
A string is a class that contains a char array, but automatically manages it for you. Most string implementations have a built-in array of 16 characters (so short strings don't fragment the heap) and use the heap for longer strings.
You can access a string's char array like this:
std::string myString = "Hello World";
const char *myStringChars = myString.c_str();
C++ strings can contain embedded \0 characters, know their length without counting, are faster than heap-allocated char arrays for short texts and protect you from buffer overruns. Plus they're more readable and easier to use.
However, C++ strings are not (very) suitable for usage across DLL boundaries, because this would require any user of such a DLL function to make sure he's using the exact same compiler and C++ runtime implementation, lest he risk his string class behaving differently.
Normally, a string class would also release its heap memory on the calling heap, so it will only be able to free memory again if you're using a shared (.dll or .so) version of the runtime.
In short: use C++ strings in all your internal functions and methods. If you ever write a .dll or .so, use C strings in your public (dll/so-exposed) functions.
Arkaitz is correct that string is a managed type. What this means for you is that you never have to worry about how long the string is, nor do you have to worry about freeing or reallocating the memory of the string.
On the other hand, the char[] notation in the case above has restricted the character buffer to exactly 256 characters. If you tried to write more than 256 characters into that buffer, at best you will overwrite other memory that your program "owns". At worst, you will try to overwrite memory that you do not own, and your OS will kill your program on the spot.
Bottom line? Strings are a lot more programmer friendly, char[]s are a lot more efficient for the computer.
Well, string type is a completely managed class for character strings, while char[] is still what it was in C, a byte array representing a character string for you.
In terms of API and standard library everything is implemented in terms of strings and not char[], but there are still lots of functions from the libc that receive char[] so you may need to use it for those, apart from that I would always use std::string.
In terms of efficiency of course a raw buffer of unmanaged memory will almost always be faster for lots of things, but take in account comparing strings for example, std::string has always the size to check it first, while with char[] you need to compare character by character.
I personally do not see any reason why one would like to use char* or char[] except for compatibility with old code. std::string's no slower than using a c-string, except that it will handle re-allocation for you. You can set it's size when you create it, and thus avoid re-allocation if you want. It's indexing operator ([]) provides constant time access (and is in every sense of the word the exact same thing as using a c-string indexer). Using the at method gives you bounds checked safety as well, something you don't get with c-strings, unless you write it. Your compiler will most often optimize out the indexer use in release mode. It is easy to mess around with c-strings; things such as delete vs delete[], exception safety, even how to reallocate a c-string.
And when you have to deal with advanced concepts like having COW strings, and non-COW for MT etc, you will need std::string.
If you are worried about copies, as long as you use references, and const references wherever you can, you will not have any overhead due to copies, and it's the same thing as you would be doing with the c-string.
One of the difference is Null termination (\0).
In C and C++, char* or char[] will take a pointer to a single char as a parameter and will track along the memory until a 0 memory value is reached (often called the null terminator).
C++ strings can contain embedded \0 characters, know their length without counting.
#include<stdio.h>
#include<string.h>
#include<iostream>
using namespace std;
void NullTerminatedString(string str){
int NUll_term = 3;
str[NUll_term] = '\0'; // specific character is kept as NULL in string
cout << str << endl <<endl <<endl;
}
void NullTerminatedChar(char *str){
int NUll_term = 3;
str[NUll_term] = 0; // from specific, all the character are removed
cout << str << endl;
}
int main(){
string str = "Feels Happy";
printf("string = %s\n", str.c_str());
printf("strlen = %d\n", strlen(str.c_str()));
printf("size = %d\n", str.size());
printf("sizeof = %d\n", sizeof(str)); // sizeof std::string class and compiler dependent
NullTerminatedString(str);
char str1[12] = "Feels Happy";
printf("char[] = %s\n", str1);
printf("strlen = %d\n", strlen(str1));
printf("sizeof = %d\n", sizeof(str1)); // sizeof char array
NullTerminatedChar(str1);
return 0;
}
Output:
strlen = 11
size = 11
sizeof = 32
Fee s Happy
strlen = 11
sizeof = 12
Fee
Think of (char *) as string.begin(). The essential difference is that (char *) is an iterator and std::string is a container. If you stick to basic strings a (char *) will give you what std::string::iterator does. You could use (char *) when you want the benefit of an iterator and also compatibility with C, but that's the exception and not the rule. As always, be careful of iterator invalidation. When people say (char *) isn't safe this is what they mean. It's as safe as any other C++ iterator.
Strings have helper functions and manage char arrays automatically. You can concatenate strings, for a char array you would need to copy it to a new array, strings can change their length at runtime. A char array is harder to manage than a string and certain functions may only accept a string as input, requiring you to convert the array to a string. It's better to use strings, they were made so that you don't have to use arrays. If arrays were objectively better we wouldn't have strings.