I'm a C programmer trying to write c++ code. I heard string in C++ was better than char* in terms of security, performance, etc, however sometimes it seems that char* is a better choice. Someone suggested that programmers should not use char* in C++ because we could do all things that char* could do with string, and it's more secure and faster.
Did you ever used char* in C++? What are the specific conditions?
It's safer to use std::string because you don't need to worry about allocating / deallocating memory for the string. The C++ std::string class is likely to use a char* array internally. However, the class will manage the allocation, reallocation, and deallocation of the internal array for you. This removes all the usual risks that come with using raw pointers, such as memory leaks, buffer overflows, etc.
Additionally, it's also incredibly convenient. You can copy strings, append to a string, etc., without having to manually provide buffer space or use functions like strcpy/strcat. With std::string it's as simple as using the = or + operators.
Basically, it's:
std::string s1 = "Hello ";
std::string s2 = s1 + "World";
versus...
const char* s1 = "Hello";
char s2[1024]; // How much should I really even allocate here?
strcpy(s2, s1);
strcat(s2, " World ");
Edit:
In response to your edit regarding the use of char* in C++: Many C++ programmers will claim you should never use char* unless you're working with some API/legacy function that requires it, in which case you can use the std::string::c_str() function to convert an std::string to const char*.
However, I would say there are some legitimate uses of C-arrays in C++. For example, if performance is absolutely critical, a small C-array on the stack may be a better solution than std::string. You may also be writing a program where you need absolute control over memory allocation/deallocation, in which case you would use char*. Also, as was pointed out in the comments section, std::string isn't guaranteed to provide you with a contiguous, writable buffer *, so you can't directly write from a file into an std::string if you need your program to be completely portable. However, in the event you need to do this, std::vector would still probably be preferable to using a raw C-array.
* Although in C++11 this has changed so that std::string does provide you with a contiguous buffer
Ok, the question changed a lot since I first answered.
Native char arrays are a nightmare of memory management and buffer overruns compared to std::string. I always prefer to use std::string.
That said, char array may be a better choice in some circumstances due to performance constraints (although std::string may actually be faster in some cases -- measure first!) or prohibition of dynamic memory usage in an embedded environment, etc.
In general, std::string is a cleaner, safer way to go because it removes the burden of memory management from the programmer. The main reason it can be faster than char *'s, is that std::string stores the length of the string. So, you don't have to do the work of iterating through the entire character array looking for the terminating NULL character each time you want to do a copy, append, etc.
That being said, you will still find a lot of c++ programs that use a mix of std::string and char *, or have even written their own string classes from scratch. In older compilers, std::string was a memory hog and not necessarily as fast as it could be. This has gotten better over time, but some high-performance applications (e.g., games and servers) can still benefit from hand-tuned string manipulations and memory-management.
I would recommend starting out with std::string, or possibly creating a wrapper for it with more utility functions (e.g., starts_with(), split(), format(), etc.). If you find when benchmarking your code that string manipulation is a bottleneck, or uses too much memory, you can then decide if you want to accept the extra risks and testing that a custom string library demands.
TIP: One way of getting around the memory issues and still use std::string is to use an embedded database such as SQLite. This is particularly useful when generating and manipulating extremely large lists of strings, and performance is better than what you might expect.
C char * strings cannot contain '\0' characters. C++ string can handle null characters without a problem. If users enter strings containing \0 and you use C strings, your code may fail. There are also security issues associated with this.
Implementations of std::string hide the memory usage from you. If you're writing performance-critical code, or you actually have to worry about memory fragmentation, then using char* can save you a lot of headaches.
For anything else though, the fact that std::string hides all of this from you makes it so much more usable.
String may actually be better in terms of performance. And innumerable other reasons - security, memory management, convenient string functions, make std::string an infinitely better choice.
Edit: To see why string might be more efficient, read Herb Sutter's books - he discusses a way to internally implement string to use Lazy Initialization combined with Referencing.
Use std::string for its incredible convenience - automatic memory handling and methods / operators. With some string manipulations, most implementations will have optimizations in place (such as delayed evaluation of several subsequent manipulations - saves memory copying).
If you need to rely on the specific char layout in memory for other optimizations, try std::vector<char> instead. If you have a non-empty vector vec, you can get a char* pointer using &vec[0] (the vector has to be nonempty).
Short answer, I don't. The exception is when I'm using third party libraries that require them. In those cases I try to stick to std::string::c_str().
In all my professional career I've had an opportunity to use std::string at only two projects. All others had their own string classes :)
Having said that, for new code I generally use std::string when I can, except for module boundaries (functions exported by dlls/shared libraries) where I tend to expose C interface and stay away from C++ types and issues with binary incompatibilities between compilers and std library implementations.
Compare and contrast the following C and C++ examples:
strlen(infinitelengthstring)
versus
string.length()
std::string is almost always preferred. Even for speed, it uses small array on the stack before dynamically allocating more for larger strings.
However, char* pointers are still needed in many situations for writing strings/data into a raw buffer (e.g. network I/O), which can't be done with std::string.
The only time I've recently used a C-style char string in a C++ program was on a project that needed to make use of two C libraries that (of course) used C strings exclusively. Converting back and forth between the two string types made the code really convoluted.
I also had to do some manipulation on the strings that's actually kind of awkward to do with std::string, but I wouldn't have considered that a good reason to use C strings in the absence of the above constraint.
Related
I read somewhere that usage of CString is costly. Can you calrify it with an example. Also among CString and char array, which is better in terms of memory.
CString in addition to array of chars (or wide chars) contains string size, allocated buffer size, and reference counter (serving additionally as a lock flag). The buffer containing the array of chars may be significantly larger than the string it contains -- it allows to reduce the number of time-costly allocation calls. In addition, when the CString is set to be zero-sized, it still contains two wchar characters.
Naturally, when you compare the size of CString with the size of corresponding C-style array, the array will be smaller. However, if you want to manipulate your string as extensively as CString allows, you will eventually define your own variables for string size, buffer size and sometimes refcounter and/or guard flags. Indeed, you need to store your string size to avoid calling strlen each time you need it. You need to store separately your buffer size if you allow your buffer to be larger than the string length, and avoid calling reallocs each time you add to or subtract from the string. And so on -- you trade some small size increase for significant increases in speed, safety and functionality.
So, the answer depends on what you are going to do with the string. Suppose you want a string to store the name of your class for logging -- there a C-style string (const and static) will do fine. If you need a string to manipulate and use it extensively with MFC or ATL-related classes, use CString family types. If you need to manipulate string in the "engine" parts of your application that are isolated from its interface, and may be converted to other platforms, use std::string or write your own string type to suit your particular needs (this can be really useful when you write the "glue" code to place between the interface and the engine, otherwise std::string is preferable).
CString is from MFC framework specific to windows. std::string is from c++ standard. They are library classes for managing strings in memory. std::string will provide you code portability across platforms.
Using raw array is always good for memory however one has to do operations on strings and it becomes difficult with raw array, consider out of bounds check, get the string length, copy the array or change the size because the string may grow, deleting the array, etc. For all these problem string utility class are good wrapper. The string class will keep the actual string in heap and you have the overhead of the string class itself. However that will provide you functionality to mange the string memory which anyway you have to write by hand.
Prefer std::string if you can, if not, use CString.
In almost all cases I encourage novice programmers to use std::string or CString(*). First they will do significantly less errors. I have seen many buffer overruns, memory invalidation or memory leaks, because of erroneous use of C arrays.
So which is more efficient, CString / std::string or raw character arrays? Memory wise, generally speaking, all CString ans std::string have more is one integer for the size. The question is does it matter?
So which is more efficient in terms of performance? Well it depends on what you are doing with it and how you are using your C-arrays. But passing CString or std::string arround can be computationally more efficient than C-arrays. The problem with C-arrays is that you can't be sure of who owns the memory and what type (heap/stack/literal) it is. Defensive programming results in more copies of arrays, you know, just to be sure that the memory you hold will be valid for the entire duration of when it is needed.
Why is std::string or CString more efficient than C-arrays, if they are passed around by value? This is a bit more complicated and for totally different reasons. For CString, this is simple, it implemented as a COW (copy on write) object. So when you have 5 objects that originate for one CString, it will not use more memory that one, until you start to make change on one object. std::string has stricter requirements and thus it is not allowed to share memory with other std:: string objects. But if you have a newer compiler, std::string should implement the move semantic and thus returning a string from a function will only result in a copy of the pointer not reallocation.
There are very few cases where raw C arrays are good and practical idea.
*) If you are already programming against MFC, why not just use CString.
"One should always use std::string over c-style strings(char *)" is advice that comes up for almost every source code posted here. While the advice is no doubt good, the actual questions being addressed do not permit to elaborate on the why? aspect of the advice in detail. This question is to serve as a placeholder for the same.
A good answer should cover the following aspects(in detail):
Why should one use std::string over c-style strings in C++?
What are the disadvantages (if any) of the practice mentioned in #1?
What are the scenarios where the opposite of the advice mentioned in #1 is a good practice?
std::string manages its own memory, so you can copy, create, destroy them easily.
You can't use your own buffer as a std::string.
You need to pass a c string / buffer to something that expects to take ownership of the buffer - such as a 3rd party C library.
Well, if you just need an array of chars, std::string provides little advantage. But face it, how often is that the case? By wrapping a char array with additional functionality like std::string does, you gain both power and efficiency for some operations.
For example, determining the length of an array of characters requires "counting" the characters in the array. In contrast, an std::string provides an efficient operation for this particular task. (see https://stackoverflow.com/a/1467497/129622)
For power, efficiency and sanity
Larger memory footprint than "just" a char array
When you just need an array of chars
3) The advice always use string of course must be taken with a pinch of common sense. String literals are const char[], and if you pass a literal to a function that takes a const char* (for example std::ifstream::open()) there's absolutely no point wrapping it in std::string.
A char* is basically a pointer to a character. What C does is frequently makes this pointer point to the first character in an array.
An std::string is a class that is much like a vector. Internally, it handles the storage of an array of characters, and gives the user several member functions to manipulate said stored array as well as several overloaded operators.
Reasons to use a char* over an std::string:
C backwards-compatibility.
Performance (potentially).
char*s have lower-level access.
Reasons to use an std::string over a char*:
Much more intuitive to use.
Better searching, replacement, and manipulation functions.
Reduced risk of segmentation faults.
Example :
char* must be used in conjuction with either a char array, or with a dynamically allocated char array. After all, a pointer is worthless unless it actually points to something. This is mainly used in C programs:
char somebuffer[100] = "a string";
char* ptr = somebuffer; // ptr now points to somebuffer
cout << ptr; // prints "a string"
somebuffer[0] = 'b'; // change somebuffer
cout << ptr; // prints "b string"
notice that when you change 'somebuffer', 'ptr' also changes. This is because somebuffer is the actual string in this case. ptr just points/refers to it.
With std::string it's less weird:
std::string a = "a string";
std::string b = a;
cout << b; // prints "a string"
a[0] = 'b'; // change 'a'
cout << b; // prints "a string" (not "b string")
Here, you can see that changing 'a' does not affect 'b', because 'b' is the actual string.
But really, the major difference is that with char arrays, you are responsible for managing the memory, whereas std::string does it for you. In C++, there are very few reasons to use char arrays over strings. 99 times out of 100 you're better off with a string.
Until you fully understand memory management and pointers, just save yourself some headaches and use std::string.
Why should one use std::string over c-style strings in C++?
The main reason is it frees you from managing the lifetime of the string data. You can just treat strings as values and let the compiler/library worry about managing the memory.
Manually managing memory allocations and lifetimes is tedious and error prone.
What are the disadvantages (if any) of the practice mentioned in #1?
You give up fine-grained control over memory allocation and copying. That means you end up with a memory management strategy chosen by your toolchain vendor rather than chosen to match the needs of your program.
If you aren't careful you can end up with a lot of unneeded data copying (in a non-refcounted implementation) or reference count manipulation (in a refcounted implementation)
In a mixed-language project any function whose arguments use std::string or any data structure that contains std::string will not be able to be used directly from other languages.
What are the scenarios where the opposite of the advice mentioned in #1 is a good practice?
Different people will have different opinions on this but IMO
For function arguments passing strings in "const char *" is a good choice since it avoids unnessacery copying/refcouning and gives the caller flexibility about what they pass in.
For things used in interoperation with other languages you have little choice but to use c-style strings.
When you have a known length limit it may be faster to use fixed-size arrays.
When dealing with very long strings it may be better to use a construction that will definately be refcounted rather than copied (such as a character array wrapped in a shared_ptr) or indeed to use a different type of data structure altogether
In general you should always use std::string, since it is less bug prone. Be aware, that memory overhead of std::string is significant. Recently I've performed some experiments about std::string overhead. In general it is about 48 bytes! The article is here: http://jovislab.com/blog/?p=76.
I have a situation where I need to process large (many GB's) amounts of data as such:
build a large string by appending many smaller (C char*) strings
trim the string
convert the string into a C++ const std::string for processing (read only)
repeat
The data in each iteration are independent.
My question is, I'd like to minimise (if possible eliminate) heap allocated memory usage, as it at the moment is my largest performance problem.
Is there a way to convert a C string (char*) into a stl C++ string (std::string) without requiring std::string to internally alloc/copy the data?
Alternatively, could I use stringstreams or something similar to re-use a large buffer?
Edit: Thanks for the answers, for clarity, I think a revised question would be:
How can I build (via multiple appends) a stl C++ string efficiently. And if performing this action in a loop, where each loop is totally independant, how can I re-use thisallocated space.
You can't actually form a std::string without copying the data. A stringstream would probably reuse the memory from pass to pass (though I think the standard is silent on whether it actually has to), but it still wouldn't avoid the copying.
A common approach to this sort of problem is to write the code which processes the data in step 3 to use a begin/end iterator pair; then it can easily process either a std::string, a vector of chars, a pair of raw pointers, etc. Unlike passing it a container type like std::string, it would no longer know or care how the memory was allocated, since it would still belong to the caller. Carrying this idea to its logical conclusion is boost::range, which adds all the overloaded constructors to still let the caller just pass a string/vector/list/any sort of container with .begin() and .end(), or separate iterators.
Having written your processing code to work on an arbitrary iterator range, you could then even write a custom iterator (not as hard as it sounds, basically just an object with some standard typedefs, and operator ++/*/=/==/!= overloaded to get a forward-only iterator) that takes care of advancing to the next fragment each time it hit the end of the one it's working on, skipping over whitespace (I assume that's what you meant by trim). That you never had to assemble the whole string contiguously at all. Whether or not this would be a win depends on how many fragments/how large of fragments you have. This is essentially what the SGI rope mentioned by Martin York is: a string where append forms a linked list of fragments instead of a contiguous buffer, which is thus suitable for much longer values.
UPDATE (since I still see occasional upvotes on this answer):
C++17 introduces another choice: std::string_view, which replaced std::string in many function signatures, is a non-owning reference to a character data. It is implicitly convertible from std::string, but can also be explicitly constructed from contiguous data owned somewhere else, avoiding the unnecessary copying std::string imposes.
Is it at all possible to use a C++ string in step 1? If you use string::reserve(size_t), you can allocate a large enough buffer to prevent multiple heap allocations while appending the smaller strings, and then you can just use that same C++ string throughout all of the remaining steps.
See this link for more information on the reserve function.
To help with really big strings SGI has the class Rope in its STL.
Non standard but may be usefull.
http://www.sgi.com/tech/stl/Rope.html
Apparently rope is in the next version of the standard :-)
Note the developer joke. A rope is a big string. (Ha Ha) :-)
This is a lateral thinking answer, not directly addressing the question but "thinking" around it. Might be useful, might not...
Readonly processing of std::string doesn't really require a very complex subset of std::string's features. Is there a possibility that you could do search/replace on the code that performs all the processing on std::strings so it takes some other type instead? Start with a blank class:
class lightweight_string { };
Then replace all std::string references with lightweight_string. Perform a compilation to find out exactly what operations are needed on lightweight_string for it to act as a drop-in replacement. Then you can make your implementation work however you want.
Is each iteration independent enough that you can use the same std::string for each iteration? One would hope that your std::string implementation is smart enough to re-use memory if you assign a const char * to it when it was previously used for something else.
Assigning a char * into a std::string must always at least copy the data. Memory management is one of the main reasons to use std::string, so you won't be a able to override it.
In this case, might it be better to process the char* directly, instead of assigning it to a std::string.
I can see that almost all modern APIs are developed in the C language. There are reasons for that: processing speed, low level language, cross platform and so on.
Nowadays, I program in C++ because of its Object Orientation, the use of string, the STL but, mainly because it is a better C.
However when my C++ programs need to interact with C APIs I really get upset when I need to convert char[] types to C++ strings, then operate on these strings using its powerful methods, and finally convert from theses strings to char[] again (because the API needs to receive char[]).
If I repeat these operations for millions of records the processing times are higher because of the conversion task.
For that simple reason, I feel that char[] is an obstacle in the moment to assume the C++ as a better c.
I would like to know if you feel the same, if not (I hope so!) I really would like to know which is the best way for C++ to coexist with char[] types without doing those awful conversions.
Thanks for your attention.
The C++ string class has a lot of problems, and yes, what you're describing is one of them.
More specifically, there is no way to do string processing without creating a copy of the string, which may be expensive.
And because virtually all string processing algorithms are implemented as class members, they can only be used on the string class.
A solution you might want to experiment with is the combination of Boost.Range and Boost.StringAlgo.
Range allows you to create sequences out of a pair of iterators. They don't take ownership of the data, so they don't copy the string. they just point to the beginning and end of your char* string.
And Boost.StringAlgo implements all the common string operations as non-member functions, that can be applied to any sequence of characters. Such as, for example, a Boost range.
The combination of these two libraries pretty much solve the problem. They let you avoid having to copy your strings to process them.
Another solution might be to store your string data as std::string's all the time. When you need to pass a char* to some API functoin, simply pass it the address of the first character. (&str[0]).
The problem with this second approach is that std::string doesn't guarantee that its string buffer is null-terminated, so you either have to rely on implementation details, or manually add a null byte as part of the string.
If you use std::vector<char> instead of std::string, the underlying storage will be a C array that can be accessed with &someVec[0]. However, you do lose a lot of std::string conveniences such as operator+.
That said, I'd suggest just avoiding C APIs that mutate strings as much as possible. If you need to pass an immutable string to a C function, you can use c_str(), which is fast and non-copying on most std::string implementations.
I'm not sure what you mean by "conversion", but won't the following suffice for moving between char*, char[], and std::string?
char[] charString = {'a', 'b', 'c', '\0'};
std::string standardString(&charString[0]);
const char* stringPointer(standardString.c_str());
I don't think it's as bad as you make it out to be.
There is a cost converting a char[] to a std::string, but if you're going to be modifying the string, you have to pay that cost anyway whether converting to a std::string or copying to another char[] buffer.
The conversion going the other way (via string.c_str()) is usually trivial. It's usually returning a pointer to an internal buffer (just don't give that buffer to code that will modify it).
I'm not sure why you would be constrained to using C strings and still have an environment that runs C++ code but if you really don't want the overhead of conversion, than don't convert. Just write routines that operate on the C strings.
Another reason for converting to C++ style strings is for bound safety.
"... because it is a better C."
Baloney. C++ is a vastly inferior dialect of C. The problems it solves are trivial, the problems it brings, much worse than those it solves.
I have a situation where I need to process large (many GB's) amounts of data as such:
build a large string by appending many smaller (C char*) strings
trim the string
convert the string into a C++ const std::string for processing (read only)
repeat
The data in each iteration are independent.
My question is, I'd like to minimise (if possible eliminate) heap allocated memory usage, as it at the moment is my largest performance problem.
Is there a way to convert a C string (char*) into a stl C++ string (std::string) without requiring std::string to internally alloc/copy the data?
Alternatively, could I use stringstreams or something similar to re-use a large buffer?
Edit: Thanks for the answers, for clarity, I think a revised question would be:
How can I build (via multiple appends) a stl C++ string efficiently. And if performing this action in a loop, where each loop is totally independant, how can I re-use thisallocated space.
You can't actually form a std::string without copying the data. A stringstream would probably reuse the memory from pass to pass (though I think the standard is silent on whether it actually has to), but it still wouldn't avoid the copying.
A common approach to this sort of problem is to write the code which processes the data in step 3 to use a begin/end iterator pair; then it can easily process either a std::string, a vector of chars, a pair of raw pointers, etc. Unlike passing it a container type like std::string, it would no longer know or care how the memory was allocated, since it would still belong to the caller. Carrying this idea to its logical conclusion is boost::range, which adds all the overloaded constructors to still let the caller just pass a string/vector/list/any sort of container with .begin() and .end(), or separate iterators.
Having written your processing code to work on an arbitrary iterator range, you could then even write a custom iterator (not as hard as it sounds, basically just an object with some standard typedefs, and operator ++/*/=/==/!= overloaded to get a forward-only iterator) that takes care of advancing to the next fragment each time it hit the end of the one it's working on, skipping over whitespace (I assume that's what you meant by trim). That you never had to assemble the whole string contiguously at all. Whether or not this would be a win depends on how many fragments/how large of fragments you have. This is essentially what the SGI rope mentioned by Martin York is: a string where append forms a linked list of fragments instead of a contiguous buffer, which is thus suitable for much longer values.
UPDATE (since I still see occasional upvotes on this answer):
C++17 introduces another choice: std::string_view, which replaced std::string in many function signatures, is a non-owning reference to a character data. It is implicitly convertible from std::string, but can also be explicitly constructed from contiguous data owned somewhere else, avoiding the unnecessary copying std::string imposes.
Is it at all possible to use a C++ string in step 1? If you use string::reserve(size_t), you can allocate a large enough buffer to prevent multiple heap allocations while appending the smaller strings, and then you can just use that same C++ string throughout all of the remaining steps.
See this link for more information on the reserve function.
To help with really big strings SGI has the class Rope in its STL.
Non standard but may be usefull.
http://www.sgi.com/tech/stl/Rope.html
Apparently rope is in the next version of the standard :-)
Note the developer joke. A rope is a big string. (Ha Ha) :-)
This is a lateral thinking answer, not directly addressing the question but "thinking" around it. Might be useful, might not...
Readonly processing of std::string doesn't really require a very complex subset of std::string's features. Is there a possibility that you could do search/replace on the code that performs all the processing on std::strings so it takes some other type instead? Start with a blank class:
class lightweight_string { };
Then replace all std::string references with lightweight_string. Perform a compilation to find out exactly what operations are needed on lightweight_string for it to act as a drop-in replacement. Then you can make your implementation work however you want.
Is each iteration independent enough that you can use the same std::string for each iteration? One would hope that your std::string implementation is smart enough to re-use memory if you assign a const char * to it when it was previously used for something else.
Assigning a char * into a std::string must always at least copy the data. Memory management is one of the main reasons to use std::string, so you won't be a able to override it.
In this case, might it be better to process the char* directly, instead of assigning it to a std::string.