Why should one use std::string over c-style strings in C++? - c++

"One should always use std::string over c-style strings(char *)" is advice that comes up for almost every source code posted here. While the advice is no doubt good, the actual questions being addressed do not permit to elaborate on the why? aspect of the advice in detail. This question is to serve as a placeholder for the same.
A good answer should cover the following aspects(in detail):
Why should one use std::string over c-style strings in C++?
What are the disadvantages (if any) of the practice mentioned in #1?
What are the scenarios where the opposite of the advice mentioned in #1 is a good practice?

std::string manages its own memory, so you can copy, create, destroy them easily.
You can't use your own buffer as a std::string.
You need to pass a c string / buffer to something that expects to take ownership of the buffer - such as a 3rd party C library.

Well, if you just need an array of chars, std::string provides little advantage. But face it, how often is that the case? By wrapping a char array with additional functionality like std::string does, you gain both power and efficiency for some operations.
For example, determining the length of an array of characters requires "counting" the characters in the array. In contrast, an std::string provides an efficient operation for this particular task. (see https://stackoverflow.com/a/1467497/129622)
For power, efficiency and sanity
Larger memory footprint than "just" a char array
When you just need an array of chars

3) The advice always use string of course must be taken with a pinch of common sense. String literals are const char[], and if you pass a literal to a function that takes a const char* (for example std::ifstream::open()) there's absolutely no point wrapping it in std::string.

A char* is basically a pointer to a character. What C does is frequently makes this pointer point to the first character in an array.
An std::string is a class that is much like a vector. Internally, it handles the storage of an array of characters, and gives the user several member functions to manipulate said stored array as well as several overloaded operators.
Reasons to use a char* over an std::string:
C backwards-compatibility.
Performance (potentially).
char*s have lower-level access.
Reasons to use an std::string over a char*:
Much more intuitive to use.
Better searching, replacement, and manipulation functions.
Reduced risk of segmentation faults.
Example :
char* must be used in conjuction with either a char array, or with a dynamically allocated char array. After all, a pointer is worthless unless it actually points to something. This is mainly used in C programs:
char somebuffer[100] = "a string";
char* ptr = somebuffer; // ptr now points to somebuffer
cout << ptr; // prints "a string"
somebuffer[0] = 'b'; // change somebuffer
cout << ptr; // prints "b string"
notice that when you change 'somebuffer', 'ptr' also changes. This is because somebuffer is the actual string in this case. ptr just points/refers to it.
With std::string it's less weird:
std::string a = "a string";
std::string b = a;
cout << b; // prints "a string"
a[0] = 'b'; // change 'a'
cout << b; // prints "a string" (not "b string")
Here, you can see that changing 'a' does not affect 'b', because 'b' is the actual string.
But really, the major difference is that with char arrays, you are responsible for managing the memory, whereas std::string does it for you. In C++, there are very few reasons to use char arrays over strings. 99 times out of 100 you're better off with a string.
Until you fully understand memory management and pointers, just save yourself some headaches and use std::string.

Why should one use std::string over c-style strings in C++?
The main reason is it frees you from managing the lifetime of the string data. You can just treat strings as values and let the compiler/library worry about managing the memory.
Manually managing memory allocations and lifetimes is tedious and error prone.
What are the disadvantages (if any) of the practice mentioned in #1?
You give up fine-grained control over memory allocation and copying. That means you end up with a memory management strategy chosen by your toolchain vendor rather than chosen to match the needs of your program.
If you aren't careful you can end up with a lot of unneeded data copying (in a non-refcounted implementation) or reference count manipulation (in a refcounted implementation)
In a mixed-language project any function whose arguments use std::string or any data structure that contains std::string will not be able to be used directly from other languages.
What are the scenarios where the opposite of the advice mentioned in #1 is a good practice?
Different people will have different opinions on this but IMO
For function arguments passing strings in "const char *" is a good choice since it avoids unnessacery copying/refcouning and gives the caller flexibility about what they pass in.
For things used in interoperation with other languages you have little choice but to use c-style strings.
When you have a known length limit it may be faster to use fixed-size arrays.
When dealing with very long strings it may be better to use a construction that will definately be refcounted rather than copied (such as a character array wrapped in a shared_ptr) or indeed to use a different type of data structure altogether

In general you should always use std::string, since it is less bug prone. Be aware, that memory overhead of std::string is significant. Recently I've performed some experiments about std::string overhead. In general it is about 48 bytes! The article is here: http://jovislab.com/blog/?p=76.

Related

Is there a reason why `cups_option_t::name` and `::value` aren't `const char*`?

I don't want to use cupsAddOption(), because it has quadratic behaviour (only ever adds one entry to the allocated memory block), and because it strdups every name and value string, while in my case, they are all string literals. So in a call to cupsPrintFile(), I want to pass a C array of cups_option_ts.
But as a C++ programmer, I cannot assign a C string literal (having type const char[]) to the cups_option_t fields, because they are char*.
Is that just lazy API design, or does CUPS actually manipulate those strings in-place?
As they are meant to point to malloced and strcpyed memory, they apparently cannot be const.
What performance penalties do you actually expect? How often do you use that function actually? Prematue optimization is often a bad habbit and results in hard to maintain code.

CString or char array which one is better in terms of memory

I read somewhere that usage of CString is costly. Can you calrify it with an example. Also among CString and char array, which is better in terms of memory.
CString in addition to array of chars (or wide chars) contains string size, allocated buffer size, and reference counter (serving additionally as a lock flag). The buffer containing the array of chars may be significantly larger than the string it contains -- it allows to reduce the number of time-costly allocation calls. In addition, when the CString is set to be zero-sized, it still contains two wchar characters.
Naturally, when you compare the size of CString with the size of corresponding C-style array, the array will be smaller. However, if you want to manipulate your string as extensively as CString allows, you will eventually define your own variables for string size, buffer size and sometimes refcounter and/or guard flags. Indeed, you need to store your string size to avoid calling strlen each time you need it. You need to store separately your buffer size if you allow your buffer to be larger than the string length, and avoid calling reallocs each time you add to or subtract from the string. And so on -- you trade some small size increase for significant increases in speed, safety and functionality.
So, the answer depends on what you are going to do with the string. Suppose you want a string to store the name of your class for logging -- there a C-style string (const and static) will do fine. If you need a string to manipulate and use it extensively with MFC or ATL-related classes, use CString family types. If you need to manipulate string in the "engine" parts of your application that are isolated from its interface, and may be converted to other platforms, use std::string or write your own string type to suit your particular needs (this can be really useful when you write the "glue" code to place between the interface and the engine, otherwise std::string is preferable).
CString is from MFC framework specific to windows. std::string is from c++ standard. They are library classes for managing strings in memory. std::string will provide you code portability across platforms.
Using raw array is always good for memory however one has to do operations on strings and it becomes difficult with raw array, consider out of bounds check, get the string length, copy the array or change the size because the string may grow, deleting the array, etc. For all these problem string utility class are good wrapper. The string class will keep the actual string in heap and you have the overhead of the string class itself. However that will provide you functionality to mange the string memory which anyway you have to write by hand.
Prefer std::string if you can, if not, use CString.
In almost all cases I encourage novice programmers to use std::string or CString(*). First they will do significantly less errors. I have seen many buffer overruns, memory invalidation or memory leaks, because of erroneous use of C arrays.
So which is more efficient, CString / std::string or raw character arrays? Memory wise, generally speaking, all CString ans std::string have more is one integer for the size. The question is does it matter?
So which is more efficient in terms of performance? Well it depends on what you are doing with it and how you are using your C-arrays. But passing CString or std::string arround can be computationally more efficient than C-arrays. The problem with C-arrays is that you can't be sure of who owns the memory and what type (heap/stack/literal) it is. Defensive programming results in more copies of arrays, you know, just to be sure that the memory you hold will be valid for the entire duration of when it is needed.
Why is std::string or CString more efficient than C-arrays, if they are passed around by value? This is a bit more complicated and for totally different reasons. For CString, this is simple, it implemented as a COW (copy on write) object. So when you have 5 objects that originate for one CString, it will not use more memory that one, until you start to make change on one object. std::string has stricter requirements and thus it is not allowed to share memory with other std:: string objects. But if you have a newer compiler, std::string should implement the move semantic and thus returning a string from a function will only result in a copy of the pointer not reallocation.
There are very few cases where raw C arrays are good and practical idea.
*) If you are already programming against MFC, why not just use CString.

How does the string class in c++ std work?

I'm afraid I don't know templates (or C++, really), but I know algorithms and data structures (even some OOP! :). Anyway, to make the question a bit more precise, consider what I would like to be part of the answer (among others I don't know in advance).
Why is it coded as a template?
How does the template work?
How does it do mem allocation?
Why is (is not) better than mere null terminated char arrays?
std::string is actually a typedef to a std::basic_string<char>, and therein lies the answer to your #1 above. Its a template in order to make basic_string work with pretty much anything. char, unsigned char, wchar_t, pizza, whatever... string itself is just a programmer convenience that uses char as the datatype, since that's what's often wanted.
Unanswerable as asked. If you're confused about something, please try to narrow it down a bit.
There are two answers. One, from the application-layer point of view, all basic_string objects use an allocator object to do the actual allocation. Allocation methods may vary from one implementation to the next, and for different template parameters, but in practice they will use new at the lower levels to allocate & manage the contained resource.
Its better than mere char arrays for a wide variety of reasons.
string managers the memory for you. You do not have to ever allocate buffer space when you add or remove data to the string. If you add more than will fit in the currently-allocated buffer, string will reallocate it for you behind the scenes.
In this regard, string can be thought of as a kind of smart pointer. For the same reasons why smart pointers are better than raw pointers, string s are better than raw char arrays.
Type safety. This may seem a little convoluted, but string used properly has better type safety than char buffers. Consider a common scenario:
#include <string>
#include <sstream>
using namespace std;
int main()
{
const char* jamorkee_raw = "jamorkee";
char raw_buf[0x1000] = {};
sprintf( raw_buf, "This is my string. Hello, %f", jamorkee_raw);
const string jamorkee_str = "jamorkee";
stringstream ss;
ss << "This is my string. Hello " << jamorkee_str;
string s = ss.str();
}
the type safety issue raised in the above by using a raw char buffer isn't even possible when using string along with streams.
A rather quick (and therefore probably incomplete) shot at answering some of the questions:
Why is it coded as a template?
Templates provide the capability for the class functions to work on arbitrary data types. For example the basic_string<> template class can work on char units (which is what the std::string typedef does) or wchar_t units (std::wstring) or any POD type. Using something other than char or wchar_t is unusual (std::vector<> would more likely be used), but the possibility exists.
How does it do mem allocation?
This isn't specified by the standard. In fact, the basic_string<> template allows an arbitrary allocator to be used for the actual allocation of memory (but doesn't determine at what points allocations might be requested). Some implementations might store short strings in actual class members, and only allocate dynamically when the strings grow beyond a certain size. The size requested might be exactly what's need to store the string or might be a multiple of the size to allow for growth without a reallocation.
Additional information stolen from another SO answer:
Scott Meyer's book, Effective STL, has a chapter on std::string implementations that's a decent overview of the common variations: "Item 15: Be aware of variations in string implementations".
He talks about 4 variations:
several variations on a ref-counted implementation (commonly known as copy on write) - when a string object is copied unchanged, the refcount is incremented but the actual string data is not. Both object point to the same refcounted data until one of the objects modifies it, causing a 'copy on write' of the data. The variations are in where things like the refcount, locks etc are stored.
a "short string optimization" implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer
Why is (is not) better than mere null terminated char arrays?
One way the string class is better than a mere null terminated array is that the class manages the memory required, so defects involving allocation errors or overrunning the end of the allocated arrays are reduced. Another (perhaps minor) benefit is that you can store 'null' characters in the string. A drawback is that there's perhaps some overhead - especially that you pretty much have to rely on dynamic memory allocation for the string class. In most scenarios that's probably not a major issue, on some setups (embedded systems for example) it can be a problem.
string is not the template, string is a specialization of the basic_string class template for char. It's a template so that for example you can typedef wstring which specializes on wide characters, and use all the same code for the encapsulated value.
See #Gman's comment. Compile-time code reuse, while retaining the ability to selectively special-case, is the basic rationale for templates.
Implementation dependent. Some do single-instance allocation, with copy on write. Some use a builtin buffer for small strings and allocate from heap only after a certain size is reached. I suggest you investigate how it works on your compiler by walking the constructor and follow-on code in <string>, as that will help you understand 2. hands on, which is way more valuable than just reading about it (though a book or other reading is a great idea for intro to templates).
Because const char* and the CRT that supports it is a bug farm for the unwary. Check out all the stuff you get for free with std::string. Plus a whole bunch of Standard C++ algorithms that work with string iterators.
Why is it coded as a template?
Several people have given the answer that having std::basic_string be a template means that you can have both std::basic_string<char> and std::basic_string<wchar_t>. What nobody has explained is why C and C++ have multiple character types in the first place.
C, especially in its early versions, was minimalistic about data types. Why have bool when the integers 0 and 1 work just fine? And why have distinct types for "byte" and "character" when they're both 8 bits?
The problem is that 8 bits limits you to 256 characters, which is adequate for an alphabetic language like English or Russian, but nowhere near enough for Japanese or Chinese. And now we have Unicode with its 21-bit code points. But char couldn't be expanded to 16 or 32 bits because the assumption that char = byte was so entrenched. So we got a separate type for "wide characters".
But now we have the problem that wchar_t is UTF-32 on Linux but UTF-16 on Windows. And to solve that problem the next version of the C++ standard will add the char16_t and char32_t types (and corresponding string types).
A good free online resource is "Thinking in C++" by Bruce Eckel, whose site is here: http://mindview.net/Books/TICPP/ThinkingInCPP2e.html .
The second volume of his free book is mirrored here: http://www.smart2help.com/e-books/ticpp-2nd-ed-vol-two/#_ftnref14 . Chapter three is all about the string class, why it's a template, and why it's useful.

Why do you prefer char* instead of string, in C++?

I'm a C programmer trying to write c++ code. I heard string in C++ was better than char* in terms of security, performance, etc, however sometimes it seems that char* is a better choice. Someone suggested that programmers should not use char* in C++ because we could do all things that char* could do with string, and it's more secure and faster.
Did you ever used char* in C++? What are the specific conditions?
It's safer to use std::string because you don't need to worry about allocating / deallocating memory for the string. The C++ std::string class is likely to use a char* array internally. However, the class will manage the allocation, reallocation, and deallocation of the internal array for you. This removes all the usual risks that come with using raw pointers, such as memory leaks, buffer overflows, etc.
Additionally, it's also incredibly convenient. You can copy strings, append to a string, etc., without having to manually provide buffer space or use functions like strcpy/strcat. With std::string it's as simple as using the = or + operators.
Basically, it's:
std::string s1 = "Hello ";
std::string s2 = s1 + "World";
versus...
const char* s1 = "Hello";
char s2[1024]; // How much should I really even allocate here?
strcpy(s2, s1);
strcat(s2, " World ");
Edit:
In response to your edit regarding the use of char* in C++: Many C++ programmers will claim you should never use char* unless you're working with some API/legacy function that requires it, in which case you can use the std::string::c_str() function to convert an std::string to const char*.
However, I would say there are some legitimate uses of C-arrays in C++. For example, if performance is absolutely critical, a small C-array on the stack may be a better solution than std::string. You may also be writing a program where you need absolute control over memory allocation/deallocation, in which case you would use char*. Also, as was pointed out in the comments section, std::string isn't guaranteed to provide you with a contiguous, writable buffer *, so you can't directly write from a file into an std::string if you need your program to be completely portable. However, in the event you need to do this, std::vector would still probably be preferable to using a raw C-array.
* Although in C++11 this has changed so that std::string does provide you with a contiguous buffer
Ok, the question changed a lot since I first answered.
Native char arrays are a nightmare of memory management and buffer overruns compared to std::string. I always prefer to use std::string.
That said, char array may be a better choice in some circumstances due to performance constraints (although std::string may actually be faster in some cases -- measure first!) or prohibition of dynamic memory usage in an embedded environment, etc.
In general, std::string is a cleaner, safer way to go because it removes the burden of memory management from the programmer. The main reason it can be faster than char *'s, is that std::string stores the length of the string. So, you don't have to do the work of iterating through the entire character array looking for the terminating NULL character each time you want to do a copy, append, etc.
That being said, you will still find a lot of c++ programs that use a mix of std::string and char *, or have even written their own string classes from scratch. In older compilers, std::string was a memory hog and not necessarily as fast as it could be. This has gotten better over time, but some high-performance applications (e.g., games and servers) can still benefit from hand-tuned string manipulations and memory-management.
I would recommend starting out with std::string, or possibly creating a wrapper for it with more utility functions (e.g., starts_with(), split(), format(), etc.). If you find when benchmarking your code that string manipulation is a bottleneck, or uses too much memory, you can then decide if you want to accept the extra risks and testing that a custom string library demands.
TIP: One way of getting around the memory issues and still use std::string is to use an embedded database such as SQLite. This is particularly useful when generating and manipulating extremely large lists of strings, and performance is better than what you might expect.
C char * strings cannot contain '\0' characters. C++ string can handle null characters without a problem. If users enter strings containing \0 and you use C strings, your code may fail. There are also security issues associated with this.
Implementations of std::string hide the memory usage from you. If you're writing performance-critical code, or you actually have to worry about memory fragmentation, then using char* can save you a lot of headaches.
For anything else though, the fact that std::string hides all of this from you makes it so much more usable.
String may actually be better in terms of performance. And innumerable other reasons - security, memory management, convenient string functions, make std::string an infinitely better choice.
Edit: To see why string might be more efficient, read Herb Sutter's books - he discusses a way to internally implement string to use Lazy Initialization combined with Referencing.
Use std::string for its incredible convenience - automatic memory handling and methods / operators. With some string manipulations, most implementations will have optimizations in place (such as delayed evaluation of several subsequent manipulations - saves memory copying).
If you need to rely on the specific char layout in memory for other optimizations, try std::vector<char> instead. If you have a non-empty vector vec, you can get a char* pointer using &vec[0] (the vector has to be nonempty).
Short answer, I don't. The exception is when I'm using third party libraries that require them. In those cases I try to stick to std::string::c_str().
In all my professional career I've had an opportunity to use std::string at only two projects. All others had their own string classes :)
Having said that, for new code I generally use std::string when I can, except for module boundaries (functions exported by dlls/shared libraries) where I tend to expose C interface and stay away from C++ types and issues with binary incompatibilities between compilers and std library implementations.
Compare and contrast the following C and C++ examples:
strlen(infinitelengthstring)
versus
string.length()
std::string is almost always preferred. Even for speed, it uses small array on the stack before dynamically allocating more for larger strings.
However, char* pointers are still needed in many situations for writing strings/data into a raw buffer (e.g. network I/O), which can't be done with std::string.
The only time I've recently used a C-style char string in a C++ program was on a project that needed to make use of two C libraries that (of course) used C strings exclusively. Converting back and forth between the two string types made the code really convoluted.
I also had to do some manipulation on the strings that's actually kind of awkward to do with std::string, but I wouldn't have considered that a good reason to use C strings in the absence of the above constraint.

char[] (c lang) to string (c++ lang) conversion

I can see that almost all modern APIs are developed in the C language. There are reasons for that: processing speed, low level language, cross platform and so on.
Nowadays, I program in C++ because of its Object Orientation, the use of string, the STL but, mainly because it is a better C.
However when my C++ programs need to interact with C APIs I really get upset when I need to convert char[] types to C++ strings, then operate on these strings using its powerful methods, and finally convert from theses strings to char[] again (because the API needs to receive char[]).
If I repeat these operations for millions of records the processing times are higher because of the conversion task.
For that simple reason, I feel that char[] is an obstacle in the moment to assume the C++ as a better c.
I would like to know if you feel the same, if not (I hope so!) I really would like to know which is the best way for C++ to coexist with char[] types without doing those awful conversions.
Thanks for your attention.
The C++ string class has a lot of problems, and yes, what you're describing is one of them.
More specifically, there is no way to do string processing without creating a copy of the string, which may be expensive.
And because virtually all string processing algorithms are implemented as class members, they can only be used on the string class.
A solution you might want to experiment with is the combination of Boost.Range and Boost.StringAlgo.
Range allows you to create sequences out of a pair of iterators. They don't take ownership of the data, so they don't copy the string. they just point to the beginning and end of your char* string.
And Boost.StringAlgo implements all the common string operations as non-member functions, that can be applied to any sequence of characters. Such as, for example, a Boost range.
The combination of these two libraries pretty much solve the problem. They let you avoid having to copy your strings to process them.
Another solution might be to store your string data as std::string's all the time. When you need to pass a char* to some API functoin, simply pass it the address of the first character. (&str[0]).
The problem with this second approach is that std::string doesn't guarantee that its string buffer is null-terminated, so you either have to rely on implementation details, or manually add a null byte as part of the string.
If you use std::vector<char> instead of std::string, the underlying storage will be a C array that can be accessed with &someVec[0]. However, you do lose a lot of std::string conveniences such as operator+.
That said, I'd suggest just avoiding C APIs that mutate strings as much as possible. If you need to pass an immutable string to a C function, you can use c_str(), which is fast and non-copying on most std::string implementations.
I'm not sure what you mean by "conversion", but won't the following suffice for moving between char*, char[], and std::string?
char[] charString = {'a', 'b', 'c', '\0'};
std::string standardString(&charString[0]);
const char* stringPointer(standardString.c_str());
I don't think it's as bad as you make it out to be.
There is a cost converting a char[] to a std::string, but if you're going to be modifying the string, you have to pay that cost anyway whether converting to a std::string or copying to another char[] buffer.
The conversion going the other way (via string.c_str()) is usually trivial. It's usually returning a pointer to an internal buffer (just don't give that buffer to code that will modify it).
I'm not sure why you would be constrained to using C strings and still have an environment that runs C++ code but if you really don't want the overhead of conversion, than don't convert. Just write routines that operate on the C strings.
Another reason for converting to C++ style strings is for bound safety.
"... because it is a better C."
Baloney. C++ is a vastly inferior dialect of C. The problems it solves are trivial, the problems it brings, much worse than those it solves.