Why use c strings in c++? - c++

Is there any good reason to use C-strings in C++ nowadays? My textbook uses them in examples at some points, and I really feel like it would be easier just to use a std::string.

The only reasons I've had to use them is when interfacing with 3rd party libraries that use C style strings. There might also be esoteric situations where you would use C style strings for performance reasons, but more often than not, using methods on C++ strings is probably faster due to inlining and specialization, etc.
You can use the c_str() method in many cases when working with those sort of APIs, but you should be aware that the char * returned is const, and you should not modify the string via that pointer. In those sort of situations, you can still use a vector<char> instead, and at least get the benefit of easier memory management.

A couple more memory control notes:
C strings are POD types, so they can be allocated in your application's read-only data segment. If you declare and define std::string constants at namespace scope, the compiler will generate additional code that runs before main() that calls the std::string constructor for each constant. If your application has many constant strings (e.g. if you have generated C++ code that uses constant strings), C strings may be preferable in this situation.
Some implementations of std::string support a feature called SSO ("short string optimization" or "small string optimization") where the std::string class contains storage for strings up to a certain length. This increases the size of std::string but often significantly reduces the frequency of free-store allocations/deallocations, improving performance. If your implementation of std::string does not support SSO, then constructing an empty std::string on the stack will still perform a free-store allocation. If that is the case, using temporary stack-allocated C strings may be helpful for performance-critical code that uses strings. Of course, you have to be careful not to shoot yourself in the foot when you do this.

Because that's how they come from numerous API/libraries?

Let's say you have some string constants in your code, which is a pretty common need. It's better to define these as C strings than as C++ objects -- more lightweight, portable, etc. Now, if you're going to be passing these strings to various functions, it's nice if these functions accept a C string instead of requiring a C++ string object.
Of course, if the strings are mutable, then it's much more convenient to use C++ string objects.

If a function needs a constant string I still prefer to use 'const char*' (or const wchar_t*) even if the program uses std::string, CString, EString or whatever elsewhere.
There are just too many sources of strings in a large code base to be sure the caller will have the string as a std::string and 'const char*' is the lowest common denominator.

Textbooks feature old-school C strings because many basic functions still expect them as arguments, or return them. Additionally, it gives some insight into the underlying structure of the string in memory.

Memory control. I recently had to handle strings (actually blobs from a database) about 200-300 MB in size, in a massively multithreaded application. It was a situation where just-one-more copy of the string might have burst the 32bit address space. I had to know exactly how many copies of the string existed. Although I'm an STL evangelist, I used char * then because it gave me the guarantee that no extra memory or even extra copy was allocated. I knew exactly how much space it would need.
Apart from that, standard STL string processing misses out on some great C functions for string processing/parsing. Thankfully, std::string has the c_str() method for const access to the internal buffer. To use printf() you still have to use char * though (what a crazy idea of the C++ team to not include (s)printf-like functionality, one of the most useful functions EVER in C. I hope boost::format will soon be included in the STL.

If the C++ code is "deep" (close to the kernel, heavily dependent on C libraries, etc.) you may want to use C strings explicitly to avoid lots of conversions in to and out of std::string. Of, if you're interfacing with other language domains (Python, Ruby, etc.) you might do so for the same reason. Otherwise, use std::string.

Some posts mention memory concerns. That might be a good reason to shun std::string, but char* probably is not the best replacement. It's still an OO language. Your own string class is probably better than a char*. It may even be more efficient - you can apply the Small String Optimization, for instance.
In my case, I was trying to get about 1GB worth of strings out of a 2GB file, stuff them in records with about 60 fields and then sort them 7 times of different fields. My predecessors code took 25 hours with char*, my code ran in 1 hour.

1) "string constant" is a C string (const char *), converting it to const std::string& is run-time process, not necessarily simple or optimized.
2) fstream library uses c-style strings to pass file names.
My rule of thumb is to pass const std::string& if I am about to use the data as std::string anyway (say, when I store them in a vector), and const char * in other cases.

After spending far, far, too much time debugging initialization rules and every conceivable string implementation on several platforms we require static strings to be const char*.
After spending far, far, too much time debugging bad char* code and memory leaks I suggest that all non-static strings be some type of string object ... until profiling shows that you can and should do something better ;-)

Legacy code that doesn't know of std::string. Also, before C++11 opening files with std::ifstream or std::ofstream was only possible with const char* as an input to the file name.

Given the choice, there is generally no reason to choose primitive C strings (char*) over C++ strings (std::string). However, often you don't have the luxury of choice. For instance, std::fstream's constructors take C strings, for historical reasons. Also, C libraries (you guessed it!) use C strings.
In your own C++ code it is best to use std::string and extract the object's C string as needed by using the c_str() function of std::string.

It depends on the libraries you're using. For example, when working with the MFC, it's often easier to use CString when working with various parts of the Windows API. It also seems to perform better than std::string in Win32 applications.
However, std::string is part of the C++ standard, so if you want better portability, go with std::string.

For applications such as most embedded platforms where you do not have the luxury of a heap to store the strings being manipulated, and where deterministic preallocation of string buffers is required.

c strings don't carry the overhead of being a class.
c strings generally can result in faster code, as they are closer to the machine level
This is not to say, you can't write bad code with them. They can be misused, like every other construct.
There is a wealth of libary calls that demand them for historical reasons.
Learn to use c strings, and stl strings, and use each when it makes sense to do so.

STL strings are certainly far easier to use, and I don't see any reason to not use them.
If you need to interact with a library that only takes C-style strings as arguments, you can always call the c_str() method of the string class.

The usual reason to do it is that you enjoy writing buffer overflows in your string handling. Counted strings are so superior to terminated strings it's hard to see why the C designers ever used terminated strings. It was a bad decision then; it's a bad decision now.

Related

C vs C++ file handling

I have been working in C and C++ and when it comes to file handling I get confused. Let me state the things I know.
In C, we use functions:
fopen, fclose, fwrite, fread, ftell, fseek, fprintf, fscanf, feof, fileno, fgets, fputs, fgetc, fputc.
FILE *fp for file pointer.
Modes like r, w, a
I know when to use these functions (Hope I didn't miss anything important).
In C++, we use functions / operators:
fstream f
f.open, f.close, f>>, f<<, f.seekg, f.seekp, f.tellg, f.tellp, f.read, f.write, f.eof.
Modes like ios::in, ios::out, ios::bin , etc...
So is it possible (recommended) to use C compatible file operations in C++?
Which is more widely used and why?
Is there anything other than these that I should be aware of?
Sometimes there's existing code expecting one or the other that you need to interact with, which can affect your choice, but in general the C++ versions wouldn't have been introduced if there weren't issues with the C versions that they could fix. Improvements include:
RAII semantics, which means e.g. fstreams close the files they manage when they leave scope
modal ability to throw exceptions when errors occur, which can make for cleaner code focused on the typical/successful processing (see http://en.cppreference.com/w/cpp/io/basic_ios/exceptions for API function and example)
type safety, such that how input and output is performed is implicitly selected using the variable type involved
C-style I/O has potential for crashes: e.g. int my_int = 32; printf("%s", my_int);, where %s tells printf to expect a pointer to an ASCIIZ character buffer but my_int appears instead; firstly, the argument passing convention may mean ints are passed differently to const char*s, secondly sizeof int may not equal sizeof const char*, and finally, even if printf extracts 32 as a const char* at best it will just print random garbage from memory address 32 onwards until it coincidentally hits a NUL character - far more likely the process will lack permissions to read some of that memory and the program will crash. Modern C compilers can sometimes validate the format string against the provided arguments, reducing this risk.
extensibility for user-defined types (i.e. you can teach streams how to handle your own classes)
support for dynamically sizing receiving strings based on the actual input, whereas the C functions tend to need hard-coded maximum buffer sizes and loops in user code to assemble arbitrary sized input
Streams are also sometimes criticised for:
verbosity of formatting, particularly "io manipulators" setting width, precision, base, padding, compared to the printf-style format strings
a sometimes confusing mix of manipulators that persist their settings across multiple I/O operations and others that are reset after each operation
lack of convenience class for RAII pushing/saving and later popping/restoring the manipulator state
being slow, as Ben Voigt comments and documents here
The performance differences between printf()/fwrite style I/O and C++ IO streams formatting are very much implementation dependent.
Some implementations (visual C++ for instance), build their IO streams on top of FILE * objects and this tends to increase the run-time complexity of their implementation. Note, however, that there was no particular constraint to implement the library in this fashion.
In my own opinion, the benefits of C++ I/O are as follows:
Type safety.
Flexibility of implementation. Code can be written to do specific formatting or input to or from a generic ostream or istream object. The application can then invoke this code with any kind of derived stream object. If the code that I have written and tested against a file now needs to be applied to a socket, a serial port, or some other kind of internal stream, you can create a stream implementation specific to that kind of I/O. Extending the C style I/O in this fashion is not even close to possible.
Flexibility in locale settings: the C approach of using a single global locale is, in my opinion, seriously flawed. I have experienced cases where I invoked library code (a DLL) that changed the global locale settings underneath my code and completely messed up my output. A C++ stream allows you to imbue() any locale to a stream object.
An interesting critical comparison can be found here.
C++ FQA io
Not exactly polite, but makes to think...
Disclaimer
The C++ FQA (that is a critical response to the C++ FAQ) is often considered by the C++ community a "stupid joke issued by a silly guy the even don't understand what C++ is or wants to be"(cit. from the FQA itself).
These kind of argumentation are often used to flame (or escape from) religion battles between C++ believers, Others languages believers or language atheists each in his own humble opinion convinced to be in something superior to the other.
I'm not interested in such battles, I just like to stimulate critical reasoning about the pros and cons argumentation. The C++ FQA -in this sens- has the advantage to place both the FQA and the FAQ one over the other, allowing an immediate comparison. And that the only reason why I referenced it.
Following TonyD comments, below (tanks for them, I makes me clear my intention need a clarification...), it must be noted that the OP is not just discussing the << and >> (I just talk about them in my comments just for brevity) but the entire function-set that makes up the I/O model of C and C++.
With this idea in mind, think also to other "imperative" languages (Java, Python, D ...) and you'll see they are all more conformant to the C model than C++. Sometimes making it even type safe (what the C model is not, and that's its major drawback).
What my point is all about
At the time C++ came along as mainstream (1996 or so) the <iostream.h> library (note the ".h": pre-ISO) was in a language where templates where not yet fully available, and, essentially, no type-safe support for varadic functions (we have to wait until C++11 to get them), but with type-safe overloaded functions.
The idea of oveloading << retuning it's first parameter over and over is -in fact- a way to chain a variable set of arguments using only a binary function, that can be overload in a type-safe manner. That idea extends to whatever "state management function" (like width() or precision()) through manipulators (like setw) appear as a natural consequence. This points -despite of what you may thing to the FQA author- are real facts. And is also a matter of fact that FQA is the only site I found that talks about it.
That said, years later, when the D language was designed starting offering varadic templates, the writef function was added in the D standard library providing a printf-like syntax, but also being perfectly type-safe. (see here)
Nowadays C++11 also have varadic templates ... so the same approach can be putted in place just in the same way.
Moral of the story
Both C++ and C io models appear "outdated" respect to a modern programming style.
C retain speed, C++ type safety and a "more flexible abstraction for localization" (but I wonder how many C++ programmers are in the world that are aware of locales and facets...) at a runtime-cost (jut track with a debugger the << of a number, going through stream, buffer locale and facet ... and all the related virtual functions!).
The C model, is also easily extensible to parametric messages (the one the order of the parameters depends on the localization of the text they are in) with format strings like
#1%d #2%i allowing scrpting like "text #2%i text #1%d ..."
The C++ model has no concept of "format string": the parameter order is fixed and itermixed with the text.
But C++11 varadic templates can be used to provide a support that:
can offer both compile-time and run-time locale selection
can offer both compile-time and run-time parametric order
can offer compile-time parameter type safety
... all using a simple format string methodology.
Is it time to standardize a new C++ i/o model ?

Length of a C string: std::strlen() vs. std::char_traits<char>::length()

Both are equivalent in that they return the length of the null-terminated character sequence. Are there reasons of preferring one to the other?
Use the simpler alternative. std::char_traits::length is great and all, but for C strings it does the same and is much longer code.
Do yourself a favour and avoid code bloat. I’m a huge fan of C++ functions over C equivalent (e.g. I will never use std::strcpy or std::memcpy, there’s a perfectly fine std::copy). But avoiding std::strlen is just silly.
One reason to use C++ functions exclusively is interface uniformity: for instance, both std::strcpy and std::memcpy have atrocious interfaces. However, std::strlen is a perfectly fine algorithm in the best tradition of C++. It doesn’t generalise, true, but the neither do other class-specific free functions found in the standard library.
std::strlen() is a holdover from the C Standard Library and only operates on a const char* (it is unsafe in that it has undefined behavior if the string is not null terminated). If the string is using a wide character set (e.g. const unsigned short*), std::strlen() is useless.
std::char_traits<T>::length() will operate on whatever the T type is (e.g. if it is an unsigned short, it will still operate properly, but also requires a null terminated value - that is the last value must be T(0) - if an array of T's passed to it is not null terminated, behavior is undefined as well).
In general, when dealing with strings, it is better to use std::string::length() instead of using C-style character strings.
Both Zack Howland and Konrad Rudolph have a point. Thanks. I accept both answers. The summarized reply would be:
There doesn't seem to be any except personal preference either for shorter code or the C++ standard library (I leave out generalization since it wasn't the point of the question as can be seen from the title).
std::strlen() is a C standard library compatibility (even though it is part of ISO C++) function that takes const char* as an argument. length() is a method of the std::string family of classes. So if you want to use strlen() on std::string you'd have to write:
strlen(mystring.c_str())
which is less tidy than mystr.length(). Apart from that, there should be no tangible difference (for char type, that is).

Why are string literals const?

It is known that in C++ string literals are immutable and the result of modifying a string literal is undefined. For example
char * str = "Hello!";
str[1] = 'a';
This will bring to an undefined behavior.
Besides that string literals are placed in static memory. So they exists during whole program. I would like to know why do string literals have such properties.
There are a couple of different reasons.
One is to allow storing string literals in read-only memory (as others have already mentioned).
Another is to allow merging of string literals. If one program uses the same string literal in several different places, it's nice to allow (but not necessarily require) the compiler to merge them, so you get multiple pointers to the same memory, instead of each occupying a separate chunk of memory. This can also apply when two string literals aren't necessarily identical, but do have the same ending:
char *foo = "long string";
char *bar = "string";
In a case like this, it's possible for bar to be foo+5 (if I'd counted correctly).
In either of these cases, if you allow modifying a string literal, it could modify the other string literal that happens to have the same contents. At the same time, there's honestly not a lot of point in mandating that either -- it's pretty uncommon to have enough string literals that you could overlap that most people probably want the compiler to run slower just to save (maybe) a few dozen bytes or so of memory.
By the time the first standard was written, there were already compilers that used all three of these techniques (and probably a few others besides). Since there was no way to describe one behavior you'd get from modifying a string literal, and nobody apparently thought it was an important capability to support, they did the obvious: said even attempting to do so led to undefined behavior.
It's undefined behaviour to modify a literal because the standard says so. And the standard says so to allow compilers to put literals in read only memory. And it does this for a number of reasons. One of which is to allow compilers to make the optimisation of storing only one instance of a literal that is repeated many times in the source.
I believe you ask about the reason why literals are placed in
read-only memory, not about technical details of linker doing this and
that or legal details of a standard forbidding such and such.
When modification of string literals works, it leads to subtle bugs
even in the absence of string merging (which we have reasons to
disallow if we decided to permit modification). When you see code like
char *str="Hello";
.../* some code, but str and str[...] are not modified */
printf("%s world\n", str);
it's a natural conclusion that you know what's going to be printed,
because str (and its content) were not modified in a particular
place, between initialization and use.
However, if string literals are writable, you don't know it any
more: str[0] could be overwritten later, in this code or inside a
deeply nested function call, and when the code is run again,
char *str="Hello";
won't guarantee anything about the content of str anymore. As we
expect, this initialization is implemented as moving the address known
in link time into a place for str. It does not check that str
contains "Hello" and it does not allocate a new copy of it. However,
we understand this code as resetting str to "Hello". It's hard to
overcome this natural understanding, and it's hard to reason about the
code where it is not guaranteed. When you see an expression like
x+14, what if you had to think about 14 being possibly overwritten
in other code, so it became 42? The same problem with strings.
That's the reason to disallow modification of string literals, both in
the standard (with no requirement to detect the failure early) and in
actual target platforms (providing the bonus of detecting potential
bugs).
I believe that many attempts to explain this thing suffer from the
worst kind of circular reasoning. The standard forbids writing to
literals because the compiler can merge strings, or they can be placed
in read-only memory. They are placed in read-only memory to catch the
violation of the standard. And it's valid to merge literals because
the standard forbids... is it a kind of explanation you asked for?
Let's look at other
languages. Common Lisp standard
makes modification of literals undefined behaviour, even though the
history of preceding Lisps is very different with the history of C
implementations. That's because writable literals are logically
dangerous. Language standards and memory layouts only reflect that
fact.
Python language has exactly one place where something resembling
"writing to literals" can happen: parameter default values, and this
fact confuses people all the time.
Your question is tagged C++, and I'm unsure of its current state
with respect to implicit conversion to non-const char*: if it's a
conversion, is it deprecated? I expect other answers to provide a
complete enlightenment on this point. As we talk of other languages
here, let me mention plain C. Here, string literals are not const,
and an equivalent question to ask would be why can't I modify string
literals (and people with more experience ask instead, why are
string literals non-const if I can't modify them?). However, the
reasoning above is fully applicable to C, despite this difference.
Because is K&R C, there was not such thing as "const". And similarly in pre-ANSI C++. Hence there was a lot of code which had things like char * str = "Hello!"; If the Standards committee made text literals const, all those programs would have no longer compiled. So they made a compromise. Text literals are official const char[], but they have a silent implicit conversion to char*.
In C++, string literals are const because you aren't allowed
to modify them. In standard C, they would have been const as
well, except that when const was introduced into C, there was
so much code along the lines of char* p = "somethin"; that
making them const would have broke, that it was deemed
unacceptable. (The C++ committee chose a different solution to
this problem, with a deprecated implicit conversion which allows
the above.)
In the original C, string literals were not const, and were
mutable, and it was garanteed that no two string literals shared
any memory. This was quickly realized to be a serious error,
allowing things like:
void
mutate(char* p)
{
static char c = 'a';
*p = a ++;
}
and in another module:
mutate( "hello" ); // Can't trust what is written, can you.
(Some early implementations of Fortran had a similar issue,
where F(4) might call F with just about any integral value.
The Fortran committee fixed this, just like the C committee
fixed string literals in C.)

How do you handle strings in C++?

Which is your favorite way to go with strings in C++? A C-style array of chars? Or wchar_t? CString, std::basic_string, std::string, BSTR or CComBSTR?
Certainly each of these has its own area of application, but anyway, which is your favorite and why?
std::string or std::wstring, depending on your needs. Why?
They're standard
They're portable
They can handle I18N
They have performance guarantees (as per the standard)
Protected against buffer overflows and similar attacks
Are easily converted to other types as needed
Are nicely templated, giving you a wide variety of options while reducing code bloat and improving performance. Really. Compilers that can't handle templates are long gone now.
A C-style array of chars is just asking for trouble. You'll still need to deal with them on occasion (and that's what std::string.c_str() is for), but, honestly -- one of the biggest dangers in C is programmers doing Bad Things with char* and winding up with buffer overflows. Just don't do it.
An array of wchar__t is the same thing, just bigger.
CString, BSTR, and CComBSTR are not standard and not portable. Avoid them unless absolutely forced. Optimally, just convert a std::string/std::wstring to them when needed, which shouldn't be very expensive.
Note that std::string is just a child of std::basic_string, but you're still better off using std::string unless you have a really good reason not to. Really Good. Let the compiler take care of the optimization in this situation.
std::string !!
There's a reason why they call it a "Standard".
basic_string is an implementation detail and should be ignored.
BSTR & CComBSTR only for interOp with COM, and only for the moment of interop.
std::string unless I need to call an API that specifically takes one of the others that you listed.
Here's an article comparing the most common kinds of strings in C++ and how to convert between them. Unraveling Strings in Visual C++
If you can use MFC, use CString. Otherwise use std::string. Plus, std::string works on any platform that supports standard C++.
When I have a choice (I usually don't), I tend to use std::string with UTF-8 encoding (and the help of UTF8 CPP library. Not that I like std::string that much, but at least it is standard and portable.
Unfortunatelly, in almost all real-life projects I've worked on, there have been internal string classes - most of them actually better than std::string, but still...
I am a Qt dev, so of course I tend to use QString whenever possible :).
It's quite nice: unicode compliant, thread-safe implicit-sharing (aka copy-on-write), and it comes with an API designed to solve practical real-world problems (split, join, replace (with and without regex), conversion to/from numbers...)
If I can't use QString, then std::wstring. If you are stuck with C, I recommend glib GString.
I use std::string (or basic_string<TCHAR>) whenever I can. It's quite versatile (just like CStringT), it's type-safe (unlike printf), and it's available on every platform.
Other, std::wstring.
std::string is 20th century technology. Use Unicode, and sell to 6 billion people instead of 300 milion.
C-style char arrays have their place, but if you use them extensively you are asking to waste time debugging off by one errors. We have our own string class tailored for use in our (embedded development environment).
We don't use std::string because it isn't always available for us.
If you're using MFC, use CString. Otherwise I agree with most of the others, std::string or std::wstring all the way.
Microsoft could have done the world a huge favor by adding std::basic_string<TCHAR> overloads in their latest update of MFC.
I like to use TCHAR which is a define for wchar or char according to the projects settings.
It's defined in tchar.h where you can find all of the related definitions for functions and types you need.
std::string and std::wstring if I can, and something else if I have to.
They may not be perfect, but they are well tested, well understood, and very versatile. They play nicely with the rest of the standard library which is also a huge bonus.
Also worth mentioning, stringstreams.
std::string is better than nothing, but it's annoying that it's missing basic functionality like split, join and even a decent format call...
Unicode is the future. Do not use char* and std::string. Please )
I am tired of localization bugs.

Why don't the std::fstream classes take a std::string?

This isn't a design question, really, though it may seem like it. (Well, okay, it's kind of a design question). What I'm wondering is why the C++ std::fstream classes don't take a std::string in their constructor or open methods. Everyone loves code examples so:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::string filename = "testfile";
std::ifstream fin;
fin.open(filename.c_str()); // Works just fine.
fin.close();
//fin.open(filename); // Error: no such method.
//fin.close();
}
This gets me all the time when working with files. Surely the C++ library would use std::string wherever possible?
By taking a C string the C++03 std::fstream class reduced dependency on the std::string class. In C++11, however, the std::fstream class does allow passing a std::string for its constructor parameter.
Now, you may wonder why isn't there a transparent conversion from a std:string to a C string, so a class that expects a C string could still take a std::string just like a class that expects a std::string can take a C string.
The reason is that this would cause a conversion cycle, which in turn may lead to problems. For example, suppose std::string would be convertible to a C string so that you could use std::strings with fstreams. Suppose also that C string are convertible to std::strings as is the state in the current standard. Now, consider the following:
void f(std::string str1, std::string str2);
void f(char* cstr1, char* cstr2);
void g()
{
char* cstr = "abc";
std::string str = "def";
f(cstr, str); // ERROR: ambiguous
}
Because you can convert either way between a std::string and a C string the call to f() could resolve to either of the two f() alternatives, and is thus ambiguous. The solution is to break the conversion cycle by making one conversion direction explicit, which is what the STL chose to do with c_str().
There are several places where the C++ standard committee did not really optimize the interaction between facilities in the standard library.
std::string and its use in the library is one of these.
One other example is std::swap. Many containers have a swap member function, but no overload of std::swap is supplied. The same goes for std::sort.
I hope all these small things will be fixed in the upcoming standard.
Maybe it's a consolation: all fstream's have gotten an open(string const &, ...) next to the open(char const *, ...) in the working draft of the C++0x standard.
(see e.g. 27.8.1.6 for the basic_ifstream declaration)
So when it gets finalised and implemented, it won't get you anymore :)
The stream IO library has been added to the standard C++ library before the STL. In order to not break backward compatibility, it has been decided to avoid modifying the IO library when the STL was added, even if that meant some issues like the one you raise.
# Bernard:
Monoliths "Unstrung." "All for one, and one for all" may work for Musketeers, but it doesn't work nearly as well for class designers. Here's an example that is not altogether exemplary, and it illustrates just how badly you can go wrong when design turns into overdesign. The example is, unfortunately, taken from a standard library near you...
~ http://www.gotw.ca/gotw/084.htm
It is inconsequential, that is true. What do you mean by std::string's interface being large? What does large mean, in this context - lots of method calls? I'm not being facetious, I am actually interested.
It has more methods than it really needs, and its behaviour of using integral offsets rather than iterators is a bit iffy (as it's contrary to the way the rest of the library works).
The real issue I think is that the C++ library has three parts; it has the old C library, it has the STL, and it has strings-and-iostreams. Though some efforts were made to bridge the different parts (e.g. the addition of overloads to the C library, because C++ supports overloading; the addition of iterators to basic_string; the addition of the iostream iterator adaptors), there are a lot of inconsistencies when you look at the detail.
For example, basic_string includes methods that are unnecessary duplicates of standard algorithms; the various find methods, could probably be safely removed. Another example: locales use raw pointers instead of iterators.
C++ grew up on smaller machines than the monsters we write code for today. Back when iostream was new many developers really cared about code size (they had to fit their entire program and data into several hundred KB). Therefore, many didn't want to pull in the "big" C++ string library. Many didn't even use the iostream library for the same reasons, code size.
We didn't have thousands of megabytes of RAM to throw around like we do today. We usually didn't have function level linking so we were at the mercy of the developer of the library to use a lot of separate object files or else pull in tons of uncalled code. All of this FUD made developers steer away from std::string.
Back then I avoided std::string too. "Too bloated", "called malloc too often", etc. Foolishly using stack-based buffers for strings, then adding all kinds of tedious code to make sure it doesn't overrun.
Is there any class in STL that takes a string... I dont think so (couldnt find any in my quick search). So it's probably some design decision, that no class in STL should be dependent on any other STL class (that is not directly needed for functionality).
I believe that this has been thought about and was done to avoid the dependency; i.e. #include <fstream> should not force one to #include <string>.
To be honest, this seems like quite an inconsequential issue. A better question would be, why is std::string's interface so large?
Nowadays you can solve this problem very easily: add -std=c++11 to your CFLAGS.