Why don't the std::fstream classes take a std::string? - c++

This isn't a design question, really, though it may seem like it. (Well, okay, it's kind of a design question). What I'm wondering is why the C++ std::fstream classes don't take a std::string in their constructor or open methods. Everyone loves code examples so:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::string filename = "testfile";
std::ifstream fin;
fin.open(filename.c_str()); // Works just fine.
fin.close();
//fin.open(filename); // Error: no such method.
//fin.close();
}
This gets me all the time when working with files. Surely the C++ library would use std::string wherever possible?

By taking a C string the C++03 std::fstream class reduced dependency on the std::string class. In C++11, however, the std::fstream class does allow passing a std::string for its constructor parameter.
Now, you may wonder why isn't there a transparent conversion from a std:string to a C string, so a class that expects a C string could still take a std::string just like a class that expects a std::string can take a C string.
The reason is that this would cause a conversion cycle, which in turn may lead to problems. For example, suppose std::string would be convertible to a C string so that you could use std::strings with fstreams. Suppose also that C string are convertible to std::strings as is the state in the current standard. Now, consider the following:
void f(std::string str1, std::string str2);
void f(char* cstr1, char* cstr2);
void g()
{
char* cstr = "abc";
std::string str = "def";
f(cstr, str); // ERROR: ambiguous
}
Because you can convert either way between a std::string and a C string the call to f() could resolve to either of the two f() alternatives, and is thus ambiguous. The solution is to break the conversion cycle by making one conversion direction explicit, which is what the STL chose to do with c_str().

There are several places where the C++ standard committee did not really optimize the interaction between facilities in the standard library.
std::string and its use in the library is one of these.
One other example is std::swap. Many containers have a swap member function, but no overload of std::swap is supplied. The same goes for std::sort.
I hope all these small things will be fixed in the upcoming standard.

Maybe it's a consolation: all fstream's have gotten an open(string const &, ...) next to the open(char const *, ...) in the working draft of the C++0x standard.
(see e.g. 27.8.1.6 for the basic_ifstream declaration)
So when it gets finalised and implemented, it won't get you anymore :)

The stream IO library has been added to the standard C++ library before the STL. In order to not break backward compatibility, it has been decided to avoid modifying the IO library when the STL was added, even if that meant some issues like the one you raise.

# Bernard:
Monoliths "Unstrung." "All for one, and one for all" may work for Musketeers, but it doesn't work nearly as well for class designers. Here's an example that is not altogether exemplary, and it illustrates just how badly you can go wrong when design turns into overdesign. The example is, unfortunately, taken from a standard library near you...
~ http://www.gotw.ca/gotw/084.htm

It is inconsequential, that is true. What do you mean by std::string's interface being large? What does large mean, in this context - lots of method calls? I'm not being facetious, I am actually interested.
It has more methods than it really needs, and its behaviour of using integral offsets rather than iterators is a bit iffy (as it's contrary to the way the rest of the library works).
The real issue I think is that the C++ library has three parts; it has the old C library, it has the STL, and it has strings-and-iostreams. Though some efforts were made to bridge the different parts (e.g. the addition of overloads to the C library, because C++ supports overloading; the addition of iterators to basic_string; the addition of the iostream iterator adaptors), there are a lot of inconsistencies when you look at the detail.
For example, basic_string includes methods that are unnecessary duplicates of standard algorithms; the various find methods, could probably be safely removed. Another example: locales use raw pointers instead of iterators.

C++ grew up on smaller machines than the monsters we write code for today. Back when iostream was new many developers really cared about code size (they had to fit their entire program and data into several hundred KB). Therefore, many didn't want to pull in the "big" C++ string library. Many didn't even use the iostream library for the same reasons, code size.
We didn't have thousands of megabytes of RAM to throw around like we do today. We usually didn't have function level linking so we were at the mercy of the developer of the library to use a lot of separate object files or else pull in tons of uncalled code. All of this FUD made developers steer away from std::string.
Back then I avoided std::string too. "Too bloated", "called malloc too often", etc. Foolishly using stack-based buffers for strings, then adding all kinds of tedious code to make sure it doesn't overrun.

Is there any class in STL that takes a string... I dont think so (couldnt find any in my quick search). So it's probably some design decision, that no class in STL should be dependent on any other STL class (that is not directly needed for functionality).

I believe that this has been thought about and was done to avoid the dependency; i.e. #include <fstream> should not force one to #include <string>.
To be honest, this seems like quite an inconsequential issue. A better question would be, why is std::string's interface so large?

Nowadays you can solve this problem very easily: add -std=c++11 to your CFLAGS.

Related

Why don't C++03 file streams accept string constructor parameters?

Why does the following code compile in C++11 and does not in C++03? (both gcc and cl)
#include <string>
#include <iostream>
#include <fstream>
int main(int argc, char* argv[]) {
const std::string t("Hello");
std::ofstream out(t);
}
Why don't the C++03 streams accept std::string as the constructor parameter? Was this decision based on something or did it happen accidentally?
The code fails when compiled with a strictly conforming C++03 compiler because the constructor that takes a std::string was only added in C++11.
As to the question, "was it based on something smart", as the interface was added, it can be inferred that there was no technical reason for it to be omitted.
It's an addition of convenience as, if you have a std::string, you can always call .c_str() to get a C string suitable for use with the old interface. (As the documentation in C++11 says , the constructors that take std::string have exactly the same effect as calling the corresponding constructor which takes a const char* with the result of calling .c_str() on the string.)
As I recall, this was discussed on c.l.c++.m some years ago, and Andrew Koenig (I think it was Andrew, anyway) said it was actually brought up during some meetings, but the idea of accepting a string was quickly conflated with the idea of accepting a wstring as well, and from there turned into a discussion about support for internationalized character sets in file names, and ... shortly after that the whole idea was dropped because it had opened a big can of worms nobody was prepared to deal with right then.
They had simply forgotten about adding the string constructor in C++03. Now that's fixed. This time round other things were forgotten, like make_unique. There's always something more that one could have done. C++03 also forgot to specify default arguments for function templates, which are now included.
Edit: As #Charles says, it may not be a literal "forgetting", but rather, it's something that clearly should be there, but just hadn't been specified for some reason or another. Further examples are given by std::next/std::prev, which are a great relief, and std::to_string and std::stoi/d/ul/ull, which again make perfect sense, but nobody had gotten around to specifying them until this time round. There isn't necessarily a deep reason for their previous absence.

C-like procedures in C++?

Does the C++ correct programming style demand writing all your code with classes or are C-like procedures allowed ? If I were to give some code to someone else, would it be accepted as C++ just because it has std::vector and std::string (instead of char *) inside, or everything has to be a class?
eg:
int number = 204;
std::string result = my_procedure(number);
OR
MyClass machine;
std::string result = machine.get(number);
Are there cases where the programmer, will have to, or is allowed to have C-like procedures in some of his source code ? Did you ever had to do something like that?
In the context of this question where does the margin between C and C++ exist (if any)?
I hope my question is clear and inline with the rules.
It's certainly OK to have free functions in your code -- this is a matter of architecture, not of "++ness". For small programs it doesn't even make sense to go all-in with classes, as OO is really a tool to manage complexity. If the complexity isn't there to begin with, why bother?
Your second question, where is the line drawn, doesn't have a short answer. The obvious one is that the line is drawn in all places where the C standard differs from the one for C++. But if you are looking for a list of high-level language features that C++ has and C does not, here are some of them:
Class types and OO (of course)
The STL
Function/operator overloading
References
Templates
new/delete to manage memory
C++ is a multi-paradigm language, where OO, procedural, generic/generative and - to a lesser (but increasing with C++0x) extent functional - are among the paradigms. You should use whichever is the best fit for the problem: you want the code to be easy to get and keep right, and hard to stuff up.
The utility of classes is in packaging data (state) along with the related functions. If your wordify function doesn't need to retain any state between calls, then there's no need to use a class/object. That said, if you can predict that you will soon want to have state, then it may be useful to start with a class so that the client code doesn't need to change as much.
For example, imagine adding a parameter to the function to specify whether the output should be "first", "second" instead of "one", "two". You want the behaviour to be set once and remembered, but somewhere else in the application some other code may also use the functionality but prefer the other setting. It's a good idea to use an object to hold the state and arrange it so each object's lifetime and accessibility aligns with the code that will use it.
EDIT:
In the context of this question where does the margin between C and C++ exist (if any)?
C++ just gives you a richer set of ways to tackle your programming tasks, each with their necessary pros and cons. There are plenty of times when the best way is still the same way it would have been done in C. It would be perverse for a C++ programmer to choose a worse way simply because it was only possible in C++. Still, such choices exist at myriad levels, so it's common to have say a non-[class-]member function that takes a const std::string& parameter, combining the procedural function call with object-oriented data that's been generated by a template: it all works well together.
C++ allows a variety of programming styles, procedural code being one of them.
Which style to use depends on the problem you are trying to solve. The margin between C and C++ is are you compiling your code with a C++ compiler.
I do at times use procedural functions in my code. Sometimes it best solves the problem.
C++ code can still be valid C++ code even without classes. Classes are more of a feature, and are not required in every piece of code.
C++ is basically C with more features, so there isn't really a "margin" between the two languages.
If you read Stroustrup's Design and Evolution, you'll see that C++ was intended to support multiple programming styles. Use whichever one is most appropriate the problem (not the same as always just usnig the one you know.)
In legacy real world applications, there is often very little distinction. Some C++ code was originally C code nad then recompilied. Slowly it migrates to use C++ features to improve its quality.
In short, Yes, C++ code can be procedural. But you'll find it does differ from C code if you use C++ features where appropriate.
What is good practice needs to consider things like encapsulation, testability, and the comprehensibility of the client API.
#include <sstream>
#include <string>
#include <iostream>
using namespace std;
string wordify(int n)
{
stringstream ss;
ss << n; // put the integer into the stream
return ss.str(); // return the string
}
int main()
{
string s1 = wordify(42);
string s2 = wordify(45678);
string s3 = wordify(-99);
cout << s1 << ' ' << s2 << ' ' << s3 << '\n';
}

Why isn't main defined `main(std::vector<std::string> args)`?

This question is only half tongue-in-cheek. I sometimes dream of a world without naked arrays or c strings.
If you're using c++, shouldn't the preferred definition of main be something like:
int main(std::vector<std::string> args)
?
There are already multiple definitions of main to choose from, why isn't there a version that is in the spirit of C++?
Because C++ was designed to be (almost) backwards compatible with C code.
There are cases where C code will break in a C++ compiler, but they're fairly rare, and there's generally a good reason for why this breakage is required.
But changing the signature of main, while convenient for us, isn't necessary. For someone porting code from C, it'd just be another thing you had to change, for no particular gain.
Another reason is that std::vector is a library, not a part of the core language. And so, you'd have to #include <vector> in every C++ program.
And of course, in its early years, C++ didn't have a vector. So when the vector was added to the language, sure, they could have changed the signature of main, but then they'd break not just C code, but also every existing C++ program.
Is it worth it?
There's another reason besides compatibility with C. In C++, the standard library is meant to be entirely optional. There's nothing about the C++ language itself that forces you to use things from the standard library like std::string and std::vector, and that is entirely by design. In fact, it is by design that you should be able to use some parts of the standard library without having to use others (although this has led to some generally annoying things like std::ifstream and std::ofstream operating on const char* C-style strings rather than on std::string objects).
The theory is that you are supposed to be able to take the C++ language and use whatever library of objects, containers, etc, that you want with it, be it the standard library or some proprietary library (e.g. Qt, MFC), or something that you created yourself. Defining main to accept an argument composed of types defined in the standard library defeats this design goal.
Because it will force you to include <vector> and <string>.
A concern that keeps coming back to my mind is that once you allow complex types, you end up with the risk of exceptions being thrown in the type's constructor. And, as the language is currently designed, there's absolutely no way for such an exception to be caught. If it were decided that such exceptions should be caught, then that would require considerably more work, both for the committee and compiler writers, making it all somewhat more troublesome than simply saying "allow std::vector<std::string>>".
There might be other issues as well. The whole "incompatible with runtimes" seems like something of a red herring to me, given that you can provide basically the same functionality now with macros. But something like this is rather more involved.
Like #jalf, I sometimes find myself writing
int main(int argc, char** argv) {
std::vector<std::string> args(argv, argv+argc);
But yes, like everyone said, main has to be C-compatible. I see it as an interface to the OS runtime, which is (at least int the systems I use) is written in C.
Although some development environment encourage replacements such as wmain or _tmain. You could write your own compiler/IDE, which would encourage the use of int vmain(const std::vector<std::string>& args).
Because C++ was in existence long before the C++ standard was, and built heavily on C. And, like the original ANSI C standard, codifying existing practice was an important part of it.
There's no point in changing something that works, especially if it will break a whole lot of existing code.
Even ISO C, which has been through quite a few iterations, still takes backwards compatibility very seriously.
Basically, to remain compatable with C. If we were to give up that, main() would be moved into a class.
The multiple definitions of main() aren't really multiple definitions. There are three:
int main(void) (C99)
int main(int argc, char *argv[]) (C99)
int main(int argc, char *argv[], char *envp[]) (POSIX, I think)
But in POSIX, you only really get the third. The fact that you can call a function with extra arguments is down to the C calling convention.
You can't have extern "C" int main(std::vector<std::string> argv) unless the memory layout happens to be magically compatible in a portable way. The runtime will call main() with the wrong arguments and fail. There's no easy way around this.
Instead, provided main() wasn't extern "C", the runtime could try the various supported symbols in order until it found one. I imagine main() is extern "C" by default, and that you can't overload extern "C" functions.
For more fun, void main(void).
I'll try explain in the best possible sentence.
C++ was designed to be backward compatible with C and std::vector was included in a library that only got included in C++.
Also, C++ and C programs were designed to run in shells or command lines (windows, linux, mac) and OS pass arguments to a program as an array of String. How would an OS really translate vectors?
That's the most reason I can think of, feel free to criticize it.

Why use c strings in c++?

Is there any good reason to use C-strings in C++ nowadays? My textbook uses them in examples at some points, and I really feel like it would be easier just to use a std::string.
The only reasons I've had to use them is when interfacing with 3rd party libraries that use C style strings. There might also be esoteric situations where you would use C style strings for performance reasons, but more often than not, using methods on C++ strings is probably faster due to inlining and specialization, etc.
You can use the c_str() method in many cases when working with those sort of APIs, but you should be aware that the char * returned is const, and you should not modify the string via that pointer. In those sort of situations, you can still use a vector<char> instead, and at least get the benefit of easier memory management.
A couple more memory control notes:
C strings are POD types, so they can be allocated in your application's read-only data segment. If you declare and define std::string constants at namespace scope, the compiler will generate additional code that runs before main() that calls the std::string constructor for each constant. If your application has many constant strings (e.g. if you have generated C++ code that uses constant strings), C strings may be preferable in this situation.
Some implementations of std::string support a feature called SSO ("short string optimization" or "small string optimization") where the std::string class contains storage for strings up to a certain length. This increases the size of std::string but often significantly reduces the frequency of free-store allocations/deallocations, improving performance. If your implementation of std::string does not support SSO, then constructing an empty std::string on the stack will still perform a free-store allocation. If that is the case, using temporary stack-allocated C strings may be helpful for performance-critical code that uses strings. Of course, you have to be careful not to shoot yourself in the foot when you do this.
Because that's how they come from numerous API/libraries?
Let's say you have some string constants in your code, which is a pretty common need. It's better to define these as C strings than as C++ objects -- more lightweight, portable, etc. Now, if you're going to be passing these strings to various functions, it's nice if these functions accept a C string instead of requiring a C++ string object.
Of course, if the strings are mutable, then it's much more convenient to use C++ string objects.
If a function needs a constant string I still prefer to use 'const char*' (or const wchar_t*) even if the program uses std::string, CString, EString or whatever elsewhere.
There are just too many sources of strings in a large code base to be sure the caller will have the string as a std::string and 'const char*' is the lowest common denominator.
Textbooks feature old-school C strings because many basic functions still expect them as arguments, or return them. Additionally, it gives some insight into the underlying structure of the string in memory.
Memory control. I recently had to handle strings (actually blobs from a database) about 200-300 MB in size, in a massively multithreaded application. It was a situation where just-one-more copy of the string might have burst the 32bit address space. I had to know exactly how many copies of the string existed. Although I'm an STL evangelist, I used char * then because it gave me the guarantee that no extra memory or even extra copy was allocated. I knew exactly how much space it would need.
Apart from that, standard STL string processing misses out on some great C functions for string processing/parsing. Thankfully, std::string has the c_str() method for const access to the internal buffer. To use printf() you still have to use char * though (what a crazy idea of the C++ team to not include (s)printf-like functionality, one of the most useful functions EVER in C. I hope boost::format will soon be included in the STL.
If the C++ code is "deep" (close to the kernel, heavily dependent on C libraries, etc.) you may want to use C strings explicitly to avoid lots of conversions in to and out of std::string. Of, if you're interfacing with other language domains (Python, Ruby, etc.) you might do so for the same reason. Otherwise, use std::string.
Some posts mention memory concerns. That might be a good reason to shun std::string, but char* probably is not the best replacement. It's still an OO language. Your own string class is probably better than a char*. It may even be more efficient - you can apply the Small String Optimization, for instance.
In my case, I was trying to get about 1GB worth of strings out of a 2GB file, stuff them in records with about 60 fields and then sort them 7 times of different fields. My predecessors code took 25 hours with char*, my code ran in 1 hour.
1) "string constant" is a C string (const char *), converting it to const std::string& is run-time process, not necessarily simple or optimized.
2) fstream library uses c-style strings to pass file names.
My rule of thumb is to pass const std::string& if I am about to use the data as std::string anyway (say, when I store them in a vector), and const char * in other cases.
After spending far, far, too much time debugging initialization rules and every conceivable string implementation on several platforms we require static strings to be const char*.
After spending far, far, too much time debugging bad char* code and memory leaks I suggest that all non-static strings be some type of string object ... until profiling shows that you can and should do something better ;-)
Legacy code that doesn't know of std::string. Also, before C++11 opening files with std::ifstream or std::ofstream was only possible with const char* as an input to the file name.
Given the choice, there is generally no reason to choose primitive C strings (char*) over C++ strings (std::string). However, often you don't have the luxury of choice. For instance, std::fstream's constructors take C strings, for historical reasons. Also, C libraries (you guessed it!) use C strings.
In your own C++ code it is best to use std::string and extract the object's C string as needed by using the c_str() function of std::string.
It depends on the libraries you're using. For example, when working with the MFC, it's often easier to use CString when working with various parts of the Windows API. It also seems to perform better than std::string in Win32 applications.
However, std::string is part of the C++ standard, so if you want better portability, go with std::string.
For applications such as most embedded platforms where you do not have the luxury of a heap to store the strings being manipulated, and where deterministic preallocation of string buffers is required.
c strings don't carry the overhead of being a class.
c strings generally can result in faster code, as they are closer to the machine level
This is not to say, you can't write bad code with them. They can be misused, like every other construct.
There is a wealth of libary calls that demand them for historical reasons.
Learn to use c strings, and stl strings, and use each when it makes sense to do so.
STL strings are certainly far easier to use, and I don't see any reason to not use them.
If you need to interact with a library that only takes C-style strings as arguments, you can always call the c_str() method of the string class.
The usual reason to do it is that you enjoy writing buffer overflows in your string handling. Counted strings are so superior to terminated strings it's hard to see why the C designers ever used terminated strings. It was a bad decision then; it's a bad decision now.

How do you handle strings in C++?

Which is your favorite way to go with strings in C++? A C-style array of chars? Or wchar_t? CString, std::basic_string, std::string, BSTR or CComBSTR?
Certainly each of these has its own area of application, but anyway, which is your favorite and why?
std::string or std::wstring, depending on your needs. Why?
They're standard
They're portable
They can handle I18N
They have performance guarantees (as per the standard)
Protected against buffer overflows and similar attacks
Are easily converted to other types as needed
Are nicely templated, giving you a wide variety of options while reducing code bloat and improving performance. Really. Compilers that can't handle templates are long gone now.
A C-style array of chars is just asking for trouble. You'll still need to deal with them on occasion (and that's what std::string.c_str() is for), but, honestly -- one of the biggest dangers in C is programmers doing Bad Things with char* and winding up with buffer overflows. Just don't do it.
An array of wchar__t is the same thing, just bigger.
CString, BSTR, and CComBSTR are not standard and not portable. Avoid them unless absolutely forced. Optimally, just convert a std::string/std::wstring to them when needed, which shouldn't be very expensive.
Note that std::string is just a child of std::basic_string, but you're still better off using std::string unless you have a really good reason not to. Really Good. Let the compiler take care of the optimization in this situation.
std::string !!
There's a reason why they call it a "Standard".
basic_string is an implementation detail and should be ignored.
BSTR & CComBSTR only for interOp with COM, and only for the moment of interop.
std::string unless I need to call an API that specifically takes one of the others that you listed.
Here's an article comparing the most common kinds of strings in C++ and how to convert between them. Unraveling Strings in Visual C++
If you can use MFC, use CString. Otherwise use std::string. Plus, std::string works on any platform that supports standard C++.
When I have a choice (I usually don't), I tend to use std::string with UTF-8 encoding (and the help of UTF8 CPP library. Not that I like std::string that much, but at least it is standard and portable.
Unfortunatelly, in almost all real-life projects I've worked on, there have been internal string classes - most of them actually better than std::string, but still...
I am a Qt dev, so of course I tend to use QString whenever possible :).
It's quite nice: unicode compliant, thread-safe implicit-sharing (aka copy-on-write), and it comes with an API designed to solve practical real-world problems (split, join, replace (with and without regex), conversion to/from numbers...)
If I can't use QString, then std::wstring. If you are stuck with C, I recommend glib GString.
I use std::string (or basic_string<TCHAR>) whenever I can. It's quite versatile (just like CStringT), it's type-safe (unlike printf), and it's available on every platform.
Other, std::wstring.
std::string is 20th century technology. Use Unicode, and sell to 6 billion people instead of 300 milion.
C-style char arrays have their place, but if you use them extensively you are asking to waste time debugging off by one errors. We have our own string class tailored for use in our (embedded development environment).
We don't use std::string because it isn't always available for us.
If you're using MFC, use CString. Otherwise I agree with most of the others, std::string or std::wstring all the way.
Microsoft could have done the world a huge favor by adding std::basic_string<TCHAR> overloads in their latest update of MFC.
I like to use TCHAR which is a define for wchar or char according to the projects settings.
It's defined in tchar.h where you can find all of the related definitions for functions and types you need.
std::string and std::wstring if I can, and something else if I have to.
They may not be perfect, but they are well tested, well understood, and very versatile. They play nicely with the rest of the standard library which is also a huge bonus.
Also worth mentioning, stringstreams.
std::string is better than nothing, but it's annoying that it's missing basic functionality like split, join and even a decent format call...
Unicode is the future. Do not use char* and std::string. Please )
I am tired of localization bugs.