Why does modern C++ still retain the old C style prototype for main with int argc, char** argv - c++

Other than backward compatibility, is there any specific reason for not changing the prototype and benefit from modern features of C++?
Modern C++ encourages the use of value semantics.
Why should argv still be a char array with pointer semantics, which could possible lead to issues if not properly handled?
In Java we have the class with, void main(String[] args).
Am I missing anything fundamental?

Other than backward compatibility
This is the reason and considered most important in moving forwards with C++.
But there is a proposal: A Modern C++ Signature for main

Many systems contain code written in languages other than C++. While it would be possible to have the main entry-point function in user code accept C++ string parameters and then have an implementation-supplied wrapper which translates the underlying platform's arguments into C++ strings and then passes them to the user-code entry point, this would only be advantageous in cases where the underlying environment represented strings in some fashion other than zero-terminated sequences of characters, and where the wrapper would be able to avoid having to first translate to C-style strings before translating into C++ strings. Otherwise, anything that could be done by the wrapper could be done just as well by user code.

Related

C vs C++ file handling

I have been working in C and C++ and when it comes to file handling I get confused. Let me state the things I know.
In C, we use functions:
fopen, fclose, fwrite, fread, ftell, fseek, fprintf, fscanf, feof, fileno, fgets, fputs, fgetc, fputc.
FILE *fp for file pointer.
Modes like r, w, a
I know when to use these functions (Hope I didn't miss anything important).
In C++, we use functions / operators:
fstream f
f.open, f.close, f>>, f<<, f.seekg, f.seekp, f.tellg, f.tellp, f.read, f.write, f.eof.
Modes like ios::in, ios::out, ios::bin , etc...
So is it possible (recommended) to use C compatible file operations in C++?
Which is more widely used and why?
Is there anything other than these that I should be aware of?
Sometimes there's existing code expecting one or the other that you need to interact with, which can affect your choice, but in general the C++ versions wouldn't have been introduced if there weren't issues with the C versions that they could fix. Improvements include:
RAII semantics, which means e.g. fstreams close the files they manage when they leave scope
modal ability to throw exceptions when errors occur, which can make for cleaner code focused on the typical/successful processing (see http://en.cppreference.com/w/cpp/io/basic_ios/exceptions for API function and example)
type safety, such that how input and output is performed is implicitly selected using the variable type involved
C-style I/O has potential for crashes: e.g. int my_int = 32; printf("%s", my_int);, where %s tells printf to expect a pointer to an ASCIIZ character buffer but my_int appears instead; firstly, the argument passing convention may mean ints are passed differently to const char*s, secondly sizeof int may not equal sizeof const char*, and finally, even if printf extracts 32 as a const char* at best it will just print random garbage from memory address 32 onwards until it coincidentally hits a NUL character - far more likely the process will lack permissions to read some of that memory and the program will crash. Modern C compilers can sometimes validate the format string against the provided arguments, reducing this risk.
extensibility for user-defined types (i.e. you can teach streams how to handle your own classes)
support for dynamically sizing receiving strings based on the actual input, whereas the C functions tend to need hard-coded maximum buffer sizes and loops in user code to assemble arbitrary sized input
Streams are also sometimes criticised for:
verbosity of formatting, particularly "io manipulators" setting width, precision, base, padding, compared to the printf-style format strings
a sometimes confusing mix of manipulators that persist their settings across multiple I/O operations and others that are reset after each operation
lack of convenience class for RAII pushing/saving and later popping/restoring the manipulator state
being slow, as Ben Voigt comments and documents here
The performance differences between printf()/fwrite style I/O and C++ IO streams formatting are very much implementation dependent.
Some implementations (visual C++ for instance), build their IO streams on top of FILE * objects and this tends to increase the run-time complexity of their implementation. Note, however, that there was no particular constraint to implement the library in this fashion.
In my own opinion, the benefits of C++ I/O are as follows:
Type safety.
Flexibility of implementation. Code can be written to do specific formatting or input to or from a generic ostream or istream object. The application can then invoke this code with any kind of derived stream object. If the code that I have written and tested against a file now needs to be applied to a socket, a serial port, or some other kind of internal stream, you can create a stream implementation specific to that kind of I/O. Extending the C style I/O in this fashion is not even close to possible.
Flexibility in locale settings: the C approach of using a single global locale is, in my opinion, seriously flawed. I have experienced cases where I invoked library code (a DLL) that changed the global locale settings underneath my code and completely messed up my output. A C++ stream allows you to imbue() any locale to a stream object.
An interesting critical comparison can be found here.
C++ FQA io
Not exactly polite, but makes to think...
Disclaimer
The C++ FQA (that is a critical response to the C++ FAQ) is often considered by the C++ community a "stupid joke issued by a silly guy the even don't understand what C++ is or wants to be"(cit. from the FQA itself).
These kind of argumentation are often used to flame (or escape from) religion battles between C++ believers, Others languages believers or language atheists each in his own humble opinion convinced to be in something superior to the other.
I'm not interested in such battles, I just like to stimulate critical reasoning about the pros and cons argumentation. The C++ FQA -in this sens- has the advantage to place both the FQA and the FAQ one over the other, allowing an immediate comparison. And that the only reason why I referenced it.
Following TonyD comments, below (tanks for them, I makes me clear my intention need a clarification...), it must be noted that the OP is not just discussing the << and >> (I just talk about them in my comments just for brevity) but the entire function-set that makes up the I/O model of C and C++.
With this idea in mind, think also to other "imperative" languages (Java, Python, D ...) and you'll see they are all more conformant to the C model than C++. Sometimes making it even type safe (what the C model is not, and that's its major drawback).
What my point is all about
At the time C++ came along as mainstream (1996 or so) the <iostream.h> library (note the ".h": pre-ISO) was in a language where templates where not yet fully available, and, essentially, no type-safe support for varadic functions (we have to wait until C++11 to get them), but with type-safe overloaded functions.
The idea of oveloading << retuning it's first parameter over and over is -in fact- a way to chain a variable set of arguments using only a binary function, that can be overload in a type-safe manner. That idea extends to whatever "state management function" (like width() or precision()) through manipulators (like setw) appear as a natural consequence. This points -despite of what you may thing to the FQA author- are real facts. And is also a matter of fact that FQA is the only site I found that talks about it.
That said, years later, when the D language was designed starting offering varadic templates, the writef function was added in the D standard library providing a printf-like syntax, but also being perfectly type-safe. (see here)
Nowadays C++11 also have varadic templates ... so the same approach can be putted in place just in the same way.
Moral of the story
Both C++ and C io models appear "outdated" respect to a modern programming style.
C retain speed, C++ type safety and a "more flexible abstraction for localization" (but I wonder how many C++ programmers are in the world that are aware of locales and facets...) at a runtime-cost (jut track with a debugger the << of a number, going through stream, buffer locale and facet ... and all the related virtual functions!).
The C model, is also easily extensible to parametric messages (the one the order of the parameters depends on the localization of the text they are in) with format strings like
#1%d #2%i allowing scrpting like "text #2%i text #1%d ..."
The C++ model has no concept of "format string": the parameter order is fixed and itermixed with the text.
But C++11 varadic templates can be used to provide a support that:
can offer both compile-time and run-time locale selection
can offer both compile-time and run-time parametric order
can offer compile-time parameter type safety
... all using a simple format string methodology.
Is it time to standardize a new C++ i/o model ?

Length of a C string: std::strlen() vs. std::char_traits<char>::length()

Both are equivalent in that they return the length of the null-terminated character sequence. Are there reasons of preferring one to the other?
Use the simpler alternative. std::char_traits::length is great and all, but for C strings it does the same and is much longer code.
Do yourself a favour and avoid code bloat. I’m a huge fan of C++ functions over C equivalent (e.g. I will never use std::strcpy or std::memcpy, there’s a perfectly fine std::copy). But avoiding std::strlen is just silly.
One reason to use C++ functions exclusively is interface uniformity: for instance, both std::strcpy and std::memcpy have atrocious interfaces. However, std::strlen is a perfectly fine algorithm in the best tradition of C++. It doesn’t generalise, true, but the neither do other class-specific free functions found in the standard library.
std::strlen() is a holdover from the C Standard Library and only operates on a const char* (it is unsafe in that it has undefined behavior if the string is not null terminated). If the string is using a wide character set (e.g. const unsigned short*), std::strlen() is useless.
std::char_traits<T>::length() will operate on whatever the T type is (e.g. if it is an unsigned short, it will still operate properly, but also requires a null terminated value - that is the last value must be T(0) - if an array of T's passed to it is not null terminated, behavior is undefined as well).
In general, when dealing with strings, it is better to use std::string::length() instead of using C-style character strings.
Both Zack Howland and Konrad Rudolph have a point. Thanks. I accept both answers. The summarized reply would be:
There doesn't seem to be any except personal preference either for shorter code or the C++ standard library (I leave out generalization since it wasn't the point of the question as can be seen from the title).
std::strlen() is a C standard library compatibility (even though it is part of ISO C++) function that takes const char* as an argument. length() is a method of the std::string family of classes. So if you want to use strlen() on std::string you'd have to write:
strlen(mystring.c_str())
which is less tidy than mystr.length(). Apart from that, there should be no tangible difference (for char type, that is).

different programming language and calling convention

According to the wiki:
Different programming languages use different calling conventions, and
so can different platforms (CPU architecture + operating system). This
can sometimes cause problems when combining modules written in
multiple languages
So am I supposed to be careful when I call C/C++ functions (exported from .so/.dll) in Python?
If yes, what should I be careful of?
Calling between Python and C is a solved problem, so you generally won't have to worry about anything -- not least because Python is written in C.
The problem described there is more an issue when multiple languages on a platform are all developed independently, individually bootstrapped from assembler. For example, there used to famously be problems calling between C and FORTRAN, and between C and Pascal, at a time when all three of those languages coexisted as rough equals. The old Mac Toolbox was written mostly in assembler using Pascal calling conventions, and early application developers used Borland Pascal. But then C compilers like Symatec's THINK C appeared, and those programmers had to specifically worry about how to translate argument types and string conventions (Pascal strings carry a length byte at the beginning, and of course C strings have a 0 at the end.)
The calling convention is not an issue when calling C from python because the python interpreter itself is written in C, and thus uses the same calling convention as the C code.
It becomes an issue when combining code written in different compiled languages, for example C and Ada.
Usually C is the system programming language, and thus the standard. This means that a compiler for any language will generally have special support to inter-operate with C. See here for an example of interoperability between C and Ada. If there is no special support wrappers must be written at the assembly level.
If you need to call C++/Ada from python you'll have to follow a two steps process. The first step is to write a C wrapper around the C++/Ada functions. The second step is to call the C wrapper from python.
Take for example the following C++ class.
class Foo
{
public:
Foo ();
int bar ();
...
};
If you want to make it available to python, you first need to wrap it in C:
extern "C" {
typedef void *FooPtr;
FooPtr foo_new () { return (FooPtr)new Foo(); }
int foo_bar (FooPtr foo) { return ((Foo*)foo)->bar(); }
...
}
Then you can call that from python. (Note, in real life there are tools to automate this, like Boost.Python).
Note that there are two aspects of writing a wrapper: converting between calling conventions and converting between type representations. The later part is usually the most difficult because as soon as you go beyond basic types languages tend to have very different representations of equivalent types.
C and C++ have different calling conventions. C doesn't support calling C++ functions but C++ support calling C functions and supports the implementation of C functions in C++. That is, you can declare functions as extern "C" in C++. I don't know how the python bindings are calling C++ but you should be careful to call C and C++ functions with their respective calling conventions. Generally, C calling conventions are used when calling between different languages. This may mean that you might need to write a wrapper function using C style conventions for C++ style functions.
The calling conventions refers to how functions calls are generated in the resulting assembly. One convention is for the caller to be responsible for creating and cleaning up the stack space for the local variables. The other convention is for the called function to create and clean up the stack space for the local variables. It doesn't matter which one you use, but obviously they have to match or the stack will get corrupted.
You can see more details here: http://en.wikipedia.org/wiki/Calling_convention

Why isn't main defined `main(std::vector<std::string> args)`?

This question is only half tongue-in-cheek. I sometimes dream of a world without naked arrays or c strings.
If you're using c++, shouldn't the preferred definition of main be something like:
int main(std::vector<std::string> args)
?
There are already multiple definitions of main to choose from, why isn't there a version that is in the spirit of C++?
Because C++ was designed to be (almost) backwards compatible with C code.
There are cases where C code will break in a C++ compiler, but they're fairly rare, and there's generally a good reason for why this breakage is required.
But changing the signature of main, while convenient for us, isn't necessary. For someone porting code from C, it'd just be another thing you had to change, for no particular gain.
Another reason is that std::vector is a library, not a part of the core language. And so, you'd have to #include <vector> in every C++ program.
And of course, in its early years, C++ didn't have a vector. So when the vector was added to the language, sure, they could have changed the signature of main, but then they'd break not just C code, but also every existing C++ program.
Is it worth it?
There's another reason besides compatibility with C. In C++, the standard library is meant to be entirely optional. There's nothing about the C++ language itself that forces you to use things from the standard library like std::string and std::vector, and that is entirely by design. In fact, it is by design that you should be able to use some parts of the standard library without having to use others (although this has led to some generally annoying things like std::ifstream and std::ofstream operating on const char* C-style strings rather than on std::string objects).
The theory is that you are supposed to be able to take the C++ language and use whatever library of objects, containers, etc, that you want with it, be it the standard library or some proprietary library (e.g. Qt, MFC), or something that you created yourself. Defining main to accept an argument composed of types defined in the standard library defeats this design goal.
Because it will force you to include <vector> and <string>.
A concern that keeps coming back to my mind is that once you allow complex types, you end up with the risk of exceptions being thrown in the type's constructor. And, as the language is currently designed, there's absolutely no way for such an exception to be caught. If it were decided that such exceptions should be caught, then that would require considerably more work, both for the committee and compiler writers, making it all somewhat more troublesome than simply saying "allow std::vector<std::string>>".
There might be other issues as well. The whole "incompatible with runtimes" seems like something of a red herring to me, given that you can provide basically the same functionality now with macros. But something like this is rather more involved.
Like #jalf, I sometimes find myself writing
int main(int argc, char** argv) {
std::vector<std::string> args(argv, argv+argc);
But yes, like everyone said, main has to be C-compatible. I see it as an interface to the OS runtime, which is (at least int the systems I use) is written in C.
Although some development environment encourage replacements such as wmain or _tmain. You could write your own compiler/IDE, which would encourage the use of int vmain(const std::vector<std::string>& args).
Because C++ was in existence long before the C++ standard was, and built heavily on C. And, like the original ANSI C standard, codifying existing practice was an important part of it.
There's no point in changing something that works, especially if it will break a whole lot of existing code.
Even ISO C, which has been through quite a few iterations, still takes backwards compatibility very seriously.
Basically, to remain compatable with C. If we were to give up that, main() would be moved into a class.
The multiple definitions of main() aren't really multiple definitions. There are three:
int main(void) (C99)
int main(int argc, char *argv[]) (C99)
int main(int argc, char *argv[], char *envp[]) (POSIX, I think)
But in POSIX, you only really get the third. The fact that you can call a function with extra arguments is down to the C calling convention.
You can't have extern "C" int main(std::vector<std::string> argv) unless the memory layout happens to be magically compatible in a portable way. The runtime will call main() with the wrong arguments and fail. There's no easy way around this.
Instead, provided main() wasn't extern "C", the runtime could try the various supported symbols in order until it found one. I imagine main() is extern "C" by default, and that you can't overload extern "C" functions.
For more fun, void main(void).
I'll try explain in the best possible sentence.
C++ was designed to be backward compatible with C and std::vector was included in a library that only got included in C++.
Also, C++ and C programs were designed to run in shells or command lines (windows, linux, mac) and OS pass arguments to a program as an array of String. How would an OS really translate vectors?
That's the most reason I can think of, feel free to criticize it.

Why use c strings in c++?

Is there any good reason to use C-strings in C++ nowadays? My textbook uses them in examples at some points, and I really feel like it would be easier just to use a std::string.
The only reasons I've had to use them is when interfacing with 3rd party libraries that use C style strings. There might also be esoteric situations where you would use C style strings for performance reasons, but more often than not, using methods on C++ strings is probably faster due to inlining and specialization, etc.
You can use the c_str() method in many cases when working with those sort of APIs, but you should be aware that the char * returned is const, and you should not modify the string via that pointer. In those sort of situations, you can still use a vector<char> instead, and at least get the benefit of easier memory management.
A couple more memory control notes:
C strings are POD types, so they can be allocated in your application's read-only data segment. If you declare and define std::string constants at namespace scope, the compiler will generate additional code that runs before main() that calls the std::string constructor for each constant. If your application has many constant strings (e.g. if you have generated C++ code that uses constant strings), C strings may be preferable in this situation.
Some implementations of std::string support a feature called SSO ("short string optimization" or "small string optimization") where the std::string class contains storage for strings up to a certain length. This increases the size of std::string but often significantly reduces the frequency of free-store allocations/deallocations, improving performance. If your implementation of std::string does not support SSO, then constructing an empty std::string on the stack will still perform a free-store allocation. If that is the case, using temporary stack-allocated C strings may be helpful for performance-critical code that uses strings. Of course, you have to be careful not to shoot yourself in the foot when you do this.
Because that's how they come from numerous API/libraries?
Let's say you have some string constants in your code, which is a pretty common need. It's better to define these as C strings than as C++ objects -- more lightweight, portable, etc. Now, if you're going to be passing these strings to various functions, it's nice if these functions accept a C string instead of requiring a C++ string object.
Of course, if the strings are mutable, then it's much more convenient to use C++ string objects.
If a function needs a constant string I still prefer to use 'const char*' (or const wchar_t*) even if the program uses std::string, CString, EString or whatever elsewhere.
There are just too many sources of strings in a large code base to be sure the caller will have the string as a std::string and 'const char*' is the lowest common denominator.
Textbooks feature old-school C strings because many basic functions still expect them as arguments, or return them. Additionally, it gives some insight into the underlying structure of the string in memory.
Memory control. I recently had to handle strings (actually blobs from a database) about 200-300 MB in size, in a massively multithreaded application. It was a situation where just-one-more copy of the string might have burst the 32bit address space. I had to know exactly how many copies of the string existed. Although I'm an STL evangelist, I used char * then because it gave me the guarantee that no extra memory or even extra copy was allocated. I knew exactly how much space it would need.
Apart from that, standard STL string processing misses out on some great C functions for string processing/parsing. Thankfully, std::string has the c_str() method for const access to the internal buffer. To use printf() you still have to use char * though (what a crazy idea of the C++ team to not include (s)printf-like functionality, one of the most useful functions EVER in C. I hope boost::format will soon be included in the STL.
If the C++ code is "deep" (close to the kernel, heavily dependent on C libraries, etc.) you may want to use C strings explicitly to avoid lots of conversions in to and out of std::string. Of, if you're interfacing with other language domains (Python, Ruby, etc.) you might do so for the same reason. Otherwise, use std::string.
Some posts mention memory concerns. That might be a good reason to shun std::string, but char* probably is not the best replacement. It's still an OO language. Your own string class is probably better than a char*. It may even be more efficient - you can apply the Small String Optimization, for instance.
In my case, I was trying to get about 1GB worth of strings out of a 2GB file, stuff them in records with about 60 fields and then sort them 7 times of different fields. My predecessors code took 25 hours with char*, my code ran in 1 hour.
1) "string constant" is a C string (const char *), converting it to const std::string& is run-time process, not necessarily simple or optimized.
2) fstream library uses c-style strings to pass file names.
My rule of thumb is to pass const std::string& if I am about to use the data as std::string anyway (say, when I store them in a vector), and const char * in other cases.
After spending far, far, too much time debugging initialization rules and every conceivable string implementation on several platforms we require static strings to be const char*.
After spending far, far, too much time debugging bad char* code and memory leaks I suggest that all non-static strings be some type of string object ... until profiling shows that you can and should do something better ;-)
Legacy code that doesn't know of std::string. Also, before C++11 opening files with std::ifstream or std::ofstream was only possible with const char* as an input to the file name.
Given the choice, there is generally no reason to choose primitive C strings (char*) over C++ strings (std::string). However, often you don't have the luxury of choice. For instance, std::fstream's constructors take C strings, for historical reasons. Also, C libraries (you guessed it!) use C strings.
In your own C++ code it is best to use std::string and extract the object's C string as needed by using the c_str() function of std::string.
It depends on the libraries you're using. For example, when working with the MFC, it's often easier to use CString when working with various parts of the Windows API. It also seems to perform better than std::string in Win32 applications.
However, std::string is part of the C++ standard, so if you want better portability, go with std::string.
For applications such as most embedded platforms where you do not have the luxury of a heap to store the strings being manipulated, and where deterministic preallocation of string buffers is required.
c strings don't carry the overhead of being a class.
c strings generally can result in faster code, as they are closer to the machine level
This is not to say, you can't write bad code with them. They can be misused, like every other construct.
There is a wealth of libary calls that demand them for historical reasons.
Learn to use c strings, and stl strings, and use each when it makes sense to do so.
STL strings are certainly far easier to use, and I don't see any reason to not use them.
If you need to interact with a library that only takes C-style strings as arguments, you can always call the c_str() method of the string class.
The usual reason to do it is that you enjoy writing buffer overflows in your string handling. Counted strings are so superior to terminated strings it's hard to see why the C designers ever used terminated strings. It was a bad decision then; it's a bad decision now.