How do you handle strings in C++?

How do you handle strings in C++? - c++

Which is your favorite way to go with strings in C++? A C-style array of chars? Or wchar_t? CString, std::basic_string, std::string, BSTR or CComBSTR?
Certainly each of these has its own area of application, but anyway, which is your favorite and why?

std::string or std::wstring, depending on your needs. Why?
They're standard
They're portable
They can handle I18N
They have performance guarantees (as per the standard)
Protected against buffer overflows and similar attacks
Are easily converted to other types as needed
Are nicely templated, giving you a wide variety of options while reducing code bloat and improving performance. Really. Compilers that can't handle templates are long gone now.
A C-style array of chars is just asking for trouble. You'll still need to deal with them on occasion (and that's what std::string.c_str() is for), but, honestly -- one of the biggest dangers in C is programmers doing Bad Things with char* and winding up with buffer overflows. Just don't do it.
An array of wchar__t is the same thing, just bigger.
CString, BSTR, and CComBSTR are not standard and not portable. Avoid them unless absolutely forced. Optimally, just convert a std::string/std::wstring to them when needed, which shouldn't be very expensive.
Note that std::string is just a child of std::basic_string, but you're still better off using std::string unless you have a really good reason not to. Really Good. Let the compiler take care of the optimization in this situation.

std::string !!
There's a reason why they call it a "Standard".
basic_string is an implementation detail and should be ignored.
BSTR & CComBSTR only for interOp with COM, and only for the moment of interop.

std::string unless I need to call an API that specifically takes one of the others that you listed.

Here's an article comparing the most common kinds of strings in C++ and how to convert between them. Unraveling Strings in Visual C++

If you can use MFC, use CString. Otherwise use std::string. Plus, std::string works on any platform that supports standard C++.

When I have a choice (I usually don't), I tend to use std::string with UTF-8 encoding (and the help of UTF8 CPP library. Not that I like std::string that much, but at least it is standard and portable.
Unfortunatelly, in almost all real-life projects I've worked on, there have been internal string classes - most of them actually better than std::string, but still...

I am a Qt dev, so of course I tend to use QString whenever possible :).
It's quite nice: unicode compliant, thread-safe implicit-sharing (aka copy-on-write), and it comes with an API designed to solve practical real-world problems (split, join, replace (with and without regex), conversion to/from numbers...)
If I can't use QString, then std::wstring. If you are stuck with C, I recommend glib GString.

I use std::string (or basic_string<TCHAR>) whenever I can. It's quite versatile (just like CStringT), it's type-safe (unlike printf), and it's available on every platform.

Other, std::wstring.
std::string is 20th century technology. Use Unicode, and sell to 6 billion people instead of 300 milion.

C-style char arrays have their place, but if you use them extensively you are asking to waste time debugging off by one errors. We have our own string class tailored for use in our (embedded development environment).
We don't use std::string because it isn't always available for us.

If you're using MFC, use CString. Otherwise I agree with most of the others, std::string or std::wstring all the way.
Microsoft could have done the world a huge favor by adding std::basic_string<TCHAR> overloads in their latest update of MFC.

I like to use TCHAR which is a define for wchar or char according to the projects settings.
It's defined in tchar.h where you can find all of the related definitions for functions and types you need.

std::string and std::wstring if I can, and something else if I have to.
They may not be perfect, but they are well tested, well understood, and very versatile. They play nicely with the rest of the standard library which is also a huge bonus.
Also worth mentioning, stringstreams.

std::string is better than nothing, but it's annoying that it's missing basic functionality like split, join and even a decent format call...

Unicode is the future. Do not use char* and std::string. Please )
I am tired of localization bugs.

Related

Will we have a size_t strlen(const char8_t*) in a future C++ version

char8_t in C++20 fixes some problems of char, so I was considering using char8_t instead of char for utf8 text (e.g. text from command line). But then I noticed that strlen was not specified in the standard to be used with char8_t, actually none of the functions in the cstring library are. Can I expect this to happen in a next standard update? Or is char8_t never intended to replace char in the way I had in mind?

I'm the author of the P0482 and P1423 char8_t proposals.
The intent of those proposals was to introduce the char8_t type with the same level of support present for char16_t and char32_t and then to follow up with additional functionality later. These proposals were adopted late in the C++20 development cycle (at the San Diego and Cologne meetings respectively), so there wasn't opportunity to deliver additional features for C++20.
One of the directives for SG16 as described in P1238 is to standardize new encoding aware text container and view types. Work is progressing in this area and we hope to deliver it for C++23. It is hoped that these new containers and views will supplant much raw string handling in C++.
With regard to strlen specifically, strlen is a C API. N2231 is a proposal to add char8_t support to C (again, at the same level as the existing support for char16_t and char32_t). That proposal has not yet been accepted by WG14. Assuming it is eventually accepted, then it would make sense to follow up with additional char8_t-based C string management functions (perhaps enhancing support for char16_t and char32_t as well).
At present, I'm working on completing an implementation of N2231 in gcc and glibc. Once that is complete, I intend to submit a revision of N2231 to WG14.
You can help! SG16 is an open group. Please feel free to subscribe to our mailing list, join us on Slack, share your ideas, needs, and wants, and write proposals for new functionality (we can help with how to do that).

These new char types are intended to use C++ string template std::basic_string, namely to define std::u8string. So the best in your case is use C++ strings.
As for the future support of char8_t in cstring library, I suppose this question is more suitable to the future C standard. I'm afraid, it will not be an easy and will be unlikely update, since C does not have overloaded functions, and this update will require new functions like c8slen in addition to strlen and wcslen.

char8_t is intended for UTF-8-encoded strings. As such, APIs that consume them will be assumed by users to be Unicode aware on some level. Quite a lot of the contents of the <cstring> header would be inappropriate for char8_t, as their behavior is very much not in line with Unicode (would strcmp do proper Unicode collation?).
If you want access to functions that work similarly to the <cstring> functions, then you'll find std::char_trait<char8_t> to contain some useful ones, in particular length (exactly like strlen) and compare (explicitly lexicographical). Most of the rest of <cstring> can be handled adequately through C++ algorithms.

0 can still act as null-terminator in utf8-strings, so technically nothing prevents you (except a lack of appropriate function) from using strlen to count the amount of bytes(!) in utf8 sequence. If you want to find the number of chars you would need a separate function.

C++/Win32 deprecated string functions: mbstowcs, wcstombs, safe or not safe?

The compiler (VC 2010) keeps complaining about me using them.
In case not safe, please offer simplest replacement.

Well you have the safe versions of most common string functions, they end at _s and offer a possibility to specify the length of the buffer.

If by "safe" you mean that you can use the functions without worrying that they will disappear in the future? In that case, most likely yes, as these functions are part of the C (and C++ probably) standard.

Many Windows DLLs use (imports, exports) these unsafe (also called "obsolete" or "banned") APIs. These are parts of Win32!

I used them in win-32 GDI+ string-drawing functions which takes a char array then chages into wide char string then draw on screen. I use VC++ 2010 express too! Works without any leak.

Array of char or std::string for a public library?

my question is simple:
Should I use array of char eg:
char *buf, buf2[MAX_STRING_LENGTH]
etc or should I use std::string in a library that will be used by other programmers where they can use it on any SO and compiler of their choice?
Considering performance and portability...
from my point of view, std strings are easier and performance is equal or the difference is way too little to not use std:string, about portability I don't know. I guess as it is standard, there shouldn't be any compiler that compiles C++ without it, at least any important compiler.
EDIT:
The library will be compiled on 3 major OS and, theorically, distributed as a lib
Your thoughts?
ty,
Joe

Depends on how this library will be used in conjunction with client code. If it will be linked in dynamically and you have a set of APIs exposed for the client -- you are better off using null terminated byte strings (i.e. char *) and their wide-character counterparts. If you are talking about using them within your code, you certainly are free to use std::string. If it is going to be included in source form -- std::string works fine.

But if your library is shipped as DLL your users will have to use the same implementation of std::string. It won't be possible for them to use STLPort (or any other implementation) if your library was built using Microsoft STL.

As long as you are targetting pure C++ for your library, using std::string is fine and even desirable. However, doing that ties you to a particular implementation of C++ (the one used to build your library), and it can't be linked with other C++ implementations or other languages.
Often, it is highly desirable to give a library a C interface rather than a C++ one. That way its usable by any other language that provides a C foreign function interface (which is most of them). For a C interface, you need to use char *

I would recommend just using std::string. Besides if you want compatibility with libraries requiring C-style strings (for example, which uses a C compatible API), you can always just use the c_str() method of std::string.

In general, you will be better off using std::string, certainly for calls internal to your library.
For your API, it's dependent on what its purpose is. For internal use within your organization, an API that uses std::string will probably be fine. For external use you may wish to provide a C API, one which uses char*

Why use c strings in c++?

Is there any good reason to use C-strings in C++ nowadays? My textbook uses them in examples at some points, and I really feel like it would be easier just to use a std::string.

The only reasons I've had to use them is when interfacing with 3rd party libraries that use C style strings. There might also be esoteric situations where you would use C style strings for performance reasons, but more often than not, using methods on C++ strings is probably faster due to inlining and specialization, etc.
You can use the c_str() method in many cases when working with those sort of APIs, but you should be aware that the char * returned is const, and you should not modify the string via that pointer. In those sort of situations, you can still use a vector<char> instead, and at least get the benefit of easier memory management.

A couple more memory control notes:
C strings are POD types, so they can be allocated in your application's read-only data segment. If you declare and define std::string constants at namespace scope, the compiler will generate additional code that runs before main() that calls the std::string constructor for each constant. If your application has many constant strings (e.g. if you have generated C++ code that uses constant strings), C strings may be preferable in this situation.
Some implementations of std::string support a feature called SSO ("short string optimization" or "small string optimization") where the std::string class contains storage for strings up to a certain length. This increases the size of std::string but often significantly reduces the frequency of free-store allocations/deallocations, improving performance. If your implementation of std::string does not support SSO, then constructing an empty std::string on the stack will still perform a free-store allocation. If that is the case, using temporary stack-allocated C strings may be helpful for performance-critical code that uses strings. Of course, you have to be careful not to shoot yourself in the foot when you do this.

Because that's how they come from numerous API/libraries?

Let's say you have some string constants in your code, which is a pretty common need. It's better to define these as C strings than as C++ objects -- more lightweight, portable, etc. Now, if you're going to be passing these strings to various functions, it's nice if these functions accept a C string instead of requiring a C++ string object.
Of course, if the strings are mutable, then it's much more convenient to use C++ string objects.

If a function needs a constant string I still prefer to use 'const char*' (or const wchar_t*) even if the program uses std::string, CString, EString or whatever elsewhere.
There are just too many sources of strings in a large code base to be sure the caller will have the string as a std::string and 'const char*' is the lowest common denominator.

Textbooks feature old-school C strings because many basic functions still expect them as arguments, or return them. Additionally, it gives some insight into the underlying structure of the string in memory.

Memory control. I recently had to handle strings (actually blobs from a database) about 200-300 MB in size, in a massively multithreaded application. It was a situation where just-one-more copy of the string might have burst the 32bit address space. I had to know exactly how many copies of the string existed. Although I'm an STL evangelist, I used char * then because it gave me the guarantee that no extra memory or even extra copy was allocated. I knew exactly how much space it would need.
Apart from that, standard STL string processing misses out on some great C functions for string processing/parsing. Thankfully, std::string has the c_str() method for const access to the internal buffer. To use printf() you still have to use char * though (what a crazy idea of the C++ team to not include (s)printf-like functionality, one of the most useful functions EVER in C. I hope boost::format will soon be included in the STL.

If the C++ code is "deep" (close to the kernel, heavily dependent on C libraries, etc.) you may want to use C strings explicitly to avoid lots of conversions in to and out of std::string. Of, if you're interfacing with other language domains (Python, Ruby, etc.) you might do so for the same reason. Otherwise, use std::string.

Some posts mention memory concerns. That might be a good reason to shun std::string, but char* probably is not the best replacement. It's still an OO language. Your own string class is probably better than a char*. It may even be more efficient - you can apply the Small String Optimization, for instance.
In my case, I was trying to get about 1GB worth of strings out of a 2GB file, stuff them in records with about 60 fields and then sort them 7 times of different fields. My predecessors code took 25 hours with char*, my code ran in 1 hour.

1) "string constant" is a C string (const char *), converting it to const std::string& is run-time process, not necessarily simple or optimized.
2) fstream library uses c-style strings to pass file names.
My rule of thumb is to pass const std::string& if I am about to use the data as std::string anyway (say, when I store them in a vector), and const char * in other cases.

After spending far, far, too much time debugging initialization rules and every conceivable string implementation on several platforms we require static strings to be const char*.
After spending far, far, too much time debugging bad char* code and memory leaks I suggest that all non-static strings be some type of string object ... until profiling shows that you can and should do something better ;-)

Legacy code that doesn't know of std::string. Also, before C++11 opening files with std::ifstream or std::ofstream was only possible with const char* as an input to the file name.

Given the choice, there is generally no reason to choose primitive C strings (char*) over C++ strings (std::string). However, often you don't have the luxury of choice. For instance, std::fstream's constructors take C strings, for historical reasons. Also, C libraries (you guessed it!) use C strings.
In your own C++ code it is best to use std::string and extract the object's C string as needed by using the c_str() function of std::string.

It depends on the libraries you're using. For example, when working with the MFC, it's often easier to use CString when working with various parts of the Windows API. It also seems to perform better than std::string in Win32 applications.
However, std::string is part of the C++ standard, so if you want better portability, go with std::string.

For applications such as most embedded platforms where you do not have the luxury of a heap to store the strings being manipulated, and where deterministic preallocation of string buffers is required.

c strings don't carry the overhead of being a class.
c strings generally can result in faster code, as they are closer to the machine level
This is not to say, you can't write bad code with them. They can be misused, like every other construct.
There is a wealth of libary calls that demand them for historical reasons.
Learn to use c strings, and stl strings, and use each when it makes sense to do so.

STL strings are certainly far easier to use, and I don't see any reason to not use them.
If you need to interact with a library that only takes C-style strings as arguments, you can always call the c_str() method of the string class.

The usual reason to do it is that you enjoy writing buffer overflows in your string handling. Counted strings are so superior to terminated strings it's hard to see why the C designers ever used terminated strings. It was a bad decision then; it's a bad decision now.

Why don't the std::fstream classes take a std::string?

This isn't a design question, really, though it may seem like it. (Well, okay, it's kind of a design question). What I'm wondering is why the C++ std::fstream classes don't take a std::string in their constructor or open methods. Everyone loves code examples so:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::string filename = "testfile";
std::ifstream fin;
fin.open(filename.c_str()); // Works just fine.
fin.close();
//fin.open(filename); // Error: no such method.
//fin.close();
}
This gets me all the time when working with files. Surely the C++ library would use std::string wherever possible?

By taking a C string the C++03 std::fstream class reduced dependency on the std::string class. In C++11, however, the std::fstream class does allow passing a std::string for its constructor parameter.
Now, you may wonder why isn't there a transparent conversion from a std:string to a C string, so a class that expects a C string could still take a std::string just like a class that expects a std::string can take a C string.
The reason is that this would cause a conversion cycle, which in turn may lead to problems. For example, suppose std::string would be convertible to a C string so that you could use std::strings with fstreams. Suppose also that C string are convertible to std::strings as is the state in the current standard. Now, consider the following:
void f(std::string str1, std::string str2);
void f(char* cstr1, char* cstr2);
void g()
{
char* cstr = "abc";
std::string str = "def";
f(cstr, str); // ERROR: ambiguous
}
Because you can convert either way between a std::string and a C string the call to f() could resolve to either of the two f() alternatives, and is thus ambiguous. The solution is to break the conversion cycle by making one conversion direction explicit, which is what the STL chose to do with c_str().

There are several places where the C++ standard committee did not really optimize the interaction between facilities in the standard library.
std::string and its use in the library is one of these.
One other example is std::swap. Many containers have a swap member function, but no overload of std::swap is supplied. The same goes for std::sort.
I hope all these small things will be fixed in the upcoming standard.

Maybe it's a consolation: all fstream's have gotten an open(string const &, ...) next to the open(char const *, ...) in the working draft of the C++0x standard.
(see e.g. 27.8.1.6 for the basic_ifstream declaration)
So when it gets finalised and implemented, it won't get you anymore :)

The stream IO library has been added to the standard C++ library before the STL. In order to not break backward compatibility, it has been decided to avoid modifying the IO library when the STL was added, even if that meant some issues like the one you raise.

# Bernard:
Monoliths "Unstrung." "All for one, and one for all" may work for Musketeers, but it doesn't work nearly as well for class designers. Here's an example that is not altogether exemplary, and it illustrates just how badly you can go wrong when design turns into overdesign. The example is, unfortunately, taken from a standard library near you...
~ http://www.gotw.ca/gotw/084.htm

It is inconsequential, that is true. What do you mean by std::string's interface being large? What does large mean, in this context - lots of method calls? I'm not being facetious, I am actually interested.
It has more methods than it really needs, and its behaviour of using integral offsets rather than iterators is a bit iffy (as it's contrary to the way the rest of the library works).
The real issue I think is that the C++ library has three parts; it has the old C library, it has the STL, and it has strings-and-iostreams. Though some efforts were made to bridge the different parts (e.g. the addition of overloads to the C library, because C++ supports overloading; the addition of iterators to basic_string; the addition of the iostream iterator adaptors), there are a lot of inconsistencies when you look at the detail.
For example, basic_string includes methods that are unnecessary duplicates of standard algorithms; the various find methods, could probably be safely removed. Another example: locales use raw pointers instead of iterators.

C++ grew up on smaller machines than the monsters we write code for today. Back when iostream was new many developers really cared about code size (they had to fit their entire program and data into several hundred KB). Therefore, many didn't want to pull in the "big" C++ string library. Many didn't even use the iostream library for the same reasons, code size.
We didn't have thousands of megabytes of RAM to throw around like we do today. We usually didn't have function level linking so we were at the mercy of the developer of the library to use a lot of separate object files or else pull in tons of uncalled code. All of this FUD made developers steer away from std::string.
Back then I avoided std::string too. "Too bloated", "called malloc too often", etc. Foolishly using stack-based buffers for strings, then adding all kinds of tedious code to make sure it doesn't overrun.

Is there any class in STL that takes a string... I dont think so (couldnt find any in my quick search). So it's probably some design decision, that no class in STL should be dependent on any other STL class (that is not directly needed for functionality).

I believe that this has been thought about and was done to avoid the dependency; i.e. #include <fstream> should not force one to #include <string>.
To be honest, this seems like quite an inconsequential issue. A better question would be, why is std::string's interface so large?

Nowadays you can solve this problem very easily: add -std=c++11 to your CFLAGS.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js