Array of char or std::string for a public library? - c++

my question is simple:
Should I use array of char eg:
char *buf, buf2[MAX_STRING_LENGTH]
etc or should I use std::string in a library that will be used by other programmers where they can use it on any SO and compiler of their choice?
Considering performance and portability...
from my point of view, std strings are easier and performance is equal or the difference is way too little to not use std:string, about portability I don't know. I guess as it is standard, there shouldn't be any compiler that compiles C++ without it, at least any important compiler.
EDIT:
The library will be compiled on 3 major OS and, theorically, distributed as a lib
Your thoughts?
ty,
Joe

Depends on how this library will be used in conjunction with client code. If it will be linked in dynamically and you have a set of APIs exposed for the client -- you are better off using null terminated byte strings (i.e. char *) and their wide-character counterparts. If you are talking about using them within your code, you certainly are free to use std::string. If it is going to be included in source form -- std::string works fine.

But if your library is shipped as DLL your users will have to use the same implementation of std::string. It won't be possible for them to use STLPort (or any other implementation) if your library was built using Microsoft STL.

As long as you are targetting pure C++ for your library, using std::string is fine and even desirable. However, doing that ties you to a particular implementation of C++ (the one used to build your library), and it can't be linked with other C++ implementations or other languages.
Often, it is highly desirable to give a library a C interface rather than a C++ one. That way its usable by any other language that provides a C foreign function interface (which is most of them). For a C interface, you need to use char *

I would recommend just using std::string. Besides if you want compatibility with libraries requiring C-style strings (for example, which uses a C compatible API), you can always just use the c_str() method of std::string.

In general, you will be better off using std::string, certainly for calls internal to your library.
For your API, it's dependent on what its purpose is. For internal use within your organization, an API that uses std::string will probably be fine. For external use you may wish to provide a C API, one which uses char*

Related

Mixing C++ flavours in the same project

Is it safe to mix C++98 and C++11 in the same project? By "mixing" I mean not only linking object files but also common header files included in the source code compiled with C++98 and C++11.
The background for the question is the desire to transition at least a part of a large code base to C++11. A part of the code is in C++ CUDA, compiled to be executed on either GPU or CPU, and the corresponding compiler doesn't support C++11 at this time. However, much of the code is intended for CPU only and can be compiled with either C++ flavour. Some header files are included in both CPU+GPU and CPU-only source files.
If we now compile CPU-only source files with C++11 compiler, can we be confident against undesirable side effects?
In practice, maybe.
It is relatively common for the standard library of C++11 and C++03 to disagree about what the layout of std namespace objects is. As an example, sizeof(std::vector<int>) changed noticeably over various compiler versions in MSVC land. (it got smaller as they optimized it)
Other examples could be a different heap on each side of the compiler fence.
So you have to carefully "firewall" between the two source trees.
Now, some compilers seek to minimize such binary compatibility changes, even at the cost of violating the standard. I believe std::list without a size counter might be an example of that (which violates C++11, but I recall that at least one vendor provided a standards-non-compliant std::list to maintain binary compatibility -- I don't remember which one).
For the two compilers (and a compiler in C++03 and C++11 are different compilers) you are going to have some ABI guarantees. There is probably a large chunk of the language for which the ABI will agree, and on that set you are relatively safe.
To be reasonably safe, you'll want to treat the other compiler version files as if they are third party DLLs (delay loaded libraries) that do not link to the same C++ standard library. That means any resources passed from one to the other have to be packaged with destruction code (ie, returned to the DLL from whence it came to be destroyed). You'll either have to investigate the ABI of the two standard libraries, or avoid using it in the common header files, so you can pass things like smart pointers between the DLLs.
An even safer approach is to strip yourself down to a C style interface with the other code base, and only pass handles (opaque types) between the two code bases. To make this sane, whip up some header-file only mojo that wraps the C style interface in pretty C++ code, just don't pass those C++ objects between the code bases.
All of this is a pain.
For example, suppose you have a std::string get_some_string(HANDLE) function, and you don't trust ABI stability.
So you have 3 layers.
namespace internal {
// NOT exported from DLL
std::string get_some_string(HANDLE) { /* implementation in DLL */ }
}
namespace marshal {
// exported from DLL
// visible in external headers, not intended to be called directly
void get_some_string(HANDLE h, void* pdata, void(*callback)( void*, char const* data, std::size_t length ) ) {
// implementation in DLL
auto r = ::internal::get_some_string(h);
callback( pdata, r.data(), r.size() );
}
}
namespace interface {
// exists in only public header file, not within DLL
inline std::string get_some_string(HANDLE h) {
std::string r;
::marshal::get_some_string(h, &r,
[](void* pr, const char* str, std::size_t length){
std::string& r = *static_cast<std::string*>(pr);
r.append( str, length );
}
);
return r;
}
}
So the code outside the DLL does an auto s = ::interface::get_some_string(handle);, and it looks like a C++ interface.
The code inside the DLL implements std::string ::internal::get_some_string(HANDLE);.
The marshal's get_some_string provides a C-style interface between the two, which provides better binary compatibility than relying on the layout and implementation of std::string to remain stable between the DLL and the code using the DLL.
The interface's std::string exists completely within the non-DLL code. The internal std::string exists completely within the DLL-code. The marshal code moves the data from one side to the other.

C++/Win32 deprecated string functions: mbstowcs, wcstombs, safe or not safe?

The compiler (VC 2010) keeps complaining about me using them.
In case not safe, please offer simplest replacement.
Well you have the safe versions of most common string functions, they end at _s and offer a possibility to specify the length of the buffer.
If by "safe" you mean that you can use the functions without worrying that they will disappear in the future? In that case, most likely yes, as these functions are part of the C (and C++ probably) standard.
Many Windows DLLs use (imports, exports) these unsafe (also called "obsolete" or "banned") APIs. These are parts of Win32!
I used them in win-32 GDI+ string-drawing functions which takes a char array then chages into wide char string then draw on screen. I use VC++ 2010 express too! Works without any leak.

Is there a portable wrapper for C++ type_info that standardizes type name string format?

The format of the output of type_info::name() is implementation specific.
namespace N { struct A; }
const N::A *a;
typeid(a).name(); // returns e.g. "const struct N::A" but compiler-specific
Has anyone written a wrapper that returns dependable, predictable type information that is the same across compilers. Multiple templated functions would allow user to get specific information about a type. So I might be able to use:
MyTypeInfo::name(a); // returns "const struct N::A *"
MyTypeInfo::base(a); // returns "A"
MyTypeInfo::pointer(a); // returns "*"
MyTypeInfo::nameSpace(a); // returns "N"
MyTypeInfo::cv(a); // returns "const"
These functions are just examples, someone with better knowledge of the C++ type system could probably design a better API. The one I'm interested in in base(). All functions would raise an exception if RTTI was disabled or an unsupported compiler was detected.
This seems like the sort of thing that Boost might implement, but I can't find it in there anywhere. Is there a portable library that does this?
There are some limitations to do such things in C++, so you probably won't find exactly what you want in the near future. The meta-information about the types that the compiler inserts in the compiled code is also implementation-specific to the RTL used by the compiler, so it'd be difficult for a third-party library to do a good job without relying to undocumented features of each specific compiler that might break in later versions.
The Qt framework has, to my knowledge, the nearest thing to what you intended. But they do that completely independent from RTTI. Instead, they have their own "compiler" that parses the source code and generates additional source modules with the meta-information. Then, you compile+link these modules along with your program and use their API to get the information. Take a look at http://doc.qt.nokia.com/latest/metaobjects.html
Jeremy Pack (from Boost Extension plugin framework) appears to have written such a thing:
http://blog.redshoelace.com/2009/06/resource-management-across-dll.html
3. RTTI does not always function as expected across DLL boundaries. Check out the type_info classes to see how I deal with that.
So you could have a look there.
PS. I remembered because I once fixed a bug in that area; this might still add information so here's the link: https://stackoverflow.com/a/5838527/85371
GCC has __cxa_demangle https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
If there are such extensions for all compilers you target, you could use them to write a portable function with macros to detect the compiler.

Why use c strings in c++?

Is there any good reason to use C-strings in C++ nowadays? My textbook uses them in examples at some points, and I really feel like it would be easier just to use a std::string.
The only reasons I've had to use them is when interfacing with 3rd party libraries that use C style strings. There might also be esoteric situations where you would use C style strings for performance reasons, but more often than not, using methods on C++ strings is probably faster due to inlining and specialization, etc.
You can use the c_str() method in many cases when working with those sort of APIs, but you should be aware that the char * returned is const, and you should not modify the string via that pointer. In those sort of situations, you can still use a vector<char> instead, and at least get the benefit of easier memory management.
A couple more memory control notes:
C strings are POD types, so they can be allocated in your application's read-only data segment. If you declare and define std::string constants at namespace scope, the compiler will generate additional code that runs before main() that calls the std::string constructor for each constant. If your application has many constant strings (e.g. if you have generated C++ code that uses constant strings), C strings may be preferable in this situation.
Some implementations of std::string support a feature called SSO ("short string optimization" or "small string optimization") where the std::string class contains storage for strings up to a certain length. This increases the size of std::string but often significantly reduces the frequency of free-store allocations/deallocations, improving performance. If your implementation of std::string does not support SSO, then constructing an empty std::string on the stack will still perform a free-store allocation. If that is the case, using temporary stack-allocated C strings may be helpful for performance-critical code that uses strings. Of course, you have to be careful not to shoot yourself in the foot when you do this.
Because that's how they come from numerous API/libraries?
Let's say you have some string constants in your code, which is a pretty common need. It's better to define these as C strings than as C++ objects -- more lightweight, portable, etc. Now, if you're going to be passing these strings to various functions, it's nice if these functions accept a C string instead of requiring a C++ string object.
Of course, if the strings are mutable, then it's much more convenient to use C++ string objects.
If a function needs a constant string I still prefer to use 'const char*' (or const wchar_t*) even if the program uses std::string, CString, EString or whatever elsewhere.
There are just too many sources of strings in a large code base to be sure the caller will have the string as a std::string and 'const char*' is the lowest common denominator.
Textbooks feature old-school C strings because many basic functions still expect them as arguments, or return them. Additionally, it gives some insight into the underlying structure of the string in memory.
Memory control. I recently had to handle strings (actually blobs from a database) about 200-300 MB in size, in a massively multithreaded application. It was a situation where just-one-more copy of the string might have burst the 32bit address space. I had to know exactly how many copies of the string existed. Although I'm an STL evangelist, I used char * then because it gave me the guarantee that no extra memory or even extra copy was allocated. I knew exactly how much space it would need.
Apart from that, standard STL string processing misses out on some great C functions for string processing/parsing. Thankfully, std::string has the c_str() method for const access to the internal buffer. To use printf() you still have to use char * though (what a crazy idea of the C++ team to not include (s)printf-like functionality, one of the most useful functions EVER in C. I hope boost::format will soon be included in the STL.
If the C++ code is "deep" (close to the kernel, heavily dependent on C libraries, etc.) you may want to use C strings explicitly to avoid lots of conversions in to and out of std::string. Of, if you're interfacing with other language domains (Python, Ruby, etc.) you might do so for the same reason. Otherwise, use std::string.
Some posts mention memory concerns. That might be a good reason to shun std::string, but char* probably is not the best replacement. It's still an OO language. Your own string class is probably better than a char*. It may even be more efficient - you can apply the Small String Optimization, for instance.
In my case, I was trying to get about 1GB worth of strings out of a 2GB file, stuff them in records with about 60 fields and then sort them 7 times of different fields. My predecessors code took 25 hours with char*, my code ran in 1 hour.
1) "string constant" is a C string (const char *), converting it to const std::string& is run-time process, not necessarily simple or optimized.
2) fstream library uses c-style strings to pass file names.
My rule of thumb is to pass const std::string& if I am about to use the data as std::string anyway (say, when I store them in a vector), and const char * in other cases.
After spending far, far, too much time debugging initialization rules and every conceivable string implementation on several platforms we require static strings to be const char*.
After spending far, far, too much time debugging bad char* code and memory leaks I suggest that all non-static strings be some type of string object ... until profiling shows that you can and should do something better ;-)
Legacy code that doesn't know of std::string. Also, before C++11 opening files with std::ifstream or std::ofstream was only possible with const char* as an input to the file name.
Given the choice, there is generally no reason to choose primitive C strings (char*) over C++ strings (std::string). However, often you don't have the luxury of choice. For instance, std::fstream's constructors take C strings, for historical reasons. Also, C libraries (you guessed it!) use C strings.
In your own C++ code it is best to use std::string and extract the object's C string as needed by using the c_str() function of std::string.
It depends on the libraries you're using. For example, when working with the MFC, it's often easier to use CString when working with various parts of the Windows API. It also seems to perform better than std::string in Win32 applications.
However, std::string is part of the C++ standard, so if you want better portability, go with std::string.
For applications such as most embedded platforms where you do not have the luxury of a heap to store the strings being manipulated, and where deterministic preallocation of string buffers is required.
c strings don't carry the overhead of being a class.
c strings generally can result in faster code, as they are closer to the machine level
This is not to say, you can't write bad code with them. They can be misused, like every other construct.
There is a wealth of libary calls that demand them for historical reasons.
Learn to use c strings, and stl strings, and use each when it makes sense to do so.
STL strings are certainly far easier to use, and I don't see any reason to not use them.
If you need to interact with a library that only takes C-style strings as arguments, you can always call the c_str() method of the string class.
The usual reason to do it is that you enjoy writing buffer overflows in your string handling. Counted strings are so superior to terminated strings it's hard to see why the C designers ever used terminated strings. It was a bad decision then; it's a bad decision now.

How do you handle strings in C++?

Which is your favorite way to go with strings in C++? A C-style array of chars? Or wchar_t? CString, std::basic_string, std::string, BSTR or CComBSTR?
Certainly each of these has its own area of application, but anyway, which is your favorite and why?
std::string or std::wstring, depending on your needs. Why?
They're standard
They're portable
They can handle I18N
They have performance guarantees (as per the standard)
Protected against buffer overflows and similar attacks
Are easily converted to other types as needed
Are nicely templated, giving you a wide variety of options while reducing code bloat and improving performance. Really. Compilers that can't handle templates are long gone now.
A C-style array of chars is just asking for trouble. You'll still need to deal with them on occasion (and that's what std::string.c_str() is for), but, honestly -- one of the biggest dangers in C is programmers doing Bad Things with char* and winding up with buffer overflows. Just don't do it.
An array of wchar__t is the same thing, just bigger.
CString, BSTR, and CComBSTR are not standard and not portable. Avoid them unless absolutely forced. Optimally, just convert a std::string/std::wstring to them when needed, which shouldn't be very expensive.
Note that std::string is just a child of std::basic_string, but you're still better off using std::string unless you have a really good reason not to. Really Good. Let the compiler take care of the optimization in this situation.
std::string !!
There's a reason why they call it a "Standard".
basic_string is an implementation detail and should be ignored.
BSTR & CComBSTR only for interOp with COM, and only for the moment of interop.
std::string unless I need to call an API that specifically takes one of the others that you listed.
Here's an article comparing the most common kinds of strings in C++ and how to convert between them. Unraveling Strings in Visual C++
If you can use MFC, use CString. Otherwise use std::string. Plus, std::string works on any platform that supports standard C++.
When I have a choice (I usually don't), I tend to use std::string with UTF-8 encoding (and the help of UTF8 CPP library. Not that I like std::string that much, but at least it is standard and portable.
Unfortunatelly, in almost all real-life projects I've worked on, there have been internal string classes - most of them actually better than std::string, but still...
I am a Qt dev, so of course I tend to use QString whenever possible :).
It's quite nice: unicode compliant, thread-safe implicit-sharing (aka copy-on-write), and it comes with an API designed to solve practical real-world problems (split, join, replace (with and without regex), conversion to/from numbers...)
If I can't use QString, then std::wstring. If you are stuck with C, I recommend glib GString.
I use std::string (or basic_string<TCHAR>) whenever I can. It's quite versatile (just like CStringT), it's type-safe (unlike printf), and it's available on every platform.
Other, std::wstring.
std::string is 20th century technology. Use Unicode, and sell to 6 billion people instead of 300 milion.
C-style char arrays have their place, but if you use them extensively you are asking to waste time debugging off by one errors. We have our own string class tailored for use in our (embedded development environment).
We don't use std::string because it isn't always available for us.
If you're using MFC, use CString. Otherwise I agree with most of the others, std::string or std::wstring all the way.
Microsoft could have done the world a huge favor by adding std::basic_string<TCHAR> overloads in their latest update of MFC.
I like to use TCHAR which is a define for wchar or char according to the projects settings.
It's defined in tchar.h where you can find all of the related definitions for functions and types you need.
std::string and std::wstring if I can, and something else if I have to.
They may not be perfect, but they are well tested, well understood, and very versatile. They play nicely with the rest of the standard library which is also a huge bonus.
Also worth mentioning, stringstreams.
std::string is better than nothing, but it's annoying that it's missing basic functionality like split, join and even a decent format call...
Unicode is the future. Do not use char* and std::string. Please )
I am tired of localization bugs.