use of char * vs std::string in different environments

use of char * vs std::string in different environments - c++

I have been using std::string in my code. I was going to make a std::string and pass it by reference. However, someone suggested using a char * instead. Something about std::string is not reliable when porting code. Is that true? I have avoided using char * as I would need to do some memory management for it. Instead I find using the std::string much easier to use.
Basically I have a 10 digit output that I am storing in this string. Atm, I am not sure which would be better to use.

std::string is part of the C++ Standard, and has been since 1998. It is available in all the current C++ compilers. There really is no portability reason not to use it. If you have an API that needs to use a C-style string, you can use the std::string's c_str() member to get one from a string:
std::string s = "foo";
int n = strlen( s.c_str() );

In C++, almost every string should be std::string unless another library requires a cstring, in which case you should still be using an std::string and passing string.c_str(), unless you're using functions that work with buffers.
However, if you're writing a library and exporting functions, it's better to use const char* parameters rather than std::string parameters for portability.

Using a char * you are sure that you will not get portability issues among libraries.
If a library exports a function that uses an std::string, it might have problems communicating with another library that has been linked against a different version of the standard library.

I think that there is nothing to worry about unless you are going to provide some API to 3rd party.
Just use std::string

There's nothing unportable about std::string that isn't also an issue with char *. std::string actually uses a char * internally...

string is better. There is nothing unreliable about it on any platform. If you're worried about passing large classes, you can pass const references of your strings into functions. Makes coding faster and less bug prone.

In addition to the fact thata it's easier, std::string will probably be more efficient. Its small string optimization can keep the 10 digits in the std::string object itself, instead of putting them in another memory block off the heap.

Related

Are the methods in the <cstring> applicable for string class too?

I've tried out using memcpy() method to strings but was getting a "no matching function call" although it works perfectly when I use an array of char[].
Can someone explain why?
www.cplusplus.com/reference/cstring/memcpy/

std::string is an object, not a contiguous array of bytes (which is what memcpy expects). std::string is not char*; std::string contains char* (somewhere really deep).
Although you can pull out the std::string inner byte array by using &str[0] (see note), I strongly encourage you not to. Almost anything you need to do already is implemented as a std::string method. Including appending, subtracting, transforming and anything that makes sense with a text object.
So yes, you can do something as stupid as:
std::string str (100,0);
memcpy(&str[0],"hello world", 11);
but you shouldn't.
Even if you do need memcpy behaviuor, try to use std::copy instead.
Note: this is often done with C functions that expects some buffer, while the developer wants to maintain a RAII style in his code. So he or she produces std::string object but passes it as C string. But if you do clean C++ code you don't need to.

Because there's no matching function call. You're trying to use C library functions with C++ types.

Is there an equivalent way to do CString::GetBuffer in std::string?

Many Windows API, such as GetModuleFileName, etc... write output to char* buffer. But it is more convenient to use std::string. Is there a way to have them write to std::string (or std::wstring)'s buffer directly?
Sorry for my poor English. I'm not a native English speaker. -_-
Taworn T.

If you're using C++0x, then the following is guaranteed to work:
std::string s;
s.resize(max_length);
size_t actual_length = SomeApiCall(&s[0], max_length);
s.resize(actual_length);
Before C++0x the std::string contents is not guaranteed to be consecutive in memory, so the code is not reliable in theory; in practice it works for popular STL implementations.

use std::string::c_str() to retrieve a const char * that is null terminated.
std::string::data() also returns a const char * but that may not be null terminated.
But like zeuxcg says, I dont suggest you to write directly in that buffer.

How can I avoid encoding mixups of strings in a C/C++ API?

I'm working on implementing different APIs in C and C++ and wondered what techniques are available for avoiding that clients get the encoding wrong when receiving strings from the framework or passing them back. For instance, imagine a simple plugin API in C++ which customers can implement to influence translations. It might feature a function like this:
const char *getTranslatedWord( const char *englishWord );
Now, let's say that I'd like to enforce that all strings are passed as UTF-8. Of course I'd document this requirement, but I'd like the compiler to enforce the right encoding, maybe by using dedicated types. For instance, something like this:
class Word {
public:
static Word fromUtf8( const char *data ) { return Word( data ); }
const char *toUtf8() { return m_data; }
private:
Word( const char *data ) : m_data( data ) { }
const char *m_data;
};
I could now use this specialized type in the API:
Word getTranslatedWord( const Word &englishWord );
Unfortunately, it's easy to make this very inefficient. The Word class lacks proper copy constructors, assignment operators etc.. and I'd like to avoid unnecessary copying of data as much as possible. Also, I see the danger that Word gets extended with more and more utility functions (like length or fromLatin1 or substr etc.) and I'd rather not write Yet Another String Class. I just want a little container which avoids accidental encoding mixups.
I wonder whether anybody else has some experience with this and can share some useful techniques.
EDIT: In my particular case, the API is used on Windows and Linux using MSVC 6 - MSVC 10 on Windows and gcc 3 & 4 on Linux.

You could pass arround a std::pair instead of a char*:
struct utf8_tag_t{} utf8_tag;
std::pair<const char*,utf8_tag_t> getTranslatedWord(std::pair<const char*,utf8_tag_t> englishWord);
The generated machine code should be identical on a decent modern compiler that uses the empty base class optimization for std::pair.
I don't bother with this though. I'd just use char*s and document that the input has to be utf8. If the data could come from an untrusted source, you're going to have to check the encoding at runtime anyway.

I suggest that you use std::wstring.
Check out this other question for details .

The ICU project provides a Unicode support library for C++.

Why did this work with Visual C++, but not with gcc?

I've been working on a senior project for the last several months now, and a major sticking point in our team's development process has been dealing wtih rifts between Visual-C++ and gcc. (Yes, I know we all should have had the same development environment.) Things are about finished up at this point, but I ran into a moderate bug just today that had me wondering whether Visual-C++ is easier on newbies (like me) by design.
In one of my headers, there is a function that relies on strtok to chop up a string, do some comparisons and return a string with a similar format. It works a little something like the following:
int main()
{
string a, b, c;
//Do stuff with a and b.
c = get_string(a,b);
}
string get_string(string a, string b)
{
const char * a_ch, b_ch;
a_ch = strtok(a.c_str(),",");
b_ch = strtok(b.c_str(),",");
}
strtok is infamous for being great at tokenizing, but equally great at destroying the original string to be tokenized. Thus, when I compiled this with gcc and tried to do anything with a or b, I got unexpected behavior, since the separator used was completely removed in the string. Here's an example in case I'm unclear; if I set a = "Jim,Bob,Mary" and b="Grace,Soo,Hyun", they would be defined as a="JimBobMary" and b="GraceSooHyun" instead of staying the same like I wanted.
However, when I compiled this under Visual C++, I got back the original strings and the program executed fine.
I tried dynamically allocating memory to the strings and copying them the "standard" way, but the only way that worked was using malloc() and free(), which I hear is discouraged in C++. While I'm curious about that, the real question I have is this: Why did the program work when compiled in VC++, but not with gcc?
(This is one of many conflicts that I experienced while trying to make the code cross-platform.)
Thanks in advance!
-Carlos Nunez

This is an example of undefined behavior. You're passing the result of string::c_str(), a const char*, to strtok, which takes a char*. By modifying the contents of the std::string data, you're invoking undefined behavior (you should be getting warnings for this unless you're casting).
When are you checking the value of a and b? In get_string, or in main? get_string is passed copies of a and b, so strtok will most likely not alter the originals in main. However, it could, as you are invoking undefined behavior.
The "right way" to do this is to use malloc/free or new[]/delete[]. You're using a C function, so you're already guilty of the same crime as you would be using malloc/free. A relatively elegant yet safe way to approach this is:
char *ap = strdup(a.c_str());
const char *a_ch = strtok(ap, ",");
/* do whatever it is you do */
free(ap);
Also bear in mind that strtok uses global state, so it won't play well with threads.

Tokens will be automatically replaced by a null-character by function strtok. That is not what you can do with constant data.
To make your code safe and cross-platform consider using boost::tokenizer.

I think the code is working because of differences in string implementation. VC++ string implementation must be making copies when you pass them to a function that could potentially modify the string.

Static library API question (std::string vs. char*)

I have not worked with static libraries before, but now I need to.
Scenario:
I am writing a console app in Unix. I freely use std::string everywhere because it's easy to do so. However, I recently found out that I have to support it in Windows and a third party application would need API's to my code (I will not be sharing source, just the DLL).
With this in mind, can I still use std::string everywhere in my code but then provide them with char * when I code the API's? Would that work?

Yep. Use std::string internally and then just use const char * on the interface functions (which will be converted to std::strings on input.

Why not just provide them with std::string?
It's standard C++, and I'd be very suprised if they didn't support it.

The question is, what your clients will do with that pointer. It should of course be const char*, but if clients will keep and reference it later on, its probably risky to use std::string internally, because as soon as you operate yourself on the strings there is no way to keep std::string from moving memory, as its reference counting mechanism can not work with exported char* pointers. As long as you dont touch the std::string objects, their memory wont move, and the pointer is safe.

There is no standardized C++ binary interface (at least I haven;t heard about it), thus projects with different settings may appear to be unlinkable together. For example, Visual C++ provides a way to enable/disable iterator debug support. This is controlled by macro and size of some data structures depends on it.
If two codes compiled with different settings start to communicate using these data structures, the best thing you can have is linker error. Other alternatives are worse - stable run-time error, release-configuration-only error, etc...
So if you don't want to restrict your users to single correct project settings set and compiler version, use only primitive data for interface. For internal implementation choose what is more convenient.

Adding to Poita_'s response:
consider unicode support
If you ever have to support localization, too, you'll be happy to have done it in the first place
when returning char/wchar_t const *, define the lifetime of the data. The best would be to have a project-wide "unless stated otherwise..." standard
Alternatively, you can return a copy that must be freed through a method exported by your library. (C++ clients can move that into a smart pointer to regain automatic memory management.)

std::string will work in, at the very least, Visual Studio C++ (and others), so why not just use that?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

use of char * vs std::string in different environments - c++

Using a char * you are sure that you will not get portability issues among libraries. If a library exports a function that uses an std::string, it might have problems communicating with another library that has been linked against a different version of the standard library.

I think that there is nothing to worry about unless you are going to provide some API to 3rd party. Just use std::string

There's nothing unportable about std::string that isn't also an issue with char . std::string actually uses a char internally...

string is better. There is nothing unreliable about it on any platform. If you're worried about passing large classes, you can pass const references of your strings into functions. Makes coding faster and less bug prone.

In addition to the fact thata it's easier, std::string will probably be more efficient. Its small string optimization can keep the 10 digits in the std::string object itself, instead of putting them in another memory block off the heap.

Related

Are the methods in the <cstring> applicable for string class too?

Is there an equivalent way to do CString::GetBuffer in std::string?

How can I avoid encoding mixups of strings in a C/C++ API?

Why did this work with Visual C++, but not with gcc?

Static library API question (std::string vs. char*)

Categories

Resources