When to use char array instead of strings in c++? - c++

How do we differentiate char arrays and string in c++?
Is there anything char arrays do better than std::string ?

How do we differentiate char arrays and string in c++?
You don't, string literals are by definition null-terminated char arrays. Since arrays decay into pointers the first chance they get, const char* is (still) often a synonym for string.
If you are asking about when you should write new char[n], the answer is never. If anything, it should be std::make_unique<char[]>(n); and unless you are writing your own version of std::string, use the standard one. If you need a buffer, use std::vector or std::array.
There are some advantages of const char[] constants over const std::string but they are being "solved" by the new C++ Standards:
Before C++20, std::string could not be used in constexpr context. So, I still prefer declaring global string constants with constexpr const char[] if all I do is just passing them to some function. As #HolyBlackCat mentioned in the comments, C++17 std::string_view makes this use-case obsolote too, especially with the new sv literal:
#include <string_view>
using namespace std::literals;
//Compile-time string_view
constexpr auto str = "hello"sv;
const char* is somewhat more universal. You can pass it to a function accepting const char*, std::string, or std::string_view. The reverse requires std::string::c_str() and it is not possible to so without copying the std::string_view.
There is no dynamic allocation involved. Although std::string might employ SSO, it is not guaranteed. This might be relevant for very small systems where the heap is precious and the program flash memory is more accomodating and contains the literal anyway.
Interacting with old libraries. But even then, std::string is null-terminated too.
Overall, my recommendation would be to use std::string_view every chance you get - for any non-owning string, including holding string literals. Most importantly, it should replace const char* and const std::string& function parameters. If you want to own a string, use std::string.

One reason (which I personally don't think is a very good one) to use char arrays is the use case when you want your code to compile with both a C compiler and a C++ compiler.

Related

c++20 handling of string literals

We’re updating a project to c++20, and are running into errors where we pass string literals into functions which take char *. I know this has been changed to make code more safe, but we are interfacing with libraries which we cannot change.
I’d rather not disable the strict treatment of literals via compiler flags, so is there a good way to wrap these literals just in these particular cases?
I was thinking of an inline function, that was named something specific to the library, that internally would use const_cast. That way later if we want to change the code because the library gets updated, we know exactly where to look.
Any other ideas?
"Any other ideas?"
static char my_string[] = "string";
...
//elsewhere in the code
library_function(my_string);
The only difference between passing a string like that, and passing a string literal is the section of the assembly the data is stored in.
A string literal is stored in .text, a non-modifiable section.
The non-const string will be stored in .data.
If you really, really care if you're passing a function a pointer to .text or a pointer to .data, and you really, really, trust the library to not modify the parameter now and for ever, then you can certainly cast away the const-ness of your string literals.
Ignoring the fact that documentation lags behind implementation, even if we could believe the documentation promise to not modify its inputs, if it doesn't enforce it through the interface, at any time, on purpose or on accident, that input could be modified.
The following string literal creates a std::string for each literal and implicitly converts to char*.
#include <string>
struct stringlitwrapper
{
constexpr stringlitwrapper(const char* c) : s(c) {};
operator char*() { return s.data(); }
std::string s;
};
constexpr stringlitwrapper operator"" _w (const char* c, std::size_t n)
{
return stringlitwrapper(c);
}
void libfunction(char* param) {
// uses non-const char* as parameter
}
int main() {
libfunction("string literal"_w);
return 0;
}
For compilers, which do not support constexpr here (e.g. msvc does, clang not), leave both constexpr away.
By internally storing the literal as non-const string, there is no undefined behaviour involved.
(The library function of course should not overwrite at all or at least not write over the end of the string.)
To prevent heap allocations, the std::string could be replaced by a char array with fixed (maximal) size.

Why standard does not provide specialization for hashing c strings with content examination

I am reading about hashing and I found following statement in here:
There is no specialization for C strings. std::hash<const char*> produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.
Why there is no mechanism that hashes C strings that examine their content?
I was browsing code for std::string_view and it seems like it would come handy to cover both: std::string and std::string_view.
Edit
Thanks for the comments. I think I wasn’t very clear. I should ask why there is no functionality to hash c strings with giving its length as an argument? hash(const char* data, size_t size) would handle nul, and not nul terminated c strings.
A specialization is not possible because "C-strings" are just pointers with the implication that they're null-terminated. A const char * could just as well be a pointer to some raw byte data, not necessarily a C-string. You simply can't assume that all const char * are null-terminated.
Your suggestion with std::string_view is not a good idea if you are using std::string, because when the associated std::string gets destroyed, the std::string_view becomes invalid.
Your suggestion would only work if you can guarantee that the associated C-string or std::string stays alive as long as the std::string_view is stored inside the map.
It is the programmer's responsibility to ensure that the resulting string view does not outlive the string.
(see std::basic_string<CharT,Traits,Allocator>::operator basic_string_view)

Using auto foo = "bar" vs std::string in C++11

I'm searching for a substring using string::find in C++. When I defined a string using const auto and used the variable later down, eclipse replaced . with ->.
I found this SO thread which concludes that auto foo = "bar" is deduced to a (const char *) foo = "bar". So eclipse is correct converting . to -> even though I was a bit baffled to begin with. I assumed incorrectly auto would become std::string.
Would there be a downside deducing auto foo = "bar" to std::string instead of const char * ? Increased code size, slower performance?
Your code could have a million classes that can be constructed implicitly from a const char *. Why should std::string be chosen?
auto simply saves some keyboard typing you if you want a variable with the same type of the expression¹, not if you want to create a different object.
(1) more or less; things as always get somewhat hairy with C++...
Well, likely, you have just answered your own question. std::string takes slightly more space (it has size counter), its creation involves dynamic allocation etc.
The lack of a complex string type may seem an anachronism nowadays, but since C++ is oriented toward a complete replacement of C with its low-level efficiency, it's pretty explainable.
Moreover, std::string is just a library class. you can choose a different string type, e.g. QString or std::experimental::string_view, if your task requires it. BTW, string_view is much more similar to const char[] since it doesn't provide dynamic manipulations at all and can be used in constexpr
"Foobar" is a string literal and not a std::string. This is stored as const char[7] in a read only section of your binary.
std::string te type has an implicit conversion from const char * because it has a single argument constructor without it being explicit which is invoked if you write: std::string s = "foobar";. Note that the default argument of allocator is assigned on the constructor.
Using const auto gives you the actual type instead of a converted type. So converting a string literal to std::string actually creates another object that references the literal.
http://en.cppreference.com/w/cpp/language/string_literal
http://en.cppreference.com/w/cpp/string/basic_string

C++ - char* vs. string*

If I have a pointer that points to a string variable array of chars, is there a difference between typing:
char *name = "name";
And,
string name = "name";
Yes, there’s a difference. Mainly because you can modify your string but you cannot modify your first version – but the C++ compiler won’t even warn you that this is forbidden if you try.
So always use the second version.
If you need to use a char pointer for whatever reason, make it const:
char const* str = "name";
Now, if you try to modify the contents of str, the compiler will forbid this (correctly). You should also push the warning level of your compiler up a notch: then it will warn that your first code (i.e. char* str = "name") is legal but deprecated.
For starters, you probably want to change
string *name = "name";
to read
string name = "name";
The first version won't compile, because a string* and a char* are fundamentally different types.
The difference between a string and a char* is that the char* is just a pointer to the sequence. This approach of manipulating strings is based on the C programming language and is the native way in which strings are encoded in C++. C strings are a bit tricky to work with - you need to be sure to allocate space for them properly, to avoid walking off the end of the buffer they occupy, to put them in mutable memory to avoid segmentation faults, etc. The main functions for manipulating them are in <cstring>. Most C++ programmers advise against the use of C-style strings, as they are inherently harder to work with, but they are still supported both for backwards compatibility and as a "lowest common denominator" to which low-level APIs can build off of.
A C++-style string is an object encapsulating a string. The details of its memory management are not visible to the user (though you can be guaranteed that all the memory is contiguous). It uses operator overloading to make some common operations like concatenation easier to use, and also supports several member functions designed to do high-level operations like searching, replacing, substrings, etc. They also are designed to interoperate with the STL algorithms, though C-style strings can do this as well.
In short, as a C++ programmer you are probably better off using the string type. It's safer and a bit easier to use. It's still good to know about C-style strings because you will certainly encounter them in your programming career, but it's probably best not to use them in your programs where string can also be used unless there's a compelling reason to do so.
Yes, the second one isn't valid C++! (It won't compile).
You can create a string in many ways, but one way is as follows:
string name = "name";
Note that there's no need for the *, as we don't need to declare it as a pointer.
char* name = "name" should be invalid but compiles on most systems for backward compatibility to the old days when there was no const and that it would break large amounts of legacy code if it did not compile. It usually gets a warning though.
The danger is that you get a pointer to writable data (writable according to the rules of C++) but if you actually tried writing to it you would invoke Undefined Behaviour, and the language rules should attempt to protect you from that as much as is reasonably possible.
The correct construct is
const char * name = "name";
There is nothing wrong with the above, even in C++. Using string is not always more correct.
Your second statement should really be
std::string name = "name";
string is a class (actually a typedef of basic_string<char,char_traits<char>,allocator<char>) defined in the standard library therefore in namespace std (as are basic_string, char_traits and allocator)
There are various scenarios where using string is far preferable to using arrays of char. In your immediate case, for example, you CAN modify it. So
name[0] = 'N';
(convert the first letter to upper-case) is valid with string and not with the char* (undefined behaviour) or const char * (won't compile). You would be allowed to modify the string if you had char name[] = "name";
However if want to append a character to the string, the std::string construct is the only one that will allow you to do that cleanly. With the old C API you would have to use strcat() but that would not be valid unless you had allocated enough memory to do that.
std::string manages the memory for you so you do not have to call malloc() etc. Actually allocator, the 3rd template parameter, manages the memory underneath - basic_string makes the requests for how much memory it needs but is decoupled from the actual memory allocation technique used, so you can use memory pools, etc. for efficiency even with std::string.
In addition basic_string does not actually perform many of the string operations which are done instead through char_traits. (This allows it to use specialist C-functions underneath which are well optimised).
std::string therefore is the best way to manage your strings when you are handling dynamic strings constructed and passed around at run-time (rather than just literals).
You will rarely use a string* (a pointer to a string). If you do so it would be a pointer to an object, like any other pointer. You would not be able to allocate it the way you did.
C++ string class is encapsulating of char C-like string. It is a much more convenient (http://www.cplusplus.com/reference/string/string/).
for legacy you always can "extract" char pointer from string variable to deal with it as char pointer:
char * cstr;
string str ("Please split this phrase into tokens");
cstr = new char [str.size()+1];
strcpy (cstr, str.c_str()); //here str.c_str() generate null terminated char* pointer
//str.data() is equivalent, but without null on end
Yes, char* is the pointer to an array of character, which is a string. string * is the pointer to an array of std::string (which is very rarely used).
string *name = "name";
"name" is a const char*, and it would never been converted to a std::string*. This will results compile error.
The valid declaration:
string name = "name";
or
const char* name = "name"; // char* name = "name" is valid, but deprecated
string *name = "name";
Does not compile in GCC.

C++: Is "my text" a std::string, a *char or a c-string?

I have just done what appears to be a common newbie mistake:
First we read one of many tutorials that goes like this:
#include <fstream>
int main() {
using namespace std;
ifstream inf("file.txt");
// (...)
}
Secondly, we try to use something similar in our code, which goes something like this:
#include <fstream>
int main() {
using namespace std;
std::string file = "file.txt"; // Or get the name of the file
// from a function that returns std::string.
ifstream inf(file);
// (...)
}
Thirdly, the newbie developer is perplexed by some cryptic compiler error message.
The problem is that ifstream takes const * char as a constructor argument.
The solution is to convert std::string to const * char.
Now, the real problem is that, for a newbie, "file.txt" or similar examples given in almost all the tutorials very much looks like a std::string.
So, is "my text" a std::string, a c-string or a *char, or does it depend on the context?
Can you provide examples on how "my text" would be interpreted differently according to context?
[Edit: I thought the example above would have made it obvious, but I should have been more explicit nonetheless: what I mean is the type of any string enclosed within double quotes, i.e. "myfilename.txt", not the meaning of the word 'string'.]
Thanks.
So, is "string" a std::string, a c-string or a *char, or does it depend on the context?
Neither C nor C++ have a built-in string data type, so any double-quoted strings in your code are essentially const char * (or const char [] to be exact). "C string" usually refers to this, specifically a character array with a null terminator.
In C++, std::string is a convenience class that wraps a raw string into an object. By using this, you can avoid having to do (messy) pointer arithmetic and memory reallocations by yourself.
Most standard library functions still take only char * (or const char *) parameters.
You can implicitly convert a char * into std::string because the latter has a constructor to do that.
You must explicitly convert a std::string into a const char * by using the c_str() method.
Thanks to Clark Gaebel for pointing out constness, and jalf and GMan for mentioning that it is actually an array.
"myString" is a string literal, and has the type const char[9], an array of 9 constant char. Note that it has enough space for the null terminator. So "Hi" is a const char[3], and so forth.
This is pretty much always true, with no ambiguity. However, whenever necessary, a const char[9] will decay into a const char* that points to its first element. And std::string has an implicit constructor that accepts a const char*. So while it always starts as an array of char, it can become the other types if you need it to.
Note that string literals have the unique property that const char[N] can also decay into char*, but this behavior is deprecated. If you try to modify the underlying string this way, you end up with undefined behavior. Its just not a good idea.
std::string file = "file.txt";
The right hand side of the = contains a (raw) string literal (i.a. a null-terminated byte string). Its effective type is array of const char.
The = is a tricky pony here: No assignment happens. The std::string class has a constructor that takes a pointer to char as an argument and this is called to create a temporary std::string and this is used to copy-construct (using the copy ctor of std::string) the object file of type std::string.
The compiler is free to elide the copy ctor and directly instantiate file though.
However, note that std:string is not the same thing as a C-style null-terminated string. It is not even required to be null-terminated.
ifstream inf("file.txt");
The std::ifstream class has a ctor that takes a const char * and the string literal passed to it decays to a pointer to the first element of the string.
The thing to remember is this: std::string provides (almost seamless) conversion from C-style strings. You have to look up the signature of the function to see if you are passing in a const char * or a std::string (the latter because of implicit conversions).
So, is "string" a std::string, a c-string or a char*, or does it depend on the context?
It depends entirely on the context. :-) Welcome to C++.
A C string is a null-terminated string, which is almost always the same thing as a char*.
Depending on the platforms and frameworks you are using, there might be even more meanings of the word "string" (for example, it is also used to refer to QString in Qt or CString in MFC).
The C++ standard library provides a std::string class to manage and represent character sequences. It encapsulates the memory management and is most of the time implemented as a C-string; but that is an implementation detail. It also provides manipulation routines for common tasks.
The std::string type will always be that (it doesn't have a conversion operator to char* for example, that's why you have the c_str() method), but it can be initialized or assigned to by a C-string (char*).
On the other hand, if you have a function that takes a std::string or a const std::string& as a parameter, you can pass a c-string (char*) to that function and the compiler will construct a std::string in-place for you. That would be a differing interpretation according to context as you put it.
Neither C nor C++ have a built-in string data type.
When the compiler finds, during the compilation, a double-quoted strings is implicitly referred (see the code below), the string itself is stored in program code/text and generates code to create even character array:
The array is created in static storage because it must persist to be referred later.
The array is made to constant because it must always contain the original data (Hello).
So at last, what you have is const char * to this constant static character array.
const char* v()
{
char* text = “Hello”;
return text;
// Above code can be reduced to:
// return “Hello”;
}
During the program run, when the control finds opening bracket, it creates “text”, the char* pointer, in the stack and constant array of 6 elements (including the null terminator ‘\0’ at the end) in static memory area. When control finds next line (char* text = “Hello”;), the starting address of the 6 element array is assigned to “text”. In next line (return text;), it returns “text”. With the closing bracket “text” will disappear from the stack, but array is still in the static memory area.
You need not to make return type const. But if you try to change the value in static array using non constant char* it will still give you an error during the run time because the array is constant. So, it’s always good to make return constant to make sure, it cannot be referred by non constant pointer.
But if the compiler finds a double-quoted strings is explicitly referred as an array, the compiler assumes that the programmer is going to (smartly) handle it. See the following wrong example:
const char* v()
{
char text[] = “Hello”;
return text;
}
During the compilation, compiler checks, quoted text and save it as it is in the code to fill the generated array during the runt time. Also, it calculate the array size, in this case again as 6.
During the program run, with the open bracket, the array “text[]” with 6 elements is created in stack. But no initialization. When the code finds (char text[] = “Hello”;), the array is initialized (with the text in compiled code). So array is now on the stack. When the compiler finds (return text;), it returns the starting address of the array “text”. When the compiler find the closing bracket, the array disappears from the stack. So no way to refer it by the return pointer.
Most standard library functions still take only char * (or const char *) parameters.
The Standard C++ library has a powerful class called string for manipulating text. The internal data structure for string is character arrays. The Standard C++ string class is designed to take care of (and hide) all the low-level manipulations of character arrays that were previously required of the C programmer. Note that std::string is a class:
You can implicitly convert a char * into std::string because the
latter has a constructor to do that.
You can explicitly convert a std::string into a const char * by using the c_str() method.
As often as possible it should mean std::string (or an alternative such as wxString, QString, etc., if you're using a framework that supplies such. Sometimes you have no real choice but to use a NUL-terminated byte sequence, but you generally want to avoid it when possible.
Ultimately, there simply is no clear, unambiguous terminology. Such is life.
To use the proper wording (as found in the C++ language standard) string is one of the varieties of std::basic_string (including std::string) from chapter 21.3 "String classes" (as in C++0x N3092), while the argument of ifstream's constructor is NTBS (Null-terminated byte sequence)
To quote, C++0x N3092 27.9.1.4/2.
basic_filebuf* open(const char* s, ios_base::openmode mode);
...
opens a file, if possible, whose name is the NTBS s