Strings and character array - c++

What is the difference between the string and character array?
How can each element of the string be accessed in C++?

string manages its own memory; this is not so with an array of char except as a local variable.
In both cases you can access individual elements using [] (but in the case of string this is actually operator[]).
string has a lot of built-in functions that you don't easily get in a C++-friendly way with C-Strings.

In C, they are the same, a string is a char array and you have a lot of standard methods to handle them like sprintf, strcat, strcpy, strdup, strchr, strstr...
In C++, you can also use the STL string class that will provide a object oriented string that you can manipulate in an easier way. The advantage is that the code is easier to read and you don't need to allocate/deallocate memory for the strings by yourself.

Related

What are some good scenarios to use a cstring over a string?

I've made quite a few projects (small) now with C++ and was wondering, what would be some good scenarios in which it is better to use a cstring instead of a string?
Just to clarify, I'm not calling cstrings bad. I'm just genuinely interested if they are as important as regular strings in C++
std::string allocates dynamic memory at runtime for its character data (unless the std::string implementation employs "Short String Optimization" and the data is short enough to fit in the SSO buffer).
So, one scenario you may want to use C-style strings for is when you want to pass around string literals without allocating memory for them. Assigning a string literal to a std::string will allocate dynamic memory (if SSO is not used).
There can also be scenarios where you want to process character data without allocating new memory for extracted substrings. C-style strings can be good for that, too (though std::string_view in C++17 and later would generally be better for that).
cstrings are just character arrays will a null character to terminate them. While std::string are dynamically allocated. Able to have more memory control; basically memory management is the main advantage

Should I always declare char array bigger than my string is?

So I know that you should declare char arrays to be one element bigger than the word you want to put there because of the \0 that has to be at the end, but what about char arrays that I don't want to use as words?
I'm currently writing a program in which i store an array of keyboard letters that have some function assigned to them. Should I still end this array with \0?
That is probably not necessary.
A null terminator is not a requirement for arrays of char; it is a requirement for "C-strings", things that you intend to use as unitary blobs of data, particularly if you intend to pass them to C API functions. It's the conventional way that the "length" of the string is determined.
But if you just want a collection of chars to use independently then knock yourself out.
We cannot see your code, but it sounds to me like you don't want or need it in this case.
The array should have, at least, the same number of elements as the data you will put there. So, if:
you don't need the '\0'
you won't place it there
you won't use routines that will depend on an '\0' to tell you the array size
... you are good with not using the trailing '\0'
If you're using C++, you should probably just use std::string or std::vector<char> or even std::array<char> and not worry about terminators.
It depends on usage. If you want to use it not as just byte array, but as c-string with probably usage of some standard string algorithms (strcmp and so on), or output to the stream - your array should ends with \0.
It depends on what you are trying to do, if you are trying to define a C-style string then, you need the terminator since the C-library won't be able to calculate the size of the string and other things if you don't...
In C++, though, the size of the string is already stored inside the std::string class along with the dynamic array of chars...
But if you just need a free container for storing characters where you don't need it to do C-string-like things... You are free to do:
char hello[128]; // 128 elements, do anything with them...
Without the terminator...
In your case, you are storing values, not creating a string, and you won't probably treat it as a string either, so doing it without the null-terminator, suffices...
\0 will certainly make it easier when wanting to use functions like strlen, strcmp, strcatand the like, but is not required.
An aside - We have an entire enterprise code base built upon strings (char arrays) with no null terminators in the database. Works just fine.

Does C++17 std::basic_string_view invalidates the use of C strings?

C++17 is introducing std::basic_string_view which is non-owning string version with its class storing only a pointer to the first element of a string and size of the string. Is there still a reason to keep using C strings?
Is there still a reason to keep using C strings?
I think it would be fair to say that other than speaking to a C API, there has never been a reason to use C strings.
When designing an interface of a function or method that simply needs a read-only representation of characters, you will want to prefer std::string_view. E.g. searching a string, producing an upper-case copy, printing it, and so on.
When designing an interface that takes a copy of a string of characters, you should probably prefer first and last iterators. However, std::string_view could be thought of as a proxy for these iterators, so string_view is appropriate.
If you want to take ownership of a long string, probably prefer to pass std::string, either by value or by r-value reference.
When designing an object that marshals calls to a c API that is expecting null-terminated strings, you should prefer std::string or std::string const& - because its c_str() method will correctly yield a null-terminated string.
When storing strings in objects (which are not temporary proxies), prefer std::string.
Of course the use of const char* as an owner of data in c++ is never appropriate. There is always a better way. This has been true since c++98.
"Invalidate" has a technical meaning here which I think is unintentional. It sounds like "obviate" is the intended word.
You are still going to have to produce and consume C strings in order to interact with common APIs. For example, POSIX has open and execve, Win32 has the rough equivalents CreateFile and CreateProcess, and all of these functions operate on C strings. But in the end, you are still calling str.data() or str.c_str() in order to interact with these APIs, so that use of C strings is not going away, no matter whether str is a std::basic_string_view or std::basic_string.
You will still have to understand what C strings are in order to correctly use these APIs. While std::string guarantees a NUL terminator, std::string_view does not, and neither structure guarantees that there is no NUL byte somewhere inside the string. You will have to sanitize NUL bytes in the middle of your string in either case.
This does not even touch on the wealth of 3rd party libraries which use C strings, or the cost of retrofitting your own code which uses C strings to one which uses std::string_view.

CString or char array which one is better in terms of memory

I read somewhere that usage of CString is costly. Can you calrify it with an example. Also among CString and char array, which is better in terms of memory.
CString in addition to array of chars (or wide chars) contains string size, allocated buffer size, and reference counter (serving additionally as a lock flag). The buffer containing the array of chars may be significantly larger than the string it contains -- it allows to reduce the number of time-costly allocation calls. In addition, when the CString is set to be zero-sized, it still contains two wchar characters.
Naturally, when you compare the size of CString with the size of corresponding C-style array, the array will be smaller. However, if you want to manipulate your string as extensively as CString allows, you will eventually define your own variables for string size, buffer size and sometimes refcounter and/or guard flags. Indeed, you need to store your string size to avoid calling strlen each time you need it. You need to store separately your buffer size if you allow your buffer to be larger than the string length, and avoid calling reallocs each time you add to or subtract from the string. And so on -- you trade some small size increase for significant increases in speed, safety and functionality.
So, the answer depends on what you are going to do with the string. Suppose you want a string to store the name of your class for logging -- there a C-style string (const and static) will do fine. If you need a string to manipulate and use it extensively with MFC or ATL-related classes, use CString family types. If you need to manipulate string in the "engine" parts of your application that are isolated from its interface, and may be converted to other platforms, use std::string or write your own string type to suit your particular needs (this can be really useful when you write the "glue" code to place between the interface and the engine, otherwise std::string is preferable).
CString is from MFC framework specific to windows. std::string is from c++ standard. They are library classes for managing strings in memory. std::string will provide you code portability across platforms.
Using raw array is always good for memory however one has to do operations on strings and it becomes difficult with raw array, consider out of bounds check, get the string length, copy the array or change the size because the string may grow, deleting the array, etc. For all these problem string utility class are good wrapper. The string class will keep the actual string in heap and you have the overhead of the string class itself. However that will provide you functionality to mange the string memory which anyway you have to write by hand.
Prefer std::string if you can, if not, use CString.
In almost all cases I encourage novice programmers to use std::string or CString(*). First they will do significantly less errors. I have seen many buffer overruns, memory invalidation or memory leaks, because of erroneous use of C arrays.
So which is more efficient, CString / std::string or raw character arrays? Memory wise, generally speaking, all CString ans std::string have more is one integer for the size. The question is does it matter?
So which is more efficient in terms of performance? Well it depends on what you are doing with it and how you are using your C-arrays. But passing CString or std::string arround can be computationally more efficient than C-arrays. The problem with C-arrays is that you can't be sure of who owns the memory and what type (heap/stack/literal) it is. Defensive programming results in more copies of arrays, you know, just to be sure that the memory you hold will be valid for the entire duration of when it is needed.
Why is std::string or CString more efficient than C-arrays, if they are passed around by value? This is a bit more complicated and for totally different reasons. For CString, this is simple, it implemented as a COW (copy on write) object. So when you have 5 objects that originate for one CString, it will not use more memory that one, until you start to make change on one object. std::string has stricter requirements and thus it is not allowed to share memory with other std:: string objects. But if you have a newer compiler, std::string should implement the move semantic and thus returning a string from a function will only result in a copy of the pointer not reallocation.
There are very few cases where raw C arrays are good and practical idea.
*) If you are already programming against MFC, why not just use CString.

Why should one use std::string over c-style strings in C++?

"One should always use std::string over c-style strings(char *)" is advice that comes up for almost every source code posted here. While the advice is no doubt good, the actual questions being addressed do not permit to elaborate on the why? aspect of the advice in detail. This question is to serve as a placeholder for the same.
A good answer should cover the following aspects(in detail):
Why should one use std::string over c-style strings in C++?
What are the disadvantages (if any) of the practice mentioned in #1?
What are the scenarios where the opposite of the advice mentioned in #1 is a good practice?
std::string manages its own memory, so you can copy, create, destroy them easily.
You can't use your own buffer as a std::string.
You need to pass a c string / buffer to something that expects to take ownership of the buffer - such as a 3rd party C library.
Well, if you just need an array of chars, std::string provides little advantage. But face it, how often is that the case? By wrapping a char array with additional functionality like std::string does, you gain both power and efficiency for some operations.
For example, determining the length of an array of characters requires "counting" the characters in the array. In contrast, an std::string provides an efficient operation for this particular task. (see https://stackoverflow.com/a/1467497/129622)
For power, efficiency and sanity
Larger memory footprint than "just" a char array
When you just need an array of chars
3) The advice always use string of course must be taken with a pinch of common sense. String literals are const char[], and if you pass a literal to a function that takes a const char* (for example std::ifstream::open()) there's absolutely no point wrapping it in std::string.
A char* is basically a pointer to a character. What C does is frequently makes this pointer point to the first character in an array.
An std::string is a class that is much like a vector. Internally, it handles the storage of an array of characters, and gives the user several member functions to manipulate said stored array as well as several overloaded operators.
Reasons to use a char* over an std::string:
C backwards-compatibility.
Performance (potentially).
char*s have lower-level access.
Reasons to use an std::string over a char*:
Much more intuitive to use.
Better searching, replacement, and manipulation functions.
Reduced risk of segmentation faults.
Example :
char* must be used in conjuction with either a char array, or with a dynamically allocated char array. After all, a pointer is worthless unless it actually points to something. This is mainly used in C programs:
char somebuffer[100] = "a string";
char* ptr = somebuffer; // ptr now points to somebuffer
cout << ptr; // prints "a string"
somebuffer[0] = 'b'; // change somebuffer
cout << ptr; // prints "b string"
notice that when you change 'somebuffer', 'ptr' also changes. This is because somebuffer is the actual string in this case. ptr just points/refers to it.
With std::string it's less weird:
std::string a = "a string";
std::string b = a;
cout << b; // prints "a string"
a[0] = 'b'; // change 'a'
cout << b; // prints "a string" (not "b string")
Here, you can see that changing 'a' does not affect 'b', because 'b' is the actual string.
But really, the major difference is that with char arrays, you are responsible for managing the memory, whereas std::string does it for you. In C++, there are very few reasons to use char arrays over strings. 99 times out of 100 you're better off with a string.
Until you fully understand memory management and pointers, just save yourself some headaches and use std::string.
Why should one use std::string over c-style strings in C++?
The main reason is it frees you from managing the lifetime of the string data. You can just treat strings as values and let the compiler/library worry about managing the memory.
Manually managing memory allocations and lifetimes is tedious and error prone.
What are the disadvantages (if any) of the practice mentioned in #1?
You give up fine-grained control over memory allocation and copying. That means you end up with a memory management strategy chosen by your toolchain vendor rather than chosen to match the needs of your program.
If you aren't careful you can end up with a lot of unneeded data copying (in a non-refcounted implementation) or reference count manipulation (in a refcounted implementation)
In a mixed-language project any function whose arguments use std::string or any data structure that contains std::string will not be able to be used directly from other languages.
What are the scenarios where the opposite of the advice mentioned in #1 is a good practice?
Different people will have different opinions on this but IMO
For function arguments passing strings in "const char *" is a good choice since it avoids unnessacery copying/refcouning and gives the caller flexibility about what they pass in.
For things used in interoperation with other languages you have little choice but to use c-style strings.
When you have a known length limit it may be faster to use fixed-size arrays.
When dealing with very long strings it may be better to use a construction that will definately be refcounted rather than copied (such as a character array wrapped in a shared_ptr) or indeed to use a different type of data structure altogether
In general you should always use std::string, since it is less bug prone. Be aware, that memory overhead of std::string is significant. Recently I've performed some experiments about std::string overhead. In general it is about 48 bytes! The article is here: http://jovislab.com/blog/?p=76.