Professional way to initialize an empty string in c++ - c++

Hi I want to learn a professional way to initialize an emtpy string in c++.
I could do
std::string a_string; // way_1
or
std::string a_string = ""; // way_2
But I think way_1 is fully depending on the default value defined in std package. To me, it is not an explicit declaration, and it might need to be changed if the codes of std::string is changed in the future. way_2 to me is not directly but using "" which is equivalent to an empty string. To me, a built-in empty string is professional, like nullptr for initializing a pointer.
Do you know how would C++ professional programmers initialize an empty string?
Thanks

But I think way_1 is fully depending on the default value defined in std package. To me, it is not an explicit declaration, and it might need to be changed if the codes of std::string is changed in the future.
This is faulty reasoning. The default value of std::string is defined by the C++ standard and is just as stable as every other part of the language. If you are worried about it changing, then you should also be worrying about std::string disappearing completely, or the meaning of "" changing, or the meaning of the initialization with the empty string changing.
If you are going for "professional", then I, if I saw a programmer using the explicit form, would wonder whether this programmer doesn't know what the default initialization of such a fundamental type as std::string does, and therefore what else this programmer might not know.

Related

C++ Split definition and assignment of variables

So I recently installed CLion and after some setup I started messing with some older code I'd written. Now a great thing about CLion is that it helps you with coding style etc. but one thing I found strange is that it recommended to me to split the definition and assignment of variables. I remember a specific case where I defined strings as follows:
string path = "mypath";
But the IDE recommended to write it like:
string path;
path = "mypath";
Now I started looking for this online to find pros and cons of both methods. Apparently the former is faster but the latter is more secure because it calls the copy constructor or something (I didn't quite understand that part). My question is basically: Since CLion recommends doing it the latter way, does that mean that it is always better? Is there one way that is preferred over the other or is it situational? Do they each have their pros/cons and if so, what are they?
Any help or reliable resources are greatly appreciated.
Thanks in advance,
Xentro
Your IDE needs to be consigned to the bin.
string path = "mypath"; is an infinitely better pattern. This is because, in general, there is no potential hazard of an indeterminate value of path. (Granted, in this case, that is not an issue since a std::string has a well-defined default constructor but the preferred way is a good habit to get into).
The compiler is also spared the headache of constructing then assigning a value to an object.
But the IDE recommended to write it like:
string path;
path = "mypath";
Time to uninstall the IDE. The recommendation is utter nonsense.
Perhaps the worst thing about it is how it prevents const correctness. If path never changes, then you should make it const, and you can only do that if you initialise the variable with the correct value right away:
std::string const path = "mypath";
Note that depending on the context of the code, you may also want to use auto (which will turn the variable into a char pointer, because the deduced type of "mypath" is a char array, but perhaps it turns out you don't even need a full-fletched std::string?) - and that also only works if you do not follow the stupid IDE recommendation:
auto const path = "mypath";
Apparently the former is faster
Nonsense. This is not about speed but about correctness and code maintainability.
but the latter is more secure because it calls the copy constructor or something (I didn't quite understand that part).
That's nonsense, too. It calls one of std::string's overloaded assignment operators, not the copy constructor. And it's not more "secure" in any usual sense of the word.
The advice is nonsense and other answers here tell that already.
I want to add a source for this: Straight from the creator of C++ Bjarne Stroustrup and the chairman on C++ comete (hope I don't get his postion wrong) Herb Sutter:
C++ core guidelines:
ES.20: Always initialize an object
Reason Avoid used-before-set errors
and their associated undefined behavior. Avoid problems with
comprehension of complex initialization. Simplify refactoring. Example
void use(int arg)
{
int i; // bad: uninitialized variable
// ...
i = 7; // initialize i
}
No, i = 7 does not initialize i; it assigns to it. Also, i can be read
in the ... part. Better:
void use(int arg) // OK
{
int i = 7; // OK: initialized
string s; // OK: default initialized
// ...
}
Note The always initialize rule is deliberately stronger than the an object must be set before used language rule. The latter, more
relaxed rule, catches the technical bugs, but:
It leads to less readable code
It encourages people to declare names in greater than necessary scopes
It leads to harder to read code
It leads to logic bugs by encouraging complex code
It hampers refactoring
The always initialize rule is a style rule aimed to improve
maintainability as well as a rule protecting against used-before-set
errors.
string path = "mypath";
The code above is called using a copy constructor
string path;
path = "mypath";
And the other one is called uses the assignment operator.
Theoretically, the pros and cons really depend on how the compiler going to deal with the code. So it vary from compiler to compiler. In general, the first way is trying to build a object by copying another existing object. While the second one is first create a new empty object, then using the assignment operator ("=", the equal mark) to give value to the variable.
You can see more discussions in the following links:
Why separate variable definition and initialization in C++?
and What's the difference between assignment operator and copy constructor?

Compile time encryption for strings using user-defined literals

I am aware that the new C++ standard allows for user-defined literals and that their generation can be done in compile time.
However, I am quite new to the whole template metaprogramming universe and I'm trying to get some examples going, but still without success, since it's the first time I have contact with this particular feature.
So let's say I have a string:
std::string tmp = "This is a test message";
And I would like to encrypt it at compile time using:
std::string tmp = "This is a test message"_encrypt;
Is it even possible what I'm trying to attempt?
I am currently using VS2015 so any help or feedback is appreciated.
Is it even possible what I'm trying to attempt?
Yes, it is possible*. What you can pre-compute and put directly in the source code can also be done by the compiler at compile time.
However, you cannot use std::string. It's not a literal type. Something like:
constexpr std::string tmp = "some string literal"
will never compile because std::string and std::basic_string in general have no constexpr constructor.
You must therefore use const char [] as input for your meta-programming; after that, you may assign it to a std::string.
NB: Meta-programming has some restrictions you need to take into account: you don't have access to many tools you'd otherwise have, like new or malloc, for example: you must allocate on the stack your variables.
*Edit: Not entirely with UDLs, as #m.s. points out. Indeed, you receive a pointer to const chars and the length of the string. This is pretty restrictive in a constexpr scenario, and I doubt it's possible to find a way to work on that string. In "normal" meta-programming, where you can have a size that is a constant expression, compile-time encryption is instead possible.

Why are string literals const?

It is known that in C++ string literals are immutable and the result of modifying a string literal is undefined. For example
char * str = "Hello!";
str[1] = 'a';
This will bring to an undefined behavior.
Besides that string literals are placed in static memory. So they exists during whole program. I would like to know why do string literals have such properties.
There are a couple of different reasons.
One is to allow storing string literals in read-only memory (as others have already mentioned).
Another is to allow merging of string literals. If one program uses the same string literal in several different places, it's nice to allow (but not necessarily require) the compiler to merge them, so you get multiple pointers to the same memory, instead of each occupying a separate chunk of memory. This can also apply when two string literals aren't necessarily identical, but do have the same ending:
char *foo = "long string";
char *bar = "string";
In a case like this, it's possible for bar to be foo+5 (if I'd counted correctly).
In either of these cases, if you allow modifying a string literal, it could modify the other string literal that happens to have the same contents. At the same time, there's honestly not a lot of point in mandating that either -- it's pretty uncommon to have enough string literals that you could overlap that most people probably want the compiler to run slower just to save (maybe) a few dozen bytes or so of memory.
By the time the first standard was written, there were already compilers that used all three of these techniques (and probably a few others besides). Since there was no way to describe one behavior you'd get from modifying a string literal, and nobody apparently thought it was an important capability to support, they did the obvious: said even attempting to do so led to undefined behavior.
It's undefined behaviour to modify a literal because the standard says so. And the standard says so to allow compilers to put literals in read only memory. And it does this for a number of reasons. One of which is to allow compilers to make the optimisation of storing only one instance of a literal that is repeated many times in the source.
I believe you ask about the reason why literals are placed in
read-only memory, not about technical details of linker doing this and
that or legal details of a standard forbidding such and such.
When modification of string literals works, it leads to subtle bugs
even in the absence of string merging (which we have reasons to
disallow if we decided to permit modification). When you see code like
char *str="Hello";
.../* some code, but str and str[...] are not modified */
printf("%s world\n", str);
it's a natural conclusion that you know what's going to be printed,
because str (and its content) were not modified in a particular
place, between initialization and use.
However, if string literals are writable, you don't know it any
more: str[0] could be overwritten later, in this code or inside a
deeply nested function call, and when the code is run again,
char *str="Hello";
won't guarantee anything about the content of str anymore. As we
expect, this initialization is implemented as moving the address known
in link time into a place for str. It does not check that str
contains "Hello" and it does not allocate a new copy of it. However,
we understand this code as resetting str to "Hello". It's hard to
overcome this natural understanding, and it's hard to reason about the
code where it is not guaranteed. When you see an expression like
x+14, what if you had to think about 14 being possibly overwritten
in other code, so it became 42? The same problem with strings.
That's the reason to disallow modification of string literals, both in
the standard (with no requirement to detect the failure early) and in
actual target platforms (providing the bonus of detecting potential
bugs).
I believe that many attempts to explain this thing suffer from the
worst kind of circular reasoning. The standard forbids writing to
literals because the compiler can merge strings, or they can be placed
in read-only memory. They are placed in read-only memory to catch the
violation of the standard. And it's valid to merge literals because
the standard forbids... is it a kind of explanation you asked for?
Let's look at other
languages. Common Lisp standard
makes modification of literals undefined behaviour, even though the
history of preceding Lisps is very different with the history of C
implementations. That's because writable literals are logically
dangerous. Language standards and memory layouts only reflect that
fact.
Python language has exactly one place where something resembling
"writing to literals" can happen: parameter default values, and this
fact confuses people all the time.
Your question is tagged C++, and I'm unsure of its current state
with respect to implicit conversion to non-const char*: if it's a
conversion, is it deprecated? I expect other answers to provide a
complete enlightenment on this point. As we talk of other languages
here, let me mention plain C. Here, string literals are not const,
and an equivalent question to ask would be why can't I modify string
literals (and people with more experience ask instead, why are
string literals non-const if I can't modify them?). However, the
reasoning above is fully applicable to C, despite this difference.
Because is K&R C, there was not such thing as "const". And similarly in pre-ANSI C++. Hence there was a lot of code which had things like char * str = "Hello!"; If the Standards committee made text literals const, all those programs would have no longer compiled. So they made a compromise. Text literals are official const char[], but they have a silent implicit conversion to char*.
In C++, string literals are const because you aren't allowed
to modify them. In standard C, they would have been const as
well, except that when const was introduced into C, there was
so much code along the lines of char* p = "somethin"; that
making them const would have broke, that it was deemed
unacceptable. (The C++ committee chose a different solution to
this problem, with a deprecated implicit conversion which allows
the above.)
In the original C, string literals were not const, and were
mutable, and it was garanteed that no two string literals shared
any memory. This was quickly realized to be a serious error,
allowing things like:
void
mutate(char* p)
{
static char c = 'a';
*p = a ++;
}
and in another module:
mutate( "hello" ); // Can't trust what is written, can you.
(Some early implementations of Fortran had a similar issue,
where F(4) might call F with just about any integral value.
The Fortran committee fixed this, just like the C committee
fixed string literals in C.)

Using void in functions without parameter?

In C++ using void in a function with no parameter, for example:
class WinMessage
{
public:
BOOL Translate(void);
};
is redundant, you might as well just write Translate();.
I, myself generally include it since it's a bit helpful when code-completion supporting IDEs display a void, since it ensures me that the function takes definitely no parameter.
My question is, Is adding void to parameter-less functions a good practice? Should it be encouraged in modern code?
In C++
void f(void);
is identical to:
void f();
The fact that the first style can still be legally written can be attributed to C.
n3290 ยง C.1.7 (C++ and ISO C compatibility) states:
Change: In C++, a function declared with an empty parameter list takes
no arguments.
In C, an empty parameter list means that the number and
type of the function arguments are unknown.
Example:
int f(); // means int f(void) in C++
// int f( unknown ) in C
In C, it makes sense to avoid that undesirable "unknown" meaning. In C++, it's superfluous.
Short answer: in C++ it's a hangover from too much C programming. That puts it in the "don't do it unless you really have to" bracket for C++ in my view.
I see absolutely no reason for this. IDEs will just complete the function call with an empty argument list, and 4 characters less.
Personally I believe this is making the already verbose C++ even more verbose. There's no version of the language I'm aware of that requires the use of void here.
I think it will only help in backward compatibility with older C code, otherwise it is redundant.
I feel like no. Reasons:
A lot more code out there has the BOOL Translate() form, so others reading your code will be more comfortable and productive with it.
Having less on the screen (especially something redundant like this) means less thinking for somebody reading your code.
Sometimes people, who didn't program in C in 1988, ask "What does foo(void) mean?"
Just as a side note. Another reason for not including the void is that software, like starUML, that can read code and generate class diagrams, read the void as a parameter. Even though this may be a flaw in the UML generating software, it is still annoying to have to go back and remove the "void"s if you want to have clean diagrams

Why don't the std::fstream classes take a std::string?

This isn't a design question, really, though it may seem like it. (Well, okay, it's kind of a design question). What I'm wondering is why the C++ std::fstream classes don't take a std::string in their constructor or open methods. Everyone loves code examples so:
#include <iostream>
#include <fstream>
#include <string>
int main()
{
std::string filename = "testfile";
std::ifstream fin;
fin.open(filename.c_str()); // Works just fine.
fin.close();
//fin.open(filename); // Error: no such method.
//fin.close();
}
This gets me all the time when working with files. Surely the C++ library would use std::string wherever possible?
By taking a C string the C++03 std::fstream class reduced dependency on the std::string class. In C++11, however, the std::fstream class does allow passing a std::string for its constructor parameter.
Now, you may wonder why isn't there a transparent conversion from a std:string to a C string, so a class that expects a C string could still take a std::string just like a class that expects a std::string can take a C string.
The reason is that this would cause a conversion cycle, which in turn may lead to problems. For example, suppose std::string would be convertible to a C string so that you could use std::strings with fstreams. Suppose also that C string are convertible to std::strings as is the state in the current standard. Now, consider the following:
void f(std::string str1, std::string str2);
void f(char* cstr1, char* cstr2);
void g()
{
char* cstr = "abc";
std::string str = "def";
f(cstr, str); // ERROR: ambiguous
}
Because you can convert either way between a std::string and a C string the call to f() could resolve to either of the two f() alternatives, and is thus ambiguous. The solution is to break the conversion cycle by making one conversion direction explicit, which is what the STL chose to do with c_str().
There are several places where the C++ standard committee did not really optimize the interaction between facilities in the standard library.
std::string and its use in the library is one of these.
One other example is std::swap. Many containers have a swap member function, but no overload of std::swap is supplied. The same goes for std::sort.
I hope all these small things will be fixed in the upcoming standard.
Maybe it's a consolation: all fstream's have gotten an open(string const &, ...) next to the open(char const *, ...) in the working draft of the C++0x standard.
(see e.g. 27.8.1.6 for the basic_ifstream declaration)
So when it gets finalised and implemented, it won't get you anymore :)
The stream IO library has been added to the standard C++ library before the STL. In order to not break backward compatibility, it has been decided to avoid modifying the IO library when the STL was added, even if that meant some issues like the one you raise.
# Bernard:
Monoliths "Unstrung." "All for one, and one for all" may work for Musketeers, but it doesn't work nearly as well for class designers. Here's an example that is not altogether exemplary, and it illustrates just how badly you can go wrong when design turns into overdesign. The example is, unfortunately, taken from a standard library near you...
~ http://www.gotw.ca/gotw/084.htm
It is inconsequential, that is true. What do you mean by std::string's interface being large? What does large mean, in this context - lots of method calls? I'm not being facetious, I am actually interested.
It has more methods than it really needs, and its behaviour of using integral offsets rather than iterators is a bit iffy (as it's contrary to the way the rest of the library works).
The real issue I think is that the C++ library has three parts; it has the old C library, it has the STL, and it has strings-and-iostreams. Though some efforts were made to bridge the different parts (e.g. the addition of overloads to the C library, because C++ supports overloading; the addition of iterators to basic_string; the addition of the iostream iterator adaptors), there are a lot of inconsistencies when you look at the detail.
For example, basic_string includes methods that are unnecessary duplicates of standard algorithms; the various find methods, could probably be safely removed. Another example: locales use raw pointers instead of iterators.
C++ grew up on smaller machines than the monsters we write code for today. Back when iostream was new many developers really cared about code size (they had to fit their entire program and data into several hundred KB). Therefore, many didn't want to pull in the "big" C++ string library. Many didn't even use the iostream library for the same reasons, code size.
We didn't have thousands of megabytes of RAM to throw around like we do today. We usually didn't have function level linking so we were at the mercy of the developer of the library to use a lot of separate object files or else pull in tons of uncalled code. All of this FUD made developers steer away from std::string.
Back then I avoided std::string too. "Too bloated", "called malloc too often", etc. Foolishly using stack-based buffers for strings, then adding all kinds of tedious code to make sure it doesn't overrun.
Is there any class in STL that takes a string... I dont think so (couldnt find any in my quick search). So it's probably some design decision, that no class in STL should be dependent on any other STL class (that is not directly needed for functionality).
I believe that this has been thought about and was done to avoid the dependency; i.e. #include <fstream> should not force one to #include <string>.
To be honest, this seems like quite an inconsequential issue. A better question would be, why is std::string's interface so large?
Nowadays you can solve this problem very easily: add -std=c++11 to your CFLAGS.