C++: is string.empty() always equivalent to string == ""?

C++: is string.empty() always equivalent to string == ""? - c++

Can I make an assumption that given
std::string str;
... // do something to str
Is the following statement is always true?
(str.empty() == (str == ""))

Answer
Yes. Here is the relevant implementation from bits/basic_string.h, the code for basic_string<_CharT, _Traits, _Alloc>:
/**
* Returns true if the %string is empty. Equivalent to *this == "".
*/
bool
empty() const
{ return this->size() == 0; }
Discussion
Even though the two forms are equivalent for std::string, you may wish to use .empty() because it is more general.
Indeed, J.F. Sebastian comments that if you switch to using std::wstring instead of std::string, then =="" won't even compile, because you can't compare a string of wchar_t with one of char. This, however, is not directly relevant to your original question, and I am 99% sure you will not switch to std::wstring.

It should be. The ANSI/ISO standard states in 21.3.3 basic_string capacity:
size_type size() const;
Returns: a count of char-like objects currently in the string.
bool empty() const;
Returns: size() == 0
However, in clause 18 of 21.3.1 basic_string constructors it states that the character-type assignment operator uses traits::length() to establish the length of the controlled sequence so you could end up with something strange if you are using a different specialization of std::basic_string<>.
I think that the 100% correct statement is that
(str.empty() == (str == std::string()))
or something like that. If you haven't done anything strange, then std::string("") and std::string() should be equivalent
They are logically similar but they are testing for different things. str.empty() is checking if the string is empty where the other is checking for equality against a C-style empty string. I would use whichever is more appropriate for what you are trying to do. If you want to know if a string is empty, then use str.empty().

str.empty() is never slower, but might be faster than str == "". This depends on implementation. So you should use str.empty() just in case.
This is a bit like using ++i instead of i++ to increase a counter (assuming you do not need the result of the increment operator itself). Your compiler might optimise, but you lose nothing using ++i, and might win something, so you are better off using ++i.
Apart from performance issues, the answer to your question is yes; both expressions are logically equivalent.

Yes (str.empty() == (str == "")) is always* true for std::string. But remember that a string can contain '\0' characters. So even though the expression s == "" may be false, s.c_str() may still return an empty C-string. For example:
#include <string>
#include <iostream>
using namespace std;
void test( const string & s ) {
bool bempty = s.empty();
bool beq = std::operator==(s, ""); // avoid global namespace operator==
const char * res = (bempty == beq ) ? "PASS" : "FAIL";
const char * isempty = bempty ? " empty " : "NOT empty ";
const char * iseq = beq ? " == \"\"" : "NOT == \"\"";
cout << res << " size=" << s.size();
cout << " c_str=\"" << s.c_str() << "\" ";
cout << isempty << iseq << endl;
}
int main() {
string s; test(s); // PASS size=0 c_str="" empty == ""
s.push_back('\0'); test(s); // PASS size=1 c_str="" NOT empty NOT == ""
s.push_back('x'); test(s); // PASS size=2 c_str="" NOT empty NOT == ""
s.push_back('\0'); test(s); // PASS size=3 c_str="" NOT empty NOT == ""
s.push_back('y'); test(s); // PASS size=4 c_str="" NOT empty NOT == ""
return 0;
}
**barring an overload of operator== in the global namespace, as others have mentioned*

Some implementations might test for the null character as the first character in the string resulting in a slight speed increase over calculating the size of the string.
I believe that this is not common however.

Normally, yes.
But if someone decides to redefine an operator then all bets are off:
bool operator == (const std::string& a, const char b[])
{
return a != b; // paging www.thedailywtf.com
}

Yes it is equivalent but allows the core code to change the implementation of what empty() actually means depending on OS/Hardware/anything and not affect your code at all. There is similiar practice in Java and .NET

Related

Why is the == operator not yielding the same result as strcmp? [duplicate]

This question already has answers here:
C++ if statements using strings not working as intended
(4 answers)
Closed 3 years ago.
I've created a two dimensional array of character pointers. I'd like to use it to create a dictionary whereby, if the variable ent is part of the dictionary, the corresponding dictionary entry for that word is retrieved if it exists. I'm currently using strcmp, but only because the == operator is giving me a hard time. I'm not sure why the == operator is not leading to the desired results.
I suspect it might have something to do with pointer comparison, as I'm comparing a pointer to a string with another pointer to a string, and not necessarily its contents.
#include <iostream>
#include <cstring>
int main() {
char *dictionary[][2] {
{"First","Monday"},
{"Second","Tuesday"},
{"Third","Wednesday"},
{"Fourth","Thursday"},
{"Fifth","Friday"},
{"Sixth","Saturday"},
{"Seventh","Sunday"},
{"",""}
};
char ent[80] = "Sixth";
for (int i{}; *dictionary[i][0]; i++) {
if (!strcmp(dictionary[i][0], ent)) {
std::cout << "Word found: " << ent
<< " corresponds to: " << dictionary[i][1]
<< std::endl;
return 0;
}
}
std::cout << ent << " not found." << std::endl;
return 1;
}
I would like to replace if (!strcmp(dictionary[i][0], word)) with something like
if (word == dictionary[i][0]) and have it yield Word found: Sixth corresponds to Saturday
If I cannot do this with the == operator, is there a way to do this through a function that uses pointers but doesn't rely on a header?
Thanks!

In the condition of the if statement
if (word == dictionary[i][0])
there are compared addresses of first characters of the strings.
In expressions arrays with rare exceptions as for example using them in the sizeof operator are implicitly converted to pointers to their first elements.
For example if you will write such an if-statement like this
if ( "hello" == "hello" ) { /*...*/ }
then the expression evaluates either to true or false depending on the compiler option that specifies whether equal string literals are stored internally as one string or as separate strings.
You could define the dictionary such a way that the type of elements of which would be std::string. In this case you can use the equality operator ==.
In this case you can compare an object of the type std::string with character arrays containing strings because the character arrays would be implicitly converted to temporary objects of the type std::string.

Less Than operator on string comparison yields same result no matter the situation

I wanted to see if the Less Than operator (<) would work on strings. Well, it did. I started to experiment with it, and it turns out, I got the same result, regardless of the situtaion. The string on the left is always less than the string on the right, even if I swap the strings. Curious on why it did this, I tried to look up what the < operator actually does for strings. I read that it does a lexicographical comparison of the two strings. Still, this did not answer why my code was doing what it was doing. For example:
int main () {
if ("A" < "B")
std::cout << "Yes";
else
std::cout << "No";
return 0;
}
It would output Yes. That makes sense. But when I swap the strings:
int main () {
if ("B" < "A")
std::cout << "Yes";
else
std::cout << "No";
return 0;
}
It would still output Yes. I don't know if I am just being ignorant right now and not fully understanding what is happening here, or if there is something wrong.

It's because a string literal gives you a pointer to a read-only array containing the string (plus its terminator). What you are comparing are not the strings but the pointers to these strings.
Use std::strcmp if you want to compare C-style strings. Or use std::string which have overloaded comparison operators defined.

You are comparing the pointer to the string "A" with the pointer to the string "B".
If you want to compare just the value in the chars then use the single quote 'A' and 'B', if you want to compare strings then use the std::string class std::string("A") < std::string("B") or strcmp()

How could I speed up comparison of std::string against string literals?

I have a bunch of code where objects of type std::string are compared for equality against string literals. Something like this:
//const std:string someString = //blahblahblah;
if( someString == "(" ) {
//do something
} else if( someString == ")" ) {
//do something else
} else if// this chain can be very long
The comparison time accumulates to a serious amount (yes, I profiled) and so it'd be nice to speed it up.
The code compares the string against numerous short string literals and this comparison can hardly be avoided. Leaving the string declared as std::string is most likely inevitable - there're thousands lines of code like that. Leaving string literals and comparison with == is also likely inevitable - rewriting the whole code would be a pain.
The problem is the STL implementation that comes with Visual C++11 uses somewhat strange approach. == is mapped onto std::operator==(const basic_string&, const char*) which calls basic_string::compare( const char* ) which in turn calls std::char_traits<char>( const char* ) which calls strlen() to compute the length of the string literal. Then the comparison runs for the two strings and lengths of both strings are passed into that comparison.
The compiler has a hard time analyzing all this and emits code that traverses the string literal twice. With short literals that's not much time but every comparison involves traversing the literal twice instead of once. Simply calling strcmp() would most likely be faster.
Is there anything I could do like perhaps writing a custom comparator class that would help avoid traversing the string literals twice in this scenario?

Similar to Dietmar's solution, but with slightly less editing: you can wrap the string (once) instead of each literal
#include <string>
#include <cstring>
struct FastLiteralWrapper {
std::string const &s;
explicit FastLiteralWrapper(std::string const &s_) : s(s_) {}
template <std::size_t ArrayLength>
bool operator== (char const (&other)[ArrayLength]) {
std::size_t const StringLength = ArrayLength - 1;
return StringLength == s.size()
&& std::memcmp(s.data(), other, StringLength) == 0;
}
};
and your code becomes:
const std:string someStdString = "blahblahblah";
// just for the context of the comparison:
FastLiteralWrapper someString(someStdString);
if( someString == "(" ) {
//do something
} else if( someString == ")" ) {
//do something else
} else if// this chain can be very long
NB. the fastest solution - at the cost of more editing - is probably to build a (perfect) hash or trie mapping string literals to enumerated constants, and then just switch on the looked-up value. Long if/else if chains usually smell bad IMO.

Well, aside from C++14's string_literal, you could easily code up a solution:
For comparison with a single character, use a character literal and:
bool operator==(const std::string& s, char c)
{
return s.size() == 1 && s[0] == c;
}
For comparison with a string literal, you can use something like this:
template<std::size_t N>
bool operator==(const std::string& s, char const (&literal)[N])
{
return s.size() == N && std::memcmp(s.data(), literal, N-1) == 0;
}
Disclaimer:
The first might even be superfluous,
Only do this if you measure an improvement over what you had.

If you have long chain of string literals to compare to there is likely some potential to deal with comparing prefixes to group common processing. Especially when comparing a known set of strings for equality with an input string, there is also the option to use a perfect hash and key the operations off an integer produced by those.
Since the use of a perfect hash will probably have the best performance but also requires major changes of the code layout, an alternative could be to determine the size of the string literals at compile time and use this size while comparing. For example:
class Literal {
char const* d_base;
std::size_t d_length;
public:
template <std::size_t Length>
Literal(char const (&base)[Length]): d_base(base), d_length(Length - 1) {}
bool operator== (std::string const& other) const {
return other.size() == this->d_length
&& !other.memcmp(this->d_base, other.c_str(), this->d_length);
}
bool operator!=(std::string const& other) const { return !(*this == other); }
};
bool operator== (std::string const& str, Literal const& literal) {
return literal == str;
}
bool operator!= (std::string const& str, Literal const& literal) {
return !(str == literal);
}
Obviously, this assumes that your literals don't embed null characters ('\0') other than the implicitly added terminating null character as the static length would otherwise be distorted. Using C++11 constexpr it would be possible to guard against that possibility but the code gets somewhat more complicated without any good reason. You'd then compare your strings using something like
if (someString == Literal("(")) {
...
}
else if (someString == Literal(")")) {
...
}

The fastest string comparison you can get is by interning the strings: Build a large hash table that contains all strings that are ever created. Ensure that whenever a string object is created, it is first looked up from the hash table, only creating a new object if no preexisting object is found. Naturally, this functionality should be encapsulated in your own string class.
Once you have done this, string comparison is equivalent to comparing their addresses.
This is actually quite an old technique first popularized with the LISP language.
The point, why this is faster, is that every string only has to be created once. If you are careful, you'll never generate the same string twice from the same input bytes, so string creation overhead is controlled by the amount of input data you work through. And hashing all your input data once is not a big deal.
The comparisons, on the other hand, tend to involve the same strings over and over again (like your comparing to literal strings) when you write some kind of a parser or interpreter. And these comparisons are reduced to a single machine instruction.

2 other ideas :
A) Build a FSA using a lexical analyser tool like flex, so the string is converted to an integer token value, depending what it matches.
B) Use length, to break up long elseif chains, possibly partly table driven
Why not get the length of the string something, at the top then just compare against the literals it could possibly match.
If there's a lot of them, it may be worth making it table driven and use a map and function pointers. You could just special case the single character literals, for example perhaps using a function lookup table.
Finding non-matches fast and the common lengths may suffice, and not require too much code restructuring, but be more maintainable as well as faster.
int len = strlen (something);
if ( ! knownliterallength[ len]) {
// not match
...
} else {
// First char may be used to index search, or literals are stored in map with *func()
switch (len)
{
case 1: // Could use a look table index by char and *func()
processchar( something[0]);
break;
case 2: // Short strings
case 3:
case 4:
processrunts( something);
break
default:
// First char used to index search, or literals are stored in map with *func()
processlong( something);
break
}
}

This is not the prettiest solution but it has proved quite fast when there is a lot of short strings to be compared (like operators and control characters/keywords in a script parser?).
Create a search tree based on string length and only compare characters. Try to represent known strings as an enumeration if this makes it cleaner in the particular implementation.
Short example:
enum StrE {
UNKNOWN = 0 ,
RIGHT_PAR ,
LEFT_PAR ,
NOT_EQUAL ,
EQUAL
};
StrE strCmp(std::string str)
{
size_t l = str.length();
switch(l)
{
case 1:
{
if(str[0] == ')') return RIGHT_PAR;
if(str[0] == '(') return LEFT_PAR;
// ...
break;
}
case 2:
{
if(str[0] == '!' && str[1] == '=') return NOT_EQUAL;
if(str[0] == '=' && str[1] == '=') return EQUAL;
// ...
break;
}
// ...
}
return UNKNOWN;
}
int main()
{
std::string input = "==";
switch(strCmp(input))
{
case RIGHT_PAR:
printf("right par");
break;
case LEFT_PAR:
printf("left par");
break;
case NOT_EQUAL:
printf("not equal");
break;
case EQUAL:
printf("equal");
break;
case UNKNOWN:
printf("unknown");
break;
}
}

!strcmp as substitute for ==

I'm working with rapidxml, so I would like to have comparisons like this in the code:
if ( searchNode->first_attribute("name")->value() == "foo" )
This gives the following warning:
comparison with string literal results in unspecified behaviour [-Waddress]
Is it a good idea to substitute it with:
if ( !strcmp(searchNode->first_attribute("name")->value() , "foo") )
Which gives no warning?
The latter looks ugly to me, but is there anything else?

You cannot in general use == to compare strings in C, since that only compares the address of the first character which is not what you want.
You must use strcmp(), but I would endorse this style:
if( strcmp(searchNode->first_attribute("name")->value(), "foo") == 0) { }
rather than using !, since that operator is a boolean operator and strcmp()'s return value is not boolean. I realize it works and is well-defined, I just consider it ugly and confused.
Of course you can wrap it:
#include <stdbool.h>
static bool first_attrib_name_is(const Node *node, const char *string)
{
return strcmp(node->first_attribute("name")->value(), string) == 0;
}
then your code becomes the slightly more palatable:
if( first_attrib_name_is(searchNode, "foo") ) { }
Note: I use the bool return type, which is standard from C99.

If the value() returns char* or const char*, you have little choice - strcmp or one of its length-limiting alternatives is what you need. If value() can be changed to return std::string, you could go back to using ==.

When comparing char* types with "==" you just compare the pointers. Use the C++ string type if you want to do the comparison with "=="

You have a few options:
You can use strcmp, but I would recommend wrapping it. e.g.
bool equals(const char* a, const char* b) {
return strcmp(a, b) == 0;
}
then you could write: if (equals(searchNode->first_attribute("name")->value(), "foo"))
You can convert the return value to a std::string and use the == operator
if (std::string(searchNode->first_attribute("name")->value()) == "foo")
That will introduce a string copy operation which, depending on context, may be undesirable.
You can use a string reference class. The purpose of a string reference class is to provide a string-like object which does not own the actual string contents. I've seen a few of these and it's simple enough to write your own, but since Boost has a string reference class, I'll use that for an example.
#include <boost/utility/string_ref.hpp>
using namespace boost;
if (string_ref(searchNode->first_attribute("name")->value()) == string_ref("foo"))

How do you construct a std::string with an embedded null?

If I want to construct a std::string with a line like:
std::string my_string("a\0b");
Where i want to have three characters in the resulting string (a, null, b), I only get one. What is the proper syntax?

Since C++14
we have been able to create literal std::string
#include <iostream>
#include <string>
int main()
{
using namespace std::string_literals;
std::string s = "pl-\0-op"s; // <- Notice the "s" at the end
// This is a std::string literal not
// a C-String literal.
std::cout << s << "\n";
}
Before C++14
The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.
To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:
std::string x("pq\0rs"); // Two characters because input assumed to be C-String
std::string x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.
Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().
Also check out Doug T's answer below about using a vector<char>.
Also check out RiaD for a C++14 solution.

If you are doing manipulation like you would with a c-style string (array of chars) consider using
std::vector<char>
You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:
std::vector<char> vec(100)
strncpy(&vec[0], "blah blah blah", 100);
std::string vecAsStr( vec.begin(), vec.end());
and you can use it in many of the same places you can use c-strings
printf("%s" &vec[0])
vec[10] = '\0';
vec[11] = 'b';
Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.

I have no idea why you'd want to do such a thing, but try this:
std::string my_string("a\0b", 3);

What new capabilities do user-defined literals add to C++? presents an elegant answer: Define
std::string operator "" _s(const char* str, size_t n)
{
return std::string(str, n);
}
then you can create your string this way:
std::string my_string("a\0b"_s);
or even so:
auto my_string = "a\0b"_s;
There's an "old style" way:
#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string
then you can define
std::string my_string(S("a\0b"));

The following will work...
std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('b');

You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.
For example, I dropped this innocent looking snippet in the middle of a program
// Create '\0' followed by '0' 40 times ;)
std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80);
std::cerr << "Entering loop.\n";
for (char & c : str) {
std::cerr << c;
// 'Q' is way cooler than '\0' or '0'
c = 'Q';
}
std::cerr << "\n";
for (char & c : str) {
std::cerr << c;
}
std::cerr << "\n";
Here is what this program output for me:
Entering loop.
Entering loop.
vector::_M_emplace_ba
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.
You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.
Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.

In C++14 you now may use literals
using namespace std::literals::string_literals;
std::string s = "a\0b"s;
std::cout << s.size(); // 3

Better to use std::vector<char> if this question isn't just for educational purposes.

anonym's answer is excellent, but there's a non-macro solution in C++98 as well:
template <size_t N>
std::string RawString(const char (&ch)[N])
{
return std::string(ch, N-1); // Again, exclude trailing `null`
}
With this function, RawString(/* literal */) will produce the same string as S(/* literal */):
std::string my_string_t(RawString("a\0b"));
std::string my_string_m(S("a\0b"));
std::cout << "Using template: " << my_string_t << std::endl;
std::cout << "Using macro: " << my_string_m << std::endl;
Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:
std::string s = S("a\0b"); // ERROR!
...so it might be preferable to use:
#define std::string(s, sizeof s - 1)
Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.

I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.
CComBSTR(20,"mystring1\0mystring2\0")

Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:
std::string s("aab");
s.at(1) = '\0';
but if you do, all your friends will laugh at you, you will never find true happiness.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++: is string.empty() always equivalent to string == ""? - c++

Can I make an assumption that given std::string str; ... // do something to str Is the following statement is always true? (str.empty() == (str == ""))

Some implementations might test for the null character as the first character in the string resulting in a slight speed increase over calculating the size of the string. I believe that this is not common however.

Normally, yes. But if someone decides to redefine an operator then all bets are off: bool operator == (const std::string& a, const char b[]) { return a != b; // paging www.thedailywtf.com }

Yes it is equivalent but allows the core code to change the implementation of what empty() actually means depending on OS/Hardware/anything and not affect your code at all. There is similiar practice in Java and .NET

Related

Why is the == operator not yielding the same result as strcmp? [duplicate]

Less Than operator on string comparison yields same result no matter the situation

How could I speed up comparison of std::string against string literals?

!strcmp as substitute for ==

How do you construct a std::string with an embedded null?

Categories

Resources