Does an empty string contain an empty string in C++? - c++

Just had an interesting argument in the comment to one of my questions. My opponent claims that the statement "" does not contain "" is wrong.
My reasoning is that if "" contained another "", that one would also contain "" and so on.
Who is wrong?
P.S.
I am talking about a std::string
P.S. P.S
I was not talking about substrings, but even if I add to my question " as a substring", it still makes no sense. An empty substring is nonsense. If you allow empty substrings to be contained in strings, that means you have an infinity of empty substrings. What is the point of that?
Edit:
Am I the only one that thinks there's something wrong with the function std::string::find?
C++ reference clearly says
Return Value: The position of the first character of the first match.
Ok, let's assume it makes sense for a minute and run this code:
string empty1 = "";
string empty2 = "";
int postition = empty1.find(empty2);
cout << "found \"\" at index " << position << endl;
The output is: found "" at index 0
Nonsense part: how can there be index 0 in a string of length 0? It is nonsense.
To be able to even have a 0th position, the string must be at least 1 character long.
And C++ is giving a exception in this case, which proves my point:
cout << empty2.at( empty1.find(empty2) ) << endl;
If it really contained an empty string it would had no problem printing it out.

It depends on what you mean by "contains".
The empty string is a substring of the empty string, and so is contained in that sense.
On the other hand, if you consider a string as a collection of characters, the empty string can't contain the empty string, because its elements are characters, not strings.
Relating to sets, the set
{2}
is a subset of the set
A = {1, 2, 3}
but {2} is not a member of A - all A's members are numbers, not sets.
In the same way, {} is a subset of {}, but {} is not an element in {} (it can't be because it's empty).
So you're both right.

C++ agrees with your "opponent":
#include <iostream>
#include <string>
using namespace std;
int main()
{
bool contains = string("").find(string("")) != string::npos;
cout << "\"\" contains \"\": "
<< boolalpha << contains;
}
Output: "" contains "": true
Demo

It's easy. String A contains sub-string B if there is an argument offset such that A.substr(offset, B.size()) == B. No special cases for empty strings needed.
So, let's see. std::string("").substr(0,0) turns out to be std::string(""). And we can even check your "counter-example". std::string("").substr(0,0).substr(0,0) is also well-defined and empty. Turtles all the way down.

The first thing that is unclear is whether you are talking about std::string or null terminated C strings, the second thing is why should it matter?. I will assume std::string.
The requirements on std::string determine how the component must behave, not what its internal representation must be (although some of the requirements affect the internal representation). As long as the requirements for the component are met, whether it holds something internally is an implementation detail that you might not even be able to test.
In the particular case of an empty string, there is nothing that mandates that it holds anything. It could just hold a size member set to 0 and a pointer (for the dynamically allocated memory if/when not empty) also set to 0. The requirement in operator[] requires that it returns a reference to a character with value 0, but since that character cannot be modified without causing undefined behavior, and since strict aliasing rules allow reading from an lvalue of char type, the implementation could just return a reference to one of the bytes in the size member (all set to 0) in the case of an empty string.
Some implementations of std::string use small object optimizations, in those implementations there will be memory reserved for small strings, including an empty string. While the std::string will obviously not contain a std::string internally, it might contain the sequence of characters that compose an empty string (i.e. a terminating null character)

empty string doesn't contain anything - it's EMPTY. :)

Of course an empty string does not contain an empty string. It'll be turtles all the way down if it did.
Take String empty = ""; that is declaring a string literal that is empty, if you want a string literal to represent a string literal that is empty you would need String representsEMpty = """"; but of course, you need to escape it, giving you string actuallyRepresentsEmpty = "\"\"";
ps, I am taking a pragmatic approach to this. Leave the maths nonsense at the door.
Thinking about you amendment, it could be possible that your 'opponent' meant was that an 'empty' std::string still has an internal storage for characters which is itself empty of characters. That would be an implementation detail I am sure, it could perhaps just keep a certain size (say 10) array of characters 'just incase', so it will technically not be empty.
Of course, there is the trick question answer that 'nothing' fits into anything infinite times, a sort of 'divide by zero' situation.

Today I had the same question since I'm currently bound to a lousy STL implementation (dating back to the pre-C++98 era) that differs from C++98 and all following standards:
TEST_ASSERT(std::string().find(std::string()) == string::npos); // WRONG!!! (non-standard)
This is especially bad if you try to write portable code because it's so hard to prove that no feature depends on that behaviour. Sadly in my case that's actually true: it does string processing to shorten phone numbers input depending on a subscriber line spec.
On Cppreference, I see in std::basic_string::find an explicit description about empty strings that I think matches exactly the case in question:
an empty substring is found at pos if and only if pos <= size()
The referred pos defines the position where to start the search, it defaults to 0 (the beginning).
A standard-compliant C++ Standard Library will pass the following tests:
TEST_ASSERT(std::string().find(std::string()) == 0);
TEST_ASSERT(std::string().substr(0, 0).empty());
TEST_ASSERT(std::string().substr().empty());
This interpretation of "contain" answers the question with yes.

Related

Is the comparison of strings or string views terminated at a null-character?

May a string or string_view include '\0' characters so that the following code prints 1 twice?
Or is this just implementation-defined?
#include <iostream>
#include <string_view>
#include <string>
using namespace std;
int main()
{
string_view sv( "\0hello world", 12 );
cout << (sv == sv) << endl;
string str( sv );
cout << (str == sv) << endl;
}
This isn't a duplicate to the question if strings can have embedded nulls since they obviously can. What I want to ask if the comparison of strings or string views is terminated at a 0-character.
Language lawyer answer since the standards documents are, by definition, the one true source of truth :-)
The standard is clear on this. In C++17 (since that's the tag you provided, but later iterations are similar), [string.operator==] states that, for using strings and/or string views, it:
Returns: lhs.compare(rhs) == 0.
The [string.compare] section further states that these all boil down to a comparison with a string view and explain that it:
Determines the effective length rlen of the strings to compare as the smaller of size() and sv.size(). The function then compares the two strings by calling traits::compare(data(), sv.data(), rlen).
These sizes are not restricted in any way by embedded nulls.
And, if you look at the traits information in table 54 of [char.traits.require], you'll see it's as clear as mud until you separate it out into sections:
X::compare(p,q,n) Returns int:
0 if for each i in [0,n), X::eq(p[i],q[i]) is true; else
a negative value if, for some j in [0,n), X::lt(p[j],q[j]) is true and for each i in [0,j) X::eq(p[i],q[i]) is true; else
a positive value.
The first bullet point is easy, it gives zero if every single character is equal.
The second is a little harder but it basically gives a negative value where the first difference between characters has the first string on the lower side (all previous characters are equal and the offending character is lower in the first string).
The third is just the default "if it's neither equal nor lesser, it must be greater".
nul-character is part of comparison, see https://en.cppreference.com/w/cpp/string/basic_string/operator_cmp
Two strings are equal if both the size of lhs and rhs are equal and each character in lhs has equivalent character in rhs at the same position.

Is the size of an std::string always the number of printed characters?

I'm trying to understand what std::string::size() returns.
According to https://en.cppreference.com/w/cpp/string/basic_string/size it's the "number of CharT elements in the string", but I'm not sure how that relates to the number of printed characters, especially if string termination characters are involved somehow.
This code
int main()
{
std::string str0 = "foo" "\0" "bar";
cout << str0 << endl;
cout << str0.size() << endl;
std::string str1 = "foo0bar";
str1[3] = '\0';
cout << str1 << endl;
cout << str1.size() << endl;
return 0;
}
prints
foo
3
foobar
7
In the case of str0, the size matches the number of printed characters. I assume the constructor iterates on the characters of the string literal until it reaches \0, which is why only 'f', 'o' and 'o' are put in the std::string, i.e. 3 characters, and the string termination character is not put in the std::string.
In the case of str1, the size doesn't match the number of printed characters. I assume the same went on as what I described above, but that I broke something by assigning a character. According to cppreference.com, "the behavior is undefined if this character is modified to any value other than CharT()", so I assume I've walked into undefined behavior here.
My question is this: outside of undefined behavior, is it possible that the size of a std::string doesn't match the number of printed characters, or is it actually something guaranteed by the standard?
(note: if the answer to that question changed between versions of the standard I'm interested in knowing that too)
In the case of str1 ... the behavior is undefined if this character is modified to any value other than CharT(), so I assume I've walked into undefined behavior here.
Your assumption is wrong. There is no UB for two reasons:
You did assign the element to '\0' which happens to be same as CharT() and thus it would be well defined to assign that value to str1[str1.size()].
Furthermore, str1.size() is 7 as you demonstrated and 3 is less than 7 and is therefore within bounds and it would be well defined to assign any value to that element.
is it possible that the size of a std::string doesn't match the number of printed characters
Yes, it is possible. std::string can contain non-printable characters as well, and thus the size is not necessarily the same as the number of printed characters. Your example str1 has no undefined behaviour and demonstrates how size can be different from number of printed characters.
Besides non-printable characters, in some character encodings - notably in unicode - grapheme clusters may consist of multiple graphemes which may consist of multiple code points which may consist of multiple code units (code unit is a single char object). The size of the string is the number of chars i.e. the number of code units. Thus, one should not expect the size of the string to match the number of printed characters.
or is it actually something guaranteed by the standard?
No such guarantee exists.
if the answer to that question changed between versions of the standard I'm interested in knowing that too
There has been no change regarding this.
std::string has several constructors, one of which receives const char* and that's the one that constructs str0. Because there's no length information provided, the string will just be initialized until the null termination character is found
In case of str1 then the string length is really 7 characters. When you replace str1[3] with '\0' then the string doesn't change its length, but the content is now "foo\0bar". Unlike C string, std::string can contain embedded null because it has the length information. Therefore when you cout << str1 << endl; exactly 7 bytes are printed out. It's just that you don't see the byte '\0' in the output because it's ASCII NUL which isn't a printable character
It's recommended to use the s suffix to construct the std::string faster and with the ability to construct from a string with embedded null directly without resorting to another constructor. Try auto str0 = "foo\0bar"s; and see

std::string resize is ruining compare operator (==)

std::string resize causes strings that appear to be equal to no longer be equal. It can appear to be misleading when I hover over the variable in my debugger and they appear to hold the same value.
I think it comes down to the fact that I expected the == operator to stop at the first null character but it keeps going till the end of the size. I'm sure this is working as intended but I was stuck on an issue caused by this for a while so I wanted to see why you would keep comparing characters even after the first null character. thanks!
int main(void)
{
std::string test1;
test1.resize(10);
test1[0] = 'a';
std::string test2 = "a";
//they are not equal
bool same = (test1 == test2);
return 0;
}
test1 is the string "a\0\0\0\0\0\0\0\0\0". test2 is the string "a". They are not equal.
std::string can contain null characters. Its length is not the distance to the first null character. It does also guarantee that the memory buffer containing the characters of the string ends with an additional null character 1 beyond its length.
If you don't intend for the string to be longer but just want the memory, use std::string::reserve. Note that you cannot access elements beyond the end with [] legally, but pushing back or whatever won't cause any new memory allocations until you pass the reserve limit.
This is the intended behavior of std::string. Unlike a c-string a std::string can have as many null characters as you want. For instance "this\0 is\0 a\0 legal\0 std::string\0" would be legal to have as the contents for a std::string. You have to build it like
std::string nulls_inside("this\0 is\0 a\0 legal\0 std::string\0", sizeof("this\0 is\0 a\0 legal\0 std::string\0");
but you can also insert null characters into an existing std::string. In your case you're comparing
"a\0\0\0\0\0\0\0\0\0\0"
against
"a\0"
so it fails.

String comparison of char* to uint8_t [duplicate]

This question already has an answer here:
Comparing uint8_t data with string
(1 answer)
Closed 4 years ago.
I'm new to C and C++, and can't seem to work out how I need to compare these values:
Variable I'm being passed:
typedef struct {
uint8_t ssid[33];
String I want to match. I've tried both of these:
uint8_t AP_Match = "MatchString";
unsigned char* AP_Match = "MatchString";
How I've attempted to match:
if (strncmp(list[i].ssid, "MatchString")) {
if (list[i].ssid == AP_Match) {
if (list[i].ssid == "MatchString") {
// This one fails because String is undeclared, despite having
// an include line for string.h
if (String(reinterpret_cast<const char*>(conf.sta.ssid)) == 'MatchString') {
I've noodled around with this a few different ways, and done some searching. I know one or both of these may be the wrong type, but I'm not sure to get from where I am to working.
There is no such type as "String" defined by any C standard. A string is just an array of characters that are stored as unsigned values based on the chosen encoding. 'string.h' provides various functions for comparison, concatenation, etc. but it can only work if the values you are passing to it are coherent.
The operator "==" is also undefined for string comparisons, because it would require comparing each character at each index, for two arrays that may not be the same size and ultimately may use different encodings, despite the same underlying unsigned integer representation (raising the prospect of false positive comparisons). You can possibly define your own function to do it (note C doesn't allow overloading operators), but otherwise you're stuck with what the standard libraries provide.
Note that strncmp() takes a size parameter for the number of characters to compare (your code is missing this). https://www.tutorialspoint.com/c_standard_library/c_function_strncmp.htm
Otherwise you would be looking at the function strcmp(), which requires the input strings to be null-terminated (last character equal to '\0'). Ultimately it's up to you to consider what the possible combinations of inputs could be and how they are stored and to use a comparison function that is robust to all possibilities.
As a final side note
if (list[i].ssid == "MatchString") {
Since ssid is an array, you should know that when you do this comparison, you are not actually accessing the contents of ssid, but rather the address of the first element of ssid. When you pass list[i].ssid into strcmp (or strncmp), you are passing a pointer to the first element of the array in memory. The function then iterates over the entire array until it reaches the null character (in the case of strcmp) or until it has compared the specified number of elements (in the case of strncmp).
To match two strings use strcmp:
if (0==strcmp(str1, str2))
str1 and str2 are addresses to memory holding a null terminated string. Return value zero means the strings are equal.
In your case one of:
if (0==strcmp(list[i].ssid, AP_Match))
if (0==strcmp(list[i].ssid, "MatchString"))

Is this expression valid?

I've met this code:
std::string str;
std::getline(std::cin, str);
std::string sub = str.substr(str.find('.') + 1);
first reaction was - this is invalid code. But after some thoughts it seems to be not so simple. So is it valid C++ expression (has predictable behavior)?
PS if it is not so clear question is mostly related to what will happen when '.' would not be found in str, but not limited to that, as there could be other issues.
str.find('.') returns the index of the first occurrence of the character in your string. substr with one argument returns the suffix of the string starting at the given index. So what the line does is it returns the tail of the string starting right after the first dot. So if str is "hello.good.bye", sub will be good.bye.
However, there is potentially a problem with the code if the string does not in fact contain any dots. It will return the whole string. This may or may not be intended. This happens because if there is no dot, find will return npos, which is the largest number std::string::size_type can hold. Add 1 to it, and you will get 0 (that's how unsigned types behave, modulo 2n).
So the code always has predictable behavior.
From http://en.cppreference.com/w/cpp/string/basic_string/npos it seems that std::string::npos = -1;. That value is returned when the find is unsuccessful. In that case, the str.substr(0) will return the entire string.
It would seem it is valid and predictable code.
If you're asking about what happens if . is not found, you don't have to worry. std::string::find then returns std::string::npos, which is defined to be -1 by the standard, which, after adding 1 overflows and makes the argument 0:
std::string sub = str.substr(0);
which gives you the whole string. I don't know if that's the desired behavior, but it's certainly not undefined behavior.
Actually, in this specific case--because the string that contains "...does not matter..." actually does--the find() call is not going to find what it is looking for and so will return std::string::npos. You then add 1 to this.
npos is of type std::string::size_type, which is generally size_t, which is usually an unsigned integer of some sort.
npos is defined as the greatest possible value for size_type.
With size_type being unsigned adding 1 to the greatest possible value creates 0.
So you're calling std::string::substr(0). What does this do? It creates a copy of the whole string you called it on because substr takes a starting position and length (defaulting to npos, or "all the way to the end").
As far as I can see it is valid though it isn't particularly readable.
If str is empty then the find() method will return std::string::npos. This is equivalent to the largest unsigned int representable by std::size_type. By adding 1 to this you're causing an integer overflow and it will wrap around to 0. This means the substr() method is attempting to create a string using chars from position 0 till the end of the string. If str is empty then sub is also empty.