Subtracting two strings is giving wrong values in c++ - c++

#include<iostream>
#include<string>
using namespace std;
int main(){
string s1 = "abc";
string s2 = "xyz";
cout << s2.compare(s1) << endl;
return 0;
}
This is the Simple program in which I am just comparing two string and print the return value of the string::compare function.
Output:
1
The actual output for this program is 23.
This similar thing is happening with the ASCII characters.
#include<iostream>
#include<string>
using namespace std;
int main(){
cout << "a" - "A" << endl;
return 0;
}
The output I am getting after running the above code is:-
Output:
-2
But the output I was expecting is 32. I don't know what's the problem and why I am getting wrong output.

The output of std::string::compare() is according to the documentation:
Return value
negative value if *this appears before the character sequence specified by the arguments, in lexicographical order
zero if both character sequences compare equivalent
positive value if *this appears after the character sequence specified by the arguments, in lexicographical order
s2 is lexicographically later than s1, so compare returns a positive value, exactly as expected.
"a" - "A" is Undefined Behaviour. You are subtracting value of two pointers that do not point to the same array. Any output would be valid, as well as no output, crash, or demons flying out of your nose.
I suppose you wanted to subtract numeric values of characters, which would in fact give 32 (provided that your compiler is using an ASCII-compliant enconding):
'a' - 'A'

The return value of std::string::compare is:
negative value if *this appears before the character sequence specified by the arguments, in lexicographical order
zero if both character sequences compare equivalent
positive value if *this appears after the character sequence specified by the arguments, in lexicographical order
As you see, it is only defined as "negative / zero / positive" and the "negative" and "positive" values need not be -1 and 1.
"a" and "A" are string literals, which represents arrays. Arrays in expressions are automatically converted to pointers pointing at the first elements of them (some exception exists) and substraction of pointers will result in an integer representing the first pointer is how many elements from the second one.
What you wanted to use should be character constants, which are surrounded by ' instead of ".
cout << 'a' - 'A' << endl;
Alternatively, you can get the first elements of the arrays using subscripting operators.
cout << "a"[0] - "A"[0] << endl;

std::string::compare returns a positive or negative number or 0, it may return 1, -1 and 0 but doesn't have to.
"a" - "A" is performing pointer arithmetic on two unrelated pointers so has undefined behaviour. The behaviour you are seeing is probably that the compiler has laid out your constants in memory as "a\0A\0" so your constants are 2 bytes apart giving a result of -2.

Related

Is the comparison of strings or string views terminated at a null-character?

May a string or string_view include '\0' characters so that the following code prints 1 twice?
Or is this just implementation-defined?
#include <iostream>
#include <string_view>
#include <string>
using namespace std;
int main()
{
string_view sv( "\0hello world", 12 );
cout << (sv == sv) << endl;
string str( sv );
cout << (str == sv) << endl;
}
This isn't a duplicate to the question if strings can have embedded nulls since they obviously can. What I want to ask if the comparison of strings or string views is terminated at a 0-character.
Language lawyer answer since the standards documents are, by definition, the one true source of truth :-)
The standard is clear on this. In C++17 (since that's the tag you provided, but later iterations are similar), [string.operator==] states that, for using strings and/or string views, it:
Returns: lhs.compare(rhs) == 0.
The [string.compare] section further states that these all boil down to a comparison with a string view and explain that it:
Determines the effective length rlen of the strings to compare as the smaller of size() and sv.size(). The function then compares the two strings by calling traits::compare(data(), sv.data(), rlen).
These sizes are not restricted in any way by embedded nulls.
And, if you look at the traits information in table 54 of [char.traits.require], you'll see it's as clear as mud until you separate it out into sections:
X::compare(p,q,n) Returns int:
0 if for each i in [0,n), X::eq(p[i],q[i]) is true; else
a negative value if, for some j in [0,n), X::lt(p[j],q[j]) is true and for each i in [0,j) X::eq(p[i],q[i]) is true; else
a positive value.
The first bullet point is easy, it gives zero if every single character is equal.
The second is a little harder but it basically gives a negative value where the first difference between characters has the first string on the lower side (all previous characters are equal and the offending character is lower in the first string).
The third is just the default "if it's neither equal nor lesser, it must be greater".
nul-character is part of comparison, see https://en.cppreference.com/w/cpp/string/basic_string/operator_cmp
Two strings are equal if both the size of lhs and rhs are equal and each character in lhs has equivalent character in rhs at the same position.

Is the size of an std::string always the number of printed characters?

I'm trying to understand what std::string::size() returns.
According to https://en.cppreference.com/w/cpp/string/basic_string/size it's the "number of CharT elements in the string", but I'm not sure how that relates to the number of printed characters, especially if string termination characters are involved somehow.
This code
int main()
{
std::string str0 = "foo" "\0" "bar";
cout << str0 << endl;
cout << str0.size() << endl;
std::string str1 = "foo0bar";
str1[3] = '\0';
cout << str1 << endl;
cout << str1.size() << endl;
return 0;
}
prints
foo
3
foobar
7
In the case of str0, the size matches the number of printed characters. I assume the constructor iterates on the characters of the string literal until it reaches \0, which is why only 'f', 'o' and 'o' are put in the std::string, i.e. 3 characters, and the string termination character is not put in the std::string.
In the case of str1, the size doesn't match the number of printed characters. I assume the same went on as what I described above, but that I broke something by assigning a character. According to cppreference.com, "the behavior is undefined if this character is modified to any value other than CharT()", so I assume I've walked into undefined behavior here.
My question is this: outside of undefined behavior, is it possible that the size of a std::string doesn't match the number of printed characters, or is it actually something guaranteed by the standard?
(note: if the answer to that question changed between versions of the standard I'm interested in knowing that too)
In the case of str1 ... the behavior is undefined if this character is modified to any value other than CharT(), so I assume I've walked into undefined behavior here.
Your assumption is wrong. There is no UB for two reasons:
You did assign the element to '\0' which happens to be same as CharT() and thus it would be well defined to assign that value to str1[str1.size()].
Furthermore, str1.size() is 7 as you demonstrated and 3 is less than 7 and is therefore within bounds and it would be well defined to assign any value to that element.
is it possible that the size of a std::string doesn't match the number of printed characters
Yes, it is possible. std::string can contain non-printable characters as well, and thus the size is not necessarily the same as the number of printed characters. Your example str1 has no undefined behaviour and demonstrates how size can be different from number of printed characters.
Besides non-printable characters, in some character encodings - notably in unicode - grapheme clusters may consist of multiple graphemes which may consist of multiple code points which may consist of multiple code units (code unit is a single char object). The size of the string is the number of chars i.e. the number of code units. Thus, one should not expect the size of the string to match the number of printed characters.
or is it actually something guaranteed by the standard?
No such guarantee exists.
if the answer to that question changed between versions of the standard I'm interested in knowing that too
There has been no change regarding this.
std::string has several constructors, one of which receives const char* and that's the one that constructs str0. Because there's no length information provided, the string will just be initialized until the null termination character is found
In case of str1 then the string length is really 7 characters. When you replace str1[3] with '\0' then the string doesn't change its length, but the content is now "foo\0bar". Unlike C string, std::string can contain embedded null because it has the length information. Therefore when you cout << str1 << endl; exactly 7 bytes are printed out. It's just that you don't see the byte '\0' in the output because it's ASCII NUL which isn't a printable character
It's recommended to use the s suffix to construct the std::string faster and with the ability to construct from a string with embedded null directly without resorting to another constructor. Try auto str0 = "foo\0bar"s; and see

String comparison giving different output when used with variable [duplicate]

This question already has answers here:
comparison between string literal
(4 answers)
C++ Comparison of String Literals
(8 answers)
Closed 3 years ago.
Why is the output of the following codes different?
I'm comparing two strings. I don't understand why they give different outputs?
Code 1:
#include <bits/stdc++.h>
using namespace std;
int main() {
if("35" <= "255")
{
cout << 1;
}
cout << 0;
}
Code 2:
#include <bits/stdc++.h>
using namespace std;
int main() {
string num = "35";
if(num <= "255")
{
cout << 1;
}
cout << 0;
}
The output of code 1 is 10. The output of code 2 is 0.
You made the second program different by using std::string.
std::string has an overload for the comparison operator, which compares the content of the operands lexicographically. Lexicographical ordering, which is different from numerical ordering, is same that would be used in a dictionary: 255 comes before (i.e. "is less than") 35, just like aardvark comes before zoo.
The string literals on the other hand are arrays, which will decay to a pointer to first element, and pointer comparison compares the relative location in memory, which has nothing to do with the text content, and which in this case is at best unspecified and you could either see 1 output or not.
A string is not a magic object that understands what it contains and acts differently on it.
In your case, you are comparing the address that hold a buffer of chars (containing '3', '5', 0) with the address containing another buffer of chars (containing '2', '3', '5', 0).
The output is random (in fact, it isn't, but for now, let's assume it is).
If you want to compare strings, you can use the second example (or strcmp) but that will compare the buffer content based on some logical rules, that are not those you except (you expect semantic logic, but it's not).
The rules are:
Compare each buffer char based on their ASCII/Unicode order, and return -1 if the first one is lower than the second one, 1 if it's higher. (If using < operator, it returns true if -1, false otherwise and so on)
If they are equal, continue to next char.
In the previous example, '3' is higher than '2' (even if 35 is smaller than 235).
You'll need to either convert your string to integer before comparing (and deal with potential conversion errors) or use integers from the beginning.
First convert the string to Int and then compare.
Example:
#include <iostream>
using std::cout;
string value1 = "22";
string value2 = "222";
int main()
{
if(std::stoi(value1)<=std::stoi(value2))
{
cout<<"1";
}
cout<<"0";
}

what does cout << "\n"[a==N]; do?

In the following example:
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout, but it does not print a newline when the value of a is equal to N.
I have no clue about what the [] option does in cout
This is actually not a cout option, what is happening is that "\n" is a string literal. A string literal has the type array of n const char, the [] is simply an index into an array of characters which in this case contains:
\n\0
note \0 is appended to all string literals.
The == operator results in either true or false, so the index will be:
0 if false, if a does not equal N resulting in \n
1 if true, if a equals N resulting in \0
This is rather cryptic and could have been replaced with a simple if.
For reference the C++14 standard(Lightness confirmed the draft matches the actual standard) with the closest draft being N3936 in section 2.14.5 String literals [lex.string] says (emphasis mine):
string literal has type “array of n const char”, where n is the
size of the string as defined below, and has static storage duration
(3.7).
and:
After any necessary concatenation, in translation phase 7 (2.2),
’\0’ is appended to every string literal so that programs that scan a string can find its end.
section 4.5 [conv.prom] says:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
Writing a null character to a text stream
The claim was made that writing a null character(\0) to a text stream is undefined behavior.
As far as I can tell this is a reasonable conclusion, cout is defined in terms of C stream, as we can see from 27.4.2 [narrow.stream.objects] which says:
The object cout controls output to a stream buffer associated with the object stdout, declared in
<cstdio> (27.9.2).
and the C11 draft standard in section 7.21.2 Streams says:
[...]Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line;
and printing characters are covered in 7.4 Character handling <ctype.h>:
[...]the term control character
refers to a member of a locale-specific set of characters that are not printing
characters.199) All letters and digits are printing characters.
with footnote 199 saying:
In an implementation that uses the seven-bit US ASCII character set, the printing characters are those
whose values lie from 0x20 (space) through 0x7E (tilde); the control characters are those whose
values lie from 0 (NUL) through 0x1F (US), and the character 0x7F (DEL).
and finally we can see that the result of sending a null character is not specified and we can see this is undefined behavior from section 4 Conformance which says:
[...]Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior.[...]
We can also look to the C99 rationale which says:
The set of characters required to be preserved in text stream I/O are those needed for writing C
programs; the intent is that the Standard should permit a C translator to be written in a maximally
portable fashion. Control characters such as backspace are not required for this purpose, so their
handling in text streams is not mandated.
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout
In C++ operator Precedence table, operator [] binds tighter than operator <<, so your code is equivalent to:
cout << ("\n"[a==N]); // or cout.operator <<("\n"[a==N]);
Or in other words, operator [] does nothing directly with cout. It is used only for indexing of string literal "\n"
For example for(int i = 0; i < 3; ++i) std::cout << "abcdef"[i] << std::endl; will print characters a, b and c on consecutive lines on the screen.
Because string literals in C++ are always terminated with null character('\0', L'\0', char16_t(), etc), a string literal "\n" is a const char[2] holding the characters '\n' and '\0'
In memory layout this looks like:
+--------+--------+
| '\n' | '\0' |
+--------+--------+
0 1 <-- Offset
false true <-- Result of condition (a == n)
a != n a == n <-- Case
So if a == N is true (promoted to 1), expression "\n"[a == N] results in '\0' and '\n' if result is false.
It is functionally similar (not same) to:
char anonymous[] = "\n";
int index;
if (a == N) index = 1;
else index = 0;
cout << anonymous[index];
valueof "\n"[a==N] is '\n' or '\0'
typeof "\n"[a==N] is const char
If the intention is to print nothing (Which may be different from printing '\0' depending on platform and purpose), prefer the following line of code:
if(a != N) cout << '\n';
Even if your intention is to write either '\0' or '\n' on the stream, prefer a readable code for example:
cout << (a == N ? '\0' : '\n');
It's probably intended as a bizarre way of writing
if ( a != N ) {
cout<<"\n";
}
The [] operator selects an element from an array. The string "\n" is actually an array of two characters: a new line '\n' and a string terminator '\0'. So cout<<"\n"[a==N] will print either a '\n' character or a '\0' character.
The trouble is that you're not allowed to send a '\0' character to an I/O stream in text mode. The author of that code might have noticed that nothing seemed to happen, so he assumed that cout<<'\0' is a safe way to do nothing.
In C and C++, that is a very poor assumption because of the notion of undefined behavior. If the program does something that is not covered by the specification of the standard or the particular platform, anything can happen. A fairly likely outcome in this case is that the stream will stop working entirely — no more output to cout will appear at all.
In summary, the effect is,
"Print a newline if a is not equal to N. Otherwise, I don't know. Crash or something."
… and the moral is, don't write things so cryptically.
It is not an option of cout but an array index of "\n"
The array index [a==N] evaluates to [0] or [1], and indexes the character array represented by "\n" which contains a newline and a nul character.
However passing nul to the iostream will have undefined results, and it would be better to pass a string:
cout << &("\n"[a==N]) ;
However, the code in either case is not particularly advisable and serves no particular purpose other than to obfuscate; do not regard it as an example of good practice. The following is preferable in most instances:
cout << (a != N ? "\n" : "") ;
or just:
if( a != N ) cout << `\n` ;
Each of the following lines will generate exactly the same output:
cout << "\n"[a==N]; // Never do this.
cout << (a==N)["\n"]; // Or this.
cout << *((a==N)+"\n"); // Or this.
cout << *("\n"+(a==N)); // Or this.
As the other answers have specified, this has nothing to do with std::cout. It instead is a consequence of
How the primitive (non-overloaded) subscripting operator is implemented in C and C++.
In both languages, if array is a C-style array of primitives, array[42] is syntactic sugar for *(array+42). Even worse, there's no difference between array+42 and 42+array. This leads to interesting obfuscation: Use 42[array] instead of array[42] if your goal is to utterly obfuscate your code. It goes without saying that writing 42[array] is a terrible idea if your goal is to write understandable, maintainable code.
How booleans are transformed to integers.
Given an expression of the form a[b], either a or b must be a pointer expression and the other; the other must be an integer expression. Given the expression "\n"[a==N], the "\n" represents the pointer part of that expression and the a==N represents the integer part of the expression. Here, a==N is a boolean expression that evaluates to false or true. The integer promotion rules specify that false becomes 0 and true becomes 1 on promotion to an integer.
How string literals degrade into pointers.
When a pointer is needed, arrays in C and C++ readily degrade into a pointer that points to the first element of the array.
How string literals are implemented.
Every C-style string literal is appended with the null character '\0'. This means the internal representation of your "\n" is the array {'\n', '\0'}.
Given the above, suppose a==N evaluates to false. In this case, the behavior is well-defined across all systems: You'll get a newline. If, on the other hand, a==N evaluates to true, the behavior is highly system dependent. Based on comments to answers to the question, Windows will not like that. On Unix-like systems where std::cout is piped to the terminal window, the behavior is rather benign. Nothing happens.
Just because you can write code like that doesn't mean you should. Never write code like that.

How to compare const char* with a string in C++?

I am working with C++ and I am trying to compare strings.
Below is my code which gives me back const char* -
const char* client_id() const {
return String(m_clientPos);
}
And now I am comparing the strings like this -
cout<<client_ptr->client_id()<< endl;
if (strcmp(client_ptr->client_id(), "Hello")) {
..
} else {
..
}
but it never goes into if statement. But my cout prints out Hello correctly. Is there anything wrong I am doing?
You need to do if (0 == strcmp(...
See http://www.cplusplus.com/reference/cstring/strcmp/
strcmp
Returns an integral value indicating the relationship between the strings:
A zero value indicates that both strings are equal.
A value greater than zero indicates that the first character that does not match has a greater value in str1 than in str2; And a value less than zero indicates the opposite.
it never goes into if statement.
The strcmp function returns zero when the strings are the same, so you should see the code hit the else branch when the two strings are equal to each other.
A zero value indicates that both strings are equal.
A value greater than zero indicates that the first character that does not match has a greater value in str1 than in str2;
And a value less than zero indicates the opposite.
Since String does not look like a built-in class and assuming that you have access to its source, you may be better off making the comparison with const char* a member function of the String class.