std::string comparison, lexicographical or not

std::string comparison, lexicographical or not - c++

The following code, comes from the article C++ quirks, part 198276
include <iostream>
#include <string>
using namespace std;
int main()
{
std::string a = "\x7f";
std::string b = "\x80";
cout << (a < b) << endl;
cout << (a[0] < b[0]) << endl;
return 0;
}
Surprisingly the output is
1
0
Shouldn't string comparison be lexicographical ? If yes how is the output explained?

There is nothing in the C++ specification to say if char is signed or unsigned, it's up to the compiler. For your compiler it seems that char defaults to signed char which is why the second comparison returns false.

So I'm just going to quote directly from your link:
It turns out that this behavior is required by the standard, in section 21.2.3.1 [char.traits.specializations.char]: “The two-argument members eq and lt shall be defined identically to the built-in operators == and < for type unsigned char .”
So:
(a < b) is required to use unsigned char comparisons.
(a[0] < b[0]) is required to use char comparisons, which may or may not be signed.

Related

Difference between char in C and C++? [duplicate]

This question already has answers here:
Why are C character literals ints instead of chars?
(11 answers)
Closed 1 year ago.
I know that C and C++ are different languages.
Code - C
#include <stdio.h>
int main()
{
printf("%zu",sizeof('a'));
return 0;
}
Output
4
Code- C++
#include <iostream>
int main()
{
std::cout<<sizeof('a');
return 0;
}
Output
1
https://stackoverflow.com/a/14822074/11862989 in this answer user Kerrek SB(438k Rep.) telling about types in C++ nor mentions char neither int but just integral.
is char in C++ is integral type or strict char type ?

is char in C++ is integral type or strict char type ?
Character types, such as char, are integral types in C++.
The type of narrow character constant in C is int, while the type of narrow character literal in C++ is char.

is char in C++ is integral type or strict char type ?
Usage of type_traits lets you know the type:
#include <iostream>
#include <type_traits>
int main()
{
std::cout << std::is_integral<char>();
}
Output:
1

As others mentioned, In C 'a' is char constant and treated as an integer.
In C++ it is integral.
Also you can check the difference between char c = 'a' and 'a' in C++ by using RTTI (run time type info) as follows:
#include <iostream>
#include <typeinfo>
using namespace std;
int main()
{
char c = 'a';
// Get the type info using typeid operator
const type_info& ti2 = typeid('a');
const type_info& ti3 = typeid(c);
// Check if both types are same
if (ti2 != ti3)
cout << "different type" << endl;
else
cout << "same type"<< endl;
return 0;
}
The output is : same type.
However, char c = 'a' and 'a' are NOT same in C.

C++ Safely and Efficiently Casting std::weak_ordering to int

C++20 is introducing a new comparison type: std::weak_ordering.
It allows for representing less than, equal to, or greater than.
However, some older functions use an int for a similar purpose. Such as qsort, which uses the signature
int compar (const void* p1, const void* p2);
How can I cast std::weak_ordering to int for the use in a function such as qsort?
Here is an example situation:
#include <compare>
#include <iostream>
int main() {
long a = 2354, b = 1234;
std::weak_ordering cmp = a <=> b;
if (cmp > 0) std::cout << "a is greater than b" << std::endl;
if (cmp == 0) std::cout << "a is equal to b" << std::endl;
if (cmp < 0) std::cout << "a is less than b" << std::endl;
int equivalent_cmp = cmp; // errors
}
In testing, I noticed that using a reinterpret_cast to int8_t type does work, but I am not sure if this would be portable.
int equivalent_cmp = *(int8_t *)&cmp;
or equivalently,
int equivalent_cmp = *reinterpret_cast<int8_t*>(&cmp);
Is this safe?
Furthermore, there are some other solutions that can work, but are inefficient compared this "unsafe" method. All of these would be slower than the above solutions
int equivalent_cmp = (a > b) - (a < b);
or
int equivalent_cmp;
if (cmp < 0) equivalent_cmp = -1;
else if (cmp == 0) equivalent_cmp = 0;
else equivalent_cmp = 1;
Is there a better solution that would be guaranteed to work?

Is there a better solution that would be guaranteed to work?
No.
The standard does not specify the contents or representation of the ordering classes. Barry's answer is based on reasonable assumptions, that are likely to hold, but they are not guaranteed.
Should you need it, your best bet is to write something like your last snippet
constexpr int ordering_as_int(std::weak_ordering cmp) noexcept {
return (cmp < 0) ? -1 : ((cmp == 0) ? 0 : 1);
}

How can I cast std::weak_ordering to int for the use in a function such as qsort?
The easy answer is: don't use qsort, use std::sort, it'll perform better anyway.
That said, we know that std::weak_ordering has to have some integral type member, and C++20 does come with a mechanism to pull it out: std::bit_cast:
static_assert(std::bit_cast<int8_t>(0 <=> 1) == -1);
The rule is that the type you're casting to (in this case int8_t) has to be the same size as the type you're casting from (in this case std::strong_ordering). That's a constraint on bit_cast, so it's safe - if the implementation actually stores an int instead of an int8_t, this won't compile.
So more generally, you'd have to write a short metaprogram to determine the correct signed integer type to cast into.
Note that while weak_ordering and strong_ordering will just be implemented as storing an integer (though not int as illustrated in the standard), partial_ordering will probably not be implemented as storing an int and a bool - it will likely still be implemented as a single integer. So this trick won't work.

Purpose of using (int) in the code below?

n=b.size()
n = max(n,(int)a.size());
where a and b are some user-Input strings and n is an integer. would anybody tell me why we use (int)a.size() and what is the purpose of using (int).

I am assuming that n is of type int, and your program will be something like this :
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string a ("Test string");
string b ("Test two");
int n = b.size() ;
n = max(n,(int)a.size());
cout << "n : " << n ;
return 0;
}
Now if you see the documentation for the .size() method of the string class, you will see it returns value of type : size_t
size_t is an unsigned integral type (the same as member type
string::size_type)
: as per documentation
Now when we look at the documentation for max() you can see it uses templating (you can read more about templating here) what it essentially means is that you can use any type as parameter (int, float, etc..) but both the parameters need to be the same type.
Now since n was declared as an int when calling max(n,x);, x needs to be type of n which basically means int in our case.
Now this is the reason for using (int) before a.size(). What we are doing here is type casting, since a.size() returns in type size_t which is different from int (You can read more about this here), we need to typecast the return value to int which can be done by (int)a.size().
SIDE NOTE
int n = max(b.size(),a.size());
cout << "n : " << n << " \n";
would also work, since both are same type so no need to do type casting.

Output ASCII value of character

#include <iostream>
using namespace std;
int main()
{
char x;
cout << "enter a character:";
cin >> x;
cout << "ASCII Value of " << x << "is" << string(x);
return 0 ;
}
the error is
main.cpp||In function 'int main()':|
main.cpp|10|error: invalid conversion from 'char' to 'const char*'|
main.cpp|10|error: initializing argument 1 of 'std::basic_string<_CharT, _Traits,_Alloc>::basic_string(const _CharT*, const _Alloc&) [with _CharT = char, _Traits = std::char_traits<char>, _Alloc = std::allocator<char>]'|
||=== Build finished: 2 errors, 0 warnings ===|

std::cout << "ASCII Value of " << x << "is" << (int)x;
is one way (the cast circumvents the special treatement of a char type by the I/O stream library), but this will output your platform's encoded value of the character, which is not necessarily ASCII.
A portable solution is much more complex: You'll need to encode the ASCII set in a 128 element array of elements capable of storing a 7 bit unsigned value, and map x to a suitable element of that.

There are 3 approaches to solving this problem:
Use to_string
Passing the correct value to cout
Using the std::string class correctly
The solutions are marked (numbers in comment).
Use std::to_string
Since C++11, there is function to convert numbers to a string (to_string):
/*(1)*/ std::cout << std::to_string( x );
There is no specialization for a char parameter. So the value is implictly converted.
Passing the correct value to cout
cout would display the value of char object as a character.
If we want to output the value of a char object, we need to convert it to a type which is output by cout as a number instead of a character.
The C++ standard guarantees:
1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
So any of those integer types can be used. Usually int is selected.
There are 4 conversions that can be used here:
1) Implicit - "Implicit conversions are performed whenever an expression of some type T1 is used in context that does not accept that type, but accepts some other type T2;"
/*(2)*/ int i = x;
std::cout << i;
2) Explicit - "Converts between types using a combination of explicit and implicit conversions."
/*(3)*/ std::cout << (int)x;
/*(4)*/ std::cout << int(x); // unsigned int(x) - is invalid,
// has to be a single-word type name
3) A named cast.
/*(5)*/ std::cout << static_cast<int>(x);
4) Use the T{e} notation for construction
/*(6)*/ std::cout << int{x};
The T{e} construction syntax makes it explicit that construction is desired. The T{e} construction syntax doesn’t allow narrowing. T{e} is the only safe and general expression for constructing a value of type T from an expression e. The casts notations T(e) and (T)e are neither safe nor general.
About conversions the C++ Core Guidelines specifies the following (among others)
ES.48: Avoid casts
ES.49: If you must use a cast, use a named cast
ES.64: Use the T{e}notation for construction
In this case I would suggest (3) or (4).
Using the std::string class correctly
string is a specialization of basic_string
using string = basic_string<char>;
basic_string has many constructors.
There are only 2 constructors, which can take a predefined number of chars;
basic_string( size_type count, CharT ch, const Allocator& alloc = Allocator() );
Constructs the string with count copies of character ch. The behavior is undefined if count >= npos.
/*(7)*/ std::string s = std::string( 1, x );
basic_string( const CharT* s, size_type count, const Allocator& alloc = Allocator() );
Constructs the string with the first count characters of character string pointed to by s. s can contain null characters. The length of the string is count. The behavior is undefined if s does not point at an array of at least count elements of CharT, including the case when s is a null pointer.
/*(8)*/ std::string s = std::string( &x, 1 );

#include <iostream>
using namespace std;
int main()
{
char x;
cout<< "enter a character:";
cin>>x;
cout<< "ASCII Value of "<< x<< "is"<< int(x);
return 0 ;
}
you mean return try this code

#include <iostream>
using namespace std;
int main()
{
char x;
cout<< "enter a character:";
cin>>x;
cout<< "ASCII Value of "<< x<< "is"<< char(x);
return 0 ;
}
try this its called return

What is wrong with my For loops? i get warnings: comparison between signed and unsigned integer expressions [-Wsign-compare]

#include <iostream>
#include <string>
#include <vector>
#include <sstream>
using namespace std;
int main() {
vector<double> vector_double;
vector<string> vector_string;
...
while (cin >> sample_string)
{
...
}
for(int i = 0; i <= vector_string.size(); i++)
{
....
}
for (int i = 0; i < vector_double.size(); i++)
....
return 0;
}

Why is there a warning with -Wsign-compare ?
As the name of the warning, and its text, imply, the issue is that you are comparing a signed and an unsigned integer. It is generally assumed that this is an accident.
In order to avoid this warning, you simply need to ensure that both operands of < (or any other comparison operator) are either both signed or both unsigned.
How could I do better ?
The idiomatic way of writing a for loop is to initialize both the counter and the limit in the first statement:
for (std::size_t i = 0, max = vec.size(); i != max; ++i)
This saves recomputing size() at each iteration.
You could also (and probably should) use iterators instead of indices:
for (auto it = vec.begin(), end = vec.end(); it != end; ++it)
auto here is a shorthand for std::vector<int>::iterator. Iterators work for any kind of containers, whereas indices limit you to C-arrays, deque and vector.

It is because the .size() function from the vector class is not of type int but of type vector::size_type
Use that or auto i = 0u and the messages should disappear.

int is signed by default - it is equivalent to writing signed int. The reason you get a warning is because size() returns a vector::size_type which is more than likely unsigned.
This has potential danger since signed int and unsigned int hold different ranges of values. signed int can hold values between –2147483648 to 2147483647 while an unsigned int can hold values between 0 to 4294967295 (assuming int is 32 bits).

I usually solve it like this:
for(int i = 0; i <= (int)vector_string.size(); i++)
I use the C-style cast because it's shorter and more readable than the C++ static_cast<int>(), and accomplishes the same thing.
There's a potential for overflow here, but only if your vector size is larger than the largest int, typically 2147483647. I've never in my life had a vector that large. If there's even a remote possibility of using a larger vector, one of the answers suggesting size_type would be more appropriate.
I don't worry about calling size() repeatedly in the loop, since it's likely an inline access to a member variable that introduces no overhead.

You get this warning because the size of a container in C++ is an unsigned type and mixing signed/unsigned types is dangerous.
What I do normally is
for (int i=0,n=v.size(); i<n; i++)
....
this is in my opinion the best way to use indexes because using an unsigned type for an index (or the size of a container) is a logical mistake.
Unsigned types should be used only when you care about the bit representation and when you are going to use the modulo-(2**n) behavior on overflow. Using unsigned types just because a value is never negative is a nonsense.
A typical bug of using unsigned types for sizes or indexes is for example
// Draw all lines between adjacent points
for (size_t i=0; i<pts.size()-1; i++)
drawLine(pts[i], pts[i+1]);
the above code is UB when the point array is empty because in C++ 0u-1 is a huge positive number.
The reason for which C++ uses an unsigned type for size of containers is because of an historical heritage from 16-bit computers (and IMO given C++ semantic with unsigned types it was the wrong choice even back then).

Your variable i is an integer while the size member function of vector which returns an Allocator::size_type is most likely returning a size_t, which is almost always implemented as an unsigned int of some size.

Make your int i as size_type i.
std::vector::size() will return size_type which is an unsigned int as size cannot be -ve.
The warning is obviously because you are comparing signed integer with unsigned integer.

Answering after so many answers, but no one noted the loop end.. So, here's my full answer:
To remove the warning, change the i's type to be unsigned, auto (for C++11), or std::vector< your_type >::size_type
Your for loops will seg-fault, if you use this i as index - you must loop from 0 to size-1, inclusive. So, change it to be
for( std::vector< your_type >::size_type i = 0; i < vector_xxx.size(); ++i )
(note the <, not <=; my advise is not to use <= with .begin() - 1, because you can have a 0 size vector and you will have issues with that :) ).
To make this more generic, as you're using a container and you're iterating through it, you can use iterators. This will make easier future change of the container type (if you don't need the exact position as number, of course). So, I would write it like this:
for( std::vector< your_type >::iterator iter = vector_XXX.begin();
iter != vector_XXX.end();
++iter )
{
//..
}

Declaring 'size_t i' for me work well.

std::cout << -1U << std::endl;
std::cout << (unsigned)-1 << std::endl;
4294967295
std::cout << 1 << std::endl;
std::cout << (signed)1 << std::endl;
1
std::cout << (unsigned short)-1 << std::endl;
65535
std::cout << (signed)-1U << std::endl;
std::cout << (signed)4294967295 << std::endl;
-1
unsign your index variable
unsigned int index;
index < vecArray.size() // size() would never be negative

Some answers are suggesting using auto, but that won't work as int is the default type deduced from integer literals. Pre c++23 you have to explicitly specify the type std::size_t defined in cstddef header
for(std::size_t i = 0; i <= vector_string.size(); i++)
{
....
}
In c++23 the integral literal zu was added, the motivation was indeed to allow the correct type to be deduced.
for(auto i = 0zu; i <= vector_string.size(); i++)
{
....
}
But unfortunately no compiler support this feature yet.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::string comparison, lexicographical or not - c++

There is nothing in the C++ specification to say if char is signed or unsigned, it's up to the compiler. For your compiler it seems that char defaults to signed char which is why the second comparison returns false.

Related

Difference between char in C and C++? [duplicate]

C++ Safely and Efficiently Casting std::weak_ordering to int

Purpose of using (int) in the code below?

Output ASCII value of character

What is wrong with my For loops? i get warnings: comparison between signed and unsigned integer expressions [-Wsign-compare]

Categories

Resources