Converting char to int without sign bit propagation in C++ - c++

A byte of data is being stored in a 'char' member variable. It should probably be stored as an 'unsigned char' instead, but that can't be changed. I need to retrieve it through an 'int' variable, but without propagating the sign bit.
My solution was this (UINT and UCHAR are the obvious types):
void Foo::get_data( int *val )
{
if( val )
*val = (int)(UINT)(UCHAR)m_data; // 'm_data' is type 'char'
}
This seemed the best solution to me. I could use
*val = 0xff & (int)m_data;
instead of the casting, but this doesn't seem as readable. Which alternative is better, if either, and why?

Just write
*val = (UCHAR)m_data;
As now the expression (UCHAR)m_data has an unsigned type neither sign bit will be propagated.

The type of conversion here is Integral promotion.
When promoting to a wider integer type the value is always "widened" using its signedness, so that the sign is propagated to the new high order bits for signed values. To avoid the sign propagation convert a signed value to its corresponding unsigned type first.
You can do that with an explicit *val = static_cast<UCHAR>(m_data).
Or, safer, using as_unsigned function as *val = as_unsigned(m_data). Function as_unsigned looks like:
inline unsigned char as_unsigned(char a) { return a; }
inline unsigned char as_unsigned(unsigned char a) { return a; }
inline unsigned char as_unsigned(signed char a) { return a; }
// And so on for the rest of integer types.
Using as_unsigned eliminates the risk of that explicit cast becoming incorrect after maintenance, should m_data become a wider integer it will use another overload of as_unsigned automatically without requiring the maintainer to manually update the expression. The inverse function as_signed is also useful.

The cast is better because some compilers (eg. clang) actually generate extra code for the bitwise and. Of course, you only need the one cast to unsigned char.
The cast also expresses your intent better: the data is actually an unsigned char that you move to an int. So I would call it better even with compilers which generate the same code.

Related

cast pointer to unsigned int in a switch case in C++

I have a C header file that has a list of definitions like below
#define TAG_A ((A*)0x123456)
#define TAG_B ((B*)0x456789)
I include that file in a cpp file.
I want to cast those definition in a switch case like below
unsigned int get_tag_address(unsigned int i)
{
switch(i)
{
case reinterpret_cast<unsigned int>(TAG_A):
return 1;
case reinterpret_cast<unsigned int>(TAG_B):
return 2;
}
return 3;
}
I still get compiler error that I can't cast a pointer to an unsigned intigeter.
What do I do wrong?
The definitions look at hardware addresses of an embedded system. I want to return an unsigned integer based on what hardware component is used (i.e. passed into the function argument).
This is how I ended up in that situation.
PS: The header file containing the defitions must not change.
It is impossible to use TAG_A and TAG_B in a case of a switch, except by using preprocessor tricks like stringifying the macro replacement itself in a macro and then parsing the value form the resulting string, which will however make the construct dependent on the exact form of the TAG_X macros and I feel is not really worth it when you don't have a strict requirement to obtain compile-time constant values representing the pointers.
The results of the expressions produced by the TAG_A and TAG_B replacement can not be used in a case operand because the operand must be a constant expression, but casting an integer to a pointer as done with (A*) and (B*) disqualifies an expression from being a constant expression.
So, you will need to use if/else if instead:
unsigned int get_tag_address(unsigned int i)
{
if(i == reinterpret_cast<unsigned int>(TAG_A)) {
return 1;
} else if(i == reinterpret_cast<unsigned int>(TAG_B)) {
return 2;
} else {
return 3;
}
}
Also, consider using std::uintptr_t instead of unsigned int for i and in the reinterpret_casts, since it is not guaranteed that unsigned int is large enough to hold the pointer values. However, compilation of the reinterpret_cast should fail if unsigned int is in fact too small. (It is possible that std::uintptr_t in <cstdint> does not exist, in which case you are either using pre-C++11 or, if not that, it would be a hint that the architecture does not allow for representing pointers as integer values. It is not guaranteed that this is possible, but you would need to be working some pretty exotic architecture for it to not be possible.)
And if you can, simply pass, store and compare pointers (maybe as void*) instead of integer values representing the pointers. That is safer for multiple reasons and always guaranteed to work.

How to safely compare a ssize_t with a int64_t

I want to safely compare a ssize_t variable with int64_t variable to check if the values are equal. By safe I mean the comparison should work for any value of ssize_t. My first guess is to use a static_cast to convert the ssize_t to int64_t but I'm not sure if it is a safe way to convert?
Something like:
ssize_t a = read(...);
int64_t b = getsize(...);
if(static_cast<int64_t>(a) == b){
// ... read succeeded
} else{
// ... partial or read failure
}
Update: On Ubuntu, they both are of exactly the same size
Don't over-complicate things.
ssize_t and int64_t are both signed types, thus the common type is the bigger of the two, meaning the conversion is value-preserving.
In conclusion, directly using the comparison-operator will do the right thing.
You only have to take care when mixing signed and unsigned, with the unsigned being at least as big as int and the signed type. Because only in that case conversion to the common type won't be value-preserving for negative values.
In that case, C++20 helps with intcmp
In C++20 you don't even need to care about signness or the size of the operands, just use intcmp to do the comparison
if (std::cmp_equal(a, b)) {
// ... read succeeded
} else{
// ... partial or read failure
}
There's also std::in_range to check values regardless of type and signness
most of the time you don't have ssize_t. and when you do what stops you from doing
int64_t a;
ssize_t b;
b = sizeof(ssize_t) == sizeof(int64_t) ? a : whatever_else;

Is it safe to compare an unsigned int with a std::string::size_type

I am going trough the book "Accelerated C++" by Andrew Koenig and Barbara E. Moo and I have some questions about the main example in chap 2. The code can be summarized as below, and is compiling without warning/error with g++:
#include <string>
using std::string;
int main()
{
const string greeting = "Hello, world!";
// OK
const int pad = 1;
// KO
// int pad = 1;
// OK
// unsigned int pad = 1;
const string::size_type cols = greeting.size() + 2 + pad * 2;
string::size_type c = 0;
if (c == 1 + pad)
{;}
return 0;
}
However, if I replace const int pad = 1; by int pad = 1;, the g++ compiler will return a warning:
warning: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (c == 1 + pad)
If I replace const int pad = 1; by unsigned int pad = 1;, the g++ compiler will not return a warning.
I understand why g++ return the warning, but I am not sure about the three below points:
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
From the compiler point of view:
It is unsafe to compare signed and unsinged variables (non-constants).
It is safe to compare 2 unsinged variables of different sizes.
It is safe to compare an unsigned variable with a singed constant if the compiler can check that constant to be in the allowed range for the type of the signed variable (e.g. for 16-bit signed integer it is safe to use a constant in range [0..32767]).
So the answers to your questions:
Yes, it is safe to compare unsigned int and std::string::size_type.
There is no warning because the compiler can perform the safety check (while compiling :)).
There is no problem to use different unsigned types in comparison. Use unsinged int.
Comparing signed and unsigned values is "dangerous" in the sense that you may not get what you expect when the signed value is negative (it may well behave as a very large unsigned value, and thus a > b gives true when a = -1 and b = 100. (The use of const int works because the compiler knows the value isn't changing and thus can say "well, this value is always 1, so it works fine here")
As long as the value you want to compare fits in unsigned int (on typical machines, a little over 4 billion) is fine.
If you are using std::string with the default allocator (which is likely), then size_type is actually size_t.
[support.types]/6 defines that size_t is
an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
So it's not technically guaranteed to be a unsigned int, but I believe it is defined this way in most cases.
Now regarding your second question: if you use const int something = 2, the compiler sees that this integer is a) never negative and b) never changes, so it's always safe to compare this variable with size_t. In some cases the compiler may optimize the variable out completely and simply replace all it's occurrences with 2.
I would say that it is better to use size_type everywhere where you are to the size of something, since it is more verbose.
What the compiler warns about is the comparison of unsigned and signed integer types. This is because the signed integer can be negative and the meaning is counter intuitive. This is because the signed is converted to unsigned before comparison, which means the negative number will compare greater than the positive.
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Yes, they are both unsigned and then the semantics is what's expected. If their range differs the narrower are converted to a wider type.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
This is because how the compiler is constructed. The compiler parses and to some extent optimizes the code before warnings are issued. The important point is that at the point this warning is being considered the compiler nows that the signed integer is 1 and then it's safe to compare with a unsigned integer.
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
If you don't want it to be constant the best solution would probably be to make it at least an unsigned integer type. However you should be aware that there is no guaranteed relation between normal integer types and sizes, for example unsigned int may be narrower, wider or equal to size_t and size_type (the latter may also differ).

C++: can an int be assigned a char*?

I am reading chapter 2 of Advanced Linux Programming:
http://www.advancedlinuxprogramming.com/alp-folder/alp-ch02-writing-good-gnu-linux-software.pdf
In the section 2.1.3 Using getopt_long, there is an example program that goes a bit like this:
int main (int argc, char* argv[]) {
int next_option;
// ...
do {
next_option = getopt_long (argc, argv, short_options, long_options, NULL);
switch (next_option) {
case ‘h’: /* -h or --help */
// ...
}
// ...
The bit that caught my attention is that next_option is declared as an int. The function getopt_long() apparently returns an int representing the short command line argument which is used in the following switch statement. How come that integer can be compared to a character in the switch statement?
Is there an implicit conversion from a char (a single character?) to an int? How is the code above valid? (see full code in linked pdf)
Neither C nor C++ have a type that can store "characters" as values with some dedicated character-specific properties. In that sense, there's no "character" type neither in C nor in C++.
In both C++ and C languages char is an integral type. It contains numbers. It is just a smallest (in terms of range) integral type. Conversion between char and int exists, just like it exists between int and long or int and short. char has no special status among other integral types (aside from the fact that char type it is distinct from signed char type).
A literal of the form 'h' in C++ has type char, but as any other integral type it is comparable to int. That's why you can use it in case label the way it is used in your original example.
In other words, your original code is as "strange" as
switch (next_option) {
case 1L: ...
// ...
}
would be. In this case the switch argument is an int, but the case label is a long. The code is valid. Do you find it surprising? Probably not. Your example with 'h' is in not much different.
You are mistaken -- getopt_long(3) returns an int.
Several functions return int in C, but char in C++. Returning an int when a char would make more sense is simply an old C cultural decision. Plus, in a few cases, it's necessary so that a function can return sentinels like EOF.
As the other answerer says, you're asking the wrong question here. But to answer the question you did ask:
No implicit casting from char* to an int is available. On x86 machines, both int and char* are 32 bits long, so it's "safe" to explicitly cast:
int x = (int*) &someChar;
BUT HIGHLY NOT RECOMMENDED!!!
On x64 machines, this will not work! int remains 32 bits long, but all pointers are now 64 bits long... so you'll lose data in the process!
According to the man page, getopt_long returns an int. And yes, there is an implicit cast from char to int; a char is just a one-byte integer value.
So in this case, the cast is not happening when assigning to next_option, but in the case statement where you have a character constant being compared to an int. Of course, this is assuming you compile this as C++. In C++, a character constant is of type char, but in C it's of type int, so if you compile this code as C then there's no type conversion at all.
(And in your question you mention char*, but you probably meant char; there are no pointers being used here.)
Think of a char as a 8bit int. You can perform integer operations on chars and you can even declare them as unsigned. You wouldn't be surprised if you could compare a short and a long. Why should comparing a char and an int be different?

Conversion from unsigned to signed type safety?

Is it safe to convert, say, from an unsigned char * to a signed char * (or just a char *?
The access is well-defined, you are allowed to access an object through a pointer to signed or unsigned type corresponding to the dynamic type of the object (3.10/15).
Additionally, signed char is guaranteed not to have any trap values and as such you can safely read through the signed char pointer no matter what the value of the original unsigned char object was.
You can, of course, expect that the values you read through one pointer will be different from the values you read through the other one.
Edit: regarding sellibitze's comment, this is what 3.9.1/1 says.
A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.9); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers.
So indeed it seems that signed char may have trap values. Nice catch!
The conversion should be safe, as all you're doing is converting from one type of character to another, which should have the same size. Just be aware of what sort of data your code is expecting when you dereference the pointer, as the numeric ranges of the two data types are different. (i.e. if your number pointed by the pointer was originally positive as unsigned, it might become a negative number once the pointer is converted to a signed char* and you dereference it.)
Casting changes the type, but does not affect the bit representation. Casting from unsigned char to signed char does not change the value at all, but it affects the meaning of the value.
Here is an example:
#include <stdio.h>
int main(int args, char** argv) {
/* example 1 */
unsigned char a_unsigned_char = 192;
signed char b_signed_char = b_unsigned_char;
printf("%d, %d\n", a_signed_char, a_unsigned_char); //192, -64
/* example 2 */
unsigned char b_unsigned_char = 32;
signed char a_signed_char = a_unsigned_char;
printf("%d, %d\n", b_signed_char, b_unsigned_char); //32, 32
return 0;
}
In the first example, you have an unsigned char with value 192, or 110000000 in binary. After the cast to signed char, the value is still 110000000, but that happens to be the 2s-complement representation of -64. Signed values are stored in 2s-complement representation.
In the second example, our unsigned initial value (32) is less than 128, so it seems unaffected by the cast. The binary representation is 00100000, which is still 32 in 2s-complement representation.
To "safely" cast from unsigned char to signed char, ensure the value is less than 128.
It depends on how you are going to use the pointer. You are just converting the pointer type.
You can safely convert an unsigned char* to a char * as the function you are calling will be expecting the behavior from a char pointer, but, if your char value goes over 127 then you will get a result that will not be what you expected, so just make certain that what you have in your unsigned array is valid for a signed array.
I've seen it go wrong in a few ways, converting to a signed char from an unsigned char.
One, if you're using it as an index to an array, that index could go negative.
Secondly, if inputted to a switch statement, it may result in a negative input which often is something the switch isn't expecting.
Third, it has different behavior on an arithmetic right shift
int x = ...;
char c = 128
unsigned char u = 128
c >> x;
has a different result than
u >> x;
Because the former is sign-extended and the latter isn't.
Fourth, a signed character causes underflow at a different point than an unsigned character.
So a common overflow check,
(c + x > c)
could return a different result than
(u + x > u)
Safe if you are dealing with only ASCII data.
I'm astonished it hasn't been mentioned yet: Boost numeric cast should do the trick - but only for the data of course.
Pointers are always pointers. By casting them to a different type, you only change the way the compiler interprets the data pointed to.