How to convert int to short int using bitwise operators? - c++

For example:
short a = 10;
int b = a & 0xffff;
Similarly if I want to convert from int to short, how do I do using bitwise operators? I don't want to use the usual casting using (short).

If you want sign extension:
int b = a;
If you don't (i.e. negative values of a will yield (weird) positive values of b)
// note that Standard Conversion of shorts to int happens before &
int b = a & std::numeric_limits<unsigned short>::max();

Doing bit-operations on signed types may not be a good idea and lead to surprising results: Are the results of bitwise operations on signed integers defined?. Why do you need bit-operations?
short int2short(int x) {
if (x > std::numeric_limits<short>::max()) {
// what to do now? Throw exception, return default value ...
}
else if (x < std::numeric_limits<short>::min()) {
// what to do now? Throw exception, return default value ...
} else
{
return static_cast<short>(x);
}
}
This could generalized into a template method and also have policies for the error cases.

Why not using (short)? That's the easiest way and gets what you want.
Unless it's an interview problem, then you need to assume how many bits a short and a int contains. If the number is positive, just using bitwise AND. If the number is negative, flip it to positive number, and do bitwise AND. After AND, you need to change the highest bit to 1.

Related

Safe, signed subtraction of large unsigned ints

I'm working with a protocol where I don't have control of the input types. But I need to compute the difference in two, 64-bit unsigned integers (currently baked into a std::uint64_t). But the difference might be negative or positive. I don't want to do this:
uint64_t a{1};
uint64_t b{2};
int64_t x = a - b; // -1; correct, but what if a and b were /enormous/?
So I was looking at Boost's safe_numerics here. The large-values case is handled as I would like:
boost::safe_numerics::safe<uint64_t> a{UINT64_MAX};
boost::safe_numerics::safe<uint64_t> b{1};
boost::safe_numerics::safe<int64_t> x = a - b;
// ^^ Throws "converted unsigned value too large: positive overflow error"
Great! But ... they're a little too safe:
boost::safe_numerics::safe<uint64_t> a{1}; //UINT64_MAX;
boost::safe_numerics::safe<uint64_t> b{2};
boost::safe_numerics::safe<int64_t> x = a - b;
// ^^ Throws "subtraction result cannot be negative: negative overflow error"
// ... even though `x` is signed
I have a suspicion that it's a - b that actually throws, not the assignment. But I've tried every kind of cast in the book to get a - b into a safe, signed integer, but no joy.
There are some inelegant ways to deal with this, like comparing a and b to always subtract the smaller from the larger. Or I can do a lot of casting with boost::numeric_cast, or old-school range checking. Or...god forbid...I just throw myself when a or b exceed 63 bits, but all that is a bit lame.
But my real question is: Why does Boost detect a negative overflow in the final example above? Am I using safe_numerics incorrectly?
Am targeting C++-17 with gcc on a 64-bit system and using Boost 1.71.
The behavior I was looking for is actually implemented in boost::safe_numerics::checked_result:
https://www.boost.org/doc/libs/develop/libs/safe_numerics/doc/html/checked_result.html
checked::subtract allows negative overflows when the difference of two unsigned integers is negative (and being stored in a signed integer of adequate size). But it throws when the result does not. For example:
using namespace std;
using namespace boost::safe_numerics;
safe<uint64_t> a{2};
safe<uint64_t> b{1};
checked_result<int64_t> x0 = checked::subtract<int64_t>(b, a);
assert(x0 == -1);
checked_result<int64_t> x1 = checked::subtract<int64_t>(a, b);
assert(x1 == 1);
a = UINT64_MAX;
checked_result<int64_t> x2 = checked::subtract<int64_t>(a, b); // throws

How to use bits for control statement expression?

I'm using the instructions in this answer to get and set bit values of a char. The setting and getting has no/shouldn't have any problem (semantically the same as the linked answer).
Problem is I can't properly use the bit values from it in control statements (if).
The relevant piece of code in which I'm having the problem:
unsigned long int find_container(unsigned long int k){
return (k)/(sizeof(char)*8);
}
unsigned long int find_bit(unsigned long int k){
return (k)%(sizeof(char)*8);
}
....
if (~(marks[find_container((k-3)/2)] >> (find_bit((k-3)/2)&1))){
printf("must print\n");
}
marks[find_container((k-3)/2)] |= 1<<find_bit((k-3)/2);
if (~(marks[find_container((k-3)/2)] >> (find_bit((k-3)/2)&1))){
printf("this shouldn't have been printed\n");
}
....
Prints:
must print
this shouldn't have been printed
It's evident that the if statement doesn't take bit value expressions.
Well, I tried casting the bit value to bool (~(bool)((marks[find_container((k-3)/2)] >> (find_bit((k-3)/2)&1))) but it didn't change this behavior.
Initially all the value in the marks array is set to zero
marks = (char *)calloc( chars, (sizeof(char)));
chars is an unsigned long int
So how can I have if statements uses bit expressions?
(find_bit((k-3)/2)&1)
is either 0 or 1.
I think you misplaced your parentheses and that you're looking for
(marks[find_container((k-3)/2)] >> find_bit((k-3)/2)) & 1
I would recommend that you add abstracting functions for manipulating bits, it makes the code much more readable and less error-prone.
Something like this.
// No safety, for clarity
void set(unsigned long int* bits, size_t which)
{
bits[find_container(which)] |= 1 << find_bit(which);
}
bool get(unsigned long int* bits, size_t which)
{
return (bits[find_container(which)] >> find_bit(which)) & 1;
}
// example
if (!get(marks, k))
{
set(marks, k);
}
Well, yes, you can use any value in an if and values not equal to zero are considered true.
But how you can tell what all that code is doing is beyond me. The chances of that having a bug are high.
I'd suggest writing some individual functions to read/clear/set a particular bit in your array. Then it should become clearer what you are doing.
Incidentally, if this is c code, you shouldn't be casting the result of calloc. If on the other hand, it is C++ code, you shouldn't be using C-style casts.

How does the compiler implement bit field arithmetics?

When asking a question on how to do wrapped N bit signed subtraction I got the following answer:
template<int bits>
int
sub_wrap( int v, int s )
{
struct Bits { signed int r: bits; } tmp;
tmp.r = v - s;
return tmp.r;
}
That's neat and all, but how will a compiler implement this? From this question I gather that accessing bit fields is more or less the same as doing it by hand, but what about when combined with arithmetic as in this example? Would it be as fast as a good manual bit-twiddling approach?
An answer for "gcc" in the role of "a compiler" would be great if anyone wants to get specific. I've tried reading the generated assembly, but it is currently beyond me.
As written in the other question, unsigned wrapping math can be done as:
int tmp = (a - b) & 0xFFF; /* 12 bit mask. */
Writing to a (12bit) bitfield will do exactly that, signed or unsigned. The only difference is that you might get a warning message from the compiler.
For reading though, you need to do something a bit different.
For unsigned maths, it's enough to do this:
int result = tmp; /* whatever bit count, we know tmp contains nothing else. */
or
int result = tmp & 0xFFF; /* 12bit, again, if we have other junk in tmp. */
For signed maths, the extra magic is the sign-extend:
int result = (tmp << (32-12)) >> (32-12); /* asssuming 32bit int, and 12bit value. */
All that does is replicate the top bit of the bitfield (bit 11) across the wider int.
This is exactly what the compiler does for bitfields. Whether you code them by hand or as bitfields is up to you, but just make sure you get the magic numbers right.
(I have not read the standard, but I suspect that relying on bitfields to do the right thing on overflow might not be safe?)
The compiler has knowledge about the size and exact position of r in your example. Suppose it is like
[xxxxrrrr]
Then
tmp.r = X;
could e.g. be expanded to (the b-suffix indicating binary literals, & is bitwise and, | is bitwise or)
tmp = (tmp & 11110000b) // <-- get the remainder which is not tmp.r
| (X & 00001111b); // <-- put X into tmp.r and filter away unwanted bits
Imagine your layout is
[xxrrrrxx] // 4 bits, 2 left-shifts
the expansion could be
tmp = (tmp & 11000011b) // <-- get the remainder which is not tmp.r
| ((X<<2) & 00111100b); // <-- filter 4 relevant bits, then shift left 2
How X actually looks like, whether a complex formulation or just a literal, is actually irrelevant.
If your architecture does not support such bitwise operations, there are still multiplications and divisions by power of two to simulate shifting, and probably these can also be used to filter out unwanted bits.

How does one safely static_cast between unsigned int and int?

I have an 8-character string representing a hexadecimal number and I need to convert it to an int. This conversion has to preserve the bit pattern for strings "80000000" and higher, i.e., those numbers should come out negative. Unfortunately, the naive solution:
int hex_str_to_int(const string hexStr)
{
stringstream strm;
strm << hex << hexStr;
unsigned int val = 0;
strm >> val;
return static_cast<int>(val);
}
doesn't work for my compiler if val > MAX_INT (the returned value is 0). Changing the type of val to int also results in a 0 for the larger numbers. I've tried several different solutions from various answers here on SO and haven't been successful yet.
Here's what I do know:
I'm using HP's C++ compiler on OpenVMS (using, I believe, an Itanium processor).
sizeof(int) will be at least 4 on every architecture my code will run on.
Casting from a number > INT_MAX to int is implementation-defined. On my machine, it usually results in a 0 but interestingly casting from long to int results in INT_MAX when the value is too big.
This is surprisingly difficult to do correctly, or at least it has been for me. Does anyone know of a portable solution to this?
Update:
Changing static_cast to reinterpret_cast results in a compiler error. A comment prompted me to try a C-style cast: return (int)val in the code above, and it worked. On this machine. Will that still be safe on other architectures?
Quoting the C++03 standard, §4.7/3 (Integral Conversions):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
Because the result is implementation-defined, by definition it is impossible for there to be a truly portable solution.
While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:
int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;
You can negate an unsigned twos-complement number by taking the complement and adding one. So let's do that for negatives:
if (val < 0x80000000) // positive values need no conversion
return val;
if (val == 0x80000000) // Complement-and-addition will overflow, so special case this
return -0x80000000; // aka INT_MIN
else
return -(int)(~val + 1);
This assumes that your ints are represented with 32-bit twos-complement representation (or have similar range). It does not rely on any undefined behavior related to signed integer overflow (note that the behavior of unsigned integer overflow is well-defined - although that should not happen here either!).
Note that if your ints are not 32-bit, things get more complex. You may need to use something like ~(~0U >> 1) instead of 0x80000000. Further, if your ints are no twos-complement, you may have overflow issues on certain values (for example, on a ones-complement machine, -0x80000000 cannot be represented in a 32-bit signed integer). However, non-twos-complement machines are very rare today, so this is unlikely to be a problem.
Here's another solution that worked for me:
if (val <= INT_MAX) {
return static_cast<int>(val);
}
else {
int ret = static_cast<int>(val & ~INT_MIN);
return ret | INT_MIN;
}
If I mask off the high bit, I avoid overflow when casting. I can then OR it back safely.
C++20 will have std::bit_cast that copies bits verbatim:
#include <bit>
#include <cassert>
#include <iostream>
int main()
{
int i = -42;
auto u = std::bit_cast<unsigned>(i);
// Prints 4294967254 on two's compliment platforms where int is 32 bits
std::cout << u << "\n";
auto roundtripped = std::bit_cast<int>(u);
assert(roundtripped == i);
std::cout << roundtripped << "\n"; // Prints -42
return 0;
}
cppreference shows an example of how one can implement their own bit_cast in terms of memcpy (under Notes).
While OpenVMS is not likely to gain C++20 support anytime soon, I hope this answer helps someone arriving at the same question via internet search.
unsigned int u = ~0U;
int s = *reinterpret_cast<int*>(&u); // -1
Сontrariwise:
int s = -1;
unsigned int u = *reinterpret_cast<unsigned int*>(&s); // all ones

Converting floating point to fixed point

In C++, what's the generic way to convert any floating point value (float) to fixed point (int, 16:16 or 24:8)?
EDIT: For clarification, fixed-point values have two parts to them: an integer part and a fractional part. The integer part can be represented by a signed or unsigned integer data type. The fractional part is represented by an unsigned data integer data type.
Let's make an analogy with money for the sake of clarity. The fractional part may represent cents -- a fractional part of a dollar. The range of the 'cents' data type would be 0 to 99. If a 8-bit unsigned integer were to be used for fixed-point math, then the fractional part would be split into 256 evenly divisible parts.
I hope that clears things up.
Here you go:
// A signed fixed-point 16:16 class
class FixedPoint_16_16
{
short intPart;
unsigned short fracPart;
public:
FixedPoint_16_16(double d)
{
*this = d; // calls operator=
}
FixedPoint_16_16& operator=(double d)
{
intPart = static_cast<short>(d);
fracPart = static_cast<unsigned short>
(numeric_limits<unsigned short> + 1.0)*d);
return *this;
}
// Other operators can be defined here
};
EDIT: Here's a more general class based on anothercommon way to deal with fixed-point numbers (and which KPexEA pointed out):
template <class BaseType, size_t FracDigits>
class fixed_point
{
const static BaseType factor = 1 << FracDigits;
BaseType data;
public:
fixed_point(double d)
{
*this = d; // calls operator=
}
fixed_point& operator=(double d)
{
data = static_cast<BaseType>(d*factor);
return *this;
}
BaseType raw_data() const
{
return data;
}
// Other operators can be defined here
};
fixed_point<int, 8> fp1; // Will be signed 24:8 (if int is 32-bits)
fixed_point<unsigned int, 16> fp1; // Will be unsigned 16:16 (if int is 32-bits)
A cast from float to integer will throw away the fractional portion so if you want to keep that fraction around as fixed point then you just multiply the float before casting it. The below code will not check for overflow mind you.
If you want 16:16
double f = 1.2345;
int n;
n=(int)(f*65536);
if you want 24:8
double f = 1.2345;
int n;
n=(int)(f*256);
**** Edit** : My first comment applies to before Kevin's edit,but I'll leave it here for posterity. Answers change so quickly here sometimes!
The problem with Kevin's approach is that with Fixed Point you are normally packing into a guaranteed word size (typically 32bits). Declaring the two parts separately leaves you to the whim of your compiler's structure packing. Yes you could force it, but it does not work for anything other than 16:16 representation.
KPexEA is closer to the mark by packing everything into int - although I would use "signed long" to try and be explicit on 32bits. Then you can use his approach for generating the fixed point value, and bit slicing do extract the component parts again. His suggestion also covers the 24:8 case.
( And everyone else who suggested just static_cast.....what were you thinking? ;) )
I gave the answer to the guy that wrote the best answer, but I really used a related questions code that points here.
It used templates and was easy to ditch dependencies on the boost lib.
This is fine for converting from floating point to integer, but the O.P. also wanted fixed point.
Now how you'd do that in C++, I don't know (C++ not being something I can think in readily). Perhaps try a scaled-integer approach, i.e. use a 32 or 64 bit integer and programmatically allocate the last, say, 6 digits to what's on the right hand side of the decimal point.
There isn't any built in support in C++ for fixed point numbers. Your best bet would be to write a wrapper 'FixedInt' class that takes doubles and converts them.
As for a generic method to convert... the int part is easy enough, just grab the integer part of the value and store it in the upper bits... decimal part would be something along the lines of:
for (int i = 1; i <= precision; i++)
{
if (decimal_part > 1.f/(float)(i + 1)
{
decimal_part -= 1.f/(float)(i + 1);
fixint_value |= (1 << precision - i);
}
}
although this is likely to contain bugs still