When asking a question on how to do wrapped N bit signed subtraction I got the following answer:
template<int bits>
int
sub_wrap( int v, int s )
{
struct Bits { signed int r: bits; } tmp;
tmp.r = v - s;
return tmp.r;
}
That's neat and all, but how will a compiler implement this? From this question I gather that accessing bit fields is more or less the same as doing it by hand, but what about when combined with arithmetic as in this example? Would it be as fast as a good manual bit-twiddling approach?
An answer for "gcc" in the role of "a compiler" would be great if anyone wants to get specific. I've tried reading the generated assembly, but it is currently beyond me.
As written in the other question, unsigned wrapping math can be done as:
int tmp = (a - b) & 0xFFF; /* 12 bit mask. */
Writing to a (12bit) bitfield will do exactly that, signed or unsigned. The only difference is that you might get a warning message from the compiler.
For reading though, you need to do something a bit different.
For unsigned maths, it's enough to do this:
int result = tmp; /* whatever bit count, we know tmp contains nothing else. */
or
int result = tmp & 0xFFF; /* 12bit, again, if we have other junk in tmp. */
For signed maths, the extra magic is the sign-extend:
int result = (tmp << (32-12)) >> (32-12); /* asssuming 32bit int, and 12bit value. */
All that does is replicate the top bit of the bitfield (bit 11) across the wider int.
This is exactly what the compiler does for bitfields. Whether you code them by hand or as bitfields is up to you, but just make sure you get the magic numbers right.
(I have not read the standard, but I suspect that relying on bitfields to do the right thing on overflow might not be safe?)
The compiler has knowledge about the size and exact position of r in your example. Suppose it is like
[xxxxrrrr]
Then
tmp.r = X;
could e.g. be expanded to (the b-suffix indicating binary literals, & is bitwise and, | is bitwise or)
tmp = (tmp & 11110000b) // <-- get the remainder which is not tmp.r
| (X & 00001111b); // <-- put X into tmp.r and filter away unwanted bits
Imagine your layout is
[xxrrrrxx] // 4 bits, 2 left-shifts
the expansion could be
tmp = (tmp & 11000011b) // <-- get the remainder which is not tmp.r
| ((X<<2) & 00111100b); // <-- filter 4 relevant bits, then shift left 2
How X actually looks like, whether a complex formulation or just a literal, is actually irrelevant.
If your architecture does not support such bitwise operations, there are still multiplications and divisions by power of two to simulate shifting, and probably these can also be used to filter out unwanted bits.
Related
I get some values from hardware registers where values are stored in 16-bit unsigned integers but these values are actually signed. Knowing the last bit is the sign bit, a colleague has done the following snippet to convert them to 2's complement values :
/* Take 15 bits of the data (last bit is the sign) */
#define DATAMASK 0x7FFF
/* Sign is bit 15 (starting from zero) with the 15 bit data */
#define SIGNMASK 0x8000
#define SIGNBIT 15
int16_t calc2sComplement(uint16_t data)
{
int16_t temp, sign;
int16_t signData;
sign = (int16_t)((data & SIGNMASK) >> SIGNBIT);
if (sign)
{
temp = (~data) & DATAMASK;
signData = (short)(temp * -1);
}
else
{
temp = (data & DATAMASK);
signData = temp;
}
return(signData);
}
As far as I know, unsigned integers types and signed integers types only differs by their type and the meaning of the last bit ; so casting such as following should work as well :
int16_t calc2sComplement(uint16_t data)
{
return(static_cast<int16_t>(data));
}
and when needing to push values to the hardware, the reverse operation is straightforward, unlike the calculation. The advantage of the former solution is it's toolchain-free ; since it can change sooner or later (gcc 4.4.7, and so C++03), I would prefer not having to do it but there won't be any regression when compiled years after. The advantage of the latter is it's more readable, close to standard and avoid unnecessary operations.
What would be the best in my case to be sure to keep the same behaviour if compiled again after a toolchain change (even the standard types are redefined somewhere in the toolchain and I do not really have the hand on it) ?
If you would keep the first solution, how would improve it and/or code the reverse conversion (keep in mind that data can be a pointer over a buffer of data) ?
In the end, let's answer to myself. So, the best way to convert values to or from two's complement, and preventing any unexpected behaviour is to perform two's complement conversion as follows :
int16_t calc2sComplement(uint16_t data)
{
return(static_cast<int16_t>(data));
}
and to do the reverse operation :
uint16_t inv2sComplement(int16_t data)
{
return(static_cast<uint16_t>(data));
}
This method is proven to be completely safe (as long as primitives types are not redefined somewhere in the toolchain - which is considered bad practice but was actually my case, hence my question in the 1st place) by relying on the definition of the primitive built-in types.
I want to modify individual bits of data, (for e.g. ints or chars). I want to do this by making a pointer, say ptr. by assigning it to some int or char, and then after incrementing ptr n times, I want to access the nth bit of that data.
Something like
// If i want to change all the 8 bits in a char variable
char c="A";
T *ptr=&c; //T is the data type of pointer I want..
int index=0;
for(index;index<8;index++)
{
*ptr=1; //Something like assigning 1 to the bit pointed by ptr...
}
There no such thing as a bit pointer in C++. You need to use two things, a byte pointer and an offset to the bit. That seems to be what you are getting towards in your code. Here's how you do the individual bit operations.
// set a bit
*ptr |= 1 << index;
// clear a bit
*ptr &= ~(1 << index);
// test a bit
if (*ptr & (1 << index))
...
The smallest addressable memory unit in C and C++ is 1 byte. So You cannot have a pointer to anything less than a byte.If you want to perform bitwise operations C and C++ provide the bitwise operators for these operations.
It is impossible to have address of individual bit, but you can utilize structures with bit fields. Like in this example from Wikipedia so:
struct box_props
{
unsigned int opaque : 1;
unsigned int fill_color : 3;
unsigned int : 4; // fill to 8 bits
unsigned int show_border : 1;
unsigned int border_color : 3;
unsigned int border_style : 2;
unsigned int : 2; // fill to 16 bits
};
Then by manipulating individual fields you will change sets of bits inside unsigned int. Technically this is identical to bitwise operations, but in this case compiler will generate the code (and you have lower chances of bug).
Be advised that you have to be cautious using bit fields.
C and C++ doesn't have a "bit pointer", technically speaking, C and C++ as such, deosn't know about "bits". You could build your own type, to do this, you need two things: A pointer to some type (char, int - probably unsigned) and a bit number. You'd then use the pointer and the bit number, along with the bitwise operators, to actually access the values.
There is nothing like a pointer to a bit
If you want all bits set to 1 then c = 0xff; is what you want, if you want to set a bit under some condition:
for(index;index<8;index++)
{
if (condition) c |= 1 << index;
}
As you can see there is no need to use a pointer
You can not read a single bit from the memory, CPU always read a full cache line, which could have different sizes for different CPUs.
But from the language point of view you can use bit fields
http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html
http://en.wikipedia.org/wiki/Bit_field
Say, i have binary protocol, where first 4 bits represent a numeric value which can be less than or equal to 10 (ten in decimal).
In C++, the smallest data type available to me is char, which is 8 bits long. So, within my application, i can hold the value represented by 4 bits in a char variable. My question is, if i have to pack the char value back into 4 bits for network transmission, how do i pack my char's value back into 4 bits?
You do bitwise operation on the char;
so
unsigned char packedvalue = 0;
packedvalue |= 0xF0 & (7 <<4);
packedvalue |= 0x0F & (10);
Set the 4 upper most bit to 7 and the lower 4 bits to 10
Unpacking these again as
int upper, lower;
upper = (packedvalue & 0xF0) >>4;
lower = packedvalue & 0x0F;
As an extra answer to the question -- you may also want to look at protocol buffers for a way of encoding and decoding data for binary transfers.
Sure, just use one char for your value:
std::ofstream outfile("thefile.bin", std::ios::binary);
unsigned int n; // at most 10!
char c = n << 4; // fits
outfile.write(&c, 1); // we wrote the value "10"
The lower 4 bits will be left at zero. If they're also used for something, you'll have to populate c fully before writing it. To read:
infile.read(&c, 1);
unsigned int n = c >> 4;
Well, there's the popular but non-portable "Bit Fields". They're standard-compliant, but may create a different packing order on different platforms. So don't use them.
Then, there are the highly portable bit shifting and bitwise AND and OR operators, which you should prefer. Essentially, you work on a larger field (usually 32 bits, for TCP/IP protocols) and extract or replace subsequences of bits. See Martin's link and Soren's answer for those.
Are you familiar with C's bitfields? You simply write
struct my_bits {
unsigned v1 : 4;
...
};
Be warned, various operations are slower on bitfields because the compiler must unpack them for things like addition. I'd imagine unpacking a bitfield will still be faster than the addition operation itself, even though it requires multiple instructions, but it's still overhead. Bitwise operations should remain quite fast. Equality too.
You must also take care with endianness and threads (see the wikipedia article I linked for details, but the issues are kinda obvious). You should leearn about endianness anyways since you said "binary protocol" (see this previous questions)
I'm writing a Fixedpoint class, but have ran into bit of a snag... The multiplication, division portions, I am not sure how to emulate. I took a very rough stab at the division operator but I am sure it's wrong. Here's what it looks like so far:
class Fixed
{
Fixed(short int _value, short int _part) :
value(long(_value + (_part >> 8))), part(long(_part & 0x0000FFFF)) {};
...
inline Fixed operator -() const // example of some of the bitwise it's doing
{
return Fixed(-value - 1, (~part)&0x0000FFFF);
};
...
inline Fixed operator / (const Fixed & arg) const // example of how I'm probably doing it wrong
{
long int tempInt = value<<8 | part;
long int tempPart = tempInt;
tempInt /= arg.value<<8 | arg.part;
tempPart %= arg.value<<8 | arg.part;
return Fixed(tempInt, tempPart);
};
long int value, part; // members
};
I... am not a very good programmer, haha!
The class's part is 16 bits wide (but expressed as a 32-bit long since I imagine it'd need the room for possible overflows before they're fixed) and the same goes for value which is the integer part. When the 'part' goes over 0xFFFF in one of it's operations, the highest 16 bits are added to 'value', and then the part is masked so only it's lowest 16 bits remain. That's done in the init list.
I hate to ask, but if anyone would know where I could find documentation for something like this, or even just the 'trick' or how to do those two operators, I would be very happy for it! I am a dimwit when it comes to math, and I know someone has had to do/ask this before, but searching google has for once not taken me to the promised land...
As Jan says, use a single integer. Since it looks like you're specifying 16 bit integer and fractional parts, you could do this with a plain 32 bit integer.
The "trick" is to realise what happens to the "format" of the number when you do operations on it. Your format would be described as 16.16. When you add or subtract, the format stays the same. When you multiply, you get 32.32 -- So you need a 64 bit temporary value for the result. Then you do a >>16 shift to get down to 48.16 format, then take the bottom 32 bits to get your answer in 16.16.
I'm a little rusty on the division -- In DSP, where I learned this stuff, we avoided (expensive) division wherever possible!
I'd recommend using one integer value instead of separate whole and fractional part. Than addition and subtraction are the integeral counterparts directly and you can simply use 64-bit support, which all common compilers have these days:
Multiplication:
operator*(const Fixed &other) const {
return Fixed((int64_t)value * (int64_t)other.value);
}
Division:
operator/(const Fixed &other) const {
return Fixed(((int64_t)value << 16) / (int64_t)other.value);
}
64-bit integers are
On gcc, stdint.h (or cstdint, which places them in std:: namespace) should be available, so you can use the types I mentioned above. Otherwise it's long long on 32-bit targets and long on 64-bit targets.
On Windows, it's always long long or __int64.
To get things up and running, first implement the (unary) inverse(x) = 1/x, and then implement a/b as a*inverse(b). You'll probably want to represent the intermediates as a 32.32 format.
I have areas of memory that could be considered "array of bits". They are equivalent to
unsigned char arr[256];
But it would be better thought of as
bit arr[2048];
I'm accessing separate bits from it with
#define GETBIT(x,in) ((in)[ ((x)/8) ] & 1<<(7-((x)%8)))
but I do it a lot in many places of the code, often in performance-critical sections and I wonder if there are any smarter, more optimal methods to do it.
extra info: Architecture: ARM9 (32 bit); gcc/Linux. The physical data representation can't be changed - it is externally provided or mapped for external use.
I don't think so. In fact, many CPU architectures won't access bits individually.
On C++ you have std::bitset<N>. but may not have highest-performance depending on your compiler's implementation and optimization.
BTW, it may be better to group your bit-array as uint32_t[32] (or uint64_t[16]) for aligned dereferencing (which bitset does this for you already).
For randomly accessing individual bits, the macro you've suggested is as good as you're going to get (as long as you turn on optimisations in your compiler).
If there is any pattern at all to the bits you're accessing, then you may be able to do better. For example, if you often access pairs of bits, then you may see some improvement by providing a method to get two bits instead of one, even if you don't always end up using both bits.
As with any optimisation problem, you will need to be very familiar with the behaviour of your code, in particular its access patterns in your bit array, to make a meaningful improvement in performance.
Update: Since you access ranges of bits, you can probably squeeze some more performance out of your macros. For example, if you need to access four bits you might have macros like this:
#define GETBITS_0_4(x,in) (((in)[(x)/8] & 0x0f))
#define GETBITS_1_4(x,in) (((in)[(x)/8] & 0x1e) >> 1)
#define GETBITS_2_4(x,in) (((in)[(x)/8] & 0x3c) >> 2)
#define GETBITS_3_4(x,in) (((in)[(x)/8] & 0x78) >> 3)
#define GETBITS_4_4(x,in) (((in)[(x)/8] & 0xf0) >> 4)
#define GETBITS_5_4(x,in) ((((in)[(x)/8] & 0xe0) >> 5) | (((in)[(x)/8+1] & 0x01)) << 3)
#define GETBITS_6_4(x,in) ((((in)[(x)/8] & 0xc0) >> 6) | (((in)[(x)/8+1] & 0x03)) << 2)
#define GETBITS_7_4(x,in) ((((in)[(x)/8] & 0x80) >> 7) | (((in)[(x)/8+1] & 0x07)) << 1)
// ...etc
These macros would clip out four bits from each bit position 0, 1, 2, etc. (To cut down on the proliferation of pointless parentheses, you might want to use inline functions for the above.) Then perhaps define an inline function like:
inline int GETBITS_4(int x, unsigned char *in) {
switch (x % 8) {
case 0: return GETBITS_0_4(x,in);
case 1: return GETBITS_1_4(x,in);
case 2: return GETBITS_2_4(x,in);
// ...etc
}
}
Since this is a lot of tedious boilerplate code, especially if you've got multiple different widths, you may want to write a program to generate all the GETBIT_* accessor functions.
(I notice that the bits in your bytes are stored in the reverse order from what I've written above. Apply an appropriate transformation to match your structure if you need to.)
Taking Greg's solution as a basis:
template<unsigned int n, unsigned int m>
inline unsigned long getbits(unsigned long[] bits) {
const unsigned bitsPerLong = sizeof(unsigned long) * CHAR_BIT
const unsigned int bitsToGet = m - n;
BOOST_STATIC_ASSERT(bitsToGet < bitsPerLong);
const unsigned mask = (1UL << bitsToGet) - 1;
const size_t index0 = n / bitsPerLong;
const size_t index1 = m / bitsPerLong;
// Do the bits to extract straddle a boundary?
if (index0 == index1) {
return (bits[index0] >> (n % bitsPerLong)) & mask;
} else {
return ((bits[index0] >> (n % bitsPerLong)) + (bits[index1] << (bitsPerLong - (m % bitsPerLong)))) & mask;
}
}
Can get at least 32 bits, even if they are not aligned. Note that's intentionally inline as you don't want to have tons of these functions.
If You reverse the bit order in 'arr', then You can eliminate the substraction from the macro. It is the best what i can say, without knowledge of the problem context (how the bits are used).
#define GETBIT(x,in) ((in)[ ((x)/8) ] & 1<<(7-((x)%8)))
can be optimized.
1) Use standard int which is normally the fastest accessible integer datatype.
If you don't need to be portable, you can find out the size of an int with
sizeof and adapt the following code.
2)
#define GETBIT(x,in) ((in)[ ((x) >>> 3) ] & 1<<((x) & 7))
The mod operator % is slower than ANDing. And you don't need to subtract,
simply adjust your SETBIT routine.
Why not create your own wrapper class?
You could then add bits to the "array" using an operator such as + and get back the individual bits using the [] operator.
Your macro could be improved by using & 7 instead of % 8 but its likely the compiler will make that optimisation for you anyway.
I recently did exactly what you are doing and my stream could consist of any number of bits.
So I have something like the following:
BitStream< 1 > oneBitBitStream;
BitStream< 2 > twoBitBitStream;
oneBitBitStream += Bit_One;
oneBitBitStream += Bit_Zero;
twoBitBitStream += Bit_Three;
twoBitBitStream += Bit_One;
and so on. It makes for nice readable code and you can provide an STL like interface to it for aiding faimilarity :)
Since the question is tagged with C++, is there any reason you can't simply use the standard bitset?
Instead of the unsigned char array and custom macros, you can use std::vector<bool>. The vector class template has a special template specialization for the bool type. This specialization is provided to optimize for space allocation: In this template specialization, each element occupies only one bit (which is eight times less than the smallest type in C++: char).