Addition and subtraction of wildcard masks - bit-manipulation

Wildcard masks are commonly used in networking.
Wildcard masks typically have "wildcard" bits that mean that bit can be both a 0 or a 1.
This binary wildcard mask (where the x's represent the wild-card bit)
10xx
covers all these values:
1000
1001
1010
1011
Is there a efficient way of adding/subtracting bit masks?
For example...
x011 + 0111 + xx01 + xxx0 + 1111 = xxxx

There are several common ways to represent bitmasks with wildcards, here's how to compute the "join" (union of the sets represented by the inputs, then "rounded up" to the strictest mask that encompasses at least that set) for them
Known/value
Consists of a pair of masks, known, value (k,v for short), where known has a 1 iff a bit has a fixed value, 0 for a wildcard. value has the values of non-wildcard bits, for wildcard bits the value is not relevant by itself but it simplifies the math if you choose it 0.
The representations of the masks from the example would be
known value
x011 0111 0011
0111 1111 0111
xx01 0011 0001
xxx0 0001 0000
1111 1111 1111
The join of two of them, (kr, vr) = (ka, va) ⋁ (kb, vb) is
kr = ka & kb & ~(va ^ vb) // known if known in both inputs and same value
vr = va & kr // value is the same as in either input, with wildcards normalized to 0
Z,O
Confusing name, but it's a pair of masks where Z (zero) has a 1 iff the bit can be 0 (so it's either 0 or a wildcard) and O (one) has a 1 iff the bit can be 1 (so it's either 1 or a wildcard). Compared to known/value it has some pros and cons,
More symmetric. Computations for Z and O are usually either the same or "dual", whereas computations for known and value are fundamentally different.
Can represent the empty set. Whether this is a pro or a con depends on what you're doing. When a bit is 0 in both Z and O, that means the bit cannot have any value.
Usually the math is more efficient, OTOH it's often harder to think about. The join is easy though.
The representations of the masks from the example would be
Z O
x011 1100 1011
0111 1000 0111
xx01 1110 1101
xxx0 1111 1110
1111 0000 1111
The join of two of them, (zr, or) = (za, oa) ⋁ (zb, ob) is
zr = za | zb
or = oa | ob

Related

Significance of x & (-x) in 2's Complement?

Where '-' denotes negative x, and '&' denotes bitwise AND.
The numbers are in 8-bit 2's complement in a program and I can't seem to find the correlation between inputs and outputs.
8 & (-8) = 8
7 & (-7) = 1
97 & (-97) = 1
So possibly the significance is in the bit manipulation?
0000 1000 & (1111 1000) = 0000 1000
0000 0111 & (1111 1001) = 0000 0001
0110 0001 & (1001 1111) = 0000 0001
In each of the above cases the upper 4-bits always end up being 0's, but I cannot find a correlation between the inputs and what the lower 4-bits end up being.
Any ideas?
ANSWERED: Find the lowest set bit
To expound on the other answer, the two's complement is equal to the one's complement of a number plus 1. Let's look at how adding 1 to the one's complement of 8 goes.
8 -> 00001000 (bin) -> 11110111 (oc) -> 11111000 (tc)
Here, notice how the added 1 moves through the one's complement until it reaches the first 0, flipping that bit and the bits to the right of it. And, also note that the position of the first 0 in the one's complement is also the position of the first 1 in the original binary expression.
In x & (-x), the bits to the left of the first 1 in x will be 0 because they are all still flipped from taking the one's complement. Then, the bits to the right of the first 1 will also be 0 because they were 0 in x (else the first 1 would be earlier).
Thus, the output of x & (-x) will be the power of 2 corresponding to the location of the first 1 in x.
Two's complement is by definition, equals to the one's complement (all bits inversed) plus one.
If you were to do only the number & and its one complement, it would always give 0000 0000.
The key to understand the pattern lies here : if the + 1 operation changes other bits or only the last one. That is, if the number has a 1 at the end, and also if any reminder will propagate after the +1 addition.

Why does left shift and right shift in the same statement yields a different result?

Consider the following Example:
First Case:
short x=255;
x = (x<<8)>>8;
cout<<x<<endl;
Second Case:
short x=255;
x = x<<8;
x = x>>8;
cout<<x<<endl;
The output in the first case is 255 whereas in the second case is -1. -1 as output does makes sense as cpp does a arithmetic right shift. Here are the intermediate values of x to obtain -1 as output.
x: 0000 0000 1111 1111
x<<8:1111 1111 0000 0000
x>>8:1111 1111 1111 1111
Why doesn't the same mechanism happen in the first case?
The difference is a result of two factors.
The C++ standard does not specify the maximum values of integral types. The standard only specifies the minimum size of each integer type. On your platform, a short is a 16 bit value, and an ints is at least a 32 bit value.
The second factor is two's complement arithmetic.
In your first example, the short value is naturally promoted to an int, which is at least 32 bits, so the left and the right shift operates on an int, before getting converted back to a short.
In your second example, after the first left shift operation the resulting value is once again converted back to a short, and due to two's complement arithmetic, it ends up being a negative value. The right shift ends up sign-extending the negative value, resulting in the final result of -1.
What you just observed is sign extension:
Sign extension is the operation, in computer arithmetic, of increasing the number of bits of a binary number while preserving the number's sign (positive/negative) and value. This is done by appending digits to the most significant side of the number, following a procedure dependent on the particular signed number representation used.
For example, if six bits are used to represent the number "00 1010" (decimal positive 10) and the sign extend operation increases the word length to 16 bits, then the new representation is simply "0000 0000 0000 1010". Thus, both the value and the fact that the value was positive are maintained.
If ten bits are used to represent the value "11 1111 0001" (decimal negative 15) using two's complement, and this is sign extended to 16 bits, the new representation is "1111 1111 1111 0001". Thus, by padding the left side with ones, the negative sign and the value of the original number are maintained.
You rigt shift all the way to the point where your short becomes negative, and when you then shift back, you get the sign extension.
This doesn't happen in the first case, as the shift isn't applied to a short. It's applied to 255 which isn't a short, but the default integral type (probably an int). It only gets casted after it's already been shifted back:
on the stack: 0000 0000 0000 0000 0000 0000 1111 1111
<<8
on the stack: 0000 0000 0000 0000 1111 1111 0000 0000
>>8
on the stack: 0000 0000 0000 0000 0000 0000 1111 1111
convert to short: 0000 0000 1111 1111

bitwise shifts, unsigned chars

Can anyone explain verbosely what this accomplishes? Im trying to learn c and am having a hard time wrapping my head around it.
void tonet_short(uint8_t *p, unsigned short s) {
p[0] = (s >> 8) & 0xff;
p[1] = s & 0xff;
}
void tonet_long(uint8_t *p, unsigned long l)
{
p[0] = (l >> 24) & 0xff;
p[1] = (l >> 16) & 0xff;
p[2] = (l >> 8) & 0xff;
p[3] = l & 0xff;
}
Verbosely, here it goes:
As a direct answer; both of them stores the bytes of a variable inside an array of bytes, from left to right. tonet_short does that for unsigned short variables, which consist of 2 bytes; and tonet_long does it for unsigned long variables, which consist of 4 bytes.
I will explain it for tonet_long, and tonet_short will just be the variation of it that you'll hopefully be able to derive yourself:
unsigned variables, when their bits are bitwise-shifted, get their bits shifted towards the determined side for determined amount of bits, and the vacated bits are made to be 0, zeros. I.e.:
unsigned char asd = 10; //which is 0000 1010 in basis 2
asd <<= 2; //shifts the bits of asd 2 times towards left
asd; //it is now 0010 1000 which is 40 in basis 10
Keep in mind that this is for unsigned variables, and these may be incorrect for signed variables.
The bitwise-and & operator compares the bits of two operands on both sides, returns a 1 (true) if both are 1 (true), and 0 (false) if any or both of them are 0 (false); and it does this for each bit. Example:
unsigned char asd = 10; //0000 1010
unsigned char qwe = 6; //0000 0110
asd & qwe; //0000 0010 <-- this is what it evaluates to, which is 2
Now that we know the bitwise-shift and bitwise-and, let's get to the first line of the function tonet_long:
p[0] = (l >> 24) & 0xff;
Here, since l is unsigned long, the (l >> 24) will be evaluated into the first 4 * 8 - 24 = 8 bits of the variable l, which is the first byte of the l. I can visualize the process like this:
abcd efgh ijkl mnop qrst uvwx yz.. .... //letters and dots stand for
//unknown zeros and ones
//shift this 24 times towards right
0000 0000 0000 0000 0000 0000 abcd efgh
Note that we do not change the l, this is just the evaluation of l >> 24, which is temporary.
Then the 0xff which is just 0000 0000 0000 0000 0000 0000 1111 1111 in hexadecimal (base 16), gets bitwise-anded with the bitwise-shifted l. It goes like this:
0000 0000 0000 0000 0000 0000 abcd efgh
&
0000 0000 0000 0000 0000 0000 1111 1111
=
0000 0000 0000 0000 0000 0000 abcd efgh
Since a & 1 will be simply dependent strictly on a, so it will be a; and same for the rest... It looks like a redundant operation for this, and it really is. It will, however, be important for the rest. This is because, for example, when you evaluate l >> 16, it looks like this:
0000 0000 0000 0000 abcd efgh ijkl mnop
Since we want only the ijkl mnop part, we have to discard the abcd efgh, and that will be done with the aid of 0000 0000 that 0xff has on its corresponding bits.
I hope this helps, the rest happens like it does this far, so... yeah.
These routines convert 16 and 32 bit values from native byte order to standard network(big-endian) byte order. They work by shifting and masking 8-bit chunks from the native value and storing them in order into a byte array.
If I see it right, I basically switches the order of bytes in the short and in the long ... (reverses the byte order of the number) and stores the result at an address which hopefully has enough space :)
explain verbosely - OK...
void tonet_short(uint8_t *p, unsigned short s) {
short is typically a 16-bit value (max: 0xFFFF)
The uint8_t is an unsigned 8-bit value, and p is a pointer to some number of unsigned 8-bit values (from the code we're assuming at least 2 sequential ones).
p[0] = (s >> 8) & 0xff;
This takes the "top half" of the value in s and puts it in the first element in the array p. So let's assume s==0x1234.
First s is shifted by 8 bits (s >> 8 == 0x0012)then it's AND'ed with 0xFF and the result is stored in p[0]. (p[0] == 0x12)
p[1] = s & 0xff;
Now note that when we did that shift, we never changed the original value of s, so s still has the original value of 0x1234, thus when we do this second line we simply do another bit-wise AND and p[1] get the "lower half" of the value of s (p[0] == 0x34)
The same applies for the other function you have there, but it's a long instead of a short, so we're assuming p in this case has enough space for all 32-bits (4x8) and we have to do some extra shifts too.
This code is used to serialize a 16-bit or 32-bit number into bytes (uint8_t). For example, to write them to disk, or to send them over a network connection.
A 16-bit value is split into two parts. One containing the most-significant (upper) 8 bits, the other containing least-significant (lower) 8 bits. The most-significant byte is stored first, then the least-significant byte. This is called big endian or "network" byte order. That's why the functions are named tonet_.
The same is done for the four bytes of a 32-bit value.
The & 0xff operations are actually useless. When a 16-bit or 32-bit value is converted to an 8-bit value, the lower 8 bits (0xff) are masked implicitly.
The bit-shifts are used to move the needed byte into the lowest 8 bits. Consider the bits of a 32-bit value:
AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
The most significant byte are the 8 bits named A. In order to move them into the lowest 8 bits, the value has to be right-shifted by 24.
The names of the functions are a big hint... "to net short" and "to net long".
If you think about decimal... say we have a two pieces of paper so small we can only write one digit on each of them, we can therefore use both to record all the numbers from 0 to 99: 00, 01, 02... 08, 09, 10, 11... 18, 19, 20...98, 99. Basically, one piece of paper holds the "tens" column (given we're in base 10 for decimal), and the other the "units".
Memory works like that where each byte can store a number from 0..255, so we're working in base 256. If you have two bytes, one of them's going to be the "two-hundred-and-fifty-sixes" column, and the other the "units" column. To work out the combined value, you multiple the former by 256 and add the latter.
On paper we write numbers with the more significant ones on the left, but on a computer it's not clear if a more significant value should be in a higher or lower memory address, so different CPU manufacturers picked different conventions.
Consequently, some computers store 258 - which is 1 * 256 + 2 - as low=1 high=2, while others store low=2 high=1.
What these functions do is rearrange the memory from whatever your CPU happens to use to a predictable order - namely, the more significant value(s) go into the lower memory addresses, and eventually the "units" value is put into the highest memory address. This is a consistent way of storing the numbers that works across all computer types, so it's great when you want to transfer the data over the network; if the receiving computer uses a different memory ordering for the base-256 digits, it can move them from network byte ordering to whatever order it likes before interpreting them as CPU-native numbers.
So, "to net short" packs the most significant 8 bits of s into p[0] - the lower memory address. It didn't actually need to & 0xff as after taking the 16 input bits and shifting them 8 to the "right", all the left-hand 8 bits are guaranteed 0 anyway, which is the affect from & 0xFF - for example:
1010 1111 1011 0111 // = decimal 10*256^3 + 15*256^2 + 11*256 + 7
>>8 0000 0000 1010 1111 // move right 8, with left-hand values becoming 0
0xff 0000 0000 1111 1111 // we're going to and the above with this
& 0000 0000 1010 1111 // the bits that were on in both the above 2 values
// (the and never changes the value)

What does a bitwise shift (left or right) do and what is it used for?

I've seen the operators >> and << in various code that I've looked at (none of which I actually understood), but I'm just wondering what they actually do and what some practical uses of them are.
If the shifts are like x * 2 and x / 2, what is the real difference from actually using the * and / operators? Is there a performance difference?
Here is an applet where you can exercise some bit-operations, including shifting.
You have a collection of bits, and you move some of them beyond their bounds:
1111 1110 << 2
1111 1000
It is filled from the right with fresh zeros. :)
0001 1111 >> 3
0000 0011
Filled from the left. A special case is the leading 1. It often indicates a negative value - depending on the language and datatype. So often it is wanted, that if you shift right, the first bit stays as it is.
1100 1100 >> 1
1110 0110
And it is conserved over multiple shifts:
1100 1100 >> 2
1111 0011
If you don't want the first bit to be preserved, you use (in Java, Scala, C++, C as far as I know, and maybe more) a triple-sign-operator:
1100 1100 >>> 1
0110 0110
There isn't any equivalent in the other direction, because it doesn't make any sense - maybe in your very special context, but not in general.
Mathematically, a left-shift is a *=2, 2 left-shifts is a *=4 and so on. A right-shift is a /= 2 and so on.
Left bit shifting to multiply by any power of two and right bit shifting to divide by any power of two.
For example, x = x * 2; can also be written as x<<1 or x = x*8 can be written as x<<3 (since 2 to the power of 3 is 8). Similarly x = x / 2; is x>>1 and so on.
Left Shift
x = x * 2^value (normal operation)
x << value (bit-wise operation)
x = x * 16 (which is the same as 2^4)
The left shift equivalent would be x = x << 4
Right Shift
x = x / 2^value (normal arithmetic operation)
x >> value (bit-wise operation)
x = x / 8 (which is the same as 2^3)
The right shift equivalent would be x = x >> 3
Left shift: It is equal to the product of the value which has to be shifted and 2 raised to the power of number of bits to be shifted.
Example:
1 << 3
0000 0001 ---> 1
Shift by 1 bit
0000 0010 ----> 2 which is equal to 1*2^1
Shift By 2 bits
0000 0100 ----> 4 which is equal to 1*2^2
Shift by 3 bits
0000 1000 ----> 8 which is equal to 1*2^3
Right shift: It is equal to quotient of value which has to be shifted by 2 raised to the power of number of bits to be shifted.
Example:
8 >> 3
0000 1000 ---> 8 which is equal to 8/2^0
Shift by 1 bit
0000 0100 ----> 4 which is equal to 8/2^1
Shift By 2 bits
0000 0010 ----> 2 which is equal to 8/2^2
Shift by 3 bits
0000 0001 ----> 1 which is equal to 8/2^3
Left bit shifting to multiply by any power of two.
Right bit shifting to divide by any power of two.
x = x << 5; // Left shift
y = y >> 5; // Right shift
In C/C++ it can be written as,
#include <math.h>
x = x * pow(2, 5);
y = y / pow(2, 5);
The bit shift operators are more efficient as compared to the / or * operators.
In computer architecture, divide(/) or multiply(*) take more than one time unit and register to compute result, while, bit shift operator, is just one one register and one time unit computation.
Some examples:
Bit operations for example converting to and from Base64 (which is 6 bits instead of 8)
doing power of 2 operations (1 << 4 equal to 2^4 i.e. 16)
Writing more readable code when working with bits. For example, defining constants using
1 << 4 or 1 << 5 is more readable.
Yes, I think performance-wise you might find a difference as bitwise left and right shift operations can be performed with a complexity of o(1) with a huge data set.
For example, calculating the power of 2 ^ n:
int value = 1;
while (exponent<n)
{
// Print out current power of 2
value = value *2; // Equivalent machine level left shift bit wise operation
exponent++;
}
}
Similar code with a bitwise left shift operation would be like:
value = 1 << n;
Moreover, performing a bit-wise operation is like exacting a replica of user level mathematical operations (which is the final machine level instructions processed by the microcontroller and processor).
Here is an example:
#include"stdio.h"
#include"conio.h"
void main()
{
int rm, vivek;
clrscr();
printf("Enter any numbers\t(E.g., 1, 2, 5");
scanf("%d", &rm); // rm = 5(0101) << 2 (two step add zero's), so the value is 10100
printf("This left shift value%d=%d", rm, rm<<4);
printf("This right shift value%d=%d", rm, rm>>2);
getch();
}

selective access to bits on datatypes with C++

I'm using C++ for hardware-based model design with SystemC. SystemC as a C++ extension introduces specific datatypes useful for signal and byte descriptions.
How can I access the first bits of a datatype in general, like:
sc_bv<16> R0;
or access the first four bits of tmp.
int my_array[42];
int tmp = my_array[1];
sc_bv is a bit-vector data-type, that's storing binary sequences. Now I want the first four bits of that data-type e. g.. My background is C# and Java, therefore I miss some of the OOP and Reflexion based API constructs in general. I need to perform conversion on this low-level stuff. Useful introductory stuff would help a lot.
Thanks :),
wishi
For sc_bv, you can use the indexing operator []
For the int, just use normal bitwise operations with constants, e.g. the least significant bit in tmp is tmp & 1
I can't really speak for SystemC (sounds interesting though). In normal C you'd read out the lower four bits with a mask like so:
temp = R0 & 0xf;
and write into only the lower four bits (assuming a 32-bit register, and temp<16) like so:
R0 = (R0 & 0xfffffff0) | temp;
To access the first four (i assume you mean four highest bits) bits of tmp (ie to get their values) you use bit masks. So if you want to know if for example the second bit is set you do the following:
int second_bit = (tmp & 0x4000000) >> 30;
now second_bit is 1 if the bit is set and zero otherwise. The idea behind this is the following:
Imagine tmp is (in binary)
1101 0000 0000 0000 0000 0000 0000 0000
Now you use bitwise AND ( the & ) with the following value
0100 0000 0000 0000 0000 0000 0000 0000 // which is 0x40000000 in hex
ANDing produces a 1 on the given bit if and only if both operands have corresponding bits set (they are both 1). So the result will be:
0100 0000 0000 0000 0000 0000 0000 0000
Then you shift this 30 bits to the right, which makes it be:
0000 0000 0000 0000 0000 0000 0000 0001 \\ which is 1
Note that if the original value had the tested bit zero, the result would be zero.
This way you can test any bit you like, you just need to provide correct mask. Note that i assumed here that int is 32bits wide, which should be true in most cases.
You will have to know a bit more about sc_bv to amke sure you get the right information. Also, when you say the "first four bytes" I assume you mean the "first four bits." However, that is misleading as well, because you really want to delineate between the low-order or high-order bits.
In any event, you use the C bitwise operators for this kind of thing. However, you will need to know the size of the integer values AND the "endian-ness" of the runtime architecture to get that right.
But, if you REALLY want just the first four bits, then you would do something like this...
inline unsigned char
first_4_bits(void const * ptr)
{
return (*reinterpret_cast<unsigned char const *>(ptr) & 0xf0) >> 4;
}
and that will grab the very first 4 bits of what it being pointed at. So, if the first byte pointed-to is 0x38, then this function will return the first 4 bits, so the result will be 3.