I have some code here, and don't really understand the ">>" and the "&". Can someone clarify?
buttons[0] = indata[byteindex]&1;
buttons[1] = (indata[byteindex]>>1)&1;
rawaxes[7] = (indata[byteindex]>>4)&0xf;
These are bitwise operators, meaning they operate on the binary bits that make up a value. See Bitwise operation on Wikipedia for more detail.
& is for AND
If indata[byteindex] is the number 4, then in binary it would look like 00000100. ANDing this number with 1 gives 0, because bit 1 is not set:
00000100 AND 00000001 = 0
If the value is 5 however, then you will get this:
00000101 AND 00000001 = 1
Any bit matched with the mask is allowed through.
>> is for right-shifting
Right-shifting shifts bits along to the right!
00010000 >> 4 = 00000001
One of the standard patterns for extracting a bit field is (reg >> offset) & mask, where reg is the register (or other memory location) you're reading, offset is how many least-significant bits you skip over, and mask is the set of bits that matter. The >> offset step can be omitted if offset is 0. mask is usually equal to 2width-1, or (1 << width) - 1 in C, where width is the number of bits in the field.
So, looking at what you have:
buttons[0] = indata[byteindex]&1;
Here, offset is 0 (it was omitted) and mask is 1. So this gets just the least-significant bit in indata[byteindex]:
bit number -> 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
indata[byteindex] | | | | | | | |*|
+-+-+-+-+-+-+-+-+
|
\----> buttons[0]
Next:
buttons[1] = (indata[byteindex]>>1)&1;
Here, offset is 1 and width is 1...
bit number -> 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
indata[byteindex] | | | | | | |*| |
+-+-+-+-+-+-+-+-+
|
\------> buttons[1]
And, finally:
rawaxes[7] = (indata[byteindex]>>4)&0xf;
Here, offset is 4 and width is 4 (24-1 = 16 - 1 = 15 = 0xf):
bit number -> 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
indata[byteindex] |*|*|*|*| | | | |
+-+-+-+-+-+-+-+-+
| | | |
\--v--/
|
\---------------> rawaxes[7]
EDIT...
but I don't understand what the point of it is...
Mike pulls up a rocking chair and sits down.
Back in the old days of 8-bit CPUs, a computer typically had 64K (65 536 bytes) of address space. Now we wanted to do as much as we could with our fancy whiz-bang machines, so we would do things like buy 64K of RAM and map everything to RAM. Shazam, 64K of RAM and bragging rights all around.
But a computer that can only access RAM isn't much good. It needs some ROM for an OS (or at least a BIOS), and some addresses for I/O. (You in the back--siddown. I know Intel chips had separate address space for I/O, but it doesn't help here because the I/O space was much, much smaller than the memory space, so you ran into the same constraints.)
Address space used for ROM and I/O was space that wasn't accessible as RAM, so you wanted to minimize how much space wasn't used for RAM. So, for example, when your I/O peripheral had five different things whose status amounted to a single bit each, rather than give each one of those bits its own byte (and, hence, address), they got the brilliant idea of packing all five of those bits into one byte, leaving three bits that did nothing. Voila, the Interrupt Status Register was born.
The hardware designers were also impressed with how fewer addresses resulted in fewer address bits (since address bits is ceiling of log-base-2 of number of addresses), meaning fewer address pins on the chip, freeing pins for other purposes. (These were the days when 48-pin chips were considered large, and 64-pins huge, and grid array packages were out of the question because multi-layer circuit boards were prohibitively expensive. These were also the days before multiplexing the address and data on the same pins became commonplace.)
So the chips were taped out and fabricated, and hardware was built, and then it fell to the programmers to make the hardware work. And lo, the programmers said, "WTF? I just want to know if there is a byte to read in the bloody serial port, but there are all these other bits like "receiver overrun" in the way." And the hardware guys considered this, and said, "tough cookies, deal with it."
So the programmers went to the Guru, the guy who hadn't forgotten his Boolean algebra and was happy not to be writing COBOL. And the Guru said, "use the Bit AND operation to force those bits you don't care about to 0. If you need a number, and not just a zero-or-nonzero, use a logical shift right (LSR) on the result." And they tried it. It worked, and there was much rejoicing, though the wiser ones started wondering about things like race conditions in a read-modify-write cycle, but that's a story for another time.
And so the technique of packing loosely or completely unrelated bits into registers became commonplace. People developing protocols, which always want to use fewer bits, jumped on these techniques as well. And so, even today, with our gigabytes of RAM and gigabits of bandwidth, we still pack and unpack bitfields with expressions whose legibility borders on keyboard head banging.
(Yes, I know bit fields probably go back to the ENIAC, and maybe even the Difference Engine if Lady Ada needed to stuff two data elements into one register, but I haven't been alive that long, okay? I'm sticking with what I know.)
(Note to hardware designers out there: There really isn't much justification anymore for packing things like status flags and control bits that a driver writer will want to use independently. I've done several designs with one bit per 32-bit register in many cases. No bit shifting or masking, no races, driver code is simpler to write and understand, and the address decode logic is trivially more complex. If the driver software is complex, simplifying flag and bitfield handling can save you a lot of ROM and CPU cycles.)
(More random trivia: The Atmel AVR architecture (used in the Arduino, among many other places) has some specialized bit-set and bit-clear instructions. The avr-libc library used to provide macros for these instructions, but now the gcc compiler is smart enough to recognize that reg |= (1 << bitNum); is a bit set and reg &= ~(1 << bitNum); is a bit clear, and puts in the proper instruction. I'm sure other architectures have similar optimizations.)
These are bitwise operators.
& ands two arguments bit by bit.
'>>' shifts first argument's bit string to the right by second argument.
'<<' does the opposite. | is bitwise or and ^ is bitwise xor just like & is bitwise and.
In English, the first line is grabbing to lowest bit (bit 0) only out of Button[0]. Basically, if the value is odd, it will be 1, if even, it will be 0.
(bit 1)
The second is grabbing the second bit. If that bit is set, it returns 1, else 0. It could have also been written as
buttons[1] = (indata[byteindex]&2)>>1;
and it would have done the same thing.
The last (3rd) line is grabbing the 5th throuh 8th bits (bits 4-7). Basically, it will be a number from 0 to 15 when it is complete. It also could hav been written as
rawaxes[7] = (indata[byteindex]&0xf0) >> 4;
and done the same thing. I'd also guess from context that these arrays are unsigned char arrays. Just a guess though.
The '&' (in this case) is a bitwise AND operator and ">>" is the bit-shift operator (so x>>y yields x shifted right Y bits).
So, they're taking the least significant bit of indata[byteindex] and putting it into buttons[0]. They taking the next least significant bit and putting it into buttons[1].
The last one probably needs to be looked at in binary to make a lot of sense. 0xf is 11112, so they're taking the input, shifting it right 4 bits, then retaining the 4 least significant bits of that result.
Related
I am working on a toy file system, I am using a bitset to keep track of used and unused pages. I am using an array of ints (in order to use GCC's built in bit ops.) to represent the bitset. I am not using the std::bitset as it will not be available on the final environment (embedded system.).
Now according to Linux perf during the tests allocating files takes 35% of runtime of this, 45% of the time is lost setting bits using,
#define BIT_SET(a,b) ((a) |= (1ULL<<(b)))
inside a loop. According to perf 42% of the time is lost in or. Deleting is a bit faster but then most time is lost in and operation to clear the bits toggling the bits using xor did not made any difference.
Basically I am wondering if there are smarter ways to set multiple bits in one go. If user requests 10 pages of space just set all bits in one go, but the problem is space can span word boundries. or any GCC/Clang instrinsics that I should be aware of?
You should be able to use a function like this to set multiple bits in a bitset at once:
void set_mask(word_t* bitset, word_t mask, int lowbit) {
int index= lowbit / sizeof(word_t);
int offset = lowbit % sizeof(word_t);
bitset[index] |= (mask << offset);
mask >>= (sizeof(word_t) - offset);
bitset[index+1] |= mask
}
If the mask does not span a boundary, the 2nd word is ORd with 0, so it is unchanged. Doing it unconditionally may be faster than the test to see if it needs to be done. If testing shows otherwise, add an if (mask) before the last line.
I have problem in accessing 32 most significant and 32 least significant bits in Verilog. I have written the following code but I get the error "Illegal part-select expression" The point here is that I don't have access to a 64 bit register. Could you please help.
`MLT: begin
if (multState==0) begin
{C,Res}<={A*B}[31:0];
multState=1;
end
else
begin
{C,Res}<={A*B}[63:32];
multState=2;
end
Unfortunately the bit-select and part-select features of Verilog are part of expression operands. They are not Verilog operators (see Sec. 5.2.1 of the Verilog 2005 Std. Document, IEEE Std 1364-2005) and can therefore not be applied to arbitrary expressions but only directly to registers or wires.
There are various ways to do what you want but I would recommend using a temporary 64 bit variable:
wire [31:0] A, B;
reg [63:0] tmp;
reg [31:0] ab_lsb, ab_msb;
always #(posedge clk) begin
tmp = A*B;
ab_lsb <= tmp[31:0];
ab_msb <= tmp[63:32];
end
(The assignments to ab_lsb and ab_msb could be conditional. Otherwise a simple "{ab_msb, ab_lsb} <= A*B;" would do the trick as well of course.)
Note that I'm using a blocking assignment to assign 'tmp' as I need the value in the following two lines. This also means that it is unsafe to access 'tmp' from outside
this always block.
Also note that the concatenation hack {A*B} is not needed here, as A*B is assigned to a 64 bit register. This also fits the recommendation in Sec 5.4.1 of IEEE Std 1364-2005:
Multiplication may be performed without losing any overflow bits by assigning the result
to something wide enough to hold it.
However, you said: "The point here is that I don't have access to a 64 bit register".
So I will describe a solution that does not use any Verilog 64 bit registers. This will however not have any impact on the resulting hardware. It will only look different in
the Verilog code.
The idea is to access the MSB bits by shifting the result of A*B. The following naive version of this will not work:
ab_msb <= (A*B) >> 32; // Don't do this -- it won't work!
The reason why this does not work is that the width of A*B is determined by the left hand side of the assignment, which is 32 bits. Therefore the result of A*B will only contain the lower 32 bits of the results.
One way of making the bit width of an operation self-determined is by using the concatenation operator:
ab_msb <= {A*B} >> 32; // Don't do this -- it still won't work!
Now the result width of the multiplication is determined using the max. width of its operands. Unfortunately both operands are 32 bit and therefore we still have a 32 bit multiplication. So we need to extend one operand to be 64 bit, e.g. by appending zeros
(I assume unsigned operands):
ab_msb <= {{32'd0, A}*B} >> 32;
Accessing the lsb bits is easy as this is the default behavior anyways:
ab_lsb <= A*B;
So we end up with the following alternative code:
wire [31:0] A, B;
reg [31:0] ab_lsb, ab_msb;
always #(posedge clk) begin
ab_lsb <= A*B;
ab_msb <= {{32'd0, A}*B} >> 32;
end
Xilinx XST 14.2 generates the same RTL netlist for both versions. I strongly recommend the first version as it is much easier to read and understand. If only 'ab_lsb' or 'ab_msb' is used, the synthesis tool will automatically discard the unused bits of 'tmp'. So there is really no difference.
If this is not the information you where looking for you should probably clarify why and how you "don't have access to 64 bit registers". After all, you try to access the bits [63:32] of a 64 bit value in your code as well. As you can't calculate the upper 32 bits of the product A*B without also performing almost all calculations required for the lower 32 bits, you might be asking for something that is not possible.
You are mixing blocking and non-blocking assignments here:
{C,Res}<={A*B}[63:32]; //< non-blocking
multState=2; //< blocking
this is considered bad practice.
Not sure that a concatenation operation which is just {A*B} is valid. At best it does nothing.
The way you have encoded it looks like you will end up with 2 hardware multipliers. What makes you say you do not have a 64 bit reg, available? reg does not have to imply flip-flops. If you have 2 32bit regs then you could have 1 64 bit one. I would personally do the multiply on 1 line then split the result up and output as 2 32 bit sections.
However :
x <= (a*b)[31:0] is unfortunately not allowed. If x is 32 bits it will take the LSBs, so all you need is :
x <= (a*b)
To take the MSBs you could try:
reg [31:0] throw_away;
{x, throw_away} <= (a*b) ;
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
practical applications of bitwise operations
I have been programming for several years now and I have always wondered about the practical application of bitwise operators.
In my programming experience, I have not had to utilize the bitwise operators.
When are they most commonly used?
In my programming career, is it necessary for me to learn these?
Thank you.
Amicably,
James
Bitwise operations are frequently used close to the hardware - when packing data, doing compression, or packing multiple booleans into a byte. Bitwise operations map directly to processor instructions, and are often extremely fast.
If you're working with I/O or device interfaces, bitwise operations become very necessary - to separate parts of a bitfield into important data.
Or you could just use it as a fast multiply-by-two. :)
Another fun usage for binary and bit twiddling.
Packing Morse code into a single byte. A . is 0 and a - is 1.
A = .-
A = 00000001xB
// Add a 'start bit'
A = 00000101xB
Shift the bit around 8 times, start playing sounds when you find the start bit.
+------- Monitor this position
V
A = 00000101 // Starting off
A = 00001010 // Nothing yet
A = 00010100 // Still nothing
A = 00101000 // WOw, a lot of nothing
A = 01010000 // Our boring life, we do nothing
A = 10100000 // Wow! A start bit! Prep to play sound.
A = 01000000 // Play a short
A = 10000000 // And play a long.
I have not needed it lately but back when coding pascal I used it to multiply or divide whenever the divisor or multiplication was a power of 2.
Color was stored in a byte with textcolor in the low 4 bits and background color in the high 4 bits.
Using c << 4 instead if c * 16 ,and c >> 4 instead of c / 16 to save or retrieve background was many times faster.
And retrieving textcolor with c <<4 >> 4 was also faster than c & 15 (bitvize and) for some reason. Probably register related ;) but thats way over my head to :D
But unless you are doing checksum calculations, compression or encryption you probably can do without.
Even if you can store bits in an int many times drivers can optimize things for you any way and in c# you can use Flag enums to automatically pack bit flags into byte, word or integer values.
So I would guess that since you have not found a use, you probably are not ding work in the area where they make sense.
I am in the process of building an assembler for a rather unusual machine that me and a few other people are building. This machine takes 18 bit instructions, and I am writing the assembler in C++.
I have collected all of the instructions into a vector of 32 bit unsigned integers, none of which is any larger than what can be represented with an 18 bit unsigned number.
However, there does not appear to be any way (as far as I can tell) to output such an unusual number of bits to a binary file in C++, can anyone help me with this.
(I would also be willing to use C's stdio and File structures. However there still does not appear to be any way to output such an arbitrary amount of bits).
Thank you for your help.
Edit: It looks like I didn't specify how the instructions will be stored in memory well enough.
Instructions are contiguous in memory. Say the instructions start at location 0 in memory:
The first instruction will be at 0. The second instruction will be at 18, the third instruction will be at 36, and so on.
There is no gaps, or no padding in the instructions. There can be a few superfluous 0s at the end of the program if needed.
The machine uses big endian instructions. So an instruction stored as 3 should map to: 000000000000000011
Keep an eight-bit accumulator.
Shift bits from the current instruction into to the accumulator until either:
The accumulator is full; or
No bits remain of the current instruction.
Whenever the accumulator is full:
Write its contents to the file and clear it.
Whenever no bits remain of the current instruction:
Move to the next instruction.
When no instructions remain:
Shift zeros into the accumulator until it is full.
Write its contents.
End.
For n instructions, this will leave (8 - 18n mod 8) zero bits after the last instruction.
There are a lot of ways you can achieve the same end result (I am assuming the end result is a tight packing of these 18 bits).
A simple method would be to create a bit-packer class that accepts the 32-bit words, and generates a buffer that packs the 18-bit words from each entry. The class would need to do some bit shifting, but I don't expect it to be particularly difficult. The last byte can have a few zero bits at the end if the original vector length is not a multiple of 4. Once you give all your words to this class, you can get a packed data buffer, and write it to a file.
You could maybe represent your data in a bitset and then write the bitset to a file.
Wouldn't work with fstreams write function, but there is a way that is described here...
The short answer: Your C++ program should output the 18-bit values in the format expected by your unusual machine.
We need more information, specifically, that format that your "unusual machine" expects, or more precisely, the format that your assembler should be outputting. Once you understand what the format of the output that you're generating is, the answer should be straightforward.
One possible format — I'm making things up here — is that we could take two of your 18-bit instructions:
instruction 1 instruction 2 ...
MSB LSB MSB LSB ...
bits → ABCDEFGHIJKLMNOPQR abcdefghijklmnopqr ...
...and write them in an 8-bits/byte file thus:
KLMNOPQR CDEFGHIJ 000000AB klmnopqr cdefghij 000000ab ...
...this is basically arranging the values in "little-endian" form, with 6 zero bits padding the 18-bit values out to 24 bits.
But I'm assuming: the padding, the little-endianness, the number of bits / byte, etc. Without more information, it's hard to say if this answer is even remotely near correct, or if it is exactly what you want.
Another possibility is a tight packing:
ABCDEFGH IJKLMNOP QRabcdef ghijklmn opqr0000
or
ABCDEFGH IJKLMNOP abcdefQR ghijklmn 0000opqr
...but I've made assumptions about where the corner cases go here.
Just output them to the file as 32 bit unsigned integers, just as you have in memory, with the endianness that you prefer.
And then, when the loader / eeprom writer / JTAG or whatever method you use to send the code to the machine, for each 32 bit word that is read, just omit the 14 more significant bits and send the real 18 bits to the target.
Unless, of course, you have written a FAT driver for your machine...
I have a stream of 16 bit values, and I need to adjust the 4 least significant bits of each sample. The new values are different for each short, but repeat every X shorts - essentially tagging each short with an ID.
Are there any bit twiddling tricks to do this faster than just a for-loop?
More details
I'm converting a file from one format to another. Currently implemented with FILE* but I could use Windows specific APIs if helpful.
[while data remaining]
{
read X shorts from input
tag 4 LSB's
write modified data to output
}
In addition to bulk operations, I guess I was looking for opinions on the best way to stomp those last 4 bits.
Shift right 4, shift left 4, | in the new values
& in my zero bits, then | in the 1 bits
modulus 16, add new value
We're only supporting win7 (32 or 64) right now, so hardware would be whatever people choose for that.
If you're working on e.g. a 32-bit platform, you can do them 2 at a time. Or on a modern x86 equivalent, you could use SIMD instructions to operate on 128 bits at a time.
Other than that, there are no bit-twiddling methods to avoid looping through your entire data set, given that it sounds like you must modify every element!
Best way to stomp those last 4 bits is your option 2:
int i;
i &= 0xFFF0;
i |= tag;
Doing this on a long would be faster if you know tag values in advance.
You can memcpy 4 shorts in one long and then do the same operations as above on 4 shorts at a time:
long l;
l &= 0xFFF0FFF0FFF0FFF0;
l |= tags;
where tags = (long) tag1 << 48 + (long) tag2 << 32 + (long) tag3 << 16 + (long) tag4;
This has sense if you are reusing this value tags often, not if you have to build it differently for each set of 4 shorts.