I have problem in accessing 32 most significant and 32 least significant bits in Verilog. I have written the following code but I get the error "Illegal part-select expression" The point here is that I don't have access to a 64 bit register. Could you please help.
`MLT: begin
if (multState==0) begin
{C,Res}<={A*B}[31:0];
multState=1;
end
else
begin
{C,Res}<={A*B}[63:32];
multState=2;
end
Unfortunately the bit-select and part-select features of Verilog are part of expression operands. They are not Verilog operators (see Sec. 5.2.1 of the Verilog 2005 Std. Document, IEEE Std 1364-2005) and can therefore not be applied to arbitrary expressions but only directly to registers or wires.
There are various ways to do what you want but I would recommend using a temporary 64 bit variable:
wire [31:0] A, B;
reg [63:0] tmp;
reg [31:0] ab_lsb, ab_msb;
always #(posedge clk) begin
tmp = A*B;
ab_lsb <= tmp[31:0];
ab_msb <= tmp[63:32];
end
(The assignments to ab_lsb and ab_msb could be conditional. Otherwise a simple "{ab_msb, ab_lsb} <= A*B;" would do the trick as well of course.)
Note that I'm using a blocking assignment to assign 'tmp' as I need the value in the following two lines. This also means that it is unsafe to access 'tmp' from outside
this always block.
Also note that the concatenation hack {A*B} is not needed here, as A*B is assigned to a 64 bit register. This also fits the recommendation in Sec 5.4.1 of IEEE Std 1364-2005:
Multiplication may be performed without losing any overflow bits by assigning the result
to something wide enough to hold it.
However, you said: "The point here is that I don't have access to a 64 bit register".
So I will describe a solution that does not use any Verilog 64 bit registers. This will however not have any impact on the resulting hardware. It will only look different in
the Verilog code.
The idea is to access the MSB bits by shifting the result of A*B. The following naive version of this will not work:
ab_msb <= (A*B) >> 32; // Don't do this -- it won't work!
The reason why this does not work is that the width of A*B is determined by the left hand side of the assignment, which is 32 bits. Therefore the result of A*B will only contain the lower 32 bits of the results.
One way of making the bit width of an operation self-determined is by using the concatenation operator:
ab_msb <= {A*B} >> 32; // Don't do this -- it still won't work!
Now the result width of the multiplication is determined using the max. width of its operands. Unfortunately both operands are 32 bit and therefore we still have a 32 bit multiplication. So we need to extend one operand to be 64 bit, e.g. by appending zeros
(I assume unsigned operands):
ab_msb <= {{32'd0, A}*B} >> 32;
Accessing the lsb bits is easy as this is the default behavior anyways:
ab_lsb <= A*B;
So we end up with the following alternative code:
wire [31:0] A, B;
reg [31:0] ab_lsb, ab_msb;
always #(posedge clk) begin
ab_lsb <= A*B;
ab_msb <= {{32'd0, A}*B} >> 32;
end
Xilinx XST 14.2 generates the same RTL netlist for both versions. I strongly recommend the first version as it is much easier to read and understand. If only 'ab_lsb' or 'ab_msb' is used, the synthesis tool will automatically discard the unused bits of 'tmp'. So there is really no difference.
If this is not the information you where looking for you should probably clarify why and how you "don't have access to 64 bit registers". After all, you try to access the bits [63:32] of a 64 bit value in your code as well. As you can't calculate the upper 32 bits of the product A*B without also performing almost all calculations required for the lower 32 bits, you might be asking for something that is not possible.
You are mixing blocking and non-blocking assignments here:
{C,Res}<={A*B}[63:32]; //< non-blocking
multState=2; //< blocking
this is considered bad practice.
Not sure that a concatenation operation which is just {A*B} is valid. At best it does nothing.
The way you have encoded it looks like you will end up with 2 hardware multipliers. What makes you say you do not have a 64 bit reg, available? reg does not have to imply flip-flops. If you have 2 32bit regs then you could have 1 64 bit one. I would personally do the multiply on 1 line then split the result up and output as 2 32 bit sections.
However :
x <= (a*b)[31:0] is unfortunately not allowed. If x is 32 bits it will take the LSBs, so all you need is :
x <= (a*b)
To take the MSBs you could try:
reg [31:0] throw_away;
{x, throw_away} <= (a*b) ;
Related
Assuming x is a 8bit unsigned integer, what is the most efficient command to set the last two bits to 01 ?
So regardless of the initial value it should be x = ******01 in the final state.
In order to set
the last bit to 1, one can use OR like x |= 00000001, and
the forelast bit to 0, one can use AND like x &= 11111101 which is ~(1<<1).
Is there an arithmetic / logical operation that can be use to apply both operations at the same time?
Can this be answered independently of programm-specific implementation but pure logical operations?
Of course you can put both operation in one instructions.
x = (x & 0b11111100) | 1;
That at least saves one assignment and one memory read/write. However, most likely the compilers may optimize this anyway, even if put into two instructions.
Depending on the target CPU, compilers may even optimize the code into bit manipulation instructions that directly can set or reset single bits. And if the variable is locally heavily used, then most likely its kept in register.
So at the end the generated code may look at as simple as (pseudo asm):
load x
setBit 0
clearBit 1
store x
however it also may be compiled into something like
load x to R1
load immediate 0b11111100 to R2
and R1, R2
load immediate 1 to R2
or R1, R2
store R1 to x
For playing with things like this you may throw a look at compiler explorer https://godbolt.org/z/sMhG3YTx9
Please try removing the -O2 compiler option and see the difference between optimized and non-optimized code. Also you may try to switch to different cpu architectures and compilers
This is not possible with a dyadic bitwise operator. Because the second operand should allow to distinguish between "reset to 0", "set to 1" or "leave unchanged". This cannot be encoded in a single binary mask.
Instead, you can use the bitwise ternary operator and form the expression
x = 0b11111100 ??? x : 0b00000001.
Anyway, a piece of caution: this operator does not exist.
I'm trying to figure out how does this code work, but I can't manage to get a single answer.
#define testbit(x, y) ( ( ((const char*) & (x))[(y)>>3] & 0x80 >> ((y)&0x07)) >> (7-((y)&0x07) ) )
I'm new at pointers, so if you can figure out a way to explain this in simplified english, I would really appreciate it.
It belongs to a segment of code for an X-Plane Plug-in found at https://code.google.com/p/xplugins/source/browse/trunk/Xsaitekpanels/SwitchPanel.cpp?r=38 line=19
The macro tests the value of the y-th bit in x. You can't directly address bits, so the code starts by treating x as an array of bytes (the const char* cast).
It then looks up the byte where the bit lives. There are 8 bits in a byte, so it divides by 8. Chasing performance, instead of simply dividing by 8, the code uses the binary trick of shifting right 3 places. In general, for unsigned x and y, x >> y = x/2^y, and x << y = x*2^y.
At this point you need to test the bit within the byte, so you get the remainder of y/8. Yet another bit trick, using y & 7 instead of the clearer y % 8.
With this information you can make a mask, a single on bit, 0x80 and shift it into position to test the y%8-th bit. The mask is ANDed against the byte and a non-zero result here means the bit was set to 1, otherwise 0.
Completing #RhythmicFistman's answer
#RhythmicFistman's answer is missing one small part to it and that is the last step in the shifts.
The >> (7-((y)&0x07) step ensures that you only ever get a result of 1 or 0. With this code it is safe to do comparisons like:
if (testbit(varible, 6) == 1) {
// do something
}
Where without that step testbit would return a bit mask in which the 6th bit would be set to 1 or 0 and all the other bits are always set to 0. That is the intent but it is not implemented in what is considered a portable way, see Warning 3 below.
Possible issues with using this code
Now to add something to the other answers. The other answers have not pointed out some keywords that should be mentioned here and they are strict aliasing and shift arithmetic right. My elaboration will come in the form of warnings below.
Warning 1: Endianness
This code assumes that you are using a big endian architecture or only wish to get the correct bit from an array of chars.
The reason is that if you convert an int into an array of chars (bytes) you will get different results on a big endian machine vs a little endian machine.
Warning 2: Strict Aliasing
The macro makes use of a cast (const char*) &(x) which is designed to change the type, a.k.a. alias, of (x) so that it is easier to get to the correct bits.
This is dangerous and the reason why is explained beautifully in this SO answer. The short version is that if you compile this code with optimisations strange things can happen.
The wikipedia pages on Aliasing and Pointer Aliasing are also useful and should be read.
Warning 3: Shift Arithmetic Right
In addition to this there could be a potential issue with the way this code uses the right shift operator >>. This operator has two different behaviors depending on whether the variable it is operating on is signed or unsigned. So long as you never use negative numbers you will be safe but this code will not protect you against that mistake. I suspect though, that you're less likely to make such a mistake anyway so it should be ok to use it.
Also worth mentioning, you are using signed char and are shifting it right. Though this works I would prefer unsigned char which would improve portability because it will not risk generating an arithmetic shift right when char and int are the same width (which is almost never the case in practice, granted). This works because char is promoted to int for the shift, see this SO answer for an explanation.
What you see is a macro, that make the following job :
(In order)
Make a bit shift to y (value : 3)
That take the address of x and pick the character in position y (into the string x)
Make a binary operation between the selected char and 0x80
Make a bit shift to the previous result (value: result of binary operation between y and 0x7)
Make a bit shift ti the previous result (value: 7 - (result of binary operation between y and 0x7))
Well, this is help you? I don't think so!
Because this macro is clairly unproper, and kind of tricky.
Bit mask, Binary operation, Binary shift...
If you can explain more precisly what you want to understand in this, maybe i can be helpfull.
i have this line of code:
base_num = (arr[j]/base)%256;
This line runs in a loop and the operations "/" and "%" take a lot of resources and time to perform. I would like to change this line and apply bit operations in order to maximize the program performance. How can i do that?
Thanks.
If base is the nth power of two, you can replace division by it with a bitshift of n to the right. Then, since taking the mod 256 of an integer is equivalent to taking its last 8 bits, you can AND it with 0xFF. Alternately, you can reverse the operations if you AND it with 256*base and then bitshift n to the right.
base_num = arr[j] >> n;
base_num &= 0xFF;
Of course, any half-decent compiler should be able to do this for you.
Add -O1 or greater to your compiler options and the compiler will do it for you.
In gcc, -O1 turns on -ftree-slsr which is, according to the docs,
Perform straight-line strength reduction on trees. This recognizes related expressions involving multiplications and replaces them by less expensive calculations when possible.
This will replace the modulo, and the base if it is constant. However, if you know that the base will be some non-constant power of two, you can refactor the surrounding code to give you the log2 of that number, and >> by that amount minus one.
You could also just declare base_num as an 8 bit integer:
#include <stdint.h>
uint8_t base_num;
uint16_t crap;
crap = 0xFF00;
base_num = crap;
If your compiler is standards compliment, it will put the value of byte(0xFF00) (0x00) into base_num.
I have yet to meet a compiler that does saturated arithmetic in plain C (neither C++ or C#), but if it does, it will put the value of sat_byte(0xFF00) which being greater than 0xFF, it will put 0xFF into base_num.
Keep in mind your compiler will warn you of a loss of precision in this instance. Your compiler may error out in this case (Visual Studio does with Treat Warnings as Errors On). If that happens, you can just do:
base_num = (uint8_t)crap;
but this seems like what you are trying to avoid.
What you are trying to do it seems is to remove the modulus operator as that requires a division and division is the most costly basic arithmetic operation. I generally would not think of this as a bottleneck in any way as any "intelligent" compiler (even in debug mode) would "optimize" it to:
base_num = crap & 0xFF;
on a supported platform (every mainstream processor I've heard of - x86, AMD64, ARM, MIPS), which should be any. I would be dumbfounded to hear of a processor that has no basic AND and OR arithmetic instructions.
I'm having some trouble figuring out the NEON equivalence of a couple of Intel SSE operations. It seems that NEON is not capable to handle an entire Q register at once(128 bit value data type). I haven't found anything in the arm_neon.h header or in the NEON intrinsics reference.
What I want to do is the following:
// Intel SSE
// shift the entire 128 bit value with 2 bytes to the right; this is done
// without sign extension by shifting in zeros
__m128i val = _mm_srli_si128(vector_of_8_s16, 2);
// insert the least significant 16 bits of "some_16_bit_val"
// the whole thing in this case, into the selected 16 bit
// integer of vector "val"(the 16 bit element with index 7 in this case)
val = _mm_insert_epi16(val, some_16_bit_val, 7);
I've looked at the shifting operations provided by NEON but could not find an equivalent way of doing the above(I don't have much experience with NEON). Is it possible to do the above(I guess it is I just don't know how)?
Any pointers greatly appreciated.
You want the VEXT instruction. Your example would look something like:
int16x8_t val = vextq_s16(vector_of_8_s16, another_vector_s16, 1);
After this, bits 0-111 of val will contain bits 16-127 of vector_of_8_s16, and bits 112-127 of val will contain bits 0-15 of another_vector_s16.
I'm talking about this:
If we have the letter 'A' which is 77 in decimal and 4D in Hex.
I am looking for the fastest way to get D.
I thought about two ways:
Given x is a byte.
x << 4; x >> 4
x %= 16
Any other ways? Which one is faster?
Brevity is nice - explanations are better :)
x &= 0x0f
is, of course, the right answer. It exactly expresses the intent of what you're trying to achieve, and on any sane architecture will always compile down to the minimum number of instructions (i.e. 1). Do use hex rather than decimal whenever you put constants in a bit-wise operator.
x <<= 4; x >>= 4
will only work if your 'byte' is a proper unsigned type. If it was actually a signed char then the second operation might cause sign extension (i.e. your original bit 3 would then appear in bits 4-7 too).
without optimization this will of course take 2 instructions, but with GCC on OSX, even -O1 will reduce this to the first answer.
x %= 16
even without the optimizer enabled your compiler will almost certainly do the right thing here and turn that expensive div/mod operation into the first answer. However it can only do that for powers of two, and this paradigm doesn't make it quite so obvious what you're trying to achieve.
I always use x &= 0x0f
There are many good answers and some of them are technically the right ones.
In a broader scale, one should understand that C/C++ is not an assembler. Programmer's job is to try to tell to the compiler the intention what you want to achieve. The compiler will pick the best way to do it depending on the architecture and various optimization flags.
x &= 0x0F; is the most clear way to tell the compiler what you want to achieve. If shifting up and down is faster on some architecture, it is the compiler's job to know it and do the right thing.
Single AND operation can do it.
x = (x & 0x0F);
It will depend on on the architecture to some extent - shifting up and back down on an ARM is probably the fastest way - however the compiler should do that for you. In fact, all of the suggested methods will probably be optimized to the same code by the compiler.
x = x & 15