Multiple array operations - c++

For my project I need to take a vector array from a file then need to compare it with two A and B vectors and need to find to which one of A and B is closer to the vector we read from file.
I already did the C++ part (taking values of X from file etc.)
For example: for X(1,3,5) , A(2,4,6) (for A distance to X is (|2-1|+|4-3|+|6-5|)= 3) then i need to do the same operation for the B and find which value is smaller(which means closer to the X vector)
Basically for 3 sized arrays i need to find difference between X's and A's 1st, 2nd and 3rd elements (then need the absolute value of their sum then I need to do this for B then compare two values )
but I'm really stuck with the Assembly part:
so far i know to find distance i need to use this code to find absolute value but before using this code down below i need to find the difference between two elements then apply this code to find the absolute value
Here is the code piece for finding absolute value I don't know if that helps:
mov ebx, eax ; move eax to ebx
neg eax ; eax = -eax
cmovl eax, ebx ; if negative move ebx back to eax
but my main problem is: How can I take the first elements from both X and A get the difference between their elements in Assembly.(Need to do this for 2th and 3th values of both arrays as well. Then i need to do same operations for X and B but if you show me for A im sure i can apply the same algorithm for B
my C++ prototype of Assembly function is this :
distance(int n, int * Xptr, int * Aptr, int * Bptr);
and defined A and B as array with 3 members.

You access an array using indirect addressing.
Like so:
;ecx = number of items in the array
push ebx
push esi
push edi
xor ebx,ebx ;outcome is zero.
mov esi,Array1 ;esi = address of array1
mov edi,Array2 ;edi = address of array2
add esi,ecx ;esi = end of array
add edi,ecx ;edi = end of array
neg ecx ;start at the beginning of each array
jz done ;count is zero, nothing to do
loop: ;for (i=0;i<count;i++)
mov edx,[edi+ecx] ;edx = Array1[i] or Array1[start+length-count]
mov eax,[esi+ecx] ;ebx = Array2[i]
sub eax,edx ;calculate difference
cdq ;edx = eax < 0? -1:0
add eax, edx
xor eax, edx ;eax = abs(eax)
add ebx,eax
inc ecx ;i++
jnz loop
done:
mov eax,ebx
pop edi
pop esi
pop ebx
ret
Let me walk you though the code.
We start with setting the sum to zero and setting pointers to the array.
Then we negate the count and update the pointers to the end of the array.
This seemingly complicated setup is a speed hack, it allows you to count from -count to zero whilst not having to keep an extra variable around to keep track of the array indexes.
Then we do some magic to do an abs without having to do jumps or conditional moves.
You call this routine twice. Once to get abs(A[]-X[]) and again to get abs(B[]-X[]).
For the abs trick, see: https://www.strchr.com/optimized_abs_function
You'll have to do some changes to adjust it to your calling convention. I leave this as an exercise for the reader. You might adjust the code to do all of the comparisons in one go, which I also leave up to the reader.
just for fun let's pick apart the abs sample:
Alt-A cycles bytes Alt B cycles bytes
mov ebx, eax 0 2 cdq 1 1
neg eax 1 2 add eax,edx 1 2
cmovl eax, ebx 2 3 xor eax,edx 1 2
As you can see there is very little difference between the two samples. I just prefer the cdq variant because it's more elegant.

Related

Assembly: loop through a sequence of characters and swap them

My assignment is to Implement a function in assembly that would do the following:
loop through a sequence of characters and swap them such that the end result is the original string in reverse ( 100 points )
Hint: collect the string from user as a C-string then pass it to the assembly function along with the number of characters entered by the user. To find out the number of characters use strlen() function.
i have written both c++ and assembly programs and it works fine for extent: for example if i input 12345 the out put is correctly shown as 54321 , but if go more than 5 characters : the out put starts to be incorrect: for example if i input 123456 the output is :653241. i will greatly appreciate anyone who can point where my mistake is:
.code
_reverse PROC
push ebp
mov ebp,esp ;stack pointer to ebp
mov ebx,[ebp+8] ; address of first array element
mov ecx,[ebp+12] ; the number of elemets in array
mov eax,ebx
mov ebp,0 ;move 0 to base pointer
mov edx,0 ; set data register to 0
mov edi,0
Setup:
mov esi , ecx
shr ecx,1
add ecx,edx
dec esi
reverse:
cmp ebp , ecx
je allDone
mov edx, eax
add eax , edi
add edx , esi
Swap:
mov bl, [edx]
mov bh, [eax]
mov [edx],bh
mov [eax],bl
inc edi
dec esi
cmp edi, esi
je allDone
inc ebp
jmp reverse
allDone:
pop ebp ; pop ebp out of stack
ret ; retunr the value of eax
_reverse ENDP
END
and here is my c++ code:
#include<iostream>
#include <string>
using namespace std;
extern"C"
char reverse(char*, int);
int main()
{
char str[64] = {NULL};
int lenght;
cout << " Please Enter the text you want to reverse:";
cin >> str;
lenght = strlen(str);
reverse(str, lenght);
cout << " the reversed of the input is: " << str << endl;
}
You didn't comment your code, so IDK what exactly you're trying to do, but it looks like you are manually doing the array indexing with MOV / ADD instead of using an addressing mode like [eax + edi].
However, it looks like you're modifying your original value and then using it in a way that would make sense if it was unmodified.
mov edx, eax ; EAX holds a pointer to the start of array, read every iter
add eax , edi ; modify the start of the array!!!
add edx , esi
Swap:
inc edi
dec esi
EAX grows by EDI every step, and EDI increases linearly. So EAX increases geometrically (integral(x * dx) = x^2).
Single-stepping this in a debugger should have found this easily.
BTW, the normal way to do this is to walk one pointer up, one pointer down, and fall out of the loop when they cross. Then you don't need a separate counter, just cmp / ja. (Don't check for JNE or JE, because they can cross each other without ever being equal.)
Overall you the right idea to start at both ends of the string and swap elements until you get to the middle. Implementation is horrible though.
mov ebp,0 ;move 0 to base pointer
This seems to be loop counter (comment is useless or even worse); I guess idea was to swap length/2 elements which is perfectly fine. HINT I'd just compare pointers/indexes and exit once they collide.
mov edx,0 ; set data register to 0
...
add ecx,edx
mov edx, eax
Useless and misleading.
mov edi,0
mov esi , ecx
dec esi
Looks like indexes to start/end of the string. OK. HINT I'd go with pointers to start/end of the string; but indexes work too
cmp ebp , ecx
je allDone
Exit if did length/2 iterations. OK.
mov edx, eax
add eax , edi
add edx , esi
eax and edx point to current symbols to be swapped. Almost OK but this clobbers eax! Each loop iteration after second will use wrong pointers! This is what caused your problem in the first place. This wouldn't have happened if you used pointers instead indexes, or if you'd used offset addressing [eax+edi]/[eax+esi]
...
Swap part is OK
cmp edi, esi
je allDone
Second exit condition, this time comparing for index collision! Generally one exit condition should be enough; several exit conditions usually either superfluous or hint at some flaw in the algorithm. Also equality comparison is not enough - indexes can go from edi<esi to edi>esi during single iteration.

Which, in x86, is more efficient? Using a variable or lea of the variable? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I have a section of x86 code inside of some c++ code:
void encrypt_chars (int lengthW, char EKey)
{
__asm { //
xor esi, esi //zeroise esi
mov edi, lengthW //store the max loop counter in a register
for:
movzx ecx, OChars[esi] //store the character to encrypt
lea eax, EKey //by ref
movzx ebx, byte ptr[eax] //store the EKey value in EBX as a keep safe for when the original is changed later
sub ecx, 0x0A //change the current characters hex value by -10 (denary)
and byte ptr[eax], 0xAA //and EKey with 170(denary) to get an encryption value
not byte ptr[eax] //not the encryption value to obtain a different value
movzx edx, byte ptr[eax] //store the encryption value in EDX
or ebx, 0xAA //create a second encryption value
add bl, dl //add the values in the last 8 bits of EBX and EDX (the two encryption values), store them in the last 8 bits of EBX (ignores the 9th bit from carry)
xor ecx, ebx //encrypt the original letter with the encryption value
rol cl, 2 //futher encryption through rotating last 8 bits of EAX bits 2 to left
mov EChars[esi], cl //move
inc esi //increment loop counter
cmp esi, edi //compare loop counter and the max number of loops
jl for //jump if esi is less than the loop counter
}
return;
}
My question is, which is more efficient, to use the lea into eax then use a pointer, or use the variable itself instead of all of the byte ptr[eax].
I know that lea is a very quick instruction, but I'm unsure as to whether referencing it in memory is more efficient than just using the variable.
Using some register, not necessary eax, is better when you have multiply accesses to the given variable and the variable is global - i.e. addressed by absolute address.
In the code from the question, the variables are function arguments and they are pointed by the ESP or by EBP (depending on the compiler). So, it is the same as using EAX.
So, using the variables by name will spare one instruction (lea eax, EKey) from the inner loop and the code will be a little bit faster.
Notice how using inline assembly makes the code less readable and more obscure, because of the hidden code generated by the compiler. Better write everything in assembly language and then link the compiled object file to your C program.
It seems that most of this code is performing 8 bit operations, and if the key is 8 bits, why not just load it into al? You could also get rid of the offsets for a slight improvement in speed.
__asm {
lea esi, Ochars
mov edi, lengthW
add edi, esi
mov al, Ekey
for:
mov cl, [esi]
mov bl, al
sub cl, 0x0a
...

Is shifting or multiplying faster and why?

What generally is a faster solution, multiplying or bit shifting?
If I want to multiply by 10000, which code would be faster?
v = (v<<13) + (v<<11) + (v<<4) - (v<<8);
or
v = 10000*v;
And the second part of the question - How to find the lowest number of shifts required to do some multiplication? (I'm intereseted in multiplying by 10000, 1000 and 100).
It really depends on the architecture of the processor, as well as the compiler that you're using.
But you can simply view the dis-assembly of each option, and see for yourself.
Here is what I got using Visual-Studio 2010 compiler for Pentium:
int v2 = (v<<13) + (v<<11) + (v<<4) - (v<<8);
mov eax,dword ptr [v]
shl eax,0Dh
mov ecx,dword ptr [v]
shl ecx,0Bh
add eax,ecx
mov edx,dword ptr [v]
shl edx,4
add eax,edx
mov ecx,dword ptr [v]
shl ecx,8
sub eax,ecx
mov dword ptr [v2],eax
int v2 = 10000*v;
mov eax,dword ptr [v]
imul eax,eax,2710h
mov dword ptr [v2],eax
So it appears that the second option is faster in my case.
BTW, you might get a different result if you enable optimization (mine was disabled)...
To the first question: Don't bother. The compiler knows better and will optimize it according to the respective target hardware.
To the second question: Look at the binary representation:
For example: bin(10000) = 0b10011100010000:
1 0 0 1 1 1 0 0 0 1 0 0 0 0
13 12 11 10 9 8 7 6 5 4 3 2 1 0
So you have to shift by 13, 10, 9, 8 and 4. If you want to shortcut consecutive ones (by subtracting as in your question) you need at least three consecutive ones in order to gain anything.
But again, let the compiler do this. It's his job.
there is only one situation in which shift operation are faster than *, and it's defined by two condition:
the operation is with a value power of two
when you multiply with a fractional number -> division.
Let's look a little deeper:
multiplication/division, shift operation are done by units in the HW
architecture; usually you have shifters, multipliers/dividers to
perform these operations but each of the operation is performed by a
different set of registers inside a Arithmeric Locgic Unit.
multiplication/division with a power of two is equivalent to a
left_shift/right_shift operation
if you are not dealing with power of 2 than multiplication and division are performed slightly differently:
Multiplication is performed by the HW ( ALU unit) in a single instrucion (depending on the data type but let's not overcomplicate things)
Division is performed in a loop as consecutive subtractions -> more than one instruction
Summarizing:
multiplication is only one instruction; while replacing
multiplication with a series of shift operations is multiple
instruction -> the first option is faster (even on a parallel
architecture)
multiplication with a power of two is the same as a shift operation; the compiler usually generates a shift when it detects this in the code.
division is multiple instruction; replaving this with a series of shifts might prove faster but it depends on each situation.
division with a power of two is multiple operations and can be replaced with a single right_shift operation; a smart compiler will
do this automatically
An older Microsoft C compiler optimized the shift sequence using lea (load effective address), which allows multiples of 5:
lea eax, DWORD PTR [eax+eax*4] ;eax = v*5
lea ecx, DWORD PTR [eax+eax*4] ;ecx = v*25
lea edx, DWORD PTR [ecx+ecx*4] ;edx = v*125
lea eax, DWORD PTR [edx+edx*4] ;eax = v*625
shl eax, 4 ;eax = v*10000
multiply (signed or unsigned) was still faster on my system with Intel 2600K 3.4ghz. Visual Studio 2005 and 2012 multiplied v*10256, then subtracted (v<<8). Shift and add / subtract sequence was slower than the lea method above:
shl eax,4 ;ecx = v*(16)
mov ecx,eax
shl eax,4 ;ecx = v*(16-256)
sub ecx,eax
shl eax,3 ;ecx = v*(16-256+2048)
add ecx,eax
shl eax,2 ;eax = v*(16-256+2048+8192) = v*(10000)
add eax,ecx

How to increment an array in x86 assembly?

How would you increment an array using x86 assembly within a for loop. If the loop (made using c++) looked like:
for (int i = 0; i < limit; i++)
A value from an array is put in a register then the altered value is placed in a separate array. How would I increment each array in x86 assembly (I know c++ is simpler but it is practice work), so that each time the loop iterates the value used and the value placed into the arrays is one higher than the previous time? The details of what occur in the loop aside from the array manipulation are unimportant as I would like to know how this can be done in general, not a specific situation?
The loop you write here would be:
xor eax, eax ; clear loop variable
mov ebx, limit
loop:
cmp eax, ebx
je done
inc eax
jmp loop
done:
...
I really don't understand what you mean by "increment an array".
If you mean that you want to load some value from one array, manipulate the value and store the result in a target array, then you should consider this:
Load the pointer for the source array in esi and the target pointer in edi.
mov esi, offset array1
mov edi, offset array2
mov ebx, counter
loop:
mov eax, [esi]
do what you need
move [edi], eax
inc esi
inc edi
dec ebx
jne loop

Effective for loop in assembly

Im currently trying to get used to assembler and I have written a for loop in c++ and then I have looked at it in disassembly. I was wondering if anyone could explain to me what each step does and/or how to improve the loop manually.
for (int i = 0; i < length; i++){
013A17AE mov dword ptr [i],0
013A17B5 jmp encrypt_chars+30h (13A17C0h)
013A17B7 mov eax,dword ptr [i]
013A17BA add eax,1
013A17BD mov dword ptr [i],eax
013A17C0 mov eax,dword ptr [i]
013A17C3 cmp eax,dword ptr [length]
013A17C6 jge encrypt_chars+6Bh (13A17FBh)
temp_char = OChars [i]; // get next char from original string
013A17C8 mov eax,dword ptr [i]
013A17CB mov cl,byte ptr OChars (13AB138h)[eax]
013A17D1 mov byte ptr [temp_char],cl
Thanks in advance.
First, I'd note that what you've posted seems to contain only part of the loop body. Second, it looks like you compiled with all optimization turned off -- when/if you turn on optimization, don't be surprised if the result looks rather different.
That said, let's look at the code line-by-line:
013A17AE mov dword ptr [i],0
This is basically just i=0.
013A17B5 jmp encrypt_chars+30h (13A17C0h)
This is going to the beginning of the loop. Although it's common to put the test at the top of a loop in most higher level languages, that's not always the case in assembly language.
013A17B7 mov eax,dword ptr [i]
013A17BA add eax,1
013A17BD mov dword ptr [i],eax
This is i++ in (extremely sub-optimal) assembly language. It's retrieving the current value of i, adding one to it, then storing the result back into i.
013A17C0 mov eax,dword ptr [i]
013A17C3 cmp eax,dword ptr [length]
013A17C6 jge encrypt_chars+6Bh (13A17FBh)
This is basically if (i==length) /* skip forward to some code you haven't shown */ It's retrieving the value of i and comparing it to the value of length, the jumping somewhere if i was greater than or equal to length.
If you were writing this in assembly language by hand, you'd normally use something like xor eax, eax (or sub eax, eax) to zero a register. In most cases, you'd start from the maximum and count down to zero if possible (avoids a comparison in the loop). You certainly wouldn't store a value into a variable, then immediately retrieve it back out (in fairness, a compiler probably won't do that either, if you turn on optimization).
Applying that, and moving the "variables" into registers, we'd end up with something on this general order:
mov ecx, length
loop_top:
; stuff that wasn't pasted goes here
dec ecx
jnz loop_top
I'll try to interpret this in plain english:
013A17AE mov dword ptr [i],0 ; Move into i, 0
013A17B5 jmp encrypt_chars+30h (13A17C0h) ; Jump to check
013A17B7 mov eax,dword ptr [i] ; Load i into the accumulator (register eax)
013A17BA add eax,1 ; Increment the accumulator
013A17BD mov dword ptr [i],eax ; and put that in it, effectively adding
; 1 to i.
check:
013A17C0 mov eax,dword ptr [i] ; Move i into the accumulator
013A17C3 cmp eax,dword ptr [length] ; Compare it to the value in 'length',
; setting flags
013A17C6 jge encrypt_chars+6Bh (13A17FBh) ; Jump if it's greater or equal. This
; address is not in your code snippet
The compiler preferes EAX for arithmetic. Each register (in the past, I don't know if this is still current) has some type of operation that it is faster at doing.
Here's the part that should be more optimized:
(note: your compiler SHOULD do this, so either you have optimizations turned off, or something in the loop body is preventing this optimization)
mov eax,dword ptr [i] ; Go get "i" from memory, put it in register EAX
add eax,1 ; Add one to register EAX
mov dword ptr [i],eax ; Put register EAX back in memory "i". (now one bigger)
mov eax,dword ptr [i] ; Go get "i" from memory, put it in EAX again.
See how often you're moving values back-n-forth from memory to EAX?
You should be able to load "i" into EAX once at the beginning of the loop, run the full loop directly out of EAX, and put the finished value back into "i" after its all done.
(unless something else in your code prevents this)
Anyway, this code comes from DEBUG build. It is possible to optimize it, but MS compiler produces very good code for such simple cases.
There is no point to do it manually, just re-build it in release mode and read the listing to learn how to do it.