I'm relying on an old implementation that does some calculations and converts float's to int.
However, after replicating the calculations some values are off due different rounding results.
It boils down that the binary is using the following code for converting a float to int (long).
lea ecx, [esp+var_8] ; Load Effective Address
sub esp, 10h ; Integer Subtraction
and ecx, 0FFFFFFF8h ; Logical AND
fld st ; Load Real
fistp qword ptr [ecx] ; Store Integer and Pop
fild qword ptr [ecx] ; Load Integer
mov edx, [ecx+4]
mov eax, [ecx]
test eax, eax ; Logical Compare
jz short loc_3 ; Jump if Zero (ZF=1)
loc_1:
fsubp st(1), st ; Subtract Real and Pop
test edx, edx ; Logical Compare
jz short loc_2 ; Jump if Zero (ZF=1)
fstp dword ptr [ecx] ; Store Real and Pop
mov ecx, [ecx]
add esp, 10h ; Add
xor ecx, 80000000h ; Logical Exclusive OR
add ecx, 7FFFFFFFh ; Add
adc eax, 0 ; Add with Carry
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_2:
fstp dword ptr [ecx] ; Store Real and Pop
mov ecx, [ecx]
add esp, 10h ; Add
add ecx, 7FFFFFFFh ; Add
sbb eax, 0 ; Integer Subtraction with Borrow
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_3:
test edx, 7FFFFFFFh ; Logical Compare
jnz short loc_1 ; Jump if Not Zero (ZF=0)
fstp dword ptr [ecx] ; Store Real and Pop
fstp dword ptr [ecx] ; Store Real and Pop
add esp, 10h ; Add
retn ; Return Near from Procedure
This behaves differently then simply doing unsigned int var = (unsigned int)floatVal;.
I believe this is an old ftol implementation and was done because converting from float to int was very slow and the compiler needed to change the FPU rounding mode.
It looks very similar to this one http://www.libsdl.org/release/SDL-1.2.15/src/stdlib/SDL_stdlib.c
Can anyone assist me in converting the function to C? Or tell me how I can create an inline ASM function with float parameter and return int using Visual Studio. The one in SDL_sdtlib.c has no header and I'm not sure how to call it without function args.
This doesn't exactly answer the questions as asked, but I wanted to start by trying a more thorough English translation. It's not perfect, and there are a few lines where I'm still trying to track down the intent. Everyone please speak up with questions and corrections.
lea ecx, [esp+var_8]; Load Effective Address // make ecx point to somewhere on the stack (I don't know where var_8 is being generated in this case, but I'm guessing it's set such that it makes ecx point to the local stack space allocated on the next line)
sub esp, 10h ; Integer Subtraction // make room on stack for 16 bytes of local variable -- doesn't all get used but adds padding to allow aligned loads and stores
and ecx, 0FFFFFFF8h ; Logical AND // align pointer in ecx to 8-byte boundary
fld st ; Load Real // duplicates whatever was last left (passed by calling convention) on the top of the FPU stack -- st(1) = st(0)
fistp qword ptr [ecx] ; Store Integer and Pop // convert st(0) to *64bit* int (truncate), store in aligned 8 bytes (of local variable space?) pointed to by ecx, and pop off the top value from the FPU stack
fild qword ptr [ecx] ; Load Integer // convert truncated value back to float and leave it sitting on the top of the FPU stack
;// at this point:
;// - st(0) is the truncated float
;// - st(1) is still the original float.
;// - There is a 64bit integer representation pointed to by [ecx]
mov edx, [ecx+4] ; // move [bytes 4 thru 7 of integer output] to edx (most significant bytes)
mov eax, [ecx] ; // move [bytes 0 thru 3 of integer output] to eax (least significant bytes) -- makes sense, as EAX should hold integer return value in x86 calling conventions
test eax, eax ; Logical Compare // (http://stackoverflow.com/questions/13064809/the-point-of-test-eax-eax)
jz short loc_3 ; Jump if Zero (ZF=1) // if the least significant 4 bytes are zero, goto loc_3
; // else fall through to loc_1
loc_1: ;
fsubp st(1), st ; Subtract Real and Pop // subtract the truncated float from the original, store in st(1), then pop. (i.e. for 1.25 st(0) ends up 0.25, and the original float is no longer on the FPU stack)
test edx, edx ; Logical Compare // same trick as earlier, but for the most significant bytes now
jz short loc_2 ; Jump if Zero (ZF=1) // if the most significant 4 bytes from before were all zero, goto loc_2 -- (i.e. input float does not overflow a 32 bit int)
fstp dword ptr [ecx] ; Store Real and Pop // else, dump the fractional portion of the original float over the least significant bytes of the 64bit integer
;// at this point:
;// - the FPU stack should be empty
;// - eax holds a copy of the least significant 4 bytes of the 64bit integer (return value)
;// - edx holds a copy of the most significant 4 bytes of the 64bit integer
;// - [ecx] points to a float representing the part of the input that would be lost in integer truncation
;// - [ecx+4] points at the most significant 4 bytes of our 64bit integer output (probably considered garbage now and not used again)
mov ecx, [ecx] ; // make ecx store what it's pointing at directly instead of the pointer to it
add esp, 10h ; Add // clean up stack space from the beginning
xor ecx, 80000000h ; Logical Exclusive OR // mask off the sign bit of the fractional float
add ecx, 7FFFFFFFh ; Add // add signed int max (still need to figure out why this)
adc eax, 0 ; Add with Carry // clear carry bit
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_2:
;// at this point: the FPU stack still holds the fractional (non-integer) portion of the original float that woud have been lost to truncation
fstp dword ptr [ecx] ; Store Real and Pop // store non-integer part as float in local stack space, and remove it from the FPU stack
mov ecx, [ecx] ; // make ecx store what it's pointing at directly instead of the pointer to it
add esp, 10h ; Add // clean up stack space from the beginning
add ecx, 7FFFFFFFh ; Add // add signed int max to the float we just stored (still need to figure out why this)
sbb eax, 0 ; Integer Subtraction with Borrow // clear carry bit
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_3:
test edx, 7FFFFFFFh ; Logical Compare // test the most significant bytes for signed int max
jnz short loc_1 ; Jump if Not Zero (ZF=0) // if the high bytes equal signed int max go back to loc_1
fstp dword ptr [ecx] ; Store Real and Pop // else, empty the FPU stack
fstp dword ptr [ecx] ; Store Real and Pop // empty the FPU stack
add esp, 10h ; Add // clean up stack space from the beginning
retn ; Return Near from Procedure
Related
I am writing an assembly function callable from C++ that will read the CPU Vendor ID. Here is the function signature:
extern "C" void GetVendorID(const char* id);
Here is how I am calling it:
char vendorID[13];
GetVendorID(vendorID);
vendorID[12] = '\0';
Here is the important parts of the assembly:
global GetVendorID
GetVendorID:
push ebp
mov ebp, esp
push eax
push ebx
push ecx
push edx
mov eax, 0
cpuid ; <- this instruction moves the vendor id into ebx, edx, ecx
mov eax, [ebp + 8] ; <- move the value of the char pointer parameter into eax
; I have verified that this instruction works by returning eax and comparing it to the
; address of the vendorID array
; start with ebx
mov byte [eax], bl ; <- move a character into the char array
inc eax ; <- increment the pointer
shl ebx, 8 ; <- shift ebx to get the next character in its least significant bits
mov byte [eax], bl ; <- repeat
inc eax
shl ebx, 8
mov byte [eax], bl
inc eax
shl ebx, 8
mov byte [eax], bl
inc eax
shl ebx, 8
; above is repeated for edx and ecx
pop edx
pop ecx
pop ebx
pop eax
mov esp, ebp
pop ebp
ret
The way the string is stored in the registers is weird. The first character is stored in the least significant byte of ebx, the next is stored in the second least significant byte, and so on. That is why I am doing the left shifts.
I have verified that ebx, edx, ecx do contain the correct values by returning them from the function and printing them out. They contain "GenuineIntel". However, the char array remains unchanged. It is full of zeroes after the function returns.
I am not really sure why this isn't working. Am I accessing the parameter incorrectly?
My assignment is to Implement a function in assembly that would do the following:
loop through a sequence of characters and swap them such that the end result is the original string in reverse ( 100 points )
Hint: collect the string from user as a C-string then pass it to the assembly function along with the number of characters entered by the user. To find out the number of characters use strlen() function.
i have written both c++ and assembly programs and it works fine for extent: for example if i input 12345 the out put is correctly shown as 54321 , but if go more than 5 characters : the out put starts to be incorrect: for example if i input 123456 the output is :653241. i will greatly appreciate anyone who can point where my mistake is:
.code
_reverse PROC
push ebp
mov ebp,esp ;stack pointer to ebp
mov ebx,[ebp+8] ; address of first array element
mov ecx,[ebp+12] ; the number of elemets in array
mov eax,ebx
mov ebp,0 ;move 0 to base pointer
mov edx,0 ; set data register to 0
mov edi,0
Setup:
mov esi , ecx
shr ecx,1
add ecx,edx
dec esi
reverse:
cmp ebp , ecx
je allDone
mov edx, eax
add eax , edi
add edx , esi
Swap:
mov bl, [edx]
mov bh, [eax]
mov [edx],bh
mov [eax],bl
inc edi
dec esi
cmp edi, esi
je allDone
inc ebp
jmp reverse
allDone:
pop ebp ; pop ebp out of stack
ret ; retunr the value of eax
_reverse ENDP
END
and here is my c++ code:
#include<iostream>
#include <string>
using namespace std;
extern"C"
char reverse(char*, int);
int main()
{
char str[64] = {NULL};
int lenght;
cout << " Please Enter the text you want to reverse:";
cin >> str;
lenght = strlen(str);
reverse(str, lenght);
cout << " the reversed of the input is: " << str << endl;
}
You didn't comment your code, so IDK what exactly you're trying to do, but it looks like you are manually doing the array indexing with MOV / ADD instead of using an addressing mode like [eax + edi].
However, it looks like you're modifying your original value and then using it in a way that would make sense if it was unmodified.
mov edx, eax ; EAX holds a pointer to the start of array, read every iter
add eax , edi ; modify the start of the array!!!
add edx , esi
Swap:
inc edi
dec esi
EAX grows by EDI every step, and EDI increases linearly. So EAX increases geometrically (integral(x * dx) = x^2).
Single-stepping this in a debugger should have found this easily.
BTW, the normal way to do this is to walk one pointer up, one pointer down, and fall out of the loop when they cross. Then you don't need a separate counter, just cmp / ja. (Don't check for JNE or JE, because they can cross each other without ever being equal.)
Overall you the right idea to start at both ends of the string and swap elements until you get to the middle. Implementation is horrible though.
mov ebp,0 ;move 0 to base pointer
This seems to be loop counter (comment is useless or even worse); I guess idea was to swap length/2 elements which is perfectly fine. HINT I'd just compare pointers/indexes and exit once they collide.
mov edx,0 ; set data register to 0
...
add ecx,edx
mov edx, eax
Useless and misleading.
mov edi,0
mov esi , ecx
dec esi
Looks like indexes to start/end of the string. OK. HINT I'd go with pointers to start/end of the string; but indexes work too
cmp ebp , ecx
je allDone
Exit if did length/2 iterations. OK.
mov edx, eax
add eax , edi
add edx , esi
eax and edx point to current symbols to be swapped. Almost OK but this clobbers eax! Each loop iteration after second will use wrong pointers! This is what caused your problem in the first place. This wouldn't have happened if you used pointers instead indexes, or if you'd used offset addressing [eax+edi]/[eax+esi]
...
Swap part is OK
cmp edi, esi
je allDone
Second exit condition, this time comparing for index collision! Generally one exit condition should be enough; several exit conditions usually either superfluous or hint at some flaw in the algorithm. Also equality comparison is not enough - indexes can go from edi<esi to edi>esi during single iteration.
I have the following code, that is supposed to XOR a block of memory:
void XorBlock(DWORD dwStartAddress, DWORD dwSize, DWORD dwsKey)
{
DWORD dwKey;
__asm
{
push eax
push ecx
mov ecx, dwStartAddress // Move Start Address to ECX
add ecx, dwSize // Add the size of the function to ECX
mov eax, dwStartAddress // Copy the Start Address to EAX
crypt_loop: // Start of the loop
xor byte ptr ds:[eax], dwKey // XOR The current byte with 0x4D
inc eax // Increment EAX with dwStartAddress++
cmp eax,ecx // Check if every byte is XORed
jl crypt_loop; // Else jump back to the start label
pop ecx // pop ECX from stack
pop eax // pop EAX from stack
}
}
However, the argument dwKey gives me an error. The code works perfectly if for example the dwKey is replaced by 0x5D.
I think you have two problems.
First, "xor" can't take two memory operands (ds:[eax] is a memory location and dwKey is a memory location); secondly, you've used "byte ptr" to indicate you want a byte, but you're trying to use a DWORD and assembly can't automatically convert those.
So, you'll probably have to load your value into an 8-bit register and then do it. For example:
void XorBlock(DWORD dwStartAddress, DWORD dwSize, DWORD dwsKey)
{
DWORD dwKey;
__asm
{
push eax
push ecx
mov ecx, dwStartAddress // Move Start Address to ECX
add ecx, dwSize // Add the size of the function to ECX
mov eax, dwStartAddress // Copy the Start Address to EAX
mov ebx, dwKey // <---- LOAD dwKey into EBX
crypt_loop : // Start of the loop
xor byte ptr ds : [eax], bl // XOR The current byte with the low byte of EBX
inc eax // Increment EAX with dwStartAddress++
cmp eax, ecx // Check if every byte is XORed
jl crypt_loop; // Else jump back to the start label
pop ecx // pop ECX from stack
pop eax // pop EAX from stack
}
}
Although, it also looks like dwKey is uninitialized in your code; maybe you should just "mov bl, 0x42". I'm also not sure you need to push and pop the registers; I can't remember what registers you are allowed to clobber with MSVC++ inline assembler.
But, in the end, I think Alan Stokes is correct in his comment: it is very unlikely assembly is actually faster than C/C++ code in this case. The compiler can easily generate this code on its own, and you might find the compiler actually does unexpected optimizations to make it run even faster than the "obvious" assembly does (for example, loop unrolling).
I am currently learning assembly programming as part of one of my university modules. I have a program written in C++ with inline x86 assembly which takes a string of 6 characters and encrypts them based on the encryption key.
Here's the full program: https://gist.github.com/anonymous/1bb0c3be77566d9b791d
My code fo the encrypt_chars function:
void encrypt_chars (int length, char EKey)
{ char temp_char; // char temporary store
for (int i = 0; i < length; i++) // encrypt characters one at a time
{
temp_char = OChars [i]; // temp_char now contains the address values of the individual character
__asm
{
push eax // Save values contained within register to stack
push ecx
movzx ecx, temp_char
push ecx // Push argument #2
lea eax, EKey
push eax // Push argument #1
call encrypt
add esp, 8 // Clean parameters of stack
mov temp_char, al // Move the temp character into a register
pop ecx
pop eax
}
EChars [i] = temp_char; // Store encrypted char in the encrypted chars array
}
return;
// Inputs: register EAX = 32-bit address of Ekey,
// ECX = the character to be encrypted (in the low 8-bit field, CL).
// Output: register EAX = the encrypted value of the source character (in the low 8-bit field, AL).
__asm
{
encrypt:
push ebp // Set stack
mov ebp, esp // Set up the base pointer
mov eax, [ebp + 8] // Move value of parameter 1 into EAX
mov ecx, [ebp + 12] // Move value of parameter 2 into ECX
push edi // Used for string and memory array copying
push ecx // Loop counter for pushing character onto stack
not byte ptr[eax] // Negation
add byte ptr[eax], 0x04 // Adds hex 4 to EKey
movzx edi, byte ptr[eax] // Moves value of EKey into EDI using zeroes
pop eax // Pop the character value from stack
xor eax, edi // XOR character to give encrypted value of source
pop edi // Pop original address of EDI from the stack
rol al, 1 // Rotates the encrypted value of source by 1 bit (left)
rol al, 1 // Rotates the encrypted value of source by 1 bit (left) again
add al, 0x04 // Adds hex 4 to encrypted value of source
mov esp, ebp // Deallocate values
pop ebp // Restore the base pointer
ret
}
//--- End of Assembly code
}
My questions are:
What is the best/ most efficient way to convert this for loop into assembly?
Is there a way to remove the call for encrypt and place the code directly in its place?
How can I optimise/minimise the use of registers and instructions to make the code smaller and potentially faster?
Is there a way for me to convert the OChars and EChars arrays into assembly?
If possible, would you be able to provide me with an explanation of how the solution works as I am eager to learn.
I can't help with optimization or the cryptography but i can show you a way to go about making a loop, if you look at the loop in this function:
void f()
{
int a, b ;
for(a = 10, b = 1; a != 0; --a)
{
b = b << 2 ;
}
}
The loop is essentially:
for(/*initialize*/; /*condition*/; /*modify*/)
{
// run code
}
So the function in assembly would look something along these lines:
_f:
push ebp
mov ebp, esp
sub esp, 8 ; int a,b
initialize: ; for
mov dword ptr [ebp-4], 10 ; a = 10,
mov dword ptr [ebp-8], 1 ; b = 1
mov eax, [ebp-4]
condition:
test eax, eax ; tests if a == 0
je exit
runCode:
mov eax, [ebp-8]
shl eax, 2 ; b = b << 2
mov dword ptr [ebp-8], eax
modify:
mov eax, [ebp-4]
sub eax, 1 ; --a
mov dword ptr [ebp-4], eax
jmp condition
exit:
mov esp, ebp
pop ebp
ret
Plus I show in the source how you make local variables;
subtract the space from the stack pointer.
and access them through the base pointer.
I tried to make the source as generic intel x86 assembly syntax as i could so my apologies if anything needs changing for your specific environment i was more aiming to give a general idea about how to construct a loop in assembly then giving you something you can copy, paste and run.
I would suggest to look into assembly code which is generated by compiler. You can change and optimize it later.
How do you get assembler output from C/C++ source in gcc?
i am learning assembly and i started experiments on SSE and MMX registers within the Digital-Mars C++ compiler (intel sytanx more easily readable). I have finished a program that takes var_1 as a value and converts it to the var_2 number system(this is in 8 bit for now. will expand it to 32 64 128 later) . Program does this by two ways:
__asm inlining
Usual C++ way of %(modulo) operator.
Question: Can you tell me more efficient way to use xmm0-7 and mm0-7 registers and can you tell me how to exchange exact bytes of them with al,ah... 8 bit registers?
Usual %(modulo) operator in the C++ usual way is very slow in comparison with __asm on my computer(pentium-m centrino 2.0GHz).
If you can tell me how to get rid of division instruction in __asmm, it will be even faster.
When i run the program it gives me:
(for the values: var_1=17,var_2=2,all loops are 200M times)
17 is 10001 in number system 2
__asm(clock)...........: 7250 <------too bad. it is 8-bit calc.
C++(clock).............: 12250 <------not very slow(var_2 is a power of 2)
(for the values: var_1=33,var_2=7,all loops are 200M times)
33 is 45 in number system 7
__asm(clock)..........: 2875 <-------not good. it is 8-bit calc.
C++(clock)............: 6328 <----------------really slow(var_2 is not a power of 2)
The second C++ code(the one with % operator): /////////////////////////////////////////////////////////
t1=clock();//reference time
for(int i=0;i<200000000;i++)
{
y=x;
counter=0;
while(y>g)
{
var_3[counter]=y%g;
y/=g;
counter++;
}
var_3[counter]=y%g;
}
t2=clock();//final time
_asm code:////////////////////////////////////////////////////////////////////////////////////////////////////////////
__asm // i love assembly in some parts of C++
{
pushf //here does register backup
push eax
push ebx
push ecx
push edx
push edi
mov eax,0h //this will be outer loop counter init to zero
//init of medium-big registers to zero
movd xmm0,eax //cannot set to immediate constant: xmm0=outer loop counter
shufps xmm0,xmm0,0h //this makes all bits zero
movd xmm1,eax
movd xmm2,eax
shufps xmm1,xmm1,0h
shufps xmm2,xmm2,0h
movd xmm2,eax
shufps xmm3,xmm3,0h//could have made pxor xmm3,xmm3(single instruction)
//init complete(xmm0,xmm1,xmm2,xmm3 are zero)
movd xmm1,[var_1] //storing variable_1 to register
movd xmm2,[var_2] //storing var_2 to register
lea ebx,var_3 //calculate var_3 address
movd xmm3,ebx //storing var_3's address to register
for_loop:
mov eax,0h
//this line is index-init to zero(digit array index)
movd edx,xmm2
mov cl,dl //this is the var_1 stored in cl
movd edx,xmm1
mov al,dl //this is the var_2 stored in al
mov edx,0h
dng:
mov ah,00h //preparation for a 8-bit division
div cl //divide
movd ebx,xmm3 //get var_3 address
add ebx,edx //i couldnt find a way to multiply with 4
add ebx,edx //so i added 4 times ^^
add ebx,edx //add
add ebx,edx //last adding
//below, mov [ebx],ah is the only memory accessing instruction
mov [ebx],ah //(8 bit)this line is equivalent to var_3[i]=remainder
inc edx //i++;
cmp al,00h //is division zero?
jne dng //if no, loop again
//here edi register has the number of digits
movd eax,xmm0 //get the outer loop counter from medium-big register
add eax,01h //j++;
movd xmm0,eax //store the new counter to medium-big register
cmp eax,0BEBC200h //is j<(200,000,000) ?
jb for_loop //if yes, go loop again
mov [var_3_size],edx //now we have number of digits too!
//here does registers revert back to old values
pop edi
pop edx
pop ecx
pop ebx
pop eax
popf
}
Whole code://///////////////////////////////////////////////////////////////////////////////////////
#include <iostream.h>
#include <cmath>
#include<stdlib.h>
#include<stdio.h>
#include<time.h>
int main()
{
srand(time(0));
clock_t t1=clock();
clock_t t2=clock();
int var_1=17; //number itself
int var_2=2; //number system
int var_3[100]; //digits to be showed(maximum 100 as seen )
int var_3_size=0;//asm block will decide what will the number of digits be
for(int i=0;i<100;i++)
{
var_3[i]=0; //here we initialize digits to zeroes
}
t1=clock();//reference time to take
__asm // i love assembly in some parts of C++
{
pushf //here does register backup
push eax
push ebx
push ecx
push edx
push edi
mov eax,0h //this will be outer loop counter init to zero
//init of medium-big registers to zero
movd xmm0,eax //cannot set to immediate constant: xmm0=outer loop counter
shufps xmm0,xmm0,0h //this makes all bits zero
movd xmm1,eax
movd xmm2,eax
shufps xmm1,xmm1,0h
shufps xmm2,xmm2,0h
movd xmm2,eax
shufps xmm3,xmm3,0h
//init complete(xmm0,xmm1,xmm2,xmm3 are zero)
movd xmm1,[var_1] //storing variable_1 to register
movd xmm2,[var_2] //storing var_2 to register
lea ebx,var_3 //calculate var_3 address
movd xmm3,ebx //storing var_3's address to register
for_loop:
mov eax,0h
//this line is index-init to zero(digit array index)
movd edx,xmm2
mov cl,dl //this is the var_1 stored in cl
movd edx,xmm1
mov al,dl //this is the var_2 stored in al
mov edx,0h
dng:
mov ah,00h //preparation for a 8-bit division
div cl //divide
movd ebx,xmm3 //get var_3 address
add ebx,edx //i couldnt find a way to multiply with 4
add ebx,edx //so i added 4 times ^^
add ebx,edx //add
add ebx,edx //last adding
//below, mov [ebx],ah is the only memory accessing instruction
mov [ebx],ah //(8 bit)this line is equivalent to var_3[i]=remainder
inc edx //i++;
cmp al,00h //is division zero?
jne dng //if no, loop again
//here edi register has the number of digits
movd eax,xmm0 //get the outer loop counter from medium-big register
add eax,01h //j++;
movd xmm0,eax //store the new counter to medium-big register
cmp eax,0BEBC200h //is j<(200,000,000) ?
jb for_loop //if yes, go loop again
mov [var_3_size],edx //now we have number of digits too!
//here does registers revert back to old values
pop edi
pop edx
pop ecx
pop ebx
pop eax
popf
}
t2=clock(); //finish time
printf("\n assembly_inline(clocks): %i for the 200 million calculations",(t2-t1));
printf("\n value %i(in decimal) is: ",var_1);
for(int i=var_3_size-1;i>=0;i--)
{
printf("%i",var_3[i]);
}
printf(" in the number system: %i \n",var_2);
//and: more readable form(end easier)
int counter=var_3_size;
int x=var_1;
int g=var_2;
int y=x;// backup
t1=clock();//reference time
for(int i=0;i<200000000;i++)
{
y=x;
counter=0;
while(y>g)
{
var_3[counter]=y%g;
y/=g;
counter++;
}
var_3[counter]=y%g;
}
t2=clock();//final time
printf("\n C++(clocks): %i for the 200 million calculations",(t2-t1));
printf("\n value %i(in decimal) is: ",x);
for(int i=var_3_size-1;i>=0;i--)
{
printf("%i",var_3[i]);
}
printf(" in the number system: %i \n",g);
return 0;
}
edit:
this is 32-bit version
void get_digits_asm()
{
__asm
{
pushf //couldnt store this in other registers
movd xmm0,eax//storing in xmm registers instead of pushing
movd xmm1,ebx//
movd xmm2,ecx//
movd xmm3,edx//
movd xmm4,edi//end of push backups
mov eax,[variable_x]
mov ebx,[number_system]
mov ecx,0h
mov edi,0h
begin_loop:
mov edx,0h
div ebx
lea edi,digits
mov [edi+ecx*4],edx
add ecx,01h
cmp eax,ebx
ja begin_loop
mov edx,0
div ebx
lea edi,digits
mov [edi+ecx*4],edx
inc ecx
mov [digits_total],ecx
movd edi,xmm4//pop edi
movd edx,xmm3//pop edx
movd ecx,xmm2//pop ecx
movd ebx,xmm1//pop ebx
movd eax,xmm0//pop eax
popf
}
}
The code can be much simpler of course: (modeled after the C++ version, does not include pushes and pops, and not tested)
mov esi,200000000
_bigloop:
mov eax,[y]
mov ebx,[g]
lea edi,var_3
; eax = y
; ebx = g
; edi = var_3
xor ecx,ecx
; ecx = counter
_loop:
xor edx,edx
div ebx
mov [edi+ecx*4],edx
add ecx,1
test eax,eax
jnz _loop
sub esi,1
jnz _bigloop
But I would be surprised if it was faster than the C++ version, and in fact it'll almost certainly be slower if the base is a power of two - all sane compilers know how to turn a division and/or modulo by a power of two into bitshifts and bitwise ands.
Here's a version that uses ab 8-bit division. Similar caveats apply, but now the division could even overflow (if y / g is more than 255).
mov esi,200000000
_bigloop:
mov eax,[y]
mov ebx,[g]
lea edi,var_3
; eax = y
; ebx = g
; edi = var_3
xor ecx,ecx
; ecx = counter
_loop:
div bl
mov [edi+ecx],ah
add ecx,1
and eax,0xFF
jnz _loop
sub esi,1
jnz _bigloop