Ive got a problem. I send a pointer in function that replace 3 last elements of array to 3 first. I should use unsigned char array to send and it should work with ASM.
int main(int argc, char* argv[])
unsigned char arr[24]={
};// example
void AsmFlipVertical(unsigned char *arr)
les esi,arr ; esi=adress of first elem
mov eax,esi
add eax,21
mov edi,eax ; edi=adress of first elem+21;edi is a adress of 21th elem of array
mov ecx,3
rep movsb
movsb from esi to edi
Ive got error in "rep movsb" What's wrong? If use this ASM code in main function that's okay,but I have to use ASM code in function...
You should not use any instructions that affect the segment registers in flat memory models. So, replace les esi,arr with mov esi,arr
The les esi, arr instruction is wrong (you don't want to change the es reister too) You should just use mov esi, arr
(tested - works)
I've been struggling trying to convert this assembly code to C++ code.
It's a function from an old game that takes pixel data Stmp, and I believe it places it to destination void* dest
void Function(int x, int y, int yl, void* Stmp, void* dest)
unsigned long size = 1280 * 2;
unsigned long j = yl;
void* Dtmp = (void*)((char*)dest + y * size + (x * 2));
push es;
push ds;
pop es;
mov edx,Dtmp;
mov esi,Stmp;
mov ebx,j;
xor eax,eax;
xor ecx,ecx;
or bx,bx;
jz exit_1;
mov edi,edx;
cmp word ptr[esi],0xffff;
jz exit_2;
mov ax,[esi];
add edi,eax;
mov cx,[esi+2];
add esi,4;
shr ecx,2;
jnc Next2;
rep movsd;
jmp loop_2;
add esi,2;
add edx,size;
dec bx;
jmp loop_1;
pop es;
That's where I've gotten as far to: (Not sure if it's even correct)
while (j > 0)
if (*stmp != 0xffff)
dtmp += size;
Any help is greatly appreciated. Thank you.
It saves / restores ES around setting it equal to DS so rep movsd will use the same addresses for load and store. That instruction is basically memcpy(edi, esi, ecx) but incrementing the pointers in EDI and ESI (by 4 * ecx). https://www.felixcloutier.com/x86/movs:movsb:movsw:movsd:movsq
In a flat memory model, you can totally ignore that. This code looks like it might have been written to run in 16-bit unreal mode, or possibly even real mode, hence the use of 16-bit registers all over the place.
Look like it's loading some kind of records that tell it how many bytes to copy, and reading until the end of the record, at which point it looks for the next record there. There's an outer loop around that, looping through records.
The records look like this I think:
struct sprite_line {
uint16_t skip_dstbytes, src_bytes;
uint16_t src_data[]; // flexible array member, actual size unlimited but assumed to be a multiple of 2.
The inner loop is this:
;; char *dstp; // in EDI
;; struct spriteline *p // in ESI
cmp word ptr[esi],0xffff ; while( p->skip_dstbytes != (uint16_t)-1 ) {
jz exit_2;
mov ax,[esi]; ; EAX was xor-zeroed earlier; some old CPUs maybe had slow movzx loads
add edi,eax; ; dstp += p->skip_dstbytes;
mov cx,[esi+2]; ; bytelen = p->src_len;
add esi,4; ; p->data
shr ecx,2; ; length in dwords = bytelen >> 2
jnc Next2;
movsw; ; one 16-bit (word) copy if bytelen >> 1 is odd, i.e. if last bit shifted out was a 1.
; The first bit shifted out isn't checked, so size is assumed to be a multiple of 2.
rep movsd; ; copy in 4-byte chunks
Old CPUs (before IvyBridge) had rep movsd faster than rep movsb, otherwise this code could just have done that.
or bx,bx;
jz exit_1;
That's an obsolete idiom that comes from 8080 for test bx,bx / jnz, i.e. jump if BX was zero. So it's a while( bx != 0 ) {} loop. With dec bx in it. It's an inefficient way to write a while (--bx) loop; a compiler would put a dec/jnz .top_of_loop at the bottom, with a test once outside the loop in case it needs to run zero times. Why are loops always compiled into "do...while" style (tail jump)?
Some people would say that's what a while loop looks like in asm, if they're picturing totally naive translation from C to asm.
I have an assignment in C++ to read a file into a string variable which contains digits (no spaces), and using inline assembly, the program needs to sum up the digits of the string. For this I want to loop until end of string (NULL) and every iteration copy 1 char (which is 1 digit) into a register so I can use compare and subtract on it. The problem is that every time instead of copying the char to the register it copies some random value.
I'm using Visual Studio for debugging. Variable Y is the string and I'm trying to copy every iteration of the loop the current char into register AL.
// read from txt file
string y;
cout << "\n" << "the text is \n";
ifstream infile;
getline(infile, y);
cout << y;
// inline assembly
mov edx, 0 // counter
mov ebx, 0
mov eax, 0
movzx AL, y[ebx]
cmp AL, 0x00
jz finished
sub AL, 48 // convert ascii to number, assuming digit
add edx, eax // add digit to counter
add ebx, 1 // move pointer to the next byte
loop loop1
mov i, edx
For example assuming Y is "123" and it's the first iteration of the loop, EBX is 0. I expect y[ebx] to point to value 49 ('1') and indeed in debug I see y[ebx]'s value is 49. I want to copy said value into a register, so when I use instruction:
movzx AL, y[ebx]
I expect register AL to change to 49 ('1'), but the value changes to something random instead. For instance last debug session it changed to 192 ('À').
y is the std::string object's control block. You want to access its C string data.
MSVC inline asm syntax is pretty crap, so there's no way to just ask for a pointer to that in a register. I think you have to create a new C++ variable like char *ystr = y.c_str();
That C variable is a pointer which you need to load into register with mov ecx, [ystr]. Accessing the bytes of ystr's object-representation directly would give you the bytes of the pointer.
Also, your current code is using the loop instruction, which is slow and equivalent to dec ecx/jnz. But you didn't initialize ECX, and your loop termination condition is based on the zero terminator, not a counter that you know ahead of the first iteration. (Unless you also ask the std::string for its length instead).
There is zero reason to use the loop instruction here. Put a test al,al / jnz loop1 at the bottom of your loop like a normal person.
I'm having a problem in finding the average, min and max of an array in assembly language. i created a simple array with C++ and created a test.asm file to pass it through. i figured out the average, but now its the min and max i cant seem to figure out.
#include <iostream>
using namespace std;
extern "C"
int test(int*, int);
int main()
const int SIZE = 7;
int arr[SIZE] = { 1,2,3,4,5,6,7 };
int val = test(arr, SIZE);
cout << "The function test returned: " << val << endl;
return 0;
This is my test.asm that adds all the values and returns 4.
.model flat
_test PROC ;named _test because C automatically prepends an underscode, it is needed to interoperate
push ebp
mov ebp,esp ;stack pointer to ebp
mov ebx,[ebp+8] ; address of first array element
mov ecx,[ebp+12]
mov ebp,0
mov edx,0
mov eax,0
cmp ebp,ecx
je allDone
add eax,[ebx+edx]
add edx,4
add ebp,1
jmp loopMe
mov edx,0
div ecx
pop ebp
_test ENDP
I am still trying to figure out how to find the min since the max will be done in a similar way. I assume you use the cmp to compare values but everything i tried so far hasn't been successful. I'm fairly new to assembly language and its hard for me to grasp. Any help is appreciated.
Any help is appreciated
Ok, so I will show you refactored average function, even if you didn't ask for it directly. :)
Things you can learn from this:
simplified function prologue/epilogue, when ebp is not modified in code
the input array is of 32b int values, so to have correct average you should calculate 64b sum, and do the 64b sum signed division
subtle "tricks" how to get zero value (xor) or how inc is +1 to value (lowering code size)
handling zero sized array by returning fake average 0 (no crash)
addition of two 64b values composed from 32b registers/instructions
counting human "index" (+1 => direct cmp with size possible), yet addressing 32b values (usage of *4 in addressing)
renamed to getAverage
BTW, this is not optimized for performance, I tried to keep the source "simple", so it's easy to read and understand what is it doing.
_getAverage PROC
; avoiding `ebp` usage, so no need to save/set it
mov ebx,[esp+4] ; address of first array element
mov ecx,[esp+8] ; size of array
xor esi,esi ; array index 0
; 64b sum (edx:eax) = 0
xor eax,eax
; test for invalid input (zero sized array)
jecxz zeroSizeArray ; arguments validation, returns 0 for 0 size
; here "0 < size", so no "index < size" test needed for first element
; "do { ... } while(index < size);" loop variant
; extend value from array[esi] to 64b (edi is upper 32b)
mov edi,[ebx+esi*4]
sar edi,31
; edx:eax += edi:array[esi] (64b array value added to 64b sum)
add eax,[ebx+esi*4]
adc edx,edi
; next index and loop while index < size
inc esi
cmp esi,ecx
jb sumLoop
; divide the 64b sum of integers by "size" to get average value
idiv ecx ; signed (!) division (input array is signed "int")
; can't overflow (Divide-error), as the sum value was accumulated
; from 32b values only, so EAX contains full correct result
_getAverage ENDP
// Calls the external LongRandom function, written in
// assembly language, that returns an unsigned 32-bit
// random integer. Compile in the Large memory model.
// Procedure called LongRandomArray that fills an array with 32-bit unsigned
// random integers
#include <iostream.h>
#include <conio.h>
extern "C" {
unsigned long LongRandom();
void LongRandomArray(unsigned long * buffer, unsigned count);
const int ARRAY_SIZE = 20;
int main()
// Allocate array storage and fill with 32-bit
// unsigned random integers.
unsigned long * rArray = new unsigned long[ARRAY_SIZE];
for(unsigned i = 0; i < 20; i++)
cout << rArray[i] << ',';
cout << endl;
return 0;
LongRandom & LongRandomArray procedure module (longrand.asm)
.model large
Public _LongRandom
Public _LongRandomArray
seed dd 12345678h
; Return an unsigned pseudo-random 32-bit integer
; in DX:AX,in the range 0 - FFFFFFFFh.
_LongRandom proc far, C
mov eax, 214013
mul seed
xor edx,edx
add eax, 2531011
mov seed, eax ; save the seed for the next call
shld edx,eax,16 ; copy upper 16 bits of EAX to DX
_LongRandom endp
_LongRandomArray proc far, C
ARG bufferPtr:DWORD, count:WORD
; fill random array
mov edi,bufferPtr
mov cx, count
call _LongRandom
mov word ptr [edi],dx
add edi,2
mov word ptr [edi],ax
add edi,2
loop L1
_LongRandomArray endp
This code is based on on an 16-bit example for MS-DOS from Kip Irvine's assembly book (6th ed.) and explicitely written for Borland C++ 5.01 and TASM 4.0 (see chapter 13.4 "Linking to C/C++ in Real-Address Mode").
Pointers in 16-bit-mode consist of a segment and an offset, usually written as segment:offset. This is not the real memory address which will calculated by the processor. You can not load segment:offset in a 32-bit-register (EDI) and store a value to the memory. So
mov edi,bufferPtr
mov word ptr [edi],dx
is wrong. You have to load the segment part of the pointer in a segment register e.g. ES, the offset part in a appropriate general 16-bit register eg. DI and to possibly use a segment override:
push es
les di,bufferPtr ; bufferPtr => ES:DI
mov word ptr es:[di],dx
pop es
The ARG replaces the name of the variable with the appropriate [bp+x] operand. Therefor you need a prologue (and an epilogue). TASM inserts the right instruction, if the PROC header is well written what is not the case here. Take a look at following working function:
_LongRandomArray PROC C FAR
ARG bufferPtr:DWORD, count:WORD
push es
les di,bufferPtr
mov cx, count
call _LongRandom
mov word ptr es:[di],dx
add di,2
mov word ptr es:[di],ax
add di,2
loop L1
pop es
_LongRandomArray ENDP
Compile your code with BCC (not BCC32):
BCC -ml main.cpp longrand.asm
I don't understand why this code
#include <iostream>
using namespace std;
int main(){
int result=0;
mov eax,3;
MUL eax,3;
mov result,eax;
return 0;
shows the following error.
1>c:\users\david\documents\visual studio 2010\projects\assembler_instructions\assembler_instructions.cpp(11): error C2414: illegal number of operands
Everything seems fine, and yet why do I get this compiler error?
According to this page, the mul instruction only takes a single argument:
mul arg
This multiplies "arg" by the value of corresponding byte-length in the A register, see table below:
operand size 1 byte 2 bytes 4 bytes
other operand AL AX EAX
higher part of result stored in: AH DX EDX
lower part of result stored in: AL AX EAX
Thus following the notes as per Justin's link:
#include <iostream>
int main()
int result=0;
mov eax, 3;
mov ebx, 4;
mul ebx;
mov result,eax;
std::cout << result << std::endl;
return 0;
imul eax, 3;
imul eax, eax, 3;
That way you don't need to worry about edx -register being clobbered. It's "signed integer multiply". You seem to have 'int' -result so it shouldn't matter whether you use mul or imul.
Sometimes I've gotten errors from not having edx register zeroed when dividing or multiplying. CPU was Intel core2 quad Q9550
There's numbingly overengineered but correct intel instruction reference manuals you can read. Though intel broke its websites while ago. You could try find same reference manuals from AMD sites though.
Update: I found the manual: http://www.intel.com/design/pentiumii/manuals/243191.htm
I don't know when they are going to again break their sites, so you really always need to search it up.
Update2: ARGHL! those are from year 1999.. well most details are unfortunately the same.
You should download the Intel architecture manuals.
For your purpose, volume 2 is going to help you the most.
As of access in July 2010, they are current.