Hooking into a rather big application - c++

i have this code :
.text:0045A020 ; int __thiscall CMapConnection__OnItemOptionCombination(CMapConnection *this, _tagRequestMAP_COMPOSITION_OPTIONITEM *prcreq)
.text:0045A020 ?OnItemOptionCombination#CMapConnection##QAEHPAU_tagRequestMAP_COMPOSITION_OPTIONITEM###Z proc near
.text:0045A020
.text:0045A020 000 push ebp
.text:0045A021 004 mov ebp, esp
.text:0045A023 004 sub esp, 440h ; Integer Subtraction
.text:0045A029 444 mov eax, ___security_cookie
.text:0045A02E 444 xor eax, ebp ; Logical Exclusive OR
.text:0045A030 444 mov [ebp+var_2F0], eax
.text:0045A036 444 push esi
.text:0045A037 448 push edi
.text:0045A038 44C mov [ebp+this], ecx
.text:0045A03E 44C mov eax, [ebp+this]
.text:0045A044 44C mov ecx, [eax+534h]
.text:0045A04A 44C mov [ebp+pPlayer], ecx
.text:0045A050 44C cmp [ebp+pPlayer], 0 ; Compare Two Operands
.text:0045A057 44C jnz short loc_45A063 ; Jump if Not Zero (ZF=0)
.text:0045A057
.text:0045A059 44C mov eax, 1
.text:0045A05E 44C jmp loc_45A97B ; Jump
Long things short, i need to do the folowing :
- hook into the beginning of the function
- do some checks ( allot of code is required for those checks )
- based on the checking result, i need to either let the function continue it's normal course or make it jump to the section where it triggers some errors or simply stop it from advancing.
I have to do this with basic understanding of asm.
From what i've read i can do that with a hook, but here's my problem :
The checking function needs to read the _tagRequestMAP_COMPOSITION_OPTIONITEM *prcreq data, so it can gather some numbers.
.text:0041A464 784C mov ecx, [ebp+pPacket] ; jumptable 00417B7A case 27
.text:0041A467 784C add ecx, 4 ; Add
.text:0041A46A 784C mov [ebp+var_1874], ecx
.text:0041A470 784C mov edx, [ebp+var_1874]
.text:0041A476 784C push edx ; prcreq
.text:0041A477 7850 mov ecx, [ebp+this] ; this
.text:0041A47D 7850 call ?OnItemOptionCombination#CMapConnection##QAEHPAU_tagRequestMAP_COMPOSITION_OPTIONITEM###Z ;
Here's how the original function is called.
My questions :
How do i read the data from *pcreq in C++ code? Is it possible?
Is it possible to call another function from my hook while passing the same parameters to it as the hooked function has?
I don't mess with the parameters of the OnItemCombination function at all, do i have to redo the stack when i exit from my hook?

Since you can't "pause" the program in order to inject the DLL/so and do the checks (or at least I've never heard of such a thing) you could modify the startup code in order to loop around a variable.
While the program is spinning, perform your checks into the injected DLL/so then get the static pointer used for that variable and modify it to allow the continuation of the injected program.
This will probably take some skill to achieve.
Eagerly waiting for more answers,
Cheers.
Update:
Here's what I had in mind.
edit the startup code of the program to spin at a loop like the following. Using jmp and cmp instructions.
static bool spin = true;
while(spin){ }
Then inject your DLL/so and do your checks. Once you're done. Change spin to false and allow the program to continue.
To change spin you'll have to find the static pointer. You can do that by studying the instructions or with a program like CheatEngine.

Detours Library
http://research.microsoft.com/en-us/projects/detours/
or
EasyHook
http://www.codeproject.com/Articles/27637/EasyHook-The-reinvention-of-Windows-API-hooking

Related

How to add my custom c++ logic to output some register info in the middle of a fuction?

I had use ida and x64dbg find some logic I care about. The development enviroment is windows x64 , visual studio 2017 .The register value I care out is in the middle of function(000000006137C034).I can not use minihook to hook sub_xxx, then in the detoure fuction I can output some info I have analized.
Disassemble code in a dll I want to analyze
.text:000000006137BFE0 ; __int64 __fastcall sub_6137BFE0(__int64, __int64, __int64, int)
.text:000000006137BFE0 sub_6137BFE0 proc near ; DATA XREF: .rdata:000000006286BB78↓o
.text:000000006137BFE0 ; .pdata:00000000670C06BC↓o
.text:000000006137BFE0
.text:000000006137BFE0 var_28 = dword ptr -28h
.text:000000006137BFE0 var_20 = dword ptr -20h
.text:000000006137BFE0 var_18 = dword ptr -18h
.text:000000006137BFE0 arg_0 = qword ptr 8
.text:000000006137BFE0 arg_8 = qword ptr 10h
.text:000000006137BFE0 arg_10 = qword ptr 18h
.text:000000006137BFE0
.text:000000006137BFE0 mov [rsp+arg_0], rbx
.text:000000006137BFE5 mov [rsp+arg_8], rbp
.text:000000006137BFEA mov [rsp+arg_10], rsi
.text:000000006137BFEF push rdi
.text:000000006137BFF0 sub rsp, 40h
.text:000000006137BFF4 mov rbp, rdx
.text:000000006137BFF7 mov [rsp+48h+var_18], 1
.text:000000006137BFFF mov rdx, [rcx+8]
.text:000000006137C003 mov edi, r9d
.text:000000006137C006 mov rsi, r8
.text:000000006137C009 mov [rsp+48h+var_20], 8
.text:000000006137C011 mov ecx, 0A0h
.text:000000006137C016 mov [rsp+48h+var_28], 4E2h
.text:000000006137C01E mov r9d, 0FF12FD5Ah
.text:000000006137C024 xor r8d, r8d
.text:000000006137C027 call global_new_handler
.text:000000006137C02C mov rbx, rax
.text:000000006137C02F test rax, rax
.text:000000006137C032 jz short loc_6137C05F
.text:000000006137C034 mov rcx, rax
{
#here I want to insert logic to output some memory info pointed by register.
#After doing this , the process continue to excute.
# add code : jump to addr of my own code
}
.text:000000006137C037 call sub_61367440
.text:000000006137C03C mov qword ptr [rbx+98h], 0
.text:000000006137C047 lea rax, off_6286B8A8
.text:000000006137C04E mov [rbx], rax
.text:000000006137C051 mov rax, rbx
.text:000000006137C054 mov [rbx+38h], rbp
.text:000000006137C058 mov [rbx+40h], rsi
.text:000000006137C05C mov [rbx+48h], edi
.text:000000006137C05F
.text:000000006137C05F loc_6137C05F: ; CODE XREF: sub_6137BFE0+52↑j
.text:000000006137C05F mov rbx, [rsp+48h+arg_0]
.text:000000006137C064 mov rbp, [rsp+48h+arg_8]
.text:000000006137C069 mov rsi, [rsp+48h+arg_10]
.text:000000006137C06E add rsp, 40h
.text:000000006137C072 pop rdi
.text:000000006137C073 retn
.text:000000006137C073 sub_6137BFE0 endp
.text:000000006137C073
I found thess papers The-Beginners-Guide-to-Codecaves, Easy-Mid-Hook
, code cave is similar to what I want . I can add my custom logic ,but it's 32 bit project. I should modify some code in 000000006137C034, jump to My own code , after some work , the process continue to excute.
My Own Code Write By C++ On another project which load the dll.
# save all registers, which maybe written by asm
# monitor logic , which written by c++
# restore all registers
# return to 000000006137C037 , so the process continue to excute
There may be some technical point:
a. Modify code segment to add jump instruction during process running ,so the process can jump to my own code;
b. How to realize my own code : stack and register save and restore, and how to deal with the return addr.
c. Other things needs paying attention to for my poor knowledge of x64 instruction and reversing engineering.
I hope some one can give me some hint( paper , open source library, or web page) so I can keep forward to learning, and finally realize these functionity I needed.
Thanks!
I have found some page help me to understand, and I will keep updating what I found during realizing my expected functionity.
1.Hooking-By-Example
2.Adding x86, x64 Assembly Language code to the Visual Studio C++ project.
3.What is the 'shadow space' in x64 assembly?

Assembly: loop through a sequence of characters and swap them

My assignment is to Implement a function in assembly that would do the following:
loop through a sequence of characters and swap them such that the end result is the original string in reverse ( 100 points )
Hint: collect the string from user as a C-string then pass it to the assembly function along with the number of characters entered by the user. To find out the number of characters use strlen() function.
i have written both c++ and assembly programs and it works fine for extent: for example if i input 12345 the out put is correctly shown as 54321 , but if go more than 5 characters : the out put starts to be incorrect: for example if i input 123456 the output is :653241. i will greatly appreciate anyone who can point where my mistake is:
.code
_reverse PROC
push ebp
mov ebp,esp ;stack pointer to ebp
mov ebx,[ebp+8] ; address of first array element
mov ecx,[ebp+12] ; the number of elemets in array
mov eax,ebx
mov ebp,0 ;move 0 to base pointer
mov edx,0 ; set data register to 0
mov edi,0
Setup:
mov esi , ecx
shr ecx,1
add ecx,edx
dec esi
reverse:
cmp ebp , ecx
je allDone
mov edx, eax
add eax , edi
add edx , esi
Swap:
mov bl, [edx]
mov bh, [eax]
mov [edx],bh
mov [eax],bl
inc edi
dec esi
cmp edi, esi
je allDone
inc ebp
jmp reverse
allDone:
pop ebp ; pop ebp out of stack
ret ; retunr the value of eax
_reverse ENDP
END
and here is my c++ code:
#include<iostream>
#include <string>
using namespace std;
extern"C"
char reverse(char*, int);
int main()
{
char str[64] = {NULL};
int lenght;
cout << " Please Enter the text you want to reverse:";
cin >> str;
lenght = strlen(str);
reverse(str, lenght);
cout << " the reversed of the input is: " << str << endl;
}
You didn't comment your code, so IDK what exactly you're trying to do, but it looks like you are manually doing the array indexing with MOV / ADD instead of using an addressing mode like [eax + edi].
However, it looks like you're modifying your original value and then using it in a way that would make sense if it was unmodified.
mov edx, eax ; EAX holds a pointer to the start of array, read every iter
add eax , edi ; modify the start of the array!!!
add edx , esi
Swap:
inc edi
dec esi
EAX grows by EDI every step, and EDI increases linearly. So EAX increases geometrically (integral(x * dx) = x^2).
Single-stepping this in a debugger should have found this easily.
BTW, the normal way to do this is to walk one pointer up, one pointer down, and fall out of the loop when they cross. Then you don't need a separate counter, just cmp / ja. (Don't check for JNE or JE, because they can cross each other without ever being equal.)
Overall you the right idea to start at both ends of the string and swap elements until you get to the middle. Implementation is horrible though.
mov ebp,0 ;move 0 to base pointer
This seems to be loop counter (comment is useless or even worse); I guess idea was to swap length/2 elements which is perfectly fine. HINT I'd just compare pointers/indexes and exit once they collide.
mov edx,0 ; set data register to 0
...
add ecx,edx
mov edx, eax
Useless and misleading.
mov edi,0
mov esi , ecx
dec esi
Looks like indexes to start/end of the string. OK. HINT I'd go with pointers to start/end of the string; but indexes work too
cmp ebp , ecx
je allDone
Exit if did length/2 iterations. OK.
mov edx, eax
add eax , edi
add edx , esi
eax and edx point to current symbols to be swapped. Almost OK but this clobbers eax! Each loop iteration after second will use wrong pointers! This is what caused your problem in the first place. This wouldn't have happened if you used pointers instead indexes, or if you'd used offset addressing [eax+edi]/[eax+esi]
...
Swap part is OK
cmp edi, esi
je allDone
Second exit condition, this time comparing for index collision! Generally one exit condition should be enough; several exit conditions usually either superfluous or hint at some flaw in the algorithm. Also equality comparison is not enough - indexes can go from edi<esi to edi>esi during single iteration.

Hanging of XShmPutImage event notification

I am using XShm extension to draw and manipulate images in Linux.
In order to not have screen flickering, I am passing send_event = TRUE to XShmPutImage and then waiting for the event with XIfEvent, immediately after the call to XScmPutImage.
This way, I am making the image drawing blocking in order to not changing the image until it is displayed on the window surface.
Usually everything works fine. But sometimes, when I have intensive image drawing, it seems that the event never comes and the drawing procedure hangs.
Where to see for the problem? Is using XIfEvent appropriate for this task? How can the event dissapear from the message queue?
Is it possible XShmPutImage to not send event (if send_event = TRUE) or to send event different than ShmCompletion on some circumstances? (for example on some internal error or something?)
EDIT:
After some more research, I found that such hangs happens only when the window manager intensively generate events to the window. For example when I resize the window by dragging its corners.
EDIT2:
I tried several ways to solve this problem, but without success. At the end I was forced to use some timeout and to cancel waiting after some time. But of course this is dirty hack and I want to fix it anyway.
So, what can be the reason XShmPutImage to not send event if send_event=TRUE or is it possible this event to disappear from the message queue?
EDIT3:
Here is the questionable code (FASM):
cinvoke XShmPutImage, ......, TRUE
.loop:
lea eax, [.event]
cinvoke XCheckTypedEvent, [Display], [ShmCompletionEvent], eax
test eax, eax
jz .loop ; there is no message
NB: XShmPutImage always return TRUE, regardless whether the event check hangs or not, so I didn't put error check after it.
EDIT4:
Because of request I am posting the whole code of the drawing function. The code uses some macro libraries of FASM, but at least the ideas are clear (I hope)
Notice that this code contains workaround code that limits the event waiting for only 20ms. Without this timeout the waiting loop simply hangs forever. The number of the XShm event is acquired by calling XShmGetEventBase as recommended in the Xshm documentation.
; Draws the image on a OS provided window surface.
proc DrawImageRect, .where, .pImage, .xDst, .yDst, .xSrc, .ySrc, .width, .height
.event XEvent
rb 256
begin
pushad
mov esi, [.pImage]
test esi, esi
jz .exit
mov ebx, [esi+TImage.ximage]
cinvoke XCreateGC, [hApplicationDisplay], [.where], 0, 0
mov edi, eax
cinvoke XShmPutImage, [hApplicationDisplay], [.where], edi, [esi+TImage.ximage], [.xSrc], [.ySrc], [.xDst], [.yDst], [.width], [.height], TRUE
stdcall GetTimestamp
lea esi, [eax+20] ; 20ms timeout
.loop:
lea eax, [.event]
cinvoke XCheckTypedEvent, [hApplicationDisplay], [ShmCompletionEvent], eax
test eax, eax
jnz .finish
stdcall GetTimestamp
cmp eax, esi
jb .loop
.finish:
cinvoke XFreeGC, [hApplicationDisplay], edi
.exit:
popad
return
endp
And here is the code of the main event loop of the application.
The procedure __ProcessOneSystemEvent simply dispatches the events to the GUI objects and ignores all events it does not use. It does not process ShmCompletionEvent at all.
All the windows created in the application have events mask of:
ExposureMask+FocusChangeMask+KeyPressMask+KeyReleaseMask+ButtonPressMask+ButtonReleaseMask+EnterWindowMask+LeaveWindowMask+PointerMotionMask+StructureNotifyMask
proc ProcessSystemEvents
.event XEvent
rb 256
begin
push ebx ecx edx
.event_loop:
; check for quit
get eax, [pApplication], TApplication:MainWindow
test eax, eax
jz .continue
cmp dword [eax], 0
jne .continue
cinvoke XFlush, [hApplicationDisplay]
xor eax, eax
mov [fGlobalTerminate], 1
stc
pop edx ecx ebx
return
.continue:
cinvoke XPending, [hApplicationDisplay]
test eax, eax
jz .noevents
push edi ecx
lea edi, [.event]
mov ecx, sizeof.XEvent/4
xor eax, eax
rep stosd
pop ecx edi
lea ebx, [.event]
cinvoke XNextEvent, [hApplicationDisplay], ebx
stdcall __ProcessOneSystemEvent, ebx
jmp .event_loop
.noevents:
clc
pop edx ecx ebx
return
endp
The full source code is available in the repository but it is a very big project not easy for navigation. The discussed source is in check-in 8453c99b1283def8.
The files: "freshlib/graphics/images.asm" and "freshlib/graphics/Linux/images.asm" are about the image drawing.
The files: "freshlib/gui/Main.asm" and "freshlib/gui/Linux/Main.asm" are about the general events handling in the application.
What is the X server doing?
The X server can and will suppress a ShmCompletionEvent if the parameters passed to XShmPutImage exceed the geometry of the shared memory area attached to the XImage in the call. The server checks X/Y and width/height against the previously stored limits for the given shared area and if the call parameters are out-of-bounds, the server will return BadValue, suppress the drawing operation, and suppress the completion event.
The above is exactly what is happening in your library. Here's how:
The main event dispatcher routine is ProcessSystemEvents. It performs an XEventNext, and based on event type, using a jump table .jump_table dispatches to an event specific handler function.
The event specific function for an Expose event is .expose
The .expose function will, in turn, call DrawImageRect using X/Y and width/height values from the XExposeEvent struct. This is wrong and is the true source of the bug, as we shall see momentarily.
DrawImageRect will pass these values along in a call to XShmPutImage
The handler for XShmPutImage in the X server will examine these parameters and reject if they're out of bounds.
The parameters are being rejected because they come from an exposure event and are related to the geometry of the window and not the geometry of the shared memory attached to the XImage used in the XShmPutImage call.
Specifically, if the window has just be resized (e.g. by the window manager) and has been enlarged, and there has been a prior ConfigureNotify event for resize. Now, with a new Expose event, it will have larger width/height that will exceed the width/height of the shared memory area that the server knows about.
It is the responsibility of the client to field the window resize events [etc.] and teardown/recreate the shared memory area with the enlarged size. This is not being done and is the source of the bug.
NOTE: Just to be completely clear on this, the server can only report on the error and not do anything about it for a few reasons:
The server knows about the window [and its resize].
It knows about the XImage, its shared memory area and size
But they're only associated during the XShmPutImage call [AFAIK]
Even if the server could associate them, it couldn't adjust the shmarea
That's because it has no way to relink the shmarea to the client side
Only the client can do that via XShmDetach/XShmAttach
Below are redacted versions of the relevant source files from the c5c765bc7e commit. They have been cleaned up a bit, so only the most germane parts remain. Some lines have been truncated or wrapped to eliminate horizontal scrolling.
The files have been annotated with NOTE and NOTE/BUG which I did while analyzing them.
gui/Main.asm The top level generic main loop. Nothing to see much to see here.
; FILE: gui/Main.asm
; _____________________________________________________________________________
;| |
;| ..::FreshLib::.. Free, open source. Licensed under "BSD 2-clause" license." |
;|_____________________________________________________________________________|
;
; Description: Main procedure of GUI application library.
;
; Target OS: Any
;
; Dependencies:
;
; Notes: Organize the main message/event loop needed by every GUI engine.
; This file contains only OS independent part and includes OS dependent
; files.
;______________________________________________________________________________
module "Main library"
proc Run
begin
.mainloop:
stdcall ProcessSystemEvents
jc .terminate
mov eax, [pApplication]
test eax, eax
jz .eventok
get ecx, eax, TApplication:OnIdle
jecxz .eventok
stdcall ecx, eax
.eventok:
stdcall WaitForSystemEvent
jmp .mainloop
.terminate:
DebugMsg "Terminate GUI application!"
return
endp
include '%TargetOS%/Main.asm'
endmodule
gui/Linux/Main.asm The event handlers
; FILE: gui/Linux/Main.asm
; _____________________________________________________________________________
;| |
;| ..::FreshLib::.. Free, open source. Licensed under "BSD 2-clause" license." |
;|_____________________________________________________________________________|
;
; Description: Main procedure of GUI application library.
;
; Target OS: Linux
;
; Dependencies:
;
; Notes: Organize the main message/event loop needed by every GUI engine.
;______________________________________________________________________________
body ProcessSystemEvents
; NOTE: this is the storage for the dequeued event -- all dispatch routines
; should use it and process it
.event XEvent
rb 256
begin
push ebx ecx edx
.event_loop:
; check for quit
get eax, [pApplication], TApplication:MainWindow
test eax, eax
jz .continue ; ???????????
cmp dword [eax], 0
jne .continue
cinvoke XFlush, [hApplicationDisplay]
xor eax, eax
mov [fGlobalTerminate], 1
stc
pop edx ecx ebx
return
; NOTE: it is wasteful for the main loop to call WaitForSystemEvent, then call
; us and we do XPending on the first loop -- we already know we have at least
; one event waiting in the queue
.continue:
cinvoke XPending, [hApplicationDisplay]
test eax, eax
jz .noevents
push edi ecx
lea edi, [.event]
mov ecx, sizeof.XEvent/4
xor eax, eax
rep stosd
pop ecx edi
lea ebx, [.event]
cinvoke XNextEvent, [hApplicationDisplay], ebx
stdcall __ProcessOneSystemEvent, ebx
jmp .event_loop
.noevents:
clc
pop edx ecx ebx
return
endp
body WaitForSystemEvent
.event XEvent
begin
push eax ecx edx
lea eax, [.event]
cinvoke XPeekEvent, [hApplicationDisplay], eax
pop edx ecx eax
return
endp
proc __ProcessOneSystemEvent, .linux_event
begin
pushad
mov ebx, [.linux_event]
; mov eax, [ebx+XEvent.type]
; cmp eax, [ShmCompletionEvent]
; je .shm_completion
stdcall _GetWindowStruct, [ebx+XEvent.window]
jc .notprocessed
test eax, eax
jz .notprocessed
mov esi, eax
mov ecx, [ebx+XEvent.type]
cmp ecx, LASTEvent
jae .notprocessed
mov ecx, [.jump_table+4*ecx]
jecxz .notprocessed
jmp ecx
.notprocessed:
popad
stc
return
.finish:
popad
clc
return
;.shm_completion:
; DebugMsg "Put back completion event!"
;
; int3
; cinvoke XPutBackEvent, [hApplicationDisplay], ebx
; jmp .finish
;.........................................................................
; seMove and seResize events.
;-------------------------------------------------------------------------
.moveresize:
; NOTE/BUG!!!!: we must not only process a resize/move request, but we must also
; adjust the size of the shmarea attached to the XImage -- that is _not_ being
; done. (e.g.) if the window is enlarged, the shmarea must be enlarged
cinvoke XCheckTypedWindowEvent, [hApplicationDisplay],
[ebx+XConfigureEvent.window], ConfigureNotify, ebx
test eax, eax
jnz .moveresize
; resize event...
mov eax, [esi+TWindow._width]
mov edx, [esi+TWindow._height]
cmp eax, [ebx+XConfigureEvent.width]
jne .resize
cmp edx, [ebx+XConfigureEvent.height]
je .is_move
.resize:
exec esi, TWindow:EventResize, [ebx+XConfigureEvent.width],
[ebx+XConfigureEvent.height]
; move event...
.is_move:
mov eax, [esi+TWindow._x]
mov edx, [esi+TWindow._y]
cmp eax, [ebx+XConfigureEvent.x]
jne .move
cmp eax, [ebx+XConfigureEvent.y]
je .finish
.move:
exec esi, TWindow:EventMove,
[ebx+XConfigureEvent.x], [ebx+XConfigureEvent.y]
jmp .finish
;.........................................................................
; DestroyNotify handler it invalidates the handle in TWindow structure and
; then destroys TWindow.
.destroy:
test esi, esi
jz .finish
mov [esi+TWindow.handle], 0
destroy esi
jmp .finish
;.........................................................................
; Window paint event
.expose:
get edi, esi, TWindow:ImgScreen
; NOTE:BUG!!!!!
;
; if the window has been resized (e.g. enlarged), these values are wrong!
; they relate to the _window_ but _not_ the shmarea that is attached to the
; XImage
;
; however, DrawImageRect will call XShmPutImage with these values, they
; will exceed the geometry of what the X server knows about the shmarea and
; it will return BadValue and _suppress_ the completion event for XShmPutImage
stdcall DrawImageRect, [esi+TWindow.handle], edi,
[ebx+XExposeEvent.x],[ebx+XExposeEvent.y],
[ebx+XExposeEvent.x], [ebx+XExposeEvent.y],
[ebx+XExposeEvent.width], [ebx+XExposeEvent.height]
jmp .finish
;.........................................................................
; Mouse event handlers
.mousemove:
cinvoke XCheckTypedWindowEvent, [hApplicationDisplay],
[ebx+XConfigureEvent.window], MotionNotify, ebx
test eax, eax
jnz .mousemove
stdcall ServeMenuMouseMove, [ebx+XMotionEvent.window],
[ebx+XMotionEvent.x], [ebx+XMotionEvent.y],
[ebx+XMotionEvent.state]
jc .finish
cinvoke XCheckTypedWindowEvent, [hApplicationDisplay],
[ebx+XMotionEvent.window], MotionNotify, ebx
test eax, eax
jnz .mousemove
mov edi, [__MouseTarget]
test edi, edi
jz .search_target_move
stdcall __GetRelativeXY, edi, [ebx+XMotionEvent.x], [ebx+XMotionEvent.y]
jmp .target_move
.search_target_move:
exec esi, TWindow:ChildByXY, [ebx+XMotionEvent.x],
[ebx+XMotionEvent.y], TRUE
mov edi, eax
.target_move:
cmp edi, [__LastPointedWindow]
je .move_event
cmp [__LastPointedWindow], 0
je .leave_ok
exec [__LastPointedWindow], TWindow:EventMouseLeave
.leave_ok:
mov [__LastPointedWindow], edi
exec edi, TWindow:EventMouseEnter
.move_event:
exec edi, TWindow:EventMouseMove, ecx, edx, [ebx+XMotionEvent.state]
jmp .finish
;.........................................................................
; event jump table
.jump_table dd 0 ; event 0
dd 0 ; event 1
dd .key_press ; KeyPress = 2
dd .key_release ; KeyRelease = 3
dd .mouse_btn_press ; ButtonPress = 4
dd .mouse_btn_release ; ButtonRelease = 5
dd .mousemove ; MotionNotify = 6
dd 0 ; EnterNotify = 7
dd 0 ; LeaveNotify = 8
dd .focusin ; FocusIn = 9
dd .focusout ; FocusOut = 10
dd 0 ; KeymapNotify = 11
dd .expose ; Expose = 12
dd 0 ; GraphicsExpose = 13
dd 0 ; NoExpose = 14
dd 0 ; VisibilityNotify = 15
dd 0 ; CreateNotify = 16
dd .destroy ; DestroyNotify = 17
dd 0 ; UnmapNotify = 18
dd 0 ; MapNotify = 19
dd 0 ; MapRequest = 20
dd 0 ; ReparentNotify = 21
dd .moveresize ; ConfigureNotify = 22
dd 0 ; ConfigureRequest = 23
dd 0 ; GravityNotify = 24
dd 0 ; ResizeRequest = 25
dd 0 ; CirculateNotify = 26
dd 0 ; CirculateRequest = 27
dd 0 ; PropertyNotify = 28
dd 0 ; SelectionClear = 29
dd 0 ; SelectionRequest = 30
dd 0 ; SelectionNotify = 31
dd 0 ; ColormapNotify = 32
dd .clientmessage ; ClientMessage = 33
dd .mapping_notify ; MappingNotify = 34
graphics/Linux/images.asm The image drawing code [including the DrawImageRect function] and the shared memory create/destroy code.
; FILE: graphics/Linux/images.asm
; _____________________________________________________________________________
;| |
;| ..::FreshLib::.. Free, open source. Licensed under "BSD 2-clause" license." |
;|_____________________________________________________________________________|
;
; Description: Memory based images manipulation library.
;
; Target OS: Linux
;
; Dependencies: memory.asm
;
; Notes:
;______________________________________________________________________________
uses libX11, xshm
struct TImage
.width dd ? ; width in pixels.
.height dd ? ; height in pixels.
.pPixels dd ? ; pointer to the pixel memory.
; os dependent data
.ximage dd ?
.shminfo XShmSegmentInfo
ends
body CreateImage
begin
pushad
stdcall GetMem, sizeof.TImage
jc .finish
mov esi, eax
xor eax, eax
inc eax
mov ecx, [.width]
mov edx, [.height]
cmp ecx, 0
cmovle ecx, eax
cmp edx, 0
cmovle edx, eax
mov [esi+TImage.width], ecx
mov [esi+TImage.height], edx
lea eax, [4*ecx]
imul eax, edx
cinvoke shmget, IPC_PRIVATE, eax, IPC_CREAT or 777o
test eax, eax
js .error
mov [esi+TImage.shminfo.ShmID], eax
cinvoke shmat, eax, 0, 0
cmp eax, -1
je .error_free
mov [esi+TImage.shminfo.Addr], eax
mov [esi+TImage.pPixels], eax
mov [esi+TImage.shminfo.fReadOnly], 1
lea ebx, [esi+TImage.shminfo]
cinvoke XShmCreateImage, [hApplicationDisplay], 0, $20, ZPixmap, eax,
ebx, [esi+TImage.width], [esi+TImage.height]
mov [esi+TImage.ximage], eax
cinvoke XShmAttach, [hApplicationDisplay], ebx
clc
mov [esp+4*regEAX], esi
.finish:
popad
return
.error_free:
cinvoke shmctl, [ebx+XShmSegmentInfo.ShmID], IPC_RMID, 0
.error:
stdcall FreeMem, esi
stc
jmp .finish
endp
body DestroyImage
begin
pushad
mov esi, [.ptrImage]
test esi, esi
jz .finish
lea eax, [esi+TImage.shminfo]
cinvoke XShmDetach, [hApplicationDisplay], eax
cinvoke XDestroyImage, [esi+TImage.ximage]
cinvoke shmdt, [esi+TImage.shminfo.Addr]
cinvoke shmctl, [esi+TImage.shminfo.ShmID], IPC_RMID, 0
stdcall FreeMem, esi
.finish:
popad
return
endp
;if used ___CheckCompletionEvent
;___CheckCompletionEvent:
;
;virtual at esp+4
; .display dd ?
; .pEvent dd ?
; .user dd ?
;end virtual
;
;; timeout
; stdcall GetTimestamp
; cmp eax, [.user]
; jbe #f
;
; DebugMsg "Timeout!"
;
; mov eax, 1
; retn
;
;##:
; mov eax, [.pEvent] ;.pEvent
; mov eax, [eax+XEvent.type]
;
; cmp eax, [ShmCompletionEvent]
; sete al
; movzx eax, al
; retn
;end if
body DrawImageRect
.event XEvent
rb 256
begin
pushad
mov esi, [.pImage]
test esi, esi
jz .exit
mov ebx, [esi+TImage.ximage]
; NOTE: is this necessary? it seems wasteful to create and destroy a GC
; repeatedly. Dunno, does this _have_ to be done here, _every_ time?
cinvoke XCreateGC, [hApplicationDisplay], [.where], 0, 0
mov edi, eax
; NOTE/BUG: The return ShmCompletionEvent will be suppressed due to a BadValue
; if the X/Y and width/height parameters given to us by caller exceed the
; geometry/range of the shmarea attached to .ximage
;
; the routine that calls us is .expose and it _is_ giving us bad values. it is
; passing us X/Y width/height related to an exposure event of the .where
; _window_ which we put in the call. The X server will compare these against
; the size of the shmarea of TImage.xmage and complain if we exceed the bounds
cinvoke XShmPutImage, [hApplicationDisplay], [.where], edi,
[esi+TImage.ximage], [.xSrc], [.ySrc], [.xDst], [.yDst],
[.width], [.height], TRUE
; NOTE/BUG: this code should _not_ be looping on XCheckTypedEvent because it
; disrupts the normal event processing. if we want to be "synchronous" on this
; we should loop on the main event dispatcher (ProcessSystemEvents) and let it
; dispatch to a callback we create. we can set a "pending" flag that our [not
; yet existent] dispatch routine can clear
; THIS CODE SOMETIMES CAUSES HANGS!
stdcall GetTimestamp
lea esi, [eax+20]
.loop:
lea eax, [.event]
cinvoke XCheckTypedEvent, [hApplicationDisplay], [ShmCompletionEvent],
eax
test eax, eax
jnz .finish
stdcall GetTimestamp
cmp eax, esi
jb .loop
.finish:
cinvoke XFreeGC, [hApplicationDisplay], edi
.exit:
popad
return
endp
Xext/shm.c The X server code that checks and processes the XShmPutImage call.
// FILE: Xext/shm.c
static int
ProcShmPutImage(ClientPtr client)
{
GCPtr pGC;
DrawablePtr pDraw;
long length;
ShmDescPtr shmdesc;
REQUEST(xShmPutImageReq);
REQUEST_SIZE_MATCH(xShmPutImageReq);
VALIDATE_DRAWABLE_AND_GC(stuff->drawable, pDraw, DixWriteAccess);
VERIFY_SHMPTR(stuff->shmseg, stuff->offset, FALSE, shmdesc, client);
// NOTE: value must be _exactly_ 0/1
if ((stuff->sendEvent != xTrue) && (stuff->sendEvent != xFalse))
return BadValue;
if (stuff->format == XYBitmap) {
if (stuff->depth != 1)
return BadMatch;
length = PixmapBytePad(stuff->totalWidth, 1);
}
else if (stuff->format == XYPixmap) {
if (pDraw->depth != stuff->depth)
return BadMatch;
length = PixmapBytePad(stuff->totalWidth, 1);
length *= stuff->depth;
}
else if (stuff->format == ZPixmap) {
if (pDraw->depth != stuff->depth)
return BadMatch;
length = PixmapBytePad(stuff->totalWidth, stuff->depth);
}
else {
client->errorValue = stuff->format;
return BadValue;
}
// NOTE/BUG: The following block is the "check parameters" code. If the
// given drawing parameters of the request (e.g. X, Y, width, height) [or
// combinations thereof] exceed the geometry/size of the shmarea, the
// BadValue error is being returned here and the code to send a return
// event will _not_ be executed. The bug isn't really here, it's on the
// client side, but it's the client side bug that causes the event to be
// suppressed
/*
* There's a potential integer overflow in this check:
* VERIFY_SHMSIZE(shmdesc, stuff->offset, length * stuff->totalHeight,
* client);
* the version below ought to avoid it
*/
if (stuff->totalHeight != 0 &&
length > (shmdesc->size - stuff->offset) / stuff->totalHeight) {
client->errorValue = stuff->totalWidth;
return BadValue;
}
if (stuff->srcX > stuff->totalWidth) {
client->errorValue = stuff->srcX;
return BadValue;
}
if (stuff->srcY > stuff->totalHeight) {
client->errorValue = stuff->srcY;
return BadValue;
}
if ((stuff->srcX + stuff->srcWidth) > stuff->totalWidth) {
client->errorValue = stuff->srcWidth;
return BadValue;
}
if ((stuff->srcY + stuff->srcHeight) > stuff->totalHeight) {
client->errorValue = stuff->srcHeight;
return BadValue;
}
// NOTE: this is where the drawing takes place
if ((((stuff->format == ZPixmap) && (stuff->srcX == 0)) ||
((stuff->format != ZPixmap) &&
(stuff->srcX < screenInfo.bitmapScanlinePad) &&
((stuff->format == XYBitmap) ||
((stuff->srcY == 0) &&
(stuff->srcHeight == stuff->totalHeight))))) &&
((stuff->srcX + stuff->srcWidth) == stuff->totalWidth))
(*pGC->ops->PutImage) (pDraw, pGC, stuff->depth,
stuff->dstX, stuff->dstY,
stuff->totalWidth, stuff->srcHeight,
stuff->srcX, stuff->format,
shmdesc->addr + stuff->offset +
(stuff->srcY * length));
else
doShmPutImage(pDraw, pGC, stuff->depth, stuff->format,
stuff->totalWidth, stuff->totalHeight,
stuff->srcX, stuff->srcY,
stuff->srcWidth, stuff->srcHeight,
stuff->dstX, stuff->dstY, shmdesc->addr + stuff->offset);
// NOTE: this is where the return event gets sent
if (stuff->sendEvent) {
xShmCompletionEvent ev = {
.type = ShmCompletionCode,
.drawable = stuff->drawable,
.minorEvent = X_ShmPutImage,
.majorEvent = ShmReqCode,
.shmseg = stuff->shmseg,
.offset = stuff->offset
};
WriteEventsToClient(client, 1, (xEvent *) &ev);
}
return Success;
}
Your source code will be the ultimate piece which we can analyse but since I understand Assembly very less, I will give you an answer on macro level. Exact answer is still unknown to me.
I case of too many events only it is creating this issue but not with normal occurrence of events which means you framework is running out of virtual memory or another frame of event is triggered before the previous one releases its memory. In this kind of situations you can do few things
Try to check if there is any memory leak or not.. After one frame of event is over, try to clean up the memory or end that frame object properly before triggering the new one.
You can also develop a mechanism to make the second frame wait for the first frame to get over. In C/C++ we do it using so many synchronization methods like Mutex or select system calls. If you design follows that kind of pattern, then you can do these
If anywhere you have the authority to change the allocated memory given to your window, try to increase it. Because one thing for sure (according to your explanation ) that this is some memory issue.
Replying on Edit 3 It looks like you are calling some method cinvoke. How it is internally handling the even is unknown to me. Why don't you implement it directly in C. I am sure for whatever target that you are working, you will get some Cross-compiler.

Why access in a for loop is faster than access in a ranged-for in -O0 but not in -O3?

I'm learning performance in C++ (and C++11). And I need to performance in Debug and Release mode because I spend time in debugging and in executing.
I'm surprise with this two tests and how much change with the different compiler flags optimizations.
Test iterator 1:
Optimization 0 (-O0): faster.
Optimization 3 (-O3): slower.
Test iterator 2:
Optimization 0 (-O0): slower.
Optimization 3 (-O3): faster.
P.D.: I use the following clock code.
Test iterator 1:
void test_iterator_1()
{
int z = 0;
int nv = 1200000000;
std::vector<int> v(nv);
size_t count = v.size();
for (unsigned int i = 0; i < count; ++i) {
v[i] = 1;
}
}
Test iterator 2:
void test_iterator_2()
{
int z = 0;
int nv = 1200000000;
std::vector<int> v(nv);
for (int& i : v) {
i = 1;
}
}
UPDATE: The problem is still the same, but for ranged-for in -O3 the differences is small. So for loop 1 is the best.
UPDATE 2: Results:
With -O3:
t1: 80 units
t2: 74 units
With -O0:
t1: 287 units
t2: 538 units
UPDATE 3: The CODE!. Compile with: g++ -std=c++11 test.cpp -O0 (and then -O3)
Your first test is actually setting the value of each element in the vector to 1.
Your second test is setting the value of a copy of each element in the vector to 1 (the original vector is the same).
When you optimize, the second loop more than likely is removed entirely as it is basically doing nothing.
If you want the second loop to actually set the value:
for (int& i : v) // notice the &
{
i = 1;
}
Once you make that change, your loops are likely to produce assembly code that is almost identical.
As a side note, if you wanted to initialize the entire vector to a single value, the better way to do it is:
std::vector<int> v(SIZE, 1);
EDIT
The assembly is fairly long (100+ lines), so I won't post it all, but a couple things to note:
Version 1 will store a value for count and increment i, testing for it each time. Version 2 uses iterators (basically the same as std::for_each(b.begin(), v.end() ...)). So the code for the loop maintenance is very different (it is more setup for version 2, but less work each iteration).
Version 1 (just the meat of the loop)
mov eax, DWORD PTR _i$2[ebp]
push eax
lea ecx, DWORD PTR _v$[ebp]
call ??A?$vector#HV?$allocator#H#std###std##QAEAAHI#Z ; std::vector<int,std::allocator<int> >::operator[]
mov DWORD PTR [eax], 1
Version 2 (just the meat of the loop)
mov eax, DWORD PTR _i$2[ebp]
mov DWORD PTR [eax], 1
When they get optimized, this all changes and (other than the ordering of a few instructions), the output is almost identical.
Version 1 (optimized)
push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH
push ecx
lea ecx, DWORD PTR _v$[ebp]
mov DWORD PTR _v$[ebp], 0
mov DWORD PTR _v$[ebp+4], 0
mov DWORD PTR _v$[ebp+8], 0
call ?resize#?$vector#HV?$allocator#H#std###std##QAEXI#Z ; std::vector<int,std::allocator<int> >::resize
mov ecx, DWORD PTR _v$[ebp+4]
mov edx, DWORD PTR _v$[ebp]
sub ecx, edx
sar ecx, 2 ; this is the only differing instruction
test ecx, ecx
je SHORT $LN3#test_itera
push edi
mov eax, 1
mov edi, edx
rep stosd
pop edi
$LN3#test_itera:
test edx, edx
je SHORT $LN21#test_itera
push edx
call DWORD PTR __imp_??3#YAXPAX#Z
add esp, 4
$LN21#test_itera:
mov esp, ebp
pop ebp
ret 0
Version 2 (optimized)
push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH
push ecx
lea ecx, DWORD PTR _v$[ebp]
mov DWORD PTR _v$[ebp], 0
mov DWORD PTR _v$[ebp+4], 0
mov DWORD PTR _v$[ebp+8], 0
call ?resize#?$vector#HV?$allocator#H#std###std##QAEXI#Z ; std::vector<int,std::allocator<int> >::resize
mov edx, DWORD PTR _v$[ebp]
mov ecx, DWORD PTR _v$[ebp+4]
mov eax, edx
cmp edx, ecx
je SHORT $LN1#test_itera
$LL33#test_itera:
mov DWORD PTR [eax], 1
add eax, 4
cmp eax, ecx
jne SHORT $LL33#test_itera
$LN1#test_itera:
test edx, edx
je SHORT $LN47#test_itera
push edx
call DWORD PTR __imp_??3#YAXPAX#Z
add esp, 4
$LN47#test_itera:
mov esp, ebp
pop ebp
ret 0
Do not worry about how much time each operation takes, that falls squarely under the premature optimization is the root of all evil quote by Donald Knuth. Write easy to understand, simple programs, your time while writing the program (and reading it next week to tweak it, or to find out why the &%$# it is giving crazy results) is much more valuable than any computer time wasted. Just compare your weekly income to the price of an off-the-shelf machine, and think how much of your time is required to shave off a few minutes of compute time.
Do worry when you have measurements showing that the performance isn't adequate. Then you must measure where your runtime (or memory, or whatever else resource is critical) is spent, and see how to make that better. The (sadly out of print) book "Writing Efficient Programs" by Jon Bentley (much of it also appears in his "Programming Pearls") is an eye-opener, and a must read for any budding programmer.
Optimization is pattern matching: The compiler has a number of different situations it can recognize and optimize. If you change the code in a way that makes the pattern unrecognizable to the compiler, suddenly the effect of your optimization vanishes.
So, what you are witnessing is nothing more or less than that the ranged for loop produces more bloated code without optimization, but that in this form the optimizer is able to recognize a pattern that it cannot recognize for the iterator-free case.
In any case, if you are curious, you should take a look at the produced assembler code (compile with -S option).

Effective for loop in assembly

Im currently trying to get used to assembler and I have written a for loop in c++ and then I have looked at it in disassembly. I was wondering if anyone could explain to me what each step does and/or how to improve the loop manually.
for (int i = 0; i < length; i++){
013A17AE mov dword ptr [i],0
013A17B5 jmp encrypt_chars+30h (13A17C0h)
013A17B7 mov eax,dword ptr [i]
013A17BA add eax,1
013A17BD mov dword ptr [i],eax
013A17C0 mov eax,dword ptr [i]
013A17C3 cmp eax,dword ptr [length]
013A17C6 jge encrypt_chars+6Bh (13A17FBh)
temp_char = OChars [i]; // get next char from original string
013A17C8 mov eax,dword ptr [i]
013A17CB mov cl,byte ptr OChars (13AB138h)[eax]
013A17D1 mov byte ptr [temp_char],cl
Thanks in advance.
First, I'd note that what you've posted seems to contain only part of the loop body. Second, it looks like you compiled with all optimization turned off -- when/if you turn on optimization, don't be surprised if the result looks rather different.
That said, let's look at the code line-by-line:
013A17AE mov dword ptr [i],0
This is basically just i=0.
013A17B5 jmp encrypt_chars+30h (13A17C0h)
This is going to the beginning of the loop. Although it's common to put the test at the top of a loop in most higher level languages, that's not always the case in assembly language.
013A17B7 mov eax,dword ptr [i]
013A17BA add eax,1
013A17BD mov dword ptr [i],eax
This is i++ in (extremely sub-optimal) assembly language. It's retrieving the current value of i, adding one to it, then storing the result back into i.
013A17C0 mov eax,dword ptr [i]
013A17C3 cmp eax,dword ptr [length]
013A17C6 jge encrypt_chars+6Bh (13A17FBh)
This is basically if (i==length) /* skip forward to some code you haven't shown */ It's retrieving the value of i and comparing it to the value of length, the jumping somewhere if i was greater than or equal to length.
If you were writing this in assembly language by hand, you'd normally use something like xor eax, eax (or sub eax, eax) to zero a register. In most cases, you'd start from the maximum and count down to zero if possible (avoids a comparison in the loop). You certainly wouldn't store a value into a variable, then immediately retrieve it back out (in fairness, a compiler probably won't do that either, if you turn on optimization).
Applying that, and moving the "variables" into registers, we'd end up with something on this general order:
mov ecx, length
loop_top:
; stuff that wasn't pasted goes here
dec ecx
jnz loop_top
I'll try to interpret this in plain english:
013A17AE mov dword ptr [i],0 ; Move into i, 0
013A17B5 jmp encrypt_chars+30h (13A17C0h) ; Jump to check
013A17B7 mov eax,dword ptr [i] ; Load i into the accumulator (register eax)
013A17BA add eax,1 ; Increment the accumulator
013A17BD mov dword ptr [i],eax ; and put that in it, effectively adding
; 1 to i.
check:
013A17C0 mov eax,dword ptr [i] ; Move i into the accumulator
013A17C3 cmp eax,dword ptr [length] ; Compare it to the value in 'length',
; setting flags
013A17C6 jge encrypt_chars+6Bh (13A17FBh) ; Jump if it's greater or equal. This
; address is not in your code snippet
The compiler preferes EAX for arithmetic. Each register (in the past, I don't know if this is still current) has some type of operation that it is faster at doing.
Here's the part that should be more optimized:
(note: your compiler SHOULD do this, so either you have optimizations turned off, or something in the loop body is preventing this optimization)
mov eax,dword ptr [i] ; Go get "i" from memory, put it in register EAX
add eax,1 ; Add one to register EAX
mov dword ptr [i],eax ; Put register EAX back in memory "i". (now one bigger)
mov eax,dword ptr [i] ; Go get "i" from memory, put it in EAX again.
See how often you're moving values back-n-forth from memory to EAX?
You should be able to load "i" into EAX once at the beginning of the loop, run the full loop directly out of EAX, and put the finished value back into "i" after its all done.
(unless something else in your code prevents this)
Anyway, this code comes from DEBUG build. It is possible to optimize it, but MS compiler produces very good code for such simple cases.
There is no point to do it manually, just re-build it in release mode and read the listing to learn how to do it.