How to pass float as function argument (external call) and return float in inline assembly? An example below doesn't work and is crashing the app. The comments are mine, so they might be also wrong.
The first two lines are added by myself just for this example. Originally I start with two float values on st(0) and st(1) and there's nothing I can do about that.
fld a ; load float 'a' on st(0)
fld b ; load float 'b' on st(0), 'a' is now st(1)
sub esp, 4 ; make room for float
fstp dword ptr [esp] ; push st(0) on stack, pop st(0)
mov ecx, ebp ; move 'this' on ecx
call Class::ModValue ; returns float on st(0)
fcompp ; compare returned st(0) with st(1)
fnstsw ax
test ah, 41h
jnz Exit_label
The code snippet above is inside asm{} block, there is more unimportant code before and after that. The crash is happening between 1st line of this code fragment and ModValue function call.
Function signature:
float Class::ModValue(float value)
{
_LOG("ModValue") // doesn't show
return value;
}
Compiler: VisualStudio, Architecture: x86, Calling convention: __thiscall
if it's a __thiscall
According to msdn:
The __thiscall calling convention is used on member functions and is the default calling convention used by C++ member functions that do not use variable arguments. Under __thiscall, the callee cleans the stack, which is impossible for vararg functions. Arguments are pushed on the stack from right to left, with the this pointer being passed via register ECX, and not on the stack, on the x86 architecture.
source: https://msdn.microsoft.com/en-us/library/ek8tkfbw.aspx
then a small example:
float DoStuffs(void *pThis, float a)
{
float flResult = 0.0f;
__asm
{
push a;
mov ecx, pThis;
call Class::ModValue;
fstp[flResult];
}
return flResult;
}
Some extended example:
class Test
{
public:
float b;
float Add(float a);
};
float Test::Add(float a)
{
return a + this->b;
}
float CallTest(void* pThis, float x)
{
float flResult = 0.0f;
__asm
{
push x;
mov ecx, pThis;
call Test::Add;
fstp[flResult];
}
return flResult;
}
int main()
{
void* m = malloc(4);
*(float*)m = 2.0f;
std::cout << CallTest(m, 2.1f) << std::endl;
return 0;
}
Related
Have made function that operates on several streams of data in same time, creates output result which is put to destination stream. It has been put huge amount of time to optimize performance of this function (openmp, intrinsics, and etc...). And it performs beautifully.
There is alot math involved here, needless to say very long function.
Now I want to implement in same function with math replacement code for each instance of this without writing each version of this function. Where I want to differentiate between different instances of this function using only #defines or inlined function (code has to be inlined in each version).
Went for templates, but templates allow only type specifiers, and realized that #defines can't be used here. Remaining solution would be inlined math functions, so simplified idea is to create header like this:
'alm_quasimodo.h':
#pragma once
typedef struct ALM_DATA
{
int l, t, r, b;
int scan;
BYTE* data;
} ALM_DATA;
typedef BYTE (*MATH_FX)(BYTE&, BYTE&);
// etc
inline BYTE math_a1(BYTE& A, BYTE& B){ return ((BYTE)((B > A) ? B:A)); }
inline BYTE math_a2(BYTE& A, BYTE& B){ return ((BYTE)(255 - ((long)((long)(255 - A) * (255 - B)) >> 8))); }
inline BYTE math_a3(BYTE& A, BYTE& B){ return ((BYTE)((B < 128)?(2*(((long)A>>1)+64))*((float)B/255):(255-(2*(255-(((long)A>>1)+64))*(float)(255-B)/255)))); }
// etc
template <typename MATH>
inline int const template_math_av (MATH math, ALM_DATA& a, ALM_DATA& b)
{
// ultra simplified version of very complex code
for (int y = a.t; y <= a.b; y++)
{
int yoffset = y * a.scan;
for (int x = a.l; x <= a.r; x++)
{
int xoffset = yoffset + x;
a.data[xoffset] = math(a.data[xoffset], b.data[xoffset]);
}
}
return 0;
}
ALM_API int math_caller(int condition, ALM_DATA& a, ALM_DATA& b);
and math_caller is defined in 'alm_quasimodo.cpp' as follows:
#include "stdafx.h"
#include "alm_quazimodo.h"
ALM_API int math_caller(int condition, ALM_DATA& a, ALM_DATA& b)
{
switch(condition)
{
case 1: return template_math_av<MATH_FX>(math_a1, a, b);
break;
case 2: return template_math_av<MATH_FX>(math_a2, a, b);
break;
case 3: return template_math_av<MATH_FX>(math_a3, a, b);
break;
// etc
}
return -1;
}
Main concern here is optimization, mainly in-lining of MATH function code, and not to break existing optimizations of original code. Without writing each instance of function for specific math operation, of course ;)
So does this template inlines properly all math functions?
And any suggestions how to optimize this function template?
If nothing, thanks for reading this lengthy question.
It all depends on your compiler, optimization level, and how and where are math_a1 to math_a3 functions defined.
Usually, the compiler can optimize this if the functions in question are inline function in the same compilation unit as the rest of the code.
If this doesn't happen for you, you may want to consider functors instead of functions.
Here are some simple examples I experimented with. You can do the same for your function, and check the behavior of different compilers.
For my example, GCC 7.3 and clang 6.0 are pretty good in optimizing-out function calls (provided they see the definition of the function of course). However, somewhat surprisingly, ICC 18.0.0 is only able to optimize-out functors and closures. Even inline functions give it some trouble.
Just to have some code here in case the link stops working in the future.
For the following code:
template <typename T, int size, typename Closure>
T accumulate(T (&array)[size], T init, Closure closure) {
for (int i = 0; i < size; ++i) {
init = closure(init, array[i]);
}
return init;
}
int sum(int x, int y) { return x + y; }
inline int sub_inline(int x, int y) { return x - y; }
struct mul_functor {
int operator ()(int x, int y) const { return x * y; }
};
extern int extern_operation(int x, int y);
int accumulate_function(int (&array)[5]) {
return accumulate(array, 0, sum);
}
int accumulate_inline(int (&array)[5]) {
return accumulate(array, 0, sub_inline);
}
int accumulate_functor(int (&array)[5]) {
return accumulate(array, 1, mul_functor());
}
int accumulate_closure(int (&array)[5]) {
return accumulate(array, 0, [](int x, int y) { return x | y; });
}
int accumulate_exetern(int (&array)[5]) {
return accumulate(array, 0, extern_operation);
}
GCC 7.3 (x86) produces the following assembly:
sum(int, int):
lea eax, [rdi+rsi]
ret
accumulate_function(int (&) [5]):
mov eax, DWORD PTR [rdi+4]
add eax, DWORD PTR [rdi]
add eax, DWORD PTR [rdi+8]
add eax, DWORD PTR [rdi+12]
add eax, DWORD PTR [rdi+16]
ret
accumulate_inline(int (&) [5]):
mov eax, DWORD PTR [rdi]
neg eax
sub eax, DWORD PTR [rdi+4]
sub eax, DWORD PTR [rdi+8]
sub eax, DWORD PTR [rdi+12]
sub eax, DWORD PTR [rdi+16]
ret
accumulate_functor(int (&) [5]):
mov eax, DWORD PTR [rdi]
imul eax, DWORD PTR [rdi+4]
imul eax, DWORD PTR [rdi+8]
imul eax, DWORD PTR [rdi+12]
imul eax, DWORD PTR [rdi+16]
ret
accumulate_closure(int (&) [5]):
mov eax, DWORD PTR [rdi+4]
or eax, DWORD PTR [rdi+8]
or eax, DWORD PTR [rdi+12]
or eax, DWORD PTR [rdi]
or eax, DWORD PTR [rdi+16]
ret
accumulate_exetern(int (&) [5]):
push rbp
push rbx
lea rbp, [rdi+20]
mov rbx, rdi
xor eax, eax
sub rsp, 8
.L8:
mov esi, DWORD PTR [rbx]
mov edi, eax
add rbx, 4
call extern_operation(int, int)
cmp rbx, rbp
jne .L8
add rsp, 8
pop rbx
pop rbp
ret
I have simple class using a kind of ATL database access.
All functions are defined in a header file.
The problematic functions all do the same. There are some macros in use. The generated code looks like this
void InitBindings()
{
if (sName) // Static global char*
m_sTableName = sName; // Save into member
{ AddCol("Name", some_constant_data... _GetOleDBType(...), ...); };
{ AddCol("Name1", some_other_constant_data_GetOleDBType(...), ...); };
...
}
AddCol returns a reference to a structure, but as you see it is ignored.
When I look into the assembler code where I have a function that uses 6 AddCol calls I can see that the function requires 2176 bytes of stack space. I have functions that requires 20kb and more. And in the debugger I can see that the stack isn't use at all. (All initialized to 0xCC and never touched)
See assembler code at the end.
The problem can be seen with VS-2015, and VS-2017.Only in Debug mode.
In Release mode the function reserves no extra stack space at all.
The only rule I see is; more AddCol calls, will cause more stack to be reserved. I can see that approximativ 500bytes per AddCol call is reserved.
Again: The function returns no object, it returns a reference to the binding information.
I already used the following pragmas in front of the function (but inside the class definition in the header):
__pragma(runtime_checks("", off)) __pragma(optimize("ts", on)) __pragma(strict_gs_check(push, off))
But no avail. This pragmas should turn optimization on, switches off runtime checks and stack checks. How can I reduce this unneeded stack space that is allocated. In some cases I can see stack overflows in the debug version, when this functions are used. No problems in the release version.
; 325 : BIND_BEGIN(CMasterData, _T("tblMasterData"))
push ebp
mov ebp, esp
sub esp, 2176 ; 00000880H
push ebx
push esi
push edi
mov DWORD PTR _this$[ebp], ecx
mov eax, OFFSET ??_C#_1BM#GOLNKAI#?$AAt?$AAb?$AAl?$AAM?$AAa?$AAs?$AAt?$AAe?$AAr?$AAD?$AAa?$AAt?$AAa?$AA?$AA#
test eax, eax
je SHORT $LN2#InitBindin
push OFFSET ??_C#_1BM#GOLNKAI#?$AAt?$AAb?$AAl?$AAM?$AAa?$AAs?$AAt?$AAe?$AAr?$AAD?$AAa?$AAt?$AAa?$AA?$AA#
mov ecx, DWORD PTR _this$[ebp]
add ecx, 136 ; 00000088H
call DWORD PTR __imp_??4?$CStringT#_WV?$StrTraitMFC_DLL#_WV?$ChTraitsCRT#_W#ATL#####ATL##QAEAAV01#PB_W#Z
$LN2#InitBindin:
; 326 : // Columns:
; 327 : B$C_IDENT (_T("Id"), m_lId);
push 0
push 0
push 1
push 4
push 0
call ?_GetOleDBType#ATL##YAGAAJ#Z ; ATL::_GetOleDBType
add esp, 4
movzx eax, ax
push eax
push 0
push OFFSET ??_C#_15NCCOGFKM#?$AAI?$AAd?$AA?$AA#
mov ecx, DWORD PTR _this$[ebp]
call ?AddCol#CDBAccess#DB##QAEAAUS_BIND#2#PB_WKGKW4TYPE#32#0_N#Z ; DB::CDBAccess::AddCol
; 328 : B$C (_T("Name"), m_szName);
push 0
push 0
push 0
push 122 ; 0000007aH
mov eax, 4
push eax
call ?_GetOleDBType#ATL##YAGQA_W#Z ; ATL::_GetOleDBType
add esp, 4
movzx ecx, ax
push ecx
push 4
push OFFSET ??_C#_19DINFBLAK#?$AAN?$AAa?$AAm?$AAe?$AA?$AA#
mov ecx, DWORD PTR _this$[ebp]
call ?AddCol#CDBAccess#DB##QAEAAUS_BIND#2#PB_WKGKW4TYPE#32#0_N#Z ; DB::CDBAccess::AddCol
; 329 : B$C (_T("Data"), m_data);
push 0
push 0
push 0
push 4
push 128 ; 00000080H
call ?_GetOleDBType#ATL##YAGAAVCComBSTR#1##Z ; ATL::_GetOleDBType
add esp, 4
movzx eax, ax
push eax
push 128 ; 00000080H
push OFFSET ??_C#_19IEEMEPMH#?$AAD?$AAa?$AAt?$AAa?$AA?$AA#
mov ecx, DWORD PTR _this$[ebp]
call ?AddCol#CDBAccess#DB##QAEAAUS_BIND#2#PB_WKGKW4TYPE#32#0_N#Z ; DB::CDBAccess::AddCol
It is a compiler bug. Already known in connect.
EDIT The problem seams to be fixed in VS-2017 15.5.1
The problem has to do with a bug in the built in offsetof.
It is not possible for me to #undef _CRT_USE_BUILTIN_OFFSETOF as written in this case.
For me it only works to #undef offsetof and to use one of this:
#define myoffsetof1(s,m) ((size_t)&reinterpret_cast<char const volatile&>((((s*)0)->m)))
#define myoffsetof2(s, m) ((size_t)&(((s*)0)->m))
#undef offsetof
#define offsetof myoffsetof1
All ATL DB consumers are affected.
Here is a minimum repro, that shows the bug. Set a breakpint on the Init function. Look into the assembler code and wonder how much stack is used!
// StackUsage.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <string>
#include <list>
#include <iostream>
using namespace std;
struct CRec
{
char t1[20];
char t2[20];
char t3[20];
char t4[20];
char t5[20];
int i1, i2, i3, i4, i5;
GUID g1, g2, g3, g4, g5;
DBTIMESTAMP d1, d2, d3, d4, d5;
};
#define sizeofmember(s,m) sizeof(reinterpret_cast<const s *>(0)->m)
#define typeofmember(c,m) _GetOleDBType(((c*)0)->m)
#define myoffsetof1(s,m) ((size_t)&reinterpret_cast<char const volatile&>((((s*)0)->m)))
#define myoffsetof2(s, m) ((size_t)&(((s*)0)->m))
// Undef this lines to fix the bug
// #undef offsetof
// #define offsetof myoffsetof1
#define COL(n,v) { AddCol(n,offsetof(CRec,v),typeofmember(CRec,v),sizeofmember(CRec,v)); }
class CFoo
{
public:
CFoo()
{
Init();
}
void Init()
{
COL("t1", t1);
COL("t2", t2);
COL("t3", t3);
COL("t4", t4);
COL("t5", t5);
COL("i1", i1);
COL("i2", i2);
COL("i3", i3);
COL("i4", i4);
COL("i5", i5);
COL("g1", g1);
COL("g2", g2);
COL("g2", g3);
COL("g2", g4);
COL("g2", g5);
COL("d1", d1);
COL("d2", d2);
COL("d2", d3);
COL("d2", d4);
COL("d2", d5);
}
void AddCol(PCSTR szName, ULONG nOffset, DBTYPE wType, ULONG nSize)
{
cout << szName << '\t' << nOffset << '\t' << wType << '\t' << nSize << endl;
}
};
int main()
{
CFoo foo;
return 0;
}
I have the following C++ code
int myFuncSum( int a, int b)
{
int c;
c = a + b;
return c;
}
int main(int argc, char *argv[])
{
int result;
result = myFuncSum(100, 200);
return 0;
}
When I step through this in the disassembly window of Visual Studio 2008 on my Win7 Pro machine, I see the following for the call to myFuncSum():
int myFuncSum( int a, int b)
{
001C1000 push ebp
001C1001 mov ebp,esp
001C1003 push ecx
int c;
c = a + b;
001C1004 mov eax,dword ptr [a] <--------------------
001C1007 add eax,dword ptr [b] <--------------------
001C100A mov dword ptr [c],eax <--------------------
return c;
001C100D mov eax,dword ptr [c] <--------------------
}
As I have indicated, the four lines that refer to the variables a, b and c refer to them as such, not as offsets relative to EBP.
Can anyone please suggest what I need to do to Visual Studio to get it to refer to them via EBP? I have already disabled C++ optimisations in the project settings; without doing so I don't even get my function called.
The disassembler is just trying to be helpful, not spamming you too much about irrelevant details. Not entirely accidental, this syntax abbreviation is also permitted in inline assembly with the __asm keyword.
You can switch back-and-forth, right-click the disassembly window and untick Show Symbol Names. Now you'll see the real machine code:
c = a + b;
003613DE mov eax,dword ptr [ebp+8]
003613E1 add eax,dword ptr [ebp+0Ch]
003613E4 mov dword ptr [ebp-8],eax
return c;
003613E7 mov eax,dword ptr [ebp-8]
So I've asked this before but with significantly less detail. The question title accurately describes the problem: I have a method in C++ that I am trying to call from assembly (x86) that has both parameters and a return value. I have a rough understanding, at best, of assembly and a fairly solid understanding of C++ (otherwise I would not have undertaken this problem). Here's what I have as far as code goes:
// methodAddr is a pointer to the method address
void* methodAddr = method->Address;
// buffer is an int array of parameter values. The parameters can be anything (of any type)
// but are copied into an int array so they can be pushed onto the stack in reverse order
// 4 bytes at a time (as in push (int)). I know there's an issue here that is irrelevent to my baseline testing, in that if any parameter is over 4 bytes it will be broken and
// reversed (which is not good) but for basic testing this isn't an issue, so overlook this.
for (int index = bufferElementCount - 1; index >= 0; index--)
{
int val = buffer[index];
__asm
{
push val
}
}
int returnValueCount = 0;
// if there is a return value, allocate some space for it and push that onto the stack after
// the parameters have been pushed on
if (method->HasReturnValue)
{
*returnSize = method->ReturnValueSize;
outVal = new char[*returnSize];
returnValueCount = (*returnSize / 4) + (*returnSize % 4 != 0 ? 1 : 0);
memset(outVal, 0, *returnSize);
for (int index = returnValueCount - 1; index >= 0; index--)
{
char* addr = ((char*)outVal) + (index * 4);
__asm
{
push addr
}
}
}
// calculate the stack pointer offset so after the call we can pop the parameters and return value
int espOffset = (bufferElementCount + returnValueCount) * 4;
// call the method
__asm
{
call methodAddr;
add esp, espOffset
};
For my basic testing I am using a method with the following signature:
Person MyMethod3( int, char, int );
The problem is this: when omit the return value from the method signature, all of the parameter values are properly passed. But when I leave the method as is, the parameter data that is passed is incorrect but the value returned is correct. So my question, obviously, is what is wrong? I've tried pushing the return value space onto the stack before the parameters. The person structure is as follows:
class Person
{
public:
Text Name;
int Age;
float Cash;
ICollection<Person*>* Friends;
};
Any help would be greatly appreciated. Thanks!
I'm using Visual Studio 2013 with the November 2013 CTP compiler for C++, targeting x86.
As it relates to disassembly, this is the straight method call:
int one = 876;
char two = 'X';
int three = 9738;
Person p = MyMethod3(one, two, three);
And here is the disassembly for that:
00CB0A20 mov dword ptr [one],36Ch
char two = 'X';
00CB0A27 mov byte ptr [two],58h
int three = 9738;
00CB0A2B mov dword ptr [three],260Ah
Person p = MyMethod3(one, two, three);
00CB0A32 push 10h
00CB0A34 lea ecx,[p]
00CB0A37 call Person::__autoclassinit2 (0C6AA2Ch)
00CB0A3C mov eax,dword ptr [three]
00CB0A3F push eax
00CB0A40 movzx ecx,byte ptr [two]
00CB0A44 push ecx
00CB0A45 mov edx,dword ptr [one]
00CB0A48 push edx
00CB0A49 lea eax,[p]
00CB0A4C push eax
00CB0A4D call MyMethod3 (0C6B783h)
00CB0A52 add esp,10h
00CB0A55 mov dword ptr [ebp-4],0
My interpretation of this is as follows:
Execute the assignments to the local variables. Then create the output register. Then put the parameters in a particular register (the order here happens to be eax, ecx, and edx, which makes sense (eax and ebx are for one, ecx is for two, and edx and some other register for the last parameter?)). Then call LEA (load-effective address) which I don't understand but have understood to be a MOV. Then it calls the method with an address as the parameter? And then moves the stack pointer to pop the parameters and return value.
Any further explanation is appreciated, as I'm sure my understanding here is somewhat flawed.
I have a pointer to struct, its 0xB7CD98. And the offset to some float value 0x540. How to get this value. All its in C++ and assembler. Another thing is that this its code from my dll injected to exe.
float buffer ;
_asm {
MOV EAX, [0xB7CD98]+0x540
MOV buffer, EAX
}
But it isn't work. Why?
Why are you using Assembly for that?
float* pBuffer = (float*)(0xB7CD98 + 0x540)
printf("%f", *pBuffer);
Disassembly of your code :
0FB310E7 mov eax,0B7D2D8h;
0FB310EC mov dword ptr [buffer],eax
//It just fill 'buffer' with 0xB7CD98+0x540
What you really want is this :
DWORD basePtr = *(DWORD*)0xB7CD98;
float someVal = *(float*)(basePtr + 0x540);
Or, if you wish to obtain a permanent pointer to this value :
typedef struct _XStruct
{
BYTE fill_0[0x540];
float Value;
}*PXStruct;
//...
PXStruct basePtr = (PXStruct)0xB7CD98;
//0F0745E7 mov dword ptr [basePtr],0B7CD98h
float buffer = basePtr->Value;
//0F0745EE mov eax,dword ptr [basePtr]
//0F0745F1 fld dword ptr [eax+540h]
//0F0745F7 fstp dword ptr [buffer]