Replacement for vsscanf on msvc - c++

I've run into an issue porting a codebase from linux (gcc) to windows (msvc). It seems like the C99 function vsscanf isn't available and has no obvious replacement.
I've read about a solution using the internal function _input_l and linking statically to the crt runtime, but unfortunately I cannot link statically since it would mess with all the plugins (as dlls) being loaded by the application.
So is there any replacement or a way to write a wrapper for vsscanf?
Update 2016-02-24:
When this was first asked there was no native replacement but since then MSVC has implemented support for this and much more.
VS2013 and later implements vsscanf and friends.
C++11 includes support as well.

A hack that should work:
int vsscanf(const char *s, const char *fmt, va_list ap)
{
void *a[20];
int i;
for (i=0; i<sizeof(a)/sizeof(a[0]); i++) a[i] = va_arg(ap, void *);
return sscanf(s, fmt, a[0], a[1], a[2], a[3], a[4], a[5], a[6], /* etc... */);
}
Replace 20 with the max number of args you think you might need. This code isn't terribly portable but it's only intended to be used on one particular broken system missing vsscanf so that shouldn't matter so much.

A quick search turned up several suggestions, including http://www.flipcode.net/archives/vsscanf_for_Win32.shtml

As this is tagged C++ have you considered just biting the bullet and moving away from the scanf line of functions completely? The C++ idiomatic way would be to use a std::istringstream. Rewriting to make use of that instead of looking for a vsscanf replacement would possibly be easier and more portable, not to mention having much greater type safety.

Funny it never came up for me before today. I could've sworn I'd used the function in the past. But anyway, here's a solution that works and is as safe as your arguments and format string:
template < size_t _NumArgs >
int VSSCANF_S(LPCTSTR strSrc, LPCTSTR ptcFmt, INT_PTR (&arr)[_NumArgs]) {
class vaArgs
{
vaArgs() {}
INT_PTR* m_args[_NumArgs];
public:
vaArgs(INT_PTR (&arr)[_NumArgs])
{
for(size_t nIndex=0;nIndex<_NumArgs;++nIndex)
m_args[nIndex] = &arr[nIndex];
}
};
return sscanf_s(strSrc, ptcFmt, vaArgs(arr));
}
///////////////////////////////////////////////////////////////////////////////
int _tmain(int, LPCTSTR argv[])
{
INT_PTR args[3];
int nScanned = VSSCANF_S(_T("-52 Hello 456 #"), _T("%d Hello %u %c"), args);
return printf(_T("Arg1 = %d, arg2 = %u, arg3 = %c\n"), args[0], args[1], args[2]);
}
Out:
Arg1 = -52, arg2 = 456, arg3 = #
Press any key to continue . . .
Well I can't get the formatting right but you get the idea.

if you want to wrap sscanf and you are using C++11, you can do this:
template<typename... Args>
int mysscanf(const char* str, const char* fmt, Args... args) {
//...
return sscanf(str, fmt, args...);
}
to make this work on msvc, you need to download this update:
http://www.microsoft.com/en-us/download/details.aspx?id=35515

modified from :
http://www.gamedev.net/topic/310888-no-vfscanf-in-visual-studio/
#if defined(_WIN32) && (_MSC_VER <= 1500)
static int vsscanf(
const char *buffer,
const char *format,
va_list argPtr
)
{
// Get an upper bound for the # of args
size_t count = 0;
const char* p = format;
while(1)
{
char c = *(p++);
if (c == 0)
break;
if (c == '%' && (p[0] != '*' && p[0] != '%'))
++count;
}
if (count <= 0)
return 0;
int result;
// copy stack pointer
_asm
{
mov esi, esp;
}
// push variable parameters pointers on stack
for (int i = count - 1; i >= 0; --i)
{
_asm
{
mov eax, dword ptr[i];
mov ecx, dword ptr [argPtr];
mov edx, dword ptr [ecx+eax*4];
push edx;
}
}
int stackAdvance = (2 + count) * 4;
_asm
{
// now push on the fixed params
mov eax, dword ptr [format];
push eax;
mov eax, dword ptr [buffer];
push eax;
// call sscanf, and more the result in to result
call dword ptr [sscanf];
mov result, eax;
// restore stack pointer
mov eax, dword ptr[stackAdvance];
add esp, eax;
}
return result;
}
#endif // _WIN32 / _MSC_VER <= 1500
tested only on Visual Studio 2008

Related

C++ ASM push offset string problem in release mode

The function I want to call is a function of a class:
void D3DBase::SetTexture(const std::string& path);
When i call it with asm block it works, but It was giving an error when I built it in release mode, then when I checked it from memory I realized that I needed to shift the string offset by 4 bytes and when I tried it worked.
My question is Why should I do that? What is the reason of this?
std::string __tmpString = "";
void SetTexture(DWORD table, const std::string& str)
{
__tmpString = str;
__asm {
#ifdef NDEBUG
push offset __tmpString - 0x4
#else
push offset __tmpString
#endif
mov ecx, table
mov eax, 0x401FC0
call eax
}
}

Why does MSVC optimize away this memcpy call?

I have the following C code (I shortened it removing some other calls and checks):
#include <stdint.h>
#include <memory.h>
extern char buffer[];
extern void getstr1(char *buff, int buflen);
extern void getstr2(char **s);
extern void dosomething(char *s);
void myfn()
{
char *s, *s1;
int len;
getstr1(buffer, 128);
getstr2(&s);
len = *s + *buffer;
memcpy(buffer + *buffer + 1, s + 1, (*s) * sizeof(char));
*buffer = len;
dosomething(buffer);
}
MSVC with the /O2 optimization option produces the following output:
_s$ = -4 ; size = 4
void myfn(void) PROC ; myfn, COMDAT
push ecx
push 128 ; 00000080H
push OFFSET char * buffer ; buffer
call void getstr1(char *,int) ; getstr1
lea eax, DWORD PTR _s$[esp+12]
push eax
call void getstr2(char * *) ; getstr2
mov eax, DWORD PTR _s$[esp+16]
push OFFSET char * buffer ; buffer
mov al, BYTE PTR [eax]
add BYTE PTR char * buffer, al
call void dosomething(char *) ; dosomething
add esp, 20 ; 00000014H
ret 0
void myfn(void) ENDP ; myfn
You can check this on Godbolt
Why did the compiler omit the memcpy call? It's interesting that declaring the external variable as "extern char buffer[N];" where N >= 2 or as "extern char *buffer;" makes the compiler use memcpy. Also replacing memcpy with memmove does the same thing. I know about possible UB when the source and destination regions overlap but here the compiler doesn't have knowledge of this.
I think this is a bug in MSVC as what you are doing is legal.
Note that there has been a similar bug filed already titled: Release build with speed optimize leaves an array uninitialized.
The code given to reproduce the problem in the bug report also uses an extern type array[];
As per the team, this issue is fixed in an upcoming release (which is not mentioned).
What you do is perfectly legal, this is definitely a bug in MSVC.
Here is a stripped down version to file a bug report:
#include <string.h>
extern unsigned char buffer[], *s;
void myfn() {
memcpy(buffer + *buffer + 1, s + 1, *s);
*buffer = 1;
}
Compiles to:
void myfn(void) PROC ; myfn, COMDAT
mov BYTE PTR unsigned char * buffer, 1
ret 0
void myfn(void) ENDP ; myfn
Removing the statement *buffer = 1; prevents the code generation bug.
Check it on Godbolt's Compiler Explorer.

Repeating macro n times

I want to ask if there's some way, to "repeat" macro n times automatically - by automatically I mean compile time, I want to do something like this:
#define foo _asm mov eax, eax
#define bar(x) //I don't know how can I do it
int main()
{
bar(5); //would generate 5 times _asm mov eax, eax
return 0;
}
I know I can embed macros in other macros but I don't know how can I do it something exactly n times. I want to use it in random-sized junk generator
You can do this using recoursive template:
// recoursive step
template
<
size_t count
>
void n_asm() {
_asm mov eax, eax
n_asm<count - 1>();
}
// base of recursion
template<>
void n_asm<0>() {
}
int main()
{
n_asm<5>();
return 0;
}

Calling C++ Method from Assembly with Parameters and Return Value

So I've asked this before but with significantly less detail. The question title accurately describes the problem: I have a method in C++ that I am trying to call from assembly (x86) that has both parameters and a return value. I have a rough understanding, at best, of assembly and a fairly solid understanding of C++ (otherwise I would not have undertaken this problem). Here's what I have as far as code goes:
// methodAddr is a pointer to the method address
void* methodAddr = method->Address;
// buffer is an int array of parameter values. The parameters can be anything (of any type)
// but are copied into an int array so they can be pushed onto the stack in reverse order
// 4 bytes at a time (as in push (int)). I know there's an issue here that is irrelevent to my baseline testing, in that if any parameter is over 4 bytes it will be broken and
// reversed (which is not good) but for basic testing this isn't an issue, so overlook this.
for (int index = bufferElementCount - 1; index >= 0; index--)
{
int val = buffer[index];
__asm
{
push val
}
}
int returnValueCount = 0;
// if there is a return value, allocate some space for it and push that onto the stack after
// the parameters have been pushed on
if (method->HasReturnValue)
{
*returnSize = method->ReturnValueSize;
outVal = new char[*returnSize];
returnValueCount = (*returnSize / 4) + (*returnSize % 4 != 0 ? 1 : 0);
memset(outVal, 0, *returnSize);
for (int index = returnValueCount - 1; index >= 0; index--)
{
char* addr = ((char*)outVal) + (index * 4);
__asm
{
push addr
}
}
}
// calculate the stack pointer offset so after the call we can pop the parameters and return value
int espOffset = (bufferElementCount + returnValueCount) * 4;
// call the method
__asm
{
call methodAddr;
add esp, espOffset
};
For my basic testing I am using a method with the following signature:
Person MyMethod3( int, char, int );
The problem is this: when omit the return value from the method signature, all of the parameter values are properly passed. But when I leave the method as is, the parameter data that is passed is incorrect but the value returned is correct. So my question, obviously, is what is wrong? I've tried pushing the return value space onto the stack before the parameters. The person structure is as follows:
class Person
{
public:
Text Name;
int Age;
float Cash;
ICollection<Person*>* Friends;
};
Any help would be greatly appreciated. Thanks!
I'm using Visual Studio 2013 with the November 2013 CTP compiler for C++, targeting x86.
As it relates to disassembly, this is the straight method call:
int one = 876;
char two = 'X';
int three = 9738;
Person p = MyMethod3(one, two, three);
And here is the disassembly for that:
00CB0A20 mov dword ptr [one],36Ch
char two = 'X';
00CB0A27 mov byte ptr [two],58h
int three = 9738;
00CB0A2B mov dword ptr [three],260Ah
Person p = MyMethod3(one, two, three);
00CB0A32 push 10h
00CB0A34 lea ecx,[p]
00CB0A37 call Person::__autoclassinit2 (0C6AA2Ch)
00CB0A3C mov eax,dword ptr [three]
00CB0A3F push eax
00CB0A40 movzx ecx,byte ptr [two]
00CB0A44 push ecx
00CB0A45 mov edx,dword ptr [one]
00CB0A48 push edx
00CB0A49 lea eax,[p]
00CB0A4C push eax
00CB0A4D call MyMethod3 (0C6B783h)
00CB0A52 add esp,10h
00CB0A55 mov dword ptr [ebp-4],0
My interpretation of this is as follows:
Execute the assignments to the local variables. Then create the output register. Then put the parameters in a particular register (the order here happens to be eax, ecx, and edx, which makes sense (eax and ebx are for one, ecx is for two, and edx and some other register for the last parameter?)). Then call LEA (load-effective address) which I don't understand but have understood to be a MOV. Then it calls the method with an address as the parameter? And then moves the stack pointer to pop the parameters and return value.
Any further explanation is appreciated, as I'm sure my understanding here is somewhat flawed.

How can I prevent MSVC++ from over-allocating stack space for a switch statement?

As part of updating the toolchain for a legacy codebase, we would like to move from the Borland C++ 5.02 compiler to the Microsoft compiler (VS2008 or later). This is an embedded environment where the stack address space is predefined and fairly limited. It turns out that we have a function with a large switch statement which causes a much larger stack allocation under the MS compiler than with Borland's and, in fact, results in a stack overflow.
The form of the code is something like this:
#ifdef PKTS
#define RETURN_TYPE SPacket
typedef struct
{
int a;
int b;
int c;
int d;
int e;
int f;
} SPacket;
SPacket error = {0,0,0,0,0,0};
#else
#define RETURN_TYPE int
int error = 0;
#endif
extern RETURN_TYPE pickone(int key);
void findresult(int key, RETURN_TYPE* result)
{
switch(key)
{
case 1 : *result = pickone(5 ); break;
case 2 : *result = pickone(6 ); break;
case 3 : *result = pickone(7 ); break;
case 4 : *result = pickone(8 ); break;
case 5 : *result = pickone(9 ); break;
case 6 : *result = pickone(10); break;
case 7 : *result = pickone(11); break;
case 8 : *result = pickone(12); break;
case 9 : *result = pickone(13); break;
case 10 : *result = pickone(14); break;
case 11 : *result = pickone(15); break;
default : *result = error; break;
}
}
When compiled with cl /O2 /FAs /c /DPKTS stack_alloc.cpp, a portion of the listing file looks like this:
_TEXT SEGMENT
$T2592 = -264 ; size = 24
$T2582 = -240 ; size = 24
$T2594 = -216 ; size = 24
$T2586 = -192 ; size = 24
$T2596 = -168 ; size = 24
$T2590 = -144 ; size = 24
$T2598 = -120 ; size = 24
$T2588 = -96 ; size = 24
$T2600 = -72 ; size = 24
$T2584 = -48 ; size = 24
$T2602 = -24 ; size = 24
_key$ = 8 ; size = 4
_result$ = 12 ; size = 4
?findresult##YAXHPAUSPacket###Z PROC ; findresult, COMDAT
; 27 : switch(key)
mov eax, DWORD PTR _key$[esp-4]
dec eax
sub esp, 264 ; 00000108H
...
$LN11#findresult:
; 30 : case 2 : *result = pickone(6 ); break;
push 6
lea ecx, DWORD PTR $T2584[esp+268]
push ecx
jmp SHORT $LN17#findresult
$LN10#findresult:
; 31 : case 3 : *result = pickone(7 ); break;
push 7
lea ecx, DWORD PTR $T2586[esp+268]
push ecx
jmp SHORT $LN17#findresult
$LN17#findresult:
call ?pickone##YA?AUSPacket##H#Z ; pickone
mov edx, DWORD PTR [eax]
mov ecx, DWORD PTR _result$[esp+268]
mov DWORD PTR [ecx], edx
mov edx, DWORD PTR [eax+4]
mov DWORD PTR [ecx+4], edx
mov edx, DWORD PTR [eax+8]
mov DWORD PTR [ecx+8], edx
mov edx, DWORD PTR [eax+12]
mov DWORD PTR [ecx+12], edx
mov edx, DWORD PTR [eax+16]
mov DWORD PTR [ecx+16], edx
mov eax, DWORD PTR [eax+20]
add esp, 8
mov DWORD PTR [ecx+20], eax
; 41 : }
; 42 : }
add esp, 264 ; 00000108H
ret 0
The allocated stack space includes dedicated locations for each case to temporarily store the structure returned from pickone(), though in the end, only one value will be copied to the result structure. As you can imagine, with larger structures, more cases, and recursive calls in this function, the available stack space is consumed rapidly.
If the return type is POD, as when the above is compiled without the /DPKTS directive, each case copies directly to result, and stack usage is more efficient:
$LN10#findresult:
; 31 : case 3 : *result = pickone(7 ); break;
push 7
call ?pickone##YAHH#Z ; pickone
mov ecx, DWORD PTR _result$[esp]
add esp, 4
mov DWORD PTR [ecx], eax
; 41 : }
; 42 : }
ret 0
Can anyone explain why the compiler takes this approach and whether there's a way to convince it to do otherwise? I have limited freedom to re-architect the code, so pragmas and the like are the more desirable solutions. So far, I have not found any combination of optimization, debug, etc. arguments that make a difference.
Thank you!
EDIT
I understand that findresult() needs to allocate space for the return value of pickone(). What I don't understand is why the compiler allocates additional space for each possible case in the switch. It seems that space for one temporary would be sufficient. This is, in fact, how gcc handles the same code. Borland, on the other hand, appears to use RVO, passing the pointer all the way down and avoiding use of a temporary. The MS C++ compiler is the only one of the three that reserves space for each case in the switch.
I know that it's difficult to suggest refactoring options when you don't know which portions of the test code can change -- that's why my first question is why does the compiler behave this way in the test case. I'm hoping that if I can understand that, I can choose the best refactoring/pragma/command-line option to fix it.
Why not just
void findresult(int key, RETURN_TYPE* result)
{
if (key >= 1 && key <= 11)
*result = pickone(4+key);
else
*result = error;
}
Assuming this counts as a smaller change, I just remembered an old question about scope, specifically related to embedded compilers. Does the optimizer do any better if you wrap each case in braces to explicitly limit the temporary scope?
switch(key)
{
case 1 : { *result = pickone(5 ); break; }
Another scope-changing option:
void findresult(int key, RETURN_TYPE* result)
{
RETURN_TYPE tmp;
switch(key)
{
case 1 : tmp = pickone(5 ); break;
...
}
*result = tmp;
}
This is all a bit hand-wavy, because we're just trying to guess which input will coax a sensible response from this unfortunate optimizer.
I'm going to assume that rewriting that function is allowed, as long as the changes don't "leak" outside the function. I'm also assuming that (as mentioned in the comments) you actually have a number of separate functions to call (but that they all receive the same type of input and return the same result type).
For such a case, I'd probably change the function to something like:
RETURN_TYPE func1(int) { /* ... */ }
RETURN_TYPE func2(int) { /* ... */ }
// ...
void findresult(int key, RETURN_TYPE *result) {
typedef RETURN_TYPE (*f)(int);
f funcs[] = (func1, func2, func3, func4, func5, /* ... */ };
if (in_range(key))
*result = funcs[key](key+4);
else
*result = error;
}