MASM str and substr? - c++

I'm currently coding an irc bot in asm
I have already done this once in C++, so I know how to solve most problems I encounter, but I need a substr()[*] function like the one seen in C++. I need the substr function to receive the server name from a PING request so I can respond with the corresponding PONG response
But I don't know how to implent it in MASM, I heard of something called macroassembling, It seems substr is often used in those functions
Does anyone have any idea how I can get my substr function to work
[*] string substr ( size_t pos = 0, size_t n = npos )
This is how I use the substr() funcion in C++:
if(data.find("PING :") != std::string::npos){
string pong = "PONG :" + data.substr( (data.find_last_of(":")+1), (data.find_last_of("\r")-1) );
SCHiMBot.Pong(pong); // Keep the connection alive!
}
Where data is a string holding all the information the server sends me, and SCHiMBot is a class I use to talk with the server
This code is c&p'ed directly out of a bot I coded, so it should be flawless

This really isn't nearly as easy to answer is it might initially seem. The problem is pretty simple: a function like substr doesn't really exist in isolation -- it's part of a string library, and to make it useful, you just about need to at least sketch out how the library as a whole fits together, how you represent your data, etc. I.e., substr creates a string, but to do so you need to decide what a string is.
To avoid that problem, I'm going to sort of ignore what you actually asked, and give a somewhat simpler answer that's more suited to assembly language. What you really need is to start with one buffer of data, find a couple of "markers" in that buffer, and copy what's in between those markers to a designated position in another buffer. First we need the code to do the "find_last":
; expects:
; ESI = address of buffer
; ECX = length of data in buffer
; AH = character to find
; returns:
; ESI = position of item
;
find_last proc
mov al, [esi+ecx]
cmp ah, al
loopnz find_last
ret
find_last endp
Now to copy the substring to the transmission buffer, we do something like this:
CR = 13
copy_substr proc
mov esi, offset read_buffer
mov ecx, bytes_read
mov ah, CR
call find_last ; find the carriage-return
mov edx, esi ; save its position
mov esi, offset read_buffer
mov ecx, bytes_read
mov ah, ':'
call find_last ; find the colon
inc esi ; point to character following colon
sub edx, esi ; get distance from colon+1 to CR
mov ecx, edx
; Now: ESI = address following ':'
; ECX = distance to CR
mov edi, (offset trans_buffer) + prefix_length
rep movsb ; copy the data
ret
copy_substr endp

data.substr( (data.find_last_of(":")+1)
The first parameter of substr is the starting position. If it is a value passed the last element of the string, out_of_range exception will be thrown. You should verify that this is not happening.
if(data.find("PING :") != std::string::npos)
{
size_t s1 = data.find_last_of(":");
size_t s2 = data.find_last_of("\r");
if (s1 != string::npos &&
s2 != string::npos &&
s1+1 < data.size())
{
string pong = "PONG :" + data.substr(s1+1, s2-1);
SCHiMBot.Pong(pong); // Keep the connection alive!
}
}

Related

why does visual studio have i++ post increment as a default for autocompleting for loop?

so this is what visual studio gives when auto completing a for loop.
I have been told that ++i is more effecient than i++ as the pre increment does not require the comiler to create a temporary variable to store i. and that I should use ++i unless I actually require i++. is there any reason that visual studio does this as a default or am I just overthinking it?
for (size_t i = 0; i < length; i++)
{
}
Some of this answer is based on my experience rather than on any hard facts, so please take it with a grain of salt.
Reasons for i++ as the default loop incrementor
Postfix vs Prefix
The postfix incrementor i++ produces a temporary pre-incremented value that's passed 'upwards' in the expression, then variable i is subsequently incremented.
The prefix incrementor ++i does not produce a temporary object, but directly increments i and passes the incremented result to the expression.
Performance
Looking at two simple loops, one using i++, and the other ++i, let us examine the generated assembly
#include <stdio.h>
int main()
{
for (int i = 0; i < 5; i++) {
//
}
for (int j = 0; j < 5; ++j) {
//
}
return 0;
}
Generates the following assembly using GCC:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0
.L3:
cmp DWORD PTR [rbp-4], 4
jg .L2
add DWORD PTR [rbp-4], 1
jmp .L3
.L2:
mov DWORD PTR [rbp-8], 0
.L5:
cmp DWORD PTR [rbp-8], 4
jg .L4
add DWORD PTR [rbp-8], 1
jmp .L5
.L4:
mov eax, 0
pop rbp
ret
.L3 corresponds to the first loop, and .L5 corresponds to the second loop. The compiler optimizes away any differences in this case resulting in identical code, so no performance difference.
Appearance
Looking at the following three loops, which one appeals to you most aesthetically?
for (int i = 0; i < 5; i++) {
//
}
for (int i = 0; i < 5; ++i) {
//
}
for (int i = 0; i < 5; i+=1) {
//
}
For me i+=1 is right out, and ++i just feels sort of... backwards
Ergonomics
i++) - Your hand is already in the correct position to type ) after typing ++
++i) - Requires moving from upper row to homerow and back
History
Older C programmers, and current C programmers who write systems level code make frequent use of i++ to allow for writing very compact code such as:
// print some char * s
while(*s)
putc(*s++);
// strcpy from src to dest
while (*dest++=*src++);
Because of this i++ became the incrementor C programmers reach for first, eventually becoming a sort of standard in cases where ++i vs i++ have the same functionally.
Choosing ++i instead there would merely be a style choice.
There are no performance gains to be had in this context. That is a myth.
i++ is also more "conventional" for a simple loop over ints.
You are of course free to write your own loop in your own style!

Reverse "formula" based on a string

Is possible to make the reverse process of that?
string dir = textenc.Text;
uint EDI = 0x1505;
uint EDX = 0;
byte ECX = 0;
for ( int i = 0; i < dir.Length; i++ )
{
if ( dir[ i ] != '.' && dir[ i ] != '\\' )
{
EDX = EDI;
EDX = EDX << 5;
ECX = ( byte )dir[ i ];
EDX = EDX + EDI;
EDX = EDX + ECX;
EDI = EDX;
}
};
return EDI;
The dir is a string, for example, when dir is "data\font\tahoma.ttf" the output of that function would be: 2114405758.
Is there a way to retrieve the original string giving only the output number?
No.
The hash function ignores the characters . and \. You can add as many as these as you want, and it will still calculate the same value.
Note: As mentioned by others, this is a hash function, which will create infinitely many collisions.
The short answer is NO, it is not possible.
The long answer is that it will be nearly impossible to tell if the generated hash is unique to the input given.
The only way to know this is to generate hashes for all possible string combinations until you hit a duplicate; Or you don't hit a duplicate, but you will run out of memory before the later happens. There is also the case that you don't hit duplicates for a long time which can also mean that the hash function is quite good but still does not mean it is reversible.

A variation of fibonacci assembly x86, have to call it from a c++ main method, kind of lost at few parts

This is the fibonacci code,
unsigned int fib(unsigned int n)
{
if (n==1 || n ==2)
return 1;
else
return fib(n-2) + fib(n-1);
}
but instead for my code, I have to change formula to a new one,
f(n-2)/2 + f(n-1) * 2, so the sequence is 1, 2, 4, 9, 20, 44, 98, 218
I need to write a recursive function called Mobonacci in assembly to calculate the nth number in sequence, and also a main function in c++ that reads a positive number n, then cals mobonacci assembly function with parameter n, then print our result
So I'm kind of confused, do I write the function in assembly like I did below, then write a c++ function to call it? and how would guys change my code from fibonacci to the new formula? Here is my code, what do I need to change and did I need create new part that let the code read input? Also is my code too short? do I need add anything else?
.code
main PROC
mov ecx,0
push 4 ; calculate the nth fib
call Fib ; calculate fib (eax)
call WriteDec
call Crlf
exit
main ENDP
Fib PROC
add ecx,1
push ebp
mov ebp,esp
mov eax,[ebp+8] ; get n
cmp eax,2 ; n == 2?
je exception2
cmp eax,1 ; n == 1?
je exception2
dec eax
push eax ; Fib(n-1)
call fib
add eax,
jmp Quit
Exception2:
dec eax
Quit:
pop ebp ; return EAX
ret 4 ; clean up stack
Fib ENDP
END main
Depends on where you are trying to insert asm code into your c++ code...
For gcc/linux you can do something like :
//Simple example:
void *frame; /* Frame pointer */
__asm__ ("mov %%ebp,%0":"=r"(frame));
//Complicated example:
int foo(void) {
int joe=1234, fred;
__asm__(
" mov %1,%%eax\n"
" add $2,%%eax\n"
" mov %%eax,%0\n"
:"=r" (fred) /* %0: Out */
:"r" (joe) /* %1: In */
:"%eax" /* Overwrite */
);
return fred;
}
The important thing is to understand how to use your asm function in cpp.
You can find some useful things about this subject here : https://www.cs.uaf.edu/2011/fall/cs301/lecture/10_12_asm_c.html
About the second part of your question.
To multiple, you can use the command "mul" and to make a division "div".
So if you want to do f(n-1) * 2
You have to get you register %eax after the "call fib" and use mul.
Just have a look here:
http://www.tutorialspoint.com/assembly_programming/assembly_arithmetic_instructions.htm

Is this a compiler error in Visual Studio 2010?

I have a bug in this conditional:
while(CurrentObserverPathPointDisplacement > lengthToNextPoint && CurrentObserverPathPointIndex < (PathSize - 1) )
{
CurrentObserverPathPointIndex = CurrentObserverPathPointIndex + 1;
CurrentObserverPathPointDisplacement -= lengthToNextPoint;
lengthToNextPoint = (CurrentObserverPath->pathPoints[min((PathSize - 1),CurrentObserverPathPointIndex + 1)] - CurrentObserverPath->pathPoints[CurrentObserverPathPointIndex]).length();
}
It seems to get stuck in an infinite loop while in Release mode. Works fine in debug mode, or more interstingly when I put a debug print on the last line
OutputInDebug("Here");
Here is the generated assembly for the conditional itself:
while(CurrentObserverPathPointDisplacement > lengthToNextPoint && CurrentObserverPathPointIndex < (PathSize - 1) )
00F074CF fcom qword ptr [dist]
00F074D2 fnstsw ax
00F074D4 test ah,5
00F074D7 jp ModelViewData::moveCameraAndCenterOnXYPlaneForwardBackward+27Eh (0F0753Eh)
00F074D9 mov eax,dword ptr [dontRotate]
00F074DC cmp eax,ebx
00F074DE jge ModelViewData::moveCameraAndCenterOnXYPlaneForwardBackward+27Eh (0F0753Eh)
{
You can see that for the second condition, it seems to move the value of 'dontRotate', a function parameter of type bool, into eax, and then compare against it, yet dontRotate is used nowhere near that bit of code.
I understand that this may be a bit little data, but it seems like an obvious compiler error personally. But sadly, i'm not sure how to distill it down to a self contained enough problem to actually produce a bug report.
Edit:
Not the actual decelerations, but the types:
double CurrentObserverPathPointDisplacement;
double lengthToNextPoint;
int CurrentObserverPathPointIndex;
int PathSize;
vector<vector3<double>> CurrentObserverPath::pathPoints;
Edit2:
Once I add in the debug print statement to the end of the while, this is the assembly that gets generated, which no longer expresses the bug:
while(CurrentObserverPathPointDisplacement > lengthToNextPoint && CurrentObserverPathPointIndex < (PathSize - 1) )
00B1751E fcom qword ptr [esi+208h]
00B17524 fnstsw ax
00B17526 test ah,5
00B17529 jp ModelViewData::moveCameraAndCenterOnXYPlaneForwardBackward+2D6h (0B175A6h)
00B1752B mov eax,dword ptr [esi+200h]
00B17531 cmp eax,ebx
00B17533 jge ModelViewData::moveCameraAndCenterOnXYPlaneForwardBackward+2D6h (0B175A6h)
{
Here:
while(/* foo */ && CurrentObserverPathPointIndex < (PathSize - 1) )
{
CurrentObserverPathPointIndex = CurrentObserverPathPointIndex + 1;
Since this is the only point (unless min does something really nasty) in the loop where CurrentObserverPathPointIndex is changed and both CurrentObserverPathPointIndex and PathSize are signed integers of the same size (and PathSize is small enough to rule out integer promotion issues), the remaining floating point fiddling is irrelevant. The loop must terminate eventually (it may take quite a long time if the initial value of CurrentOvserverPathPointIndex is small compared to PathSize, though).
This allows only one conclusion; If the compiler generates code that does not terminate (ever), the compiler is wrong.
It looks like PathSize doesn't change in the loop, so the compiler can compute PathSize - 1 before looping and coincidentally use the same memory location as dontRotate, whatever that is.
More importantly, how many elements are there in CurrentObserverPath->pathPoints?
Your loop condition includes this test:
CurrentObserverPathPointIndex < (PathSize - 1)
Inside your loop is this assignment:
CurrentObserverPathPointIndex = CurrentObserverPathPointIndex + 1;
followed by this further incremented subscript:
[min((PathSize - 1),CurrentObserverPathPointIndex + 1)]
Maybe your code appeared to work in debug mode because random undefined behaviour appeared to work?

Why are some statements skipped in gdb?

I'm debugging this code :
len = NGX_SYS_NERR * sizeof(ngx_str_t);
ngx_sys_errlist = malloc(len);
if (ngx_sys_errlist == NULL) {
goto failed;
}
for (err = 0; err < NGX_SYS_NERR; err++) {
But in gdb if (ngx_sys_errlist == NULL) { is skipped directly:
(gdb)
59 ngx_sys_errlist = malloc(len);
(gdb) n
64 for (err = 0; err < NGX_SYS_NERR; err++) {
I also have experienced this before,but never knows the reason,anyone knows?
Is it a bug?
UPDATE
0x000000000041be9d <ngx_strerror_init+0>: mov %rbx,-0x30(%rsp)
0x000000000041bea2 <ngx_strerror_init+5>: mov %rbp,-0x28(%rsp)
0x000000000041bea7 <ngx_strerror_init+10>: mov %r12,-0x20(%rsp)
0x000000000041beac <ngx_strerror_init+15>: mov %r13,-0x18(%rsp)
0x000000000041beb1 <ngx_strerror_init+20>: mov %r14,-0x10(%rsp)
0x000000000041beb6 <ngx_strerror_init+25>: mov %r15,-0x8(%rsp)
0x000000000041bebb <ngx_strerror_init+30>: sub $0x38,%rsp
0x000000000041bebf <ngx_strerror_init+34>: mov $0x840,%edi
0x000000000041bec4 <ngx_strerror_init+39>: callq 0x402388 <malloc#plt>
0x000000000041bec9 <ngx_strerror_init+44>: mov %rax,0x26e718(%rip) # 0x68a5e8 <ngx_sys_errlist>
0x000000000041bed0 <ngx_strerror_init+51>: mov $0x840,%r12d
0x000000000041bed6 <ngx_strerror_init+57>: test %rax,%rax
0x000000000041bed9 <ngx_strerror_init+60>: je 0x41bf56 <ngx_strerror_init+185>
0x000000000041bedb <ngx_strerror_init+62>: mov $0x0,%r13d
0x000000000041bee1 <ngx_strerror_init+68>: mov $0x0,%r14d
0x000000000041bee7 <ngx_strerror_init+74>: mov $0x0,%r15d
0x000000000041beed <ngx_strerror_init+80>: mov %r13d,%edi
0x000000000041bef0 <ngx_strerror_init+83>: callq 0x402578 <strerror#plt>
UPDATE
Nobody else ever met the same thing in using gdb? It happens to me frequently when debugging.
Most likely the two statements were optimized into a single set-and-test expression, which then can't be decomposed into the original two lines. The generated pseudocode is likely to be something like
call _malloc
jz _failed
mov acc, _ngx_sys_errlist
where the test now happens before the assignment; do you let the source level trace go backwards to reflect this?
please check,
a) if you are debugging release build (if there exists one)
b) if your source file is modified
if you still have the issue, please provide the details (Complier with version, degugger version , platform and code ...)