I'm trying to get a hold of assembly, but there's one probably very simple thing I don't understand.
Consider this following simple example
long long * values = new long long[2];
values[0] = 10;
values[1] = 20;
int j = -1;
values[j+2] = 15; // xxxxxxx
Now, the last line (marked with xxxxxx) disassembles to:
000A6604 mov eax,dword ptr [j]
000A6607 mov ecx,dword ptr [values]
000A660A mov dword ptr [ecx+eax*8+10h],0Fh
First question: What is actually stored in eax and ecx, is it the actual values (i.e. -1 for "j", and the two long long values 10 and 20 for "values"), or is it merely a memory address (e.g. someting like &p, &values) pointing to some place where the values are being stored?
Second question, I know what the third line is supposed to do, but I'm not quite sure why this actually works.
So my understand is, it copies the value 0x0F into the specified memory location. The memory location is basically
- the location of the first element stored in ecx
- plus the size of long long in bytes (= 8) * the value of eax (which equals j, so -1)
- plus the generic offset of 16 bytes (2 times the size of long long).
What I don't get is: In this expression, ecx seems to be a memory address, while eax seems to be a value (-1). How is this possible? Seeing they were defined in pretty much the same way, shouldn't eax and ecx either both contain memory addresses, or both values?
Thanks.
eax and ecx are registers -- the first two instructions load those registers with the values used in the calculation, i.e. j and values (where values means the base address of the array by that name).
I know what the third line is supposed to do, but I'm not quite sure why this actually works
The instruction mov dword ptr [ecx+eax*8+10h],0Fh means move the value 0Fh (i.e. 15 decimal) into the location ecx+eax*8+10h. To figure that out, consider each piece:
ecx is the base address of the values array
eax is the value at j, i.e. -1
eax*8 is j converted to an offset in bytes -- the size of a long long is 8 bytes
eax*8+10h 10h is 16 decimal, i.e. 2*8, so this is j+2 converted to a byte offset
ecx+eax*8+10h adds that final offset to the base address of the array to determine the location in which to store the value 15
Related
#include <iostream>
#include <random>
using namespace std;
struct TradeMsg {
int64_t timestamp; // 0->7
char exchange; // 8
char symbol[17]; // 9->25
char sale_condition[4]; // 26 -> 29
char source_of_trade; // 30
uint8_t trade_correction; // 31
int64_t trade_volume; // 32->39
int64_t trade_price; // 40->47
};
static_assert(sizeof(TradeMsg) == 48);
char buffer[1000000];
template<class T, size_t N=1>
int someFunc(char* buffer, T* output, int& cursor) {
// read + process data from buffer. Return data in output. Set cursor to the last byte read + 1.
return cursor + (rand() % 20) + 1; // dummy code
}
void parseData(TradeMsg* msg) {
int cursor = 0;
cursor = someFunc<int64_t>(buffer, &msg->timestamp, cursor);
cursor = someFunc<char>(buffer, &msg->exchange, cursor);
cursor++;
int i = 0;
// i is GUARANTEED to be <= 17 after this loop,
// edit: the input data in buffer[] guarantee that fact.
while (buffer[cursor + i] != ',') {
msg->symbol[i] = buffer[cursor + i];
i++;
}
msg->symbol[i] = '\n'; // might access symbol[17].
cursor = cursor + i + 1;
for (i=0; i<4; i++) msg->sale_condition[i] = buffer[cursor + i];
cursor += 5;
//cursor = someFunc...
}
int main()
{
TradeMsg a;
a.symbol[17] = '\0';
return 0;
}
I have this struct that is guaranteed to have predictable size. In the code, there is a case where the program tries to assign value to an array element past its size msg->symbol[17] = ... .
However, in that case, the assignment does not cause any harm as long as:
It is done before the next struct members (sale_condition) are assigned (no unexpected code reordering).
It does not modifies any previous members (timestamp, exchange).
It does not access any memory outside the struct.
I read that this is undefined behavior. But what kind of compiler optimization/code generation can make this go wrong? symbol[17] is pretty deep inside the middle of the struct, so I don't see how can the compiler generates an access outside it. Assume that platform is x86-64 only
Various folks have pointed out debug-mode checks that will fire on access outside the bounds of an array member of a struct, with options like gcc -fsanitize=undefined. Separate from that, it's also legal for a compiler to use the assumption of non-overlap between member accesses to reorder two assignments which actually do alias:
#Peter in comments points out that the compiler is allowed to assume that accesses to msg->symbol[i] don't affect other struct members, and potentially delay msg->symbol[i] = '\n'; until after the loop that writes msg->sale_condition[i]. (i.e. sink that store to the bottom of the function).
There isn't a good reason you'd expect a compiler to want to do that in this function alone, but perhaps after inlining into some caller that also stored something there, it could be relevant. Or just because it's a DeathStation 9000 that exists in this thought experiment to break your code.
You could write this safely, although GCC compiles that worse
Since char* is allowed to alias any other object, you could offset a char* relative to the start of the whole struct, rather than to the start of the member array. Use offsetof to find the right start point like this:
#include <cstddef>
...
((char*)msg + offsetof(TradeMsg, symbol))[i] = '\n'; // might access symbol[17].
That's exactly equivalent to *((char*)msg + offsetof(...) + i) = '\n'; by definition of C++'s [] operator, even though it lets you use [i] to index relative to the same position.
However, that does compile to less efficient asm with GCC11.2 -O2. (Godbolt), mostly because int i, cursor are narrower than pointer-width. The "safe" version that redoes indexing from the start of the struct does more indexing work in asm, not using the msg+offsetof(symbol) pointer that it was already using as the base register in the loop.
# original version, with UB if `i` goes past the buffer.
# gcc11.2 -O2 -march=haswell. -O3 fully unrolls into a chain of copy/branch
... partially peeled first iteration
.L3: # do{
mov BYTE PTR [rbx+8+rax], dl # store into msg->symbol[i]
movsx rdi, eax # not read inside the loop
lea ecx, [r8+rax]
inc rax
movzx edx, BYTE PTR buffer[rsi+1+rax] # load from buffer
cmp dl, 44
jne .L3 # }while(buffer[cursor+i] != ',')
## End of copy-and-search loop.
# Loops are identical up to this point except for MOVSX here vs. MOV in the no-UB version.
movsx rcx, ecx # just redo sign extension of this calculation that was done repeatedly inside the loop just for this, apparently.
.L2:
mov BYTE PTR [rbx+9+rdi], 10 # store a newline
mov eax, 1 # set up for next loop
# offsetof version, without UB
# same loop, but with RDI and RSI usage switched.
# And with mov esi, eax zero extension instead of movsx rdi, eax sign extension
cmp dl, 44
jne .L3 # }while(buffer[cursor+i] != ',')
add esi, 9 # offsetof(TradeMsg, symbol)
movsx rcx, ecx # more stuff getting sign extended.
movsx rsi, esi # including something used in the newline store
.L2:
mov BYTE PTR [rbx+rsi], 10
mov eax, 1 # set up for next loop
The RCX calculation seems to just be for use by the next loop, setting sale_conditions.
BTW, the copy-and-search loop is like strcpy but with a ',' terminator. Unfortunately gcc/clang don't know how to optimize that; they compile to a slow byte-at-a-time loop, not e.g. an AVX512BW masked store using mask-1 from a vec == set1_epi8(',') compare, to get a mask selecting the bytes-before-',' instead of the comma element. (Probably needs a bithack to isolate that lowest-set-bit as the only set bit, though, unless it's safe to always copy 16 or 17 bytes separate from finding the ',' position, which could be done efficiently without masked stores or branching.)
Another option might be a union between a char[21] and struct{ char sym[17], sale[4];}, if you use a C++ implementation that allows C99-style union type-punning. (It's a GNU extension, and also supported by MSVC, but not necessarily literally every x86 compiler.)
Also, style-wise, shadowing int i = 0; with for( int i=0 ; i<4 ; i++ ) is poor style. Pick a different var name for that loop, like j. (Or if there is anything meaningful, a better name for i which has to survive across multiple loops.)
In a few cases:
When variable guard is set up: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
In a C++ interpreter (yes they exist): https://root.cern/cling/
Your symbol has a size of 17 Yet, you are trying to assign a value to the 18th index a.symbol[17] = '\0';
Remember your index value starts off at 0 not 1.
So you have two places that can go wrong. i can equal 17 which will cause an error and that last line I showed above will cause an error.
I have an assignment in C++ to read a file into a string variable which contains digits (no spaces), and using inline assembly, the program needs to sum up the digits of the string. For this I want to loop until end of string (NULL) and every iteration copy 1 char (which is 1 digit) into a register so I can use compare and subtract on it. The problem is that every time instead of copying the char to the register it copies some random value.
I'm using Visual Studio for debugging. Variable Y is the string and I'm trying to copy every iteration of the loop the current char into register AL.
// read from txt file
string y;
cout << "\n" << "the text is \n";
ifstream infile;
infile.open("1.txt");
getline(infile, y);
cout << y;
infile.close();
// inline assembly
_asm
{
mov edx, 0 // counter
mov ebx, 0
mov eax, 0
loop1:
movzx AL, y[ebx]
cmp AL, 0x00
jz finished
sub AL, 48 // convert ascii to number, assuming digit
add edx, eax // add digit to counter
add ebx, 1 // move pointer to the next byte
loop loop1
finished:
mov i, edx
}
For example assuming Y is "123" and it's the first iteration of the loop, EBX is 0. I expect y[ebx] to point to value 49 ('1') and indeed in debug I see y[ebx]'s value is 49. I want to copy said value into a register, so when I use instruction:
movzx AL, y[ebx]
I expect register AL to change to 49 ('1'), but the value changes to something random instead. For instance last debug session it changed to 192 ('À').
y is the std::string object's control block. You want to access its C string data.
MSVC inline asm syntax is pretty crap, so there's no way to just ask for a pointer to that in a register. I think you have to create a new C++ variable like char *ystr = y.c_str();
That C variable is a pointer which you need to load into register with mov ecx, [ystr]. Accessing the bytes of ystr's object-representation directly would give you the bytes of the pointer.
Also, your current code is using the loop instruction, which is slow and equivalent to dec ecx/jnz. But you didn't initialize ECX, and your loop termination condition is based on the zero terminator, not a counter that you know ahead of the first iteration. (Unless you also ask the std::string for its length instead).
There is zero reason to use the loop instruction here. Put a test al,al / jnz loop1 at the bottom of your loop like a normal person.
I'm having a problem in finding the average, min and max of an array in assembly language. i created a simple array with C++ and created a test.asm file to pass it through. i figured out the average, but now its the min and max i cant seem to figure out.
#include <iostream>
using namespace std;
extern "C"
int test(int*, int);
int main()
{
const int SIZE = 7;
int arr[SIZE] = { 1,2,3,4,5,6,7 };
int val = test(arr, SIZE);
cout << "The function test returned: " << val << endl;
return 0;
}
This is my test.asm that adds all the values and returns 4.
.686
.model flat
.code
_test PROC ;named _test because C automatically prepends an underscode, it is needed to interoperate
push ebp
mov ebp,esp ;stack pointer to ebp
mov ebx,[ebp+8] ; address of first array element
mov ecx,[ebp+12]
mov ebp,0
mov edx,0
mov eax,0
loopMe:
cmp ebp,ecx
je allDone
add eax,[ebx+edx]
add edx,4
add ebp,1
jmp loopMe
allDone:
mov edx,0
div ecx
pop ebp
ret
_test ENDP
END
I am still trying to figure out how to find the min since the max will be done in a similar way. I assume you use the cmp to compare values but everything i tried so far hasn't been successful. I'm fairly new to assembly language and its hard for me to grasp. Any help is appreciated.
Any help is appreciated
Ok, so I will show you refactored average function, even if you didn't ask for it directly. :)
Things you can learn from this:
simplified function prologue/epilogue, when ebp is not modified in code
the input array is of 32b int values, so to have correct average you should calculate 64b sum, and do the 64b sum signed division
subtle "tricks" how to get zero value (xor) or how inc is +1 to value (lowering code size)
handling zero sized array by returning fake average 0 (no crash)
addition of two 64b values composed from 32b registers/instructions
counting human "index" (+1 => direct cmp with size possible), yet addressing 32b values (usage of *4 in addressing)
renamed to getAverage
BTW, this is not optimized for performance, I tried to keep the source "simple", so it's easy to read and understand what is it doing.
_getAverage PROC
; avoiding `ebp` usage, so no need to save/set it
mov ebx,[esp+4] ; address of first array element
mov ecx,[esp+8] ; size of array
xor esi,esi ; array index 0
; 64b sum (edx:eax) = 0
xor eax,eax
cdq
; test for invalid input (zero sized array)
jecxz zeroSizeArray ; arguments validation, returns 0 for 0 size
; here "0 < size", so no "index < size" test needed for first element
; "do { ... } while(index < size);" loop variant
sumLoop:
; extend value from array[esi] to 64b (edi is upper 32b)
mov edi,[ebx+esi*4]
sar edi,31
; edx:eax += edi:array[esi] (64b array value added to 64b sum)
add eax,[ebx+esi*4]
adc edx,edi
; next index and loop while index < size
inc esi
cmp esi,ecx
jb sumLoop
; divide the 64b sum of integers by "size" to get average value
idiv ecx ; signed (!) division (input array is signed "int")
; can't overflow (Divide-error), as the sum value was accumulated
; from 32b values only, so EAX contains full correct result
zeroSizeArray:
ret
_getAverage ENDP
I have the following code
int isBST(struct node* node)
{
return(isBSTUtil(node, INT_MIN, INT_MAX));
}
int isBSTUtil(struct node* node, int min, int max)
{
if (node==NULL)
return 1;
if (node->data <= min || node->data > max)
return 0;
return
isBSTUtil(node->left, min, node->data) && // Allow only distinct values
isBSTUtil(node->right, node->data, max); // Allow only distinct values
}
When I do debug the code in GDB, I see that the second parameter is set by address ebp + 0xc (0xbffff188+0xc), the third parameter is set to ebp + 0x10 and the first parameter is not clear where, in theory, we know that the return address of the function is located EBP + 4 , the first parameter is located EBP +8 and ....from what I have so ?
In theory, we don't know anything about where the arguments or
the return address is located. On a specific architecture, we
can (usually) figure out what a specific compiler does by
examining some of the generated assembler. The first thing to
examine is the function preable. On an Intel 32 bit processor,
the frame pointer will be in EBP, and it will be set up after
a certain number of push to save registers. (In particular,
there must be a push EBP before EBP is set up for the local
frame.) A typical preable for an Intel might be:
function:
PUSH EBP
MOV EBP, ESP
SUB ESP, n ; allocate space for local variables
Beyond that: the Intel stack grows down, and compilers
for Intel almost universally push the arguments from right to
left, so in your case, max will have the highest address,
min will be right below it, and node below that. So your
image of the frame will be:
[EBP - ...] local variables
[EBP + 0] the pushed old EBP
[EBP + 4] the return address
[EBP + 8] the first argument (node)
[EBP + 12] the second argument (min)
[EBP + 16] the third argument (max)
(This is supposing that the arguments are all 32 bit values.)
Of course, a compiler might push additional registers before
pushing EBP, causing the offsets to be correspondingly higher.
This is just one possible layout.
I create an application to compute the primes numbers in 64-bit range so when i tried to compute the square root of 64-bit number using sqrt function from math.h i found the answer is not accurate for example when the input is ~0ull the answer should be ~0u but the one I get is 0x100000000 which is not right, so i decided to create my own version using assembly x86 language to see if this is a bug, here is my function:
inline unsigned prime_isqrt(unsigned long long value)
{
const unsigned one = 1;
const unsigned two = 2;
__asm
{
test dword ptr [value+4], 0x80000000
jz ZERO
mov eax, dword ptr [value]
mov ecx, dword ptr [value + 4]
shrd eax, ecx, 1
shr ecx, 1
mov dword ptr [value],eax
mov dword ptr [value+4],ecx
fild value
fimul two
fiadd one
jmp REST
ZERO:
fild value
REST:
fsqrt
fisttp value
mov eax, dword ptr [value]
}
}
the input is an odd number to get its square root. When i test my function with same input the result was the same.
What i don't get is why those functions round the result or to be specific why sqrt instruction round the result?
sqrt doesn't round anything - you do when you convert your integer into a double. A double can't represent all the numbers that a 64-bit integer can without loss of precision. Specifically starting at 253, there are multiple integers that will be represented as the same double value.
So if you convert an integer above 253 to double, you lose some of the least significant bits, which is why (double)(~0ull) is 18446744073709552000.0, not 18446744073709551615.0 (or to be more precise the latter is actually equal to the former because they represent the same double number).
You're not very clear about the C++ function that you're calling. sqrt is an overloaded name. You probably wanted sqrt(double(~0ull)). There's no sqrt overload which takes an unsigned long long.