Intel PIN Routine address retrieval: Linux vs. Windows - c++

I'm new to PIN so perhaps there is an easy explanation to this. I'm puzzled by routine addresses PIN returns on Windows only. I created a minimal test program to illustrate my point.
I'm currently using PIN 2.14. My inspected application is a Debug build, with disabled ASLR on Windows.
First consider this simple application that calls an empty method and prints the method's address:
#include "stdio.h"
class Test {
public:
void method() {}
};
void main() {
Test t;
t.method();
printf("Test::method = 0x%p\n", &Test::method);
}
The following pin tool will disassemble a program's main routine and print the address of Test::method:
#include <pin.H>
#include "stdio.h"
VOID disassemble(IMG img, VOID *v)
{
for (SEC sec = IMG_SecHead(img); SEC_Valid(sec); sec = SEC_Next(sec))
{
for (RTN rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn))
{
auto name = RTN_Name(rtn);
if(name == "Test::method" || name == "_ZN4Test6methodEv") {
printf("%s detected by PIN resides at 0x%p.\n", name.c_str(), RTN_Address(rtn));
}
if (RTN_Name(rtn) != "main") continue;
RTN_Open(rtn);
for (INS ins = RTN_InsHead(rtn); INS_Valid(ins); ins = INS_Next(ins)) {
printf("%s\n", INS_Disassemble(ins).c_str());
}
RTN_Close(rtn);
}
}
}
int main(int argc, char **argv)
{
PIN_InitSymbols();
PIN_Init(argc, argv);
IMG_AddInstrumentFunction(disassemble, 0);
PIN_StartProgram();
return 0;
}
Running this PIN tool against the test application on an Ubuntu 14.04.3 x64 VM prints:
push rbp
mov rbp, rsp
push r13
push r12
push rbx
sub rsp, 0x18
lea rax, ptr [rbp-0x21]
mov rdi, rax
call 0x400820
mov r12d, 0x400820
mov r13d, 0x0
mov rcx, r12
mov rbx, r13
mov rax, r12
mov rdx, r13
mov rax, rdx
mov rsi, rcx
mov rdx, rax
mov edi, 0x4008b4
mov eax, 0x0
call 0x4006a0
mov eax, 0x0
add rsp, 0x18
pop rbx
pop r12
pop r13
pop rbp
ret
_ZN4Test6methodEv detected by PIN resides at 0x0x400820.
Test::method = 0x0x400820
Please note that both the call targeting t.method(), as well as the function address retrieved by PIN and the application's output show Test::method to reside at address 0x0x400820.
The output on my Windows 10 x64 machine is:
push rdi
sub rsp, 0x40
mov rdi, rsp
mov ecx, 0x10
mov eax, 0xcccccccc
rep stosd dword ptr [rdi]
lea rcx, ptr [rsp+0x24]
call 0x14000104b
lea rdx, ptr [rip-0x2b]
lea rcx, ptr [rip+0x74ca3]
call 0x1400013f0
xor eax, eax
mov edi, eax
mov rcx, rsp
lea rdx, ptr [rip+0x74cf0]
call 0x1400015f0
mov eax, edi
add rsp, 0x40
pop rdi
ret
Test::method detected by PIN resides at 0x0000000140001120.
Test::method = 0x000000014000104B
The application's output and the call target in the disassembly show the same value. However the routine address returned by PIN is different!
I'm very puzzled about this behavior. Do you have any idea how to explain this?
Thanks for any suggestions!

Answering my question for myself:
After probing for a while it became evident that both function pointers, the call target as well as the routine address returned by PIN were valid and ended up calling the method.
I ended up looking at the in-memory diassembly of my call target's address and sure enough it looked something like this:
0000000140001122 E9 D2 00 00 00 jmp Test::method (014000104Bh)
In other words: PIN seemingly replaced the original method with a jump to its own version of the code. Knowing this, correlating original function pointers and PIN routine addresses again became possible which is all I needed to do.

Related

Why is the C++ function parameter stored 20 bytes off of the rbp in x86-64 when the method body only has one 4 byte variable?

Consider the following program, compiled using x86-64 GCC 12.2 with flags --std=c++17 -O0:
int square(int num, int num2) {
int foo = 37;
return num * num;
}
int main () {
return square(10, 5);
}
The resulting assembly using godbolt is:
square(int, int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov DWORD PTR [rbp-4], 37
mov eax, DWORD PTR [rbp-20]
imul eax, eax
pop rbp
ret
main:
push rbp
mov rbp, rsp
mov esi, 5
mov edi, 10
call square(int, int)
nop
pop rbp
ret
I read about shadow spaces and it appears that in x64 there must be at minimum 32 bytes allocated: "32 bytes above the return address which the called function owns" ...
With that said, how is the offset -20 determined for the parameter num? If there's 32 bytes from rbp, wouldn't that be -24?
I noticed even if you add more local variables, it'll remain -20 until it gets pushed over to -36, but I cannot understand why. Thanks!

Get device encryption support information

I want to detect the device encryption support in my program. This info is available in the System Information program. Please check out the screenshot below:
What kind of Win API functions are used/available to detect the device encryption support? What System Information program uses to detect it? I just need some information.
TL;DR: it uses undocumented functions from fveapi.dll (Windows Bitlocker Drive Encryption API). It seems to rely only on the TPM capabilities.
Note that I only spent like 15 mins on it, but I doubt I missed something crucial, althoug this might be possible.
A bit of Reverse engineering
Typed system information in search bar, saw it spawned msinfo32.exe. Put the binary in a disassembler. It uses a MUI file so I'll have to search for the strings in the MUI file and not the executable.
Searching Device Encryption Support leads to string ID 951 (0x3b7)
STRINGTABLE
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
{
951, "Device Encryption Support|%s"
Searching for the contant in the disassembler leads to a function named:
DeviceEncryptionDataPoints(struct CWMIHelper *, struct CPtrList *)
The load ing of the aforementioned string is almost right at the start:
.text:00000001400141E9 mov edx, 3B7h
.text:00000001400141EE lea rcx, [rsp+2C8h+var_280]
.text:00000001400141F3
.text:00000001400141F3 loc_1400141F3:
.text:00000001400141F3 ; try {
.text:00000001400141F3 call cs:__imp_?LoadStringW#CString##QEAAHI#Z ; CString::LoadStringW(uint)
So we are definitely in the right function.
It loads the fveapi.dll module:
.text:0000000140014269 xor edx, edx ; hFile
.text:000000014001426B mov r8d, 800h ; dwFlags
.text:0000000140014271 lea rcx, LibFileName ; "fveapi.dll"
.text:0000000140014278 call cs:__imp_LoadLibraryExW
Gets a pointer on FveQueryDeviceEncryptionSupport:
.text:00000001400142AB lea rdx, aFvequerydevice ; "FveQueryDeviceEncryptionSupport"
.text:00000001400142B2 mov rcx, rdi ; hModule
.text:00000001400142B5 call cs:__imp_GetProcAddress
And immediately calls the function (this is a protected CFG call, but it's here):
.text:00000001400142CA mov [rsp+2C8h+var_254], rbx
.text:00000001400142CF mov [rsp+2C8h+var_260], 14h
.text:00000001400142D7 mov [rsp+2C8h+var_25C], 1
.text:00000001400142DF mov [rsp+2C8h+var_258], 1
.text:00000001400142E7 lea rcx, [rsp+2C8h+var_260]
.text:00000001400142EC call cs:__guard_dispatch_icall_fptr
Return value
If the function fails:
.text:00000001400142EC call cs:__guard_dispatch_icall_fptr
.text:00000001400142F2 mov esi, eax
.text:00000001400142F4 test eax, eax
.text:00000001400142F6 js loc_1400143F0 ; check failure
We land here:
.text:00000001400143F7 mov edx, 2FFh
.text:00000001400143FC lea rcx, [rsp+2C8h+var_288]
.text:0000000140014401 call cs:__imp_?LoadStringW#CString##QEAAHI#Z ; CString::LoadStringW(uint)
The string 0x2FF (767) is:
767, "Elevation Required to View"
If the call succeed, the code checks one of the parameter which is definitly an out parameter:
.text:00000001400142EC call cs:__guard_dispatch_icall_fptr
.text:00000001400142F2 mov esi, eax
.text:00000001400142F4 test eax, eax
.text:00000001400142F6 js loc_1400143F0
.text:00000001400142FC cmp dword ptr [rsp+2C8h+var_254], ebx ; rbx = 0
.text:0000000140014300 jnz short loc_14001431D
.text:0000000140014302 mov edx, 3B8h
.text:0000000140014307 lea rcx, [rsp+2C8h+var_288]
.text:000000014001430C call cs:__imp_?LoadStringW#CString##QEAAHI#Z ; CString::LoadStringW(uint)
If it's 0, the string 0x3b8 (952) is used:
952, "Meets prerequisites"
Otherwise various failure functions are called.
Failure
In case of a failure, the UpdateDeviceEncryptionStateFailureString function is called:
.text:0000000140014325 lea r9, [rsp+2C8h+var_294] ; int *
.text:000000014001432A lea r8, [rsp+2C8h+var_290] ; int *
.text:000000014001432F mov edx, 3C1h ; unsigned int
.text:0000000140014334 lea rcx, [rsp+2C8h+var_288] ; struct CString *
.text:0000000140014339 call ?UpdateDeviceEncryptionStateFailureString##YAXPEAVCString##IPEAH1#Z ; UpdateDeviceEncryptionStateFailureString(CString *,uint,int *,int *)
Its main goal is to fetch some string from the resource file.
One that stands out is 0x3b9:
.text:0000000140013A37 mov edx, 3B9h
.text:0000000140013A3C mov rcx, rbx
.text:0000000140013A3F call cs:__imp_?LoadStringW#CString##QEAAHI#Z ; CString::LoadStringW(uint)
953, "Reasons for failed automatic device encryption"
Which is the case for me since I don't have a TPM.
Other Functions
All of the other functions called from the DeviceEncryptionDataPoints (at least to get the needed results) are all from the fveapi.dll.
There are a lot in a function called PerformIndividualHardwareTests(HINSTANCE hModule, struct CString *, int *, int *):
.text:0000000140013AEF lea rdx, aNgscbcheckisao ; "NgscbCheckIsAOACDevice"
.text:0000000140013AF6 mov [rbp+var_1F], 0
.text:0000000140013AFA mov rdi, r9
.text:0000000140013AFD mov [rbp+var_20], 0
.text:0000000140013B01 mov rsi, r8
.text:0000000140013B04 mov [rbp+var_1E], 0
.text:0000000140013B08 mov rbx, rcx
.text:0000000140013B0B call cs:__imp_GetProcAddress
.text:0000000140013B12 nop dword ptr [rax+rax+00h]
.text:0000000140013B17 mov r12, rax
.text:0000000140013B1A test rax, rax
.text:0000000140013B1D jz loc_140013CB9
.text:0000000140013B23 lea rdx, aNgscbcheckishs ; "NgscbCheckIsHSTIVerified"
.text:0000000140013B2A mov rcx, rbx ; hModule
.text:0000000140013B2D call cs:__imp_GetProcAddress
.text:0000000140013B34 nop dword ptr [rax+rax+00h]
.text:0000000140013B39 mov r15, rax
.text:0000000140013B3C test rax, rax
.text:0000000140013B3F jz loc_140013CB9
.text:0000000140013B45 lea rdx, aNgscbcheckhsti ; "NgscbCheckHSTIPrerequisitesVerified"
.text:0000000140013B4C mov rcx, rbx ; hModule
.text:0000000140013B4F call cs:__imp_GetProcAddress
.text:0000000140013B56 nop dword ptr [rax+rax+00h]
.text:0000000140013B5B mov r13, rax
.text:0000000140013B5E test rax, rax
.text:0000000140013B61 jz loc_140013CB9
.text:0000000140013B67 lea rdx, aNgscbcheckdmas ; "NgscbCheckDmaSecurity"
.text:0000000140013B6E mov rcx, rbx ; hModule
.text:0000000140013B71 call cs:__imp_GetProcAddress
There's also a registry key checked SYSTEM\CurrentControlSet\Control\BitLocker\AutoDE\HSTI:
.text:0000000140013C10 lea r8, ?NGSCB_AUTODE_HSTI_REQUIRED##3QBGB ; "HSTIVerificationRequired"
.text:0000000140013C17 mov [rsp+60h+pcbData], rax ; pcbData
.text:0000000140013C1C lea rdx, ?NGSCB_AUTODE_HSTI_PREREQS##3QBGB ; "SYSTEM\\CurrentControlSet\\Control\\Bit"...
.text:0000000140013C23 lea rax, [rbp+var_1C]
.text:0000000140013C27 mov r9d, 10h ; dwFlags
.text:0000000140013C2D mov [rsp+60h+pvData], rax ; pvData
.text:0000000140013C32 mov rcx, 0FFFFFFFF80000002h ; hkey
.text:0000000140013C39 and [rsp+60h+var_40], 0
.text:0000000140013C3F call cs:__imp_RegGetValueW
and some other functions (NgscbCheckPreventDeviceEncryption, NgscbGetWinReConfiguration, FveCheckTpmCapability, ...) , once again, all from the fveapi.dll module.
So basically the checks are all based on functions from this DLL. It seems that none of them are documented (as far as I can see with a quick search).
I didn't find anything around in the DeviceEncryptionDataPoints caller (which is basically the main() function), since the next calls are dealing with checking the hypervisor capabilities.

C++ inline asm move WCHAR in 32-bit register

I am trying to practice the inline ASM in C++ :) Maybe outdated, but it is interesting, to know how CPU is executing the code.
So, what I am trying to do here, is to loop through processes and get a handle of needed one :) I am using for that already created methods from tlhelp32
I have this code:
HANDLE RetHandle = nullptr, snap;
int SizeOfPE = sizeof(PROCESSENTRY32), pid; PROCESSENTRY32 pe;
int PA = PROCESS_ALL_ACCESS;
const char* Pname = "explorer.exe";
__asm
{
mov eax, pe
mov ebx, this
mov ecx, [ebx]pe.dwSize
mov ecx, SizeOfPE
mov[ebx]pe.dwSize, ecx
mov eax, PA
mov ebx,0
call CreateToolhelp32Snapshot
mov eax,snap
label1:
mov eax, snap
mov ebx, [pe]
call Process32First
cmp eax,1
jne exitLabel
Process32NextLoop:
mov eax, snap
mov ebx, [pe]
call Process32Next
cmp eax, 1
jne Process32NextLoop
mov edx, pe
mov ecx, [edx].szExeFile
cmp ecx, Pname
je ExitLoop
jne Process32NextLoop
ExitLoop:
mov eax, [ebx].th32ProcessID
mov pid, eax
ExitLabel:
ret
}
Apparently, it is throwing error in th32ProcessID as well, however, it is just regular int.
Have been searching, but haven't found the equivalent for movl in C++

Why there is no `leave` instruction at function epilog on x64? [duplicate]

This question already has answers here:
Why does the x86-64 GCC function prologue allocate less stack than the local variables?
(1 answer)
Why is there no "sub rsp" instruction in this function prologue and why are function parameters stored at negative rbp offsets?
(2 answers)
Closed 4 years ago.
I'm on the way to get idea how the stack works on x86 and x64 machines. What I observed however is that when I manually write a code and disassembly it, it differs from what I see in the code people provide (eg. in their questions and tutorials). Here is little example:
Source
int add(int a, int b) {
int c = 16;
return a + b + c;
}
int main () {
add(3,4);
return 0;
}
x86
add(int, int):
push ebp
mov ebp, esp
sub esp, 16
mov DWORD PTR [ebp-4], 16
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+12]
add edx, eax
mov eax, DWORD PTR [ebp-4]
add eax, edx
leave (!)
ret
main:
push ebp
mov ebp, esp
push 4
push 3
call add(int, int)
add esp, 8
mov eax, 0
leave (!)
ret
Now goes x64
add(int, int):
push rbp
mov rbp, rsp
(?) where is `sub rsp, X`?
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov DWORD PTR [rbp-4], 16
mov edx, DWORD PTR [rbp-20]
mov eax, DWORD PTR [rbp-24]
add edx, eax
mov eax, DWORD PTR [rbp-4]
add eax, edx
(?) where is `mov rsp, rbp` before popping rbp?
pop rbp
ret
main:
push rbp
mov rbp, rsp
mov esi, 4
mov edi, 3
call add(int, int)
mov eax, 0
(?) where is `mov rsp, rbp` before popping rbp?
pop rbp
ret
As you can see, my main confusion is that when I compile against x86 - I see what I expect. When it's x64 - I miss leave instruction or exact following sequence: mov rsp, rbp then pop rbp. What's worng?
UPDATE
It seems like leave is missing, just because it wasn't altered previously. But then, goes another question - why there is no allocation for local vars in the frame?
To this question #melpomene gives pretty straightforward answer - because of "red zone". Which basically means the function that calls no further functions (leaf) can use the first 128 bytes below the stack without allocating space. So if I insert a call inside an add() to any other dumb function - sub rsp, X and add rsp, X will be added to prologue and epilogue respectively.

Segmentation fault in NASM 64bit

I am trying to output the result to the user after getting 3 inputs from scanf.
When I run my code, I am able to get the input I need. However it crashes after I collect the input and begin the calculation.
By the way, I am using Ubuntu 14.04 with g++ and NASM 64bit.
Here's how it should look:
This program is brought to you by Chris Tarazi
Welcome to Areas of Trapezoids
Please enter one of the base numbers: 5.8
Please enter the other base number: 2.2
Please enter the height: 6.5
****//Crashes here with Segmentation fault (core dumped)****
The area of a trapezoid with sizes 5.799999999999999365, 2.200000000000000153,
and 6.500000000000000000 is 26.000000000000000328
Have a nice day. Enjoy your trapezoids.
C++ file:
#include <stdio.h>
#include <stdint.h>
extern "C" double ComputeArea(); // links with global in assembly
using namespace std;
int main()
{
double area;
printf("This program is brought to you by Chris Tarazi.\n");
area = ComputeArea();
printf("Have a nice day. Enjoy your trapezoids.\n");
return 0;
}
Assembly file:
extern printf ; This function will be linked later.
extern scanf
global ComputeArea ; Declare function global to link with "extern" from C++.
;---------------------------------Declare variables-------------------------------------------
segment .data
welcome: db "Welcome to the area of trapezoids.", 10, 0
input: db "Please enter one of the base numbers: ", 0
secInput: db "Please enter the other base number: ", 0
output: db "The area of a trapezoid with sizes %1.18lf, %1.18lf, and %1.18lf is %1.18lf .", 10, 0
hInput: db "Please enter the height: ", 0
inputformat: db "%lf", 0
stringformat: db "%s", 0
fourfloatformat: db "%1.18lf %1.18lf %1.18lf %1.18lf", 0
;---------------------------------Begin segment of executable code------------------------------
segment .text
ComputeArea: ; Area of trapezoid = ((a + b) / 2) * h.
push rbp ; Save a copy of the stack base pointer
mov rbp, rsp ; We do this in order to be 100% compatible with C and C++.
push rbx ; Back up rbx
push rcx ; Back up rcx
push rdx ; Back up rdx
push rsi ; Back up rsi
push rdi ; Back up rdi
push r8 ; Back up r8
push r9 ; Back up r9
push r10 ; Back up r10
push r11 ; Back up r11
push r12 ; Back up r12
push r13 ; Back up r13
push r14 ; Back up r14
push r15 ; Back up r15
pushf ; Back up rflags
;---------------------------------Output messages to user---------------------------------------
mov qword rax, 0
mov rdi, stringformat
mov rsi, welcome
call printf
mov qword rax, 0
mov rdi, stringformat
mov rsi, input
call printf
push qword 0
mov qword rax, 0
mov rdi, inputformat
mov rsi, rsp ;firstbase
call scanf
movsd xmm0, [rsp]
pop rax
mov qword rax, 0
mov rdi, stringformat
mov rsi, secInput
call printf
push qword 0
mov qword rax, 0
mov rdi, inputformat
mov rsi, rsp ;secondbase
call scanf
movsd xmm1, [rsp + 4]
pop rax
mov qword rax, 0
mov rdi, stringformat
mov rsi, hInput
call printf
push qword 0
mov qword rax, 0
mov rdi, inputformat
mov rsi, rsp ;height
call scanf
movsd xmm2, [rsp + 8]
pop rax
;---------------------------------Begin ComputeArea Calculation-----------------------------------
mov rax, 2
cvtsi2sd xmm3, rax
addsd xmm0, xmm1
divsd xmm0, xmm3
mulsd xmm0, xmm2
ret
;---------------------------------Output result to user-------------------------------------------
mov rax, 3
mov rdi, output
call printf
First off, why on earth are you saving ALL of those registers?!? The ABI for 64 bit Linux says you only need to save rbx, rbp, and r12 - r15 if you use those registers in your function. Also, you using Assembler, there is no need to create a stack frame in 64bit land (plus you aren't even using rbp! so why create a stack frame?) The only thing that is very important is to make sure your stack is aligned on a 16 byte boundary - call pushes an 8 byte return address, so all you need in your ComputeArea function is sub rsp, 8 and add rsp, 8 right before your ret.
In your first scanf you are using rsp without adjusting it, you just overwrote something!
You do some computations here:
mov rax, 2
cvtsi2sd xmm3, rax
addsd xmm0, xmm1
divsd xmm0, xmm3
mulsd xmm0, xmm2
ret
You return from the procedure here but do not pop all of those registers you just pushed!! So basically your stack pointer is all messed up! The CPU does not know what the return address is!
What you do in the prologue, must be reversed in the epilogue before you return!
Maybe, you should start simple, read in 3 floats and try to print them!
When I correct your code, this is my output:
Welcome to the area of trapezoids.
Please enter one of the base numbers: 5.8
Please enter the other base number: 2.2
Please enter the height: 6.5
The area of a trapezoid with sizes 5.799999999999999822, 2.200000000000000178, and 6.500000000000000000 is 26.000000000000000000 .