Delete instruction using PIN - c++

I am using the Intel PIN tool to emulate some new instructions and check the corresponding results. For this purpose I am using illegal opcodes of x86_64 to represent my instructions. For example- opcodes 0x16, 0x17 are illegal in x86_64. which represent my instruction opcodes. I am using a C program to generate an executable and then pass it to the Pintool. A C program I am using is this -
#include <stdio.h>
int main()
{
asm(".byte 0x16");
asm(".byte 0x17");
return 0;
}
So if we see the instruction trace 0x16 and 0x17 will appear as bad instructions and if we try to run the executable we get -
Illegal instruction (core dumped)
which is expected as 0x16, 0x17 are illegal in x86_64 and hence the executable should not pass. I am using this executable as input to my Pintool, which examines the instruction trace and hence will encounter 0x16 and 0x17 in the trace.
The Pintool I am using is this -
#include "pin.H"
#include <iostream>
#include <fstream>
#include <cstdint>
UINT64 icount = 0;
using namespace std;
KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "test.out","This pin tool simulates ULI");
FILE * op;
//====================================================================
// Analysis Routines
//====================================================================
VOID simulate_ins(VOID *ip, UINT32 size) {
fprintf(op,"Wrong instruction encountered here\n");
// Do something based on the instruction
}
//====================================================================
// Instrumentation Routines
//====================================================================
VOID Instruction(INS ins, void *v) {
UINT8 opcodeBytes[15];
UINT64 fetched = PIN_SafeCopy(&opcodeBytes[0],(void *)INS_Address(ins),INS_Size(ins));
if (fetched != INS_Size(ins))
fprintf(op,"\nBad\n");
else {
if(opcodeBytes[0]==0x16 || opcodeBytes[0]==0x17) {
INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)simulate_ins, IARG_INST_PTR, IARG_UINT64, INS_Size(ins) , IARG_END);
INS_Delete(ins);
}
}
VOID Fini(INT32 code, VOID *v) {
//Display some end result
}
INT32 Usage() {
PIN_ERROR("This Pintool failed\n" + KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
int main(int argc, char *argv[])
{
op = fopen("test.out", "w");
if (PIN_Init(argc, argv))
return Usage();
PIN_InitSymbols();
PIN_AddInternalExceptionHandler(ExceptionHandler,NULL);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
So I am extracting my assembly opcodes and if the first byte is 0x16 or 0x17 I am sending the instruction to my analysis routine and then deleting the instruction. But however when I run this Pintool on the executable I still get the Illegal instruction (core dumped) error and my code fails to run. My understanding is that the Instrumentation routine is called every time a new instruction is encountered in the trace and the analysis routine is called before the instruction is executed. Here I am checking for the opcode and based on the result I am sending the code to the analysis routine and deleting the instruction. I will be simulating my new instruction in the analysis routine so, I just need to delete the old instruction and let the program proceed futher and make sure it dosen't give the illegal instruction error again.
Anywhere I am doing something wrong?

Related

Tracking native instructions in Intel PIN [duplicate]

This question already has an answer here:
What instructions 'instCount' Pin tool counts?
(1 answer)
Closed 5 years ago.
I am using the Intel PIN tool to do some analysis on the assembly instructions of a C program. I have a simple C program which prints "Hello World", which I have compiled and generated an executable. I have the assembly instruction trace generated from gdb like this-
Dump of assembler code for function main:
0x0000000000400526 <+0>: push %rbp
0x0000000000400527 <+1>: mov %rsp,%rbp
=> 0x000000000040052a <+4>: mov $0x4005c4,%edi
0x000000000040052f <+9>: mov $0x0,%eax
0x0000000000400534 <+14>: callq 0x400400 <printf#plt>
0x0000000000400539 <+19>: mov $0x0,%eax
0x000000000040053e <+24>: pop %rbp
0x000000000040053f <+25>: retq
End of assembler dump.
I ran a pintool where I gave the executable as an input, and I am doing an instruction trace and printing the number of instructions. I wish to trace the instructions which are from my C program and probably get the machine opcodes and do some kind of analysis. I am using a C++ PIN tool to count the number of instructions-
#include "pin.H"
#include <iostream>
#include <stdio.h>
UINT64 icount = 0;
using namespace std;
//====================================================================
// Analysis Routines
//====================================================================
void docount(THREADID tid) {
icount++;
}
//====================================================================
// Instrumentation Routines
//====================================================================
VOID Instruction(INS ins, void *v) {
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_THREAD_ID, IARG_END);
}
VOID Fini(INT32 code, VOID *v) {
printf("count = %ld\n",(long)icount);
}
INT32 Usage() {
PIN_ERROR("This Pintool failed\n"
+ KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
int main(int argc, char *argv[]) {
if (PIN_Init(argc, argv)) return Usage();
PIN_InitSymbols();
PIN_AddInternalExceptionHandler(ExceptionHandler,NULL);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
When I run my hello world program with this tool, I get icount = 81563. I understand that PIN adds its own instructions for analysis, but I don't understand how it adds so many instructions, while I don't have more than 10 instructions in my C program. Also is there a way to identify the assembly instructions which are from my code and the ones generated by PIN. I seem to find no way to differentiate between instructions generated by PIN and the ones which are from my program. Please Help!
You're not measuring what you think you're measuring. See my answer here for details:
What instructions 'instCount' Pin tool counts?
Pin does not count its own instructions. The large count is the result of preparation before and after main() and the call to printf().

Run ARM coprocessor instructions with sudo

My goal is to write to the coprocessor p15 register, and right now I'm just trying to read it.
I have the following example c++ program, the first asm instruction is just a simple ror, which works just fine. The second asm instruction, I'm trying to just read SCTLR register.
I compile the program with g++ test_program.cpp -o test
and run with ./test or sudo ./test
If I run with ./test, I get the output:
Value: 10 Result: 8
Illegal instruction
If I run with sudo ./test, I get:
Value: 10 Result: 8
So clearly the instruction is not working since its not printing the line "got here" or "SCTLR". Is there something else I have to do to execute the coprocessor read? I'm running this on a raspberry pi (Cortex A-53).
#include <cstdlib>
#include <cstdio>
int main(){
unsigned int y, x;
x = 16;
//Assembly rotate right 1 example
asm("mov %[result], %[value], ror #1"
: [result]"=r" (y) /* Rotation result. */
: [value]"r" (x) /* Rotated value. */
: /* No clobbers */
);
printf("Value: %x Result: %x\n", x, y);
// Assembly read SCTLR, program crashes if this is run
unsigned int id = 0;
asm("MRC p15, 0, %[result], c0, c0, 0"
: [result]"=r" (id) //Rotation result.
);
printf("Got here\n");
printf("SCTLR: %x", id);
return 0;
}

clock_gettime fails in chrooted Debian etch with CLOCK_PROCESS_CPUTIME_ID

I have setup a chrooted Debian Etch (32bit) under Ubuntu 12.04 (64bit), and it appears that clock_gettime() works with CLOCK_MONOTONIC, but fails with both CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID. The errno is set to EINVAL, which according to the man page means that "The clk_id specified is not supported on this system."
All three clocks work fine outside the chrooted Debian and in 64bit chrooted Debian etch.
Can someone explains to me why this is the case and how to fix it?
Much appreciated.
I don't know the cause yet, but I have ideas that won't fit in the comment box.
First, you can make the test program simpler by compiling it as C instead of C++ and not linking it to libpthread. -lrt should be good enough to get clock_gettime. Also, compiling it with -static could make tracing easier since the dynamic linker startup stuff won't be there.
Static linking might even change the behavior of clock_gettime. It's worth trying just to find out whether it works around the bug.
Another thing I'd like to see is the output of this vdso-bypassing test program:
#define _GNU_SOURCE
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>
int main(void)
{
struct timespec ts;
if(syscall(SYS_clock_gettime, CLOCK_PROCESS_CPUTIME_ID, &ts)) {
perror("clock_gettime");
return 1;
}
printf("CLOCK_PROCESS_CPUTIME_ID: %lu.%09ld\n",
(unsigned long)ts.tv_sec, ts.tv_nsec);
return 0;
}
with and without -static, and if it fails, add strace.
Update (actually, skip this. go to the second update)
A couple more simple test ideas:
compile and run a 32-bit test program in the Ubuntu host system, by adding -m32 to the gcc command. It's possible that the kernel's 32-bit compatibility mode is causing the error. If that's the case, then the 32-bit version will fail no matter which libc it gets linked to.
take the non-static test programs you compiled under Debian, copy them to the Ubuntu host system and try to run them there. Change in behavior will point to libc as the cause.
Then it's time for the hard stuff. Looking at disassembled code and maybe single-stepping it in gdb. Instead of having you do that on your own, I'd like to get a copy of the code you're running. Upload a a static-compiled failing test program somewhere I can get it. Also a copy of the 32-bit vdso provided by your kernel might be interesting. To extract the vdso, run the following program (compiled in the 32-bit chroot) which will create a file called vdso.dump, and upload that too.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int getvseg(const char *which, const char *outfn)
{
FILE *maps, *outfile;
char buf[1024];
void *start, *end;
size_t sz;
void *copy;
int ret;
char search[strlen(which)+4];
maps = fopen("/proc/self/maps", "r");
if(!maps) {
perror("/proc/self/maps");
return 1;
}
outfile = fopen(outfn, "w");
if(!outfile) {
perror(outfn);
fclose(maps);
return 1;
}
sprintf(search, "[%s]\n", which);
while(fgets(buf, sizeof buf, maps)) {
if(strlen(buf)<strlen(search) ||
strcmp(buf+strlen(buf)-strlen(search),search))
continue;
if(sscanf(buf, "%p-%p", &start, &end)!=2) {
fprintf(stderr, "weird line in /proc/self/maps: %s", buf);
continue;
}
sz = (char *)end - (char *)start;
/* copy because I got an EFAULT trying to write directly from vsyscall */
copy = malloc(sz);
if(!copy) {
perror("malloc");
goto fail;
}
memcpy(copy, start, sz);
if(fwrite(copy, 1, sz, outfile)!=sz) {
if(ferror(outfile))
perror(outfn);
else
fprintf(stderr, "%s: short write", outfn);
free(copy);
goto fail;
}
free(copy);
goto success;
}
fprintf(stderr, "%s not found\n", which);
fail:
ret = 1;
goto out;
success:
ret = 0;
out:
fclose(maps);
fclose(outfile);
return ret;
}
int main(void)
{
int ret = 1;
if(!getvseg("vdso", "vdso.dump")) {
printf("vdso dumped to vdso.dump\n");
ret = 0;
}
if(!getvseg("vsyscall", "vsyscall.dump")) {
printf("vsyscall dumped to vsyscall.dump\n");
ret = 0;
}
return ret;
}
Update 2
I reproduced this by downloading an etch libc. It's definitely caused be glibc stupidity. Instead of a simple syscall wrapper for clock_gettime it has a big wad of preprocessor spaghetti culminating in "you can't use clockid's that we didn't pre-approve". You're not going to get it to work with that old glibc. Which brings us to the question I didn't want to ask: why are you trying to use an obsolete version of Debian anyway?

My trampoline won't bounce (detouring, C++, GCC)

It feels like I'm abusing Stackoverflow with all my questions, but it's a Q&A forum after all :) Anyhow, I have been using detours for a while now, but I have yet to implement one of my own (I've used wrappers earlier). Since I want to have complete control over my code (who doesn't?) I have decided to implement a fully functional detour'er on my own, so I can understand every single byte of my code.
The code (below) is as simple as possible, the problem though, is not. I have successfully implemented the detour (i.e a hook to my own function) but I haven't been able to implement the trampoline.
Whenever I call the trampoline, depending on the offset I use, I get either a "segmentation fault" or an "illegal instruction". Both cases ends the same though; 'core dumped'. I think it is because I've mixed up the 'relative address' (note: I'm pretty new to Linux so I have far from mastered GDB).
As commented in the code, depending on sizeof(jmpOp)(at line 66) I either get an illegal instruction or a segmentation fault. I'm sorry if it's something obvious, I'm staying up way too late...
// Header files
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include "global.h" // Contains typedefines for byte, ulong, ushort etc...
#include <cstring>
bool ProtectMemory(void * addr, int flags)
{
// Constant holding the page size value
const size_t pageSize = sysconf(_SC_PAGE_SIZE);
// Calculate relative page offset
size_t temp = (size_t) addr;
temp -= temp % pageSize;
// Update address
addr = (void*) temp;
// Update memory area protection
return !mprotect(addr, pageSize, flags);
}
const byte jmpOp[] = { 0xE9, 0x00, 0x00, 0x00, 0x00 };
int Test(void)
{
printf("This is testing\n");
return 5;
}
int MyTest(void)
{
printf("This is ******\n");
return 9;
}
typedef int (*TestType)(void);
int main(int argc, char * argv[])
{
// Fetch addresses
byte * test = (byte*) &Test;
byte * myTest = (byte*) &MyTest;
// Call original
Test();
// Update memory access for 'test' function
ProtectMemory((void*) test, PROT_EXEC | PROT_WRITE | PROT_READ);
// Allocate memory for the trampoline
byte * trampoline = new byte[sizeof(jmpOp) * 2];
// Do copy operations
memcpy(trampoline, test, sizeof(jmpOp));
memcpy(test, jmpOp, sizeof(jmpOp));
// Setup trampoline
trampoline += sizeof(jmpOp);
*trampoline = 0xE9;
// I think this address is incorrect, how should I calculate it? With the current
// status (commented 'sizeof(jmpOp)') the compiler complains about "Illegal Instruction".
// If I uncomment it, and use either + or -, a segmentation fault will occur...
*(uint*)(trampoline + 1) = ((uint) test - (uint) trampoline)/* + sizeof(jmpOp)*/;
trampoline -= sizeof(jmpOp);
// Make the trampoline executable (and read/write)
ProtectMemory((void*) trampoline, PROT_EXEC | PROT_WRITE | PROT_READ);
// Setup detour
*(uint*)(test + 1) = ((uint) myTest - (uint) test) - sizeof(jmpOp);
// Call 'detoured' func
Test();
// Call trampoline (crashes)
((TestType) trampoline)();
return 0;
}
In case of interest, this is the output during a normal run (with the exact code above):
This is testing
This is **
Illegal instruction (core dumped)
And this is the result if I use +/- sizeof(jmpOp) at line 66:
This is testing
This is ******
Segmentation fault (core dumped)
NOTE: I'm running Ubuntu 32 bit and compile with g++ global.cpp main.cpp -o main -Iinclude
You're not going to be able to indiscriminately copy the first 5 bytes of Test() into your trampoline, followed by a jump to the 6th instruction byte of Test(), because you don't know if the first 5 bytes comprise an integral number of x86 variable-length instructions. To do this, you're going to have to do at least a minimal amount of automated disassembling of the Test() function in order to find an instruction boundary that's 5 or more bytes past the beginning of the function, then copy an appropriate number of bytes to your trampoline, and THEN append your jump (which won't be at a fixed offset within your trampoline). Note that on a typical RISC processor (like PPC), you wouldn't have this problem, as all instructions are the same width.

How to get a "bus error"?

I am trying very hard to get a bus error.
One way is misaligned access and I have tried the examples given here and here, but no error for me - the programs execute just fine.
Is there some situation which is sure to produce a bus error?
This should reliably result in a SIGBUS on a POSIX-compliant system.
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
int main() {
FILE *f = tmpfile();
int *m = mmap(0, 4, PROT_WRITE, MAP_PRIVATE, fileno(f), 0);
*m = 0;
return 0;
}
From the Single Unix Specification, mmap:
References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal.
Bus errors can only be invoked on hardware platforms that:
Require aligned access, and
Don't compensate for an unaligned access by performing two aligned accesses and combining the results.
You probably do not have access to such a system.
Try something along the lines of:
#include <signal.h>
int main(void)
{
raise(SIGBUS);
return 0;
}
(I know, probably not the answer you want, but it's almost sure to get you a "bus error"!)
As others have mentioned this is very platform specific. On the ARM system I'm working with (which doesn't have virtual memory) there are large portions of the address space which have no memory or peripheral assigned. If I read or write one of those addresses, I get a bus error.
You can also get a bus error if there's actually a hardware problem on the bus.
If you're running on a platform with virtual memory, you might not be able to intentionally generate a bus error with your program unless it's a device driver or other kernel mode software. An invalid memory access would likely be trapped as an access violation or similar by the memory manager (and it never even has a chance to hit the bus).
on linux with an Intel CPU try this:
int main(int argc, char **argv)
{
# if defined i386
/* enable alignment check (AC) */
asm("pushf; "
"orl $(1<<18), (%esp); "
"popf;");
# endif
char d[] = "12345678"; /* yep! - causes SIGBUS even on Linux-i386 */
return 0;
}
the trick here is to set the "alignment check" bit in one of the CPUs "special" registers.
see also: here
I am sure that you must be using x86 machines.
X86 cpu does not generate bus error unless its AC flag in EFALAGS register is set.
Try this code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *p;
__asm__("pushf\n"
"orl $0x40000, (%rsp)\n"
"popf");
/*
* malloc() always provides aligned memory.
* Do not use stack variable like a[9], depending on the compiler you use,
* a may not be aligned properly.
*/
p = malloc(sizeof(int) + 1);
memset(p, 0, sizeof(int) + 1);
/* making p unaligned */
p++;
printf("%d\n", *(int *)p);
return 0;
}
More about this can be found at http://orchistro.tistory.com/206
Also keep in mind that some operating systems report "bus error" for errors other than misaligned access. You didn't mention in your question what it was you were actually trying to acheive. Maybe try thus:
int *x = 0;
*x=1;
the Wikipedia page you linked to mentions that access to non-existant memory can also result is a bus error. You might have better luck with loading a known-invalid address into a pointer and dereferwncing that.
How about this? untested.
#include<stdio.h>
typedef struct
{
int a;
int b;
} busErr;
int main()
{
busErr err;
char * cPtr;
int *iPtr;
cPtr = (char *)&err;
cPtr++;
iPtr = (int *)cPtr;
*iPtr = 10;
}
int main(int argc, char **argv)
{
char *bus_error = new char[1];
for (int i=0; i<1000000000;i++) {
bus_error += 0xFFFFFFFFFFFFFFF;
*(bus_error + 0xFFFFFFFFFFFFFF) = 'X';
}
}
Bus error: 10 (core dumped)
Simple, write to memory that isn't yours:
int main()
{
char *bus_error = 0;
*bus_error = 'X';
}
Instant bus error on my PowerPC Mac [OS X 10.4, dual 1ghz PPC7455's], not necessarily on your hardware and/or operating system.
There's even a wikipedia article about bus errors, including a program to make one.
For 0x86 arch:
#include <stdio.h>
int main()
{
#if defined(__GNUC__)
# if defined(__i386__)
/* Enable Alignment Checking on x86 */
__asm__("pushf\norl $0x40000,(%esp)\npopf");
# elif defined(__x86_64__)
/* Enable Alignment Checking on x86_64 */
__asm__("pushf\norl $0x40000,(%rsp)\npopf");
# endif
#endif
int b = 0;
int a = 0xffffff;
char *c = (char*)&a;
c++;
int *p = (int*)c;
*p = 10; //Bus error as memory accessed by p is not 4 or 8 byte aligned
printf ("%d\n", sizeof(a));
printf ("%x\n", *p);
printf ("%x\n", p);
printf ("%x\n", &a);
}
Note:If asm instructions are removed, code wont generate the SIGBUS error as suggested by others.
SIGBUS can occur for other reason too.
Bus errors occur if you try to access memory that is not addressable by your computer. For example, your computer's memory has an address range 0x00 to 0xFF but you try to access a memory element at 0x0100 or greater.
In reality, your computer will have a much greater range than 0x00 to 0xFF.
To answer your original post:
Tell me some situation which is sure to produce a bus error.
In your code, index into memory way outside the scope of the max memory limit. I dunno ... use some kind of giant hex value 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF indexed into a char* ...