C++ DLL Memory Reading Crashes - c++

I've been trying all day how to read properly the memory of a game with my injected DLL, it works correctly but if the DLL reach other type of variable which is not "float" then it crashes. My question is, how can I detect if is float and avoid crashes. Here is my code:
for (DWORD i = 0x1000000; i < 0x2FFFFFF; i += 0x4)
{
DWORD Base = *(DWORD*)(base + i);
DWORD lvl_2 = *(DWORD*)(Base + 0x8);
float* posx = static_cast<float*>((float*)(lvl_2 + 0x90));
Position_x = static_cast<float>(*posx);
if (Position_x > 7.05f && Position_x < 7.20f) //test > 7.05f && test < 7.20f || i == 0x2217710
{
fprintf(file, "Pointer: 0x%x Position x: %.2f \n", i, Position_x);
}
}
This is a scanner I made to update pointer of a game knowing the structs offsets. This code works correctly if I use as condition i == 0x2217710, it returns the correct position x of the player. If I remove the condition it crashes due to the line Position_x = static_cast... is converting other type of variable in float which is illegal for some variables. How could I do this? Thanks in advance.

First, I'm pretty sure the problem isn't that it crashes when you have a pointer that isn't to a float. All binary values 0 - 0xFFFFFFFF are generally valid as float values, but some are "special", such as "not a number" - but as long as you only compare the value, etc, it will work just fine to use such values too - will just come out as false in the comparison.
Your actual problem is that you are making a pointer from base+i, and then using the content at that location as another pointer - but if base+i does not hold a valid pointer {which is pretty much most 32- or 64-bit values in most systems}, then your next fetch will go wrong. Unfortunately, there is no trivial way to check if a memory address is valid or not. In Windows, you could possibly use __try, __except to catch when you are trying to read an invalid memory address and "do something else".

Related

Is there a compiler bug for my iOS metal compute kernel or am I missing something?

I need an implementation of upper_bound as described in the STL for my metal compute kernel. Not having anything in the metal standard library, I essentially copied it from <algorithm> into my shader file like so:
static device float* upper_bound( device float* first, device float* last, float val)
{
ptrdiff_t count = last - first;
while( count > 0){
device float* it = first;
ptrdiff_t step = count/2;
it += step;
if( !(val < *it)){
first = ++it;
count -= step + 1;
}else count = step;
}
return first;
}
I created a simple kernel to test it like so:
kernel void upper_bound_test(
device float* input [[buffer(0)]],
device uint* output [[buffer(1)]]
)
{
device float* where = upper_bound( input, input + 5, 3.1);
output[0] = where - input;
}
Which for this test has a hardcoded input size and search value. I also hardcoded a 5 element input buffer on the framework side as you'll see below. This kernel I expect to return the index of the first input greater than 3.1
It doesn't work. In fact output[0] is never written--as I preloaded the buffer with a magic number to see if it gets over-written. It doesn't. In fact after waitUntilCompleted, commandBuffer.error looks like this:
Error Domain = MTLCommandBufferErrorDomain
Code = 1
NSLocalizedDescription = "IOAcceleratorFamily returned error code 3"
What does error code 3 mean? Did my kernel get killed before it had a chance to finish?
Further, I tried just a linear search version of upper_bound like so:
static device float* upper_bound2( device float* first, device float* last, float val)
{
while( first < last && *first <= val)
++first;
return first;
}
This one works (sort-of). I have the same problem with a binary search lower_bound from <algorithm>--yet a naive linear version works (sort-of). BTW, I tested my STL copied versions from straight C-code (with device removed obviously) and they work fine outside of shader-land. Please tell me I'm doing something wrong and this is not a metal compiler bug.
Now about that "sort-of" above: the linear search versions work on a 5s and mini-2 (A7s) (returns index 3 in the example above), but on a 6+ (A8) it gives the right answer + 2^31. What the heck! Same exact code. Note on the framework side I use uint32_t and on the shader side I use uint--which are the same thing. Note also that every pointer subtraction (ptrdiff_t are signed 8-byte things) are small non-negative values. Why is the 6+ setting that high order bit? And of course, why don't my real binary search versions work?
Here is the framework side stuff:
id<MTLFunction> upperBoundTestKernel = [_library newFunctionWithName: #"upper_bound_test"];
id <MTLComputePipelineState> upperBoundTestPipelineState = [_device
newComputePipelineStateWithFunction: upperBoundTestKernel
error: &err];
float sortedNumbers[] = {1., 2., 3., 4., 5.};
id<MTLBuffer> testInputBuffer = [_device
newBufferWithBytes:(const void *)sortedNumbers
length: sizeof(sortedNumbers)
options: MTLResourceCPUCacheModeDefaultCache];
id<MTLBuffer> testOutputBuffer = [_device
newBufferWithLength: sizeof(uint32_t)
options: MTLResourceCPUCacheModeDefaultCache];
*(uint32_t*)testOutputBuffer.contents = 42;//magic number better get clobbered
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
id<MTLComputeCommandEncoder> commandEncoder = [commandBuffer computeCommandEncoder];
[commandEncoder setComputePipelineState: upperBoundTestPipelineState];
[commandEncoder setBuffer: testInputBuffer offset: 0 atIndex: 0];
[commandEncoder setBuffer: testOutputBuffer offset: 0 atIndex: 1];
[commandEncoder
dispatchThreadgroups: MTLSizeMake( 1, 1, 1)
threadsPerThreadgroup: MTLSizeMake( 1, 1, 1)];
[commandEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
uint32_t answer = *(uint32_t*)testOutputBuffer.contents;
Well, I've found a solution/work-around. I guessed it was a pointer-aliasing problem since first and last pointed into the same buffer. So I changed them to offsets from a single pointer variable. Here's a re-written upper_bound2:
static uint upper_bound2( device float* input, uint first, uint last, float val)
{
while( first < last && input[first] <= val)
++first;
return first;
}
And a re-written test kernel:
kernel void upper_bound_test(
device float* input [[buffer(0)]],
device uint* output [[buffer(1)]]
)
{
output[0] = upper_bound2( input, 0, 5, 3.1);
}
This worked--completely. That is, not only did it fix the "sort-of" problem for the linear search, but a similarly re-written binary search worked too. I don't want to believe this though. The metal shader language is supposed to be a subset of C++, yet standard pointer semantics don't work? Can I really not compare or subtract pointers?
Anyway, I don't recall seeing any docs saying there can be no pointer aliasing or what declaration incantation would help me here. Any more help?
[UPDATE]
For the record, as pointed out by "slime" on Apple's dev forum:
https://developer.apple.com/library/ios/documentation/Metal/Reference/MetalShadingLanguageGuide/func-var-qual/func-var-qual.html#//apple_ref/doc/uid/TP40014364-CH4-SW3
"Buffers (device and constant) specified as argument values to a graphics or kernel function cannot be aliased—that is, a buffer passed as an argument value cannot overlap another buffer passed to a separate argument of the same graphics or kernel function."
But it's also worth noting that upper_bound() is not a kernel function and upper_bound_test() is not passed aliased arguments. What upper_bound_test() does do is create a local temporary that points into the same buffer as one of its arguments. Perhaps the docs should say what it means, something like: "No pointer aliasing to device and constant memory in any function is allowed including rvalues." I don't actually know if this is too strong.

Problems with pointers and memory adresses

I wonder why this code doesn't work:
#include <iostream>
using namespace std;
int main()
{
int *pointer = (int*)0x02F70BCC;
cout<<*pointer;
return 0;
}
In my opinion it should write on the screen value of 0x02F70BCC,
instead of this my programm crashes.
I know that memory with adress 0x02F70BCC stores value of 20.
But like I said no matter what it just doesn't want to show correct number.
Please help me guys, detailed explanation would be very nice of you.
It doesn't work, because you won't get access to every location in memory you want. Not every location in memory is valid, you may want to read about Virtual Address Space.
Some addresses are reserved for device drivers and kernel mode operations. Another range of addresses (for example 0xCCCCCCCC and higher) may be reserved for uninitialized pointers.
Even if some location is valid, operating system may still deny access to write to/read from certain location, if that would cause undefined behaviour or violate system safety.
EDIT
I think you might be interested in creating some kind of "GameHack", that allows you to modify amount of resources, number of units, experience level, attributes or anything.
Memory access is not a simple topic. Different OSes use different strategies to prevent security violations. But many thing can be done here, after all there is a lot software for doing such things.
First of all, do you really need to write your own tool? If you just want some cheating, use ArtMoney - it is a great memory editor, that I have been using for years.
But if you really have to write it manually, you need to do some research first.
On Windows, for example, I would start from these:
ReadProcessMemory
WriteProcessMemory
Also, I am quite certain, that one of possible techniques is to pretend, that you are a debugger:
DebugActiveProcess.
EDIT 2
I have done some research and it looks, that on Windows (I assume this is your platform, since you mentioned gaming; can't imagine playing anything on crappy Linux), steps required to write another process' memory are:
1. Enumerate processes: (EnumProcesses)
const size_t MAX_PROC_NUM = 512;
DWORD procIDs[MAX_PROC_NUM] = { 0 };
DWORD idsNum = 0;
if(!EnumProcesses(procIDs, sizeof(DWORD) * MAX_PROC_NUM, &idsNum))
//handle error here
idsNum /= sizeof(DWORD); //After EnumProcesses(), idsNum contains number of BYTES!
2. Open required process. (OpenProcess,GetModuleFileNameEx)
const char* game_exe_path = "E:\\Games\\Spellforce\\Spellforce.exe"; //Example
HANDLE game_proc_handle = nullptr;
DWORD proc_access = PROCESS_QUERY_INFORMATION | PROCESS_VM_READ | PROCESS_VM_WRITE; //read & write memory, query info needed to get .exe name
const DWORD MAX_EXE_PATH_LEN = 1024;
for(DWORD n = 0 ; n < idsNum ; ++idsNum)
{
DWORD current_id = procIDs[n];
HANDLE current_handle = OpenProcess(proc_access, false, current_id);
if(!current_handle)
{
//handle error here
continue;
}
char current_path[MAX_EXE_PATH_LEN];
DWORD length = GetModuleFileNameEx(current_handle, nullptr, current_path, MAX_EXE_PATH_LEN);
if(length > 0)
{
if(strcmp(current_path, game_exe_path) == 0) //that's our game!
{
game_proc_handle = current_handle;
break;
}
}
CloseHandle(current_handle); //don't forget this!
}
if(!game_proc_handle)
//sorry, game not found
3. Write memory (WriteProcessMemory)
void* pointer = reinterpret_cast<void*>(0x02F70BCC);
int new_value = 5000; //value to be written
BOOL success = WriteProcessMemory(game_proc_handle, pointer, &new_value, sizeof(int), nullptr);
if(success)
//data successfully written!
else
//well, that's... em...
This code is written just 'as is', but I see no errors, so you can use it as your starting point. I also provided links for all functions I used, so with some additional research (if necessary), you can achieve what you are trying to.
Cheers.
When you use,
cout<<*pointer;
the program tries to dereference the value of the pointer and writes the value at the address.
If you want to print just the pointer, use:
cout << pointer;
Example:
int main()
{
int i = 20;
int* p = &i;
std::cout << *p << std::endl; // print the value stored at the address
// pointed to by p. In this case, it will
// print the value of i, which is 20
std::cout << p << std::endl; // print the address that p points to
// It will print the address of i.
}

Bad optimization of std::fabs()?

Recently i was working with an application that had code similar to:
for (auto x = 0; x < width - 1 - left; ++x)
{
// store / reset points
temp = hPoint = 0;
for(int channel = 0; channel < audioData.size(); channel++)
{
if (peakmode) /* fir rms of window size */
{
for (int z = 0; z < sizeFactor; z++)
{
temp += audioData[channel][x * sizeFactor + z + offset];
}
hPoint += temp / sizeFactor;
}
else /* highest sample in window */
{
for (int z = 0; z < sizeFactor; z++)
{
temp = audioData[channel][x * sizeFactor + z + offset];
if (std::fabs(temp) > std::fabs(hPoint))
hPoint = temp;
}
}
.. some other code
}
... some more code
}
This is inside a graphical render loop, called some 50-100 times / sec with buffers up to 192kHz in multiple channels. So it's a lot of data running through the innermost loops, and profiling showed this was a hotspot.
It occurred to me that one could cast the float to an integer and erase the sign bit, and cast it back using only temporaries. It looked something like this:
if ((const float &&)(*((int *)&temp) & ~0x80000000) > (const float &&)(*((int *)&hPoint) & ~0x80000000))
hPoint = temp;
This gave a 12x reduction in render time, while still producing the same, valid output. Note that everything in the audiodata is sanitized beforehand to not include nans/infs/denormals, and only have a range of [-1, 1].
Are there any corner cases where this optimization will give wrong results - or, why is the standard library function not implemented like this? I presume it has to do with handling of non-normal values?
e: the layout of the floating point model is conforming to ieee, and sizeof(float) == sizeof(int) == 4
Well, you set the floating-point mode to IEEE conforming. Typically, with switches like --fast-math the compiler can ignore IEEE corner cases like NaN, INF and denormals. If the compiler also uses intrinsics, it can probably emit the same code.
BTW, if you're going to assume IEEE format, there's no need for the cast back to float prior to the comparison. The IEEE format is nifty: for all positive finite values, a<b if and only if reinterpret_cast<int_type>(a) < reinterpret_cast<int_type>(b)
It occurred to me that one could cast the float to an integer and erase the sign bit, and cast it back using only temporaries.
No, you can't, because this violates the strict aliasing rule.
Are there any corner cases where this optimization will give wrong results
Technically, this code results in undefined behavior, so it always gives wrong "results". Not in the sense that the result of the absolute value will always be unexpected or incorrect, but in the sense that you can't possibly reason about what a program does if it has undefined behavior.
or, why is the standard library function not implemented like this?
Your suspicion is justified, handling denormals and other exceptional values is tricky, the stdlib function also needs to take those into account, and the other reason is still the undefined behavior.
One (non-)solution if you care about performance:
Instead of casting and pointers, you can use a union. Unfortunately, that only works in C, not in C++, though. That won't result in UB, but it's still not portable (although it will likely work with most, if not all, platforms with IEEE-754).
union {
float f;
unsigned u;
} pun = { .f = -3.14 };
pun.u &= ~0x80000000;
printf("abs(-pi) = %f\n", pun.f);
But, granted, this may or may not be faster than calling fabs(). Only one thing is sure: it won't be always correct.
You would expect fabs() to be implemented in hardware. There was an 8087 instruction for it in 1980 after all. You're not going to beat the hardware.
How the standard library function implements it is .... implementation dependent. So you may find different implementation of the standard library with different performance.
I imagine that you could have problems in platforms where int is not 32 bits. You 'd better use int32_t (cstdint>)
For my knowledge, was std::abs previously inlined ? Or the optimisation you observed is mainly due to suppression of the function call ?
Some observations on how refactoring may improve performance:
as mentioned, x * sizeFactor + offset can be factored out of the inner loops
peakmode is actually a switch changing the function's behaviour - make two functions rather than test the switch mid-loop. This has 2 benefits:
easier to maintain
fewer local variables and code paths to get in the way of optimisation.
The division of temp by sizeFactor can be deferred until outside the channel loop in the peakmode version.
abs(hPoint) can be pre-computed whenever hPoint is updated
if audioData is a vector of vectors you may get some performance benefit by taking a reference to audioData[channel] at the start of the body of the channel loop, reducing the array indexing within the z loop down to one dimension.
finally, apply whatever specific optimisations for the calculation of fabs you deem fit. Anything you do here will hurt portability so it's a last resort.
In VS2008, using the following to track the absolute value of hpoint and hIsNeg to remember whether it is positive or negative is about twice as fast as using fabs():
int hIsNeg=0 ;
...
//Inside loop, replacing
// if (std::fabs(temp) > std::fabs(hPoint))
// hPoint = temp;
if( temp < 0 )
{
if( -temp > hpoint )
{
hpoint = -temp ;
hIsNeg = 1 ;
}
}
else
{
if( temp > hpoint )
{
hpoint = temp ;
hIsNeg = 0 ;
}
}
...
//After loop
if( hIsNeg )
hpoint = -hpoint ;

C array with float data crashing in Objective C class (EXC_BAD_ACCESS)

I am doing some audio processing and therefore mixing some C and Objective C. I have set up a class that handles my OpenAL interface and my audio processing. I have changed the class suffix to
.mm
...as described in the Core Audio book among many examples online.
I have a C style function declared in the .h file and implemented in the .mm file:
static void granularizeWithData(float *inBuffer, unsigned long int total) {
// create grains of audio data from a buffer read in using ExtAudioFileRead() method.
// total value is: 235377
float tmpArr[total];
// now I try to zero pad a new buffer:
for (int j = 1; j <= 100; j++) {
tmpArr[j] = 0;
// CRASH on first iteration EXC_BAD_ACCESS (code=1, address= ...blahblah)
}
}
Strange??? Yes I am totally out of ideas as to why THAT doesn't work but the FOLLOWING works:
float tmpArr[235377];
for (int j = 1; j <= 100; j++) {
tmpArr[j] = 0;
// This works and index 0 - 99 are filled with zeros
}
Does anyone have any clue as to why I can't declare an array of size 'total' which has an int value? My project uses ARC, but I don't see why this would cause a problem. When I print the value of 'total' when debugging, it is in fact the correct value. If anyone has any ideas, please help, it is driving me nuts!
Problem is that that array gets allocated on the stack and not on the heap. Stack size is limited so you can't allocate an array of 235377*sizeof(float) bytes on it, it's too large. Use the heap instead:
float *tmpArray = NULL;
tmpArray = (float *) calloc(total, sizeof(float)); // allocate it
// test that you actually got the memory you asked for
if (tmpArray)
{
// use it
free(tmpArray); // release it
}
Mind that you are always responsible of freeing memory which is allocated on the heap or you will generate a leak.
In your second example, since size is known a priori, the compiler reserves that space somewhere in the static space of the program thus allowing it to work. But in your first example it must do it on the fly, which causes the error. But in any case before being sure that your second example works you should try accessing all the elements of the array and not just the first 100.

memcpy() crashes randomly

I am using memcpy in my application. memcpy crashes randomely and below is the logs i got in Dr.Watson files.
100181b5 8bd1 mov edx,ecx
100181b7 c1e902 shr ecx,0x2
100181ba 8d7c030c lea edi,[ebx+eax+0xc]
100181be f3a5 rep movsd
100181c0 8bca mov ecx,edx
100181c2 83e103 and ecx,0x3
FAULT ->100181c5 f3a4 rep movsb ds:02a3b000=?? es:01b14e64=00
100181c7 ff1508450210 call dword ptr [Debug (10024508)]
100181cd 83c424 add esp,0x24
100181d0 6854580210 push 0x10025854
100181d5 ff1508450210 call dword ptr [Debug (10024508)]
100181db 83c404 add esp,0x4
Below is the code
memcpy((char *)dep + (int)sizeof(EntryRec) + (int)adp->fileHdr.keySize, data, dataSize );
Where:
dep is a structure
EntryRec is a charecter pointer
adp is a structure
data is not NULL in this case
Has anyone faced this issue and can help me?
I have tried to debug the prog,
then i got the following error
Unhandled exception in Prog.exe(MSVCRTD.DLL):0xC0000005: Access voilation
Data is passed argument for this program and this is void*
Further Info:
I have tried to Debug the code adapter is crashing in the following area this function is present in OUTPUT.c (I think this is a library function)
#else /* _UNICODE */
if (flags & (FL_LONG|FL_WIDECHAR)) {
if (text.wz == NULL) /* NULL passed, use special string */
text.wz = __wnullstring;
bufferiswide = 1;
pwch = text.wz;
while ( i-- && *pwch )
++pwch;
textlen = pwch - text.wz;
/* textlen now contains length in wide chars */
} else {
if (text.sz == NULL) /* NULL passed, use special string */
text.sz = __nullstring;
p = text.sz;
while (i-- && *p) //Crash points here
++p;
textlen = p - text.sz; /* length of the string */
}
Value for variables:
p= ""(not initialised)
i= 2147483598
There are two very likely explanations:
You are using memcpy across overlapping addresses -- the behavior of this situation is undefined. If you require the ability to handle overlapping addresses, std::memmove is the "equivalent" tool.
You are using memcpy to copy to/from memory that is inaccessible to your program.
From the code you've shown, it looks like (2) is the more likely scenario. Since you are able to debug the source, try setting a breakpoint before the memcpy occurs, and verify that the arguments to memcpy all match up (i.e. source + num < dest or source > dest + num).
From the disassembled code it appears that the source pointer is not in your address space.
rep movsb copies from ds:si to es:di. The ?? indicates that the memory at ds:si could not be read.
Is the data pointed to by (char *)dep + (int)sizeof(EntryRec) + (int)adp->fileHdr.keySize always at least dataSize long?
I have come across similar crashes where variable length strings are later treated like fixed with strings.
eg
char * ptr = strdup("some string");
// ...
memcpy(ptr, dest, fixedLength);
Where fixedLength is greater than 10. Obviously these were in different functions so the length issue was not noticed. Most of the time this will work, dest will contain "some string" and after the null will be random garbage. In this case if you treat dest as a null terminated string you will never notice, as you don't see the garbage after the null.
However if ptr is allocated at the end of a page of memory, you can only read to the end of the allocated memory and no further. As soon as you read past the end of the page the operating system will rightly crash your program.
It looks like you've run over the end of a buffer and generated an access violation.
Edit: There still is not enough information. We cannot spot a bug without knowing much more about how the buffer you are trying to copy to is allocated whether it has enough space (I suspect it does not) and whether dataSize is valid.
If memcpy crashes the usual reason is, that you passed illegal arguments.
Note that with memcpy source and destination may not overlap.
In such a case use memmove.
from your code "memcpy((char *)dep + (int)sizeof(EntryRec) + (int)adp->fileHdr.keySize, data, dataSize)" and the debug infomation, the "data" looks like a local variable (on-stack variable), you'd do "data = malloc(DATA_SIZE)" instead of "char data[DATA_SIZE]" etc; otherwise, at your current code line, the "data" was popped already, so may cause memory accessing fault randomly.
I'd suggest using memmove as this handles overlapping strings, when using memcpy in this situation the result is unpredictable.