Is it possible without any file.out and source code, but just the binary?
Is it possible, knowing the name of a var, found and read at runtime the value?
Is it possible, knowing the name of a var, found and read at runtime the value
It depends.
If the variable is a global, and the binary is not stripped, then you should be able to examine its value with a simple
x/gx &var
print var
The latter may print the variable as if it were of type int (if the binary has no debug info), which may not be what you are looking for.
If the variable is local (automatic), then you can print it only while inside the routine in which it is declared (obviously).
If the binary has debug info, then simple print var in correct context should work.
If the binary doesn't, you'll have to figure out the in-memory address of the variable (usually at fixed offset from stack pointer of frame pointer register), and examine that address. You can often figure out a lot about the given routine by disassembling it.
Update:
if I strip the binary, is harder to do the reverse engineering?
Sure: the less info you provide to the attacker, the harder you make his job.
But you also make your job harder: when your binary doesn't work, often your end-user will know more about his system than you do. Often he will load your binary into GDB, and tell you exactly where your bug is. With a stripped executable, he likely wouldn't be able to do that, so you'll guess back and forth, and after a week of trying will lose that customer.
And there is nothing you can do to prevent a sufficiently determined and sufficiently skilled hacker with root access to his system and hardware from reverse engineering your program.
In the end, in my experience, anti-circumvention techniques are usually much more trouble than they are worth.
Related
A long time ago I learned about filling unused / uninitialized memory with 0xDEADBEEF so that in a debugger or a crash report if I ever see that value I know I'm looking at uninitialized memory. I saw from a crash report iOS uses 0xBBADBEEF.
What other creative values have people used? Do any particular values have any kind of specific benefit?
The most obvious benefit of values that turn into words is that, at least of most people, if the words are in their language they stick out easily where as some strictly numeric value is less likely to stick out.
But, maybe there are other reason to pick numbers? For example an odd number might crash a processors (68000) for example on certain memory accesses so it's probably better to pick 0x0BADBEEF over 0xBADBEEF0. Are their any other values (maybe processor specific) that have a concrete benefit for using for uninitialized memory?
Generally speaking, you want a value which is unlikely to happen to "work" when interpreted as either an integer, a pointer, or a string. So, here are a few constraints:
Don't use a value that's a multiple of the smallest "usual" alignment on your target architecture. For x86, that's 4 (bytes), so no values that are divisible by 4. This ensures that if the value is interpreted as a pointer, it'll be obviously-incorrect. If you're on a non-x86 architecture, you might even be able to use a value that will cause an alignment trap if used as a pointer.
Don't use a value which could reasonably be a small (positive or negative) integer. Your typical "int" variable in a C program never gets larger than 1,000 or so, so don't use small numbers as your empty data fill.
Don't use a value which is composed entirely of valid ASCII characters. Make sure there's at least one byte in there with the high bit set. These days, you'd want to make sure they weren't valid UTF-8 or possibly UTF-16 values, either.
Don't have any zero bytes in the value. There are too many cases where this would work out to be "helpful" to keeping the program from crashing - terminating a string, giving a non-int field a reasonable-looking value, etc.
Don't use a single (or two) byte values, repeated over and over. Having a full-word length pattern can make it easier to determine how your wild pointer ended up pointing where it is, at least narrowing down which operations offset it from the start of the pattern.
Don't use a value that maps to an valid address for a "typical" process. If the highest bits are set, it'll typically take a whole lot of malloc() before your process will grow large enough to make that a valid address.
Perhaps unsurprisingly, patterns like 0xDEADBEEF meet basically all of these requirements.
One technical term for values like this is "poison value".
Hex numbers that form English words are called Hexspeak. Wikipedia's Hexspeak article pretty much answers this question, cataloguing many known constants in use for various things, including several that are used as poison values / canaries / sanity checks, as well as other uses like error codes or IPv6 addresses.
I seem to recall some variation of 0xBADF00D. (maybe with a repeated letter like your 2nd example).
There's also 0xDEADC0DE. (Googling for where I've seen this used found the wikipedia article linked above).
Other English words in hex I've seen: Java .class files use 0xCAFEBABE as the magic number (first 4 bytes of the file). As a play on this, I guess, the Jikes JVM uses 0xDEADBABE as a sanity check constant.
Apparently Java wasn't the first user of 0xCAFEBABE. Wikipedia says "It was originally created by NeXTSTEP developers as a reference to the baristas at Peet's Coffee & Tea", and was used by the people developing Java before they thought of the name "Java". So it didn't come out of Java -> coffee (if anything the other way around), it's just plain old non-feminist tech culture. :(
re: update: Choosing a good value. For a poison value (not an error code), you want all the bytes to be different and not 0x00 or 0xFF, since those are probably the most likely values for an errant single-byte store. This applies especially for things like stack canaries (to detect buffer overruns), or other cases where detecting that it didn't get overwritten is important.
Your speculation about picking an odd value makes a lot of sense. Not being a valid memory address in the virtual memory layout of typical processes is a big advantage. Failing noisily as early as possible is optimal for debugging. Anyway, this probably means that having the high bit set is a good idea, so 0x0... is probably not a good idea.
I'm trying to understand a small binary using gdb but there is something I can't find a way to achieve : how can I find the list of jumps that point to a specified address?
I have a small set of instructions in the disassembled code and I want to know where it is called.
I first thought about searching the corresponding instruction in .text, but since there are many kind of jumps, and address can be relative, this can't work.
Is there a way to do that?
Alternatively, if I put a breakpoint on this address, is there a way to know the address of the previous instruction (in this case, the jump)?
If this is some subroutine being called from other places, then it must respect some ABI while it's called.
Depending on a CPU used, the return address (and therefore a place from where it was called) will be stored somewhere (on stack or in some registers). If you replace original code with the one that examines this, you can create a list of return addresses. Or simpler, as you suggested, if you use gdb and put a breakpoint at that routine, you can see from where it was called by using a bt command.
If it was actual jump (versus a "jump to subroutine") that led you there (which I doubt, if it's called from many places, unless it's a kind of longjmp/setjmp), then you will probably not be able to determine where this was called from, unless the CPU you are using allows you to trace the execution in some way.
Sorry to pester everyone, but this has been causing me some pain. Here's the code:
char buf[500];
sprintf(buf,"D:\\Important\\Calibration\\Results\\model_%i.xml",mEstimatingModelID);
mEstimatingModelID is an integer, currently holding value 0.
Simple enough, but debugging shows this is happening:
0x0795f630 "n\Results\model_0.xml"
I.e. it's missing the start of the string.
Any ideas? This is simple stuff, but I can't figure it out.
Thanks!
In an effort to make this an actual general answer: Here's a checklist for similar errors:
Never trust what you see in release mode, especially local variables that have been allocated from stack memory. Static variables that exist in heap data are about the only thing that will generally be correct but even then, don't trust it. (Which was the case for the user above)
It's been my experience that the more recent versions of VS have less reliable release mode data (probably b/c they optimize much more in release, or maybe it's 64bitness or whatever)
Always verify that you are examining the variable in the correct function. It is very easy to have a variable named "buf" in a higher function that has some uninitialized garbage in it. This would be easily confused with the same named variable in the lower subroutine/function.
It's always a good idea to double check for buffer overruns. If you ever use a %s in your sprintf, you could get a buffer overrun.
Check your types. sprintf is pretty adaptable and you can easily get a non-crashing but strange result by passing in a string pointer when an int is expected etc.
Is there any way to profile the mathkernel memory usage (down to individual variables) other than paying $$$ for their Eclipse plugin (mathematica workbench, iirc)?
Right now I finish execution of a program that takes multiple GB's of ram, but the only things that are stored should be ~50MB of data at most, yet mathkernel.exe tends to hold onto ~1.5GB (basically, as much as Windows will give it). Is there any better way to get around this, other than saving the data I need and quitting the kernel every time?
EDIT: I've just learned of the ByteCount function (which shows some disturbing results on basic datatypes, but that's besides the point), but even the sum over all my variables is nowhere near the amount taken by mathkernel. What gives?
One thing a lot of users don't realize is that it takes memory to store all your inputs and outputs in the In and Out symbols, regardless of whether or not you assign an output to a variable. Out is also aliased as %, where % is the previous output, %% is the second-to-last, etc. %123 is equivalent to Out[123].
If you don't have a habit of using %, or only use it to a few levels deep, set $HistoryLength to 0 or a small positive integer, to keep only the last few (or no) outputs around in Out.
You might also want to look at the functions MaxMemoryUsed and MemoryInUse.
Of course, the $HistoryLength issue may or not be your problem, but you haven't shared what your actual evaluation is.
If you're able to post it, perhaps someone will be able to shed more light on why it's so memory-intensive.
Here is my solution for profiling of memory usage:
myByteCount[symbolName_String] :=
Replace[ToHeldExpression[symbolName],
Hold[x__] :>
If[MemberQ[Attributes[x], Protected | ReadProtected],
Sequence ## {}, {ByteCount[
Through[{OwnValues, DownValues, UpValues, SubValues,
DefaultValues, FormatValues, NValues}[Unevaluated#x,
Sort -> False]]], symbolName}]];
With[{listing = myByteCount /# Names[]},
Labeled[Grid[Reverse#Take[Sort[listing], -100], Frame -> True,
Alignment -> Left],
Column[{Style[
"ByteCount for symbols without attributes Protected and \
ReadProtected in all contexts", 16, FontFamily -> "Times"],
Style[Row#{"Total: ", Total[listing[[All, 1]]], " bytes for ",
Length[listing], " symbols"}, Bold]}, Center, 1.5], Top]]
Evaluation the above gives the following table:
Michael Pilat's answer is a good one, and MemoryInUse and MaxMemoryUsed are probably the best tools you have. ByteCount is rarely all that helpful because what it measures can be a huge overestimate because it ignores shared subexpressions, and it often ignores memory that isn't directly accessible through Mathematica functions, which is often a major component of memory usage.
One thing you can do in some circumstances is use the Share function, which forces subexpressions to be shared when possible. In some circumstances, this can save you tens or even hundreds of magabytes. You can tell how well it's working by using MemoryInUse before and after you use Share.
Also, some innocuous-seeming things can cause Mathematica to use a whole lot more memory than you expect. Contiguous arrays of machine reals (and only machine reals) can be allocated as so-called "packed" arrays, much the way they would be allocated by C or Fortran. However, if you have a mix of machine reals and other structures (including symbols) in an array, everything has to be "boxed", and the array becomes an array of pointers, which can add a lot of overhead.
One way is to automatize restarting of kernel when it goes out of memory. You can execute your memory-consuming code in a slave kernel while the master kernel only takes the result of computation and controls memory usage.
Im trying to get the engine version of a game from a global pointer, but I am fairly new to this. Here is a very small example I found...
http://ampaste.net/mb42243
And this is the disassembly for what I am trying to get, the pointer (gpszVersionString) is the highlighted line (line 5)
http://ampaste.net/m2a8f8887
So what I need to find out is basically using the example approach I found to get it, would I need to basically sig out the first part of the function and find the offset to that line?
Like...
Memory signature - /x56/x8B/x35/x74/xD5/x29/x10/x68/x00/xA8/x38/x10
Then an offset to reach that line? (not sure how to find the offset)
You can't directly do this. Process address space is completely unique to your process -- 0xDEADBEEF can point to "Dog" in one process, while 0xDEADBEEF can point to "Cat" in another. You would have to make operating system calls that allow you to access another process' address space, and even then you'd have to guess. Many times that location will be different each run of the application -- you can't generally predict what the runtime layout of a process will be in all cases.
Assuming you're on Windows you'll need to (EDIT: You don't need A and B in all cases but you usually need them) A. be an administrator, B. take the SeDebugPrivilege for your process, C, open a handle to the process, and then D. use ReadProcessMemory/WriteProcessMemory to do what you want.
Hope that helps :)
EDIT 2: It looks like you're looking at an address taken from a disassembler. If that's the case, then you can't use that value of the address -- the image can be re-based at runtime and the value there would be completely different. Particularly on recent versions of Windows which support Address Space Layout Randomization.