Related
I wonder can input parameters of main() be changed at runtime. In other words, should we protect the app from possible TOCTTOU attack when handling data in argv? Currently, I don't know any way to change data that was passed in argv, but I'm not sure that such ways don't exist.
UPD: I forgot to point out that I'm curious about changing argv from outside the program since argv is accepted from outside the program.
I'd say there are two main options based on your threat model here:
You do not trust the environment and assume that other privileged processes on your machine are able to alter the contents of memory of your program while it is running. If so, nothing is safe, the program could be altered to do literally anything. In such case, you can't even trust an integer comparison.
You trust the environment in which your program is running. In this case your program is the only owner of its data, and as long as you don't explicitly decide to alter argv or any other piece of data, you can rely on it.
In the first case, it doesn't matter if you guard against potential argv modifications, since you are not trusting the execution environment, so even those guards could be fooled. In the second case, you trust the execution environment, so you don't need to guard against the problem in the first place.
In both the above cases, the answer is: no, you shouldn't protect the app from a possible TOCTTOU attack when handling data in argv.
TOCTTOU kind of problems usually arise from external untrusted data, that can be modified by somebody else and should not be trusted by definition. A simple example is the existence of a file: you cannot rely on it, as other users or programs on the machine could delete or move it, the only way you can make sure the file can be used is by trying to open it. In the case of argv, the data is not external and is owned by the process itself, so the problem really does not apply.
In general, the set of strings that are passed to main() in the argv array are set inside the program user space, mostly in a fixed place at the top of the program stack.
The reason for such a fixed place, is that some programs modify this area to allow for a privileged program (e.g. the ps command) to gather and show you different command arguments, as the program evolves at runtime. This is used in programs like sendmail(8) or in user program's threads, to show you which thread is doing what job in your program.
This is a feature that is not standard, it is used differently by the different operating systems (I have described you the BSD way) As far as I know, linux also exhibits this behaviour and Solaris.
In general, this makes the arguments to main something that, belonging to the user process space, has to be modified with care (using some operating system specific contract), as it is normally subject to rigid conventions. The ps(1) command digs in the user space of the process it is going to show in order to show the long listing showing the command parameters. The different operating systems document (probably you can get this from the linker standard script used in your system the exact format or how the stack is intialized by the exec(2) familiy of calls -- the exec(2) manual page should be of help also)
I don't exactly know if this is what you expect, or if you just want to see if you can modify the arguments.... as something belonging to the user space of the proces, they are modifiable most probably, but I cannot guess any reasons to do that, apart of those described in this answer.
By the way, the FreeBSD manual page for the execlp(2) system call shows the following excerpt:
The type of the argv and envp parameters to execle(), exect(), execv(),
execvp(), and execvP() is a historical accident and no sane
implementation should modify the provided strings. The bogus parameter
types trigger false positives from const correctness analyzers. On
FreeBSD, the __DECONST() macro may be used to work around this
limitation.
This states clearly that you cannot modify them (in FreeBSD at least). I assume the ps(8) command will handle the extra work of verifying those parameters in a proper way in order to never incurr in a security issue bug (well, this can be tested, but I leave it as an exercise for the interested people)
EDIT
If you look at /usr/include/sys/exec.h (line 43) in FreeBSD, you will find that there's a struct ps_strings located in the top of the user stack, that is used by ps(1) command to find and locate the the process environment and argv strings. While you can edit this to change the information a program gives to ps(1), you have a setproctitle(3) library function (again, all of this is FreeBSDish, you'll have to dig to get the way linux, or other, solves this problem)
I've tried this approach, but it doesn't work. Today there's a library function call to get this approach, but the top of the stack is actually filled with the data mentioned above (I assume for compatibility reasons)
I was thinking about some RPC code that I have to implement in C++ and I wondered if it's safe (and under which assumptions) to send it over the network to the same binary code (assuming it's exactly the same and that they are running on same architecture). I guess virtual memory should do the difference here.
I'm asking it just out of curiosity, since it's a bad design in any case, but I would like to know if it's theoretically possible (and if it's extendable to other kind of pointers to static data other than functions that the program may include).
In general, it's not safe for many reasons, but there are limited cases in which it will work. First of all, I'm going to assume you're using some sort of signing or encryption in the protocol that ensures the integrity of your data stream; if not, you have serious security issues already that are only compounded by passing around function pointers.
If the exact same program binary is running on both ends of the connection, if the function is in the main program (or in code linked from a static library) and not in a shared library, and if the program is not built as a position-independent executable (PIE), then the function pointer will be the same on both ends and passing it across the network should work. Note that these are very stringent conditions that would have to be documented as part of using your program, and they're very fragile; for instance if somebody upgrades the software on one side and forgets to upgrade the version on the other side of the connection at the same time, things will break horribly and dangerously.
I would avoid this type of low-level RPC entirely in favor of a higher-level command structure or abstract RPC framework, but if you really want to do it, a slightly safer approach would be to pass function names and use dlsym or equivalent to look them up. If the symbols reside in the main program binary rather than libraries, then depending on your platform you might need -rdynamic (GCC) or a similar option to make them available to dlsym. libffi might also be a useful tool for abstracting this.
Also, if you want to avoid depending on dlsym or libffi, you could keep your own "symbol table" hard-coded in the binary as a static const linear table or hash table mapping symbol names to function pointers. The hash table format used in ELF for this purpose is very simple to understand and implement, so I might consider basing your implementation on that.
What is it a pointer to?
Is it a pointer to a piece of static program memory? If so, don't forget that it's an address, not an offset, so you'd first need to convert between the two accordingly.
Second, if it's not a piece of static memory (ie: statically allocated array created at build time as opposed to run time) it's not really possible at all.
Finally, how are you ensuring the two pieces of code are the same? Are both binaries bit identical (eg: diff -a binary1 binary2). Even if they are bit-identical, depending on the virtual memory management on each machine, the entire program's program memory segment may not exist in a single page, or the alignment across multiple pages may be different for each system.
This is really a bad idea, no matter how you slice it. This is what message passing and APIs are for.
I don't know of any form of RPC that will let you send a pointer over the network (at least without doing something like casting to int first). If you do convert to int on the sending end, and convert that back to a pointer on the far end, you get pretty much the same as converting any other arbitrary int to a pointer: undefined behavior if you ever attempt to dereference it.
Normally, if you pass a pointer to an RPC function, it'll be marshalled -- i.e., the data it points to will be packaged up, sent across, put into memory, and a pointer to that local copy of the data passed to the function on the other end. That's part of why/how IDL gets a bit ugly -- you need to tell it how to figure out how much data to send across the wire when/if you pass a pointer. Most know about zero-terminated strings. For other types of arrays, you typically need to specify the size of the data (somehow or other).
This is highly system dependent. On systems with virtual addressing such that each process thinks it's running at the same address each time it executes, this could plausibly work for executable code. Darren Kopp's comment and link regarding ASLR is interesting - a quick read of the Wikipedia article suggests the Linux & Windows versions focus on data rather than executable code, except for "network facing daemons" on Linux, and on Windows it applies only when "specifically linked to be ASLR-enabled".
Still, "same binary code" is best assured by static linking - if different shared objects/libraries are loaded, or they're loaded in different order (perhaps due to dynamic loading - dlopen - driven by different ordering in config files or command line args etc.) you're probably stuffed.
Sending a pointer over the network is generally unsafe. The two main reasons are:
Reliability: the data/function pointer may not point to the same entity (data structure or function) on another machine due to different location of the program or its libraries or dynamically allocated objects in memory. Relocatable code + ASLR can break your design. At the very least, if you want to point to a statically allocated object or a function you should sent its offset w.r.t. the image base if your platform is Windows or do something similar on whatever OS you are.
Security: if your network is open and there's a hacker (or they have broken into your network), they can impersonate your first machine and make the second machine either hang or crash, causing a denial of service, or execute arbitrary code and get access to sensitive information or tamper with it or hijack the machine and turn it into an evil bot sending spam or attacking other computers. Of course, there are measures and countermeasures here, but...
If I were you, I'd design something different. And I'd ensure that the transmitted data is either unimportant or encrypted and the receiving part does the necessary validation of it prior to using it, so there are no buffer overflows or execution of arbitrary things.
If you're looking for some formal guarantees, I cannot help you. You would have to look in the documentation of the compiler and OS that you're using - however I doubt that you would find the necessary guarantees - except possibly for some specialized embedded systems OS'.
I can however provide you with one scenario where I'm 99.99% sure that it will work without any problems:
Windows
32 bit process
Function is located in a module that doesn't have relocation information
The module in question is already loaded & initialized on the client side
The module in question is 100% identical on both sides
A compiler that doesn't do very crazy stuff (e.g. MSVC and GCC should both be fine)
If you want to call a function in a DLL you might run into problems. As per the list above the module (=DLL) may not have relocation information, which of course makes it impossible to relocate it (which is what we need). Unfortunately that also means that loading the DLL will fail, if the "preferred load address" is used by something else. So that would be kind-of risky.
If the function resides in the EXE however, you should be fine. A 32 bit EXE doesn't need relocation information, and most don't include it (MSVC default settings). BTW: ASLR is not an issue here since a) ASLR does only move modules that are tagged as wanting to be moved and b) ASLR could not move a 32 bit windows module without relocation information, even if it wanted to.
Most of the above just makes sure that the function will have the same address on both sides. The only remaining question - at least that I can think of - is: is it safe to call a function via a pointer that we initialized by memcpy-ing over some bytes that we received from the network, assuming that the byte-pattern is the same that we would have gotten if we had taken the address of the desired function? That surely is something that the C++ standard doesn't guarantee, but I don't expect any real-world problems from current real-world compilers.
That being said, I would not recommend to do that, except for situations where security and robustness really aren't important.
Recently I was updating some code used to take screenshots using the GetWindowDC -> CreateCompatibleDC -> CreateCompatibleBitmap -> SelectObject -> BitBlt -> GetDIBits series of WinAPI functions. Now I check all those for failure because they can and sometimes do fail. But then I have to perform cleanup by deleting the created bitmap, deleting the created dc, and releasing the window dc. In any example I've seen -- even on MSDN -- the related functions (DeleteObject, DeleteDC< ReleaseDC) aren't checked for failure, presumably because if they were retrieved/created OK, they will always be deleted/released OK. But, they still can fail.
That's just one noteable example since the calls are all right next to each other. But occasionally there are other functions that can fail but in practice never do. Such as GetCursorPos. Or functions that can fail only if passed invalid data, such as FileTimeToSytemTime.
So, is it good-practice to check ALL functions that can fail for failure? Or are some OK not to check? And as a corollary, when checking these should-never-fail functions for failure, what is proper? Throwing a runtime exception, using an assert, something else?
The question whether to test or not depends on what you would do if it failed. Most samples exit once cleanup is finished, so verifying proper clean up serves no purpose, the program is exiting in either case.
Not checking something like GetCursorPos could lead to bugs, but depending on the code required to avoid this determines whether you should check or not. If checking it would add 3 lines around all your calls then you are likely better off to take the risk. However if you have a macro setup to handle it then it wouldn't hurt to add that macro just in case.
FileTimeToSystemTime being checked depends on what you are passing into it. A file time from the system? probably safe to ignore it. A custom string built from user input? probably better to make sure.
Yes. You never know when a promised service will surprise by not working. Best to report an error even for the surprises. Otherwise you will find yourself with a customer saying your application doesn't work, and the reason will be a complete mystery; you won't be able to respond in a timely, useful way to your customer and you both lose.
If you organize your code to always do such checks, it isn't that hard to add the next check to that next API you call.
It's funny that you mention GetCursorPos since that fails on Wow64 processes when the address passed is >2Gb. It fails every time. The bug was fixed in Windows 7.
So, yes, I think it's wise to check for errors even when you don't expect them.
Yes, you need to check, but if you're using C++ you can take advantage of RAII and leave cleanup to the various resources that you are using.
The alternative would be to have a jumble of if-else statements, and that's really ugly and error-prone.
Yes. Suppose you don't check what a function returned and the program just continues after the function failure. What happens next? How will you know why your program misbehaves long time later?
One quite reliable solution is to throw an exception, but this will require your code to be exception-safe.
Yes. If a function can fail, then you should protect against it.
One helpful way to categorise potential problems in code is by the potential causes of failure:
invalid operations in your code
invalid operations in client code (code that call yours, written by
someone else)
external dependencies (file system, network connection etc.)
In situation 1, it is enough to detect the error and not perform recovery, as this is a bug that should be fixable by you.
In situation 2, the error should be notified to client code (e.g. by throwing an exception).
In situation 3, your code should recover as far as possible automatically, and notify any client code if necessary.
In both situations 2 & 3, you should endeavour to make sure that your code recovers to a valid state, e.g. you should try to offer "strong exception guarentee" etc.
The longer I've coded with WinAPIs with C++ and to a lesser extent PInvoke and C#, the more I've gone about it this way:
Design the usage to assume it will fail (eventually) regardless of what the documentation seems to imply
Make sure to know the return value indication for pass/fail, as sometimes 0 means pass, and vice-versa
Check to see if GetLastError is noted, and decide what value that info can give your app
If robustness is a serious enough goal, you may consider it a worthy time-investment to see if you can do a somewhat fault-tolerant design with redundant means to get whatever it is you need. Many times with WinAPIs there's more than one way to get to the specific info or functionality you're looking for, and sometimes that means using other Windows libraries/frameworks that work in-conjunction with the WinAPIs.
For example, getting screen data can be done with straight WinAPIs, but a popular alternative is to use GDI+, which plays well with WinAPIs.
I'm making a commercial product that will have a client and server side. The client is totally dependent on the server , just to make it harder to crack/pirate . Problem is , even so there is a chance that someone will reverse engineer the protocol and make their own server.
I've thought about encrypting the connection either with ssl or with another algorithm so it won't be so easy to figure out the protocol just from sniffing the traffic between the client and the server.
Now the only thing I can think of that pirates would use is to decompile the program, remove the encryption and try to see the "plain text" protocol in order to reverse engineer it.
I have read previous topics and I know that it's impossible to make it impossible to crack , but what tweaks can we programmers bring to our code to make it a huge headache for crackers?
Read how Skype did it:
The binary is decrypted into memory at startup.
The import table is overwritten.
The startup code is erased from memory.
Code integrity checks bust most debuggers: in random points in the code it computes a checksum of some other chunk of code and uses the checksum for an indirect jump to the next instruction. (Explanation: most debuggers implement breakpoints by changing the instruction at the breakpoint address. This check detects that.)
If debugger is detected -- it scrambles the registers and jumps to a random page.
Obfuscates code: call destination addresses are dynamically computed; dummy branches that are never executed; raises SEH where the handler sets some registers and resumes execution.
Keep in mind that these or other techniques would make reverse engineering harder, but not impossible. Also you shall never rely on any of these for security.
IMO your best option is to design your servers to provide some useful functionality (SaS). Your clients will essentially be paying for using that functionality. If your client-app is dumb enough, you won't care about it being open-source.
One thing you need to be aware of is that most packers/cryptors cause false positives with virus scanners. And that can be pretty annoying because people complain all the time that your software contains a virus(they don't get the concept of false positives).
And for protocol-obfuscation don't use SSL. It is trivial for an attacker to intercept the plaintext when you call Send with the plain-text. Use SSL for securing the connection and obfuscate the data before sending them. The obfuscation algorithm doesn't need to be cryptographically secure.
This might be helpful: http://www.woodmann.com/crackz/Tutorials/Protect.htm
IMHO, it's difficult to hide the actual plain code. What most packers do is to make it difficult to patch. However, in your case, Themida could do the trick.
Here are some nice tips about writing a good protection: http://www.inner-smile.com/nocrack.phtml
With my basic knowledge of C++, I've managed to whip together a simple program that reads some data from a program (using ReadProcessMemory) and sends it to my web server every five minutes, so I can see the status of said program while I'm not at home.
I found the memory addresses to read from using a program designed to hack games called "Memory Hacking Software." The problem is, the addresses change whenever I move the program to another machine.
My question is: is there a way to find a 'permanent' address that is the same on any machine? Or is this simply impossible. Excuse me if this is a dumb question, but I don't know a whole lot on the subject. Or perhaps another means to access information from a running program.
Thanks for any and all help!
There are ways to do it such as being able to recognise memory patterns around the thing you're looking for. Crackers can use this to find memory locations to patch even with software that "moves around", so to speak (as with operating systems that provide randomisation of address spaces).
For example, if you know that there are fixed character strings always located X bytes beyond the area of interest, you can scan the whole address space to find them, then calculate the area of interest from that.
However, it's not always as reliable as you might think.
I would instead be thinking of another way to achieve your ends, one that doesn't involve battling the features that are protecting such software from malicious behaviour.
Think of questions like:
Why exactly do you need access to the address space at all?
Does the program itself provide status information in a more workable manner?
If the program is yours, can you modify it to provide that information?
If you only need to know if the program is doing its job, can you simply "ping" the program (e.g., for a web page, send an HTML request and ensure you get a valid response)?
As a last resort, can you convince the OS to load your program without address space randomisation then continue using your (somewhat dubious) method?
Given your comment that:
I use the program on four machines and I have to "re-find" the addresses (8 of them) on all of them every time they update the program.
I would simply opt for automating this process. This is what some cracking software does. It scans files or in-memory code and data looking for markers that it can use for locating an area of interest.
If you can do it manually, you should be able to write a program that can do it. Have that program locate the areas of interest (by reading the process address space) and, once they're found, just read your required information from there. If the methods of finding them changes with each release (instead of just the actual locations), you'll probably need to update your locator routines with each release of their software but, unfortunately, that's the price you pay for the chosen method.
It's unlikely the program you're trying to read will be as secure as some - I've seen some move their areas of interest around as the program is running, to try and confuse crackers.
What you are asking for is impossible by design. ASLR is designed specifically to prevent this kind of snooping.
What kind of information are you getting from the remote process?
Sorry, this isn't possible. The memory layout of processes isn't going to be reliably consistent.
You can achieve your goal in a number of ways:
Add a client/server protocol that you can connect to and ask "what's your status?" (this also lends itself nicely to asking for more info).
Have the process periodically touch a file, the "monitor" can check the modification time of that file to see if the process is dead.