Can argv be changed at runtime (not by the app itself) - c++

I wonder can input parameters of main() be changed at runtime. In other words, should we protect the app from possible TOCTTOU attack when handling data in argv? Currently, I don't know any way to change data that was passed in argv, but I'm not sure that such ways don't exist.
UPD: I forgot to point out that I'm curious about changing argv from outside the program since argv is accepted from outside the program.

I'd say there are two main options based on your threat model here:
You do not trust the environment and assume that other privileged processes on your machine are able to alter the contents of memory of your program while it is running. If so, nothing is safe, the program could be altered to do literally anything. In such case, you can't even trust an integer comparison.
You trust the environment in which your program is running. In this case your program is the only owner of its data, and as long as you don't explicitly decide to alter argv or any other piece of data, you can rely on it.
In the first case, it doesn't matter if you guard against potential argv modifications, since you are not trusting the execution environment, so even those guards could be fooled. In the second case, you trust the execution environment, so you don't need to guard against the problem in the first place.
In both the above cases, the answer is: no, you shouldn't protect the app from a possible TOCTTOU attack when handling data in argv.
TOCTTOU kind of problems usually arise from external untrusted data, that can be modified by somebody else and should not be trusted by definition. A simple example is the existence of a file: you cannot rely on it, as other users or programs on the machine could delete or move it, the only way you can make sure the file can be used is by trying to open it. In the case of argv, the data is not external and is owned by the process itself, so the problem really does not apply.

In general, the set of strings that are passed to main() in the argv array are set inside the program user space, mostly in a fixed place at the top of the program stack.
The reason for such a fixed place, is that some programs modify this area to allow for a privileged program (e.g. the ps command) to gather and show you different command arguments, as the program evolves at runtime. This is used in programs like sendmail(8) or in user program's threads, to show you which thread is doing what job in your program.
This is a feature that is not standard, it is used differently by the different operating systems (I have described you the BSD way) As far as I know, linux also exhibits this behaviour and Solaris.
In general, this makes the arguments to main something that, belonging to the user process space, has to be modified with care (using some operating system specific contract), as it is normally subject to rigid conventions. The ps(1) command digs in the user space of the process it is going to show in order to show the long listing showing the command parameters. The different operating systems document (probably you can get this from the linker standard script used in your system the exact format or how the stack is intialized by the exec(2) familiy of calls -- the exec(2) manual page should be of help also)
I don't exactly know if this is what you expect, or if you just want to see if you can modify the arguments.... as something belonging to the user space of the proces, they are modifiable most probably, but I cannot guess any reasons to do that, apart of those described in this answer.
By the way, the FreeBSD manual page for the execlp(2) system call shows the following excerpt:
The type of the argv and envp parameters to execle(), exect(), execv(),
execvp(), and execvP() is a historical accident and no sane
implementation should modify the provided strings. The bogus parameter
types trigger false positives from const correctness analyzers. On
FreeBSD, the __DECONST() macro may be used to work around this
limitation.
This states clearly that you cannot modify them (in FreeBSD at least). I assume the ps(8) command will handle the extra work of verifying those parameters in a proper way in order to never incurr in a security issue bug (well, this can be tested, but I leave it as an exercise for the interested people)
EDIT
If you look at /usr/include/sys/exec.h (line 43) in FreeBSD, you will find that there's a struct ps_strings located in the top of the user stack, that is used by ps(1) command to find and locate the the process environment and argv strings. While you can edit this to change the information a program gives to ps(1), you have a setproctitle(3) library function (again, all of this is FreeBSDish, you'll have to dig to get the way linux, or other, solves this problem)
I've tried this approach, but it doesn't work. Today there's a library function call to get this approach, but the top of the stack is actually filled with the data mentioned above (I assume for compatibility reasons)

Related

A Function in an application(.exe) should be called only once regardless of how many times I run the same application

Suppose there are two functions, one to print "hello" and other to print "world" and I call these two functions inside the main function. Now, when I compile it will create a .exe file. When I run this .exe for the first time both functions will print "hello world".This .exe is terminated.
But if I run the same .exe for the second time or multiple times, only one function must execute ie. it should print only "world". I want to a piece of code or function that should only run once and after that, it should destroy itself and should be not be executed again regardless of how many times I run the application(.exe)
I can achieve this by accessing locally or windows registry and write some value for once and can check if that value is present, no need to execute this piece of code or function.
Can I achieve it without any external help that the application itself should be capable of performing this behaviour?
Any ideas are appreciated. thanks for reading
There is no coherent or portable way1 to do this from software without requiring the use of an external resource of some kind.
The issue is that you want the invocation of this process to be aware of the amount of times it has been executed, but the amount of times it has been executed is not a property that is recorded anywhere2. A program itself has no memory of its previous executions unless you program it do so.
Your best bet is to write out this information in some canonicalized location so that it can be read on later executions. This could be as a file in the filesystem (such as a hidden .firstrun file or something), or it could be through the registry (Windows specific), or some other environment-specific form of communication.
The main thing is that this must persist between executions and be available to your process.
1 You could potentially write code that overwrites the executable itself after the first invocation -- but this is extraordinarily brittle, and will be highly specific to the executable format. This is not an ideal nor recommended approach to solving this problem.
2 This is not a capability defined in the C or C++ standard. It's possible that there may be some specialized operating systems/flavors of linux that allow querying this -- but this is not something seen in most general-purpose operating systems. Generally the approach is communicate via an external resource.
Can I achieve it without any external help that the application itself
should be capable of performing this behaviour?
Not by any means defined by C or C++, and probably not on Windows at all.
You have to somehow, somewhere memorialize the fact that the one-time function has been called. If you have nothing but the compiled program to use for that, then the only alternative is to modify the program. Neither C nor C++ provides for live modification of the running program, much less for writing it back to the executable file containing its image.
Conceivably, if the program knows where to find or how to recreate its own source code, and if it knows how to run the compiler, then it could compile a modified version of itself. On Windows, however, it very likely could not overwrite its own executable file while it was running (though that would be possible on various other operating systems), so that would not solve the problem.
Moreover, note that any approach that involves modifying the executable would be at least a bit wonky, for different copies of the program would have their own, semi-independent idea of whether the one-time function had been run.
Basically, then, no.

How can I change an address in another process with a value that can also change?

I am using C++ with Qt and I am struggling to find the way to achieve something I never did before.
Here is what I want to achieve :
I have a client (let's call it Client.exe) which I don't have access to the source and a launcher (let's call it... Launcher.exe) which I have access to the source.
Cient.exe needs a password and a username, supposed to come from Launcher.exe.
If I had only one couple password/username, I know I could make a .dll and inject it, but since I can have a lot of combinaisons, that is impossible.
So here is my question, what is the way to make a link allowing me to send password and username from Launcher.exe to Client.exe ?
Second question would be : is there a way to use VirtualProtect and this kind of stuff (in order to modify some instructions in memory), with an executable, meaning without any injection ? (I guess the answer is no, but I want to be sure)
Your "Launcher.exe" and your DLL injected into "Client.exe" can communicate with each other via interprocess communication, for example through file mapping. This could be used for "Launcher.exe" to pass any desired username and password to "Client.exe".
However, the main problem I see is how to get "Client.exe" to use this data, if you do not have access to the source code and if it also does not provide an API for this.
If you want to trick "Client.exe" into using the data provided by you (or by your injected DLL) instead of the intended data, then you must reverse engineer the program and change the appropriate instructions so that they load your data instead of the original data. Since you do not have access to the C/C++ source code, you will have to understand the assembly language instructions to accomplish this.
In order to find out which instructions to change, you will likely need a debugging tool such as x64dbg, which is designed to debug applications that you haven't written yourself (and have no source code for) and possibly also a static analysis tool, such as IDA or Ghidra. Furthermore, if the program deliberately protects itself from reverse-engineering, you will have to learn how to overcome this (which can be very hard).
You could also accomplish this without injecting a DLL, by using WriteProcessMemory. You may need to also use VirtualAllocEx if you need extra memory inside the target process, for example for injecting instructions or data.
In any case, before tampering with another process's instructions or data, it may be advisable to suspend all of its threads using SuspendThread, and then resuming all threads afterwards with ResumeThread. Otherwise, if the program runs while its instructions or data are in an inconsistent state, the program may crash.

How to program securely when you can only expect advisory file locks where the program will be used?

Asked on the grounds of: "...but if your question generally covers…
- a specific programming problem..." (Help center - asking)
Scope: This is not about how to use the file lock mechanisms on different platforms, but about how to mitigate the absence of mandatory file locks on the user's system. E.g.: I can't expect a user of a Linux system to modify the system, let alone know how to do it, so I have to assume advisory locks is all that is available. I have found a lot of info about how to use locks of both kinds, and what is available on some platforms, and even why they are not available on some systems. Portability would be great, but this is probably to much down to the bone for that.
I am a bit confused about how to safeguard my program's data if other processes don't cooperate, intentionally or not. Assuming that my program uses its own directory for the data, is there a way to make sure that my data will stay consistent while the program runs?
Would, for example, temporary hidden files be a practical solution (create a file, delete it from the OS' directory, so only my handle holds the inode to the file), copying all data at program start and overwriting the original at the end? It seems to be very platform specific, though.
Are there specific mechanisms or techniques to use that could help with this, or can I only "trust"?
Note: This is not specifically about Linux, it's just an example.
------ EDIT -------
I'm looking for a way to do this that works in C/C++, hence those tags, but am aware that it might involve system specific features. If possible, the solution would work regardless of platform and file locking mechanism.
While file locks is the mechanism referenced in the question, the real problem is how to prevent another process from trampling over the data my program relies on while running, even if that process runs as the same user as my program does, but does not care to check whether the files are locked. Also if it is using a mechanism that isn't working well with the one my program is using. (AFAICT, locks acquired with one of flock()/lockf() on Linux may not work when the other is used in the other process) (Another situation from Linux, but one that is outside the scope of the question)
As I tried to explain about the scope, this is not about how to use file locks on any platform, but what to do when you cannot assume anything about what mechanism is available/turned on, to achieve similar protection to what mandatory file locks would give.
I am a bit confused about how to safeguard my program's data if other processes don't cooperate, intentionally or not. Assuming that my program uses its own directory for the data, is there a way to make sure that my data will stay consistent while the program runs?
You cannot do so and should not even try to do so. How do you safeguard your data if the admin pulls the power cord? All things with access to your data must cooperate -- that is the precondition for safeguarding being possible.
Simply specify that your program requires a directory that is only touched by programs that cooperate with yours. That is a trivial requirement that many programs have any any competent administrator can provide.

Is it safe to send a pointer to a static function over the network?

I was thinking about some RPC code that I have to implement in C++ and I wondered if it's safe (and under which assumptions) to send it over the network to the same binary code (assuming it's exactly the same and that they are running on same architecture). I guess virtual memory should do the difference here.
I'm asking it just out of curiosity, since it's a bad design in any case, but I would like to know if it's theoretically possible (and if it's extendable to other kind of pointers to static data other than functions that the program may include).
In general, it's not safe for many reasons, but there are limited cases in which it will work. First of all, I'm going to assume you're using some sort of signing or encryption in the protocol that ensures the integrity of your data stream; if not, you have serious security issues already that are only compounded by passing around function pointers.
If the exact same program binary is running on both ends of the connection, if the function is in the main program (or in code linked from a static library) and not in a shared library, and if the program is not built as a position-independent executable (PIE), then the function pointer will be the same on both ends and passing it across the network should work. Note that these are very stringent conditions that would have to be documented as part of using your program, and they're very fragile; for instance if somebody upgrades the software on one side and forgets to upgrade the version on the other side of the connection at the same time, things will break horribly and dangerously.
I would avoid this type of low-level RPC entirely in favor of a higher-level command structure or abstract RPC framework, but if you really want to do it, a slightly safer approach would be to pass function names and use dlsym or equivalent to look them up. If the symbols reside in the main program binary rather than libraries, then depending on your platform you might need -rdynamic (GCC) or a similar option to make them available to dlsym. libffi might also be a useful tool for abstracting this.
Also, if you want to avoid depending on dlsym or libffi, you could keep your own "symbol table" hard-coded in the binary as a static const linear table or hash table mapping symbol names to function pointers. The hash table format used in ELF for this purpose is very simple to understand and implement, so I might consider basing your implementation on that.
What is it a pointer to?
Is it a pointer to a piece of static program memory? If so, don't forget that it's an address, not an offset, so you'd first need to convert between the two accordingly.
Second, if it's not a piece of static memory (ie: statically allocated array created at build time as opposed to run time) it's not really possible at all.
Finally, how are you ensuring the two pieces of code are the same? Are both binaries bit identical (eg: diff -a binary1 binary2). Even if they are bit-identical, depending on the virtual memory management on each machine, the entire program's program memory segment may not exist in a single page, or the alignment across multiple pages may be different for each system.
This is really a bad idea, no matter how you slice it. This is what message passing and APIs are for.
I don't know of any form of RPC that will let you send a pointer over the network (at least without doing something like casting to int first). If you do convert to int on the sending end, and convert that back to a pointer on the far end, you get pretty much the same as converting any other arbitrary int to a pointer: undefined behavior if you ever attempt to dereference it.
Normally, if you pass a pointer to an RPC function, it'll be marshalled -- i.e., the data it points to will be packaged up, sent across, put into memory, and a pointer to that local copy of the data passed to the function on the other end. That's part of why/how IDL gets a bit ugly -- you need to tell it how to figure out how much data to send across the wire when/if you pass a pointer. Most know about zero-terminated strings. For other types of arrays, you typically need to specify the size of the data (somehow or other).
This is highly system dependent. On systems with virtual addressing such that each process thinks it's running at the same address each time it executes, this could plausibly work for executable code. Darren Kopp's comment and link regarding ASLR is interesting - a quick read of the Wikipedia article suggests the Linux & Windows versions focus on data rather than executable code, except for "network facing daemons" on Linux, and on Windows it applies only when "specifically linked to be ASLR-enabled".
Still, "same binary code" is best assured by static linking - if different shared objects/libraries are loaded, or they're loaded in different order (perhaps due to dynamic loading - dlopen - driven by different ordering in config files or command line args etc.) you're probably stuffed.
Sending a pointer over the network is generally unsafe. The two main reasons are:
Reliability: the data/function pointer may not point to the same entity (data structure or function) on another machine due to different location of the program or its libraries or dynamically allocated objects in memory. Relocatable code + ASLR can break your design. At the very least, if you want to point to a statically allocated object or a function you should sent its offset w.r.t. the image base if your platform is Windows or do something similar on whatever OS you are.
Security: if your network is open and there's a hacker (or they have broken into your network), they can impersonate your first machine and make the second machine either hang or crash, causing a denial of service, or execute arbitrary code and get access to sensitive information or tamper with it or hijack the machine and turn it into an evil bot sending spam or attacking other computers. Of course, there are measures and countermeasures here, but...
If I were you, I'd design something different. And I'd ensure that the transmitted data is either unimportant or encrypted and the receiving part does the necessary validation of it prior to using it, so there are no buffer overflows or execution of arbitrary things.
If you're looking for some formal guarantees, I cannot help you. You would have to look in the documentation of the compiler and OS that you're using - however I doubt that you would find the necessary guarantees - except possibly for some specialized embedded systems OS'.
I can however provide you with one scenario where I'm 99.99% sure that it will work without any problems:
Windows
32 bit process
Function is located in a module that doesn't have relocation information
The module in question is already loaded & initialized on the client side
The module in question is 100% identical on both sides
A compiler that doesn't do very crazy stuff (e.g. MSVC and GCC should both be fine)
If you want to call a function in a DLL you might run into problems. As per the list above the module (=DLL) may not have relocation information, which of course makes it impossible to relocate it (which is what we need). Unfortunately that also means that loading the DLL will fail, if the "preferred load address" is used by something else. So that would be kind-of risky.
If the function resides in the EXE however, you should be fine. A 32 bit EXE doesn't need relocation information, and most don't include it (MSVC default settings). BTW: ASLR is not an issue here since a) ASLR does only move modules that are tagged as wanting to be moved and b) ASLR could not move a 32 bit windows module without relocation information, even if it wanted to.
Most of the above just makes sure that the function will have the same address on both sides. The only remaining question - at least that I can think of - is: is it safe to call a function via a pointer that we initialized by memcpy-ing over some bytes that we received from the network, assuming that the byte-pattern is the same that we would have gotten if we had taken the address of the desired function? That surely is something that the C++ standard doesn't guarantee, but I don't expect any real-world problems from current real-world compilers.
That being said, I would not recommend to do that, except for situations where security and robustness really aren't important.

How can I log which thread called which function from which class and at what time throughout my whole project?

I am working on a fairly large project that runs on embedded systems. I would like to add the capability of logging which thread called which function from which class and at what time. E.g., here's what a typical line of the log file would look like:
Time - Thread Name - Function Name - Class Name
I know that I can do this by using the _penter hook function, which would execute at the beginning of every function called within my project (Source: http://msdn.microsoft.com/en-us/library/c63a9b7h%28VS.80%29.aspx). I could then find a library that would help me find the function, class, and thread from which _penter was called. However, I cannot use this solution since it is VC++ specific.
Is there a different way of doing this that would be supported by non-VC++ implementations? I am using the ARM/Thumb C/C++ Compiler, RVCT3.1. Additionally, is there a better way of tracking problems that may arise from multithreading?
Thank you,
Borys
I've worked with a system that had similar requirements (ARM embedded device). We had to build much of it from scratch, but we used some CodeWarrior stuff to do it, and then the map file for the function name lookup.
With CodeWarrior, you can get some code inserted into the start and end of each function, and using that, you can track when you enter each function, and when you switch threads. We used assembly, and you might have to as well, but it's easier than you think. One of your registers will be your return value, which is a hex value. If you compile with a map file, you can then use that hex value to look up the (mangled) name of that function. You can find the class name in the function name.
But, basically, get yourself a stream to somewhere (ideally to a desktop), and yell to the stream:
Entered Function #####
Left Function #####
Switched to Thread #
(PS - Actual encoding should be more like 1 21361987236, 2 1238721312, since you don't actually want to send characters)
If you're only ever processing one thread at a time, this should give you an accurate record of where you went, in the order you went there. Attach clock tick info for function profiling, add a message for allocations (and deallocations) and you get memory tracking.
If you're actually running multiple threads, it could get substantially harder, or be more of the same - I don't know. I'd put timing information on everything, and then have a separate stream for each thread. Although you might just be able to detect which processor you're running on, and report that, for which thread.... I don't, however, know if any of that will work.
Still, the basic idea was: Report on each step (function entry/exit, thread switching, and allocation), and then re-assemble the information you care about on the desktop side, where you have processing to spare.
gcc has PRETTY_FUNCTION define. With regard to thread, you can always call gettid or similar.
I've written a few log systems that just increment a thread # and store in in thread-local-data. That helps with giving thread of log statements. (time is easy to print out)
For tracing all function calls automatically, I'm not so sure. If it's just a few, you can easily write an object & macro that logs entry/exit using the __FUNCNAME__ #define (or something similar for your compiler).