How to get a list of files opened and closed by a program execution? - c++

I have the source code of a program. The source code is extremely huge and written in C/C++. I have the credentials to modify the source code, compile and execute it.
I want to know the filenames of all the files opened and closed by this program when it executes. It would be a plus if this list is sorted in the order the file operations occurred.
How can I get this information? Is there some monitoring tool I need to use or can I inject a library call into the C++ code to achieve this? The code is too large and complicated to hunt down every file open/close call and add a printf there. Or adding a pseudo macro to the file open API call might also be difficult.
Note that this is not the same as viewing what files are open currently by a process. I am aware of the many questions on StackOverflow that already address this problem (using lsof or /proc and so on).

You can use strace as below
$ strace -e trace=open,close -o /tmp/trace.log <your_program> <program_options>
In file /tmp/trace.log you will get all open, close operation done by the program.

In addition to strace, you can use interposition to intercept open/close syscalls. If you Google for "interposition shared library linux" you'll get many other references also.

I am understanding that you want to determine statically what files a given source code could open (for many runs of its compiled program).
If you just want to know it dynamically for a given run, use strace(1) as answered by Rohan and/or interposition library as answered by Kec. Notice that ltrace(1) could also be useful, and perhaps more relevant (since you would trace stdio or C++ library calls).
First, a program can (and many do) open a file whose name is some input (or some program argument). Then you cannot add that arbitrary file name to a list.
You could #define fopen and #define open to print a message. You could use LD_PRELOAD tricks to override open, fopen
If in C++, the program may open files using std::ifstream etc...
You could consider customizing the GCC compiler with MELT to help you...

Related

How can I find the underlying file type in C++?

In *nix system there is a command called 'file', which can tell you the underlying type of a file. Say, if you rename a binary executable's name into foo.txt, or you rename a mp3 file into .txt, the system will always tell you the real type of the file. But in Windows, there seems no such functionality, if you rename an executable into .txt, you cannot execute it. Can anyone explain to me how this is done in *nix system, and how can I find the real type of a file using C++, especially in windows, where I cannot use std::system("file blah")?
File utility uses libmagic library. It recognises filetype parsing "special" fields in the file.
Of course, you can program by yourself recognition of some formats, but sometimes this requires plenty of work. E.g. when you try to differentiate between different formats of MP4.
Developers of that library did pretty huge amount of work. So it's adviced to use their results if you want to get god results in saying what type format you deal with.(this is a big sphere, really, and if knowing what type format you are working with,better rely on them then on your code)
File utility - http://www.darwinsys.com/file/
You can download source code and see how really many different recognition types they use.
Download archive file-4.26 -> magic -> Magdir
Personally I had luck with compiling file 4.26 on Windows ftp://ftp.astron.com/pub/file/
Caution It's merely a convention that files of certain formats should have predefined signatures and it's true almost always and helps identify formats of files properly.
If it's not point of concern, you can surely trust signature. But just keep in mind that anyone having enough knowledge and wish can open a file in hex editor and playing with bits make another format of file.
Even in Unix/Linux, the system doesn't actually definitively know a file's type. The "file" program makes an educated guess by comparing the file's contents against a database of patterns that characterize a variety of common file types, but it's no more than a guess — it doesn't know about all possible file formats, and it can be wrong about the ones that it does know.
It's entirely possible to write a program like "file" for Windows; it doesn't depend on any special capabilities in the OS. Cygwin provides a Windows port of the "file" program, for example.
The issue of renaming a program to have a .txt extension is unrelated to the "file" program. That comes from the fact that Windows decides whether a file is executable based on its name (specifically, its extension), whereas Unix/Linux decides whether a file is executable based on its permissions — not its contents. If you chmod a-x a program on a Linux system, the system will consider it non-executable, just like if you remove the .exe extension from a program on Windows.
The command reference is suggesting that the type information is saved to an external place for further usage. It is also mentioning magic numbers, which is refering to file signatures.
Being 100% sure of a file type is theorically impossible since there is no precise rules around what a certain type should contain. Even if they were such rules, it would be possible to alter the file in a way to make it look like another one. While both signatures and extension can give you a good idea of what the type actually is, you still need to face the possibility of dealing with a wrong type.
UNIX file command uses heuristics. There is a database of magic numbers, usually in /usr/share/file/magic and /etc/magic/ that allows you to add new file "types" to be recogized by the file command. It simply probes the file to look for magic numbers (signatures) in its contents.
UNIX traditionally doesn't have the same type of file extension and type associations that Windows does, although Linux is accumulating that in recent times.
I would think on Windows you'd want to at least check the file extension association, to be correct. But even within a given extension (such as .txt) the individual program may perform its own heuristics. Example, notepad has to make an educated guess at the character encoding when it opens a file. Raymond Chen wrote a good read in his blog about it The Old New Thing - The Notepad file encoding problem, redux

How to use system calls with c++

What I need to do is mimic std::cout using system calls.
I have seen the syscall() function that uses a number for the system call, the system() function that uses a string with a command and system_call() that worked for someone here in stackoverflow but she didn't list header files or anything so it didn't work for me.
I don't expect you to code it for me, since this is a homework, but I would like some clues as to which is the best way to go around it, what header files to use and functions to use and look into with more depth. I don't know the differences between those functions but ideally I would like to find c++11 functions.
I have only found vague information about those functions so I haven't been able to put any code together.
System calls, like API, are Operating System (OS) specific.
To use the API, you will need to include the appropriate header file and link with the appropriate libraries for your system.
Again, the C++ language does not cover platform specific functionality and you will need to search the web to find the API for your platform.
What I need to do is mimic std::cout using system calls.
You want to call the system call write(2) which is system call number 4, on the standard output file descriptor which is file descriptor 1.
Read:
$ man 2 syscall
$ man 2 syscalls
$ man 2 write

Calling external files (e.g. executables) in C++ in a cross-platform way

I know many have asked this question before, but as far as I can see, there's no clear answer that helps C++ beginners. So, here's my question (or request if you like),
Say I'm writing a C++ code using Xcode or any text editor, and I want to use some of the tools provided in another C++ program. For instance, an executable. So, how can I call that executable file in my code?
Also, can I exploit other functions/objects/classes provided in a C++ program and use them in my C++ code via this calling technique? Or is it just executables that I can call?
I hope someone could provide a clear answer that beginners can absorb.. :p
So, how can I call that executable file in my code?
The easiest way is to use system(). For example, if the executable is called tool, then:
system( "tool" );
However, there are a lot of caveats with this technique. This call just asks the operating system to do something, but each operating system can understand or answer the same command differently.
For example:
system( "pause" );
...will work in Windows, stopping the exectuion, but not in other operating systems. Also, the rules regarding spaces inside the path to the file are different. Finally, even the separator bar can be different ('\' for windows only).
And can I also exploit other functions/objects/classes... from a c++
and use them in my c++ code via this calling technique?
Not really. If you want to use clases or functions created by others, you will have to get the source code for them and compile them with your program. This is probably one of the easiest ways to do it, provided that source code is small enough.
Many times, people creates libraries, which are collections of useful classes and/or functions. If the library is distributed in binary form, then you'll need the dll file (or equivalent for other OS's), and a header file describing the classes and functions provided y the library. This is a rich source of frustration for C++ programmers, since even libraries created with different compilers in the same operating system are potentially incompatible. That's why many times libraries are distributed in source code form, with a list of instructions (a makefile or even worse) to obtain a binary version in a single file, and a header file, as described before.
This is because the C++ standard does not the low level stuff that happens inside a compiler. There are lots of implementation details that were freely left for compiler vendors to do as they wanted, possibly trying to achieve better performance. This unfortunately means that it is difficult to distribute a simple library.
You can call another program easily - this will start an entirely separate copy of the program. See the system() or exec() family of calls.
This is common in unix where there are lots of small programs which take an input stream of text, do something and write the output to the next program. Using these you could sort or search a set of data without having to write any more code.
On windows it's easy to start the default application for a file automatically, so you could write a pdf file and start the default app for viewing a PDF. What is harder on Windows is to control a separate giu program - unless the program has deliberately written to allow remote control (eg with com/ole on windows) then you can't control anything the user does in that program.

Disable system() and exec() function in C and Pascal

Is there any way to disable system() and exec() function in C/C++ and Pascal, by using any compiler argument or modifying header/unit file? (It's a Windows)
I've tried using -Dsystem=NONEXIST for gcc and g++ but #include <cstdio> causes compile error.
EDIT: Of course I know they can use #undef system to bypass the defense, so I've tried to comment out the system function line in stdlib.h, but that doesn't work too.
EDIT2 (comment): It's a system, to which users submit their programs and the server compile and run it with different input data, then compare the program output with pre-calculated standard output to see if the program is correct. Now some users send code like system("shutdown -s -t 0"); to shutdown the server.
The server is running Windows system so I don't have any chroot environment. Also the server application is closed-source so I can do nothing to control how the program submitted by user is executed. What I can do is to modify the compiler commandline argument and modify header files.
Well, you could try:
#define system DontEvenThinkAboutUsingThisFunction
#define exec OrThisOneYouClown
in a header file but I'm pretty certain any code monkey worth their salt could bypass such a "protection".
I'd be interested in understanding why you thought this was necessary (there may be a better solution if we understood the problem better).
The only thing that comes to mind is that you want to provide some online compiler/runner akin to the Euler project. If that was the case, then you could search the code for the string system<whitespace>( as an option but, even then, a determined party could just:
#define InoccuousFunction system
to get around your defenses.
If that is the case, you might want to think about using something like chroot so that no-one can even get access to any dangerous binaries like shutdown (and that particular beast shouldn't really be runnable by a regular user anyway) - in other words, restrict their environment so that the only things they can even see are gcc and its kin.
You need to do proper sandboxing since, even if you somehow prevented them from running external programs, they may still be able to do dangerous things like overwite files or open up socket connections to their own box to send through the contents of your precious information.
One possibility is to create your own version of such functions, and link them into every program you compile/link on the server. If the symbol is found in your objects, it'll take precedence.
Just make sure you get them all ;)
It would be much better to run the programs as a user with as few privileges as possible. Then you don't have to worry about them deleting/accessing system files, shutting down the system, etc.
EDIT: of course, by my logic, the user could provide their own version of the function also, which does dynamic library loading & symbol lookup to find the original function. You really just need to sandbox it.
For unixoid environments, there is Geordi, which uses a lot of help from the operating system to sandbox the code to be executed.
Basically you want to run the code in a very restricted environment; Linux provides a special process flag for that which disables any system calls that would give access to resources that the process did not have at the point where the flag was set (i.e. it disallows opening new files, but any files that are already open may be accessed normally).
I think Windows should have a similar mechanism.
Not really (because of tricks like calling some library function which would call system itself, or because the functionality of spawning processes can be done with just fork & execve system calls, which remain available...).
But why do you ask that?
You can never (as you have found out) rely on user input to be safe. system and execXX are unlikely to be your only problems.
This means you have the following options:
Run the program in some kind of chrooted jail (not sure how to do this on windows)
Scan the code before before compiling to ensure there are no "illegal" functions.
Scan the executable binary after compiling to ensure that it is not using any "forbidden" library function.
Prevent the linker from linking to any external libraries including the standard C library (libc) on unix. You then create your own "libc" which explicilty allow certain functions.
Number 3 on unix can use utilities like readelf or objdump can check for linked in symbols. This can also probably be done using the Binary File Descriptor Library as well.
Number 4 will require fiddling with compiler flags but probably is the safest out of the options listed above.
You could use something like this
#include<stdlib.h>
#include<unistd.h>
#define system <stdlib.h>
#define exec <unistd.h>
In this case even if the user wants to swap macro values they can't. If they try to swap macro values like this
#define <stdlib.h> system
#define <unistd.h> exec
they can't because C wouldn't allow this type of first name in macros. Even if somehow they swap these values then we have included those header files that will create a compile time error.

how to make sure that a file will be closed at the end of the run

Suppose someone wrote a method that opens a certain file and forgets to close it in some cases. Given this method, can I make sure that the file is closed without changing the code of the original method?
The only option I see is to write a method that wraps the original method, but this is only possible if the file is defined outside the original method, right? Otherwise it's lost forever...
Since this is C++, I would expect that the I/O streams library (std::ifstream and friends) would be used, not the legacy C I/O library. In that case, yes, the file will be closed because the stream is closed by the stream object's destructor.
If you are using the legacy C API, then no, you're out of luck.
In my opinion, the best answer to an interview question like this is to point out the real flaw in the code--managing resources manually--and to suggest the correct solution: use automatic resource management ("Resource Acquisition is Initialization" or "Scope-Bound Resource Management").
You are correct that if the wrapper doesn't somehow get a reference to the opened file, it may be difficult to close it. However, the operating system might provide a means to get a list of open files, and you could then find the one you need to close.
However, note that most (practically all) operating systems take care of closing files when the application exits, so you don't need to worry about a file being left open indefinitely after the program stops. (This may or may not be a reasonable answer to the question you were given, which seems incredibly vague and ambiguous.)
If you are using C function for file open, you can use _fcloseall function for closing all the opened files.
If you are using C++, Like James suggested, stream destructor should take care of it.
Which environment are you in? You can always check the file descriptors opened by the process and close them forcefully.
Under linux you can use the lsof command to list open files for a process. Do it once before the method and once after the method to detect newly opened files. Hopefully you aren't fighting some multithreaded legacy beast.