How do I run freestanding scripts with SML/NJ? - sml

How do I use SML/NJ to run a script which reads from STDIN and writes to STDOUT say? Is there a way to get rid of the output from the interpreter itself?

Just to be very clear, SML/NJ is not strictly an interpreter. It's a compiler that just so happens to have a REPL. The best way to achieve what you're suggesting is to create a heap image (basically a compiled binary that's ready to be loaded by the SML/NJ runtime system), and then run it directly using sml #SMLload=heapfile.img where heapfile.img is the name of the heap file you generated. You might also want to pass #SMLquiet as a command line option. This will surpress any output while loading the heap file.
You might also just be trying to compile a program into something you can run stand-alone, in which case you might like to look at the MLton compiler.

Related

GDB fails to open file when debugging f77 program

So I'm not a big CS guy, so bear with me as I try to explain this adequately enough.
At work, I use a program written in Fortran 77 to do some modeling. Our debugging has been an issue, due to some IT constraints that are outside my control. When we attempt to use GDB, the compiler loads. When you run the program, it fails through internal logic checks. The program's looking for an input file, but it can't find it because GDB does not load another file that has a list of all the directories the input file, and other relevant files, could possibly be in.
The relevant code:
...
logical exst
...
INQUIRE(FILE='KEYWORDS',EXTST=exst)
if(exst)then
...
endif
End code
This DOES work when I run the program. The KEYWORDS file is found, read in through a call within the if statement branch, which allows the program to find the input file. When debugging, however, exst is always false, preventing proper read in, and failing later through logic checks.
Does GDB require certain permissions? The only thing I could find in my own search was a possible issue on signed/unsigned reported file size incompatibility, but outside of understanding what signed and unsigned values are, the explanation was a bit over my head.
Any help is appreciated. Will try to provide more information where requested.
gdb doesn't change the permissions of the program it runs. It runs under the same user id, as usual.
Normally when this sort of problem arises, it comes from an environmental difference. Typical sources are the current working directory, the command-line arguments, or environment variables. It's also reasonably common to have a wrapper script that invokes a program properly, but then when running in gdb, one does not use the wrapper and then improperly duplicates the setup that it provides. Less common but also still possible is code in .gdbinit messing with the environment inside gdb. So be sure to double-check things with pwd inside gdb, etc.

How to get a list of files opened and closed by a program execution?

I have the source code of a program. The source code is extremely huge and written in C/C++. I have the credentials to modify the source code, compile and execute it.
I want to know the filenames of all the files opened and closed by this program when it executes. It would be a plus if this list is sorted in the order the file operations occurred.
How can I get this information? Is there some monitoring tool I need to use or can I inject a library call into the C++ code to achieve this? The code is too large and complicated to hunt down every file open/close call and add a printf there. Or adding a pseudo macro to the file open API call might also be difficult.
Note that this is not the same as viewing what files are open currently by a process. I am aware of the many questions on StackOverflow that already address this problem (using lsof or /proc and so on).
You can use strace as below
$ strace -e trace=open,close -o /tmp/trace.log <your_program> <program_options>
In file /tmp/trace.log you will get all open, close operation done by the program.
In addition to strace, you can use interposition to intercept open/close syscalls. If you Google for "interposition shared library linux" you'll get many other references also.
I am understanding that you want to determine statically what files a given source code could open (for many runs of its compiled program).
If you just want to know it dynamically for a given run, use strace(1) as answered by Rohan and/or interposition library as answered by Kec. Notice that ltrace(1) could also be useful, and perhaps more relevant (since you would trace stdio or C++ library calls).
First, a program can (and many do) open a file whose name is some input (or some program argument). Then you cannot add that arbitrary file name to a list.
You could #define fopen and #define open to print a message. You could use LD_PRELOAD tricks to override open, fopen
If in C++, the program may open files using std::ifstream etc...
You could consider customizing the GCC compiler with MELT to help you...

Is there a way to figure out what environment variables are needed/used by an executable?

I've got a C++ program that will run certain very specific commands as root. The reason this is needed is because another program running under Node.js needs to do things like set system time, set time zone, etc that require root privileges to accomplish. I'm using the function execve in C++ to make the system call with root privileges after using the setuid command. I specifically choose the execve command because I want to wall off the environment so I don't create an environment variable vulnerability.
setuid(0);
execve(acExeName, pArgsForExec2, pcEnv);
What I want to do is find out exactly the pcEnv which is the environment variable list for the program to execute with that my program needs. For example, if I want to run the tool time-admin as if I was running it from the console, how can I figure out what environment variables it needs. I know I can print off the environment variables with the command printenv, but that gives me all of them. I'm quite sure I don't need them all and want as small a subset as possible.
I know I can use them all and then slowly comment each one out and see if it keeps working, but I'd really rather not go that far.
Anyone got a clever way to figure out what environment variables are used by a program? I should add I'm doing this on a Ubuntu 12.04 LTS install.
Thanks for any help.
There are no general ways of figuring out the environment variables used by some program. For example, one could imagine that a program has some configuration files which gives the name of environment variables.
Actually many shell like programs (or script interpreters) are doing that.
More generally, the argument to getenv(3) could be computed. So in theory you cannot guess its possible values. (I might be wrong, but some very old versions of libc and of bash used to play such tricks; unfortunately, I forgot the details, but sometimes an environment variable with some pid number in its name was used).
And, as others commented, you might want to use ltrace (or play LD_PRELOAD tricks), or use gdb, to find out how getenv is called ...
And the application might also use the environ variable (see environ(7) ...) or the third argument to main ....
In practice however, a reasonably written program should clearly document all the environment variables it is using....
If you have access to the source code of the program, you could, if it is compiled by GCC, use (the just released version 1.0 of) the MELT plugin. MELT is a domain specific language to extend GCC and can be used to explore the internal Gimple representations handled by GCC while compiling your program. In particular with its new findgimple mode you could find in one command all the calls to getenv with a constant string.

MATLAB arbitrary code execution

I am writing an automatic grader program under linux. There are several graders written in MATLAB, so I want to tie them all together and let students run a program to do an assignment, and have them choose the assignment. I am using a C++ main program, which then has mcc-compiled MATLAB libraries linked to it.
Specifically, my program reads a config file for the names of the various matlab programs, and other information. It then uses that information to present choices to the student. So, If an assignment changes, is added or removed, then all you have to do is change the config file.
The idea is that next, the program invokes the correct matlab library that has been compiled with mcc. But, that means that the libraries have to be recompiled if a grader gets changed. Worse, the whole program must be recompiled if a grader is added or removed. So, I would like one, simple, unchanging matlab library function to call the grader m-files directly. I currently have such a library, that uses eval on a string passed to it from the main program.
The problem is that when I do this, apparently, mcc absorbs the grader m-code into itself; changing the grader m code after compilation has no effect. I would like for this not to happen. It was brought to my attention that Mathworks may not want me to be able to do this, since it could bypass matlab entirely. That is not my intention, and I would be happy with a solution that requires a full matlab install.
My possible solutions are to use a mex file for the main program, or have the main program call a mcc library, which then calls a mex file, which then calls the proper grader. The reason I am hesitant about the first solution is that I'm not sure how many changes I would have to make to my code to make it work; my code is C++, not C, which I think makes things more complicated. The 2nd solution, though, may just be more complicated and ultimately have the same problem.
So, any thoughts on this situation? How should I do this?
You seem to have picked the most complicated way of solving the problem. Here are some alternatives:
Don't use C/C++ at all -- Write a MATLAB program to display the menu of choices (either a GUI for a simple text menu in the MATLAB command window) and then invoke the appropriate MATLAB grading programs.
Write your menu program in C/C++, but invoke MATLAB using a -r argument to run a specific grading program (to speed up the startup times, use the -nodesktop, -nojvm or -nodisplay options as appropriate). However, note that MATLAB will be started anew on each menu selection.
Write your menu program in C/C++ and start MATLAB using the popen command (this sets up a pipe between your C++ program and the MATLAB process). After a menu selection by the user:
your C++ program writes the name of the MATLAB program (and any parameters) to the pipe.
On the MATLAB side, write a MATLAB program to a blocking read on that pipe. When it reads a command, it invokes the appropriate MATLAB function.
You could also use named pipes. See this MATLAB newsgroup thread for more information.
Update: Option #3 above is effectively how the MATLAB engine works, so you are probably better off using that directly.
Don't make this a mex function.
Use a regular m-file that has to be executed in matlab. If you don't want to launch matlab first, write a bat file. I believe -r or -m runs a given command (you will have to cd to the correct directory before running you ml function).
To compile c++ code using mex first install visual studio. Then run (in matlab) mex -setup. Select "locate installed compilers" or some such, and then select your compiler from the list. Now mex will compile c++ code.
Using the MATLAB Engine to Call MATLAB Software from C/C++ and Fortran Programs

Restoring program state from a core file

Is it possible, under any circumstances, to restore the state of a program to what it was during the generation of a core file?
The reason I ask is that in order to take advantage of gdb's ability to execute functions and so forth you need to have a running instance. Surely it should be possible to produce a mock process of the same executable with the state set to be the contents of the core?
If not what alternatives are there for the sort of situation that made me want to do this in the first place? In this case the back-trace of the core led to a library function and I wanted to replicate the inputs to this function call but one of the inputs is was complex object which could easily be serialized to a string with a function call in a running instance but not so in a core dump.
It is theoretically possible to do exactly what you want, but (AFAICT) there is no support for this in GDB (yet).
Your best bet is to use GDB-7.0 and use its embedded python scripting to re-implement the serialization function.
That's what a core file does already? If you load gdb with the original executable and the core file
gdb myprogram.exe -c mycorefile
Then it'll go to the point at where it crashed. You can use all the normal inspection functionality to view the variables, see the stack trace and so on.
Or have I misunderstood your question?
In case it's useful to someone,
I've implemented a Python module to do just that: call functions in a core file (by emulating the CPU).
It's called EmuCore.
I've successfully used it on very complex functions, example serializing a GStreamer pipeline graph.
Note that it still has important limitations such as:
only x64 Linux
the function can't call the OS (to e.g. read files)
function arguments can't be floats
See README for more info.