Calling external files (e.g. executables) in C++ in a cross-platform way - c++

I know many have asked this question before, but as far as I can see, there's no clear answer that helps C++ beginners. So, here's my question (or request if you like),
Say I'm writing a C++ code using Xcode or any text editor, and I want to use some of the tools provided in another C++ program. For instance, an executable. So, how can I call that executable file in my code?
Also, can I exploit other functions/objects/classes provided in a C++ program and use them in my C++ code via this calling technique? Or is it just executables that I can call?
I hope someone could provide a clear answer that beginners can absorb.. :p

So, how can I call that executable file in my code?
The easiest way is to use system(). For example, if the executable is called tool, then:
system( "tool" );
However, there are a lot of caveats with this technique. This call just asks the operating system to do something, but each operating system can understand or answer the same command differently.
For example:
system( "pause" );
...will work in Windows, stopping the exectuion, but not in other operating systems. Also, the rules regarding spaces inside the path to the file are different. Finally, even the separator bar can be different ('\' for windows only).
And can I also exploit other functions/objects/classes... from a c++
and use them in my c++ code via this calling technique?
Not really. If you want to use clases or functions created by others, you will have to get the source code for them and compile them with your program. This is probably one of the easiest ways to do it, provided that source code is small enough.
Many times, people creates libraries, which are collections of useful classes and/or functions. If the library is distributed in binary form, then you'll need the dll file (or equivalent for other OS's), and a header file describing the classes and functions provided y the library. This is a rich source of frustration for C++ programmers, since even libraries created with different compilers in the same operating system are potentially incompatible. That's why many times libraries are distributed in source code form, with a list of instructions (a makefile or even worse) to obtain a binary version in a single file, and a header file, as described before.
This is because the C++ standard does not the low level stuff that happens inside a compiler. There are lots of implementation details that were freely left for compiler vendors to do as they wanted, possibly trying to achieve better performance. This unfortunately means that it is difficult to distribute a simple library.

You can call another program easily - this will start an entirely separate copy of the program. See the system() or exec() family of calls.
This is common in unix where there are lots of small programs which take an input stream of text, do something and write the output to the next program. Using these you could sort or search a set of data without having to write any more code.
On windows it's easy to start the default application for a file automatically, so you could write a pdf file and start the default app for viewing a PDF. What is harder on Windows is to control a separate giu program - unless the program has deliberately written to allow remote control (eg with com/ole on windows) then you can't control anything the user does in that program.

Related

Precompile script into objects inside C++ application

I need to provide my users the ability to write mathematical computations into the program. I plan to have a simple text interface with a few buttons including those to validate the script grammar, save etc.
Here's where it gets interesting. These functions the user is writing need to execute at multi-megabyte line speeds in a communications application. So I need the speed of a compiled language, but the usage of a script. A fully interpreted language just won't cut it.
My idea is to precompile the saved user modules into objects at initialization of the C++ application. I could then use these objects to execute the code when called upon. Here are the workflows I have in mind:
1) Testing(initial writing) of script: Write code in editor, save, compile into object (testing grammar), run with test I/O, Edit Code
2) Use of Code (Normal operation of application): Load script from file, compile script into object, Run object code, Run object code, Run object code, etc.
I've looked into several off the shelf interpreters, but can't find what I'm looking for. I considered JAVA, as it is pretty fast, but I would need to load the JAVA virtual machine, which means passing objects between C and the virtual machine... The interface is the bottleneck here. I really need to create a native C++ object running C++ code if possible. I also need to be able to run the code on multiple processors effectively in a controlled manner.
I'm not looking for the whole explanation on how to pull this off, as I can do my own research. I've been stalled for a couple days here now, however, and I really need a place to start looking.
As a last resort, I will create my own scripting language to fulfill the need, but that seems a waste with all the great interpreters out there. I've also considered taking an existing open source complier and slicing it up for the functionality I need... just not saving the compiled results to disk... I don't know. I would prefer to use a mainline language if possible... but that's not required.
Any help would be appreciated. I know this is not your run of the mill idea I have here, but someone has to have done it before.
Thanks!
P.S.
One thought that just occurred to me while writing this was this: what about using a true C compiler to create object code, save it to disk as a dll library, then reload and run it inside "my" code? Can you do that with MS Visual Studio? I need to look at the licensing of the compiler... how to reload the library dynamically while the main application continues to run... hmmmmm I could then just group the "functions" created by the user into library groups. Ok that's enough of this particular brain dump...
A possible solution could be use gcc (MingW since you are on windows) and build a DLL out of your user defined code. The DLL should export just one function. You can use the win32 API to handle the DLL (LoadLibrary/GetProcAddress etc.) At the end of this job you have a C style function pointer. The problem now are arguments. If your computation has just one parameter you can fo a cast to double (*funct)(double), but if you have many parameters you need to match them.
I think I've found a way to do this using standard C.
1) Standard C needs to be used because when it is compiled into a dll, the resulting interface is cross compatible with multiple compilers. I plan to do my primary development with MS Visual Studio and compile objects in my application using gcc (windows version)
2) I will expose certain variables to the user (inputs and outputs) and standardize them across units. This allows multiple units to be developed with the same interface.
3) The user will only create the inside of the function using standard C syntax and grammar. I will then wrap that function with text to fully define the function and it's environment (remember those variables I intend to expose?) I can also group multiple functions under a single executable unit (dll) using name parameters.
4) When the user wishes to test their function, I dump the dll from memory, compile their code with my wrappers in gcc, and then reload the dll into memory and run it. I would let them define inputs and outputs for testing.
5) Once the test/create step was complete, I have a compiled library created which can be loaded at run time and handled via pointers. The inputs and outputs would be standardized, so I would always know what my I/O was.
6) The only problem with standardized I/O is that some of the inputs and outputs are likely to not be used. I need to see if I can put default values in or something.
So, to sum up:
Think of an app with a text box and a few buttons. You are told that your inputs are named A, B, and C and that your outputs are X, Y, and Z of specified types. You then write a function using standard C code, and with functions from the specified libraries (I'm thinking math etc.)
So now your done... you see a few boxes below to define your input. You fill them in and hit the TEST button. This would wrap your code in a function context, dump the existing dll from memory (if it exists) and compile your code along with any other functions in the same group (another parameter you could define, basically just a name to the user.) It then runs the function using a functional pointer, using the inputs defined in the UI. The outputs are sent to the user so they can determine if their function works. If there are any compilation errors, that would also be outputted to the user.
Now it's time to run for real. Of course I kept track of what functions are where, so I dynamically open the dll, and load all the functions into memory with functional pointers. I start shoving data into one side and the functions give me the answers I need. There would be some overhead to track I/O and to make sure the functions are called in the right order, but the execution would be at compiled machine code speeds... which is my primary requirement.
Now... I have explained what I think will work in two different ways. Can you think of anything that would keep this from working, or perhaps any advice/gotchas/lessons learned that would help me out? Anything from the type of interface to tips on dynamically loading dll's in this manner to using the gcc compiler this way... etc would be most helpful.
Thanks!

How to Prevent I/O Access in C++ or Native Compiled Code

I know this may be impossible but I really hope there's a way to pull it off. Please tell me if there's any way.
I want to write a sandbox application in C++ and allow other developers to write native plugins that can be loaded right into the application on the fly. I'd probably want to do this via DLLs on Windows, but I also want to support Linux and hopefully Mac.
My issue is that I want to be able to prevent the plugins from doing I/O access on their own. I want to require them to use my wrapped routines so that I can ensure none of the plugins write malicious code that starts harming the user's files on disk or doing things undesireable on the network.
My best guess on how to pull off something like this would be to include a compiler with the application and require the source code for the plugins to be distributed and compiled right on the end-user platform. Then I'd need an code scanner that could search the plugin uncompiled code for signatures that would show up in I/O operations for hard disk or network or other storage media.
My understanding is that the STD libaries like fstream wrap platform-specific functions so I would think that simply scanning all the code that will be compiled for platform-specific functions would let me accomplish the task. Because ultimately, any C native code can't do any I/O unless it talks to the OS using one of the OS's provided methods, right??
If my line of thinking is correct on this, does anyone have a book or resource recommendation on where I could find the nuts and bolts of this stuff for Windows, Linux, and Mac?
If my line of thinking is incorrect and its impossible for me to really prevent native code (compiled or uncompiled) from doing I/O operations on its own, please tell me so I don't create an application that I think is secure but really isn't.
In an absolutely ideal world, I don't want to require the plugins to distribute uncompiled code. I'd like to allow the developers to compile and keep their code to themselves. Perhaps I could scan the binaries for signatures that pertain to I/O access????
Sandboxing a program executing code is certainly harder than merely scanning the code for specific accesses! For example, the program could synthesize assembler statements doing system calls.
The original approach on UNIXes is to chroot() the program but I think there are problems with that approach, too. Another approach is a secured environment like selinux, possible combined with chroot(). The modern approach used to do things like that seems to run the program in a virtual machine: upon start of the program fire up a suitable snapshot of a VM. Upon termination just rewind to tbe snaphot. That merely requires that the allowed accesses are somehow channeled somewhere.
Even a VM doesn't block I/O. It can block network traffic very easily though.
If you want to make sure the plugin doesn't do I/O you can scan it's DLL for all it's import functions and run the function list against a blacklist of I/O functions.
Windows has the dumpbin util and Linux has nm. Both can be run via a system() function call and the output of the tools be directed to files.
Of course, you can write your own analyzer but it's much harder.
User code can't do I/O on it's own. Only the kernel. If youre worried about the plugin gaining ring0/kernel privileges than you need to scan the ASM of the DLL for I/O instructions.

What does embedding a language into another do?

This may be kind of basic but... here goes.
If I decide to embed some kind of scripting language like Lua or Ruby into a C++ program by linking it's interpreter what does that allow me to do in C++ then?
Would I be able to write Ruby or Lua code right into the cpp file or simply call scripts from the program?
If the latter is true, how would I do that?
Because they're scripting languages, the code is always going to be "interpreted." In reality, you aren't "calling" the script code inside your program, but rather when you reach that point, you're executing the interpreter in the context of that thread (the thread that reaches the scripting portion), which then reads the scripting language and executes the applicable machine code after interpreting it (JIT compiling kind of, but not really, there's no compiling involved).
Because of this, its basically the same thing as forking the interpreter and running the script, unless you want access to variables in your compiled program/in your script from the compiled program. To access values to/from, because you're using the thread that has your compiled program's context, you should be able to store script variables on the stack as well and access them when your thread stops running the interpreter (assuming you stored the variables on the stack).
Edit: response:
You would have to write it yourself. Think about it this way: if you want to use assembly in c++, you use the asm keyword. You then in the c++ compiler, need to parse the source file, get to the asm keyword, and then switch to the assembly compiler. Then the assembly compiler needs to go until the end bracket of the asm region and compile this code.
If you want to do this,it will be a bit different, since assembly gets compiled, not interpreted (which is what you want to do). What you'll need to do, is change the compiler you're using (lets say c++), so that it recognizes your own user defined keyword. Lets say this keyword is scriptX{}. You need to change the c++'s parser so that when it see's scriptX{}, it stores everything between the brackets in the readonly data section of your compiled program. You then need to add a hook in the compiled assembly file to switch the context of the thread to your script interpreter, and start the program counter at the beginning of your script section (which you put in read only data section of the object file).
Good luck with that...
A common reason to embed a scripting language into a program is to provide for the ability to control the program with scripts provided by the end user.
Probably the simplest example of such a script is a configuration file. Assume that your program has options, and needs to remember the options from run to run. You could write them out to a file as a binary image of your options structure, but that would be fragile, not easy to inspect or edit, and likely not portable across systems. Writing the options out in plain text with some sort of labels for which is which addresses most of those complaints, but now you need to parse that text and recover the options. Then some users want different options on Tuesdays, want to do simple arithmetic to compute one option from another, or to write one configuration file that they can use on both Windows and Linux, and pretty soon you find yourself inventing a little language to express all of those ideas and mechanisms with. At this point, there's a better way.
The languages Lua and TCL both grew out of essentially that scenario. Larger systems needed to be configured and controlled by end users. End users wanted to edit a simple text file and get immediate satisfaction, even (especially) when working with large systems that might have required hours to compile successfully.
One advantage here is that rather than inventing a programming language one feature at a time as user's needs change, you start with a complete language along with its documentation. The language designer has already made a number of tough decisions for you (how do I represent strings and numbers, what about lists, what about named values, what does if look like, etc.) and has generally also brought a carefully designed and debugged implementation to the table.
Lua is particularly easy to integrate. Reading a simple configuration file and extracting the settings from the Lua state can be done using a small subset of its C API. Once you have Lua available, it is attractive to use it for other purposes. In many cases, you will find that it is more productive to write only the innermost loops in C, and use Lua to glue those functions together and provide all the "business logic" of the application. This is how Adobe Lightroom is implemented, as well as many games on platforms ranging from simple set-top-boxes to iOS devices and even PCs.

When writing a portable c/c++ program, what is the best way to consume external files?

I'm pretty new to the c/c++ scene, I've been spoon fed on virtual machines for too long.
I'm modifying an existing C++ tool that we use across the company. The tool is being used on all the major operating systems (Windows, Mac, Ubuntu, Solaris, etc). I'm attempting to bridge the tool with another tool written Java. Basically I just need to call java -jar from the C++ tool.
The problem is, how do I know where the jar is located on the user's computer? The c++ executables are currently checked into Perforce, and users sync and then call the exe, presumably leaving the exe in place (although they could copy it somewhere else). My current solution checks in the jar file beside the exe.
I've looked at multiple ways to calculate the location of the exe from C++, but none of them seem to be portable. On windows there is a 'GetModuleLocation' and on posix you can look at the procs/process.exe info to figure out the location of the process. And on most systems you can look at argv[0] to figure out where the exe is. But most of these techniques are 100% guaranteed due to users using $PATH, symlinks, etc to call the exe.
So, any guidance on the right way to do this that will always work? I guess I have no problem ifdef'ing multiple solutions, but it seems like there should be a more elegant way to do this.
I don't believe there is a portable way of doing this. The C++ standard itself does not define anything about the execution environment. The best you get is the std::system call, and that can fail for things like Unicode characters in path names.
The issue here is that C and C++ are both used on systems where there's no such thing as an operating system. No such thing as $PATH. Therefore, it would be nonsensical for the standards committee to require a conforming implementation provide such features.
I would just write one implementation for POSIX, one for Mac (if it differs significantly from the POSIX one... never used it so I'm not sure), and one for Windows (Select which one at compilation time with the preprocessor). It's maybe 3 function calls for each one; not a lot of code, and you'll be sure you're following the conventions of your target platform.
I'd like to point you to a few URLs which might help you find where the current executable was located. It does not appear as if there is one method for all (aside from the ARGV[0] + path search method which as you note is spoofable, but…are you really in a threat environment such that this is likely to happen?).
How to get the application executable name in WindowsC++/CLI?
https://superuser.com/questions/49104/handy-tool-to-find-executable-program-location
Finding current executable's path without /proc/self/exe
How do I find the location of the executable in C?
There are several solutions, none of them perfect. Under Windows, as
you have said, you can use GetModuleLocation, but that's not available
under Unix. You can try to simulate how the shell works, using
argv[0] and getenv("PATH"), but that's not easy, and it's not 100%
reliable either. (Under Unix, and I think under Windows as well, the
spawning application can hoodwink you, and put any sort of junk in
argv[0].) The usual solution under Unix is to require an environment
variable, e.g. MYAPPLICATION_HOME, which should contain the root
directory where you're application is installed; the application won't
start without it. Or you can ask the user to specify the root path with
a command line option.
In practice, I usually use all three: the command line option has
precedence, and is very useful when testing; the environment variable
works well in the Unix world, since it's what people are used to; and if
neither are present, I'll try to work out the location from where I was
started, using system dependent code: GetModuleLocation under Windows,
and getenv("PATH") and all the rest under Unix. (The Unix solution
isn't that hard if you already have code for breaking a string into
fields, and are using boost::filesystem.)
Good solution would be to write your custom function that is guaranteed to work in every platform you use. Preferably should use runtime checks if it worked, and then fallback to ifdefs only if some way of detecting it is not available in all platforms. But it might not be easy to detect if your code that executes correctly for example argv[0] would return the correct path...

Disabling system calls in C++

Is it possible to disable system calls when compiling C++ code? And if it is, how would I do that?
And to extend this question a bit. I wish to make program to not be able to interact with operating system, except for file reading and writing. Is it possible to do this?
EDIT: With not be able to interact with OS, I mean to not be able to change anything in OS, like creating, editing or deleting something. My main concern is system calls, which would almost in all cases be intended to be harmful.
This is for grading programs, where I would be running other people code. The programs would usually solve various algorithmic problems, so there is no need for very advanced features. Basic (more or less) STL usage and classic code. There would be no external libraries (like Boost or anything like that) or multiple files.
Yes, it's certainly possible.
Take a look at the source code for geordi to see how it does it. Geordi is an IRC bot that compiles, links and runs C++ code under an environment where most system calls are disabled.
#define system NO_SYSTEM_CALL
If you are ok with macros to generate errors for compilation purpose.
You could use any combination of the following:
create your own library with a dummy function called system and link it with the student code (assuming you control the build steps)
grep the source code (though preprocessing hacks could get around that)
run the built binaries under an unprivileged user id, after chroot etc.
use a virtual machine
invoke the compiler with -Dsystem= (though the student could #undef)
(maybe - have to check the end-user agreement) upload their source to ideone or similar and let their security handle such issues
An program can always invoke system calls, at leased under *nix it can. You could however take a look at SELinux, Apparmor, GRsec this are kernel safeguards which can block certain system calls for an application.