C++: Any way to 'jail function'? - c++

Well, it's a kind of a web server.
I load .dll(.a) files and use them as program modules.
I recursively go through directories and put '_main' functors from these libraries into std::map under name, which is membered in special '.m' files.
The main directory has few directories for each host.
The problem is that I need to prevent usage of 'fopen' or any other filesystem functions working with directory outside of this host directory.
The only way I can see for that - write a warp for stdio.h (I mean, write s_stdio.h that has a filename check).
May be it could be a deamon, catching system calls and identifying something?
add
Well, and what about such kind of situation: I upload only souses and then compile it directly on my server after checking up? Well, that's the only way I found (having everything inside one address space still).

As C++ is low level language and the DLLs are compiled to machine code they can do anything. Even if you wrap the standard library functions the code can do the system calls directly, reimplementing the functionality you have wrapped.
Probably the only way to effectively sandbox such a DLL is some kind of virtualisation, so the code is not run directly but in a virtual machine.
The simpler solution is to use some higher level language for the loadable modules that should be sandboxed. Some high level languages are better at sandboxing (Lua, Java), other are not so good (e.g. AFAIK currently there is no official restricted environment implemented for Python).

If you are the one loading the module, you can perform a static analysis on the code to verify what APIs it calls, and refuse to link it if it doesn't check out (i.e. if it makes any kind of suspicious call at all).
Having said that, it's a lot of work to do this, and not very portable.

Related

How to add boost.asio to the windows universal app project?

How can I add boost.asio to a windows universal project to it's shared components?
Do I need to create separate project and include the header files there or is there more simple way ?
Thanks!
While I can't get into the specifics of Universal Apps too much (I'm not an authority on that subject), I can tell you that this: boost::asio is a header only library. That means that by simply including the headers into your C++ project, that code is merged directly into your main assembly. I highly recommend using it in this way.
If you're going to include this header only library into another DLL that you then include in your main app, things are going to messy. First, you have the headache of building binaries for each target (x86, x64 and ARM) and maintaining those dependencies but beyond that, the real headache is what you need to go through to make boost::asio function when being loaded from a shared assembly.
In order to do this, you need to define a special static member inside ::asio called winsock_init in your code. ::asio uses an internal, static customized reference counter using interlocked exchanges to track its own usage. When the counter is incremented beyond zero, calls to things such as WSAStartup() are made to ensure that the library plays nice with Winsock. When the counter reaches zero again, WSACleanup() is called again for the same reasons.
The structure winsock_init circumvents this functionality, so it's up to you to correctly, manually call these functions from within your shared assembly, otherwise you're going to completely break ASIO AND your application will fail compliance testing for app store deployment.
Also, whenever you try to wrap ::asio into a shared assembly you need to include special source files one time only, within the dll and then you need to define a bunch of special boost config variables both in this DLL project and any project that uses this ::asio dll.
My advice again is to simply include the headers alone in your primary assembly and then you're not introducing all of these headaches. Another alternative is to simply use C++/CLI or Managed C++, whatever it's called these days, and directly access the .NET socket classes from within your mixed C++ code.
See here for more details about compiling ASIO into a separate assembly if you really want to suffer all the pain I've described.

How to Prevent I/O Access in C++ or Native Compiled Code

I know this may be impossible but I really hope there's a way to pull it off. Please tell me if there's any way.
I want to write a sandbox application in C++ and allow other developers to write native plugins that can be loaded right into the application on the fly. I'd probably want to do this via DLLs on Windows, but I also want to support Linux and hopefully Mac.
My issue is that I want to be able to prevent the plugins from doing I/O access on their own. I want to require them to use my wrapped routines so that I can ensure none of the plugins write malicious code that starts harming the user's files on disk or doing things undesireable on the network.
My best guess on how to pull off something like this would be to include a compiler with the application and require the source code for the plugins to be distributed and compiled right on the end-user platform. Then I'd need an code scanner that could search the plugin uncompiled code for signatures that would show up in I/O operations for hard disk or network or other storage media.
My understanding is that the STD libaries like fstream wrap platform-specific functions so I would think that simply scanning all the code that will be compiled for platform-specific functions would let me accomplish the task. Because ultimately, any C native code can't do any I/O unless it talks to the OS using one of the OS's provided methods, right??
If my line of thinking is correct on this, does anyone have a book or resource recommendation on where I could find the nuts and bolts of this stuff for Windows, Linux, and Mac?
If my line of thinking is incorrect and its impossible for me to really prevent native code (compiled or uncompiled) from doing I/O operations on its own, please tell me so I don't create an application that I think is secure but really isn't.
In an absolutely ideal world, I don't want to require the plugins to distribute uncompiled code. I'd like to allow the developers to compile and keep their code to themselves. Perhaps I could scan the binaries for signatures that pertain to I/O access????
Sandboxing a program executing code is certainly harder than merely scanning the code for specific accesses! For example, the program could synthesize assembler statements doing system calls.
The original approach on UNIXes is to chroot() the program but I think there are problems with that approach, too. Another approach is a secured environment like selinux, possible combined with chroot(). The modern approach used to do things like that seems to run the program in a virtual machine: upon start of the program fire up a suitable snapshot of a VM. Upon termination just rewind to tbe snaphot. That merely requires that the allowed accesses are somehow channeled somewhere.
Even a VM doesn't block I/O. It can block network traffic very easily though.
If you want to make sure the plugin doesn't do I/O you can scan it's DLL for all it's import functions and run the function list against a blacklist of I/O functions.
Windows has the dumpbin util and Linux has nm. Both can be run via a system() function call and the output of the tools be directed to files.
Of course, you can write your own analyzer but it's much harder.
User code can't do I/O on it's own. Only the kernel. If youre worried about the plugin gaining ring0/kernel privileges than you need to scan the ASM of the DLL for I/O instructions.

When writing a portable c/c++ program, what is the best way to consume external files?

I'm pretty new to the c/c++ scene, I've been spoon fed on virtual machines for too long.
I'm modifying an existing C++ tool that we use across the company. The tool is being used on all the major operating systems (Windows, Mac, Ubuntu, Solaris, etc). I'm attempting to bridge the tool with another tool written Java. Basically I just need to call java -jar from the C++ tool.
The problem is, how do I know where the jar is located on the user's computer? The c++ executables are currently checked into Perforce, and users sync and then call the exe, presumably leaving the exe in place (although they could copy it somewhere else). My current solution checks in the jar file beside the exe.
I've looked at multiple ways to calculate the location of the exe from C++, but none of them seem to be portable. On windows there is a 'GetModuleLocation' and on posix you can look at the procs/process.exe info to figure out the location of the process. And on most systems you can look at argv[0] to figure out where the exe is. But most of these techniques are 100% guaranteed due to users using $PATH, symlinks, etc to call the exe.
So, any guidance on the right way to do this that will always work? I guess I have no problem ifdef'ing multiple solutions, but it seems like there should be a more elegant way to do this.
I don't believe there is a portable way of doing this. The C++ standard itself does not define anything about the execution environment. The best you get is the std::system call, and that can fail for things like Unicode characters in path names.
The issue here is that C and C++ are both used on systems where there's no such thing as an operating system. No such thing as $PATH. Therefore, it would be nonsensical for the standards committee to require a conforming implementation provide such features.
I would just write one implementation for POSIX, one for Mac (if it differs significantly from the POSIX one... never used it so I'm not sure), and one for Windows (Select which one at compilation time with the preprocessor). It's maybe 3 function calls for each one; not a lot of code, and you'll be sure you're following the conventions of your target platform.
I'd like to point you to a few URLs which might help you find where the current executable was located. It does not appear as if there is one method for all (aside from the ARGV[0] + path search method which as you note is spoofable, but…are you really in a threat environment such that this is likely to happen?).
How to get the application executable name in WindowsC++/CLI?
https://superuser.com/questions/49104/handy-tool-to-find-executable-program-location
Finding current executable's path without /proc/self/exe
How do I find the location of the executable in C?
There are several solutions, none of them perfect. Under Windows, as
you have said, you can use GetModuleLocation, but that's not available
under Unix. You can try to simulate how the shell works, using
argv[0] and getenv("PATH"), but that's not easy, and it's not 100%
reliable either. (Under Unix, and I think under Windows as well, the
spawning application can hoodwink you, and put any sort of junk in
argv[0].) The usual solution under Unix is to require an environment
variable, e.g. MYAPPLICATION_HOME, which should contain the root
directory where you're application is installed; the application won't
start without it. Or you can ask the user to specify the root path with
a command line option.
In practice, I usually use all three: the command line option has
precedence, and is very useful when testing; the environment variable
works well in the Unix world, since it's what people are used to; and if
neither are present, I'll try to work out the location from where I was
started, using system dependent code: GetModuleLocation under Windows,
and getenv("PATH") and all the rest under Unix. (The Unix solution
isn't that hard if you already have code for breaking a string into
fields, and are using boost::filesystem.)
Good solution would be to write your custom function that is guaranteed to work in every platform you use. Preferably should use runtime checks if it worked, and then fallback to ifdefs only if some way of detecting it is not available in all platforms. But it might not be easy to detect if your code that executes correctly for example argv[0] would return the correct path...

Obtaining cross-platform path for config file (C/C++)

I would like to store my application's settings in a configuration file. Under Linux (and Mac?) this (might) be /home/user/.config/app.conf while under Windows it (might) be "C:\Documents and Settings\username\Application Data\app.conf". It can of course be stored elsewhere, so the only way to get the correct location is to use a platform-specific function.
Suffice it to say I don't wish to risk coding this myself and getting it wrong (because I lack access to some of these platforms for testing), so does anyone know if there are any well-tested cross-platform C/C++ libraries that can do this? A .h/.hpp file that uses a bunch of #defines would also be fine, as long as it's widely used.
I thought Boost's program options library might be able to (as it can load configuration files) but it doesn't seem able to.
Any suggestions?
This came up again, so I decided to bite the bullet and create my own solution since the only existing ones are part of huge frameworks and impractical for small programs.
I have published the code at https://github.com/Malvineous/cfgpath
It is placed in the public domain so free to use by anyone for any purpose. It has no dependencies beyond the standard platform APIs. Just #include a single .h file and call one of the functions. The other files in the repository are just test code, you don't need these unless you want to make changes you intend to send to me (please do!)
Unfortunately as I said in my original post I don't have easy access to many platforms, so I hope I will get a few patches to add support for more platforms.
Qt's QSettings class will do this for you.
On *nix the settings will be stored in $HOME/.config. On Windows the settings will be stored in the registry. On Mac the settings will be stored in $HOME/Library/Preferences/.
wxWidgets has a function you can call to get this, but for Unix, it's a bit outdated as it returns the home directory instead of the more common ~/.config
See:
https://docs.wxwidgets.org/3.0/classwx_standard_paths.html#a7c7cf595d94d29147360d031647476b0
https://github.com/wxWidgets/wxWidgets/issues/9300
I think the boost filesystem libraries should help. It has a platform independent path grammar.

C++ internal code reuse: compile everything or share the library / dynamic library?

General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?
People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.
One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.
We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.
I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.