What is a Sandbox - c++

When anti-viruses run some application in a virtual environment called a "sandbox", how does this sandbox precisely work from the Windows kernel point of view?
Is it hard to write such a sandbox?

At a high level such sandboxes are kernel drivers which intercept calls to APIs, and modify the results those APIs return using hooking. How an entire sandboxing solution works under the hood though, could easily fill several books.
As for difficulty, it's probably one of the harder things you could ever possibly write. Not only do you have to provide hooks for most everything the operating system kernel provides, but you have to prevent the application from accessing the memory space of other processes, you have to have a way to save the state of the changes a program makes so that the program does not realize it's running under a sandbox. You have to do all of this in Kernel mode, which effectively limits you to using C, and forces you to deal with different kinds of memory, e.g. paged pool and nonpaged pool. Oh, and you have to do all of this very fast, so that the user feels it's worthwhile to run applications inside your sandbox. 50+% performance hits won't be tolerated by most users.

Related

Can I somehow protect the whole Windows machine from stalling in case my 64-bit program consumes all memory? [duplicate]

Is there any way to set a system wide memory limit a process can use in Windows XP? I have a couple of unstable apps which do work ok for most of the time but can hit a bug which results in eating whole memory in a matter of seconds (or at least I suppose that's it). This results in a hard reset as Windows becomes totally unresponsive and I lose my work.
I would like to be able to do something like the /etc/limits on Linux - setting M90, for instance (to set 90% max memory for a single user to allocate). So the system gets the remaining 10% no matter what.
Use Windows Job Objects. Jobs are like process groups and can limit memory usage and process priority.
Use the Application Verifier (AppVerifier) tool from Microsoft.
In my case I need to simulate memory no longer being available so I did the following in the tool:
Added my application
Unchecked Basic
Checked Low Resource Simulation
Changed TimeOut to 120000 - my application will run normally for 2 minutes before anything goes into effect.
Changed HeapAlloc to 100 - 100% chance of heap allocation error
Set Stacks to true - the stack will not be able to grow any larger
Save
Start my application
After 2 minutes my program could no longer allocate new memory and I was able to see how everything was handled.
Depending on your applications, it might be easier to limit the memory the language interpreter uses. For example with Java you can set the amount of RAM the JVM will be allocated.
Otherwise it is possible to set it once for each process with the windows API
SetProcessWorkingSetSize Function
No way to do this that I know of, although I'm very curious to read if anyone has a good answer. I have been thinking about adding something like this to one of the apps my company builds, but have found no good way to do it.
The one thing I can think of (although not directly on point) is that I believe you can limit the total memory usage for a COM+ application in Windows. It would require the app to be written to run in COM+, of course, but it's the closest way I know of.
The working set stuff is good (Job Objects also control working sets), but that's not total memory usage, only real memory usage (paged in) at any one time. It may work for what you want, but afaik it doesn't limit total allocated memory.
Per process limits
From an end-user perspective, there are some helpful answers (and comments) at the superuser question “Is it possible to limit the memory usage of a particular process on Windows”, including discussions of how to set recursive quota limits on any or all of:
CPU assignment (quantity, affinity, NUMA groups),
CPU usage,
RAM usage (both ‘committed’ and ‘working set’), and
network usage,
… mostly via the built-in Windows ‘Job Objects’ system (as mentioned in #Adam Mitz’s answer and #Stephen Martin’s comment above), using:
the registry (for persistence, when desired) or
free tools, such as the open-source Process Governor.
(Note: nested Job Objects ~may~ not have been available under all earlier versions of Windows, but the un-nested version appears to date back to Windows XP)
Per-user limits
As far as overall per-user quotas:
??
It is possible that each user session is automatically assigned to a job group itself; if true, per-user limits should be able to be applied to that job group. Update: nope; Job Objects can only be nested at the time they are created or associated with a specific process, and in some cases a child Job Object is allowed to ‘break free’ from its parent and become independent, so they can’t facilitate ‘per-user’ resource limits.
(NTFS does support per-user file system ~storage~ quotas, though)
Per-system limits
Besides simple BIOS or ‘energy profile’ restrictions:
VM hypervisor or Kubernetes-style container resource limit controls may be the most straightforward (in terms of end-user understandability, at least) option.
Footnotes, regarding per-process and other resource quotas / QoS for non-Windows systems:
‘Classic’ Mac OS (including ‘classic’ applications running on 2000s-era versions of Mac OS X): per-application memory limits can be easily set within the ‘Memory’ section of the Finder ‘Get Info’ window for the target program; as a system using a cooperative multitasking concurrency model, per-process CPU limits were impossible.
BSD: ? (probably has some overlap with linux and non-proprietary macOS methods?)
macOS (aka ‘Mac OS X’): no user-facing interface; system support includes, depending on version, the ‘Multiprocessing Services API’, Grand Central Dispatch, POSIX threads / pthread, ‘operation objects’, and possibly others.
Linux: ‘Resource Manager’/limits.conf, control groups/‘cgroups’, process priority/‘niceness’/renice, others?
IBM z/OS and other mainframe-style systems: resource controls / allocation was built-in from nearly the beginning

How (and why?) do memory editors work

Memory editors such as Cheat Engine are able to read the memory of other processes and modify it.
How do they do it?(a code snippet would be interesting!) A process does typically not have the ability to access the memory of another one, the only cases that I've heard of are in sub-processes/threading, but memory editors are typically not related to the target process in any way.
Why do they work? In what scenario is this ability useful aside from using it to hack other processes, why wouldn't the operating system simply disallow unrelated processes from reading the memory of each other?
On Windows, the function typically used to alter the memory of another process is called WriteProcessMemory:
https://learn.microsoft.com/en-in/windows/win32/api/memoryapi/nf-memoryapi-writeprocessmemory
If you search the Cheat Engine source code for WriteProcessMemory you can find it both in their Pascal code and the C kernel code. It needs PROCESS_VM_WRITE and PROCESS_VM_OPERATION access to the process which basically means you need to run Cheat Engine as admin.
WriteProcessMemory is used any time you want to alter the runtime behavior of another process. There are legitimate uses, such as with Cheat Engine or ModOrganizer, and of course lots of illegitimate ones. It's worth mentioning that anti-virus software is typically trained to look for this API call (among others) so unless your application has been whitelisted it might get flagged because of it.
Operating systems typically expose system calls that allow a userspace program to inspect a target process's memory using said system calls.
For instance, the linux kernel supports the ptrace system call. This system call is primarily used by the well known debugger gdb and by popular system call tracing utilities such as strace.
The ptrace system call allows for the inspection of memory contents of the target process, code injection, register manipulation and much more. I would suggest this as a resource if you are interested in learning more.
On Linux, you can either run a binary from within gdb, or attach to a process. In case of the latter, you need elevated privileges. There are other protections that try to limit what you can do with ptrace, such as the one mentioned here.
On Windows you can achieve the same effect by using the following functions in order : OpenProcess, GetProcAddress, VirtualAllocEx, WriteProcessMemory and CreateRemoteThread. I would suggest this as a resource if you are interested in knowing more. You might need elevated privileges to do this on newer Windows versions.

Dynamic Linking ~ Limiting a DLL's system access

I know the question might seem a little vague but I will try to explain as clearly as I can.
In C++ there is a way to dynamically link code to your already running program. I am thinking about creating my own plugin system (For learning/research purposes) but I'd like to limit the plugins to specific system access for security purposes.
I would like to give the plugins limited access to for example disk writing such that it can only call functions from API I pass from my application (and write through my predefined interface) Is there a way to enforce this kind of behaviour from the application side?
If not: Are there other language's that support secure dynamically linked modules?
You should think of writing a plugin container (or a sand-box), then coordinate everything through the container, also make sure to drop privileges that you do not need inside the container process before running the plugin. Being run in a process means, you can run the container also as a unique user and not the one who started the process, after that you can limit the user and automatically the process will be limited. Having a dedicated user for a process is the most common and easiest way, it is also the only cross-platform way to limit a process, even on Windows you can use this method to limit a process.
Limiting access to shared resources that OS provides, like disk or RAM or CPU depends heavily on the OS, and you have not specified what OS. While it is doable on most OSes, Linux is the prime choice because it is written with multi-seat and server-use-cases in mind. For example in Linux you can use cgroups here to limit CPU, or RAM easily for each process, then you will only need to apply it for your plugin container process. There is blkio to control disk access, but you can still use the traditional quote mechanism in Linux to limit per-process or per-user share of disk space.
Supporting plugins is an involved process, and the best way to start is reading code that does some of that, Chromium sand-boxing is best place I can suggest, it is very cleanly written, and has nice documentation. Fortunately the code is not very big.
If you prefer less involvement with actual cgroups, there is an even easier mechanism for limiting resources, docker is fairly new but abstracts away low level OS constructs to easily contain applications, without the need to run them in Virtual Machines.
To block some calls, a first idea may be to hook the system calls which are forbidden and others API call which you don't want. You can also hook the dynamic linking calls to prevent your plugins to load another DLLs. Hook disk read/write API to block read/write.
Take a look at this, it may give you an idea to how can you forbid function calls.
You can also try to sandbox your plugins, try to look some open source sandbox and understand how they work. It should help you.
In this case you really have to sandbox the environment in that the DLL runs. Building such a sandbox is not easy at all, and it is something you probably do not want to do at all. System calls can be hidden in strings, or generated through meta programming at execution time, so hard to detect by just analysing the binary. Luckyly people have already build solutions. For example google's project native client with the goal to generally allow C++ code to be run safely in the browser. And when it is safe enough for a browser, it is probably safe enough for you and it might work outside of the browser.

c++ Distributed computing of an executable program

I was wondering if it is possible to run an executable program without adding to its source code, like running any game across several computers. When i was programming in c# i noticed a process method, which lets you summon or close any application or process, i was wondering if there was something similar with c++ which would let me transfer the processes of any executable file or game to other computers or servers minimizing my computer's processor consumption.
thanks.
Everything is possible, but this would require a huge amount of work and would almost for sure make your program painfully slower (I'm talking about a factor of millions or billions here). Essentially you would need to make sure every layer that is used in the program allows this. So you'd have to rewrite the OS to be able to do this, but also quite a few of the libraries it uses.
Why? Let's assume you want to distribute actual threads over different machines. It would be slightly more easy if it were actual processes, but I'd be surprised many applications work like this.
To begin with, you need to synchronize the memory, more specifically all non-thread-local storage, which often means 'all memory' because not all language have a thread-aware memory model. Of course, this can be optimized, for example buffer everything until you encounter an 'atomic' read or write, if of course your system has such a concept. Now can you imagine every thread blocking for synchronization a few seconds whenever a thread has to be locked/unlocked or an atomic variable has to be read/written?
Next to that there are the issues related to managing devices. Assume you need a network connection: which device will start this, how will the ip be chosen, ...? To seamlessly solve this you probably need a virtual device shared amongst all platforms. This has to happen for network devices, filesystems, printers, monitors, ... . And as you kindly mention games: this should happen for a GPU as well, just imagine how this would impact performance in only sending data from/to the GPU (hint: even 16xpci-e is often already a bottleneck).
In conclusion: this is not feasible, if you want a clustered application, you have to build it into the application from scratch.
I believe the closest thing you can do is MapReduce: it's a paradigm which hopefully will be a part of the official boost library soon. However, I don't think that you would want to apply it to a real-time application like a game.
A related question may provide more answers: https://stackoverflow.com/questions/2168558/is-there-anything-like-hadoop-in-c
But as KillianDS pointed out, there is no automagical way to do this, nor does it seem like is there a feasible way to do it. So what is the exact problem that you're trying to solve?
The current state of research is into practical means to distribute the work of a process across multiple CPU cores on a single computer. In that case, these processors still share RAM. This is essential: RAM latencies are measured in nanoseconds.
In distributed computing, remote memory access can take tens if not hundreds of microseconds. Distributed algorithms explicitly take this into account. No amount of magic can make this disappear: light itself is slow.
The Plan 9 OS from AT&T Bell Labs supports distributed computing in the most seamless and transparent manner. Plan 9 was designed to take the Unix ideas of breaking jobs into interoperating small tasks, performed by highly specialised utilities, and "everything is a file", as well as the client/server model, to a whole new level. It has the idea of a CPU server which performs computations for less powerful networked clients. Unfortunately the idea was too ambitious and way beyond its time and Plan 9 remained largerly a research project. It is still being developed as open source software though.
MOSIX is another distributed OS project that provides a single process space over multiple machines and supports transparent process migration. It allows processes to become migratable without any changes to their source code as all context saving and restoration are done by the OS kernel. There are several implementations of the MOSIX model - MOSIX2, openMosix (discontinued since 2008) and LinuxPMI (continuation of the openMosix project).
ScaleMP is yet another commercial Single System Image (SSI) implementation, mainly targeted towards data processing and Hight Performance Computing. It not only provides transparent migration between the nodes of a cluster but also provides emulated shared memory (known as Distributed Shared Memory). Basically it transforms a bunch of computers, connected via very fast network, into a single big NUMA machine with many CPUs and huge amount of memory.
None of these would allow you to launch a game on your PC and have it transparently migrated and executed somewhere on the network. Besides most games are GPU intensive and not so much CPU intensive - most games are still not even utilising the full computing power of multicore CPUs. We have a ScaleMP cluster here and it doesn't run Quake very well...

how to check performance of a c++ api

my web server has a lot of dependencies for sending back data, when it gets a request. i am testing one of these dependency applications within the web server. the application is decoupled from the main web server, and only queries are going to it in the form of api's exposed.
my question is, if i wish to check these api's in a multithreaded environment (c++ functions with a 2 quadcore processor machine), what is the best wy to go about doing it?
do i call each api in a separate thread or process? if so, how do i implement such code? from what i can figure out, i would be duplicating the functioning of the web server, but i can find no other better way to figure out the performance improvements given by that component alone.
It depends on whether your app deails with data that's shared if it is run in parallel processes because that'll most likely determine where the speed bottleneck awaits.
E.g, if the app accesses a database or disk files, you'll probably have to simulate multiple threads/processes querying the app in order to see how they get along with each other, i.e. whether they have to wait for each other while accessing the shared resource.
But if the app only does some internal calculation, all by its own, then it may scale well, as long as all its data fits into memory (i.e. not virtual memory access, e.g. disk access, necessary). Then you can test the performance of just one instance and focus on optimizing its speed.
It also might help to state the OS you're planning to use. Mac OS X offers tools for performance testing and optimization that Windows and Linux may not, and vice versa.