how to check performance of a c++ api - c++

my web server has a lot of dependencies for sending back data, when it gets a request. i am testing one of these dependency applications within the web server. the application is decoupled from the main web server, and only queries are going to it in the form of api's exposed.
my question is, if i wish to check these api's in a multithreaded environment (c++ functions with a 2 quadcore processor machine), what is the best wy to go about doing it?
do i call each api in a separate thread or process? if so, how do i implement such code? from what i can figure out, i would be duplicating the functioning of the web server, but i can find no other better way to figure out the performance improvements given by that component alone.

It depends on whether your app deails with data that's shared if it is run in parallel processes because that'll most likely determine where the speed bottleneck awaits.
E.g, if the app accesses a database or disk files, you'll probably have to simulate multiple threads/processes querying the app in order to see how they get along with each other, i.e. whether they have to wait for each other while accessing the shared resource.
But if the app only does some internal calculation, all by its own, then it may scale well, as long as all its data fits into memory (i.e. not virtual memory access, e.g. disk access, necessary). Then you can test the performance of just one instance and focus on optimizing its speed.
It also might help to state the OS you're planning to use. Mac OS X offers tools for performance testing and optimization that Windows and Linux may not, and vice versa.

Related

QsharedMemory with no Qt application

I have an application A, and I want to share some information with an application B.
Application A write information each ~150ms.
Application B read information at any times.
I searched and found QSharedMemory, which looks great, but the application B will not be developed by my company, so I can't choose the programming langage.
Is QSharedMemory a good idea ?
How can I do that ?
QSharedMemory is a thin wrapper around named and unnamed platform shared memory. When named, there's simply a file that the other application can memory-map and use from any programming language, as long as said language supports binary buffers.
I do wonder if it wouldn't be easier, though, if you used a pipe for IPC. QLocalSocket encapsulates that on Qt's end, and the other side simply uses a native pipe.
Shared memory makes sense only in certain scenarios, like, say, pushing images that may not change all that much between applications - where the cost of pushing the entire image all the time would be prohibitive in the light of small-on-average bandwidth of changes. The image doesn't need to mean a visual image, it may be an industrial process image, etc.
In many cases, shared memory is a premature pseudo-optimization that makes things much harder than necessary, and can, in case of a multitude of communicating processes, become a pessimization - you do pay the cost in virtual memory for each shared memory segment.
Sounds like you need to implement a simple server, using local sockets it should be pretty fast in terms of bandwidth and easy to develop. The server will act to store data from A and deliver it to B upon request.
Obviously, it won't work "with no application" in between. Whether you go for shared memory or a local socket, you will need some server code to run at all time to service A and B. If A is running all the time, it can well be a part of it, but it can also be standalone.
It would be preferable to use a local socket, because the API for that is more portable across different programming languages, in that case A and B can be implemented in arbitrary languages and frameworks and communicate at the socket protocol level. With QSharedMemory it won't be as portable in your scenario.

Dynamic Linking ~ Limiting a DLL's system access

I know the question might seem a little vague but I will try to explain as clearly as I can.
In C++ there is a way to dynamically link code to your already running program. I am thinking about creating my own plugin system (For learning/research purposes) but I'd like to limit the plugins to specific system access for security purposes.
I would like to give the plugins limited access to for example disk writing such that it can only call functions from API I pass from my application (and write through my predefined interface) Is there a way to enforce this kind of behaviour from the application side?
If not: Are there other language's that support secure dynamically linked modules?
You should think of writing a plugin container (or a sand-box), then coordinate everything through the container, also make sure to drop privileges that you do not need inside the container process before running the plugin. Being run in a process means, you can run the container also as a unique user and not the one who started the process, after that you can limit the user and automatically the process will be limited. Having a dedicated user for a process is the most common and easiest way, it is also the only cross-platform way to limit a process, even on Windows you can use this method to limit a process.
Limiting access to shared resources that OS provides, like disk or RAM or CPU depends heavily on the OS, and you have not specified what OS. While it is doable on most OSes, Linux is the prime choice because it is written with multi-seat and server-use-cases in mind. For example in Linux you can use cgroups here to limit CPU, or RAM easily for each process, then you will only need to apply it for your plugin container process. There is blkio to control disk access, but you can still use the traditional quote mechanism in Linux to limit per-process or per-user share of disk space.
Supporting plugins is an involved process, and the best way to start is reading code that does some of that, Chromium sand-boxing is best place I can suggest, it is very cleanly written, and has nice documentation. Fortunately the code is not very big.
If you prefer less involvement with actual cgroups, there is an even easier mechanism for limiting resources, docker is fairly new but abstracts away low level OS constructs to easily contain applications, without the need to run them in Virtual Machines.
To block some calls, a first idea may be to hook the system calls which are forbidden and others API call which you don't want. You can also hook the dynamic linking calls to prevent your plugins to load another DLLs. Hook disk read/write API to block read/write.
Take a look at this, it may give you an idea to how can you forbid function calls.
You can also try to sandbox your plugins, try to look some open source sandbox and understand how they work. It should help you.
In this case you really have to sandbox the environment in that the DLL runs. Building such a sandbox is not easy at all, and it is something you probably do not want to do at all. System calls can be hidden in strings, or generated through meta programming at execution time, so hard to detect by just analysing the binary. Luckyly people have already build solutions. For example google's project native client with the goal to generally allow C++ code to be run safely in the browser. And when it is safe enough for a browser, it is probably safe enough for you and it might work outside of the browser.

c++ Distributed computing of an executable program

I was wondering if it is possible to run an executable program without adding to its source code, like running any game across several computers. When i was programming in c# i noticed a process method, which lets you summon or close any application or process, i was wondering if there was something similar with c++ which would let me transfer the processes of any executable file or game to other computers or servers minimizing my computer's processor consumption.
thanks.
Everything is possible, but this would require a huge amount of work and would almost for sure make your program painfully slower (I'm talking about a factor of millions or billions here). Essentially you would need to make sure every layer that is used in the program allows this. So you'd have to rewrite the OS to be able to do this, but also quite a few of the libraries it uses.
Why? Let's assume you want to distribute actual threads over different machines. It would be slightly more easy if it were actual processes, but I'd be surprised many applications work like this.
To begin with, you need to synchronize the memory, more specifically all non-thread-local storage, which often means 'all memory' because not all language have a thread-aware memory model. Of course, this can be optimized, for example buffer everything until you encounter an 'atomic' read or write, if of course your system has such a concept. Now can you imagine every thread blocking for synchronization a few seconds whenever a thread has to be locked/unlocked or an atomic variable has to be read/written?
Next to that there are the issues related to managing devices. Assume you need a network connection: which device will start this, how will the ip be chosen, ...? To seamlessly solve this you probably need a virtual device shared amongst all platforms. This has to happen for network devices, filesystems, printers, monitors, ... . And as you kindly mention games: this should happen for a GPU as well, just imagine how this would impact performance in only sending data from/to the GPU (hint: even 16xpci-e is often already a bottleneck).
In conclusion: this is not feasible, if you want a clustered application, you have to build it into the application from scratch.
I believe the closest thing you can do is MapReduce: it's a paradigm which hopefully will be a part of the official boost library soon. However, I don't think that you would want to apply it to a real-time application like a game.
A related question may provide more answers: https://stackoverflow.com/questions/2168558/is-there-anything-like-hadoop-in-c
But as KillianDS pointed out, there is no automagical way to do this, nor does it seem like is there a feasible way to do it. So what is the exact problem that you're trying to solve?
The current state of research is into practical means to distribute the work of a process across multiple CPU cores on a single computer. In that case, these processors still share RAM. This is essential: RAM latencies are measured in nanoseconds.
In distributed computing, remote memory access can take tens if not hundreds of microseconds. Distributed algorithms explicitly take this into account. No amount of magic can make this disappear: light itself is slow.
The Plan 9 OS from AT&T Bell Labs supports distributed computing in the most seamless and transparent manner. Plan 9 was designed to take the Unix ideas of breaking jobs into interoperating small tasks, performed by highly specialised utilities, and "everything is a file", as well as the client/server model, to a whole new level. It has the idea of a CPU server which performs computations for less powerful networked clients. Unfortunately the idea was too ambitious and way beyond its time and Plan 9 remained largerly a research project. It is still being developed as open source software though.
MOSIX is another distributed OS project that provides a single process space over multiple machines and supports transparent process migration. It allows processes to become migratable without any changes to their source code as all context saving and restoration are done by the OS kernel. There are several implementations of the MOSIX model - MOSIX2, openMosix (discontinued since 2008) and LinuxPMI (continuation of the openMosix project).
ScaleMP is yet another commercial Single System Image (SSI) implementation, mainly targeted towards data processing and Hight Performance Computing. It not only provides transparent migration between the nodes of a cluster but also provides emulated shared memory (known as Distributed Shared Memory). Basically it transforms a bunch of computers, connected via very fast network, into a single big NUMA machine with many CPUs and huge amount of memory.
None of these would allow you to launch a game on your PC and have it transparently migrated and executed somewhere on the network. Besides most games are GPU intensive and not so much CPU intensive - most games are still not even utilising the full computing power of multicore CPUs. We have a ScaleMP cluster here and it doesn't run Quake very well...

Is QtCore too 'heavy' for server-side use?

I am looking into writing a self contained http server using Qt libraries, although many people have the view that QtCore is too bloated and that the overhead would be too large. Would a QtCore http server manage a load of about 50 concurrent connections, using a thread pool.
The QtCore library is dynamically linked on arch Linux compiled for release with optimization o2
There is no reason that one could not write a server with Qt, however, there is really no way to tell beforehand whether the performance will be what you want (depends on what your server does). Note that the optimal number of concurrent threads is typically dependent on the number of hardware cores, as well as the level of parallelism in your program. My suggestion would be to implement whatever you can in the least amount of time, and then tune the performance as needed afterwards. Even if the server cannot handle that many concurrent connections, you can use process-level parallelism (running multiple instances of your multithreaded server), until you have improved the performance.
Your question is very broad and the answer depends on how you want to design your http server. You could design it as a "single threaded reactor" or "multi-threaded proactor" or "half-synch half asynch" server.
QT mostly uses little wrapper classes over native or posix APIs and brings its own overweight for sure and 50 connections does not sound too many but again the answer depends on what will these connections do ? Serve simple pages or perform heavy calculations ?
I think the difficulty of the project lies in implementing a full http server that is secure ,reliable and scalable. You will have to do a lot of coding just to provide the life cycle of a simple Java servlet model. Many interfaces/abstractions are required.
You can find open source http servers already tested. I would not even bother writing my own for production software.
50 connections isn't much.
But I hope you will add the QtNetwork module :-)

What is a Sandbox

When anti-viruses run some application in a virtual environment called a "sandbox", how does this sandbox precisely work from the Windows kernel point of view?
Is it hard to write such a sandbox?
At a high level such sandboxes are kernel drivers which intercept calls to APIs, and modify the results those APIs return using hooking. How an entire sandboxing solution works under the hood though, could easily fill several books.
As for difficulty, it's probably one of the harder things you could ever possibly write. Not only do you have to provide hooks for most everything the operating system kernel provides, but you have to prevent the application from accessing the memory space of other processes, you have to have a way to save the state of the changes a program makes so that the program does not realize it's running under a sandbox. You have to do all of this in Kernel mode, which effectively limits you to using C, and forces you to deal with different kinds of memory, e.g. paged pool and nonpaged pool. Oh, and you have to do all of this very fast, so that the user feels it's worthwhile to run applications inside your sandbox. 50+% performance hits won't be tolerated by most users.