How to implement online judge bot?(ex. TopCoder, Uva, ACM-ICPC) - c++

There are many online judge sites which can verify your program by comparing its output to the correct answers. What's more, they also check the running time and memory usage to make sure that your program doesn't exceed the maximum limit.
So here is my question, since some online judge sites run several test programs at the same time, how do they achieve performance isolation? and how do they achieve same running time on same program that run at another time?
I think there are isolated environment processes like 'VMware' or 'Sandbox' that always return same result. is this correct? and any idea about how to implement these things?
Current Solution
I'm using docker for sandboxing. it's a dead simple and the safest way.

Unfortunately it is VERY hard to actually guarantee consistent running times even on a dedicated machine versus a VM. If you do want to implement something like this as was mentioned you probably want a VM to keep all the code that will run sandboxed. Usually you don't want to service more than a couple of requests per core so I would say for algorithms that are memory and cpu bound use at most 2 VMs per physical core of the machine.
Although I can only speculate why not try different numbers of VMs per core and see how it performs. Try to aim for about a 90% or higher rate of SLO compliance (or 98-99 if you really need to) and you should be just fine. Again its hard to tell you exactly what to do as a lot of these things require just testing it out and seeing how it does.

May be overly simplistic depending on your other requirements which aren't in the question, but;
If the algorithms are CPU bound, simply running it in an isolated VM (or FreeBSD jail, or...) and using the built-in operating system instrumentation would be the simplest.
(Could be as simple as using the 'time' command in unix and setting memory limits with "limit")


Profiling a multiprocess system

I have a system that i need to profile.
It is comprised of tens of processes, mostly c++, some comprised of several threads, that communicate to the network and to one another though various system calls.
I know there are performance bottlenecks sometimes, but no one has put in the time/effort to check where they are: they may be in userspace code, inefficient use of syscalls, or something else.
What would be the best way to approach profiling a system like this?
I have thought of the following strategy:
Manually logging the roundtrip times of various code sequences (for example processing an incoming packet or a cli command) and seeing which process takes the largest time. After that, profiling that process, fixing the problem and repeating.
This method seems sorta hacky and guess-worky. I dont like it.
How would you suggest to approach this problem?
Are there tools that would help me out (multi-process profiler?)?
What im looking for is more of a strategy than just specific tools.
Should i profile every process separately and look for problems? if so how do i approach this?
Do i try and isolate the problematic processes and go from there? if so, how do i isolate them?
Are there other options?
I don't think there is a single answer to this sort of question. And every type of issue has it's own problems and solutions.
Generally, the first step is to figure out WHERE in the big system is the time spent. Is it CPU-bound or I/O-bound?
If the problem is CPU-bound, a system-wide profiling tool can be useful to determine where in the system the time is spent - the next question is of course whether that time is actually necessary or not, and no automated tool can tell the difference between a badly written piece of code that does a million completely useless processing steps, and one that does a matrix multiplication with a million elements very efficiently - it takes the same amount of CPU-time to do both, but one isn't actually achieving anything. However, knowing which program takes most of the time in a multiprogram system can be a good starting point for figuring out IF that code is well written, or can be improved.
If the system is I/O bound, such as network or disk I/O, then there are tools for analysing disk and network traffic that can help. But again, expecting the tool to point out what packet response or disk access time you should expect is a different matter - if you contact google to search for "kerflerp", or if you contact your local webserver that is a meter away, will have a dramatic impact on the time for a reasonable response.
There are lots of other issues - running two pieces of code in parallel that uses LOTS of memory can cause both to run slower than if they are run in sequence - because the high memory usage causes swapping, or because the OS isn't able to use spare memory for caching file-I/O, for example.
On the other hand, two or more simple processes that use very little memory will benefit quite a lot from running in parallel on a multiprocessor system.
Adding logging to your applications such that you can see WHERE it is spending time is another method that works reasonably well. Particularly if you KNOW what the use-case is where it takes time.
If you have a use-case where you know "this should take no more than X seconds", running regular pre- or post-commit test to check that the code is behaving as expected, and no-one added a lot of code to slow it down would also be a useful thing.

How to get an accurate performance measure?

In our project we're trying to automatically monitor the performance of test runs, to make sure that we don't have any significant changes in the performance of the program over time.
The problem is that there seems to be a consistent 5% variability in the measures we get. That is, on the same machine with the same program (no recompilation) running the same test we get values that differ by around 5% from run to run. This is way too much for what we want to use the numbers for.
We're already excluding setup costs from the timing considerations - that is, from within C++ code itself we're grabbing the time immediately before and after running the time-critical portions, rather than doing the timing of the whole program on the OS level. We are also doing averaging and outlier exclusion. The problem is that the variability looks to also have long-term trends, so we get tight clustering of times for replicates right after each other, but an hour or two later the times are substantially different. (Unfortunately, spreading the test out over several hours is not feasible.) The tests are also being run on a dedicated machine while "nothing else" is being run on it.
We're not quite sure where the timing variation is coming from, but it may have to do with the processor and the system - there's indications that the size of the variability depends on what machine the program is running on.
Does anyone have an idea where this variation is likely to be coming from, and how to remove it? The tests are running on a dedicated machine, so changing the operating system settings would be possible.
(As indicated by the tags, this is a C++ program running on a x86 Linux system, if that helps clarify things.)
Edit: Response to comments
Our current timing scheme is to use the clock() function from the C standard library, looking at the difference in the return value from before/after the functions we want to test.
The code we're testing should be deterministic, and shouldn't involve heavy IO.
I realize that the situation is a little hazy for a "silver bullet" answer. I guess I'm more looking for a "these are the factors that are important to consider, this is the order you probably should check them in, and here's how you go about checking each of them" type answer.
I'm amazed you got down to 5% variation.
Unless you can get rid of all the unnecessary things running on your system, you will be getting high variation. This is at the top level.
You OS needs to be deterministic. You need to know what other tasks and threads are running and their durations. For example, there is the clock interrupt. Now, how many other functions are chained to this interrupt? Do these other functions vary?
Is your system isolated? For example, your measurements may vary if your system is connected to a network.
Does your program use external resources? For example a hard drive. If the program writes to the hard drive, the drive will not be deterministic. Files and parts of files may move on the drive. The drive may become fragmented. This fragmentation may cause variance in your measurements.
The operating system memory may get fragmented. Also, the executable's memory may become fragmented. Fragmentation may add to the variance.

check the performance of an exe through code

I want to check the performance of an application (whose exe i have, no source code) by running it multiple times and possibly compare the results, dint find much on the internet regarding this topic,
Since i have to do it with multiple input times, i thought doing it through code(no bar on the language used) can make things easier, as i may have to repeat them many times,
can anyone help me start off???
Note: by Performance i mean the memory usage, cpu and possibly the time taken to do it!
(I'm currently using perfmon on windows by using necessary counters to check these parameters and manually noting it down)
It strongly depends upon your operating system. On Linux, you could use the time utility. And strace might help you understanding the system calls that are used.
I have no idea of the equivalent on Windows systems.
I think that you could create a bash/batch script to call your program as many times as you need and with different inputs.
You could then have your script create a CSV file that contains the time it took to execute your program (start date and end date for example). CSV files are usually compatible with most spreadsheet programs like Excel, so I think that can make it easier for you to process your data, like creating means and standard deviations.
I don't have much to say regarding the memory and CPU usage, but if you are in Windows it wouldn't hurt to take a look at the Process Explorer and the Process Monitor (you can find them in this page). I think that they might help you in your task.
Finally if you are in Linux I think that you might be able to use grep with the top command to gather some statistics.
If you want exact results, Rational Purify (on Windows), or valgrind (on Linux) are the best tools; these run your application in a virtual machine that can be instructed to do exact cycle counting.
In another post an utility named timethis.exe was mentioned for measuring time under Windows. Maybe it is useful for your purposes.
I used the perform im using to manually note down in an automated way,
that is, i used the performance counter class available in dot net and obtained samples of the particular application at regular intervals and generated a graph with those values..
Thanks :)

How to find performance bottlenecks in C++ code

I have a server application written in C++ and deployed in Cent OS. I haven't wrote any part of its code but i need to optimize its performance. Its current performance is acceptable for few amount of users but when the number of users increase the server's performance decrease dramatically.
Are there any tools, techniques or best practices to find out the bottlenecks?
People typically use profilers to determine performance bottlenecks. Earlier SO questions asking for C++ profilers are here and here (depending on the operating system and compiler you use). For Linux, people typically use gprof, just because it comes with the system.
You'll start by building a performance test environment if you don't have one
Production-grade hardware. If you do not have the budget for this, you may as well give up.
Driver program(s) or hardware devices which throw production-like traffic at it at a high rate - as fast or faster than production. Depending on your protocol and use-case this may be easy or difficult. One technique is to sample some requests from production and replay them - but this may be give unrealistic results as it will give higher cache hit rates.
Surrounding infrastructure as similar to production as you can reasonably get
Then reproduce the problem, as it exists in production. Once you've done that, then use a profiler etc, as others have suggested.
This works, without fail.
I like, MIke Dunlavey's answer above (so uptick his if you uptick mine)
I'd like to elaborate for someone in a hurry with two methods:
A quick way for gcc users to sample in that gstack
self inspection with SIGALRM combined with backtrace (driven by you own timer).
Just a few days ago I did something like this
# while true; do gstack $MYPID; sleep 2; done | logger $PARAMS
using PARAMS that go with my syslog routing rules so that my app logs were intermixed with stacks (not a perfect line-up with the events)
The results were on the nose, they pointed me to an area that I thought could be an issue at all but were my bottleneck due to misuse of reference in a tr1::bind
In the alarm method be careful what you do in the signal, don`t use anything that allocates memory (no cout/cerr/boost, and use just simple formats (i.e. "%08X" with printf)

Limiting CPU speed for profiling

I'm trying to optimize several bottlenecks on an application which is supposed to run on a really wide range of CPUs and architectures (some of them very close to embeded devices).
The results of my profiler, however, aren't really significant because of the speed of my CPU. Is there any way (preferably under Windows or Mac OS X) to limit the speed of my CPU for profiling purposes?
I've thought about using a virtual machine, but haven't found any with such functionality.
This works well and supports multicore.
It's a common misconception that you need to know how fast your code is to know where your performance problems are. That confuses problem-finding with problem-measurement.
This is the method I use.
If there is some wasteful logic in the program, it will be wasteful no matter what CPU runs it.
What you need to know is where it is. For measurement, you don't need to know how big it is; you only need to know that it is big enough to need to be fixed.
Usually there are a number of problems, of different sizes. You will probably find the biggest ones first, but no matter what order you fix them in, each one you fix will make it easier to find the remaining ones, because they will take a larger percentage.
I'm afraid I don't know any answer other than to start looking around in your area for old hardware. The CPU isn't the only variable that can (usually) affect things. L1/L2 cache size, memory bus speed, memory speed/latency, hard drive speed, etc. are all significant factors in many applications.
There was an app on recently. I dont remember the name of it but it did some fun stiff woth processors and task manager. It may have only been to manage what apps are on what cpu but maybe it would give you this. I will try to look for it this afternoon, and respond back if I find it.
Many profilers (for example oprofile - but thats linux only) let you set the frequency that they collect data. See if your profiler supports this, and if not try a different one that does.
I've thought about using a virtual
machine, but haven't found any with
such functionality.
Why do you need a VM that explicitly offers that functionality? Just limit the CPU usage of the VM in the host OS (where it is just a regular process). That should have exactly the same effect.
You can do this e.g. using cpulimit on Linux; similar solutions exist for MS Windows.