Online judge system - c++

I would like to create web-service with some problems and judge system. The user should write solution in C++ and then upload the source code. Then code should be compiled and test to determinate if the solution is right(like TopCoder does when you submit your code). There should be time and memory limits for solutions to run. Is there some ready-to-use judge systems to create such a service on Linux?

Do you mean online judge system like acm.timus.ru? Then you could try open source project on code.google.com and I guess that they use it here. I studied before this code a little bit: basically it prevents some system calls and measures spent time and memory. In detail, have a look at here.

I used DomJudge system which is ready for online competitions. http://domjudge.sourceforge.net/

For just checking memory and time caps, just add the code into their project automatically. Also, see Limit physical memory per process for memory caps as well.
For example, to check memory, add:
(EDITED FOR LINUX EXAMPLE)
#include "sys/types.h"
#include "sys/sysinfo.h"
#inlcude <stdio.h>
#include <pthread.h>
public long long getMemoryUsage()
{
sysinfo (&memInfo);
long long totalVirtualMem = memInfo.totalram; //Total physical memory
}
And add a new Thread to their Main() that calls a function as follow:
private const long long MAXMEMORY = 1024; // Memory cap
private const int LIMITAPPLICATION = 100; // 10 seconds
private void TimeLimitToApplication()
{
//LIMITAPPLICATION represents how many 100ms's must
//pass until time limit is reached.
while(COUNTER < LIMITAPPLICATION)
{
sleep(100);
/* Include if including code block below
if(CheckProcessThreadsIsComplete()) break;
*/
//Check memory usage every 100ms
if(getMemoryUsage() > MAXMEMORY)
break;
COUNTER++;
}
pgid = getpgid();
kill(pgid, 15);
}
You might also want to add another check to see if all your threads are completed:
(I only have a C#.NET sample of this, but I'm sure you could find something similar.)
public static bool CheckProcessThreadsIsComplete()
{
Process[] allProcs = Process.GetProcesses();
foreach(Process proc in allProcs)
{
ProcessThreadCollection myThreads = proc.Threads;
foreach(ProcessThread pt in myThreads)
{
//Ignore thread that you started to do time and memory checks
if(pt.Id == TIMELIMIT_AND_MEMORYCHECK_THREAD_ID) continue;
if(pt.ThreadState != (ThreadState.Unstarted | ThreadState.Stopped | ThreadState.WaitSleepJoin | ThreadState.Aborted)
return true; // all threads are not completed
}
}
return false; // all threads are completed
}
And add that to your checks in TimeLimitToApplication();

Related

Resource intensive multithreading killing other processes

I have a very resource intensive code, that I made, so I can split the workload over multiple pthreads. While everything works, the computation is done faster, etc. What I'm guessing happens is that other processes on that processor core get so slow, that they crash after a few seconds of runtime.
I already managed to kill random processes like Chrome tabs, the Cinnamon DE or even the entire OS (Kernel?).
Code: (It's late, and I'm too tired to make a pseudo code, or even comments..)
-- But it's a brute force code, not so much for cracking, but for testing passwords and or CPU IPS.
Any ideas how to fix this, while still keeping as much performance as possible?
static unsigned int NTHREADS = std::thread::hardware_concurrency();
static int THREAD_COMPLETE = -1;
static std::string PASSWORD = "";
static std::string CHARS;
static std::mutex MUTEX;
void *find_seq(void *arg_0)
{
unsigned int _arg_0 = *((unsigned int *) arg_0);
std::string *str_CURRENT = new std::string(" ");
while (true)
{
for (unsigned int loop_0 = _arg_0; loop_0 < CHARS.length() - 1; loop_0 += NTHREADS)
{
str_CURRENT->back() = CHARS[loop_0];
if (*str_CURRENT == PASSWORD)
{
THREAD_COMPLETE = _arg_0;
return (void *) str_CURRENT;
}
}
str_CURRENT->back() = CHARS.back();
for (int loop_1 = (str_CURRENT->length() - 1); loop_1 >= 0; loop_1--)
{
if (str_CURRENT->at(loop_1) == CHARS.back())
{
if (loop_1 == 0)
str_CURRENT->assign(str_CURRENT->length() + 1, CHARS.front());
else
{
str_CURRENT->at(loop_1) = CHARS.front();
str_CURRENT->at(loop_1 - 1) = CHARS[CHARS.find(str_CURRENT->at(loop_1 - 1)) + 1];
}
}
}
};
}
Areuz,
Can you post the full code? I suspect the issue is the NTHREADS value. On my Ubuntu box, the value is set to 8 which is the number of cores in the /proc/cpuinfo file. Kicking off 8 'hot' threads on my box hogs 100% of the CPU. The kernel will time slice for its own critical processes but in general all other processes will starve for CPU.
Check out the max processor value in /etc/cpuinfo and go at least one lower then that. The CPU's are numbered 0-7 on my box, so 7 would be the max for me. The actual max might be 3 since 4 of my cores are hyper-threads. For completely CPU processes, hyper-threading generally doesn't help.
Bottom line, don't hog all the CPU, it will destabilize the system.
--Matt
Thank you for your answers and especially Matthew Fisher for his suggestion to try it on another system.
After some trial and error I decided to pull back my CPU overclock that I thought was stable (I had it for over a year) and that solved this weird behaviour. I guess that I've never ran such a CPU intensive and (I'm guessing) efficient (In regards to not throttling the full CPU by yielding) script to see this happen.
As Matthew suggested I need to come up with a better way than to just constantly check the THREAD_COMPLETE variable with a while true loop, but I hope to resolve that in the comments.
Full and updated code for future visitors is here: pastebin.com/jbiYyKBu

MongoDB C driver efficiency

I'm trying to write a program whose job it is to go into shared memory, retrieve a piece of information (a struct 56 bytes in size), then parse that struct lightly and write it to a database.
The catch is that it needs to do this several dozens of thousands of times per second. I'm running this on a dedicated Ubuntu 14.04 server with dual Xeon X5677's and 32GB RAM. Also, Mongo is running PerconaFT as its storage engine. I am making an uneducated guess here, but say worst case load scenario would be 100,000 writes per second.
Shared memory is populated by another program who's reading information from a real time data stream, so I can't necessarily reproduce scenarios.
First... is Mongo the right choice for this task?
Next, this is the code that I've got right now. It starts with creating a list of collections (the list of items I want to record data points on is fixed) and then retrieving data from shared memory until it catches a signal.
int main()
{
//these deal with navigating shared memory
uint latestNotice=0, latestTurn=0, latestPQ=0, latestPQturn=0;
symbol_data *notice = nullptr;
bool done = false;
//this is our 56 byte struct
pq item;
uint64_t today_at_midnight; //since epoch, in milliseconds
{
time_t seconds = time(NULL);
today_at_midnight = seconds/(60*60*24);
today_at_midnight *= (60*60*24*1000);
}
//connect to shared memory
infob=info_block_init();
uint32_t used_symbols = infob->used_symbols;
getPosition(latestNotice, latestTurn);
//fire up mongo
mongoc_client_t *client = nullptr;
mongoc_collection_t *collections[used_symbols];
mongoc_collection_t *collection = nullptr;
bson_error_t error;
bson_t *doc = nullptr;
mongoc_init();
client = mongoc_client_new("mongodb://localhost:27017/");
for(uint32_t symbol = 0; symbol < used_symbols; symbol++)
{
collections[symbol] = mongoc_client_get_collection(client, "scribe",
(infob->sd+symbol)->getSymbol());
}
//this will be used later to sleep one millisecond
struct timespec ts;
ts.tv_sec=0;
ts.tv_nsec=1000000;
while(continue_running) //becomes false if a signal is caught
{
//check that new info is available in shared memory
//sleep 1ms if it isn't
while(!getNextNotice(&notice,latestNotice,latestTurn)) nanosleep(&ts, NULL);
//get the new info
done=notice->getNextItem(item, latestPQ, latestPQturn);
if(done) continue;
//just some simple array math to make sure we're on the right collection
collection = collections[notice - infob->sd];
//switch on the item type and parse it accordingly
switch(item.tp)
{
case pq::pq_event:
doc = BCON_NEW(
//decided to use this instead of std::chrono
"ts", BCON_DATE_TIME(today_at_midnight + item.ts),
//item.pr is a uint64_t, and the guidance I've read on mongo
//advises using strings for those values
"pr", BCON_UTF8(std::to_string(item.pr).c_str()),
"sz", BCON_INT32(item.sz),
"vn", BCON_UTF8(venue_labels[item.vn]),
"tp", BCON_UTF8("e")
);
if(!mongoc_collection_insert(collection, MONGOC_INSERT_NONE, doc, NULL, &error))
{
LOG(1,"Mongo Error: "<<error.message<<endl);
}
break;
//obviously, several other cases go here, but they all look the
//same, using BCON macros for their data.
default:
LOG(1,"got unknown type = "<<item.tp<<endl);
break;
}
}
//clean up once we break from the while()
if(doc != nullptr) bson_destroy(doc);
for(uint32_t symbol = 0; symbol < used_symbols; symbol++)
{
collection = collections[symbol];
mongoc_collection_destroy(collection);
}
if(client != nullptr) mongoc_client_destroy(client);
mongoc_cleanup();
return 0;
}
My second question is: is this the fastest way to do this? The retrieval from shared memory isn't perfect, but this program is getting way behind its supply of data, far moreso than I need it to be. So I'm looking for obvious mistakes with regards to efficiency or technique when speed is the goal.
Thanks in advance. =)

What could cause a mutex to misbehave?

I've been busy the last couple of months debugging a rare crash caused somewhere within a very large proprietary C++ image processing library, compiled with GCC 4.7.2 for an ARM Cortex-A9 Linux target. Since a common symptom was glibc complaining about heap corruption, the first step was to employ a heap corruption checker to catch oob memory writes. I used the technique described in https://stackoverflow.com/a/17850402/3779334 to divert all calls to free/malloc to my own function, padding every allocated chunk of memory with some amount of known data to catch out-of-bounds writes - but found nothing, even when padding with as much as 1 KB before and after every single allocated block (there are hundreds of thousands of allocated blocks due to intensive use of STL containers, so I can't enlarge the padding further, plus I assume any write more than 1KB out of bounds would eventually trigger a segfault anyway). This bounds checker has found other problems in the past so I don't doubt its functionality.
(Before anyone says 'Valgrind', yes, I have tried that too with no results either.)
Now, my memory bounds checker also has a feature where it prepends every allocated block with a data struct. These structs are all linked in one long linked list, to allow me to occasionally go over all allocations and test memory integrity. For some reason, even though all manipulations of this list are mutex protected, the list was getting corrupted. When investigating the issue, it began to seem like the mutex itself was occasionally failing to do its job. Here is the pseudocode:
pthread_mutex_t alloc_mutex;
static bool boolmutex; // set to false during init. volatile has no effect.
void malloc_wrapper() {
// ...
pthread_mutex_lock(&alloc_mutex);
if (boolmutex) {
printf("mutex misbehaving\n");
__THROW_ERROR__; // this happens!
}
boolmutex = true;
// manipulate linked list here
boolmutex = false;
pthread_mutex_unlock(&alloc_mutex);
// ...
}
The code commented with "this happens!" is occasionally reached, even though this seems impossible. My first theory was that the mutex data structure was being overwritten. I placed the mutex within a struct, with large arrays before and after it, but when this problem occurred the arrays were untouched so nothing seems to be overwritten.
So.. What kind of corruption could possibly cause this to happen, and how would I find and fix the cause?
A few more notes. The test program uses 3-4 threads for processing. Running with less threads seems to make the corruptions less common, but not disappear. The test runs for about 20 seconds each time and completes successfully in the vast majority of cases (I can have 10 units repeating the test, with the first failure occurring after 5 minutes to several hours). When the problem occurs it is quite late in the test (say, 15 seconds in), so this isn't a bad initialization issue. The memory bounds checker never catches actual out of bounds writes but glibc still occasionally fails with a corrupted heap error (Can such an error be caused by something other than an oob write?). Each failure generates a core dump with plenty of trace information; there is no pattern I can see in these dumps, no particular section of code that shows up more than others. This problem seems very specific to a particular family of algorithms and does not happen in other algorithms, so I'm quite certain this isn't a sporadic hardware or memory error. I have done many more tests to check for oob heap accesses which I don't want to list to keep this post from getting any longer.
Thanks in advance for any help!
Thanks to all commenters. I've tried nearly all suggestions with no results, when I finally decided to write a simple memory allocation stress test - one that would run a thread on each of the CPU cores (my unit is a Freescale i.MX6 quad core SoC), each allocating and freeing memory in random order at high speed. The test crashed with a glibc memory corruption error within minutes or a few hours at most.
Updating the kernel from 3.0.35 to 3.0.101 solved the problem; both the stress test and the image processing algorithm now run overnight without failing. The problem does not reproduce on Intel machines with the same kernel version, so the problem is specific either to ARM in general or perhaps to some patch Freescale included with the specific BSP version that included kernel 3.0.35.
For those curious, attached is the stress test source code. Set NUM_THREADS to the number of CPU cores and build with:
<cross-compiler-prefix>g++ -O3 test_heap.cpp -lpthread -o test_heap
I hope this information helps someone. Cheers :)
// Multithreaded heap stress test. By Itay Chamiel 20151012.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>
#include <pthread.h>
#include <sys/time.h>
#define NUM_THREADS 4 // set to number of CPU cores
#define ALIVE_INDICATOR NUM_THREADS
// Each thread constantly allocates and frees memory. In each iteration of the infinite loop, decide at random whether to
// allocate or free a block of memory. A list of 500-1000 allocated blocks is maintained by each thread. When memory is allocated
// it is added to this list; when freeing, a random block is selected from this list, freed and removed from the list.
void* thr(void* arg) {
int* alive_flag = (int*)arg;
int thread_id = *alive_flag; // this is a number between 0 and (NUM_THREADS-1) given by main()
int cnt = 0;
timeval t_pre, t_post;
gettimeofday(&t_pre, NULL);
const int ALLOCATE=1, FREE=0;
const unsigned int MINSIZE=500, MAXSIZE=1000;
const int MAX_ALLOC=10000;
char* membufs[MAXSIZE];
unsigned int membufs_size = 0;
int num_allocs = 0, num_frees = 0;
while(1)
{
int action;
// Decide whether to allocate or free a memory block.
// if we have less than MINSIZE buffers, allocate.
if (membufs_size < MINSIZE) action = ALLOCATE;
// if we have MAXSIZE, free.
else if (membufs_size >= MAXSIZE) action = FREE;
// else, decide randomly.
else {
action = ((rand() & 0x1)? ALLOCATE : FREE);
}
if (action == ALLOCATE) {
// choose size to allocate, from 1 to MAX_ALLOC bytes
size_t size = (rand() % MAX_ALLOC) + 1;
// allocate and fill memory
char* buf = (char*)malloc(size);
memset(buf, 0x77, size);
// add buffer to list
membufs[membufs_size] = buf;
membufs_size++;
assert(membufs_size <= MAXSIZE);
num_allocs++;
}
else { // action == FREE
// choose a random buffer to free
size_t pos = rand() % membufs_size;
assert (pos < membufs_size);
// free and remove from list by replacing entry with last member
free(membufs[pos]);
membufs[pos] = membufs[membufs_size-1];
membufs_size--;
assert(membufs_size >= 0);
num_frees++;
}
// once in 10 seconds print a status update
gettimeofday(&t_post, NULL);
if (t_post.tv_sec - t_pre.tv_sec >= 10) {
printf("Thread %d [%d] - %d allocs %d frees. Alloced blocks %u.\n", thread_id, cnt++, num_allocs, num_frees, membufs_size);
gettimeofday(&t_pre, NULL);
}
// indicate alive to main thread
*alive_flag = ALIVE_INDICATOR;
}
return NULL;
}
int main()
{
int alive_flag[NUM_THREADS];
printf("Memory allocation stress test running on %d threads.\n", NUM_THREADS);
// start a thread for each core
for (int i=0; i<NUM_THREADS; i++) {
alive_flag[i] = i; // tell each thread its ID.
pthread_t th;
int ret = pthread_create(&th, NULL, thr, &alive_flag[i]);
assert(ret == 0);
}
while(1) {
sleep(10);
// check that all threads are alive
bool ok = true;
for (int i=0; i<NUM_THREADS; i++) {
if (alive_flag[i] != ALIVE_INDICATOR)
{
printf("Thread %d is not responding\n", i);
ok = false;
}
}
assert(ok);
for (int i=0; i<NUM_THREADS; i++)
alive_flag[i] = 0;
}
return 0;
}

Windows7 memory management - how to prevent concurrent threads from blocking

I'm working on a program consisting of two concurrent threads. One (here "Clock") is performing some computation on a regular basis (10 Hz) and is quite memory-intensive. The other one (here "hugeList") uses even more RAM but is not as time critical as the first one. So I decided to reduce its priority to THREAD_PRIORITY_LOWEST. Yet, when the thread frees most of the memory it has used the critical one doesn't manage to keep its timing.
I was able to condense down the problem to this bit of code (make sure optimizations are turned off!):
while Clock tries to keep a 10Hz-timing the hugeList-thread allocates and frees more and more memory not organized in any sort of chunks.
#include "stdafx.h"
#include <stdio.h>
#include <forward_list>
#include <time.h>
#include <windows.h>
#include <vector>
void wait_ms(double _ms)
{
clock_t endwait;
endwait = clock () + _ms * CLOCKS_PER_SEC/1000;
while (clock () < endwait) {} // active wait
}
void hugeList(void)
{
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_LOWEST);
unsigned int loglimit = 3;
unsigned int limit = 1000;
while(true)
{
for(signed int cnt=loglimit; cnt>0; cnt--)
{
printf(" Countdown %d...\n", cnt);
wait_ms(1000.0);
}
printf(" Filling list...\n");
std::forward_list<double> list;
for(unsigned int cnt=0; cnt<limit; cnt++)
list.push_front(42.0);
loglimit++;
limit *= 10;
printf(" Clearing list...\n");
while(!list.empty())
list.pop_front();
}
}
void Clock()
{
clock_t start = clock()-CLOCKS_PER_SEC*100/1000;
while(true)
{
std::vector<double> dummyData(100000, 42.0); // just get some memory
printf("delta: %d ms\n", (clock()-start)*1000/CLOCKS_PER_SEC);
start = clock();
wait_ms(100.0);
}
}
int main()
{
DWORD dwThreadId;
if (CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)&Clock, (LPVOID) NULL, 0, &dwThreadId) == NULL)
printf("Thread could not be created");
if (CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)&hugeList, (LPVOID) NULL, 0, &dwThreadId) == NULL)
printf("Thread could not be created");
while(true) {;}
return 0;
}
First of all I noticed that allocating memory for the linked list is way faster than freeing it.
On my machine (Windows7) at around the 4th iteration of the "hugeList"-method the Clock-Thread gets significantly disturbed (up to 200ms). The effect disappears without the dummyData-vector "asking" for some memory in the Clock-Thread.
So,
Is there any way of increasing the priority of memory allocation for the Clock-Thread in Win7?
Or do I have to split both operations onto two contexts (processes)?
Note that my original code uses some communication via shared variables which would require for some kind of IPC if I chose the second option.
Note that my original code gets stuck for about 1sec when the equivalent to the "hugeList"-method clears a boost::unordered_map and enters ntdll.dll!RtIInitializeCriticalSection many many times.
(observed by systinernals process explorer)
Note that the effects observed are not due to swapping, I'm using 1.4GB of my 16GB (64bit win7).
edit:
just wanted to let you know that up to now I haven't been able to solve my issue. Splitting both parts of the code onto two processes does not seem to be an option since my time is rather limited and I've never worked with processes so far. I'm afraid I won't be able to get to a running version in time.
However, I managed to reduce the effects by reducing the number of memory deallocations made by the non-critical thread. This was achieved by using a fast pooling memory allocator (like the one provided in the boost library).
There does not seem to be the possibility of explicitly creating certain objects (like e.g. the huge forward list in my example) on some sort of threadprivate heap that would not require synchronisation.
For further reading:
http://bmagic.sourceforge.net/memalloc.html
Do threads have a distinct heap?
Memory Allocation/Deallocation Bottleneck?
http://software.intel.com/en-us/articles/avoiding-heap-contention-among-threads
http://www.boost.org/doc/libs/1_55_0/libs/pool/doc/html/boost_pool/pool/introduction.html
Replacing std::forward_list with a std::list, I ran your code on a corei7 4GB machine until 2GB is consumed. No disturbances at all. (In debug build)
P.S
Yes. The release build recreates the issue. I replaced the forward list with an array
double* p = new double[limit];
for(unsigned int cnt=0; cnt<limit; cnt++)
p[cnt] = 42.0;
and
for(unsigned int cnt=0; cnt<limit; cnt++)
p[cnt] = -1;
delete [] p;
It does not recreates then.
It seems thread scheduler is punishing for asking for lot of small memory chunks.

Determine Process Info Programmatically in Darwin/OSX

I have a class with the following member functions:
/// caller pid
virtual pid_t Pid() const = 0;
/// physical memory size in KB
virtual uint64_t Size() const = 0;
/// resident memory for this process
virtual uint64_t Rss() const = 0;
/// cpu used by this process
virtual double PercentCpu() const = 0;
/// memory used by this process
virtual double PercentMemory() const = 0;
/// number of threads in this process
virtual int32_t Lwps() const = 0;
This class' duty is to return process information about caller. Physical memory size can easily determined by a sysctl call, and pid is trivial, but the remaining calls have eluded me, aside from invoking a popen on ps or top and parsing the output - which isn't acceptable. Any help would be greatly appreciated.
Requirements:
Compiles on g++ 4.0
No obj-c
OSX 10.5
Process info comes from pidinfo:
cristi:~ diciu$ grep proc_pidinfo /usr/include/libproc.h
int proc_pidinfo(int pid, int flavor, uint64_t arg, void *buffer, int buffersize);
cpu load comes from host_statistics:
cristi:~ diciu$ grep -r host_statistics /usr/include/
/usr/include/mach/host_info.h:/* host_statistics() */
/usr/include/mach/mach_host.defs:routine host_statistics(
/usr/include/mach/mach_host.h:/* Routine host_statistics */
/usr/include/mach/mach_host.h:kern_return_t host_statistics
For more details, check out sources for top and lsof, they are open source (you need to register as an Apple developer but that's free of charge):
https://opensource.apple.com/source/top/top-111.20.1/libtop.c.auto.html
Later edit: All these interfaces are version specific, so you need to take that into account when writing production code (libproc.h):
/*
* This header file contains private interfaces to obtain process information.
* These interfaces are subject to change in future releases.
*/
Since you say no Objective-C we'll rule out most of the MacOS frameworks.
You can get CPU time using getrusage(), which gives the total amount of User and System CPU time charged to your process. To get a CPU percentage you'd need to snapshot the getrusage values once per second (or however granular you want to be).
#include <sys/resource.h>
struct rusage r_usage;
if (getrusage(RUSAGE_SELF, &r_usage)) {
/* ... error handling ... */
}
printf("Total User CPU = %ld.%ld\n",
r_usage.ru_utime.tv_sec,
r_usage.ru_utime.tv_usec);
printf("Total System CPU = %ld.%ld\n",
r_usage.ru_stime.tv_sec,
r_usage.ru_stime.tv_usec);
There is an RSS field in the getrusage structure, but is appears to always be zero in MacOS X 10.5. Michael Knight wrote a blog post several years ago about how to determine the RSS.
You can use below code for process info in mac OS:
void IsInBSDProcessList(char *name) {
assert( name != NULL);
kinfo_proc *result;
size_t count = 0;
result = (kinfo_proc *)malloc(sizeof(kinfo_proc));
if(GetBSDProcessList(&result,&count) == 0) {
for (int i = 0; i < count; i++) {
kinfo_proc *proc = NULL;
proc = &result[i];
}
}
free(result);
}
kinfo_proc struct contains all the information about a process.such as Process identifier (pid),
process group, process status and etc.
I think most of these values are available in the Mach API, but it's been a while since I've poked around in there. Alternatively, you could just look at the source code for the "ps" or "top" commands, and see how they do it.
Most of this info can be gotten from GetProcessInformation().
By the way, why virtual methods for functions that return processwide info?
This is CARBON only, and won't work with cocoa