I have a class with the following member functions:
/// caller pid
virtual pid_t Pid() const = 0;
/// physical memory size in KB
virtual uint64_t Size() const = 0;
/// resident memory for this process
virtual uint64_t Rss() const = 0;
/// cpu used by this process
virtual double PercentCpu() const = 0;
/// memory used by this process
virtual double PercentMemory() const = 0;
/// number of threads in this process
virtual int32_t Lwps() const = 0;
This class' duty is to return process information about caller. Physical memory size can easily determined by a sysctl call, and pid is trivial, but the remaining calls have eluded me, aside from invoking a popen on ps or top and parsing the output - which isn't acceptable. Any help would be greatly appreciated.
Requirements:
Compiles on g++ 4.0
No obj-c
OSX 10.5
Process info comes from pidinfo:
cristi:~ diciu$ grep proc_pidinfo /usr/include/libproc.h
int proc_pidinfo(int pid, int flavor, uint64_t arg, void *buffer, int buffersize);
cpu load comes from host_statistics:
cristi:~ diciu$ grep -r host_statistics /usr/include/
/usr/include/mach/host_info.h:/* host_statistics() */
/usr/include/mach/mach_host.defs:routine host_statistics(
/usr/include/mach/mach_host.h:/* Routine host_statistics */
/usr/include/mach/mach_host.h:kern_return_t host_statistics
For more details, check out sources for top and lsof, they are open source (you need to register as an Apple developer but that's free of charge):
https://opensource.apple.com/source/top/top-111.20.1/libtop.c.auto.html
Later edit: All these interfaces are version specific, so you need to take that into account when writing production code (libproc.h):
/*
* This header file contains private interfaces to obtain process information.
* These interfaces are subject to change in future releases.
*/
Since you say no Objective-C we'll rule out most of the MacOS frameworks.
You can get CPU time using getrusage(), which gives the total amount of User and System CPU time charged to your process. To get a CPU percentage you'd need to snapshot the getrusage values once per second (or however granular you want to be).
#include <sys/resource.h>
struct rusage r_usage;
if (getrusage(RUSAGE_SELF, &r_usage)) {
/* ... error handling ... */
}
printf("Total User CPU = %ld.%ld\n",
r_usage.ru_utime.tv_sec,
r_usage.ru_utime.tv_usec);
printf("Total System CPU = %ld.%ld\n",
r_usage.ru_stime.tv_sec,
r_usage.ru_stime.tv_usec);
There is an RSS field in the getrusage structure, but is appears to always be zero in MacOS X 10.5. Michael Knight wrote a blog post several years ago about how to determine the RSS.
You can use below code for process info in mac OS:
void IsInBSDProcessList(char *name) {
assert( name != NULL);
kinfo_proc *result;
size_t count = 0;
result = (kinfo_proc *)malloc(sizeof(kinfo_proc));
if(GetBSDProcessList(&result,&count) == 0) {
for (int i = 0; i < count; i++) {
kinfo_proc *proc = NULL;
proc = &result[i];
}
}
free(result);
}
kinfo_proc struct contains all the information about a process.such as Process identifier (pid),
process group, process status and etc.
I think most of these values are available in the Mach API, but it's been a while since I've poked around in there. Alternatively, you could just look at the source code for the "ps" or "top" commands, and see how they do it.
Most of this info can be gotten from GetProcessInformation().
By the way, why virtual methods for functions that return processwide info?
This is CARBON only, and won't work with cocoa
Related
I want to allocate my buffers according to memory available. Such that, when I do processing and memory usage goes up, but still remains in available memory limits. Is there a way to get available memory (I don't know will virtual or physical memory status will make any difference ?). Method has to be platform Independent as its going to be used on Windows, OS X, Linux and AIX. (And if possible then I would also like to allocate some of available memory for my application, someone it doesn't change during the execution).
Edit: I did it with configurable memory allocation.
I understand it is not good idea, as most OS manage memory for us, but my application was an ETL framework (intended to be used on server, but was also being used on desktop as a plugin for Adobe indesign). So, I was running in to issue of because instead of using swap, windows would return bad alloc and other applications start to fail. And as I was taught to avoid crashes and so, was just trying to degrade gracefully.
On UNIX-like operating systems, there is sysconf.
#include <unistd.h>
unsigned long long getTotalSystemMemory()
{
long pages = sysconf(_SC_PHYS_PAGES);
long page_size = sysconf(_SC_PAGE_SIZE);
return pages * page_size;
}
On Windows, there is GlobalMemoryStatusEx:
#include <windows.h>
unsigned long long getTotalSystemMemory()
{
MEMORYSTATUSEX status;
status.dwLength = sizeof(status);
GlobalMemoryStatusEx(&status);
return status.ullTotalPhys;
}
So just do some fancy #ifdefs and you'll be good to go.
There are reasons to do want to do this in HPC for scientific software. (Not game, web, business or embedded software). Scientific software routinely go through terabytes of data to get through one computation (or run) (and run for hours or weeks) -- all of which cannot be stored in memory (and if one day you tell me a terabyte is standard for any PC or tablet or phone it will be the case that the scientific software will be expected to handle petabytes or more). The amount of memory can also dictate the kind of method/algorithm that makes sense. The user does not always want to decide the memory and method - he/she has other things to worry about. So the programmer should have a good idea of what is available (4Gb or 8Gb or 64Gb or thereabouts these days) to decide whether a method will automatically work or a more laborious method is to be chosen. Disk is used but memory is preferable. And users of such software are not encouraged to be doing too many things on their computer when running such software -- in fact, they often use dedicated machines/servers.
There is no platform independent way to do this, different operating systems use different memory management strategies.
These other stack overflow questions will help:
How to get memory usage at run time in c++?
C/C++ memory usage API in Linux/Windows
You should watch out though: It is notoriously difficult to get a "real" value for available memory in linux. What the operating system displays as used by a process is no guarantee of what is actually allocated for the process.
This is a common issue when developing embedded linux systems such as routers, where you want to buffer as much as the hardware allows. Here is a link to an example showing how to get this information in a linux (in C):
http://www.unix.com/programming/25035-determining-free-available-memory-mv-linux.html
Having read through these answers I'm astonished that so many take the stance that OP's computer memory belongs to others. It's his computer and his memory to do with as he sees fit, even if it breaks other systems taking a claim it. It's an interesting question. On a more primitive system I had memavail() which would tell me this. Why shouldn't the OP take as much memory as he wants without upsetting other systems?
Here's a solution that allocates less than half the memory available, just to be kind. Output was:
Required FFFFFFFF
Required 7FFFFFFF
Required 3FFFFFFF
Memory size allocated = 1FFFFFFF
#include <stdio.h>
#include <stdlib.h>
#define MINREQ 0xFFF // arbitrary minimum
int main(void)
{
unsigned int required = (unsigned int)-1; // adapt to native uint
char *mem = NULL;
while (mem == NULL) {
printf ("Required %X\n", required);
mem = malloc (required);
if ((required >>= 1) < MINREQ) {
if (mem) free (mem);
printf ("Cannot allocate enough memory\n");
return (1);
}
}
free (mem);
mem = malloc (required);
if (mem == NULL) {
printf ("Cannot enough allocate memory\n");
return (1);
}
printf ("Memory size allocated = %X\n", required);
free (mem);
return 0;
}
Mac OS X example using sysctl (man 3 sysctl):
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/sysctl.h>
int main(void)
{
int mib[2] = { CTL_HW, HW_MEMSIZE };
u_int namelen = sizeof(mib) / sizeof(mib[0]);
uint64_t size;
size_t len = sizeof(size);
if (sysctl(mib, namelen, &size, &len, NULL, 0) < 0)
{
perror("sysctl");
}
else
{
printf("HW.HW_MEMSIZE = %llu bytes\n", size);
}
return 0;
}
(may also work on other BSD-like operating systems ?)
The code below gives the total and free memory in Megabytes. Works for FreeBSD, but you should be able to use same/similar sysctl tunables on your platform and do to the same thing (Linux & OS X have sysctl at least)
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <sys/vmmeter.h>
int main(){
int rc;
u_int page_size;
struct vmtotal vmt;
size_t vmt_size, uint_size;
vmt_size = sizeof(vmt);
uint_size = sizeof(page_size);
rc = sysctlbyname("vm.vmtotal", &vmt, &vmt_size, NULL, 0);
if (rc < 0){
perror("sysctlbyname");
return 1;
}
rc = sysctlbyname("vm.stats.vm.v_page_size", &page_size, &uint_size, NULL, 0);
if (rc < 0){
perror("sysctlbyname");
return 1;
}
printf("Free memory : %ld\n", vmt.t_free * (u_int64_t)page_size);
printf("Available memory : %ld\n", vmt.t_avm * (u_int64_t)page_size);
return 0;
}
Below is the output of the program, compared with the vmstat(8) output on my system.
~/code/memstats % cc memstats.c
~/code/memstats % ./a.out
Free memory : 5481914368
Available memory : 8473378816
~/code/memstats % vmstat
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs us sy id
0 0 0 8093M 5228M 287 0 1 0 304 133 0 0 112 9597 1652 2 1 97
Linux currently free memory: sysconf(_SC_AVPHYS_PAGES) and get_avphys_pages()
The total RAM was covered at https://stackoverflow.com/a/2513561/895245 with sysconf(_SC_PHYS_PAGES);.
Both sysconf(_SC_AVPHYS_PAGES) and get_avphys_pages() are glibc extensions to POSIX that give instead the total currently available RAM pages.
You then just have to multiply them by sysconf(_SC_PAGE_SIZE) to obtain the current free RAM.
Minimal runnable example at: C - Check available free RAM?
The "official" function for this is was std::get_temporary_buffer(). However, you might want to test whether your platform has a decent implemenation. I understand that not all platforms behave as desired.
Instead of trying to guess, have you considered letting the user configure how much memory to use for buffers, as well as assuming somewhat conservative defaults? This way you can still run (possibly slightly slower) with no override, but if the user know there is X memory available for the app they can improve performance by configuring that amount.
Here is a proposal to get available memory on Linux platform:
/// Provides the available RAM memory in kibibytes (1 KiB = 1024 B) on Linux platform (Available memory in /proc/meminfo)
/// For more info about /proc/meminfo : https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo
long long getAvailableMemory()
{
long long memAvailable = -1;
std::ifstream meminfo("/proc/meminfo");
std::string line;
while (std::getline(meminfo, line))
{
if (line.find("MemAvailable:") != std::string::npos)
{
const std::size_t firstWhiteSpacePos = line.find_first_of(' ');
const std::size_t firstNonWhiteSpaceChar = line.find_first_not_of(' ', firstWhiteSpacePos);
const std::size_t nextWhiteSpace = line.find_first_of(' ', firstNonWhiteSpaceChar);
const std::size_t numChars = nextWhiteSpace - firstNonWhiteSpaceChar;
const std::string memAvailableStr = line.substr(firstNonWhiteSpaceChar, numChars);
memAvailable = std::stoll(memAvailableStr);
break;
}
}
return memAvailable;
}
I have a Monte Carlo simulation in which the state of the system is a bit string (size N) with the bits being randomly flipped. In an effort to accelerate the simulation the code was revised to use CUDA. However because of the large number of statistics I need calculated from the system state (goes as N^2) this part needs to be done on the CPU where there is more memory. Currently the algorithm looks like this:
loop
CUDA kernel making 10s of Monte Carlo steps
Copy system state back to CPU
Calculate statistics
This is inefficient and I would like to have the kernel run persistently while the CPU occasionally queries the state of the system and calculates the statistics while the kernel continues to run.
Based on Tom's answer to this question I think the answer is double buffering, but I haven't been able to find an explanation or example of how to do this.
How does one set up the double buffering described in the third paragraph of Tom's answer for a CUDA/C++ code?
Here's a fully worked example of a "persistent" kernel, producer-consumer approach, with a double-buffered interface from device (producer) to host (consumer).
Persistent kernel design generally implies launching kernels with, at most, the number of blocks that can be simultaneously resident on the hardware (see item 1 on slide 16 here). For the most efficient usage of the machine, we'd generally like to maximize this, while still staying within the aforementioned limit. This involves an occupancy study for a specific kernel, and it will vary from kernel to kernel. Therefore I've chosen to take a shortcut here, and simply launch as many blocks as there are multiprocessors. Such an approach is always guaranteed to work (it could be considered a "lower bound" on the number of blocks to launch for a persistent kernel), but is (typically) not the most efficient usage of the machine. Nevertheless, I claim the occupancy study is beside the point of your question. Furthermore, it is arguable that proper "persistent kernel" design with guaranteed forward progress is actually quite tricky - requiring careful design of the CUDA thread code and placement of threadblocks (e.g. only use 1 threadblock per SM) to guarantee forward progress. However we don't need to delve to this level to address your question (I don't think) and the persistent kernel example I propose here only places 1 threadblock per SM.
I'm also assuming a proper UVA setup, so that I can skip the details of arranging for proper mapped memory allocations in an non-UVA setup.
The basic idea is that we will have 2 buffers on the device, along with 2 "mailboxes" in mapped memory, one for each buffer. The device kernel will fill a buffer with data, then set the "mailbox" to a value (2, in this case) that indicates the host may "consume" the buffer. The device then goes on to the other buffer and repeats the process in a ping-pong fashion between buffers. In order to make this work we must make sure that the device itself has not overrun the buffers (no thread is allowed to be more than one buffer ahead of any other thread) and that before a buffer is populated by the device, the host has consumed the previous contents.
On the host side, it is simply waiting for the mailbox to indicate "full", then copying the buffer from device to host, reset the mailbox, and perform the "processing" on it (the validate function). It then goes on to the next buffer in a ping-pong fashion. The actual data "production" by the device is just to fill each buffer with the iteration number. The host then checks to see that the proper iteration number was received.
I've structured the code to call out the actual device "work" function (my_compute_function) which is where you would put whatever your Monte Carlo code is. If your code is nicely thread-independent, this should be straightforward. Thus the device side my_compute_function is the producer function, and the host side validate is the consumer function. If your device producer code is not simply thread independent, then you may need to restructure things slightly around the calling point to my_compute_function.
The net effect of this is that the device can "race ahead" and begin filling the next buffer, while the host is "consuming" the data in the previous buffer.
Because persistent kernel design imposes an upper bound on the number of blocks (and threads) in a kernel launch, I've chosen to implement the "work" producer function in a grid-striding loop, so that arbitrary size buffers can be handled by the given grid-width.
Here's a fully worked example:
$ cat t942.cu
#include <stdio.h>
#define ITERS 1000
#define DSIZE 65536
#define nTPB 256
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
__device__ volatile int blkcnt1 = 0;
__device__ volatile int blkcnt2 = 0;
__device__ volatile int itercnt = 0;
__device__ void my_compute_function(int *buf, int idx, int data){
buf[idx] = data; // put your work code here
}
__global__ void testkernel(int *buffer1, int *buffer2, volatile int *buffer1_ready, volatile int *buffer2_ready, const int buffersize, const int iterations){
// assumption of persistent block-limited kernel launch
int idx = threadIdx.x+blockDim.x*blockIdx.x;
int iter_count = 0;
while (iter_count < iterations ){ // persistent until iterations complete
int *buf = (iter_count & 1)? buffer2:buffer1; // ping pong between buffers
volatile int *bufrdy = (iter_count & 1)?(buffer2_ready):(buffer1_ready);
volatile int *blkcnt = (iter_count & 1)?(&blkcnt2):(&blkcnt1);
int my_idx = idx;
while (iter_count - itercnt > 1); // don't overrun buffers on device
while (*bufrdy == 2); // wait for buffer to be consumed
while (my_idx < buffersize){ // perform the "work"
my_compute_function(buf, my_idx, iter_count);
my_idx += gridDim.x*blockDim.x; // grid-striding loop
}
__syncthreads(); // wait for my block to finish
__threadfence(); // make sure global buffer writes are "visible"
if (!threadIdx.x) atomicAdd((int *)blkcnt, 1); // mark my block done
if (!idx){ // am I the master block/thread?
while (*blkcnt < gridDim.x); // wait for all blocks to finish
*blkcnt = 0;
*bufrdy = 2; // indicate that buffer is ready
__threadfence_system(); // push it out to mapped memory
itercnt++;
}
iter_count++;
}
}
int validate(const int *data, const int dsize, const int val){
for (int i = 0; i < dsize; i++) if (data[i] != val) {printf("mismatch at %d, was: %d, should be: %d\n", i, data[i], val); return 0;}
return 1;
}
int main(){
int *h_buf1, *d_buf1, *h_buf2, *d_buf2;
volatile int *m_bufrdy1, *m_bufrdy2;
// buffer and "mailbox" setup
cudaHostAlloc(&h_buf1, DSIZE*sizeof(int), cudaHostAllocDefault);
cudaHostAlloc(&h_buf2, DSIZE*sizeof(int), cudaHostAllocDefault);
cudaHostAlloc(&m_bufrdy1, sizeof(int), cudaHostAllocMapped);
cudaHostAlloc(&m_bufrdy2, sizeof(int), cudaHostAllocMapped);
cudaCheckErrors("cudaHostAlloc fail");
cudaMalloc(&d_buf1, DSIZE*sizeof(int));
cudaMalloc(&d_buf2, DSIZE*sizeof(int));
cudaCheckErrors("cudaMalloc fail");
cudaStream_t streamk, streamc;
cudaStreamCreate(&streamk);
cudaStreamCreate(&streamc);
cudaCheckErrors("cudaStreamCreate fail");
*m_bufrdy1 = 0;
*m_bufrdy2 = 0;
cudaMemset(d_buf1, 0xFF, DSIZE*sizeof(int));
cudaMemset(d_buf2, 0xFF, DSIZE*sizeof(int));
cudaCheckErrors("cudaMemset fail");
// inefficient crutch for choosing number of blocks
int nblock = 0;
cudaDeviceGetAttribute(&nblock, cudaDevAttrMultiProcessorCount, 0);
cudaCheckErrors("get multiprocessor count fail");
testkernel<<<nblock, nTPB, 0, streamk>>>(d_buf1, d_buf2, m_bufrdy1, m_bufrdy2, DSIZE, ITERS);
cudaCheckErrors("kernel launch fail");
volatile int *bufrdy;
int *hbuf, *dbuf;
for (int i = 0; i < ITERS; i++){
if (i & 1){ // ping pong on the host side
bufrdy = m_bufrdy2;
hbuf = h_buf2;
dbuf = d_buf2;}
else {
bufrdy = m_bufrdy1;
hbuf = h_buf1;
dbuf = d_buf1;}
// int qq = 0; // add for failsafe - otherwise a machine failure can hang
while ((*bufrdy)!= 2); // use this for a failsafe: if (++qq > 1000000) {printf("bufrdy = %d\n", *bufrdy); return 0;} // wait for buffer to be full;
cudaMemcpyAsync(hbuf, dbuf, DSIZE*sizeof(int), cudaMemcpyDeviceToHost, streamc);
cudaStreamSynchronize(streamc);
cudaCheckErrors("cudaMemcpyAsync fail");
*bufrdy = 0; // release buffer back to device
if (!validate(hbuf, DSIZE, i)) {printf("validation failure at iter %d\n", i); exit(1);}
}
printf("Completed %d iterations successfully\n", ITERS);
}
$ nvcc -o t942 t942.cu
$ ./t942
Completed 1000 iterations successfully
$
I've tested the above code and it seems to work well on linux. I believe it should be OK on a windows TCC setup. On windows WDDM, however, I think there are issues that I am still investigating.
Note that the above kernel design attempts to do a grid-wide synchronization using a block-counting atomic strategy. CUDA now (9.0 and newer) has cooperative groups, and that is the recommended approach, rather than the above methodology, to create a grid-wide sync.
This isn't a direct answer to your question but it may be of help.
I am working with a CUDA producer-consumer code that appears to be similar in basic structure to yours. I was hoping to speed up the code by making the CPU and GPU run concurrently. I attempted this by restructuring the code this why
Launch kernel
Copy data
Loop
Launch kernel
CPU work
Copy data
CPU work
This way the CPU can work on the data from the last kernel run while the next set of data is being generated. This cut 30% off the runtime of my code. I am guess thing it could get better if the GPU/CPU work can be balanced so they take roughly the same amount of time.
I am still launching the same kernel 1000s of times. If the overhead of launching a kernel repeatedly is significant then looking for a way to do what I have accomplish with a single launch would be worth it. Otherwise this is probably the best (simplest) solution.
I went through this blog and this videocast. In Windows if I want to retrieve information about a range of pages within the virtual address space of a specified process, I can use WinAPI VirtualQueryEx method:
MEMORY_BASIC_INFORMATION meminfo;
unsigned char *addr = 0;
for(;;)
{
if(!VirtualQueryEx(hProc, addr, &meminfo, sizeof(meminfo)))
break;
if(meminfo.State & MEM_COMMIT)
{
//collect some data from meminfo
}
addr = (unsigned char*)meminfo.BaseAddress + meminfo.RegionSize;
}
I wondered how to get similar set of information in Linux using syscalls, but it is not clear for me how using C/C++ can I gather such a data under Linux. I went through this thread when there are suggestions to take a look at /proc/<pid>/mem or /proc/<pid>/maps files. Is it the good direction? How should look the closest implementation to this one provided here, but for Linux?
Yes, the proc filesystem is part of the Linux API, so this is the way to go. A lot of data in that filesystem is usually accessed using a library wrapper, but that's where the data lie.
As far as I know /proc/<pid>/maps is the only reliable and supported way to do it. Even libunwind is using it:
if (maps_init (&mi, getpid()) < 0)
return -1;
unsigned long offset;
while (maps_next (&mi, &low, &hi, &offset)) {
struct dl_phdr_info info;
info.dlpi_name = mi.path;
info.dlpi_addr = low;
can I get a C++ code to read windows perfmon counter (category, counter name and instance name)?
It's very easy in c# but I needed c++ code.
Thanks
As Doug T. pointed out earlier, I posted a helper class awhile ago to query the performance counter value. The usage of the class is pretty simple, all you have to do is to provide the string for the performance counter.
http://askldjd.wordpress.com/2011/01/05/a-pdh-helper-class-cpdhquery/
However, the code I posted on my blog has been modified in practice. From your comment, it seems like you are interested in querying just a single field.
In this case, try adding the following function to my CPdhQuery class.
double CPdhQuery::CollectSingleData()
{
double data = 0;
while(true)
{
status = PdhCollectQueryData(hQuery);
if (ERROR_SUCCESS != status)
{
throw CException(GetErrorString(status));
}
PDH_FMT_COUNTERVALUE cv;
// Format the performance data record.
status = PdhGetFormattedCounterValue(hCounter,
PDH_FMT_DOUBLE,
(LPDWORD)NULL,
&cv);
if (ERROR_SUCCESS != status)
{
continue;
}
data = cv.doubleValue;
break;
}
return data;
}
For e.g.
To get processor time
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\Processor Information(_Total)\% Processor Time")));
To get file read bytes / sec:
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\System\\File Read Bytes/sec")));
To get % Committed Bytes:
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\Memory\\% Committed Bytes In Use")));
To get the data, do this.
double data = counter->CollectSingleData();
I hope this helps.
... Alan
Some of the commonly used performance values have API calls to get them directly. For example, the total processor time can be obtained from GetSystemTimes, and you can calculate the percentage yourself.
If this isn't an option then the Performance Data Helper library provides a moderately simple interface to performance data.
I would like to create web-service with some problems and judge system. The user should write solution in C++ and then upload the source code. Then code should be compiled and test to determinate if the solution is right(like TopCoder does when you submit your code). There should be time and memory limits for solutions to run. Is there some ready-to-use judge systems to create such a service on Linux?
Do you mean online judge system like acm.timus.ru? Then you could try open source project on code.google.com and I guess that they use it here. I studied before this code a little bit: basically it prevents some system calls and measures spent time and memory. In detail, have a look at here.
I used DomJudge system which is ready for online competitions. http://domjudge.sourceforge.net/
For just checking memory and time caps, just add the code into their project automatically. Also, see Limit physical memory per process for memory caps as well.
For example, to check memory, add:
(EDITED FOR LINUX EXAMPLE)
#include "sys/types.h"
#include "sys/sysinfo.h"
#inlcude <stdio.h>
#include <pthread.h>
public long long getMemoryUsage()
{
sysinfo (&memInfo);
long long totalVirtualMem = memInfo.totalram; //Total physical memory
}
And add a new Thread to their Main() that calls a function as follow:
private const long long MAXMEMORY = 1024; // Memory cap
private const int LIMITAPPLICATION = 100; // 10 seconds
private void TimeLimitToApplication()
{
//LIMITAPPLICATION represents how many 100ms's must
//pass until time limit is reached.
while(COUNTER < LIMITAPPLICATION)
{
sleep(100);
/* Include if including code block below
if(CheckProcessThreadsIsComplete()) break;
*/
//Check memory usage every 100ms
if(getMemoryUsage() > MAXMEMORY)
break;
COUNTER++;
}
pgid = getpgid();
kill(pgid, 15);
}
You might also want to add another check to see if all your threads are completed:
(I only have a C#.NET sample of this, but I'm sure you could find something similar.)
public static bool CheckProcessThreadsIsComplete()
{
Process[] allProcs = Process.GetProcesses();
foreach(Process proc in allProcs)
{
ProcessThreadCollection myThreads = proc.Threads;
foreach(ProcessThread pt in myThreads)
{
//Ignore thread that you started to do time and memory checks
if(pt.Id == TIMELIMIT_AND_MEMORYCHECK_THREAD_ID) continue;
if(pt.ThreadState != (ThreadState.Unstarted | ThreadState.Stopped | ThreadState.WaitSleepJoin | ThreadState.Aborted)
return true; // all threads are not completed
}
}
return false; // all threads are completed
}
And add that to your checks in TimeLimitToApplication();