MPI time measurement got affected by other user?

MPI time measurement got affected by other user? - c++

I am measuring time in my MPI code like this:
MPI_Barrier(MPI_COMM_WORLD);
MPIt1 = MPI_Wtime();
// my code
MPI_Barrier(MPI_COMM_WORLD);
MPIt2 = MPI_Wtime();
MPIelapsed_inverse = MPIt2 - MPIt1;
I am logging into the lab computer with ssh and I run my program. However, these days I am running a really long experiment (it takes about 1 or 2 days to get completed). Yesterday night, another user logged in and did some heavy tasks too, which resulted in loss of the CPU from mine project for some time.
Will this affect my time measurements, or will MPI_Wtime() still report the actual elapsed time? I mean regardless of the other user.

MPI_Wtime reports 'wall-clock' or 'elapsed' time. If another user's program takes clock cycles from your program then the elapsed time of your program, from start to finish, will increase.

Related

How to interpret the output of boost::timer::cpu_timer on multicore machine?

I use boost::timer::cpu_timer to measure the performance of some algorithm in my application. The example output looks like this:
Algo1 duration: 6.755457s wall, 12.963683s user + 1.294808s system = 14.258491s CPU (211.1%)
From boost cpu_timer documentation:
The output of this program will look something like this:
5.713010s wall, 5.709637s user + 0.000000s system = 5.709637s CPU (99.9%)
In other words, this program ran in 5.713010 seconds as would be
measured by a clock on the wall, the operating system charged it for
5.709637 seconds of user CPU time and 0 seconds of system CPU time, the total of these two was 5.709637, and that represented 99.9 percent
of the wall clock time.
What does the value I obtained mean (211.1%), does it mean that more than two cores were involved in execution of my algorithm ?
What is the meaning of user CPU time and system CPU time ?

What does the value I obtained mean (211.1%), does it mean that more than two cores were involved in execution of my algorithm ?
It means that program used a little bit more than twice as much CPU time as wall time. For that to happen, it must have been running on at least three cores for some of that time.
What is the meaning of user CPU time and system CPU time ?
User CPU time is time when the CPU is running user code. System CPU time is time when the CPU is running system code. When you call a system function, like the function to read from a file, you switch from running user code to running system code until that function returns.

Incorrect Time in C++

I have the following code:
clock_t tt = clock();
sleep(10);
tt = clock()-tt;
cout<<(float)tt/CLOCKS_PER_SEC<<" "<<CLOCKS_PER_SEC<<endl;
When I run the code, it apparently pauses for 10 seconds and the output is:
0.001074 1000000
This indicates it passed 1074 clock ticks and 1ms, which is apparently false.
Why does this happen?
I am using g++ under linux.

The function clocks returns the processor time consumed by the program. While sleeping, your process does not use any amount of processing, so this is expected. The amount of time your program is showing could be from the clock function calling.

clock() doesn't measure elapsed time (what you would measure with a stopwatch), it measures the time spent by your program running on the CPU. But sleep() almost don't use any CPU, it simply makes your process going to sleep. Try to modify sleep(10) by any other value sleep(1)for example, and you will get the same result.

OpenCL time measurment issues with AMD GPU

I recently compared 2 kinds of doing kernel runtime measuring and I see some confusing results.
I use an AMD Bobcat CPU (E-350) with integrated GPU and Ubuntu Linux (CL_PLATFORM_VERSION is OpenCL 1.2 AMD-APP (923.1)).
The basic gettimeofday idea looks like this:
clFinish(...) // that all tasks are finished on the command queue
gettimeofday(&starttime,0x0)
clEnqueueNDRangeKernel(...)
clFlush(...)
clWaitForEvents(...)
gettimeofday(&endtime,0x0)
This says the kernel needs around 5466 ms.
Second time measurement I did with clGetEventProfilingInfo for QUEUED / SUBMIT / START / END.
With the 4 time values I can calculate the time spend in the different states:
time spend queued: 0.06 ms,
time spend submitted: 2733 ms,
time spend in execution: 2731 ms (actual execution time).
I see that it adds up to the 5466 ms, but why does it stay in submitted state for half the time?
And the funny things are:
the submitted state is always half of the actual execution time, even for different kernels or different workload (so it can't be a constant setup time),
for the CPU the time spend in submitted state is 0 and the execution time is equal to the gettimeofday result,
I tested my kernels on an Intel Ivy Bridge with windows using CPU and GPU and I didn't see the effects there.
Does anyone have a clue?
I suspect that either the GPU runs the kernel twice (resulting in gettimeofday being double of the actual execution time) or that the function clGetEventProfilingInfo is not working correctly for the AMD GPU.

I posted the problem in an AMD forum. They say it's a bug in the AMD profiler.
http://devgurus.amd.com/thread/159809

how to detect if a thread or process is getting starved due to OS scheduling

This is on Linux OS. App is written in C++ with ACE library.
I am suspecting that one of the thread in the process is getting blocked for unusually long time(5 to 40 seconds) sometimes. The app runs fine most of the times except couple times a day it has this issue. There are other similar 5 apps running on the box which are also I/O bound due to heavy socket incoming data.
I would like to know if there is any thing I can do programatically to see if the thread/process are getting their time slice.

If a process is being starved out, self monitoring for that process would not be that productive. But, if you just want that process to notice it hasn't been run in a while, it can call times periodically and compare the relative difference in elapsed time with the relative difference in scheduled user time (you would sum the tms_utime and tms_cutime fields if you want to count waiting for children as productive time, and you would sum in the tms_stime and tms_cstime fields if you count kernel time spent on your behalf to be productive time). For thread times, the only way I know of is to consult the /proc filesystem.
A high priority external process or high priority thread could externally monitor processes (and threads) of interest by reading the appropriate /proc/<pid>/stat entries for the process (and /proc/<pid>/task/<tid>/stat for the threads). The user times are found in the 14th and 16th fields of the stat file. The system times are found in the 15th and 17th fields. (The field positions are accurate for my Linux 2.6 kernel.)
Between two time points, you determine the amount of elapsed time that has passed (a monitor process or thread would usually wake up at regular intervals). Then the difference between the cumulative processing times at each of those time points represents how much time the thread of interest got to run during that time. The ratio of processing time to elapsed time would represent the time slice.
One last bit of info: On Linux, I use the following to obtain the tid of the current thread for examining the right task in the /proc/<pid>/task/ directory:
tid = syscall(__NR_gettid);
I do this, because I could not find the gettid system call actually exported by any library on my system, even though it was documented. But, it might be available on yours.

Modify Time for simulation in c++

i am writing a program which simulates an activity, i am wondering how to speed up time for the simulation, let say 1 hour in the real world is equal to 1 month in the program.
thank you
the program is actually similar to a restaurant simulation where you dont really know when customer come. let say we pick a random number (2-10) customer every one hour

It depends on how it gets time now.
For example, if it calls Linux system time(), just replace that with your own function (like mytime) which returns speedier times. Perhaps mytime calls time and multiplies the returned time by whatever factor makes sense. 1 hr = 1 month is 720 times. Handling the origin as when the program begins should be accounted for:
time_t t0;
main ()
{
t0 = time(NULL); // at program initialization
....
for (;;)
{
time_t sim_time = mytime (NULL);
// yada yada yada
...
}
}
time_t mytime (void *)
{
return 720 * (time (NULL) - t0); // account for time since program started
// and magnify by 720, so one hour is one month
}

You just do it. You decide how many events take place in an hour of simulation time (eg., if an event takes place once a second, then after 3600 simulated events you've simulated an hour of time). There's no need for your simulation to run in real time; you can run it as fast as you can calculate the relevant numbers.

It sounds like you are implementing a Discrete Event Simulation. You don't even need to have a free-running timer (no matter what scaling you may use) in such a situation. It's all driven by the events. You have a priority queue containing events, ordered by the event time. You have a processing loop which takes the event at the head of the queue, and advances the simulation time to the event time. You process the event, which may involve scheduling more events. (For example, the customerArrived event may cause a customerOrdersDinner event to be generated 2 minutes later.) You can easily simulate customers arriving using random().
The other answers I've read thus far are still assuming you need a continuous timer, which is usually not the most efficient way of simulating an event-driven system. You don't need to scale real time to simulation time, or have ticks. Let the events drive time!

If the simulation is data dependent (like a stock market program), just speed up the rate at which the data is pumped. If it is some think that depends on time() calls you will have to do some thing like wallyk's answer (assuming you have the source code).

If time in your simulation is discrete, one option is to structure your program so that something happens "every tick".
Once you do that, time in your program is arbitrarily fast.
Is there really a reason for having a month of simulation time correspond exactly to an hour of time in the real world ? If yes, you can always process the number of ticks that correspond to a month, and then pause the appropriate amount of time to let an hour of "real time" finish.
Of course, a key variable here is the granularity of your simulation, i.e. how many ticks correspond to a second of simulated time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js