I am doing a benchmark project between two graphical libraries (SDL, SFML) for my final cs project. I got it almost finished, however when I benchmark the speed of playing sounds, it always returns time taken 0, no matter how many loops he does. Do you know whats wrong with my code? The sound actually plays, however I should probably do some other algorithm.
void playSound()
{
Mix_PlayChannel(-1, sound, 0);
}
void soundBenchmark(int numOfCycles)
{
int time = SDL_GetTicks(), timeRequired;
for(int i = 0; i < numOfCycles; i++) playSound();
timeRequired = SDL_GetTicks() - time;
cout << "Time required for " << numOfCycles << " cycles: " << timeRequired << " seconds.\n";
}
The function Mix_PlayChannel() does not block the execution of the code. The function just send the data to the sound card( or equivalent) and returns.
You are going to have to remember the channel you used with Mix_PlayChannel() and then check periodically with Mix_Playing() whether that channel is playing or not and look at the time.
Related
I have written a small c++ program that receives data from the USRP. The program can receive the I/Q data and show it on a spectrum analyzer. The receiver LED is not always green though. It sorts of blinking and dimming. I suspect there is a rate mismatch between the computer and the USRP. Could this be the case? How does one make sure that the computer consumes the samples at the same rate as the USRP is acquiring them? Below is a thread function I use for the I/Q signal acquisition.
void
USRPDriver::RxEventLoop()
{
uhd::rx_metadata_t md;
uhd::stream_cmd_t stream_cmd(uhd::stream_cmd_t::STREAM_MODE_NUM_SAMPS_AND_DONE);
stream_cmd.stream_now = true;
stream_cmd.num_samps = 1024;
//std::cout << "Maximum num samps = " << rx_stream->get_max_num_samps() << std::endl;
std::vector<std::complex<float> > fcpxIQ;
fcpxIQ.resize(1024);
usrp->issue_stream_cmd(stream_cmd);
while(true)
{
usrp->issue_stream_cmd(stream_cmd);
size_t num_rx_samps = rx_stream->recv(&fcpxIQ[0], 1024, md);
emit ReceiveIQ(fcpxIQ);
//std::cout << "Rx rate = " << usrp->get_rx_rate(0) << std::endl;
//fcpxIQ.clear();
}
}
you should not use NUM_SAMPS_AND_DONE if you want continuous streaming. That's exactly not the use case it's for: It tells the USRP to stop receiving once 1024 samples have been received.
Simply don't use that mode.
I am trying to create a function that will allow me to enter the desired frames per second and the maximum frame count and then have the function "cout" to the console on the fixed time steps. I am using Sleep() to avoid busy waiting as well. I seem to make the program sleep longer than it needs to because it keeps stalling on the sleep command i think. Can you help me with this? i am having some trouble understanding time, especially on windows.
Ultimately i will probably use this timing method to time and animate a simple game , maybe like pong, or even a simple program with objects that can accelerate. I think i already understand GDI and wasapi enough to play sound and show color on the screen, so now i need to understand timing. I have looked for a long time before asking this question on the internet and i am sure that i am missing something, but i can't quite put my finger on it :(
here is the code :
#include <windows.h>
#include <iostream>
// in this program i am trying to make a simple function that prints frame: and the number frame in between fixed time intervals
// i am trying to make it so that it doesn't do busy waiting
using namespace std;
void frame(LARGE_INTEGER& T, LARGE_INTEGER& T3, LARGE_INTEGER& DELT,LARGE_INTEGER& DESI, double& framepersec,unsigned long long& count,unsigned long long& maxcount,bool& on, LARGE_INTEGER& mili)
{
QueryPerformanceCounter(&T3); // seccond measurement
DELT.QuadPart = &T3.QuadPart - &T.QuadPart; // getting the ticks between the time measurements
if(DELT.QuadPart >= DESI.QuadPart) {count++; cout << "frame: " << count << " !" << endl; T.QuadPart = T3.QuadPart; } // adding to the count by just one frame (this may cause problems if more than one passes)
if(count > maxcount) {on = false;} // turning off the loop
else {DESI.QuadPart = T.QuadPart + DESI.QuadPart;//(long long)framepersec; // setting the stop tick
unsigned long long sleep = (( DESI.QuadPart - DELT.QuadPart) / mili.QuadPart);
cout << sleep << endl;
Sleep(sleep);} // sleeping to avoid busy waiting
}
int main()
{
LARGE_INTEGER T1, T2, Freq, Delta, desired, mil;
bool loopon = true; // keeps the loop flowing until max frames has been reached
QueryPerformanceFrequency(&Freq); // getting num of updates per second
mil.QuadPart = Freq.QuadPart / 1000; // getting the number clock updates that occur in a millisecond
double framespersec; // the number of clock updates that occur per target frame
unsigned long long framecount,maxcount; //to stop the program after a certain amount of frames
framecount = 0;
cout << "Hello world! enter the amount of frames per second : " << endl;
cin >> framespersec;
cout << "you entered: " << framespersec << " ! how many max frames?" << endl;
cin >> maxcount;
cout << "you entered: " << maxcount << " ! now doing the frames !!!" << endl;
desired.QuadPart = (Freq.QuadPart / framespersec);
while(loopon == true) {
frame(T1, T2, Delta, desired, framespersec, framecount, maxcount,loopon, mil);
}
cout << "all frames are done!" << endl;
return 0;
}
The time that you sleep is limited by the frequency of the system clock. The frequency defaults to 64 Hz, so you'll end up seeing sleeps in increments of 16ms. Any sleep that's less than 16ms will be at least 16ms long - it could be longer depending on CPU load. Likewise, a sleep of 20ms will likely be rounded up to 32ms.
You can change this period by calling timeBeginPeriod(...) and timeEndPeriod(...), which can increase sleep accuracy to 1ms. If you have a look at multimedia apps like VLC Player, you'll see that they use these functions to get reliable frame timing. Note that this changes the system wide scheduling rate, so it will affect battery life on laptops.
More info:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686298%28v=vs.85%29.aspx
Waitable timers are more accurate than Sleep, and also integrate with a GUI message loop better (replace GetMessage with MsgWaitForMultipleObjects). I've used them successfully for graphics timing before.
They won't get you high precision for e.g. controlling serial or network output at sub-millisecond timing, but UI updates are limited by VSYNC anyway.
I have been looking at the performance of our C++ server application running on embedded Linux (ARM). The pseudo code for the main processing loop of the server is this -
for i = 1 to 1000
Process item i
Sleep for 20 ms
The processing for one item takes about 2ms. The "Sleep" here is really a call to the Poco library to do a "tryWait" on an event. If the event is fired (which it never is in my tests) or the time expires, it comes returns. I don't know what system call this equates to. Although we ask for a 2ms block, it turns out to be roughly 20ms. I can live with that - that's not the problem. The sleep is just an artificial delay so that other threads in the process are not starved.
The loop takes about 24 seconds to go through 1000 items.
The problem is, we changed the way the sleep is used so that we had a bit more control. I mean - 20ms delay for 2ms processing doesn't allow us to do much processing. With this new parameter set to a certain value it does something like this -
For i = 1 to 1000
Process item i
if i % 50 == 0 then sleep for 1000ms
That's the rough code, in reality the number of sleeps is slightly different and it happens to work out at a 24s cycle to get through all the items - just as before.
So we are doing exactly the same amount of processing in the same amount of time.
Problem 1 - the CPU usage for the original code is reported at around 1% (it varies a little but that's about average) and the CPU usage reported for the new code is about 5%. I think they should be the same.
Well perhaps this CPU reporting isn't accurate so I thought I'd sort a large text file at the same time and see how much it's slowed up by our server. This is a CPU bound process (98% CPU usage according to top). The results are very odd. With the old code, the time taken to sort the file goes up by 21% when our server is running.
Problem 2 - If the server is only using 1% of the CPU then wouldn't the time taken to do the sort be pretty much the same?
Also, the time taken to go through all the items doesn't change - it's still 24 seconds with or without the sort running.
Then I tried the new code, it only slows the sort down by about 12% but it now takes about 40% longer to get through all the items it has to process.
Problem 3 - Why do the two ways of introducing an artificial delay cause such different results. It seems that the server which sleeps more frequently but for a minimum time is getting more priority.
I have a half baked theory on the last one - whatever the system call that is used to do the "sleep" is switching back to the server process when the time is elapsed. This gives the process another bite at the time slice on a regular basis.
Any help appreciated. I suspect I'm just not understanding it correctly and that things are more complicated than I thought. I can provide more details if required.
Thanks.
Update: replaced tryWait(2) with usleep(2000) - no change. In fact, sched_yield() does the same.
Well I can at least answer problem 1 and problem 2 (as they are the same issue).
After trying out various options in the actual server code, we came to the conclusion that the CPU reporting from the OS is incorrect. It's quite result so to make sure, I wrote a stand alone program that doesn't use Poco or any of our code. Just plain Linux system calls and standard C++ features. It implements the pseudo code above. The processing is replaced with a tight loop just checking the elapsed time to see if 2ms is up. The sleeps are proper sleeps.
The small test program shows exactly the same problem. i.e. doing the same amount of processing but splitting up the way the sleep function is called, produces very different results for CPU usage. In the case of the test program, the reported CPU usage was 0.0078 seconds using 1000 20ms sleeps but 1.96875 when a less frequent 1000ms sleep was used. The amount of processing done is the same.
Running the test on a Linux PC did not show the problem. Both ways of sleeping produced exactly the same CPU usage.
So clearly a problem with our embedded system and the way it measures CPU time when a process is yielding so often (you get the same problem with sched_yeild instead of a sleep).
Update: Here's the code. RunLoop is where the main bit is done -
int sleepCount;
double getCPUTime( )
{
clockid_t id = CLOCK_PROCESS_CPUTIME_ID;
struct timespec ts;
if ( id != (clockid_t)-1 && clock_gettime( id, &ts ) != -1 )
return (double)ts.tv_sec +
(double)ts.tv_nsec / 1000000000.0;
return -1;
}
double GetElapsedMilliseconds(const timeval& startTime)
{
timeval endTime;
gettimeofday(&endTime, NULL);
double elapsedTime = (endTime.tv_sec - startTime.tv_sec) * 1000.0; // sec to ms
elapsedTime += (endTime.tv_usec - startTime.tv_usec) / 1000.0; // us to ms
return elapsedTime;
}
void SleepMilliseconds(int milliseconds)
{
timeval startTime;
gettimeofday(&startTime, NULL);
usleep(milliseconds * 1000);
double elapsedMilliseconds = GetElapsedMilliseconds(startTime);
if (elapsedMilliseconds > milliseconds + 0.3)
std::cout << "Sleep took longer than it should " << elapsedMilliseconds;
sleepCount++;
}
void DoSomeProcessingForAnItem()
{
timeval startTime;
gettimeofday(&startTime, NULL);
double processingTimeMilliseconds = 2.0;
double elapsedMilliseconds;
do
{
elapsedMilliseconds = GetElapsedMilliseconds(startTime);
} while (elapsedMilliseconds <= processingTimeMilliseconds);
if (elapsedMilliseconds > processingTimeMilliseconds + 0.1)
std::cout << "Processing took longer than it should " << elapsedMilliseconds;
}
void RunLoop(bool longSleep)
{
int numberOfItems = 1000;
timeval startTime;
gettimeofday(&startTime, NULL);
timeval startMainLoopTime;
gettimeofday(&startMainLoopTime, NULL);
for (int i = 0; i < numberOfItems; i++)
{
DoSomeProcessingForAnItem();
double elapsedMilliseconds = GetElapsedMilliseconds(startTime);
if (elapsedMilliseconds > 100)
{
std::cout << "Item count = " << i << "\n";
if (longSleep)
{
SleepMilliseconds(1000);
}
gettimeofday(&startTime, NULL);
}
if (longSleep == false)
{
// Does 1000 * 20 ms sleeps.
SleepMilliseconds(20);
}
}
double elapsedMilliseconds = GetElapsedMilliseconds(startMainLoopTime);
std::cout << "Main loop took " << elapsedMilliseconds / 1000 <<" seconds\n";
}
void DoTest(bool longSleep)
{
timeval startTime;
gettimeofday(&startTime, NULL);
double startCPUtime = getCPUTime();
sleepCount = 0;
int runLoopCount = 1;
for (int i = 0; i < runLoopCount; i++)
{
RunLoop(longSleep);
std::cout << "**** Done one loop of processing ****\n";
}
double endCPUtime = getCPUTime();
std::cout << "Elapsed time is " <<GetElapsedMilliseconds(startTime) / 1000 << " seconds\n";
std::cout << "CPU time used is " << endCPUtime - startCPUtime << " seconds\n";
std::cout << "Sleep count " << sleepCount << "\n";
}
void testLong()
{
std::cout << "Running testLong\n";
DoTest(true);
}
void testShort()
{
std::cout << "Running testShort\n";
DoTest(false);
}
I am working on some grid generation code, during which I really want to see where I am, so I download a piece of progress bar code from internet and then inserted it into my code, something like:
std::string bar;
for(int i = 0; i < 50; i++)
{
if( i < (percent/2))
{
bar.replace(i,1,"=");
}
else if( i == (percent/2))
{
bar.replace(i,1,">");
}
else
{
bar.replace(i,1," ");
}
}
std::cout<< "\r" "[" << bar << "] ";
std::cout.width( 3 );
std::cout<< percent << "% "
<< " ieration: " << iterationCycle << std::flush;
This is very straightforward. However, it GREATLY slows down the whole process, note percent=iterI/nIter.
I am really get annoyed with this, I am wondering if there is any smarter and more efficient way to print a progress bar to the screen.
Thanks a million.
Firstly you could consider only updating it on every 100 or 1000 iterations. Secondly, I don't think the division is the bottleneck, but much rather the string operations and the outputting itself.
I guess the only significant improvement would be to just output less often.
Oh and just for good measure - an efficient way to only execute the code every, say, 1024 iterations, would be not to see if 1024 is a divisor using the modulo operations, but rather using bitwise calls. Something along the lines of
if (iterationCycle & 1024) {
would work. You'd be computing the bitwise AND of iterationCycle and 1024, only returning positive for every time the bit on the 10th position would be a 1. These kind of operations are done extremely fast, as your CPU has specific hardware for them.
You might be overthinking this. I would just output a single character every however-many cycles of your main application code. Run some tests to see how many (hundreds? millions?), but you shouldn't print more than say once a second. Then just do:
std::fputc('*', stdout);
std::fflush(stdout);
You should really check "efficiency", but what would work almost the same ist boost.progress:
#include <boost/progress.hpp>
...
boost::progress_display pd(50);
for (int i=0; i<=60; i++) {
++pd;
}
and as Joost already answered, output less often
Is there any way in C++ to calculate how long does it take to run a given program or routine in CPU time?
I work with Visual Studio 2008 running on Windows 7.
If you want to know the total amount of CPU time used by a process, neither clock nor rdtsc (either directly or via a compiler intrinsic) is really the best choice, at least IMO. If you need the code to be portable, about the best you can do is use clock, test with the system as quiescent as possible, and hope for the best (but if you do, be aware that the resolution of clock is CLOCKS_PER_SEC, which may or may not be 1000, and even if it is, your actual timing resolution often won't be that good -- it may give you times in milliseconds, but at least normally advance tens of milliseconds at a time).
Since, however, you don't seem to mind the code being specific to Windows, you can do quite a bit better. At least if my understanding of what you're looking for is correctly, what you really want is probably GetProcessTimes, which will (separately) tell you both kernel-mode and user-mode CPU usage of the process (as well as the start time and exit time, from which you can compute wall time used, if you care). There's also QueryProcessCycleTime, which will tell you the total number of CPU clock cycles used by the process (total of both user and kernel mode in all threads). Personally, I have a hard time imagining much use for the latter though -- counting individual clock cycles can be useful for small sections of code subject to intensive optimization, but I'm less certain about how you'd apply it to a complete process. GetProcessTimes uses FILETIME structures, which support resolutions of 100 nanoseconds, but in reality most times you'll see will be multiples of the scheduler's time slice (which varies with the version of windows, but is on the order of milliseconds to tens of milliseconds).
In any case, if you truly want time from beginning to end, GetProcessTimes will let you do that -- if you spawn the program (e.g., with CreateProcess), you'll get a handle to the process which will be signaled when the child process exits. You can then call GetProcessTimes on that handle, and retrieve the times even though the child has already exited -- the handle will remain valid as long as at least one handle to the process remains open.
Here's one way. It measures routine exeution time in milliseconds.
clock_t begin=clock(); starts before the route is executed and clock_t end=clock(); starts right after the routine exits.
The two time sets are then subtracted from each other and the result is a millisecod value.
#include <stdio.h>
#include <iostream>
#include <time.h>
using namespace std;
double get_CPU_time_usage(clock_t clock1,clock_t clock2)
{
double diffticks=clock1-clock2;
double diffms=(diffticks*1000)/CLOCKS_PER_SEC;
return diffms;
}
void test_CPU_usage()
{
cout << "Standby.. measuring exeution time: ";
for (int i=0; i<10000;i++)
{
cout << "\b\\" << std::flush;
cout << "\b|" << std::flush;
cout << "\b/" << std::flush;
cout << "\b-" << std::flush;
}
cout << " \n\n";
}
int main (void)
{
clock_t begin=clock();
test_CPU_usage();
clock_t end=clock();
cout << "Time elapsed: " << double(get_CPU_time_usage(end,begin)) << " ms ("<<double(get_CPU_time_usage(end,begin))/1000<<" sec) \n\n";
return 0;
}
The __rdtscp intrinsic will give you the time in CPU cycles with some caveats.
Here's the MSDN article
It depends really what you want to measure. For better results take the average of a few million (if not billion) iterations.
The clock() function [as provided by Visual C++ 2008] doesn't return processor time used by the program, while it should (according to the C standard and/or C++ standard). That said, to measure CPU time on Windows, I have this helper class (which is inevitably non-portable):
class ProcessorTimer
{
public:
ProcessorTimer() { start(); }
void start() { ::GetProcessTimes(::GetCurrentProcess(), &ft_[3], &ft_[2], &ft_[1], &ft_[0]); }
std::tuple<double, double> stop()
{
::GetProcessTimes(::GetCurrentProcess(), &ft_[5], &ft_[4], &ft_[3], &ft_[2]);
ULARGE_INTEGER u[4];
for (size_t i = 0; i < 4; ++i)
{
u[i].LowPart = ft_[i].dwLowDateTime;
u[i].HighPart = ft_[i].dwHighDateTime;
}
double user = (u[2].QuadPart - u[0].QuadPart) / 10000000.0;
double kernel = (u[3].QuadPart - u[1].QuadPart) / 10000000.0;
return std::make_tuple(user, kernel);
}
private:
FILETIME ft_[6];
};
class ScopedProcessorTimer
{
public:
ScopedProcessorTimer(std::ostream& os = std::cerr) : timer_(ProcessorTimer()), os_(os) { }
~ScopedProcessorTimer()
{
std::tuple<double, double> t = timer_.stop();
os_ << "user " << std::get<0>(t) << "\n";
os_ << "kernel " << std::get<1>(t) << "\n";
}
private:
ProcessorTimer timer_;
std::ostream& os_;
}
For example, one can measure how long it takes a block to execute, by defining a ScopedProcessorTimer at the beginning of that {} block.
This Code is Process Cpu Usage
ULONGLONG LastCycleTime = NULL;
LARGE_INTEGER LastPCounter;
LastPCounter.QuadPart = 0; // LARGE_INTEGER Init
// cpu get core number
SYSTEM_INFO sysInfo;
GetSystemInfo(&sysInfo);
int numProcessors = sysInfo.dwNumberOfProcessors;
HANDLE hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, Process::pid);
if (hProcess == NULL)
nResult = 0;
int count = 0;
while (true)
{
ULONG64 CycleTime;
LARGE_INTEGER qpcLastInt;
if (!QueryProcessCycleTime(hProcess, &CycleTime))
nResult = 0;
ULONG64 cycle = CycleTime - LastCycleTime;
if (!QueryPerformanceCounter(&qpcLastInt))
nResult = 0;
double Usage = cycle / ((double)(qpcLastInt.QuadPart - LastPCounter.QuadPart));
// Scaling
Usage *= 1.0 / numProcessors;
Usage *= 0.1;
LastPCounter = qpcLastInt;
LastCycleTime = CycleTime;
if (count > 3)
{
printf("%.1f", Usage);
break;
}
Sleep(1); // QueryPerformanceCounter Function Resolution is 1 microsecond
count++;
}
CloseHandle(hProcess);