Hi i am using QueryperformanceFrequency to get the No of cycle i.e processor speed.
But it is showing me the wrong value. It is written in the specicfication is the Processor is about 400MHz, but what we are getting through code is something 16MHz.
Please porvide any pointer :
The code for Wince device is:
LARGE_INTEGER FrequnecyCounter;
QueryPerformanceFrequency(&FrequnecyCounter);
CString temp;
temp.Format(L"%lld",FrequnecyCounter.QuadPart)`AfxMessageBox(temp);
Thanks,
Mukesh
QueryPerformanceFrequency returns frequency of the counter peripheral not of the processor. These peripheral typically runs at the original Crystal clock frequency. 16Mhz should be good enough resolution for you to measure fine grain intervals.
QPF doesn't return the CPU clock speed. It returns the frequency of a high performance timer. On a few systems, it might actually measure CPU cycles. On other systems, it might use s a separate timer running at the same frequency. (but which is unaffected by things like SpeedStep which can change the clock speed of the CPU). Often, it uses a separate timer entirely, one which may not even be on the CPU itself, but may be part of the motherboard.
QueryPerformanceCounter/QueryPerformanceFrequency only promise that they use the best timer available on the system. They make no promises about what that timer might be.
Related
Previously in my main loop of the game, the time was managed at 60 FPS and the respective Delay for the time delay.
The Sprite sequence was animated as follows:
<pre>
if(++ciclos > 10){
siguienteSprite++;
ciclos = 0;
}
</pre>
Given that I am using Smooth Motion with DeltaTime, therefore I have eliminated Delay from the Main Cycle; Making this the sprites cycles of the animation are faster, and not only this, but also the time between each sequence varies.
Someone could give me a hand, only with the logic of this problem, beforehand thank you. :)
delay in main loop is not really a good way for this (as it is not accounting for the time other stuff in your main loop takes). When you removed delay then the speed is bigger and varying more because the other stuff in your main loop timing is more significant and usually non constant for many reasons like:
OS granularity
synchronization with gfx card/driver
non constant processing times
There are more ways how to handle this:
measure time
<pre>
t1=get_actual_time();
while (t1-t0>=animation_T)
{
siguienteSprite++;
t0+=animation_T;
}
// t0=t1; // this is optional and change the timing properties a bit
</pre>
where t0 is some global variable holding "last" measured time os sprite change. t1 is actual time and animation_T is time constant between animation changes. To measure time you need to use OS api like PerformanceCounter on windows or RDTSC in asm or any other you got at hand but with small enough resolution.
OS timer
simply increment the siguienteSprite in some timer with animation_T interval. This is simple but OS timers are not precise and usually of around 1ms or more + OS granularity (similar to Sleep accuracy).
Thread timer
you can create single thread for timing purposes for example something like this:
for (;!threads_stop;)
{
Delay(animation_T); // or Sleep()
siguienteSprite++;
}
Do not forget that siguienteSprite must be volatile and buffered during rendering to avoid flickering and or access violation errors. This approach is a bit more precise (unless you got single core CPU).
You cam also increment some time variable instead and use that as actual time in your app with any resolution you want. But beware if delay is not returning CPU control to OS then this approach will utilize your CPU to 100%/CPU_cores. There is remedy for this and that is replacing your delay with this:
Sleep(0.9*animation_T);
for (;;)
{
t1=get_actual_time();
if (t1-t0>=animation_T)
{
siguienteSprite++;
t0=t1;
break;
}
If you are using measured time then you should handle overflows (t1<t0) because any counter will overflow after time. For example using 32bit part of RDTSC on 3.2 GHz CPU core will overflow every 2^32/3.2e9 = 1.342 sec so it is real possibility. If my memory serves well then Performance counters in Windows usually runs around 3.5 MHz on older OS systems and around 60-120 MHz on newer (at least last time I check) and are 64 bit so the overflows are not that big of a problem (unless you run 24/7). Also in case of RDTSC use you should set process/thread affinity to single CPU core to avoid timing problems on multi core CPUs.
I did my share of benchmarking and advanced high resolution timing at low level over the years so here few related QAs of mine:
wrong clock cycle measurements with rdtsc - OS Granularity
Measuring Cache Latencies - measuring CPU frequency
Cache size estimation on your system? - PerformanceCounter example
Questions on Measuring Time Using the CPU Clock - PIT as alternative timing source
Taking the timestamp in linux is same as clock cycles in ticks for linux.
I know how to get the timestamp but I dont now how to take the clock cycles in ticks.
could someone tell me what is the difference between the two ??
could someone give me a small example for clock cycles in ticks ??
Looking forward for your reply.
extra : Am trying to add assembly instructions for timing in pisa architecture using simplescalar simulator. For my instructions i should access clock cycles and store it in the register.
Well, there are:
Ticks. Modern kernels don't use them.
Timestamp counter, this is what you may like to use for high-resolution time measurements, but you have to understand how to make unbiased measurements with it.
Clocks, portable clock_gettime, nanosecond-resolution, often good enough for everything.
If you want to get access to CPU performance registers, you can start with IntelĀ® Performance Counter Monitor inside your programs.
This question is not about timing something accurately on Windows (XP or better), but rather about doing something very rapidly via callback or interrupt.
I need to be doing something regularly every 1 millisecond, or preferably even every 100 microseconds. What I need to do is drive some assynchronous hardware (ethernet) at this rate to output a steady stream of packets to the network, and make that stream appear to be as regular and synchronous as possible. But if the question can be separated from the (ethernet) device, it would be good to know the general answer.
Before you say "don't even think about using Windows!!!!", a little context. Not all real-time systems have the same demands. Most of the time songs and video play acceptably on Windows despite needing to handle blocks of audio or images every 10-16ms or so on average. With appropriate buffering, Windows can have its variable latencies, but the hardware can be broadly immune to them, and keep a steady synchronous stream of events happening. Even so, most of us tolerate the occasional glitch. My application is like that - probably quite tolerant.
The expensive option for me is to port my entire application to Linux. But Linux is simply different software running on the same hardware, so my strong preference is to write some better software, and stick with Windows. I have the luxury of being able to eliminate all competing hardware and software (no internet or other network access, no other applications running, etc). Do I have any prospect of getting Windows to do this? What limitations will I run into?
I am aware that my target hardware has a High Performance Event Timer, and that this timer can be programmed to interrupt, but that there is no driver for it. Can I write one? Are there useful examples out there? I have not found one yet. Would this interfere with QueryPerformanceCounter? Does the fact that I'm going to be using an ethernet device mean that it all becomes simple if I use select() judiciously?
Pointers to useful articles welcomed - I have found dozens of overviews on how to get accurate times, but none yet on how to do something like this other than by using what amounts to a busy wait. Is there a way to avoid a busy wait? Is there a kernel mode or device driver option?
You should consider looking at the Multimedia Timers. These are timers that are intended to the sort of resolution you are looking at.
Have a look here on MSDN.
I did this using DirectX 9, using the QueryPerformanceCounter, but you will need to hog at least one core, as task switching will mess you up.
For a good comparison on tiemers you can look at
http://www.geisswerks.com/ryan/FAQS/timing.html
If you run into timer granularity issues, I would suggest using good old Sleep() with a spin loop. Essentially, the code should do something like:
void PrecisionSleep(uint64 microSec)
{
uint64 start_time;
start_time = GetCurrentTime(); // assuming GetCurrentTime() returns microsecs
// Calculate number of 10ms intervals using standard OS sleep.
Sleep(10*(microSec/10000)); // assuming Sleep() takes millisecs as argument
// Spin loop to spend the rest of the time in
while(GetCurrentTime() - start_time < microSec)
{}
}
This way, you will have a high precision sleep which wouldn't tax your CPU much if a lot of them are larger than the scheduling granularity (assumed 10ms). You can send your packets in a loop while you use the high precision sleep to time them.
The reason audio works fine on most systems is that the audio device has its own clock. You just buffer the audio data to it and it takes care of playing it and interrupts the program when the buffer is empty. In fact, a time skew between the audio card clock and the CPU clock can cause problems if a playback engine relies on the CPU clock.
EDIT:
You can make a timer abstraction out of this by using a thread which uses a lock protected min heap of timed entries (the heap comparison is done on the expiry timestamp) and then you can either callback() or SetEvent() when the PrecisionSleep() to the next timestamp completes.
Use NtSetTimerResolution when program starts up to set timer resolution. Yes, it is undocumented function, but works well. You may also use NtQueryTimerResolution to know timer-resolution (before setting and after setting new resolution to be sure).
You need to dynamically get the address of these functions using GetProcAddress from NTDLL.DLL, as it is not declared in header or any LIB file.
Setting timer resolution this way would affect Sleep, Windows timers, functions that return current time etc.
I there any method to sleep the thread upto 100.8564 millisecond under window OS. I am using multimedia timer but its resolution is minimum 1 second. Kindly guide me so that I can handle the fractional part of the millisecond.
Yes you can do it. See QueryPerformanceCounter() to read accurate time, and make a busy loop.
This will enable you to make waits with up to 10 nanosecond resolution, however, if thread scheduler decides to steal control from you at the moment of the cycle end, it will, and there's nothing you can do about it except assigning your process realtime priority.
You may also have a look at this: http://msdn.microsoft.com/en-us/library/ms838340(WinEmbedded.5).aspx
Several frameworks were developed to do hard realtime on windows.
Otherwise, your question probably implies that you might be doing something wrong. There're numerous mechanisms to trick around ever needing precise delays, such as using proper bus drivers (in case of hardware/IO, or respective DMAs if you are designing a driver), and more.
Please tell us what exactly are you building.
I do not know your use case, but even a high end realtime operating system would be hard pressed to achieve less 100ns jitter on timings.
In most cases I found you do not need that precision in reproducibility but only for long time drift. In that respect it is relatively straightforward to keep a timeline and calculate the event on the desired precision. Then use that timeline to synchronize the events which may be off even by 10's of ms. As long as these errors do not add up, I found I got adequate performance.
If you need guaranteed latency, you cannot get it with MS Windows. It's not a realtime operating system. It might swap in another thread or process at an importune instant. You might get a cache miss. When I did a robot controller a while back, I used an OS called On Time RTOS 32. It has an MS Windows API emulation layer. You can use it with Visual Studio. You'll need something like that.
The resolution of a multimedia timer is much better than one second. It can go down to 1 millisecond when you call timeBeginPeriod(1) first. The timer will automatically adjust its interval for the next call when the callback is delivered late. Which is inevitable on a multi-tasking operating system, there is always some kind of kernel thread with a higher priority than yours that will delay the callback.
While it will work pretty well on average, worst case latency is in the order of hundreds of milliseconds. Clearly, your requirements cannot be met by Windows by a long shot. You'll need some kind of microcontroller to supply that kind of execution guarantee.
I was just wondering if there is an elegant way to set the maximum CPU load for a particular thread doing intensive calculations.
Right now I have located the most time consuming loop in the thread (it does only compression) and use GetTickCount() and Sleep() with hardcoded values. It makes sure that the loop continues for a certain period and then sleeps for a certain minimum time. It more or less does the job, i.e. guarantees that the thread will not use more than 50% of CPU. However, behavior is dependent on the number of CPU cores (huge disadvantage) and simply ugly (smaller disadvantage :)). Any ideas?
I am not aware of any API to do get the OS's scheduler to do what you want (even if your thread is idle-priority, if there are no higher-priority ready threads, yours will run). However, I think you can improvise a fairly elegant throttling function based on what you are already doing. Essentially (I don't have a Windows dev machine handy):
Pick a default amount of time the thread will sleep each iteration. Then, on each iteration (or on every nth iteration, such that the throttling function doesn't itself become a significant CPU load),
Compute the amount of CPU time your thread used since the last time your throttling function was called (I'll call this dCPU). You can use the GetThreadTimes() API to get the amount of time your thread has been executing.
Compute the amount of real time elapsed since the last time your throttling function was called (I'll call this dClock).
dCPU / dClock is the percent CPU usage (of one CPU). If it is higher than you want, increase your sleep time, if lower, decrease the sleep time.
Have your thread sleep for the computed time.
Depending on how your watchdog computes CPU usage, you might want to use GetProcessAffinityMask() to find out how many CPUs the system has. dCPU / (dClock * CPUs) is the percentage of total CPU time available.
You will still have to pick some magic numbers for the initial sleep time and the increment/decrement amount, but I think this algorithm could be tuned to keep a thread running at fairly close to a determined percent of CPU.
On linux, you can change the scheduling priority of a thread with nice().
I can't think of any cross platform way of what you want (or any guaranteed way full stop) but as you are using GetTickCount perhaps you aren't interested in cross platform :)
I'd use interprocess communications and set the intensive processes nice levels to get what you require but I'm not sure that's appropriate for your situation.
EDIT:
I agree with Bernard which is why I think a process rather than a thread might be more appropriate but it just might not suit your purposes.
The problem is it's not normal to want to leave the CPU idle while you have work to do. Normally you set a background task to IDLE priority, and let the OS handle scheduling it all the CPU time that isn't used by interactive tasks.
It sound to me like the problem is the watchdog process.
If your background task is CPU-bound then you want it to take all the unused CPU time for its task.
Maybe you should look at fixing the watchdog program?
You may be able to change the priority of a thread, but changing the maximum utilization would either require polling and hacks to limit how many things are occurring, or using OS tools that can set the maximum utilization of a process.
However, I don't see any circumstance where you would want to do this.