Determining CPU usage in WinCE

Determining CPU usage in WinCE - c++

I want to be able to get the current % CPU usage in a C++ program running under Wince.
I found this link that states where the source code is but I cannot find it in my platform builder installation - I expect this is because it isn't the Windows Automotive platform.
Does anyone know where I can find this source code or (even better) know how I can get this information directly? i.e. what DLL / function calls to make etc.

Since GetProcessTimes doesn't exist in CE, you have to calculate this.
You have to start with the toolhelp APIs to enumerate the processes and the threads in the processes. You then call GetThreadTimes for each thread and add all that up.
Bear in mind that the act of calculating this info will affect the CPU utilization.

I have found that GetIdleTime (or CeGetIdleTimeEx on WEC7 or newer) works well for calculating system-wide processor usage. Sample code for calculating processor idle time percentage is shown on GetIdleTime MSDN page. Obviously, processor utilization percentage can then be calculated by subtracting the idle time percentage from 100.
The MSDN page does warn that support for GetIdleTime is dependent on OAL implementation.

Note that when using the toolhelp APIs to calculate the CPU usage, you need to take two measurements, then calculate the difference. when doing so, you won't know how much CPU any threads that were terminated before the second sample took.
So, applications that often create short-lived threads will not be represented properly in your result.

You can look into Remote Task Monitor. It will let you get the current % CPU usage of your process (or thread), exactly what you are looking for. It also is very light weight, does not impact your device much.

Related

Gauss Blur 3d image in cuda, sometimes it works sometimes it does not [duplicate]

I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way to extend this amount of time or to get around it?

I'm not a CUDA expert, --- I've been developing with the AMD Stream SDK, which AFAIK is roughly comparable.
You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious.
To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1.
You may also need to do something in the NVidia control panel. Look for some reference to "VPU Recovery" in the CUDA docs.
Ideally, you should be able to break your kernel operations up into multiple passes over your data to break it up into operations that run in the time limit.
Alternatively, you can divide the problem domain up so that it's computing fewer output pixels per command. I.e., instead of computing 1,000,000 output pixels in one fell swoop, issue 10 commands to the gpu to compute 100,000 each.
The basic unit that has to fit within the time slice is not your entire application, but the execution of a single command buffer. In the AMD Stream SDK, a long sequence of operations can be broken up into multiple time slices by explicitly flushing the command queue with a CtxFlush() call. Perhaps CUDA has something similar?
You should not have to read all of your data back and forth across the PCIX bus on every time slice; you can leave your textures, etc. in gpu local memory; you just have some command buffers complete occasionally, to prove to the OS that you're not stuck in an infinite loop.
Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong.
[EDIT Mar 2010 to update:] (outdated again, see the updates below for the most recent information) The registry key above is out-of-date. I think that was the key for Windows XP 64-bit. There are new registry keys for Vista and Windows 7. You can find them here: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
or here: http://msdn.microsoft.com/en-us/library/ee817001.aspx
[EDIT Apr 2015 to update:] This is getting really out of date. The easiest way to disable TDR for Cuda programming, assuming you have the NVIDIA Nsight tools installed, is to open the Nsight Monitor, click on "Nsight Monitor options", and under "General" set "WDDM TDR enabled" to false. This will change the registry setting for you. Close and reboot. Any change to the TDR registry setting won't take effect until you reboot.
[EDIT August 2018 to update:]
Although the NVIDIA tools allow disabling the TDR now, the same question is relevant for AMD/OpenCL developers. For those: The current link that documents the TDR settings is at https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

On Windows, the graphics driver has a watchdog timer that kills any shader programs that run for more than 5 seconds. Note that the Xorg/XFree86 drivers don't do this, so one possible workaround is to run the CUDA apps on Linux.
AFAIK it is not possible to disable the watchdog timer on Windows. The only way to get around this on Windows is to use a second card that has no displayed screens on it. It doesn't have to be a Tesla but it must have no active screens.

Resolve Timeout Detection and Recovery - WINDOWS 7 (32/64 bit)
Create a registry key in Windows to change the TDR settings to a
higher amount, so that Windows will allow for a longer delay before
TDR process starts.
Open Regedit from Run or DOS.
In Windows 7 navigate to the correct registry key area, to create the
new key:
HKEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Control>GraphicsDrivers.
There will probably one key in there called DxgKrnlVersion there as a
DWord.
Right click and select to create a new key REG_DWORD, and name it
TdrDelay. The value assigned to it is the number of seconds before
TDR kicks in - it > is currently 2 automatically in Windows (even
though the reg. key value doesn't exist >until you create it). Assign
it with a new value (I tried 4 seconds), which doubles the time before
TDR. Then restart PC. You need to restart the PC before the value will
work.
Source from Win7 TDR (Driver Timeout Detection & Recovery)
I have also verified this and works fine.

The most basic solution is to pick a point in the calculation some percentage of the way through that I am sure the GPU I am working with is able to complete in time, save all the state information and stop, then to start again.
Update:
For Linux: Exiting X will allow you to run CUDA applications as long as you want. No Tesla required (A 9600 was used in testing this)
One thing to note, however, is that if X is never entered, the drivers probably won't be loaded, and it won't work.
It also seems that for Linux, simply not having any X displays up at the time will also work, so X does not need to be exited as long as you screen to a non-X full-screen terminal.

This isn't possible. The time-out is there to prevent bugs in calculations from taking up the GPU for long periods of time.
If you use a dedicated card for CUDA work, the time limit is lifted. I'm not sure if this requires a Tesla card, or if a GeForce with no monitor connected can be used.

The solution I use is:
1. Pass all information to device.
2. Run iterative versions of algorithms, where each iteration invokes the kernel on the memory already stored within the device.
3. Finally transfer memory to host only after all iterations have ended.
This enables control over iterations from CPU (including option to abort), without the costly device<-->host memory transfers between iterations.

The watchdog timer only applies on GPUs with a display attached.
On Windows the timer is part of the WDDM, it is possible to modify the settings (timeout, behaviour on reaching timeout etc.) with some registry keys, see this Microsoft article for more information.

It is possible to disable this behavior in Linux. Although the "watchdog" has an obvious purpose, it may cause some very unexpected results when doing extensive computations using shaders / CUDA.
The option can be toggled in your X-configuration (likely /etc/X11/xorg.conf)
Adding: Option "Interactive" "0" to the device section of your GPU does the job.
see CUDA Visual Profiler 'Interactive' X config option?
For details on the config
and
see ftp://download.nvidia.com/XFree86/Linux-x86/270.41.06/README/xconfigoptions.html#Interactive
For a description of the parameter.

Find out the system frequency in wince 7

I have a system that runs on wince 7.
I have encountered a problem in which the system slows down after a while.
How can I find out the system frequency?

There are several methods (depending on the information you want):
You can use the toolhelp api to find how much processor time each application is using. http://www.codeproject.com/Articles/159461/Mobile-Processor-Usage
IOCTL_PROCESSOR_INFORMATION will tell you what kind of processor is in your system and what clock speed it is running at. http://www.codeproject.com/Tips/122843/What-processor-is-in-my-mobile-device
GetIdleTime() can be used to tell you how busy your processor is overall: http://www.codeproject.com/Tips/133104/Mobile-processor-usage

C API for getting CPU load in linux

In linux, is there a built-in C library function for getting the CPU load of the machine? Presumably I could write my own function for opening and parsing a file in /proc, but it seems like there ought to be a better way.
Doesn't need to be portable
Must not require any libraries beyond a base RHEL4 installation.

If you really want a c interface use getloadavg(), which also works in unixes without /proc.
It has a man page with all the details.

The preferred method of getting information about CPU load on linux is to read from /proc/stat, /proc/loadavg and /proc/uptime. All the normal linux utilities like top use this method.

from the proc (5) man page:
/proc/loadavg
The first three fields in this file are load average figures
giving the number of jobs in the run queue (state R) or waiting
for disk I/O (state D) averaged over 1, 5, and 15 minutes. They
are the same as the load average numbers given by uptime(1) and
other programs. The fourth field consists of two numbers sepaâ
rated by a slash (/). The first of these is the number of curâ
rently executing kernel scheduling entities (processes,
threads); this will be less than or equal to the number of CPUs.
The value after the slash is the number of kernel scheduling
entities that currently exist on the system. The fifth field is
the PID of the process that was most recently created on the
system.

My understanding is that parsing the contains of /proc is the official interface for that kind of thing (there are a number of files there which are really meant to be parsed before presented to the user).

"Load average" may not be very useful. We find it to be of limited use, as it doesn't actually tell you how much CPU is being used, only the average number of tasks "ready to run". "Ready to run" is somewhat subjective, but not very helpful as it often includes processes waiting for IO.
On busy systems, we see load average of 20+ on machines with only 8 cores, and still the CPUs are relatively idle.
If you want to see what CPU is in use, have a look at the various files in /proc

Running background services on a PocketPC

I've recently bought myself a new cellphone, running Windows Mobile 6.1 Professional. And of course I am currently looking into doing some coding for it, on a hobby basis. My plan is to have a service running as a DLL, loaded by Services.exe. This needs to gather som data, and do som processing at regular intervals (every 5-10 minutes).
Since I need to run this at regular intervals, it is a bit of a problem for me, that the system typically goes to sleep (suspend) after a short period of inactivity by the user.
I have been reading all the documentation I could find on MSDN, and MSDN blogs about this subject, and it seems to me, that there are three possible solutions to this problem:
Keep the system in an "Always On"-state, by calling SystemIdleTimerReset periodically. This seems a bit excessive, and is therefore out of the question.
Have the system periodically waken up with CeRunAppAtTime, and enter the unattended state, to do my processing.
Use the unattended state instead of going into a full suspend. This would be transparent to the user, but the system would never go into sleep.
The second approach seems to be preferred, however, this would require an executable to be called by the system on wake up, with the only task of notifying my service that it should commence processing. This seems a bit unnecessary and I would like to avoid this extra executable. I could of course move all my processing into this extra executable, but I would like to use some of the facilities provided when running as a service, and also not have a program pop up (even if its in the background) whenever processing starts.
At first glance, the third approach seems to have the same basic problem as the first. However, I have read on some of the MSDN blogs, that it might be possible to actually conserve battery consumption with this approach, instead of going in and out of suspend mode often (The arguments for this was that the nature of the WM platform is to have a very little battery consumption, when the system is idle. And that going in and out of suspend require quite a bit of processing).
So I guess my questions are as following:
Which approach would you recommend in my situation? With respect to keeping a minimum battery consumption, and a nice clean implementation.
In the case of approach number two, is it possible to eliminate the need for a notifying executable? Either through alternative API functions, or existing generic applications on the platform?
In the case of approach number three, do you know of any information/statistics relevant to the claim, that it is possible to extend the battery lifetime when using unattended mode over going into suspend. E.g. how often do you need to pull the system out of suspend, before unattended mode is to be preferred.
Implementation specific (bonus) question: Is it necessary to regularly call SystemIdleTimerReset to stay in unattended mode?
And finally, if you think I have prematurely eliminated approach number one, please tell me why.
Please include in your response whether you base your response on knowledge, or are merely guessing (the latter is also very welcome!).
Please leave a comment, if you think I need to clarify any parts of this question.

CERunAppAtTime is a much-misunderstood API (largely because of the terrible name). It doesn't have to run an app. It can simply set a named system event (see the description of the pwszAppName parameter in the MSDN docs). If you care to know when it has fired (to lat your app put the device to sleep again when it's done processing) simply have a worker thread that is doing a WaitForSingleObject on that same named event.
Unattended state is often used for devices that need to keep an app running continuously (like an MP3 player) but conserve power by shutting down the backlight (probably the single most power consuming subsystem).
Obviously unattended mode uses significantly more powr than suspend, becasue in suspend the only power draw is for RAM self-refresh. In unattended mode the processor is stuill powered and running (and several peripherals may be too - depends on how the OEM defined their unattended mode).
SystemIdleTimerReset simply prevents the power manager from putting the device into low-power mode due to inactivity. This mode, whether suspended, unattended, flight or other, is defined by the OEM. Use it sparingly because when your do it impacts the power consumption of the device. Doing it in unattended mode is especially problematic from a user perspective because they might think the device is off (it looks that way) but now their battery life has gone south.

I had a whole long post detailing how you shouldn't expect to be able to get acceptable battery life because WM is not designed to support what you're trying to do, but -- you could signal your service on wakeup, do your processing, then use the methods in this post to put the device back to sleep immediately. You should be able to keep the ratio of on-time-to-sleep-time very low this way -- but as you say, I'm only guessing.
See also:
Power-Efficient Apps (MSDN)
Power To The People (Developers 1, Developers 2, Devices)
Power-Efficient WM Apps (blog post)

monitor cpu usage per thread on windows mobile device

Is is possible to measure CPU per thread on a windows mobile (or CE 5) device programmatically (c++)? If not, is their a utility that will monitor the CPU usage of a process?

CPU usage cannot be directly measured because, unlike an x86, the ARM processor doesn't have a register for it. You can calculate it using the Toolhelp APIs to get a list of processes and their child threads and then use GetThreadTimes to figure out how much time each thread is using.
Keep in mind that doing this calculation directly affects how much the CPU is in use.

Someone wrote a tool that looks a lot like Task Manager on the PC:
http://www.vttoth.com/LPK/taskmanager.html
As ctacke says, it does seem to use a lot of the CPU. It reports uses ~15%-30% of our CPU on our 800MHz ARM device.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js