Correctly measure CPU usage with HyperThreading? - c++

I know there are several answers here on SO on how to measure CPU usage with either of two approaches:
By using the performance counters (PDH API)
By using GetProcessTimes() and dividing that against either wall time or times from GetSystemTimes()
For some days now I am miserably failing to perform CPU usage measurements of my program with either of these - with both mechanisms I get a CPU usage that is smaller than displayed in Task Manager or Process Explorer. Is there some magic how these tools do this and is this related to HyperThreading being enabled? I will perform my tests on a CPU without HyperThreding but if anyone can point out what am I missing here I would be very thankful.
To illustrate what I have tried, here is the code that does PDH based measruements:
class CCpuUsageMonitor
{
public:
CCpuUsageMonitor(const wchar_t* pProcessName)
{
GetSystemInfo(&m_SystemInfo);
auto nStatus = PdhOpenQuery(NULL, NULL, &m_hPdhQuery);
_ASSERT(nStatus == ERROR_SUCCESS);
nStatus = PdhAddCounter(m_hPdhQuery, L"\\Processor(_Total)\\% Processor Time", NULL, &m_hPdhCpuUsageCounter);
_ASSERT(nStatus == ERROR_SUCCESS);
wchar_t pCounterPath[PDH_MAX_COUNTER_PATH];
StringCbPrintf(pCounterPath, PDH_MAX_COUNTER_PATH, L"\\Process(%s)\\%% Processor Time", pProcessName);
nStatus = PdhAddCounter(m_hPdhQuery, pCounterPath, NULL, &m_hPhdProcessCpuUsageCounter);
_ASSERT(nStatus == ERROR_SUCCESS);
}
~CCpuUsageMonitor()
{
PdhCloseQuery(&m_hPdhQuery);
}
void CollectSample()
{
auto nStatus = PdhCollectQueryData(m_hPdhQuery);
_ASSERT(nStatus == ERROR_SUCCESS);
}
double GetCpuUsage()
{
DWORD nType;
PDH_FMT_COUNTERVALUE CounterValue;
auto nStatus = PdhGetFormattedCounterValue(m_hPdhCpuUsageCounter, PDH_FMT_DOUBLE | PDH_FMT_NOCAP100, &nType, &CounterValue);
_ASSERT(nStatus == ERROR_SUCCESS);
return CounterValue.doubleValue;
}
double GetProcessCpuUsage()
{
DWORD nType;
PDH_FMT_COUNTERVALUE CounterValue;
auto nStatus = PdhGetFormattedCounterValue(m_hPhdProcessCpuUsageCounter, PDH_FMT_DOUBLE | PDH_FMT_NOCAP100, &nType, &CounterValue);
_ASSERT(nStatus == ERROR_SUCCESS);
return CounterValue.doubleValue / m_SystemInfo.dwNumberOfProcessors;
}
private:
SYSTEM_INFO m_SystemInfo;
HANDLE m_hPdhQuery;
HANDLE m_hPdhCpuUsageCounter;
HANDLE m_hPhdProcessCpuUsageCounter;
};
With the second approach I basically take two snapshots of process times via GetProcessTimes() before and after my code runs, substract and divide against wall time multiplied by the number of processors.

Here are a few links I've used in the past and a good article on why GetThreadTimes is wrong (I wouldn't use it as a reliable source of data):
http://blog.kalmbachnet.de/?postid=28
https://msdn.microsoft.com/en-us/library/aa392397(VS.85).aspx
http://www.drdobbs.com/windows/win32-performance-measurement-options/184416651
https://msdn.microsoft.com/en-us/library/aa394279(VS.85).aspx
You seem well on your way and knowledgeable those links should get you going in the right direction at least.

From this link:
Starting with Windows 8, a change was made to the way that Task Manager and Performance Monitor report CPU utilization...
This change affects the way that CPU utilization is computed. The values in Task Manager now correspond to the Processor Information\% Processor Utility and Processor Information\% Privileged Utility performance counters, not to the Processor Information\% Processor Time and Processor Information\% Privileged Time counters as in Windows 7.
Your code will work as written other than the change in which counters you are querying. You are using the Processor counters; you should switch to Processor Information enabled in Windows 8; and also use the "Utility" versions of the counters.
If you query the formatted value as you currently do, you'll get the same number displayed on the Task manager with 1-second polling.
If you want to do calculations over longer intervals, you can query the raw value; the numbers into a PDH_RAW_COUNTER structure instead of your current PDH_FMT_COUNTERVALUE. The values used to calculate the usage for the numerator are in the PDH_RAW_COUNTER structure's FirstValue, and the "base" values for the denominator are in SecondValue.

Related

Resource intensive multithreading killing other processes

I have a very resource intensive code, that I made, so I can split the workload over multiple pthreads. While everything works, the computation is done faster, etc. What I'm guessing happens is that other processes on that processor core get so slow, that they crash after a few seconds of runtime.
I already managed to kill random processes like Chrome tabs, the Cinnamon DE or even the entire OS (Kernel?).
Code: (It's late, and I'm too tired to make a pseudo code, or even comments..)
-- But it's a brute force code, not so much for cracking, but for testing passwords and or CPU IPS.
Any ideas how to fix this, while still keeping as much performance as possible?
static unsigned int NTHREADS = std::thread::hardware_concurrency();
static int THREAD_COMPLETE = -1;
static std::string PASSWORD = "";
static std::string CHARS;
static std::mutex MUTEX;
void *find_seq(void *arg_0)
{
unsigned int _arg_0 = *((unsigned int *) arg_0);
std::string *str_CURRENT = new std::string(" ");
while (true)
{
for (unsigned int loop_0 = _arg_0; loop_0 < CHARS.length() - 1; loop_0 += NTHREADS)
{
str_CURRENT->back() = CHARS[loop_0];
if (*str_CURRENT == PASSWORD)
{
THREAD_COMPLETE = _arg_0;
return (void *) str_CURRENT;
}
}
str_CURRENT->back() = CHARS.back();
for (int loop_1 = (str_CURRENT->length() - 1); loop_1 >= 0; loop_1--)
{
if (str_CURRENT->at(loop_1) == CHARS.back())
{
if (loop_1 == 0)
str_CURRENT->assign(str_CURRENT->length() + 1, CHARS.front());
else
{
str_CURRENT->at(loop_1) = CHARS.front();
str_CURRENT->at(loop_1 - 1) = CHARS[CHARS.find(str_CURRENT->at(loop_1 - 1)) + 1];
}
}
}
};
}
Areuz,
Can you post the full code? I suspect the issue is the NTHREADS value. On my Ubuntu box, the value is set to 8 which is the number of cores in the /proc/cpuinfo file. Kicking off 8 'hot' threads on my box hogs 100% of the CPU. The kernel will time slice for its own critical processes but in general all other processes will starve for CPU.
Check out the max processor value in /etc/cpuinfo and go at least one lower then that. The CPU's are numbered 0-7 on my box, so 7 would be the max for me. The actual max might be 3 since 4 of my cores are hyper-threads. For completely CPU processes, hyper-threading generally doesn't help.
Bottom line, don't hog all the CPU, it will destabilize the system.
--Matt
Thank you for your answers and especially Matthew Fisher for his suggestion to try it on another system.
After some trial and error I decided to pull back my CPU overclock that I thought was stable (I had it for over a year) and that solved this weird behaviour. I guess that I've never ran such a CPU intensive and (I'm guessing) efficient (In regards to not throttling the full CPU by yielding) script to see this happen.
As Matthew suggested I need to come up with a better way than to just constantly check the THREAD_COMPLETE variable with a while true loop, but I hope to resolve that in the comments.
Full and updated code for future visitors is here: pastebin.com/jbiYyKBu

C++ beginner how to use GetSystemTimeAsFileTime

I have a program that reads the current time from the system clock and saves it to a text file. I previously used the GetSystemTime function which worked, but the times weren't completely consistent eg: one of the times is 32567.789 and the next time is 32567.780 which is backwards in time.
I am using this program to save the time up to 10 times a second. I read that the GetSystemTimeAsFileTime function is more accurate. My question is, how to I convert my current code to use the GetSystemTimeAsFileTime function? I tried to use the FileTimeToSystemTime function but that had the same problems.
SYSTEMTIME st;
GetSystemTime(&st);
WORD sec = (st.wHour*3600) + (st.wMinute*60) + st.wSecond; //convert to seconds in a day
lStr.Format( _T("%d %d.%d\n"),GetFrames() ,sec, st.wMilliseconds);
std::wfstream myfile;
myfile.open("time.txt", std::ios::out | std::ios::in | std::ios::app );
if (myfile.is_open())
{
myfile.write((LPCTSTR)lStr, lStr.GetLength());
myfile.close();
}
else {lStr.Format( _T("open file failed: %d"), WSAGetLastError());
}
EDIT To add some more info, the code captures an image from a camera which runs 10 times every second and saves the time the image was taken into a text file. When I subtract the 1st entry of the text file from the second and so on eg: entry 2-1 3-2 4-3 etc I get this graph, where the x axis is the number of entries and the y axis is the subtracted values.
All of them should be around the 0.12 mark which most of them are. However you can see that a lot of them vary and some even go negative. This isn't due to the camera because the camera has its own internal clock and that has no variations. It has something to do with capturing the system time. What I want is the most accurate method to extract the system time with the highest resolution and as little noise as possible.
Edit 2 I have taken on board your suggestions and ran the program again. This is the result:
As you can see it is a lot better than before but it is still not right. I find it strange that it seems to do it very incrementally. I also just plotted the times and this is the result, where x is the entry and y is the time:
Does anyone have any idea on what could be causing the time to go out every 30 frames or so?
First of all, you wanna get the FILETIME as follows
FILETIME fileTime;
GetSystemTimeAsFileTime(&fileTime);
// Or for higher precision, use
// GetSystemTimePreciseAsFileTime(&fileTime);
According to FILETIME's documentation,
It is not recommended that you add and subtract values from the FILETIME structure to obtain relative times. Instead, you should copy the low- and high-order parts of the file time to a ULARGE_INTEGER structure, perform 64-bit arithmetic on the QuadPart member, and copy the LowPart and HighPart members into the FILETIME structure.
So, what you should be doing next are
ULARGE_INTEGER theTime;
theTime.LowPart = fileTime.dwLowDateTime;
theTime.HighPart = fileTime.dwHighDateTime;
__int64 fileTime64Bit = theTime.QuadPart;
And that's it. The fileTime64Bit variable now contains the time you're looking for.
If you want to get a SYSTEMTIME object instead, you could just do the following:
SYSTEMTIME systemTime;
FileTimeToSystemTime(&fileTime, &systemTime);
Getting the system time out of Windows with decent accuracy is something that I've had fun with, too... I discovered that Javascript code running on Chrome seemed to produce more consistent timer results than I could with C++ code, so I went looking in the Chrome source. An interesting place to start is the comments at the top of time_win.cc in the Chrome source. The links given there to a Mozilla bug and a Dr. Dobb's article are also very interesting.
Based on the Mozilla and Chrome sources, and the above links, the code I generated for my own use is here. As you can see, it's a lot of code!
The basic idea is that getting the absolute current time is quite expensive. Windows does provide a high resolution timer that's cheap to access, but that only gives you a relative, not absolute time. What my code does is split the problem up into two parts:
1) Get the system time accurately. This is in CalibrateNow(). The basic technique is to call timeBeginPeriod(1) to get accurate times, then call GetSystemTimeAsFileTime() until the result changes, which means that the timeBeginPeriod() call has had an effect. This gives us an accurate system time, but is quite an expensive operation (and the timeBeginPeriod() call can affect other processes) so we don't want to do it each time we want a time. The code also calls QueryPerformanceCounter() to get the current high resolution timer value.
bool NeedCalibration = true;
LONGLONG CalibrationFreq = 0;
LONGLONG CalibrationCountBase = 0;
ULONGLONG CalibrationTimeBase = 0;
void CalibrateNow(void)
{
// If the timer frequency is not known, try to get it
if (CalibrationFreq == 0)
{
LARGE_INTEGER freq;
if (::QueryPerformanceFrequency(&freq) == 0)
CalibrationFreq = -1;
else
CalibrationFreq = freq.QuadPart;
}
if (CalibrationFreq > 0)
{
// Get the current system time, accurate to ~1ms
FILETIME ft1, ft2;
::timeBeginPeriod(1);
::GetSystemTimeAsFileTime(&ft1);
do
{
// Loop until the value changes, so that the timeBeginPeriod() call has had an effect
::GetSystemTimeAsFileTime(&ft2);
}
while (FileTimeToValue(ft1) == FileTimeToValue(ft2));
::timeEndPeriod(1);
// Get the current timer value
LARGE_INTEGER counter;
::QueryPerformanceCounter(&counter);
// Save calibration values
CalibrationCountBase = counter.QuadPart;
CalibrationTimeBase = FileTimeToValue(ft2);
NeedCalibration = false;
}
}
2) When we want the current time, get the high resolution timer by calling QueryPerformanceCounter(), and use the change in that timer since the last CalibrateNow() call to work out an accurate "now". This is in Now() in my code. This also periodcally calls CalibrateNow() to ensure that the system time doesn't go backwards, or drift out.
FILETIME GetNow(void)
{
for (int i = 0; i < 4; i++)
{
// Calibrate if needed, and give up if this fails
if (NeedCalibration)
CalibrateNow();
if (NeedCalibration)
break;
// Get the current timer value and use it to compute now
FILETIME ft;
::GetSystemTimeAsFileTime(&ft);
LARGE_INTEGER counter;
::QueryPerformanceCounter(&counter);
LONGLONG elapsed = ((counter.QuadPart - CalibrationCountBase) * 10000000) / CalibrationFreq;
ULONGLONG now = CalibrationTimeBase + elapsed;
// Don't let time go back
static ULONGLONG lastNow = 0;
now = max(now,lastNow);
lastNow = now;
// Check for clock skew
if (LONGABS(FileTimeToValue(ft) - now) > 2 * GetTimeIncrement())
{
NeedCalibration = true;
lastNow = 0;
}
if (!NeedCalibration)
return ValueToFileTime(now);
}
// Calibration has failed to stabilize, so just use the system time
FILETIME ft;
::GetSystemTimeAsFileTime(&ft);
return ft;
}
It's all a bit hairy but works better than I had hoped. This also seems to work well as far back on Windows as I have tested (which was Windows XP).
I believe you are looking for GetSystemTimePreciseAsFileTime() function or even QueryPerformanceCounter() - to be short for something that is guarantied to produce monotone values.

How can I measure CPU time in C++ on windows and include calls of system()?

I want to run some benchmarks on a C++ algorithm and want to get the CPU time it takes, depending on inputs. I use Visual Studio 2012 on Windows 7. I already discovered one way to calculate the CPU time in Windows: How can I measure CPU time and wall clock time on both Linux/Windows?
However, I use the system() command in my algorithm, which is not measured that way. So, how can I measure CPU time and include the times of my script calls via system()?
I should add a small example. This is my get_cpu_time-function (From the link described above):
double get_cpu_time(){
FILETIME a,b,c,d;
if (GetProcessTimes(GetCurrentProcess(),&a,&b,&c,&d) != 0){
// Returns total user time.
// Can be tweaked to include kernel times as well.
return
(double)(d.dwLowDateTime |
((unsigned long long)d.dwHighDateTime << 32)) * 0.0000001;
}else{
// Handle error
return 0;
}
}
That works fine so far, and when I made a program, that sorts some array (or does some other stuff that takes some time), it works fine. However, when I use the system()-command like in this case, it doesn't:
int main( int argc, const char* argv[] )
{
double start = get_cpu_time();
double end;
system("Bla.exe");
end = get_cpu_time();
printf("Everything took %f seconds of CPU time", end - start);
std::cin.get();
}
The execution of the given exe-file is measured in the same way and takes about 5 seconds. When I run it via system(), the whole thing takes a CPU time of 0 seconds, which obviously does not include the execution of the exe-file.
One possibility would be to get a HANDLE on the system call, is that possible somehow?
Linux:
For the wall clock time, use gettimeofday() or clock_gettime()
For the CPU time, use getrusage() or times()
It will actually prints the CPU time that your program takes. But if you use threads in your program, It will not work properly. You should wait for thread to finish it's job before taking the finish CPU time. So basically you should write this:
WaitForSingleObject(threadhandle, INFINITE);
If you dont know what exactly you use in your program (if it's multithreaded or not..) you can create a thread for doing that job and wait for termination of thread and measure the time.
DWORD WINAPI MyThreadFunction( LPVOID lpParam );
int main()
{
DWORD dwThreadId;
HANDLE hThread;
int startcputime, endcputime, wcts, wcte;
startcputime = cputime();
hThread = CreateThread(
NULL, // default security attributes
0, // use default stack size
MyThreadFunction, // thread function name
NULL, // argument to thread function
0, // use default creation flags
dwThreadIdArray);
WaitForSingleObject(hThread, INFINITE);
endcputime = cputime();
std::cout << "it took " << endcputime - startcputime << " s of CPU to execute this\n";
return 0;
}
DWORD WINAPI MyThreadFunction( LPVOID lpParam )
{
//do your job here
return 0;
}
If your using C++11 (or have access to it) std::chrono has all of the functions you need to calculate how long a program has run.
You'll need to add your process to a Job object before creating any child processes. Child processes will then automatically run in the same job, and the information you want can be found in the TotalUserTime and TotalKernelTime members of the JOBOBJECT_BASIC_ACCOUNTING_INFORMATION structure, available through the QueryInformationJobObject function.
Further information:
Resource Accounting for Jobs
JOBOBJECT_BASIC_ACCOUNTING_INFORMATION structure
Beginning with Windows 8, nested jobs are supported, so you can use this method even if some of the programs already rely on job objects.
I don't think there is a cross platform mechanism. Using CreateProcess to launch the application, with a WaitForSingleObject for the application to finish, would allow you to get direct descendants times. After that you would need job objects for complete accounting (if you needed to time grandchildren)
You might also give external sampling profilers a shot. I've used the freebie "Sleepy" [http://sleepy.sourceforge.net/]and even better "Very Sleepy" [http://www.codersnotes.com/sleepy/] profilers under Windows and been very happy with the results -- nicely formatted info in a few minutes with virtually no effort.
There is a similar project called "Shiny" [http://sourceforge.net/projects/shinyprofiler/] that is supposed to work on both Windows and *nix.
You can try using boost timer. It is cross-platform capable. Sample code from boost web-site:
#include <boost/timer/timer.hpp>
#include <cmath>
int main() {
boost::timer::auto_cpu_timer t;
for (long i = 0; i < 100000000; ++i)
std::sqrt(123.456L); // burn some time
return 0;
}

how to read windows perfmon counter?

can I get a C++ code to read windows perfmon counter (category, counter name and instance name)?
It's very easy in c# but I needed c++ code.
Thanks
As Doug T. pointed out earlier, I posted a helper class awhile ago to query the performance counter value. The usage of the class is pretty simple, all you have to do is to provide the string for the performance counter.
http://askldjd.wordpress.com/2011/01/05/a-pdh-helper-class-cpdhquery/
However, the code I posted on my blog has been modified in practice. From your comment, it seems like you are interested in querying just a single field.
In this case, try adding the following function to my CPdhQuery class.
double CPdhQuery::CollectSingleData()
{
double data = 0;
while(true)
{
status = PdhCollectQueryData(hQuery);
if (ERROR_SUCCESS != status)
{
throw CException(GetErrorString(status));
}
PDH_FMT_COUNTERVALUE cv;
// Format the performance data record.
status = PdhGetFormattedCounterValue(hCounter,
PDH_FMT_DOUBLE,
(LPDWORD)NULL,
&cv);
if (ERROR_SUCCESS != status)
{
continue;
}
data = cv.doubleValue;
break;
}
return data;
}
For e.g.
To get processor time
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\Processor Information(_Total)\% Processor Time")));
To get file read bytes / sec:
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\System\\File Read Bytes/sec")));
To get % Committed Bytes:
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\Memory\\% Committed Bytes In Use")));
To get the data, do this.
double data = counter->CollectSingleData();
I hope this helps.
... Alan
Some of the commonly used performance values have API calls to get them directly. For example, the total processor time can be obtained from GetSystemTimes, and you can calculate the percentage yourself.
If this isn't an option then the Performance Data Helper library provides a moderately simple interface to performance data.

XP app won't increase cpu utilization

I am trying to fix a problem with a legacy Visual Studio win32 un-managed c++ app which is not keeping up with input. As a part of my solution, I am exploring bumping up the class and thread priorities.
My PC has 4 xeon processors, running 64 bit XP. I wrote a short win32 test app which creates 4 background looping threads, each one running on their own processor. Some code samples are shown following. The problem is that even when I bump the priorities to the extreme, the cpu utilization is still less than 1%.
My test app is 32 bit, running on WOW64. The same test app also utilizes less than 1% cpu utilization on a 32 bit xp machine. I am an administrator on both machines. What else do I need to do to get this to work?
DWORD __stdcall ThreadProc4 (LPVOID)
{
SetThreadPriority(GetCurrentThread(),THREAD_PRIORITY_TIME_CRITICAL);
while (true)
{
for (int i = 0; i < 1000; i++)
{
int p = i;
int red = p *5;
theClassPrior4 = GetPriorityClass(theProcessHandle);
}
Sleep(1);
}
}
int APIENTRY _tWinMain(...)
{
...
theProcessHandle = GetCurrentProcess();
BOOL theAffinity = GetProcessAffinityMask(
theProcessHandle,&theProcessMask,&theSystemMask);
SetPriorityClass(theProcessHandle,REALTIME_PRIORITY_CLASS);
DWORD threadid4 = 0;
HANDLE thread4 = CreateThread((LPSECURITY_ATTRIBUTES)NULL,
0,
(LPTHREAD_START_ROUTINE)ThreadProc4,
NULL,
0,
&threadid4);
DWORD_PTR theAff4 = 8;
DWORD_PTR theAf4 = SetThreadAffinityMask(thread1,theAff4);
SetThreadPriority(thread4,THREAD_PRIORITY_TIME_CRITICAL);
ResumeThread(thread4);
Well, if you want it to actually eat CPU time, you'll want to remove that 'Sleep' call - your 'processing' is taking no significant amount of time, and so it's spending most of it's time sleeping.
You'll also want to look at what the optimizer is doing to your code. I wouldn't be totally surprised if it completely removed 'p' and 'red' (and the multiply) in your loop (because the results are never used). You could trying marking 'red' as volatile, that should force it to not remove the calculation.