how to read windows perfmon counter?

how to read windows perfmon counter? - c++

can I get a C++ code to read windows perfmon counter (category, counter name and instance name)?
It's very easy in c# but I needed c++ code.
Thanks

As Doug T. pointed out earlier, I posted a helper class awhile ago to query the performance counter value. The usage of the class is pretty simple, all you have to do is to provide the string for the performance counter.
http://askldjd.wordpress.com/2011/01/05/a-pdh-helper-class-cpdhquery/
However, the code I posted on my blog has been modified in practice. From your comment, it seems like you are interested in querying just a single field.
In this case, try adding the following function to my CPdhQuery class.
double CPdhQuery::CollectSingleData()
{
double data = 0;
while(true)
{
status = PdhCollectQueryData(hQuery);
if (ERROR_SUCCESS != status)
{
throw CException(GetErrorString(status));
}
PDH_FMT_COUNTERVALUE cv;
// Format the performance data record.
status = PdhGetFormattedCounterValue(hCounter,
PDH_FMT_DOUBLE,
(LPDWORD)NULL,
&cv);
if (ERROR_SUCCESS != status)
{
continue;
}
data = cv.doubleValue;
break;
}
return data;
}
For e.g.
To get processor time
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\Processor Information(_Total)\% Processor Time")));
To get file read bytes / sec:
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\System\\File Read Bytes/sec")));
To get % Committed Bytes:
counter = boost::make_shared<CPdhQuery>(std::tstring(_T("\\Memory\\% Committed Bytes In Use")));
To get the data, do this.
double data = counter->CollectSingleData();
I hope this helps.
... Alan

Some of the commonly used performance values have API calls to get them directly. For example, the total processor time can be obtained from GetSystemTimes, and you can calculate the percentage yourself.
If this isn't an option then the Performance Data Helper library provides a moderately simple interface to performance data.

Related

MongoDB C driver efficiency

I'm trying to write a program whose job it is to go into shared memory, retrieve a piece of information (a struct 56 bytes in size), then parse that struct lightly and write it to a database.
The catch is that it needs to do this several dozens of thousands of times per second. I'm running this on a dedicated Ubuntu 14.04 server with dual Xeon X5677's and 32GB RAM. Also, Mongo is running PerconaFT as its storage engine. I am making an uneducated guess here, but say worst case load scenario would be 100,000 writes per second.
Shared memory is populated by another program who's reading information from a real time data stream, so I can't necessarily reproduce scenarios.
First... is Mongo the right choice for this task?
Next, this is the code that I've got right now. It starts with creating a list of collections (the list of items I want to record data points on is fixed) and then retrieving data from shared memory until it catches a signal.
int main()
{
//these deal with navigating shared memory
uint latestNotice=0, latestTurn=0, latestPQ=0, latestPQturn=0;
symbol_data *notice = nullptr;
bool done = false;
//this is our 56 byte struct
pq item;
uint64_t today_at_midnight; //since epoch, in milliseconds
{
time_t seconds = time(NULL);
today_at_midnight = seconds/(60*60*24);
today_at_midnight *= (60*60*24*1000);
}
//connect to shared memory
infob=info_block_init();
uint32_t used_symbols = infob->used_symbols;
getPosition(latestNotice, latestTurn);
//fire up mongo
mongoc_client_t *client = nullptr;
mongoc_collection_t *collections[used_symbols];
mongoc_collection_t *collection = nullptr;
bson_error_t error;
bson_t *doc = nullptr;
mongoc_init();
client = mongoc_client_new("mongodb://localhost:27017/");
for(uint32_t symbol = 0; symbol < used_symbols; symbol++)
{
collections[symbol] = mongoc_client_get_collection(client, "scribe",
(infob->sd+symbol)->getSymbol());
}
//this will be used later to sleep one millisecond
struct timespec ts;
ts.tv_sec=0;
ts.tv_nsec=1000000;
while(continue_running) //becomes false if a signal is caught
{
//check that new info is available in shared memory
//sleep 1ms if it isn't
while(!getNextNotice(&notice,latestNotice,latestTurn)) nanosleep(&ts, NULL);
//get the new info
done=notice->getNextItem(item, latestPQ, latestPQturn);
if(done) continue;
//just some simple array math to make sure we're on the right collection
collection = collections[notice - infob->sd];
//switch on the item type and parse it accordingly
switch(item.tp)
{
case pq::pq_event:
doc = BCON_NEW(
//decided to use this instead of std::chrono
"ts", BCON_DATE_TIME(today_at_midnight + item.ts),
//item.pr is a uint64_t, and the guidance I've read on mongo
//advises using strings for those values
"pr", BCON_UTF8(std::to_string(item.pr).c_str()),
"sz", BCON_INT32(item.sz),
"vn", BCON_UTF8(venue_labels[item.vn]),
"tp", BCON_UTF8("e")
);
if(!mongoc_collection_insert(collection, MONGOC_INSERT_NONE, doc, NULL, &error))
{
LOG(1,"Mongo Error: "<<error.message<<endl);
}
break;
//obviously, several other cases go here, but they all look the
//same, using BCON macros for their data.
default:
LOG(1,"got unknown type = "<<item.tp<<endl);
break;
}
}
//clean up once we break from the while()
if(doc != nullptr) bson_destroy(doc);
for(uint32_t symbol = 0; symbol < used_symbols; symbol++)
{
collection = collections[symbol];
mongoc_collection_destroy(collection);
}
if(client != nullptr) mongoc_client_destroy(client);
mongoc_cleanup();
return 0;
}
My second question is: is this the fastest way to do this? The retrieval from shared memory isn't perfect, but this program is getting way behind its supply of data, far moreso than I need it to be. So I'm looking for obvious mistakes with regards to efficiency or technique when speed is the goal.
Thanks in advance. =)

Correctly measure CPU usage with HyperThreading?

I know there are several answers here on SO on how to measure CPU usage with either of two approaches:
By using the performance counters (PDH API)
By using GetProcessTimes() and dividing that against either wall time or times from GetSystemTimes()
For some days now I am miserably failing to perform CPU usage measurements of my program with either of these - with both mechanisms I get a CPU usage that is smaller than displayed in Task Manager or Process Explorer. Is there some magic how these tools do this and is this related to HyperThreading being enabled? I will perform my tests on a CPU without HyperThreding but if anyone can point out what am I missing here I would be very thankful.
To illustrate what I have tried, here is the code that does PDH based measruements:
class CCpuUsageMonitor
{
public:
CCpuUsageMonitor(const wchar_t* pProcessName)
{
GetSystemInfo(&m_SystemInfo);
auto nStatus = PdhOpenQuery(NULL, NULL, &m_hPdhQuery);
_ASSERT(nStatus == ERROR_SUCCESS);
nStatus = PdhAddCounter(m_hPdhQuery, L"\\Processor(_Total)\\% Processor Time", NULL, &m_hPdhCpuUsageCounter);
_ASSERT(nStatus == ERROR_SUCCESS);
wchar_t pCounterPath[PDH_MAX_COUNTER_PATH];
StringCbPrintf(pCounterPath, PDH_MAX_COUNTER_PATH, L"\\Process(%s)\\%% Processor Time", pProcessName);
nStatus = PdhAddCounter(m_hPdhQuery, pCounterPath, NULL, &m_hPhdProcessCpuUsageCounter);
_ASSERT(nStatus == ERROR_SUCCESS);
}
~CCpuUsageMonitor()
{
PdhCloseQuery(&m_hPdhQuery);
}
void CollectSample()
{
auto nStatus = PdhCollectQueryData(m_hPdhQuery);
_ASSERT(nStatus == ERROR_SUCCESS);
}
double GetCpuUsage()
{
DWORD nType;
PDH_FMT_COUNTERVALUE CounterValue;
auto nStatus = PdhGetFormattedCounterValue(m_hPdhCpuUsageCounter, PDH_FMT_DOUBLE | PDH_FMT_NOCAP100, &nType, &CounterValue);
_ASSERT(nStatus == ERROR_SUCCESS);
return CounterValue.doubleValue;
}
double GetProcessCpuUsage()
{
DWORD nType;
PDH_FMT_COUNTERVALUE CounterValue;
auto nStatus = PdhGetFormattedCounterValue(m_hPhdProcessCpuUsageCounter, PDH_FMT_DOUBLE | PDH_FMT_NOCAP100, &nType, &CounterValue);
_ASSERT(nStatus == ERROR_SUCCESS);
return CounterValue.doubleValue / m_SystemInfo.dwNumberOfProcessors;
}
private:
SYSTEM_INFO m_SystemInfo;
HANDLE m_hPdhQuery;
HANDLE m_hPdhCpuUsageCounter;
HANDLE m_hPhdProcessCpuUsageCounter;
};
With the second approach I basically take two snapshots of process times via GetProcessTimes() before and after my code runs, substract and divide against wall time multiplied by the number of processors.

Here are a few links I've used in the past and a good article on why GetThreadTimes is wrong (I wouldn't use it as a reliable source of data):
http://blog.kalmbachnet.de/?postid=28
https://msdn.microsoft.com/en-us/library/aa392397(VS.85).aspx
http://www.drdobbs.com/windows/win32-performance-measurement-options/184416651
https://msdn.microsoft.com/en-us/library/aa394279(VS.85).aspx
You seem well on your way and knowledgeable those links should get you going in the right direction at least.

From this link:
Starting with Windows 8, a change was made to the way that Task Manager and Performance Monitor report CPU utilization...
This change affects the way that CPU utilization is computed. The values in Task Manager now correspond to the Processor Information\% Processor Utility and Processor Information\% Privileged Utility performance counters, not to the Processor Information\% Processor Time and Processor Information\% Privileged Time counters as in Windows 7.
Your code will work as written other than the change in which counters you are querying. You are using the Processor counters; you should switch to Processor Information enabled in Windows 8; and also use the "Utility" versions of the counters.
If you query the formatted value as you currently do, you'll get the same number displayed on the Task manager with 1-second polling.
If you want to do calculations over longer intervals, you can query the raw value; the numbers into a PDH_RAW_COUNTER structure instead of your current PDH_FMT_COUNTERVALUE. The values used to calculate the usage for the numerator are in the PDH_RAW_COUNTER structure's FirstValue, and the "base" values for the denominator are in SecondValue.

C++ ASIO, accessing buffers

I have no experience in audio programming and C++ is quite low level language so I have a little problems with it. I work with ASIO SDK 2.3 downloaded from http://www.steinberg.net/en/company/developers.html.
I am writing my own host based on example inside SDK.
For now I've managed to go through the whole sample and it looks like it's working. I have external sound card connected to my PC. I've successfully loaded driver for this device, configured it, handled callbacks, casting data from analog to digital etc. common stuff.
And part where I am stuck now:
When I play some track via my device I can see bars moving in the mixer (device's software). So device is connected in right way. In my code I've picked the inputs and outputs with the names of the bars that are moving in mixer. I've also used ASIOCreateBuffers() to create buffer for each input/output.
Now correct me if I am wrong:
When ASIOStart() is called and driver is in running state, when I input the sound signal to my external device I believe the buffers get filled with data, right?
I am reading the documentation but I am a bit lost - how can I access the data being sent by device to application, stored in INPUT buffers? Or signal? I need it for signal analysis or maybe recording in future.
EDIT: If I had made it to complicated then in a nutshell my question is: how can I access input stream data from code? I don't see any objects/callbacks letting me to do so in documentation.

The hostsample in the ASIO SDK is pretty close to what you need. In the bufferSwitchTimeInfo callback there is some code like this:
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
char* buf = asioDriver.bufferInfos[i].buffers[index];
....
Inside of that if block asioDriver.bufferInfos[i].buffers[index] is a pointer to the raw audio data (index is a parameter to the method).
The format of the buffer is dependent upon the driver and that can be discovered by testing asioDriverInfo.channelInfos[i].type. The types of formats will be 32bit int LSB first, 32bit int MSB first, and so on. You can find the list of values in the ASIOSampleType enum in asio.h. At this point you'll want to convert the samples to some common format for downstream signal processing code. If you're doing signal processing you'll probably want convert to double. The file host\asioconvertsample.cpp will give you some idea of what's involved in the conversion. The most common format you're going to encounter is probably INT32 MSB. Here is how you'd convert it to double.
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
switch (asioDriverInfo.channelInfos[i].type)
{
case ASIOInt32LSB:
{
double* pDoubleBuf = new double[_bufferSize];
for (int i = 0 ; i < _bufferSize ; ++i)
{
pDoubleBuf[i] = *(int*)asioDriverInfo.bufferInfos.buffers[index] / (double)0x7fffffff;
}
// now pDoubleBuf contains one channels worth of samples in the range of -1.0 to 1.0.
break;
}
// and so on...

Thank you very much. Your answer helped quite much but as I am inexperienced with C++ a bit :P I find it a bit problematic.
In general I've written my own host based on hostsample. I didn't implement asioDriverInfo structure and use common variables for now.
My first problem was:.
char* buf = asioDriver.bufferInfos[i].buffers[index];
as I got error that I can't cast (void*) to char* but this probably solved the problem:
char* buf = static_cast<char*>(bufferInfos[i].buffers[doubleBufferIndex]);
My second problem is with the data conversion. I've checked the file you've recommended me but I find it a little black magic. For now I am trying to follow your example and:
for (int i = 0; i < inputBuffers + outputBuffers; i++)
{
if (bufferInfos[i].isInput)
{
switch (channelInfos[i].type)
{
case ASIOSTInt32LSB:
{
double* pDoubleBuf = new double[buffSize];
for (int j = 0 ; j < buffSize ; ++j)
{
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
}
break;
}
}
}
I get error there:
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
which is:
error C2296: '/' : illegal, left operand has type 'void *'
What I don't get is that in your example there is no table there: asioDriverInfo.bufferInfos.buffers[index] after bufferInfos and even if I fix it... to what kind of type should I cast it to make it work. P
PS. I am sure ASIOSTInt32LSB data type is fine for my PC.

The ASIO input and output buffers are accessible using void pointers, but using memcpy or memmove to access I/O buffer will create a memory copy which is to be avoided if you are doing real-time processing. I would suggest casting the pointer type to int* so you can directly access them.
It's also very slow in real-time processing to cast types 1 by 1 when you have like 100+ audio channels when AVX2 is supported on most CPUs.
_mm256_loadu_si256() and _mm256_cvtepi32_ps() will do the conversion much faster.

How to reduce cpu usage during data transfer on TCP ports realtime

I have a socket program which acts like both client and server.
It initiates connection on an input port and reads data from it. On a real time scenario it reads data on input port and sends the data (record by record ) on to the output port.
The problem here is that while sending data to the output port CPU usage increases to 50% while is not permissible.
while(1)
{
if(IsInputDataAvail==1)//check if data is available on input port
{
//condition to avoid duplications while sending
if( LastRecordSent < LastRecordRecvd )
{
record_time temprt;
list<record_time> BufferList;
list<record_time>::iterator j;
list<record_time>::iterator i;
// Storing into a temp list
for(i=L.begin(); i != L.end(); ++i)
{
if((i->recordId > LastRecordSent) && (i->recordId <= LastRecordRecvd))
{
temprt.listrec = i->listrec;
temprt.recordId = i->recordId;
temprt.timestamp = i->timestamp;
BufferList.push_back(temprt);
}
}
//Sending to output port
for(j=BufferList.begin(); j != BufferList.end(); ++j)
{
LastRecordSent = j->recordId;
std::string newlistrecord = j->listrec;
newlistrecord.append("\n");
char* newrecord= new char [newlistrecord.size()+1];
strcpy (newrecord, newlistrecord.c_str());
if ( s.OutputClientAvail() == 1) //check if output client is available
{
int ret = s.SendBytes(newrecord,strlen(newrecord));
if ( ret < 0)
{
log1.AddLogFormatFatal("Nice Send Thread : Nice Client Disconnected");
--connected;
return;
}
}
else
{
log1.AddLogFormatFatal("Nice Send Thread : Nice Client Timedout..connection closed");
--connected; //if output client not available disconnect after a timeout
return;
}
}
}
}
// Sleep(100); if we include sleep here CPU usage is less..but to send data real time I need to remove this sleep.
If I remove Sleep()...CPU usage goes very high while sending data to out put port.
}//End of while loop
Any possible ways to maintain real time data transfer and reduce CPU usage..please suggest.

There are two potential CPU sinks in the listed code. First, the outer loop:
while (1)
{
if (IsInputDataAvail == 1)
{
// Not run most of the time
}
// Sleep(100);
}
Given that the Sleep call significantly reduces your CPU usage, this spin-loop is the most likely culprit. It looks like IsInputDataAvail is a variable set by another thread (though it could be a preprocessor macro), which would mean that almost all of that CPU is being used to run this one comparison instruction and a couple of jumps.
The way to reclaim that wasted power is to block until input is available. Your reading thread probably does so already, so you just need some sort of semaphore to communicate between the two, with a system call to block the output thread. Where available, the ideal option would be sem_wait() in the output thread, right at the top of your loop, and sem_post() in the input thread, where it currently sets IsInputDataAvail. If that's not possible, the self-pipe trick might work in its place.
The second potential CPU sink is in s.SendBytes(). If a positive result indicates that the record was fully sent, then that method must be using a loop. It probably uses a blocking call to write the record; if it doesn't, then it could be rewritten to do so.
Alternatively, you could rewrite half the application to use select(), poll(), or a similar method to merge reading and writing into the same thread, but that's far too much work if your program is already mostly complete.

if(IsInputDataAvail==1)//check if data is available on input port
Get rid of that. Just read from the input port. It will block until data is available. This is where most of your CPU time is going. However there are other problems:
std::string newlistrecord = j->listrec;
Here you are copying data.
newlistrecord.append("\n");
char* newrecord= new char [newlistrecord.size()+1];
strcpy (newrecord, newlistrecord.c_str());
Here you are copying the same data again. You are also dynamically allocating memory, and you are also leaking it.
if ( s.OutputClientAvail() == 1) //check if output client is available
I don't know what this does but you should delete it. The following send is the time to check for errors. Don't try to guess the future.
int ret = s.SendBytes(newrecord,strlen(newrecord));
Here you are recomputing the length of the string which you probably already knew back at the time you set j->listrec. It would be much more efficient to just call s.sendBytes() directly with j->listrec and then again with "\n" than to do all this. TCP will coalesce the data anyway.

Online judge system

I would like to create web-service with some problems and judge system. The user should write solution in C++ and then upload the source code. Then code should be compiled and test to determinate if the solution is right(like TopCoder does when you submit your code). There should be time and memory limits for solutions to run. Is there some ready-to-use judge systems to create such a service on Linux?

Do you mean online judge system like acm.timus.ru? Then you could try open source project on code.google.com and I guess that they use it here. I studied before this code a little bit: basically it prevents some system calls and measures spent time and memory. In detail, have a look at here.

I used DomJudge system which is ready for online competitions. http://domjudge.sourceforge.net/

For just checking memory and time caps, just add the code into their project automatically. Also, see Limit physical memory per process for memory caps as well.
For example, to check memory, add:
(EDITED FOR LINUX EXAMPLE)
#include "sys/types.h"
#include "sys/sysinfo.h"
#inlcude <stdio.h>
#include <pthread.h>
public long long getMemoryUsage()
{
sysinfo (&memInfo);
long long totalVirtualMem = memInfo.totalram; //Total physical memory
}
And add a new Thread to their Main() that calls a function as follow:
private const long long MAXMEMORY = 1024; // Memory cap
private const int LIMITAPPLICATION = 100; // 10 seconds
private void TimeLimitToApplication()
{
//LIMITAPPLICATION represents how many 100ms's must
//pass until time limit is reached.
while(COUNTER < LIMITAPPLICATION)
{
sleep(100);
/* Include if including code block below
if(CheckProcessThreadsIsComplete()) break;
*/
//Check memory usage every 100ms
if(getMemoryUsage() > MAXMEMORY)
break;
COUNTER++;
}
pgid = getpgid();
kill(pgid, 15);
}
You might also want to add another check to see if all your threads are completed:
(I only have a C#.NET sample of this, but I'm sure you could find something similar.)
public static bool CheckProcessThreadsIsComplete()
{
Process[] allProcs = Process.GetProcesses();
foreach(Process proc in allProcs)
{
ProcessThreadCollection myThreads = proc.Threads;
foreach(ProcessThread pt in myThreads)
{
//Ignore thread that you started to do time and memory checks
if(pt.Id == TIMELIMIT_AND_MEMORYCHECK_THREAD_ID) continue;
if(pt.ThreadState != (ThreadState.Unstarted | ThreadState.Stopped | ThreadState.WaitSleepJoin | ThreadState.Aborted)
return true; // all threads are not completed
}
}
return false; // all threads are completed
}
And add that to your checks in TimeLimitToApplication();

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how to read windows perfmon counter? - c++

can I get a C++ code to read windows perfmon counter (category, counter name and instance name)? It's very easy in c# but I needed c++ code. Thanks

Related

MongoDB C driver efficiency

Correctly measure CPU usage with HyperThreading?

C++ ASIO, accessing buffers

How to reduce cpu usage during data transfer on TCP ports realtime

Online judge system

Categories

Resources