How can I get Windows data like CPU usage, physical memory, and network utilization and bandwidth, similar to what I see in Task Manager? I'm using C++.
You can get CPU usage using the performance data. If you want data for only one (or a few processes) it may be simpler to call GetProcessTimes every few seconds.
It's generally difficult to pin down exactly what physical memory usage even means. Until you do, trying to describe how to measure it is pretty pointless.
You can get some information about network utilization with GetIpStatistics.
You can get the rated bandwidths of the installed network adapters with GetIfTable.
Look Performance Counters in msdn
Related
My program is in C++ and I have one server listening to a number of clients. Clients send small packets to server. I'm running my code on Ubuntu.
I want to measure the CPU utilization and possibly total number of CPU cycles on both sides, ideally with a breakdown on cycles/utilization spent on networking (all the way from NIC to the user space and vice versa), kernel space, user space, context switches, etc.
I did some search, but I couldn't figure out whether it should be done inside my C++ code or an external profiler should be used, or perhaps some other way.
Your best friend/helper in this case is the /proc file system in Linux. In /proc you will find CPU usage, memory usage, power usage etc. Have a look at this link
http://www.linuxhowtos.org/System/procstat.htm
Even you can check each process cpu usage by looking at the files /proc/process_id/stat.
Take a look at the RDTSCP instruction and some other ways to measure performacnce metrics. System simulators like SniperSim, Gem5 etc. can also give total cycle count of your running program ( however, they may not be very accurate - there are some conditions that need to be met (core frequencies are same etc.))
As I commented, you probably should consider using oprofile. I am not very familiar with it (and it may be complex to use, and require system-wide configuration)
I'm writing a small Linux application which logs the computer's power consumption along with CPU utilisation and disk utilisation. I'd like to add the ability to log memory bandwidth currently being used so I can see how well that correlates with a power consumption.
I understand that I can get information about the amount of memory currently allocated from /proc/meminfo but, of course, that doesn't tell me how much bandwidth is being used at present. Does anyone know how I could measure memory bandwidth currently in use?
edit
I'd like this to work primarily on the x86 and x86-64 platforms
It's highly CPU-dependent but you'll need to be able to get access to the CPU's performance registers. You may be able to do this via oprofile. Note that not all CPUs have a performance register (or combination of registers) which can be used to calculate to memory bandwidth usage, however.
My application buffers data for likely requests in the background. Currently I limit the size of the buffer based on a command-line parameter, and begin dumping less-used data when we hit this limit. This is not ideal because it relies on the user to specify a performance-critical parameter. Is there a better way to handle this? Is there a way to automatically monitor system memory use and dump the oldest/least-recently-used data before the system starts to thrash?
A complicating factor here is that my application runs on Linux, OSX, and Windows. But I'll take a good way to do this on only one platform over nothing.
Your best bet would likely be to monitor your applications working set/resident set size, and try to react when it doesn't grow after your allocations. Some pointers on what to look for:
Windows: GetProcessMemoryInfo
Linux: /proc/self/statm
OS X: task_info()
Windows also has GlobalMemoryStatusEx which gives you a nice Available Physical Memory figure.
I like your current solution. Letting the user decide is good. It's not obvious everyone would want the buffer to be as big as possible, is it? If you do invest in implemting some sort of memory monitor for automatically adjusting the buffer/cache size, at least let the user choose between the user set limit and the automatic/dynamic one.
I know this isn't a direct answer, but I'd say step back a bit and maybe don't do this.
Even if you have the API to see current physical memory usage, that's not enough to choose an ideal cache size. That would depend on your typical and future workloads for both the program and the machine (and the overall system of all clients running this program + the server(s) they're querying), the platform's caching behavior, whether the system should be tuned for throughput or latency, and so on. In a tight memory situation, you're going to be competing for memory with other opportunistic caches, including the OS's disk cache. On the one hand, you want to be exerting some pressure on them, to force out other low-value data. On the other hand, if you get greedy while there's plenty of memory, you're going to be affecting the behavior of other adaptive caches.
And with speculative caching/prefetching, the LRU value function is odd: you will (hopefully) fetch the most-likely-to-be-called data first, and less-likely data later, so the LRU data in your prefetch cache may be less valuable than older data. This could lead to perverse behavior in the systemwide set of caches by artificially "heating up" less commonly used data.
It seems unlikely that your program would be able to make a cache size choice better than a simple fixed size, perhaps scaled based on the size of overall physical memory on the machine. And there's very little chance it could beat a sysadmin who knows the machine's typical workload and its performance goals.
Using an adaptive cache sizing strategy means that your program's resource usage is going to be both variable and unpredictable. (With respect to both memory and the I/O and server requests used to populate that prefetch cache.) For a lot of server situations, that's not good. (Especially in HPC or DB servers, which this sounds like it might be for, or a high-utilization/high-throughput environment.) Consistency, configurability, and availability are often more important than maximum resource utilization. And locality of reference tends to fall off quickly, so you're likely getting very diminishing returns with larger cache sizes. If this is going to be used server-side, at least leave the option for explicit control of cache sizes, and probably make that the default, if not only, option.
There is a way: it is called virtual memory (vm). All three operating systems listed will use virtual memory (vm), unless there is no hardware support (which may be true in embedded systems). So I will assume that vm support is present.
Here is a quote from the architecture notes of the Varnish project:
The really short answer is that computers do not have two kinds of storage any more.
I would suggest you read the full text here: http://www.varnish-cache.org/trac/wiki/ArchitectNotes
It is a good read, and I believe will answer your question.
You could attempt to allocate some large-ish block of memory then check for a memory allocation exception. If the exception occurs, dump data. The problem is, this will only work when all system memory (or process limit) is reached. This means your application is likely to start swapping.
try {
char *buf = new char[10 * 1024 * 1024]; // 10 megabytes
free(buf);
} catch (const std::bad_alloc &) {
// Memory allocation failed - clean up old buffers
}
The problems with this approach are:
Running out of system memory can be dangerous and cause random applications to be shut down
Better memory management might be a better solution. If there is data that can be freed, why has it not already been freed? Is there a periodic process you could run to clean up unneeded data?
Any recommendations out there for Windows application tuning resources (books web sites etc.)?
I have a C++ console application that needs to feed a hardware device with a considerable amount of data at a fairly high rate. (buffer is 32K in size and gets consumed at ~800k bytes per second)
It will stream data without under runs, except when I perform file IO like opening a folder etc... (It seems to be marginally meeting its timing requirements).
Anyway.. a good book or resource to brush up on realtime performance with windows would be helpful.
Thanks!
The best you can hope for on commodity Windows is "usually meets timing requirements". If the system is running any processes other than your target app, it will occasionally miss deadlines due scheduling inconsistencies. However, if your app/hardware can handle the rare but occasional misses, there are a few things you can do to reduce the number of misses.
Set your process's priority to REALTIME_PRIORITY_CLASS
Change the scheduler's granularity to 1ms resolution via the timeBeginPeriod() function (part of the Windows Multimedia libraries)
Avoid as many system calls in your main loop as possible (this includes allocating memory). Each syscall is an opportunity for the OS to put the process to sleep and, consequently, is an opportunity for the non-deterministic scheduler to miss the next deadline
If this doesn't get the job done for you, you might consider trying a Linux distribution with realtime kernel patches applied. I've found those to provide near-perfect timing (within 10s of microseconds accuracy over the course of several hours). That said, nothing short of a true-realtime OS will actually give you perfection but the realtime-linux distros are much closer than commodity Windows.
The first thing I would do is tune it to where it's as lean as possible. I use this method. For these reasons. Since it's a console app, another option is to try out LTProf, which will show you if there is anything you can fruitfully optimize. When that's done, you will be in the best position to look for buffer timing issues, as #Hans suggested.
Optimizing software in C++ from agner.com is a great optimization manual.
As Rakis said, you will need to be very careful in the processing loop:
No memory allocation. Use the stack and preallocated memory instead.
No throws. Exceptions are quite expensive, in win32 they have a cost even not throwing.
No polymorphism. You will save some indirections.
Use inline extensively.
No locks. Try lock-free approaches when possible.
The buffer will last for only 40 milliseconds. You can't guarantee zero under-runs on Windows with such strict timing requirements. In user mode land, you are looking at, potentially, hundreds of milliseconds when kernel threads do what they need to do. They run with higher priorities that you can ever gain. The thread quantum on the workstation version is 3 times the clock tick, already beyond 40 milliseconds (3 x 15.625 msec). You can't even reliably compete with user mode threads that boosted their priority and take their sweet old time.
If a bigger buffer is not an option then you are looking at a device driver to get this kind of service guarantee. Or something in between that can provide a larger buffer.
I'm writing a performance critical application where its essential to store as much data as possible in the physical memory before dumping to disc.
I can use ::GlobalMemoryStatusEx(...) and ::GetProcessMemoryInfo(...) to find out what percentage of physical memory is reserved\free and how much memory my current process handles.
Using this data I can make sure to dump when ~90% of the physical memory is in use or ~90 of the maximum of 2GB per application limit is hit.
However, I would like a method for simply recieving how many bytes are actually left before the system will start using the virtual memory, especially as the application will be compiled for both 32bit and 64bit, whereas the 2 GB limit doesnt exist.
How about this function:
int
bytesLeftUntilVMUsed() {
return 0;
}
it should give the correct result in nearly all cases I think ;)
Imagine running Windows 7 in 256Mb of RAM (MS suggest 1GB minimum). That's effectively what you're asking the user to do by wanting to reseve 90% of available RAM.
The real question is: Why do you need so much RAM? What is the 'performance critical' criteria exactly?
Usually, this kind of question implies there's something horribly wrong with your design.
Update:
Using top of the range RAM (DDR3) would give you a theoretical transfer speed of 12GB/s which equates to reading one 32 bit value every clock cycle with some bandwidth to spare. I'm fairly sure that it is not possible to do anything useful with the data coming into the CPU at that speed - instruction processing stalls would interrupt this flow. The extra, unsued bandwidth can be used to page data to/from a hard disk. Using RAID this transfer rate can be quite high (about 1/16th of the RAM bandwidth). So it would be feasible to transfer data to/from the disk and process it without having any degradation of performance - 16 cycles between reads is all it would take (OK, my maths might be a bit wrong here).
But if you throw Windows into the mix, it all goes to pot. Your memory can go away at any moment, your application can be paused arbitrarily and so on. Locking memory to RAM would have adverse affects on the whole system, thus defeating the purpose of locing the memory.
If you explain what you're trying to acheive and the performance critria, there are many people here that will help develop a suitable solution, because if you have to ask about system limits, you really are doing something wrong.
Even if you're able to stop your application from having memory paged out to disk, you'll still run into the problem that the VMM might be paging out other programs to disk and that might potentially affect your performance as well. Not to mention that another application might start up and consume memory that you're currently occupying and thus resulting in some of your applications memory being paged out. How are you planning to deal with that?
There is a way to use non-pageable memory via the non-paged pool but (a) this pool is comparatively small and (b) it's used by device drivers and might only be usable from inside the kernel. It's also not really recommended to use large chunks of it unless you want to make sure your system isn't that stable.
You might want to revisit the design of your application and try to work around the possibility of having memory paged to disk before you either try to write your own VMM or turn a Windows machine into essentially a DOS box with more memory.
The standard solution is to not worry about "virtual" and worry about "dynamic".
The "virtual" part of virtual memory has to be looked at as a hardware function that you can only defeat by writing your own OS.
The dynamic allocation of objects, however, is simply your application program's design.
Statically allocate simple arrays of the objects you'll need. Use those arrays of objects. Increase and decrease the size of those statically allocated arrays until you have performance problems.
Ouch. Non-paged pool (the amount of RAM which cannot be swapped or allocated to processes) is typically 256 MB. That's 12.5% of RAM on a 2GB machine. If another 90% of physical RAM would be allocated to a process, that leaves either -2,5% for all other applications, services, the kernel and drivers. Even if you'd allocate only 85% for your app, that would still leave only 2,5% = 51 MB.