How to monitor TLB misses (windows)? - c++

Is there a way in windows to monitor the TLB misses of an application? The resource monitor in windows shows me 0 Hard faults/sec (so, TLB misses where the page isn't in main memory). But is there also some way to monitor TLB misses where the page is in main memory?
I have an app where I need to make random accesses across about 100GB of data. I'm running it on a computer with 160GB of ram and am keeping all the data in the working set. Still, what I'm seeing is that during the sections where the random access happens the CPUs are shown in windows task manager to only run at about 20% load (the app is multi-threaded with as many threads as there are cpu cores, without any critical sections and with no I/O).
I'm currently suspecting TLB misses to be the problem and wonder how I could confirm/reject this theory.

Related

Retrieving disk read/write max speed (programmatically)

I am in the process of creating a C++ application that measures disk usage. I've been able to retrieve current disk usage (read and write speeds) by reading /proc/diskstats at regular intervals.
I would now like to be able to display this usage as a percentage (I find it is more user-friendly than raw numbers, which can be hard to interpret). Therefore, does anyone know of a method for retrieving maximum (or nominal) disk I/O speed programmatically on Linux (API call, reading a file, etc)?
I am aware of various answers about measuring disks speeds(eg https://askubuntu.com/questions/87035/how-to-check-hard-disk-performance), but all are through testing. I would like to avoid such methods as they take some time to run and entail heavy disk I/O while running (thus potentially degrading the performance of other running applications).
In the advent of IBM PC era, there was a great DOS utility, I forgot its name, but it was measuring the speed of the computer (maybe Speedtest? whatever). There was a bar in the 2/3 bottom of the screen, which is represented the speed of the CPU. If you had a 4.0 MHz (not GHz!) the bar occupied the 10% of the screen.
2-3 years later, '386 computers has risen, and the speed indicator bar overgrown not just the line but the screen, and it looked crappy.
So, there is no such as 100% disk speed, CPU speed etc.
The best you can do: if you program runs for a while, you can remember the highest value and set it as 100%. Probably you may save the value into a tmp file.

Windows pauses my process after a short time if memory consumption is high

I am using a 3d-lattice to update two fields in time, using OpenCL kernels for the update rule, and a C++ host program, and run my program under Windows 64bit with 8GB RAM. The application is built using VS2017.
My problem is: No matter whether I use my graphics card or the CPU for the computation, the application is paused by Windows after a brief time (about 15min), and I have to press a key in the open console to wake it up, afer which it continues running, but stops outputting status information to the console (which it should do).
This happens only when I use a lot of memory, i.e. compute on a big lattice with at least 3GB of allocated memory, with less memory consumption the program runs just fine for as long as I need it to.
Of course, I would like to be able to run my simulations without having to watch my PC all the time.I already tried increasing the priority of the process, which did not help.
Is there a way to tell Windows to leave my processes running?

Windows based C++ application consumes more CPU over time

We have a C++ based Multi-threaded application on Windows that captures network packets in real-time using the WinPCAP library and then processes these packets for monitoring the network. This application is intended to run 24x7. Our applicatin easily consumes 7-8 GB of RAM.
Issue that we are observing :
Lets say the application is monitoring 100Mbps of network traffic and consumes 60% CPU. We have observed that when the application keeps running for a longer duration like a day or two, the CPU consumption of the application increases to like 70-80%, even though it is still processing 100 Mbps traffic (doing the same amount of work).
We have tried to debug this issue to the thread level using ProcessExplorer and noticed that the packet capturing threads start consuming more CPU over time. This issue is not resolved even after re-starting the application. Only a machine restart solves the problem.
We have observed this issue is easily reproducible on Windows 2012 R2 Server OS during over night runs. In Windows 7, the issue happens but over few days.
Any idea what might be causing this ?
Thanks in Advance
What about memory allocation? Because you are using lots of memory it could be a memory fregmentation problem so if you do several allocation/reallocation of buffers this of course will cause a major cost for the processor to find and allocate space available.
I finally found the reason for the above behavior : it was the winpcap code that was causing it. After replacing that, we did not observe this behavior.

Slow or delayed loading of my application

Question:
My question is what will be the impact on my application memory footprint or performance if I replace functions like foo1 (which I have in my code) below with foo2. This function is called frequently in application.
#define SIZE 5000
void foo1()
{
double data[SIZE];
// ....
}
void foo2()
{
std::unique_ptr< double[] > data( new double[SIZE] );
// ....
}
Context:
My MFC application loads really slow on the embedded device running Windows 7 after implementation of new features/modules. The same application loads fast on PC. At least one of the difference and what I suspect is the cause is RAM on embedded unit is really low, just 768 MB.
I debugged it to find out where does this delay occurs and recorded time stamps within application in loading process. What I discovered was interesting. When I double click the exe, it takes about a minute to record the first time stamp and after that it runs fast, so all the delay is right there.
My theory is that windows is taking all this time to setup the environment for exe and once done, it runs fast. The reason I suspect this is there are a lot big structures declared on stack in the application to the point I had to move some of them to heap to get rid of stack overflow errors even on PC with new features.
What do you think is the cause of the slow or more accurately delayed loading of executable on low RAM machine? Do you think it will fix up if I move all of the big structures from stack to heap?
There are not a lot of things that take a minute in modern day computing. Not on a machine with an embedded version of Windows either. Not the processor, not the RAM, not the disk.
Except one, networking is still based on assumptions that were last valid in the 1980s. TCP/IP has taken over as the only protocol in common use. But has a flaw, there is no reasonable way to discover how long a connection attempt might take. So connection timeouts are based on absolute worst-case conditions, trying to hook up to a machine half-way around the world, connected with a modem that needs to spin up the drum to load the program.
The minimum timeout on Windows is hard-baked at 45 seconds. And, in general, a condition that certainly isn't unlikely in an embedded machine. You might have hooked it up to a network to get it initialized but it isn't connected anymore or the machine you copied from might no longer be powered up.
Chase it down by first looking for a disconnected disk drive, very common. Next use SysInternals' utilities like TcpView to look for network activity, like trying to connect to a CRL server. Use Process Explorer to find out where the program is stuck. Mark Russinovich' blog is excellent to show his trouble-shooting strategies using these tools. Good luck with it.

Octo.py only using between 0% and 3% of my CPUs

I have been running a Python octo.py script to do word counting/author on a series of files. The script works well -- I tried it on a limited set of data and am getting the correct results.
But when I run it on the complete data set it takes forever. I am running on a windows XP laptop with dual core 2.33 GHz and 2 GB RAM.
I opened up my CPU usage and it shows the processors running at 0%-3% of maximum.
What can I do to force Octo.py to utilize more CPU?
Thanks.
As your application isn't very CPU intensive, the slow disk turns out to be the bottleneck. Old 5200 RPM laptop hard drives are very slow, which, in addition to fragmentation and low RAM (which impacts disk caching), make reading very slow. This in turns slows down processing and yields low CPU usage. You can try defragmenting, compressing the input files (as they become smaller in disk size, processing speed will increase) or other means of improving IO.