We have a C++ based Multi-threaded application on Windows that captures network packets in real-time using the WinPCAP library and then processes these packets for monitoring the network. This application is intended to run 24x7. Our applicatin easily consumes 7-8 GB of RAM.
Issue that we are observing :
Lets say the application is monitoring 100Mbps of network traffic and consumes 60% CPU. We have observed that when the application keeps running for a longer duration like a day or two, the CPU consumption of the application increases to like 70-80%, even though it is still processing 100 Mbps traffic (doing the same amount of work).
We have tried to debug this issue to the thread level using ProcessExplorer and noticed that the packet capturing threads start consuming more CPU over time. This issue is not resolved even after re-starting the application. Only a machine restart solves the problem.
We have observed this issue is easily reproducible on Windows 2012 R2 Server OS during over night runs. In Windows 7, the issue happens but over few days.
Any idea what might be causing this ?
Thanks in Advance
What about memory allocation? Because you are using lots of memory it could be a memory fregmentation problem so if you do several allocation/reallocation of buffers this of course will cause a major cost for the processor to find and allocate space available.
I finally found the reason for the above behavior : it was the winpcap code that was causing it. After replacing that, we did not observe this behavior.
Related
I am using a 3d-lattice to update two fields in time, using OpenCL kernels for the update rule, and a C++ host program, and run my program under Windows 64bit with 8GB RAM. The application is built using VS2017.
My problem is: No matter whether I use my graphics card or the CPU for the computation, the application is paused by Windows after a brief time (about 15min), and I have to press a key in the open console to wake it up, afer which it continues running, but stops outputting status information to the console (which it should do).
This happens only when I use a lot of memory, i.e. compute on a big lattice with at least 3GB of allocated memory, with less memory consumption the program runs just fine for as long as I need it to.
Of course, I would like to be able to run my simulations without having to watch my PC all the time.I already tried increasing the priority of the process, which did not help.
Is there a way to tell Windows to leave my processes running?
I use C++ (Visual Studio 2015) and OpenCV (ver 3.2.0) to process data sent from Kinect v1. My C++ program has no problem when it starts debugging for the first time. After it stops debugging and re-start debugging, however, it gets very slow.
I am suspecting that the program closes without releasing some memory (i.e., memory leak). I am aware of that I would need to use the delete function to release the memory if I use the new function. But I didn't use the new function in the C++ program (I neither used the malloc() function, which is equivalent to the new function in C programs).
For OpenCV, I use the destroyAllWindows function at the end of the program. For Kinect v1, I also use the NuiShutdown(), Release(), and CloseHandle() functions at the end of the program.
Is there anything else I need do to release the memory (e.g., releasing memory associated with Mat in OpenCV)? Or is something else causing the decrease in processing speed?
I'd appreciate your help. Thanks.
After first run disconnect Kinect then reconnect and try second run.
If all goes well now then the problem is most likely stuck thread. The device access is usually handled by separate threads and especially with USB they can get stuck (in case of error or sync problem between accessing form host and expecting on device side) until you disconnect device (not sure which Kinect driver are you using but JUNGO version which NuiShutdown() infers have this problem). You can also check task manager before disconnection if there are not some stuck processes left after first run.
To remedy this you need to find out what are you doing wrong during access. It could be:
wrong USB port
use the back side not front slots.
invalid USB transfer request
device is always waiting for specific set of commands or stream and waits until it does not receive it so it blocks all other things. So using unsupported commands or reading in wrong times or sizes of packets can cause this.
USB communication is out of sync
PC host can timeout in case you do not have enough CPU power while critical operation is processed (or have opened too many apps on background).
This can be caused also by wrong gfx driver as I suspect you are using rendering ... Intel HD graphics can generate such problems with ease especially on notebooks. Try to disable any rendering in your app or at least limit rendering to OpenGL 1.0 to see if speed is the same in between runs. If this is the case the whole desktop usually flickers or is not repainting parts of apps ... and animations are sometimes sluggish.
Another problem might be a debugger. If without it all is well then debugger is the problem and you can not solve it. Debugging while accessing IO can cause sync and timeout problems especially with USB.
To check for memory leaks you can simply see how much free memory you got before 1st run and compare it to values after 1st,2nd,3th .. runs if the value lowers you got something stuck somewhere. After app close all the memory belonging to app is freed by OS so even if you forget some delete that does not matter unless some thread is still running ...
Some USB drivers based on libUSB I encountered got also problem with Handle leaks. But that behaves differently ... all runs fine until there are no free handles. After that OS is non functional you can not open any window,app, anything ... until any app is closed.
[Edit1] Front USB slots
Front slots are usually connected to motherboard with relatively long cable (usually flat and not very well shielded) so it is more susceptible to noise. Also as it is located usually around HDD and above high frequency parts of the motherboard it also induce it into the USB feed. All this degrades the quality of USB signal causing much much bigger rejection rate hence lowering sync capability and also the overall usable bandwidth.
If you compare that with backside USB ports they have no cables but are connected directly in PCB with short and well shielded paths so the connection quality is much much better.
So if you use device demanding high bandwith or synchronism then front ports are a bad choice.
I am using Eclipse for C/C++ development. I am trying to compile and run a project. When I compile and run the project after a while my CPU gets to 100% usage . I checked "Task Manager" and there I found that Eclipse isn't closing any of the previous build and it's running in the background which uses my CPU heavily. How do I solve this problem. When at 100% usage my PC becomes very very slow.
If you don't want the build to use up all your CPU time (maybe because you want to do other stuff while building) then you could decrease the parallelism of the build to a point where it leaves one or more cores unused. For example, if you have 8 cores you could configure your build to only use 6 of them.
Your build will take longer, but your machine will be more responsive for other tasks while the build runs.
Adding More RAM seems to have solved my problem. Disk usage is also low now. Maybe Since there wasnt enough RAM in my laptop the CPU was fetching data from the Disk directly which made the disk usage to go up.
Question:
My question is what will be the impact on my application memory footprint or performance if I replace functions like foo1 (which I have in my code) below with foo2. This function is called frequently in application.
#define SIZE 5000
void foo1()
{
double data[SIZE];
// ....
}
void foo2()
{
std::unique_ptr< double[] > data( new double[SIZE] );
// ....
}
Context:
My MFC application loads really slow on the embedded device running Windows 7 after implementation of new features/modules. The same application loads fast on PC. At least one of the difference and what I suspect is the cause is RAM on embedded unit is really low, just 768 MB.
I debugged it to find out where does this delay occurs and recorded time stamps within application in loading process. What I discovered was interesting. When I double click the exe, it takes about a minute to record the first time stamp and after that it runs fast, so all the delay is right there.
My theory is that windows is taking all this time to setup the environment for exe and once done, it runs fast. The reason I suspect this is there are a lot big structures declared on stack in the application to the point I had to move some of them to heap to get rid of stack overflow errors even on PC with new features.
What do you think is the cause of the slow or more accurately delayed loading of executable on low RAM machine? Do you think it will fix up if I move all of the big structures from stack to heap?
There are not a lot of things that take a minute in modern day computing. Not on a machine with an embedded version of Windows either. Not the processor, not the RAM, not the disk.
Except one, networking is still based on assumptions that were last valid in the 1980s. TCP/IP has taken over as the only protocol in common use. But has a flaw, there is no reasonable way to discover how long a connection attempt might take. So connection timeouts are based on absolute worst-case conditions, trying to hook up to a machine half-way around the world, connected with a modem that needs to spin up the drum to load the program.
The minimum timeout on Windows is hard-baked at 45 seconds. And, in general, a condition that certainly isn't unlikely in an embedded machine. You might have hooked it up to a network to get it initialized but it isn't connected anymore or the machine you copied from might no longer be powered up.
Chase it down by first looking for a disconnected disk drive, very common. Next use SysInternals' utilities like TcpView to look for network activity, like trying to connect to a CRL server. Use Process Explorer to find out where the program is stuck. Mark Russinovich' blog is excellent to show his trouble-shooting strategies using these tools. Good luck with it.
I am developing a multi threaded application which ran fine on my development system which has 8 cores. When I ran it on a PC with 2 cores I encountered some synchronization issues.
Apart from turning off hyper-threading is there any way of limiting the number of cores an application can use so that I can emulate single and dual core environments for testing & debugging.
My application is written in C++ using Visual Studio 2010.
We always test in virtual machines nowadays since it's so easy to set up specific environments with given limitations.
For example, VMWare easily allows you to limit the number of processors in use, how much memory there is, hard disk sizes, the presence of USB or floppies or printers and all sorts of other wondrous things.
In fact, we have scripts which do all the work at the push of a button, from restoring the VM to a known initial state, then booting it up, installing the code over the network, running a test cycle then moving the results to an analysis machine on the network as well.
It greatly speeds up and simplifies the testing regime.
You want the SetProcessAffinityMask function or the SetThreadAffinityMask function.
The former works on the whole process and the latter on a specific thread.
You can also limit the active cores via the Windows Task Manager. Right click on process name and select "Set Affinity".