System not reaching 100% CPU how to trouble shoot [closed]

System not reaching 100% CPU how to trouble shoot [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have a application (basically C++ application) which has below properties
Multi Threaded
Each thread has its own thread attributes (like stack size, etc).
Multi process (i.e will run multiple process).
Run on a 8 core processor.
Uses shared memory/IPC's/extensive heap management (allocation/deallocation), system sleep etc.
So now, I am supposed to find the system CAPS at max CPU. The ideal way is to load the system to 100% CPU and them check the CAPS (successful) the system supports.
I know, that in complex systems, CPU will be "dead" for context switches, page swaps, I/O etc.
But my system is max able to run at 95% CPU (not more than that irrespective of the load). So the idea here is to find out these points which is really contributing to "CPU eating" and then see if I can engineer them to reduce/eliminate the unused CPU's.
Question
How do we find out which IO/Context switching... etc is the cause of the un-conquerable 5% CPU? Is there any tool for this? I am aware of OProfile/Quantify and vmstat reports. But none of them would give this information.
There may be some operations which I am not aware - which may restrict the MAX CPU utilization. Any link/document which can help me in understanding a detailed set of operation which will reduce my CPU usage would be very helpful.
Edit 1:
Added some more information
a. The OS under question is SUSE10 Linux server.
b. CAPS - it is the average CALLS you can run on your system per second. Basically a telecommunication term - But it can be considered generic - Assume your application provides a protocol implementation. How many protocol calls can you make per second?

"100% CPU" is a convenient engineering concept, not a mathematical absolute. There's no objective definition of what it means. For instance, time spent waiting on DRAM is often counted as CPU time, but time spent waiting on Flash is counted as I/O time. With my hardware hat on, I'd say that both Flash and DRAM are solid-state cell-organized memories, and could be treated the same.
So, in this case, your system is running at "100% CPU" for engineering purposes. The load is CPU-limited, and you can measure the Calls Per Second in this state.

Related

CPU utilization degradation over time [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a multi-threaded process. Each thread is CPU bound (performs calculations) and also uses a lot of memory. The process starts with 100% cpu utilization according to resource monitor, but after several hours, cpu utilization starts to degrade, slowly. After 24 hours, it's on 90-95% and falling.
The question is - what should I look for, and what best-known-methods can I use to debug this?
Additional info:
I have enough RAM - most of it is unused at any given moment.
According to perfmon - memory doesn't grow (so I don't think it's leaking).
The code is a mix of .Net and native c++, with some data marshaling back and forth.
I saw this on several different machines (servers with 24 logical cores).
One thing I saw in perfmon - Modified Page List Bytes indicator increases over time as CPU utilization degrades.
Edit 1
One of the third party libraries that is used is openfst. Looks like it's very related to some mis-usage of that library.
Specifically, I noticed that I have the following warnings:
warning LNK4087: CONSTANT keyword is obsolete; use DATA
Edit 2
Since the question is closed, and wasn't reopened, I will write my findings and how the issue was solved in the body of the question (sorry) for future users.
Turns out there is an openfst.def file that defines all the openfst FLAGS_* symbols to be used by consuming applications/dlls. I had to fix those to use the keyword "DATA" instead of "CONSTANT" (CONSTANT is obsolete because it's risky - more info: https://msdn.microsoft.com/en-us/library/aa271769(v=vs.60).aspx).
After that - no more degradation in CPU utilization was observed. No more rise in "modified page list bytes" indicator. I suspect that it was related to the default values of the FLAGS (specifically the garbage collection flags - FLAGS_fst_default_cache_gc) which were non deterministic because of the misusage of CONSTANT keyword in openfst.def file.
Conclusion Understand your warnings! Eliminate as much of them as you can!
Thanks.

For a non-obvious issue like this, you should also use a profiler that actually samples the underlying hardware counters in the CPU. Most profilers that I’m familiar with use kernel supplied statistics and not the underlying HW counters. This is especially true in Windows. (The reason is in part legacy, and in part that Windows wants its kernel statistics to be independent of hardware. PAPI APIs attempt to address this but are still relatively new.)
One of the best profilers is Intel’s VTune. Yes, I work for Intel but the internal HPC people use VTune as well. Unfortunately, it costs. If you’re a student, there are discounts. If not, there is a trial period.
You can find a lot of optimization and performance issue diagnosis information at software.intel.com. Here are pointers for optimization and for profiling. Even if you are not using an x86 architecture, the techniques are still valid.
As to what might be the issue, a degradation that slow is strange.
How often do you use new memory or access old? At what rate? If the rate is very slow, you might still be running into a situation where you are slowing using up a resource, e.g. pages.
What are your memory access patterns? Does it change over time? How rapidly? Perhaps your memory access patterns over time are spreading, resulting in more cache misses.
Perhaps your partitioning of the problem space is such that you have entered a new computational domain and there is no real pathology.
Look at whether there are periodic maintenance activities that take place over a longer interval, though this would result in a periodic degradation, say every 24 hours. This doesn’t sound like your situation since you are experiencing is a gradual degradation.
If you are using an x86 architecture, consider submitting a question in an Intel forum (e.g. "Intel® Clusters and HPC Technology" and "Software Tuning, Performance Optimization & Platform Monitoring").
Let us know what you ultimately find out.

How to decide whether to use hyperthreading or not? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hyperthreading can hurt performance of some applications and it should not be used. From the Microsoft website (https://msdn.microsoft.com/en-us/library/cc615012%28BTS.10%29.aspx):
It is critical hyper-threading be turned off for BizTalk Server
computers. Hyper-threading makes the server appear to have more
processors/processor cores than it actually does; however
hyper-threaded processors typically provide between 20 and 30% of the
performance of a physical processor/processor core. When BizTalk
Server counts the number of processors to adjust its self-tuning
algorithms; the hyper-threaded processors cause these adjustments to
be skewed which is detrimental to overall performance.
Process Lasso program allows to disable hyperthreading for some processes:
You can use programs like Process Lasso (free) to set default CPU
affinities for critical processes, so that their threads never get
allocated to logical cores. We call this feature HyperThreaded Core
Avoidance.
I've got some older programs which perform a lot of mathematical computations. It is frustrating to see them use one core if they could use 4. I want to rewrite them to use many threads. They use large continuous memory blocks so number of cache misses is minimal. My questions are following:
How to decide whether to use hyperthreading or not in your application ? (general guidance with some technical details if necessary)
Does it come down to performing experiments to make final decision ?
How to avoid hyperthreading in your application if it is not advantageous ? (examples in c++ and c)

I don't know Process Lasso works wrt "disabling HyperThreading". For that particular app, the best you can do is to inject DLL into every process into the system, call SetProcessAffinityMask with something that only amounts to a guess, disable every other core, in the hopes that the OS will avoid scheduling to the hyperthreaded logical cores.
Guesses and hopes, there's nothing in the Windows API that will do this for certain. This answers your third bullet point.
You can disable HyperThreading as the BIOS level (usually).
I can't comment on the Microsoft advice of disabling HT for BizTalk, your linked article, since I can't find a date for this article. The only interesting bit was about "Assigning interrupt affinity to logical processors...", new to me. The only other advice in that article regarding HT is rather weak.
On a larger note: I don't know why you're asking about HyperThreading, when you should be concerned with multithreading in general. If you're concerned about multiple threads contending for the same shared resource... then don't use threads in your app.
A humorous aside: the same company also sells a product called SmartTrim, reminiscent of the RAM-doublers that were popular in the '90's.

Basically, it comes down to configuring the number of concurrent threads executing CPU workloads. The OS is aware of hyperthreading, and will assign threads to physical cores until it runs out, and only if there are more threads than physical cores will it start assigning work to logical cores.
To decide whether the optimal number of threads is the number of physical or logical cores, measuring performance of your real tasks is the best approach. Synthetic benchmarks can teach you something about how hyperthreading works, but won't tell you what is best for your particular mix of instructions.
The exact way to control number of threads depends on the multithreading construct you use -- if you create threads yourself, it is obvious, but threadpool and automated parallelism frameworks such as OpenMP provide ways to tune thread count also.

How is a process able to acquire many CPU cycles? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I noticed that VLC media player acquired at times up to 98% of the CPU when performing a file conversion from MP4 to MP3. My understanding is that the OS tries to balance the time each process gets so this captured my attention. I have a feeling that programs like disk defragmenters and antivirus may also require processor cycles on such a magnitude. How it achieved in code( C,C++)?

It depends on OS, but OS tries to balance the time each process gets is usually not the prime objective.
A smart scheduler will instead utilise the available CPU(s) while still be responsive for higher priority things like user input and hardware events. A nicely behalves thread will also withdraw its time slice before its cpu quota if there is no more work to do (e.g. blocking for event), otherwise upon deadline the scheduler may take over the cpu(preempt) and give other thread a chance to execute.
You may set the thread priority as a hint to the scheduler, that may affect the take over condition, but it all depends on the scheduler and OS internals.
Simply put, you don't need to do special things to utilise a cpu core, if you have intensive calculation, the OS give the most to you.

OS task scheduling emulator [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm trying to find a c or c++ library which is could work with tasks(or threads) in preemptive way. I need a mechanism which could manage tasks one by one like in RTOS. Creating tasks (function as an entry point to a task), timeslicing, switching etc...
Is it possible to write it in a user-space?

The simplest solution is perhaps to run a real RTOS in a virtual machine or processor emulator. Any RTOS with an x86 port might be persuaded to run in a PC VM, but you could also use QEMU.
For example you can run RTEMS on QUEMU, and QUEMU itself can emulate ARM targets - though that may not matter and the i386 emulation may suit your needs and will be faster.

RTOS scheduling/dispatching to handle threads in an efficient manner requires hardware interrupts to communicate effectively with peripheral hardware, (KB, mouse, disk, NIC, timer etc). Standard C has no means of handling interrupts, so you cannot do it.
If you have memory-management hardware that defines separate user and kernel memory access rights, then no - a hardware interrupt will change state in hardware and so you will leave user space whether you want to or not.
You should be aware that preemptive schedulers are not primarily designed to switch between tasks that need CPU upon a timer interrupt - they are designed first to provide efficent, high-performance I/O by removing CPU from tasks that don't need it because their I/O requests cannot be met immediately.

C/C++ timer, like Swiss watches [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Can someone provide a list of timers for C/C++ that they provide god level accuracy?
If for example I take 100 computers and start the program at the same microsecond in all of them, I want the timers to display the same time on all computers (with different CPUs and different CPU loads) after a year of continuously running.
Platform: Linux
Accuracy: 1 second but the timer must be EXACLTY 1 second, not 1 second and 1/1000000. this extra 1/1000000 is not acceptable. In other words, in one year of running, not even a second of lost accuracy is acceptable.
The timer must not need extra hardware.
Q1: What's the best timer the mankind made and it's free (chrono, setitimer, the one of the many boost timers, or something else?)
Q2:: Using this best timer, what kind of accuracy I can expect, when using an Ivy Bridge CPU?

The best timing accuracy you can get with your program is synchronizing with an atomic clock device, like the USNO Master Clock.
/sarcasm off
To give a few hints:
The C++ standard doesn't guarantee anything beyond milliseconds accuracy, and even these might end up in tenths of ms jittering (depends on OS).
Your hardware timers might provide better accuracy, but your drivers/applications still might introduce unwanted latencies
If you're really going to get nearly precise for your requirements, don't forget to keep in mind compensation of relativistic effects like height position, speed, etc. of the measurement equipment
Q2: Using this best timer, what kind of accuracy I can expect, when using an Ivy Bridge CPU?
Multiples of nanoseconds I'd guess, if done right (forget about that atomic clock joke when going to this direction).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js