Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Can someone provide a list of timers for C/C++ that they provide god level accuracy?
If for example I take 100 computers and start the program at the same microsecond in all of them, I want the timers to display the same time on all computers (with different CPUs and different CPU loads) after a year of continuously running.
Platform: Linux
Accuracy: 1 second but the timer must be EXACLTY 1 second, not 1 second and 1/1000000. this extra 1/1000000 is not acceptable. In other words, in one year of running, not even a second of lost accuracy is acceptable.
The timer must not need extra hardware.
Q1: What's the best timer the mankind made and it's free (chrono, setitimer, the one of the many boost timers, or something else?)
Q2:: Using this best timer, what kind of accuracy I can expect, when using an Ivy Bridge CPU?
The best timing accuracy you can get with your program is synchronizing with an atomic clock device, like the USNO Master Clock.
/sarcasm off
To give a few hints:
The C++ standard doesn't guarantee anything beyond milliseconds accuracy, and even these might end up in tenths of ms jittering (depends on OS).
Your hardware timers might provide better accuracy, but your drivers/applications still might introduce unwanted latencies
If you're really going to get nearly precise for your requirements, don't forget to keep in mind compensation of relativistic effects like height position, speed, etc. of the measurement equipment
Q2: Using this best timer, what kind of accuracy I can expect, when using an Ivy Bridge CPU?
Multiples of nanoseconds I'd guess, if done right (forget about that atomic clock joke when going to this direction).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a multi-threaded process. Each thread is CPU bound (performs calculations) and also uses a lot of memory. The process starts with 100% cpu utilization according to resource monitor, but after several hours, cpu utilization starts to degrade, slowly. After 24 hours, it's on 90-95% and falling.
The question is - what should I look for, and what best-known-methods can I use to debug this?
Additional info:
I have enough RAM - most of it is unused at any given moment.
According to perfmon - memory doesn't grow (so I don't think it's leaking).
The code is a mix of .Net and native c++, with some data marshaling back and forth.
I saw this on several different machines (servers with 24 logical cores).
One thing I saw in perfmon - Modified Page List Bytes indicator increases over time as CPU utilization degrades.
Edit 1
One of the third party libraries that is used is openfst. Looks like it's very related to some mis-usage of that library.
Specifically, I noticed that I have the following warnings:
warning LNK4087: CONSTANT keyword is obsolete; use DATA
Edit 2
Since the question is closed, and wasn't reopened, I will write my findings and how the issue was solved in the body of the question (sorry) for future users.
Turns out there is an openfst.def file that defines all the openfst FLAGS_* symbols to be used by consuming applications/dlls. I had to fix those to use the keyword "DATA" instead of "CONSTANT" (CONSTANT is obsolete because it's risky - more info: https://msdn.microsoft.com/en-us/library/aa271769(v=vs.60).aspx).
After that - no more degradation in CPU utilization was observed. No more rise in "modified page list bytes" indicator. I suspect that it was related to the default values of the FLAGS (specifically the garbage collection flags - FLAGS_fst_default_cache_gc) which were non deterministic because of the misusage of CONSTANT keyword in openfst.def file.
Conclusion Understand your warnings! Eliminate as much of them as you can!
Thanks.
For a non-obvious issue like this, you should also use a profiler that actually samples the underlying hardware counters in the CPU. Most profilers that I’m familiar with use kernel supplied statistics and not the underlying HW counters. This is especially true in Windows. (The reason is in part legacy, and in part that Windows wants its kernel statistics to be independent of hardware. PAPI APIs attempt to address this but are still relatively new.)
One of the best profilers is Intel’s VTune. Yes, I work for Intel but the internal HPC people use VTune as well. Unfortunately, it costs. If you’re a student, there are discounts. If not, there is a trial period.
You can find a lot of optimization and performance issue diagnosis information at software.intel.com. Here are pointers for optimization and for profiling. Even if you are not using an x86 architecture, the techniques are still valid.
As to what might be the issue, a degradation that slow is strange.
How often do you use new memory or access old? At what rate? If the rate is very slow, you might still be running into a situation where you are slowing using up a resource, e.g. pages.
What are your memory access patterns? Does it change over time? How rapidly? Perhaps your memory access patterns over time are spreading, resulting in more cache misses.
Perhaps your partitioning of the problem space is such that you have entered a new computational domain and there is no real pathology.
Look at whether there are periodic maintenance activities that take place over a longer interval, though this would result in a periodic degradation, say every 24 hours. This doesn’t sound like your situation since you are experiencing is a gradual degradation.
If you are using an x86 architecture, consider submitting a question in an Intel forum (e.g. "Intel® Clusters and HPC Technology" and "Software Tuning, Performance Optimization & Platform Monitoring").
Let us know what you ultimately find out.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I noticed that VLC media player acquired at times up to 98% of the CPU when performing a file conversion from MP4 to MP3. My understanding is that the OS tries to balance the time each process gets so this captured my attention. I have a feeling that programs like disk defragmenters and antivirus may also require processor cycles on such a magnitude. How it achieved in code( C,C++)?
It depends on OS, but OS tries to balance the time each process gets is usually not the prime objective.
A smart scheduler will instead utilise the available CPU(s) while still be responsive for higher priority things like user input and hardware events. A nicely behalves thread will also withdraw its time slice before its cpu quota if there is no more work to do (e.g. blocking for event), otherwise upon deadline the scheduler may take over the cpu(preempt) and give other thread a chance to execute.
You may set the thread priority as a hint to the scheduler, that may affect the take over condition, but it all depends on the scheduler and OS internals.
Simply put, you don't need to do special things to utilise a cpu core, if you have intensive calculation, the OS give the most to you.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I wish to run a very simple function a lot of times.
At first I thought about inlining the function (its only four lines long), so I figured that placing it in the header will do that automatically. gprof said that was a good idea. However I heard that pixel shaders are optimized for that purpose. I was wondering if this true? I have a simple function that takes 6 numbers and I wish to run it N times. Would a pixel shader speed things up?
Maybe a GPU could speed up your function, maybe not. It depends vastly on the function. GPUs are good at parallel execution. While a consumer-grade x86 CPU has 8 cores at most, graphic cards can execute a lot more calculations in parallel. But the bottleneck is often the transfer of data between GPU RAM and system RAM. When your function isn't actually that computationally expensive, that overhead might overshadow it.
In the end you can just try yourself, measure it, and see for yourself which is faster.
You might want to take a look at OpenCL, the most widely-supported standard for moving computation to the graphic card.
When you are living in Windows-land there is also DirectCompute which is a part of DirectX or the Accelerated Massive Parallelism extension for C++. There is also CUDA, but it only supports NVIDIA GPUs.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I know that the default is 15.6 ms per tick, but some loser may change it and then change back and forth again and again, and I need to poll what the current value is to perform valid QueryPerformanceCounter synchronization.
So is there an API way to get the timer resolution?
I'm on C++ BTW.
Windows timer resolution is provided by the hidden API call:
NTSTATUS NtQueryTimerResolution(OUT PULONG MinimumResolution,
OUT PULONG MaximumResolution,
OUT PULONG ActualResolution);
NtQueryTimerResolution is exported by the native Windows NT library NTDLL.DLL.
Common hardware platforms report 156,250 or 100,144 for ActualResolution; older platforms may report even larger numbers; newer systems, particulary when HPET (High Precision Event Timer) or constant/invariant TSC are supported, may return 156,001 for ActualResolution.
Calls to timeBeginPeriod(n) are reflected in ActualResolution.
More details in this answer.
This won't be helpful, another process can change it while you are calibrating.
This falls in the "if you can't beat them, join them" category. Call timeBeginPeriod(1) before you start calibrating. This ensures that you have a known rate that nobody can change. Getting the improved timer accuracy surely doesn't hurt either.
Do note that it is pretty unlikely that you can do better than QueryPerformanceFrequency(). Unless you calibrate for a very long time, the clock rate just isn't high enough to give you extra accuracy since you can never measure better than +/- 0.5 msec. And the timer event isn't delivered with millisecond accuracy, it can be arbitrarily delayed. If you calibrate over long periods then GetTickCount64() is plenty good enough.
The RDTSC instruction may be used to read the CPU time-stamp counter.
In most cases (if not all), this counter will change at the CPU clock rate.
If you want to be picky, you can also use an instruction like CPUID to serialize instructions.
Refer to the Intel manuals for more details.
You can work RDTSC against API's like QueryPerformanceCounter, et al.
In other words, use RDTSC before and after a call to make measurements.
WINAPI function GetSystemTimeAdjustment
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have a application (basically C++ application) which has below properties
Multi Threaded
Each thread has its own thread attributes (like stack size, etc).
Multi process (i.e will run multiple process).
Run on a 8 core processor.
Uses shared memory/IPC's/extensive heap management (allocation/deallocation), system sleep etc.
So now, I am supposed to find the system CAPS at max CPU. The ideal way is to load the system to 100% CPU and them check the CAPS (successful) the system supports.
I know, that in complex systems, CPU will be "dead" for context switches, page swaps, I/O etc.
But my system is max able to run at 95% CPU (not more than that irrespective of the load). So the idea here is to find out these points which is really contributing to "CPU eating" and then see if I can engineer them to reduce/eliminate the unused CPU's.
Question
How do we find out which IO/Context switching... etc is the cause of the un-conquerable 5% CPU? Is there any tool for this? I am aware of OProfile/Quantify and vmstat reports. But none of them would give this information.
There may be some operations which I am not aware - which may restrict the MAX CPU utilization. Any link/document which can help me in understanding a detailed set of operation which will reduce my CPU usage would be very helpful.
Edit 1:
Added some more information
a. The OS under question is SUSE10 Linux server.
b. CAPS - it is the average CALLS you can run on your system per second. Basically a telecommunication term - But it can be considered generic - Assume your application provides a protocol implementation. How many protocol calls can you make per second?
"100% CPU" is a convenient engineering concept, not a mathematical absolute. There's no objective definition of what it means. For instance, time spent waiting on DRAM is often counted as CPU time, but time spent waiting on Flash is counted as I/O time. With my hardware hat on, I'd say that both Flash and DRAM are solid-state cell-organized memories, and could be treated the same.
So, in this case, your system is running at "100% CPU" for engineering purposes. The load is CPU-limited, and you can measure the Calls Per Second in this state.