This is a sequel to my previous question. I am using fork to create child process. Inside child, I am giving command to run a process as follows:
if((childpid=fork())==0)
{
system("./runBinary ");
exit(1)
}
My runBinary has the functionality of measuring how much time it takes from start to finish.
What amazes me is that when I run runBinary directly on command-line, it takes ~60 seconds. However, when I run it as a child process, it takes more, like ~75 or more. Is there something which I can do or am currently doing wrong, which is leading to this?
Thanks for the help in advance.
MORE DETAILS: I am running on linux RHEL server, with 24 cores. I am measuring CPU time. At a time, I only fork 8 child (sequentially), each of which is bound to different core, using taskset (not shown in code). The system is not loaded except for my own program.
The system() function is to invoke the shell. You can do anything inside it, including running a script. This gives you a lot of flexibility, but it comes with a price: you're loading a shell, and then runBinary inside it. Although I don't think loading the shell would be responsible to so much time difference (15 seconds is a lot, after all), since it doesn't seem you need that - just to run the app - try using something from the exec() family instead.
Without profiling the application, if the parent process which forks has a large memory space, you might find that there is time spent attempting to fork the process itself, and attempts to duplicate the memory space.
This isn't a problem in Red Hat Enterprise Linux 6, but was in earlier versions of Red Hat Enterprise Linux 5.
Related
I have been working for 2.5 years on a personal flight sim project on my leisure time, written in C++ and using Opengl on a windows 7 PC.
I recently had to move to windows 10. Hardware is exactly the same. I reinstalled Code::blocks.
It turns out that on first launch of my project after the system start, performance is OK, similar to what I used to see with windows 7. But, the second, third, and all next launches give me lower performance, with significant less fluidity in frame rate compared to the first run, detectable by eye. This never happened with windows 7.
Any time I start my system, first run is fast, next ones are slower.
I had a look at the task manager while doing some runs. The first run is handled by one of the 4 cores of my CPU (iCore5-6500) at approximately 85%. For the next runs, the load is spread accross the 4 cores. During those slower runs on 4 cores, I tried to modify the affinity and direct my program to only one core without significant improvement in performance. The selected core was working at full load, though.
My C++ code doesn't explicitly use any thread function at this stage. From my modest programmer's point of view, there is only one main thread run in the main(). On the task manager, I can see that some 10 to 14 threads are alive when my program runs. I guess (wrongly?) that they are implicitly created by the use of joysticks, track ir or other communication task with GPU...
Could it come from memory not being correctly freed when my program stops? I thought windows would free it properly, even if I forgot some 'delete' after using 'new'.
Has anyone encountered a similar situation? Any explanation coming to your minds?
Any suggestion to better understand these facts? Obviously, my ultimate goal is to have a consistent performance level whatever the number of launches.
trying to upload screenshots of second run as viewed by task manager:
trying to upload screenshots of first run as viewed by task manager:
Well I got a problems when switching to win10 for clients at my work too here few I encountered all because Windows10 has changed up scheduling of processes creating a lot of issues like:
older windowses blockless thread synchronizations techniques not working anymore
well placed Sleep() helps sometimes. Btw. similar problems was encountered when switching from w2k to wxp.
huge slowdowns and frequent freezes for few seconds on older single threaded apps
usually setting affinity to single core solves this. You can do this also in task manager just to check and if helps then you can do this in code too. Here an example on how to do it with winapi:
Cache size estimation on your system?
messed up drivers timings causing zombies processes even total freeze and or BSOD
I deal with USB in my work and its a nightmare sometimes on win10. On top of all this Win10 tends to enforce wrong drivers on devices (like gfx cards, custom USB systems etc ...)
auto freeze close app if it does not respond the wndproc in time
In Windows10 the timeout is much much smaller than in older versions. If the case You can try running in compatibility mode (set in icon properties on desktop) for older windows (however does not work for #1,#2), or change the apps code to speed up response. For example in VCL you can call Process Messages from inside of blocking code to remedy this... Or you can use threads for the heavy lifting ... just be careful with rendering and using winapi as accessing some winapi (any window/visual related stuff) functions from outside main thread causes havoc ...
On top of all this old IDEs (especially for MCUs) don't work properly anymore and new ones are usually much worse (or unusable because of lack of functionality that was present in older versions) to work with so I stayed faith full to Windows7 for developer purposes.
If none of above helps then try to log the times some of your task did need ... it might show you which part of code is the problem. I usually do this using timing graph like this:
both x,y axises are time and each task has its own color and row in graph. the graph is scrolling in time (to the left side in my case) and has changeable time scale. The numbers are showing actual and max (or sliding avg) value ...
This way I can see if some task is not taking too much time or even overlaps its next execution, peaks are also nicely visible and all runs during runtime without any debug tools which might change the behavior of execution.
I'm working on an application that uses multiple threads to process its data. The app is developped in C++ (Intel C++ comp. 9.1) and uses OpenMP. It is a 64 bit app running on Win7.
The problem is that when I run it during day, it runs correctly. But when I run it during night after the screen has been locked, it enters in a forever loop after a few processes.
To be more precise, the app is called many times for different files to process. The calls are done within a batch file (no problem there).
I found that it enters in the forever loop about 2 hours after the lock screen occurs.
I disabled all power saving settings. But nothing changed.
It is not very clear as description but the reason is that I don't have a clue about the source of the problem. I just hope someone among you could have had the same problem (and found a fix!). If you want more details, just let me know.
Any idea? Thanks in advance!
As my tests go on, I installed the same setup (but in release rather than debug version) on another computer. I ran into the same problem after 20 minutes (after the screen lock) with another set of data. I ran the same data on my own computer (which is not locked) and everything was fine.
I'm mystified!
Are you giving a thread priority that is taking control of the application?
Also, I would suggest taking running it through some kind of profiling, such as VTune as it can point out potential odd cases that could be causing an issue for you. (There is a free evaluation that you can try).
I am writing a program that needs to run a set of executables and find their execution times.
My first approach was just to run a process, start a timer and see the difference between the start time and the moment when process returns exit value.
Unfortunately, this program will not run on the dedicated machine so many other processes / threads can greatly change the execution time.
I would like to get time in milliseconds / clocks that actually was given to the process by OS. I hope that windows stores that information somewhere but I cannot find anything useful on the msdn.
Sure one solution is to run the process multiple times and calculate the avrage time, but I wont to avoid that.
Thanks.
You can take a look at GetProcessTimes API.
The "High-Performance Counter" might be what you're looking for.
I've used QueryPerformanceCounter/QueryPerformanceFrequency for high-resolution timing in stuff like 3D programming where the stock functionality just doesn't cut it.
You could also try the RDTSC x86 instruction.
short version:
Is there a good time based sampling profiler for Linux?
long version:
I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.
The problem was a tight loop, spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen of c++filt to abi::__cxa_demangle. The runtime went from more than a minute to a little over a second. About a x60 speed up.
Is there a way I could have configured OProfile to flag the popen call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).
Here is my OProfile configuration:
$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536
Is there another profiler for Linux that could have found the bottleneck?
I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen call) OProfile would just place its sample at the blocked call.
If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that have inefficient blocking calls.
Glad you asked. I believe OProfile can be made to do what I consider the right thing, which is to take stack samples on wall-clock time when the program is being slow and, if it won't let you examine individual stack samples, at least summarize for each line of code that appears on samples, the percent of samples the line appears on. That is a direct measure of what would be saved if that line were not there. Here's one discussion. Here's another, and another. And, as Paul said, Zoom should do it.
If your time went from 60 sec to 1 sec, that implies every single stack sample would have had a 59/60 probability of showing you the problem.
Try Zoom - I believe it will let you profile all processes - it would be interesting to know if it highlights your problem in this case.
I wrote this a long time ago, only because I couldn't find anything better: https://github.com/dicej/profile
I just found this, too, though I haven't tried it: https://github.com/oliver/ptrace-sampler
Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html
It appends backtrace(3) to a file on SIGUSR1, and then converts it to annotated source.
After trying everything suggested here (except for the now-defunct Zoom, which is still available as huge file from dropbox), I found that NOTHING does what Mr. Dunlavey recommends. The "quick hacks" listed above in some of the answers wouldn't build for me, or didn't work for me either. Spent all day trying stuff... and nothing could find fseek as a hotspot in an otherwise simple test program that was I/O bound.
So I coded up yet another profiler, this time with no build dependencies, based on GDB, so it should "just work" for almost any debuggable code. A single CPP file.
https://github.com/jasonrohrer/wallClockProfiler
It automates the manual process suggested by Mr. Dunlavey, interrupting the target process with GDB periodically and harvesting a stack trace, and then printing a report at the end about which stack traces are the most common. Those are your true wall-clock hotspots. And it actually works.
Is it possible to change the name(the one that apears under 'processes' in Task Manager) of a process at runtime in win32? I want the program to be able to change it's own name, not other program's. Help would be appreciated, preferably in C++. And to dispel any thoughts of viruses, no this isn't a virus, yes I know what I'm doing, it's for my own use.
I would like to submit what i believe IS a valid reason for changing the process name at runtime:
I have an exe that runs continuously on a server -- though it is not a service. Several instances of this process can run on the server. The process is a scheduling system. An instance of the process runs for each line that is being scheduled, monitored and controlled. Imagine a factory with 7 lines to be scheduled. Main Assembly line, 3 sub assembly lines, and 3 machining lines.
Rather than see sched.exe 7 times in task manager, it would be more helpful to see:
sched-main
sched-sub1
sched-sub2
sched-sub3
sched-mach1
sched-mach2
sched-mach3
This would be much more helpful to the Administrator ( the user in this situation should never see task manager). If one process is hung, the Administrator can easily know which one to kill and restart.
I know you're asking for Win32, but under most *nixes, this can be accomplished by just changing argv[0]
I found code for doing that in VB. I believe it won't be too hard to convert it to C++ code.
A good book about low level stuff is Microsoft Windows Internals.
And I agree with Peter Ruderman
This is not something you should do.