I have an application which runs through a wrapper and submitted as a job on grid (Linux).
y task is to monitor the RAM and virtual memory usage of the process and if the process fails due to memory issue, resubmit it again to grid with a higher memory requirement ( using some switch ).
I think this can be achieved by invoking a separate thread from the application which watches the main application and in case of failure relaunch the main application.
I am seeking for an advice for better solution to this problem.
Thanks
Ruchi
Thread will not work, since C and C++ mandate that returning from the main function kills all running threads (courtesy Do child threads exit when the parent thread terminates).
You will need to make it another process, perhaps a script that starts the process which then manages your application.
An usuall way of doing it'd be to check when memory allocation fails, i.e malloc(). If malloc() fails that's an indication that, your systems memory is almost full and on that particular case, you can do what you like to do.
I'm having exactly the same problem described here:
timer_create() : -1 EAGAIN (Resource temporarily unavailable)
in short, some process is reserving a lot of timers via timer_create but never release them.
What I cannot figure out is how to determine the process affected by the leak in our production environment.
How could I know what process is the bad one, without randomly killing all the running stuff?
Any /proc/`pidof myprocess`/ debug info that tell me how many timers are reserved?
Thank you in advance!
Why yes, actually. Use the stap tool to trace system calls and determine which calls processes make most often.
The SystemTap Beginners Guide is a good resource. In particular, see the script on this page for an example of counting specific system calls per process.
I'm trying to understand what the mechanism is for getting a string from a c++ daemon I've written to Java for use by a UI. I'll post a picture of what I envision, then continue the question afterward:
There are two issues that I envision here:
1) The semaphore needs to be available to the library. In Windows, that could've been done with a named semaphore and access to it's handle. In Linux, I've been pointed toward using a semaphore in shared memory and making processes aware of it through a key to the shared memory. It's vague to me, but will that concept work to synchronize Java and the daemon?
2) Do I have to place the queue in shared memory in order to make the ??? link in the above chart work? Can and should the queue reside in the .so?
So those are my concerns. I'd love and welcome any and all help, challenges, and pleas for sanity and will do my best to provide all additionally necessary information. Thanks in advance.
You're running both applications in a separate process, in vanilla Linux this means you cannot communicate between these processes via memory directly. The Java VM is a process, and the C++ daemon is a process. It's in separate memory locations which are btw scrambled by the Memory Manager (MMU). So there is no way of getting memory access.
Google on "inner process communication" if you'd like. I prefer to run with socketpair for bi-directional parent-child communication.
I have my application(VC MFC) run with gflags with Pageheap enabled to track down the page heap corruption.
Now the application has crashed and it shows this error, I could not interpret these lines (other than having a feel of resource inavailablity)
Can anyone throw a light on what exactly is the reason that has caused the crash of the app?
(info: Application is a multithreaded one about 500 threads running,in a multi - processor machine)
kernel32!RaiseException+53
msvcrt!_CxxThrowException+36
mfc42u!AfxThrowResourceException+19
mfc42u!AfxRegisterWndClass+ab
mfc42u!CAsyncSocket::AttachHandle+5c
mfc42u!CAsyncSocket::Socket+25
mfc42u!CAsyncSocket::Create+14
This same problem has driven me nuts but finally i fixed it and it is working. This is bug with MFC socket library that when inside a thread [other than main application thread], If we try to do something like
CSocket socket;
socket.Create();
It will throws an unhandled exception. I found an article on it See What Microsoft says about this
that said something from Microsoft but that did not help me either. So here is a workaround i have found and i hope it can help some frustrated fellow like me.
Inside thread, do this
CSocket mySock;
SOCKET sockethandle = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
mySock.m_hSocket= sockethandle;
After that DO NOT call mySock.Create as it has been created already through assignment of socket handle. I am not sure if we can use mySock.Attach(sockethandle) as i did not try it yet.
After that you can call Connect etc directly.
When you are done using the socket, DO NOT call mySock.Close() - rather call closesocket(mySock.m_hSocket); And that will free the socket object. If Attach works in above case then i guess we need to do Detach here when to free the socket.
Good Luck
I wonder if this is your actual heap corruption issue, or if your program has just hit a resource limitation as a consequence of running with Pageheap.
I can't remember the exact details, but Pageheap incurs extra memory overhead, so much so that you can run out of memory much sooner than you would without Pageheap enabled.
With 500 threads running, you have a 1MB stack for each, plus any memory they've allocated dynamically along the way.
CAsyncSocket::AttachHandle triggers AfxThrowResourceException if it can't create a window. It seems that your system is saturated due to Pageheap.
Do you have to have 500 threads running to reproduce the problem? Maybe if you could lower that count a little, there would be more resources available.
I had the same problem, and after trying many things I've notice the following CAsyncSocket reference:
Create is not thread-safe. If you are calling it in a multi-threaded environment where it could be invoked simultaneously by different threads, be sure to protect each call with a mutex or other synchronization lock.
After adding a Mutex synchronization it no longer throws an exception.
I'd like to emulate violent system shutdown, i.e. to get as close as possible to power outage on an application level. We are talking about c/c++ application on Linux. I need the application to terminate itself.
Currently i see several options:
call exit()
call _exit()
call abort()
do division by zero or dereference NULL.
other options?
What is the best choice?
Partly duplicate of this question
IMHO the closest to come to a power outrage is to run the application in a VM and to power of the VM without shutting down. In all other cases where the OS is still running when the application terminates the OS will do some cleanup that would not occur in a real power outage.
At the application level, the most violent you can get is _exit(). Division by zero, segfaults, etc are all signals, which can be trapped - if untrapped, they're basically the same as _exit(), but may leave a coredump depending on the signal.
If you truly want a hard shutdown, the best bet is to cut power in the most violent way possible. Invoking /sbin/poweroff -fn is about as close as you can get, although it may do some cleanup at the hardware level on its way out.
If you really want to stress things, though, your best bet is to really, truly cut the power - install some sort of software controlled relay on the power cord, and have the software cut that. The uncontrolled loss of power will turn up all sorts of weird stuff. For example, data on disk can be corrupted due to RAM losing power before the DMA controller or hard disk. This is not something you can test by anything other than actually cutting power, in your production hardware configuration, over multiple trials.
kill -9
It kills a process and does not allow any signal handlers to run.
Why not do a halt? Or call panic?
Try
raise(SIGKILL)
in the process,
or from the command line:
kill -9 pid
where pid is the PID of your process (these two methods are equivalent and should not perform any cleanup)
You're unclear as to what your requirements are. If you're doing tests of how you will recover from a power failure, you need to actually cause a power failure. Even doing things like a kernel panic will allow write buffers on hard disks to flush, since they are independent of the CPU.
A remote power strip might be a solution if you really need to test the complete failure case.
You could try using a virtual machine.
Freeze it, screw it hard, and see what happens.
Otherwise kill -9 would be the best solution.
If you need the application to terminate itself, the following seems appropriate:
kill(getpid(), SIGKILL); // same as kill -9
If that's not violent enough (and it may not be), then I like the idea of terminating a VM inside which your application is running. You should be able to rig up something where the application can send a command to the host machine (via ssh or something) to terminate its own VM.
I've had regression tests that we used to perform where we flicked the power switch to OFF.
While doing disk IO.
Failure to recover later was, well: a failure.
You can buy reliability like that: generally you'll need an "end user certificate".
You can get there in software by talking (dirty) to your UPS.
APC UPSes will definitely do power off under software control!
Who says systems can't power cycle themselves ?
Infinite recursion, should run out of stack space (if not, the OOM killer will finish the job):
void a() { a(); }
Fork bomb (if the app doesn't have any fork limits then the OOM killer should kill the app at some point):
while(1)
fork();
Run out of memory:
while(1)
malloc(1);
Within a single running process kill(getpid(), SIGKILL) is the most extreme, as no cleanup is possible.
Otherwise, try a VM, or put a test machine on a power strip and turn the power off, if you are doing automated testing.
Any solution where the process terminates itself programatically does not emulate an asynchronous termination in any way. It is entirely deterministic in the sense that it will terminate at the same point in the code every time.
Of your suggestions
exit() is defined to "terminate normally, performing the regular cleanup for terminating processes." - hardly violent!
_exit() performs some subset of exit()'s operations, but remains 'nice' to the application, the OS, and its resources.
abort() creates a SIGABRT, and the OS may choose to perform clean-up of some resources.
/ 0 probably has similar behaviour to abort()
It is probably best not to have the application terminate itself, but have some external process kill it asynchronously so that termination may occur at any point in the execution. Use kill from another process or script on a randomised timer, to send the SIGKILL signal which cannot be trapped, and performs no clean-up. If you must have the process terminate itself, do it from some asynchronous thread that wakes up after some non-deterministic time, and kills the process, but even then you will know which thread was running when it ternminated. Even using these methods, there is no way a process can be terminated mid-cpu-cycle as a real power down might, and any cached or buffered data pending output may still appear or be written after process termination.
As pointed out, try to consume as much resources as possible until the kernel kills you:
while(1)
{
malloc(1);
fork();
}
Another way is trying to write to a read only page, just keep writing memory until you get a bus error.
If you can get to the kernel, a great way to kill it is simply writing over a data structure the kernel uses, bonus points if you find a page as only readable and marked as writable and then overwrite it. BTW most linux kernels allow writing to the syscall_table or interrupt table, if you write there your system will crash for sure.
On a recent system, a process with superuser privileges could take realtime CPU/IO priority, lock all addressable memory, spew garbage across /proc, /dev, /sys, LAN/WiFi, firmware ioctls, and flash memory simultaneously, overclock/overvolt the CPU/GPU/RAM, and have a good chance of exiting by causing something nearby to Halt and Catch Fire.
If the process only needs to do metaphorical violence, it could stop at /proc/sysrq-trigger.