My server exited with code 137 - c++

I wrote a C++ server/client pair using C++11, boost::asio and HDF5. The server was running fine for a some time (2 days), and then it stopped with code 137. Since I executed the server with an infinite loop, it was restarted.
Unfortunately, my error logs don't provide sufficient information to understand the problem. So I've been trying to understand what this code means. It seems there's consensus that this means it's an error of 128+9, with 9 meaning that the program was killed with kill -9. Now I'm not sure at all why this happened. I need help to find out.
By reading further, I found out that it could have been killed by the system because it exceeded a certain allowed execution time, and thus the system killed it. Now this is not so unlikely, since my linux server is provided by my university, so they could be applying some kind of security to do this. I read about something called timeout in linux. My first question is: How can I know if this is the cause of the problem?
My second question is: what should I check also to understand this problem? What would you do? Please advise.
If you require any additional information, please ask.
Thanks.

Sounds like you have blown through memory limits and your linux memory manager sent SIGKILL to your process. In that case you should check /var/log/messages file to see if there is anything about it. That's the first thing I would do. Check with your sysadmin if you don't have permissions.

Related

What causes libdispatch error EVFILT_MACHPORT in MacOSX Sierra?

Good morning,
I am facing a crash in my application. When the user tries to start it, he waits like a minute and then a std::exception is raised. Really I could not reproduce the bug by myself, but it seems quite a common problem.
The only thing I could track is the following line in syslog:
BUG in libdispatch client: kevent[EVFILT_MACHPORT] monitored resource vanished before the source cancel handler was invoked
Then, I start to google it and I can not find much more...I can only "suppose" that is some problem with GCD (that I do not use afaik, or at least not directly...). What I saw in Internet is that it is related with MacOSX Sierra. But the majority of forum have no answer, just a lot of tries without a unique result. Maybe the only web page that seems a bit clear about a workaround (that I have not tested, and anyway I do not want to use) is this.
So...:
someone has clear what can cause the exception in libdispatch?
someone can give me some good link, official documentation or something?
Is true that can be a bug in Sierra without updates?
Could it be related with the installer of an application?
Someone knows a way to reproduce this exception with a test program?
This libdispatch log message is not fatal, and is almost certainly not related to your crash, which sounds like an abort due to an uncaught C++ exception (without a crashreport/backtrace it is difficult to say anything more). libdispatch does not itself generate any C++ exceptions FWIW.
As to the meaning of that specific log message, it relates to the following section in the dispatch_source_create(3) manpage:
CANCELLATION:
Important: a cancellation handler is required for file descriptor and mach port based sources in order
to safely close the descriptor or destroy the port. Closing the descriptor or port before the cancellation handler has run may result in a race condition: if a new descriptor is allocated with the same
value as the recently closed descriptor while the source's event handler is still running, the event
handler may read/write data to the wrong descriptor.
If you see the EVFILT_MACHPORT "vanished" log message, somebody in your process has violated that API contract and deallocated a machport while a dispatch source was still monitoring it (causing the kernel to generate an EV_VANISHED kevent), see the source code.

IcmpCreateFile hangs

I have a C++ program that needs to ping a device before it proceeds. I've been using IcmpSendEcho for a few months now. During this time I have seen my program hang on the IcmpCreateFile call more than once. I haven't pinpointed the cause and all of my searches lead to nothing, this seems to happen for no rhyme or reason, whether my program is running for weeks or was just launched.
HANDLE hIcmpFile;
hIcmpFile = IcmpCreateFile();
Is anyone familiar with this, or should I just use a different method to ping?
EDIT: I'm not 100% sure but windows update seems to encourage this behavior; I don't seem to see this issue when a server or the client it's talking to have update turned off.

How to find where the program is waiting

I am working on a big code base. It is heavily multithreaded.
After running the linux based application for a few hours, in the end, right before reporting, the application silences. It doesn't die, it doesn't crash, it just waits there. Joins, mutexes, condition variables ... any of these can be the culprit.
If it had crashed, I would at least have a chance to find the source using debugger. But this way, I have no clue how to use what tool to find the bug. I can't even post a code sample for you. The only thing that can possibly help is to tap MANY places with cout to get a visual where the application is.
Have you been in such a situation? What do you recommend?
If you're running under Linux then just use gdb to run the program. When the application 'silences', interrupt it with CTRL+C, then type backtrace to see the call stack. With this you will find out the function where your application was blocked.
Incase of linux, gdb will be great help. Another tool that can be of great help is strace (This can also be used where there are problems with program for with source is not readily available because strace does not need recompilation to trace them.)
strace shall intercept/record system calls that are called by a process and also the signals that are received by a process. It will be able to show the order of events and all the return/resumption paths of calls. This can take you almost closer to the area of problem.
iotop, LTTng and Ftrace are few of other tools that be helpful to you in this scenario.

Failed to resume in time Crashlog

I am trying to figure out a "Failed to resume in time" problem. In one of our testers devices (which is an iPhone 4S with the latest OS) it happens very frequently, whereas in my own device it doesn't seem to happen at all.
Anyway, I got a few crashlogs. I am unable to trace the root of the cause though. I understand that the issue might be
1.When a process is holding up the main thread for too long.
2.When there is a memory issue.
I don't think the memory is much of an issue since it seems to happen when the user leaves the main menu and comes back. Nothing much is happening in the main menu so it probably is a task that runs too long.
Here is an excerpt from the crash log:
Can somebody help me or guide me on who I can trace the cause of the issue? Is there anyway to turn off the watchdog timer(probably not huh?) Also, what does highlighted thread refer to?
I have already checked my applicationDidBecomeActive & applicationWillEnterForeground to make sure there is nothing going on there.
To my knowledge there are no synchronous calls being made at this point. Does Reachability use synchronous calls to check for internet? How can I check for that?
I am not making any large data transfers upon resume.
I notice that GameCenter automatically logs in or check for log in upon resuming your app. Is there anyway to prevent this? Could this possibly cause a time out issue?
I tried doing a time profile, but I am not able to understand how to use it to analyze. If you can provide a good resource for that, that would be amazing.
Thanks!!!
You're currently in "trying to find the issue mode". You should switch to "try to find out how much of an issue this really is" mode.
So go find another 4S (actually as many as you can) to rule out that it's a device-specific issue. If it happens on all 4S it should be easier to pinpoint. If not, have someone else look over it, discuss possible causes. The peer programming approach often helps when you're stuck in a dead-end situation.
If the issue is only on that one device, you might want to check if it's broken (or "jailbroken") or might simply need a hard reboot (hold power and home for 10+ seconds).
If it only happens on some devices but not all, try to find what they have in common. This could be language/locale, or dictation, practically any kind of setting the user might have changed. If necessary, write a logger that logs as many settings as possible to your (web) server so you can compare settings one-by-one and quickly discard those that aren't in synch.
If only very few devices are affected, you could also ignore the issue and hope that additional crash logs from users will reveal the key to the issue.
Finally, there's always the option to disable suspend on terminate and instead terminate the app when the home button is pressed (as it was pre iOS 4). Unless of course the app has to run in background.

Process name change at runtime (C++)

Is it possible to change the name(the one that apears under 'processes' in Task Manager) of a process at runtime in win32? I want the program to be able to change it's own name, not other program's. Help would be appreciated, preferably in C++. And to dispel any thoughts of viruses, no this isn't a virus, yes I know what I'm doing, it's for my own use.
I would like to submit what i believe IS a valid reason for changing the process name at runtime:
I have an exe that runs continuously on a server -- though it is not a service. Several instances of this process can run on the server. The process is a scheduling system. An instance of the process runs for each line that is being scheduled, monitored and controlled. Imagine a factory with 7 lines to be scheduled. Main Assembly line, 3 sub assembly lines, and 3 machining lines.
Rather than see sched.exe 7 times in task manager, it would be more helpful to see:
sched-main
sched-sub1
sched-sub2
sched-sub3
sched-mach1
sched-mach2
sched-mach3
This would be much more helpful to the Administrator ( the user in this situation should never see task manager). If one process is hung, the Administrator can easily know which one to kill and restart.
I know you're asking for Win32, but under most *nixes, this can be accomplished by just changing argv[0]
I found code for doing that in VB. I believe it won't be too hard to convert it to C++ code.
A good book about low level stuff is Microsoft Windows Internals.
And I agree with Peter Ruderman
This is not something you should do.