How to figure out why ssh session does not exit sometimes? - c++

I have a C++ application that uses ssh to summon a connection to the server. I find that sometimes the ssh session is left lying around long after the command to summon the server has exited. Looking at the Centos4 man page for ssh I see the following:
The session terminates when the command or shell on the remote machine
exits and all X11 and TCP/IP connections have been closed. The exit
status of the remote program is returned as the exit status of ssh.
I see that the command has exited, so I imagine not all the X11 and TCP/IP connections have been closed. How can I figure out which of these ssh is waiting for so that I can fix my summon command's C++ application to clean up whatever is being left behind that keeps the ssh open.
I wonder why this failure only occurs some of the time and not on every invocation? It seems to occur approximately 50% of the time. What could my C++ application be leaving around to trigger this?
More background: The server is a daemon, when launched, it forks and the parent exits, leaving the child running. The client summons by using:
popen("ssh -n -o ConnectTimeout=300 user#host \"sererApp argsHere\""
" 2>&1 < /dev/null", "r")

Use libssh or libssh2, rather than calling popen(3) from C only to invoke ssh(1) which itself is another C program. If you want my personal experience, I'd say try libssh2 - I've used it in a C++ program and it works.

I find some hints here:
http://www.snailbook.com/faq/background-jobs.auto.html
This problem is usually due to a feature of the OpenSSH server. When writing an SSH server, you have to answer the question, "When should the server close the SSH connection?" The obvious answer might seem to be: close it when the server-side user program started by client request (shell or remote command) exits. However, it's actually a bit more complicated; this simple strategy allows a race condition which can cause data loss (see the explanation below). To avoid this problem, sshd instead waits until it encounters end-of-file (eof) on the pipes connecting to the stdout and stderr of the user program.

#sienkiew: If you really want to execute a command or script via ssh and exit, have a look at the daemontool of the libslack package. (Similar tools that can detach a command from its standard streams would be screen, tmux or detach.)
To inspect stdin, stdout & stderr of the command executed via ssh on the command line, you can, for example, use lsof.
# sample code to figure out why ssh session does not exit
# sleep keeps its stdout open, so sshd only sees EOF after command completion
ssh localhost 'sleep 10 &' # blocks
ssh localhost 'sleep 10 1>&- &' # does not block
ssh localhost 'sleep 10 & lsof -p ${!}'
ssh localhost 'sleep 10 1>&- & lsof -p ${!}'
ssh localhost 'sleep 10 1>/dev/null & lsof -p ${!}'
ssh localhost 'sleep 10 1>/dev/null 2>&1 & lsof -p ${!}'

Related

With django, how to control the server's stop and start again by batch.bat via a button on the screen, use the Windows operating system

With django, how to control the server's stop and start again by batch.bat via a button on the screen, use the Windows operating system
all the methods I got The process is done manually and I didn't find an automated way or code that works on it
for example:
1-I have to Open Windows PowerShell as Administrator
Find PID (ProcessID) for port 8080:
netstat -aon | findstr 8000
TCP 0.0.0.0:8080 0.0.0.0:0 LISTEN 77777
Kill the zombie process:
taskkill /f /pid 77777
Now we return to the question
how can I do this process automatically, either through the batch.bat file or through the django code

How to run valgrind for the server?

I am new to valgrind. I need to run the valgrind for a server written in C++. The server listens to a port. When the run the server inside the Valgrind, I couldn't communicate with the server. The Port is not listening.
valgrind --tool=memcheck --leak-check=yes --log-file=valgrind_log.txt /binary_path-c
I need the server should listen to the port when i run with valgrind.
If you have already confirmed that the exact same binary is doing that desired network socket open() and it doesn’t work in Valgrind, then read on.
Valgrind only works with binary file and cannot attach to an already running process (as explained here).
Valgrind is also sensitive to change of effective UID, particularly when running from root UID. You cannot use sudo with valgrind (detailed here).
You cannot Valgrind on an executable binary that has Linux capability bit enabled (details here).
Valgrind cannot handle root setuid on NFS filesystem (even when mounted to allow this). Workaround is to move your build or binary to non-NFS partition.
Having said all that above, it is a timing problem where Valgrind is taking things SLOWER and that the control flow of your code is “missing its mark” to performing that open to a network socket. Only way is to put in debug print statements throughout your code and nail that timing logic.
Alternatively...
To see what a production grade daemon is doing from the very beginning of startup, execute:
valgrind --trace-children=yes /usr/skin/<your-server-binary>
There’s another way to monitor network socket in action, read on ...
Tracing from start of execution
You can perform strace from the start and find out what network socket got opened (and described later, show its buffer content) by:
strace -eopen <your-server-binary> <server-arguments>
make a note of the desired fd (file descriptor) number.
As with any strace command in starting a process, pressing Ctrl-C will stop that process. But using strace on a live process, you safely detach using Ctrl-C from its targeted process (and let that process continue running) and return to your command shell prompt.
Attaching to already running server
But you could monitor an already running production daemon server using strace but it’s harder to find that opened fd number for your network socket. Do previous step briefly to get that fd.
Find out your PID using ps auxw.
Then plug in your server/daemon’s PID here:
strace -f -p <your-server-PID -fnetwork
to find out its fd number.
Exact socket monitoring
With the identified fd on hand, rerun strace to attach to that production server with:
strace -f -eread=<fd> -ewrite=<fd> -p<your-daemon-PID>
network troubleshooting checklist
lsof -i -n a list of open ports
strace
netstat -lt
tcpdump/wireshark
A list of network troubleshooting tools for Linux is given here, here and most comprehensively here.

Shell Script stops after connecting to external server

I am in the process of trying to automate deployment to an AWS Server as a cool project to do for my coding course. I'm using ShellScript to automate different processes but when connecting to the AWS E2 Ubuntu server. When connected to the server, it will not do any other shell command until I close the connection. IS there any way to have it continue sending commands while being connected?
read -p "Enter Key Name: " KEYNAME
read -p "Enter Server IP With Dashes: " IPWITHD
chmod 400 $KEYNAME.pem
ssh -i "$KEYNAME.pem" ubuntu#ec2-$IPWITHD.us-east-2.compute.amazonaws.com
ANYTHING HERE AND BELOW WILL NOT RUN UNTIL SERVER IS DISCONNECTED
A couple of basic points:
A shell script is a sequential set of commands for the shell to execute. It runs a program, waits for it to exit, and then runs the next one.
The ssh program connects to the server and tells it what to do. Once it exits, you are no longer connected to the server.
The instructions that you put in after ssh will only run when ssh exits. Those commands will then run on your local machine instead of the server you are sshed into.
So what you want to do instead is to run ssh and tell it to run a set of steps on the server, and then exit.
Look at man ssh. It says:
ssh destination [command]
If a command is specified, it is executed on the remote host instead of a login shell
So, to run a command like echo hi, you use ssh like this:
ssh -i "$KEYNAME.pem" ubuntu#ec2-$IPWITHD.us-east-2.compute.amazonaws.com "echo hi"
Or, for longer commands, use a bash heredoc:
ssh -i "$KEYNAME.pem" ubuntu#ec2-$IPWITHD.us-east-2.compute.amazonaws.com <<EOF
echo "this will execute on the server"
echo "so will this"
cat /etc/os-release
EOF
Or, put all those commands in a separate script and pipe it to ssh:
cat commands-to-execute-remotely.sh | ssh -i "$KEYNAME.pem" ubuntu#ec2-$IPWITHD.us-east-2.compute.amazonaws.com
Definitely read What is the cleanest way to ssh and run multiple commands in Bash? and its answers.

GDB Connection Timeout

I used to the St-write to burn .bin to the STM32F4 and saw the message which I expected. Now, I hope to understand how GPIO init. Hence, I use OpenOCD and arm-none-eabi-gdb to do that. Here, it is my process.
$ minicom
$ openocd -f /opt/openocd/share/openocd/scripts/board/stm32f4discovery.cfg
$ arm-none-eabi-gdb main.elf
(gdb) target remote localhost:3333
(gdb) localhost:3333: Connection timed out.
How do I check the port of OpenOCD? Why does it occur timeout?
That certainly means that openocd did not start or that the port is busy.
Usually, you use :
openocd -f board/stm32f4discovery.cfg
You should check that your session is running.
Are you running a virtual linux machine on a windows host?
If so, you probably need to replace localhost with 10.0.0.2 (or whatever your windows IP is).
A good way to know, is to telnet to the openOCD address and port 4444 and see if you get the openOCD prompt, and can type a few commands.

Two-machine GDB debugging between Macs over Ethernet - transaction timed out

I am trying to debug a device driver which is crashing the kernel on a Mac using a remote machine running gdb (trying to follow the instructions here). Both machines are connected to the same network by Ethernet (same router even, and both can access the network). I have also set nvram boot-args="debug=0x144" on the target and restarted.
I then load the kernel extension on the target as usual. On the host machine I start gdb like this:
$ gdb -arch i386 /Volumes/KernelDebugKit/mach_kernel
Once in gdb, I load the kernel macros and set up for remote attachment
(gdb) source /Volumes/KernelDebugKit/kgmacros
(gdb) target remote-kdp
(gdb) kdp-reattach 11.22.33.44
However, the last command then does not make a connection and I get an endless spool of
kdp_reply_wait: error from kdp_receive: receive timeout exceeded
kdp_transaction (remote_connect): transaction timed out
kdp_transaction (remote_connect): re-sending transaction
What is the correct way to get gdb connected to the target machine?
There are a number of ways to break into the target, including:
Kernel panic, as stated in your answer above.
Non-maskable interrupt, which is triggered by the cmd-option-ctrl-shift-esc key combination.
Code a break in your kernel extension using PE_enter_debugger(), which is declared in pexpert/pexpert.h
Halt at boot by setting DB_HALT (0x01) in the NVRAM boot-args value.
Additionally, you may need to set a persistent ARP table entry, as the target is unable to respond to ARP requests while stopped in the debugger. I use the following in my debugger-launch shell script to set the ARP entry if it doesn't already exist:
if !(arp -a -n -i en0 | grep '10\.211\.55\.10[)] at 0:1c:42:d7:29:47 on en0 permanent' > /dev/null) ; then
echo "Adding arp entry"
sudo arp -s 10.211.55.10 00:1c:42:d7:29:47
fi
Someone more expert could probably improve on my bit of shell script.
All of the above is documented in http://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming/KernelProgramming.pdf.
The answer is simply to make sure the target has a kernel panic before you try to attach gdb from the host.