How can an executable file being debugged while it changes - gdb

Here is the scenario I'm asking about:
gdb debugs some program executable
programmer finds bug (good!)
programmer fixes code and re-compiles (great!)
programmer realizes he didn't quit gdb, so it was running all that time
specifically, it was running while the executable was written and the OS (Linux) allowed it.
How come the above is possible?
Shouldn't I get some OS error message like "file is being used by another application (gdb)"

Shouldn't I get some OS error message like
It depends.
Suppose your rebuild command is gcc -o foo t.c.
This command can either open(2) foo for writing, or it can write to a temporary file foo.$uniqsuffix and rename(2) the temporary to foo on success, or it can unlink(2) foo and create and write to a new foo.
Only the first variant -- attempting to write to the original foo would fail with ETXTBSY.
Running strace -fe file gcc -o foo t.c |& grep foo on my (Ubuntu) system shows:
[pid 116892] stat("foo", {st_mode=S_IFREG|0750, st_size=16520, ...}) = 0
[pid 116892] lstat("foo", {st_mode=S_IFREG|0750, st_size=16520, ...}) = 0
[pid 116892] unlink("foo") = 0
[pid 116892] openat(AT_FDCWD, "foo", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
[pid 116892] stat("foo", {st_mode=S_IFREG|0640, st_size=16520, ...}) = 0
[pid 116892] chmod("foo", 0750) = 0
So on this system the linker uses unlink + create new file strategy, and no error is expected.
See this answer for why you can continue debugging the original program even after it has been rebuilt.

Related

Failures bulding Perl 5.32.1 on HP-UX (hpia11.31) - related to failed regex evaluations

I'm attempting to build Perl 5.32.1 on an hpia11.31 system and am getting what appear to be failures in regex evaluations. For instance, make_patchnum.pl fails because a regex intended to pull the filename from a heredoc instead returns the entire heredoc as the filename:
./miniperl -Ilib make_patchnum.pl
Failed to open for write './lib/Config_git.pl' is generated by make_patchnum.pl
# DO NOT EDIT DIRECTLY - edit make_patchnum.pl instead
######################################################################
$Config::Git_Data=<<'ENDOFGIT';
git_commit_id=''
git_describe=''
git_branch=''
git_uncommitted_changes=''
git_commit_id_title='':File name too long at make_patchnum.pl line 84.
Manually getting past that, configpm exhibits the same issue: regex evaluations to extract variable headers fail in 5.32.1 where they succeed in 5.28.1. Example:
Expected a Configure variable header, instead we got:
_exe (Unix.U):
This variable defines the extension used for executable files.
DJGPP, Cygwin and OS/2 use '.exe'. Stratus VOS uses '.pm'.
On operating systems which do not require a specific extension
for executable files, this variable is empty.
I assume this is using the regexec.c built earlier in the build process, although I don't know that for certain. Tne regexec.c build reports some warnings, but they seem in line with warnings reported against 5.28.1: I don't see anything here that suggests it doesn't work.
Here's the build command for regexec.c:
cc -c -DPERL_CORE -D_POSIX_C_SOURCE=199506L -D_REENTRANT \
-Ae -Wp,-H150000 -D_HPUX_SOURCE -Wl,+vnocompatwarnings +DD64 \
-D_INCLUDE__STDC_A1_SOURCE -I/usr/local/include -D_LARGEFILE_SOURCE \
-D_FILE_OFFSET_BITS=64 +O2 +Onolimit regexec.c
Any ideas why the regex parser might behave differently on HP-UX from other platforms? I've successfully built 5.32.1 for x86 Linux, plinux, zlinux, rs6000, and Solaris, so this seems specific to HP-UX.
EDIT: compiler info
bash-4.0$ /opt/aCC/bin/cc --version
cc: HP C/aC++ B3910B A.06.20 [May 13 2008]
Which compiler is your cc?
You require the ANSI-C compiler to be able to build perl on HP-UX
$ cc --version
cc: HP C/aC++ B3910B A.06.28.02 [Mar 09 2016]
$ cc -V
cc: HP C/aC++ B3910B A.06.28.02 [Mar 09 2016]

Program execution steps

I have a c++ program that works fine, however it needs to run for a long time. But while it is running I could continue to develop some parts of it. If I recompile my program, this will replace the binary with a new one. Does this will modify the behavior of the running program? Or are the process and the binary file two separate things once the program is launched?
More generally, what are the steps of a program execution?
On Linux, the process uses memory mapping to map the text section of the executable file and shared libraries directly into the running process memory. So if you could overwrite the executable file, it would affect the running process. However, writing into a file that's mapped for execution is prohibited -- you get a "Text file busy" error.
However, you can still recompile the program. If the compiler (actually the linker) gets this error, it removes the old executable file and creates a new one. On Unix, if you remove a file that's in use, the file contents are not actually removed from the disk, only the reference from the directory entry is removed; the file isn't fully deleted until all references to it (directory entries, file descriptors and memory mappings) go away. So the running process continues to be mapped to the old, nameless file. You can see this with the following demonstration:
barmar#dev:~$ ls -li testsleep
229774 -rwxr-xr-x 1 barmar adm 4584 Apr 24 04:30 testsleep
barmar#dev:~$ ./testsleep &
[1] 17538
barmar#dev:~$ touch testsleep.c
barmar#dev:~$ make testsleep
cc testsleep.c -o testsleep
barmar#dev:~$ ls -li testsleep
229779 -rwxr-xr-x 1 barmar adm 4584 Apr 24 04:32 testsleep
The inode number changed from 229774 to 229779 when I recompiled the program while it was running, indicating that a new file was created.
On Windows, you couldn't even write the new executable while the old version is running. The file on disk is locked while the process exists. On Linux, you can overwrite the file on disk, but the copy in memory remains untouched.
OTOH, while running in an IDE, it may be possible to patch the running process as the IDE is aware of the relevant details. But it's complex and not all IDE's support this.

Supply arguments to program under gdbserver on MSYS

I needed to debug a program asynchronously, because it stalled, and Ctrl+C killed gdb, rather than interrupting the program (this is on MinGW/MSYS).
Someone hinted that gdb wouldn't work on Windows in async mode, and indeed it didn't (with the Asynchronous execution not supported on this target. message), but that gdbserver would.
So I try:
$ gdbserver localhost:60000 ./a_.exe 0
Process ./a_.exe created; pid = 53644
Listening on port 60000
(Supplying the 0 as the argument, according to how the manpage says it's done.)
Then in another terminal:
$ gdb ./a_.exe
(gdb) target remote localhost:60000
Remote debugging using localhost:60000
0x76fa878f in ntdll!DbgBreakPoint () from C:\Windows\system32\ntdll.dll
(gdb) continue
Continuing.
[Inferior 1 (Remote target) exited with code 01]
While the original now looks like:
$ gdbserver localhost:60000 ./a_.exe 0
Process ./a_.exe created; pid = 53484
Listening on port 60000
Remote debugging from host 127.0.0.1
Expecting 1 argument: test case number to run.
Child exited with status 1
GDBserver exiting
That is, my program thought that it got no arguments.
Is the manpage wrong?
Yes! "Misleading" is a more fitting term. (Misleading, at least as it applies to this version of gdbserver on this platform.)
The first argument is literally the first argument (argv) given to the inferior. Normally this is the name of the executable. So, the following worked:
$ gdbserver localhost:60000 ./a_.exe whatever 0
That is, the manpage should have said, to be consistent:
target> gdbserver host:2345 emacs emacs foo.txt
I am not able to help you out with gdbserver,
but if you are just looking for a way to interrupt a program running in mingw/msys gdb(Similar to Ctrl+C on linux)
take a look at at Debugbreak

Windows vs. Linux GCC argv[0] value [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Get path of executable
I'm programming on Windows using MinGW, gcc 4.4.3. When I use the main function like this:
int main(int argc, char* argv[]){
cout << "path is " << argv[0] << endl;
}
On Windows I get a full path like this: "C:/dev/stuff/bin/Test". When I run the same application on Linux, however, I get some sort of relative path: "bin/Test". It's breaking my application! Any idea on how to make sure the path is absolute on both systems?
No, there isn't. Under most shells on Linux, argv[0] contains exactly what the user typed to run the binary. This allows binaries to do different things depending on what the user types.
For example, a program with several different command-line commands may install the binary once, and then hard-link the various different commands to the same binary. For example, on my system:
$ ls -l /usr/bin/git*
-rwxr-xr-x 109 root wheel 2500640 16 May 18:44 /usr/bin/git
-rwxr-xr-x 2 root wheel 121453 16 May 18:43 /usr/bin/git-cvsserver
-rwxr-xr-x 109 root wheel 2500640 16 May 18:44 /usr/bin/git-receive-pack
-rwxr-xr-x 2 root wheel 1021264 16 May 18:44 /usr/bin/git-shell
-rwxr-xr-x 109 root wheel 2500640 16 May 18:44 /usr/bin/git-upload-archive
-rwxr-xr-x 2 root wheel 1042560 16 May 18:44 /usr/bin/git-upload-pack
-rwxr-xr-x 1 root wheel 323897 16 May 18:43 /usr/bin/gitk
Notice how some of these files have exactly the same size. More investigation reveals:
$ stat /usr/bin/git
234881026 459240 -rwxr-xr-x 109 root wheel 0 2500640 "Oct 29 08:51:50 2011" "May 16 18:44:05 2011" "Jul 26 20:28:29 2011" "May 16 18:44:05 2011" 4096 4888 0 /usr/bin/git
$ stat /usr/bin/git-receive-pack
234881026 459240 -rwxr-xr-x 109 root wheel 0 2500640 "Oct 29 08:51:50 2011" "May 16 18:44:05 2011" "Jul 26 20:28:29 2011" "May 16 18:44:05 2011" 4096 4888 0 /usr/bin/git-receive-pack
The inode number (459240) is identical and so these are two links to the same file on disk. When run, the binary uses the contents of argv[0] to determine which function to execute. You can see this (sort of) in the code for Git's main().
argv array
argv[0] is a parameter like any others: it can be an arbitrary NUL terminated byte string. It can be the empty string. It is whatever the launching process wants.
By default, the shell with set argv[0] to whatever is used to name the program: a name looked-up in $PATH, a relative or an absolute path. It can be a symbolic link or a regular file.
To invoke a program with some other value, with zsh (dunno with other shells) use:
ARGV0=whatever_you_want some_program arguments
If you really need the path to the executable, you cannot use the command line on Unix.
Linux only
On Linux: /proc/self/exe is a symbolic link to the executable file.
You can readlink it. You can also stat or open it directly.
Renaming and soft link
A normal soft link is a dumb string, and doesn't know what happens to its target (if it exists at all). But the /proc/self/exe soft link is magic.
In case of renaming, the soft-but-magic-link will follow renaming. In case there are several hard links, it will follow the name of the particular hard link that was used. (So different hard links to the same file are not perfectly equivalent under Linux.)
If this hard link is unlinked, I think " (deleted)" is appended to the value of the symbolic link. Note that this is a valid file name, so another unrelated file could have that name.
In any case, the symbolic link is a hard link to the file, so you can stat or open it directly.
I don't think you can count on anything on a network file system if the binary is renamed or unlinked on another system than the one where the executable is launched.
Security considerations
When your program gets to use the /proc/self/exe special file, it is possible for the file used to launch your program to be unlinked or renamed. This should be taken seriously in case the program is privileged (SUID or Set Capabilities): even if the user doesn't have write access to the original "Set Something" binary, he may be able to make a hard link to it if he has write access to a directory on the same file system, so he may be able to change the name if a running privileged binary.
By the time you readlink, the value returned may refer to another file. (Of course, there is always an unavoidable race condition with opening the result of readlink.)
As usual, NFS does not provides all the same guaranties that local file systems have.
There is no way to ensure that argv[0] is an absolute path because it is supposed to be exactly how the user invoked the program. So, if on a Linux command line you invoke your program via ./bin/Test, then argv[0] should be exactly "./bin/Test".
It seems like a bug in MinGW's runtime if when you invoke the program from a command prompt via .\bin\Test, argv[0] is "C:/dev/stuff/bin/Test". With the latest MinGW (gcc version 4.5.2), invoking a binary via .\bin\Test means argv[0] is ".\bin\Test". A Microsoft Visual C++-built binary (cl version 16.00.40219.01) invoked via .\bin\Test also has ".\bin\Test" for argv[0].

per process configurable core dump directory

Is there a way to configure the directory where core dump files are placed for a specific process?
I have a daemon process written in C++ for which I would like to configure the core dump directory. Optionally the filename pattern should be configurable, too.
I know about /proc/sys/kernel/core_pattern, however this would change the pattern and directory structure globally.
Apache has the directive CoreDumpDirectory - so it seems to be possible.
No, you cannot set it per process. The core file gets dumped either to the current working directory of the process, or the directory set in /proc/sys/kernel/core_pattern if the pattern includes a directory.
CoreDumpDirectory in apache is a hack, apache registers signal handlers for all signals that cause a core dump , and changes the current directory in its signal handler.
/* handle all varieties of core dumping signals */
static void sig_coredump(int sig)
{
apr_filepath_set(ap_coredump_dir, pconf);
apr_signal(sig, SIG_DFL);
#if AP_ENABLE_EXCEPTION_HOOK
run_fatal_exception_hook(sig);
#endif
/* linuxthreads issue calling getpid() here:
* This comparison won't match if the crashing thread is
* some module's thread that runs in the parent process.
* The fallout, which is limited to linuxthreads:
* The special log message won't be written when such a
* thread in the parent causes the parent to crash.
*/
if (getpid() == parent_pid) {
ap_log_error(APLOG_MARK, APLOG_NOTICE,
0, ap_server_conf,
"seg fault or similar nasty error detected "
"in the parent process");
/* XXX we can probably add some rudimentary cleanup code here,
* like getting rid of the pid file. If any additional bad stuff
* happens, we are protected from recursive errors taking down the
* system since this function is no longer the signal handler GLA
*/
}
kill(getpid(), sig);
/* At this point we've got sig blocked, because we're still inside
* the signal handler. When we leave the signal handler it will
* be unblocked, and we'll take the signal... and coredump or whatever
* is appropriate for this particular Unix. In addition the parent
* will see the real signal we received -- whereas if we called
* abort() here, the parent would only see SIGABRT.
*/
}
It is possible to make it using the "|command" mechanism of the core_pattern file. The executed command can create the directories and files as needed. The command can be passed the following specifiers in the parameters (cf. man 5 core):
%% a single % character
%c core file size soft resource limit of crashing process
%d dump mode—same as value returned by prctl(2) PR_GET_DUMPABLE
%e executable filename (without path prefix)
%E pathname of executable, with slashes ('/') replaced by exclamation marks ('!')
%g (numeric) real GID of dumped process
%h hostname (same as nodename returned by uname(2))
%i TID of thread that triggered core dump, as seen in the PID namespace in which the thread resides
%I TID of thread that triggered core dump, as seen in the initial PID namespace
%p PID of dumped process, as seen in the PID namespace in which the process resides
%P PID of dumped process, as seen in the initial PID namespace
%s number of signal causing dump
%t time of dump, expressed as seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC)
%u (numeric) real UID of dumped process
For example, it possible to create a script (e.g. named crash.sh) as follow:
#!/bin/bash
# $1: process number on host side (%P)
# $2: program's name (%e)
OUTDIR=/tmp/core/$2
OUTFILE="core_$1"
# Create a sub-directory in /tmp
mkdir -p "$OUTDIR"
# Redirect stdin in a per-process file:
cat > "$OUTDIR"/"$OUTFILE"
exit 0
In the shell:
$ chmod +x crash.sh
$ mv crash.sh /tmp # Put the script in some place
$ sudo su
# echo '|/tmp/crash.sh %P %e' > /proc/sys/kernel/core_pattern
# cat /proc/sys/kernel/core_pattern
|/tmp/crash.sh %P %e
# exit
$
Create an example program which crashes (e.g. fail.c):
int main(void)
{
char *ptr = (char *)0;
*ptr = 'q';
return 0;
}
Compile the program (make several executables) and adjust the core file size in the current shell:
$ gcc fail.c -o fail1
$ gcc fail.c -o fail2
$ ulimit -c
0
$ ulimit -c unlimited
$ ulimit -c
unlimited
Run the failing programs several times to have multiple processes ids:
$ ./fail1
Segmentation fault (core dumped)
$ ./fail2
Segmentation fault (core dumped)
$ ./fail1
Segmentation fault (core dumped)
$ ./fail2
Segmentation fault (core dumped)
Look at /tmp where the core_pattern redirect the core dumps:
$ ls -l /tmp/core
total 8
drwxrwxrwx 2 root root 4096 nov. 3 15:57 fail1
drwxrwxrwx 2 root root 4096 nov. 3 15:57 fail2
$ ls -l /tmp/core/fail1/
total 480
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10606
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10614
$ ls -l /tmp/core/fail2
total 480
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10610
-rw-rw-rw- 1 root root 245760 nov. 3 15:57 core_10618