Why does '$' for funcname in git log -L cause an infinite search? - regex

You can search for a filename and function name in git log with git log -L :funcname:filename.
I ran into an issue where we had been running this search programmatically and the funcname was set to '$', which caused an endless search. (e.g. git log -L :$:somefile.py)
'$' means end of string in regex, but why does this cause an endless search loop when other regex characters like '^' or '?' don't? What unique effects does the '$' character have?

I was able to reproduce this locally and step through the code using gdb.
This looks like a bug in the find_funcname_match_regexp function. It eventually ends up matching $ against the empty string, which matches successfully but causes no changes in the pointers used to mark the position in the file, resulting in an infinite loop.
Here's a walk through of the reproducer. In this example, we're running git log against the file main.go which has the following content:
package main
import "fmt"
func main() {
fmt.Println("example repository for demonstrating git log bug")
}
Start gdb and set a breakpoint at the beginning of the while loop in line-range.c. Arrange to print the value of start after each break:
(gdb) break line-range.c:140
Breakpoint 1 at 0x5a7a89: file line-range.c, line 140.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>p start
>end
(gdb)
Run git log -L :$:main.go under the control of gdb:
(gdb) run log -L :$:main.go
Starting program: /home/lars/src/git/git log -L :$:main.go
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 2469232]
Breakpoint 1, find_funcname_matching_regexp (xecfg=0x0, start=0x8477c0 "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tfmt.Println(\"example repository for demonstrating git log bug\")\n}\n", regexp=0x7fffffffc020) at line-range.c:140
140 reg_error = regexec(regexp, start, 1, match, 0);
$1 = 0x8477c0 "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tfmt.Println(\"example repository for demonstrating git log bug\")\n}\n"
In this output, we can see that start is pointing at the beginning of the file:
$1 = 0x8477c0 "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tfmt.Println(\"example repository for demonstrating git log bug\")\n}\n"
Skip a few iterations:
(gdb) c 7
Will ignore next 6 crossings of breakpoint 1. Continuing.
Breakpoint 1, find_funcname_matching_regexp (xecfg=0x0, start=0x84782b "}\n", regexp=0x7fffffffc020) at line-range.c:140
140 reg_error = regexec(regexp, start, 1, match, 0);
$2 = 0x84782b "}\n"
Here we see that start now points at the last line in the file.
Watch what happens if we iterate a few more times:
(gdb) c
Continuing.
Breakpoint 1, find_funcname_matching_regexp (xecfg=0x0, start=0x84782d "", regexp=0x7fffffffc020) at line-range.c:140
140 reg_error = regexec(regexp, start, 1, match, 0);
$3 = 0x84782d ""
(gdb) c
Continuing.
Breakpoint 1, find_funcname_matching_regexp (xecfg=0x0, start=0x84782d "", regexp=0x7fffffffc020) at line-range.c:140
140 reg_error = regexec(regexp, start, 1, match, 0);
$4 = 0x84782d ""
(gdb) c
Continuing.
Breakpoint 1, find_funcname_matching_regexp (xecfg=0x0, start=0x84782d "", regexp=0x7fffffffc020) at line-range.c:140
140 reg_error = regexec(regexp, start, 1, match, 0);
$5 = 0x84782d ""
After one more iteration of the loop, start now points at the empty string. It keeps this value in every subsequent iteration, and we never break out of the while loop.
I've submitted a patch to git that should correct this behavior.
You can follow the discussion there to see if they like my patch or if they decide there is a more appropriate way to resolve the problem.
With the patched version of the code, we see the following behavior instead:
$ git log -L :$:main.go
fatal: -L parameter '$' starting at line 1: no match

Related

gdb step until source is available again

I'm running GNU grep under gdb on linux and single stepping it. After about 12 steps, control is transferred to setlocale.c, for which no source code is available.
Example session, after step 12 no source code information is available and the list command just shows the file.
Is there a way of getting gdb to keep stepping until a file with source code is available again. Alternatively, is there a way of telling gdb to keep stepping until control is transferred to a different file?
example session, showing source code initially available and then unavaiable for setlocale.c?
(gdb) start
Temporary breakpoint 1 at 0x402e50: file grep.c, line 2415.
Starting program: ~/ws/opt/grep/out/bin/grep --context=20 -r --line-number --byte-offset --include=\*.c int .
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main (argc=0x8, argv=0x7fffffffdaa8) at grep.c:2415
2415 {
(gdb) l
2410 return result;
2411 }
2412
2413 int
2414 main (int argc, char **argv)
2415 {
2416 char *keys = NULL;
2417 size_t keycc = 0, oldcc, keyalloc = 0;
2418 int matcher = -1;
2419 bool with_filenames = false;
(gdb) s 12
__GI_setlocale (category=category#entry=0x6, locale=locale#entry=0x420b7b "") at setlocale.c:220
220 setlocale.c: No such file or directory.
(gdb) l
215 in setlocale.c
You need gdb finish command. With this command you can go out of current stack frame which has no source code available. You can use it as many times as you want until you are again in stack frame with source code. See documentation.
I ended up writing a simple gdb script using the Python API to do this. It will keep stepping until control is transfered to the next file, regardless of whether that involves adding a new stack frame or leaving the current one.
The script can be loaded with source leave_this_file.py. It defines a command called leave_this_file that can be invoked with no arguments, or given a number of times to repeat.
The script is a little bit makeshift and ends up parsing the result of the gdb command frame 0 rather than using one of gdb's proper APIs for inspecting frames.
MAX_STEPS = 10000
def get_file_name():
"""extract the file name for the bottommost frame"""
# example string
#0 main (argc=0x7, argv=0x7fffffffdaa8) at grep.c:2415
# <source fragment>
where_str = gdb.execute("frame 0", from_tty=False, to_string=True)
# last word of first line is file:line
file_line = where_str.splitlines()[0].split()[-1]
filename, _, line = file_line.rpartition(":")
# confirm that line number is an int, raise otherwise
int(line)
return filename
def step_out_of_file_once():
orig_file_name = get_file_name()
current_file_name = orig_file_name
counter = 0
for x in range(MAX_STEPS):
gdb.execute("step", from_tty=False, to_string=True)
counter += 1
current_file_name = get_file_name()
if orig_file_name != current_file_name:
break
print("%s: %30s, %s: %s" % ("new", current_file_name, "steps", counter))
class LeaveThisFile(gdb.Command):
"""step out of the current file"""
def __init__(self):
gdb.Command.__init__(
self, "leave_this_file", gdb.COMMAND_DATA, gdb.COMPLETE_SYMBOL, True
)
def invoke(self, arg, from_tty):
# interpret the arg as a number of times to execute the command
# 1 by default
if arg:
arg = int(arg)
else:
arg = 1
for x in range(arg):
step_out_of_file_once()
LeaveThisFile()
Here's some example output when running GNU grep under gdb
2415 {
(gdb) startQuit
(gdb) source leave_this_file.py
(gdb) leave_this_file 15
new: setlocale.c, steps: 18
new: pthread_rwlock_wrlock.c, steps: 8
new: ../sysdeps/unix/sysv/linux/x86/hle.h, steps: 3
new: pthread_rwlock_wrlock.c, steps: 1
new: setlocale.c, steps: 7
new: ../sysdeps/x86_64/multiarch/../strcmp.S, steps: 1
new: setlocale.c, steps: 48
new: getenv.c, steps: 4
new: ../sysdeps/x86_64/strlen.S, steps: 2
new: getenv.c, steps: 16
new: ../sysdeps/x86_64/multiarch/../strcmp.S, steps: 64
new: getenv.c, steps: 53
new: setlocale.c, steps: 16
new: ../sysdeps/x86_64/multiarch/../strchr.S, steps: 5
new: setlocale.c, steps: 23

Program (nload) runs as a daemon when executed in shell but not in startup/automation script

I would like to run nload (a network throughput monitor) as a daemon on startup (or just automate in general). I can successfully run it as a daemon from the command line by typing this:
nload eth0 >& /dev/null &
Just some background: I modified the nload source code (written in C++) slightly to write to a file in addition to outputting to the screen. I would like to read the throughput values from the file that nload writes to. The reason I am outputting to /dev/null is so that I don't need to worry about the stdout output.
The weird thing is that, when I run it manually it runs just fine as a dameon and I am able to read throughput values from the file. But every attempt at automation has failed. I have tried init.d, rc.local, cron but no luck. The script I wrote to run this in automation is:
#!/bin/bash
echo "starting nload"
/usr/bin/nload eth0 >& /dev/null &
if [ $? -eq 0 ]; then
echo started nload
else
echo failed to start nload
fi
I can confirm that when automated, the script does run, since I tried logging the output. It even logs "started nload", but when I look at the list of processes running nload is not one of them. I can also confirm that when the script is run manually from the shell, nload starts up just fine as a daemon.
Does anyone know what could be preventing this program from running when run via an automated script?
looks like nload is crashing if it's not run from terminal.
viroos#null-linux:~$ cat /etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
strace -o /tmp/nload.trace /usr/bin/nload
exit 0
looks like HOME env var is missing:
viroos#null-linux:~$ cat /tmp/nload.trace
brk(0x1f83000) = 0x1f83000
write(2, "Could not retrieve home director"..., 34) = 34
write(2, "\n", 1) = 1
exit_group(1) = ?
+++ exited with 1 +++
lets fix this:
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
export HOME=/tmp
strace -o /tmp/nload.trace /usr/bin/nload
exit 0
we have another problem:
viroos#null-linux:~$ cat /tmp/nload.trace
read(3, "\32\1\36\0\7\0\1\0\202\0\10\0unknown|unknown term"..., 4096) = 320
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7f23e62c9000, 4096) = 0
ioctl(2, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffedd149010) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(2, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffedd148fb0) = -1 ENOTTY (Inappropriate ioctl for device)
write(2, "Error opening terminal: unknown."..., 33) = 33
exit_group(1) = ?
+++ exited with 1 +++
I saw you mentioned that you modified nload code but my guess is you haven't removed handling missing termin. You can try further editing nload code or use screen in detached mode:
viroos#null-linux:~$ cat /etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
export HOME=/tmp
screen -S nload -dm /usr/bin/nload
exit 0

Get PID of new process

So I'm creating a process in Perl like this:
my $process = `nohup ./run > /dev/null 2>&1 &`;
Which returns something along the lines of
[1] 2905
How do I go about getting the process ID from this so later on in the script execution I can run something like:
exec("kill -9 $pid");
Here's what I've got so far:
/\[1\] ([0-9]+)/g
but it looks quite messy, is there any way to improve upon this regular expression? Will that regex always work? Is there any case where it wont be [1]?
how about
#ar = split(/\s+/, $process);
$pid = $ar[1];
You should probably be using a "fork and exec" pattern here.
if (my $pid = fork()) {
# You're in the parent process
# $pid contains the PID of the new child process
...
} else {
# You're in the new child process
# exec() your new command
exec($cmd);
# Execution never gets here
}
Edit: Actually, given that you're basically creating a daemon process here, perhaps you should look at Proc::Daemon instead.

RegEx for extracting a value from Open3.popen3 stdout

How do I get the output of an external command and extract values from it?
I have something like this:
stdin, stdout, stderr, wait_thr = Open3.popen3("#{path}/foobar", configfile)
if /exit 0/ =~ wait_thr.value.to_s
runlog.puts("Foobar exited normally.\n")
puts "Test completed."
someoutputvalue = stdout.read("TX.*\s+(\d+)\s+")
puts "Output value: " + someoutputvalue
end
I'm not using the right method on stdout since Ruby tells me it can't convert String into Integer.
So for instance, if the output is
"TX So and so: 28"
I would like to get only "28". I validated that the regex above matches what I need to match, I'm only wondering how to store that extracted value in a variable.
What is the right way of doing this? I can't find anywhere in the documentation the methods available for stdout. I'm using stout.read from Ruby 1.9.3.
All the information needed is in the Popen3 documentation, but you have to read it all and look at the examples pretty carefully. You can also glean useful information from the Process docs too.
Maybe this will 'splain it better:
require 'open3'
captured_stdout = ''
captured_stderr = ''
exit_status = Open3.popen3(ENV, 'date') {|stdin, stdout, stderr, wait_thr|
pid = wait_thr.pid # pid of the started process.
stdin.close
captured_stdout = stdout.read
captured_stderr = stderr.read
wait_thr.value # Process::Status object returned.
}
puts "STDOUT: " + captured_stdout
puts "STDERR: " + captured_stderr
puts "EXIT STATUS: " + (exit_status.success? ? 'succeeded' : 'failed')
Running that outputs:
STDOUT: Wed Jun 12 07:07:12 MST 2013
STDERR:
EXIT STATUS: succeeded
Things to note:
You often have to close the stdin stream. If the called application expects input on STDIN it will hang until it sees the stream close, then will continue its processing.
stdin, stdout, stderr are IO handles, so you have to read the IO class documentation to find out what methods are available.
You have to output to stdin using puts, print or write, and read or gets from stdout and stderr.
exit_status isn't a string, it's an instance of the Process::Status class. You can mess with trying to parse from its to_s version, but don't. Instead use the accessors to see what it returned.
I passed in the ENV hash, so the child program had access to the entire environment the parent saw. It's not necessary to do that; Instead you can create a reduced environment for the child if you don't want it to have access to everything, or you can mess with its view of the environment by changing values.
The code stdout.read("TX.*\s+(\d+)\s+") posted in the question is, um... nonsense. I have no idea where you got that as nothing like that is documented in Ruby's IO class for IO#read or IO.read.
It's easier to use capture3 if you don't need to write to STDIN of the called code:
require 'open3'
stdout, stderr, exit_status = Open3.capture3('date')
puts "STDOUT: " + stdout
puts "STDERR: " + stderr
puts "EXIT STATUS: " + (exit_status.success? ? 'succeeded' : 'failed')
Which outputs:
STDOUT: Wed Jun 12 07:23:23 MST 2013
STDERR:
EXIT STATUS: succeeded
Extracting a value from a string using a regular expression is trivial, and well covered by the Regexp documentation. Starting from the last code example:
stdout[/^\w+ (\w+ \d+) .+ (\d+)$/]
puts "Today is: " + [$1, $2].join(' ')
Which outputs:
Today is: Jun 12 2013
That's using the String.[] method which is extremely flexible.
An alternate is using "named captures":
/^\w+ (?<mon_day>\w+ \d+) .+ (?<year>\d+)$/ =~ stdout
puts "Today is: #{ mon_day } #{ year }"
which outputs the same thing. The downside to named captures is they're slower for what I consider a minor bit of convenience.
"TX So and so: 28"[/\d+$/]
=> "28"

How to log all commands run By system() System Call

I am trying to debug a C++ application which invokes many command line applications such as grep, etc through a the system() system call. I need to see all the commands the application is executing through the system() call.
I tried to view these commands by enabling history and view the .history file. But these commands are not executed through a terminal. The history file has only the commands executed interactively.
Any idea how this can be done?
Define a new macro with similar name:
#define system(_x) std::cout << _x << std::endl; (system)(_x);
The system macro replaces the system function and:
It prints the command to the standard output (or elsewhere).
It calls the system function.
Thanks to Hasturkun's suggestion, the following is better:
#define system(_x) (std::cout << (_x) << std::endl, system(_x))
That returns the result of system function call, too ;-)
To trace every command executed by "yourProgram":
truss -s!all -daDf -t exec yourProgram
eg:
$ truss -s!all -daDf -t exec sh -c "/bin/echo hello world;/bin/date"
Base time stamp: 1282164973.7245 [ Wed Aug 18 22:56:13 CEST 2010 ]
5664: 0.0000 0.0000 execve("/usr/bin/i86/ksh93", 0x080471DC, 0x080471EC) argc = 3
5664: argv: sh -c /bin/echo hello world;/bin/date
5665: 0.0106 0.0106 execve("/bin/echo", 0x08067484, 0x080674F8) argc = 3
5665: argv: /bin/echo hello world
hello world
5664: 0.0126 0.0126 execve("/bin/date", 0x080674E0, 0x080674F8) argc = 1
5664: argv: /bin/date
Wed Aug 18 22:56:13 CEST 2010
If you want to correlate these execs to system() calls, you can use that command:
truss -t execve -f -u 'libc:system' yourProgram
eg:
$ cat a.c
main()
{
system("echo a b c");
system("pwd");
}
$ truss -t execve -f -u 'libc:system' ./a
20073: execve("a", 0x08047240, 0x08047248) argc = 1
20073/1#1: -> libc:system(0x8050a5c, 0x0)
20074/1: execve("/bin/sh", 0x080471BC, 0x08047248) argc = 3
a b c
20073/1#1: <- libc:system() = 0
20073/1#1: -> libc:system(0x8050a68, 0x0)
20076/1: execve("/bin/sh", 0x080471BC, 0x08047248) argc = 3
/tmp
20073/1#1: <- libc:system() = 0
Finally, if you are using Solaris 10 or newer, you can use Dtrace for this task like this:
dtrace -Z -q -c yourProgram -n ' pid$target:libc:system:entry { printf("system(\"%s\")\n", copyinstr(arg0)); } '
which will give that output with the same "a" code:
a b c
/tmp
system("echo a b c")
system("pwd")
PS: By the way system() isn't a system call but a standard library function.
You can use truss or strace (Not sure which one comes with Solaris) to run the program and trace the calls to system.
For truss the relevant command will be something like truss -caf program_name