grep not matching strings when they come from a variable - regex

I'm writing a script that is helping me process log files. In it, I have my grep flags stored in a variable. The flags and strings themselves work just fine, but when I pass them to grep using a variable, the parts of the string that use escaped characters don't produce any matches. See below:
grepvars="-B4 -Psihe 'caused\sby|unable|fault|error|deadlock|checkpoint|corrupt|fail|exception|fatal|severe|\tat\s'"
grep -B4 -Psihe 'caused\sby|unable|fault|error|deadlock|checkpoint|corrupt|fail|exception|fatal|severe|\tat\s' adapter_15.log > adapter_15-error1.log
grep $grepvars adapter_15.log > adapter_15-error2.log
wc -l *-error?.log
51398 adapter_15-error1.log
25032 adapter_15-error2.log
As you can see, the \tat\s part does not produce matches when passed through a variable to grep. What that is supposed to match is a (literal tab)at(literal space). Although this works correctly without using a variable, I'd rather use one since it makes my multiple grep calls easier to manage. What do I have to do to ensure that grep will perform this match correctly when passed through a variable?

After not having any sort of luck with this, I found a workaround: create a function and call it when needed. Here's what I came up with:
grep4j () {
unset IFS
nice -n 15 grep -B3 -Psihe '\tat\s|caused\sby|unable|fault|error|deadlock|checkpoint|corrupt|fail|exception|fatal|severe' $1
IFS=$'\n'
}
Yes, I did try unsetting IFS before and after the grep strings that were using the varaible. It didn't work (and I need it to be set for other things to work). Doing the function like this met my needs, and maybe it will help someone else as well. Cheers!
In case you're curious, this is designed to get relevant messages out of log4j-formatted logs. It saves me a lot of time.

If you're storing all the options of grep in a string then I guess you need to use evil eval:
str="grep $grepvars adapter_15.log > adapter_15-error2.log"
eval "$str"

It may be easier to stuff options into the environment variable GREP_OPTIONS, and patterns into a file, like so:
grep -f <file-with-patterns> ...

Related

grep for a list of variables that are referenced by a script?

I have some scripts that use various variables, and I want to grep (from within bash on FreeBSD) each of them for the list of variables that are used by the script. These are not shell scripts, but the syntax of referencing a variable is similar to that used in bash. Specifically, a reference to a variable can be like:
$X, or
${X}
and the name of the variable ("X" in this case) can include alphanumerics and underscores. At this point I want to explicitly note that I imagine that bash itself probably has a more complicated set of possible ways to reference variables, but if so, I do not care about that for the purposes of this question.
I would like to find all variable names that are so referenced in a given file - just the name, not the entire line. So something of the form awesomegrepcmd filename | sort | uniq, or something like that.
Note that there may be multiple different variables referenced on a single line.
I have been fighting regex and escaping rules via largely ignorant stab-in-the-dark attempts (e.g. "OK, maybe I need to put TWO backslashes here, but only ONE there") for a while now, and googling to try to find others who have done this sort of thing, but have thus far been unable to accomplish what I want. How can this be done? Thanks.
EDIT: Example: Let's say the content of a file is this (just pseudocode, not meant as bash or anything):
SPECIFICS={ "be born", "die", "plant", "reap", "kill", "heal", "laugh", "weep" }
CHORUS_PREFIX="To every thing"
CHORUS_INFIX="there is a season"
CHORUS_SUFFIX="and a time to every purpose under heaven."
CHORUS_SEPARATOR=", turn, turn, turn, "
CHORUS = $CHORUS_PREFIX + ${CHORUS_SEPARATOR} + $CHORUS_INFIX + $CHORUS_SEPARATOR + ${CHORUS_SUFFIX}
SPECIFIC_PREFIX="A time to "
UNUSED=${SOMETHING}
echo $CHORUS
foreach SPECIFIC in $SPECIFICS {
echo $SPECIFIC_PREFIX + " " + ${SPECIFIC}
}
echo ${CHORUS}
Then the output I would want would be:
CHORUS
CHORUS_INFIX
CHORUS_PREFIX
CHORUS_SEPARATOR
CHORUS_SUFFIX
SOMETHING
SPECIFIC
SPECIFIC_PREFIX
SPECIFICS
A couple things to note:
${CHORUS} and $CHORUS (for example) both refer to the same variable, named CHORUS
The UNUSED variable is not in the output, because it is not actually used in the script (despite having been defined in it)
The SOMETHING variable, used only to define the unused UNUSED variable, is in the output, since it was used
The various CHORUS_xxx variables are only used in the one single line defining the CHORUS variable, but they are all present in the output.
Using an Extended regular expression:
me#lappy386:/tmp$ grep -Eoi '\$(\{[a-z0-9_]+\}|[a-z0-9_]+)' \
/tmp/example |tr -d '{}$'|sort|uniq
CHORUS
CHORUS_INFIX
CHORUS_PREFIX
CHORUS_SEPARATOR
CHORUS_SUFFIX
SOMETHING
SPECIFIC
SPECIFIC_PREFIX
SPECIFICS
Using grep and sed
grep -oE '\$[A-Za-z_]+|\${[A-Za-z_]+}' inputFile| sed -r 's/[${}]//g' | sort | uniq
Example :
$ grep -oE '\$[A-Za-z_]+|\${[A-Za-z_]+}' inputFile | sed -r 's/[${}]//g' | sort | uniq
CHORUS
CHORUS_INFIX
CHORUS_PREFIX
CHORUS_SEPARATOR
CHORUS_SUFFIX
SOMETHING
SPECIFIC
SPECIFICS
SPECIFIC_PREFIX

Bash: Pass all arguments exactly as they are to a function and prepend a flag on each of them

This seems like a relatively basic question, but I can't find it anywhere after an hour of searching. Many (there are a lot!) of the similar questions do not seem to hit the point.
I am writing a script ("vims") to use vim in a sed-like mode (so I can call normal vim commands on a stream input without actually opening vim), so I need to pass each argument to vim with a "-c" flag prepended to it. There are also many characters that need to be escaped (I need to pass regex expressions), so some of the usual methods on SO do not work.
Basically, when I write:
cat myfile.txt | vims ':%g/foo/exe "norm yyPImyfile: \<esc>\$dF,"' ':3p'
which are two command-line vim arguments to run on stdout,
I need these two single-quoted arguments to be passed exactly the way they are to my function vims(), which then tags each of them with a -c flag, so they are interpreted as commands in vim.
Here's what I've tried so far:
vims() {
vim - -nes -u NONE -c '$1' -c ':q!' | tail -n +2
}
This seems to work perfectly for a single command. No characters get escaped, and the "-c" flag is there.
Then, using the oft-duplicated question-answer, the "$#" trick, I tried:
vims() {
vim - -nes -u NONE $(for arg in "$#"; do echo -n " -c $arg "; done) -c ':q!' | tail -n +2
}
This seems to break the spaces within each string I pass it, so does not work. I also tried a few variations of the printf command, as suggested in other questions, but this has weird interactions with the vim command sequences. I've tried many other different backslash-quote-combinations in a perpetual edit-test loop, but have always found a quirk in my method.
What is the command sequence I am missing?
Add all the arguments to an array one at a time, then pass the entire array to vim with proper quoting to ensure whitespace is correctly preserved.
vims() {
local args=()
while (($# > 0)); do
args+=(-c "$1")
shift
done
vim - -nes -u NONE "${args[#]}" -c ':q!' | tail -n +2
}
As a rule of thumb, if you find yourself trying to escape things, add backslashes, use printf, etc., you are likely going down the wrong path. Careful use of quoting and arrays will cover most scenarios.

Unpredictable behavior in sed interpreters output from multiple expressions

Why does GNU sed sometimes handle substitution with piped output into another sed instance differently than when multiple expressions are used with the same one?
Specifically, for msys/mingw sessions, in the /etc/profile script I have a series of manipulations that "rearrange" the order of the environment variable PATH and removes duplicate entries.
Take note that while normally sed treats each line of input seperately (and therfore can't easily substitute '\n' in the input stream, this sed statement does a substitution of ':' with '\n', so it still handles the entire input stream like one line (with '\n' characters in it). This behavior stays true for all sed expressions in the same instance of sed (basically until you redirect or pipe the output into another program).
Here's the obligatory specs:
Windows 7 Professional Service Pack 1
HP Pavilion dv7-6b78us
16 GB DDR3 RAM
MinGW-w64 (x86_64-w64-mingw32-gcc-4.7.1.2-release-win64-rubenvb) mounted on /mingw/
MSYS (20111123) mounted on / and on /usr/
$ uname -a="MINGW32_NT-6.1 CHRIV-L09 1.0.17(0.48/3/2) 2011-04-24 23:39 i686 Msys"
$ which sed="/bin/sed.exe" (it's part of MSYS)
$ sed --version="GNU sed version 4.2.1"
This is the contents of PATH before manipulation:
PATH='.:/usr/local/bin:/mingw/bin:/bin:/c/PHP:/c/Program Files (x86)/HP SimplePass 2011/x64:/c/Program Files (x86)/HP SimplePass 2011:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/si:/c/android-sdk:/c/android-sdk/tools:/c/android-sdk/platform-tools:/c/Program Files (x86)/WinMerge:/c/ntp/bin:/c/GnuWin32/bin:/c/Program Files/MySQL/MySQL Server5.5/bin:/c/Program Files (x86)/WinSCP:/c/Program Files (x86)/Overlook Fing 2.1/bin:/c/Program Files/7-zip:.:/c/Program Files/TortoiseGit/bin:/c/Program Files (x86)/Git/bin:/c/VS10/VC/bin/x86_amd64:/c/VS10/VC/bin/amd64:/c/VS10/VC/bin'
This is an excerpt of /etc/profile (where I have begun the PATH manipulation):
set | grep --color=never ^PATH= | sed -e "s#^PATH=##" -e "s#'##g" \
-e "s/:/\n/g" -e "s#\n\(/[^\n]*tortoisegit[^\n]*\)#\nZ95-\1#ig" \
-e "s#\n\(/[a-z]/win\)#\nZ90-\1#ig" -e "s#\n\(/[a-z]/p\)#\nZ70-\1#ig" \
-e "s#\.\n#A10-.\n#g" -e "s#\n\(/usr/local/bin\)#\nA15-\1#ig" \
-e "s#\n\(/bin\)#\nA20-\1#ig" -e "s#\n\(/mingw/bin\)#\nA25-\1#ig" \
-e "s#\n\(/[a-z]/vs10/vc/bin\)#\nA40-\1#ig"
The last sed expression in that line basically looks for lines that begins with "/c/VS10/VC/bin" and prepends them with 'A40-' like this:
...
/c/si
A40-/c/VS10/VC/bin
A40-/c/VS10/VC/bin/amd64
A40-/c/VS10/VC/bin/x86_amd64
/c/GnuWin32/bin
...
I like my sed expressions to be flexible (path structures change), but I don't want it to match the lines that end with amd64 or x86_amd64 (those are going to have a different string prepended). So I change the last expression to:
-e "s#\n\(/[a-z]/vs10/vc/bin\)\n#\nA40-\1\n#ig"
This works:
...
/c/si
A40-/c/VS10/VC/bin
/c/VS10/VC/bin/amd64
/c/VS10/VC/bin/x86_amd64
/c/GnuWin32/bin
...
Then, (to match any "line" matching the pseudocode "/x/.../bin") I change the last expression to:
-e "s#\n\(/[a-z]/.*/bin\)\n#\nA40-\1\n#ig"
Which produces:
...
/c/si
/c/VS10/VC/bin
/c/VS10/VC/bin/amd64
/c/VS10/VC/bin/x86_amd64
/c/GnuWin32/bin
...
??? - sed didn't match any character ('.') any number of times ('*') in the middle of the line ???
But, if I pipe the output into a different instance of sed (and compensate for sed handling each "line" seperately) like this:
| sed -e "s#^\(/[a-z]/.*/bin\)$#A40-\1#ig"
I get:
sed: -e expression #1, char 30: unterminated `s' command
??? How is that unterminated? It's got all three '#' characters after the s, has the modifiers 'i' and 'g' after the third '#', and the entire expression is in double quotes ('"'). Also, there are no escapes ('\') immediately preceding the delimiters, and the delimiter is not a part of either the search or the replacement. Let's try a different delimiter than '#', like '~':
I use:
| sed -e "s~^(/[a-z]/.*/bin)$~A40-\1~ig"
and, I get:
...
/c/si
A40-/c/VS10/VC/bin
/c/VS10/VC/bin/amd64
/c/VS10/VC/bin/x86_amd64
A40-/c/GnuWin32/bin
...
And, that is correct! The only thing I changed was the delimeter from '#' to '~' and it worked ???
This is not (even close to) the first time that sed has produced unexplainable results for me.
Why, oh, why, is sed NOT matching syntax in an expression in the same instance, but IS matching when piped into another instance of sed?
And, why, oh, why, do I have to use a different delimeter when I do this (in order not to get an "unterminated 's' command"?
And the real reason I'm asking: Is this a bug in sed, OR, is it correct behavior that I don't understand (and if so, can someone explain why this behavior is correct)? I want to know if I'm doing it wrong, or if I need a different/better tool (or both, they don't have to be mutually exclusive).
I'll mark a response it as the answer if someone can either prove why this behavior is correct or if they can prove why it is a bug. I'll gladly accept any advice about other tools or different methods of using sed, but those won't answer the question.
I'm going to have to get better at other text processors (like awk, tr, etc.) because sed is costing me too much time with it's unexplainable results.
P.S. This is not the complete logic of my PATH manipulation. The complete logic also finishes prepending all the lines with values from 'A00-' to 'Z99-', then pipes that output into 'sort -u -f' and back into sed to remove those same prefixes on each line and to convert the lines ('\n') back into colons (':'). Then "export PATH='" is prepended to the single line and "'" is appended to it. Then that output is redirected into a temporary file. Next, that temporary file is sourced. And, finally, that temporary file is removed.
The /etc/profile script also displays the contents of PATH before and after sorting (in case it screwed up the path).
P.P.S. I'm sure there is a much better way to do this. It started as some very simple sed manipulations, and grew into the monster you see here. Even if there is a better way, I still need to know why sed is giving me these results.
sed -e "s#^\(/[a-z]/.*/bin\)$#A40-\1#ig"
is unterminated because the shell is trying to expand "$#A". Put your expressions in single quotes to avoid this.
The expression
-e "s#\n\(/[a-z]/.*/bin\)\n#\nA40-\1\n#ig"
fails, or doesn't do what you expect, because . matches the newline in a multi-line expression. Check your whole output, the A40- is at the very beginning. Change it to
-e "s#\n\(/[a-z]/[^\n]*/bin\)\n#\nA40-\1\n#ig"
and it might be more what you expect. This may very well be the case with most of your issues with multi-line modifications.
You can also put the statements, one per line, into a standalone file and invoke sed with sed -f editscript. It might make maintenance of this a bit easier.

Bash quote behavior and sed

I wrote a short bash script that is supposed to strip the leading tabs/spaces from a string:
#!/bin/bash
RGX='s/^[ \t]*//'
SED="sed '$RGX'"
echo " string" | $SED
It works from the command line, but the script gets this error:
sed: -e expression #1, char 1: unknown command: `''
My guess is that something is wrong with the quotes, but I'm not sure what.
Putting commands into variables and getting them back out intact is hard, because quoting doesn't work the way you expect (see BashFAQ #050, "I'm trying to put a command in a variable, but the complex cases always fail!"). There are several ways to deal with this:
1) Don't do it unless you really need to. Seriously, unless you have a good reason to put your command in a variable first, just execute it and don't deal with this messiness.
2) Don't use eval unless you really really really need to. eval has a well-deserved reputation as a source of nasty and obscure bugs. They can be avoided if you understand them well enough and take the necessary precautions to avert them, but this should really be a last resort.
3) If you really must define a command at one point and use it later, either define it as a function or an array. Here's how to do it with a function:
RGX='s/^[ \t]*//'
SEDCMD() { sed "$RGX"; }
echo " string" | SEDCMD
Here's the array version:
RGX='s/^[ \t]*//'
SEDCMD=(sed "$RGX")
echo " string" | "${SEDCMD[#]}"
The idiom "${SEDCMD[#]}" lets you expand an array, keeping each element a separate word, without any of the problems you're having.
It does. Try:
#!/bin/bash
RGX='s/^[ \t]*//'
#SED='$RGX'
echo " string" | sed "$RGX"
This works.
The issue you have is with quotes and spaces. Double quoted strings are passed as single arguments.
Add set -x to your script. You'll see that variables within a single-quote mark are not expanded.
+To expand on my comment above:
#!/bin/bash
RGX='s/^[[:space:]]+//'
SED="sed -r '$RGX'"
eval "printf \" \tstring\n\" | $SED"
Note that this also makes your regex an extended one, for no particular reason. :-)

Extracting username from UNIX path using Regex

I need to get a username from an Unix path with this format:
/home/users/myusername/project/number/files
I just want "myusername" I've been trying for almost a hour and I'm completely clueless.
Any idea?
Thanks!
Maybe just /home/users/([a-zA-Z0-9_\-]*)/.*?
Note that the critical part [a-zA-Z0-9_\-]* has to contain all valid characters for unix usernames. I took from here, that a username should only contain digits, characters, dashes and underscores.
Also note that the extracted username is not the whole matching, but the first group (indicated by (...)).
The best answer to this depends on what you are trying to achieve. If you want to know the user who owns that file then you can use the stat command, this unfortunately has slightly different syntax dependant on the operating system however the following two commands work
Max OS/X
stat -f '%Su' /home/users/myusername/project/number/files
Redhat/Fedora/Centos
stat -c '%U' /home/users/myusername/project/number/files
If you really do want the string following /home/users then the either of the Regexes provided above will do that, you could use that in a bash script as follows (Mac OS/X)
USERNAME=$(echo '/home/users/myusername/project/number/files' | \
sed -E -e 's!^/home/users/([^/]+)/.*$!\1!g')
Check http://rubular.com/r/84zwJmV62G. The first match, not the entire match, is the username.
in a bourne shell something like :
string="/home/users/STRINGWEWANT/some/subdir/here"
echo $string | awk -F\/ '{print $3}'
would be one option, assuming its always the third element of the path. There are more lightweight that use only the shell builtins :
echo ${x#*users/}
will strip out everything up to and including 'users/'
echo ${y%%/*}
Will strip out the remainder.
So to put it all together :
export path="/home/users/STRINGWEWABT/some/other/dirs"
export y=`echo ${path#*users/}` && echo ${y%%/*}
STRINGWEWABT
Also checkout the bash manpage and search for "Parameter Expansion"
(\/home\/users\/)([^\/]+)
The 2nd capture group (index 1) will be myusername