Bash quote behavior and sed - regex

I wrote a short bash script that is supposed to strip the leading tabs/spaces from a string:
#!/bin/bash
RGX='s/^[ \t]*//'
SED="sed '$RGX'"
echo " string" | $SED
It works from the command line, but the script gets this error:
sed: -e expression #1, char 1: unknown command: `''
My guess is that something is wrong with the quotes, but I'm not sure what.

Putting commands into variables and getting them back out intact is hard, because quoting doesn't work the way you expect (see BashFAQ #050, "I'm trying to put a command in a variable, but the complex cases always fail!"). There are several ways to deal with this:
1) Don't do it unless you really need to. Seriously, unless you have a good reason to put your command in a variable first, just execute it and don't deal with this messiness.
2) Don't use eval unless you really really really need to. eval has a well-deserved reputation as a source of nasty and obscure bugs. They can be avoided if you understand them well enough and take the necessary precautions to avert them, but this should really be a last resort.
3) If you really must define a command at one point and use it later, either define it as a function or an array. Here's how to do it with a function:
RGX='s/^[ \t]*//'
SEDCMD() { sed "$RGX"; }
echo " string" | SEDCMD
Here's the array version:
RGX='s/^[ \t]*//'
SEDCMD=(sed "$RGX")
echo " string" | "${SEDCMD[#]}"
The idiom "${SEDCMD[#]}" lets you expand an array, keeping each element a separate word, without any of the problems you're having.

It does. Try:
#!/bin/bash
RGX='s/^[ \t]*//'
#SED='$RGX'
echo " string" | sed "$RGX"
This works.
The issue you have is with quotes and spaces. Double quoted strings are passed as single arguments.

Add set -x to your script. You'll see that variables within a single-quote mark are not expanded.

+To expand on my comment above:
#!/bin/bash
RGX='s/^[[:space:]]+//'
SED="sed -r '$RGX'"
eval "printf \" \tstring\n\" | $SED"
Note that this also makes your regex an extended one, for no particular reason. :-)

Related

grep for a list of variables that are referenced by a script?

I have some scripts that use various variables, and I want to grep (from within bash on FreeBSD) each of them for the list of variables that are used by the script. These are not shell scripts, but the syntax of referencing a variable is similar to that used in bash. Specifically, a reference to a variable can be like:
$X, or
${X}
and the name of the variable ("X" in this case) can include alphanumerics and underscores. At this point I want to explicitly note that I imagine that bash itself probably has a more complicated set of possible ways to reference variables, but if so, I do not care about that for the purposes of this question.
I would like to find all variable names that are so referenced in a given file - just the name, not the entire line. So something of the form awesomegrepcmd filename | sort | uniq, or something like that.
Note that there may be multiple different variables referenced on a single line.
I have been fighting regex and escaping rules via largely ignorant stab-in-the-dark attempts (e.g. "OK, maybe I need to put TWO backslashes here, but only ONE there") for a while now, and googling to try to find others who have done this sort of thing, but have thus far been unable to accomplish what I want. How can this be done? Thanks.
EDIT: Example: Let's say the content of a file is this (just pseudocode, not meant as bash or anything):
SPECIFICS={ "be born", "die", "plant", "reap", "kill", "heal", "laugh", "weep" }
CHORUS_PREFIX="To every thing"
CHORUS_INFIX="there is a season"
CHORUS_SUFFIX="and a time to every purpose under heaven."
CHORUS_SEPARATOR=", turn, turn, turn, "
CHORUS = $CHORUS_PREFIX + ${CHORUS_SEPARATOR} + $CHORUS_INFIX + $CHORUS_SEPARATOR + ${CHORUS_SUFFIX}
SPECIFIC_PREFIX="A time to "
UNUSED=${SOMETHING}
echo $CHORUS
foreach SPECIFIC in $SPECIFICS {
echo $SPECIFIC_PREFIX + " " + ${SPECIFIC}
}
echo ${CHORUS}
Then the output I would want would be:
CHORUS
CHORUS_INFIX
CHORUS_PREFIX
CHORUS_SEPARATOR
CHORUS_SUFFIX
SOMETHING
SPECIFIC
SPECIFIC_PREFIX
SPECIFICS
A couple things to note:
${CHORUS} and $CHORUS (for example) both refer to the same variable, named CHORUS
The UNUSED variable is not in the output, because it is not actually used in the script (despite having been defined in it)
The SOMETHING variable, used only to define the unused UNUSED variable, is in the output, since it was used
The various CHORUS_xxx variables are only used in the one single line defining the CHORUS variable, but they are all present in the output.
Using an Extended regular expression:
me#lappy386:/tmp$ grep -Eoi '\$(\{[a-z0-9_]+\}|[a-z0-9_]+)' \
/tmp/example |tr -d '{}$'|sort|uniq
CHORUS
CHORUS_INFIX
CHORUS_PREFIX
CHORUS_SEPARATOR
CHORUS_SUFFIX
SOMETHING
SPECIFIC
SPECIFIC_PREFIX
SPECIFICS
Using grep and sed
grep -oE '\$[A-Za-z_]+|\${[A-Za-z_]+}' inputFile| sed -r 's/[${}]//g' | sort | uniq
Example :
$ grep -oE '\$[A-Za-z_]+|\${[A-Za-z_]+}' inputFile | sed -r 's/[${}]//g' | sort | uniq
CHORUS
CHORUS_INFIX
CHORUS_PREFIX
CHORUS_SEPARATOR
CHORUS_SUFFIX
SOMETHING
SPECIFIC
SPECIFICS
SPECIFIC_PREFIX

Bash: Pass all arguments exactly as they are to a function and prepend a flag on each of them

This seems like a relatively basic question, but I can't find it anywhere after an hour of searching. Many (there are a lot!) of the similar questions do not seem to hit the point.
I am writing a script ("vims") to use vim in a sed-like mode (so I can call normal vim commands on a stream input without actually opening vim), so I need to pass each argument to vim with a "-c" flag prepended to it. There are also many characters that need to be escaped (I need to pass regex expressions), so some of the usual methods on SO do not work.
Basically, when I write:
cat myfile.txt | vims ':%g/foo/exe "norm yyPImyfile: \<esc>\$dF,"' ':3p'
which are two command-line vim arguments to run on stdout,
I need these two single-quoted arguments to be passed exactly the way they are to my function vims(), which then tags each of them with a -c flag, so they are interpreted as commands in vim.
Here's what I've tried so far:
vims() {
vim - -nes -u NONE -c '$1' -c ':q!' | tail -n +2
}
This seems to work perfectly for a single command. No characters get escaped, and the "-c" flag is there.
Then, using the oft-duplicated question-answer, the "$#" trick, I tried:
vims() {
vim - -nes -u NONE $(for arg in "$#"; do echo -n " -c $arg "; done) -c ':q!' | tail -n +2
}
This seems to break the spaces within each string I pass it, so does not work. I also tried a few variations of the printf command, as suggested in other questions, but this has weird interactions with the vim command sequences. I've tried many other different backslash-quote-combinations in a perpetual edit-test loop, but have always found a quirk in my method.
What is the command sequence I am missing?
Add all the arguments to an array one at a time, then pass the entire array to vim with proper quoting to ensure whitespace is correctly preserved.
vims() {
local args=()
while (($# > 0)); do
args+=(-c "$1")
shift
done
vim - -nes -u NONE "${args[#]}" -c ':q!' | tail -n +2
}
As a rule of thumb, if you find yourself trying to escape things, add backslashes, use printf, etc., you are likely going down the wrong path. Careful use of quoting and arrays will cover most scenarios.

What is the difference b/w two sed commands below?

Information about the environment I am working in:
$ uname -a
AIX prd231 1 6 00C6B1F74C00
$ oslevel -s
6100-03-10-1119
Code Block A
( grep schdCycCleanup $DCCS_LOG_FILE | sed 's/[~]/ \
/g' | grep 'Move(s) Exist for cycle' | sed 's/[^0-9]*//g' ) > cycleA.txt
Code Block B
( grep schdCycCleanup $DCCS_LOG_FILE | sed 's/[~]/ \n/g' | grep 'Move(s) Exist for cycle' | sed 's/[^0-9]*//g' ) > cycleB.txt
I have two code blocks(shown above) that make use of sed to trim the input down to 6 digits but one command is behaving differently than I expected.
Sample of input for the two code blocks
Mar 25 14:06:16 prd231 ajbtux[33423660]: 20160325140616:~schd_cem_svr:1:0:SCHD-MSG-MOVEEXISTCYCLE:200705008:AUDIT:~schdCycCleanup - /apps/dccs/ajbtux/source/SCHD/schd_cycle_cleanup.c - line 341~ SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210~
I get the following output when the sample input above goes through the two code blocks.
cycleA.txt content
389210
cycleB.txt content
25140616231334236602016032514061610200705008341389210
I understand that my last piped sed command (sed 's/[^0-9]*//g') is deleting all characters other than numbers so I omitted it from the block codes and placed the output in two additional files. I get the following output.
cycleA1.txt content
SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210
cycleB1.txt content
Mar 25 15:27:58 prd231 ajbtux[33423660]: 20160325152758: nschd_cem_svr:1:0:SCHD-MSG-MOVEEXISTCYCLE:200705008:AUDIT: nschdCycCleanup - /apps/dccs/ajbtux/source/SCHD/schd_cycle_cleanup.c - line 341 n SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210 n
I can see that the first code block is removing every thing other that (SCHD_CYCLE_CLEANUP - Move(s) Exist for cycle 389210) and is using the tilde but the second code block is just replacing the tildes with the character n. I can also see that it is necessary in the first code block for a line break after this(sed 's/[~]/ ) and that is why I though having \n would simulate a line break but that is not the case. I think my different output results are because of the way regular expressions are being used. I have tried to look into regular expressions and searched about them on stackoverflow but did not obtain what I was looking for. Could someone explain how I can achieve the same result from code block B as code block A without having part of my code be on a second line?
Thank you in advance
This is an example of the XY problem (http://xyproblem.info/). You're asking for help to implement something that is the wrong solution to your problem. Why are you changing ~s to newlines, etc when all you need given your posted sample input and expected output is:
$ sed -n 's/.*schdCycCleanup.* \([0-9]*\).*/\1/p' file
389210
or:
$ awk -F'[ ~]' '/schdCycCleanup/{print $(NF-1)}' file
389210
If that's not all you need then please edit your question to clarify your requirements for WHAT you are trying to do (as opposed to HOW you are trying to do it) as your current approach is just wrong.
Etan Reisner's helpful answer explains the problem and offers a single-line solution based on an ANSI C-quoted string ($'...'), which is appropriate, given that you originally tagged your question bash.
(Ed Morton's helpful answer shows you how to bypass your problem altogether with a different approach that is both simpler and more efficient.)
However, it sounds like your shell is actually something different - presumably ksh88, an older version of the Korn shell that is the default sh on AIX 6.1 - in which such strings are not supported[1]
(ANSI C-quoted strings were introduced in ksh93, and are also supported not only in bash, but in zsh as well).
Thus, you have the following options:
With your current shell, you must stick with a two-line solution that contains an (\-escaped) actual newline, as in your code block A.
Note that $(printf '\n') to create a newline does not work, because command substitutions invariably trim all trailing newlines, resulting in the empty string in this case.
Use a more modern shell that supports ANSI C-quoted strings, and use Etan's answer. http://www.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.cmds3/ksh.htm tells me that ksh93 is available as an alternative shell on AIX 6.1, as /usr/bin/ksh93.
If feasible: install GNU sed, which natively understands escape sequences such as \n in replacement strings.
[1] As for what actually happens when you try echo 'foo~bar~baz' | sed $'s/[~]/\\\n/g' in a POSIX-like shell that does not support $'...': the $ is left as-is, because what follow is not a valid variable name, and sed ends up seeing literal $s/[~]/\\\n/g, where the $ is interpreted as a context address applying to the last input line - which doesn't make a difference here, because there is only 1 line. \\ is interpreted as plain \, and \n as plain n, effectively replacing ~ instances with literal \n sequences.
GNU sed handles \n in the replacement the way you expect.
OS X (and presumably BSD) sed does not. It treats it as a normal escaped character and just unescapes it to n. (Though I don't see this in the manual anywhere at the moment.)
You can use $'' quoting to use \n as a literal newline if you want though.
echo 'foo~bar~baz' | sed $'s/[~]/\\\n/g'

grep not matching strings when they come from a variable

I'm writing a script that is helping me process log files. In it, I have my grep flags stored in a variable. The flags and strings themselves work just fine, but when I pass them to grep using a variable, the parts of the string that use escaped characters don't produce any matches. See below:
grepvars="-B4 -Psihe 'caused\sby|unable|fault|error|deadlock|checkpoint|corrupt|fail|exception|fatal|severe|\tat\s'"
grep -B4 -Psihe 'caused\sby|unable|fault|error|deadlock|checkpoint|corrupt|fail|exception|fatal|severe|\tat\s' adapter_15.log > adapter_15-error1.log
grep $grepvars adapter_15.log > adapter_15-error2.log
wc -l *-error?.log
51398 adapter_15-error1.log
25032 adapter_15-error2.log
As you can see, the \tat\s part does not produce matches when passed through a variable to grep. What that is supposed to match is a (literal tab)at(literal space). Although this works correctly without using a variable, I'd rather use one since it makes my multiple grep calls easier to manage. What do I have to do to ensure that grep will perform this match correctly when passed through a variable?
After not having any sort of luck with this, I found a workaround: create a function and call it when needed. Here's what I came up with:
grep4j () {
unset IFS
nice -n 15 grep -B3 -Psihe '\tat\s|caused\sby|unable|fault|error|deadlock|checkpoint|corrupt|fail|exception|fatal|severe' $1
IFS=$'\n'
}
Yes, I did try unsetting IFS before and after the grep strings that were using the varaible. It didn't work (and I need it to be set for other things to work). Doing the function like this met my needs, and maybe it will help someone else as well. Cheers!
In case you're curious, this is designed to get relevant messages out of log4j-formatted logs. It saves me a lot of time.
If you're storing all the options of grep in a string then I guess you need to use evil eval:
str="grep $grepvars adapter_15.log > adapter_15-error2.log"
eval "$str"
It may be easier to stuff options into the environment variable GREP_OPTIONS, and patterns into a file, like so:
grep -f <file-with-patterns> ...

Unpredictable behavior in sed interpreters output from multiple expressions

Why does GNU sed sometimes handle substitution with piped output into another sed instance differently than when multiple expressions are used with the same one?
Specifically, for msys/mingw sessions, in the /etc/profile script I have a series of manipulations that "rearrange" the order of the environment variable PATH and removes duplicate entries.
Take note that while normally sed treats each line of input seperately (and therfore can't easily substitute '\n' in the input stream, this sed statement does a substitution of ':' with '\n', so it still handles the entire input stream like one line (with '\n' characters in it). This behavior stays true for all sed expressions in the same instance of sed (basically until you redirect or pipe the output into another program).
Here's the obligatory specs:
Windows 7 Professional Service Pack 1
HP Pavilion dv7-6b78us
16 GB DDR3 RAM
MinGW-w64 (x86_64-w64-mingw32-gcc-4.7.1.2-release-win64-rubenvb) mounted on /mingw/
MSYS (20111123) mounted on / and on /usr/
$ uname -a="MINGW32_NT-6.1 CHRIV-L09 1.0.17(0.48/3/2) 2011-04-24 23:39 i686 Msys"
$ which sed="/bin/sed.exe" (it's part of MSYS)
$ sed --version="GNU sed version 4.2.1"
This is the contents of PATH before manipulation:
PATH='.:/usr/local/bin:/mingw/bin:/bin:/c/PHP:/c/Program Files (x86)/HP SimplePass 2011/x64:/c/Program Files (x86)/HP SimplePass 2011:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/si:/c/android-sdk:/c/android-sdk/tools:/c/android-sdk/platform-tools:/c/Program Files (x86)/WinMerge:/c/ntp/bin:/c/GnuWin32/bin:/c/Program Files/MySQL/MySQL Server5.5/bin:/c/Program Files (x86)/WinSCP:/c/Program Files (x86)/Overlook Fing 2.1/bin:/c/Program Files/7-zip:.:/c/Program Files/TortoiseGit/bin:/c/Program Files (x86)/Git/bin:/c/VS10/VC/bin/x86_amd64:/c/VS10/VC/bin/amd64:/c/VS10/VC/bin'
This is an excerpt of /etc/profile (where I have begun the PATH manipulation):
set | grep --color=never ^PATH= | sed -e "s#^PATH=##" -e "s#'##g" \
-e "s/:/\n/g" -e "s#\n\(/[^\n]*tortoisegit[^\n]*\)#\nZ95-\1#ig" \
-e "s#\n\(/[a-z]/win\)#\nZ90-\1#ig" -e "s#\n\(/[a-z]/p\)#\nZ70-\1#ig" \
-e "s#\.\n#A10-.\n#g" -e "s#\n\(/usr/local/bin\)#\nA15-\1#ig" \
-e "s#\n\(/bin\)#\nA20-\1#ig" -e "s#\n\(/mingw/bin\)#\nA25-\1#ig" \
-e "s#\n\(/[a-z]/vs10/vc/bin\)#\nA40-\1#ig"
The last sed expression in that line basically looks for lines that begins with "/c/VS10/VC/bin" and prepends them with 'A40-' like this:
...
/c/si
A40-/c/VS10/VC/bin
A40-/c/VS10/VC/bin/amd64
A40-/c/VS10/VC/bin/x86_amd64
/c/GnuWin32/bin
...
I like my sed expressions to be flexible (path structures change), but I don't want it to match the lines that end with amd64 or x86_amd64 (those are going to have a different string prepended). So I change the last expression to:
-e "s#\n\(/[a-z]/vs10/vc/bin\)\n#\nA40-\1\n#ig"
This works:
...
/c/si
A40-/c/VS10/VC/bin
/c/VS10/VC/bin/amd64
/c/VS10/VC/bin/x86_amd64
/c/GnuWin32/bin
...
Then, (to match any "line" matching the pseudocode "/x/.../bin") I change the last expression to:
-e "s#\n\(/[a-z]/.*/bin\)\n#\nA40-\1\n#ig"
Which produces:
...
/c/si
/c/VS10/VC/bin
/c/VS10/VC/bin/amd64
/c/VS10/VC/bin/x86_amd64
/c/GnuWin32/bin
...
??? - sed didn't match any character ('.') any number of times ('*') in the middle of the line ???
But, if I pipe the output into a different instance of sed (and compensate for sed handling each "line" seperately) like this:
| sed -e "s#^\(/[a-z]/.*/bin\)$#A40-\1#ig"
I get:
sed: -e expression #1, char 30: unterminated `s' command
??? How is that unterminated? It's got all three '#' characters after the s, has the modifiers 'i' and 'g' after the third '#', and the entire expression is in double quotes ('"'). Also, there are no escapes ('\') immediately preceding the delimiters, and the delimiter is not a part of either the search or the replacement. Let's try a different delimiter than '#', like '~':
I use:
| sed -e "s~^(/[a-z]/.*/bin)$~A40-\1~ig"
and, I get:
...
/c/si
A40-/c/VS10/VC/bin
/c/VS10/VC/bin/amd64
/c/VS10/VC/bin/x86_amd64
A40-/c/GnuWin32/bin
...
And, that is correct! The only thing I changed was the delimeter from '#' to '~' and it worked ???
This is not (even close to) the first time that sed has produced unexplainable results for me.
Why, oh, why, is sed NOT matching syntax in an expression in the same instance, but IS matching when piped into another instance of sed?
And, why, oh, why, do I have to use a different delimeter when I do this (in order not to get an "unterminated 's' command"?
And the real reason I'm asking: Is this a bug in sed, OR, is it correct behavior that I don't understand (and if so, can someone explain why this behavior is correct)? I want to know if I'm doing it wrong, or if I need a different/better tool (or both, they don't have to be mutually exclusive).
I'll mark a response it as the answer if someone can either prove why this behavior is correct or if they can prove why it is a bug. I'll gladly accept any advice about other tools or different methods of using sed, but those won't answer the question.
I'm going to have to get better at other text processors (like awk, tr, etc.) because sed is costing me too much time with it's unexplainable results.
P.S. This is not the complete logic of my PATH manipulation. The complete logic also finishes prepending all the lines with values from 'A00-' to 'Z99-', then pipes that output into 'sort -u -f' and back into sed to remove those same prefixes on each line and to convert the lines ('\n') back into colons (':'). Then "export PATH='" is prepended to the single line and "'" is appended to it. Then that output is redirected into a temporary file. Next, that temporary file is sourced. And, finally, that temporary file is removed.
The /etc/profile script also displays the contents of PATH before and after sorting (in case it screwed up the path).
P.P.S. I'm sure there is a much better way to do this. It started as some very simple sed manipulations, and grew into the monster you see here. Even if there is a better way, I still need to know why sed is giving me these results.
sed -e "s#^\(/[a-z]/.*/bin\)$#A40-\1#ig"
is unterminated because the shell is trying to expand "$#A". Put your expressions in single quotes to avoid this.
The expression
-e "s#\n\(/[a-z]/.*/bin\)\n#\nA40-\1\n#ig"
fails, or doesn't do what you expect, because . matches the newline in a multi-line expression. Check your whole output, the A40- is at the very beginning. Change it to
-e "s#\n\(/[a-z]/[^\n]*/bin\)\n#\nA40-\1\n#ig"
and it might be more what you expect. This may very well be the case with most of your issues with multi-line modifications.
You can also put the statements, one per line, into a standalone file and invoke sed with sed -f editscript. It might make maintenance of this a bit easier.