Bash variable search and replace instead of sed - regex

See Code Review
See Github Project
I need to parse out instances of +word+ line by line (replace +word+ with blank). I'm currently using the following (working) sed regex:
newLine=$(echo "$line" | sed "s/+[a-Z]\++//g")
This violates "SC2001" according to "ShellCheck" validation;
SC2001: See if you can use ${variable//search/replace} instead.
I've attempted several variations without success (The string "+word+" remains in the output):
newLine=$(line//+[a-Z]+/)
newLine=$(line/+[a-Z]+//)
newLine=$(line/+[a-Z]\++/)
newLine=${line//+[a-Z]+/}
and more..
I've heard that in some cases sed is necessary, but I would like to use Bash's built in find and replace if possible.

The substitution in parameter expansion doesn't use regular expressions, but patterns. To get closer to regular expressions, you can turn on extended patterns:
shopt -s extglob
new_line=${line//++([a-Z])+}

Related

Bash variable substitution with a regex not working as expected

Given a bash variable holding the following string:
INPUT="Cookie: cf_clearance=foo; __cfduid=bar;"
Why is the substitution ${INPUT/cf_clearance=[^;]*;/} producing the output: Cookie: instead of what I'd expect: Cookie: __cfduid=bar;
Testing the same regex in online regex validators confirms that cf_clearance=[^;]*; should match cf_clearance=foo; only, and not the rest of the string.
What am I doing wrong here?
Use the actual regular-expression matching features instead of parameter expansion, which works with patterns.
[[ $INPUT =~ (.*)(cf_clearance=[^;]*;)(.*) ]]
ans=${BASH_REMATCH[1]}${BASH_REMATCH[3]}
You can also use an extended pattern, which is equivalent to a regular expression in power:
shopt -s extglob
$ echo "${INPUT/cf_clearance=*([^;]);/}"
Use sed:
INPUT=$(sed 's/cf_clearance=[^;]*;//' <<< "$INPUT")
Like you have been told in comments, bash parameter substitution only supports glob patterns, not regular expressions. So the problem is really with your expectation, not with your code per se.
If you know that the expression can be anchored to the beginning of the string, you can use the ${INPUT#prefix} parameter substitution to grab the shortest possible match, and add back the Cookie: in front:
echo "Cookie: ${INPUT#Cookie: cf_clearance=*;}"
If you don't have this guarantee, something very similar can be approximated with a pair of parameter substitutions. Find which part precedes cf_clearance, find which part follows after the semicolon after cf_clearance; glue them together.
head=${INPUT%cf_clearance=*}
tail=${INPUT#*cf_clearance=*;}
echo "$head$tail"
(If you are not scared of complex substitutions, the temporary variables aren't really necessary or useful.
echo "${INPUT%cf_clearance=*}${INPUT#*cf_clearance=*;}"
This is a little dense even for my sophisticated taste, though.)

How to get sed to take extended regular expressions?

I want to do string replacement using regular expressions in sed. Now, I'm aware that the behavior of sed is funky on a Mac. I've often seen workarounds using egrep when I want to just examine a certain pattern in a line. But, in this case I want to do string replacement.
I want to replace cp an and cp <tab or newline> an with gggg. I tried the following, which would work under extended regular expressions:
sed -i'_backup' 's/cp\s+an/gggg/g'
But of course this does nothing. I tried egrepping, and of course it picks out the lines with cp <one or more space characters> an.
How do I get sed to do replacement using extended regular expressions? Or what is a better way to do replacement using regular expressions?
i'm on mac osx.
On OSX following command will work for extended regex support:
sed -i.backup -E 's/cp[[:blank:]]+an/gggg/g'
POSIX Character Class Reference
Since you mentioned you want <newline> to be handled, you'll need to coax sed a bit. Your exact requirements aren't too clear to me but the following example illustrates that sed can easily handle certain cases in which a newline is in the "target" regex:
$ echo $'cp\nancp an' | sed -E '/cp/{N; s/cp(\n|[[:blank:]])an/gggg/g;}'
gggggggg
(Note to non-Mac readers: If your grep does not support -E, try -r instead.)

Trying to remove version number from a string using sed in OSX

I have what I hope is a simple issue which is stumping me. I need to take an installer file with a name like:
installer_v0.29_linux.run
installer_v10.22_linux_x64.run
installer_v1.1_osx.app
installer_v5.6_windows.exe
and zip it up into a file with the format
installer_linux.zip
installer_linux_x64.zip
installer_osx.zip
installer_windows.zip
I already have a bash script running on OSX which does almost everything else I need in the build chain, and was certain I could achieve this with sed using something like:
ZIP_NAME=`echo "$OUTPUT_NAME" | sed -E 's/_(?:\d*\.)?\d+//g'`
That is, replacing the regex _(?:\d*\.)?\d+ with a blank - the regex should match any decimal number preceded by an underscore.
However, I get the error RE error: repetition-operator operand invalid when I try to run this. At this stage I am stumped - I have Googled around this and can't see what I am doing wrong. The regex I wrote works correctly at Regexr, but clearly some element of it is not supported by the sed implementation in OSX. Does anyone know what I am doing wrong?
You can try this sed:
sed 's/_v[^_]*//; s/\.[[:alnum:]]\+$/.zip/' file
installer_linux.zip
installer_linux_x64.zip
installer_osx.zip
installer_windows.zip
You don't need sed, just some parameter expansion magic with an extended pattern.
shopt -s extglob
zip_name=${OUTPUT_NAME/_v+([^_])/}
The pattern _v+([^_]) matches a string starting with _v and all characters up to the next _. The extglob option enables the use of the +(...) pattern to match one or more occurrences of the enclosed pattern (in this case, a non-_ character). The parameter expansion ${var/pattern/} removes the first occurrence of the given pattern from the expansion of $var.
Try this way also
sed 's/_[^_]\+//' FileName
OutPut:
installer_linux.run
installer_linux_x64.run
installer_osx.app
installer_windows.exe
If you want add replace zip instead of run use below method
sed 's/\([^_]\+\).*\(_.*\).*/\1\2.zip/' Filename
Output :
installer_linux.run.zip
installer_x64.run.zip
installer_osx.app.zip
installer_windows.exe.zip

difference between 'i' and 'I' in sed

I thought i and I both mean ignorecase in sed, e.g.
$ echo "abcABC"|sed -e 's/a/j/gi'
jbcjBC
$ echo "abcABC"|sed -e 's/a/j/gI'
jbcjBC
However, looks like it's only for substitution:
$ echo "abcABC"|sed -e '/a/id' # <--
d
abcABC
$ echo "abcABC"|sed -e '/a/Id'
$
It's really confusing.
Where can I find full reference of the meaning of regular expression for sed?
i and I are indeed flags to the s command; they are not generally applicable to all uses of regular expressions in sed. The GNU man page is oddly silent on which flags s accepts (or even the fact that s accepts flags), so you'll have to look in the info page (run info sed).
Other uses of regular expressions are governed by the function in which they are used.
In your other examples, i and I are the actual sed functions applied to lines that match the regular expression a; i means to insert text. As far as I can tell, I is an unrecognized function and so ignored, leaving d as the function, deleting the line. (My interpretation of I may be wrong.)
The sed man page in FreeBSD in the section describing options to the s (substitute) command, says only:
i or I Match the regular expression in a case-insensitive
way
Thus, the following are identical:
s/a/j/gi
s/a/j/gI
But that's only using i as a modifier to the s command. In your second example, you're using i as a command. The man page in this case states:
[1addr]i\
text Write text to the standard output.
and at least in FreeBSD's sed, there is no I (capital-I) command. So your sed script /a/id would (1) match lines containing an a, and if found (2) print the text "d". Which is what you saw.
And since I is not a command, I would have expected an error, but my results match yours -- /a/Id appears to eliminate output.
Note that commands, commands, and completeness of documentation may differ depending on the variant of sed you are using.

Use grep to find strings at the beginning of a line or after a delimiter in Git Bash for Windows

I have such file:
blue|1|red|2
green|3|blue|4
darkblue|0|yellow|3
I want to use grep to find anything containg blue| at the beginning of line or |blue| anywhere, but not any darkblue| or |darkblue| or |blueberry|
I tried to use grep [^|\|]blue\| but Git Bash gives me error:
$ grep [^|\|]blue\| *.*
grep: Unmatched [ or [^
sh.exe": |]blue|: command not found
What did I do wrong? What's the proper way to do it?
Here's a quick & dirty one:
grep -E '(^|\|)blue\|' *
Matches start of line or |, followed by blue|. The important note is that you need extended regular expressions (via egrep or the -E flag) to use the | (or) construct.
Also, note the single quotes around the regular expression.
So, in answer to the OP's "What did I do wrong?",
You forgot to put the regexp in single quotes;
You chose the wrong type of brackets to enclose the alternate expressions; and finally
You forgot to use egrep or the -E flag
It's always easier to see other people's errors; I wish I was a quick to spot my own :-|