Syntax error using =~ operator in bash - regex

I'm very new to bash and I'm trying to extract a portion of a string based on a pattern, but when I execute my code I'm seeing errors.
Sample code:
#!/bin/sh
STRING="LAX-8912_Words_Are_Here";
if [[ $STRING =~ LAX-(\d)+ ]]; then
echo "${BASH_REMATCH[1]}"
fi
So from the code above, I'm wanting to extract the "LAX-8912" portion of the string. Basically the string will be LAX- and then a series of numbers, could be any length. When the code is exectued however, I'm getting this message:
Syntax error: "(" unexpected (expecting "then")
I've also tried storing the regex in a variable like this:
#!/bin/sh
STRING="LAX-8912_Words_Are_Here";
REX="LAX-(\d)+";
if [[ $STRING =~ $REX ]]; then
echo "${BASH_REMATCH[1]}"
fi
But then I get this error:
[[: not found
My bash version is 4.2.25 so I'm guessing it's not a version issue, but I'm at a bit of a loss as to what's going.

Since your script starts with:
#!/bin/sh
it will run the system shell, which may be a completely different shell like dash, or at best bash in compatibility mode. You should use:
#!/bin/bash
to use bash with all its features.
Similarly, if run with sh file you override the shebang and force the script to run with the system shell. Use ./file so that the script can run with its declared shebang.

Related

Bash Regex matches on Ubuntu but not on macOS

I have the following bash script:
#!/bin/bash
git_status="$(git status 2>/dev/null)"
branch_pattern="^(# |)On branch ([^${IFS}]*)"
echo $git_status
echo $branch_pattern
if [[ ${git_status} =~ ${branch_pattern} ]]; then
echo 'hello'
echo $BASH_REMATCH
fi
Here is the output when I run the script on Ubuntu with bash version 4:
On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) test.sh nothing added to commit but untracked files present (use "git add" to track)
^(# |)On branch ([^ ]*)
hello
On branch master
However, when I run the same script on macOS with bash version 3, the regex does not match, and nothing inside the if block is executed. The rest of the output is identical. What am I missing? Does my regex need to be formatted differently on macOS/in this version of bash? Is there a flag I am missing?
I have seen similar posts about differences in regex behavior across platforms for e.g., the find command, but I have not yet found a post relevant to my issue.
It looks to me like there's a bug in the RE engine in the version of bash that macOS comes with (it's rather old -- 3.2.57). It's something to do with the ^(# |) part -- it doesn't seem to match an empty string at the beginning of the string, as it should. But I found a workaround. Apparently the bug doesn't happen if the ^ is inside the parentheses, like this:
branch_pattern="(^# |^)On branch ([^${IFS}]*)"
BTW, you shouldn't use echo $varname to print the contents of a variable. For one thing, it'll do word splitting (converting all runs of whitespace into single spaces) and wildcard expansion on the value, which can be very confusing/misleading. Try something like printf '<<%q>>\n' "$varname" instead. Its output can be a little cryptic if the variable contains weird characters, but at least it'll make it clear that there are weird things in there.
If you are only trying to get the current git branch name, there is no need for regex. Git already has this built in.
git rev-parse --abbrev-ref HEAD 2>/dev/null
This will print out the current branch name (if any)
If you are in a git repository without any commits, it will only return "HEAD"

Bash regex - different number of backslashes for escaping?

Recently I wrote a script with such regex test:
# Works fine on Sabayon/Mac, doesn't work on CentOS
[[ $line =~ (.+)\{(.+)\} ]] || continue
It runs smoothly on Sabayon Linux and also on Mac, but then I needed to run it on other environment and script failed. It was CentOS Linux. I found out after a while that I need to make double backlash escape to make it work.
# Works fine on CentOS, does not on Sabayon/Mac
[[ $line =~ (.+)\\{(.+)\\} ]] || continue
Environtment list:
CentOS release 5.5 (Final), 2.6.18-194.el5, running GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
Sabayon latest release, 3.12.0-sabayon, running GNU bash, version 4.2.45(1)-release (x86_64-pc-linux-gnu)
OS X 10.9.3, running GNU bash, version 3.2.51(1)-release (x86_64-apple-darwin13)
Why is that happening? How to make it run on both environments?
You can store the expression in a variable and use that in your test:
re="(.+)\{(.+)\}"
[[ $line =~ $re ]] || continue
I don't think you really need the parentheses here. ".+\{.+\}" would work as well.

renaming a file with regex in bash MacOSX

I have file names like this
223h123.sdsdas.png
which I would like to rename to
sdsdas.png
I am using this command
for i in *.png;do mv "$i" "${i/[a-zA-Z0-9]*/}";done
which gives me this instead
png,
I am using bash on MacOS X.
You're confusing regular expressions with Parameter Expansion. They look a bit similar, but they are not the same.
A regular expression parser might allow you to make a regex-based substitution, for example:
[ghoti#pc ~]$ i="223h123.sdsdas.png"
[ghoti#pc ~]$ echo "$i" | sed 's/^[^.]*\.\([^.]*\)/\1/'
sdsdas.png
[ghoti#pc ~]$
Sed, the Stream Editor, has a substitute command that takes a Basic Regular Expression and replaces \1 with the first bracketed atom of the regex.
Alternately, you could use parameter expansion to strip off text to the first dot.
[ghoti#pc ~]$ i="223h123.sdsdas.png"
[ghoti#pc ~]$ echo ${i#*.}
sdsdas.png
[ghoti#pc ~]$

pattern matching while using ls command in bash script

In a sh script, I am trying to loop over all files that match the following pattern
abc.123 basically abc. followed by only numbers, number following . can be of any length.
Using
$ shopt -s extglob
$ ls abc.+([0-9])
does the job but on terminal and not through the script. How can I get only files that match the pattern?
if I understood you right, the pattern could be translated into regex:
^abc\.[0-9]+$
so you could
keep using ls and grep the output. for example:
ls *.*|xargs -n1|grep -E '^abc\.[0-9]+$'
or use find
find has an option -regex
If you're using sh and not bash, and presumably you also want to be POSIX compliant, you can use:
for f in ./*
do
echo "$f" | grep -Eq '^\./abc.[0-9]+$' && continue
echo "Something with $f here"
done
It will work fine with filenames with spaces, quotes and such, but may match some filenames with line feeds in them that it shouldn't.
If you tagged your question bash because you're using bash, then just use extglob like you described.

How to go from a multiple line sed command in command line to single line in script

I have sed running with the following argument fine if I copy and paste this into an open shell:
cat test.txt | sed '/[,0-9]\{0,\}[0-9]\{1,\}[acd][0-9]\{1,\}[,0-9]\{0,\}/{N
s/[,0-9]\{0,\}[0-9]\{1,\}[acd][0-9]\{1,\}[,0-9]\{0,\}\n\-\-\-//}'
The problem is that when I try to move this into a KornShell (ksh) script, the ksh throws errors because of what I think is that new line character. Can anyone give me a hand with this? FYI: the regular expression is supposed to be a multiple line replacement.
Thank you!
This: \{0,\} can be replaced by this: *
This: \{1,\} can be replaced by this: \+
It's not necessary to escape hyphens.
The newline can be replaced by -e (or by a semicolon)
The cat can be replaced by using the filename as an argument to sed
The result:
sed -e '/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*/{N' -e 's/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*\n---//}' test.txt
or
sed '/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*/{N;s/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*\n---//}' test.txt
(untested)
can you try to put your regex in a file and call sed with the option -f ?
cat test.txt | sed -f file.sed
Can you try to replace the new line character with `echo -e \\r`
The Korn Shell - unlike the C Shell - has no problem with newlines in strings. The newline is very unlikely to be your problem, therefore. The same comments apply to Bourne and POSIX shells, and to Bash. I've copied your example and run it on Linux under both Bash and Korn shell without any problem.
If you use C Shell for your work, are you sure you're running 'ksh ./script' and not './script'?
Otherwise, there is some other problem - an unbalanced quote somewhere, perhaps.
Check out the '-v' and '-n' options as well as the '-x' option to the Korn Shell. That may tell you more about where the problem is.