Bash Regex matches on Ubuntu but not on macOS - regex

I have the following bash script:
#!/bin/bash
git_status="$(git status 2>/dev/null)"
branch_pattern="^(# |)On branch ([^${IFS}]*)"
echo $git_status
echo $branch_pattern
if [[ ${git_status} =~ ${branch_pattern} ]]; then
echo 'hello'
echo $BASH_REMATCH
fi
Here is the output when I run the script on Ubuntu with bash version 4:
On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) test.sh nothing added to commit but untracked files present (use "git add" to track)
^(# |)On branch ([^ ]*)
hello
On branch master
However, when I run the same script on macOS with bash version 3, the regex does not match, and nothing inside the if block is executed. The rest of the output is identical. What am I missing? Does my regex need to be formatted differently on macOS/in this version of bash? Is there a flag I am missing?
I have seen similar posts about differences in regex behavior across platforms for e.g., the find command, but I have not yet found a post relevant to my issue.

It looks to me like there's a bug in the RE engine in the version of bash that macOS comes with (it's rather old -- 3.2.57). It's something to do with the ^(# |) part -- it doesn't seem to match an empty string at the beginning of the string, as it should. But I found a workaround. Apparently the bug doesn't happen if the ^ is inside the parentheses, like this:
branch_pattern="(^# |^)On branch ([^${IFS}]*)"
BTW, you shouldn't use echo $varname to print the contents of a variable. For one thing, it'll do word splitting (converting all runs of whitespace into single spaces) and wildcard expansion on the value, which can be very confusing/misleading. Try something like printf '<<%q>>\n' "$varname" instead. Its output can be a little cryptic if the variable contains weird characters, but at least it'll make it clear that there are weird things in there.

If you are only trying to get the current git branch name, there is no need for regex. Git already has this built in.
git rev-parse --abbrev-ref HEAD 2>/dev/null
This will print out the current branch name (if any)
If you are in a git repository without any commits, it will only return "HEAD"

Related

Delete Local References Not Starting With Match

The strategy that we have for the project I'm working on is to release the project in phases, aka releases. Therefore our branching strategy is to have release1 act as the master branch for all features related to the first release, release2 for the second, and so on.
Occasionally I like to clean out the local instances of my branches. The majority of the time they've either been merged into their respective release branch or they've been abandoned. This usually involves the following sequence:
$ git branch
--- prints the list ---
$ git branch -d branch1, branch2, branch3, etc...
To try and do this in one go I tried running the following command:
$ git branch | grep -v '^release.+|QA' | xargs git branch -d
The idea is that it should:
Get every branch
Grep everything that does not start with release and is also not QA
Pass the branches to the git branch -d
But what is happening is that it is delete every branch except for the branch I currently have checked out. What am I doing wrong?
You will find that git branch output is intended to be read by humans, but has some quirks (there is a * in its output to mark the active branch, each line starts with two spaces ...) that make for a poor scripting experience.
Try using git for-each-ref instead :
# here is the mantra to get all local branch names :
git for-each-ref --format="%(refname:short)" refs/heads | grep -v ...
The reason behind it is quite simple.
git branch
QA
* master
not-release1
not-release2
not-release3
release1
release2
release3
release4
Each line begins with 2 spaces. So that's why your regex is not matching. Secondly, there is an asterisk.
The asterisk would only give a minor error like error: branch '*' not found. whereas your current checked branch can't be deleted according to git. Since nothing was matching your regex, the option -v inverted that and every branch matched, deleting everything except your current branch.
error: Cannot delete branch 'master' checked out at ...
Solution to your problem
git branch | grep -v "^ *release" | grep -v "QA" | xargs git branch -d

Bash on macOS: How replace a path in a file with another string?

For integration tests, I have output that contains full file paths. I want to have my test script replace the user-specific start of the file path (e.g. /Users/uli/) with a generic word (USER_DIR) so that I can compare the files.
The problem, of course, are the slashes in the path. I tried the solutions given here and here, but they don't work for me:
#!/bin/bash
old_path="/Users/uli/"
new_path="USERDIR"
sed -i "s#$old_path#$new_path#g" /Users/uli/Desktop/replacetarget.txt
I get the error
sed: 1: "/Users/uli/Desktop/repl ...": invalid command code u
This is the version of sed that comes with macOS 10.14.6 (it has no --version option and is installed in /usr/bin/, so no idea what exact version).
Update:
I also tried
#!/bin/bash
old_path="/Users/uli/"
old_path=${old_path//\//\\\/}
new_path="USERDIR"
regex="s/$old_path/$new_path/g"
echo $old_path
echo $regex
sed -i $regex /Users/uli/Desktop/replacetarget.txt
But I get the same error. What am I doing wrong?
BSD sed requires an argument following -i (the empty string '' indicates no backup, similar to argumentless -i in GNU sed). As a result, your script is being treated as the backup-file extention, and your input file as the script.
old_path="/Users/uli/"
new_path="USERDIR"
sed -i '' "s#$old_path#$new_path#g" /Users/uli/Desktop/replacetarget.txt
However, sed is a stream editor, based on the file editor ed, so using -i is an indication you are using the wrong tool to begin with. Just use ed.
old_path="/Users/uli/"
new_path="USERDIR"
printf 's#%s#%s#g\nwq\n' "$old_path" "$new_path" | ed /Users/uli/Desktop/replacetarget.txt
Obligatory warning: neither editor is parameterized as such; you are simpling generating the script dynamically, which means it's your responsibility to ensure that the resulting script is valid. (For example, if either parameter contains a ;, it had better be escaped to prevent (s)ed from seeing it as a command separator.)

Capture group from regex in bash script

When building an R package the command outputs the process steps to std out. From that output I would like to capture the final name of the package.
In the simulated script below I show the output of the build command. The part that needs to be captured is the last line starting with building.
How do I get the regex to match with these quotes, and then capture the package name into a variable?
#!/usr/bin/env bash
var=$(cat <<"EOF"
Warning message:
* checking for file ‘./DESCRIPTION’ ... OK
* preparing ‘analysis’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
Removed empty directory ‘analysis/.idea/inspectionProfiles’
Removed empty directory ‘analysis/.idea/snapshots’
* creating default NAMESPACE file
* building ‘analysis_0.1.tar.gz’
EOF
)
regex="building [\u2018](.*?)?[\u2019]"
if [[ "${var}" =~ $regex ]]; then
pkgname="${BASH_REMATCH[1]}"
echo "${pkgname}"
else
echo "sad face"
fi
This should work on both macOS and CentOS.
Support for the \u and \U unicode escapes was introduced in Bash 4.2. CentOS 7 has Bash 4.2, so this should work on that platform:
regex=$'.*building[[:space:]]+\u2018(.*)\u2019'
Unfortunately, earlier versions of CentOS had older versions of Bash, and I believe the default version of Bash on MacOS is still 3.2. For those, assuming that the quotes are encoded as UTF-8, this should work:
regex=$'.*building[[:space:]]+\xe2\x80\x98(.*)\xe2\x80\x99'
If the quotes are encoded in different ways on different platforms, then you could use alternation (e.g. (\xe2\x80\x98|...) instead of xe2\x80\x98) to match all of the possibilities (and adjusting the index used for BASH_REMATCH).
See How do you echo a 4-digit Unicode character in Bash? for more information about Unicode in Bash.
I've used $'...' to set the regular expression because it supports \x and (from Bash 4.2) \u escapes for characters, and Bash regular expressions don't.
With regard to the regular expression:
The leading .* is to ensure that the match occurs at the end of the text.
I've dropped the ?s because they aren't compatible with Bash's built-in regular expressions. See mkelement0's excellent answer to How do I use a regex in a shell script? for information about Bash regular expressions.
There are many ways to do it, this is one:
file=`echo "$var" | grep '^\* building' | grep -o '‘.*’' | head -c -4 | tail -c +4`
echo $file
Find the line starting with * building (first grep)
Find the text between ‘’ (second grep)
Discard the quotes (first 4 bytes and last 4 bytes) (head and tail)

How can I capture group from string using regexp in shell in msysgit on Windows?

[Editor's note: The OP has later clarified that he's running bash as part of msysgit, the Git version for Windows.]
I'm trying to get last digits from the string. I have a little script but it doesn't work and i don't know why:
#!bin/bash
TAGS="MASTER_46"
re="_(\d+)"
if [[ ${TAGS}=~$re ]]; then
echo "Find"
echo ${BASH_REMATCH}
echo ${BASH_REMATCH[1]}
fi
The output:
Find
{empty}
{empty}
I am using bash
$ bash --version
GNU bash, version 3.1.20(4)-release
There are several problems:
bash doesn't support \d. Use [0-9].
Whitespace is needed around the operator: $TAGS =~ $re, otherwise bash parses it as [[ -n "$TAGS=~$re" ]].
Path to the shell is /bin/bash, not bin/bash.
Update, based on the OP's clarification re environment and his own findings:
tl;dr:
msysgit (as of version 1.9.5) comes with a bash executable that is compiled without support for =~, the regex-matching operator
A limited workaround is to use utilities such as grep, sed, and awk instead.
To solve this problem fundamentally, install a separate Unix emulation environment such as MSYS or Cygwin, and use git.exe (the core of msysgit) from there.
choroba's answer has great pointers, but let me add that, since you get the following error message:
conditional binary operator expected syntax error near =~
the implication is either that
your bash version is too old to support =~, the regex-matching operator.
your bash version was compiled without regex support
Given that =~ was introduced in bash 3.0 (see http://tiswww.case.edu/php/chet/bash/NEWS) and you're running 3.1.x, it must be the latter, which indeed turned out to be true:
The OP runs msysgit, the official Git version for Windows that comes with bash and a set of Unix utilities.
As it turns out, as of version 1.9.5, the bash executable that comes with msysgit is built without regex support, presumably due to technical difficulties - see https://groups.google.com/forum/#!topic/msysgit/yPh85MPDyfE.
Incredibly, the "Known Issues" section of the release notes does not mention this limitation.
Your best bet is to:
Install MSYS to use as your Unix emulation environment - its bash does come with =~ support (note that msysgit was originally forked from MSYS).
Alternatively, to get better Unix emulation and more tools at the expense of a larger installation footprint and possibly performance, install Cygwin instead.
In MSYS, use only git.exe from msysgit, via the Windows %PATH%.
To that end, be sure to install msysgit with the Run Git from the Windows Command Prompt option - see https://stackoverflow.com/a/5890222/45375
Alternatively, add C:\Program Files\Git\cmd (assumes the default path on 32-bit Windows, on 64-bit Windows it's C:\Program Files (x86)\Git\cmd) manually to your Windows %PATH%, OR extend $PATH in your MSYS ~/.profile file (e.g., PATH="$PATH:/c/program files/git/cmd").
You could hack your msysgit installation, but it hardly seems worth it and I don't know what the side effects are;
If you really wanted to try, do the following: Copy the following files from an MSYS installation's bin directory to msysgit's bin directory: bash.exe, sh.exe, msys-termcap-0.dll - in other words: replace msysgit's bash altogether.

Rename multiple files with /usr/bin/rename using regex

I have a lot of pdfs that I want to rename with /usr/bin/rename.
The files are named in the following pattern:
<rating> <a pretty long title> (<author> <year>).pdf
e.g.: +++ The discovery of some very interesting stuff (Dude 1999).pdf
rating: 1 to 5 '+' signs
year: numerical
They should be renamed into the following pattern:
<author>, <year> <rating> <a pretty long title>.pdf
e.g.: Dude, 1999 +++ The discovery of some very interesting stuff.pdf
I tried to use /usr/bin/rename and wrote this command:
rename 's/(.*)\ (.*)\ \((.*)\ (.*)\).pdf/$3, $4 $1 $2.pdf/' *.pdf
However, the command does not consider that the rating always contains '+' signs and that the year is always numerical. How can I achieve this? I tried something like ([+]{1,5}) and ([0-9]{4}), but it didn't work.
Is rename actually able to interpret something other than (.*) as the input for the variables $1 ... $n?
Thanks for your help!
This works fine for me:
rename 's/(\+{1,5}) (.*) \((.*) ([0-9]{4})\).pdf/$3, $4 $1 $2.pdf/' -- \
'Dude, 1999 +++ The discovery of some very interesting stuff.pdf'
... however your question doesn't quote the error message, so it's hard to tell what might be wrong in your situation.
Just as a warning, there are two different versions of /usr/bin/rename that are widely found on Linux systems, and which have different syntaxes. I assume that you are using the Perl one, however, since your original command worked at all. That means that you can use any Perl expression to modify the name - see perlre for more details.
Unfortunately Fedora (it's my distro) has worthless version of rename.
But I have changed it for perl version of replace utility.
You can find it at CPAN
get and untar archive and then:
# ./Build installdeps
# sudo ./Build install
!!! It actions replace original fedora rename: bin file and manual, but it can be reverted by yum reinstall and may be reverted at next fedora update
Also you can install it separately or use alternatives.