Regular Expressions in BASH?

Regular Expressions in BASH? - regex

I am ok with regular expressions in Perl but not had to do it in BASH before.
I tried to google for some sort of tutorial on it but didn't see any really good ones yet the way there are with Perl.
What I am trying to achieve is to strip /home/devtestdocs/devtestdocs-repo/ out of a variable called $filename and replace it with another variable called $testdocsdirurl
Hopefully that makes sense and if anybody has any good links that would be much appreciated.
Another way might be is if there is already a function someone has written to do a find and replace in bash.

sed is the typical weapon of choice for string manipulation in Unix:
echo $filename | sed s/\\/home\\/devtestdocs\\/devtestdocs-repo\\//$testdocsdirurl/
Also, as hop suggests, you can use the # syntax to avoid escaping the path:
echo $filename | sed s#/home/devtestdocs/devtestdocs-repo/#$testdocsdirurl#

You can achieve this without a regular expression:
somepath="/foo/bar/baz"
newprefix="/alpha/beta/"
newpath="$newprefix${somepath##/foo/bar/}"

yes, bash supports regular expressions, e.g.
$ [[ 'abc' =~ (.)(.)(.) ]]
$ echo ${BASH_REMATCH[1]}
a
$ echo ${BASH_REMATCH[2]}
b
but you might rather want to use basename utility
$ f='/some/path/file.ext'
$ echo "/new/path/$(basename $f)"
/new/path/file.ext
excellent source of info is bash manual page

With bash
pattern=/home/devtestdocs/devtestdocs-repo/
testdocsdirurl=/tmp/
filename=/foo/bar/home/devtestdocs/devtestdocs-repo/filename
echo ${filename/$pattern/$testdocsdirurl} # => /foo/bar/tmp/filename

Why do you need regular expressions for this?
These are just a few possibilities:
$ filename=/home/devtestdocs/devtestdocs-repo/foo.txt
$ echo ${filename/'/home/devtestdocs/devtestdocs-repo/'/'blah/'}
blah/foo.txt
$ basename $filename
foo.txt
$ realfilename=$(basename "$filename")

you're looking for an example of how use regular expressions in powershell?
is there an example here:
$input = "hello,123"
$pattern = ([regex]"[0-9]+")
$match = $pattern.match($input)
$ok = $input -match $pattern #return an boolean value if matched..
if($ok) {
$output = $match.groups[0].value
[console]::write($output)
} else {
//no match
}
in 'bash classic' regular expressions usage is precarious.
you can use this:
http://www.robvanderwoude.com/findstr.php

Related

How to use perl to extract text between and look ahead and a look behind string without applying it twice?

I currently have a string:
https://drive.google.com/file/d/j2903r293rj092j3r20/view?usp=sharing
I would like to extract j2903r293rj092j3r20 from. I am using a standard perl installation in Mac OS. I have
URL="https://drive.google.com/file/d/j2903r293rj092j3r20/view?usp=sharing"
echo $URL | perl -pe 's/https\:\/\/drive.google.com\/file\/d\///g' | perl -pe 's/\/view\?usp=sharing//g'
where I apply perl to the front and back. Is there a way to do this in one step instead? thanks

When parsing URLs you are probably better off using a proper parser, such as URI
use strict;
use warnings;
use URI;
my $uri = URI->new("https://drive.google.com/file/d/j2903r293rj092j3r20/view?usp=sharing");
my #path = $uri->path_segments;
print $path[-2];
This prints:
j2903r293rj092j3r20
I suppose if you need this in a one-liner it would be something like:
perl -MURI -lne'$u = URI->new($_); print (( $u->path_segments )[-2])'

Sure.
Firstly, using the substitution operator (s/.../.../) here is the wrong tool. You can use the match operator (m/.../) to just extract the bit of the string that you want.
echo $URL | perl -pe 'm/https\:\/\/drive.google.com\/file\/d\/(\w+)/ and $_ = $1'
Here, we're using "capturing parentheses" to copy the string of "word characters" (alphanumerics and the underscore) that follows the /d/ in the URL into the variable $1. We then copy that into $_ as that's the variable that -p will automatically print.
But we can do better than that. Both s/.../.../ and m/.../ allow us to change our delimiters, so that we don't have to escape all of those slashes.
echo $URL | perl -pe 'm[https://drive.google.com/file/d/(\w+)] and $_ = $1'
We can use print directly to remove the slightly confusing variable assignment at the end.
echo $URL | perl -ne 'print m[https://drive.google.com/file/d/(\w+)]'
And, if we know that our input data is always going to look like the current example, there's really no need to include so much of the URL.
echo $URL | perl -ne 'print m[/d/(\w+)]'
Update: You've got a comment suggesting that you use the URI module to parse your string. I'm not convinced that's particularly useful as the module will give you the path part of your URL and you still need to extract the correct part of the path. But, for completeness, here's an example using that module:
echo $URL | perl -MURI -ne 'print +(URI->new($_)->path_segments)[3]'
We create a URI object from our input and immediately call its path_segments() method to get the segments of the path. We print the fourth element of the list that is returned.

Since you tagged the question with macos, I guess there is nothing wrong with a simple sed command like
echo "$s" | sed -n 's,.*/d/\([^/]*\).*,\1,p'
Match all up to and including /d/, capture the next characters until the first / or end of string and then match the rest. Replace with the contents of the first group and only print that value.

You can just put the two pieces comma separated in one perl -pe command:
echo $URL | perl -pe 's/https\:\/\/drive.google.com\/file\/d\///g','s/\/view\?usp=sharing//g'

How do I return just an individual string in bash?

I have a text file with multiple lines, but I know that on one line there's the following
name="some_string" value="some_string"
I know I can get down to the line by just cat file.txt|grep "value=\".*\"" but I can't figure out just to return what's inside the parenthesis. I think grep can only get it down to the line

Using GNU grep for -P (PCRE) option and -o option to print only matched part.
grep -oP '(?<=value=")[^"]+' file

If your regex contains grouping operators, you can use =~ and BASH_REMATCH:
regex='value="([^"]+)"'
[[ $string =~ $regex ]] && result=${BASH_REMATCH[1]}
This works on systems (like MacOS) without GNU grep.
Another approach, which similarly works on systems with only POSIX grep, is to filter grep's output in native bash with parameter expansions:
while read -r; do
value=${REPLY#*'value="'}
value=${value%'"'*}
echo "$value"
done < <(grep -e 'name="foo"' file)

Change variable value with regular expression

I have a string: http://user_name:user_password#example.com/gitproject.git
and want to make it without user and pass - http://example.com/gitproject.git
i.e.
http://user_name:user_password#example.com/gitproject.git
to
http://example.com/gitproject.git
How can I do it automatically in bash?

Some languages you may have installed such as php or python have excellent URL parsing facilities. For example, php:
$url = parse_url("http://user_name:user_password#example.com/gitproject.git ");
return "$url[scheme]://" . $url['host'] . $url['path'];
However, since that's not what you asked for, you can still do it in sed:
sed -r "s#(.*?://).*?#(.*)#\1\2#" <<<"http://user:pass#example.com/git"

This sed should work:
s="http://user_name:user_password#example.com/gitproject.git"
sed 's~^\(.*//\)[^#]*#\(.*\)$~\1\2~' <<< "$s"
http://example.com/gitproject.git
Using pure BASH
echo "${s/*#/http://}"
http://example.com/gitproject.git

With sed:
$ sed "s#//.*##//#g" <<< "http://user_name:user_password#example.com/gitproject.git"
http://example.com/gitproject.git

A pure bash possibility
var='http://user_name:user_password#example.com/gitproject.git'
pat='(http://).*?#(.*)'
[[ $var =~ $pat ]]
echo "${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
http://example.com/gitproject.git

Regexp for extensions tgz, tar.gz, TGZ and TAR.GZ

Im trying to get a regexp (in bash) to identify files with only the following extensions :
tgz, tar.gz, TGZ and TAR.GZ.
I tried several ones but cant get it to work.
Im using this regexp to select only files files with those extensions to do some work with them :
if [ -f $myregexp ]; then
.....
fi
thanks.

Try this:
#!/bin/bash
# no case match
shopt -s nocasematch
matchRegex='.*\.(tgz$)|(tar\.gz$)'
for f in *
do
# display filtered files
[[ -f "$f" ]] && [[ "$f" =~ "$matchRegex" ]] && echo "$f";
done

I have found an elegant way of doing this:
shopt -s nocasematch
for file in *;
do
[[ "$file" =~ .*\.(tar.gz|tgz)$ ]] && echo $file
done
This may be good for you since you seems to want to use the if and a bash regex. The =~ operator allow to check if the pattern is matching a given expression. Also shopt -s nocasematch has to be set to perfom a case insensitive match.

Use this pattern
.*\.{1}(tgz|tar\.gz)
But how to make a regular expression case-insensitive? It depends on the language you use. In JavaScript they use /pattern/i, in which, i denotes that the search should be case-insensitive. In C# they use RegexOptions enumeration.

Depends on where you want to use this regex. If with GREP, then use egrep with -i parameter, which stands for "ignore case"
egrep -i "(\.tgz)|(\.tar\.gz)$"

Write 4 regexes, and check whether the file name matches any of them. Or write 2 case-insensitive regexes.
This way the code will be much more readable (and easier) than writing 1 regex.

You can even do it without a regex (a bit wordy though):
for f in *.[Tt][Gg][Zz] *.[Tt][Aa][Rr].[Gg][Zz]; do
echo $f
done

In bash? Use curly brackets, *.{tar.gz,tgz,TAR.GZ,TGZ} or even *.{t{ar.,}gz,T{AR.,}GZ}. Thus, ls -l *.{t{ar.,}gz,T{AR.,}GZ} on the command-line will do a detailed listing of all files with the matching extensions.

Return a regex match in a Bash script, instead of replacing it

I just want to match some text in a Bash script. I've tried using sed but I can't seem to make it just output the match instead of replacing it with something.
echo -E "TestT100String" | sed 's/[0-9]+/dontReplace/g'
Which will output TestTdontReplaceString.
Which isn't what I want, I want it to output 100.
Ideally, it would put all the matches in an array.
edit:
Text input is coming in as a string:
newName()
{
#Get input from function
newNameTXT="$1"
if [[ $newNameTXT ]]; then
#Use code that im working on now, using the $newNameTXT string.
fi
}

You could do this purely in bash using the double square bracket [[ ]] test operator, which stores results in an array called BASH_REMATCH:
[[ "TestT100String" =~ ([0-9]+) ]] && echo "${BASH_REMATCH[1]}"

echo "TestT100String" | sed 's/[^0-9]*\([0-9]\+\).*/\1/'
echo "TestT100String" | grep -o '[0-9]\+'
The method you use to put the results in an array depends somewhat on how the actual data is being retrieved. There's not enough information in your question to be able to guide you well. However, here is one method:
index=0
while read -r line
do
array[index++]=$(echo "$line" | grep -o '[0-9]\+')
done < filename
Here's another way:
array=($(grep -o '[0-9]\+' filename))

Pure Bash. Use parameter substitution (no external processes and pipes):
string="TestT100String"
echo ${string//[^[:digit:]]/}
Removes all non-digits.

I Know this is an old topic but I came her along same searches and found another great possibility apply a regex on a String/Variable using grep:
# Simple
$(echo "TestT100String" | grep -Po "[0-9]{3}")
# More complex using lookaround
$(echo "TestT100String" | grep -Po "(?i)TestT\K[0-9]{3}(?=String)")
With using lookaround capabilities search expressions can be extended for better matching. Where (?i) indicates the Pattern before the searched Pattern (lookahead),
\K indicates the actual search pattern and (?=) contains the pattern after the search (lookbehind).
https://www.regular-expressions.info/lookaround.html
The given example matches the same as the PCRE regex TestT([0-9]{3})String

Use grep. Sed is an editor. If you only want to match a regexp, grep is more than sufficient.

using awk
linux$ echo -E "TestT100String" | awk '{gsub(/[^0-9]/,"")}1'
100

I don't know why nobody ever uses expr: it's portable and easy.
newName()
{
#Get input from function
newNameTXT="$1"
if num=`expr "$newNameTXT" : '[^0-9]*\([0-9]\+\)'`; then
echo "contains $num"
fi
}

Well , the Sed with the s/"pattern1"/"pattern2"/g just replaces globally all the pattern1s to pattern 2.
Besides that, sed while by default print the entire line by default .
I suggest piping the instruction to a cut command and trying to extract the numbers u want :
If u are lookin only to use sed then use TRE:
sed -n 's/.*\(0-9\)\(0-9\)\(0-9\).*/\1,\2,\3/g'.
I dint try and execute the above command so just make sure the syntax is right.
Hope this helped.

using just the bash shell
declare -a array
i=0
while read -r line
do
case "$line" in
*TestT*String* )
while true
do
line=${line#*TestT}
array[$i]=${line%%String*}
line=${line#*String*}
i=$((i+1))
case "$line" in
*TestT*String* ) continue;;
*) break;;
esac
done
esac
done <"file"
echo ${array[#]}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expressions in BASH? - regex

You can achieve this without a regular expression: somepath="/foo/bar/baz" newprefix="/alpha/beta/" newpath="$newprefix${somepath##/foo/bar/}"

With bash pattern=/home/devtestdocs/devtestdocs-repo/ testdocsdirurl=/tmp/ filename=/foo/bar/home/devtestdocs/devtestdocs-repo/filename echo ${filename/$pattern/$testdocsdirurl} # => /foo/bar/tmp/filename

Why do you need regular expressions for this? These are just a few possibilities: $ filename=/home/devtestdocs/devtestdocs-repo/foo.txt $ echo ${filename/'/home/devtestdocs/devtestdocs-repo/'/'blah/'} blah/foo.txt $ basename $filename foo.txt $ realfilename=$(basename "$filename")

Related

How to use perl to extract text between and look ahead and a look behind string without applying it twice?

How do I return just an individual string in bash?

Change variable value with regular expression

Regexp for extensions tgz, tar.gz, TGZ and TAR.GZ

Return a regex match in a Bash script, instead of replacing it

Categories

Resources