Change variable value with regular expression - regex

I have a string: http://user_name:user_password#example.com/gitproject.git
and want to make it without user and pass - http://example.com/gitproject.git
i.e.
http://user_name:user_password#example.com/gitproject.git
to
http://example.com/gitproject.git
How can I do it automatically in bash?

Some languages you may have installed such as php or python have excellent URL parsing facilities. For example, php:
$url = parse_url("http://user_name:user_password#example.com/gitproject.git ");
return "$url[scheme]://" . $url['host'] . $url['path'];
However, since that's not what you asked for, you can still do it in sed:
sed -r "s#(.*?://).*?#(.*)#\1\2#" <<<"http://user:pass#example.com/git"

This sed should work:
s="http://user_name:user_password#example.com/gitproject.git"
sed 's~^\(.*//\)[^#]*#\(.*\)$~\1\2~' <<< "$s"
http://example.com/gitproject.git
Using pure BASH
echo "${s/*#/http://}"
http://example.com/gitproject.git

With sed:
$ sed "s#//.*##//#g" <<< "http://user_name:user_password#example.com/gitproject.git"
http://example.com/gitproject.git

A pure bash possibility
var='http://user_name:user_password#example.com/gitproject.git'
pat='(http://).*?#(.*)'
[[ $var =~ $pat ]]
echo "${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
http://example.com/gitproject.git

Related

bash - Extract part of string

I have a string something like this
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
I want to extract the following from it :
AppointmentManagementService.xsd6.xsd
I have tried using regex, bash and sed with no success. Can someone please help me out with this?
The regex that I used was this :
/AppointmentManagementService.xsd\d{1,2}.xsd/g
Your string is:
nampt#nampt-desktop:$ cat 1
xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace=
Try with awk:
cat 1 | awk -F "\"" '{print $2}'
Output:
AppointmentManagementService.xsd6.xsd
sed doesn't recognize \d, use [0-9] or [[:digit:]] instead:
sed 's/^.*schemaLocation="\([^"]\+[[:digit:]]\{1,2\}\.xsd\)".*$/\1/g'
## or
sed 's/^.*schemaLocation="\([^"]\+[0-9]\{1,2\}\.xsd\)".*$/\1/g'
You can use bash native regex matching:
$ in='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
$ if [[ $in =~ \"(.+)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
Output:
AppointmentManagementService.xsd6.xsd
Based on your example, if you want to grant, at least, 1 or, at most, 2 digits in the .xsd... component, you can fine tune the regex with:
$ if [[ $in =~ \"(AppointmentManagementService.xsd[0-9]{1,2}.xsd)\" ]]; then echo "${BASH_REMATCH[1]}"; fi
using PCRE in GNU grep
grep -oP 'schemaLocation="\K.*?(?=")'
this will output pattern matched between schemaLocation=" and very next occurrence of "
Reference:
https://unix.stackexchange.com/a/13472/109046
Also we can use 'cut' command for this purpose,
[root#code]# echo "xsd:import schemaLocation=\"AppointmentManagementService.xsd6.xsd\" namespace=" | cut -d\" -f 2
AppointmentManagementService.xsd6.xsd
s='xsd:import schemaLocation="AppointmentManagementService.xsd6.xsd" namespace='
echo $s | sed 's/.*schemaLocation="\(.*\)" namespace=.*/\1/'

Regex to get number after last underscore

I am having trouble coming up with the regex command that will get me Y in the following string X_X_X_Y . BTW: Y is an interger, but can validate that after.
You could use shell parameter expansion:
$ s="X_X_X_Y"
$ echo "${s##*_}"
Y
Using sed:
$ sed 's/.*_//' <<< "$s"
Y
Using grep:
$ grep -oP '.*_\K.*' <<< "$s"
Y
This regex will work as long at the stuff you're matching for is an integer
[^_]+_[^_]+_[^_]+_(\d+)
as an alternative, if you are always tokenizing on the _ char you can skip regex and use awk
echo 'X_X_X_Y' | awk -F_ '{print $NF}'
Using BASH regex:
s='s="X_X_X_10'
[[ "$s" =~ [^_]+$ ]] && echo "${BASH_REMATCH[0]}"
10
This will print an integer at the end of the string after an underscore.
perl -e '"0_0_0_1" =~ /_([0-9]+)$/; print $1,"\n" if defined $1'
1
This might work for you:
sed 's/.*_\([0-9][0-9]*\)/\1/' file

bash regex patch match

I have a path such as thus ..
/Users/me/bla/dev/trunk/source/java/com/mecorp/sub/misc/filename.java
I'd like to be able to use bash to create the package structure in another dir somewhere e.g.
com/mecorp/sub/misc/
I tried the following but it wont work .. I was able to get a match if I change my regex to .* so that implies my bash is ok - There must be something wrong with the way im quoting the regex or maybe the regex its self. I do see working here ..
http://regexr.com?3439m
So im confused ?
regex="(?<=/java)(.*)(?=/)"
[[ $fullfile =~ $regex ]]
echo "pkg name " ${BASH_REMATCH[0]}
Thanks for your time.
EDIT - I'm using OSX so it doesn't have all those nice spiffy GNU extensions.
Try this :
using GNU grep :
$ echo '/Users/me/bla/dev/trunk/source/java/com/mecorp/sub/misc/filename.java' |
grep -oP 'java/\K.*/'
com/mecorp/sub/misc/
See http://regexr.com?3439p
Or using bash :
x="/Users/me/bla/dev/trunk/source/java/com/mecorp/sub/misc/filename.java"
[[ $x =~ java/(.*/) ]] && echo ${BASH_REMATCH[1]}
Or with awk :
echo "$x" | awk -F/ '{print gensub(".*/java/(.*/).*", "\\1", $0)}'
Or with sed :
echo "$x" | sed -e 's#.*/java/\(.*/\).*#\1#'
If you try to extract the path after /java/ you can do it with this:
path=/Users/me/bla/dev/trunk/source/java/com/mecorp/sub/misc/filename.java
package=`echo $path | sed -r 's,^.*/java/(.*/).*$,\1,'`

Regular Expressions in BASH?

I am ok with regular expressions in Perl but not had to do it in BASH before.
I tried to google for some sort of tutorial on it but didn't see any really good ones yet the way there are with Perl.
What I am trying to achieve is to strip /home/devtestdocs/devtestdocs-repo/ out of a variable called $filename and replace it with another variable called $testdocsdirurl
Hopefully that makes sense and if anybody has any good links that would be much appreciated.
Another way might be is if there is already a function someone has written to do a find and replace in bash.
sed is the typical weapon of choice for string manipulation in Unix:
echo $filename | sed s/\\/home\\/devtestdocs\\/devtestdocs-repo\\//$testdocsdirurl/
Also, as hop suggests, you can use the # syntax to avoid escaping the path:
echo $filename | sed s#/home/devtestdocs/devtestdocs-repo/#$testdocsdirurl#
You can achieve this without a regular expression:
somepath="/foo/bar/baz"
newprefix="/alpha/beta/"
newpath="$newprefix${somepath##/foo/bar/}"
yes, bash supports regular expressions, e.g.
$ [[ 'abc' =~ (.)(.)(.) ]]
$ echo ${BASH_REMATCH[1]}
a
$ echo ${BASH_REMATCH[2]}
b
but you might rather want to use basename utility
$ f='/some/path/file.ext'
$ echo "/new/path/$(basename $f)"
/new/path/file.ext
excellent source of info is bash manual page
With bash
pattern=/home/devtestdocs/devtestdocs-repo/
testdocsdirurl=/tmp/
filename=/foo/bar/home/devtestdocs/devtestdocs-repo/filename
echo ${filename/$pattern/$testdocsdirurl} # => /foo/bar/tmp/filename
Why do you need regular expressions for this?
These are just a few possibilities:
$ filename=/home/devtestdocs/devtestdocs-repo/foo.txt
$ echo ${filename/'/home/devtestdocs/devtestdocs-repo/'/'blah/'}
blah/foo.txt
$ basename $filename
foo.txt
$ realfilename=$(basename "$filename")
you're looking for an example of how use regular expressions in powershell?
is there an example here:
$input = "hello,123"
$pattern = ([regex]"[0-9]+")
$match = $pattern.match($input)
$ok = $input -match $pattern #return an boolean value if matched..
if($ok) {
$output = $match.groups[0].value
[console]::write($output)
} else {
//no match
}
in 'bash classic' regular expressions usage is precarious.
you can use this:
http://www.robvanderwoude.com/findstr.php

Return a regex match in a Bash script, instead of replacing it

I just want to match some text in a Bash script. I've tried using sed but I can't seem to make it just output the match instead of replacing it with something.
echo -E "TestT100String" | sed 's/[0-9]+/dontReplace/g'
Which will output TestTdontReplaceString.
Which isn't what I want, I want it to output 100.
Ideally, it would put all the matches in an array.
edit:
Text input is coming in as a string:
newName()
{
#Get input from function
newNameTXT="$1"
if [[ $newNameTXT ]]; then
#Use code that im working on now, using the $newNameTXT string.
fi
}
You could do this purely in bash using the double square bracket [[ ]] test operator, which stores results in an array called BASH_REMATCH:
[[ "TestT100String" =~ ([0-9]+) ]] && echo "${BASH_REMATCH[1]}"
echo "TestT100String" | sed 's/[^0-9]*\([0-9]\+\).*/\1/'
echo "TestT100String" | grep -o '[0-9]\+'
The method you use to put the results in an array depends somewhat on how the actual data is being retrieved. There's not enough information in your question to be able to guide you well. However, here is one method:
index=0
while read -r line
do
array[index++]=$(echo "$line" | grep -o '[0-9]\+')
done < filename
Here's another way:
array=($(grep -o '[0-9]\+' filename))
Pure Bash. Use parameter substitution (no external processes and pipes):
string="TestT100String"
echo ${string//[^[:digit:]]/}
Removes all non-digits.
I Know this is an old topic but I came her along same searches and found another great possibility apply a regex on a String/Variable using grep:
# Simple
$(echo "TestT100String" | grep -Po "[0-9]{3}")
# More complex using lookaround
$(echo "TestT100String" | grep -Po "(?i)TestT\K[0-9]{3}(?=String)")
With using lookaround capabilities search expressions can be extended for better matching. Where (?i) indicates the Pattern before the searched Pattern (lookahead),
\K indicates the actual search pattern and (?=) contains the pattern after the search (lookbehind).
https://www.regular-expressions.info/lookaround.html
The given example matches the same as the PCRE regex TestT([0-9]{3})String
Use grep. Sed is an editor. If you only want to match a regexp, grep is more than sufficient.
using awk
linux$ echo -E "TestT100String" | awk '{gsub(/[^0-9]/,"")}1'
100
I don't know why nobody ever uses expr: it's portable and easy.
newName()
{
#Get input from function
newNameTXT="$1"
if num=`expr "$newNameTXT" : '[^0-9]*\([0-9]\+\)'`; then
echo "contains $num"
fi
}
Well , the Sed with the s/"pattern1"/"pattern2"/g just replaces globally all the pattern1s to pattern 2.
Besides that, sed while by default print the entire line by default .
I suggest piping the instruction to a cut command and trying to extract the numbers u want :
If u are lookin only to use sed then use TRE:
sed -n 's/.*\(0-9\)\(0-9\)\(0-9\).*/\1,\2,\3/g'.
I dint try and execute the above command so just make sure the syntax is right.
Hope this helped.
using just the bash shell
declare -a array
i=0
while read -r line
do
case "$line" in
*TestT*String* )
while true
do
line=${line#*TestT}
array[$i]=${line%%String*}
line=${line#*String*}
i=$((i+1))
case "$line" in
*TestT*String* ) continue;;
*) break;;
esac
done
esac
done <"file"
echo ${array[#]}