shell script regex help needed - regex

My file names are like this F1616L_GATCAG_L002_R2_001, and I want to extract the name before the first underscore _, in this case, F1616L.
I am a newbie to shell script regex, could someone help with this?
I appreciate it.

You can do that using BASH string manipulation:
s='F1616L_GATCAG_L002_R2_001'
echo "${s%%_*}"
F1616L
UPDATE: To get 2nd part after _:
[[ "$s" =~ ^[^_]+_([^_]+) ]] && echo ${BASH_REMATCH[1]}
GATCAG

Related

BASH script to match a word using REGEX

I have a CSV being read into a script that has the phrases:
This port supports SSLv3/TLSv1.0.
This port supports TLSv1.0/TLSv1.1/TLSv1.2.
This port supports TLSv1.2.
What I'm looking to do is setup a REGEX variable on the word/number: TLSv1.0
Then reference that variable in an IF/Then statement. The problem I'm
having is getting the regex to see the TLSv1.0. Could somebody help me
craft my BASH script to see TLSv1.0 when it's along a line that starts off with "This port supports"?
#!/bin/sh
REGEX="\TLSv1.0\"
cat filename.csv | awk -F"," '{gsub(/\"/,"",$4);print $5}' | sed s/\"//g |
while IFS=" " read pluginoutput
do
if [[ "$pluginoutput" =~ $REGEX ]]; then
.
. rest of my code
.
You can see that I'm trying to set the regex in the variable, but it just isn't working. Obviously a typo or something. Does anybody have a regex suggestion?
Thanks,
There are a lot of things wrong here. To pick some key ones:
#!/bin/sh specifies that you want your script to be interpreted with a POSIX-compliant interpreter, but doesn't specify which one. Many of these, like ash or dash, don't have [[ ]], =~, or other extensions which your code depends on. Use #!/bin/bash instead.
In REGEX="\TLSv1.0\", the "s are data, not syntax. This means that they're part of the content being searched for when you do [[ $string =~ $regex ]]. By contrast, regex=TLSv1.0, regex="TLSv1.0" or regex='TLSv1.0' will all have the identical effect, of assigning TLSv1.0 as the content of the regex variable.
That said, as a point on regex syntax, you probably want regex='TLSv1[.]0' -- that way it will only match a ., as opposed to treating the dot as a match-any-character wildcard (as it is in regular-expression syntax).
Personally, I might do something more like the following (if I had a reason to do the parsing in bash rather than to let a single egrep call process all your input):
#!/bin/bash
regex='(^|,)"?This port supports .*TLSv1[.]0.*[.]"?($|,)'
while IFS= read -r line; do
[[ $line =~ $regex ]] && echo "Found TLSv1.0 support"
done

Find a string in a file name (shell script)

I am trying to use regex to match a file name and extract only a portion of the file name. My file names have this pattern: galax_report_for_Sample11_8757.xls, and I want to extract the string Sample11 in this case. I have tried the following regex, but it does not work for me, could someone help with the correct regex?
name=galax_report_for_Sample11_8757.xls
sampleName=$([[ "$name" =~ ^[^_]+_([^_]+) ]] && echo ${BASH_REMATCH[2]})
edit:
just found this works for me:
sampleName=$([[ "$name" =~ ^[^_]+_([^_]+)_([^_]+)_([^_]+) ]] && echo ${BASH_REMATCH[3]})
In a simple case like this, where you essentially have just a list of values separated by a single instance of a separator character each, consider using cut to extract the field of interest:
sampleName=$(echo 'galax_report_for_Sample11_8757.xls' | cut -d _ -f 4)
If you're using bash or zsh or ksh, you can make it a little more efficient:
sampleName=$(cut -d _ -f 4 <<< 'galax_report_for_Sample11_8757.xls')
Here is a slightly shorter alternative to the approach you used:
sampleName=$([[ "$name" =~ ^([^_]+_){3}([^_]+) ]] && echo ${BASH_REMATCH[2]})

Regex bash script

Can anyone help me understand why this doesn't work? Just trying out some simple regex in bash.
#!/bin/bash
re="-regex_"
if [[ "$re" =~ ^[-[:alpha:]_]+$ ]]; then
echo "Regex"
else
echo "this is not regex"
fi
Cheers
I am assuming that you are hoping that the "-regex_" will evaluate to true in your if statement.
on the [:alpha:] tag there is nothing to say search for more than one alpha-numeric character.
try
[[ "$re" =~ ^-[[:alpha:]]+_$ ]]
If you are having an error running it, make sure you are using unix line endings (run it through dos2unix) and make sure it is marked executable. Otherwise, the script prints "Regex" for me.

Regular Expressions in BASH?

I am ok with regular expressions in Perl but not had to do it in BASH before.
I tried to google for some sort of tutorial on it but didn't see any really good ones yet the way there are with Perl.
What I am trying to achieve is to strip /home/devtestdocs/devtestdocs-repo/ out of a variable called $filename and replace it with another variable called $testdocsdirurl
Hopefully that makes sense and if anybody has any good links that would be much appreciated.
Another way might be is if there is already a function someone has written to do a find and replace in bash.
sed is the typical weapon of choice for string manipulation in Unix:
echo $filename | sed s/\\/home\\/devtestdocs\\/devtestdocs-repo\\//$testdocsdirurl/
Also, as hop suggests, you can use the # syntax to avoid escaping the path:
echo $filename | sed s#/home/devtestdocs/devtestdocs-repo/#$testdocsdirurl#
You can achieve this without a regular expression:
somepath="/foo/bar/baz"
newprefix="/alpha/beta/"
newpath="$newprefix${somepath##/foo/bar/}"
yes, bash supports regular expressions, e.g.
$ [[ 'abc' =~ (.)(.)(.) ]]
$ echo ${BASH_REMATCH[1]}
a
$ echo ${BASH_REMATCH[2]}
b
but you might rather want to use basename utility
$ f='/some/path/file.ext'
$ echo "/new/path/$(basename $f)"
/new/path/file.ext
excellent source of info is bash manual page
With bash
pattern=/home/devtestdocs/devtestdocs-repo/
testdocsdirurl=/tmp/
filename=/foo/bar/home/devtestdocs/devtestdocs-repo/filename
echo ${filename/$pattern/$testdocsdirurl} # => /foo/bar/tmp/filename
Why do you need regular expressions for this?
These are just a few possibilities:
$ filename=/home/devtestdocs/devtestdocs-repo/foo.txt
$ echo ${filename/'/home/devtestdocs/devtestdocs-repo/'/'blah/'}
blah/foo.txt
$ basename $filename
foo.txt
$ realfilename=$(basename "$filename")
you're looking for an example of how use regular expressions in powershell?
is there an example here:
$input = "hello,123"
$pattern = ([regex]"[0-9]+")
$match = $pattern.match($input)
$ok = $input -match $pattern #return an boolean value if matched..
if($ok) {
$output = $match.groups[0].value
[console]::write($output)
} else {
//no match
}
in 'bash classic' regular expressions usage is precarious.
you can use this:
http://www.robvanderwoude.com/findstr.php

Return a regex match in a Bash script, instead of replacing it

I just want to match some text in a Bash script. I've tried using sed but I can't seem to make it just output the match instead of replacing it with something.
echo -E "TestT100String" | sed 's/[0-9]+/dontReplace/g'
Which will output TestTdontReplaceString.
Which isn't what I want, I want it to output 100.
Ideally, it would put all the matches in an array.
edit:
Text input is coming in as a string:
newName()
{
#Get input from function
newNameTXT="$1"
if [[ $newNameTXT ]]; then
#Use code that im working on now, using the $newNameTXT string.
fi
}
You could do this purely in bash using the double square bracket [[ ]] test operator, which stores results in an array called BASH_REMATCH:
[[ "TestT100String" =~ ([0-9]+) ]] && echo "${BASH_REMATCH[1]}"
echo "TestT100String" | sed 's/[^0-9]*\([0-9]\+\).*/\1/'
echo "TestT100String" | grep -o '[0-9]\+'
The method you use to put the results in an array depends somewhat on how the actual data is being retrieved. There's not enough information in your question to be able to guide you well. However, here is one method:
index=0
while read -r line
do
array[index++]=$(echo "$line" | grep -o '[0-9]\+')
done < filename
Here's another way:
array=($(grep -o '[0-9]\+' filename))
Pure Bash. Use parameter substitution (no external processes and pipes):
string="TestT100String"
echo ${string//[^[:digit:]]/}
Removes all non-digits.
I Know this is an old topic but I came her along same searches and found another great possibility apply a regex on a String/Variable using grep:
# Simple
$(echo "TestT100String" | grep -Po "[0-9]{3}")
# More complex using lookaround
$(echo "TestT100String" | grep -Po "(?i)TestT\K[0-9]{3}(?=String)")
With using lookaround capabilities search expressions can be extended for better matching. Where (?i) indicates the Pattern before the searched Pattern (lookahead),
\K indicates the actual search pattern and (?=) contains the pattern after the search (lookbehind).
https://www.regular-expressions.info/lookaround.html
The given example matches the same as the PCRE regex TestT([0-9]{3})String
Use grep. Sed is an editor. If you only want to match a regexp, grep is more than sufficient.
using awk
linux$ echo -E "TestT100String" | awk '{gsub(/[^0-9]/,"")}1'
100
I don't know why nobody ever uses expr: it's portable and easy.
newName()
{
#Get input from function
newNameTXT="$1"
if num=`expr "$newNameTXT" : '[^0-9]*\([0-9]\+\)'`; then
echo "contains $num"
fi
}
Well , the Sed with the s/"pattern1"/"pattern2"/g just replaces globally all the pattern1s to pattern 2.
Besides that, sed while by default print the entire line by default .
I suggest piping the instruction to a cut command and trying to extract the numbers u want :
If u are lookin only to use sed then use TRE:
sed -n 's/.*\(0-9\)\(0-9\)\(0-9\).*/\1,\2,\3/g'.
I dint try and execute the above command so just make sure the syntax is right.
Hope this helped.
using just the bash shell
declare -a array
i=0
while read -r line
do
case "$line" in
*TestT*String* )
while true
do
line=${line#*TestT}
array[$i]=${line%%String*}
line=${line#*String*}
i=$((i+1))
case "$line" in
*TestT*String* ) continue;;
*) break;;
esac
done
esac
done <"file"
echo ${array[#]}