syntax error executing regex on bash - regex

I have the following small code to retrieve file name,
for pfile in ../data/${BENCH}/*.data; do
prot=`expr match ${pfile%.data} "../data/${BENCH}/\(.*\)"`
echo ${prot}
done
pfile%.data is a string like "../data/gpcr/3.4.5.data". However this expression returns 'syntax error'.
I also tried,
prot=`expr match "${pfile%.data}" "../data/${BENCH}/\(.*\)"` AND
prot=`expr match "${pfile%.data}" : "../data/${BENCH}/\(.*\)"` AND
prot=`expr match "${pfile%.data}" '../data/${BENCH}/\(.*\)'`
neither of them worked. I am running these on MacOSX terminal.
Thanks in advance.

According to https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/expr.1.html, you need to drop the "match" keyword, which does not exist in this implementation
prot=$(expr ${pfile%.data} : "../data/${BENCH}/\(.*\)")
But since you have bash:
prot=${pfile%.data}
prot=${prot##*/}
echo "$prot"
Or you could do:
prot=$( basename "$pfile" ".data" )

Would just this work for you?
for pfile in ../data/$BENCH/*.data; do
[[ $pfile =~ ^../data/$BENCH/(.*).data$ ]]
prot=${BASH_REMATCH[1]}
echo "$prot"
done
(I guess it's what you're trying to express with the expr match statement).

Related

FreeBSD Bash - Unable to code a condition with regex

I am trying to code a script for pfSense who is based on FreeBSD. The only part left who is giving me trouble is a condition with a regex. The simplified code is this :
RESPONSE='{"port":98989}'
REG='{"port":([0-9]*)}'
if [[ $RESPONSE =~ $REG ]]; then
PORT=${BASH_REMATCH[1]}
fi
With the trace mode enabled, the error returned is the following :
+ RESPONSE='{"port":98989}'
+ REG='{"port":([0-9]*)}'
+ '[[' '{"port":98989}' '=~' '{"port":([0-9]*)}' ]]
./pia-port-v2: [[: not found
I don't understand why the [[ is between single quote in the trace and it is probably why the "not found" error occurs.
Update
It is probably because pfSense's FreeBSD does not support bash and these instructions are bash only. I found that after writing this question and trying to find an answer.
Anybody have an alternative for bourne shell? The goal is to return the port number if the expression match.
I am new to script coding in unix like OS.
In the meantime, I look at grep, but it seems to apply the regex to file input only.
You should be able to use the expr utility to do this, but note that it use Posix basic regexps, which means that you need to backslash your parentheses to make them into captures:
response='{"port":98989}'
reg='{"port":\([0-9]*\)}'
port=$(expr "$response" : "$reg")
expr returns failure if the regex doesn't match, so you could use a shell conditional to test:
port=$(expr "$response" : "$reg") || { echo Failed; }
or
if ! port=$(expr "$response" : "$reg"); then
# Do something on failure
fi
With /bin/sh:
#!/bin/sh
response='{"port":98989}'
case $response in
'{"port":'[0-9]*'}')
port=${response#*:} # {"port":98989} --> 98989}
port=${port%'}'} # 98989} --> 98989
esac
printf 'response %s yields port %s\n' "$response" "$port"
Note that a case statement does not use regular expression but shell filename globbing patterns. Therefore, the pattern will only match a single digit and trigger for bogus strings like {"port":0xxx}.
If the response string is a JSON document:
$ response='{"port":98989}'
$ printf '%s\n' "$response" | jq .port
98989
There is trouble with ' and " when using [[ regexps (sometimes; not always) so I would try this instead (which works fine for me):
#!/bin/bash
REG=\{\"port\"\:\([0-9]\*\)\} # This line is altered
RESPONSE='{"port":98989}'
if [[ $RESPONSE =~ $REG ]]; then
echo funkar
fi

How to capture the beginning of a filename using a regex in Bash?

I have a number of files in a directory named edit_file_names.sh, each containing a ? in their name. I want to use a Bash script to shorten the file names right before the ?. For example, these would be my current filenames:
test.file.1?twagdsfdsfdg
test.file.2?
test.file.3?.?
And these would be my desired filenames after running the script:
test.file.1
test.file.2
test.file.3
However, I can't seem to capture the beginning of the filenames in my regex to use in renaming the files. Here is my current script:
#!/bin/bash
cd test_file_name_edit/
regex="(^[^\?]*)"
for filename in *; do
$filename =~ $regex
echo ${BASH_REMATCH[1]}
done
At this point I'm just attempting to print off the beginnings of each filename so that I know that I'm capturing the correct string, however, I get the following error:
./edit_file_names.sh: line 7: test.file.1?twagdsfdsfdg: command not found
./edit_file_names.sh: line 7: test.file.2?: command not found
./edit_file_names.sh: line 7: test.file.3?.?: command not found
How can I fix my code to successfully capture the beginnings of these filenames?
Regex as such may not be the best tool for this job. Instead, I'd suggest using bash parameter expansion. For example:
#!/bin/bash
files=(test.file.1?twagdsfdsfdg test.file.2? test.file.3?.?)
for f in "${files[#]}"; do
echo "${f} shortens to ${f%%\?*}"
done
which prints
test.file.1?twagdsfdsfdg shortens to test.file.1
test.file.2? shortens to test.file.2
test.file.3?.? shortens to test.file.3
Here, ${f%%\?*} expands f and trims the longest suffix that matches a ? followed by any characters (the ? must be escaped since it's a wildcard character).
You miss the test command [[ ]] :
for filename in *; do
[[ $filename =~ $regex ]] && echo ${BASH_REMATCH[1]}
done

Bash: shell script if statement using multiple conditions including regex

I am currently studying the shell script and having some syntax issue.
what I am tyring is to make the 'if' statement to catch any user-input with alphabet, except the "giveup" line
here is the code that I built:
if [ $usrGuess =~ *[:alpha:]* && $usrGuess != "giveup" ]
once I run the code, it gives out the error message saying that:
[: missing `]'
If you guys have any solution to this, I will be happy to hear your advice :)
Thanks!
test ([) builtin of any shell (or the external one) does not support putting conditional construct e.g. &&, || or multiple command separator e.g. ; inside it.
Also, [ does not support Regex matching with =~. BTW your Regex pattern is not correct, it seems more like a glob pattern (and that should suffice in this case).
Both of the above are supported by the [[ keyword of bash and not all shells support these.
So, you can do:
if [[ $usrGuess = *[[:alpha:]]* && $usrGuess != "giveup" ]]
Here, I have moved for [[ and used the Glob match $usrGuess = *[:alpha:]* (dropped Regex matching).
Use double brackets, as your condition is composite:
if [[ $usrGuess =~ *[:alpha:]* && $usrGuess != "giveup" ]]
A slightly different approach using grep command would also work.
if grep -v '^giveup$' <<<$userGuess | grep -iq '^[a-z]*$'
In this example, we use exit code of grep command to make a if-else decision. Also note the '-q' option to second grep command. This ensures that the grep command matches the pattern silently.
Pros: Less complicated if() clause.
Con: There are two grep processes executed.
If you did want to retain POSIX compatibility, use the expr command to perform the regular expression match.
if expr "$usrGuess" : '[[:alpha:]]*' > /dev/null && [ "$usrGuess" != "giveup" ]
Either way, I'd opt to check against "giveup" first; if that check fails, you avoid the more expensive regular-expression check altogether.

How do I print out all the value my regex could take in bash

I would like to print all the value a regex could take in bash. How could I do it, knowing the following code doesn't work.
regex="^db10300[7-9]$"
for valueofregex in $regex
do
echo "$valueofregex";
done
It should of course print :
db103007
db103008
db103009
Thanks in advance
Looks like you are searching for brace expansion:
for value in db10300{7..9}
do
echo "$value"
done

Bash regular expression execution hangs on long expressions

I need to validate a 38 field comma seperated string. Fields can be numeric, decimal or empty allowed strings.
Problem is when I construct a regular expression for 38 fields and try to execute, it hangs forever and it hangs.
I use following per field reg exps:
INT="[0-9]+"
TIM="[0-9]+"
NUM="[0-9]+(\.[0-9]+)?"
STR=".*" # --> (also tried "[^,]*" but no change)
I constructed my regexps with above expressions.
1) This is working fine: (Output: "matches")
[[ "str1,1.1,5,6,7,8,9,str2,str3,str4,str1,1.1,5,6,7,8,9,str2,str3,str4,str1,1.1,5,6,7,8,9,str2,str3,str4,str1,1.1,5,6,7,8,9,str2,str3,str4" =~ ^.*\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,.*\,.*\,.*\,.*\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,.*\,.*\,.*\,.*\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,.*\,.*\,.*\,.*\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,.*\,.*\,.*$ ]] && echo matches
2) This hangs and execution wont complete !!!:
[[ "str1,1.1,5,6,7,8,9,str2,str3,str4,str5,str6,str7,str8,str9,str10,str11,2.0,str12,0.0,5.0,str13,12312545645,45456456478,78979754545,12312545645,45456456478,78979754545,78979754545,4.74,0.1245,4.174,0.4245,6,80,str14,str15" =~ ^.*\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,.*\,.*\,.*\,.*\,.*\,.*\,.*\,.*\,.*\,.*\,[0-9]+(\.[0-9]+)?\,.*\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,.*\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+(\.[0-9]+)?\,.*\,.*$ ]] && echo matches
I thought .* is too generic then tried [^,]* but nothing changed.
Please advice how can I solve this without splitting by "," once then compare one by one.
!!! Correction !!!
Above I stated:
STR="." # --> (also tried "[^,]" but no change)
This is wrong. Noticed that, I failed to replace all of them. When I replace all .* to [^,] problem is resolved. See below:
3) This is fixed version and working as expected:
[[ "str1,1.1,5,6,7,8,9,str2,str3,str4,str5,str6,str7,str8,str9,str10,str11,2.0,str12,0.0,5.0,str13,12312545645,45456456478,78979754545,12312545645,45456456478,78979754545,78979754545,4.74,0.1245,4.174,0.4245,6,80,1,str15,str16" =~ ^[^,]*\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[^,]*\,[^,]*\,[^,]*\,[^,]*\,[^,]*\,[^,]*\,[^,]*\,[^,]*\,[^,]*\,[^,]*\,[0-9]+(\.[0-9]+)?\,[^,]*\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,[^,]*\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,[0-9]+(\.[0-9]+)?\,[0-9]+\,[0-9]+\,[0-9]+(\.[0-9]+)?\,[^,]*\,[^,]*$ ]] && echo matches
Watch out for Catastrophic Backtracking that I learned from this issue.
Sorry I am not able to comment since my reputation is lower than 50. :(
Will the following regex work for you?
^([A-Za-z0-9\s\.]+\,){37}[A-Za-z0-9\s\.]+$