I'm trying to use regex in an if statement in a bash script, but I am getting different values.
The script:
#!/bin/bash
a="input2.txt"
paramCheck(){
while read -r line; do
d=( $line )
e=${d[#]:1}
for i in "$e"; do
if [ "$i" == $[st][0-9] ]; then
echo "$i"
fi
done
done < "$a"
}
echo `paramCheck`
The text file:
add $s1 $s2 $s3
sub $t0
sub $t1 $t0
addi $t1 $t0 $s5
The predicted results:
$s1 $s2 $s3 $t0 $t1 $t0 $t1 $t0 $s5
The actual result was: nothing printed out.
You have to use double brackets for regex matching and escape the dollar, as it is a special bash character. Substitute
if [ "$i" == $[st][0-9] ]; then
for
if [[ "$i" = \$[st][0-9] ]]; then
Here's one way you could do this using various standard utilities:
$ cut -d' ' -f2- infile | grep -o '\$[st][[:digit:]]' | paste -sd ' '
$s1 $s2 $s3 $t0 $t1 $t0 $t1 $t0 $s
cut removes the first space separated column
grep finds all matches of the pattern and prints them one per line
paste gets the output on a single line
In pure Bash:
#!/usr/bin/env bash
while read -ra line; do
for word in "${line[#]:1}"; do
[[ $word == \$[st][[:digit:]] ]] && printf '%s ' "$word"
done
done < 'input2.txt'
reads directly into an array with read -a
no intermediate assignment, loop directly over elements of "${line[#]:1}"
use [[ ]] for pattern matching, escape $, use locale-safe [[:digit:]] instead of [0-9]
use printf instead of echo to suppress linebreaks
Notice that this'll add a trailing blank.
A few pointers for your code:
d=( $line ) relies on word splitting and is subject to filename expansion; if you have a word * in $line, it'll expand to all files in the directory.
e=${d[#]:1} assigns the second and later elements of the array to a single string – now we don't have an array any longer. To keep the array, use e=("${d[#]:1}") instead.
for i in "$e" now has $e containing all the elements in a single string, and the quoting suppresses word splitting, so for the first line, this'll put all of $s1 $s2 $s3 into i instead of just $s1. The intent is probably for i in $e, but that's again subject to word splitting and glob expansion; use an array instead.
[ ] doesn't support pattern matching, use [[ ]] instead. $ has to be escaped.
Glob patterns (used here) are not regular expressions. Check the "Patterns" article in the references for a good overview of the differences.
Bash does understand both == and = within [ ], but == isn't portable (as in "POSIX conformant") – it's a good habit to use = instead. Within [[ ]], it's debatable what to use, as [[ ]] isn't portable itself.
echo `cmd` is the same as just cmd.
References:
cut invocation
grep -o manual
paste invocation
Wooledge wiki article about patterns
Related
I'm trying to solve a problem that appeared in my script which doesn't let me match the date+time (YYYY-MM-DD HH:MM:SS) inside a for loop
list='"dt_txt":"2022-06-03 21:00:00"},'
regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'
for i in $list; do
[[ $i =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done
It seems that the "." between the two pair of brackets it's not recognizing the space! that's because inside of the list, if I replace the empty space between the date and the time by a _, it works as intended! list='"dt_txt":"2022-06-03_21:00:00"},'
desired output:
2022-06-03 21:00:00
what I get:
2022-06-03
The problem here is one that catches a lot of people, and that is whitespace breaking. In the for loop, your $list variable is not quoted, and it contains a space:
$ list='"dt_txt":"2022-06-03 21:00:00"},'
$ for i in $list ; do echo "i = $i" ; done ;
i = "dt_txt":"2022-06-03
i = 21:00:00"},
Make sure to put double-quotes around all strings that contain variables except regexes:
Using an array for list, which is what makes sense when using the for loop from your original code, it would look something like this:
#!/usr/bin/env bash
# filename: re.sh
list=(
'"dt_txt":"2022-06-03 21:00:00"},'
'"dt_txt":"2022-06-03 22:00:00"},'
'"dt_txt":"2022-06-03 23:00:00"},'
)
regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'
for i in "${list[#]}" ; do
[[ "$i" =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done
$ ./re.sh
2022-06-03 21:00:00
2022-06-03 22:00:00
2022-06-03 23:00:00
I made a trying to make regex expression that will validate a number that is in the range of -100 to 100.
the regex expression I made is ^[-+]?([0-9][0-9]?|100)$.
I am looking for a pattern in a string not just an integer by itself.
this is my script:
#!/bin/bash
a="input2.txt"
while read -r line; do
mapfile -t d <<< "$line"
for i in "${d[#]}"; do
if [[ "$i" =~ ^[-+]?([0-9][0-9]?|100)$ ]]; then
echo "$i"
fi
done
done < "$a"
this is my input file:
add $s1 $s2 $s3
sub $t0
sub $t1 $t0
addi $t1 $t0 75
lw $s1 -23($s2)
the actual result is nothing.
the expected result:
75 -23($s2)
[...] denotes a set of characters, where the dash can be used to specify a character range. For instance, [4-6u-z] in a regexp means one of the characters 4,5,6,u,v,w,x,z. Your expression [1-200] simply matches the characters (digits) 0, 1 and 2.
In your case, I would therefore proceed in two steps: First, extract from your string the initial numeric parts, and then use arithmetic comparision on the result. For example (not tested!):
if [[ $i =~ ^-?[0-9]+ ]]
then
intval=${BASH_REMATCH[0]}
if (( intval >= -200 && intval <= 1000 ))
then
....
See the bash man page for an explanation of the BASH_REMATCH array.
#first store your file in an array so that we could pass thru the words
word_array=( $(<filename) )
for i in "${word_array[#]}"
do
if [[ $i =~ ^([[:blank:]]{0,1}-?[0-9]+)([^[:digit:]]?[^[:blank:]]*)$ ]]
#above line looks for the pattern while separating the number and an optional string
#that may follow like ($s2) using '()' so that we could access each part using BASH_REMATCH later.
then
#now we have only the number which could be checked to fall within a range
[ ${BASH_REMATCH[1]} -ge -100 ] && [ ${BASH_REMATCH[1]} -le 100 ] && echo "$i"
fi
done
Sample Output
75
-23($s2)
Note : The pattern might need a bit more testing, but you could imbibe the idea.
I really don't know what I'm doing.
In variable a, I want to find the first appearance of '$' after the first appearance of 'Bitcoin', and print everything after it until the first newline.
I have the following code:
a = 'something Bitcoin something againe $jjjkjk\n againe something'
if [[ $a =~ .*Bitcoin.*[\$](.*).* ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
In this example I would like to get 'jjjkjk'. All I get is 'no'.
This code might be really flawed, I have no experience in this. I think tho the problem might be with the '$' sign. Please help!
Properly handle newlines in bash with ANSI-C Quoting -- \n sequences become literal newlines.
a=$'something Bitcoin something againe $jjjkjk\n againe something'
regex=$'Bitcoin[^$]*[$]([^\n]+)'
[[ $a =~ $regex ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="Bitcoin something againe \$jjjkjk" [1]="jjjkjk")'
# .................................................................^^^^^^^^^^^^
To verify the contents contain newlines:
$ printf '%s' "$regex" | od -c
0000000 B i t c o i n [ ^ $ ] * [ $ ] (
0000020 [ ^ \n ] + )
0000026
Here is a working version of your code:
a='something Bitcoin something againe $jjjkjk\n againe something'
r=".*Bitcoin.*[\$]([^\n]*).*"
if [[ $a =~ $r ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
You need to find 'Bitcoin' then find a '$' after it, no matter what is between, so you should use .* operator, also when you want to capture some text until a specific char, the best way is using [^](not) operator, in your case: [^\n] this means capture everything until \n.
Also you had an issue with your variable declaration. a = "..." is not valid, the spaces are waste. so the correct one is 'a=".."`.
Using double quotation is wrong too, this will replaces dollar sign with an empty variable (evaluation)
I need to match a string $str that contains any of
foo{77..93}
and capture the above substring in a variable.
So far I've got:
str=/random/string/containing/abc-foo78_efg/ # for example
if [[ $str =~ (foo[7-9][0-9]) ]]; then
id=${BASH_REMATCH[1]}
fi
echo $id # gives foo78
but this also captures ids outside of the target range (e.g. foo95).
Is there a way to restrict the regex to an exact integer range? (tried foo[77-93] but that doesn't work.
Thanks
If you want to use a regex, you're going to have to make it slightly more complex:
if [[ $str =~ foo(7[7-9]|8[0-9]|9[0-3]) ]]; then
id=${BASH_REMATCH[0]}
fi
Note that I have removed the capture group around the whole pattern and am now using the 0th element of the match array.
As an aside, for maximum compatibility with older versions of bash, I would recommend assigning the pattern to a variable and using in the test like this:
re='foo(7[7-9]|8[0-9]|9[0-3])'
if [[ $str =~ $re ]]; then
id=${BASH_REMATCH[0]}
fi
An alternative to using a regex would be to use an arithmetic context, like this:
if (( "${str#foo}" >= 77 && "${str#foo}" <= 93 )); then
id=$str
fi
This strips the "foo" part from the start of the variable so that the integer part can be compared numerically.
Sure is easy to do with Perl:
$ echo foo{1..100} | tr ' ' '\n' | perl -lne 'print $_ if m/foo(\d+)/ and $1>=77 and $1<=93'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93
Or awk even:
$ echo foo{1..100} | tr ' ' '\n' | awk -F 'foo' '$2>=77 && $2<=93
{print}'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93
All I need to do is extract the versioning information from the following file:
my_archive_1.1.1.201_x86_64.tgz
I am trying to extract both the version number which is 1.1.1 and the release number which is 201. Normally I use python for these purposes, but I have been asked not to. How do I do it by just using bash? The filename will always be of the form
([A-Za-z_]+)_([0-9]+\.[0-9]+\.[0-9]+)\.([0-9]+)_x86_64\.tgz
The groups are in parenthesis. I need the second and third groups if you start counting from 1.
Use pure BASH:
s='my_archive_1.1.1.201_x86_64.tgz'
[[ $s =~ ^[^_]+_[^_]+_(([^.]+\.){2}[^.]+)\.([^_]+) ]] && \
echo "${BASH_REMATCH[1]}, ${BASH_REMATCH[3]}"
OUTPUT:
1.1.1, 201
Using your own regex:
[[ $s =~ ([A-Za-z_]+)_([0-9]+\.[0-9]+\.[0-9]+).([0-9]+)_x86_64\.tgz ]] && \
echo "${BASH_REMATCH[2]}, ${BASH_REMATCH[3]}"
You can use simple string substitutions to extract substrings. You don't really need regular expressions. As a bonus, this is portable to other POSIX shells. Whether this is simpler or not is a matter of taste, and also depends on the problem.
s='my_archive_1.1.1.201_x86_64.tgz'
# ${s%%_[0-9]*} is 'my-archive'
s=${s#${s%%_[0-9]*}_}
# s='1.1.1.201_x86_64.tgz'
s=${s%%_*}
# s='1.1.1.201'
release=${s##*.}
version=${s%."$release"}
You might also want to experiment with set:
s='my_archive_1.1.1.201_x86_64.tgz'
oldIFS=$IFS
IFS=_
set $s
# $1 = my, $2=archive, $3=1.1.1.201, $4=x86, $5=64.tgz
# Shift until $1 contains only numbers and periods
while $1; do
case $1 in *[!.0-9]* ) shift ;; *) break ;; esac
done
IFS=.
set $1
version=$1.$2.$3
release=$4
IFS=$oldIFS
Another alternative without using regular expressions:
split=`echo "my_archive_1.1.1.201_x86_64.tgz" | cut -d'_' -f3`
versionnumber=`echo $split | cut -d'.' -f1,2,3`
releasenumber=`echo $split | cut -d'.' -f4`
echo "$versionnumber $releasenumber"