Bash regex not recognizing a single space " " - regex

I'm trying to solve a problem that appeared in my script which doesn't let me match the date+time (YYYY-MM-DD HH:MM:SS) inside a for loop
list='"dt_txt":"2022-06-03 21:00:00"},'
regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'
for i in $list; do
[[ $i =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done
It seems that the "." between the two pair of brackets it's not recognizing the space! that's because inside of the list, if I replace the empty space between the date and the time by a _, it works as intended! list='"dt_txt":"2022-06-03_21:00:00"},'
desired output:
2022-06-03 21:00:00
what I get:
2022-06-03

The problem here is one that catches a lot of people, and that is whitespace breaking. In the for loop, your $list variable is not quoted, and it contains a space:
$ list='"dt_txt":"2022-06-03 21:00:00"},'
$ for i in $list ; do echo "i = $i" ; done ;
i = "dt_txt":"2022-06-03
i = 21:00:00"},
Make sure to put double-quotes around all strings that contain variables except regexes:
Using an array for list, which is what makes sense when using the for loop from your original code, it would look something like this:
#!/usr/bin/env bash
# filename: re.sh
list=(
'"dt_txt":"2022-06-03 21:00:00"},'
'"dt_txt":"2022-06-03 22:00:00"},'
'"dt_txt":"2022-06-03 23:00:00"},'
)
regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'
for i in "${list[#]}" ; do
[[ "$i" =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done
$ ./re.sh
2022-06-03 21:00:00
2022-06-03 22:00:00
2022-06-03 23:00:00

Related

'$' in regexp in bash

I really don't know what I'm doing.
In variable a, I want to find the first appearance of '$' after the first appearance of 'Bitcoin', and print everything after it until the first newline.
I have the following code:
a = 'something Bitcoin something againe $jjjkjk\n againe something'
if [[ $a =~ .*Bitcoin.*[\$](.*).* ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
In this example I would like to get 'jjjkjk'. All I get is 'no'.
This code might be really flawed, I have no experience in this. I think tho the problem might be with the '$' sign. Please help!
Properly handle newlines in bash with ANSI-C Quoting -- \n sequences become literal newlines.
a=$'something Bitcoin something againe $jjjkjk\n againe something'
regex=$'Bitcoin[^$]*[$]([^\n]+)'
[[ $a =~ $regex ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="Bitcoin something againe \$jjjkjk" [1]="jjjkjk")'
# .................................................................^^^^^^^^^^^^
To verify the contents contain newlines:
$ printf '%s' "$regex" | od -c
0000000 B i t c o i n [ ^ $ ] * [ $ ] (
0000020 [ ^ \n ] + )
0000026
Here is a working version of your code:
a='something Bitcoin something againe $jjjkjk\n againe something'
r=".*Bitcoin.*[\$]([^\n]*).*"
if [[ $a =~ $r ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
You need to find 'Bitcoin' then find a '$' after it, no matter what is between, so you should use .* operator, also when you want to capture some text until a specific char, the best way is using [^](not) operator, in your case: [^\n] this means capture everything until \n.
Also you had an issue with your variable declaration. a = "..." is not valid, the spaces are waste. so the correct one is 'a=".."`.
Using double quotation is wrong too, this will replaces dollar sign with an empty variable (evaluation)

Bash only get the first matched result when use regex

There's a string example
"j2sdk/1.8.0_25-static j2sdk/1.8.0_45 j2sdk/1.8.0_p120 j2sdk/1.8.0_40 j2sdk/1.8.0_51"
I want to find the ones matched with format j2sdk/1.8.0_xxx, but xxx only with digits, here, I want below strings be matched
j2sdk/1.8.0_45
j2sdk/1.8.0_40
j2sdk/1.8.0_51
I wrote below code, but when run, it only get the first matched j2sdk/1.8.0_45, anything wrong with my code?
avail_versions="j2sdk/1.8.0_25-static j2sdk/1.8.0_45 j2sdk/1.8.0_p120 j2sdk/1.8.0_40 j2sdk/1.8.0_51"
patern='j2sdk\/1\.8\.0_[0-9]+\s+'
if [[ $avail_versions =~ $patern ]];then
echo matched
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
fi
The results is that BASH_REMATCH[0] is j2sdk/1.8.0_45, BASH_REMATCH[1] and [2] are empty
I expected I can get them in BASH_REMATH[1],BASH_REMATH[2],BASH_REMATH[3].
Is there other way in Bash I can get expected matches.
Thanks
I split the input at spaces and add back the space after each word.
for s in $avail_versions ; do
s="$s "
if [[ $s =~ $patern ]];then
echo ${BASH_REMATCH[0]}
fi
done
j2sdk/1.8.0_45
j2sdk/1.8.0_40
j2sdk/1.8.0_51

preg_match_all equivalent for BASH?

I have a string like this
foo:collection:indexation [options] [--] <text> <text_1> <text_2> <text_3> <text_4>
And i want to use bash regex to get an array or string that I can split to get this in order to check if the syntax is correct
["text", "text_1", "text_2", "text_3", "text_4"]
I have tried to do this :
COMMAND_OUTPUT=$($COMMAND_HELP)
# get the output of the help
# regex
ARGUMENT_REGEX="<([^>]+)>"
GOOD_REGEX="[a-z-]"
# get all the arguments
while [[ $COMMAND_OUTPUT =~ $ARGUMENT_REGEX ]]; do
ARGUMENT="${BASH_REMATCH[1]}"
# bad syntax
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
But the while does not seem to be appropriate since I always get the first match.
How can I get all the matches for this regex ?
Thanks !
The loop doesn't work because every time you're just testing the same input string against the regexp. It doesn't know that it should start scanning after the match from the previous iteration. You'd need to remove the part of the string up to and including the previous match before doing the next test.
A simpler way is to use grep -o to get all the matches.
$COMMAND_HELP | grep -o "$ARGUMENT_REGEX" | while read ARGUMENT; do
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
Bash doesn't have this directly, but you can achieve a similar effect with a slight modification.
string='foo...'
re='<([^>]+)>'
while [[ $string =~ $re(.*) ]]; do
string=${BASH_REMATCH[2]}
# process as before
done
This matches the regex we want and also everything in the string after the regex. We keep shortening $string by assigning only the after-our-regex portion to it on every iteration. On the last iteration, ${BASH_REMATCH[2]} will be empty so the loop will terminate.

In bash how do I match the a string of the form [SOME_ALPHA_NUM_WORD]?

I have tried stuff like =~ "\[[A-Za-z0-9]+\]" which I would expect would work but doesnt. I also tried "[[A-Za-z0-9]+]" and "\[[:alnum:]+\]". What am I doing wrong? Sample line I want to match: [RTNUT18] (I am iterating through a file, some lines are of this form)
This is my code snippet:
while read line;
do
if [[ $line =~ "^\[[A-Za-z0-9]+\]$" ]]; then
echo match
else
echo no match
fi
done < $1
This is a sample file:
[RBPAT7]
Whatever=foo,bla
Otherline
RRR
and I run:
./script.sh thefile.txt
I am not getting a hit on the [RBPAT7] line at all
Stuff like that isn't enough. You must use it in [[.
$ [[ [foo] =~ ^\[[A-Za-z0-9]+\]$ ]] ; echo $?
0
EDIT:
Unlike test, [[ does not need quotes around its arguments. Your code matches nothing, since you can't have " before the beginning of the line, nor " after the end. Remove the quotes.

Bash script with regex not behaving on Ubuntu

I have a Bash script that is working on my OpenSuSE box, but when copied across to my Ubuntu box, is not working. The script reads in from a file. The file has fields separated by white space (tabs and spaces).
#!/bin/bash
function test1()
{
while read LINE
do
if [[ $LINE =~ "^$" || $LINE =~ "^#.*" ]] ; then
continue;
fi
set -- $LINE
local field1=$1
local field2=$2
done < test.file
}
test1
with test.file containing:
# Field1Header Field2Header
abcdef A-2
ghijkl B-3
There seem to be two problems:
(1) $field2, the one with the hyphen, is blank
(2) The regex to strip out the blank lines and lines that start with # is not working
Anyone know what's wrong? As I said, it works fine on OpenSuSE.
Thanks,
Paul
Apparently, as of bash 3.2 the regular expression should not be quoted. So this should work:
#!/bin/bash
while read LINE
do
if [[ $LINE =~ ^$ || $LINE =~ ^#.* ]] ; then
continue;
fi
set -- $LINE
local field1=$1
local field2=$2
done < test.file
Edit: you should probably use Jo So's answer as it's definitely cleaner. But I was explaining why the regex fails and the reason behind the different behavior between OpenSuse and Ubuntu(different version of bash, very probably)
Quoting is wrong, that probably accounts for the regex failing.
No need to use bashisms.
No need to use set
Try
while read field1 field2 dummy
do
if ! test "${field1%%#*}"
then
continue
fi
# do stuff here
done
EDIT: The obvious version using set
while read -r line
do
if ! test "${line%%#*}"
then
continue
fi
set -- $line
do_stuff_with "$#"
done
On my ubuntu there is no expresion like "=~" for test command. Just use this one:
if [[ $LINE = "" || ${LINE:0:1} = "#" ]] ; then
continue;
fi