Matching exactly one whitespace inside if statement - regex

I'm trying to match a file right now to change the name of the file
tempString="hi"
end="_hi.pdf"
for c in *.pdf; do
tempString="$(echo ${c})"
if [[ $tempString =~ $AA[0-9][0-9][0-9]\.pdf$ ]]
then
echo "inside if"
tempString="$(echo $tempString | tr -d ' ')"
tempString=${tempString%.pdf}
mv "%c" "$monthyear$tempString$end"
fi
done
tempString is set to something like "AA 111.pdf"
i need it to match something like AA 111.pdf but not AA 111.pdf (one space instead of two spaces). I just want it to match exactly one whitespace inbetween AA and 111.
it keeps matching both of those examples or neither. i've tried \s, [\s], [:space:], [[:space:]], etc.
i've tried looking it up everywhere but to no avail. can somebody help me out?

The following will match one (and only one) space names like AA 111.pdf:
if [[ "$tempString" =~ ^AA" "[0-9][0-9][0-9]\.pdf$ ]];
The trick is to quote your spaces inside the regex.
Update: The following code ignores the two (and more) spaces example:
tempString="AA 111.pdf"
if [[ "$tempString" =~ ^AA" "[0-9][0-9][0-9]\.pdf$ ]]; then
echo "yes"
else
echo "no"
fi
This prints no
One-liner version:
tempString="AA 111.pdf"; if [[ "$tempString" =~ ^AA" "[0-9][0-9][0-9]\.pdf$ ]]; then echo "yes"; else echo "no"; fi

try this regex [a-zA-z]+ \d+
Demo
and if you want any character before space use this \w+ \d+
and if you want any character before and after space use this \w+ \w+
if you want to take file extension into consideration you can add \.pdf$
at the end of any regex from the above

Related

Mix of regex and non-regex in bash if-statement

Inside of my $foo variable I have this data (please pay close attention to the .s and ,s):
,example.com,de.wikipedia.org,reddit,stackoverflow.com.,amazon.,
I am trying to write an if statement in bash that basically works like this:
if [[ "${foo}" =~ *','[a-z0-9]','* || "${foo}" =~ *','[a-z0-9]'.,'* ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
It would echo Invalid input detected since reddit and amazon. are in $foo.
If I change the contents of $foo to be:
,example.com,de.wikipedia.org,www.reddit.com,stackoverflow.com.,amazon.com,
Then it would echo OK.
I am using bash 3.2.57(1)-release on OS X 10.11.6 El Capitan.
Try:
if [[ $foo =~ ,[a-z0-9]*, || $foo =~ ,[a-z0-9]*\., ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
Notes:
=~ is a regular expression operator. The right-hand-side needs to be a regular expression, not a glob.
, is not a shell-active character. Thus, it does not need any special quoting.
[a-z0-9] matches exactly one alphanumeric. Since we want to allow for more any number, use [a-z0-9]*
In regular expressions, ','* matches zero or more commas. This is not what you want. One might write ,.* which, because, . is a wildcard, matches a comma followed by zero or more of anything. Since the regex is not anchored to the end, adding a final .* makes no difference.
Inside of [[...]] there is no word splitting. So shell variables do not the double-quoting that need elsewhere.
Note that, in [a-z0-9], the exact characters that match a-z or 0-9 depend on the collation order in the locale.

'$' in regexp in bash

I really don't know what I'm doing.
In variable a, I want to find the first appearance of '$' after the first appearance of 'Bitcoin', and print everything after it until the first newline.
I have the following code:
a = 'something Bitcoin something againe $jjjkjk\n againe something'
if [[ $a =~ .*Bitcoin.*[\$](.*).* ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
In this example I would like to get 'jjjkjk'. All I get is 'no'.
This code might be really flawed, I have no experience in this. I think tho the problem might be with the '$' sign. Please help!
Properly handle newlines in bash with ANSI-C Quoting -- \n sequences become literal newlines.
a=$'something Bitcoin something againe $jjjkjk\n againe something'
regex=$'Bitcoin[^$]*[$]([^\n]+)'
[[ $a =~ $regex ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="Bitcoin something againe \$jjjkjk" [1]="jjjkjk")'
# .................................................................^^^^^^^^^^^^
To verify the contents contain newlines:
$ printf '%s' "$regex" | od -c
0000000 B i t c o i n [ ^ $ ] * [ $ ] (
0000020 [ ^ \n ] + )
0000026
Here is a working version of your code:
a='something Bitcoin something againe $jjjkjk\n againe something'
r=".*Bitcoin.*[\$]([^\n]*).*"
if [[ $a =~ $r ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
You need to find 'Bitcoin' then find a '$' after it, no matter what is between, so you should use .* operator, also when you want to capture some text until a specific char, the best way is using [^](not) operator, in your case: [^\n] this means capture everything until \n.
Also you had an issue with your variable declaration. a = "..." is not valid, the spaces are waste. so the correct one is 'a=".."`.
Using double quotation is wrong too, this will replaces dollar sign with an empty variable (evaluation)

shell script odd regex

i have some regex that is behaving oddly in my shell script i have variables, and i have tried every what way to get them to behave, and they dont seem to do any regex, and i know my regex quite well thanks to regex101, here is what a sample looks like
fname="direcheck"
FIND="*"
if [[ $fname =~ $FIND ]]; then
echo "no quotes"
fi
if [[ "$fname" =~ "$FIND" ]]; then
echo "with quotes"
fi
right now it will display nothing
if i change find to
FIND="[9]*"
then it prints no quotes
if i say
FIND="[a-z]*"
then it prints no quotes
if i say
FIND="dircheck"
then nothing prints
if i say
FIND="*ck"
then nothing prints
I don't get how this regex is working
how do i use these variables, and what is the proper syntax?
* and *ck are invalid regular expressions. It would work (with no quotes) if you were comparing with ==, not =~. If you want to use the same functionality that you get in == for them, the equivalent regexps are .* and .*ck.
[9]* is any number (including zero) of characters that are 9. There is zero characters 9 in your direcheck, so it matches. (Edited from brainfart, thanks chepner)
dircheck is not found in direcheck, so not printing anything is hardly surprising.
[a-z]* is any number of characters that are between a and z (i.e. any number of lowercase letters). This will match, assuming it's not quoted.
I finally figured it out, and why it was working so oddly
[a-z]* and [9]* and [anythinghere]* they all match because it matches zero or more times. so "direcheck" has [9] zero or more times.
so
if [[ "$fname" =~ $FIND ]]; then
or
if [[ $fname =~ $FIND ]]; then
are both correct, and
if [[ "$fname" =~ "$FIND" ]]; then
matches only when the string matches exactly because $FIND is matched as a literal string not regex

Bash regex does not accept slash

i am pretty new to bash shell scripting (and linux too)... i try to do a simple script which involves some regex for a string given by keyboard from a user.
clear
read -p "Insert e-mail > "
if [[ $REPLY =~ ^[.] ]]
then
echo "ERROR (code 1): e-mail cannot start with \".\""
elif [[ $REPLY =~ .[.]$ ]]
then
echo "ERROR (code 2): e-mail cannot end with \".\""
else
if [[ $REPLY =~ ^[0-9][0-9a-zA-Z!#$%^\&\'*+-]+$ ]] #THIS IS WHERE I NEED HELP
then
echo "Good!"
else
echo "Bad!"
fi
fi
so what i want to do is to make a regex
so that the user cant start with . or end with . (i pretty much did that and its working)...
next what i wanted to do was make the string start with a number and i did that with ^[0-9] (i think this is correct)
and after that..string could be anything like a number 0-9 or letters a-z and A-Z or the next characters: !#$%^&'*+-/
so when user entered 1& (it starts with number and the rest is in the acceptable characters) but it didn't work.. because it need to be \& (at the regex formula).
next the same problem occurred to character ' what i did, was to add again a backslash to regex formula (\') and it worked..
then i tried to do the same with / character (slash character) so what i did was add a backslash / (backslash slash) but when user entered 1/ (it starts with number and the rest are acceptable characters) unfortunately it printed "Bad!" ... it should print Good!..
why is that happening?
i tried \/ and \\/ but still... cant understand why it doesn't work!
Problem is presence of ! in your character class that is doing history expansion.
I suggest declaring your regex beforehand like this:
re="^[0-9][0-9a-zA-Z\!#$%^&/*'+-]+$"
Then use it as:
s='1/'
[[ $s =~ $re ]] && echo "good" || echo "bad"
good
Actually, /s work in character classes just fine:
$ [[ "1/" =~ ^[0-9][/]+$ ]]; echo $?
0

how to match two strings with regular expression in bash shell script?

I want to automate some task in a shell script. Among the code I need to make a comparison between two names that share the same digit but differ in one letter. I have a bunch of strings:
YC1SM YM1SM YC1SN YM1SN
YC4SM YM4SM YC4SN YM4SN
I need to match between the following:
$a=YC1SM
$b=YM1SM
or
$a=YC4SM
$b=YM4SM
or
$a=YC4SN
$b=YM4SN
I need to have an if clause using regular expression basically to do something like this:
if [$a matches $b]; then
command xxx
fi
How can I do this match within bash?
Edit:
The names are all the same length. They all differ in just one letter. This differing letter occur at the same position in the strings (here, the second character).
Edit2:
Added more scenario
Build a pattern from variable a and match b against the pattern.
a=YC1SM
b=YM1SM
pattern="${a:0:1}?${a:2}"
echo "$pattern"
[[ $b == $pattern ]] && echo match
Y?1SM
match
If the unmatched char must be a letter, change ? to [[:alpha:]]
You can have this comparison like this using BASH regex:
a=YC123SM
b=YART123JKL
[[ "$a" =~ ([0-9]+) ]] && n1="${BASH_REMATCH[1]}"
[[ "$b" =~ ([0-9]+) ]] && n2="${BASH_REMATCH[1]}"
[[ "$n1" -eq "$n2" ]] && echo "same" || echo "not same"
same
You don't need a regex. Just use the substring operation like this:
c="${a:0:1}${b:1:1}${a:2}"
if [[ "$c" -eq "$b" ]]; then
command xxx
fi
The substring operator works like this: ${var:first:length}
So the first line tkaes the first character of a, then the second character of b, the from the third character to the end of a.
In your case this will create a copy of a (called c) that will have all of the letters from a except it will contain the second character from b, which is the only character that you say can be different. Since this character is copied from b to make c, c will now match b if that character was the only difference.