Using regex on bash script to identify option parameters - regex

I'm fairly new to this topic and I apologize if this question is irrelevant. I searched the website thoroughly but didn't find an answer.
I'm making this shell script for a college project where we use rsync and crontab to sync folders. I'm trying to offer to the user the possibility of customizing rsync and crontab parameters, as followed:
rsync accepts -auvn. I tried making the folling regex on my .sh file:
#!/bin/bash
#(...) lots of previous code
if [[ $1 =~ ^-a?u?v?n? ]]; then
if [ $1 != "-" ]; then
But it accepts arguments such as "-x". You can see that the second if shows that I have no idea of what I'm doing!
crontab accepts five parameters:
min, [0,59] or * if any;
hour, [0,23] or * if any;
day of week, [Mon, Tue, ... , Sun] or * if any;
month, [Jan, Feb, ... , Dec] or * if any;
year, 2017 and forward, or * if any;
I'm not worried with crontab regex (for now), but I'm struggling to make rsync regex work. I downloaded rsync source code to see how they treat their option parameters, but most scripts are written in C and it escapes the scope of this project. I could also just send whatever the user requests to rsync options and watch it explode, but I'm trying to give it a little treatment first.
Thank you!

[[ $1 =~ ^-[auvn]+$ ]]
I.e., check minus, then check any of "auvn" letters multiple times until the end. How to test in the command line:
$ [[ '-auvn' =~ ^-[auvn]+$ ]] && echo yes || echo no
yes
$ [[ '-a' =~ ^-[auvn]+$ ]] && echo yes || echo no
yes
$ [[ '-x' =~ ^-[auvn]+$ ]] && echo yes || echo no
no

Related

Bash script with regex and capturing group

I'm working on a bash script to rename automatically files on my Synology NAS.
I have a loop for the statement of the files and everything is ok until I want to make my script more efficient with regex.
I have several bits of code which are working like as expected:
filename="${filename//[-_.,\']/ }"
filename="${filename//[éèēěëê]/e}"
But I have this:
filename="${filename//t0/0}"
filename="${filename//t1/1}"
filename="${filename//t2/2}"
filename="${filename//t3/3}"
filename="${filename//t4/4}"
filename="${filename//t5/5}"
filename="${filename//t6/6}"
filename="${filename//t7/7}"
filename="${filename//t8/8}"
filename="${filename//t9/9}"
And, I would like to use captured group to have something like this:
filename="${filename//t([0-9]{1,2})/\1}"
filename="${filename//t([0-9]{1,2})/${BASH_REMATCH[1]}}"
I've been looking for a working syntax without success...
The shell's parameter expansion facility does not support regular expressions. But you can approximate it with something like
filename=$(sed 's/t\([0-9]\)/\1/g' <<<"$filename")
This will work regardless of whether the first digit is followed by additional digits or not, so dropping that requirement simplifies the code.
If you want the last or all t[0-9]{1,2}s replaced:
$ filename='abt1cdt2eft3gh'; [[ "$filename" =~ (.*)t([0-9]{1,2}.*) ]] && filename="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"; echo "$filename"
abt1cdt2ef3gh
$ filename='abt1cdt2eft3gh'; while [[ "$filename" =~ (.*)t([0-9]{1,2}.*) ]]; do filename="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"; done; echo "$filename"
ab1cd2ef3gh
Note that the "replace all" case above would keep iterating until all t[0-9]{1,2}s are changed, even ones that didn't exist in the original input but were being created by the loop, e.g.:
$ filename='abtt123de'; while [[ "$filename" =~ (.*)t([0-9]{1,2}.*) ]]; do filename="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"; echo "$filename"; done
abt123de
ab123de
whereas the sed script in #tripleee's answer would not do that:
$ filename='abtt123de'; filename=$(sed 's/t\([0-9]\)/\1/g' <<<"$filename"); echo "$filename"
abt123de

Extracting CGI query parameter values in bash [duplicate]

This question already has answers here:
How to parse $QUERY_STRING from a bash CGI script?
(16 answers)
Closed 3 years ago.
All right, folks, you may have seen this infamous quirk to get hold of those values:
query=`echo $QUERY_STRING | sed "s/=/='/g; s/&/';/g; s/$/'/"`
eval $query
If the query string is host=example.com&port=80 it works just fine and you get the values in bash variables host and port.
However, you may know that a cleverly crafted query string will cause an arbitrary command to be executed on the server side.
I'm looking for a secure replacement or an alternative not using eval. After some research I dug up these alternatives:
read host port <<< $(echo "$QUERY_STRING" | tr '=&' ' ' | cut -d ' ' -f 2,4)
echo $host
echo $port
and
if [[ $QUERY_STRING =~ ^host=([^&]*)\&port=(.*)$ ]]
then
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
else
echo no match, sorry
fi
Unfortunately these two alternatives only work if the pars come in the order host,port. But they could come in the opposite order.
There could also be more than 2 pars, and any order is possible and allowed. So how do you propose to get the values into the
appropriate bash vars? Can the above methods be amended? Remember that with n pars there are n! possible orders. With 2 pars
there are only 2, but with 3 pars there are already 3! = 6.
I returned to the first method. Can it be made safe to run eval? Can you transform $QUERY_STRING with sed in a way that
makes it safe to do eval $query ?
EDIT: Note that this question differs from the other one referred to and is not a duplicate. The emphasis here is on using eval in a safe way. That is not answered in the other thread.
This method is safe. It does not eval or execute the QUERY_STRING. It uses string manipulation to break up the string into pieces:
QUERY_STRING='host=example.com&port=80'
declare -a pairs
IFS='&' read -ra pairs <<<"$QUERY_STRING"
declare -A values
for pair in "${pairs[#]}"; do
IFS='=' read -r key value <<<"$pair"
values["$key"]="$value"
done
echo do something with "${values[host]}" and "${values[port]}"
URL "percent decoding" left as an exercise.
You must avoid executing strings at all time when they come from untrusted sources. Therefore I would strongly suggest never to use eval in Bash do something with a string.
To be really save, I think I would echo the string into a file, use grep to retrieve parts of the string and remove the file afterwards. Always use a directory out of the web root.
#! /bin/bash
MYFILE=$(mktemp)
QUERY_STRING='host=example.com&port=80&host=lepmaxe.moc&port=80'
echo "${QUERY_STRING}" > ${MYFILE}
TMP_ARR=($(grep -Eo '(host|port)[^&]*' ${MYFILE}))
[ ${#TMP_ARR} -gt 0 ] || exit 1
[ $((${#TMP_ARR} % 2)) -eq 0 ] || exit 1
declare -A ARRAY;
for ((i = 0; i < ${#TMP_ARR[#]}; i+=2)); do
tmp=$(echo ${TMP_ARR[#]:$((i)):2})
port=$(echo $tmp | sed -r 's/.*port=([^ ]*).*/\1/')
host=$(echo $tmp | sed -r 's/.*host=([^ ]*).*/\1/')
ARRAY[$host]=$port
done
for i in ${!ARRAY[#]}; do
echo "$i = ${ARRAY[$i]}"
done
rm ${MYFILE}
exit 0
This produces:
lepmaxe.moc = 80
example.com = 80

BASH find regex for arbitrary range of numbers in a large number of files

I am writing a BASH script that, among other things, copies files from one directory to another based on input arguments for the start and end dates. The filenames are of the format YYYYMMDDhhmmss.jpg, e.g. 20161230143922.jpg. I am using find ... -exec cp {} ... because there are tens of thousands of files in the source directory. The input arguments are the start and end date in the format YYYYMMDD.
I know that I can't do a simple range in the regex like ($startdate..$enddate), but I am unable to figure out how to programmatically generate a regex that would work. If I had fewer files I could simply do cp {$startdate..$enddate} destination, but alas I don't think that is feasible.
I would like to copy all files between $startdate and $enddate that fall between the hours of 0500 and 1700. This would include images like 20170102060635.jpg and 20170104131255.jpg, but not 20170103010022.jpg.
This is what I have so far:
#!/bin/bash
STARTDATE=$1
ENDDATE=$2
FILE_NAME="review-${STARTDATE}-${ENDDATE}.mp4"
if [[ -n "$STARTDATE" ]]; then
echo "STARTDATE: $STARTDATE"
else
echo "Invalid start date: '$STARTDATE'"
echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
exit
fi
if [[ -n "$ENDDATE" ]]; then
echo "ENDDATE: $ENDDATE"
else
echo "Invalid end date: '$ENDDATE'"
echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
exit
fi
cd ~/Desktop/test\ timelapse
# Copy relevant files to local directory
find ~/Desktop/originals -regex "???????????????" -exec cp {} ~/Desktop/test\ timelapse/ \;
# Rename files to be sequential serial numbers
find ~/Desktop/test\ timelapse -name "*.jpg" | awk 'BEGIN{ a=0 }{ printf "mv \"%s\" ~/Desktop/\"test\ timelapse/%06d.jpg\"\n", $0, a++ }' | bash
# Generate timelapse video
ffmpeg -framerate 25 -i %06d.jpg -c:v libx264 -r 25 ${FILE_NAME}
Regex isn't the best tool for dealing with numerical ranges, so you may need to consider a solution that incorporates some logic outside the regex itself. Something like this:
REGEX="([0-9]{8})([0-9]{4})"
for f in ~/Desktop/originals/*.jpg
do
if [[ $f =~ $regex ]]
then
datepart=${BASH_REMATCH[1]}
timepart=${BASH_REMATCH[2]}
#if the DATE part matches
if (( $STARTDATE <= $datepart )) && (( $datepart <= $ENDDATE ))
then
#if the TIME part matches
if [[ $timepart =~ "(0[5-9]|1[0-7])" ]]
then
# copy file ...
fi
fi
fi
done
Pure Regex Solution
If you really want a pure regex solution, this will help demonstrate the complexity. Here's a regex to find all the files in the 0500 to 1700 timeframe, for dates in January 2017: ^201701\d{2}(0[5-9]|1[0-7])\d{4}\.jpg$
Notice the regex pattern needed to match times from 0500 to 1700:
(0[5-9]|1[0-7])
It's not pretty, and that's with a hardcoded range. To deal with dynamic start and end dates, you would be building a similar pattern dynamically. It could be done, but why use regex for it?
Here's an example, showing what you would need to generate for a date range from 20161225 to 20170114:
^(201612(2[5-9]|3\d)|201701(0\d|1[0-4]))(0[5-9]|1[0-7])\d{4}\.jpg$

bash regular expression format

My code have problem with compare var with regular expression.
The main problem is problem is here
if [[ “$alarm” =~ ^[0-2][0-9]\:[0-5][0-9]$ ]]
This "if" is never true i dont know why even if i pass to "$alarm" value like 13:00 or 08:19 its always false and write "invalid clock format".
When i try this ^[0-2][0-9]:[0-5][0-9]$ on site to test regular expressions its work for example i compered with 12:20.
I start my script whith command ./alarm 11:12
below is whole code
#!/bin/bash
masa="`date +%k:%M`"
mp3="$HOME/Desktop/alarm.mp3" #change this
echo "lol";
if [ $# != 1 ]; then
echo "please insert alarm time [24hours format]"
echo "example ./alarm 13:00 [will ring alarm at 1:00pm]"
exit;
fi
alarm=$1
echo "$alarm"
#fix me with better regex >_<
if [[ “$alarm” =~ ^[0-2][0-9]\:[0-5][0-9]$ ]]
then
echo "time now $masa"
echo "alarm set to $alarm"
echo "will play $mp3"
else
echo "invalid clock format"
exit;
fi
while [ $masa != $alarm ];do
masa="`date +%k:%M`" #update time
sleep 1 #dont overload the cpu cycle
done
echo $masa
if [ $masa = $alarm ];then
echo ringggggggg
play $mp3 > /dev/null 2> /dev/null &
fi
exit
I can see a couple of issues with your test.
Firstly, it looks like you may be using the wrong kind of double quotes around your variable (“ ”, rather than "). These "fancy quotes" are being concatenated with your variable, which I assume is what causes your pattern to fail to match. You could change them but within bash's extended tests (i.e. [[ instead of [), there's no need to quote your variables anyway, so I would suggest removing them entirely.
Secondly, your regular expression allows some invalid dates at the moment. I would suggest using something like this:
re='^([01][0-9]|2[0-3]):[0-5][0-9]$'
if [[ $alarm =~ $re ]]
I have deliberately chosen to use a separate variable to store the pattern, as this is the most widely compatible way of working with bash regexes.

How to check an input string in bash it's in version format (n1.n2.n3)

I've written an script that updates a version on a certain file. I need to check that the input for the user is in version format so I don't finish adding number that are not needed in those important files. The way I have done it is by adding a new value version_check which where I delete my regex pattern and then an if check.
version=$1
version_checked=$(echo $version | sed -e '/[0-9]\+\.[0-9]\+\.[0-9]/d')
if [[ -z $version_checked ]]; then
echo "$version is the right format"
else
echo "$version_checked is not in the right format, please use XX.XX.XX format (ie: 4.15.3)"
exit
fi
That works fine for XX.XX and XX.XX.XX but it also allows XX.XX.XX.XX and XX.XX.XX.XX.XX etc.. so if user makes a mistake it will input wrong data on the file. How can I get the sed regex to ONLY allow 3 pairs of numbers separated by a dot?
Change your regex from:
/[0-9]\+\.[0-9]\+\.[0-9]/
to this:
/^[0-9]*\.[0-9]*\.[0-9]*$/
You can do this with bash pattern matching:
$ for version in 1.2 1.2.3 1.2.3.4; do
printf "%s\t" $version
[[ $version == +([0-9]).+([0-9]).+([0-9]) ]] && echo y || echo n
done
1.2 n
1.2.3 y
1.2.3.4 n
If you need each group of digits to be exactly 2 digits:
[[ $version == [0-9][0-9].[0-9][0-9].[0-9][0-9] ]]