Does anyone have any recommendations for the best method to write a regular expression for CRON?
Allow me to explain a little better. I have a config file with individual variables corresponding to the fields in CRON. I need to verify that each field is valid. ie 0-59 for seconds, 0-31 for months etc. I'm using sed to update CRON and if the configuration file has syntax errors (accidental extra characters, letters, anything that CRON doesnt like) the results are disastrous (CRON file is clobbered)
I would need to verify all possible numbers and wildcards and throw an error on anything else. I dont know if im just getting tired or what, but I cant seem to get started logically on this one.
I'm open to any suggestions, not just coding. How to prevent CRON from getting clobbered, maybe editing everything in one string (in config file) for CRON instead of individual variables
Thx for any help
Here is an example of the config. Very simple.
# SUMMARY REPORT FREQUENCY ( * Wildcards acceptable )
MIN="30"
HOUR="*"
DAY="12"
MON="*"
WEEK="*"
* UPDATE *
Ubuntu 12.04 LTS which ships with Bash 4.2.25
and here is the code that is doing the updating.
function REPORT.CHECK {
sleep 1s
if [ "`crontab -l | grep report.sh`" \> " " ]; then
CTMP="$(set -f; crontab -l | grep report.sh)"
if [ "$CTMP" = "$MIN $HOUR $DAY $MON $WEEK cd $DIR && ./report.sh" ]; then
if [ "$DISABLE" = "false" ]; then
RETURN="true"
fi
else
if [ "$DISABLE" = "false" ]; then
CTMPESC=$(sed 's/[\*\.&]/\\&/g' <<<"$CTMP")
DIRESC=$(sed 's/[\*\.&]/\\&/g' <<<"$DIR")
crontab -l | sed "s%$CTMPESC%/$MIN /$HOUR /$DAY /$MON /$WEEK cd $DIRESC \&\& \./report\.sh" | crontab -
RETURN="update"
fi
fi
if [ "$DISABLE" = "true" ]; then
crontab -l | grep -F -v report.sh | crontab -
RETURN="disable"
fi
else
if [ "$DISABLE" = "true" ]; then
RETURN="exit"
else
(crontab -l ; echo "$MIN $HOUR $DAY $MON $WEEK cd $DIR && ./report.sh") | crontab -
RETURN="default"
fi
fi
}
This snip of code actually does quite a bit. It adds the entry to CRON if it doesn't exist. It also kills the script (well returns exit) if this part (the reporting portion) is disabled in the config, it also updates CRON if it sees that what is in CRON is different than whats in the config and finally if the config is identical to whats in CRON, it just ignores and moves on. Those features are not in order. Hopefully that adds enough detail lol.
If you are sticking with the regex-based approach, this set of regexes (regeces?) should get you started. It doesn't support using names for days of the week or months, nor "frequency" notation like */5 to substitute for every five minutes. But try this (assuming you have opened your config file into an file id $configfile:
min=$(grep -P 'MIN="([0-5]?[0-9]|\*)"' $configfile | grep -oP '([0-5]?[0-9]|\*)')
hour=$(grep -P 'HOUR=\"([1-2]?[0-9]|\*)"' $configfile | grep -oP "([1-2]?[0-9]|\*)")
day=$(grep -P 'DAY=\"([1-3]?[0-9]|\*)"' $configfile | grep -oP "([1-3]?[0-9]|\*)")
mon=$(grep -P 'MON=\"(1?[0-9]|\*)"' $configfile | grep -oP "(1?[0-9]|\*)")
week=$(grep -P 'WEEK=\"([0-7]|\*)"' $configfile | grep -oP "([0-7]|\*)")
After you've collected these values, you can easily check to see if they're in the correct range -- for example, it's possible for the HOUR regex to match 29, which obviously isn't a real hour. But now that the value is saved, you can do:
if [ "$hour" -gt 23 ]; then
#throw an error, exit the test, whatever
fi
Just make sure to quote the variables when you test them! For example, "$hour", not $hour. If you have an * in a variable and don't quote it, the shell will expand it inline to all the filenames in your current directory.
Related
This question already has answers here:
How to parse $QUERY_STRING from a bash CGI script?
(16 answers)
Closed 3 years ago.
All right, folks, you may have seen this infamous quirk to get hold of those values:
query=`echo $QUERY_STRING | sed "s/=/='/g; s/&/';/g; s/$/'/"`
eval $query
If the query string is host=example.com&port=80 it works just fine and you get the values in bash variables host and port.
However, you may know that a cleverly crafted query string will cause an arbitrary command to be executed on the server side.
I'm looking for a secure replacement or an alternative not using eval. After some research I dug up these alternatives:
read host port <<< $(echo "$QUERY_STRING" | tr '=&' ' ' | cut -d ' ' -f 2,4)
echo $host
echo $port
and
if [[ $QUERY_STRING =~ ^host=([^&]*)\&port=(.*)$ ]]
then
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
else
echo no match, sorry
fi
Unfortunately these two alternatives only work if the pars come in the order host,port. But they could come in the opposite order.
There could also be more than 2 pars, and any order is possible and allowed. So how do you propose to get the values into the
appropriate bash vars? Can the above methods be amended? Remember that with n pars there are n! possible orders. With 2 pars
there are only 2, but with 3 pars there are already 3! = 6.
I returned to the first method. Can it be made safe to run eval? Can you transform $QUERY_STRING with sed in a way that
makes it safe to do eval $query ?
EDIT: Note that this question differs from the other one referred to and is not a duplicate. The emphasis here is on using eval in a safe way. That is not answered in the other thread.
This method is safe. It does not eval or execute the QUERY_STRING. It uses string manipulation to break up the string into pieces:
QUERY_STRING='host=example.com&port=80'
declare -a pairs
IFS='&' read -ra pairs <<<"$QUERY_STRING"
declare -A values
for pair in "${pairs[#]}"; do
IFS='=' read -r key value <<<"$pair"
values["$key"]="$value"
done
echo do something with "${values[host]}" and "${values[port]}"
URL "percent decoding" left as an exercise.
You must avoid executing strings at all time when they come from untrusted sources. Therefore I would strongly suggest never to use eval in Bash do something with a string.
To be really save, I think I would echo the string into a file, use grep to retrieve parts of the string and remove the file afterwards. Always use a directory out of the web root.
#! /bin/bash
MYFILE=$(mktemp)
QUERY_STRING='host=example.com&port=80&host=lepmaxe.moc&port=80'
echo "${QUERY_STRING}" > ${MYFILE}
TMP_ARR=($(grep -Eo '(host|port)[^&]*' ${MYFILE}))
[ ${#TMP_ARR} -gt 0 ] || exit 1
[ $((${#TMP_ARR} % 2)) -eq 0 ] || exit 1
declare -A ARRAY;
for ((i = 0; i < ${#TMP_ARR[#]}; i+=2)); do
tmp=$(echo ${TMP_ARR[#]:$((i)):2})
port=$(echo $tmp | sed -r 's/.*port=([^ ]*).*/\1/')
host=$(echo $tmp | sed -r 's/.*host=([^ ]*).*/\1/')
ARRAY[$host]=$port
done
for i in ${!ARRAY[#]}; do
echo "$i = ${ARRAY[$i]}"
done
rm ${MYFILE}
exit 0
This produces:
lepmaxe.moc = 80
example.com = 80
I'm currently trying to get into bash regular expressions to change multiple filenames at the same time. Here are the file names:
a_001_D_xy_S37_L003_R1_001.txt
a_001_D_xy_S37_L003_R2_001.txt
a_002_D_xy_S37_L006_R1_001.txt
a_002_D_xy_S37_L006_R2_001.txt
a_003_D_xy_S23_L003_R1_001.txt
a_003_D_xy_S23_L003_R2_001.txt
I want this as my result:
a_002_D_xy_R1.txt
a_002_D_xy_R2.txt
...
I only want to change those with *001.txt at the end. First I want to remove the _S.._L00. in the filenames and the 001 in the end. I split this procedure in two parts:
for file in *001.txt;
do
echo ${file#_S.._L..6}
done
This loop already does not work. As a second alternative I tried:
for file in *001.fastq.gz;
do
echo ${file/_S.._L00./}
done
but the filenames are again unchanged. (I just use echo here to see the results. If it works I will replace it with mv ${file} ${regularexpression})
Thanks for help!
Considering that you need lots of different fields it is possibly better to just split the filename and then reconstruct it as you wish.
I suggest using an array built by splitting the original filename with _. Then you just reconstruct the new name by using the fields that you wish.
for file in *001.txt; do
echo "FILE: $file"
IFS='_' read -r -a fileFields <<< "$file"
echo "FILE FIELDS: "
for index in "${!fileFields[#]}"; do
echo "- $index ${fileFields[index]}"
done
fileName="${fileFields[0]}_${fileFields[1]}_${fileFields[2]}_${fileFields[3]}_${fileFields[-2]}.txt"
echo "NEW FILE NAME: $fileName"
# mv $file $fileName
done
The echo commands are just for debuging, you can remove them all once you understand the code.
However, if you really need to split the string using BASH expressions you can check this post:
Extracting part of a string to a variable in bash or take a look at this BASH cheat sheet.
Try to make a function, you'll first have to decide the number (n) of files.
n=$(ls *_001.txt | wc -l)
functionRename(){
for(( i=1; i <=n; i++))
do
file=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file}" "${file%_S??_*}${file#???????????????????}"
file2=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file2}" "${file2%_001*}.txt"
done
}
functionRename
Here is an example of logs in my /var/www/apache2/log folder-
./no_domain_access.log.7.gz
./no_domain_access.log.8.gz
./no_domain_access.log.9.gz
./no_domain_error.log.10.gz
./no_domain_error.log.11.gz
./no_domain_error.log.12.gz
./no_domain_error.log.13.gz
./no_domain_error.log.14.gz
./no_domain_error.log.15.gz
./no_domain_error.log.16.gz
./no_domain_error.log.17.gz
./no_domain_error.log.18.gz
./no_domain_error.log.19.gz
./no_domain_error.log.20.gz
and goes until 50...
I would like to iterate over those files and remove all log files that are greater then 5.
using regex syntax will give me the option to match numbers in the pattern of [1-9] or {1,2} but this will also match that log files that i dont want to delete ( single numbers 1-5 log files that i wish to keep)
How can i match only file names with numbers higher than 5 ?
Thanks!
You can use awk one-liner for this:
printf '%s\n' *[0-9].gz | awk -F '.' '$(NF-1) >= 5'
This awk command uses dot as field separator and compared $(NF-1) (that is the numeric field before extension) with number 5.
To delete these files use:
printf '%s\n' *[0-9].gz | awk -F '.' '$(NF-1) >= 5' | xargs rm
xargs takes input from awk and rm command just deletes those files.
Use the bash, regex operator ~ to extract the number and list the file if the number was greater than 5
for file in /var/www/apache2/log/*.gz; do
test -f "$file" || continue
[[ $file =~ ^.*log\.([[:digit:]]+).*$ ]] && { (( "${BASH_REMATCH[1]}" > 5 )) && printf "%s\n" "$file"; }
done
If you just want to delete the files, replace printf "%s\n" by just rm.
Find with regular expressions
find . -regex './no_domain_access.log.*gz' ! -regex './no_domain_access.log.[1-5].gz'
Find all files matching no_domain... and then run another regular expression to attain all these results minus files with 1 to 5.
Without regular expressions, using shell globs and entirely native & portable POSIX shell code:
rm -f no_domain_access.log.[6-9].gz no_domain_access.log.[0-9][0-9].gz
It's easier in bash:
rm -f no_domain_access.log.{6..50}.gz
These are probably created with logrotate or a similar log rotation utility. You might want to just change its configuration to only store five logs.
If it's controlled by logrotate, you can find the documentation with man logrotate and you'll probably find something like this:
/var/log/no_domain_access.log {
rotate 50
daily
}
Change the 50 to 5 and you're done. You probably(?) still have to clean up the current old logs using one of the above commands.
(I'm in a Bash environment, Cygwin on a Windows machine, with awk, sed, grep, perl, etc...)
I want to add the last folder name to the filename, just before the last underscore (_) followed by numbers or at the end if no numbers are in the filename.
Here is an example of what I have (hundreds of files needed to be reorganized) :
./aaa/A/C_17x17.p
./aaa/A/C_32x32.p
./aaa/A/C.p
./aaa/B/C_12x12.p
./aaa/B/C_4x4.p
./aaa/B/C_A_3x3.p
./aaa/B/C_X_91x91.p
./aaa/G/C_6x6.p
./aaa/G/C_7x7.p
./aaa/G/C_A_113x113.p
./aaa/G/C_A_8x8.p
./aaa/G/C_B.p
./aab/...
I would like to rename all thses files like this :
./aaa/C_A_17x17.p
./aaa/C_A_32x32.p
./aaa/C_A.p
./aaa/C_B_12x12.p
./aaa/C_B_4x4.p
./aaa/C_A_B_3x3.p
./aaa/C_X_B_91x91.p
./aaa/C_G_6x6.p
./aaa/C_G_7x7.p
./aaa/C_A_G_113x113.p
./aaa/C_A_G_8x8.p
./aaa/C_B_G.p
./aab/...
I tried many bash for loops with sed and the last one was the following :
IFS=$'\n'
for ofic in `find * -type d -name 'A'`; do
fic=`echo $ofic|sed -e 's/\/A$//'`
for ftr in `ls -b $ofic | grep -E '.png$'`; do
nfi=`echo $ftr|sed -e 's/(_\d+[x]\d+)?/_A\1/'`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
But yet with no success... This \1 does not get inserted in the $nfi...
This is the last one I tried, only working on 1 folder (which is a subfolder of a huge folder collection) and after over 60 minutes of unsuccessful trials, I'm here with you guys.
I modified your script so that it works for all your examples.
IFS=$'\n'
for ofic in ???/?; do
IFS=/ read fic fia <<<$ofic
for ftr in `ls -b $ofic | grep -E '\.p.*$'`; do
nfi=`echo $ftr|sed -e "s/_[0-9]*x[0-9]*/_$fia&/;t;s/\./_$fia./"`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
# it's easier to change to here first
cd aaa
# process every file
for f in $(find . -type f); do
# strips everything after the first / so this is our foldername
foldername=${f/\/*/}
# creates the new filename from substrings of the
# original filename concatenated to the foldername
newfilename=".${f:1:3}${foldername}_${f:4}"
# if you are satisfied with the output, just leave out the `echo`
# from below
echo mv ${f} ${newfilename}
done
Might work for you.
See here in action. (slightly modified, as ideone.com handles STDIN/find diferently...)
I know this is an ugly script but it does the job.
What I am facing now is adding a few more extensions what would clutter the scrip even more.
How can I make it more modular?
Specifically, how can I write this long regular expression (source file extensions) on multiple lines? say one extension on each line. I guess I am doing something wrong with string concatenation but not quite sure what exactly.
Here's the original file:
#!/bin/bash
COMMAND='svn status'
XARGS='xargs'
SVN='svn add'
$COMMAND | grep -E '(\.m|\.mat|\.java|\.js|\.php|\.cpp|\.h|\.c|\.py|\.hs|\.pl|\.xml|\.html|\.sh|.\asm|\.s|\.tex|\.bib|.\Makefile|.\jpg|.\gif|.\png|.\css)'$ | awk ' { print$2 } ' | $XARGS $SVN
and here's roughly what I am aiming at
...code code
'(.\m|
\.mat|
\.js|
.
.
.
.\css)'
..more code here
Anybody?
I know this doesn't answer the question directly, but from a readability perspective, I think most developers would agree that a single-line regex is the most common way to do things and therefore the most maintainable approach. Further, I'm not sure why you're including a period in each of your extensions, this should only need to be used once.
I wrote this little script to automatically add all images to svn. You should be able to simply add extensions between the pipes in the regex to add or remove different file types. Note that it makes sure to only add files that are unrecognized by making sure each line starts with a "?" (^\?) and ends with a period (\.(extensions)$). Hope it's helpful!
#!/bin/bash
svn st | grep -E "^\?.*\.(png|jpg|jpeg|tiff|bmp|gif)$" > /tmp/svn-auto-add-img
while read output; do
FILE=$(echo $output | awk '{ print $2 }')
svn add $FILE
done < /tmp/svn-auto-add-img
exit 0
How about this:
PATTERNS="
\.foo
\.bar
\.baz"
# Put them into one list separated by or ("|").
PATTERNS=`echo $PATTERNS |sed 's/\s\+/|/g'`
$COMMAND | grep -E "($PATTERNS)"
(Note that this would not work if you put quotes around $PATTERNS in the call to echo -- echo is taking care of stripping whitespace and converting newlines to spaces for us.)
#!/bin/bash
COMMAND='svn status'
XARGS='xargs'
SVNADD='svn add'
pats=
pats+=' \.m'
pats+=' \.mat'
pats+=' \.java'
pats+=' \.js'
# add your 'or-able' sub patterns here
# build the full pattern
pattern='(';for pat in $pats;do pattern+="$pat|";done;pattern=${pattern%\|}')$'
# run grep with the generated pattern
files=$($COMMAND | grep -E ${pattern} | awk ' { print $NF } ')
if [ " $files" != " " ]
then
$COMMAND | grep -E ${pattern} | awk ' { print $NF } ' | $XARGS $SVNADD
fi