Using bash (parameter expansion) to sanitize input file

Using bash (parameter expansion) to sanitize input file - regex

I have a bash script that has a function like so:
sanitize(){
rb_reg="^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$"
if grep -Ex "${rb_reg}" "${1}/.ruby-version" > /dev/null 2>&1; then
sanitize_tmp="$(<"${1}"/.ruby-version)" &&
ruby_version="${sanitize_tmp//[^0-9\.]/}" &&
echo "Setting Ruby Version: ${ruby_version}"
else
echo "There was an error trying to sanitize a .ruby-version file"
echo "The file was: ${1}/.ruby-version"
exit 7
fi
}
I'm using it to check a .ruby-version file and then set the version in there as a variable.
Mostly these files will contain something sensible like: 2.0.0 which works OK. I want to be defensive and not trust the input file, so check/sanitize it as much as possible.
Two questions:
If for some reason there were multiple version numbers in the file on multiple lines, say:
'2.0.0
1.0.0'
That's going to smash them together currently removing white space and end up with a variable like: '2.0.01.0.0'
What's a good way to only pick up the first version number that matches the regex?
Is there a better way to do this, maybe entirely in bash without grep? Appreciate any examples people have of checking for a version like this but not trusting the input file.

I'm still playing around with this a little, but here is what I ended up doing.
I'm passing in the file name as an argument to the function elsewhere in the script. Really liked the concept of BASH_REMATCH, so tried to avoid using grep, sed, awk etc and do it this way.
You can view the latest version of the code here: https://github.com/octopusnz/scripts
sanitize(){
if [[ "${#}" -ne 1 ]]; then
echo "[ERROR 7]: We expected 1 argument to the sanitize() function."
echo "But we got ${#} instead."
exit 7
fi
rbv_reg="^([0-9]{1,2})\.([0-9]{1,2})\.([0-9]{1,2})(-([a-z]{1,10}))?$"
reg_matches=0
while read -r rbv_line || [[ -n "$rbv_line" ]]; do
if [[ "${rbv_line}" =~ ${rbv_reg} ]]; then
ruby_version="${BASH_REMATCH[0]//[^0-9a-z\.\-]/}" &&
((reg_matches="${reg_matches}"+1)) &&
echo "" &&
echo "Setting Ruby version: ${ruby_version}" &&
break
fi
done < "${1}"
if [[ "${reg_matches}" -lt 1 ]]; then
if [[ -v ruby_version ]]; then
echo "We couldn't parse ${1} and set a valid Ruby version."
echo "Using default: ${ruby_version}"
else
echo "We couldn't parse ${1} and set a default Ruby version."
echo "[ERROR 4]: No valid .ruby-version file found."
exit 4
fi
fi
}

Related

Check if a string contains valid pattern in Bash

I have a file a.txt contains a string like:
Axxx-Bxxxx
Rules for checking if it is valid or not include:
length is 10 characters.
x here is digits only.
Then, I try to check with:
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
echo $msg;
if[ -f $file];then
tmp=$(cat $file);
if[[${#tmp} != $exp_len ]];then
msg="invalid length";
elif [[ $tmp =~ ^[A[0-9]{3}-B[0-9]{4}]$]];then
msg="valid";
else
msg="invalid";
fi
else
msg="file not exist";
fi
echo $msg;
But in valid case it doesn't work...
Is there someone help to correct me?
Thanks :)

Other than the regex fix, your code can be refactored as well, moreover there are syntax issues as well. Consider this code:
file="a.txt"
msg="checking string"
tmp="File not exist"
echo "$msg"
if [[ -f $file ]]; then
s="$(<$file)"
if [[ $s =~ ^A[0-9]{3}-B[0-9]{4}$ ]]; then
msg="valid"
else
msg="invalid"
fi
else
msg="file not exist"
fi
echo "$msg"
Changes are:
Remove unnecessary cat
Use [[ ... ]] when using bash
Spaces inside [[ ... ]] are required (your code was missing them)
There is no need to check length of 10 as regex will make sure that part as well
As mentioned in comments earlier correct regex should be ^A[0-9]{3}-B[0-9]{4}$ or ^A[[:digit:]]{3}-B[[:digit:]]{4}$

Note that a regex like ^[A[0-9]{3}-B[0-9]{4}]$ matches
^ - start of string
[A[0-9]{3} - three occurrences of A, [ or a digit
-B - a -B string
[0-9]{4} - four digits
] - a ] char
$ - end of string.
So, it matches strings like [A[-B1234], [[[-B1939], etc.
Your regex checking line must look like
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
See the online demo:
#!/bin/bash
tmp="A123-B1234";
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
msg="valid";
else
msg="invalid";
fi
echo $msg;
Output:
valid

Using just grep might be easier:
$ echo A123-B1234 > valid.txt
$ echo 123 > invalid.txt
$ grep -Pq 'A\d{3}-B\d{4}' valid.txt && echo valid || echo invalid
valid
$ grep -Pq 'A\d{3}-B\d{4}' invalid.txt && echo valid || echo invalid
invalid

With your shown samples and attempts, please try following code also.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk '/^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
OR In case you want to check if line in your Input_file should be 10 characters long(by seeing OP's attempted code's exp_len shell variable) then try following code, where an additional condition is also added in awk code.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk -v len="$exp_len" 'length($0) == len && /^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
NOTE: I am using here -f flag to test if file is existing or not, you can change it to -s eg: -s "$file" in case you want to check file is present and is of NOT NULL size.

incrementing list variable and then loop on it

in a sh script, I am trying to make a list of filename in a folder, and then loop on it to check if two consecutive filename respond well to "expression criteria".
in a folder I have:
file1.nii
file1_mask.nii
file2.nii
file2_mask.nii
etc ...
undefined number of files. but if filex.nii exists, it must have filex_mask.nii
in a .txt file that the user modify.
it contains:
file1.nii tab some parameter \n
file2.nii tab some parameter \n
etc ...
the script take long hours after to run, and for example, the mask files are used only after few hours.
so I want at the beginning of the .sh to check if filenames are well spelled and if any files in the .txt is present in the folder.
and in case not, stop the .sh and warn the user. not wait hours before noticing the problem.
For now I tried:
test=""
for entry in "${search_dir}"/*
do
echo "$entry"
test="${test} $entry"
done
I have then a string variable with space between filenames, but it has the folder name as well.
./search_dir/file1.nii ./search_dir/file1_mask.nii
I wanted file1.nii file1_mask.nii etc ...
and now I read my .txt file and check if the filename specified in it are in my test variable.
while read -r line
do
set -- $line
stack=$1
check=False
check2=False
for i in $test; do
echo "$stack.nii"
echo "$i"
if "${stack}.nii" == "$i";
then
check=True
fi
if "${stack}_mask.nii"=="$i";
then
check2=True
fi
done
done < "$txt_file"
but it is not working.
"$stack_mask.nii"=="$i"
doesn't seems to be the good way to compare strings
it generates the error:
"file1.nii" not found
Here is my solution for now, based on glenn answer:
errs=0
while read -r line; do
set -- $line
prefix="${1}.nii"
prefix2="${1}.nii.gz"
if [ -e ${PATH}/$prefix2 ]; then
echo "File found: ${PATH}/$prefix2" >&2
elif [ -e ${PATH}/$prefix ]; then
echo "File found: ${PATH}/$prefix" >&2
else
echo "File not found: ${PATH}/$prefix" >&2
errs=$((errs + 1))
fi
prefixmask="${1}_brain_mask.nii"
prefixmask2="${1}_brain_maskefsd.nii.gz"
if [ -e ${PATH}/$prefixmask ]; then
echo "Mask file found for ${PATH}/$prefixmask" >&2
elif [ -e ${PATH}/$prefixmask2 ]; then
echo "Mask file found for ${PATH}/$prefixmask2" >&2
else
echo "Mask file not found: ${PATH}/$prefixmask" >&2
errs=$((errs + 1))
fi
done < "$INPUT"
echo $errs
if [ $errs > 0 ]; then
echo "Errors found"
exit 3
fi
then only problem now is that it always exit, even if errs is equal to 0 and I don't know why ...

I would do this:
errs=0
for f in "$search_dir"/*.mii; do
[[ $f == *_mask.mii ]] && continue # skip the mask files
prefix=${f%.mii} # strip off the extension
if [[ ! -f "${prefix}_mask.mii" ]]; then
echo "Error: $f has no mask file" >&2
((errs++))
fi
done
if [[ $errs -gt 0 ]]; then
echo "Aborting due to errors" >&2
exit 2
fi
That should be pretty efficient, since it just loops through the files once.
Now that we see the input file:
errs=0
while read -r mii_file other_stuff; do
prefix="${mii_file%.mii}"
if [[ ! -f ./"$mii_file" ]]; then # adjust your relative path accordingly
echo "File not found: $mii_file" >&2
((errs++))
elif [[ ! -f ./"${prefix}_mask.mii" ]]; then
echo "Mask file missing for $mii_file" >&2
((errs++))
fi
done < "$txt_file"
if (( errs > 0 )); then
echo "Errors found"
exit 2
fi

Why is this regex with a capture group not working in bash?

I do this in a bash script:
#!/bin/bash
set -e
function getVersion {
REGEX="^$2-(.*?)\.ear$"
if [[ "$1" =~ $REGEX ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "Cannot deduce artifact version from file name $1, exiting" >&2
exit 1
fi
}
MY_APP=app-1.3.4.ear
APP_VERSION=$(getVersion "$MY_APP" app)
echo "Version is $APP_VERSION"
I would expect getVersion to return 1.3.4, but it produces the error message instead. Why doesn't the regex match?

Bash doesn't support ? to mean non-greedy in a regular expression.
Luckily, you don't need it in this case:
function getVersion {
REGEX="^$2-(.*)\.ear$"
if [[ "$1" =~ $REGEX ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "Cannot deduce artifact version from file name $1, exiting" >&2
exit 1
fi
}
The fact that you're specifying what must come after the version number prevents it from being consumed by the .*.
As an aside, I'd recommend getting out of the habit of using uppercase variable names in the shell; they're used internally so you run the risk of accidentally overwriting something useful.
I'd also normally advise against using the bash-specific function syntax in favour of the standard getVersion() { but you're using bash features so I guess it's not so much of an issue.

Check for valid number in busybox?

I am trying to do a script where i need to check if the user input is valid. I just can't figure it out. Have been trying different ways but can't find a solution. So if there are a busybox ash guru out there i am happy for all help.
if ! [[ $ANS =~ ^[0-9][.0-9]*$ ]]; then
echo "abort"
else
echo "go on"
fi
I want to see if the user inputs a number. A number with decimal is also allowed. If not then it should abort.
Same goes with..
if ! [[ $ANS =~ ^[0-9A-Fa-f]{6}$ ]] ; then
echo "abort"
else
echo "go on"
fi
Where i need it to see if hexadecimal is used. All i get is "unknown operand".

It feels a bit of a hack, but you can use egrep for this:
$ ANS=10.2
$ echo -n $ANS | egrep -q '^[0-9]*[.]?[0-9]*$' && echo success || echo failure
success
$ ANS=10.2.2
$ echo -n $ANS | egrep -q '^[0-9]*[.]?[0-9]*$' && echo success || echo failure
failure

shell scripting and regular expression

#!bin/bash
echo enter your password :
read password
passlength=$(echo ${#password})
if [ $passlength -le 8 ];
then
echo you entered correct password
else
echo entered password is incorrect
fi
if [[$password == [a-z]*[0-9][a-z]*]];
then
echo match found
else
echo match not found
fi
I am not getting what's wrong with this code. If I enter any string as a password, let's say hello123, it gives me an error:
hello123 : command not found
What is wrong with my script?

You can do the following to make it work cross-platforms with any the bourne shell (/bin/sh) based shell, no bash specific primitives -
echo "$password" | grep -q "[a-z]*[0-9][a-z]*"
if [ $? -eq 0 ] ;then
echo "match found"
else
echo "match not found"
fi
Also feel free to use quotes around the variable names. It will save you hours and hours worth of useless debugging. :)

Technically it should give you an error like [[hello123 : command not found.
The issue is that [[$password is not expanded how you think it is. Bash will first resolve the $password variable to what you entered (i.e. hello123). This will yield the string [[hello123 which bash will then try to invoke (and fail, as there is nothing with that name).
Simply add a space () after [[ and bash will recognise [[ as the command to run (although it is a builtin).
if [[ "$password" == [a-z]*[0-9][a-z]* ]]
then
...

The corrected script is below. The errors were:
#!/bin/bash, not #!bin/bash
To read password length, just do passlength=${#password}, not
passlength=$(echo ${#password})
Always put a space after [ or [[
#!/bin/bash
echo "enter your password :"
read password
passlength=${#password}
if [[ $passlength -le 8 ]]
then
echo "you entered correct password"
else
echo "entered password is incorrect"
fi
if [[ $password == [a-z]*[0-9][a-z]* ]]
then
echo "match found"
else
echo "match not found"
fi

In the bash [[ construct, the == operator will match glob-style patterns, and =~ will match regular expressions. See the documentation.

#!/bin/bash
read -s -p "Enter Password: " password
password_length=${#password}
if [ $password_length -lt 8 -o $password_length -gt 20 ] ;then
echo -e "Invalid password - should be between 8 and 20 characters in length.";
echo ;
else
# Check for invalid characters
case $password in
*[^a-zA-Z0-9]* )
echo -e "Password contains invalid characters.";
echo ;
;;
* )
echo "Password accepted.";
echo ;
break;
;;
esac
fi
More tuned example..

Try to replace line
if [[$password == [a-z]*[0-9][a-z]*]];
with following
if echo "$password" | grep -qs '[a-z]*[0-9][a-z]*'
HTH

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using bash (parameter expansion) to sanitize input file - regex

Related

Check if a string contains valid pattern in Bash

incrementing list variable and then loop on it

Why is this regex with a capture group not working in bash?

Check for valid number in busybox?

shell scripting and regular expression

Categories

Resources