incrementing list variable and then loop on it - list

in a sh script, I am trying to make a list of filename in a folder, and then loop on it to check if two consecutive filename respond well to "expression criteria".
in a folder I have:
file1.nii
file1_mask.nii
file2.nii
file2_mask.nii
etc ...
undefined number of files. but if filex.nii exists, it must have filex_mask.nii
in a .txt file that the user modify.
it contains:
file1.nii tab some parameter \n
file2.nii tab some parameter \n
etc ...
the script take long hours after to run, and for example, the mask files are used only after few hours.
so I want at the beginning of the .sh to check if filenames are well spelled and if any files in the .txt is present in the folder.
and in case not, stop the .sh and warn the user. not wait hours before noticing the problem.
For now I tried:
test=""
for entry in "${search_dir}"/*
do
echo "$entry"
test="${test} $entry"
done
I have then a string variable with space between filenames, but it has the folder name as well.
./search_dir/file1.nii ./search_dir/file1_mask.nii
I wanted file1.nii file1_mask.nii etc ...
and now I read my .txt file and check if the filename specified in it are in my test variable.
while read -r line
do
set -- $line
stack=$1
check=False
check2=False
for i in $test; do
echo "$stack.nii"
echo "$i"
if "${stack}.nii" == "$i";
then
check=True
fi
if "${stack}_mask.nii"=="$i";
then
check2=True
fi
done
done < "$txt_file"
but it is not working.
"$stack_mask.nii"=="$i"
doesn't seems to be the good way to compare strings
it generates the error:
"file1.nii" not found
Here is my solution for now, based on glenn answer:
errs=0
while read -r line; do
set -- $line
prefix="${1}.nii"
prefix2="${1}.nii.gz"
if [ -e ${PATH}/$prefix2 ]; then
echo "File found: ${PATH}/$prefix2" >&2
elif [ -e ${PATH}/$prefix ]; then
echo "File found: ${PATH}/$prefix" >&2
else
echo "File not found: ${PATH}/$prefix" >&2
errs=$((errs + 1))
fi
prefixmask="${1}_brain_mask.nii"
prefixmask2="${1}_brain_maskefsd.nii.gz"
if [ -e ${PATH}/$prefixmask ]; then
echo "Mask file found for ${PATH}/$prefixmask" >&2
elif [ -e ${PATH}/$prefixmask2 ]; then
echo "Mask file found for ${PATH}/$prefixmask2" >&2
else
echo "Mask file not found: ${PATH}/$prefixmask" >&2
errs=$((errs + 1))
fi
done < "$INPUT"
echo $errs
if [ $errs > 0 ]; then
echo "Errors found"
exit 3
fi
then only problem now is that it always exit, even if errs is equal to 0 and I don't know why ...

I would do this:
errs=0
for f in "$search_dir"/*.mii; do
[[ $f == *_mask.mii ]] && continue # skip the mask files
prefix=${f%.mii} # strip off the extension
if [[ ! -f "${prefix}_mask.mii" ]]; then
echo "Error: $f has no mask file" >&2
((errs++))
fi
done
if [[ $errs -gt 0 ]]; then
echo "Aborting due to errors" >&2
exit 2
fi
That should be pretty efficient, since it just loops through the files once.
Now that we see the input file:
errs=0
while read -r mii_file other_stuff; do
prefix="${mii_file%.mii}"
if [[ ! -f ./"$mii_file" ]]; then # adjust your relative path accordingly
echo "File not found: $mii_file" >&2
((errs++))
elif [[ ! -f ./"${prefix}_mask.mii" ]]; then
echo "Mask file missing for $mii_file" >&2
((errs++))
fi
done < "$txt_file"
if (( errs > 0 )); then
echo "Errors found"
exit 2
fi

Related

Check if a string contains valid pattern in Bash

I have a file a.txt contains a string like:
Axxx-Bxxxx
Rules for checking if it is valid or not include:
length is 10 characters.
x here is digits only.
Then, I try to check with:
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
echo $msg;
if[ -f $file];then
tmp=$(cat $file);
if[[${#tmp} != $exp_len ]];then
msg="invalid length";
elif [[ $tmp =~ ^[A[0-9]{3}-B[0-9]{4}]$]];then
msg="valid";
else
msg="invalid";
fi
else
msg="file not exist";
fi
echo $msg;
But in valid case it doesn't work...
Is there someone help to correct me?
Thanks :)
Other than the regex fix, your code can be refactored as well, moreover there are syntax issues as well. Consider this code:
file="a.txt"
msg="checking string"
tmp="File not exist"
echo "$msg"
if [[ -f $file ]]; then
s="$(<$file)"
if [[ $s =~ ^A[0-9]{3}-B[0-9]{4}$ ]]; then
msg="valid"
else
msg="invalid"
fi
else
msg="file not exist"
fi
echo "$msg"
Changes are:
Remove unnecessary cat
Use [[ ... ]] when using bash
Spaces inside [[ ... ]] are required (your code was missing them)
There is no need to check length of 10 as regex will make sure that part as well
As mentioned in comments earlier correct regex should be ^A[0-9]{3}-B[0-9]{4}$ or ^A[[:digit:]]{3}-B[[:digit:]]{4}$
Note that a regex like ^[A[0-9]{3}-B[0-9]{4}]$ matches
^ - start of string
[A[0-9]{3} - three occurrences of A, [ or a digit
-B - a -B string
[0-9]{4} - four digits
] - a ] char
$ - end of string.
So, it matches strings like [A[-B1234], [[[-B1939], etc.
Your regex checking line must look like
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
See the online demo:
#!/bin/bash
tmp="A123-B1234";
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
msg="valid";
else
msg="invalid";
fi
echo $msg;
Output:
valid
Using just grep might be easier:
$ echo A123-B1234 > valid.txt
$ echo 123 > invalid.txt
$ grep -Pq 'A\d{3}-B\d{4}' valid.txt && echo valid || echo invalid
valid
$ grep -Pq 'A\d{3}-B\d{4}' invalid.txt && echo valid || echo invalid
invalid
With your shown samples and attempts, please try following code also.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk '/^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
OR In case you want to check if line in your Input_file should be 10 characters long(by seeing OP's attempted code's exp_len shell variable) then try following code, where an additional condition is also added in awk code.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk -v len="$exp_len" 'length($0) == len && /^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
NOTE: I am using here -f flag to test if file is existing or not, you can change it to -s eg: -s "$file" in case you want to check file is present and is of NOT NULL size.

Using bash (parameter expansion) to sanitize input file

I have a bash script that has a function like so:
sanitize(){
rb_reg="^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$"
if grep -Ex "${rb_reg}" "${1}/.ruby-version" > /dev/null 2>&1; then
sanitize_tmp="$(<"${1}"/.ruby-version)" &&
ruby_version="${sanitize_tmp//[^0-9\.]/}" &&
echo "Setting Ruby Version: ${ruby_version}"
else
echo "There was an error trying to sanitize a .ruby-version file"
echo "The file was: ${1}/.ruby-version"
exit 7
fi
}
I'm using it to check a .ruby-version file and then set the version in there as a variable.
Mostly these files will contain something sensible like: 2.0.0 which works OK. I want to be defensive and not trust the input file, so check/sanitize it as much as possible.
Two questions:
If for some reason there were multiple version numbers in the file on multiple lines, say:
'2.0.0
1.0.0'
That's going to smash them together currently removing white space and end up with a variable like: '2.0.01.0.0'
What's a good way to only pick up the first version number that matches the regex?
Is there a better way to do this, maybe entirely in bash without grep? Appreciate any examples people have of checking for a version like this but not trusting the input file.
I'm still playing around with this a little, but here is what I ended up doing.
I'm passing in the file name as an argument to the function elsewhere in the script. Really liked the concept of BASH_REMATCH, so tried to avoid using grep, sed, awk etc and do it this way.
You can view the latest version of the code here: https://github.com/octopusnz/scripts
sanitize(){
if [[ "${#}" -ne 1 ]]; then
echo "[ERROR 7]: We expected 1 argument to the sanitize() function."
echo "But we got ${#} instead."
exit 7
fi
rbv_reg="^([0-9]{1,2})\.([0-9]{1,2})\.([0-9]{1,2})(-([a-z]{1,10}))?$"
reg_matches=0
while read -r rbv_line || [[ -n "$rbv_line" ]]; do
if [[ "${rbv_line}" =~ ${rbv_reg} ]]; then
ruby_version="${BASH_REMATCH[0]//[^0-9a-z\.\-]/}" &&
((reg_matches="${reg_matches}"+1)) &&
echo "" &&
echo "Setting Ruby version: ${ruby_version}" &&
break
fi
done < "${1}"
if [[ "${reg_matches}" -lt 1 ]]; then
if [[ -v ruby_version ]]; then
echo "We couldn't parse ${1} and set a valid Ruby version."
echo "Using default: ${ruby_version}"
else
echo "We couldn't parse ${1} and set a default Ruby version."
echo "[ERROR 4]: No valid .ruby-version file found."
exit 4
fi
fi
}

Weird regex behavior in bash if condition

I have written a small script that loops through directories (starting from a given argument directory) and prompts directories that have an xml file inside. Here is my code :
#! /bin/bash
process()
{
LIST_ENTRIES=$(find $1 -mindepth 1 -maxdepth 1)
regex="\.xml"
if [[ $LIST_ENTRIES =~ $regex ]]; then
echo "$1"
fi
# Process found entries
while read -r line
do
if [[ -d $line ]]; then
process $line
fi
done <<< "$LIST_ENTRIES"
}
process $1
This code works fine. However, if I change the regex to \.xml$ to indicate that it should match at the end of the line, the result is different, and I do not get all the right directories.
Is there something wrong with this ?
Your variable LIST_ENTRIES may not have .xml as the last entry.
To validate, try echo $LIST_ENTRIES.
To overcome this, use for around your if:
process()
{
LIST_ENTRIES=$(find $1 -mindepth 1 -maxdepth 1)
regex="\.xml$"
for each in $LIST_ENTRIES; do
if [[ $each =~ $regex ]]; then
echo "$1"
fi
done
# Process found entries
while read -r line
do
if [[ -d $line ]]; then
process $line
fi
done <<< "$LIST_ENTRIES"
}
process $1

Splitting all txt files in a folder into smaller files based on a regular expression using bash

I have a folder containing large text files. Each file is a collection of 1000 files separated by [[ file name ]]. I want to split the files and make 1000 files out of them and put them in a new folder. Is there a way in bash to do it? Any other fast method will also do.
for f in $(find . -name '*.txt')
do mkdir $f
mv
cd $f
awk '/[[.*]]/{g++} { print $0 > g".txt"}' $f
cd ..
done
You are trying to create a folder with the same name of the already existing file.
for f in $(find . -name '*.txt')
do mkdir $f
Here, "find" will list the files in the current path, and for each of these files you will try to create a directory with exactly the same name. One way of doing it would be first creating a temporary folder:
for f in $(find . -name '*.txt')
do mkdir temporary # create a temporary folder
mv $f temporary # move the file into the folder
mv temporary $f # rename the temporary folder to the name of the file
cd $f # enter the folder and go on....
awk '/[[.*]]/{g++} { print $0 > g".txt"}' $f
cd ..
done
Note that all your folders will have the ".txt" extension. If you don't want that, you can cut it out before creating the folder; that way, you won't need the temporary folder, because the folder you're trying to create has a different name from the .txt file.
Example:
for f in $(find . -name '*.txt' | rev | cut -b 5- | rev)
Although not awk and written and written by a drunk person, not guaranteed to work.
import re
import sys
def main():
pattern = re.compile(r'\[\[(.+)]]')
with open (sys.argv[1]) as f:
for line in f:
m = re.search(pattern, line)
if m:
try:
with open(fname, 'w+') as g:
g.writelines(lines)
except NameError:
pass
fname = m.group(1)
lines = []
else:
lines.append(line)
with open(fname, 'w+') as g:
g.writelines(lines)
if __name__ == '__main__':
main()
Write a bash script. Here, I've done it for you.
Notice the structure and features of this script:
explain what it does in a usage() function, which is used for the -h option.
provide a set of standard options: -h, -n, -v.
use getopts to do option processing
do lots of error checking on the arguments
be careful about filename parsing (notice that blanks surrounding the file names are ignored.
hide details within functions. Notice the 'talk', 'qtalk', 'nvtalk' functions? Those are from a bash library I've built to make this kind of scripting easy to do.
explain what is going on to the user if in $verbose mode.
provide the user the ability to see what would be done without actually doing it (the -n option, for $norun mode).
never run commands directly. but use the run function, which pays attention to the $norun, $verbose, and $quiet variables.
I'm not just fishing for you, but teaching you how to fish.
Good luck with your next bash script.
Alan S.
#!/bin/bash
# split-collections IN-FOLDER OUT-FOLDER
PROG="${0##*/}"
usage() {
cat 1>&2 <<EOF
usage: $PROG [OPTIONS] IN-FOLDER OUT-FOLDER
This script splits a collection of files within IN-FOLDER into
separate, named files into the given OUT-FOLDER. The created file
names are obtained from formatted text headers within the input
files.
The format of each input file is a set of HEADER and BODY pairs,
where each HEADER is a text line formatted as:
[[input-filename1]]
text line 1
text line 2
...
[[input-filename2]]
text line 1
text line 2
...
Normal processing will show the filenames being read, and file
names being created. Use the -v (verbose) option to show the
number of text lines being written to each created file. Use
-v twice to show the actual lines of text being written.
Use the -n option to show what would be done, without actually
doing it.
Options
-h Show this help
-n Dry run -- do NOT create any files or make any changes
-o Overwrite existing output files.
-v Be verbose
EOF
exit
}
talk() { echo 1>&2 "$#" ; }
chat() { [[ -n "$norun$verbose" ]] && talk "$#" ; }
nvtalk() { [[ -n "$verbose" ]] || talk "$#" ; }
qtalk() { [[ -n "$quiet" ]] || talk "$#" ; }
nrtalk() { talk "${norun:+(norun) }$#" ; }
error() {
local code=2
case "$1" in [0-9]*) code=$1 ; shift ;; esac
echo 1>&2 "$#"
exit $code
}
talkf() { printf 1>&2 "$#" ; }
chatf() { [[ -n "$norun$verbose" ]] && talkf "$#" ; }
nvtalkf() { [[ -n "$verbose" ]] || talkf "$#" ; }
qtalkf() { [[ -n "$quiet" ]] || talkf "$#" ; }
nrtalkf() { talkf "${norun:+(norun) }$#" ; }
errorf() {
local code=2
case "$1" in [0-9]*) code=$1 ; shift ;; esac
printf 1>&2 "$#"
exit $code
}
# run COMMAND ARGS ...
qrun() {
( quiet=1 run "$#" )
}
run() {
if [[ -n "$norun" ]]; then
if [[ -z "$quiet" ]]; then
nrtalk "$#"
fi
else
if [[ -n "$verbose" ]]; then
talk ">> $#"
fi
if ! eval "$#" ; then
local code=$?
return $code
fi
fi
return 0
}
show_line() {
talkf "%s:%d: %s\n" "$in_file" "$lines_in" "$line"
}
# given an input filename, read it and create
# the output files as indicated by the contents
# of the text in the file
split_collection() {
in_file="$1"
out_file=
lines_in=0
lines_out=0
skipping=
while read line ; do
: $(( lines_in++ ))
[[ $verbose_count > 1 ]] && show_line
# if a line with the format of "[[foo]]" occurs,
# close the current output file, and open a new
# output file called "foo"
if [[ "$line" =~ ^\[\[[[:blank:]]*([^ ]+.*[^ ]|[^ ])[[:blank:]]*\]\][[:blank:]]*$ ]] ; then
new_file="${BASH_REMATCH[1]}"
# close out the current file, if any
if [[ "$out_file" ]]; then
nrtalkf "%d lines written to %s\n" $lines_out "$out_file"
fi
# check the filename for bogosities
case "$new_file" in
*..*|*/*)
[[ $verbose_count < 2 ]] && show_line
error "Badly formatted filename"
;;
esac
out_file="$out_folder/$new_file"
if [[ -e "$out_file" ]]; then
if [[ -n "$overwrite" ]]; then
nrtalk "Overwriting existing '$out_file'"
qrun "cat /dev/null >'$out_file'"
else
error "$out_file already exists."
fi
else
nrtalk "Creating new output file: '$out_file' ..."
qrun "touch '$out_file'"
fi
lines_out=0
elif [[ -z "$out_file" ]]; then
# apparently, there are text lines before the filename
# header; ignore them (out loud)
if [[ ! "$skipping" ]]; then
talk "Text preceding first filename ignored.."
skipping=1
fi
else # next line of input for the file
qrun "echo \"$line\" >>'$out_file'"
: $(( lines_out++ ))
fi
done
}
norun=
verbose=
verbose_count=0
overwrite=
quiet=
while getopts 'hnoqv' opt ; do
case "$opt" in
h) usage ;;
n) norun=1 ;;
o) overwrite=1 ;;
q) quiet=1 ;;
v) verbose=1 ; : $(( verbose_count++ )) ;;
esac
done
shift $(( OPTIND - 1 ))
in_folder="${1:?Missing IN-FOLDER; see $PROG -h for details}"
out_folder="${2:?Missing OUT-FOLDER; see $PROG -h for details}"
# validate the input and output folders
#
# It might be reasonable to create the output folder for the
# user, but that's left as an exercise for the user.
in_folder="${in_folder%/}" # remove trailing slash, if any
out_folder="${out_folder%/}"
[[ -e "$in_folder" ]] || error "$in_folder does not exist"
[[ -d "$in_folder" ]] || error "$in_folder is not a directory."
[[ -e "$out_folder" ]] || error "$out_folder does not exist."
[[ -d "$out_folder" ]] || error "$out_folder is not a directory."
for collection in $in_folder/* ; do
talk "Reading $collection .."
split_collection "$collection" <$collection
done
exit

shell scripting and regular expression

#!bin/bash
echo enter your password :
read password
passlength=$(echo ${#password})
if [ $passlength -le 8 ];
then
echo you entered correct password
else
echo entered password is incorrect
fi
if [[$password == [a-z]*[0-9][a-z]*]];
then
echo match found
else
echo match not found
fi
I am not getting what's wrong with this code. If I enter any string as a password, let's say hello123, it gives me an error:
hello123 : command not found
What is wrong with my script?
You can do the following to make it work cross-platforms with any the bourne shell (/bin/sh) based shell, no bash specific primitives -
echo "$password" | grep -q "[a-z]*[0-9][a-z]*"
if [ $? -eq 0 ] ;then
echo "match found"
else
echo "match not found"
fi
Also feel free to use quotes around the variable names. It will save you hours and hours worth of useless debugging. :)
Technically it should give you an error like [[hello123 : command not found.
The issue is that [[$password is not expanded how you think it is. Bash will first resolve the $password variable to what you entered (i.e. hello123). This will yield the string [[hello123 which bash will then try to invoke (and fail, as there is nothing with that name).
Simply add a space () after [[ and bash will recognise [[ as the command to run (although it is a builtin).
if [[ "$password" == [a-z]*[0-9][a-z]* ]]
then
...
The corrected script is below. The errors were:
#!/bin/bash, not #!bin/bash
To read password length, just do passlength=${#password}, not
passlength=$(echo ${#password})
Always put a space after [ or [[
#!/bin/bash
echo "enter your password :"
read password
passlength=${#password}
if [[ $passlength -le 8 ]]
then
echo "you entered correct password"
else
echo "entered password is incorrect"
fi
if [[ $password == [a-z]*[0-9][a-z]* ]]
then
echo "match found"
else
echo "match not found"
fi
In the bash [[ construct, the == operator will match glob-style patterns, and =~ will match regular expressions. See the documentation.
#!/bin/bash
read -s -p "Enter Password: " password
password_length=${#password}
if [ $password_length -lt 8 -o $password_length -gt 20 ] ;then
echo -e "Invalid password - should be between 8 and 20 characters in length.";
echo ;
else
# Check for invalid characters
case $password in
*[^a-zA-Z0-9]* )
echo -e "Password contains invalid characters.";
echo ;
;;
* )
echo "Password accepted.";
echo ;
break;
;;
esac
fi
More tuned example..
Try to replace line
if [[$password == [a-z]*[0-9][a-z]*]];
with following
if echo "$password" | grep -qs '[a-z]*[0-9][a-z]*'
HTH