Checking if a file is in a S3 bucket using the s3cmd

Checking if a file is in a S3 bucket using the s3cmd - if-statement

I having a program that successfully uploads all of the files that I need. I have new files everyday that I need to upload. After I have uploaded the files I no longer need them and thus am not looking to sync them.
I am curious if there is a way to check if given a path and file name if that exists within S3 using the s3cmd.

You can use the ls command in s3cmd to know if a file is present or not in S3.
Bash code
path=$1
count=`s3cmd ls $path | wc -l`
if [[ $count -gt 0 ]]; then
echo "exist"
else
echo "do not exist"
fi
Usage: ./s3_exist.sh s3://foo/bar.txt
Edit:
As cocoatomo pointed out in comments, s3cmd ls $path lists all file that begins with $path. A safer approach would be to use s3cmd info $path and check the exit code.
New Bash code
path=$1
s3cmd info $path >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
echo "exist"
else
echo "do not exist"
fi

Assuming that bar.txt and bar.txt.bak exist in a bucket s3://foo, "s3cmd ls s3://foo/bar.txt" shows a following output.
$ s3cmd ls s3://foo/bar.txt
2013-11-11 11:11 5 s3://foo/bar.txt
2013-11-11 11:11 5 s3://foo/bar.txt.bak
Since we should remove 2nd line from the command result, we use "awk" command to filter unnecessary lines.
$ filename=s3://foo/bar.txt
$ s3cmd ls ${filename} | awk "\$4 == \"${filename}\" { print \$4 }"
2013-11-11 11:11 5 s3://foo/bar.txt
Finally, we build up all commands.
filename=s3://foo/bar.txt
count=$(s3cmd ls ${filename} | awk "\$4 == \"${filename}\" { print \$4 }" | wc -l)
if [ $count -eq 0 ]; then
echo "file does not exist"
else
echo "file exists"
fi

In the newer version of AWS CLI, you can use the following code to detect the existence of a file or directory
count=$(aws s3 ls $path | wc -l)
if [ $count -gt 0 ]
then
(>&2 echo "$path already exists!")
return
fi

We can use s3cmd ls , Take one flag flag_exists true if file is there and false if file is not there.
FLAG_EXISTS=false
for j in $(s3cmd ls s3://abc//abc.txt); do
if [[ "$j" == "s3://abc//abc.txt" ]]; then
FLAG_EXISTS=true
break
fi
done
if [ "$FLAG_EXISTS" = false ]; then
echo 'file not exists'
else
echo 'file exists'
fi
Explanation - Since ls can return many values like if u search for s3cmd ls abc.txt , then it can return values like abc.txt abcd.txt and so on , so looping and checking using if condition if file exists.

Related

bash sript to check script file extension and adding an extension

I have written the following Bash script. Its role is to check its own name, and in case of nonexistent extension , to amend ".sh" with sed. Still I have error "missing target file..."
#!/bin/bash
FILE_NAME="$0"
EXTENSION=".sh"
FILE_NAME_MOD="$FILE_NAME$EXTENSION"
if [[ "$0" != "FILE_NAME_MOD" ]]; then
echo mv -v "$FILENAME" "$FILENAME$EXTENSION"
cp "$0" | sed 's/\([^.sh]\)$/\1.sh/g' $0
fi

#!/bin/bash
file="$0"
extension=".sh"
if [ $(echo -n $file | tail -c 3) != $extension ]; then
mv -v "$file" "$file$extension"
fi
Important stuff:
-n flag suppress the new line at the end, so we can test for 3 chars instead of 4
When in doubt, always use set -x to debug your scripts.

Try this Shellcheck-clean code:
#! /bin/bash -p
file=${BASH_SOURCE[0]}
extension=.sh
[[ $file == *"$extension" ]] || mv -i -- "$file" "$file$extension"
See choosing between $0 and BASH_SOURCE for details of why ${BASH_SOURCE[0]} is better than $0.
See Correct Bash and shell script variable capitalization for details of why file is better than FILE and extension is better than EXTENSION. (In short, ALL_UPPERCASE names are dangerous because there is a danger that they will clash with names that are already used for something else.)
The -i option to mv means that you will be prompted to continue if the new filename is already in use.
See Should I save my scripts with the .sh extension? before adding .sh extensions to your shell programs.

Just for fun, here is a way to do it just with GNU sed:
#!/usr/bin/env bash
sed --silent '
# match FILENAME only if it does not end with ".sh"
/\.sh$/! {
# change "FILENAME" to "mv -v FILENAME FILENAME.sh"
s/.*/mv -v & &.sh/
# execute the command
e
}
' <<<"$0"
You can also make the above script output useful messages:
#!/usr/bin/env bash
sed --silent '
/\.sh$/! {
s/.*/mv -v & &.sh/
e
# exit with code 0 immediately after the change has been made
q0
}
# otherwise exit with code 1
q1
' <<<"$0" && echo 'done' || echo 'no changes were made'

incrementing list variable and then loop on it

in a sh script, I am trying to make a list of filename in a folder, and then loop on it to check if two consecutive filename respond well to "expression criteria".
in a folder I have:
file1.nii
file1_mask.nii
file2.nii
file2_mask.nii
etc ...
undefined number of files. but if filex.nii exists, it must have filex_mask.nii
in a .txt file that the user modify.
it contains:
file1.nii tab some parameter \n
file2.nii tab some parameter \n
etc ...
the script take long hours after to run, and for example, the mask files are used only after few hours.
so I want at the beginning of the .sh to check if filenames are well spelled and if any files in the .txt is present in the folder.
and in case not, stop the .sh and warn the user. not wait hours before noticing the problem.
For now I tried:
test=""
for entry in "${search_dir}"/*
do
echo "$entry"
test="${test} $entry"
done
I have then a string variable with space between filenames, but it has the folder name as well.
./search_dir/file1.nii ./search_dir/file1_mask.nii
I wanted file1.nii file1_mask.nii etc ...
and now I read my .txt file and check if the filename specified in it are in my test variable.
while read -r line
do
set -- $line
stack=$1
check=False
check2=False
for i in $test; do
echo "$stack.nii"
echo "$i"
if "${stack}.nii" == "$i";
then
check=True
fi
if "${stack}_mask.nii"=="$i";
then
check2=True
fi
done
done < "$txt_file"
but it is not working.
"$stack_mask.nii"=="$i"
doesn't seems to be the good way to compare strings
it generates the error:
"file1.nii" not found
Here is my solution for now, based on glenn answer:
errs=0
while read -r line; do
set -- $line
prefix="${1}.nii"
prefix2="${1}.nii.gz"
if [ -e ${PATH}/$prefix2 ]; then
echo "File found: ${PATH}/$prefix2" >&2
elif [ -e ${PATH}/$prefix ]; then
echo "File found: ${PATH}/$prefix" >&2
else
echo "File not found: ${PATH}/$prefix" >&2
errs=$((errs + 1))
fi
prefixmask="${1}_brain_mask.nii"
prefixmask2="${1}_brain_maskefsd.nii.gz"
if [ -e ${PATH}/$prefixmask ]; then
echo "Mask file found for ${PATH}/$prefixmask" >&2
elif [ -e ${PATH}/$prefixmask2 ]; then
echo "Mask file found for ${PATH}/$prefixmask2" >&2
else
echo "Mask file not found: ${PATH}/$prefixmask" >&2
errs=$((errs + 1))
fi
done < "$INPUT"
echo $errs
if [ $errs > 0 ]; then
echo "Errors found"
exit 3
fi
then only problem now is that it always exit, even if errs is equal to 0 and I don't know why ...

I would do this:
errs=0
for f in "$search_dir"/*.mii; do
[[ $f == *_mask.mii ]] && continue # skip the mask files
prefix=${f%.mii} # strip off the extension
if [[ ! -f "${prefix}_mask.mii" ]]; then
echo "Error: $f has no mask file" >&2
((errs++))
fi
done
if [[ $errs -gt 0 ]]; then
echo "Aborting due to errors" >&2
exit 2
fi
That should be pretty efficient, since it just loops through the files once.
Now that we see the input file:
errs=0
while read -r mii_file other_stuff; do
prefix="${mii_file%.mii}"
if [[ ! -f ./"$mii_file" ]]; then # adjust your relative path accordingly
echo "File not found: $mii_file" >&2
((errs++))
elif [[ ! -f ./"${prefix}_mask.mii" ]]; then
echo "Mask file missing for $mii_file" >&2
((errs++))
fi
done < "$txt_file"
if (( errs > 0 )); then
echo "Errors found"
exit 2
fi

Bash Script sed command not working correctly with file passed through command line

Problem
As I am trying to write a script to rename massive files according to some regex requirement, the command work ok on my iTerm2 succeeds but the same command fails to do the work in the script.
Plus some of my file names includes some Chinese and Korean characters.(don't know whether that is the problem or not)
code
So My code takes three input: Old regex, New regex and the files that need to be renamed.
Here is not code:
#!/bin/bash
# we have less than 3 arguments. Print the help text:
if [ $# -lt 3 ] ; then
cat << HELP
ren -- renames a number of files using sed regular expressions USAGE: ren 'regexp'
'replacement' files...
EXAMPLE: rename all *.HTM files into *.html:
ren 'HTM' 'html' *.HTM
HELP
exit 0
fi
OLD="$1"
NEW="$2"
# The shift command removes one argument from the list of
# command line arguments.
shift
shift
# $# contains now all the files:
for file in "$#"; do
if [ -f "$file" ] ; then
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
if [ -f "$newfile" ]; then
echo "ERROR: $newfile exists already"
else
echo "renaming $file to $newfile ..."
mv "$file" "$newfile"
fi
fi
done
I register the bash command in the .profile as:
alias ren="bash /pathtothefile/ren.sh"
Test
The original file name is "제01과.mp3" and I want it to become "第01课.mp3".
So with my script I use:
$ ren "제\([0-9]*\)과" "第\1课" *.mp3
And it seems that the sed in the script has not worked successfully.
But the following which is exactly the same, works to replaces the name:
$ echo "제01과.mp3" | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
Any thoughts? Thx
Print the result
I have make the following change in the script so that it could print the process information:
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
echo "The ${file} is changed to ${newfile}"
And the result for my test is:
The 제01과.mp3 is changed into 제01과.mp3
ERROR: 제01과.mp3 exists already
So there is no format problem.
Updating(all done under bash 4.2.45(2), Mac OS 10.9)
Testing
As I try to execute the command from the bash directly. I mean with the for loop. There is something interesting. I first stored all the names into a files.txt file using:
$ ls | grep mp3 > files.txt
And do the sed and bla bla. While single command in bash interactive mode like:
$ file="제01과.mp3"
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
gives
第01课.mp3
While in the following in the interactive mode:
files=`cat files.txt`
for file in $files
do
echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
done
gives no changes!
And by now:
echo $file
gives:
$ 제30과.mp3
(There are only 30 files)
Problem Part
And I tried the first command which worked before:
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
It gives no changes as:
$ 제30과.mp3
So I create a new newfile and tried again as:
$ newfile="제30과.mp3"
$ echo $newfile | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
And it gives correctly:
$第30课.mp3
WOW ORZ... Why! Why ! Why! And I try to see whether file and newfile are the same, and of course, they are not:
if [[ $file == $new ]]; then
echo True
else
echo False
fi
gives:
False
My guess
I guess there are some encoding problems , but I have found non reference, could anyone help? Thx again.
Update 2
I seem to understand that there are a huge difference between string and the file name. To be specific, it I directly use a variable like:
file="제30과.mp3"
in the script, the sed works fine. However, if the variable was passed from the $# or set the variable like:
file=./*mp3
Then the sed fails to work. I don't know why. And btw, mac sed has no -r option and in ubuntu -r does not solve the question I mention above.

Some errors combined:
In order to use groups in a regex, you need extended regex -r in sed, -E in grep
escaping correctly is a beast :)
Example
files="제2과.mp3 제30과.mp3"
for file in $files
do
echo $file | sed -r 's/제([0-9]*)과\.mp3/第\1课.mp3/g'
done
outputs
第2课.mp3
第30课.mp3

If you are not doing this as a programming project, but want to skip ahead to the part where it just works, I found these resources listed at http://www.tldp.org/LDP/GNU-Linux-Tools-Summary/html/x4055.htm:
MMV (and MCP, MLN, ...) utilities use a specialized syntax to perform bulk file operations on paths. (http://linux.maruhn.com/sec/mmv.html)
mmv before\*after.mp3 Before\#1After.mp3
Esomaniac, a Java alternative that also works on Windows, is apparently dead (home page is parked).
rename is a perl script you can download from CPAN: https://metacpan.org/release/File-Rename
rename 's/\.JPG$/.jpg/' *.JPG

Splitting all txt files in a folder into smaller files based on a regular expression using bash

I have a folder containing large text files. Each file is a collection of 1000 files separated by [[ file name ]]. I want to split the files and make 1000 files out of them and put them in a new folder. Is there a way in bash to do it? Any other fast method will also do.
for f in $(find . -name '*.txt')
do mkdir $f
mv
cd $f
awk '/[[.*]]/{g++} { print $0 > g".txt"}' $f
cd ..
done

You are trying to create a folder with the same name of the already existing file.
for f in $(find . -name '*.txt')
do mkdir $f
Here, "find" will list the files in the current path, and for each of these files you will try to create a directory with exactly the same name. One way of doing it would be first creating a temporary folder:
for f in $(find . -name '*.txt')
do mkdir temporary # create a temporary folder
mv $f temporary # move the file into the folder
mv temporary $f # rename the temporary folder to the name of the file
cd $f # enter the folder and go on....
awk '/[[.*]]/{g++} { print $0 > g".txt"}' $f
cd ..
done
Note that all your folders will have the ".txt" extension. If you don't want that, you can cut it out before creating the folder; that way, you won't need the temporary folder, because the folder you're trying to create has a different name from the .txt file.
Example:
for f in $(find . -name '*.txt' | rev | cut -b 5- | rev)

Although not awk and written and written by a drunk person, not guaranteed to work.
import re
import sys
def main():
pattern = re.compile(r'\[\[(.+)]]')
with open (sys.argv[1]) as f:
for line in f:
m = re.search(pattern, line)
if m:
try:
with open(fname, 'w+') as g:
g.writelines(lines)
except NameError:
pass
fname = m.group(1)
lines = []
else:
lines.append(line)
with open(fname, 'w+') as g:
g.writelines(lines)
if __name__ == '__main__':
main()

Write a bash script. Here, I've done it for you.
Notice the structure and features of this script:
explain what it does in a usage() function, which is used for the -h option.
provide a set of standard options: -h, -n, -v.
use getopts to do option processing
do lots of error checking on the arguments
be careful about filename parsing (notice that blanks surrounding the file names are ignored.
hide details within functions. Notice the 'talk', 'qtalk', 'nvtalk' functions? Those are from a bash library I've built to make this kind of scripting easy to do.
explain what is going on to the user if in $verbose mode.
provide the user the ability to see what would be done without actually doing it (the -n option, for $norun mode).
never run commands directly. but use the run function, which pays attention to the $norun, $verbose, and $quiet variables.
I'm not just fishing for you, but teaching you how to fish.
Good luck with your next bash script.
Alan S.
#!/bin/bash
# split-collections IN-FOLDER OUT-FOLDER
PROG="${0##*/}"
usage() {
cat 1>&2 <<EOF
usage: $PROG [OPTIONS] IN-FOLDER OUT-FOLDER
This script splits a collection of files within IN-FOLDER into
separate, named files into the given OUT-FOLDER. The created file
names are obtained from formatted text headers within the input
files.
The format of each input file is a set of HEADER and BODY pairs,
where each HEADER is a text line formatted as:
[[input-filename1]]
text line 1
text line 2
...
[[input-filename2]]
text line 1
text line 2
...
Normal processing will show the filenames being read, and file
names being created. Use the -v (verbose) option to show the
number of text lines being written to each created file. Use
-v twice to show the actual lines of text being written.
Use the -n option to show what would be done, without actually
doing it.
Options
-h Show this help
-n Dry run -- do NOT create any files or make any changes
-o Overwrite existing output files.
-v Be verbose
EOF
exit
}
talk() { echo 1>&2 "$#" ; }
chat() { [[ -n "$norun$verbose" ]] && talk "$#" ; }
nvtalk() { [[ -n "$verbose" ]] || talk "$#" ; }
qtalk() { [[ -n "$quiet" ]] || talk "$#" ; }
nrtalk() { talk "${norun:+(norun) }$#" ; }
error() {
local code=2
case "$1" in [0-9]*) code=$1 ; shift ;; esac
echo 1>&2 "$#"
exit $code
}
talkf() { printf 1>&2 "$#" ; }
chatf() { [[ -n "$norun$verbose" ]] && talkf "$#" ; }
nvtalkf() { [[ -n "$verbose" ]] || talkf "$#" ; }
qtalkf() { [[ -n "$quiet" ]] || talkf "$#" ; }
nrtalkf() { talkf "${norun:+(norun) }$#" ; }
errorf() {
local code=2
case "$1" in [0-9]*) code=$1 ; shift ;; esac
printf 1>&2 "$#"
exit $code
}
# run COMMAND ARGS ...
qrun() {
( quiet=1 run "$#" )
}
run() {
if [[ -n "$norun" ]]; then
if [[ -z "$quiet" ]]; then
nrtalk "$#"
fi
else
if [[ -n "$verbose" ]]; then
talk ">> $#"
fi
if ! eval "$#" ; then
local code=$?
return $code
fi
fi
return 0
}
show_line() {
talkf "%s:%d: %s\n" "$in_file" "$lines_in" "$line"
}
# given an input filename, read it and create
# the output files as indicated by the contents
# of the text in the file
split_collection() {
in_file="$1"
out_file=
lines_in=0
lines_out=0
skipping=
while read line ; do
: $(( lines_in++ ))
[[ $verbose_count > 1 ]] && show_line
# if a line with the format of "[[foo]]" occurs,
# close the current output file, and open a new
# output file called "foo"
if [[ "$line" =~ ^\[\[[[:blank:]]*([^ ]+.*[^ ]|[^ ])[[:blank:]]*\]\][[:blank:]]*$ ]] ; then
new_file="${BASH_REMATCH[1]}"
# close out the current file, if any
if [[ "$out_file" ]]; then
nrtalkf "%d lines written to %s\n" $lines_out "$out_file"
fi
# check the filename for bogosities
case "$new_file" in
*..*|*/*)
[[ $verbose_count < 2 ]] && show_line
error "Badly formatted filename"
;;
esac
out_file="$out_folder/$new_file"
if [[ -e "$out_file" ]]; then
if [[ -n "$overwrite" ]]; then
nrtalk "Overwriting existing '$out_file'"
qrun "cat /dev/null >'$out_file'"
else
error "$out_file already exists."
fi
else
nrtalk "Creating new output file: '$out_file' ..."
qrun "touch '$out_file'"
fi
lines_out=0
elif [[ -z "$out_file" ]]; then
# apparently, there are text lines before the filename
# header; ignore them (out loud)
if [[ ! "$skipping" ]]; then
talk "Text preceding first filename ignored.."
skipping=1
fi
else # next line of input for the file
qrun "echo \"$line\" >>'$out_file'"
: $(( lines_out++ ))
fi
done
}
norun=
verbose=
verbose_count=0
overwrite=
quiet=
while getopts 'hnoqv' opt ; do
case "$opt" in
h) usage ;;
n) norun=1 ;;
o) overwrite=1 ;;
q) quiet=1 ;;
v) verbose=1 ; : $(( verbose_count++ )) ;;
esac
done
shift $(( OPTIND - 1 ))
in_folder="${1:?Missing IN-FOLDER; see $PROG -h for details}"
out_folder="${2:?Missing OUT-FOLDER; see $PROG -h for details}"
# validate the input and output folders
#
# It might be reasonable to create the output folder for the
# user, but that's left as an exercise for the user.
in_folder="${in_folder%/}" # remove trailing slash, if any
out_folder="${out_folder%/}"
[[ -e "$in_folder" ]] || error "$in_folder does not exist"
[[ -d "$in_folder" ]] || error "$in_folder is not a directory."
[[ -e "$out_folder" ]] || error "$out_folder does not exist."
[[ -d "$out_folder" ]] || error "$out_folder is not a directory."
for collection in $in_folder/* ; do
talk "Reading $collection .."
split_collection "$collection" <$collection
done
exit

Remove text in file name after particular search pattern

I have a file name such as follows:
file_name1.pdf.sometext.here
I have a directory of several files in the same format, and I want to edit all the files so that the portion after .pdf is deleted... thus the file would look like this
file_name1.pdf

Using parameter expansion:
$ ls *.pdf*
file_name1.pdf.sometext.here file_name2.pdf.blah file_name3.pdf.sometext
$ for fname in *.pdf*; do mv "$fname" "${fname//.pdf.*/.pdf}"; done
$ ls *.pdf*
file_name1.pdf file_name2.pdf file_name3.pdf

This should get you started:
#!/bin/bash
for FILE in "$#"; do
NEWFILE=$(echo $FILE | sed -re 's/(.*.pdf).*/\1/')
if [ ! -z "$NEWFILE" -a ! -f "$NEWFILE" -a ! -d "$NEWFILE" ]; then
mv "$FILE" "$NEWFILE"
fi
done
But if you have /usr/bin/rename, use it:
rename 's/(.*\.pdf).*/$1/' *.here

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Checking if a file is in a S3 bucket using the s3cmd - if-statement

In the newer version of AWS CLI, you can use the following code to detect the existence of a file or directory count=$(aws s3 ls $path | wc -l) if [ $count -gt 0 ] then (>&2 echo "$path already exists!") return fi

Related

bash sript to check script file extension and adding an extension

incrementing list variable and then loop on it

Bash Script sed command not working correctly with file passed through command line

Splitting all txt files in a folder into smaller files based on a regular expression using bash

Remove text in file name after particular search pattern

Categories

Resources