Does AWS CLI upload files larger than 4GB to Amazon Glacier? - amazon-web-services

The AWS Command Line Interface (CLI) allows to upload a file to AWS Glacier. But there is also a limit of 4GB for file uploads in the AWS Rest API. If I need to upload a file larger than 4GB through the Rest API, I need to use the multi-part upload.
My question is: does the AWS CLI handle internally file uploads larger than 4GB, or do I need to handle myself the multipart upload when handling files larger than 4GB? Can I just pass a 20Gb file to the upload-archive option of the AWS CLI and it will just work? If the CLI can't handle large file uploads directly, there is any command line tool that does it for me (freeing me from the trouble of implementing all of the checksum computing, error handling and retry logic when a part upload fails)?
I understand that the 4GB limit is on the AWS Rest API, but I could not find anything about how this limit is handled in the CLI. I could just make the test, but my upload speed is not so fast and I fear wasting a few hours before discovering that it does not work.

I'm using glacier-cmd (https://github.com/uskudnik/amazon-glacier-cmd-interface), works pretty well, but seems to be unsupported recently. Sometimes it has a timeout with big files (~50GB).

The below script will work fine. I had created chunks for treehash calculation and fileparts uploads saperately. It worked fine.
#!/bin/bash
date1=$(date +"%s")
byteSize=1073741824
CHUNK_SIZE=1073741824
hashsize=1048576
if [[ -z "${1}" ]]; then
echo "No file provided."
exit 1
fi
ARCHIVE="/mnt/dbfiles/mahipal/splitfiles/${1}"
ARCHIVE_SIZE=`cat "${ARCHIVE}" | wc --bytes`
cd /mnt/dbfiles/mahipal/splitfiles
rm -rf TEMP
rm -rf HASH
mkdir TEMP
mkdir HASH
cd /mnt/dbfiles/mahipal/splitfiles/TEMP
date3=$(date +"%s")
split -d --bytes=${CHUNK_SIZE} "${ARCHIVE}" chunk -a 4
date4=$(date +"%s")
diff2=$(($date4-$date3))
cd /mnt/dbfiles/mahipal/splitfiles/HASH
date5=$(date +"%s")
split -d --bytes=${hashsize} "${ARCHIVE}" chunk -a 5
date6=$(date +"%s")
diff3=$(($date6-$date5))
cd /mnt/dbfiles/mahipal/splitfiles/TEMP
lastpartsize=`expr $(ls -l | tail -1 | awk '{print$5}') + 0`
lastfile=$(ls -l | tail -1 | awk '{print$9}')
cont=$(ls -l | wc -l)
cnt=`expr $cont - 2`
fileCount=$(ls -1 | grep "^chunk" | wc -l)
echo "Total parts to upload: " $fileCount
files=$(ls | grep "^chunk")
init=$(/bin/aws glacier initiate-multipart-upload --account-id - --part-size $byteSize --vault-name final_vault --archive-description "${1}_${ARCHIVE_SIZE}_${byteSize}")
echo "---------------------------------------"
uploadId=$(echo $init | jq '.uploadId' | xargs)
touch commands.txt
i=0
for f in $files
do
byteStart=$((i*byteSize))
byteEnd=$((i*byteSize+byteSize-1))
echo /bin/aws glacier upload-multipart-part --body $f --range "'"'bytes '"$byteStart"'-'"$byteEnd"'/*'"'" --account-id - --vault-name final_vault --upload-id $uploadId >> commands.txt
i=$(($i+1))
if [ "$i" == "$cnt" ]
then
byteEnd=`expr $byteEnd + 1`
byteEnd2=$((i*byteSize+lastpartsize-1))
byteSize=$lastpartsize
echo /bin/aws glacier upload-multipart-part --body $lastfile --range "'"'bytes '"$byteEnd"'-'"$byteEnd2"'/*'"'" --account-id - --vault-name final_vault --upload-id $uploadId >> commands.txt
break
fi
done
parallel --load 100% -a commands.txt --no-notice --bar
cd /mnt/dbfiles/mahipal/splitfiles/HASH
files=$(ls | grep "^chunk")
for f in $files
do
openssl dgst -sha256 -binary ${f} > "hash${f:5}"
done
echo "List Active Multipart Uploads:"
echo "Verify that a connection is open:"
/bin/aws glacier list-multipart-uploads --account-id - --vault-name final_vault >> /mnt/dbfiles/mahipal/splitfiles/TEMP/commands.txt
echo "-------------"
echo "Contents of commands.txt"
cd /mnt/dbfiles/mahipal/splitfiles/TEMP
cat commands.txt
# Calculate tree hash.
cd /mnt/dbfiles/mahipal/splitfiles/HASH
echo "Calculating tree hash..."
while true; do
COUNT=`ls hash* | wc -l`
if [[ ${COUNT} -le 2 ]]; then
TREE_HASH=$(cat hash* | openssl dgst -sha256 | awk '{print $2}')
break
fi
ls hash* | xargs -n 2 | while read PAIR; do
PAIRARRAY=(${PAIR})
if [[ ${#PAIRARRAY[#]} -eq 1 ]]; then
break
fi
cat ${PAIR} | openssl dgst -sha256 -binary > temphash
rm ${PAIR}
mv temphash "${PAIRARRAY[0]}"
done
done
cd /mnt/dbfiles/mahipal/splitfiles/TEMP
echo "Finalizing..."
/bin/aws glacier complete-multipart-upload --account-id=- --vault-name="final_vault" --upload-id="$uploadId" --checksum="${TREE_HASH}" --archive-size=${ARCHIVE_SIZE} >>commands.txt
RETVAL=$?
if [[ ${RETVAL} -ne 0 ]]; then
echo "complete-multipart-upload failed with status code: ${RETVAL}" >>commands.txt
echo "Aborting upload ${uploadId}" >>commands.txt
/bin/aws glacier abort-multipart-upload --account-id=- --vault-name="final_vault" --upload-id="${uploadId}" >>commands.txt
exit 1
fi
echo "--------------"
echo "Deleting temporary commands.txt file"
#rm commands.txt
date2=$(date +"%s")
diff=$(($date2-$date1))
echo "Total Split Duration for Chunk Part Size: $(($diff2/ 3600 )) hours $((($diff2 % 3600) / 60)) minutes $(($diff2 % 60)) seconds" >>commands.txt
echo "Total Split Duration for hash Part Size: $(($diff3/ 3600 )) hours $((($diff3 % 3600) / 60)) minutes $(($diff3 % 60)) seconds" >>commands.txt
echo "Total upload Duration: $(($diff/ 3600 )) hours $((($diff % 3600) / 60)) minutes $(($diff % 60)) seconds" >>commands.txt
echo "Done."
exit 0

Related

How to download a secured file using curl from s3 bucket using SecretAccessKey and AccessKeyId

I want to download an apk which exists into my private s3 bucket using curl command. I dont want to use awscli/boto3. I have SecretAccessKey, SessionToken, Expiration, AccessKeyId
Tried Following Code:
curl -k -v -L -o url="https://s3-eu-west-1.amazonaws.com" -H "x-amz-security-token: xxxxxxxxxxxxxxxx" -H "Content-Type: application/xml" -X GET https://xyz/test.apk
curl -k -v -L -o url="https://s3-eu-west-1.amazonaws.com" -H "Content-Type: application/xml" -X GET https://xyz/.test.apk?AWSAccessKeyId=xxxxxxxxxxxxx
Here is the script that downloads and upload file to s3, you have to export keys or can modify the script accordingly.
export AWS_ACCESS_KEY_ID=AKxxx
export AWS_SECRET_ACCESS_KEY=zzzz
Download a file
./s3download.sh get s3://mybucket/myfile.txt myfile.txt
That's it, all you need to pass s3 bucket along with file name
#!/bin/bash
set -eu
s3simple() {
local command="$1"
local url="$2"
local file="${3:--}"
# todo: nice error message if unsupported command?
if [ "${url:0:5}" != "s3://" ]; then
echo "Need an s3 url"
return 1
fi
local path="${url:4}"
if [ -z "${AWS_ACCESS_KEY_ID-}" ]; then
echo "Need AWS_ACCESS_KEY_ID to be set"
return 1
fi
if [ -z "${AWS_SECRET_ACCESS_KEY-}" ]; then
echo "Need AWS_SECRET_ACCESS_KEY to be set"
return 1
fi
local method md5 args
case "$command" in
get)
method="GET"
md5=""
args="-o $file"
;;
put)
method="PUT"
if [ ! -f "$file" ]; then
echo "file not found"
exit 1
fi
md5="$(openssl md5 -binary $file | openssl base64)"
args="-T $file -H Content-MD5:$md5"
;;
*)
echo "Unsupported command"
return 1
esac
local date="$(date -u '+%a, %e %b %Y %H:%M:%S +0000')"
local string_to_sign
printf -v string_to_sign "%s\n%s\n\n%s\n%s" "$method" "$md5" "$date" "$path"
local signature=$(echo -n "$string_to_sign" | openssl sha1 -binary -hmac "${AWS_SECRET_ACCESS_KEY}" | openssl base64)
local authorization="AWS ${AWS_ACCESS_KEY_ID}:${signature}"
curl $args -s -f -H Date:"${date}" -H Authorization:"${authorization}" https://s3.amazonaws.com"${path}"
}
s3simple "$#"
You can find more detail here

How to upload files to S3 using Signature v4

I'm trying this for couple of days now but getting stuck at signature calculation. for the background, I've an EC2 instance role assign to the instance and also need to use MKS SSE (Server Side encryption) to store data in S3.
So, this is the script I'm using now:
#!/usr/bin/env bash
set -E
export TERM=xterm
#
s3_region='eu-west-1'
s3_bucket='my-s3-file-bucket'
#
bkup_optn='hourly'
data_type='application/octet-stream'
bkup_path="/tmp/backups/${bkup_optn}"
bkup_file="$( ls -t ${bkup_path}|head -n1 )"
timestamp="$( LC_ALL=C date -u "+%Y-%m-%d %H:%M:%S" )"
#
appHost="$( hostname -f )"
thisApp="$( facter -p my_role )"
thisEnv="$( facter -p my_environment )"
upldUri="${thisEnv}/${appHost}/${bkup_optn}/${bkup_file}"
# This doesn't work on OS X
iso_timestamp=$(date -ud "${timestamp}" "+%Y%m%dT%H%M%SZ")
date_scope=$(date -ud "${timestamp}" "+%Y%m%d")
date_header=$(date -ud "${timestamp}" "+%a, %d %h %Y %T %Z")
## AWS instance role
awsMetaUri='http://169.254.169.254/latest/meta-data/iam/security-credentials/'
awsInstRole=$( curl -s ${awsMetaUri} )
awsAccessKey=$( curl -s ${awsMetaUri}${awsInstRole}|awk -F'"' '/AccessKeyId/ {print $4}' )
awsSecretKey=$( curl -s ${awsMetaUri}${awsInstRole}|grep SecretAccessKey|cut -d':' -f2|sed 's/[^0-9A-Za-z/+=]*//g' )
awsSecuToken=$( curl -s ${awsMetaUri}${awsInstRole}|sed -n '/Token/{p;}'|cut -f4 -d'"' )
signedHeader='date;host;x-amz-content-sha256;x-amz-date;x-amz-security-token;x-amz-server-side-encryption;x-amz-server-side-encryption-aws-kms-key-id'
echo -e "awsInstRole => ${awsInstRole}\nawsAccessKey => ${awsAccessKey}\nawsSecretKey => ${awsSecretKey}"
payload_hash()
{
local output=$(shasum -ba 256 "${bkup_path}/${bkup_file}")
echo "${output%% *}"
}
canonical_request()
{
echo "PUT"
echo "/${upldUri}"
echo ""
echo "date:${date_header}"
echo "host:${s3_bucket}.s3.amazonaws.com"
echo "x-amz-security-token:${awsSecuToken}"
echo "x-amz-content-sha256:$(payload_hash)"
echo "x-amz-server-side-encryption:aws:kms"
echo "x-amz-server-side-encryption-aws-kms-key-id:arn:aws:kms:eu-west-1:xxxxxxxx111:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx1432"
echo "x-amz-date:${iso_timestamp}"
echo ""
echo "${signedHeader}"
printf "$(payload_hash)"
}
canonical_request_hash()
{
local output=$(canonical_request | shasum -a 256)
echo "${output%% *}"
}
string_to_sign()
{
echo "AWS4-HMAC-SHA256"
echo "${iso_timestamp}"
echo "${date_scope}/${s3_region}/s3/aws4_request"
echo "x-amz-security-token:${awsSecuToken}"
echo "x-amz-server-side-encryption:aws:kms"
echo "x-amz-server-side-encryption-aws-kms-key-id:arn:aws:kms:eu-west-1:xxxxxxxx111:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx1432"
printf "$(canonical_request_hash)"
}
signature_key()
{
local secret=$(printf "AWS4${awsSecretKey}" | hex_key)
local date_key=$(printf ${date_scope} | hmac_sha256 "${secret}" | hex_key)
local region_key=$(printf ${s3_region} | hmac_sha256 "${date_key}" | hex_key)
local service_key=$(printf "s3" | hmac_sha256 "${region_key}" | hex_key)
printf "aws4_request" | hmac_sha256 "${service_key}" | hex_key
}
hex_key() {
xxd -p -c 256
}
hmac_sha256() {
local hexkey=$1
openssl dgst -binary -sha256 -mac HMAC -macopt hexkey:${hexkey}
}
signature() {
string_to_sign | hmac_sha256 $(signature_key) | hex_key | sed "s/^.* //"
}
curl \
-T "${bkup_path}/${bkup_file}" \
-H "Authorization: AWS4-HMAC-SHA256 Credential=${awsAccessKey}/${date_scope}/${s3_region}/s3/aws4_request,SignedHeaders=${signedHeader},Signature=$(signature)" \
-H "Date: ${date_header}" \
-H "x-amz-date: ${iso_timestamp}" \
-H "x-amz-security-token:${awsSecuToken}" \
-H "x-amz-content-sha256: $(payload_hash)" \
-H "x-amz-server-side-encryption:aws:kms" \
-H "x-amz-server-side-encryption-aws-kms-key-id:arn:aws:kms:eu-west-1:xxxxxxxx111:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx1432" \
"https://${s3_bucket}.s3.amazonaws.com/${upldUri}"
It was written following this doc:
http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html and parts from a sample script in Github, which I forgot to bookmark. After the initial bumpy ride, now I'm keep getting this error:
<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><AWSAccessKeyId>XXXXXXXXXXXXXXX</AWSAccessKeyId><StringToSign>AWS4-HMAC-SHA256
I went through several AWS docs but I cannot figure out why. Anyone can help me out here please?
-San

Best way to automatically move backups of web server to an AWS sever

I have a web server that produces .tar.zg backup files that I want to automatically transfer to an AWS server.
To accomplish this I have tried to write a bash script on the AWS server that will automatically check for a new backup at the web server and make a copy of the backup if it is more recent (preserving timestamps).
Is there an easier or more robust way to go about this?
Am I correct in my FTP script syntax?
# Credentials to access other machine
HOST=xxxxxx
USER=xxxxx
PASSWD=xxxxxxx
# path to the remoteBackups
remoteBackups=/home/ubuntu/testBackups
# Loops indefinitly
#while [[ true ]]
#do
# FTP to remote host and get the name most recent backup
ftp -inv $HOST<<-EOT
user $USER $PASSWD
#Store name of most recent backup to FILE
# does this work or will it just save it to a variable FILE on the
remote machine
FILE=`ls -t ~/Desktop/backups/*.tar.gz | head -1`
bye
EOT
# For testing
echo $FILE
# Copy (preserving modification dates) file to the local remote
backups folder on aws server
#scp -p -i <.pem> $FILE $remoteBackups
# Get the most recent back up from both directories
latestLocal=`ls -t ~/intranetBackups/*.tar.gz | head -1`
latestRemote=`ls -t $remoteBackups/*.tar.gz | head -1`
# For testing
echo $latestLocal
echo $latestRemote
# If the backup from the remote is newer then save to backups and
sleep for 15 days
if [[ $latestLocal -ot $latestRemote ]]
then
echo Transferring backup from $latestRemote to $latestLocal
sleep 15d
else
echo No new backup file found
sleep 1d
fi
# If there are more than 20 backups delete the oldest
if [[ `ls -1 ~/intranetBackups | wc -l` -ge 20 ]]
then
rm `ls -t ~/intranetBackuos | tail -1`
echo removed the oldest backup
else
echo no file to be removed
fi
#done

looking for s3cmd download command for a certain date

I am trying to figure out on what the s3cmd command would be to download files from bucket by date, so for example i have a bucket named "test" and in that bucket there are different files from different dates. I am trying to get the files that were uploaded yesterday. what would the command be?
There is no single command that will allow you to do that. You have to write a script some thing like this. Or use a SDK that allows you to do this. Below script is a sample script that will get S3 files from last 30 days.
#!/bin/bash
# Usage: ./getOld "bucketname" "30 days"
s3cmd ls s3://$1 | while read -r line; do
createDate=`echo $line|awk {'print $1" "$2'}`
createDate=`date -d"$createDate" +%s`
olderThan=`date -d"-$2" +%s`
if [[ $createDate -lt $olderThan ]]
then
fileName=`echo $line|awk {'print $4'}`
echo $fileName
if [[ $fileName != "" ]]
then
s3cmd get "$fileName"
fi
fi
done;
I like s3cmd but to work with single line command, I prefer the JSon output of aws cli and jq JSon processor
The command will look like
aws s3api list-objects --bucket "yourbucket" |\
jq '.Contents[] | select(.LastModified | startswith("yourdate")).Key' --raw-output |\
xargs -I {} aws s3 cp s3://yourbucket/{} .
basically what the script does
list all object from a given bucket
(the interesting part) jq will parse the Contents array and select element where the LastModified value start with your pattern (you will need to change), get the Key of the s3 object and add --raw-output so it strips the quote from the value
pass the result to an aws copy command to download the file from s3
if you want to automate a bit further you can get yesterday from the command line
for mac os
$ export YESTERDAY=`date -v-1w +%F`
$ aws s3api list-objects --bucket "ariba-install" |\
jq '.Contents[] | select(.LastModified | startswith('\"$YESTERDAY\"')).Key' --raw-output |\
xargs -I {} aws s3 cp s3://ariba-install/{} .
for linux os (or other flavor of bash that I am not familiar)
$ export YESTERDAY=`date -d "1 day ago" '+%Y-%m-%d' `
$ aws s3api list-objects --bucket "ariba-install" |\
jq '.Contents[] | select(.LastModified | startswith('\"$YESTERDAY\"')).Key' --raw-output |\
xargs -I {} aws s3 cp s3://ariba-install/{} .
Now you get the idea if you want to change the YESTERDAY variable to have different kind of date

Quickly finding the size of an S3 'folder'

We have s3 'folders' (objects with a prefix under a bucket) with millions and millions of files and we want to figure out the size of these folders.
Writing my own .net application to get the lists of s3 objects was easy enough but the maximum number of keys per request is 1000, so it's taking forever.
Using S3Browser to look at a 'folder's' properties is taking a long time too. I'm guessing for the same reasons.
I've had this .NET application running for a week - I need a better solution.
Is there a faster way to do this?
The AWS CLI's ls command can do this: aws s3 ls --summarize --human-readable --recursive s3://$BUCKETNAME/$PREFIX --region $REGION
Seems like AWS added a menu item where it's possible to see the size:
I prefer using the AWSCLI. I find that the web console often times out when there are too many objects.
replace s3://bucket/ with where you want to start from.
relies on awscli, awk, tail, and some bash-like shell
start=s3://bucket/ && \
for prefix in `aws s3 ls $start | awk '{print $2}'`; do
echo ">>> $prefix <<<"
aws s3 ls $start$prefix --recursive --summarize | tail -n2
done
or in one line form:
start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done
Output looks something like:
$ start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done
>>> extracts/ <<<
Total Objects: 23
Total Size: 10633858646
>>> hackathon/ <<<
Total Objects: 2
Total Size: 10004
>>> home/ <<<
Total Objects: 102
Total Size: 1421736087
I think the ideal solution does not exist. But I offer some ideas you can further develop:
Is the app the only mean by which file are written to S3? If so, you can store (in a db, a file or what ever) the files size and sum it when necessary
Do concurrent calls to the LIST api
Can you switch from an organisation based on folders to one based on buckets? If so, you could query the billing API (yes, the billing) and calculating the size (or an approximation of) from cost...
If they're throttling you too 1000 keys per request, I'm not certain how PowerShell is going to help, but if you want to size of a bunch of folders, something like this should do it.
Save the following in a file called Get-FolderSize.ps1:
param
(
[Parameter(Position=0, ValueFromPipeline=$True, Mandatory=$True)]
[ValidateNotNullOrEmpty()]
[System.String]
$Path
)
function Get-FolderSize ($_ = (get-item .)) {
Process {
$ErrorActionPreference = "SilentlyContinue"
#? { $_.FullName -notmatch "\\email\\?" } <-- Exlcude folders.
$length = (Get-ChildItem $_.fullname -recurse | Measure-Object -property length -sum).sum
$obj = New-Object PSObject
$obj | Add-Member NoteProperty Folder ($_.FullName)
$obj | Add-Member NoteProperty Length ($length)
Write-Output $obj
}
}
Function Class-Size($size)
{
IF($size -ge 1GB)
{
"{0:n2}" -f ($size / 1GB) + " GB"
}
ELSEIF($size -ge 1MB)
{
"{0:n2}" -f ($size / 1MB) + " MB"
}
ELSE
{
"{0:n2}" -f ($size / 1KB) + " KB"
}
}
Get-ChildItem $Path | Get-FolderSize | Sort-Object -Property Length -Descending | Select-Object -Property Folder, Length | Format-Table -Property Folder, #{ Label="Size of Folder" ; Expression = {Class-Size($_.Length)} }
Usage: .\Get-FolderSize.ps1 -Path \path\to\your\folders