aws s3 sync missing to create root folders - amazon-web-services

I am archiving some folders to S3
Example: C:\UserProfile\E21126\data ....
I expect to have a folder structure in s3 like, UserProfiles\E21126.
Problem is it created the folders under \E21126 and misses creating the root folder \E21126.
Folds1.txt contains these folders to sync:
G:\UserProfiles\E21126
G:\UserProfiles\E47341
G:\UserProfiles\C68115
G:\UserProfiles\C30654
G:\UserProfiles\C52860
G:\UserProfiles\E47341
G:\UserProfiles\C68115
G:\UserProfiles\C30654
G:\UserProfiles\C52860
my code below:
ForEach ($Folder in (Get-content "F:\scripts\Folds1.txt")) {
aws s3 sync $Folder s3://css-lvdae1cxfs003-archive/Archive-Profiles/ --acl bucket-owner-full-control --storage-class STANDARD
}

It will upload all the folders with their names excluding the path. If you want to include the UserProfiles in the S3 bucket then you will needs to include that in the key. You need to upload them to the S3 bucket with specifying the key name
aws s3 sync $Folder s3://css-lvdae1cxfs003-archive/Archive-Profiles/UserProfiles --acl bucket-owner-full-control --storage-class STANDARD
and If your files have different name instead of UserProfiles string then you can get the parent path and then fetch the leaf to get the username from the string
PS C:\> Split-Path -Path "G:\UserProfiles\E21126"
G:\UserProfiles
PS C:\> Split-Path -Path "G:\UserProfiles" -Leaf -Resolve
UserProfiles

If you were to modify the text file to contain:
E21126
E47341
C68115
Then you could use the command:
ForEach ($Folder in (Get-content "F:\scripts\Folds1.txt")) {
aws s3 sync G:\UserProfiles\$Folder s3://css-lvdae1cxfs003-archive/Archive-Profiles/$Folder/ --acl bucket-owner-full-control --storage-class STANDARD
}
Note that the folder name is included in the destination path.

Related

Download Last 24 hour files from s3 using Powershell

I have an s3 bucket with different filenames. I need to download specific files (filenames that starts with impression) that are created or modified in last 24 hours from s3 bucket to local folder using powershell?
$items = Get-S3Object -BucketName $sourceBucket -ProfileName $profile -Region 'us-east-1' | Sort-Object LastModified -Descending | Select-Object -First 1 | select Key Write-Host "$($items.Length) objects to copy" $index = 1 $items | % { Write-Host "$index/$($items.Length): $($_.Key)" $fileName = $Folder + ".\$($_.Key.Replace('/','\'))" Write-Host "$fileName" Read-S3Object -BucketName $sourceBucket -Key $_.Key -File $fileName -ProfileName $profile -Region 'us-east-1' > $null $index += 1 }
A workaround might be to turn on access log, and since the access log will contain timestamp, you can get all access logs in the past 24 hours, de-duplicate repeated S3 objects, then download them all.
You can enable S3 access log in the bucket settings, the logs will be stored in another bucket.
If you end up writing a script for this, just bear in mind downloading the S3 objects will essentially create new access logs, making the operation irreversible.
If you want something fancy perhaps you can even query the logs and perhaps deduplicate using AWS Athena.

Powershell writing to AWS S3

I'm trying to get powershell to write results to AWS S3 and I can't figure out the syntax. Below is the line that is giving me trouble. If I run this without everything after the ">>" the results print on the screen.
Write-host "Thumbprint=" $i.Thumbprint " Expiration Date="$i.NotAfter " InstanceID ="$instanceID.Content" Subject="$i.Subject >> Write-S3Object -BucketName arn:aws:s3:::eotss-ssl-certificatemanagement
Looks like you have an issue with >> be aware that you can't pass the write-host function result into another command.
In order to do that, you need to assign the string you want into a variable and then pass it into the -Content.
Take a look at the following code snippet:
Install-Module AWSPowerShell
Import-Module AWSPowerShell
#Set AWS Credential
Set-AWSCredential -AccessKey "AccessKey" -SecretKey "SecretKey"
#File upload
Write-S3Object -BucketName "BucketName" -Key "File upload test" -File "FilePath"
#Content upload
$content = "Thumbprint= $($i.Thumbprint) Expiration Date=$($i.NotAfter) InstanceID = $($instanceID.Content) Subject=$($i.Subject)"
Write-S3Object -BucketName "BucketName" -Key "Content upload test" -Content $content
How to create new AccessKey and SecretKey - Managing Access Keys for Your AWS Account.
AWSPowerShell Module installation.
AWS Tools for PowerShell - S3 Documentation.

How to pass AWS CLI parameters as variables in Powershell

I have a AWS CLI command to download some files from my S3 bucket but I wanted to pass parameters and their values in from PowerShell variables. Is this possible?
This works:
$filterInclude = "7012*"
$results = aws s3 cp $bucketPath $destinationDir --recursive --exclude $filterExclude --include $filterInclude
But I wanted something like:
$filterInclude = "7012*"
$includeCom = "--include \`"$($filterInclude)\`""
$results = aws s3 cp $bucketPath $destinationDir --recursive --exclude $filterExclude "$($includeCom)"
The result I get is:
Unknown options: --include "7012*"

How do I list all AWS S3 objects that are public?

I wanted to list all the objects that are in my s3 buckets that are public. Using the get-object-acl would list the grantees for a specific object so I was wondering if there are better options
Relying on get-object-acl is probably not what you want to do, because objects can be made public by means other than their ACL. At the very least, this is possible through both the object's ACL and also the bucket's policy (see e.g. https://havecamerawilltravel.com/photographer/how-allow-public-access-amazon-bucket/), and perhaps there are other means I don't know about.
A smarter test is to make a HEAD request to each object with no credentials. If you get a 200, it's public. If you get a 403, it's not.
The steps, then, are:
Get a list of buckets with the ListBuckets endpoint. From the CLI, this is:
aws2 s3api list-buckets
For each bucket, get its region and list its objects. From the CLI (assuming you've got credentials configured to use it), you can do these two things with these two commands, respsectively:
aws2 s3api get-bucket-location --bucket bucketnamehere
aws2 s3api list-objects --bucket bucketnamehere
For each object, make a HEAD request to a URL like
https://bucketname.s3.us-east-1.amazonaws.com/objectname
with bucketname, us-east-1, and objectname respectively replaced with your bucket name, the actual name of the bucket's region, and your object name.
To do this from the Unix command line with Curl, do
curl -I https://bucketname.s3.us-east-1.amazonaws.com/objectname
An example implementation of the logic above in Python using Boto 3 and Requests:
from typing import Iterator
import boto3
import requests
s3 = boto3.client('s3')
all_buckets = [
bucket_dict['Name'] for bucket_dict in
s3.list_buckets()['Buckets']
]
def list_objs(bucket: str) -> Iterator[str]:
"""
Generator yielding all object names in the bucket. Potentially requires
multiple requests for large buckets since list_objects is capped at 1000
objects returned per call.
"""
response = s3.list_objects_v2(Bucket=bucket)
while True:
if 'Contents' not in response:
# Happens if bucket is empty
return
for obj_dict in response['Contents']:
yield obj_dict['Key']
last_key = obj_dict['Key']
if response['IsTruncated']:
response = s3.list_objects_v2(Bucket=bucket, StartAfter=last_key)
else:
return
def is_public(bucket: str, region: str, obj: str) -> bool:
url = f'https://{bucket}.s3.{region}.amazonaws.com/{obj}'
resp = requests.head(url)
if resp.status_code == 200:
return True
elif resp.status_code == 403:
return False
else:
raise Exception(f'Unexpected HTTP code {resp.status_code} from {url}')
for bucket in all_buckets:
region = s3.get_bucket_location(Bucket=bucket)['LocationConstraint']
for obj in list_objs(bucket):
if is_public(bucket, region, obj):
print(f'{bucket}/{obj} is public')
Be aware that this takes about a second per object, which is... not ideal, if you have a lot of stuff in S3. I don't know of a faster alternative, though.
After some time spending with AWS CLI can tell you that the best approach for that is to sync, mv or cp files with permissions under structured prefixes
Permission – Specifies the granted permissions, and can be set to read, readacl, writeacl, or full.
For example aws s3 sync . s3://my-bucket/path --acl public-read
Then under needed prefix list all those objects.
Put the name of the bucket or list of buckets into "buckets.list" file & run the bash script below.
The script supports unlimited(!) number of objects as it uses pagination.
#!/bin/bash
MAX_ITEMS=100
PAGE_SIZE=100
for BUCKET in $(cat buckets.list);
do
OBJECTS=$(aws s3api list-objects-v2 --bucket $BUCKET --max-items=$MAX_ITEMS --page-size=$PAGE_SIZE 2>/dev/null)
e1=$?
if [[ "OBJECTS" =~ "Could not connect to the endpoint URL" ]]; then
echo "Could not connect to the endpoint URL!"
echo -e "$BUCKET" "$OBJECT" "Could not connect to the endpoint URL" >> errors.log
fi
NEXT_TOKEN=$(echo $OBJECTS | jq -r '.NextToken')
while [[ "$NEXT_TOKEN" != "" ]]
do
OBJECTS=$(aws s3api list-objects-v2 --bucket $BUCKET --max-items=$MAX_ITEMS --page-size=$PAGE_SIZE --starting-token $NEXT_TOKEN | jq -r '.Contents | .[].Key' 2>/dev/null)
for OBJECT in $OBJECTS;
do
ACL=$(aws s3api get-object-acl --bucket $BUCKET --key $OBJECT --query "Grants[?Grantee.URI=='http://acs.amazonaws.com/groups/global/AllUsers']" --output=text 2>/dev/null)
e2=$?
if [[ "$ACL" =~ "Could not connect to the endpoint URL" ]]; then
echo "Could not connect to the endpoint URL!"
echo -e "$BUCKET" "$OBJECT" "Could not connect to the endpoint URL" >> errors.log
fi
if [[ ! "$ACL" == "" ]] && [[ $e1 == 0 ]] && [[ $e2 == 0 ]]; then
echo -e "$BUCKET" "$OBJECT" "Public object!!!" "$ACL"
echo -e "$BUCKET" "$OBJECT" "$ACL" >> public-objects.log
else
echo -e "$BUCKET" "$OBJECT" "not public"
fi
done
done
done

How to cp file only if it does not exist, throw error otherwise?

aws s3 cp "dist/myfile" "s3://my-bucket/production/myfile"
It always copies myfile to s3 - I would like to copy file ONLY if it does no exist, throw error otherwise. How I can do it? Or at least how I can use awscli to check if file exists?
You could test for the existence of a file by listing the file, and seeing whether it returns something. For example:
aws s3 ls s3://bucket/file.txt | wc -l
This would return a zero (no lines) if the file does not exist.
If you only want to copy a file if it does not exist, try the sync command, e.g.:
aws s3 sync . s3://bucket/ --exclude '*' --include 'file.txt'
This will synchronize the local file with the remote object, only copying it if it does not exist or if the local file is different to the remote object.
So, turns out that "aws s3 sync" doesn't do files, only directories. If you give it a file, you get...interesting...behavior, since it treats anything you give it like a directory and throws a slash on it. At least aws-cli/1.6.7 Python/2.7.5 Darwin/13.4.0 does.
%% date > test.txt
%% aws s3 sync test.txt s3://bucket/test.txt
warning: Skipping file /Users/draistrick/aws/test.txt/. File does not exist.
So, if you -really- only want to sync a file (only upload if exists, and if checksum matches) you can do it:
file="test.txt"
aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
Note the exclude/include order - if you reverse that, it won't include anything. And your source and include path need to have sanity around their matching, so maybe a $(basename $file) is in order for --include if you're using full paths... aws --debug s3 sync is your friend here to see how the includes evaluate.
And don't forget the target is a directory key, not a file key.
Here's a working example:
%% file="test.txt"
%% date >> $file
%% aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
upload: ./test.txt to s3://bucket/test.txt/test.txt
%% aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
%% date >> $file
%% aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
upload: ./test.txt to s3://bucket/test.txt/test.txt
(now, if only there were a way to ask aws s3 to -just- validate the checksum, since it seems to always do multipart style checksums.. oh, maybe some --dryrun and some output scraping and sync..)
You can do this by listing and copying if and only if the list succeeds.
aws s3 ls "s3://my-bucket/production/myfile" || aws s3 cp "dist/myfile" "s3://my-bucket/production/myfile"
Edit: replaced && to || to have the desired effect of if list fails do copy
You can also check the existence of a file by aws s3api head-object subcommand. An advantage of this over aws s3 ls is that it just requires s3:GetObject permission instead of s3:ListBucket.
$ aws s3api head-object --bucket ${BUCKET} --key ${EXISTENT_KEY}
{
"AcceptRanges": "bytes",
"LastModified": "Wed, 1 Jan 2020 00:00:00 GMT",
"ContentLength": 10,
"ETag": "\"...\"",
"VersionId": "...",
"ContentType": "binary/octet-stream",
"ServerSideEncryption": "AES256",
"Metadata": {}
}
$ echo $?
0
$ aws s3api head-object --bucket ${BUCKET} --key ${NON_EXISTENT_KEY}
An error occurred (403) when calling the HeadObject operation: Forbidden
$ echo $?
255
Note that the HTTP status code for the non-existent object depends on whether you have the s3:ListObject permission. See the API document for more details:
If you have the s3:ListBucket permission on the bucket, Amazon S3 returns an HTTP status code 404 ("no such key") error.
If you don’t have the s3:ListBucket permission, Amazon S3 returns an HTTP status code 403 ("access denied") error.
AWS HACK
You can run the following command to raise ERROR if the file already exists
Run aws s3 sync command to sync the file to s3, it will return the copied path if the file doesn't exist or it will give blank output if it exits
Run wc -c command to check the character count and raise an error if the output is zero
com=$(aws s3 sync dist/ s3://my-bucket/production/ | wc -c);if [[ $com -ne 0
]]; then exit 1; else exit 0; fi;
OR
#!/usr/bin/env bash
com=$(aws s3 sync dist s3://my-bucket/production/ | wc -c)
echo "hello $com"
if [[ $com -ne 0 ]]; then
echo "File already exists"
exit 1
else
echo "success"
exit 0
fi
I voted up aviggiano. Using his example above, I was able to get this to work in my windows .bat file. If the S3 path exists it will throw an error and end the batch job. If the file does not exist it will continue on to perform the copy function. Hope this helps some one.
:Step1
aws s3 ls s3://00000000000-fake-bucket/my/s3/path/inbound/test.txt && ECHO Could not copy to S3 bucket becasue S3 Object already exists, ending script. && GOTO :Failure
ECHO No file found in bucket, begin upload.
aws s3 cp Z:\MY\LOCAL\PATH\test.txt s3://00000000000-fake-bucket/my/s3/path/inbound/test.txt --exclude "*" --include "*.txt"
:Step2
ECHO YOU MADE IT, LET'S CELEBRATE
IF %ERRORLEVEL% == 0 GOTO :Success
GOTO :Failure
:Success
echo Job Endedsuccess
GOTO :ExitScript
:Failure
echo BC_Script_Execution_Complete Failure
GOTO :ExitScript
:ExitScript
I am running AWS on windows. and this is my simple script.
rem clean work files:
if exist SomeFileGroup_remote.txt del /q SomeFileGroup_remote.txt
if exist SomeFileGroup_remote-fileOnly.txt del /q SomeFileGroup_remote-fileOnly.txt
if exist SomeFileGroup_Local-fileOnly.txt del /q SomeFileGroup_Local-fileOnly.txt
if exist SomeFileGroup_remote-Download-fileOnly.txt del /q SomeFileGroup_remote-Download-fileOnly.txt
Rem prep:
call F:\Utilities\BIN\mhedate.cmd
aws s3 ls s3://awsbucket//someuser#domain.com/BulkRecDocImg/folder/folder2/ --recursive >>SomeFileGroup_remote.txt
for /F "tokens=1,2,3,4* delims= " %%i in (SomeFileGroup_remote.txt) do #echo %%~nxl >>SomeFileGroup_remote-fileOnly.txt
dir /b temp\*.* >>SomeFileGroup_Local-fileOnly.txt
findstr /v /I /l /G:"SomeFileGroup_Local-fileOnly.txt" SomeFileGroup_remote-fileOnly.txt >>SomeFileGroup_remote-Download-fileOnly.txt
Rem Download:
for /F "tokens=1* delims= " %%i in (SomeFileGroup_remote-Download-fileOnly.txt) do (aws s3 cp s3://awsbucket//someuser#domain.com/BulkRecDocImg/folder/folder2/%%~nxi "temp" >>"SomeFileGroup_Download_%DATE.YEAR%%DATE.MONTH%%DATE.DAY%.log")
I Added Date to the path in-order to not override the file:
aws cp videos/video_name.mp4 s3://BUCKET_NAME/$(date +%D-%H:%M:%S)
So that way I will have history and the existing file won't be overriddend.