PowerShell Copy Files to AWS Bucket with Object Lock - amazon-web-services

I want to copy CSV files generated by an SSIS package from an AWS EC2 server to an S3 bucket. Each time I try I get an error around the content-MD5 HTTP error because we have object lock enabled on the bucket.
Write-S3Object : Content-MD5 HTTP header is required for Put Object requests with Object Lock parameters
I would assume there is a PowerShell command I can add or I am missing something but after furious googling I cannot find a resolution. Any help or an alternative option would be appreciated.
I am now testing using the AWS CLI process instead of PowerShell.

If you do want to continue to use the Write-S3Object PowerShell command the missing magic flag is:
-CalculateContentMD5Header 1
So the final command will be
Write-S3Object -Region $region -BucketName $bucketName -File $fileToBackup -Key $destinationFileName -CalculateContentMD5Header 1
https://docs.aws.amazon.com/powershell/latest/reference/items/Write-S3Object.html

After a lot of testing, reading and frustration I found the AWS CLI was able to do exactly what I needed. I am unsure if this is an issue with my PowerShell knowledge or a missing feature (I lean toward my knowledge).
I created a bat file that used the CLI to move the files into the S3 bucket, I then called this bat file from within an SSIS execute task process.
Dropped the one line code below just in case it may help others.
aws s3 mv C:\path\to\files\ s3://your.s3.bucket.name/ --recursive

Related

aws s3 sync --delete, does not remove directory manually created from console

aws s3 sync <> <> --delete works fine but I have a scenario wherein someone created directories using AWS console and put in some files inside those directories using manual upload. So now when I run the sync command, those files get removed but the manually created directories still persist.
Is this an expected behavior of the command?
This issue is tracked here - https://github.com/aws/aws-cli/issues/2685 . This is a known bug and no direct solution is available yet so we need to create wayarounds according to our condition.

aws S3 how to show contents of file based on a pattern

I would like to perform a simple
cat dir/file.OK.*
how can this be achieved in aws?
I came up with
aws s3 cp s3://bucket-name/path/to/folder/ - --exclude="*" --include="R0.OK.*"
but this returns:
download failed: *(the path)* to - An error occurred (404) when calling the HeadObject operation: Not Found
Thank you for your help.
Additional detail: there is supposed to be only one file matching the pattern, so we could use that information. Anything else is allowed to (horribly) fail.
Edit - currently I am just executing an aws ls into a file, and then cp-ing every file piped into grep. Works, but a nuissance.
I would consider an alternative to streaming the S3 content direct to stdout, which is to simply copy the files from S3 to local disk, which will allow you to use include/exclude/recursive, cat the files locally, and then simply delete the files afterwards.

automating file archival from ec2 to s3 based on last modified date

I want to write an automated job in which the job will go through my files stored on the ec2 storage and check for the last modified date.If the date is more than (x) days the file should automatically get archived to my s3.
Also I don't want to convert the file to a zip file for now.
What I don't understand is how to give the path of the ec2 instance storage and the how do i put the condition for the last modified date.
aws s3 sync your-new-dir-name s3://your-s3-bucket-name/folder-name
Please correct me if I understand this wrong
Your requirement is to archive the older files
So you need a script that checks the modified time and if its not being modified since X days you simply need to make space by archiving it to S3 storage . You don't wish to store the file locally
is it correct ?
Here is some advice
1. Please provide OS information ..this would help us to suggest shell script or power shell script
Here is power shell script
$fileList = Get-Content "c:\pathtofolder"
foreach($file in $fileList) {
Get-Item $file | select -Property fullName, LastWriteTime | Export-Csv 'C:\fileAndDate.csv' -NoTypeInformation
}
then AWS s3 cp to s3 bucket.
You will do the same with Shell script.
Using aws s3 sync is a great way to backup files to S3. You could use a command like:
aws s3 sync /home/ec2-user/ s3://my-bucket/ec2-backup/
The first parameter (/home/ec2-user/) is where you can specify the source of the files. I recommend only backing-up user-created files, not the whole operating system.
There is no capability for specifying a number of days. I suggest you just copy all files.
You might choose to activate Versioning to keep copies of all versions of files in S3. This way, if a file gets overwritten you can still go back to a prior version. (Storage charges will apply for all versions kept in S3.)

Remove processed source files after AWS Datapipeline completes

A third party sends me a daily upload of log files into an S3 bucket. I'm attempting to use DataPipeline to transform them into a slightly different format with awk, place the new files back on S3, then move the original files aside so that I don't end up processing the same ones again tomorrow.
Is there a clean way of doing this? Currently my shell command looks something like :
#!/usr/bin/env bash
set -eu -o pipefail
aws s3 cp s3://example/processor/transform.awk /tmp/transform.awk
for f in "${INPUT1_STAGING_DIR}"/*; do
basename=${f//+(*\/|.*)}
unzip -p "$f" | awk -f /tmp/transform.awk | gzip > ${OUTPUT1_STAGING_DIR}/$basename.tsv.gz
done
I could use the aws cli tool to move the source file aside on each iteration of the loop, but that seems flakey - if my loop dies halfway through processing, those earlier files are going to get lost.
Few possible solutions:
Create a trigger on your s3 bucket.. Whenever any object added to the bucket --> invoke lambda function which can be a python script which performs transformation --> copies back to another bucket. Now, on this other bucket again lambda function is invoked which deletes file from first bucket.
I personally feel; what you have achieved is good enough..All you need is exception handling in the shell script and delete the file ( never loose data ) ONLY when output file is successfully created ( probably u can check the size of output file also )

aws s3 mv/sync command

I have about 2 million files nested in subfoldrs in a bucket and want to move all of them to another bucket. Spending much of time on searching ... i found a solution to use AWS CLI mv/sync command. use move command or use sync command and then delete all the files after successfully synced.
aws s3 mv s3://mybucket/ s3://mybucket2/ --recursive
or it can be as
aws s3 sync s3://mybucket/ s3://mybucket2/
But the problem is how would i know that how many files/folders have moved or synced and how much time would it take...
And what if some exception occurs(machine/server stops/ internet disconnection due to any reason )...i have to again execute the command or it will for surely complete and move/sync all files. How can i be sure about the number of files moved/synced and files not moved/synced.
or can i have something like that
I move limited number of files e.g 100 thousand.. and repeat until all files are moved...
or move files on the basis of uploaded time.. e.g files uploaded from starting date to ending date
if yes .. how?
To sync them use:
aws s3 sync s3://mybucket/ s3://mybucket2/
You can repeat the command, after it finish (or fail) without issue. This will check if anything is missing/different to the target s3 bucket and will process it again.
The time depends on what size are the files, how much objects you have. Amazon counts directories as an object, so they matter too.