AWS S3 presign url, check file exist - amazon-web-services

I use AWS php SDK.
How can I check if file exist using presign request commands?
Currently I use "GetObject" command but I do not need it download file. I only need check if file exist.
$cmd = $s3->getCommand('GetObject', [
'Bucket' => 's3.test.bucket',
'Key' => $fileKey
]);
$request = $s3->createPresignedRequest($cmd, '+60 minutes')->withMethod('GET');
return (string)$request->getUri();
Is there any command to achieve it?
Thank you.

I found solution. The proper command is HeadObject and method is HEAD.
Return 200 or 404.

Related

Amazon S3 Multipart Upload Fails at 21% in Paws::S3, Net::Amazon::S3::Client and Amazon::S3

I've been attempting to do a simple, but large file upload to S3 in Perl. Using Paws::S3, Net::Amazon::S3::Client and Amazon::S3, it fails at around 21% of the chunks being uploaded -- the first chunks go fine, but no matter how small or large of chunks I go with, at approximately that percentage of them, it fails. I've run into problems trying to use Net::Amazon::S3::Client more generally, but have kludged my way past the issue in this other question for now.
I'm suspecting this is an S3 issue at this point, since I'm using two completely separate Perl libraries to attempt the upload, although perhaps both libraries have the same flaw. The Paws::S3 library simply pauses for a long time, outputs so much gibberish that it crashes my Terminal app and dies. Net::Amazon::S3::Client at least gives a bit more of a meaningful error output:
Subroutine Net::Amazon::S3::Client::Object::put_part redefined at ftBackupPaws.pl line 136, <FH> line 14.
Progress: 21% [======================================= ]ETA 12:05500: write failed: at /usr/local/share/perl5/Net/Amazon/S3/Error/Handler/Confess.pm line 35.
Net::Amazon::S3::Error::Handler::Confess::handle_error(Net::Amazon::S3::Error::Handler::Confess=HASH(0x55b8d529af68), Net::Amazon::S3::Operation::Object::Upload::Part::Response=HASH(0x55b8d5ca7d50)) called at /usr/local/share/perl5/Net/Amazon/S3.pm line 393
Net::Amazon::S3::_perform_operation(Net::Amazon::S3=HASH(0x55b8d55a6fb0), "Net::Amazon::S3::Operation::Object::Upload::Part", "error_handler", Net::Amazon::S3::Error::Handler::Confess=HASH(0x55b8d529af68), "bucket", "MyBucketName", "part_number", 21, ...) called at /usr/local/share/perl5/Net/Amazon/S3/Client.pm line 109
Net::Amazon::S3::Client::_perform_operation(Net::Amazon::S3::Client=HASH(0x55b8d044cab0), "Net::Amazon::S3::Operation::Object::Upload::Part", "bucket", "MyBucketName", "headers", HASH(0x55b8d601b8c0), "part_number", 21, ...) called at /usr/local/share/perl5/Net/Amazon/S3/Client/Bucket.pm line 197
Net::Amazon::S3::Client::Bucket::_perform_operation(Net::Amazon::S3::Client::Bucket=HASH(0x55b8d55a6fc8), "Net::Amazon::S3::Operation::Object::Upload::Part", "key", "multipart1", "part_number", 21, "headers", HASH(0x55b8d601b8c0), ...) called at /usr/local/share/perl5/Net/Amazon/S3/Client/Object.pm line 520
Net::Amazon::S3::Client::Object::_perform_operation(Net::Amazon::S3::Client::Object=HASH(0x55b8d55a17b0), "Net::Amazon::S3::Operation::Object::Upload::Part", "upload_id", "BmDoBE8OdS8aUMUpP5m3jf9ltt7NmTiWuou01ODeW3YZzloElL3EnUcdApIX_"..., "part_number", 21, "headers", HASH(0x55b8d601b8c0), ...) called at ftBackupPaws.pl line 133
main::__ANON__(Net::Amazon::S3::Client::Object=HASH(0x55b8d55a17b0), "upload_id", "theUploadId"..., "part_number", 21, "value", "\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}\x{0}"...) called at ftBackupPaws.pl line 164
The code that is presently doing the upload is, as I said, a bit of a kludge. Net::Amazon::S3::Client seems to have a bug that prevents it from initializing a multipart upload, so to get started, I used aws-cli to create get an upload id and hard coded it for the time being, just to try to assess how everything else worked. Here's the Paws::S3 version since it is less kludgy, but, again, they fail at the same point.
use Paws;
use Paws::Credential::Explicit;
use POSIX;
use Term::ProgressBar;
our $paws = Paws->new(config => {
credentials => Paws::Credential::Explicit->new(
access_key => $config{'access_key_id'},
secret_key => $config{'access_key'}
)
});
our $pawsS3 = Paws->service('S3', credentials => Paws::Credential::Explicit->new(
access_key => $config{'access_key_id'},
secret_key => $config{'access_key'}),
region => 'us-east-1');
my $CreateMultipartUploadOutput = $pawsS3->CreateMultipartUpload(
'Bucket' => $bucketName,
'Key' => 'backup/daily/2022-11-13/myaccount.tar.gz'
);
my $filename = "/backup/2022-12-13/accounts/myaccount.tar.gz";
my $size = -s $filename;
my $fiveMeg = (1024*1024*500);
my $parts = ceil($size / $fiveMeg);
my #manifest;
open(my $file, "/backup/2022-12-13/accounts/myaccount.tar.gz") or die("Error reading file, stopped");
my $progress = Term::ProgressBar->new({name => 'Progress',
count => $parts,
ETA => 'linear',
remove => 1 });
for (my $i = 0; $i <= $parts; $i++) {
my $chunk;
my $offset = $i * $fiveMeg;
read($file, $chunk, $fiveMeg, $offset);
my $UploadPartOutput = $pawsS3->UploadPart(
'Body' => $chunk,
'Bucket' => $bucketName,
'Key' => 'backup/daily/2022-11-13/myaccount.tar.gz',
'PartNumber' => $i + 1,
'UploadId' => $CreateMultipartUploadOutput->UploadId
);
push (#manifest, { 'ETag' => $UploadPartOutput->ETag, 'PartNumber' => $i +1 });
$progressBarUpdate = $progress->update($i);
say STDOUT 'Chunk ' . $i . '/' . $parts;
}
close(FILE);
my $CompleteMultipartUploadOutput = $pawsS3->CompleteMultipartUpload(
'Bucket' => $bucketName,
'Key' => 'backup/daily/2022-11-13/myaccount.tar.gz',
'MultipartUpload' => {
'Parts' => \#manifest
},
'UploadId' => $CreateMultipartUploadOutput->UploadId
);
# Results:
use Data::Dumper;
say STDOUT Dumper $CompleteMultipartUploadOutput;
I can provide the other code base, too, if that's helpful.
Update (December 15, 2022): So it must have something to do with the API and not S3 itself. To try to test it further, I modified my script to output the 18 chunks rather than upload them. I then tried to use the aws console app to to a multipart upload:
aws s3api create-multipart-upload --bucket mybucket --key 'multipart1'
aws s3api upload-part --bucket mybucket --key 'multipart1' --part-number 1 --body part0 --upload-id [ID GOES HERE]
aws s3api upload-part --bucket mybucket --key 'multipart1' --part-number 2 --body part1 --upload-id [ID GOES HERE]
# ....
I went well beyond the stopping point in the Perl API and each upload successfully received an etag. So there's something in both Paws::S3 and Net::Amazon::S3::Client causing a disruption at around 2GB of uploads that doesn't happen in was-cli.
Updated (December 15, 2022): I tried Amazon::S3 and the result was the same: when it has uploaded 21%, it fails with:
Amazon::S3: Amazon responded with 500 write failed: at /usr/local/share/perl5/Amazon/S3/Bucket.pm line 418 thread 2.
I also tried iterating so that the program would hit the "troublemaking" bucket that appears at the 21% mark first. In doing so, the upload still failed at having uploaded 21% of the chunks -- so the failure seems to be tied somehow to how much stuff is being uploaded rather than it being something corrupt in a particular chunk.
Also, I tried implementing threading thinking maybe the upload was timing out. But queuing up the work into multiple threads didn't solve anything other than getting me to the failure point faster.

How to determine if a string is located in AWS S3 CSV file

I have a CSV file in AWS S3.
The file is very large 2.5 Gigabytes
The file has a single column of strings, over 120 million:
apc.com
xyz.com
ggg.com
dddd.com
...
How can I query the file to determine if the string xyz.com is located in the file?
I only need to know if the string is there or not, I don't need to return the file.
Also it will be great if I can pass multiple strings for search and return only the ones that were found in the file.
For example:
Query => ['xyz.com','fds.com','ggg.com']
Will return => ['xyz.com','ggg.com']
The "S3 Select" SelectObjectContent API enables applications to retrieve only a subset of data from an object by using simple SQL expressions. Here's a Python example:
res = client.select_object_content(
Bucket="my-bucket",
Key="my.csv",
ExpressionType="SQL",
InputSerialization={"CSV": { "FileHeaderInfo": "NONE" }}, # or IGNORE, USE
OutputSerialization={"JSON": {}},
Expression="SELECT * FROM S3Object s WHERE _1 IN ['xyz.com', 'ggg.com']") # _1 refers to the first column
See this AWS blog post for an example with output parsing.
If you use the aws s3 cp command you can send the output to stdout:
aws s3 cp s3://yourbucket/foo.csv - | grep 'apc.com'
- The dash will send the output to stdout.
this are two examples of grep checking on multiple patterns:
aws s3 cp s3://yourbucket/foo.csv - | grep -e 'apc.com' -e 'dddd.com'
aws s3 cp s3://yourbucket/foo.csv - | grep 'apc.com\|dddd.com'
To learn more about grep, please look at the manual: GNU Grep 3.7

How to execute AWS SSM send command to run shell script with arguments from Lambda?

Currently working on a AWS Lambda function to execute shell script (with arguments) remotely on EC2 instance.
Shell script argument values are stored as environment variables in Lambda.
How to reference the env variables of Lambda inside SSM send command?
Have a code snippet like this: (but it doesn't work)
response = ssm_client.send_command(
InstanceIds=instances,
DocumentName="AWS-RunShellScript",
Parameters={
"commands": ["sh /bin/TEST/test.sh -v {{os.environ:tar_version}}"]
},
OutputS3BucketName="tar",
OutputS3Region="eu-west-1"
)
Request you to please help me here.
Thanks
All you need to do is simple string formatting. Using Python's f-strings:
import os
tar_version = os.environ['TAR_VERSION']
response = ssm_client.send_command(
InstanceIds=instances,
DocumentName="AWS-RunShellScript",
Parameters={
"commands": [f"sh /bin/TEST/test.sh -v {tar_version}"]
},
OutputS3BucketName="tar",
OutputS3Region="eu-west-1"
)

AWS - Centos7 - /home/.aws/credentials not working

I have a Centos7 VPS with AWS CLI installed on the /home directory. I've added my credentials into aws configure and it's generated the following files:
/home/.aws/credentials
/home/.aws/config
If I run the following code, it fails:
$client = new Aws\Lightsail\LightsailClient([
'region' => 'eu-west-2',
'version' => '2016-11-28'
]);
The error message is:
AccessDeniedException (client): User: arn:aws:sts::523423432423:assumed-role/AmazonLightsailInstanceRole/i-0eb5b2155b08e5185 is not authorized to perform
However if I add my credentials like so it works:
$credentials = new Aws\Credentials\Credentials('key', 'secret');
$client = new Aws\Lightsail\LightsailClient([
'region' => 'eu-west-2',
'version' => '2016-11-28',
'credentials' => $credentials
]);
Do I need to do something extra in order to get my script to read the /home/.aws/credentials file?
Do I need to do something extra in order to get my script to read the /home/.aws/credentials file?
Yes, you need to put the .aws/credentials directory in the home directory of the user running the command. This will be something like /home/username instead meaning that the full path to the credentials will be /home/username/.aws/credentials. It does not matter where you installed the aws command to.

AWS SSM RunCommand - Issue with RunRemoteScript Document to run PowerShell script with parameters

In AWS SSM, I use RunRemoteScript document to run a PowerShell script to install some software on SSM managed instances. The script is hosted in a public accessible S3 bucket.
The RunCommand works fine with the script not taking any parameters. Software was successfully deployed to managed instances. But my script has a unique CID embedded in the code. For security reasons, I need to take it out and set it as a parameter for the PS script. Ever since then, the RunCommand just keeps failing.
My script looks like below (with parameter CID):
param (
[Parameter(Position = 0, Mandatory = 1)]
[string]$CID
)
Start-Transcript -Path "$([System.Environment]::GetEnvironmentVariable('TEMP','Machine'))\app_install.log" -Append
function Install-App {
<#
Installs App
#>
[CmdletBinding()]
[OutputType([PSCustomObject])]
param (
[Parameter(Position = 0, Mandatory = 1)]
[string]$msiURL,
[Parameter(Position = 2, Mandatory = 1)]
[string]$InstallCheck,
[Parameter(Position = 3, Mandatory = 1)]
[string]$CustomerID
)
if ( -not(Test-Path $installCheck)) {
# Do stuff
...
}
else {
Write-Host ("$installCheck - Already Installed")
Return "Already Installed, Skipped $(($msiURL -split '([^\\/]+$)')[1])"
}
}
Install-App -msiURL "https://s3.amazonaws.com/app.foo.com/Windows/app.exe" -InstallCheck "C:\Program Files\App\app.exe" -CustomerID $CID
Stop-Transcript
By following AWS SSM documentation below, I run the command below to kick off the RunCommand.
https://docs.aws.amazon.com/systems-manager/latest/userguide/integration-remote-scripts.html
aws ssm send-command --document-name "AWS-RunRemoteScript" --targets "Key=instanceids,Values=mi-abc12345"
--parameters '{"sourceType":["S3"],"sourceInfo":["{\"path\": "https://s3.amazonaws.com/app.foo.com/Windows/app_install.ps1\"}"],"commandLine":["app_install.ps1 abcd123456"]}'
The RunCommand keeps failing with error below:
----------ERROR-------
app_install.ps1 : The term 'app_install.ps1' is not recognized
as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At C:\ProgramData\Amazon\SSM\InstanceData\mi-abcd1234\document\orchest
ration\a6811111d-c411-411-a222-bad123456\runPowerShellScript\_script.ps1:4
char:2
+ app_install.ps1 abcd123456
+ ~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (app_install.ps1:String)
[], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
failed to run commands: exit status 255
I suspect this is to do with the way how RunCommand handles the argument for the PowerShell script. But I cannnot find any examples other than the official document, which I followed. Anyone can point out what the issue is here?
BTW, I already tried putting the ps1 after ".\" without luck.
I found out the cause of the issue. The IAM role attached to the instance did not have sufficient rights to access the S3 bucket holds the script. As a result SSM wasn't able to download the script to the instance, hence the error "...ps1 is not recognized".
So it's not related to the code actually.