AWS boto3 how to get the metadata from the key? - amazon-web-services

I am trying to get the metadata value of the file uploaded in the s3 bucket
#i have to specifically use the boto3.resource('s3') for other api call in the project.
i have below data available under the metadata field
#metadata
Key=Content-Type
Value= application/json
below are the code
bucket= 'mybucket'
key='L1/input/file.json'
s3_resource = boto3.resource('s3')
object = s3_resource.Object(bucket,key)
metadata = object.metadata
but iam getting below error
[ERROR] ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
can anyone help me on this.

Be careful of your syntax. This line:
s3_client=boto3.resource('s3')
is returning a resource, not a client.
Therefore, this line is failing:
obj = s3_client.head_object(bucket,key)
because head_object() is not an operation that can be performed on a resource.
Instead, use:
s3_resource = boto3.resource('s3')
object = s3_resource.Object('bucket_name','key')
metadata = object.metadata
It will provide a Dictionary of the metadata.

Related

Trying to use AWS' SelectObjectContent but getting error code: NotImplemented

I am running the following code to get the number of records in a parquet file placed inside an S3 bucket.
import boto3
import os
s3 = boto3.client('s3')
sql_stmt = """SELECT count(*) FROM s3object s"""
req_fact =s3.select_object_content(
Bucket = 'test_hadoop',
Key = 'counter_db.cm_workload_volume_sec.dt=2023-01-23.cm_workload_volume_sec+2+000000347262.parquet',
ExpressionType = 'SQL',
Expression = sql_stmt,
InputSerialization={'Parquet':{}},
OutputSerialization = {'JSON': {}})
for event in req_fact['Payload']:
if 'Records' in event:
print(event['Records']['Payload'].decode('utf-8'))
elif 'Stats' in event:
print(event['Stats'])
However I get this error: botocore.exceptions.ClientError: An error occurred (XNotImplemented) when calling the SelectObjectContent operation: This node does not support SelectObjectContent.
What is the issue?
I ran your code against a known good (uncompressed) parquet file with no errors.
resp = s3.select_object_content(
Bucket='my-test-bucket',
Key='counter_db.cm_workload_volume_sec.dt=2023-01-23.cm_workload_volume_sec+2+000000347262.parquet',
ExpressionType='SQL',
Expression="SELECT count(*) FROM s3object s",
InputSerialization={'Parquet': {}},
OutputSerialization={'JSON': {}},
)
Output:
{"_1":4}
In the AWS console you can navigate to the S3 bucket and find the file, highlight it and choose to run S3 select there (Actions > Query with S3 Select). That will allow you to validate that the file can be queried with S3 select (which I think is your issue here)
Note the following: Amazon S3 Select does not support whole-object compression for Apache Parquet objects.

failed to download files from AWS S3

Senario:
commit Athena query with boto3 and output to s3
download result in s3
Error: An error occurred (404) when calling the HeadObject operation: Not Found
It's weird that the file exists in S3 and I can copy it down with aws s3 cp command. But I just cannot download with boto3 and failed to execute head-object.
aws s3api head-object --bucket dsp-smaato-sink-prod --key /athena_query_results/c96bdc09-d545-4ee3-bc66-be3be928e3f2.csv
It does work. I've checked account policies and it has granted admin policy.
# snippets
def s3_donwload(url, target=None):
# s3 = boto3.resource('s3')
# client = s3.meta.client
client = boto3.client("s3", region_name=constant.AWS_REGION, endpoint_url='https://s3.ap-southeast-1.amazonaws.com')
s3_file = urlparse(url)
if target:
target = os.path.abspath(target)
else:
target = os.path.abspath(os.path.basename(s3_file.path))
logger.info(f"download {url} to {target}...")
client.download_file(s3_file.netloc, s3_file.path, target)
logger.info(f"download {url} to {target} done!")
Take a look at the value of s3_file.path -- does it start with a slash? If so, it needs to change because Amazon S3 keys do not start with a slash.
I suggest that you print the content of netloc, path and target to see what values it is actually passing.
It's a bit strange to use os.path with an S3 URL, so it might need some tweaking.

boto3 getting error when trying to list buckets

I'm using
>>> s3 = session.client(service_name='s3',
... aws_access_key_id='access_key_id_goes_here',
... aws_secret_access_key='secret_key_goes_here',
... endpoint_url='endpoint_url_goes_here')
>>> s3.list_buckets()
to list out my existing buckets, but got the error botocore.exceptions.ClientError: An error occurred () when calling the ListBuckets operation: Not sure how to proceed from that
Are you using boto3?
Here is some sample code. There are two ways to use boto:
The 'client' method that maps to AWS API calls, or
The 'resource' method that is more Pythonic
boto3 will automatically retrieve your user credentials from a configuration file, so there is no need to put credentials in the code. You can create the configuration file with the AWS CLI aws configure command.
import boto3
# Using the 'client' method
s3_client = boto3.client('s3')
response = s3_client.list_buckets()
for bucket in response['Buckets']:
print(bucket['Name'])
# Or, using the 'resource' method
s3_resource = boto3.resource('s3')
for bucket in s3_resource.buckets.all():
print(bucket.name)
If you are using an S3-compatible service, you can add a endpoint_url parameter to the client() and resource() calls.

Getting 'ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden' while doing cross account copy of file in using boto3

Copying a file from an s3 bucket in one AWS account to an s3 bucket of another account.
The required roles/policies for this task were created by IAM team which is out of my scope.
This lambda is going to run in destination account and it has to copy the object from source bucket.
while running lambda, getting below error:
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
I was wondering if there is something to be fixed in my code or just a permission issue?
Here is my lambda.
import boto3
ID_list=['123456789101']
def lambda_handler(event, context):
for entry in ID_list:
sts_client = boto3.client('sts')
assumed_role_object=sts_client.assume_role(
RoleArn="arn:aws:iam::" + entry[0] + ":role/requiredrole",
RoleSessionName="SampleSession"
)
credentials=assumed_role_object['Credentials']
s3_resource=boto3.resource(
's3',
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken'],
)
##performing copy from one bucket to other
s3 = boto3.resource('s3')
source= { 'Bucket' : 'my-bucket' + entry[0], 'Key': 'test.csv'} ##source bucket,file details
dest_bucket = s3.Bucket('dest-bucket') #bucket in destination account
dest_bucket.copy(source, 'test1.csv')
i think its a permission issue, you can't either access source bucket or destination bucket.

What I am doing wrong by allowing user to download file from AWS with django boto3?

Hello Awesome People!
I will try to be clear with my question.
All of my media files are uploaded to AWS, I created a view that allows each user to download images. Before, I did it without the boto that summed up
"this back-end doesn't support absolute path."
And now, after some research, I'm using the s3 boto3 connection.
Model
class Photo(models.Model):
file = models.ImageField(upload_to="uploaded_documents/")
total_download = models.PositiveIntegerField()
View
def download_file(request,id):
photo = get_object_or_404(Photo,id=id)
photo.total_download += 1
photo.save()
path = os.path.basename(photo.file.name)
# path = '/media/public/uploaded_documents/museum.jpg'
client = boto3.client('s3',aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
resource = boto3.resource('s3')
bucket = resource.Bucket(settings.AWS_STORAGE_BUCKET_NAME)
bucket.download_file(path, 'my_local_image.jpg')
Here I don't know what to do to trigger it. when I run it, I get the following error:
NoCredentialsError at /api/download-file/75
Exception Type: NoCredentialsError
Exception Value: Unable to locate credentials
UPDATE
I use the credentials in resources instead of client
client = boto3.client('s3')
resource = boto3.resource('s3',aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
and it seems to be authenticated. But now I got the error:
Exception Type: ClientError
Exception Value: An error occurred (404) when calling the HeadObject operation: Not Found
please try looking at the answers in:
Boto3 Error: botocore.exceptions.NoCredentialsError: Unable to locate credentials
also check that the file path is correct in S3