I want to delete multiple GCS keys wih Boto. In it's documentation it suggests that there's a multi-object delete method (delete_keys), however I cannot get it to work.
According to this article it is possible for Amazon S3:
s3 = boto.connect_s3()
bucket = s3.get_bucket("basementcoders.logging")
result = bucket.delete_keys([key.name for key in bucket if key.name[-1] == '6'])
result.deleted
However when i try the same thing for Google Storage it doesn't work:
bucket = BotoConnection().get_bucket(bucketName)
keys = [key for key in bucket]
print len(keys)
result = bucket.delete_keys(keys)
print result.deleted
print result.errors
Traceback (most recent call last):
File "gcsClient.py", line 166, in <module>
GcsClient.deleteMultipleObjects('debug_bucket')
File "gcsClient.py", line 155, in deleteMultipleObjects
result = bucket.delete_keys(keys)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/bucket.py", line 583, in delete_keys
while delete_keys2(headers):
File "/usr/local/lib/python2.7/dist-packages/boto/s3/bucket.py", line 582, in delete_keys2
body)
boto.exception.GSResponseError: GSResponseError: 400 Bad Request
This uses S3's multi-object delete API, which Google Cloud Storage does not support. Thus, it is not possible to do it this way for Google Cloud Storage - you will need to call delete_key () once per key.
Related
I have a specific use case where I want to upload an object to S3 at a specific prefix. A file already exists at that prefix and I want to replace that file with this new one. I am using boto3 to do the same and I am getting the following error. Bucket versioning is turned off and hence I am expecting the file to be overwritten in this case. However, I get the following error.
{
"errorMessage": "An error occurred (InvalidRequest) when calling the CopyObject operation: This copy request is illegal because it is trying to copy an object to itself without changing the object's metadata, storage class, website redirect location or encryption attributes.",
"errorType": "ClientError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key)\n",
" File \"/var/runtime/boto3/resources/factory.py\", line 520, in do_action\n response = action(self, *args, **kwargs)\n",
" File \"/var/runtime/boto3/resources/action.py\", line 83, in __call__\n response = getattr(parent.meta.client, operation_name)(*args, **params)\n",
" File \"/var/runtime/botocore/client.py\", line 386, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 705, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
This is what I have tried so far.
import boto3
import tempfile
import os
import tempfile
print('Loading function')
s3 = boto3.resource('s3')
glue = boto3.client('glue')
bucket='my-bucket'
bucket_prefix='my-prefix'
def lambda_handler(_event, _context):
my_bucket = s3.Bucket(bucket)
# Code to find the object name. There is going to be only one file.
for object_summary in my_bucket.objects.filter(Prefix=bucket_prefix):
product_key= object_summary.key
print(product_key)
#Using product_key variable I am trying to copy the same file name to the same location, which is when I get an error.
s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key)
# Maybe the following line is not required
s3.Object(bucket,bucket_prefix).delete()
I have a very specific reason to copy the same file at the same location. AWS GLue doesn't pick the same file once it's bookmarked it. My copying the file again I am hoping that the Glue bookmark will be dropped and the Glue job will consider this as a new file.
I am not too tied up with the name. If you can help me modify the above code to generate a new file at the same prefix level that would work as well. There always has to be one file here though. Consider this file as a static list of products that has been bought over from a relational DB into S3.
Thanks
From Tracking Processed Data Using Job Bookmarks - AWS Glue:
For Amazon S3 input sources, AWS Glue job bookmarks check the last modified time of the objects to verify which objects need to be reprocessed. If your input source data has been modified since your last job run, the files are reprocessed when you run the job again.
So, it seems your theory could work!
However, as the error message states, it is not permitted to copy an S3 object to itself "without changing the object's metadata, storage class, website redirect location or encryption attributes".
Therefore, you can add some metadata as part of the copy process and it will succeed. For example:
s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key, Metadata={'foo': 'bar'})
I have a TensorFlow Model deployed with AWS SageMaker endpoint exposed .
From Lambda Python I am using boto3 client to invoke the endpoint .
The TensorFlow model accepts 3 inputs as follows
{'input1' : numpy array , 'input2' : integer ,'input3' :numpy array }
From Lambda using runtime.invoke_endpoint to invoke the SageMaker endpoint .
Getting the error as Parse Error when the API is invoked from boto3client
I tried serializing the data into csv format before calling the API endpoint
Below code written in Lambda
payload = {'input1': encoded_enc_inputstanza_in_batch,
'input2' : encoded_enc_inputstanza_in_batch.shape[1],
'input3' : np.reshape([[15]*20],20) }
infer_file = io.StringIO()
writer = csv.writer(infer_file)
for key, value in payload.items():
writer.writerow([key, value])
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=infer_file.getvalue())
Additional Details
These are the additional details
- Sagemaker Model expects 3 fields as input - 'input1' - Numpy array
'input2' - Int data type ,
'input3' - numpy array
Actual result -
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 143, in lambda_handler
Body=infer_file.getvalue())
File "/var/runtime/botocore/client.py", line 320, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 623, in _make_api_call
raise error_class(parsed_response, operation_name)
END RequestId: fa70e1f3-763b-41be-ad2d-76ae80aefcd0
Expected Result - Successful invocation of the API endpoint .
For text/csv the value for the Body argument to invoke_endpoint should be a string with commas separating the values for each feature. For example, a record for a model with four features might look like: 1.5,16.0,14,23.0. You can try the following:
dp = ','.join(str(a) for a in payload)
After converting the data to text/csv format with comma delimited, have you updated the Endpoint to use the new model data? The input data needs to match the schema of the model.
Is there comma in the "encoded_enc_inputstanza_in_batch" variable?
Hi I am working on a project where I have a source image in s3 bucket and I want to compare it with images in my local computer. I have already set up aws cli. Here is the code. My image is in some bucket 'bx' with name 's.jpg'. Now I want to read it so I called get_object method and used open() to read but it didn't worked.
import boto3
import os
if __name__ == "__main__":
path2 = '/home/vivek/Desktop/tar/'
t=[j for j in os.listdir(path2)]
buck=boto3.client('s3')
obj=buck.get_object(Bucket='bx',Key='s.jpg')
imageSource=open(obj['Body'],'rb')
for targetFile in t:
client=boto3.client('rekognition')
imageTarget=open(targetFile,'rb')
response=client.compare_faces(SimilarityThreshold=70,
SourceImage={'Bytes': imageSource.read()},
TargetImage={'Bytes': imageTarget.read()})
I get an error :
vivek#skywalker:~/Desktop/code$ python3 y.py
Traceback (most recent call last):
File "y.py", line 9, in <module>
imageSource=open(obj['Body'],'rb')
TypeError: expected str, bytes or os.PathLike object, not StreamingBody
I am trying to pass boto3 a list of bucket names and have it first enable versioning on each bucket, then enable a lifecycle policy on each.
I have done aws configure, and do have two profiles, both current, active user profiles with all necessary permissions. The one I want to use is named "default."
import boto3
# Create session
s3 = boto3.resource('s3')
# Bucket list
buckets = ['BUCKET-NAME']
# iterate through list of buckets
for bucket in buckets:
# Enable Versioning
bucketVersioning = s3.BucketVersioning('bucket')
bucketVersioning.enable()
# Current lifecycle configuration
lifecycleConfig = s3.BucketLifecycle(bucket)
lifecycleConfig.add_rule={
'Rules': [
{
'Status': 'Enabled',
'NoncurrentVersionTransition': {
'NoncurrentDays': 7,
'StorageClass': 'GLACIER'
},
'NoncurrentVersionExpiration': {
'NoncurrentDays': 30
}
}
]
}
# Configure Lifecycle
bucket.configure_lifecycle(lifecycleConfig)
print "Versioning and lifecycle have been enabled for buckets."
When I run this I get the following error:
Traceback (most recent call last):
File "putVersioning.py", line 27, in <module>
bucketVersioning.enable()
File "/usr/local/lib/python2.7/dist-packages/boto3/resources/factory.py", line 520, in do_action
response = action(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(**params)
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 253, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 557, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutBucketVersioning operation: Access Denied
My profiles has full privileges, so that shouldn't be a problem. Is there something else I need to do for passing credentials? Thanks everyone!
To set the versioning state, you must be the bucket owner.
The above statement means - To use PutBucketVersioning operation to enable the versioning, you must be the owner of the bucket.
Use the below command to check the owner of the bucket. If you are the owner of the bucket, you should be able to set the versioning state as ENABLED / SUSPENDED.
aws s3api get-bucket-acl --bucket yourBucketName
Ok, notionquest is correct; however, it appears I also goofed up in my code by quoting a variable:
bucketVersioning = s3.BucketVersioning('bucket')
should be
bucketVersioning = s3.BucketVersioning(bucket)
I have been trying to get the Service Account authentication working for the Google Admin SDK for a few days now to no avail. I am using the google-api-python-client-1.2 library freshly installed from Google.
I have been following Google's documentation on the topic. Links are here:
htps://developers.google.com/accounts/docs/OAuth2ServiceAccount
htps://developers.google.com/api-client-library/python/guide/aaa_oauth
htp://google-api-python-client.googlecode.com/hg/docs/epy/oauth2client.client.SignedJwtAssertionCredentials-class.html
And have the tasks.py Service Account example working which you can be found here:
htp://code.google.com/p/google-api-python-client/source/browse/samples/service_account/tasks.py?r=c21573904a2df1334d13b4380f63463c94c8d0e8
And have been closely studying these two Stack Overflow threads on a related topic here:
google admin sdk directory api 403 python
Google Admin API using Oauth2 for a Service Account (Education Edition) - 403 Error
And have studied the relevant code in gam.py (Dito GAM).
Yet I'm still missing something as I am getting an 'oauth2client.client.AccessTokenRefreshError: access_denied' exception in nearly every test case I write.
Here is a concise example of a test authentication:
import httplib2
from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials
f = file('myKey.p12', 'rb')
key = f.read()
f.close()
credentials = SignedJwtAssertionCredentials(
'myServiceAdmin#developer.gserviceaccount.com',
key,
sub='myAdminUser#my.googleDomain.edu',
scope = ['https://www.googleapis.com/auth/admin.directory.user',])
http = httplib2.Http()
http = credentials.authorize(http)
service = build('admin', 'directory_v1', http=http)
When I run the above code I get this stack dump and exception:
Traceback (most recent call last):
File "./test.py", line 17, in <module>
service = build('admin', 'directory_v1', http=http)
File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/apiclient/discovery.py", line 192, in build resp, content = http.request(requested_url)
File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/oauth2client/client.py", line 475, in new_request
self._refresh(request_orig)
File "/usr/lib/python2.7/dist-packages/oauth2client/client.py", line 653, in _refresh
self._do_refresh_request(http_request)
File "/usr/lib/python2.7/dist-packages/oauth2client/client.py", line 710, in _do_refresh_request
raise AccessTokenRefreshError(error_msg)
oauth2client.client.AccessTokenRefreshError: access_denied
I've tried multiple super user accounts, service accounts, and keys and always end up with the same exception. If I add sub to the tasks.py example I end up with the same error. Replacing sub with prn also generates this exception and adding private_key_password='notasecret' does nothing (it is the default). The Admin SDK is activated in the Google Developers Console and the target accounts have super user privileges. This makes me think something is missing on the Google domain side but I cannot think of anything else to check.
Any one have an idea what I am doing wrong?
Have you granted the third party client access in your Admin Console for your service account?
My to go instruction when it comes to setting up Service Account is the instruction Google has for Drive Api.
https://developers.google.com/drive/web/delegation
Take a look at the "Delegate domain-wide authority to your service account" part and see if you have completed those steps.
Maybe not OP's problem, but I had this same error and my issue was that I was setting the sub field in the credentials object
credentials = SignedJwtAssertionCredentials(SERVICE_ACCOUNT_EMAIL, key,
scope=SCOPES, sub=**<DON'T SET ME>**)
If you're using domain-wide delegation, you need to not set a sub (because your "user" is the domain administrator.) The docs are a bit confusing on this point. I just removed the sub field and it worked for me.