Custom message attributes on default GCS events - google-cloud-platform

I was wondering if it's possible to customise the pubsub messages that are triggered by GCS events. In particular, I'm interested in adding metadata in the message "attributes".
For example, upon the creation of a new object in GCS, the OBJECT_FINALIZED (see https://cloud.google.com/functions/docs/calling/storage) is triggered.
I pull this message, e.g.
received_messages = pubsub_v1.SubscriberClient().pull(request)
for msg in received_messages:
message_data = json.loads(msg.message.data.decode('utf-8')).get("data")
msg_attributes = message_data.get("attributes")
I want to be able to customise what goes into "attributes" prior to creation the object in GCS.

It is not possible to customize the Pub/Sub notifications from Cloud Storage. They are published by Cloud Storage and the schema and contents are controlled by the service and are specified in the notifications documentation.

Related

Sending message to HTTP Google Cloud Function

I would like to send a message to an HTTP triggered Google Cloud Function. Specifically I want to tell the function when a file version has changed so that the function loads the new version of the file in memory.
I thought about updating an environment variable as a way of sending that message but it is not so straightforward to run an update-env-vars since this needs to be done in the context of the function's project.
Also I thought of using a database which sounds like too much for a single variable and using a simple text file in storage with the current version which sounds too little. Any other idea?
According to the conversation in the comments section, I believe the best way to achieve what you are looking for is a gcs notification triggering PubSub.
gsutil notification create -t TOPIC_NAME -f json gs://BUCKET_NAME
PubSub will get notified based on event types and this I believe it will depend on what you consider a new version of the file (metadata changes? new blob will be created?)
Basically, you can pass the -e flag in the command above which indicates the event type:
OBJECT_FINALIZE Sent when a new object (or a new generation of an
existing object) is successfully created in the bucket. This includes
copying or rewriting an existing object. A failed upload does not
trigger this event.
OBJECT_METADATA_UPDATE Sent when the metadata of an existing object
changes.
That means, any file upload or metadata change in GCS it will trigger PubSub which triggers your Cloud Function. Function example to pull message from PubSub
def hello_pubsub(event, context):
import base64
print("""This Function was triggered by messageId {} published at {} to {}
""".format(context.event_id, context.timestamp, context.resource["name"]))
if 'data' in event:
name = base64.b64decode(event['data']).decode('utf-8')
else:
name = 'World'
print('Hello {}!'.format(name))
Documents for reference:
https://cloud.google.com/storage/docs/pubsub-notifications
https://cloud.google.com/functions/docs/calling/pubsub#functions_calling_pubsub-python

How to subscribe to changes in DynamoDB

I don't know how to subscribe to changes in DynamoDB database. Let me show an example: User A sends a message (which is saved in the database) to User B and in the User B's app the message automatically appears.
I know this is possible with recently released AWS AppSync, but I couldn't integrate it with Ionic (which I am using). However, there must be an alternative since AWS AppSync was released only at the end of 2017/beginning of 2018.
I've also seen something called Streams in DynamoDB but not sure if that's what I need.
DynamoDB Streams is designed specifically for capturing/subscribing to table activity. You can set up a Lambda Function with your notification logic to process the stream and send notifications accordingly.

Google Cloud storage - delete processed files

Can you please help me here:
I'm batch processing files (json files) from Cloud Storage to write the data into BigQuery.
I have a topic created with a Cloud Function(to process the message and write the data into BQ) subscriber to the topic.
I have created a 'DataFlow' job to notify the topic for any json files created/stored in my source bucket.
The above flow processes the json file and inserts rows in to BQ table perfectly.
I want to delete the source json file from the Cloud Storage after the file is successfully processed. Any input on how this can be done?
You can use the Client Libraries and then make a call to the objects delete function at some point in your pipeline.
First install the the Java Client Library for example and after the file is processed make a call to the delete method as shown in this sample:
BlobId blobId = BlobId.of(bucketName, blobName);
boolean deleted = storage.delete(blobId);
if (deleted) {
// the blob was deleted
} else {
// the blob was not found
}
UPDATE:
Another thing that comes to mind is to use Pub/Sub notifications in order to know when certain events occur in your storage bucket. But so far the list of supported events doesn't include object creation:
OBJECT_FINALIZE Sent when a new object (or a new generation of an existing object) is successfully created in the bucket ...
OBJECT_METADATA_UPDATE Sent when the metadata of an existing object changes ...
OBJECT_DELETE Sent when an object has been permanently deleted. This includes objects that are overwritten or are deleted as part of
the bucket's lifecycle configuration ...
OBJECT_ARCHIVE Only sent when a bucket has enabled object versioning ...
Important: Additional event types may be released later. Client code
should either safely ignore unrecognized event types, or else
explicitly specify in their notification configuration which event
types they are prepared to accept.
Hope this helps.

AWS S3 Event - Client Identification

I'm looking to allow multiple clients can upload files to an S3 bucket (or buckets). The S3 create event would trigger a notification that would add a message to an SNS topic. This works, but I'm having issues deciding how to identify which client uploaded the file. I could get this to work by explicitly checking the uploaded file's subfolder/S3 name, but I'd much rather automatically add the client identifier as an attribute to the SNS message.
Is this possible? My other thought is using a Lambda function as a middle man to add the attribute and pass it along to the SNS Topic, but again I'd like to do it without the Lambda function if possible.
The Event Message Structure sent from S3 to SNS includes a field:
"userIdentity":{
"principalId":"Amazon-customer-ID-of-the-user-who-caused-the-event"
},
However, this also depends upon the credentials that were used when the object was uploaded:
If users have their individual AWS credentials, then the Access Key will be provided
If you are using a pre-signed URL to permit the upload, then the Access Key will belong to the one used in the pre-signed URL and your application (which generated the pre-signed URL) would be responsible for tracking the user who requested the upload
If you are generating temporary credentials for each client (eg by calling AssumeRole, then then Role's ID will be returned
(I didn't test all the above cases, so please do test them to confirm the definition of Amazon-customer-ID-of-the-user-who-caused-the-event.)
If your goal is to put your own client identifier in the message, then the best method would be:
Configure the event notification to trigger a Lambda function
Your Lambda function uses the above identifier to determine which user identifier within your application triggered the notification (presumably consulting a database of application user information)
The Lambda function sends the message to SNS or to whichever system you wish to receive the message (SNS might not be required if you send directly)
You can add user-defined metadata to your files before you upload the file like below:
private final static String CLIENT_ID = "client-id";
ObjectMetadata meta = new ObjectMetadata();
meta.addUserMetadata(CLIENT_ID, "testid");
s3Client.putObject(<bucket>, <objectKey>, <inputstream of the file>, meta);
Then when downloading the S3 files:
ObjectMetadata meta = s3Client.getObjectMetadata(<bucket>, <objectKey>);
String clientId = meta.getUserMetaDataOf(CLIENT_ID);
Hope this is what you are looking for.

Is it possible to generate custom event on s3?

I tried to enable notifications in S3 bucket, but i get JSON format long data to my registered email , i want to filter on notifications's attribute such as "object deleted" , "date-time" only, so is it possible ?
If you want to either limit the fields returned, or filter the events that get generated, you are going to have to do that yourself.
Easiest way would probably be to have the s3 event notifications sent to a custom lambda function (that you write) that can filter and/or reformat the raw s3eventnotification and then have lambda send it on to your downstream consumer, i.e. via email if you want - but there is nothing built-in to aws to do the filtering/reformatting for you.