I would like to send a message to an HTTP triggered Google Cloud Function. Specifically I want to tell the function when a file version has changed so that the function loads the new version of the file in memory.
I thought about updating an environment variable as a way of sending that message but it is not so straightforward to run an update-env-vars since this needs to be done in the context of the function's project.
Also I thought of using a database which sounds like too much for a single variable and using a simple text file in storage with the current version which sounds too little. Any other idea?
According to the conversation in the comments section, I believe the best way to achieve what you are looking for is a gcs notification triggering PubSub.
gsutil notification create -t TOPIC_NAME -f json gs://BUCKET_NAME
PubSub will get notified based on event types and this I believe it will depend on what you consider a new version of the file (metadata changes? new blob will be created?)
Basically, you can pass the -e flag in the command above which indicates the event type:
OBJECT_FINALIZE Sent when a new object (or a new generation of an
existing object) is successfully created in the bucket. This includes
copying or rewriting an existing object. A failed upload does not
trigger this event.
OBJECT_METADATA_UPDATE Sent when the metadata of an existing object
changes.
That means, any file upload or metadata change in GCS it will trigger PubSub which triggers your Cloud Function. Function example to pull message from PubSub
def hello_pubsub(event, context):
import base64
print("""This Function was triggered by messageId {} published at {} to {}
""".format(context.event_id, context.timestamp, context.resource["name"]))
if 'data' in event:
name = base64.b64decode(event['data']).decode('utf-8')
else:
name = 'World'
print('Hello {}!'.format(name))
Documents for reference:
https://cloud.google.com/storage/docs/pubsub-notifications
https://cloud.google.com/functions/docs/calling/pubsub#functions_calling_pubsub-python
Related
I was wondering if it's possible to customise the pubsub messages that are triggered by GCS events. In particular, I'm interested in adding metadata in the message "attributes".
For example, upon the creation of a new object in GCS, the OBJECT_FINALIZED (see https://cloud.google.com/functions/docs/calling/storage) is triggered.
I pull this message, e.g.
received_messages = pubsub_v1.SubscriberClient().pull(request)
for msg in received_messages:
message_data = json.loads(msg.message.data.decode('utf-8')).get("data")
msg_attributes = message_data.get("attributes")
I want to be able to customise what goes into "attributes" prior to creation the object in GCS.
It is not possible to customize the Pub/Sub notifications from Cloud Storage. They are published by Cloud Storage and the schema and contents are controlled by the service and are specified in the notifications documentation.
In my bash script I want to recreate GCS notification create by:
gsutil notification create -f json -t <topic> -p <prefix> gs://<bucket>
In case I'll try to call this line again, it will create one more (the same) notification.
In order to delete the notification I need:
gsutil notification delete projects/_/buckets/<bucket>/notificationConfigs/<config-id>
config-id is the identifier returned when the notification is created. also, it can be retrieved with:
gsutil notification list gs://<bucket>
The output of list call is similar to:
projects/_/buckets/<bucket>/notificationConfigs/<config-id>
Cloud Pub/Sub topic: projects/<project>/topics/<topic>
Filters:
Object name prefix: '<project>'
This config-id does not look like something to parse easily in Shell.
Is there a normal way to manage notifications? Or can I create notifications with no duplicates (so the second create call will not create a new notification, but update the existent)?
If you use the CLI, it's the normal way. If you use the debug command, you have a id field, but not sure that the response be easier to parse
gsutil -D notification list gs://<bucket>
You can also use the REST API of Google Cloud Storage for the notifications
In the list endpoint, you have a notification description with the ID of the notification, easy to get this time.
Finally, you can use the client library (here in Python for example) where you have handy method, like an Exist, to be sure to not create twice the same subscription.
As the title says, I need to fetch the size of the video / object I just uploaded to the bucket.
Every few seconds, an object is uploaded to my bucket which is of the form, {id}/video1.mp4.
I want to make use of google cloud storage triggers which would alert me if a 0 byte video was added. Can someone pls suggest me how to access the size of the added object.
Farhan,
Assuming you know the basics of cloud functions. You can create a cloud function trigger that runs a script every-time you create/finalize an object in a selected bucket.
The link you posted contains the tutorial and the following python script attached.
def hello_gcs_generic(data, context):
"""Background Cloud Function to be triggered by Cloud Storage.
This generic function logs relevant data when a file is changed.
Args:
data (dict): The Cloud Functions event payload.
context (google.cloud.functions.Context): Metadata of triggering event.
Returns:
None; the output is written to Stackdriver Logging
"""
print('Event ID: {}'.format(context.event_id))
print('Event type: {}'.format(context.event_type))
print('Bucket: {}'.format(data['bucket']))
print('File: {}'.format(data['name']))
print('Metageneration: {}'.format(data['metageneration']))
print('Created: {}'.format(data['timeCreated']))
print('Updated: {}'.format(data['updated']))
In this example, we see data has multiple items such as name, timeCreated ect.
What this example doesn't show however is that data has another item, SIZE!
listed as data['size']
So now we have a cloud function that gets the filename, and file size of whatever is uploaded when it's uploaded!. all we have to do now is create an if statement to do "something" if the file size is = 0. It will look something like this in python. (apologies for syntax issues, but this is the jist of it)
def hello_gcs_generic(data, context):
"""Background Cloud Function to be triggered by Cloud Storage.
This generic function logs relevant data when a file is changed.
Args:
data (dict): The Cloud Functions event payload.
context (google.cloud.functions.Context): Metadata of triggering event.
Returns:
None; the output is written to Stackdriver Logging
"""
print('File: {}'.format(data['name']))
print('Size: {}'.format(data['size']))
size = data['size']
if size == 0:
print("its 0!")
else:
print("its not 0!")
Hope this helps!
Can you please help me here:
I'm batch processing files (json files) from Cloud Storage to write the data into BigQuery.
I have a topic created with a Cloud Function(to process the message and write the data into BQ) subscriber to the topic.
I have created a 'DataFlow' job to notify the topic for any json files created/stored in my source bucket.
The above flow processes the json file and inserts rows in to BQ table perfectly.
I want to delete the source json file from the Cloud Storage after the file is successfully processed. Any input on how this can be done?
You can use the Client Libraries and then make a call to the objects delete function at some point in your pipeline.
First install the the Java Client Library for example and after the file is processed make a call to the delete method as shown in this sample:
BlobId blobId = BlobId.of(bucketName, blobName);
boolean deleted = storage.delete(blobId);
if (deleted) {
// the blob was deleted
} else {
// the blob was not found
}
UPDATE:
Another thing that comes to mind is to use Pub/Sub notifications in order to know when certain events occur in your storage bucket. But so far the list of supported events doesn't include object creation:
OBJECT_FINALIZE Sent when a new object (or a new generation of an existing object) is successfully created in the bucket ...
OBJECT_METADATA_UPDATE Sent when the metadata of an existing object changes ...
OBJECT_DELETE Sent when an object has been permanently deleted. This includes objects that are overwritten or are deleted as part of
the bucket's lifecycle configuration ...
OBJECT_ARCHIVE Only sent when a bucket has enabled object versioning ...
Important: Additional event types may be released later. Client code
should either safely ignore unrecognized event types, or else
explicitly specify in their notification configuration which event
types they are prepared to accept.
Hope this helps.
I'm looking to allow multiple clients can upload files to an S3 bucket (or buckets). The S3 create event would trigger a notification that would add a message to an SNS topic. This works, but I'm having issues deciding how to identify which client uploaded the file. I could get this to work by explicitly checking the uploaded file's subfolder/S3 name, but I'd much rather automatically add the client identifier as an attribute to the SNS message.
Is this possible? My other thought is using a Lambda function as a middle man to add the attribute and pass it along to the SNS Topic, but again I'd like to do it without the Lambda function if possible.
The Event Message Structure sent from S3 to SNS includes a field:
"userIdentity":{
"principalId":"Amazon-customer-ID-of-the-user-who-caused-the-event"
},
However, this also depends upon the credentials that were used when the object was uploaded:
If users have their individual AWS credentials, then the Access Key will be provided
If you are using a pre-signed URL to permit the upload, then the Access Key will belong to the one used in the pre-signed URL and your application (which generated the pre-signed URL) would be responsible for tracking the user who requested the upload
If you are generating temporary credentials for each client (eg by calling AssumeRole, then then Role's ID will be returned
(I didn't test all the above cases, so please do test them to confirm the definition of Amazon-customer-ID-of-the-user-who-caused-the-event.)
If your goal is to put your own client identifier in the message, then the best method would be:
Configure the event notification to trigger a Lambda function
Your Lambda function uses the above identifier to determine which user identifier within your application triggered the notification (presumably consulting a database of application user information)
The Lambda function sends the message to SNS or to whichever system you wish to receive the message (SNS might not be required if you send directly)
You can add user-defined metadata to your files before you upload the file like below:
private final static String CLIENT_ID = "client-id";
ObjectMetadata meta = new ObjectMetadata();
meta.addUserMetadata(CLIENT_ID, "testid");
s3Client.putObject(<bucket>, <objectKey>, <inputstream of the file>, meta);
Then when downloading the S3 files:
ObjectMetadata meta = s3Client.getObjectMetadata(<bucket>, <objectKey>);
String clientId = meta.getUserMetaDataOf(CLIENT_ID);
Hope this is what you are looking for.