In my bash script I want to recreate GCS notification create by:
gsutil notification create -f json -t <topic> -p <prefix> gs://<bucket>
In case I'll try to call this line again, it will create one more (the same) notification.
In order to delete the notification I need:
gsutil notification delete projects/_/buckets/<bucket>/notificationConfigs/<config-id>
config-id is the identifier returned when the notification is created. also, it can be retrieved with:
gsutil notification list gs://<bucket>
The output of list call is similar to:
projects/_/buckets/<bucket>/notificationConfigs/<config-id>
Cloud Pub/Sub topic: projects/<project>/topics/<topic>
Filters:
Object name prefix: '<project>'
This config-id does not look like something to parse easily in Shell.
Is there a normal way to manage notifications? Or can I create notifications with no duplicates (so the second create call will not create a new notification, but update the existent)?
If you use the CLI, it's the normal way. If you use the debug command, you have a id field, but not sure that the response be easier to parse
gsutil -D notification list gs://<bucket>
You can also use the REST API of Google Cloud Storage for the notifications
In the list endpoint, you have a notification description with the ID of the notification, easy to get this time.
Finally, you can use the client library (here in Python for example) where you have handy method, like an Exist, to be sure to not create twice the same subscription.
Related
how can I get last 30 minutes AWS CloudWatch logs which are inserted to the specific LogStream using AWS Command ?
Can you describe what you already tried yourself and what you ran into? Looking at the AWS CLI command reference, it seems that you should be able to run "aws cloudwatch get-log-events ----log-stream-name <name of the stream> --start-time <timestamp>" to get a list of events, starting at given UNIX timestamp, calculating the timestamp should be fairly trivial.
Addition, based on your comment: you'll need to look into the AWS concept of pagination. Most/many AWS API calls (which the CLI also makes for you) retrieve a size/length limited set of data and return a token if there is more data present. You can then make a subsequent call passing that token, which tells the service to return data starting at that token. Repeat this process until you no longer get a token back, at which point you know you have iterated the full dataset.
For this specific CLI command, there is a flag.
--next-token (string)
The token for the next set of items to return. (You received this token from a previous call.
Hope this helps?
I am developing a solution where a cloud function calls BigQuery procedure and upon successful completion of this stored proc trigger another cloud function. For this I am using Audit Logs "jobservice.jobcompleted" method. Problem with this approach is it will trigger cloud function on every job that are completed in BigQuery irrespective of dataset and procedure.
Is there any way to add Path Pattern to the filter so that it triggers only for specific query completion and not for all?
My query starts something like: CALL storedProc() ...
Also, as I tried to create a 2nd Gen function from console, I tried Eventarc trigger. But to my surprise BigQuery Event provider doesn't have Event for jobCompleted
Now I'm wondering if it's possible to trigger based on job complete event.
Update:I changed my logic now to use google.cloud.bigquery.v2.TableService.InsertTable method to make sure after inserting a record to a table it will add AuditLog message so that I can trigger the next service. This insert statement is present as the last statement in BigQuery procedure.
After running the procedure, the insert statement is inserting the data but resource name is coming as projects/<project_name>/jobs
I was expecting something like projects/<project_name>/tables/<table_name> so that I can apply path pattern on resource name.
Do I need to use different protoPayload.method?
Try to create a Log Sink for job completed with unique principal-email sv account and use pubsub with the sink.
Get pubsub published event to run destination service.
I would like to send a message to an HTTP triggered Google Cloud Function. Specifically I want to tell the function when a file version has changed so that the function loads the new version of the file in memory.
I thought about updating an environment variable as a way of sending that message but it is not so straightforward to run an update-env-vars since this needs to be done in the context of the function's project.
Also I thought of using a database which sounds like too much for a single variable and using a simple text file in storage with the current version which sounds too little. Any other idea?
According to the conversation in the comments section, I believe the best way to achieve what you are looking for is a gcs notification triggering PubSub.
gsutil notification create -t TOPIC_NAME -f json gs://BUCKET_NAME
PubSub will get notified based on event types and this I believe it will depend on what you consider a new version of the file (metadata changes? new blob will be created?)
Basically, you can pass the -e flag in the command above which indicates the event type:
OBJECT_FINALIZE Sent when a new object (or a new generation of an
existing object) is successfully created in the bucket. This includes
copying or rewriting an existing object. A failed upload does not
trigger this event.
OBJECT_METADATA_UPDATE Sent when the metadata of an existing object
changes.
That means, any file upload or metadata change in GCS it will trigger PubSub which triggers your Cloud Function. Function example to pull message from PubSub
def hello_pubsub(event, context):
import base64
print("""This Function was triggered by messageId {} published at {} to {}
""".format(context.event_id, context.timestamp, context.resource["name"]))
if 'data' in event:
name = base64.b64decode(event['data']).decode('utf-8')
else:
name = 'World'
print('Hello {}!'.format(name))
Documents for reference:
https://cloud.google.com/storage/docs/pubsub-notifications
https://cloud.google.com/functions/docs/calling/pubsub#functions_calling_pubsub-python
I'm trying to use Go to send objects in a S3 bucket to Textract and collect the response.
I'm using the aws go sdk package and able to connect to my S3 bucket and list all the objects contained within. So far so good. I now need to be able to send one of those objects (a .pdf file) to Textract and collect the response(s).
The AWS Go SDK content for interacting with Textract seem to be quite extensive but I cannot find a good example for how to do this.
I would be very grateful for a sample or advice on how to do this.
To start a job, you invoke StartDocumentTextDetection, using a DocumentLocation to specify the file, and you specify a SNS topic where Textract will publish a notification when it has finished to process your job.
You have now two possibilities:
Subscribe to the SNS topic, and when you receive a message retrieve the result
Create a lambda function triggered by the SNS topic, which retrieves the result.
The second option is IMO better 'cause it use less computation time (doesn't run until the job hasn't finished).
To retrieve the job, you use GetDocumentTextDetection
If anyone else reaches this site searching for an answer:
I understood the documentation as if I could just call the StartDocumentAnalysis function through the textract SDK but in fact what was missing is the fact that you need to create a new Session first and do the calls based on the session:
https://docs.aws.amazon.com/sdk-for-go/api/service/textract/#New
We have cloud watch log agent setup and the logs streamed are appending a timestamp to beginning of each line which we could see after export.
2017-05-23T04:36:02.473Z "message"
Is there any configuration on cloud watch log agent setup that helps not appending this timestamp to each log entry?
Is there a way to export cloud watch logs only the messages of log events? We dont want the timestamp on our exported logs.
Thanks
Assume that you are able to retrieve those logs using your Lambda function (Python 3.x).
Then you can use Regular Expression to identify the timestamp and write a function to strip it from the event log.
^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z\t
The above will identify the following timestamp: 2019-10-10T22:11:00.123Z
Here is a simple Python function:
def strip(eventLog):
timestamp = "r'^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z\t'"
result = re.sub(timestamp, "", eventLog)
return result
I don't think it's possible, I needed the same exact behavior you are asking for and looks like it's not possible unless you implement a man in the middle processor to remove the timestamp from every log message as suggested in the other answer
Checking the CloudWatch Logs Client API in the first place, it's required to send the timestamp with every log message you send to CloudWatch Logs (API reference)
And the export logs to S3 task API also has no parameters to control this behavior (API reference)