I have 2 problems related to managing concurrency between Google Cloud Functions.
The setup is I have a slackbot enabling use of a "checkoff" slash command. This slash command sends another Slack user yes/no buttons whether to authorize the checkoff. When the user clicks an option, it sends that response to a Google Cloud Function which 1) Sends a response back to Slack to close the buttons and 2) Records the checkoff if authorized in a Google Sheet using the Sheets v4 API (spreadsheets.values.append)
Issue #1: Users who spam the yes/no buttons trigger multiple Slack requests to the Cloud Function before the Function can acknowledge and close the buttons. This leads to multiple Cloud Functions spawning and multiple checkoffs being recorded in the sheet. If I could maintain state, I could save unique information from the request and check to make sure that request had not been already serviced. Is there a pattern to do this with Cloud Functions?
Issue #2: Sometimes multiple checkoffs are authorized at similar times by independent users. These requests spawn independent Cloud Function instances which attempt to append to the Sheet. There is a rare case where another Function writes in between the first Function's read then write causing an overwrite. I would use a read-write lock to deal with this but there's no way to share concurrency resources between Cloud Functions I'm aware of.
(Less important) Issue #3: I'd really love to batch the spreadsheet writes but it seems against the grain of serverless computing in the 1st place. Is there a way to do this?
Any help is appreciated.
I had a similar issues with Cloud Functions and Firestore. In my case I was receiving notifications about new and updated data in the form of 'order/123', I was then creating a copy of the order in Firestore, the problem was that sometimes multiple notifications arrived at the same time resulting in duplicated orders because of race conditions.
My solution to the problem was to use Google Cloud Tasks, https://console.cloud.google.com/cloudtasks, I have a cloud function that receives the notification, that adds a message to the queue to be processed with concurrency of 1, then other cloud function takes care of the processing.
Receive notification -> Post message to queue (concurrency 1) -> Process message
In this case I have 1 queue per customer, I am sure there a better ways but for now this is good enough. You can later on route customers to the same queues but always having the same customer on the same queue.
Related
Our many end users will, through a web browser, read and write in partly overlapping data.
When a user makes a change, a related change should be broadcasted to relevant other users.
Example use case: Several end users, each on their own device, look at a calendar with available time blocks to make an appointment. One of them creates an appointment, causing that a time block is not available for others anymore. The calendar on the screens of those others is updated accordingly and immediately.
Technically this would mean:
Browser sends 'create appointment' event through WebSocket
This event spins up a Cloud Function, which does the following (and then terminates):
Reserve the required capacity in the database
If this causes that the used time block is not available anymore for other users: Broadcast a 'not available anymore' event through the WebSockets of those other users that are viewing this time block.
In Google Cloud this is possible using an Apigee Java callout, where the Java (if needed) calls a Cloud Function, as described on https://cloud.google.com/apigee/docs/api-platform/develop/how-create-java-callout. However, Apigee runs in Kubernetes (https://cloud.google.com/apigee/docs/hybrid/kubernetes-resources), causing the overhead of containers being up at moments when they are not or sparsely used.
Google Clouds API Gateway https://cloud.google.com/api-gateway doesn't support WebSockets: https://issuetracker.google.com/issues/176472002?pli=1
Is there a way to accomplish our goal through a Cloud Function, without any container?
Suppose we have a web-service, that does some time-consuming processing of a file submitted by user. Operations on the file are not running inside the HTTP(s) request handler, as it really affects user experience. Instead, once the user submits the file, a new Google Cloud Task's task gets created and posted to the HTTP(s) endpoint which handles the file processing.
The task runs pretty long, so the user should be updated with the progress to have a better waiting experience. As a beginner user of GCP, I have the following questions.
Is there an API in Google Cloud Tasks to achieve that? For example, some API that allows to set custom metadata on a task object to retrieve it from the other side later.
If not, what are the best practices of storing the progress information of the Google Cloud Task's task? What services are the best to use in this case?
The simplest idea is just to store the progress information using plain text files on the Google Storage bucket.
Assume there's a different HTTP(s) handler that receives the polling requests and must return current task progress. The progress is determined by the file-processing code.
We currently run an AWS Lambda function that primarily simply redirects the user to a different URL. The function is invoked via API-Gateway.
For tracking purposes, we would like to create a widget on our dashboard that provides real-time insights into how many redirects are performed each second. The creation of the widget itself is not the problem.
My main question currently is which AWS Services is best suited for telling our other services that an invocation took place. We plan to register the invocation in our database.
Some additional things:
low latency (< 5 seconds) in order to be real-time data
nearly no increased time wait for the user. We aim to redirect the user as fast as possible
Many thanks in advance!
Best Regards
Martin
I understand that your goal is to simply persist the information that an invocation happened somewhere with minimal impact on the response time of the Lambda.
For that purpose I'd probably use an SQS standard queue and just send a message to the queue that the invocation happened.
You can then have an asynchronous process (Lambda, Docker, EC2) process the messages from the queue and update your Dashboard.
Depending on the scalability requirements looking into Kinesis Data Analytics might also be worth it.
It's a fully managed streaming data solution and the analytics part allows you to do sliding window analyses using SQL on data in the Stream.
In that case you'd write the info that something happened to the stream, which also has a low latency.
I'm studying GCP and reading about different ways to communicate and manage cloud functions I end up wondering when to use each of the services that offer GCP.
So, I have been reading about GCP Composer, GCP Workflows, Cloud Pub/Sub and I don't see clearly when to use each one, or use simple HTTP calls.
I understand that it depends a lot on the application that you are building, but for example, If I'm building a payment gateway and some functions should be fired after the payment was verified, like sending emails, making not related business logic, adding the purchase to a sales platform. So which one should be the way I manage this flow and in which case would be better to use the others? Should I use events to create an async flow with Pub/Sub, or use complex solutions like composer and workflows? or just simple HTTP calls?
As always, it depends!! Even in your use case, it depends! Ok, after a payment you want to send an email, make business logic, adding the order to your databases,...
But, is all theses actions can be done in parallel, or you need to execute them in a certain order and if a step fails, you stop the process?
In the first case, you can use Cloud PubSub with 1 message published (payment OK) and then a fan out to several functions in parallel. Else, you can use workflow to test the response of the fonction and then to call, or not the following fonctions. With composer you can perform much more checks and actions.
You can also imagine to send another email 24h after to thank the customer for their order, and use Cloud Task to delayed an action.
You talked about Cloud Functions, but you also have other solutions to host code on GCP: App Engine and Cloud Run. Cloud function is, most of the time, single purpose. Sending an email is perfect for a function.
Now, if you have "set of functions" to browse your stock, view the object details, review the price, and book an object (validate an order "books" the order content in your warehouse), the "functions" are all single purpose but related to the same domain: warehouse management. Thus you can create a webserver that propose different path to manage the warehouse (a microservice for the warehouse if you prefer) and host it on CloudRun or App Engine.
Each product has its strength and weakness. You will also see this when you will learn about the storage on GCP. Most of the time, you can achieve things with several product, but if you don't use the right one, it will be slower, or cost much more.
I would like to use AWS Lambda as a social media post scheduler, but I can't find an elegant way to do so. In our app, users create social media posts and set a time. We then post them via the social network's API at the time specified.
I need to be able to schedule a Lambda to run once at a scheduled time and with unique data (being the user's token and the body of the post) in order to accomplish it. Here's an example:
John wants to post to Twitter next Thursday at 2pm. He's scheduled a
post with the body "Hello world!" for that time via our web app. The
app will talk to AWS Lambda via the API and set a Lambda function to
fire one time next Thursday at 2pm. That function would fire a request
to the Twitter API with John's token and the body ("Hello world!").
Would love to be able to do this serverless with Lambda, but I can't find a great way. If you could pair a Cloudwatch scheduled event trigger with a unique payload, that might work, but I don't see that it's possible. Otherwise, it seems this would require creating a new Lambda function for each post with the data hard-coded or having the Lambda hit the database to look for the scheduled post. Creating potentially hundreds of bespoke Lambda functions seems like a huge mess, and hitting the database at Lambda runtime seems like undue stress on the database since we have all the data we need in-hand at the time we schedule.
Any suggestions for how I might accomplish this with Lambda? Is there another AWS service that is better suited to the task? Should I give up on serverless and just set up another EC2 instance to handle the scheduler?
You definitely don't want to be creating a function + event per scheduled task.
The scalable way to do this would be to schedule a single function to run regularly (e.g. hourly) and check a database to see if any posts where scheduled for the last hour (i.e. since the last run), and perform them if so.
The reason I am suggesting a database is because you need to manage your state (that is, the post payload/details) somewhere, and relying on CloudWatch Events for this is not the right way, for all the reasons you've listed in your question.
An alternative to a database would be to put the payload in S3, and have the scheduled function check a specific location/bucket for the payloads that need processing. Lambda to S3 communication is very fast, and you don't need to worry about load or network transfers.
You could use AWS Step Functions for this task. With these you can model a state machine which waits for the exact timestamp to trigger.
https://aws.amazon.com/step-functions/
The only drawback of those is, that documentation is still pretty scarce, but if you log into the AWS console, they provide some samples how to implement those wait processes.