GCP Best way to manage multiple cloud function flow - google-cloud-platform

I'm studying GCP and reading about different ways to communicate and manage cloud functions I end up wondering when to use each of the services that offer GCP.
So, I have been reading about GCP Composer, GCP Workflows, Cloud Pub/Sub and I don't see clearly when to use each one, or use simple HTTP calls.
I understand that it depends a lot on the application that you are building, but for example, If I'm building a payment gateway and some functions should be fired after the payment was verified, like sending emails, making not related business logic, adding the purchase to a sales platform. So which one should be the way I manage this flow and in which case would be better to use the others? Should I use events to create an async flow with Pub/Sub, or use complex solutions like composer and workflows? or just simple HTTP calls?

As always, it depends!! Even in your use case, it depends! Ok, after a payment you want to send an email, make business logic, adding the order to your databases,...
But, is all theses actions can be done in parallel, or you need to execute them in a certain order and if a step fails, you stop the process?
In the first case, you can use Cloud PubSub with 1 message published (payment OK) and then a fan out to several functions in parallel. Else, you can use workflow to test the response of the fonction and then to call, or not the following fonctions. With composer you can perform much more checks and actions.
You can also imagine to send another email 24h after to thank the customer for their order, and use Cloud Task to delayed an action.
You talked about Cloud Functions, but you also have other solutions to host code on GCP: App Engine and Cloud Run. Cloud function is, most of the time, single purpose. Sending an email is perfect for a function.
Now, if you have "set of functions" to browse your stock, view the object details, review the price, and book an object (validate an order "books" the order content in your warehouse), the "functions" are all single purpose but related to the same domain: warehouse management. Thus you can create a webserver that propose different path to manage the warehouse (a microservice for the warehouse if you prefer) and host it on CloudRun or App Engine.
Each product has its strength and weakness. You will also see this when you will learn about the storage on GCP. Most of the time, you can achieve things with several product, but if you don't use the right one, it will be slower, or cost much more.

Related

Can a Google Cloud Function communicate through a WebSocket?

Our many end users will, through a web browser, read and write in partly overlapping data.
When a user makes a change, a related change should be broadcasted to relevant other users.
Example use case: Several end users, each on their own device, look at a calendar with available time blocks to make an appointment. One of them creates an appointment, causing that a time block is not available for others anymore. The calendar on the screens of those others is updated accordingly and immediately.
Technically this would mean:
Browser sends 'create appointment' event through WebSocket
This event spins up a Cloud Function, which does the following (and then terminates):
Reserve the required capacity in the database
If this causes that the used time block is not available anymore for other users: Broadcast a 'not available anymore' event through the WebSockets of those other users that are viewing this time block.
In Google Cloud this is possible using an Apigee Java callout, where the Java (if needed) calls a Cloud Function, as described on https://cloud.google.com/apigee/docs/api-platform/develop/how-create-java-callout. However, Apigee runs in Kubernetes (https://cloud.google.com/apigee/docs/hybrid/kubernetes-resources), causing the overhead of containers being up at moments when they are not or sparsely used.
Google Clouds API Gateway https://cloud.google.com/api-gateway doesn't support WebSockets: https://issuetracker.google.com/issues/176472002?pli=1
Is there a way to accomplish our goal through a Cloud Function, without any container?

Possible to route multiple projects to a Cloud Function endpoint?

I have a Saas billing-model and each user has their own GCP Project. This is similar to this reddit thread, which asks:
I’m thinking about selling a saas service. I’ve decided every customer will get their own gcp project every customer will have a bunch of cloud run services, a cloud sql database and some users in Identity platform. I know the default project limit is around 12 and it can be increased by filling a form.
This works for something like BigQuery, where each user's Dataset or Table will be created within their own GCP project, and thus their billing (and data) will be segmented under their project.
However, I also have some shared endpoints on Google Cloud Functions, for example let's say I have general/shared endpoints to do something like "export data". Now of course the query to grab the data will hit the correct GCP project, but if the export (or some other data processing task) is doing something that is very expensive -- some exports might take over an hour to write the data, if dealing with billions of rows, what would be the suggested way to set that up so the end user is paying for their computation, since I imagine an endpoint such as www.example.com/api/export is just going to be on the main Project account, and we wouldn't have, for example, 1000 different cloud functions that do the same thing just to have each one under their respective project.
What might be a solution to this? In a way I'm looking for something like this I suppose where the requestor pays.
You would probably need to record how long each function call took, and save that data somewhere before exiting the shared function.
The only alternative would be to split the function for each client, and use billing labels to help with allocation.

How create a combined response from multiple microservices (cloud run containers) in a single api endpoint using Google Cloud Endpoints (gateway)?

I am familiar with firebase platform, but I am relatively a new user of the google cloud platform as whole.
I am working on a project built using a microservices structure, and I do have so many question for which I cannot find an answer or better I cannot find any example.
Unfortunately all the example that I am able to find are way to simple to be able to extrapolate a viable answer for my issues.
I adopted the new cloud run offer, and I decided to play with the full managed version (not kubernetes). I built few microservices (each service is built using express for node or flask for python - depending on what the services does). Each microservices expose it's own endpoint and has it's own api to call the methods - and I use a service account to allow the application to perform the internal calls.
I now want to expose the application to the external (specifically to my client built using vuejs technology), and I was trying to leverage another google product to create and expose an api: the google endpoints.
My question (specifically referred to the cloud run structure) is related to how is possible and what I need to do to create an api endpoints to communicate with the client app, that internally calls multiple services and combine their response in one.
Just to be clear, let's make an example:
Cloud run service 1 -> crud user api
Cloud run service 2 -> crud product api
Cloud endpoint external visible api -> get user from service 1, and after get products from service 2 and return the combined response all green products for user Jane Doe.
How I can aggregate the response directly in the endpoint gateway, check for failure and if everything goes smooth send the aggregate response to the client?
I need to build the aggregate endpoint in something else, like a cloud function for example? or I can do it directly in the google endpoints gateway?
Note that for cloud run the google endpoints is another cloud run container.
Thanks guys for some help, running pretty much out of option here.
As per my understanding, API Gateway should just work as a proxy, presenting all micro services as a single endpoint. To this scenarios I think you can have following 2 approaches :
1: Implement a new micro service (or on any of the existing one) which will do invocations and aggregation of responses.
2: Client(like UI) can invoke the services and do the aggregation on their side as well.
I feel, it is not a good idea to do it at api-gateway.
In my opinion, from an architectural point of view, the best option for you is to create a new microservice which will take the responses from the other two and then, it will aggregate them.
I understand that you want to aggregate the responses in a api-geteway and you are not able to find code examples for it. Here I was able to find a guide on what are you wanting to implement. The full code implementation can be found in this repository.
Keep in mind though, this idea of implementation is not a best practice.
This is ok, only if those two services that are going to be combined are independent. Meaning there is no functional/business relation between them and the concurrency or inconsistency problem will not occur in the process of aggregating.

Display real time data on website that scales?

I am starting a project where I want to create a website which will display LIVE flight information and status. We all have seen this at airport. An example is given here - http://www.computronics.biz/productimages/prodairport4.jpg. As you can see this information changes continuously. The website will talk to a backend api and the this backend api will talk to database. Now the important part is that the flight information in the database will be updated by the airline itself. There could be several airlines and they will update their data respectively. I have drawn a diagram and uploaded here - https://imgur.com/a/ssw1S.
Now those airlines will obviously have an interface (website talking to some backend API) through which they will update the database.
Now here is my attempt to solve it. We need to have some sort of trigger such that if any airline updates a flight detail in the database between current time - 1 hour to current + 4 hours (website will only display few hours of flights), we need to call the web api and then send the update to the website in the real time. The user must not refresh the page at all. At the same time the website needs to scale well i.e. if 1 million users are on the website, and there is an update in the database in the correct time range, all 1 million user's website should get updated within a decent amount of time.
I did some research and it looks like we need to have an event based approach. For example - we need to create a function (AWS lambda or Azure function) that should be called whenever there is an update in the database (Dynamo DB for example) within the correct time range. This function then should call an API which should then update the website through web socket technology for example.
I am not looking for any code but just some alternative suggestions on how this can be solved in a scalable way. Also how do we test scalability?
Dont use serverless functions(Lambda/Azure functions)
Although I am a huge fan of serverless functions, and currently running a full web app in Lambda, I don't think its needed for your use case and doesn't make sense economically. As you've answered in the comments, each airline will not write directly to the database, they'll push to an API, meaning you are explicitly told when flights have changed. When an airline has sent you new data you can simply propagate this to all the browser endpoints via websockets. This keeps the design very simple. There is no need to artificially create a database event that then triggers a function that will then tell you a flight has been updated. Thats like removing your doorbell and replacing it with a motion detector that triggers a doorbell :)
Cost
Money always deserves its own section. Lambda is more of an economic break through than a technological one. You have to know when its cost effective. You pay per request so if your dealing with a process that handles 10,000 operations a month, or something that only fires 1,000 times a day, than lambda is dirt cheap and practically free. You also pay for the length of time the function is executing and the memory consumed while executing. Generally, it makes sense to use lambda functions where a dedicated server would be sitting idle for most of the time. So instead of a whole EC2 instance, AWS provides you with a container on demand. There are points at which high requests rates and constantly running processes makes lambda more expensive than EC2. This article discusses how generally its cheaper to use lambda up to a point -> https://www.trek10.com/blog/lambda-cost/ The same applies to Azure functions and googles equivalent. They are all just containers offered on demand.
If you're dealing with flight information I would imagine you will have thousands of flights being updated every minute so your lambda functions will be firing constantly as if you were running an EC2 instance. You will end up paying a lot more than EC2. When you have a service that needs to stay up 24/7 and run 24/7 with high activity that is most certainly a valid use case for a dedicated server or servers.
Proposed Solution
These are the components I would use below:
Message Queue of some sort (RabbitMQ or AWS SQS with SNS perhaps)
Web Socket Backend (The choice will depend on programming language)
Airline input API (REST,GraphQL, or maybe AWS Kinesis Data Firehose)
The airlines publish their data to a back-end api. The updates are stored on a message queue and the web applicaton that actually displays the results to users, via websockets, reads from the queue.
Scalability
For scalability you can run the websocket application on multiple EC2 instances (all reading from the same queuing service) in an autoscaling group, so with extra load more instances will be created automatically hence the name "autoscaling". And those instances can sit behind an elastic load balancer. Lots of AWS documentation on how to do this and its their flagship design pattern. If you use AWS SQS you don't have to manage the scalability details yourself, aws handles that. The only real components to scale are your websocket application and the flight data input endpoint. You can run the flight api in an autoscaling group as well but AWS does offer an additional tool for high traffic data processing. I detail that below.
Testing Scalability
It would be fairly easy to have a mock airline blast your service with thousands and thousands of fake updates and on the other end you can easily run multiple threads of selenium tests simulating browser clicks and validating that the UI is still operational.
Additional tools
If it ends up being large amounts of data, rather than using a conventional REST api for your flight update service you could consider a service AWS offers specifically for dealing with large amounts of real time updates (Kinessis Data Firehose) https://aws.amazon.com/kinesis/data-firehose/ But I've never used it.
First, please don't over think this. This is a trivial problem to solve and doesn't require any special techniques, technologies or trendy patterns & frameworks.
You actually have three functional areas you can address almost separately.
Ingestion - Collection and normalization of the data from the various sources. For this, you'll need a process and transformation engine, LogicApps or such.
Your databases. You'll quickly learn that not all flights are the same ;). While it might seem so, the amount of data isn't that much. Instances of MySQL/SQL Server tuned for a particular function will work just fine. Hint, you don't need to have data for every movement ready to present all the time.
Presentation. The data API and UIs. This, really, is the easy part. I would suggest you use basic polling at first. For reasons you will never have any control over, the SLA for flight data is ~5 minutes so a real-time client notification system is time you should spend elsewhere at first.

Best option to schedule payments: azure scheduler, WebJob or Azure Functions or a Worker Role?

I've hosted my website on azure and now I want to schedule payments on a monthly basis. I am using Authorize.net for payments but I cannot use their recurring billing feature as it gives very little control. I've to perform checks in the database, make payments and update records. What should I use Azure Scheduler, Azure WebJob or Azure Functions a Worker Role?
Definitely not a Worker Role. They are very heavyweight and generally not worth the effort for a single, simple job like this.
Web jobs might be a good solution. It can run in the context of your web app, so you can use this with no additional cost. But you'll need to do some development with this - you have to create an app that calls Authorize.net.
If you only need to fire a single HTTP request, then using Azure Scheduler to schedule this HTTP action might be a good choice. You can configure the request itself (headers, payload) and it has error handling as well. But you might have to store sensitive information in the Azure portal, in the configuration of the scheduled job.
So I'd say forget about the Worker Role, then weigh simplicity against flexibility and development effort. That being sad, I would probably try it with the scheduler, and then move on to the WebJob, if I encounter something that is not feasible with the scheduler.
Edit:
Azure Functions can also be a good option - I'd say it's sort of a middle ground between the webjob and the simple scheduled option. It is part of the app services featureset, so it can run in the same appservice plan as the web app, so no costs. But here you have to code the http request to Authorize.net yourself as well. But Azure Functions is a lot more lightweight compared to webjobs - you do not have to create an exe (or ps script or whatever), you can just code the http request in a script editor inside the Azure portal. But you still have to do it yourself. This is a bit more flexible than the simple scheduled option though, which is something to consider when it comes to error handling.
So this is a good middleground, but I think it's still a lot of work given the complexity of the task (which is to fire a single HTTP request).
To get it working quickly, Logic Apps is a good choice. With Logic Apps, you can trigger it with a timer based on schedule you defined, use the out-of-box SQL/DocDB (depending on your exact scenario) to connect to your database. Although there's currently no Authorize.net connector available, you should be able to use the generic HTTP action to talk to its RESTful APIs. Most likely, you should be able to get this working very quickly. I'd also recommend submit a suggestion on aka.ms/logicapps-wish so we can track the request for Authorize.net connector, when available, is going to make this ever easier.