Can Cloud Workflows be used for both orchestration AND transformation? - google-cloud-platform

I hope you guys are doing well.
We are evaluating some solutions (Apache Camel K and the likes) to allow teams to:
Low Code protocol transformation (Kafka, FTP, S3, MQ, SOAP, SFTP, gRPC, GraphQL, etc.) One team in particular has to integrate their product with 100s of external partners (each one uses a different integration technology), and writing each integration "by hand" would be a waste of time/motivation.
Enrich integrations' payloads (by calling both internal and external services)
Pay per execution/transformation/step (SERVERLESS)
Orchestrate processes that span multiples domain/services (On either our GCP account or Partners external Datacenters)
Strong retry and monitoring capabilities
Be part of our CI/CD pipeline (and not be limited to a Graphical interface)
The items in bold seem to be part of what Cloud Workflows does natively, but are the other requirements something that can be added to (or achieved with) GCW to keep it "serverless"? Please.
Any help would be appreciated.
Thanks

Cloud Workflow can perform basic transformation (on string or date), but I can't recommend that. It's better to have a Cloud Functions or a Cloud Run that perform a transformation with code. You will be able to write unit test on it and to ensure the quality and the evolution of your system without regressions.
For orchestration, it's the purpose of Cloud Workflows. Now, there is also some limits, or some corner case less easy to achieve with it. It depends on the complexity of your process and your expectations (observability, portability, replayability,...)

Related

Pros and Cons of Google Dataflow VS Cloud Run while pulling data from HTTP endpoint

This is a design approach question where we are trying to pick the best option between Apache Beam / Google Dataflow and Cloud Run to pull data from HTTP endpoints (source) and put them down the stream to Google BigQuery (sink).
Traditionally we have implemented similar functionalities using Google Dataflow where the sources are files in the Google Storage bucket or messages in Google PubSub, etc. In those cases, the data arrived in a 'push' fashion so it makes much more sense to use a streaming Dataflow job.
However, in the new requirement, since the data is fetched periodically from an HTTP endpoint, it sounds reasonable to use a Cloud Run spinning up on schedule.
So I want to gather pros and cons of going with either of these approaches, so that we can make a sensible design for this.
I am not sure this question is appropriate for SO, as it opens a big discussion with different opinions, without clear context, scope, functional and non functional requirements, time and finance restrictions including CAPEX/OPEX, who and how is going to support the solution in BAU after commissioning, etc.
In my personal experience - I developed a few dozens of similar pipelines using various combinations of cloud functions, pubsub topics, cloud storage, firestore (for the pipeline process state managemet) and so on. Sometimes with the dataflow as well (embedded into the pipelieines); but never used the cloud run. But my knowledge and experience may be not relevant in your case.
The only thing I might suggest - try to priorities your requirements (in a whole solution lifecycle context) and then design the solution based on those priorities. I know - it is a trivial idea, sorry to disappoint you.

Approach for syncing data from different CRM/MAP tools for different clients

We are bulding a highly efficient marketing automation tool.
What we want is to periodically sync data/modified data from our customer's CRM/MAP tools for salesforce, marketo, hubspot, etc.
While we have the REST APIs and all to do that, the biggest challenge we are facing is on architectural side. We are completely on AWS.
We have used sqs-lambda approach - but maintaing parallel syncing is a challenge. Also, it is difficult to modify changes with this approach as and when the number of customers and their integrations with CRM/MAP tools increase.
We are trying to use Airflow (MWAA) with FARGATE now - But we are facing some issues there as well.
Note:
Some syncs are being processed and its time taking (Hence, Lambda is
not a feasible approach)
Please suggest some good approach to do this efficiently on a daily basis.

Planning an architecture in GCP

I want to plan an architecture based on GCP cloud platform. Below are the subject areas what I have to cover. Can someone please help me to find out the proper services which will perform that operation?
Data ingestion (Batch, Real-time, Scheduler)
Data profiling
AI/ML based data processing
Analytical data processing
Elastic search
User interface
Batch and Real-time publish
Security
Logging/Audit
Monitoring
Code repository
If I am missing something which I have to take care then please add the same too.
GCP offers many products with functionality that can overlap partially. What product to use would depend on the more specific use case, and you can find an overview about it here.
That being said, an overall summary of the services you asked about would be:
1. Data ingestion (Batch, Real-time, Scheduler)
That will depend on where your data comes from, but the most common options are Dataflow (both for batch and streaming) and Pub/Sub for streaming messages.
2. Data profiling
Dataprep (which actually runs on top of Dataflow) can be used for data profiling, here is an overview of how you can do it.
3. AI/ML based data processing
For this, you have several options depending on your needs. For developers with limited machine learning expertise there is AutoML that allows to quickly train and deploy models. For more experienced data scientists there is ML Engine, that allows training and prediction of custom models made with frameworks like TensorFlow or scikit-learn.
Additionally, there are some pre-trained models for things like video analysis, computer vision, speech to text, speech synthesis, natural language processing or translation.
Plus, it’s even possible to perform some ML tasks in GCP’s data warehouse, BigQuery in SQL language.
4. Analytical data processing
Depending on your needs, you can use Dataproc, which is a managed Hadoop and Spark service, or Dataflow for stream and batch data processing.
BigQuery is also designed with analytical operations in mind.
5. Elastic search
There is no managed Elastic search service directly provided by GCP, but you can find several options on the marketplace, like an API service or a Kubernetes app for Google’s Kubernetes Engine.
6. User interface
If you are referring to a user interface for your own use, GCP’s console is what you’d be using. If you are referring to a UI for end-users, I’d suggest using App Engine.
If you are referring to a UI for data exploration, there is Datalab, which is essentially a managed notebook service, and Data Studio, where you can build plots of your data in real time.
7. Batch and Real-time publish
The publishing service in GCP, for both synchronous and asynchronous messages is Pub/Sub.
8. Security
Most security concerns in GCP are addressed here. Which is a wide topic by itself and should probably need a separate question.
9. Logging/Audit
GCP uses Stackdriver for logging of most of its products, and provides many ways to process and analyze those logs.
10. Monitoring
Stackdriver also has monitoring features.
11. Code repository
For this there is Cloud Source Repositories, which integrate with GCP’s automated build system and can also be easily synched with a Github repository.
12. Analytical data warehouse
You did not ask for this one, but I think it's an important part of a data analysis stack.
In the case of GCP, this would be BigQuery.

How to build complex apps with AWS Lambda and SOA?

We currently run a Java backend which we're hoping to move away from and switch to Node running on AWS Lambda & Serverless.
Ideally during this process we want to build out a fully service orientated architecture.
My question is if our frontend angular app requests the current user's ordered items to get that information it would need to hit three services, the user service, the order service and the item service.
Does this mean we would need make three get requests to these services? At the moment we would have a single endpoint built for that specific request, which can then take advantage of DB joins for optimal performance.
I understand the benefits SOA, but how to do we scale when performing more compex requests such as this? Are there any good resources I can take a look at?
Looking at your question I would advise to align your priorities first: why do you want to move away from the Java backend that you're running on now? Which problems do you want to overcome?
You're combining the microservices architecture and the concept of serverless infrastructure in your question. Both can be used in conjunction, but they don't have to. A lot of companies are using microservices, even bigger enterprises like Uber (on NodeJS), but serverless infrastructures like Lambda are really just getting started. I would advise you to read up on microservices especially, e.g. here are some nice articles. You'll also find answers to your question about performance and joins.
When considering an architecture based on Lambda, do consider that there's no state whatsoever possible in a Lambda function. This is a step further then stateless services that we usually talk about; they generally target 'client state' that does not exist anymore. But a Lambda function cannot have any state, so e.g. a persistent DB-connection pool is not possible. For all the downsides, there's also a lot of stuff you don't have to deal with which can be very beneficial, especially in terms of scalability.

What is web service composition?

What exactly is web service composition?
Composition refers to the way something is build, the new term at the moment is mash-up which basically means utilising a variety of different services in a composite application. So that functionality of disparate application can be used in one application.
I think your referring to service granularity - which means how much functionality a service exposes. a coarse grained service will expose a whole process as a consumable unit whereas a fine grained service will expose a specific unit of logic from a larger process. Obviously, it is up to the service architects to determine what granularity of service works best in the given environment.
This also, in a way has to do with the style of SOAP message you are using whether it is RPC style or document and that a service should be atomic and not hold external state. Meaning it does not need to know any more information other than that in the SOAP message to perform its function.
Hope this gives you a good starting point. The trouble with service-orientation is that it differs depending on who you read, but the main points stay the same!
Jon
Some web services which are provided for clients are abstract and composition of some smaller web services and it's called web service composition.
Sometimes there are more than one web service in order to use as the mentioned small web services, so we choose them based on QoS (Quality of Service) and many researches have been done on this subject.
Web service composition involves integration of two or more web service to achieve more added value of business functionality. A work flow composer is responsible of aggregating different web services to act as a single service according to functional requirements as well as QoS constrains. BPEL is one of the popular composers uses XML language to perform service composition. Fine-grained services perform single business task and provides higher flexibility and reusability. However, coarse-grained service involves performing complex business functionality leading to lower flexibility