AWS step functions vs Camunda for workflow

AWS step functions vs Camunda for workflow - amazon-web-services

I am not sure if I am comparing Apples to Oranges, but both Camunda and Step Functions of AWS seem to address the same thing: Workflows. Help me with the comparison of both, which to use when. Are they replaceable?

You are not comparing apples to oranges. Both tools are workflow engines.
As a background read, the comparison is also discussed for example here: https://forum.camunda.org/t/bpmn-vs-aws-step-function/5460.
Differences in essence:
Process modeling language (Proprietary Amazon State Language vs. standardized BPMN supporting more language constructs from http://www.workflowpatterns.com/)
Visualization of process models for different stakeholders (simple auto-generated for Step Functions, BPMN for Camunda)
Architecture possibilities (Step functions are cloud-only and even AWS-only, but then of course integrated in the AWS world; Camunda is independent and can run in any environment, but also needs additional work to integrate with AWS).
As a rule of thumb:
Use Step Functions if you have quite technical workflows that need to work only in the AWS world
Use Camunda in all other cases, including more hybrid environments and "bigger" processes
I also described this more in https://processautomationbook.com/

Related

Pros and Cons of Google Dataflow VS Cloud Run while pulling data from HTTP endpoint

This is a design approach question where we are trying to pick the best option between Apache Beam / Google Dataflow and Cloud Run to pull data from HTTP endpoints (source) and put them down the stream to Google BigQuery (sink).
Traditionally we have implemented similar functionalities using Google Dataflow where the sources are files in the Google Storage bucket or messages in Google PubSub, etc. In those cases, the data arrived in a 'push' fashion so it makes much more sense to use a streaming Dataflow job.
However, in the new requirement, since the data is fetched periodically from an HTTP endpoint, it sounds reasonable to use a Cloud Run spinning up on schedule.
So I want to gather pros and cons of going with either of these approaches, so that we can make a sensible design for this.

I am not sure this question is appropriate for SO, as it opens a big discussion with different opinions, without clear context, scope, functional and non functional requirements, time and finance restrictions including CAPEX/OPEX, who and how is going to support the solution in BAU after commissioning, etc.
In my personal experience - I developed a few dozens of similar pipelines using various combinations of cloud functions, pubsub topics, cloud storage, firestore (for the pipeline process state managemet) and so on. Sometimes with the dataflow as well (embedded into the pipelieines); but never used the cloud run. But my knowledge and experience may be not relevant in your case.
The only thing I might suggest - try to priorities your requirements (in a whole solution lifecycle context) and then design the solution based on those priorities. I know - it is a trivial idea, sorry to disappoint you.

Can Cloud Workflows be used for both orchestration AND transformation?

I hope you guys are doing well.
We are evaluating some solutions (Apache Camel K and the likes) to allow teams to:
Low Code protocol transformation (Kafka, FTP, S3, MQ, SOAP, SFTP, gRPC, GraphQL, etc.) One team in particular has to integrate their product with 100s of external partners (each one uses a different integration technology), and writing each integration "by hand" would be a waste of time/motivation.
Enrich integrations' payloads (by calling both internal and external services)
Pay per execution/transformation/step (SERVERLESS)
Orchestrate processes that span multiples domain/services (On either our GCP account or Partners external Datacenters)
Strong retry and monitoring capabilities
Be part of our CI/CD pipeline (and not be limited to a Graphical interface)
The items in bold seem to be part of what Cloud Workflows does natively, but are the other requirements something that can be added to (or achieved with) GCW to keep it "serverless"? Please.
Any help would be appreciated.
Thanks

Cloud Workflow can perform basic transformation (on string or date), but I can't recommend that. It's better to have a Cloud Functions or a Cloud Run that perform a transformation with code. You will be able to write unit test on it and to ensure the quality and the evolution of your system without regressions.
For orchestration, it's the purpose of Cloud Workflows. Now, there is also some limits, or some corner case less easy to achieve with it. It depends on the complexity of your process and your expectations (observability, portability, replayability,...)

Orchestrated vs Choreographed Service-Oriented Architecture in large scale?

I'm an architect in a large scale financial company and we are in the beginning of implementing a new business oriented infosystem across our different countries.
From the very early on the core idea has been to follow microservice oriented principles as much as possible (and making sure engineers have read Building Microservices book by Sam Newman).
By now I've come to crossroads. Our services are primarily JSON REST services using Swagger for automated documentation, but in order to use these services in our business processes and making sure not to write business logic into services outside the domain of those services, we've been using Camunda as an orchestration tool. And Camunda is fine (though some have considered Corezoid as an alternative), but somewhat clumsy in what is an otherwise an elegant set of services.
Now service orchestration is a concept pretty familiar to most engineers. But it is one that I am not entirely happy with due to still having a central engine that drives everything. It is incredibly expensive to replace later down the road (though still cheaper to replace than a monolith). And even if this central engine is split into multiple engines (which is actually the case today), it does not necessarily make it much better.
In recent years there has been a movement with microservices towards choreographed (close to event-driven) architecture. It is at this point where I am looking for advice from engineers and architects who have faced similar crossroad decision points.
I absolutely love the idea of decoupled architecture and despite feeling good about killing monoliths and having elegant independent services, I still detect a lot of dependencies in business process as a whole in current orchestrated solution in where it should not actually exist.
And it's not like we are avoiding events. We have actually implemented events on our architecture as well in order to decouple many processes with the core principle that if you don't need a synchronized response and just need to notify of something happening to initiate another process an event is put up that may be caught by another process that starts executing. And orchestration is easier to explain and visualize, it is easier to tweak and modify by more technical minded business users. And I think it is easier to test and validate from business perspective. Orchestrated architecture like this also (usually) expects a good service discovery and quality automated documentation and non-functional requirements which are all things I value greatly.
All of those things that are a question to me in choreographed approach since I don't have first-hand experience in running this in large scale - just some local test prototypes.
But I think you see where I am coming from. I'm trying to consider alternatives without having to regret driving the company all the other way in the end.
Perhaps you can share your own experience with a similar situation or share an interesting link or two? Or am I looking for a silver bullet that doesn't exist yet?

Services need to interact - services that don't interact are not part of the same system. The search needs access to the catalog, the cart doesn't get the price info from the page, the account needs the purchase history, the recommender needs purchase history, the cart needs to verify the currently available coupons, the inventory needs to know something was purchased etc.
Service boundaries are set to minimize the needed interactions. It can make sense to cut a service to smaller components but if they share a database (internal structure) they are different aspects of the service.
When services interact it creates a level of coupling - at the least, this coupling is some API (JSON or otherwise) that the service has to "maintain" for so other services can interact with it.
Another coupling type is temporal coupling - which is what you get in request-reply situations (and you can eliminate in event driven systems) However, Orchestration vs. Choreography is not about these differences (even though orchestration is mostly associated with request/reply) - it is about central control and governance vs.flexibility and serendipity.
Orchestration has risks like migrating business logic out of services into the orchestration while choreography runs the risk of chaos. By the way, direct request/reply integration has the worst of both worlds but wins on simplicity when systems are small enough.
Choosing between the two is a balancing act (like most architectural decisions) for instance, Netflix built on choreography for a lot of time but then found they need some control back and introduced an orchestration engine. Nothing is a silver bullet :)
Personally, I like choreography better because of the reduced coupling and flexibility and favor tools like open Zipkin to bring some order into the chaos.
You can see a partial example for an orchestration based arch in slides 10-22 of a presentation I did about microservices

I think I understand where you're coming from, having recently redesigned a system to a "microservices" architecture. I like (and use) the approach by these guys: http://scs-architecture.org/
The main point is, that you try to avoid cross-dependencies between you "services", which basically makes choreography obsolete. The hard part is decomposing your problem domain into chunks which do not need eachother for any of the executed business-cases. They may need different kinds of data that may or may not be "shared", as in present in multiple systems, but they don't need synchronous calls between them for any given business case.
This is quite different from what Netflix is doing for example. Those guys/gals are doing chain-calling through different layers of services, each adding its logic to the "process". This model might fit in some cases, and probably fits in Netflix's case. But it may not be necessary for you.
The ideal Self-Contained System would be completely independent of other Self-Contained Systems, would cover one or more highly cohesive business functions (in full depth from the UI to Persistence!), and would be not calling any other system synchronously. The ideal system would let the client "orchestrate", by just offering links through its Web (HTML) interface.
Think more like Amazon. The "Landing Page" is a different application than the "Search", which is still different from the "Checkout". They are completely different, sometimes even look a bit different! Integrated by links and forms in HTML, not explicitly orchestrated.
This might be what you are looking for.
Some warnings: First instinct of some people is to have "Customer" microservice, or "Product Repository" microservice, and similar. This will not lead to Self-Contained Systems, as you will need synchronous calls to these things, making them essentially "central" components. The key is to split the business domain, so bounded contexts a la Eric Evans.

Amazon States Language: Java Library

https://states-language.net/spec.html
Is there an API/library which I can use to process states as per this standard. I understand AWS resources are available; is there anything independent of AWS and in the java library that can be used?

There is complete lightweight microservice platform (probably fastest on the market) called light4j eventuate. One of subproject is https://github.com/networknt/light-workflow-4j which implements Amazon States Language completely.

As of now, there is no Java implementation/bindings outside AWS.
Netflix Conductor blog states:
Recently announced AWS Step Functions added some of the features we were looking for in an orchestration engine. There is a potential for Conductor to adopt the states language to define workflows.
So - since their decider component is a Java service - maybe that could become a Java implementation someday.
https://github.com/totherik/step is based on nodejs and focused on openwhisk compatibility.

'Workers and Workloads'

What are 'Workers and Workloads' in Distributed Computing? Heard about these terms recently. Are these general terms or vendor-specific?

Workloads are described in this article as:
an independent service or collection of code that can be executed. Therefore, a workload doesn’t depend on outside elements. A workload can be a small or complete application.
Workers are computers that you program to perform automatic tasks within the cloud. Tasks like monitoring databases, checking files on servers, etc. All within the cloud.
I found an article which writes about IBM introducing a workloads in the cloud concept, but I think the first part of my response covers the thing you're curious about.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js