'Workers and Workloads' - amazon-web-services

What are 'Workers and Workloads' in Distributed Computing? Heard about these terms recently. Are these general terms or vendor-specific?

Workloads are described in this article as:
an independent service or collection of code that can be executed. Therefore, a workload doesn’t depend on outside elements. A workload can be a small or complete application.
Workers are computers that you program to perform automatic tasks within the cloud. Tasks like monitoring databases, checking files on servers, etc. All within the cloud.
I found an article which writes about IBM introducing a workloads in the cloud concept, but I think the first part of my response covers the thing you're curious about.

Related

Pros and Cons of Google Dataflow VS Cloud Run while pulling data from HTTP endpoint

This is a design approach question where we are trying to pick the best option between Apache Beam / Google Dataflow and Cloud Run to pull data from HTTP endpoints (source) and put them down the stream to Google BigQuery (sink).
Traditionally we have implemented similar functionalities using Google Dataflow where the sources are files in the Google Storage bucket or messages in Google PubSub, etc. In those cases, the data arrived in a 'push' fashion so it makes much more sense to use a streaming Dataflow job.
However, in the new requirement, since the data is fetched periodically from an HTTP endpoint, it sounds reasonable to use a Cloud Run spinning up on schedule.
So I want to gather pros and cons of going with either of these approaches, so that we can make a sensible design for this.
I am not sure this question is appropriate for SO, as it opens a big discussion with different opinions, without clear context, scope, functional and non functional requirements, time and finance restrictions including CAPEX/OPEX, who and how is going to support the solution in BAU after commissioning, etc.
In my personal experience - I developed a few dozens of similar pipelines using various combinations of cloud functions, pubsub topics, cloud storage, firestore (for the pipeline process state managemet) and so on. Sometimes with the dataflow as well (embedded into the pipelieines); but never used the cloud run. But my knowledge and experience may be not relevant in your case.
The only thing I might suggest - try to priorities your requirements (in a whole solution lifecycle context) and then design the solution based on those priorities. I know - it is a trivial idea, sorry to disappoint you.

Can Cloud Workflows be used for both orchestration AND transformation?

I hope you guys are doing well.
We are evaluating some solutions (Apache Camel K and the likes) to allow teams to:
Low Code protocol transformation (Kafka, FTP, S3, MQ, SOAP, SFTP, gRPC, GraphQL, etc.) One team in particular has to integrate their product with 100s of external partners (each one uses a different integration technology), and writing each integration "by hand" would be a waste of time/motivation.
Enrich integrations' payloads (by calling both internal and external services)
Pay per execution/transformation/step (SERVERLESS)
Orchestrate processes that span multiples domain/services (On either our GCP account or Partners external Datacenters)
Strong retry and monitoring capabilities
Be part of our CI/CD pipeline (and not be limited to a Graphical interface)
The items in bold seem to be part of what Cloud Workflows does natively, but are the other requirements something that can be added to (or achieved with) GCW to keep it "serverless"? Please.
Any help would be appreciated.
Thanks
Cloud Workflow can perform basic transformation (on string or date), but I can't recommend that. It's better to have a Cloud Functions or a Cloud Run that perform a transformation with code. You will be able to write unit test on it and to ensure the quality and the evolution of your system without regressions.
For orchestration, it's the purpose of Cloud Workflows. Now, there is also some limits, or some corner case less easy to achieve with it. It depends on the complexity of your process and your expectations (observability, portability, replayability,...)

How does Google Cloud Run spin up instantly

So, I really like the idea of server-less. I came across Google Cloud Functions and Google Cloud Run.
So google cloud functions are individual functions, which is a broad perspective, I assume google must be securely running on a huge nodejs server. And it contains all the functions of all the google consumers and fulfils the request using unique URLs. Now, Google takes care of the cost of this one big server and charges users for every hit their function gets. So its pay to use. And makes sense.
But when it comes to Cloud Run. I fail to understand how does it work. Obviously the container must not always be running because then they will simply charge a monthly basis instead of a per-hit basis, just like a normal VM where docker image is deployed. But no, in reality, they charge on per hit basis, that means they spin up the container when a request arrives. So, I don't understand how does it spin it up so fast? The users have the flexibility of running any sort of environment, that means the docker container could contain literally anything. Maybe a full-fledged Linux OS. How does it load up the environment OS so quickly and fulfils the request? Well, maybe it maintains the state of the machine and shut it down when not in use, but even then, it will require a decent amount of time to restore the state.
So how does google really does it? How is it able to spin up a customer's container in literally no time?
The idea of fast spinning-up sandboxes containers (that run on their own kernel for security reasons) have been around for a pretty long time. For example, Intel Clear Linux Containers and Firecracker provide fast startup through various optimizations.
As you can imagine, implementing something like this would require optimizations at many layers (scheduling, traffic serving, autoscaling, image caching...).
Without giving away Google’s secrets, we can probably talk about image storage and caching: Just like how VMs use initramfs to pre-cache the state of the VM, instead of reading all the files from harddisk and following the boot sequence, we can do similar tricks with containers.
Google uses a similar solution for Cloud Run, called gVisor. It's a user-space virtualization technique (not an actual VMM or hypervisor). To run containers on a Linux-like environment, gVisor doesn't need to boot a Linux kernel from scratch (because gVisor reimplements the linux kernel in go!).
You’ll find many optimizations on other serverless platforms across most cloud providers (such as how to keep a container instance around, should you be predictively scheduling inactive containers before the load arrives). I recommend reading the Peeking Behind the Curtains of Serverless Platforms paper to get an idea about what are the problems in this space and what are cloud providers trying to optimize for speed and cost.
You have to decouple the containers to the VMs. The second link of Dustin is great because if you understand the principles of Kubernetes (and more if you have a look to Knative), it's easy to translate this to Cloud Run.
You have a pool of resources (Nodes in Kubernetes, the VM in fact with CPU and memory) and on these resources, you can run container: 1, 2, 1000 per VM, maybe, you don't know and you don't care. The power of the container, is the ability to be packaged with all the dependency that it needs. Yes, I talked about package because your container isn't an OS, it contains the dependencies for interacting with the host OS.
For preventing any problem between container from different project/customer, the container run into a sandbox (GVisor, first link of Dustin).
So, there is no VM to start and to stop, no VM to create when you deploy a Cloud Run services,... It's only a start of your container on existing resources. It's also for this reason that you need to have a stateless container, without disks attached to it.
Do you want 3 "secrets"?
It's exactly the same things with Cloud Functions! Your code is packaged into a container and deploy exactly as it's done with Cloud Run.
The underlying platform that manages Cloud Functions and Cloud Run is the same. That's why the behavior and the feature are very similar! Cloud Functions is longer to deploy because Google need to build the container for you. With Cloud Run the container is already built.
Your Compute Engine instance is also managed as a container on the Google infrastructure! More generally, all is container at Google!

Orchestrated vs Choreographed Service-Oriented Architecture in large scale?

I'm an architect in a large scale financial company and we are in the beginning of implementing a new business oriented infosystem across our different countries.
From the very early on the core idea has been to follow microservice oriented principles as much as possible (and making sure engineers have read Building Microservices book by Sam Newman).
By now I've come to crossroads. Our services are primarily JSON REST services using Swagger for automated documentation, but in order to use these services in our business processes and making sure not to write business logic into services outside the domain of those services, we've been using Camunda as an orchestration tool. And Camunda is fine (though some have considered Corezoid as an alternative), but somewhat clumsy in what is an otherwise an elegant set of services.
Now service orchestration is a concept pretty familiar to most engineers. But it is one that I am not entirely happy with due to still having a central engine that drives everything. It is incredibly expensive to replace later down the road (though still cheaper to replace than a monolith). And even if this central engine is split into multiple engines (which is actually the case today), it does not necessarily make it much better.
In recent years there has been a movement with microservices towards choreographed (close to event-driven) architecture. It is at this point where I am looking for advice from engineers and architects who have faced similar crossroad decision points.
I absolutely love the idea of decoupled architecture and despite feeling good about killing monoliths and having elegant independent services, I still detect a lot of dependencies in business process as a whole in current orchestrated solution in where it should not actually exist.
And it's not like we are avoiding events. We have actually implemented events on our architecture as well in order to decouple many processes with the core principle that if you don't need a synchronized response and just need to notify of something happening to initiate another process an event is put up that may be caught by another process that starts executing. And orchestration is easier to explain and visualize, it is easier to tweak and modify by more technical minded business users. And I think it is easier to test and validate from business perspective. Orchestrated architecture like this also (usually) expects a good service discovery and quality automated documentation and non-functional requirements which are all things I value greatly.
All of those things that are a question to me in choreographed approach since I don't have first-hand experience in running this in large scale - just some local test prototypes.
But I think you see where I am coming from. I'm trying to consider alternatives without having to regret driving the company all the other way in the end.
Perhaps you can share your own experience with a similar situation or share an interesting link or two? Or am I looking for a silver bullet that doesn't exist yet?
Services need to interact - services that don't interact are not part of the same system. The search needs access to the catalog, the cart doesn't get the price info from the page, the account needs the purchase history, the recommender needs purchase history, the cart needs to verify the currently available coupons, the inventory needs to know something was purchased etc.
Service boundaries are set to minimize the needed interactions. It can make sense to cut a service to smaller components but if they share a database (internal structure) they are different aspects of the service.
When services interact it creates a level of coupling - at the least, this coupling is some API (JSON or otherwise) that the service has to "maintain" for so other services can interact with it.
Another coupling type is temporal coupling - which is what you get in request-reply situations (and you can eliminate in event driven systems) However, Orchestration vs. Choreography is not about these differences (even though orchestration is mostly associated with request/reply) - it is about central control and governance vs.flexibility and serendipity.
Orchestration has risks like migrating business logic out of services into the orchestration while choreography runs the risk of chaos. By the way, direct request/reply integration has the worst of both worlds but wins on simplicity when systems are small enough.
Choosing between the two is a balancing act (like most architectural decisions) for instance, Netflix built on choreography for a lot of time but then found they need some control back and introduced an orchestration engine. Nothing is a silver bullet :)
Personally, I like choreography better because of the reduced coupling and flexibility and favor tools like open Zipkin to bring some order into the chaos.
You can see a partial example for an orchestration based arch in slides 10-22 of a presentation I did about microservices
I think I understand where you're coming from, having recently redesigned a system to a "microservices" architecture. I like (and use) the approach by these guys: http://scs-architecture.org/
The main point is, that you try to avoid cross-dependencies between you "services", which basically makes choreography obsolete. The hard part is decomposing your problem domain into chunks which do not need eachother for any of the executed business-cases. They may need different kinds of data that may or may not be "shared", as in present in multiple systems, but they don't need synchronous calls between them for any given business case.
This is quite different from what Netflix is doing for example. Those guys/gals are doing chain-calling through different layers of services, each adding its logic to the "process". This model might fit in some cases, and probably fits in Netflix's case. But it may not be necessary for you.
The ideal Self-Contained System would be completely independent of other Self-Contained Systems, would cover one or more highly cohesive business functions (in full depth from the UI to Persistence!), and would be not calling any other system synchronously. The ideal system would let the client "orchestrate", by just offering links through its Web (HTML) interface.
Think more like Amazon. The "Landing Page" is a different application than the "Search", which is still different from the "Checkout". They are completely different, sometimes even look a bit different! Integrated by links and forms in HTML, not explicitly orchestrated.
This might be what you are looking for.
Some warnings: First instinct of some people is to have "Customer" microservice, or "Product Repository" microservice, and similar. This will not lead to Self-Contained Systems, as you will need synchronous calls to these things, making them essentially "central" components. The key is to split the business domain, so bounded contexts a la Eric Evans.

SOA - How granular should services be to maintain performance?

I am taking over a project to replace an ancient legacy system from the ground up. Before I came on, the company hired a consultant who put together a basic sketch of the system and pushed SOA heavily. This resulted in a long list of "entity services", with the intention of them being composed into more complex service combinations. For instance, a user wanting committee info would hit the "Committee" service, which then calls the "Person" service to get its members, and the "Meeting" service to get its meetings, and so on.
I understand the flexibility gains in this, but my concerns are about performance. It seems to me that a system built with such a fine level of granularity to its services spends too many resources on translating service messages, and the performance will be unacceptable. It also seems to me that the flexibility gains can still be made using basic reusable objects, although in that case the benefit of a technology-agnostic interface is lost to gain performance.
For more background: The organization requesting this software does not currently have a stable of third-party software suites that need integrating with. This software will replace all suites. There are also currently no outside consumers who need to access the data outside of the provided website interface -- all service calls will be from other pieces inside our system. The choice of SOA in this case seems entirely based on the concept of "preparation".
So my question -- what level of granularity is acceptable in a stable of services without sacrificing performance? Am I being too skeptical of the performance hits we'll take implementing all our entities as services? Should functionality be available as web services only when they are needed, with the "preparation" focus instead going into designing the business layer for the probability of services later being dropped on top of it?
First off, finding the "sweet spot" in the number of services is difficult for sure. Too many services, and your integration costs suffer, too few, and your implementation costs suffer. You have to find a good balance.
My advice to you is to follow Juval Lowy's methodology in that you should break down your services by areas of volatility, or areas of change. This will give you your granularity level. You should also read his WCF book if you can.
As for the performance, WCF will inherently support many thousands of calls per second depending on your use cases and hardware. Services calling services is not a problem. The platform will support it if you do your part. For example, use the right binding for the right scenario (named pipes to call services on the same machine and TCP to call services across machine where possible). You should also implement a vertical slice of the application and do performance testing before building the rest of the application. This will verify your architecture.
When I say "Service", I mean the complete vertical component that can perform a complete independent operation. And I don't prefer going in more granularity unless there is exceptional requirement. In my view of SOA, A service should perform the meaningful business function that can be independently performed. A service should not require another service to complete its function.
What level of granularity is acceptable in a stable of services without sacrificing performance?
Individual entities. As described by the consultant.
Am I being too skeptical of the performance hits we'll take implementing all our entities as services?
Yes. Way too skeptical.
A decent framework can optimize some of these requests so that they don't involve a lot of network overheads.
As with SQL databases, the problems are largely solved. You'll find that the underlying applications that you're presenting as services are the bottlenecks. The SOA layer is largely plumbing. The bits still need to move through the pipes, the SOA layer just organizes them more intelligently than most of the alternatives.
Should services only be implemented when they are needed, with the "preparation" focus instead going into designing the business layer for the probability of services later being dropped on top of it?
Yes.
That's what "Agile" means.
Find a user story. Build just the services (and entities) for that story.
You will have some significant overhead for the first few stories in getting your SOA framework all squared away and deployable as a simple, repeatable release step.
Never do extensive "preparation" for things you "may" need in some improbable future. Read up on Agile and how to prioritize a backlog.