I am playing around with containers (docker) and I'm trying to create several extraction jobs.
Long story short, currently I have a huge container that runs for several hours doing these extractions.
I want to break this docker image in several minor images, but there is a catch: some of these extractions need to run sequentially. Meaning, for example, that in some cases a given container can only start after a previous one ends.
Is it possible to do this with docker compose? Is there any other solution? I would like to avoid kubernetes and run this on aws or azure ( with azure container instances or aws ecs)?
Thanks
First of all, split your application into as many components as you need. Ideally, one container should carry out one task only.
Then, given that you want to run the containers sequentially, you should adopt a state machine service. On AWS, you can use Step Functions. On Azure, you can use Logic Apps. With either of these services, you can configure running containers in sequence or even in parallel, as needed.
Related
I have a set of calculations that needs to run in a batch, and the workload is easily parallelized across machines. The work to be done is already done within a Docker container. I'm trying to understand the easiest way for me to run this workload in a highly parallel way on AWS. However, in trying to figure out where to begin I'm having trouble finding the right entrypoint. I read about AWS Batch and AWS Fargate, but each time I try to go down one of those paths to learn about them in more detail, more AWS services start popping up (Lamdas, Step Functions, ECS, AutoScaling groups), with each article having a different combination. Furthermore, I start thinking about the problem as a Batch vs Fargate problem, and then I find another article that talks about Batch + Fargate, or X + ECS + ....
I'm having trouble finding the appropriate introduction to the choices so I can get started with setting something up and getting some experience. Any pointers on which direction I might go or some resources for me to look at?
AWS containers services team member here. Your question triggers all my button cause I have been working on a deliverable to address some of this confusion ("where do I start with xyz?"). I can try to answer your question briefly here but if you want to read more (perhaps way more than you'd need feel free to contact me offline (mreferre at amazon dot com will work).
First and foremost it's not a Vs but it's an AND. Think of all these products you mention being distributed at different layers of the stack (this is a draft visual in the deliverable):
Fargate represents capacity (where your container is running), ECS represents a core containers orchestrator and Batch is one of the provisioners on top of the container orchestrator. Lambda is something separate and that live on its own. The options for your specific use case seem to be:
Lambda
ECS/Fargate
Batch/ECS/Fargate
Step Functions/ECS/Fargate (this one is outside of analysis and you don't see it in my visual - wondering if I should add it).
As others have hinted you probably want to use Lambda if your model is event-driven (e.g. if you want to fire up a dedicated function for every event like a new file uploaded to S3).
You probably do not want to use a naked ECS/Fargate solution because it would require more work to deal with the triggering and the scheduling of your batch jobs.
You probably want to use either Batch or Step Functions to schedule jobs on ECS/Fargate. I'd argue SF is good if you have basic workflows that you need to deal with and Batch if you need to manage complex jobs at scale. Perhaps this 35 mins presentation that I did last year can provide a bit more background on these Batch Vs SF differences.
Let me know if you have any additional questions because this discussion is super useful for the positioning I am trying to build.
I am trying to build a web application with Kubernetes and Amazon Web Service.
I believe there are many different approaches but I would like to ask for your opinions!
To make it simple, the app would be a simple web page that display information. User logs in, and based on filters, get specific information.
My reflection is on the k8s inside architecture:
Can I put my whole application as a Pod? That way, cluster and node scalability would make the app available for each user by allocating each of them 1 pod. Is that good practice?
By following that logic, every different elements of my app would be a container. For instance, in a simple way, 1 container that contains the front of the app, 1 container that has the data access/management, 1 container for backend/auth etc
So 1 user would "consume" 1 pod which containers' are discussing together to give the required data from the user. k8s would create a pod for every users, scaling up/down the nodes number etc...
But then, except for the data itself, everything would be dockerized and stored on ECR (Elastic Container Registry) right? And so no need for any S3/EBS/EFS in my opinion.
I am quite new at AWS and k8s, so please feel free to give honest opinions :) Feedback, good or bad, is always good to take.
Thanks in advance!
I'd recommend a layout where:
Any container can perform its unit of work for any user
One container per pod
Each component of your application has its own (stateless) deployment
It probably won't work well to try to put your entire application into a single multi-container pod. This means the entire application would need to fit on a single node; some applications are larger than that, and even if it fits, it can lead to trouble with scheduling. It also means that, if you update the image for any single container, you need to delete and recreate all of the containers, which could be more disruptive than you want.
Trying to create a pod per user also will present some practical problems. You need to figure out how to route inbound requests to a particular user's pod, and keep requests within that user's set of containers; Kubernetes doesn't have any sort of native support for this. A per-user pod will also keep using resources even if it's overnight or the weekend for that user and they're not using the application. You would also need something with access to the Kubernetes API to create and destroy resources as new users joined your platform.
In an AWS-specific context you might consider using RDS (hosted PostgreSQL/MySQL) or S3 (object storage) for data storage (and again one database or S3 bucket shared across all customers). ECR is useful, but as a place to store your Docker images; that is, your built code, but not any of the persisted data or running containers.
I am monitoring an EC2 instance using Prometheus, node Exporter, and Grafana - each of which operates in its own container.
I thought that putting each of the 3 monitoring tools in its own container would make the system easier to set up and the system would run faster. Is that true that the system would run faster?
To start running all 3 monitoring tools, I have a docker-compose that starts all 3 containers at once. Would it be beneficial to run all 3 monitoring tools in one container instead of separate containers.
Here is the current system architecture
It is probably not a good idea to try to combine the processes into a single container.
There are various reasons:
Containers aren't like VMs; there's no significant overhead in running X processes in X containers;
In fact, keeping the processes in separate images (and thus running them in separate containers) permits each to be maintained (e.g. patched) distinctly;
... And it permits you to potentially secure each process distinctly;
It's considered good practice to run one process per container
Keeping the processes in their own containers permits you to e.g. run one prometheus container and one grafana container for multiple containers;
...following on from this, this permits more flexibility in relocating containers too, potentially dropping e.g. Grafana to use a Grafana hosted service etc.
So, I really like the idea of server-less. I came across Google Cloud Functions and Google Cloud Run.
So google cloud functions are individual functions, which is a broad perspective, I assume google must be securely running on a huge nodejs server. And it contains all the functions of all the google consumers and fulfils the request using unique URLs. Now, Google takes care of the cost of this one big server and charges users for every hit their function gets. So its pay to use. And makes sense.
But when it comes to Cloud Run. I fail to understand how does it work. Obviously the container must not always be running because then they will simply charge a monthly basis instead of a per-hit basis, just like a normal VM where docker image is deployed. But no, in reality, they charge on per hit basis, that means they spin up the container when a request arrives. So, I don't understand how does it spin it up so fast? The users have the flexibility of running any sort of environment, that means the docker container could contain literally anything. Maybe a full-fledged Linux OS. How does it load up the environment OS so quickly and fulfils the request? Well, maybe it maintains the state of the machine and shut it down when not in use, but even then, it will require a decent amount of time to restore the state.
So how does google really does it? How is it able to spin up a customer's container in literally no time?
The idea of fast spinning-up sandboxes containers (that run on their own kernel for security reasons) have been around for a pretty long time. For example, Intel Clear Linux Containers and Firecracker provide fast startup through various optimizations.
As you can imagine, implementing something like this would require optimizations at many layers (scheduling, traffic serving, autoscaling, image caching...).
Without giving away Google’s secrets, we can probably talk about image storage and caching: Just like how VMs use initramfs to pre-cache the state of the VM, instead of reading all the files from harddisk and following the boot sequence, we can do similar tricks with containers.
Google uses a similar solution for Cloud Run, called gVisor. It's a user-space virtualization technique (not an actual VMM or hypervisor). To run containers on a Linux-like environment, gVisor doesn't need to boot a Linux kernel from scratch (because gVisor reimplements the linux kernel in go!).
You’ll find many optimizations on other serverless platforms across most cloud providers (such as how to keep a container instance around, should you be predictively scheduling inactive containers before the load arrives). I recommend reading the Peeking Behind the Curtains of Serverless Platforms paper to get an idea about what are the problems in this space and what are cloud providers trying to optimize for speed and cost.
You have to decouple the containers to the VMs. The second link of Dustin is great because if you understand the principles of Kubernetes (and more if you have a look to Knative), it's easy to translate this to Cloud Run.
You have a pool of resources (Nodes in Kubernetes, the VM in fact with CPU and memory) and on these resources, you can run container: 1, 2, 1000 per VM, maybe, you don't know and you don't care. The power of the container, is the ability to be packaged with all the dependency that it needs. Yes, I talked about package because your container isn't an OS, it contains the dependencies for interacting with the host OS.
For preventing any problem between container from different project/customer, the container run into a sandbox (GVisor, first link of Dustin).
So, there is no VM to start and to stop, no VM to create when you deploy a Cloud Run services,... It's only a start of your container on existing resources. It's also for this reason that you need to have a stateless container, without disks attached to it.
Do you want 3 "secrets"?
It's exactly the same things with Cloud Functions! Your code is packaged into a container and deploy exactly as it's done with Cloud Run.
The underlying platform that manages Cloud Functions and Cloud Run is the same. That's why the behavior and the feature are very similar! Cloud Functions is longer to deploy because Google need to build the container for you. With Cloud Run the container is already built.
Your Compute Engine instance is also managed as a container on the Google infrastructure! More generally, all is container at Google!
I have a scenario where our Azure App Service needs to run a job every night. The job cannot scale to multiple machines -- it involves downloading a large data file, and does special processing on it (only takes a couple minutes). Special software will be required to be installed as well. A lot of memory will be needed on the machine for the computation, therefore I was thinking one of the Ev-series machines. For these reasons, I cannot run the job as a web job on the Azure App Service, and I need to delegate it elsewhere.
Anyway, I have experience with Azure Batch so at first I was thinking of Azure Batch. But I am not sure this makes sense for my scenario because the work cannot scale to multiple machines. Does it make sense to have a pool with a single node and single vm on the node? When I need to do the work, an Azure web job enqueues the job, and the pool automatically sizes from 0 to 1?
Are there better options out there? I was look to see if there are any .NET libraries to spin up a single VM and start executing work on it, then disable the VM when done, but I couldn't find anything.
Thanks!
For Azure Batch, the scenario of a single VM in a single pool is valid. Azure Container Instances or Azure Functions would appear to be a better fit, however, if you can provision the appropriate VM sizes for your workload.
As you suggested, you can combine Azure Functions/Web Jobs to enqueue the work to an Azure Batch Job. If you have autoscaling or an autopool set on the Azure Batch Pool or Job, respectively, then the work will be processed and the compute resources will be deallocated after (assuming you have the correct settings in-place).