Is it possible to run Carla simulator in a serverless manner on the cloud? (AWS in particular) - amazon-web-services

As part of my project, I need to run Carla Simulator on the cloud. I'd be developing around it and I need to be able to use the cloud to run and record simulations.
One of the requirements is for the usage of the cloud to be as cheap as possible, hence the preferred direction of going serverless since EC2 instances are quite expensive, especially with the resources needed by Carla.
I have tried using a Fargate Launch type but discovered that it doesn't support GPU resources which are needed for Carla. EKS can use either Fargate or EC# instances, the former has the same problem and the latter isn't an option due to cost. The only option I see with potential is using AWS Robomaker, which I've tried to use but am not sure if the Carla images and Dockefiles Match the requirments for Robomaker. I'm not sure if this would even fulfil the needs of my project as Robomaker seems to have prohibitive limitations.
I'm not at all experienced with any other cloud providers but would be open to use them if there is a strong case for them against AWS.

Related

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops into local kubernetes cluster (using kubeadm)?

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops (project link) into local Kubernetes cluster (setup using kubeadm)?
My thinking is that if the application runs in k8s cluster based on AWS EC2 instances, it should also run in local k8s cluster as well. I am trying it locally for testing purposes.
Heres what I have tried so far but it is not working.
First I set up my local 2-node cluster using kubeadm
Then I modified the installation script of the project (link given above) by removing all the references to EC2 (as I am using local machines) and kops (particularly in their create_cluster.py script) state.
I have modified their application yaml files (app requirements) to meet my localsetup (2-node)
Unfortunately, although most of the application pods are created and in running state, some other application pods are unable to create and therefore, I am not being able to run the whole application on my local cluster.
I appreciate your help.
It is the beauty of Docker and Kubernetes. It helps to keep your development environment to match production. For simple applications, written without custom resources, you can deploy the same workload to any cluster running on any cloud provider.
However, the ability to deploy the same workload to different clusters depends on some factors, like,
How you manage authorization and authentication in your cluster? for example, IAM, IRSA..
Are you using any cloud native custom resources - ex, AWS ALBs used as LoadBalancer Services
Are you using any cloud native storage - ex, your pods rely on EFS/EBS volumes
Is your application cloud agonistic - ex using native technologies like Neptune
Can you mock cloud technologies in your local - ex. Using local stack to mock Kinesis, Dynamo
How you resolve DNS routes - ex, Say you are using RDS n AWS. You can access it using a route53 entry. In local you might be running a mysql instance and you need a DNS mechanism to discover that instance.
I did a google search and looked at the documentation of kOps. I could not find any info about how to deploy to local, and it only supports public cloud providers.
IMO, you need to figure out a way to set up your local EKS cluster, and if there are any usage of cloud native technologies, you need to figure out an alternative way about doing the same in your local.
The true answer, as Rajan Panneer Selvam said in his response, is that it depends, but I'd like to expand somewhat on his answer by saying that your application should run on any K8S cluster given that it provides the services that the application consumes. What you're doing is considered good practice to ensure that your application is portable, which is always a factor in non-trivial applications where simply upgrading a downstream service could be considered a change of environment/platform requiring portability (platform-independence).
To help you achieve this, you should be developing a 12-Factor Application (12-FA) or one of its more up-to-date derivatives (12-FA is getting a little dated now and many variations have been suggested, but mostly they're all good).
For example, if your application uses a database then it should use DB independent SQL or no-sql so that you can switch it out. In production, you may run on Oracle, but in your local environment you may use MySQL: your application should not care. The credentials and connection string should be passed to the application via the usual K8S techniques of secrets and config-maps to help you achieve this. And all logging should be sent to stdout (and stderr) so that you can use a log-shipping agent to send the logs somewhere more useful than a local filesystem.
If you run your app locally then you have to provide a surrogate for every 'platform' service that is provided in production, and this may mean switching out major components of what you consider to be your application but this is ok, it is meant to happen. You provide a platform that provides services to your application-layer. Switching from EC2 to local may mean reconfiguring the ingress controller to work without the ELB, or it may mean configuring kubernetes secrets to use local-storage for dev creds rather than AWS KMS. It may mean reconfiguring your persistent volume classes to use local storage rather than EBS. All of this is expected and right.
What you should not have to do is start editing microservices to work in the new environment. If you find yourself doing that then the application has made a factoring and layering error. Platform services should be provided to a set of microservices that use them, the microservices should not be aware of the implementation details of these services.
Of course, it is possible that you have some non-portable code in your system, for example, you may be using some Oracle-specific PL/SQL that can't be run elsewhere. This code should be extracted to config files and equivalents provided for each database you wish to run on. This isn't always possible, in which case you should abstract as much as possible into isolated services and you'll have to reimplement only those services on each new platform, which could still be time-consuming, but ultimately worth the effort for most non-trival systems.

Enabling all API in Google Cloud project

Google Cloud needs enabled API before many things are possible to be done.
Enabling needs just one CLI command, and usually is very fast. Enabling is even proposed by CLI if I try to do something which requires not-enabled API. But it anyway interrupts development.
My question is why they are not enabled by default? And is it ok if I enable them all just after creating new project to don't bother about enabling them later?
I would like to understand purpose of such design and learn best practices.
Well, they're disabled mainly in order not to incurr costs that you weren't intending on inducing, for you to be aware which service you're using at which point and to track the usage/costs for each of them.
Also, some services like Pub/Sub are dependent on others, and others such as Container Registry (or Artifact Registry), require a Cloud Storage bucket for artifacts to be stored, and it will create a one automatically if you're pushing a Docker image or using Cloud Build. So these are things for you to be aware of.
Enabling an API takes a bit of time depending on the service, yes, but it's a one-time action per project. I'm not sure what exactly your concerns on the waiting time are, but if you want to run commands while having executed a gcloud command to enable some APIs you can use the --async flag which will run the commands in the background without needing you to wait for it to complete before running another one.
Lastly, sure, you can just enable them all if you know what you're doing but at your own risk - it's a safer route to enable just the ones you need and as you might already be aware, you can enable multiple in a single gcloud command. In the example of Container Registry, it uses Cloud Storage, for which you will still be billed on.
Enabling services enables access to (often billed) resources.
It's considered good practice to keep this "surface" of resources constrained to those that you(r customers) need; the more services you enable, the greater your potential attack surface and potential bills.
Google provides an increasing number of services (accessible through APIs). It is highly unlikely that you would ever want to access them all.
APIs are enabled by Project. The Project creation phase (including enabling services) is generally only a very small slice of the entire lifetime of a Project; even of those Projects created-and-torn-down on demand.
It's possible to enable the APIs asynchronously, permitting you to enable-not-block each service:
for SERVICE in "containerregistry" "container" "cloudbuild" ...
do
gcloud services enable ${SERVICE}.googleapis.com --project=${PROJECT} --async
done
Following on from this, it is good practice to automate your organization's project provisioning (scripts, Terraform, Deployment Manager etc.). This provides a baseline template for how your projects are created, which services are enabled, default permissions etc. Then your developers simply fire-and-forget a provisioner (hopefully also checked-in to your source control), drink a coffee and wait these steps are done for them.

Continuous Integration on AWS EMR

We have a long running EMR cluster that has multiple libraries installed on it using bootstrap actions. Some of these libraries are under continuous development and their codebase is on GitHub.
I've been looking to plug Travis CI with AWS EMR in a similar way to Travis and CodeDeploy. The idea is to get the code on GitHub tested and deployed automatically to EMR while using bootstrap actions to install the updated libraries on all EMR's nodes.
A solution I came up with is to use an EC2 instance in the middle, where Travis and CodeDeploy can be first used to deploy the code on the instance. After that a lunch script on the instance is triggered to create a new EMR cluster with the updated libraries.
However, the above solution means we need to create a new EMR cluster every time we deploy a new version of the system
Any other suggestions?
You definitely don't want to maintain an EC2 instance to orchestrate a CI/CD process like that. First of all, it introduces a number of challenges because then you need to deal with an entire server instance, keep it maintained, deal with networking, apply monitoring and alerts to deal with availability issues, and even then, you won't have availability guarantees, which may cause other issues. Most of all, maintaining an EC2 instance for a purpose like that simply is unnecessary.
I recommend that you investigate using Amazon CodePipeline with a Lambda Step Function.
The Step Function can be used to orchestrate the provisioning of your EMR cluster in a fully serverless environment. With CodePipeline, you can setup a web hook into your Github repo to pull your code and spin up a new deployment automatically whenever changes are committed to your master Github branch (or whatever branch you specify). You can use EMRFS to sync an S3 bucket or folder to your EMR file system for your cluster and then obtain the security benefits of IAM, as well as additional consistency guarantees that come with EMRFS. With Lambda, you also get seamless integration into other services, such as Kinesis, DynamoDB, and CloudWatch, among many others, that will simplify many administrative and development tasks, as well as enable you to have more sophisticated automation with minimal effort.
There are some great resources and tutorials for using CodePipeline with EMR, as well as in general. Here are some examples:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-apache-spark-applications-using-aws/
https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-ecs-ecr-codedeploy.html
https://chalice-workshop.readthedocs.io/en/latest/index.html
There are also great tutorials for orchestrating applications with Lambda Step Functions, including the use of EMR. Here are some examples:
https://aws.amazon.com/blogs/big-data/orchestrate-apache-spark-applications-using-aws-step-functions-and-apache-livy/
https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
https://github.com/DavidWells/serverless-workshop/tree/master/lessons-code-complete/events/step-functions
https://github.com/aws-samples/lambda-refarch-imagerecognition
https://github.com/aws-samples/aws-serverless-workshops
In the very worst case, if all of those options fail, such as if you need very strict control over the startup process on the EMR cluster after the EMR cluster completes its bootstrapping, you can always create a Java JAR that is loaded as a final step and then use that to either execute a shell script or use the various Amazon Java libraries to run your provisioning commands. In even this case, you still have no need to maintain your own EC2 instance for orchestration purposes (which, in my opinion, still would be hard to justify even if it was running in a Docker container in Kubernetes) because you can easily maintain that deployment process as well with a fully serverless approach.
There are many great videos from the Amazon re:Invent conferences that you may want to watch to get a jump start before you dive into the workshops. For example:
https://www.youtube.com/watch?v=dCDZ7HR7dms
https://www.youtube.com/watch?v=Xi_WrinvTnM&t=1470s
Many more such videos are available on YouTube.
Travis CI also supports Lambda deployment, as mentioned here: https://docs.travis-ci.com/user/deployment/lambda/

Docker for AWS vs pure Docker deployment on EC2

The purpose is production-level deployment of a 8-container application, using swarm.
It seems (ECS aside) we are faced with 2 options:
Use the so called docker-for-aws that does (swarm) provisioning via a cloudformation template.
Set up our VPC as usual, install docker engines, bootstrap the swarm (via init/join etc) and deploy our application in normal EC2 instances.
Is the only difference between these two approaches the swarm bootstrap performed by docker-for-aws?
Any other benefits of docker-for-aws compared to a normal AWS VPC provisioning?
Thx
If you need to provide a portability across different cloud providers - go with AWS CloudFormation template provided by Docker team. If you only need to run on AWS - ECS should be fine. But you will need to spend a bit of time on figuring out how service discovery works there. Benefit of Swarm is that they made it fairly simple, just access your services via their service name like they were DNS names with built-in load-balancing.
It's fairly easy to automate new environment creation with it and if you need to go let's say Azure or Google Cloud later - you simply use template for them to get your docker cluster ready.
Docker team has put quite a few things into that template and you really don't want to re-create them yourself unless you really have to. For instance if you don't use static IPs for your infra (fairly typical scenario) and one of the managers dies - you can't just restart it. You will need to manually re-join it to the cluster. Docker for AWS handles that through IPs sync via DynamoDB and uses other provider specific techniques to make failover / recovery work smoothly. Another example is logging - they push your logs automatically into CloudWatch, which is very handy.
A few tips on automating your environment provisioning if you go with Swarm template:
Use some infra automation tool to create VPC per environment. Use some template provided by that tool so you don't write too much yourself. Using a separate VPC makes all environment very isolated and easier to work with, less chance to screw something up. Also, you're likely to add more elements into those environments later, such as RDS. If you control your VPC creation it's easier to do that and keep all related resources under the same one. Let's say DEV1 environment's DB is in DEV1 VPC
Hook up running AWS Cloud Formation template provided by docker to provision a Swarm cluster within this VPC (they have a separate template for that)
My preference for automation is Terraform. It lets me to describe a desired state of infrastructure rather than on how to achieve it.
I would say no, there are basically no other benefits.
However, if you want to achieve all/several of the things that the docker-for-aws template provides I believe your second bullet point should contain a bit more.
E.g.
Logging to CloudWatch
Setting up EFS for persistence/sharing
Creating subnets and route tables
Creating and configuring elastic load balancers
Basic auto scaling for your nodes
and probably more that I do not recall right now.
The template also ingests a bunch of information about related resources to your EC2 instances to make it readily available for all Docker services.
I have been using the docker-for-aws template at work and have grown to appreciate a lot of what it automates. And what I do not appreciate I change, with the official template as a base.
I would go with ECS over a roll your own solution. Unless your organization has the effort available to re-engineer the services and integrations AWS offers as part of the offerings; you would be artificially painting yourself into a corner for future changes. Do not re-invent the wheel comes to mind here.
Basically what #Jonatan states. Building the solutions to integrate what is already available is...a trial of pain when you could be working on other parts of your business / application.

How to deploy Kubernetes on AWS?

I'm wondering how people are deploying a production-caliber Kubernetes cluster in AWS and, more importantly, how they chose their approach.
The k8s documentation points towards kops for Debian, Ubuntu, CentOS, and RHEL or kube-aws for CoreOS/Container Linux. Among these choices it's not clear how to pick one over the others. CoreOS seems like the most compelling option since it's designed for container workloads.
But wait, there's more.
bootkube seems to be next iteration of the CoreOS deployment technology and is on the roadmap for inclusion within kube-aws. Should I wait until kube-aws uses bootkube?
Heptio recently announced a Quickstart architecture for deploying k8s in AWS. This is the newest approach and so probably the least mature approach but it does seem to have gained traction from within AWS.
Lastly kubeadm is a thing and I'm not really sure where it fits into all of this.
There are probably more approaches that I'm missing too.
Given the number of options with overlapping intent it's very difficult to choose a path forward. I'm not interested in a proof-of-concept. I want to be able to deploy a secure, highly-available cluster for production use and be able to upgrade the cluster (host OS, etcd, and k8s system components) over time.
What did you choose and how did you decide?
I'd say pick anything which fit's your needs (see also Picking the right solution)...
Which could be:
Speed of the cluster setup
Integration in your existing toolchain
e.g. kops integrates with Terraform which might be a good fit for some prople
Experience within your team/company/...
e.g. how comfortable are you with the related Linux distribution
Required maturity of the tool itself
some tools are very alpha, are you willing to play to role of an early adaptor?
Ability to upgrade between Kubernetes versions
kubeadm has this on their agenda, some others prefer to throw away clusters instead of upgrading
Required integration into external tools (monitoring, logging, auth, ...)
Supported cloud providers
With your specific requirements I'd pick the Heptio or kubeadm approach.
Heptio if you can live with the given constraints (e.g. predefined OS)
kubeadm if you need more flexibility, everything done with kubeadm can be transferred to other cloud providers
Other options for AWS lower on my list:
Kubernetes the hard way - using this might be the only true way to setup a production cluster as this is the only way you can fully understand each moving part of the system. Lower on the list, because often the result from any of the tools might just be more than enough, even for production.
kube-up.sh - is deprecated by the community, so I'd not use it for new projects
kops - my team had some strange experiences with it which seemed due to our (custom) needs back then (existing VPC), that's why it's lower on my list - it would be #1 for an environment where Terraform is used too.
bootkube - lower on my list, because it's limitation to CoreOS
Rancher - interesting toolchain, seems to be too much for a single cluster
Offtopic: If you don't have to run on AWS, I'd also always consider to rather run on GCE for production workloads, as this is a well managed platform rather than something you've to build yourself.