Configure cloud-based vscode ide on AWS - amazon-web-services

CONTEXT:
We have a platform where users can create their own projects - multiple projects per user. We need to provide them with a browser-based IDE to edit those projects.
We decided to go with coder-server. For this we need to configure an auto-scalable cluster on AWS. When the user clicks "Edit Project" we will bring up a new container each time.
https://hub.docker.com/r/codercom/code-server
QUESTION:
How to pass parameters from the url query (my-site.com/edit?project=1234) into a startup script to pre-configure the workspace in a docker container when it starts?
Let's say the stack is AWS + ECS + Fargate. We could use kubernetes instead of ECS if it helps.
I don't have any experience in cluster configuration. Will appreciate any help or at least a direction where to dig further.

The above can be achieved using multiple ways in AWS ECS. The basic requirements for such systems are to launch and terminate containers on the fly while persisting the changes in the files. (I will focus on launching the containers)
Using AWS SDK's:
The task can be easily achieved using AWS SDKs, Using a base task definition. AWS SDK allows starting tasks with overrides on the base task definition.
E.G. If task definition has a memory of 2GB then the SDK can override the memory to parameterised value while launching a task from task def.
Refer to the boto3 (AWS SDK for Python) docs.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
Overall Solution
Now that we know how to run custom tasks with python SDK (on demand). The overall flow for your application is your API calling AWS lambda function whit parameters to spin up and wait to keep checking task status and update and rout traffic to it once the status is healthy.
API calls AWS lambda functions with parameters
Lambda function using AWS SDK create a new task with overrides from base task definition. (assuming the base task definition already exists)
Keep checking the status of the new task in the same function call and set a flag in your database for your front end to be able to react to it.
Once the status is healthy you can add a rule in the application load balancer using AWS SDK to route traffic to the IP without exposing the IP address to the end client (AWS application load balancer can get expensive, I'll advise using Nginx or HAProxy on ec2 to manage dynamic routing)
Note:
Ensure your Image is lightweight, and the startup times are less than 15 mins as lambda cannot execute beyond that. If that's the case create a microservice for launching ad-hoc containers and hosting them on EC2
Using Terraform:
If you looking for infrastructure provisioning terraform is the way to go. It has a learning curve so recommend it as a secondary option.
Terraform is popular for parametrising using variables and it can be plugged in easily as a backend for an API. The flow of your application still remains the same from step 1, but instead of AWS Lambda API will be calling your ad-hoc container microservice, which in turn calls terraform script and passing variables to it.
Refer to the Terrafrom docs for AWS
https://registry.terraform.io/providers/hashicorp/aws/latest

Related

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops into local kubernetes cluster (using kubeadm)?

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops (project link) into local Kubernetes cluster (setup using kubeadm)?
My thinking is that if the application runs in k8s cluster based on AWS EC2 instances, it should also run in local k8s cluster as well. I am trying it locally for testing purposes.
Heres what I have tried so far but it is not working.
First I set up my local 2-node cluster using kubeadm
Then I modified the installation script of the project (link given above) by removing all the references to EC2 (as I am using local machines) and kops (particularly in their create_cluster.py script) state.
I have modified their application yaml files (app requirements) to meet my localsetup (2-node)
Unfortunately, although most of the application pods are created and in running state, some other application pods are unable to create and therefore, I am not being able to run the whole application on my local cluster.
I appreciate your help.
It is the beauty of Docker and Kubernetes. It helps to keep your development environment to match production. For simple applications, written without custom resources, you can deploy the same workload to any cluster running on any cloud provider.
However, the ability to deploy the same workload to different clusters depends on some factors, like,
How you manage authorization and authentication in your cluster? for example, IAM, IRSA..
Are you using any cloud native custom resources - ex, AWS ALBs used as LoadBalancer Services
Are you using any cloud native storage - ex, your pods rely on EFS/EBS volumes
Is your application cloud agonistic - ex using native technologies like Neptune
Can you mock cloud technologies in your local - ex. Using local stack to mock Kinesis, Dynamo
How you resolve DNS routes - ex, Say you are using RDS n AWS. You can access it using a route53 entry. In local you might be running a mysql instance and you need a DNS mechanism to discover that instance.
I did a google search and looked at the documentation of kOps. I could not find any info about how to deploy to local, and it only supports public cloud providers.
IMO, you need to figure out a way to set up your local EKS cluster, and if there are any usage of cloud native technologies, you need to figure out an alternative way about doing the same in your local.
The true answer, as Rajan Panneer Selvam said in his response, is that it depends, but I'd like to expand somewhat on his answer by saying that your application should run on any K8S cluster given that it provides the services that the application consumes. What you're doing is considered good practice to ensure that your application is portable, which is always a factor in non-trivial applications where simply upgrading a downstream service could be considered a change of environment/platform requiring portability (platform-independence).
To help you achieve this, you should be developing a 12-Factor Application (12-FA) or one of its more up-to-date derivatives (12-FA is getting a little dated now and many variations have been suggested, but mostly they're all good).
For example, if your application uses a database then it should use DB independent SQL or no-sql so that you can switch it out. In production, you may run on Oracle, but in your local environment you may use MySQL: your application should not care. The credentials and connection string should be passed to the application via the usual K8S techniques of secrets and config-maps to help you achieve this. And all logging should be sent to stdout (and stderr) so that you can use a log-shipping agent to send the logs somewhere more useful than a local filesystem.
If you run your app locally then you have to provide a surrogate for every 'platform' service that is provided in production, and this may mean switching out major components of what you consider to be your application but this is ok, it is meant to happen. You provide a platform that provides services to your application-layer. Switching from EC2 to local may mean reconfiguring the ingress controller to work without the ELB, or it may mean configuring kubernetes secrets to use local-storage for dev creds rather than AWS KMS. It may mean reconfiguring your persistent volume classes to use local storage rather than EBS. All of this is expected and right.
What you should not have to do is start editing microservices to work in the new environment. If you find yourself doing that then the application has made a factoring and layering error. Platform services should be provided to a set of microservices that use them, the microservices should not be aware of the implementation details of these services.
Of course, it is possible that you have some non-portable code in your system, for example, you may be using some Oracle-specific PL/SQL that can't be run elsewhere. This code should be extracted to config files and equivalents provided for each database you wish to run on. This isn't always possible, in which case you should abstract as much as possible into isolated services and you'll have to reimplement only those services on each new platform, which could still be time-consuming, but ultimately worth the effort for most non-trival systems.

What is the difference between AWS Lambda & AWS Elastic Beanstalk

I am studying for my AWS Cloud Practitioner Certification and I am confused with the difference between AWS Lambda & AWS Elastic Beanstalk. From my understanding, for both services you upload your code to AWS and AWS essentially manages the underlying infrastructure for you.
I know with Lambda you upload your code to a 'Lambda Function' and set triggers for when the code executes.
With AWS EB you upload your application code and EB automatically handles the deployment, capacity, provisioning, etc...
They both sound very similar as you upload your code to both and both handle underlying instances/environments.
Thanks!
Elastic beanstalk and lambda are very different though some of the features may look similar. At high level, elastic beanstalk deploys a long running application whereas lambda deploys short running code function
Lambda can at maximum run for 15 minutes, whereas EB can run continuously. Generally, we deploy websites/apps on EB whereas lambda are generally used for triggered functionality like processing image when image gets uploaded to S3.
Lambda can only handle one request at a time whereas number of concurrent requests EB can handle depends on your underlying infrastructure. So, if you are having say 100 requests, 100 lambdas will be created whereas these 100 requests can be handled by one underlying EC2 instance in EB
Lambda is serverless (underlying infra is entirely abstracted from developer). Whereas EB is automation over infra provisioning. You can still see your EC2 instances, load balancer, auto scaling group etc. in your AWS console. You can even ssh/rdp to your instance and change running services. AWS EB allows you also to have your custom AMIs.
Lambda is having issue of cold starts as in lambda, infra needs to be provisioned on demand by AWS, whereas in EB, you generally have EC2 instances already provisioned to handle your requests.
All great (and exam-specific) points by SmartCoder. If I may add a general ancillary comment:
Wittgenstein said, "In most cases, the meaning of a word is its use." I think this maxim is remarkably apt for software engineering too. In the context of your question, those two AWS services are used for significantly different purposes.
Lambda - Say you developed a photo uploading application with Node.js that uploads some processed images to an S3 bucket. The core logic for this is probably quite straightforward, and it's got a singular, distinct task. Simply take in an image, do some processing and if not for any exception, store it in a bucket. In this case, it's inefficient to waste time spinning up servers, configuring them with a runtime environment, downloading dependencies, maintenance, etc. A literal copy and paste of your code into the Lambda console while setting up a few configurations should get your job done. Plus, you save a lot of money as infrastructure is "provisioned" only when your Node.js function is invoked. Again, keep in mind the principle of this code performing a singular task.
Elastic Beanstalk - This same photo uploading system mentioned above might now mature into a more complex full-fledged software application that requires user management, authentication, and further processing of the images, which certainly requires more provisioning of resources. This application will probably do a lot of things with multiple code repositories for you to manage and deploy. And yet, you don't want to spend money on a DevOps engineer or learn to use an IaC (Infrastructure as Code) platform like CloudFormation or Terraform. In this case, Elastic Beanstalk is useful for a developer without too much in-depth DevOps knowledge as it's a PaaS (Platform as a Service) tool; it pretty much gives you a clear interface to spin up whole new production-ready systems.
Here are two good whitepapers I read a while back on the above topics.
https://docs.aws.amazon.com/whitepapers/latest/serverless-architectures-lambda/serverless-architectures-lambda.pdf
https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/introduction-devops-aws.pdf
Lambda is run based on specific trigger events.. and it exits as soon as its work is over.

how to deploy code on multiple instances Amazon EC2 Autocaling group?

So we are launching an ecommerce store built on magento. We are looking to deploy it on Amazon EC2 instance using RDS as database service and using amazon auto-scaling and elastic load balancer to scale the application when needed.
What I don't understand is this:
I have installed and configured my production magento enviorment on an EC2 instance (database is in RDS). This much is working fine. But now when I want to dynamically scale the number of instances
how will I deploy the code on the dynamically generated instances each time?
Will aws copy the whole instance assign it a new ip and spawn it as a
new instance or will I have to write some code to automate this
process?
Plus will it not be an overhead to pull code from git and deploy every time a new instance is spawned?
A detailed explanation or direction towards some resources on the topic will be greatly appreciated.
You do this in the AutoScalingGroup Launch Configuration. There is a UserData section in the LaunchConfiguration in CloudFormation where you would write a script that is ran when ever the ASG scales up and deploys a new instance.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-launchconfig.html#cfn-as-launchconfig-userdata
This is the same as the UserData section in an EC2 Instance. You can use LifeCycle hooks that will tell the ASG not to put the EC2 instance into load until everything you want to have configured it set up.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-as-lifecyclehook.html
I linked all CloudFormation pages, but you may be using some other CI/CD tool for deploying your infrastructure, but hopefully that gets you started.
To start, do check AWS CloudFormation. You will be creating templates to design how the infrastructure of your application works ~ infrastructure as code. With these templates in place, you can rollout an update to your infrastructure by pushing changes to your templates and/or to your application code.
In my current project, we have a github repository dedicated for these infrastructure templates and a separate repository for our application code. Create a pipeline for creating AWS resources that would rollout an updated to AWS every time you push to the repository on a specific branch.
Create an infrastructure pipeline
have your first stage of the pipeline to trigger build whenever there's code changes to your infrastructure templates. See AWS CodePipeline and also see AWS CodeBuild. These aren't the only AWS resources you'll be needing but those are probably the main ones, of course aside from this being done in cloudformation template as mentioned earlier.
how will I deploy the code on the dynamically generated instances each time?
Check how containers work, it would be better and will greatly supplement on your learning on how launching new version of application work. To begin, see docker, but feel free to check any resources at your disposal
Continuation with my current project: We do have a separate pipeline dedicated for our application, but will also get triggered after our infrastructure pipeline update. Our application pipeline is designed to build a new version of our application via AWS Codebuild, this will create an image that will become a container ~ from the docker documentation.
we have two triggers or two sources that will trigger an update rollout to our application pipeline, one is when there's changes to infrastructure pipeline and it successfully built and second when there's code changes on our github repository connected via AWS CodeBuild.
Check AWS AutoScaling , this areas covers the dynamic launching of new instances, shutting down instances when needed, replacing unhealthy instances when needed. See also AWS CloudWatch, you can design criteria with it to trigger scaling down/up and/or in/out.
Will aws copy the whole instance assign it a new ip and spawn it as a new instance or will I have to write some code to automate this process?
See AWS ElasticLoadBalancing and also check out more on AWS AutoScaling. On the automation process, if ever you'll push through with CloudFormation, instance and/or containers(depending on your design) will be managed gracefully.
Plus will it not be an overhead to pull code from git and deploy every time a new instance is spawned?
As mentioned, earlier having a pipeline for rolling out new versions of your application via CodeBuild, this will create an image with the new code changes and when everything is ready, it will be deployed ~ becomes a container. The old EC2 instance or the old container( depending on how you want your application be deployed) will be gracefully shut down after a new version of your application is up and running. This will give you zero downtime.

Docker for AWS vs pure Docker deployment on EC2

The purpose is production-level deployment of a 8-container application, using swarm.
It seems (ECS aside) we are faced with 2 options:
Use the so called docker-for-aws that does (swarm) provisioning via a cloudformation template.
Set up our VPC as usual, install docker engines, bootstrap the swarm (via init/join etc) and deploy our application in normal EC2 instances.
Is the only difference between these two approaches the swarm bootstrap performed by docker-for-aws?
Any other benefits of docker-for-aws compared to a normal AWS VPC provisioning?
Thx
If you need to provide a portability across different cloud providers - go with AWS CloudFormation template provided by Docker team. If you only need to run on AWS - ECS should be fine. But you will need to spend a bit of time on figuring out how service discovery works there. Benefit of Swarm is that they made it fairly simple, just access your services via their service name like they were DNS names with built-in load-balancing.
It's fairly easy to automate new environment creation with it and if you need to go let's say Azure or Google Cloud later - you simply use template for them to get your docker cluster ready.
Docker team has put quite a few things into that template and you really don't want to re-create them yourself unless you really have to. For instance if you don't use static IPs for your infra (fairly typical scenario) and one of the managers dies - you can't just restart it. You will need to manually re-join it to the cluster. Docker for AWS handles that through IPs sync via DynamoDB and uses other provider specific techniques to make failover / recovery work smoothly. Another example is logging - they push your logs automatically into CloudWatch, which is very handy.
A few tips on automating your environment provisioning if you go with Swarm template:
Use some infra automation tool to create VPC per environment. Use some template provided by that tool so you don't write too much yourself. Using a separate VPC makes all environment very isolated and easier to work with, less chance to screw something up. Also, you're likely to add more elements into those environments later, such as RDS. If you control your VPC creation it's easier to do that and keep all related resources under the same one. Let's say DEV1 environment's DB is in DEV1 VPC
Hook up running AWS Cloud Formation template provided by docker to provision a Swarm cluster within this VPC (they have a separate template for that)
My preference for automation is Terraform. It lets me to describe a desired state of infrastructure rather than on how to achieve it.
I would say no, there are basically no other benefits.
However, if you want to achieve all/several of the things that the docker-for-aws template provides I believe your second bullet point should contain a bit more.
E.g.
Logging to CloudWatch
Setting up EFS for persistence/sharing
Creating subnets and route tables
Creating and configuring elastic load balancers
Basic auto scaling for your nodes
and probably more that I do not recall right now.
The template also ingests a bunch of information about related resources to your EC2 instances to make it readily available for all Docker services.
I have been using the docker-for-aws template at work and have grown to appreciate a lot of what it automates. And what I do not appreciate I change, with the official template as a base.
I would go with ECS over a roll your own solution. Unless your organization has the effort available to re-engineer the services and integrations AWS offers as part of the offerings; you would be artificially painting yourself into a corner for future changes. Do not re-invent the wheel comes to mind here.
Basically what #Jonatan states. Building the solutions to integrate what is already available is...a trial of pain when you could be working on other parts of your business / application.

Accessing files in EC2 from Lambda

I have few EC2 servers in AWS. Whenever the disk space exceeds a limit, i want to delete some files (may be logs folder) in EC2 instance automatically. I am planning to use Lambda and cloudwatch for this. Can i use Lambda to interact with EC2. If not possible, what is the alternate approach to achieve this functionality.
This is not an appropriate use-case for an AWS Lambda function.
AWS Lambda is suitable for tasks where compute is required in response to an event. Your use-case, however, is to manipulate information on an EC2 instance, which does not need cloud compute.
You could run a script on each each computer, triggered by a Scheduled Task.
Alternatively, you could use the Systems Manager Run Command (also known as the EC2 Run Command), which allows you to run commands on multiple Amazon EC2 instances and view the results. This could be used to trigger a local script, or it could pass the whole command to run (including the script). It is purpose-built for the type of task you describe.
AWS Lambda has access to your instances if they are available in the internet. If they are not available in the internet, it is possible to give access to AWS lambda using a NAT or instance Gateway in your VPC.
The problem is: access to your instance does not means access to the instances filesystems. To delete the files from Lambda you can use two alternatives:
Configure a network filesystem service in your instances an connect
to this services in your lambda function. Using windows you would
just "share" your disks, but in that case you would use some SMB
library in your lambda code, that "I think" did not have native SMB
support. Just keep in mind that your security guy will scream out
loud when you propose this alternative.
Create a "agent" in your EC2 instances and keep it running as a
Windows Service and call this agent from your lambda function. In
that case, the lambda will start the execution of the agent that
will be responsible for the file deletion.
Another option, is to follow Ramesh's suggestion and create a Powershell script and configure a cron job. To be easy, you can create a Image with this Powershell script and use the image to initialize each instance. The same solution would be applicable to "the agent" solution in the lambda alternantives.
I think that, in any case, you will need to change something in your 150 servers. Using a customized image can help you to simplify this a little bit, but you will not get a solution without some changes.
According to the following thread, you cannot access files inside a EC2 VM unless you are exposing files to the public using different methodology.
AWS Forum
Quoting from the forum
If you are talking about the underlying EC2 instance, answer is No, you cannot access those files.
However as a solution for your problem, you can used scheduled job to cleanup your files depending your usage. You can use a service or cron job.