How to detect when an AWS instance is fully functional - amazon-web-services

We are using Bamboo to spin up (through Powershell on the buildserver) an AWS Windows 2008 R2 instance as a target for a web deployment.
What is the best method for determining that the target instance is ready for a deployment (all services up and running, etc).

There's no real easy way to do this with Windows instances. Your best bet is to write a infrastructure test that looks to see if a service is running on the target environment and to keep retrying until it either verifies that the service is available or a timeout has occurred. At that point you could start your deployment.
I generally do this with a cucumber script that will check a service's status continually until it gets an answer
You could also set a timeout for an appropriate amount of time, although this option wouldn't be my recommendation

Related

Configure cloud-based vscode ide on AWS

CONTEXT:
We have a platform where users can create their own projects - multiple projects per user. We need to provide them with a browser-based IDE to edit those projects.
We decided to go with coder-server. For this we need to configure an auto-scalable cluster on AWS. When the user clicks "Edit Project" we will bring up a new container each time.
https://hub.docker.com/r/codercom/code-server
QUESTION:
How to pass parameters from the url query (my-site.com/edit?project=1234) into a startup script to pre-configure the workspace in a docker container when it starts?
Let's say the stack is AWS + ECS + Fargate. We could use kubernetes instead of ECS if it helps.
I don't have any experience in cluster configuration. Will appreciate any help or at least a direction where to dig further.
The above can be achieved using multiple ways in AWS ECS. The basic requirements for such systems are to launch and terminate containers on the fly while persisting the changes in the files. (I will focus on launching the containers)
Using AWS SDK's:
The task can be easily achieved using AWS SDKs, Using a base task definition. AWS SDK allows starting tasks with overrides on the base task definition.
E.G. If task definition has a memory of 2GB then the SDK can override the memory to parameterised value while launching a task from task def.
Refer to the boto3 (AWS SDK for Python) docs.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
Overall Solution
Now that we know how to run custom tasks with python SDK (on demand). The overall flow for your application is your API calling AWS lambda function whit parameters to spin up and wait to keep checking task status and update and rout traffic to it once the status is healthy.
API calls AWS lambda functions with parameters
Lambda function using AWS SDK create a new task with overrides from base task definition. (assuming the base task definition already exists)
Keep checking the status of the new task in the same function call and set a flag in your database for your front end to be able to react to it.
Once the status is healthy you can add a rule in the application load balancer using AWS SDK to route traffic to the IP without exposing the IP address to the end client (AWS application load balancer can get expensive, I'll advise using Nginx or HAProxy on ec2 to manage dynamic routing)
Note:
Ensure your Image is lightweight, and the startup times are less than 15 mins as lambda cannot execute beyond that. If that's the case create a microservice for launching ad-hoc containers and hosting them on EC2
Using Terraform:
If you looking for infrastructure provisioning terraform is the way to go. It has a learning curve so recommend it as a secondary option.
Terraform is popular for parametrising using variables and it can be plugged in easily as a backend for an API. The flow of your application still remains the same from step 1, but instead of AWS Lambda API will be calling your ad-hoc container microservice, which in turn calls terraform script and passing variables to it.
Refer to the Terrafrom docs for AWS
https://registry.terraform.io/providers/hashicorp/aws/latest

Is it possible to set up auto-scaling so that it always duplicates the most recent version of your main server?

I know that you can create an image of your server as-is and setup auto-scaling on that, but what if I then make changes to my original server? Do I have to then make another snapshot of that and setup auto-scaling again?
There are two approaches for configuring a server:
Creating an Amazon Machine Image (AMI) with all software fully configured, or
Having the instance configure itself via a startup script triggered via User Data
A fully-configured AMI is fast to startup, whereas a configuration script can take several minutes to configure the instance before the instance is ready to accept traffic.
In general, it is not considered good practice to "make changes to my original server" because there is no concept of an "original server". All instances are considered equal. Instead, the configuration should be created and tested on development servers separate to any production servers and then released by deploying new servers or by having an 'update' script intelligently run on existing servers. Some of these capabilities are provided by AWS CodeDeploy.

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops into local kubernetes cluster (using kubeadm)?

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops (project link) into local Kubernetes cluster (setup using kubeadm)?
My thinking is that if the application runs in k8s cluster based on AWS EC2 instances, it should also run in local k8s cluster as well. I am trying it locally for testing purposes.
Heres what I have tried so far but it is not working.
First I set up my local 2-node cluster using kubeadm
Then I modified the installation script of the project (link given above) by removing all the references to EC2 (as I am using local machines) and kops (particularly in their create_cluster.py script) state.
I have modified their application yaml files (app requirements) to meet my localsetup (2-node)
Unfortunately, although most of the application pods are created and in running state, some other application pods are unable to create and therefore, I am not being able to run the whole application on my local cluster.
I appreciate your help.
It is the beauty of Docker and Kubernetes. It helps to keep your development environment to match production. For simple applications, written without custom resources, you can deploy the same workload to any cluster running on any cloud provider.
However, the ability to deploy the same workload to different clusters depends on some factors, like,
How you manage authorization and authentication in your cluster? for example, IAM, IRSA..
Are you using any cloud native custom resources - ex, AWS ALBs used as LoadBalancer Services
Are you using any cloud native storage - ex, your pods rely on EFS/EBS volumes
Is your application cloud agonistic - ex using native technologies like Neptune
Can you mock cloud technologies in your local - ex. Using local stack to mock Kinesis, Dynamo
How you resolve DNS routes - ex, Say you are using RDS n AWS. You can access it using a route53 entry. In local you might be running a mysql instance and you need a DNS mechanism to discover that instance.
I did a google search and looked at the documentation of kOps. I could not find any info about how to deploy to local, and it only supports public cloud providers.
IMO, you need to figure out a way to set up your local EKS cluster, and if there are any usage of cloud native technologies, you need to figure out an alternative way about doing the same in your local.
The true answer, as Rajan Panneer Selvam said in his response, is that it depends, but I'd like to expand somewhat on his answer by saying that your application should run on any K8S cluster given that it provides the services that the application consumes. What you're doing is considered good practice to ensure that your application is portable, which is always a factor in non-trivial applications where simply upgrading a downstream service could be considered a change of environment/platform requiring portability (platform-independence).
To help you achieve this, you should be developing a 12-Factor Application (12-FA) or one of its more up-to-date derivatives (12-FA is getting a little dated now and many variations have been suggested, but mostly they're all good).
For example, if your application uses a database then it should use DB independent SQL or no-sql so that you can switch it out. In production, you may run on Oracle, but in your local environment you may use MySQL: your application should not care. The credentials and connection string should be passed to the application via the usual K8S techniques of secrets and config-maps to help you achieve this. And all logging should be sent to stdout (and stderr) so that you can use a log-shipping agent to send the logs somewhere more useful than a local filesystem.
If you run your app locally then you have to provide a surrogate for every 'platform' service that is provided in production, and this may mean switching out major components of what you consider to be your application but this is ok, it is meant to happen. You provide a platform that provides services to your application-layer. Switching from EC2 to local may mean reconfiguring the ingress controller to work without the ELB, or it may mean configuring kubernetes secrets to use local-storage for dev creds rather than AWS KMS. It may mean reconfiguring your persistent volume classes to use local storage rather than EBS. All of this is expected and right.
What you should not have to do is start editing microservices to work in the new environment. If you find yourself doing that then the application has made a factoring and layering error. Platform services should be provided to a set of microservices that use them, the microservices should not be aware of the implementation details of these services.
Of course, it is possible that you have some non-portable code in your system, for example, you may be using some Oracle-specific PL/SQL that can't be run elsewhere. This code should be extracted to config files and equivalents provided for each database you wish to run on. This isn't always possible, in which case you should abstract as much as possible into isolated services and you'll have to reimplement only those services on each new platform, which could still be time-consuming, but ultimately worth the effort for most non-trival systems.

AWS autoscaling starts not ready instances because of userdata script

I have an autoscaling that works great, with a launchconfiguration where i defined a userdata script that is executed on a new instance launch.
The userscript updates basecode and generate cache, this takes some seconds. But as soon as the instance is "created" (and not "ready"), the autoscaling adds it to the load balancer.
It's a problem because while the userdata script is executed, the instance does not answer with a good response (basically, 500 errors are throw).
I would like to avoid that, of course I saw this documentation : http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/InstallingAdd
As with a standalone EC2 instance, you have the option of configuring instances launched into an Auto Scaling group using user data. For example, you can specify a configuration script using the User data field in the AWS Management Console, or the --userdata parameter in the AWS CLI.
If you have software that can't be installed using a configuration script, or if you need to modify software manually before Auto Scaling adds the instance to the group, add a lifecycle hook to your Auto Scaling group that notifies you when the Auto Scaling group launches an instance. This hook keeps the instance in the Pending:Wait state while you install and configure the additional software.
Looks like i'm not in this case. Also, modify the pending hook on the userdata script is complicated. There must be a simple solution to fix my problem.
Thank you for your help !
EC2 instance Userdata does not utilize a lifecycle hook to stop a newly launched instance being brought into service until after it has finished executing.
Stopping your web server at the start of your user data script sounds a little unreliable to me, and therefore I would urge you to utilize the features AutoScaling provides that were designed to solve this very problem.
I have two suggestions:
Option 1:
Using lifecycle hooks isn't at all complicated, once you read through the docs. And in your user data, you can easily use the CLI to control the hook, check this out. In fact, a hook can be controlled from any supported language or scripting language.
Option 2:
If manually taking care of lifecycle hooks doesn't appeal to you, then I would recommend scrapping your user data script and doing a work around with AWS CodeDeploy. You could have CodeDeploy deploy nothing (eg. empty S3 folder) but you could use the deployment hook scripts to replace your user data script. Code Deploy integrates with AutoScaling seamlessly and handles lifecycle hooks automatically. A newly launched instance won't be brought into service by AutoScaling until a deployment has succeeded. Read the docs here and here for more info.
However, I would urge you to go with option 1. Lifecycle hooks were designed to solve the very problem you have. They're powerful, robust, awesome and free. Use them.
#Brooks said the easiest way to "wait" before the ELB serve the instance is to deal with ELB health status.
I solved my problem by shutting down the http server at the start of the userdata script. So the ELB can't have a green health status, and it does not send clients to the instance. I re-start the http server at the end of the script, the health is good so the ELB serve it.

Best practice for reconfiguring and redeploying on AWS autoscalegroup

I am new to AWS (Amazon Web Services) as well as our own custom boto based python deployment scripts, but wanted to ask for advice or best practices for a simple configuration management task. We have a simple web application with configuration data for several different backend environments controlled by a command line -D defined java environment variable. Sometimes, the requirement comes up that we need to switch from one backend environment to another due to maintenance or deployment schedules of our backend services.
The current procedure requires python scripts to completely destroy and rebuild all the virtual infrastructure (load balancers, auto scale groups, etc.) to redeploy the application with a change to the command line parameter. On a traditional server infrastructure, we would log in to the management console of the container, change the variable, bounce the container, and we're done.
Is there a best practice for this operation on AWS environments, or is the complete destruction and rebuilding of all the pieces the only way to accomplish this task in an AWS environment?
It depends on what resources you have to change. AWS is evolving everyday in a fast paced manner. I would suggest you to take a look at the AWS API for the resources you need to deal with and check if you can change a resource without destroying it.
Ex: today you cannot change a Launch Group once it is created. you must delete it and create it again with the new configurations. but if you have one auto scaling group attached to that launch group you will have to delete the auto scaling group and so on.
IMHO a see no problems with your approach, but as I believe that there is always room for improvement, I think you can refactor it with the help of AWS API documentation.
HTH
I think I found the answer to my own question. I know the interface to AWS is constantly changing, and I don't think this functionality is available yet in the Python boto library, but the ability I was looking for is best described as "Modifying Attributes of a Stopped Instance" with --user-data as being the attribute in question. Documentation for performing this action using HTTP requests and the command line interface to AWS can be found here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_ChangingAttributesWhileInstanceStopped.html