As you can see I have task definition for revision 4 and a task definition for revision 5. I want permanently stop running 4, and only run 5:
So in other words, the task that is PROVISIONING - I only want that one. The task that is RUNNING - I don't want that one to run anymore. How to achieve this?
I tried to replicate the scenario and it went well for me. So what I think is you need to dig further under the hood.
Your task is in provisioning state which I believe is related to your environment and not related to your task, service or cluster.
From the AWS Documentation :
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-lifecycle.html
**PROVISIONING**
Amazon ECS has to perform additional steps before the task is launched. For example, for tasks that use the awsvpc network mode, the elastic network interface needs to be provisioned.
You might want to check below things to start debugging :
Cloudformation template that ECS use to provision your resources.
Try looking into your VPC if anything got changed since the last deployment.
Security Groups, IAM Roles to find out if anything blocking your resource creation.
Related
I have a service that needs to run on our own EC2 instances, since it requires some support from the kernel. My previous experience is all with containers in AWS. The application itself is distributed as a single JAR file and I'm looking for advice for how I should automate deployments. The architecture is:
An ALB in front of the ASG.
EC2 instance running a single Java application.
Any open sockets are open for an hour tops and to not cause any trouble, we have to drain the connections to the EC2 instances before performing an update, so a hard requirement is for the ALB to stop opening new connections for an hour before updating the software. The application is mission critical and ECS has had some issues last year, so I want to minimize the AWS services I depend on. While I could do what I want on my own ECS cluster with custom AMIs, I don't want to do it, since I will run a single instance of the app per host and don't need the extra layer.
My question: What is the simplest method to achieve this using CodePipeline? My understanding is that I need to use a CodeDeploy deployment step to push something to bare EC2 instances. How does draining with an ALB work in this case? We're using CloudFormation for the deployment.
You need to use codedeploy. You can find tutorial on AWS codedeploy documentation.
Codedeploy deployment lifecycle hooks for EC2.
https://docs.aws.amazon.com/codedeploy/latest/userguide/reference-appspec-file-structure-hooks.html#appspec-hooks-server
my current ECS infrastructure works as follows: ALB -> ECS Fargate --> ECS service -> ECS task.
Now I would like to replace the normal ECS task with a Scheduled ECS task. But nowhere do I find a way to connect the Scheduled ECS task to the service and thus make it accessible via the ALB. Isn't that possible?
Thanks in advance for answers.
A scheduled task is really more for something that runs to complete a given task and then exits.
If you want to connect your ECS task to a load balancer you should run it as part of a Service. ECS will handle connecting the task to the load balancer for you when it runs as a Service.
You mentioned in comments that your end goal is to run a dev environment for a specific time each day. You can do this with an ECS service and scheduled auto-scaling. This feature isn't available through the AWS Web console for some reason, but you can configure it via the AWS CLI or one of the AWS SDKs. You would configure it to scale to 0 during the time you don't want your app running, and scale up to 1 or more during the time you do want it running.
A scheduled ECS task is it a one-off task launched with the RunTask API and that has no ties to an ALB (because it's not part of the ECS service). You could probably make this work but you'd probably need to build the wiring yourself by finding out the details of the task and adding it to the target group. I believe what you need to do (if you want ECS to deal with the wiring) is to schedule a Lambda that increments the desired number of tasks in the service. I am also wondering what the use case is for this (as maybe there are other ways to achieve it). Scheduled tasks are usually batch jobs of some sort and not web services that need to get wired to a load balancer. What is the scenario / end goal you have?
UPDATE: I missed the non-UI support for scheduling the desired number of tasks so the Lambda isn't really needed.
This is almost the same question as this one, but for Fargate.
I can't find any way to just stop the cluster or the Fargate service temporarily without having to delete it or changing its task definition.
Tried to stop each task individually but as expected, Fargate provisions a new task right after.
Seems there no option in the AWS console yet - maybe a CLI option exists?
Fargate does not allow you to stop the cluster because there are no underlying EC2 instances that you control to stop. Resources are provisioned in a "serverless" way so you don't have to deal with the underlying resources.
You need to stop the individual tasks but, like you reported, you may encounter that they are replaced after you stop the running tasks that are part of a service. To ensure this doesn't happen update your services to have a "Number of tasks" set to 0. This will keep your service definition up so you don't have to delete them but it will allow you to remove any running tasks.
Hope that helps!
Found a command in the ecs-cli that does exactly what #jd-d described:
ecs-cli compose --project-name name service down --cluster-config cluster --cluster cluster
Stops the running tasks that belong to the service created with the compose project. This command updates the desired count of the service to 0.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cmd-ecs-cli-compose-service-stop.html
It does work! but unfortunately I think it is not a complete answer as seems it only works when using ecs-cli and to manage docker compose.
Below the screenshot, it seems to be successful only in one Availability Zone.
I checked the codedeploy logs for a failed instance, and I found that there was an error, I think it is recognized as an on-premise instance.
2018-01-10 04:40:22 INFO [codedeploy-agent(2696)]: On Premises config file does not exist or not readable
2018-01-10 04:40:43 ERROR [codedeploy-agent(2696)]: CodeDeploy Instance Agent Service: CodeDeploy Instance Agent Service: error during start or run: InstanceMetadata::InstanceMetadataError - Not an EC2 instance and region not provided in the environment variable AWS_REGION. Please specify your region using environment variable AWS_REGION.......
I've searched for about three days for this issue, but there was no mention in the AWS documentation. In the production env, I plan to use two Availability Zones attached to the auto scaling group. I wonder if I'm overlooking the other thing except CodeDeploy... What should I check? Thank you in advance.
[Updated]
I update with ASG and ASG Config screent shot. There's no special, it's vanila and default process. I waiting 5 days from AWS support center but still pending response.
Auto Scaling Group -----
Auto Scaling Group Launch Config -----
Finally, I found out why codedeploy failed across multiple availability zones on Windows 2016. Also, this problem seems to be an issue with Windows 2016 EC2 itself rather than ASG or codedeploy(I have not tested it on linux). There are 2 solutions I found,
Shut down the server safely by clicking the button "Shutdown with Syspre" in Ec2LaunchSettings. And then you can create AMI as usal.
Run the C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 -Schedule script manually. The argument "-Schedule" is required. And then you can create AMI as usal.
The first method is an intuitive and convenient way(GUI), and the second method is appropriate for automate a powershell script. I have confirmed that both methods succeed in deploying to multiple AZs. There were no errors in the logs recorded by codedeployagent.
To be more specific, codedeployagent leaves various logs at the time of deployment, and I found that the agent seems to use meta-info from 169.254.169.254. When I failed, the log say "You are On-Premise Instance.". Probably the deployment fails because the instance can not get meta-info. In the following document, I have received a lot of help and all of my solutions are listed.
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2launch.html
Especially, in the document
.....In Windows PowerShell, run the following command so that the system schedules the script to run as a Windows Scheduled Task. The script runs one time during the next boot and then disables these tasks from running again....
C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 -Schedule
There are two questions about AWS autoscaling + deployment which I cannot clearly answer:
I'm currently trying to figure out, whats the best strategy to deploy to an EC2 instance behind an ELB which is the only member of an autoscaling group without downtime.
By now the EC2 setup will be done with puppet including the deployment of the application, triggered after an successful build by jenkins.
The best solution I have found is to check per script how many instances are registered at the ELB. If a single one is registered, spawn a new one, which runs puppet on startup (the new node will be up to date) and kill the old node.
How to deploy (autoscaling EC2 behind an ELB) without delivering two different versions of the application?
Possible solution: Check per script how many EC2 instances are registered to the ELB, spawn the same amount of instances, register all new instances and unregister all old ones.
My experiences with AWS teacher me that AWS has a service for everything. So are there any services out there to accomplish my requirements and my solutions are inconvenient?
You can create an entirely new environment with its own ELB and when it's ready and checked, you switch the DNS record to the new ELB.
Anyway for a brief time (60 seconds or so, depending on the TTL of your DNS record) some users will see your old version while some others will see the new version.
In the end there were two possible solutions. Both of them would temporarily deliver two versions of the app.
Use AWS CodeDeploy to perform an sequential deployment (one after another). This solution offers the possibility to rollback to a previous state and visual shows the state and results of the deployment.
Create a python script to get the registered nodes (using Boto) and run the appropriate puppet script on them (using Fabric). This solution offers more control of the deployment but requires some time to build these script. Also there can be bugs..
For now I choose AWS CodeDeploy because its already available and - hopefully - well tested.