Docker service update downtime - amazon-web-services

I have an AWS EC2 with docker service.
The service has just 1 container, and when I update the container (changing image), I have a downtime (about 1 minute).
This is my docker service create code:
docker service create \
--name service-$IMAGE_NAME \
--publish 80:80 \
--env ENVIRONMENT=$(cat /etc/service_environment) \
--env-file=/etc/.env \
--replicas=1 \
--update-failure-action rollback \
--update-order start-first \
$ECR_IMAGE
Here update code:
#pull image from private ECR repository
docker pull $IMAGE
docker service update \
--force \
--image $IMAGE:latest \
--update-failure-action rollback \
--update-order start-first \
service-$IMAGE_NAME
Why this happen? What's wrong?
Thank you

Changing image means you are stopping existing docker instance, and start a new one. That's why it will be down for a while, in your case, 1 minute. It's the time for the docker instance to restart. You can make this seamless using 2 ec2 instance and load balancer. You update 1 instance and repointing the traffic to other instance, and then update the other one after the instance you've update is successful.

Related

Deploy Interactive Docker image on AWS

I am currently facing issues with deploying my Docker image on AWS. I managed to push my image into a Elastic Container Registry repository. I created an Elastic Container Service Cluster with a Task. Everything seems fine so far.
It does not start as I expect. I noticed that locally my Docker image must be executed with the "-it" argument (interactive shell).
Can you tell me how to enable such "-it" parameter?
Thanks!
you can set 'initProcessEnabled' to true parameter in container definition. This will allow us to access the running container
Following doc might help :
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_linuxparameters
Once this parameter is set to true you can access running container using below cli command.
aws ecs execute-command --cluster *cluster-name* \
--region *aws-region*
--task *task-id* \
--container *container-name* \
--interactive \
--command "/bin/sh"
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html

How to open master node of aws ec2 in ssh

I have created kubernetes cluster using kops and kubectl in main EC2 instance and master-child node created automatically.
kops create cluster \
--state=${KOPS_STATE_STORE} \
--node-count=2 \
--master-size=t2.medium \
--node-size=t2.medium \
--zones=ap-south-1a,ap-south-1b \
--name=${KOPS_CLUSTER_NAME} \
--dns private \
--master-count 1
I am able to connect kops ec2 instance(main) from git bash,or directly. But I am not able to open master instance either way.
ssh -i key1.pem ec2-user#kops-ip #working for kops
While connecting to master node its giving:
There was a problem setting up the instance connection
Log in failed. If this instance has just started up, try again in a minute or two.
My question is:
1.How to open master ec2 instance?
2.Do I need to install kubernetes dashboard in master or in kops instance(currently having)?
AWS Instances:
kops(ec2-user)
master-ap-south-1a.masters.a.com
nodes.a.com
nodes.a.com

How to keep Google Dataproc master running?

I created a cluster on Dataproc and it works great. However, after the cluster is idle for a while (~90 min), the master node will automatically stops. This happens to every cluster I created. I see there is a similar question here: Keep running Dataproc Master node
It looks like it's the initialization action problem. However the post does not give me enough info to fix the issue. Below are the commands I used to create the cluster:
gcloud dataproc clusters create $CLUSTER_NAME \
--project $PROJECT \
--bucket $BUCKET \
--region $REGION \
--zone $ZONE \
--master-machine-type $MASTER_MACHINE_TYPE \
--master-boot-disk-size $MASTER_DISK_SIZE \
--worker-boot-disk-size $WORKER_DISK_SIZE \
--num-workers=$NUM_WORKERS \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--scopes cloud-platform \
--metadata JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn \
--optional-components=ANACONDA,JUPYTER \
--image-version=1.3
I need the BigQuery connector, GCS connector, Jupyter and DataLab for my cluster.
How can I keep my master node running? Thank you.
As summarized in the comment thread, this is indeed caused by Datalab's auto-shutdown feature. There are a couple ways to change this behavior:
Upon first creating the Datalab-enabled Dataproc cluster, log in to Datalab and click on the "Idle timeout in about ..." text to disable it: https://cloud.google.com/datalab/docs/concepts/auto-shutdown#disabling_the_auto_shutdown_timer - The text will change to "Idle timeout is disabled"
Edit the initialization action to set the environment variable as suggested by yelsayed:
function run_datalab(){
if docker run -d --restart always --net=host -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true" \
-v "${DATALAB_DIR}:/content/datalab" ${VOLUME_FLAGS} datalab-pyspark; then
echo 'Cloud Datalab Jupyter server successfully deployed.'
else
err 'Failed to run Cloud Datalab'
fi
}
And use your custom initialization action instead of the stock gs://dataproc-initialization-actions one. It could be worth filing a tracking issue in the github repo for dataproc initialization actions too, suggesting to disable the timeout by default or provide an easy metadata-based option. It's probably true that the auto-shutdown behavior isn't as expected in default usage on a Dataproc cluster since the master is also performing roles other than running the Datalab service.

Set Desired Capacity error during the bootstrap of the instance

I've created an ASG with a min size and desired capacity set to 1. The EC2 instance is bind to an Application Load Balancer. I use ignition to define the user data of the Launch Configuration. I run defined in Ignition a script which execute these two commands:
# Set the ASG Desired Capacity - get CoreOS metadata
ASG_NAME=$(/usr/bin/docker run --rm --net=host \
"$AWSCLI_IMAGE" aws autoscaling describe-auto-scaling-instances \
--region="$COREOS_EC2_REGION" --instance-ids="$COREOS_EC2_INSTANCE_ID" \
--query 'AutoScalingInstances[].AutoScalingGroupName' --output text)
echo "Check desired capacity of Auto Scaling group..."
# shellcheck disable=SC2154,SC2086
/usr/bin/docker run --rm --net=host \
$AWSCLI_IMAGE aws autoscaling set-desired-capacity \
--region="$COREOS_EC2_REGION" --auto-scaling-group-name "$ASG_NAME" \
--desired-capacity 3 \
--honor-cooldown
The problem is that I get as error ScalingActivityInProgress so I can't change the desired capacity.
First I'd like to understand the root cause. Is it maybe because the ALB is not healthy when I run the above commands?
Solved removing honor-cooldown param from the request

How to automatically run consul agent and registrator container on scaling ECS instance

I have created a consul cluster of three nodes. Now I need to run consul agent and registrator containers and join consul agent with one of the consul server node whenever I up ECS instance or scale out ECS instance on which I'm running my micro services.
I have automated rest of the deployment process with rolling updates. But I have to manually start up consul agent and registrator whenever I scale out ECS instance.
Anyone have idea how can we automate this?
Create a task-definition with two containers, consul-client and registrator.
aws ecs start-task in your userdata.
This AWS post focuses on this.
edit: Since you mentioned ECS instance, I assume you already have the necessary IAM role set for the instance.
Create an ELB in front of you consul servers or an elastic IP so that it doesn't change.
Then in userdata:
#!/bin/bash
consul_host=consul.mydomain.local
#start the agent
docker run -it --restart=always -p 8301:8301 -p 8301:8301/udp -p 8400:8400 -p 8500:8500 -p 53:53/udp \
-v /opt/consul:/data -v /var/run/docker.sock:/var/run/docker.sock -v /etc/consul:/etc/consul -h \
$(curl -s http://169.254.169.254/latest/meta-data/instance-id) --name consul-agent progrium/consul \
-join $consul_host -advertise $(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)`
#start the registrator
docker run -it --restart=always -v /var/run/docker.sock:/tmp/docker.sock \
-h $(curl -s http://169.254.169.254/latest/meta-data/instance-id) --name consul-registrator \
gliderlabs/registrator:latest -ip $(curl -s http://169.254.169.254/latest/meta-data/local-ipv4) \
consul://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):8500
Note: this snippet assumes your setup is all locally reachable, etc. It's from the cloudformations from this blog post and this link