Using Spark-Submit in Docker Image? - amazon-web-services

I am wondering something about PySpark applications. If I container a PySpark program called my_spark_script.py, can I just execute it inside the Docker container? I mean to ask, is a Docker file like this valid:
WORKDIR /app
COPY . .
RUN pip3 install -r requirements.txt
CMD spark-submit --master yarn --deploy-mode cluster--num-executors 2 my_spark_script.py // <-- ???
And I can build it as:
docker build -t my_docker_image .
and then run it as
docker run -d my_docker_image
I am wondering if this can be run on AWS EC2 or AWS EMR or something else like this? Would it work?
I just dont know how the container CMD works in relation to environment like EC2 or EMR. Please help!

Amazon Elastic Container Service (ECS) is a managed AWS service to run Docker containers. ECS provides the Fargate launch type, which is a serverless platform with which a container service is run on Docker containers instead of EC2 instances.
To build the source code into a Docker image you can use AWS CodeBuild service, and AWS CodePipeline for Continuous Integration, please check the following example, here.

Related

Why can't I run X-Ray daemon inside docker container in AWS Elastic Beanstalk

I read the AWS X-RAY and AWS Elastic Beanstalk documentation and wonder the question why do they say that X-RAY daemon should be run as extension. As I know Elastic Beanstalk can run my application as docker container. Can I just run the daemon inside that container?
Documentation:
Here they say that we should run the X-RAY daemon as extension:
https://docs.aws.amazon.com/xray/latest/devguide/xray-daemon-beanstalk.html
Here they show how to run that daemon inside Docker:
https://docs.aws.amazon.com/xray/latest/devguide/xray-daemon-local.html
You should edit your dockerFile to have the X-ray Daemon
FROM amazonlinux
RUN yum install -y unzip
RUN curl -o daemon.zip https://s3.us-east-2.amazonaws.com/aws-xray-assets.us-east-2/xray-daemon/aws-xray-daemon-linux-3.x.zip
RUN unzip daemon.zip && cp xray /usr/bin/xray
ENTRYPOINT ["/usr/bin/xray", "-t", "0.0.0.0:2000", "-b", "0.0.0.0:2000"]
EXPOSE 2000/udp
EXPOSE 2000/tcp

Unable to deploy custom docker image to AWS ECS using Terraform

I am using terraform to build infrastructure on AWS provider. I am using ECR to push my local docker images using AWSCLI.
Now, I have a Application load balancer which would route traffic to ECS_service. I want ECS to manage my Docker Containers using Fargate. But, the docker containers are exited by saying "Essential Docker container exited".
Thats the only log printed out.
If i change the docker image to be nginx:latest(which is fetched from dockerhub). It works.
PS: My docker container is a simple node application with node:alpine as base image. Is it something related to this, i am wrong !
Can anyone provide me with some insight on what is wrong with my approach.
I get the following error in AWS Logs:
docker-standard-init-linux-go211-exec-user-process-caused-exec-format-error.
My Dockerfile
FROM node:alpine
WORKDIR /app
COPY . .
RUN npm install
# Expose a port.
EXPOSE 8080
# Run the node server.
ENTRYPOINT ["npm", "start"]
They say, its issue with the start script. I am just running this command. npm start to start the server.
It’s not your approach, your image is just not working.
Try running it locally and see the output otherwise you will need to ship the logs to Cloudwatch and see what they say

How can I install aws cli, from WITHIN the ECS task?

Question:
How can I install aws cli, from WITHIN the ECS task ?
DESCRIPTION:
I'm using a docker container to run the logstash application (it is part of the elastic family).
The docker image name is "docker.elastic.co/logstash/logstash:7.10.2"
This logstash application needs to write to S3, thus it needs AWS CLI installed.
If aws is not installed, it crashes.
# STEP 1 #
To avoid crashing, when I used this application only as a docker, I ran it in a way that I caused the 'logstash start' to be delayed, after docker container was started.
I did this by adding "sleep" command to an external docker-entrypoint file, before it starts the logstash.
This is how it looks in the docker-entrypoint file:
sleep 120
if [[ -z $1 ]] || [[ ${1:0:1} == '-' ]] ; then
exec logstash "$#"
else
exec "$#"
fi
# EOF
# STEP 2 #
run the docker with "--entrypoint" flag so it will use my entrypoint file
docker run \
-d \
--name my_logstash \
-v /home/centos/DevOps/psifas_logstash_docker-entrypoint:/usr/local/bin/psifas_logstash_docker-entrypoint \
-v /home/centos/DevOps/logstash.conf:/usr/share/logstash/pipeline/logstash.conf \
-v /home/centos/DevOps/logstash.yml:/usr/share/logstash/config/logstash.yml \
--entrypoint /usr/local/bin/psifas_logstash_docker-entrypoint \
docker.elastic.co/logstash/logstash:7.10.2
# STEP 3 #
install aws cli and configure aws cli from the server hosting the docker:
docker exec -it -u root <DOCKER_CONTAINER_ID> yum install awscli -y
docker exec -it <DOCKER_CONTAINER_ID> aws configure set aws_access_key_id <MY_aws_access_key_id>
docker exec -it <DOCKER_CONTAINER_ID> aws configure set aws_secret_access_key <MY_aws_secret_access_key>
docker exec -it <DOCKER_CONTAINER_ID> aws configure set region <MY_region>
This worked for me,
Now I want to "translate" this flow into an AWS ECS task.
in ECS I will use parameters instead of running the above 3 "aws configure" commands.
MY QUESTION
How can I do my 3rd step, installing aws cli, from WITHIN the ECS task ? (meaning not to run it on the EC2 server hosting the ECS cluster)
When I was working on the docker I also thought of these options to use the aws cli:
find an official elastic docker image containing both logstash and aws cli. <-- I did not find one.
create such an image by myself and use. <-- I prefer not , because I want to avoid the maintenance of creating new custom images when needed (e.g when new version of logstash image is available).
Eventually I choose the 3 steps above, but I'm open to suggestion.
Also, My tests showed that running 2 containers within the same ECS task:
logstah
awscli
and then the logstash container will use the aws cli container
(image "amazon/aws-cli") is not working.
THANKS A LOT IN ADVANCE :-)
Your option #2, create the image yourself, is really the best way to do this. Anything else is going to be a "hack". Also, you shouldn't be running aws configure for an image running in ECS, you should be assigning a IAM role to the task, and the AWS CLI will pick that up and use it.
Mark B, your answer helped me to solve this. Thanks!
writing here the solution in case it will help somebody else.
There is no need to install AWS CLI, in the logstash docker container running inside the ECS task.
Inside the logstash container (from image "docker.elastic.co/logstash/logstash:7.10.2") there is AWS SDK to connect to the S3.
The only thing required is to allow the ECS Task execution role, access to S3.
(I attached AmazonS3FullAccess policy)

Dockerfile - install jenkins on AWS

New to AWS so any help would be appreciated.
I'm attempting to run Jenkins through Docker on AWS. I found this article https://docs.aws.amazon.com/aws-technical-content/latest/jenkins-on-aws/containerized-deployment.html
Can anyone share a better step-by-step tutorial to achieve this? the page above seems incomplete.
It talks about "The Dockerfile should also contain the steps to install the Jenkins Amazon ECS plugin" but does not show how to go about installing the plugin using the Dockerfile.
thanks
Please follow below steps:
Launch EC2 cluster according to your needs.
Install docker in you local machine. For example, for ubuntu (sudo apt-get isntall docker.io)
systemctl start docker
Create new folder for our jenkins docker. Create new Dockerfile inside it with following contents.
FROM Jenkins
COPY plugins.txt /usr/share/jenkins/plugins.txt
RUN /usr/local/bin/plugins.sh /usr/share/jenkins/plugins.txt
Create plugins.txt in same folder and add below line
amazon-ecs:1.3
Login to ECR using aws cli. Configure aws first with your credentials.
aws ecr get-login --region <REGION>
Run the output returned from above command to docker login.
sudo docker build -t jenkins_master .
sudo docker tag jenkins_master:latest <AWS ACC ID>.dkr.ecr.<REGION>.amazonaws.com/jenkins_master:latest
Create repository in ECR for this image
aws ecr create-repository --repository-name jenkins_master
Push the image in AWS ECR.
sudo docker push <AWS ACC ID>.dkr.ecr.<REGION>+.amazonaws.com/jenkins_master:latest
Our Jenkins docker image is ready. But data stored by this Jenkins server will not be persistent. To store data permanently, we will create another docker image which will create a volume with mount point. For that, create new directory for this new docker image and inside it create another Dockerfile with below content.
FROM Jenkins
VOLUME ["/var/jenkins_home"]
Again follow same commands to push this new repository to ECR.
sudo docker build -t jenkins_dv .
sudo docker tag jenkins_dv:latest <AWS ACC ID>.dkr.ecr.<REGION>.amazonaws.com/jenkins_dv:latest
aws ecr create-repository --repository-name jenkins_dv
sudo docker push <AWS Account Number>.dkr.ecr.<REGION>.amazonaws.com/jenkins_dv:latest
Now our images are ready. We will use this images to run them as service on our ECS cluster. For that we need to install ecs-cli using below command for linux.
sudo curl -o /usr/local/bin/ecs-cli https://s3.amazonaws.com/amazon-ecs-cli/ecs-cli-linux-amd64-latest
Create a new txt file with below contents which will have jenkins configuration.
jenkins_master:
image: jenkins_master
cpu_shares: 100
mem_limit: 2000M
ports:
- "8080:8080"
- "50000:50000"
volumes_from:
- jenkins_dv
jenkins_dv:
image: jenkins_dv
cpu_shares: 100
mem_limit: 500M
15. Finally push this service using above file to your newly created cluster.
ecs-cli compose --file docker_compose.txt service up --cluster <cluster_name>
Hope this helps!

how to setup continuos deployment from docker-hub to AWS ECS?

I am setting up a CI/CD pipeline for my micro-services. Currently I use TravisCI to pull the code from Github upon check-in, build the docker image and push it to DockerHub. I tried using docker cloud(previously knows as Tutum), which provides automatic deployment feature to AWS EC2 instance but the deployment sometimes recreates the container and the service endpoint URL changes, which is not desirable.
I am exploring amazon's ECS and its tasks , but I can not find any reference for how to setup continuos deployment to ECS when a new image is pushed to docker hub.
Anybody has any experience doing the setup ?
with ECS you would basically have CI detect a change to docker hub and update your task definition/service.
For this I use the wonderful ecs-deploy script from here:
https://github.com/silinternational/ecs-deploy
After my container has been built and deployed to dockerhub it's simply a matter of:
ecs-deploy -k $AWS_KEY -s $AWS_SECRET -r $AWS_REGION -c $CLUSTER_NAME -n $SERVICE_NAME -i $DOCKER_IMAGE_NAME
and that does it.