Copy S3 File to Docker Image via Dockerfile - amazon-web-services

I have a Dockerfile that installs awscli and then tries to run aws s3 cp to get a file and put it on the docker image.
My dockerfile is:
FROM my-kie-server:latest
USER root
RUN echo "ip_resolve=4" >> /etc/yum.conf
ENV http_proxy host.docker.internal:9000
ENV https_proxy host.docker.internal:9000
ENV HTTP_PROXY host.docker.internal:9000
ENV HTTPS_PROXY host.docker.internal:9000
RUN yum install -y maven
RUN yum install -y awscli
USER jboss
ARG AWS_ACCESS_KEY_ID
ARG AWS_SECRET_ACCESS_KEY
RUN aws s3 cp s3://myBucket/myPath/myFile.jar x.jar
But when I build the image I get this error:
fatal error: [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:618)
The command '/bin/sh -c aws s3 cp s3://myBucket/myPath/myFile.jar x.jar' returned a non-zero code: 1
I have tried using --no-verify-ssl on the aws s3 cp command but get the same error.
I've found very little online that mentions this UNKNOWN_PROTOCOL error. Any advice appreciated, thanks.

Related

Setup Apache Sedona on EMR

I want to be able to use Apache Sedona for distributed GIS computing on AWS EMR. We need the right bootstrap script to have all dependencies.
I tried setting up Geospark using EMR 5.33 using the Jars listed here. It didn't work as some dependencies were still missing.
I then manually set Sedona up on local, found the difference of Jars between Spark 3 and the Sedona setup and came up with following bootstrap script
#!/bin/bash
sudo pip3 install numpy
sudo pip3 install boto3 pandas findspark shapely py4j attrs
sudo pip3 install geospark --no-dependencies
sudo pip3 install apache-sedona
sudo aws s3 cp s3://emr_setup/apache-sedona-1.0.1-incubating-bin/sedona-python-adapter-2.4_2.11-1.0.1-incubating.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/apache-sedona-1.0.1-incubating-bin/sedona-viz-2.4_2.11-1.0.1-incubating.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/geospark_bin/postgresql-42.2.23.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/sedona-core-2.4_2.11-1.0.1-incubating.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/stream-2.7.0.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/orc-core-1.5.5-nohive.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/jersey-media-jaxb-2.22.2.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/hadoop-mapreduce-client-common-2.6.5.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/hadoop-mapreduce-client-shuffle-2.6.5.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/org.w3.xlink-24.0.jar /usr/lib/spark/jars/
sudo aws s3 cp s3:///emr_setup/spark_2.4_2.11_sedona_all_jars/minlog-1.3.0.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/jersey-client-2.22.2.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/xz-1.5.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/pyrolite-4.13.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/hadoop-yarn-common-2.6.5.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/curator-recipes-2.6.0.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/aopalliance-1.0.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/commons-configuration-1.6.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/commons-beanutils-1.7.0.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/gt-metadata-24.0.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/spark-unsafe_2.11-2.4.7.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/objenesis-2.5.1.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/commons-httpclient-3.1.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/stax-api-1.0-2.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/hk2-api-2.4.0-b34.jar /usr/lib/spark/jars/
sudo aws s3 cp s3://emr_setup/spark_2.4_2.11_sedona_all_jars/apacheds-i18n-2.0.0-M15.jar /usr/lib/spark/jars/
The EMR setup starts, but the attached notebooks to the script don't seem to be able to start. The master seems to fail for some reason.
Need help with preparing the right bootstrap script to install Apache Sedona on EMR 6.0.
Here is a complete tutorial of setting up Sedona on EMR EC2.
EMR version: 6.9.0.
Installed applications: Hadoop 3.3.3, JupyterEnterpriseGateway 2.6.0, Livy 0.7.1, Spark 3.3.0
I am using it together EMR Studio (notebooks).
In a S3 bucket, add a script that has the following content:
#!/bin/bash
# EMR clusters only have ephemeral local storage. It does not really matter where we store the jars.
sudo mkdir /jars
# Download Sedona jar
sudo curl -o /jars/sedona-python-adapter-3.0_2.12-1.3.1-incubating.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-python-adapter-3.0_2.12/1.3.1-incubating/sedona-python-adapter-3.0_2.12-1.3.1-incubating.jar"
# Download GeoTools jar
sudo curl -o /jars/geotools-wrapper-1.3.0-27.2.jar "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.3.0-27.2/geotools-wrapper-1.3.0-27.2.jar"
# Install necessary python libraries
sudo python3 -m pip install pandas geopandas==0.10.2
sudo python3 -m pip install attrs matplotlib descartes apache-sedona==1.3.1
When you create a EMR cluster, in the bootstrap action, specify the location of this script.
When you create a EMR cluster, in the software configuration, add the following content:
[
{
"Classification":"spark-defaults",
"Properties":{
"spark.yarn.dist.jars": "/jars/sedona-python-adapter-3.0_2.12-1.3.1-incubating.jar,/jars/geotools-wrapper-1.3.0-27.2.jar",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.sedona.core.serde.SedonaKryoRegistrator",
"spark.sql.extensions": "org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
}
}
]
The key point is to use Sedona 1.3.1-incubating which can search for jars specified in spark.yarn.dist.jars property. spark.jars property is ignored for EMR on EC2 since it uses Yarn to deploy jars. See SEDONA-183

AWS lambda: how can I run aws cli commands in lambda

I want to run aws cli commands from lambda
I have a Pull request event that triggers when the approval state changes and whenever it's changed I need to run an aws CLI command from lambda but the lambda function says aws not found!
how do I get the status on PR's in my lambda function?
Create a lambda function, build an image to ecr, have the lambda function reference the image, and then test the image with an event. This is a good way to run things like aws s3 sync.
Testing local:
docker run -p 9000:8080 repo/lambda:latest
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
app.py
import subprocess
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def run_command(command):
try:
logger.info('Running shell command: "{}"'.format(command))
result = subprocess.run(command, stdout=subprocess.PIPE, shell=True)
logger.info(
"Command output:\n---\n{}\n---".format(result.stdout.decode("UTF-8"))
)
except Exception as e:
logger.error("Exception: {}".format(e))
return False
return True
def handler(event, context):
run_command('aws s3 ls')
Dockerfile (awscliv2, can make requirements file if needed)
FROM public.ecr.aws/lambda/python:3.9
RUN yum -y install unzip
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-2.0.30.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install
COPY app.py ${LAMBDA_TASK_ROOT}
COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
CMD [ "app.handler" ]
Makefile (make all - login,build,tag,push to ecr repo)
ROOT:=$(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
IMAGE_NAME:=repo/lambda
ECR_TAG:="latest"
AWS_REGION:="us-east-1"
AWS_ACCOUNT_ID:="xxxxxxxxx"
REGISTRY_URI=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${IMAGE_NAME}
REGISTRY_URI_WITH_TAG=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${IMAGE_NAME}:${ECR_TAG}
# Login to AWS ECR registry (must have docker running)
login:
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${REGISTRY_URI}
build:
docker build --no-cache -t ${IMAGE_NAME}:${ECR_TAG} .
# Tag docker image
tag:
docker tag ${IMAGE_NAME}:${ECR_TAG} ${REGISTRY_URI_WITH_TAG}
# Push to ECR registry
push:
docker push ${REGISTRY_URI_WITH_TAG}
# Pull version from ECR registry
pull:
docker pull ${REGISTRY_URI_WITH_TAG}
# Build docker image and push to AWS ECR registry
all: login build tag push
The default lambda environment doesn't provide the awscli. In fact, the idea of using it there is quite awkward. You can call any command the aws cli can via an sdk like boto3 for example, which is provided in that environment.
You can however include binaries in your lambda, if you please, then execute them.
You also consider using a container image for your lambda. You can find information here: https://docs.aws.amazon.com/lambda/latest/dg/images-create.html.

Docker Build using an Assumed Role Profile

I want to build a docker image locally that copies an s3 file and sets it as the file to be executed by the container.
How can I reference the proper profile I'm needing for the S3 Bucket inside the docker file without using access keys?
dockerfile:
FROM onesysadmin/awscli:latest
RUN aws s3 cp s3://sample-bucket-dev-us-east-1/test_script.sh test_script.sh
RUN chmod 755 test_script.sh
CMD test_script.sh
.aws/credentials:
[master]
aws_access_key_id = ASIASF.......
aws_secret_access_key = 75opt1.......
aws_session_token = FwoGZXIvYXdzE......
aws_security_token = FwoGZXIvYXdzEFwoGZ......
[master-dev]
region = us-east-1
role_arn = arn:aws:iam::1234567890:role/master-admin
source_profile = master
ie..I want to be able to use master-dev as the profile in my docker build command.
I ended up using the docker buildkit.
I'm on a mac and had to change my 'docker desktop' settings to true for experimental (Docker --> Preferences --> Docker Engine):
{
"debug": true,
"experimental": true
}
Then I changed my dockerfile:
# syntax = docker/dockerfile:experimental
FROM onesysadmin/awscli:latest
ARG PROFILE
ENV AWS_DEFAULT_PROFILE=$PROFILE
RUN --mount=type=secret,id=aws,target=/root/.aws/credentials aws sts get-caller-identity
RUN --mount=type=secret,id=aws,target=/root/.aws/credentials aws s3 cp s3://sample-bucket-dev-us-east-1/test_script.sh test_script.sh
RUN chmod 755 test_script.sh
CMD test_script.sh
And finally ran the build command:
DOCKER_BUILDKIT=1 docker build -t testing --build-arg PROFILE=master-dev \
--secret id=aws,src=$HOME/.aws/credentials .

Can you multi stage build a docker image with both aws/gsutil cli?

I am wondering if there is a straightforward way in docker to build an image that has both the aws cli and gsutil cli installed on it for use. Unfortunately, an s3 name containing periods creates a Host ... returned an invalid certificate error https://github.com/GoogleCloudPlatform/gsutil/issues/267 and I cannot change the s3 bucket name unfortunately, which means I cannot do the following
gsutil -m cp -r "s3://path.with.periods/path/files" "gs://bucket_path/path"
so instead Ill have to do something like
aws s3 cp --recursive --quiet "s3://path.with.periods/path/files" ./
gsutil -m cp -r "./" "gs://bucket_path/path"
but I was wondering if there was a straightforward dockerfile that could run these commands?

How to run aws configure in a travis deploy script?

I am trying to get travis-ci to run a custom deploy script that uses awscli to push a deployment up to my staging server.
In my .travis.yml file I have this:
before_deploy:
- 'curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"'
- 'unzip awscli-bundle.zip'
- './awscli-bundle/install -b ~/bin/aws'
- 'export PATH=~/bin:$PATH'
- 'aws configure'
And I have set up the following environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
with their correct values in the travis-ci web interface.
However when the aws configure runs, it stops and waits for user input. How can I tell it to use the environment variables I have defined?
Darbio's solution works fine but it's not taking into consideration that you may end up pushing your AWS credentials in your repository.
That is a bad thing especially if docker is trying to pull a private image from one of your ECR repositories. It would mean that you probably had to store your AWS production credentials in the .travis.yml file and that is far from ideal.
Fortunately Travis gives you the possibility to encrypt environment variables, notification settings, and deploy api keys.
gem install travis
Do a travis login first of all, it will ask you for your github credentials. Once you're logged in get in your project root folder (where your .travis.yml file is) and encrypt your access key id and secret access key.
travis encrypt AWS_ACCESS_KEY_ID="HERE_PUT_YOUR_ACCESS_KEY_ID" --add
travis encrypt AWS_SECRET_ACCESS_KEY="HERE_PUT_YOUR_SECRET_ACCESS_KEY" --add
Thanks to the --add option you'll end up with two new (encrypted) environment variables in your configuration file. Now just open your .travis.yml file and you should see something like this:
env:
global:
- secure: encrypted_stuff
- secure: encrypted_stuff
Now you can make travis run a shell script that creates the ~/.aws/credentials file for you.
ecr_credentials.sh
#!/usr/bin/env bash
mkdir -p ~/.aws
cat > ~/.aws/credentials << EOL
[default]
aws_access_key_id = ${AWS_ACCESS_KEY_ID}
aws_secret_access_key = ${AWS_SECRET_ACCESS_KEY}
EOL
Then you just need to run the ecr_credentials.sh script from your .travis.yml file:
before_install:
- ./ecr_credentials.sh
Done! :-D
Source: Encription keys on Travis CI
You can set these in a couple of ways.
Firstly, by creating a file at ~/.aws/config (or ~/.aws/credentials).
For example:
[default]
aws_access_key_id=foo
aws_secret_access_key=bar
region=us-west-2
Secondly, you can add environment variables for each of your settings.
For example, create the following environment variables:
AWS_DEFAULT_REGION
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Thirdly, you can pass region in as a command line argument. For example:
aws eb deploy --region us-west-2
You won't need to run aws configure in these cases as the cli is configured.
There is further AWS documentation on this page.
Following the advice from #Darbio, I came up with this solution:
- stage: deploy
name: "Deploy to AWS EKS"
language: minimal
before_install:
# Install kubectl
- curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
- chmod +x ./kubectl
- sudo mv ./kubectl /usr/local/bin/kubectl
# Install AWS CLI
- if ! [ -x "$(command -v aws)" ]; then curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" ; unzip awscliv2.zip ; sudo ./aws/install ; fi
# export environment variables for AWS CLI (using Travis environment variables)
- export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- export AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}
# Setup kubectl config to use the desired AWS EKS cluster
- aws eks update-kubeconfig --region ${AWS_DEFAULT_REGION} --name ${AWS_EKS_CLUSTER_NAME}
deploy:
- provider: script
# bash script containing the kubectl commands to setup the cluster
script: bash k8s-config/deployment.sh
on:
branch: master
It is also possible to avoid installing AWS CLI altogether. Then you need to configure kubectl:
kubectl config set-cluster --server= --certificate-authority=
kubectl config set-credentials --client-certificate= --client-key=
kubectl config set-context myContext --cluster= --namespace= --user=
kubectl config use-context myContext
You can find most of the needed values in your users home directory in /.kube/config, after you performed the aws eks update-kubeconfig command on your local machine.
Except for the client certificate and key. I couldn't figure out where to get them from and therefore needed to install AWS CLI in the pipeline as well.