airflow docker awscliv2 using bitnami image - amazon-web-services

I am trying to get awscliv2 installed in docker image for airflow. However, when I run the dag I get this error and the alias is not being created I have to manually change it in the container. I am still pretty new to docker.
no name!#f3d6d31933d8:/$ awscliv2 configure
18:51:03 - awscliv2 - ERROR - Command failed with code 127
Dockerfile:
# set up some variables
ARG IMAGE=airflow
ARG TAG=2.3.4
ARG STAGEPATH=/etc/airflow/builddeps
# builder stage
FROM bitnami/$IMAGE:$TAG as builder
# refresh the arg
ARG STAGEPATH
# user root is required for installing packages
USER root
# install build essentials
RUN install_packages build-essential unixodbc-dev curl gnupg2
# make paths, including apt archives or the download-only fails trying to cleanup
RUN mkdir -p $STAGEPATH/deb; mkdir -p /var/cache/apt/archives
# download & build pip wheels to directory
RUN mkdir -p $STAGEPATH/pip-wheels
RUN pip install wheel
RUN python -m pip wheel --wheel-dir=$STAGEPATH/pip-wheels \
numpy\
requests \
pythonnet==3.0.0rc5 \
pymssql \
awscliv2 \
apache-airflow-providers-odbc \
apache-airflow-providers-microsoft-mssql \
apache-airflow-providers-ssh \
apache-airflow-providers-sftp \
statsd
# next stage
FROM bitnami/$IMAGE:$TAG as airflow
# refresh the arg within this stage
ARG STAGEPATH
# user root is required for installing packages
USER root
# copy pre-built pip packages from first stage
RUN mkdir -p $STAGEPATH
COPY --from=builder $STAGEPATH $STAGEPATH
# install updated and required pip packages
RUN . /opt/bitnami/airflow/venv/bin/activate && python -m pip install --upgrade --no-index --find-links=$STAGEPATH/pip-wheels \
numpy\
requests \
pythonnet==3.0.0rc5 \
pymssql \
awscliv2 \
apache-airflow-providers-odbc \
apache-airflow-providers-microsoft-mssql \
apache-airflow-providers-ssh \
apache-airflow-providers-sftp \
statsd
# createawscliv2 alias
RUN alias aws='awsv2' /bin/bash
# return to airflow user
USER 1000
I expect the awscliv2 to install with PIP and configure the alias.
I have tried running this from the container command line and the dag still gives the error command not found exit code 128

Related

"Timeout in polling result file" error when executing a Dataflow flex-template job

I've tried a lot of different things found online, but I'm still unable to solve the below timeout error:
2021-11-27T14:51:21.844520452ZTimeout in polling result file: gs://...
when submitting a Dataflow flex-template job. It goes into Queued state and after 14 mins {x} secs goes to Failed state with the above log message. My Dockerfile is as follows:
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
COPY requirements.txt .
COPY test-beam.py .
# Do not include `apache-beam` in requirements.txt
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/test-beam.py"
# Setting Proxy
ENV http_proxy=http://proxy-web.{company_name}.com:80 \
https_proxy=http://proxy-web.{company_name}.com:80 \
no_proxy=127.0.0.1,localhost,.{company_name}.com,{company_name}.com,.googleapis.com
# Company Cert
RUN apt-get update && apt-get install -y curl \
&& curl http://{company_name}.com/pki/{company_name}%20Issuing%20CA.pem -o - | tr -d '\r' > /usr/local/share/ca-certificates/{company_name}.crt \
&& curl http://{company_name}.com/pki/{company_name}%20Root%20CA.pem -o - | tr -d '\r' > /usr/local/share/ca-certificates/{company_name}-root.crt \
&& update-ca-certificates \
&& apt-get remove -y --purge curl \
&& apt-get autoremove -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Set pip config to point to Company Cert
RUN pip config set global.cert /etc/ssl/certs/ca-certificates.crt
# Install apache-beam and other dependencies to launch the pipeline
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir apache-beam[gcp]==2.32.0 \
&& pip install --no-cache-dir -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \
# Download the requirements to s7peed up launching the Dataflow job.
&& pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE
# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True
ENV http_proxy= \
https_proxy= \
no_proxy=
ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
And requirements.py:
numpy
setuptools
scipy
wavefile
I know my Python script used above test-beam.py works as it executes successfully locally using a DirectRunner.
I have gone through many SO posts and GCP's own troubleshooting guide here aimed at this error, however to no success. As you can see from my Dockerfile, I have done the following in it:
Installing apache-beam[gcp] separately and not including it in my requirements.txt file.
Pre-downloading all dependencies using pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE.
Setting ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"] explicitly as it seems this is not set in the base image gcr.io/dataflow-templates-base/python3-template-launcher-base as found by executing docker inspect on it (am I correct about this?).
Unsetting company proxy at the end as it seems to be the cause of timeout issues seen in job logs from previous runs.
What am I missing? How can I fix this issue?

'pip' is not recognized as an internal or external command, operable program or batch file on docker web application

I am undergoing a web project using django and docker. The tutorial references how to set up an email service. I registered with AWS and followed a guide of how to link it to docker. The first step is to run "pip install --upgrade boto3". This is followed by the error in the title. How do I install boto3 through docker?
You can use docker-boto3 docker image instead of installing and maintaining a docker image for your self.
docker run --rm -t \
-v $HOME/.aws:/home/worker/.aws:ro \
-v ${pwd}/example:/work \
shinofara/docker-boto3 python example.py
or you can create your own docker image
FROM alpine:latest
RUN apk add --update python3 \
&& pip3 install --upgrade pip \
&& pip3 install boto3 requests PyYAML pg8000 -U \
&& ln -sv /usr/bin/python3 /usr/bin/python
ENTRYPOINT [ "python3" ]
Boto3 Dockerfile

Docker "ImportError: No module named boto" when Boto is installed

In my alpine based docker image, I have installed boto3.output of dockerbuild from docker-compose.
Running setup.py install for s3cmd: started
Running setup.py install for s3cmd: finished with status 'done'
Successfully installed awscli-1.14.5 **boto3-1.13.15** botocore-1.8.9 colorama-0.3.7 docutils-0.16 futures-3.3.0 jmespath-0.10.0 pyasn1-0.4.8 python-dateutil-2.8.1 pyyaml-5.3.1 rsa-3.4.2 s3cmd-2.0.1 s3transfer-0.1.13 six-1.15.0
snippet of dockerfile looks like below,
FROM alpine:3.6
RUN apk -v --update add \
python3 \
py-pip \
groff \
less \
mailcap \
curl \
jq && \
pip install --upgrade pip && \
pip install --no-cache-dir awscli==1.14.5 s3cmd==2.0.1 boto3 pyyaml && \
apk -v --purge del py-pip && \
rm /var/cache/apk/*
when I try to execute my python package via docker-compose using same Dockerfile:
its says
setup-application_1 | File "test_cf_create_or_update.py", line 1, in <module>
setup-application_1 | import boto3
setup-application_1 | ModuleNotFoundError: No module named 'boto3'
localstack_setup-application_1 exited with code 1
I don't know, how to resolve this.
Try:
pip3 install boto3 -t .
It seems that boto is not going to the right target, with "t" flag and ".", you make sure it is seen by all!
Since you are installing python3, you should be using pip3:
pip3 install --upgrade pip && \
pip3 install --no-cache-dir awscli==1.14.5 s3cmd==2.0.1 boto3 pyyaml && \
And the in the container you will use python3 to execute your script.
Also worth noting is the fact that since you have fixed the versions of awscli and s3cmd, you will be getting warnings about either too old or too new libraries when building the docker image.

gcc error while building docker image for django on windows

I am trying to build a docker image using Visual Studio Code following this tutorial "https://code.visualstudio.com/docs/python/tutorial-deploy-containers".
I created a django app with a connection to a MSSQLserver on azure with the package pyodbc.
During the build of the docker image i receive the following error messages:
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
----------------------------------------
Failed building wheel for pyodbc
and
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
----------------------------------------
Failed building wheel for typed-ast
I read solutions for linux systems where one should install python-dev, but since i am working on a windows machine this is no solution.
Then i read that on windows all the needed files are in the 'include' directory of the python installation. But in a venv installation this directory is empty... so i created a directory junction to the original 'include'. The error still exists.
My docker file is included below.
# Python support can be specified down to the minor or micro version
# (e.g. 3.6 or 3.6.3).
# OS Support also exists for jessie & stretch (slim and full).
# See https://hub.docker.com/r/library/python/ for all supported Python
# tags from Docker Hub.
FROM tiangolo/uwsgi-nginx:python3.6-alpine3.7
# Indicate where uwsgi.ini lives
ENV UWSGI_INI uwsgi.ini
# Tell nginx where static files live (as typically collected using Django's
# collectstatic command.
ENV STATIC_URL /app/static_collected
# Copy the app files to a folder and run it from there
WORKDIR /app
ADD . /app
# Make app folder writable for the sake of db.sqlite3, and make that file also writable.
# RUN chmod g+w /app
# RUN chmod g+w /app/db.sqlite3
# If you prefer miniconda:
#FROM continuumio/miniconda3
LABEL Name=hello_django Version=0.0.1
EXPOSE 8000
# Using pip:
RUN python3 -m pip install -r requirements.txt
CMD ["python3", "-m", "hello_django"]
# Using pipenv:
#RUN python3 -m pip install pipenv
#RUN pipenv install --ignore-pipfile
#CMD ["pipenv", "run", "python3", "-m", "hello_django"]
# Using miniconda (make sure to replace 'myenv' w/ your environment name):
#RUN conda env create -f environment.yml
#CMD /bin/bash -c "source activate myenv && python3 -m hello_django"
I could use some help in building the image without the errors.
Based on the answer of 2ps i added these lines almost at the top of the docker file
FROM tiangolo/uwsgi-nginx:python3.6-alpine3.7
RUN apk update \
&& apk add apk add gcc libc-dev g++ \
&& apk add libffi-dev libxml2 libffi-dev \
&& apk add unixodbc-dev mariadb-dev python3-dev
and received a new error...
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
v3.7.1-98-g2f2e944c59 [http://dl-cdn.alpinelinux.org/alpine/v3.7/main]
v3.7.1-105-g7db92f4321 [http://dl-cdn.alpinelinux.org/alpine/v3.7/community]
OK: 9053 distinct packages available
ERROR: unsatisfiable constraints:
add (missing):
required by: world[add]
apk (missing):
required by: world[apk]
The command '/bin/sh -c apk update && apk add apk add gcc libc-dev g++ && apk add libffi-dev libxml2 libffi-dev && apk add unixodbc-dev mariadb-dev python3-dev' returned a non-zero code: 2
Found out that adding
RUN echo "ipv6" >> /etc/modules
helped with the errors above. Taken from: https://github.com/gliderlabs/docker-alpine/issues/55
The app now works, exept that the intended connection to the MsSQL database still not works.
Error at /
('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 13 for SQL Server' : file not found (0) (SQLDriverConnect)")
I think i should get my hands dirty on some docker documentation.
I gave up on the solution with alpine and switched to debian
FROM python:3.7
# needed files for pyodbc
RUN apt-get update
RUN apt-get install gcc libc-dev g++ libffi-dev libxml2 libffi-dev unixodbc-dev -y
# MS SQL driver 17 for debian
RUN apt-get install apt-transport-https \
&& curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -\
&& curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list \
&& apt-get update \
&& ACCEPT_EULA=Y apt-get install msodbcsql17 -y
You'll need to use apk to install gcc and other native dependencies needed to build your pip dependencies. For the ones that you listed (typedast and pyodbc), I think they would be:
RUN apk update \
&& apk add apk add gcc libc-dev g++ \
&& apk add libffi-dev libxml2 libffi-dev \
&& apk add unixodbc-dev mariadb-dev python3-dev

Dockerfile PHP, NGINX and Composer

I'm having a difficult time finding resources for creating a Dockerfile to install a proper PHP, Composer and NGINX environment.
I can create a docker-compose container set, but I cannot get composer installed doing that. If anyone has any good resources to point me to, in order to write a full PHP, Composer and NGINX Dockerfile.
this is my docker file example for a similar scenario, I hope it helps. Feedback and ideas are welcomed !
FROM php:7.4-fpm
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
curl \
libpng-dev \
libonig-dev \
libxml2-dev \
libzip-dev \
zip \
unzip \
software-properties-common \
lsb-release \
apt-transport-https \
ca-certificates \
wget \
gnupg2
# Clear cache
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
# Install PHP extensions (some are already compiled in the PHP base image)
RUN docker-php-ext-install pdo_mysql mbstring exif pcntl bcmath gd json zip xml
# Get latest Composer
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer
# Create myuser
RUN useradd -G www-data,root -u 1000 -d /home/myuser myuser
RUN mkdir -p /home/myuser/.composer && \
chown -R myuser:myuser /home/myuser
# Set working directory
WORKDIR /var/www/mypage
USER $user
You can add nginx to this container but then, I recommend to use supervisord to control multiple processes.