Docker "ImportError: No module named boto" when Boto is installed - amazon-web-services

In my alpine based docker image, I have installed boto3.output of dockerbuild from docker-compose.
Running setup.py install for s3cmd: started
Running setup.py install for s3cmd: finished with status 'done'
Successfully installed awscli-1.14.5 **boto3-1.13.15** botocore-1.8.9 colorama-0.3.7 docutils-0.16 futures-3.3.0 jmespath-0.10.0 pyasn1-0.4.8 python-dateutil-2.8.1 pyyaml-5.3.1 rsa-3.4.2 s3cmd-2.0.1 s3transfer-0.1.13 six-1.15.0
snippet of dockerfile looks like below,
FROM alpine:3.6
RUN apk -v --update add \
python3 \
py-pip \
groff \
less \
mailcap \
curl \
jq && \
pip install --upgrade pip && \
pip install --no-cache-dir awscli==1.14.5 s3cmd==2.0.1 boto3 pyyaml && \
apk -v --purge del py-pip && \
rm /var/cache/apk/*
when I try to execute my python package via docker-compose using same Dockerfile:
its says
setup-application_1 | File "test_cf_create_or_update.py", line 1, in <module>
setup-application_1 | import boto3
setup-application_1 | ModuleNotFoundError: No module named 'boto3'
localstack_setup-application_1 exited with code 1
I don't know, how to resolve this.

Try:
pip3 install boto3 -t .
It seems that boto is not going to the right target, with "t" flag and ".", you make sure it is seen by all!

Since you are installing python3, you should be using pip3:
pip3 install --upgrade pip && \
pip3 install --no-cache-dir awscli==1.14.5 s3cmd==2.0.1 boto3 pyyaml && \
And the in the container you will use python3 to execute your script.
Also worth noting is the fact that since you have fixed the versions of awscli and s3cmd, you will be getting warnings about either too old or too new libraries when building the docker image.

Related

airflow docker awscliv2 using bitnami image

I am trying to get awscliv2 installed in docker image for airflow. However, when I run the dag I get this error and the alias is not being created I have to manually change it in the container. I am still pretty new to docker.
no name!#f3d6d31933d8:/$ awscliv2 configure
18:51:03 - awscliv2 - ERROR - Command failed with code 127
Dockerfile:
# set up some variables
ARG IMAGE=airflow
ARG TAG=2.3.4
ARG STAGEPATH=/etc/airflow/builddeps
# builder stage
FROM bitnami/$IMAGE:$TAG as builder
# refresh the arg
ARG STAGEPATH
# user root is required for installing packages
USER root
# install build essentials
RUN install_packages build-essential unixodbc-dev curl gnupg2
# make paths, including apt archives or the download-only fails trying to cleanup
RUN mkdir -p $STAGEPATH/deb; mkdir -p /var/cache/apt/archives
# download & build pip wheels to directory
RUN mkdir -p $STAGEPATH/pip-wheels
RUN pip install wheel
RUN python -m pip wheel --wheel-dir=$STAGEPATH/pip-wheels \
numpy\
requests \
pythonnet==3.0.0rc5 \
pymssql \
awscliv2 \
apache-airflow-providers-odbc \
apache-airflow-providers-microsoft-mssql \
apache-airflow-providers-ssh \
apache-airflow-providers-sftp \
statsd
# next stage
FROM bitnami/$IMAGE:$TAG as airflow
# refresh the arg within this stage
ARG STAGEPATH
# user root is required for installing packages
USER root
# copy pre-built pip packages from first stage
RUN mkdir -p $STAGEPATH
COPY --from=builder $STAGEPATH $STAGEPATH
# install updated and required pip packages
RUN . /opt/bitnami/airflow/venv/bin/activate && python -m pip install --upgrade --no-index --find-links=$STAGEPATH/pip-wheels \
numpy\
requests \
pythonnet==3.0.0rc5 \
pymssql \
awscliv2 \
apache-airflow-providers-odbc \
apache-airflow-providers-microsoft-mssql \
apache-airflow-providers-ssh \
apache-airflow-providers-sftp \
statsd
# createawscliv2 alias
RUN alias aws='awsv2' /bin/bash
# return to airflow user
USER 1000
I expect the awscliv2 to install with PIP and configure the alias.
I have tried running this from the container command line and the dag still gives the error command not found exit code 128

How can I make packages installed using Poetry accessible in Docker?

I have a Django REST framework API that I'm trying to run in Docker. The project uses Poetry 1.1.12. When running, I can see that Poetry is installed correctly, and that Poetry installs the packages in my pyproject.toml, including Django. I'm using supervisor to run the API using Daphne, as well as some other tasks (like collecting static files).
However, when supervisor runs the app, I get:
Traceback (most recent call last):
File "/home/docker/api/manage.py", line 22, in <module>
main()
File "/home/docker/api/manage.py", line 13, in main
raise ImportError(
ImportError: Couldn't import Django. Are you sure it's installed and available on your PYTHONPATH environment variable? Did you forget to activate a virtual environment?
Traceback (most recent call last):
File "/home/docker/api/manage.py", line 11, in main
from django.core.management import execute_from_command_line
ModuleNotFoundError: No module named 'django'
Notice how I set POETRY_VIRTUALENVS_CREATE=false and ENV PATH="/root/.local/bin:${PATH}". According to the poetry installation script, that is the path that needs to be added to PATH.
Here is an abridged versioned of my Dockerfile:
FROM python:3.9-slim-buster
ENV PATH="/root/.local/bin:${PATH}"
RUN apt-get update && apt-get install -y --no-install-recommends \
... \
curl \
supervisor \
&& curl -sSL 'https://install.python-poetry.org' | python - && poetry --version \
&& apt-get remove -y curl \
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
&& apt-get clean -y && rm -rf /var/lib/apt/lists/* \
&& rm -rf /var/lib/apt/lists/*
COPY poetry.lock pyproject.toml /home/docker/api/
WORKDIR /home/docker/api
RUN if [ "$DEBUG" = "false" ] \
; then POETRY_VIRTUALENVS_CREATE=false poetry install --no-dev --no-interaction --no-ansi -vvv --extras "production" \
; else POETRY_VIRTUALENVS_CREATE=false poetry install --no-interaction --no-ansi -vvv --extras "production" \
; fi
COPY . /home/docker/api/
COPY .docker/services/api/files/supervisor.conf /etc/supervisor/conf.d/
CMD ["supervisord", "-n"]
Which is pretty much how I see others doing it. Any ideas?
Poetry documents itself as trying very very hard to always run inside a virtual environment. However, a Docker container is itself isolation from other Pythons, and it's normal (and easiest) to install packages in the "system" Python.
There is a poetry export command that can convert Poetry's files to a normal pip requirements.txt file, and from there you can RUN pip install in your Dockerfile. You could use a multi-stage Dockerfile to generate that file without actually including Poetry in your main image.
FROM python:3.9-slim-buster AS poetry
RUN pip install poetry
WORKDIR /app
COPY pyproject.toml poetry.lock .
RUN poetry export -f requirements.txt --output requirements.txt
FROM python:3.9-slim-buster
WORKDIR /app
COPY --from=poetry /app/requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["./manage.py", "runserver", "0.0.0.0:8000"]
django should show up in the generated requirements.txt file, and since pip install installs it as a "normal" "system" Python package, your application should be able to see it normally, without tweaking environment variables or other settings.
Could it be because of a missing DJANGO_SETTINGS_MODULE environment variable?

Module errorr for airflow oracle hook

I am trying to set up an Oracle database connection on airflow.
I am getting this error:
ModuleNotFoundError: No module named 'airflow.providers.oracle' when using: from airflow.provideres.oracle.hooks.oracle import OracleHook
Part of my dag file:
from airflow.decorators import task
from airflow.providers.oracle.hooks.oracle import OracleHook
def exe_query_oracle_hook():
hook = OracleHook(oracle_conn_id="oracle_conn")
df = hook.get_pandas_df(sql='SELECT * FROM TABLE')
print(df.to_string())
I tried installing pip install apache-airflow-providers-oracle and most were already required, my current version is 2.1.0. I also followed the docs: airflow building custom images. Here is my Dockerfile
FROM apache/airflow:2.1.0
ARG ORACLE_VERSION=11.2.0.4.0
ARG ORACLE_SHORT_VER=11204
ENV CLIENT_ZIP=instantclient-basiclite-linux.x64-${ORACLE_VERSION}.zip
ENV SDK_ZIP=instantclient-sdk-linux.x64-${ORACLE_VERSION}.zip
ENV ORACLE_HOME=/opt/oracle
ENV TNS_ADMIN ${ORACLE_HOME}/network/admin
WORKDIR ${ORACLE_HOME}
USER root
RUN apt-get update \
&& apt-get -yq install unzip curl \
&& apt-get clean
COPY dockerfiles/${CLIENT_ZIP} ${ORACLE_HOME}/${CLIENT_ZIP}
COPY dockerfiles/${SDK_ZIP} ${ORACLE_HOME}/${SDK_ZIP}
RUN unzip ${ORACLE_HOME}/${CLIENT_ZIP} && unzip ${ORACLE_HOME}/${SDK_ZIP} \
&& rm -f *.zip
VOLUME ["${TNS_ADMIN}"]
RUN apt-get -yq install libaio1 \
&& apt-get autoremove \
&& apt-get clean \
&& echo ${ORACLE_HOME} > /etc/ld.so.conf.d/oracle.conf \
&& mkdir -p ${TNS_ADMIN} \
&& ldconfig \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN pip install --no-cache-dir apache-airflow-providers-oracle
USER 1001
Not sure what else to try, can anyone please provide some assistance? Thanks.
You have to run the pip install as airflow user, currently you run it as root.
...
USER 1001
RUN pip install --no-cache-dir apache-airflow-providers-oracle

"Timeout in polling result file" error when executing a Dataflow flex-template job

I've tried a lot of different things found online, but I'm still unable to solve the below timeout error:
2021-11-27T14:51:21.844520452ZTimeout in polling result file: gs://...
when submitting a Dataflow flex-template job. It goes into Queued state and after 14 mins {x} secs goes to Failed state with the above log message. My Dockerfile is as follows:
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
COPY requirements.txt .
COPY test-beam.py .
# Do not include `apache-beam` in requirements.txt
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/test-beam.py"
# Setting Proxy
ENV http_proxy=http://proxy-web.{company_name}.com:80 \
https_proxy=http://proxy-web.{company_name}.com:80 \
no_proxy=127.0.0.1,localhost,.{company_name}.com,{company_name}.com,.googleapis.com
# Company Cert
RUN apt-get update && apt-get install -y curl \
&& curl http://{company_name}.com/pki/{company_name}%20Issuing%20CA.pem -o - | tr -d '\r' > /usr/local/share/ca-certificates/{company_name}.crt \
&& curl http://{company_name}.com/pki/{company_name}%20Root%20CA.pem -o - | tr -d '\r' > /usr/local/share/ca-certificates/{company_name}-root.crt \
&& update-ca-certificates \
&& apt-get remove -y --purge curl \
&& apt-get autoremove -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Set pip config to point to Company Cert
RUN pip config set global.cert /etc/ssl/certs/ca-certificates.crt
# Install apache-beam and other dependencies to launch the pipeline
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir apache-beam[gcp]==2.32.0 \
&& pip install --no-cache-dir -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \
# Download the requirements to s7peed up launching the Dataflow job.
&& pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE
# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True
ENV http_proxy= \
https_proxy= \
no_proxy=
ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
And requirements.py:
numpy
setuptools
scipy
wavefile
I know my Python script used above test-beam.py works as it executes successfully locally using a DirectRunner.
I have gone through many SO posts and GCP's own troubleshooting guide here aimed at this error, however to no success. As you can see from my Dockerfile, I have done the following in it:
Installing apache-beam[gcp] separately and not including it in my requirements.txt file.
Pre-downloading all dependencies using pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE.
Setting ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"] explicitly as it seems this is not set in the base image gcr.io/dataflow-templates-base/python3-template-launcher-base as found by executing docker inspect on it (am I correct about this?).
Unsetting company proxy at the end as it seems to be the cause of timeout issues seen in job logs from previous runs.
What am I missing? How can I fix this issue?

'pip' is not recognized as an internal or external command, operable program or batch file on docker web application

I am undergoing a web project using django and docker. The tutorial references how to set up an email service. I registered with AWS and followed a guide of how to link it to docker. The first step is to run "pip install --upgrade boto3". This is followed by the error in the title. How do I install boto3 through docker?
You can use docker-boto3 docker image instead of installing and maintaining a docker image for your self.
docker run --rm -t \
-v $HOME/.aws:/home/worker/.aws:ro \
-v ${pwd}/example:/work \
shinofara/docker-boto3 python example.py
or you can create your own docker image
FROM alpine:latest
RUN apk add --update python3 \
&& pip3 install --upgrade pip \
&& pip3 install boto3 requests PyYAML pg8000 -U \
&& ln -sv /usr/bin/python3 /usr/bin/python
ENTRYPOINT [ "python3" ]
Boto3 Dockerfile