I am trying to set up an Oracle database connection on airflow.
I am getting this error:
ModuleNotFoundError: No module named 'airflow.providers.oracle' when using: from airflow.provideres.oracle.hooks.oracle import OracleHook
Part of my dag file:
from airflow.decorators import task
from airflow.providers.oracle.hooks.oracle import OracleHook
def exe_query_oracle_hook():
hook = OracleHook(oracle_conn_id="oracle_conn")
df = hook.get_pandas_df(sql='SELECT * FROM TABLE')
print(df.to_string())
I tried installing pip install apache-airflow-providers-oracle and most were already required, my current version is 2.1.0. I also followed the docs: airflow building custom images. Here is my Dockerfile
FROM apache/airflow:2.1.0
ARG ORACLE_VERSION=11.2.0.4.0
ARG ORACLE_SHORT_VER=11204
ENV CLIENT_ZIP=instantclient-basiclite-linux.x64-${ORACLE_VERSION}.zip
ENV SDK_ZIP=instantclient-sdk-linux.x64-${ORACLE_VERSION}.zip
ENV ORACLE_HOME=/opt/oracle
ENV TNS_ADMIN ${ORACLE_HOME}/network/admin
WORKDIR ${ORACLE_HOME}
USER root
RUN apt-get update \
&& apt-get -yq install unzip curl \
&& apt-get clean
COPY dockerfiles/${CLIENT_ZIP} ${ORACLE_HOME}/${CLIENT_ZIP}
COPY dockerfiles/${SDK_ZIP} ${ORACLE_HOME}/${SDK_ZIP}
RUN unzip ${ORACLE_HOME}/${CLIENT_ZIP} && unzip ${ORACLE_HOME}/${SDK_ZIP} \
&& rm -f *.zip
VOLUME ["${TNS_ADMIN}"]
RUN apt-get -yq install libaio1 \
&& apt-get autoremove \
&& apt-get clean \
&& echo ${ORACLE_HOME} > /etc/ld.so.conf.d/oracle.conf \
&& mkdir -p ${TNS_ADMIN} \
&& ldconfig \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN pip install --no-cache-dir apache-airflow-providers-oracle
USER 1001
Not sure what else to try, can anyone please provide some assistance? Thanks.
You have to run the pip install as airflow user, currently you run it as root.
...
USER 1001
RUN pip install --no-cache-dir apache-airflow-providers-oracle
Related
We have been using Google Cloud Build via build triggers for our GitHub repository which holds a C++ application that is deployed via Google Cloud Kubernetes Cluster.
As seen above, our build configuration is arriving from Dockerfile which is located in our GitHub repository.
Everything is working as expected, however our builds lasts about 55+ minutes. I would like to add Kaniko Cache support as suggested [here], however Google Cloud document only suggests a way to add it via a yaml file as below :
steps:
- name: 'gcr.io/kaniko-project/executor:latest'
args:
- --destination=gcr.io/$PROJECT_ID/image
- --cache=true
- --cache-ttl=XXh
How shall I achieve Kaniko builds with a Dockerfile based trigger ?
FROM --platform=amd64 ubuntu:22.10
ENV GCSFUSE_REPO gcsfuse-stretch
RUN apt-get update && apt-get install --yes --no-install-recommends \
ca-certificates \
curl \
gnupg \
&& echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" \
| tee /etc/apt/sources.list.d/gcsfuse.list \
&& curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - \
&& apt-get update \
&& apt-get install --yes gcsfuse \
&& apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
EXPOSE 80
RUN \
sed -i 's/# \(.*multiverse$\)/\1/g' /etc/apt/sources.list && \
apt-get update && \
apt-get -y upgrade && \
apt-get install -y build-essential && \
apt-get install -y gcc && \
apt-get install -y software-properties-common && \
apt install -y cmake && \
apt-get install -y make && \
apt-get install -y clang && \
apt-get install -y mesa-common-dev && \
apt-get install -y git && \
apt-get install -y xorg-dev && \
apt-get install -y nasm && \
apt-get install -y byobu curl git htop man unzip vim wget && \
rm -rf /var/lib/apt/lists/*
# Update and upgrade repo
RUN apt-get update -y -q && apt-get upgrade -y -q
COPY . /app
RUN cd /app
RUN ls -la
# Set environment variables.
ENV HOME /root
ENV WDIR /app
# Define working directory.
WORKDIR /app
RUN cd /app/lib/glfw && cmake -G "Unix Makefiles" && make && apt-get install libx11-dev
RUN apt-cache policy libxrandr-dev
RUN apt install libxrandr-dev
RUN cd /app/lib/ffmpeg && ./configure && make && make install
RUN cmake . && make
# Define default command.
CMD ["bash"]
Any suggestions are quite welcome.
As I mentioned in the comment, You can only add your kaniko in your cloudbuild.yaml files as its also the only options shown in this github link but you can add the --dockerfile argument to find your Dockerfile path.
I can build this Dockerfile normally, but when i run the container, the python
application crashes. After a while, I got into the container to debug and realized that happened because somehow the mariadb service was down, even after I turned it on in this line :RUN service mariadb start && sleep 3 && \ . I already fixed this by creating another Dockerfile with different commands, but do someone know why the mariadb service suddently got down ?
FROM debian
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt install python3 python3-venv debconf-utils -y && \
echo mariadb-server mysql-server/root_password password r00tp#ssw0rd | debconf-set-selections && \
echo mariadb-server mysql-server/root_password_again password r00tp#ssw0rd | debconf-set-selections && \
DEBIAN_FRONTEND=noninteractive apt-get install -y \
mariadb-server \
&& \
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN python3 -m venv venv
COPY requirements.txt .
RUN /app/venv/bin/pip3 install -r requirements.txt
COPY . .
RUN useradd -ms /bin/bash app
RUN chown app:app -R /app
RUN service mariadb start && sleep 3 && \
mysql -uroot -pr00tp#ssw0rd -e "CREATE USER app#localhost IDENTIFIED BY 'sup3r#ppp#ssw0rd';CREATE DATABASE my_lab_1; GRANT ALL PRIVILEGES ON my_lab_1.* TO 'app'#'localhost';" && \
mysql -uroot -pr00tp#ssw0rd -D "my_lab_1" < makedb.sql
EXPOSE 8000
CMD ["/app/venv/bin/python3","/app/run.py"]
I've tried a lot of different things found online, but I'm still unable to solve the below timeout error:
2021-11-27T14:51:21.844520452ZTimeout in polling result file: gs://...
when submitting a Dataflow flex-template job. It goes into Queued state and after 14 mins {x} secs goes to Failed state with the above log message. My Dockerfile is as follows:
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
COPY requirements.txt .
COPY test-beam.py .
# Do not include `apache-beam` in requirements.txt
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/test-beam.py"
# Setting Proxy
ENV http_proxy=http://proxy-web.{company_name}.com:80 \
https_proxy=http://proxy-web.{company_name}.com:80 \
no_proxy=127.0.0.1,localhost,.{company_name}.com,{company_name}.com,.googleapis.com
# Company Cert
RUN apt-get update && apt-get install -y curl \
&& curl http://{company_name}.com/pki/{company_name}%20Issuing%20CA.pem -o - | tr -d '\r' > /usr/local/share/ca-certificates/{company_name}.crt \
&& curl http://{company_name}.com/pki/{company_name}%20Root%20CA.pem -o - | tr -d '\r' > /usr/local/share/ca-certificates/{company_name}-root.crt \
&& update-ca-certificates \
&& apt-get remove -y --purge curl \
&& apt-get autoremove -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Set pip config to point to Company Cert
RUN pip config set global.cert /etc/ssl/certs/ca-certificates.crt
# Install apache-beam and other dependencies to launch the pipeline
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir apache-beam[gcp]==2.32.0 \
&& pip install --no-cache-dir -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \
# Download the requirements to s7peed up launching the Dataflow job.
&& pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE
# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True
ENV http_proxy= \
https_proxy= \
no_proxy=
ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
And requirements.py:
numpy
setuptools
scipy
wavefile
I know my Python script used above test-beam.py works as it executes successfully locally using a DirectRunner.
I have gone through many SO posts and GCP's own troubleshooting guide here aimed at this error, however to no success. As you can see from my Dockerfile, I have done the following in it:
Installing apache-beam[gcp] separately and not including it in my requirements.txt file.
Pre-downloading all dependencies using pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE.
Setting ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"] explicitly as it seems this is not set in the base image gcr.io/dataflow-templates-base/python3-template-launcher-base as found by executing docker inspect on it (am I correct about this?).
Unsetting company proxy at the end as it seems to be the cause of timeout issues seen in job logs from previous runs.
What am I missing? How can I fix this issue?
I have the following Dockerfile:
# Use Python base image from DockerHub
FROM python:2.7
WORKDIR /salmon
# INSTALL CMAKE
RUN apt-get update && apt-get install -y sudo \
&& sudo apt-get update \
&& sudo apt-get install -y \
python \
cmake \
wget
#INSTALL BOOST
RUN wget https://dl.bintray.com/boostorg/release/1.66.0/source/boost_1_66_0.tar.gz \
&& mv boost_1_66_0.tar.gz /usr/local/bin/ \
&& cd /usr/local/bin/ \
&& tar -xzf boost_1_66_0.tar.gz \
&& cd ./boost_1_66_0/ \
&& ./bootstrap.sh \
&& ./b2 install
#INSTALL SALMON
RUN wget https://github.com/COMBINE-lab/salmon/releases/download/v0.14.1/salmon-0.14.1_linux_x86_64.tar.gz \
&& mv salmon-0.14.1_linux_x86_64.tar.gz /usr/local/bin/ \
&& cd /usr/local/bin/ \
&& tar -xzf salmon-0.14.1_linux_x86_64.tar.gz \
&& cd salmon-latest_linux_x86_64/
ENV PATH=/salmon/
ADD . /salmon
When I try to run it interactively via sudo docker run -v ~/Documents/Docker/salmon_test/:/data -it salmon:00.00.01, I get the error:
"exec: \"python2\": executable file not found in $PATH": unknown."
I don't understand why I'm getting this error. I even added the sudo apt-get install python command (which I didn't have before) but that didn't solve this either. Any thoughts?
This is because of overriding the $PATH variable and as a result, the container failed to find executable.
Default PATH value is
/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
So when you set this to /salmon/ you can then call python using full path like /usr/local/bin/python, btw you should not update PATH variable like this.
Better to update with the existing PATH variable.
FROM python:2.7
ENV PATH="/salmon/:${PATH}"
.
.
Hello I have the following docker container definition
FROM temp_base_image_name_for_post
RUN apt-get update \
&& apt-get install -y python3 \
&& apt-get install -y python3-pip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip3 install boto3
ENV INSTALL_PATH /docker-flowcell-restore
RUN mkdir -p $INSTALL_PATH
WORKDIR $INSTALL_PATH
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY /src/* $INSTALL_PATH/src/
ENTRYPOINT python3 src/main.py
My terraform module points to this container and has a parameter called --object_key
the module submission is getting the parameters correctly, but it is not being populated in my docker for my python script. How do I modify my current docker image definition in order to get my arguments that are passed into my terraform definition?
for future reference the fix was
FROM temp_base_image_name_for_post
RUN apt-get update \
&& apt-get install -y python3 \
&& apt-get install -y python3-pip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip3 install boto3
ENV INSTALL_PATH /docker-flowcell-restore
RUN mkdir -p $INSTALL_PATH
WORKDIR $INSTALL_PATH
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY /src/* $INSTALL_PATH/src/
ENTRYPOINT python3 src/main.py
to
FROM temp_base_image_name_for_post
RUN apt-get update \
&& apt-get install -y python3 \
&& apt-get install -y python3-pip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip3 install boto3
ENV INSTALL_PATH /docker-flowcell-restore
RUN mkdir -p $INSTALL_PATH
WORKDIR $INSTALL_PATH
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY /src/* $INSTALL_PATH/src/
**ENTRYPOINT ["python3", "src/main.py"]**