Dataflow with python flex template - launcher timeout - google-cloud-platform

I'm trying to run my python dataflow job with flex template. job works fine locally when I run with direct runner (without flex template) however when I try to run it with flex template, job stuck in "Queued" status for a while and then fail with timeout.
Here is some of logs I found in GCE console:
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/dataflow/template/requirements.txt', '--exists-action', 'i', '--no-binary', ':all:'
Shutting down the GCE instance, launcher-202011121540156428385273524285797, used for launching.
Timeout in polling result file: gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
Possible causes are:
1. Your launch takes too long time to finish. Please check the logs on stackdriver.
2. Service my_service_account#developer.gserviceaccount.com may not have enough permissions to pull container image gcr.io/indigo-computer-272415/samples/dataflow/streaming-beam-py:latest or create new objects in gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
3. Transient errors occurred, please try again.
For 1, I see no useful lo. For 2, service account is default service account so it should all permissions.
How can I debug this further?
Here is my Docker file:
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
ADD localdeps localdeps
COPY requirements.txt .
COPY main.py .
COPY setup.py .
COPY bq_field_pb2.py .
COPY bq_table_pb2.py .
COPY core_pb2.py .
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
RUN pip install -U --no-cache-dir -r ./requirements.txt
I'm following this guide - https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates

A possible cause of this issue can be found within the requirements.txt file. If you are trying to install apache-beam within the requirements file the flex template will experience the exact issue you are describing: Jobs stay some time in the Queued state and finally fail with Timeout in polling result.
The reason being, they are affected by this issue. This only affects flex templates, the jobs run properly locally or with Standard Templates.
The solution is to install it separately in the Dockerfile.
RUN pip install -U apache-beam==<your desired version>
RUN pip install -U -r ./requirements.txt

Download the requirements to speed up launching the Dataflow job.
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
COPY . .
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
RUN apt-get update \
# Upgrade pip and install the requirements.
&& pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \
# Download the requirements to speed up launching the Dataflow job.
&& pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE
# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True

Related

Unable to load shared library 'libgdiplus' or one of its dependencies while running lambda function

I am writing an AWS Lambda function in .NET Core 3.1. I am using Aspose.slides library in the AWS Lambda function. I am publishing the AWS lambda function as docker on AWS. Lambda function successfully gets published but when i test the Lambda it gives me the following error:
Aspose.Slides.PptxReadException: The type initializer for 'Gdip' threw an exception.
---> System.TypeInitializationException: The type initializer for 'Gdip' threw an exception.
---> System.DllNotFoundException: Unable to load shared library 'libgdiplus' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: liblibgdiplus: cannot open shared object file: No such file or directory
at System.Drawing.SafeNativeMethods.Gdip.GdiplusStartup(IntPtr& token, StartupInput& input, StartupOutput& output)
at System.Drawing.SafeNativeMethods.Gdip..cctor()
Even though, i am installing the libgdiplus package from the docker file but i am still getting the above error.
Docker file is:
FROM public.ecr.aws/lambda/dotnet:core3.1 AS base
FROM mcr.microsoft.com/dotnet/sdk:3.1 as build
WORKDIR /src
COPY ["Lambda.PowerPointProcessor.csproj", "base/"]
RUN dotnet restore "base/Lambda.PowerPointProcessor.csproj"
WORKDIR "/src"
COPY . .
RUN apt-get update && apt-get install -y libc6-dev
RUN apt-get update && apt-get install -y libgdiplus
RUN dotnet build "Lambda.PowerPointProcessor.csproj" --configuration Release --output /app/build
FROM build AS publish
RUN dotnet publish "Lambda.PowerPointProcessor.csproj" \
--configuration Release \
--runtime linux-x64 \
--self-contained false \
--output /app/publish \
-p:PublishReadyToRun=true
FROM base AS final
WORKDIR /var/task
COPY --from=publish /app/publish .
CMD ["Lambda.PowerPointProcessor::Lambda.PowerPointProcessor.Function::FunctionHandler"]
Any help would be much appreciated.
FROM public.ecr.aws/lambda/dotnet:core3.1
WORKDIR /var/task
COPY "bin/Release/netcoreapp3.1/linux-x64" .
RUN yum install -y amazon-linux-extras
RUN amazon-linux-extras install epel -y
RUN yum install -y libgdiplus
CMD ["Lambda.PowerPointProcessor::Lambda.PowerPointProcessor.Function::FunctionHandler"]
This docker file resolved the issue for me. It's working fine for me.

How to identify why a program is not starting inside docker

I have a docker image which has a c++ executable with dependencies packed into it. This executable runs fine outside docker environment and i have tested it multiple times.
However inside docker it stops immediately as and when started.
To debug i have added a std::cout << "Main 1" << std::endl as soon as main() function is called. But even this is not being printed when i start the executable inside docker.
Any tips on how to debug this issue.
Adding docker file which is used to build the docker image.
FROM ubuntu:18.04
# install app dependencies
RUN apt-get -yqq update \
&& apt-get -yqq dist-upgrade \
&& apt-get -yqq install apt-utils libgomp1 libprotobuf10 libboost-thread1.65.1 libboost-filesystem1.65.1 libopencv-core3.2 libopencv-imgproc3.2 libopencv-imgcodecs3.2 libjpeg-turbo8 libpo
&& apt-get -yqq remove systemd cups perl ffmpeg apt-utils \
&& rm -rf /var/lib/apt/lists/*
# create app folder
RUN mkdir -p /opt/aimes
# copy app, dependencies and config
COPY deps/aimes /opt/aimes/
COPY deps/*.* /opt/aimes/
COPY deps/config /opt/aimes/config
# copy wrapper script
COPY run-es.sh /opt/aimes/
# run command
WORKDIR /opt/aimes
ENV LD_LIBRARY_PATH .
ENTRYPOINT ["./run-es.sh"]
Adding --cap-add=SYS_PTRACE to docker run command helped in finding out issue using gdb.
Also the solution was to add the above option to docker run command, since the exe required root permissions.
Below command solved my issue.
docker run --cap-add=SYS_PTRACE -it --rm

i am not able execute next commands after localstack --host command

FROM ubuntu:18.04
RUN apt-get update -y && \
apt-get install -y apt-utils && \
apt-get install -y python3-pip python3-dev\
pypy-setuptools
COPY . .
WORKDIR .
RUN pip3 install boto3
RUN pip3 install awscli
RUN apt-get install libsasl2-dev
ENV HOST_TMP_FOLDER=/tmp/localstack
RUN apt-get install -y git
RUN apt-get install -y npm
RUN mkdir -p .localstacktmp
ENV TMPDIR=.localstacktmp
RUN pip3 install localstack[full]
RUN SERVICES=s3,lambda,es DEBUG=1 localstack start --host
WORKDIR ./boto3Tools
ENTRYPOINT [ "python3" ]
CMD [ "script.py" ]
You can't start services in a Dockerfile.
In your case what's happening is that your Dockerfile is running RUN localstack start. That goes ahead and starts up the selected set of services and stays running, waiting for connections. Meanwhile, the Dockerfile is waiting for the command you launched to finish before it moves on.
The usual answer to this is to start servers and clients in separate containers (or start a server in a container and run clients directly from your host). In this case, there is already a localstack/localstack Docker image and a prebuilt Docker Compose setup, so you can just run it:
curl -LO https://github.com/localstack/localstack/raw/master/docker-compose.yml
docker-compose up
The localstack GitHub repo has more information on using it.
If you wanted to use a Boto-based application with this, the easiest way is to add it to the same docker-compose.yml file (or, conversely, add Localstack to the Compose setup you already have). At this point you can use normal Docker inter-container communication to reach the mock AWS, but you have to configure this in your code
s3 = boto3.client('s3',
endpoint_url='http://localstack:4566')
You have to make similar changes anyways to use localstack, so the only difference is the hostname you're setting.

How to run the bash when we trigger docker run command without -it?

I have a Dockerfile as follow:
FROM centos
RUN mkdir work
RUN yum install -y python3 java-1.8.0-openjdk java-1.8.0-openjdk-devel tar git wget zip
RUN pip install pandas
RUN pip install boto3
RUN pip install pynt
WORKDIR ./work
CMD ["bash"]
where i am installing some basic dependencies.
Now when I run
docker run imagename
it does nothing but when I run
docker run -it imageName
I lands into the bash shell. But I want to get into the bash shell as soon as I trigger the run command without any extra parameters.
I am using this docker container in AWS codebuild and there I can't specify any parameters like -it but I want to execute my code in the docker container itself.
Is it possible to modify CMD/ENTRYPOINT in such a way that when running the docker image I land right inside the container?
I checked your container, it will not even build due to missing pip. So I modified it a bit so that it at least builds:
FROM centos
RUN mkdir glue
RUN yum install -y python3 java-1.8.0-openjdk java-1.8.0-openjdk-devel tar git wget zip python3-pip
RUN pip3 install pandas
RUN pip3 install boto3
RUN pip3 install pynt
WORKDIR ./glue
Build it using, e.g.:
docker build . -t glue
Then you can run command in it using for example the following syntax:
docker run --rm glue bash -c "mkdir a; ls -a; pwd"
I use --rm as I don't want to keep the container.
Hope this helps.
We cannot login to the docker container directly.
If you want to run any specific commands when the container start in detach mode than either you can give it in CMD and ENTRYPOINT command of the Dockerfile.
If you want to get into the shell directly, you can run
docker -it run imageName
or
docker run imageName bash -c "ls -ltr;pwd"
and it will return the output.
If you have triggered the run command without -it param then you can get into the container using:
docker exec -it imageName
and you will land up into the shell.
Now, if you are using AWS codebuild custom images and concerned about how the commands can be submitted to the container than you have to put your commands into the build_spec.yaml file and put your commands either in pre_build, build or post_build parameter and those commands will be submitted to the docker container.
-build_spec.yml
version: 0.2
phases:
pre_build:
commands:
- pip install boto3 #or any prebuild configuration
build:
commands:
- spark-submit job.py
post_build:
commands:
- rm -rf /tmp/*
More about build_spec here

Getting Permission Denied error while accessing a file in Docker

I am trying to deploy a model on AWS Sagemaker and using the following docker file:
FROM ubuntu:16.04
#MAINTAINER Amazon AI <sage-learner#amazon.com>
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python3.5-dev \
gcc \
nginx \
ca-certificates \
libgcc-5-dev \
&& rm -rf /var/lib/apt/lists/*
# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.
RUN wget https://bootstrap.pypa.io/3.3/get-pip.py && python3.5 get-pip.py && \
pip3 install numpy==1.14.3 scipy lightfm scikit-optimize pandas==0.22.0 flask gevent gunicorn && \
rm -rf /root/.cache
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
# Set up the program in the image
COPY lightfm /opt/program
WORKDIR /opt/program
The docker container is built successfully, but when I write the following command:
docker run XYZ train
on my local or even on Sagemaker, I am getting the following error:
standard_init_linux.go:207: exec user process caused "permission denied"
In the docker file I am copying a folder called Lightfm and there is a file called "train" in it.
Can anyone help?
OUTPUT OF MY DOCKER BUILD:
$ docker build -t lightfm .
Sending build context to Docker daemon 41.47kB
Step 1/9 : FROM ubuntu:16.04
---> 5e13f8dd4c1a
Step 2/9 : RUN apt-get -y update && apt-get install -y --no-install-recommends wget python3.5-dev gcc nginx ca-certificates libgcc-5-dev && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 14ae3a1eb780
Step 3/9 : RUN wget https://bootstrap.pypa.io/3.3/get-pip.py && python3.5 get-pip.py && pip3 install numpy==1.14.3 scipy lightfm scikit-optimize pandas==0.22.0 flask gevent gunicorn && rm -rf /root/.cache
---> Using cache
---> 5a2727e27385
Step 4/9 : ENV PYTHONUNBUFFERED=TRUE
---> Using cache
---> 43bf8c5e8414
Step 5/9 : ENV PYTHONDONTWRITEBYTECODE=TRUE
---> Using cache
---> 7d2c45d61cec
Step 6/9 : ENV PATH="/opt/program:${PATH}"
---> Using cache
---> f3cc6313c0d9
Step 7/9 : COPY lightfm /opt/program
---> ad929ba84692
Step 8/9 : WORKDIR /opt/program
---> Running in a040dd0bab03
Removing intermediate container a040dd0bab03
---> 8f53c5a3ba63
Step 9/9 : RUN chmod 755 serve
---> Running in 5666abb27cd0
Removing intermediate container 5666abb27cd0
---> e80aca934840
Successfully built e80aca934840
Successfully tagged lightfm:latest
SECURITY WARNING: You are building a Docker image from Windows against a non-Windows Docker host. All files and directories added to build context will have '-rwxr-xr-x' permissions. It is recommended to double check and reset permissions for sensitive files and directories.
Assuming train is the executable you want to run, give it exec permission. After COPY lightfm /opt/program line, add RUN chmod +x /opt/program/train.