When I took over a project, I found a command "RUN true" in the Dockerfile.
FROM xxx
RUN xxx
RUN true
RUN xxx
I don't know what this command does, can anyone help explain. In my opinion, this command makes no sense, but i'm not sure if there is any other use.
There is doc about Creating Images, you can see it:
RUN true \
&& dnf install -y --setopt=tsflags=nodocs \
httpd vim \
&& systemctl enable httpd \
&& dnf clean all \
&& true
#David Maze
test for it. docker file:
FROM centos:7.9.2009
RUN yum install tmux -y
RUN yum install not_exists -y
build log:
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM centos:7.9.2009
---> eeb6ee3f44bd
Step 2/3 : RUN yum install tmux -y
---> Running in 6c6e29ea9f2c
...omit...
Complete!
Removing intermediate container 6c6e29ea9f2c
---> 7c796c2b5260
Step 3/3 : RUN yum install not_exists -y
---> Running in e4b7096cc42b
...omit...
No package not_exists available.
Error: Nothing to do
The command '/bin/sh -c yum install not_exists -y' returned a non-zero code: 1
modify dockefile:
FROM centos:7.9.2009
RUN yum install tmux -y
RUN yum install tree -y
build log:
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM centos:7.9.2009
---> eeb6ee3f44bd
Step 2/3 : RUN yum install tmux -y
---> Using cache
---> 7c796c2b5260
Step 3/3 : RUN yum install tree -y
---> Running in 180b32cb44f3
...omit...
Installed:
tree.x86_64 0:1.6.0-10.el7
Complete!
Removing intermediate container 180b32cb44f3
---> 4e905ed25cc2
Successfully built 4e905ed25cc2
Successfully tagged test:v0
you can see Using cache 7c796c2b5260. without a command "RUN true", but the first "RUN" cache is reusged.
RUN true as a standalone command does absolutely nothing and it's safe to delete it.
/bin/true is a standard shell command. It reads no input, produces no output, and neither reads nor writes files; it just exits with a status code of 0 ("success"). Running it as a Docker step will have no effect on the final image other than inserting an additional layer into the docker history.
The one clever use I can think of for this is to cause a later part of a Dockerfile to re-run. Imagine a Dockerfile like
RUN some_expensive_command http://server-a.example.com/input1
RUN another_expensive_command http://server-b.example.com/input2
If the second input changes, you could want to rebuild this image. docker build --no-cache will re-run the first step too, though, and this could take longer than you want. Inserting a RUN true line between the two lines would break Docker's layer caching, but only after the first command has run.
# identical RUN line as before, from cache
RUN some_expensive_command http://server-a.example.com/input1
# not the same RUN line, so "executes" (but does nothing)
RUN true
# not running commands from cache any more
RUN another_expensive_command http://server-b.example.com/input2
I found an already existing answer which explains it quite well.
And if I quote the answer here:
Running and thus creating a new container even if it terminates still keeps the resulting container image and metadata lying around which can still be linked to.
So when you run docker run ... /bin/true you are essentially creating a new container for storage purposes and running the simplest thing you can.
In Docker 1.5 was introduced the docker create command so I believe you can now "create" containers without confusingly running something like /bin/true
And I found an quick explanation from the best practices github page under section '#chaining-commands' saying:
The first and last commands of the block are special.
If you would like to prepend or append a one-line command to the block, you will have to edit two lines - one that you are adding and the first or last commands. The first command is on the same line as the RUN directive, whereas the last command lacks the trailing backslash.
Editing a line with a command that that you don’t want to change presents a risk of introducing a bug, and you also obscure the line’s history. This can be mitigated by having both the first and last commands true - they don’t do anything.
Related
I am trying to use a post-startup script to create a Vertex AI User Managed Notebook whose Jupyter Lab has a dedicated virtual environment and corresponding computing kernel when first launched. I have had success creating the instance and then, as a second manual step from within the Jupyter Lab > Terminal, running a bash script like so:
#!/bin/bash
cd /home/jupyter
mkdir -p env
cd env
python3 -m venv envName --system-site-packages
source envName/bin/activate
envName/bin/python3 -m pip install --upgrade pip
python -m ipykernel install --user --name=envName
pip3 install geemap --user
pip3 install earthengine-api --user
pip3 install ipyleaflet --user
pip3 install folium --user
pip3 install voila --user
pip3 install jupyterlab_widgets
deactivate
jupyter labextension install --no-build #jupyter-widgets/jupyterlab-manager jupyter-leaflet
jupyter lab build --dev-build=False --minimize=False
jupyter labextension enable #jupyter-widgets/jupyterlab-manager
However, I have not had luck using this code as a post-startup script (being supplied through the console creation tools, as opposed to command line, thus far). When I open Jupyter Lab and look at the relevant structures, I find that there is no environment or kernel. Could someone please provide a working example that accomplishes my aim, or otherwise describe the order of build steps that one would follow?
Post startup scripts run as root.
When you run:
python -m ipykernel install --user --name=envName
Notebook is using current user which is root vs when you use Terminal, which is running as jupyter user.
Option 1) Have 2 scripts:
Script A. Contents specified in original post. Example: gs://newsml-us-central1/so73649262.sh
Script B. Downloads script and execute it as jupyter. Example: gs://newsml-us-central1/so1.sh and use it as post-startup script.
#!/bin/bash
set -x
gsutil cp gs://newsml-us-central1/so73649262.sh /home/jupyter
chown jupyter /home/jupyter/so73649262.sh
chmod a+x /home/jupyter/so73649262.sh
su -c '/home/jupyter/so73649262.sh' jupyter
Option 2) Create a file in bash using EOF. Write the contents into a single file and execute it as mentioned above.
This is being posted as support context for the accepted solution from #gogasca.
#gogasca's suggestion (I'm using Option 1) works great, if you are patient. Through many attempts, I discovered that inconsistent behavior was based on timing of access. Using Option 1, the User Managed Notebook appears available for use in Vertex AI Workbench (green check and clickable "OPEN JUPYTERLAB" link) before the installation script(s) have finished.
If you open the Notebook too soon, you will find two things: (1) you will be prompted for a recommended Jupyter Lab build, for instance:
Build Recommended
JupyterLab build is suggested:
#jupyter-widgets/jupyterlab-manager changed from file:../extensions/jupyter-widgets-jupyterlab-manager-3.1.1.tgz to file:../extensions/jupyter-widgets-jupyterlab-manager-5.0.3.tgz
and (2) while the custom environment/kernel is present and accessible, if you try to use ipyleaflet or ipywidget tools, you will see one of several JavaScript errors, depending on how quickly you try to use the kernel, relative to the build that is (apparently) continuing to take place in the background: Error displaying widget: model not found, and/or a broken page icon with a JavaScript error, that, if clicked, will show you something like:
[Open Browser Console for more detailed log - Double click to close this message]
Failed to load model class 'LeafletMapModel' from module 'jupyter-leaflet'
Error: No version of module jupyter-leaflet is registered
at f.loadClass (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/134.bcbea9feb6e7c4da7530.js?v=bcbea9feb6e7c4da7530:1:74856)
at f.loadModelClass (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:10729)
at f._make_model (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:7517)
at f.new_model (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:5137)
at https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:6385
at Array.map ()
at f._loadFromKernel (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/150.3e1e5adfd821b9b96340.js?v=3e1e5adfd821b9b96340:1:6278)
at async f.restoreWidgets (https://someURL.notebooks.googleusercontent.com/lab/extensions/#jupyter-widgets/jupyterlab-manager/static/134.bcbea9feb6e7c4da7530.js?v=bcbea9feb6e7c4da7530:1:77764)
The solution here is to keep waiting. In my demo script, I transfer a file at the end of the build process. If I wait long enough for this file to actually appear in the Instance directories, the recommendation for a rebuild is absent and the extensions work properly.
I am trying to customize Amazon SageMaker Notebook Instances using Lifecycle Configurations because I need to install additional pip packages. What it means is I have to create a on-start.sh and on-create.sh script within a lifecycle configuration. You can see a sample here.
Now, I have many packages and the installation time might go over 5 minutes, causing a potential timeout. It is suggested to use nohup to run the script as a background job in that case.
But how do I run this with a nohup since I do not have a terminal in this case [see above screenshot]? Is there a way to run the script as a background job from within the script? Anything else I am missing? Please suggest
I have done this before, install many libraries for around 15 minutes. I wrapped the script I actually want to run in a create.sh and run that create.sh using nohup. Now the logs of these you can view on cloudwatch and also sagemaker start wont time out with a plus that you will have nohup.out file where you executed the nohup.
Below I wrapped script in https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/tree/master/scripts/export-to-pdf-enable into create.sh
#!/bin/bash
set -e
cat <<'EOF'>create.sh
#!/bin/bash
sudo -u ec2-user -i <<'EOF'
set -e
# OVERVIEW
# This script enables Jupyter to export a notebook directly to PDF.
# nbconvert depends on XeLaTeX and several LaTeX packages that are non-trivial to
# install because `tlmgr` is not included with the texlive packages provided by yum.
# REQUIREMENTS
# Internet access is required in on-create.sh in order to fetch the latex libraries from the ctan mirror.
sudo yum install -y texlive*
unset SUDO_UID
ln -s /home/ec2-user/SageMaker/.texmf /home/ec2-user/texmf
EOF
echo 'EOF' >> create.sh
nohup bash create.sh &
I'm trying to run my python dataflow job with flex template. job works fine locally when I run with direct runner (without flex template) however when I try to run it with flex template, job stuck in "Queued" status for a while and then fail with timeout.
Here is some of logs I found in GCE console:
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/dataflow/template/requirements.txt', '--exists-action', 'i', '--no-binary', ':all:'
Shutting down the GCE instance, launcher-202011121540156428385273524285797, used for launching.
Timeout in polling result file: gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
Possible causes are:
1. Your launch takes too long time to finish. Please check the logs on stackdriver.
2. Service my_service_account#developer.gserviceaccount.com may not have enough permissions to pull container image gcr.io/indigo-computer-272415/samples/dataflow/streaming-beam-py:latest or create new objects in gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
3. Transient errors occurred, please try again.
For 1, I see no useful lo. For 2, service account is default service account so it should all permissions.
How can I debug this further?
Here is my Docker file:
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
ADD localdeps localdeps
COPY requirements.txt .
COPY main.py .
COPY setup.py .
COPY bq_field_pb2.py .
COPY bq_table_pb2.py .
COPY core_pb2.py .
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
RUN pip install -U --no-cache-dir -r ./requirements.txt
I'm following this guide - https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates
A possible cause of this issue can be found within the requirements.txt file. If you are trying to install apache-beam within the requirements file the flex template will experience the exact issue you are describing: Jobs stay some time in the Queued state and finally fail with Timeout in polling result.
The reason being, they are affected by this issue. This only affects flex templates, the jobs run properly locally or with Standard Templates.
The solution is to install it separately in the Dockerfile.
RUN pip install -U apache-beam==<your desired version>
RUN pip install -U -r ./requirements.txt
Download the requirements to speed up launching the Dataflow job.
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
COPY . .
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
RUN apt-get update \
# Upgrade pip and install the requirements.
&& pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \
# Download the requirements to speed up launching the Dataflow job.
&& pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE
# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True
I have a custom build step in Google Cloud Build, which first builds a docker image and then deploys it as a cloud run service.
This last step fails, with the following log output;
Step #2: Deploying... Step #2: Setting IAM Policy.........done Step
2: Creating Revision............................................................................................................................failed
Step #2: Deployment failed Step #2: ERROR: (gcloud.run.deploy) Cloud
Run error: Invalid argument error. Invalid ENTRYPOINT. [name:
"gcr.io/opencobalt/silo#sha256:fb860e758eb1957b90ff3761fcdf68dedb9d10f832f2bb21375915d3de2aaed5"
Step #2: error: "Invalid command \"/bin/sh\": file not found" Step #2:
]. Finished Step #2 ERROR ERROR: build step 2
"gcr.io/cloud-builders/gcloud" failed: step exited with non-zero
status: 1
The build steps look like this;
["run","deploy","silo","--image","gcr.io/opencobalt/silo","--region","us-central1","--platform","managed","--allow-unauthenticated"]}
The image is built an exists in the registry, and if I change the last build step to deploy a compute engine VM instead, it works. Those build steps looks like this;
{"name":"gcr.io/cloud-builders/gcloud","args":["compute","instances",
"create-with-container","silo","--container-image","gcr.io/opencobalt/silo","--zone","us-central1-a","--tags","silo,pharo"]}
I can also build the image locally but run into the same error when running gcloud run deploy locally.
I am trying to figure out how to solve this problem. The image works, since it runs fine locally and runs fine when deployed as a Compute Engine VM, the error only show up when I'm trying to deploy the image as a Cloud Run service.
(added) The Dockerfile looks like this;
######################################
# Based on Ubuntu image
######################################
FROM ubuntu
######################################
# Basic project infos
######################################
LABEL maintainer="PeterSvensson"
######################################
# Update Ubuntu apt and install some tools
######################################
RUN apt-get update \
&& apt-get install -y wget \
&& apt-get install -y git \
&& apt-get install -y unzip \
&& rm -rf /var/lib/apt/lists/*
######################################
# Have an own directory for the tool
######################################
RUN mkdir webapp
WORKDIR webapp
######################################
# Download Pharo using Zeroconf & start script
######################################
RUN wget -O- https://get.pharo.org/64/80+vm | bash
COPY service_account.json service_account.json
RUN export certificate="$(cat service_account.json)"
COPY load.st load.st
COPY setup.sh setup.sh
RUN chmod +x setup.sh
RUN ./setup.sh; echo 0
RUN ./pharo Pharo.image load.st; echo 0
######################################
# Expose port 8080 of Zinc outside the container
######################################
EXPOSE 8080
######################################
# Finally run headless as server
######################################
CMD ./pharo --headless Pharo.image --no-quit
Any advice warmly welcome.
Thank you.
After a lot of testing, I managed to come further. It seems that the /bin/sh missing file thing is a red herring.
I tried to change the startup command from CMD to ENTRYPOINT, since that was mentioned in the error, but it did not work. However, when I copied the startup instruction into a new file 'startup.sh' and changed the last line of the Dockerfile to;
ENTRYPOINT ./startup.sh
It did work. I needed to chmod +x the new file of course, but the strange thing is that ENTRYPOINT ./pharo --headless Pharo.image --no-quit gave the same error, and even ENTRYPOINT ["./pharo", "--headless", "Pharo.image", "--no-quit"] also gave the same error.
But having just one argument to ENTRYPOINT made cloud run work. Go figure.
It appears that Google Cloud Run has a dislike for the ubuntu:20.04 image. I have the exact same problem with a Play framework application.
The command
ENTRYPOINT /opt/play-codecheck/bin/play-codecheck -Dconfig.file=/opt/codecheck/production.conf
failed with
error: "Invalid command \"/bin/sh\": file not found"
I also tried
ENTRYPOINT ["/bin/bash", "/opt/play-codecheck/bin/play-codecheck", "-Dconfig.file=/opt/codecheck/production.conf"]
and was rewarded with
error: "Invalid command \"/bin/bash\": file not found"
The trick of putting the command in a shell script didn't work for me either. However, when I changed
FROM ubuntu:20.04
to
FROM ubuntu:18.04
the image deployed. At this point, that's an acceptable fix for me, but it seems like something that Google needs to address.
See also:
Unable to deploy Ubuntu 20.04 Docker container on Google Cloud Run
My workaround was to use a CMD directive that calls Python directly rather than a shell (either /bin/sh or /bin/bash). It's working well so far.
If I want to build my Dockerfile, it can't connect to the network or at least DNS:
Sending build context to Docker daemon 15.95 MB
Sending build context to Docker daemon
Step 0 : FROM ruby
---> eeb85dfaa855
Step 1 : RUN apt-get update -qq && apt-get install -y build-essential libpq-dev
---> Running in ec8cbd41bcff
W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie/InRelease
W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie-updates/InRelease
W: Failed to fetch http://security.debian.org/dists/jessie/updates/InRelease
W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie/Release.gpg Could not resolve 'httpredir.debian.org'
W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie-updates/Release.gpg Could not resolve 'httpredir.debian.org'
W: Failed to fetch http://security.debian.org/dists/jessie/updates/Release.gpg Could not resolve 'security.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package build-essential
INFO[0001] The command "/bin/sh -c apt-get update -qq && apt-get install -y build-essential libpq-dev" returned a non-zero code: 100
But if I run exactly the same command via docker run it works:
docker run --name="test" ruby /bin/sh -c 'apt-get update -qq && apt-get install -y build-essential libpq-dev'
Does anybody have an idea, why docker build does not work? I have tried all DNS related tipps on StackOverflow, like starting docker with --dns 8.8.8.8 etc.
Thanks in advance
Check what networks are available on your host with the below command:
docker network ls
then pick one that you know is working, the host one could be a good candidate.
Now assuming you are in the directory where it is available your Dokerfile, build your image appending the flag --networks and change the <image-name> with yours:
docker build . -t <image-name> --no-cache --network=host
Docker definitely seems to have some network issues. I managed to fix this problem with
systemctl restart docker
... which is basically just the unix-level 'restart-the-daemon' command in Debian 8.
I had similar problem. But as I was running AWS linux i had no systemctl. I solved using:
sudo service docker restart
My docker build also failed while trying to run apt-get upgrade with the exact same errors. I was using docker-machine on Mac OSX and a simple docker-machine restart default solved this issue. No idea what initially caused this, though.
Another case of the above reported behaviour - this time building a docker image from Jenkins:
[...]
Step 3 : RUN apt-get update && apt-get install -y curl libapache2-mod-proxy-html
---> Running in ea7aca5dea9b
Err http://security.debian.org jessie/updates InRelease
Err http://security.debian.org jessie/updates Release.gpg
Could not resolve 'security.debian.org'
Err http://httpredir.debian.org jessie InRelease
[...]
In my case it turned out that the DNS wasn't reachable from within the container - but still from the docker host !? (The containers resolver configuration was okay(!))
After restarting the docker machine (a complete reboot - a 'docker.service restart' didn't do the trick) it's been working again.
So one of my activities (or of a colleague of mine) must have broken the docker networking then !?? Maybe some firewalld modification activity ???
I'm still investigating as I'm not sure which activity may have corrupted the docker networking then ...
I have the exact same issue with a Raspberry.
Start/stopping the service did not help, but re-installing the package (dpkg -i docker-hypriot_1.10.3-1_armhf.deb && service docker start in my case) immediately solved the situation : apt-get update manages to resolve and reach the servers.
There must be some one-shot actions in the installation process...
Also faced the same issue today. My workaround was to restart your docker-machine. In my case, it's on VirtualBox.
Once you power off it and then restart the machine, http://security.debian.org seemed resolved.
Hope this helps.
A couple of suggestions, not sure if they will work or not. Can you change the ...apt-get install -y... to ...apt-get install -yqq...
Also, has that image changed that you're trying to build from?