I'm new to docker and I'm trying to process a lot of data on AWS. Right now, the input for the scripts I want to run using my parent image is about 20G.
First, I tried just copying the data into my image on the writeable layer (using COPY), but then I got the error
Sending build context to Docker daemon 20.53 GB
Error response from daemon: Error processing tar file(exit status 1): write /Input/dbsnp_138.b37.vcf: no space left on device
So I thought that 20G would be too much to just store on my writeable layer.
Then I looked at mounting a volume on the docker host (using VOLUME), but wouldn't that also need to be written on the writeable layer first? Wouldn't that also give me the same error?
So the way docker build works is this, when you use something like below
docker build .
It will send the content of the current directory to the Docker daemon. So if you are sending a context of 20GB then an additional 20GB+ of free space would be needed.
When you mount volumes on a running image, it would actually mount your folder, so no extra space would be used. But if delete the directory from inside the container, the file on the host will also be deleted.
So mounting the directory is a possible solution. Also if there are files in your current directory which you would not be interested in sending to docker daemon as a context, then you should use a .dockerignore to specify which files to ignore
Related
I have Java application that extracts content from zip archive.
When launching it as a Fargate task, it produces the following error:
java.util.zip.ZipException: invalid block type
I could get similar zlib error while running an app locally and setting non-writable directory to extract zip archive content to. It works otherwise.
Using various directories inside Docker layer did not help (i tried /tmp and WORKDIR, I also tried User: root in ContainerDefinition), neither did an approach with mounting writable volume in ContainerDefinition.
According to documentation, Fargate provides 10GB for writable upper Docker layer and 4GB for mounted volume. Why I cannot extract zip archive?
I cannot trace it further, as Fargate does not provide an option for this, and I could not get more informative Java exception.
It turned out that zip file I've tried to upload has zero buffer. I needed to add Binary media type support to Api Gateway for zip file to upload correctly to destination Task.
I have an Elastic BeanStalk environment where I run my application on 1 EC2 instance. I've added load balancer, when I configured the environment initially, but since then I set it only use 1 instance.
Application run within container apparently produces quite a lot of logs - after several days they use up whole disk space and then application crash. Health check drops to severe.
I see that terminating instance manually helps - environment removes old instance and creates a new one that works (until it fills up the whole disk again).
What are my options? A script that regularly cleans up logs? Some log rotation? Trigger that reboots instance when disk is nearly full?
I do not write anything to file myself - my application only log to std out and std err, so writing to file is done by EC2/EBS wrapper. (I deploy the application as a ZIP containing a JAR, a bash script and Procfile if that is relevant).
By default EB will rotate some of the logs produced by the Docker containers, but not all of them. After contacting support on this issue I received the following helpful config file, to be placed in the source path .ebextensions/liblogrotate.config:
files:
"/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.containers.conf":
mode: "00644"
owner: "root"
group: "root"
content: |
/var/lib/docker/containers/*/*.log {
size 10M
rotate 5
missingok
compress
notifempty
copytruncate
dateext
dateformat %s
olddir /var/lib/docker/containers/rotated
}
"/etc/cron.hourly/cron.logrotate.elasticbeanstalk.containers.conf":
mode: "00755"
owner: "root"
group: "root"
content: |
#!/bin/sh
test -x /usr/sbin/logrotate || exit 0
/usr/sbin/logrotate /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.containers.conf
container_commands:
create_rotated_dir:
command: mkdir -p /var/lib/docker/containers/rotated
test: test ! -d /var/lib/docker/containers/rotated
99_cleanup:
command: rm /etc/cron.hourly/*.bak /etc/logrotate.elasticbeanstalk.hourly/*.bak
ignoreErrors: true
What this does is install an additional log rotation configuration and cron task for the /var/lib/docker/containers/*/*.log files which are the ones not automatically rotated on EB.
Eventually, however, the rotated logs themselves will fill up the disk if the host lives long enough. For this, you can add shred in the list of logrotation options (along side compress notifempty etc).
(However, I'm not sure if the container logs that are already configured for rotation are set to be shredded, probably not - so those may accumulate too and require modification of the default EB log rotation config. Not sure how to do that yet. But the above solution in most cases would be sufficient since hosts typically do not live that long. The volume of logging and lifetime of your containers may force you to go even further.)
Logrotation is the way forward. You can create a configuration file in `/etc/logrotate.d/' where you state your options in order to avoid having large log files.
You can read more about the configurations here https://linuxconfig.org/setting-up-logrotate-on-redhat-linux
A sample configuration file would look something like this:
/var/log/your-large-log.log {
missingok
notifempty
compress
size 20k
daily
create 0600 root root
}
You can also test the new configuration file from the cli by running the follow:
logrotate -d [your_config_file]
This will test if the log rotation will be successful or not but only in debugging mode, therefore the log file will not be actually rotated.
I have two containers, one is web-server based on Node.JS with assets directory. Another container is nginx which proxify page requests to web-server and getting statics from assets directory.
I created AWS cluster, EC2 instance, built and pushed docker images to registry, made tasks to deploy my applications, but I can't share with assets directory to nginx because directory is not part of this container.
So to solve my problem I figured out to create EFS and attach the volume, add permissions to ec2-user and makes directory available by path /var/html/assets.
Cool and how to copy assets content from my web-server docker container to /var/html/assets?
I want to make it public / shared because soon I will make additional servers which should also place assets to this common directory.
The process should be automized and work on each deployment, guys, any suggestions? Thanks!
To copy assets content from your web-server docker container to your host machine,
say you want to save your assets content from container to /var/html/assets on host machine, use this command to run your container:
docker run --name=nginx -d -v ~/var/html/assets:[Your Container path] -p 5000:80 nginx
-v ~/var/html/assets:[Your Container path] Sets up a bindmount volume that links [Your Container path] directory from inside the Nginx container to the ~/var/html/assets directory on the host machine. Docker uses a : to split the host's path from the container path, and the host path always comes first.
Hope it will help!
I solved problem by making host directory accessible for writing chmod 777 /var/html/assets, then added a volume which is looking to host directory and applied it to web and nginx containers. When running the web container, it invokes cp instruction to copy assets to mount directory (host directory). Nginx will see populated directory and can use it.
Note: It's a temporary / workaround solution, giving xrw access to directory is not a good way because of security.
I have a dockerfile with something like:
VOLUME /tmp/space
ADD local/directory/ /tmp/space/
RUN cp /tmp/space/somescript.sh /opt/real/space/
After the container is built and I get an interactive shell I notice that the /tmp/space still contains the data from local/directory.
If I add a RUN rm -rf /tmp/space/* to the end of the dockerfile and get shell access. The data is still there in /tmp/space/.
As a result, I'm left making a running container using the same volume and then committing the changed container to an updated image.
Is there a method to, during the build, have a temporarily loaded volume that doesn't bloat the resulting image?
The goal is to use source files and scripts to perform some actions during a build. The layers of docker end up recording a duplicate of the COPY/ADD step with the RUN step. So it would be better to COPY the data into a space that isn't recorded as a layer then as a single RUN step cp stuff && execute scripts to save on space.
I am not sure what you are trying to do here:
VOLUME /tmp/space - this declares a mount point and maps /tmp/space on your container to a directory on the host
ADD /local/directory /tmp/space - I think you are attempting to copy a local directory from your container to your mounted volume
RUN cp /tmp/space/* /opt/real/space/ - Are you trying to copy from your volume to your host?
After adding a VOLUME directive in a dockerfile, what happens is that the folder in the container /tmp/space is mapped to a folder on the host, say /hosttmp/hostspace. You can find out what this is on the host by running the command -
$ docker inspect -f {{.Volumes}} <your_container_name>
In order to prevent corruption of data in /hosttmp/hostspace, once a VOLUME is declared in the dockerfile, you cannot play around with the contents.
I would recommend reading this article as it explains the rather confusing docker volume concept
http://container-solutions.com/understanding-volumes-docker/
Sharing data between a running docker container and my host (on AWS) seems overly complicated. From the docker documentation it seems as if I need to specify volumes when I start the container.
I found this: https://github.com/synack/docker-rsync
But this watches recursively to copy only from the host machine to the docker container
I'm looking for a way to create (preferably in a Dockerfile) a folder visible on my host machine on AWS where I can scp files into that folder and they will be visible on my docker container. I am also looking for my docker image to be able to write to that folder so if the container is stopped I won't lose those files.
As a side note I already declared in my Dockerfile to
VOLUME /Training-master
but I don't know how to access it from my machine and when I stopped the container I lost the data.
Does anyone know how to do this or can they point me in the right direction?
What you are looking for is provided by docker run time options. Documented here: http://docs.docker.com/engine/userguide/dockervolumes/#mount-a-host-directory-as-a-data-volume
At the end of it, its clearly mentioned
Note: The host directory is, by its nature, host-dependent.
For this reason, you can’t mount a host directory from Dockerfile
because built images should be portable. A host directory wouldn’t
be available on all potential hosts.
Like Raghav said a drive cannot be created and shared from a Dockerfile because of image portability.
But after you create the image you can run this command and this will create a shared folder between host and docker. Be careful because you can overview a directory in the docker container if it has the same name as an existing folder:
$ sudo docker run -itd -v /home/ubuntu/Sharing dockeruser/imageID:version bash
/home/ubuntu/Sharing -- Path to sharing folder on host computer
/Share -- Path to sharing folder in my container
dockeruser/imageID:version -- the name of your container
-v -- specifies you are creating a volume
-d -- daemonizes the containe, puts it in the background
bash -- the command for the container to execute
Just for reference for Windows users:
1) You can mount a host folder into a container by
docker run -ti -v C:\local_folder\:c:\container_folder container1
2) Alternatively, you can create a volume:
docker volume create --name temp_volume
See the absolute path of the volume by:
docker volume inspect temp_volume
The mountpoint is the absolute path of the volume. You can add/remove files from that path. Then you can mount it to the container by:
docker run -ti -v temp_volume:c:\tmploc container1
Notice that both host and container are Windows machines.