How to alter shared memory for SageMaker Docker containers? - amazon-web-services

I have a Docker image in Elastic Container Registry (ECR). It was created via a simple Dockerfile which I have control over.
The image itself is fine, but I have a problem where the shared memory is insufficient when working inside a container in SageMaker Studio. Therefore I need to raise the shared memory of these containers.
To raise the shared memory of a container, I believe the usual method is to pass the --shm-size argument to the docker run command when starting the container. However, I do not have control over this command, as SageMaker is doing that bit for me. The docs say that SageMaker is running docker run <image> train when starting a container.
Is it possible to work around this problem? Either via somehow providing additional arguments to the command, or specifying something when creating the Docker image (such as in the Dockerfile, deployment script to ECR).

According to this issue there is no option you can use in sagemaker at the moment. If ECS is an option for you, it does support --shm-size option in the task definition.

As pointed out by #rok (thank you!) it is not possible in this situation to pass arguments to docker run, although it would be if switching to ECS.
It is however possible to pass the --shm-size argument to docker build when building the image to push to ECR. This seems to have fixed the problem, albeit it does require a new Docker image to be built and pushed whenever wanting to change this parameter.

Related

Is there a service in AWS that is equivalent to docker configs?

I have a WordPress site that is gonna be hosted using ECS in AWS.
To make the management even more flexible, I plan not to store service configurations (i.e. php.ini, nginx.conf) inside the docker image itself. I found that docker swarm offers "docker configs" for such. Are there any equivalent tools doing the same thing? (I know AWS Secrets Manager can handle docker secrets though)
Any advice or alternative approaches? thank you all.
The most similar you could use is probably AWS SSM Parameter store
You will need some logic to retrieve the values when you are running the image.
If you don't want to have the files also inside of the running containers, then you pull from Parameter Store, and add them to the environment, and you will need to do probably some work in the application to read from the environment (the application stays decoupled from the actually source of the config), or you can read directly from Param store in the application (easier, but you have some coupling in your image with Parameter store.
if your concern is only about not having the values in the image, but it is fine if they are inside of the running container, then you can read from Param Store and inject the values in the container inside of the usual location of the files, so for the application is transparent
As additional approaches:
Especially for php.ini and nginx.conf I like a simple approach that is having a separate git repo, with different config files per different environments.
You have a common docker image regardless of the environment
in build time, you pull the proper file for the enviroment, and either save as env variables, or inject in the container
And last: need to mention classic tools like Chef or Puppet, and also ansible. More complex and maybe overkill
The two ways that I store configs and secrets for most services are
Credstash which is combination of KMS and Dynamodb, and
Parameter Store which has already been mentioned,
The aws command line tool can be used to fetch from Parameter Store
and S3(for configs), while credstash is its own utility (quite useful and easy to
use) and needs to be installed separately.

How to use a docker file with terraform

Terraform has a dedicated "docker" provider which works with images and containers and which can use a private registry and supply it with credentials, cf. the registry documentation. However, I didn't find any means to supply a Dockerfile directly without use of a separate registry. The problem of handling changes to docker files itself is already solved e.g. in this question, albeit without the use of terraform.
I could do a couple of workarounds: not using the dedicated docker provider, but use some other provider (although I don't know which one). Or I could start my own private registry (possibly in a docker container with terraform), run the docker commands locally which generate the images files (from terraform this could be done using the null_resource of the null provider) and then continue with those.
None of these workarounds make much sense to me. Is there a way to deploy docker containers described in a docker file directly using terraform?
Terraform is a provisioning tool rather than a build tool, so building artifacts like Docker images from source is not really within its scope.
Much as how the common and recommended way to deal with EC2 images (AMIs) is to have some other tool build them and Terraform simply to use them, the same principle applies to Docker images: the common and recommended path is to have some other system build your Docker images -- a CI system, for example -- and to publish the results somewhere that Terraform's Docker provider will be able to find them at provisioning time.
The primary reason for this separation is that it separates the concerns of building a new artifact and provisioning infrastructure using artifacts. This is useful in a number of ways, for example:
If you're changing something about your infrastructure that doesn't require a new image then you can just re-use the image you already built.
If there's a problem with your Dockerfile that produces a broken new image, you can easily roll back to the previous image (as long as it's still in the registry) without having to rebuild it.
It can be tempting to try to orchestrate an entire build/provision/deploy pipeline with Terraform alone, but Terraform is not designed for that and so it will often be frustrating to do so. Instead, I'd recommend treating Terraform as just one component in your pipeline, and use it in conjunction with other tools that are better suited to the problem of build automation.
If avoiding running a separate registry is your goal, I believe that can be accomplished by skipping using docker_image altogether and just using docker_container with an image argument referring to an image that is already available to the Docker daemon indicated in the provider configuration.
docker_image retrieves a remote image into the daemon's local image cache, but docker build writes its result directly into the local image cache of the daemon used for the build process, so as long as both Terraform and docker build are interacting with the same daemon, Terraform's Docker provider should be able to find and use the cached image without interacting with a registry at all.
For example, you could build an automation pipeline that runs docker build first, obtains the raw id (hash) of the image that was built, and then runs terraform apply -var="docker_image=$DOCKER_IMAGE" against a suitable Terraform configuration that can then immediately use that image.
Having such a tight coupling between the artifact build process and the provisioning process does defeat slightly the advantages of the separation, but the capability is there if you need it.

Does sagemaker use nvidia-docker or docker runtime==nvidia by default or user need to manually set up?

As stated in the question, "Does sagemaker use nvidia-docker or docker runtime==nvidia by default or user need to manually set up?"
Some common error message showed as "CannotStartContainerError. Please ensure the model container for variant variant-name-1 starts correctly when invoked with 'docker run serve’." and it didn't show as running with nividia driver.
So, do we need manually set up?
I'm using tensorflow-gpu image as base images for my containers and I can use the gpu without specifying anything gpu related. When building docker containers for sagemaker you have to beware of folder structure and that your container is able to start with the command serve(which the error suggest).
If you have problem setting this up I find this example the most useful one to get the hang of it.

Is it possible to directly call docker run from AWS lambda

I have a Java standalone application which I have dockerized. I want to run this docker everytime an object is put into S3 storage. On way is to do it via AWS batch which I am trying to avoid.
Is there a direct and easy way to call docker run from a lambda?
Yes and no.
What you can't do is execute docker run to run a container within the context of the Lambda call. But you can trigger a task on ECS to be executed. For this to work, you need to have a cluster set up on ECS, which means you need to pay for at least one EC2 instance. Because of that, it might be better to not use Docker, but I know too little about your application to judge that.
There are a lot of articles out there how to connect S3, Lambda and ECS. Here is a pretty in-depth article by Amazon that you might be interested in:
https://aws.amazon.com/blogs/compute/better-together-amazon-ecs-and-aws-lambda/
If you are looking for code, this repository implements what is discussed in the above article:
https://github.com/awslabs/lambda-ecs-worker-pattern
Here is a snippet we use in our Lambda function (Python) to run a Docker container from Lambda:
result = boto3.client('ecs').run_task(
cluster=cluster,
taskDefinition=task_definition,
overrides=overrides,
count=1,
startedBy='lambda'
)
We pass in the name of the cluster on which we want to run the container, as well as the task definition that defines which container to run, the resources it needs and so on. overrides is a dictionary/map with settings that you want to override in the task definition, which we use to specify the command we want to run (i.e. the argument to docker run). This enables us to use the same Lambda function to run a lot of different jobs on ECS.
Hope that points you in the right direction.
Yes. It is possible to run containers out Docker images stored in Docker Hub within AWS Lambda using SCAR.
For example, you can create a Lambda function to execute a container out of the ubuntu:16.04 image in Docker Hub as follows:
scar init ubuntu:16.04
And then you can run a command or a shell-script within that container upon each invocation of the function:
scar run scar-ubuntu-16-04 whoami
SCAR: Request Id: ed5e9f09-ce0c-11e7-8375-6fc6859242f0
Log group name: /aws/lambda/scar-ubuntu-16-04
Log stream name: 2017/11/20/[$LATEST]7e53ed01e54a451494832e21ea933fca
---------------------------------------------------------------------------
sbx_user1059
You can use your own Docker images stored in Docker Hub. Some limitations apply but it can be effectively used to run generic applications on AWS Lambda. It also features a programming model for file-processing event-driven applications. It uses uDocker under the hood.
Yes try Udocker.
Udocker is a simple tool written in Python, it has a minimal set of dependencies so that can be executed in a wide range of Linux systems.
udocker does not make use of docker nor requires its installation.
udocker "executes" the containers by simply providing a chroot like environment over the extracted container. The current implementation uses PRoot to mimic chroot without requiring privileges.
Examples
Pull from docker hub and list the pulled images.
udocker pull fedora
Create the container from a pulled image and run it.
udocker create --name=myfed fedora
udocker run myfed cat /etc/redhat-release
And also its good to check Hackernoon.
Because:
In Lambda, the only place you are allowed to write is /tmp. But udocker will attempt to write to the homedir by default. And other stuff.

How Docker and Ansible fit together to implement Continuous Delivery/Continuous Deployment

I'm new to the configuration management and deployment tools. I have to implement a Continuous Delivery/Continuous Deployment tool for one of the most interesting projects I've ever put my hands on.
First of all, individually, I'm comfortable with AWS, I know what Ansible is, the logic behind it and its purpose. I do not have same level of understanding of Docker but I got the idea. I went through a lot of Internet resources, but I can't get the the big picture.
What I've been struggling is how they fit together. Using Ansible, I can manage my Infrastructure as Code; building EC2 instances, installing packages... I can even deploy a full application by pulling its code, modify config files and start web server. Docker is, itself, a tool that packages an application and ensures that it can be run wherever you deploy it.
My problems are:
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Suppose we have a source code repository, the team members finish working on a feature and they push their work. Jenkins detects this, runs all the acceptance/unit/integration test suites and if they all passed, it declares it as a stable build. How Docker fits here? I mean when the team pushes their work, does Jenkins have to pull the Docker file source coded within the app, build the image of the application, start the container and run all the tests against it or it runs the tests the classic way and if all is good then it builds the Docker image from the Docker file and saves it in a private place?
Should Jenkins tag the final image using x.y.z for example!?
Docker containers configuration :
Suppose we have an image built by Jenkins stored somewhere, how to handle deploying the same image into different environments, and even, different configurations parameters ( Vhosts config, DB hosts, Queues URLs, S3 endpoints, etc...) What is the most flexible way to deal with this issue without breaking Docker principles? Are these configurations backed in the image when it gets build or when the container based on it is started, if so how are they injected?
Ansible and Docker:
Ansible provides a Docker module to manage Docker containers. Assuming I solved the problems mentioned above, when I want to deploy a new version x.t.z of my app, I tell Ansible to pull that image from where it was stored on, start the app container, so how to inject the configuration settings!? Does Ansible have to log in the Docker image, before it's running ( this sounds insane to me ) and use its Jinja2 templates the same way with a classic host!? If not, how is this handled?!
Excuse me if it was a long question or if I misspelled something, but this is my thinking out loud. I'm blocked for the past two weeks and I can't figure out the correct workflow. I want this to be a reference for future readers.
Please, it would very helpful to read your experiences and solutions because this looks like a common workflow.
I would like to answer in parts
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Since docker images same everywhere, you use your docker images as if they are production images. Therefore, when somebody committed a code, you build your docker image. You run tests against it. When all tests pass, you tag that image accordingly. Since docker is fast, this is a feasible workflow.
Also docker changes are incremental; therefore, your images will have minimal impact on storage. Also when your tests fail, you may also choose to save that image too. In this way, developer will pull that image and investigate easily why your tests failed. Developer may choose to run tests in their machine too since docker images in jenkins and their machine are not different.
What this brings that all developers will have same environment, same version of all software since you decide which one will be used in docker images. I have come across to bugs that are due to differences between developer machines. For example in the same operating system, unicode settings may affect your code. But in docker images all developers will test against same settings, same version software.
Docker containers configuration :
If you are using a private repository, and you should use one, then configuration changes will not affect hard disk space much. Therefore except security configurations, such as db passwords, you can apply configuration changes to docker images(Baking the Configuration into the Container). Then you can use ansible to apply not-stored configurations to deployed images before/after startup using environment variables or Docker Volumes.
https://dantehranian.wordpress.com/2015/03/25/how-should-i-get-application-configuration-into-my-docker-containers/
Does Ansible have to log in the Docker image, before it's running (
this sounds insane to me ) and use its Jinja2 templates the same way
with a classic host!? If not, how is this handled?!
No, ansible will not log in the Docker image, but ansible with Jinja2 templates can be used to change dockerfile. You can change dockerfile with templates and can inject your configuration to different files. Tag your files accordingly and you have configured images to spin up.
Regarding your question about handling multiple environment configurations using the same Docker image, I have been planning on using a Service Discovery tool like Consul as a centralized config/property management tool. So, when you start your container up, you set an ENV var that tells it what application it is (appID), and what environment config it should use (ex: MyApplication:Dev) and it will pull its config from Consul at startup. I still have to investigate the security around Consul (as if we are storing DB connection credentials in there for example, how do we restrict who can query/update those values). I don't want to just use this for containers, but all apps in general. Another cool capability is to change the config value in Consul and have a hook back into your app to apply the changes immediately (maybe like a REST endpoint on your app to push changes down to and dynamically apply it). Of course your app has to be written to support this!
You might be interested in checking out Martin Fowler's blog articles on immutable infrastructure and on Phoenix servers.
Although not a complete solution, I have suggestions for two of your issues. Although they might not be perfect, these are the practices we are using in our workflow, and prove themselves so far.
Defining different environments - supposing you've written a different Ansible role for each environment you launch, we define an environment variable setting the environment we wish the container to belong to. We then download the suitable configuration file from an S3 bucket using the env variable set before into the container (which should be possible if you supply AWS creds or give your server an IAM role) and inject these parameters into the code when building it.
Ansible doesn't need to log into the docker app, but the solution is a bit tricky. I've tried two ways of tackling this problem, and both aren't ideal. The first one is to download the configuration file as part of the docker image command line, and build the app on container startup. While this solution works - it breaches the Docker philosophy and makes the image highly prone to build errors.
Another solution is pushing several images to your docker hub repo, and then pulling the appropriate image according to the environment at hand.
In a broader stroke, I've tried launching our app completely with Ansible and it was hell, many configuration steps are tricky and get trickier when you try to implement them as a playbook. When I switched to maintaining the severs alone with Ansible, and deploying the app itself with Docker things got a lot easier.