Does Google Life Sciences API support dependencies and parallel commands? - google-cloud-platform

From what I can tell from the docs, the Google Cloud Life Sciences API (v2beta) allows you to define a "pipeline" with multiple commands in it, but these run sequentially.
Am I correct in thinking there is no way to have some commands run in parallel, and for a group of commands to be dependent on others (that is, to not start running until their predecessors have finished)?

You are correct that you cannot run commands in parallel, or in such a way that the process is dependent upon the completion of some other process.
When you run commands using the commands[] flag. This is exactly the same as passing the CMD parameter to a Docker container(as this is exactly what you are doing). The commands[] flag overrides the CMD arguments passed to the Docker container at runtime. If the container in using an Entrypoint then the commands[] flag will override the Entrypoint argument values for the container
You can review the official here;
Method: projects.locations.pipelines.run
gcloud command-line tool examples
gcloud beta lifesciences

Related

Input and Ouput to ECS task in Step function

Have previously worked with lambda orchestration using AWS step function. This has been working very well. Setting the result_path of each lambda will pass along arguments to subsequent lambda.
However, I now need to run a fargate task and then pass along arguments from that fargate task to subsequent lambdas. I have created a python script that acts as the entrypoint in the container definition. Obviously in a lambda function the handler(event, context) acts as the entrypoint and by defining a return {"return_object": "hello_world"} its easy to pass a long a argument to the next state of the state machine.
In my case though, I have task definition with a container definition created from this Dockerfile:
FROM python:3.7-slim
COPY my_script.py /my_script.py
RUN ln -s /python/my_script.py /usr/bin/my_script && \
chmod +x /python/my_script.py
ENTRYPOINT ["my_script"]
Hence, I am able to invoke the state machine and it will execute my_script as intended. But how do I get the output from this python script and pass it along to another state in the state machine?
I have found some documentation on how to pass along inputs, but no example of passing along outputs.
To get output from an ECS/Fargate task, I think you have to use the Task Token Integration instead of Run Job (Sync) which is usually recommended for Fargate tasks. You can pass the token as a container override ("TASK_TOKEN": "$$.Task.Token"). Then inside your image you need some logic like this:
client = boto3.client('stepfunctions')
client.send_task_success(
taskToken=os.environ["TASK_TOKEN"],
output=output
)
to pass it back.

How to run a script on a Google Compute Instance?

You can run a startup-script and a shutdown-script, but is it possible to use the Compute Engine API to run a script after startup?
The primary reason I'm asking is because the startup script isn't executing for me upon first run
Yes, this can be done. You can pass the file through a gcloud command or just add it to the instance metadata using the UI. Take a look at the following documentation for startup-script and shutdown-script.

How to execute a docker-run command from C++ program?

I want to execute "docker run -it Image_name" from a C++ program. Is there any way to achieve this?
Try as you do for simple system command from C++.
System("docker run -it Image_name")
I can think of two ways you could achieve this.
For a quick-and-dirty approach, you can actually run commands from your C++ code. There seems to be a few ways to run commands with C++, but the system() function seems to be an easy way if you just want to run the command:
int main() {
system("docker run -it Image_name");
}
Bare in mind you will need to make sure the docker executable is in your PATH environment variable. You will also need to consider what operating systems you want to support, a system call in Linux might not behave the same as in windows. It can be tricky to get system calls right.
For another method, using the docker engine's API directly. docker commands are sent to this API. You could connect directly to this API yourself and call the API the same way the docker run -it Image_name command would. The Engine API is documented here https://docs.docker.com/engine/api/v1.24/ . I believe the docker run -it Image_name command starts up what the API calls a "service".
The shell command will be the easiest approach. The Engine API approach would take more effort up front, but will result in cleaner, more robust code. The correct approach will depend on your situation.

Google Cloud container orchestration and cleanup

In short, I am looking to see if it is possible to run multiple Docker containers on the same machine via gcloud's create-with-container functionality (or similar). The idea is that there will be some "worker" container (which does some arbitrary work) which runs and completes, followed by a "cleanup" container which subsequently runs performing the same task each time.
Longer explanation:
I currently have an application that launches tasks that run inside Docker containers on Google Cloud. I use gcloud beta compute instances create-with-container <...other args...> to launch the VM, which runs the specified container. I will call that the "worker" container, and the tasks it performs are not relevant to my question. However, regardless of the "worker" container, I would like to run a second, "cleanup" container upon the completion of the first. In this way, developers can write independently write Docker containers that do not have to "repeat" the work done by the "cleanup" container.
Side note:
I know that I could alternatively specify a startup script (e.g. a bash script) which starts the docker containers as I describe above. However, when I first tried that, I kept running into issues where the docker pull <image> command would timeout or fail for some reason when communicating with dockerhub. The gcloud beta compute instances create-with-container <...args...> seemed to have error handling/retries built-in, which seemed ideal. Does anyone have a working snippet that would provide relatively robust error handling in the startup script?
As far as I know the limitation is one container per VM instance. See limitations.
Answer: It is currently not possible to launch multiple containers with the create-with-container functionality.
Alternative: You mentioned that you have already tried launching your containers with a startup script. Another option would be to specify a cloud-init config through instance metadata. Cloud-init is built into Container-Optimized OS (the same OS that you would use with create-with-container).
It works by adding and starting a systemd service, which means that you can:
specify that your service should run after other services: network-online.target and docker.socket
specify a Restart policy for the service to do retries on failure,
add an ExecStopPost specification to run your cleanup (or add a separate service for that in the cloud-init config)
This is a snippet that could be a starting point (you would need to add it under user-data metadata key):
#cloud-config
users:
- name: cloudservice
uid: 2000
write_files:
- path: /etc/systemd/system/cloudservice.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Start a simple docker container
Wants=network-online.target docker.socket
After=network-online.target docker.socket
[Service]
ExecStart=/usr/bin/docker run --rm -u 2000 --name=cloudservice busybox:latest /bin/sleep 180
ExecStopPost=/bin/echo Finished!
Restart=on-failure
RestartSec=30
runcmd:
- systemctl daemon-reload
- systemctl start cloudservice.service

Docker on EC2, RUN command in dockerfile not reading environment variable

I have two elastic-beanstalk environments on AWS: development and production. I'm running a glassfish server on each instance and it is requested that the same application package be deployable in production and in development environment, without requiring two different .EAR files.The two instance differ in size: the dev has a micro instance while the production has a medium instance, therefore I need to deploy two different configuration files for glassfish, one for each environment.
The main problem is that the file has to be in the glassfish config directory before the server starts, therefore I thought it could be better moving it while the container was created.
Of course each environment uses a docker container to host the glassfish instance, so my first thought was to configure an environment variable for the elastic-beanstalk. In this case
ypenvironment = dev
for the development environment and
ypenvironment = pro
for the production environment. Then in my DOCKERFILE I put this statement in the RUN command:
RUN if [ "$ypenvironment"="pro" ] ; then \
mv --force /var/app/GF_domain.xml /usr/local/glassfish/glassfish/domains/domain1/config/domain.xml ; \
elif [ "$ypenvironment"="dev" ] ; then \
mv --force /var/app/GF_domain.xml.dev /usr/local/glassfish/glassfish/domains/domain1/config/domain.xml ; \
fi
unfortunately, when the startup finishes, both GF_domain files are still in var/app.
Then I red that the RUN command runs things BEFORE the container is fully loaded, maybe missing the elastic-beanstalk-injected variables. So I tried to move the code to the ENTRYPOINT directive. No luck again, the container startup fails. Also tried the
ENTRYPOINT ["command", "param"]
syntax, but it didn't work giving a
System error: exec: "if": executable file not found in $PATH
Thus I'm stuck.
You need:
1/ Not to use entrypoint (or at least use a sh -c 'if...' syntax): that is for runtime execution, not compile-time image build.
2/ to use build-time variables (--build-arg):
You can use ENV instructions in a Dockerfile to define variable values. These values persist in the built image.
However, often persistence is not what you want. Users want to specify variables differently depending on which host they build an image on.
A good example is http_proxy or source versions for pulling intermediate files. The ARG instruction lets Dockerfile authors define values that users can set at build-time using the --build-arg flag:
$ docker build --build-arg HTTP_PROXY=http://10.20.30.2:1234 .
In your case, your Dockefile should include:
ENV ypenvironment
Then docker build --build-arg ypenvironment=dev ... myDevImage
You will build 2 different images (based on the same Dockerfile)
I need to be able to use the same EAR package for dev and pro environments,
Then you want your ENTRYPOINT, when run, to move a file depending on the value of an environment variable.
Your Dockerfile still needs to include:
ENV ypenvironment
But you need to run your one image with
docker run -x ypenvironment=dev ...
Make sure your script (referenced by your entrypoint) includes the if [ "$ypenvironment"="pro" ] ; then... you mention in your question, plus the actual launch (in foreground) of your app.
Your script needs to not exit right away, or your container would switch to exit status right after having started.
When working with Docker you must differentiate between build-time actions and run-time actions.
Dockerfiles are used for building Docker images, not for deploying containers. This means that all the commands in the Dockerfile are executed when you build the Docker image, not when you deploy a container from it.
The CMD and ENTRYPOINT commands are special build-time commands which tell Docker what command to execute when a container is deployed from that image.
Now, in your case a better approach would be to check if Glassfish supports environment variables inside domain.xml (or somewhere else). If it does, you can use the same domain.xml file for both environments, and have the same Docker image for both of them. You then differentiate between the environments by injecting run-time environment variables to the containers by using docker run -e "VAR=value" when running locally, and by using the Environment Properties configuration section when deploying on Elastic Beanstalk.
Edit: In case you can't use environment variables inside domain.xml, you can solve the problem by starting the container with a script which reads the runtime environment variables and puts their values in the correct places in domain.xml using sed, then starts your application as usual. You can find an example in this post.