Implicit Process creation when pushing a Spring boot application - cloud-foundry

I am pushing a minimalistic Spring Boot web application on Cloud Foundry. My manifest looks like
---
applications:
- name: training-app
path: target/spring-boot-initial-0.0.1-SNAPSHOT.jar
instances: 1
memory: 1G
buildpacks:
- java_buildpack
env:
TRAINING_KEY_3: from manifest
When I push the application with Java Buildpack (https://github.com/cloudfoundry/java-buildpack/releases/tag/v4.45) , I see that it is creating an additional process of type -task which does not have any running instance though.
name: training-app
requested state: started
isolation segment: trial
routes: ***************************
last uploaded: Thu 20 Jan 21:29:31 IST 2022
stack: cflinuxfs3
buildpacks:
isolation segment: trial
name version detect output buildpack name
java_buildpack v4.45-offline-https://github.com/cloudfoundry/java-buildpack.git#f1b695a0 java java
type: web
sidecars:
instances: 1/1
memory usage: 1024M
start command: JAVA_OPTS="-agentpath:$PWD/.java-buildpack/open_jdk_jre/bin/jvmkill-1.16.0_RELEASE=printHeapHistogram=1 -Djava.io.tmpdir=$TMPDIR -XX:ActiveProcessorCount=$(nproc)
-Djava.ext.dirs=$PWD/.java-buildpack/container_security_provider:$PWD/.java-buildpack/open_jdk_jre/lib/ext -Djava.security.properties=$PWD/.java-buildpack/java_security/java.security $JAVA_OPTS" &&
CALCULATED_MEMORY=$($PWD/.java-buildpack/open_jdk_jre/bin/java-buildpack-memory-calculator-3.13.0_RELEASE -totMemory=$MEMORY_LIMIT -loadedClasses=13109 -poolType=metaspace -stackThreads=250 -vmOptions="$JAVA_OPTS") && echo JVM Memory Configuration:
$CALCULATED_MEMORY && JAVA_OPTS="$JAVA_OPTS $CALCULATED_MEMORY" && MALLOC_ARENA_MAX=2 SERVER_PORT=$PORT eval exec $PWD/.java-buildpack/open_jdk_jre/bin/java $JAVA_OPTS -cp $PWD/. org.springframework.boot.loader.JarLauncher
state since cpu memory disk details
#0 running 2022-01-20T15:59:55Z 0.0% 62.2M of 1G 130M of 1G
type: task
sidecars:
instances: 0/0
memory usage: 1024M
start command: JAVA_OPTS="-agentpath:$PWD/.java-buildpack/open_jdk_jre/bin/jvmkill-1.16.0_RELEASE=printHeapHistogram=1 -Djava.io.tmpdir=$TMPDIR -XX:ActiveProcessorCount=$(nproc)
-Djava.ext.dirs=$PWD/.java-buildpack/container_security_provider:$PWD/.java-buildpack/open_jdk_jre/lib/ext -Djava.security.properties=$PWD/.java-buildpack/java_security/java.security $JAVA_OPTS" &&
CALCULATED_MEMORY=$($PWD/.java-buildpack/open_jdk_jre/bin/java-buildpack-memory-calculator-3.13.0_RELEASE -totMemory=$MEMORY_LIMIT -loadedClasses=13109 -poolType=metaspace -stackThreads=250 -vmOptions="$JAVA_OPTS") && echo JVM Memory Configuration:
$CALCULATED_MEMORY && JAVA_OPTS="$JAVA_OPTS $CALCULATED_MEMORY" && MALLOC_ARENA_MAX=2 SERVER_PORT=$PORT eval exec $PWD/.java-buildpack/open_jdk_jre/bin/java $JAVA_OPTS -cp $PWD/. org.springframework.boot.loader.JarLauncher
There are no running instances of this process.
I understand that it is a Springboot Web application , and that corresponds to the process of type web , however I do not know
Who is creating the process of type task
What is the purpose of this process ?
It would be great of someone is able to help me here.
Regards
AM

Who is creating the process of type task
The buildpack creates both. This is what's been happening for a while, but recent cf cli changes are making this more visible.
What is the purpose of this process ?
I didn't add that into the buildpack so I can't 100% say its purpose, but I believe it is meant to be used in conjunction with running Java apps ask tasks on CF.
See this commit.
When you run a task, there is a --process flag to the cf run-task command which can be used to set a process to use as the command template. I believe the idea is that you'd set it to task so it can use that command to run your ask. See here for reference to that flag.

Related

cf v3-push with manifest and variable substitution

I have a v3 app that I want to deploy to 2 different environments. The app name and some definitions vary from env to env, but the structure of the manifest is the same. For example:
# manifest_test.yml
applications:
- name: AppTest
processes:
- type: web
command: start-web.sh
instances: 1
- type: worker
command: start-worker.sh
instances: 1
# manifest_prod.yml
applications:
- name: AppProd
processes:
- type: web
command: start-web.sh
instances: 3
- type: worker
command: start-worker.sh
instances: 5
Instead of keeping duplicate manifests with only minor changes in variables, I wanted to use a single manifest with variable substitution. So I created something like this:
# manifest.yml
- name: App((env))
processes:
- type: web
command: start-web.sh
instances: ((web_instances))
- type: worker
command: start-worker.sh
instances: ((worker_instances))
However, it seems like cf v3-apply-manifest doesn't have an option to provide variables for substitution (as cf push did).
Is there any way around this, or do I have to keep using a separate manifest for each environment?
Please try one of the cf v7 cli beta releases. I didn't test it but the output from cf7 push -h has a flag for using --vars and --vars-file. It should also use the v3 APIs so it will support things like rolling deploy.
For what it's worth, if you're looking to use CAPI v3 features you should probably use the cf7 beta releases going forward. That is going to get you the latest and greatest support for the CAPI v3.
Hope that helps!

DB Migration on a Load-Balanced Cloud Foundry Deployment

I'm deploying an app on cloud foundry. I also run a db migration before the deployment. To do this, my launch command looked like:
./run_migration && ./run_app
That was working well on 1 instance, but now I have 2 instances, so the launch command was changed to:
[ $CF_INSTANCE_INDEX != 0 ] || ./run_migration && ./run_app
This way the migration runs only on the instance number 0. And this works as well. However, once the migration failed.
2019-02-12T13:56:45.27+0100 [APP/PROC/WEB/0]OUT Exit status 1
2019-02-12T13:56:45.28+0100 [CELL/SSHD/0]OUT Exit status 0
OK
requested state: started
instances: 2/2
state since cpu memory disk
#0 starting 2019-02-12 01:56:36 PM 0.0% 0 of 1G 0 of 1G
#1 running 2019-02-12 01:56:39 PM 15.8% 93.3M of 1G 249.4M of 1G
So as far as I understand, the puch is considered to be healthy although only one instance manages to start.
Is there a way to fail the push when not all instances managed to start
Is there a way to fail the push when not all instances managed to start
I don't know of a way to do that, but you could always follow up and check after cf push completes.
Run cf app <app> | grep 'instances:' and you should see the number running and total requests. If they don't match something's up.
If you're just trying to make your deployment script fail, doing a check like that, while a little more work, should suffice. If there's some other reason, you'll need to add some background to your question so we call understand the use case better.
Hope that helps!

Reuse a cloudfoundry app without having to rebuild from sratch

I deploy a Django Application with Cloudfoundry. Building the app takes some time, however I need to launch the application with different start commands and the only solution I have today is fully to rebuild each time the application.
With Docker, changing the start command is very easy and it doesn't require to rebuild to the whole container, there must be a more efficient way to do this:
Here are the applications launched:
FrontEndApp-Prod: The Django App using gunicorn
OrchesterApp-Prod: The Django Celery Camera & Heartbeat
WorkerApp-Prod: The Django Celery Workers
All these apps are basically identical, they just use different routes, configurations and start commands.
Below is the file manifest.yml I use:
defaults: &defaults
timeout: 120
memory: 768M
disk_quota: 2G
path: .
stack: cflinuxfs2
buildpack: https://github.com/cloudfoundry/buildpack-python.git
services:
- PostgresDB-Prod
- RabbitMQ-Prod
- Redis-Prod
applications:
- name: FrontEndApp-Prod
<<: *defaults
routes:
- route: www.myapp.com
instances: 2
command: chmod +x ./launch_server.sh && ./launch_server.sh
- name: OrchesterApp-Prod
<<: *defaults
memory: 1G
instances: 1
command: chmod +x ./launch_orchester.sh && ./launch_orchester.sh
health-check-type: process
no-route: true
- name: WorkerApp-Prod
<<: *defaults
instances: 3
command: chmod +x ./launch_worker.sh && ./launch_worker.sh
health-check-type: process
no-route: true
Two options I can think of for this:
You can use some of the new v3 API features and take advantage of their support for multiple processes in a Procfile. With that, you'd essentially have a Profile like this:
web: ./launch_server.sh
worker: ./launch_orchester.sh
worker: ./launch_worker.sh
The platform should then stage your app once, but deploy it three times based on the droplet that is produced from staging. It's slick because you end up with only one application that has multiple processes running off of it. The drawback is that this is a experimental API at the time of me writing this, so it still has some rough edges, plus the exact support you get could vary depending on how quickly your CF provider installs new versions of the Cloud Controller API.
You can read all the details about this here:
https://www.cloudfoundry.org/blog/build-cf-push-learn-procfiles/
You can use cf local. This is a cf cli plugin which allows you to build a droplet locally (staging occurs in a docker container on your local machine). You can then take that droplet and deploy it as much as you want.
The process would look roughly like this, you'll just need to fill in some options/flags (hint run cf local -h to see all the options):
cf local stage
cf local push FrontEndApp-Prod
cf local push OrchesterApp-Prod
cf local push WorkerApp-Prod
The first command will create a file ending in .droplet in your current directory, the subsequent three commands will deploy that droplet to your provider and run it. The net result is that you should end up with three applications, like you have now, that are all deployed from the same droplet.
The drawback is that your droplet is local, so you're uploading it three times once for each app.
I suppose you also have a third option which is to just use a docker container. That has it's own advantages & drawbacks though.
Hope that helps!

Issues with Docker Swarm running TeamCity using rexray/ebs for drive persistence in AWS EBS

I'm quite new to Docker but have started thinking about production set-ups, hence needing to crack the challenge of data persistence when using Docker Swarm. I decided to start by creating my deployment infrastructure (TeamCity for builds and NuGet plus the "registry" [https://hub.docker.com/_/registry/] for storing images).
I've started with TeamCity. Obvious this needs data persistence in order to work. I am able to run TeamCity in a container with an EBS drive and everything looks like it is working just fine - TeamCity is working through the set-up steps and my TeamCity drives appear in AWS EBS, but then the worker node TeamCity gets allocated to shuts down and the install process stops.
Here are all the steps I'm following:
Phase 1 - Machine Setup:
Create one AWS instance for master
Create two AWS instances for workers
All are 64-bit Ubuntu t2.mircro instances
Create three elastic IPs for convenience and assign them to the above machines.
Install Docker on all nodes using this: https://docs.docker.com/install/linux/docker-ce/ubuntu/
Install Docker Machine on all nodes using this: https://docs.docker.com/machine/install-machine/
Install Docker Compose on all nodes using this: https://docs.docker.com/compose/install/
Phase 2 - Configure Docker Remote on the Master:
$ sudo docker run -p 2375:2375 --rm -d -v /var/run/docker.sock:/var/run/docker.sock jarkt/docker-remote-api
Phase 3 - install the rexray/ebs plugin on all machines:
$ sudo docker plugin install --grant-all-permissions rexray/ebs REXRAY_PREEMPT=true EBS_ACCESSKEY=XXX EBS_SECRETKEY=YYY
[I lifted the correct values from AWS for XXX and YYY]
I test this using:
$ sudo docker volume create --driver=rexray/ebs --name=delete --opt=size=2
$ sudo docker volume rm delete
All three nodes are able to create and delete drives in AWS EBS with no issue.
Phase 4 - Setup the swarm:
Run this on the master:
$ sudo docker swarm init --advertise-addr eth0:2377
This gives the command to run on each of the workers, which looks like this:
$ sudo docker swarm join --token XXX 1.2.3.4:2377
These execute fine on the worker machines.
Phase 5 - Set up visualisation using Remote Powershell on my local machine:
$ $env:DOCKER_HOST="{master IP address}:2375"
$ docker stack deploy --with-registry-auth -c viz.yml viz
viz.yml looks like this:
version: '3.1'
services:
viz:
image: dockersamples/visualizer
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
ports:
- "8080:8080"
deploy:
placement:
constraints:
- node.role==manager
This works fine and allows me to visualise my swarm.
Phase 6 - Install TeamCity using Remote Powershell on my local machine:
$ docker stack deploy --with-registry-auth -c docker-compose.yml infra
docker-compose.yml looks like this:
version: '3'
services:
teamcity:
image: jetbrains/teamcity-server:2017.1.2
volumes:
- teamcity-server-datadir:/data/teamcity_server/datadir
- teamcity-server-logs:/opt/teamcity/logs
ports:
- "80:8111"
volumes:
teamcity-server-datadir:
driver: rexray/ebs
teamcity-server-logs:
driver: rexray/ebs
[Incorporating NGINX as a proxy is a later step on my to do list.]
I can see both the required drives appear in AWS EBS and the container appear in my swarm visualisation.
However, after a while of seeing the progress screen in TeamCity the worker machine containing the TeamCity instance shuts down and the process abruptly ends.
I'm at a loss as to what to do next. I'm not even sure where to look for logs.
Any help gratefully received!
Cheers,
Steve.
I found a way to get logs for my service. First do this to list the services the stack creates:
$ sudo docker service ls
Then do this to see logs for the service:
$ sudo docker service logs --details {service name}
Now I just need to wade through the logs and see what went wrong...
Update
I found the following error in the logs:
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | [2018-05-14 17:38:56,849] ERROR - r.configs.dsl.DslPluginManager - DSL plugin compilation failed
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | exit code: 1
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | stdout: #
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | # There is insufficient memory for the Java Runtime Environment to continue.
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | # Native memory allocation (mmap) failed to map 42012672 bytes for committing reserved memory.
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | # An error report file with more information is saved as:
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | # /opt/teamcity/bin/hs_err_pid125.log
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 |
infra_teamcity.1.bhiwz74gnuio#ip-172-31-18-103 | stderr: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e2dfe000, 42012672, 0) failed; error='Cannot allocate memory' (errno=12)
Which is making me think this is a memory problem. I'm going to try this again with a better AWS instance and see how I get on.
Update 2
Using a larger AWS instance solved the issue. :)
I then discovered that rexray/ebs doesn't like it when a container switches between hosts in my swarm - it duplicates the EBS volumes so that it keeps one per machine. My solution to this was to use an EFS drive in AWS and mount it to each possible host. I then updated the fstab file so that the drive is remounted on every reboot. Job done. Now to look into using a reverse proxy...

How to tune/troubleshoot/optimize docker block i/o on AWS

I have the following docker containers that I have set up to test my web application:
Jenkins
Apache 1 (serving a laravel app)
Apache 2 (serving a legacy codeigniter app)
MySQL (accessed by both Apache 1 and Apache 2)
Selenium HUB
Selenium Node — ChromeDriver
The jenkins job runs a behat command on Apache 1 which in turn connects to Selenium Hub, which has a ChromeDriver node to actually hit the two apps: Apache 1 and Apache 2.
The whole system is running on an EC2 t2.small instance (1 core, 2GB RAM) with AWS linux.
The problem
The issue I am having is that if I run the pipeline multiple times, the first few times it runs just fine (the behat stage takes about 20s), but on the third and consecutive runs, the behat stage starts slowing down (taking 1m30s) and then failing after 3m or 10m or whenever I lose patience.
If I restart the docker containers, it works again, but only for another 2-4 runs.
Clues
Monitoring docker stats each time I run the jenkins pipeline, I noticed that the Block I/O, and specifically the 'I' was growing exponentially after the first few runs.
For example, after run 1
After run 2
After run 3
After run 4
The Block I/O for the chromedriver container is 21GB and the driver hangs. While I might expect the Block I/O to grow, I wouldn't expect it to grow exponentially as it seems to be doing. It's like something is... exploding.
The same docker configuration (using docker-compose) runs flawlessly every time on my personal MacBook Pro. Block I/O does not 'explode'. I constrain Docker to only use 1 core and 2GB of RAM.
What I've tried
This situation has sent me down the path of learning a lot more about docker, filesystems and memory management, but I'm still not resolving the issue. Some of the things I have tried:
Memory
I set mem_limit options on all containers and tuned them so that during any given run, the memory would not reach 100%. Memory usage now seems fairly stable, and never 'blows up'.
Storage Driver
The default for AWS Linux Docker is devicemapper in loop-lvm mode. After reading this doc
https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#configure-docker-with-devicemapper
I switched to the suggested direct-lvm mode.
docker-compose restart
This does indeed 'reset' the issue, allowing me to get a few more runs in, but it doesn't last. After 2-4 runs, things seize up and the tests start failing.
iotop
Running iotop on the host shows that reads are going through the roof.
My Question...
What is happening that causes the block i/o to grow exponentially? I'm not clear if it's docker, jenkins, selenium or chromedriver that are causing the problem. My first guess is chromedriver, although the other containers are also showing signs of 'exploding'.
What is a good approach to tuning a system like this with multiple moving parts?
Additonal Info
My chromedriver container has the following environment set in docker-compose:
- SE_OPTS=-maxSession 6 -browser browserName=chrome,maxInstances=3
docker info:
$ docker info
Containers: 6
Running: 6
Paused: 0
Stopped: 0
Images: 5
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 4.862 GB
Data Space Total: 20.4 GB
Data Space Available: 15.53 GB
Metadata Space Used: 2.54 MB
Metadata Space Total: 213.9 MB
Metadata Space Available: 211.4 MB
Thin Pool Minimum Free Space: 2.039 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: overlay null host bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.51-40.60.amzn1.x86_64
Operating System: Amazon Linux AMI 2017.03
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.956 GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/