Health check in Cloud Foundry - cloud-foundry

Does anyone know how I can tell my cloud foundry instance to monitor my health endpoint, so that when my health endpoint says that the app health is not status: UP, that the app is restarted?

The cf CLI 6.24.0 (released Feb 2017) exposed this type of health checking.
In your app manifest, use:
applications:
- name: myapp
health-check-type: http
health-check-http-endpoint: /admin/health
Your app needs to return a 200 status code from that path, or an error code when it's not status UP.
You can also use the cf set-health-check command to configure it on existing apps.
Check out this documentation for more details on the different health check types.

If an app instance dies, Cloud Foundry, by default, will new up a new instance and try to start it. That resiliency is built into Cloud Foundry.
Actuators are rest end points auto injected in your app that allow you to see the app's status and health at runtime.
https://spring.io/guides/gs/actuator-service/
Try Actuators out.

I don't believe that custom url health checking is available to day in CF. If your application instance is no longer healthy and you want to restart it you can System.exit(1) and CF will restart it for you.
I've heard rumors of custom health checks possibly coming in the future with the CC V3 api and Diego.

the way to do health check in PCF
cf set-health-check APP-NAME <HEALTH-CHECK-TYPE> --endpoint <CUSTOM-HTTP-ENDPOINT>
HEALTH-CHECK-TYPE = process | port | http ( ideally http for web apps )
CUSTOM-HTTP-ENDPOINT = /health
Reference: https://docs.cloudfoundry.org/devguide/deploy-apps/healthchecks.html

Related

Cloud Run Error 504 (Upstream Request Timeout) after successful deploy

I was following this tutorial from Google to deploy a servise to Cloud Run (https://codelabs.developers.google.com/codelabs/cloud-run-hello-python3#5). In Cloud Shell my project is deployed successfully (screenshot below). However, once I click on the link I get timeout. If I test it locally from Cloud Shell it works fine.
Why could this be happening? Where could I get more data about the issue?
As mentioned in the Documentation :
For Cloud Run services, the request timeout setting specifies the time
within which a response must be returned by services deployed to Cloud
Run. If a response isn't returned within the time specified, the
request ends and error 504 is returned.
The timeout is set by default to 5 minutes and can be extended up to
60 minutes. You can change this setting when you deploy a container
image or by updating the service configuration. In addition to
changing the Cloud Run request timeout, you should also check your
language framework to see whether it has its own request timeout
setting that you must also update.
You can refer to this Public group issue which will be helpful in resolving the current error.
You can increase timeout by clicking EDIT & DEPLOY NEW REVISION and then adjust new Request timeout value

Airflow web-server produces temporary 502 errors in Cloud Composer

I'm encountering 502 errors on AirFlow(2.0.2) UI hosted in Cloud Composer(1.17.0).
Error: Server Error The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
They last for a few minutes and it happens several times a day after it's gone everything works fine.
At the moment of errors:
there is a gap in logs and after we can see that logs resumed with messages about staring gunicorn:
[1133] [INFO] Starting gunicorn 19.10.0
there is a spike in resource usage of web-server
I didn't spot any other suspicious activity in other parts of the system(workers, scheduler, DB)
I think that this is a result of OOM error because we have DAGs with a big number of tasks (2k).
But I'd like to be sure and I haven't found a way to connect to VM of app engine in tenant project(where Airflow server is hosted by default) to get additional logs.
Maybe anyone knows a way to get additional logs from AirFlow server VMs or have any other idea?
Cloud Composer documentation shows Troubleshooting DAGs sections. It shows how to check individual workers logs. It even mentions OOM issues (direct link).
Generally troubleshooting section is well documented so you should be able to find many interesting information. You can also use Cloud Monitoring and Cloud Logging to monitor Composer, but I am not sure if this will be valuable in this use case (reference).

How to determine if application/application container has started on AWS elastic beanstalk?

I am writing a generic application deployment tool. It takes an application from the user and deploys it to Elastic Beanstalk. That part is working. The issue is that the users want to compose the use of the deployment tool with other operations, and right now my tool reports success when it has told the Beanstalk APIs to start the application.
Unfortunately, it is thus returning before the application itself has started. So the user is forced to write polling logic themselves to await the starting of their application.
Looking at the AWS Elastic Beanstalk API and I cannot see any methods that return any indication of such a state reporting. The closest I can find is DescribeEvents... which looks hopeful, however it seems from the examples that the granularity of the application / application container starting within the environment is not part of that API:
<DescribeEventsResponse xmlns="https://elasticbeanstalk.amazonaws.com/docs/2010-12-01/">
<DescribeEventsResult>
<Events>
<member>
<Message>Successfully completed createEnvironment activity.</Message>
<EventDate>2010-11-17T20:25:35.191Z</EventDate>
<VersionLabel>New Version</VersionLabel>
<RequestId>bb01fa74-f287-11df-8a78-9f77047e0d0c</RequestId>
<ApplicationName>SampleApp</ApplicationName>
<EnvironmentName>SampleAppVersion</EnvironmentName>
<Severity>INFO</Severity>
</member>
Note: the INFO level event is that the environment was created, nothing at the lower level of the application container starting within the environment appears to be reported...
I could mandate that the applications deployed with this tool expose a status REST endpoint, but that puts restrictions on the application.
Is there some API that I am missing that will report when the application container (e.g. Tomcat, Node, etc) is started... or better yet when the application deployed within the container is started... but I can live with the application container
Every application is supposed to expose a health URL (Beanstalk/ELB will have problems any case otherwise - it will think the instances are not responding, and might replace). This is typically a HEAD request expecting a 200 OK.
Since this is anyway expected to be there in all apps, you can probably hit this URL and check the deployment is OK. I guess Beanstalk console itself is using this method.
You can also poll using DescribeEnvironments API call which will give you the Environment CNAME (the URL to check), Health of the environment (RED, GREEN), Status (Launching | Updating | Ready | Terminating | Terminated). This API takes Environment Name as an argument. So you can just get the description of one environment.
API Documentation: http://docs.aws.amazon.com/elasticbeanstalk/latest/APIReference/API_DescribeEnvironments.html
Explanation of Environment Description in response: http://docs.aws.amazon.com/elasticbeanstalk/latest/APIReference/API_EnvironmentDescription.html
Sample Response Below:
<DescribeEnvironmentsResponse xmlns="https://elasticbeanstalk.amazonaws.com/docs/2010-12-01/">
<DescribeEnvironmentsResult>
<Environments>
<member>
<VersionLabel>Version1</VersionLabel>
<Status>Available</Status>
<ApplicationName>SampleApp</ApplicationName>
<EndpointURL>elasticbeanstalk-SampleApp-1394386994.us-east-1.elb.amazonaws.com</EndpointURL>
<CNAME>SampleApp-jxb293wg7n.elasticbeanstalk.amazonaws.com</CNAME>
<Health>Green</Health>
<EnvironmentId>e-icsgecu3wf</EnvironmentId>
<DateUpdated>2010-11-17T04:01:40.668Z</DateUpdated>
<SolutionStackName>32bit Amazon Linux running Tomcat 7</SolutionStackName>
<Description>EnvDescrip</Description>
<EnvironmentName>SampleApp</EnvironmentName>
<DateCreated>2010-11-17T03:59:33.520Z</DateCreated>
</member>
</Environments>
</DescribeEnvironmentsResult>
<ResponseMetadata>
<RequestId>44790c68-f260-11df-8a78-9f77047e0d0c</RequestId>
In your case you may want to read following documentation:
Monitoring Application Health
You can also configure Application Health Check URL for your environment. By default AWS Elastic Beanstalk uses TCP:80 check on your instances. But using the Application Health Check URL you can override this health check to use HTTP:80 by using the Application Health Check URL option as described here.
Using Status/Health from DescribeEnvironments you can check if the application has been deployed.

Error when deploying applications to cloud foundry using cloud bees plugin

I have integrated my Cloud Foundry account with Cloud Bees as mentioned in the url -
http://docs.cloudfoundry.com/docs/dotcom/integration/cloudbees/
and trying to deploy few sample applications from github.
Build was successful every time but when I went for app-deployment using this plugin, it gave one exception (one particular exception for 2-3 applications I have tried).
[INFO] Deployment done in 1.2 sec
[cloudbees-deployer] Deploying as (jenkins) to the svcnvghi293 account
[cloudbees-deployer] Deploying null
com.cloudbees.plugins.deployer.exceptions.DeployException: Could not create DeployEvent
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl.createEvent(RunEngineImpl.java:132)
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl.createEvent(RunEngineImpl.java:51)
at com.cloudbees.plugins.deployer.engines.Engine.perform(Engine.java:82)
at com.cloudbees.plugins.deployer.DeployPublisher.perform(DeployPublisher.java:95)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:728)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:703)
at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.post2(MavenModuleSetBuild.java:994)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:650)
at hudson.model.Run.execute(Run.java:1530)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:477)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)
Caused by: java.lang.NullPointerException
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl$EventImpl.<init>(RunEngineImpl.java:208)
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl.createEvent(RunEngineImpl.java:124)
... 12 more
Build step 'Deploy applications' marked build as failure
Finished: FAILURE
does anyone have any idea about this ?
Thanks in advance.
After a bit of digging I figured out which account you have.
The issue is that you had left the CloudBees RUN#Cloud host service in the list of host services to deploy to but you had not provided a complete configuration for it, e.g. see the "Application Id cannot be empty" red error text in this screenshot
I have removed this host section and saved your hellospring job. Build 8 shows a successful deployment.

What the reason for AWS Health status becoming RED?

I've deployed an application to AWS elastic beanstalk.
after start the application, it runs well. But after 5 minutes(I set health check every 5 min), it runs failed. I access the url but HTTP 503 error back.
From the event info, I only get the info that the health status from YELLOW TO GREEN.
But how can I get detailed info and what can I do about this error?
BTW: I don't understand that is this health status RED leads to application can't start up OR something else failed leads to application failed, then the health status becomes to RED?
Elastic Load Balancing has a health check daemon that checks the path you've provided for a 200-range HTTP status.
If there is a problem with the application, or its not returning a 2xx status code, or if you've misconfigured the health check URL, the status will go RED.
Two things you can do to see what's going on:
Hit the hostname of an individual instance in your web browser — particularly the health check path. Are you seeing what you expected?
SSH into the instance and check the logs in /var/log and /opt/elasticbeanstalk/var/log. Are there any errors that you can find?
Without knowing more about your application, stack or container type, that's the best I can do.
I hope this helps! :)