Decrease in Instances Size Suddenly - cloud-foundry

enter image description here
I deployed a Spring boot Java Application in CF in one Org and Space , Initially i deployed the App with the 2 instances which is described in manifest file later due to the requirement i increased the number of instances to 4 but this i gave in cf push command "cf push appname -i 4 ", in script task bamboo, everything was running as expected but on friday suddenly the instances got decreased to 2 no changes in manifest or script , i checked the logs "cf logs app-name recent and cf events appname" there is no clue for app crash may i know what might be the reason for thi weird behaviour , do i need to check somewhere else, if app restarts suddenly would it take the manifest file ? please help me on this

It is unclear if you're referring to the total instance count or the currently running instance count.
Running Instance Count
If you're currently running instance count does not match the total instance count, then there is some sort of problem with the instances. The app is crashing, you've scaled up/down recently and new instances couldn't be created, or perhaps your platform is undergoing maintenance and app instances are being shuffled around.
In all cases, you'd want to look at cf events and cf logs for clues. The event would tell you if it's an app crash or specific scaling event. If there's nothing in the event's output, it could be platform maintenance. That doesn't generate event entries.
Total Instance Count
If the total instance count does not match what you'd expect, then it was changed by someone or something. The total count is stored in Cloud Controller and would not change unless it was directed.
If your total instance count is changing unexpectedly look at the cf events output. This will include entries that modify the application instance count, like cf push and cf scale. It should also tell you what made the change (the actor), so that can help to track down how it's being changed.
I'm not totally sure this is happening here, but I have seen others hit by this. If you do a one-off cf push or cf scale, that change will stick on the server. However, if you are using a manifest.yml file that contains state, i.e. your old instance count. If you or someone else or your CI then goes and does a cf push with the manifest.yml file that has the old instance count, that will update the instance count on Cloud Controller back to the old instance count.
In short, if you do a one-off cf push or cf scale to change the instance count, you need to make sure your manifest.yml is kept up-to-date & that is checked into CI (if you're doing that).
Lastly, check if you have any autoscaling rules in place. Those could also obviously impact your instance count. If you change your total instance count, you'll probably also want to update your scaling rules.

Related

How can I have my instances refresh automatically in Google Cloud Run?

I have a Cloud Run application built in Python and Docker. The application serves a dashboard that runs queries against data and displays visualizations and statistics. Currently, if I want the app to load quickly I have to set the minimum number of instances to a number greater than 0, I typically use 10. This is great for serving the app immediately, however it can become outdated. I would essentially like to be able to keep a minimum number of instances available to serve the app immediately, but I would like it if they would refresh, or shut down and start up, once every few hours or at least once a day. Is there a way to achieve this?
I have tried looking into Cloud Scheduler to somehow get the Cloud Run application to refresh on a schedule, but I was unclear on how to make the whole thing shut down and reload, especially without serving another revision.
I think about a design but I never tested it. Try to do that.
At startup, store in a global variable the time.Now(). You have the startup time
Implement a Healthcheck probe. The healthcheck answer OK (HTTP 200) if the NOW minus startup time (the global variable) is below X (1 hour for instance). Else it answer KO (HTTP 500)
Deploy your Cloud Run service with the new Health check feature
Like that, after X duration, the instance will autodeclare itself unhealthy and Cloud Run will evict it and create a new one.
It should work. Let me know, I'm interested in the result!

GCP cloud run send a request to all running instances

I have a rest API running on cloud run that implements a cache, which needs to be cleared maybe once a week when I update a certain property in the database. Is there any way to send a HTTP request to all running instances of my application? Right now my understanding is even if I send multiple requests and there are 5 instances, it could all go to one instance. So is there a way to do this?
Let's go back to basics:
Cloud Run instances start based on a revision/image.
If you have the above use case, where suppose you have 5 instances running and you suddenly need to re-start them as restarting the instances resolves your use case, such as clearing/rebuilding the cache, what you need to do is:
Trigger a change in the service/config, so a new revision gets
created.
This will automatically replace, so will stop and relaunch all your instances on the fly.
You have a couple of options here, choose which is suitable for you:
if you have your services defined as yaml files, the easiest is to run the replace service command:
gcloud beta run services replace myservice.yaml
otherwise add an Environmental variable like a date that you increase, and this will yield a new revision (as a change in Env means new config, new revision) read more.
gcloud run services update SERVICE --update-env-vars KEY1=VALUE1,KEY2=VALUE2
As these operations are executed, you will see a new revision created, and your active instances will be replaced on their next request with fresh new instances that will build the new cache.
You can't reach directly all the active instance, it's the magic (and the tradeoff) of serverless: you don't really know what is running!! If you implement cache on Cloud Run, you need a way to invalidate it.
Either based on duration; when expired, refresh it
Or by invalidation. But you can't on Cloud Run.
The other way to see this use case is that you have a cache shared between all your instance, and thus you need a shared cache, something like memory store. You can have only 1 Cloud Run instance which invalidate it and recreate it and all the other instances will use it.

CF set-env without restage

Let's say I would have one app with two instances.
I would change an environment variable (cf set-env) and not perform a cf restage.
Eventually one of the two instances would crash and restart. Would it take the new environment variable or the old?
In general if an instance crashes (say the app runs out of memory) and is restarted by Diego (the runtime that's actually running the container instances), the restarted instance will still have the environment variables it was originally "desired" (created) with.
If you explicitly cf restart or cf stop && cf start the app it will pickup the new environment variables which out needing to be restaged.
As user152468 said above, if the environment variables are used during the staging process you will need to cf restage the app for them to functionally take effect.
Edge Case Scenario
If the Diego runtime goes away/loses data for some catastrophic reason, the Cloud Controller will re-sync it and recreate the apps that are meant to be running. In this event the behavior is similar to a cf restart and the app will pick up the new environment variables. This is definitely uncommon, but would also count as a "crash" scenario.
EDIT:
After reading tcdowney's answer below, I tried it out. And tcdowney is right. When the app is restarted due to a crash, it will not pick up the new environment variable, so both of your app instances will share the same environment.
In contrast when you do a cf restart my-app then it will pick it up. Sorry for the confusion!
========================================================
It would take the new value of the environment variable. See here: https://docs.run.pivotal.io/devguide/deploy-apps/start-restart-restage.html
Only if the variable is relevant for the buildpack, you will need to restage.

how to auto update AWS windows EC2 instances when updates become available

I am working with AWS EC2 windows instances and my goal is to associate them with a maintenance window or a patch baseline (I'm not sure which one) to schedule an automation that, when updates for the instance become available it automatically updates itself. I have created a maintenance window for the instances but I think my issue is figuring out how to link up a system to check for updates and run them when they become available.
What you're looking for is the Patch Manager feature of the EC2 Systems Manager service: http://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-patch.html.
There is also a handy getting started blog post available here: https://aws.amazon.com/blogs/mt/getting-started-with-patch-manager-and-amazon-ec2-systems-manager/
Creating a Maintenance Window is the right first step, this will let you control when you want to do patching of your instances, which instances you want to patch and how you want them to be patched.
To define which instances you want to patch, the easiest way is to tag your instances (e.g. create a tag with the key 'Type' and value 'Patching'), but if you have a reasonably small number of instances and don't launch new instances on a regular basis you can also add them individually, by instance id to the Maintenance Window as a target. If you regularly launch new instances (either manually or as part of an Auto Scaling Group), tagging is convenient as those instances will be picked up automatically for patching.
Once you've added your instances as targets to your maintenance window, the next step is to add a task to the maintenance window. Specifically you want to add the Run Command task 'AWS-RunPatchBaseline' and run that for the target you created above (making sure to set Operation to 'Install').
This completes the minimum steps needed to patch all of your instances whenever the maintenance window runs. Every time the maintenance window runs, the AWS-RunPatchBaseline command will be sent to your instances and all approved patches will be installed and patch compliance reported.
If you want more control over exactly which patches are approved you can also create a custom patch baseline and define specific rules controlling which patches to approve when. If you choose to do so (if not, the default patch baseline is used), you'll also want to set the 'Patch Group' tag on your instances to define which patch baseline to use for which instance. That's described in more detail in the documentation.
Hope this helps, feel free to ping me otherwise.
/Mats

issues with elastic beanstalk leader election

We have a rails app that has been working fine for months. Today we discovered some inconsistencies with leader election. Primarily:
su - "leader_only bundle exec rake db:migrate" webapp
After many hours of trial and error (and dozens of deployments) none of the instances in our dev application run this migration. /usr/bin/leader_only looks for an environment variable that is never set on any instance (the dev app has only one instance).
Setting the application deployment to 1 instance at a time and providing the value that /usr/bin/leader_only expects as an env var works, but not as it has been and should. (Now all instances are leaders so they will fruitlessly run db:migrate and it's 1 at a time, so if we have many instances this will slow us down)
We thought maybe it was due to some issues with the code and/or app, so we rebuilt it. No change.
I even cloned our test application's RDS server and created a new application from a saved configuration, deployed a new git hash, and it never ran db:migrate as well. It attempts to and shows the leader_only line, but it never runs. That rules out code, configuration, artifacts.
Also for what its worth, it never says skipping migrations due to RAILS_SKIP_MIGRATIONS, which has a value of false. This means that it is in fact trying to run db:migrate but isn't due to not being described as the leader.
We have been in talk with the AWS support teams. It seems as though EB leader election is very fragile.
Per the tech:
Also, as explained before(Leader is the first instance in an
auto-scaling group and if it is removed we loose the leader and even
using the leader_only : true in container_commands, db:migrate doesn't
work.)
What happened is that we lost all instances. The leader is elected once, and is passed through instance rotation. If you do not lose all instances, everything is fine.
I did not mention a detail. We have many non-production environments, and through elastic beanstalk autoscaling settings, we use timed scaling to set our instance count to 0 at night, and back up to the expected 1-2 amount during the day. We do this for our dev, test, and UAT environments to make sure we dont run at full speed 24/7. Because of this, we lost the leader and never got it back.
Per the follow up from the tech:
We have a feature request in place to overcome the issue of losing the
leader when very first instance is deleted.
"Elastic Beanstalk uses leader election to determine which instance in
your worker environment queues the periodic task. Each instance
attempts to become leader by writing to a DynamoDB table. The first
instance that succeeds is the leader, and must continue to write to
the table to maintain leader status. If the leader goes out of
service, another instance quickly takes its place."
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks
In Elastic Beanstalk, you can run a command into a single "leader" instance. Just create a .ebextensions file that contain container_commands and then deploy it. Make sure you set the leader_only value to true.
For example:
.ebextensions/00_db_migration.config
container_commands:
00_db_migrate:
command: "rake db:migrate"
leader_only: true
The working directory of this command will be your new application.
The leader instance environment variable will be set by Elastic Beanstalk agent while updating time. It will not be exported to normal ssh shell.