I have a super small C# .NET Framework web application that I run on AWS Beanstalk. It works fine, except for one peculiar thing:
AWS seems to restart the service with unregular interval. I can see this as I can see log prints that only happens on startup.
In this case I am using ServiceStack AppHostBase, which has a Configure method that should be executed only once, on startup. In this method, a log print is done, and this gets executed between every 15 min to 2 hours approximately, indicating that the service is restarted at those times:
No update of the build has been done at those times seen above.
The Beanstalk configuration looks like this:
I would like to know why it restarts and what I can do to stop it.
Related
I am fairly new to AWS and would like your suggestions. The problem I would like to solve is that I want to automate the process. I have this ec2 image running ubuntu and I want to call this executable "executable_hello_world_repeat" inside the image which prints "Hello World" every second. and when calling the executable I want to add input parameters such as "executable_hello_world_repeat -n10" this would print "hello world" 10 times.
Manually I can do the following:
go to AWS management console and choose the ec2 image to start
check if the instance is running successfully
from the terminal call "executable_hello_world_repeat -n10"
it prints the "Hello World"
I want to write a program to do them all programatically. Eventually I will have a web page in React/JS and automate this process.
Thanks for reading.
When an Amazon EC2 instance is first launched, a User Data script can be provided, which is automatically executed as the root user towards the end of the boot process. You can use this script to install software, configure settings, start process, etc.
Please note that this script only runs on the first boot, because the software does not need to be installed on subsequent boots.
If you want a script to run on every boot, put it in the /var/lib/cloud/scripts/per-boot/ directory.
If you later want to trigger a script to run, then you will need some mechanism that receives this request and runs the script. A few ways you could do this are:
Run a web server on the instance and the request comes via an HTTP / REST request, or
Trigger the AWS Systems Manager Run Command that will cause a script to be run on the instance, or even multiple instances, or
Have a program or script running on the instance that is continuously polling an Amazon SQS queue. When a message is received from the queue, trigger a program/script to process the message. This is known as a "Worker" that pulls from the Queue
The EC2 instance is basically just a normal Linux instance, so you'll need to somehow get something to trigger on the instance when desired.
We built an app that heavily rely on google cloud task, but as our user base is increasing we are pushing 100s of cloud task in a queue. While 99%+ of task gets executed correctly, there are few that just disappear.
It's a very strange issue. Having no logs adds salt to injury. I added custom logs in my application to print task name.
My application A --> create cloud tasks --> calls another application B
A creates CT successfully but B receives 99%+ tasks only, not 100%.
Example this ask:
locations/asia-south1/queues/tpl-pending/tasks/88947425534926389671
I have a AWS pipeline, which:
1) first stage, get template.yaml and build a ec2 windows instance via script
note when this machine boots up, via user data it starts a script to downloads requirements, git etc, code, setups iis and various other stuff.
so this happens once the cloudformation part has completed, and takes about another 5 mins
2) i then want to run external tests on this machine - maybe using blazemeter, as the second part of the pipeline
the problem is that between stage 1 and 2 i need to wait for the website to work on the box, so i need to wait at least 5 mins. i could add a manual approval stage, but this seams cumbersome.
does anyone have a way to add this timed wait? or a pipeline process to check the site is up?
I am trying to use Amazon Elastic Beanstalk to run a very long numerical simulation - up to 20 hours. The code works beautifully when I tell it to do a short, 20 second simulation. However, when running a longer one, I get the error "The following instances have not responded in the allowed command timeout time (they might still finish eventually on their own)".
After browsing the web, it seems to me that the issue is that Elastic Beanstalk allows worker processes to run for 30 minutes at most, and then they time out because the instance has not responded (i.e. finished the simulation). The solution some have proposed is to send a message every 30 seconds or so that "pings" Elastic Beanstalk, letting it know that the simulation is going well so it doesn't time out, which would let me run a long worker process. So I have a few questions:
Is this the correct approach?
If so, what code or configuration would I add to the project to make it stop terminating early?
If not, how can I smoothly run a 12+ hour simulation on AWS or more generally, the cloud?
Add on information
Thank you for the feedback, Rohit. To give some more information, I'm using Python with Flask.
• I am indeed using an Elastic Beanstalk worker tier with SQS queues
• In my code, I'm running a simulation of variable length - from as short as 20 seconds to as long as 20 hours. 99% of the work that Elastic Beanstalk does is running the simulation. The other 1% involves saving results, sending emails, etc.
• The simulation itself involves using generating many random numbers and working with objects that I defined. I use numpy heavily here.
Let me know if I can provide any more information. I really appreciate the help :)
After talking to a friend who's more in the know about this stuff than me, I solved the problem. It's a little sketchy, but got the job done. For future reference, here is an outline of what I did:
1) Wrote a main script that used Amazon's boto library to connect to my SQS queue. Wrote an infinite while loop to poll the queue every 60 seconds. When there's a message on the queue, run a simulation and then continue through with the loop
2) Borrowed a beautiful /etc/init.d/ template to run my script as a daemon (http://blog.scphillips.com/2013/07/getting-a-python-script-to-run-in-the-background-as-a-service-on-boot/)
3) Made my main script and the script in (2) executable
4) Set up a cron job to make sure the script would start back up if it failed.
Once again, thank you Rohit for taking the time to help me out. I'm glad I still got to use Amazon even though Elastic Beanstalk wasn't the right tool for the job
From your question it seems you are running into launches timing out because some commands during launch that run on your instance take more than 30 minutes.
As explained here, you can adjust the Timeout option in the aws:elasticbeanstalk:command namespace. This can have values between 1 and 1800. This means if your commands finish within 30 minutes you won't see this error. The commands might eventually finish as the error message says but since Elastic Beanstalk has not received a response within the specified period it does not know what is going on your instance.
It would be helpful if you could add more details about your usecase. What commands you are running during startup? Apparently you are using ebextensions to launch commands which take a long time. Is it possible to run those commands in the background or do you need these commands to run during server startup?
If you are running a Tomcat web app you could also use something like servlet init method to run app bootstrapping code. This code can take however long it needs without giving you this error message.
Unfortunately, there is no way to 'process a message' from an SQS queue for more than 12 hours (see the description of ChangeVisibilityTimeout).
With that being the case, this approach doesn't fit your application well. I have ran into the same problem.
The correct way to do this: I don't know. However, I would suggest an alternate approach where you grab a message off of your queue, spin off a thread or process to run your long running simulation, and then delete the message (signaling successful processing). In this approach, be careful of spinning off too many threads on one machine and also be wary of machines shutting down before the simulation has ended, because the queue message has already been deleted.
Final note: your question is excellently worded and sufficiently detailed :)
For those looking to run jobs shorter than 10 hours, it needs to be mentioned that the current inactivity timeout limit is 36000 seconds, so exactly 10 hours and not anymore 30 minutes, like mentioned in posts all over the web (which led me to think a workaround like described above is needed).
Check out the docs: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
A very nice write-up can be found here: https://dev.to/rizasaputra/understanding-aws-elastic-beanstalk-worker-timeout-42hi
What does "Pending Restart" mean? I have stopped and restarted my WebJob numerous times and that doesn't seem to fix it. Does it mean I have to restart my website? What caused my job to get in this state in the first place? Is there any way I can prevent this from happening in the future?
Usually, it means that the job fails to start (an exception?). Look in the jobs dashboard for logs.
Also, make sure that if the job is continuous, you actually have an infinite loop that keeps the process alive.
To add to Victor's answer, the continuous WebJob states are:
Initializing - The site was just started and the WebJob is doing it's initialization process.
Starting - The WebJob is starting up the process/script.
Running - The WebJob's process is running.
PendingRestart - The WebJob's process exited (for any good or bad reason) in less than 2 minutes since it started, for a continuous WebJob it's considered that something was probably not right with it (some exception during start-up probably as mentioned by Victor), at this point the system is waiting for 60 seconds before it'll restart the WebJob process (hence the name "pending restart").
Stopped - The WebJob was stopped (usually from the Azure portal) and is currently not running and will not be running until it is started again, best way to see this is as disabled.
Also, take a look at the webjob log, it should hold cue to what's been happening.
if the JOB is set to run continuously, once the process exits (say you are polling a queue and it's empty) the job shuts down and status changes to "pending restart". Azure Scheduler will typically restart the process in 60 seconds.
Try changing the target framework to .NET 4.5. This same issue was fixed for me when I changed the target framework from 4.6.1 to 4.5.
Had the same problem, found out that i need to keep my webjob alive, so I put a continous loop to keep it alive.
It means that application is failing after start. Check the App Service Application setting might have some problem. .. In my case i am passing date as a configuration and i entered wrong date like 20160431
This is just because Webjobs is failing or giving exception.
Make sure the Webjobs is continuous and you can check that in the log where it failing and can make the changes.
Process went down, waiting for 60 seconds
Status changed to Pending Restart
In my case, we got the deployment package prepared through the Visual Studio folder publish and deployed along with WebApp. Package created through 'Folder publish' lacked 'run.cmd' file in which command to invoke the console application (.exe) is available; this file is automatically created when we directly publish to Azure WebJob from Visual Studio.
After manually adding this to a package folder, the issue got fixed.