I have just done a clean install of CF8 on a Windows 2000 machine. I have a scheduled task I need to run every 15 minutes on this machine, and the machine does little else.
The task is set up as normal through CF admin, but for some reason, when the task takes about 5 minutes to run it will complete fine (I can see this from debug output and from cfstat) but the scheduler will not reschedule the task.
The scheduling log shows that the task started to execute, but not entry that it was rescheduled. Eg:
[ProcessRecords] Executing at Wed May 20 10:30:00 BST 2009
I have been over my server timeouts. I have NO timeout in CF admin and this particular script has a <cfsetting requesttimeout="43200" /> tag set. There are no exceptions in the console logging. The last bit of console logging is the very last debug statement in my .cfm template.
I do notice that task that run in a shorter time, say for example under a minute, will reschedule as normal.
Has anyone come across a problem like this before?
I'm baffled. Any and all replies are appreciated!
Cheers,
Ciaran
not for nothing, but i've never seen anything like this with cf8. are you sure that you have the latest hotfix and jvm installed? this might have been something in cf8 that was fixed in 8.01.
hotfix 2 for cf8.01
list of all hotfixes and updates for cf8.01
hotfix 3 for cf8
list of all hotfixes and updates for cf8
latest jvm
upgrade instruction for jvm
If you suspect that it's an uncaught exception causing the issue, then might I suggest logging portions of the process. Case in point, I had a similar problem with a scheduled task where it would just bottom out for no reason (never had the reschedule problem though). What I ended up doing to diagnose the problem was use cflog to write out portions of the process as they completed. This particular task too about 4 minutes to complete but ran through about 200 portions (it was a mass emailer for a bunch of clients).
I logged the when the portion started and completed along with how log it took. By doing so, i could see what portion would trip up the whole process and knew where to focus my attention.
Related
I have small project with couple of tasks to run several times a day.
The project is based on Django 2.1, having celery 4.2.1 and django-celery-beat 1.3.0. And also have rabbitmq installed.
Each task is inside it's projects application. Runs, works, gives some result.
The problem is - on virtual server, leased from some provider, if I set any task to run periodically (each hour, or two)- it starts running immidiately, without end and, as i suppose in some kind of parallel threads, wish mesh each other.
Command rabbintmqctl list_queues name messages_unacknowldged always shows 8 in queue celery. Purging the queue celery does not give any changes. Restarting service - too.
But setting tasks schedule to run in exact time works good. Well, almost good. Two tasks have schedule to run in the beginning of different hours (even and odd). But both run in about 30 minutes after hour beginning, of the same (odd) hour. At least tasks don't run more times in a day than set in schedule. But it is still something wrong.
As a newbie with rabbitmq and celery don't know where to look for solution. Official celery docs didn't help me. May be was not looking in right place. Any help help or advice would be good. Thanks.
It seems this is bug of django-celery-beat - https://github.com/celery/celery/issues/4041.
If anyone have already made any solution for this - please inform.
I have scheduled a C# console application in Task Scheduler of Windows 2012 R2. Application will run when executed it manually or Right click on scheduled task and click on Run, but it is failed when triggered by Task Scheduler with below error.
The operator or administrator has refused the request(0x800710E0)
I have followed below steps also after Google search
Selected "Run whether user logged in or not"
Unchecked "Start the task only if the computer is on AC power"
In my case, the error message "The operator or administrator has refused the request" meant that a previous instance of the task has still been running and the task was configured to not start a new instance if it's already running (the default configuration), so the Task Scheduler refused to start a new instance when the task was triggered.
You can find that option in a select box on the task's Settings tab, under the caption "If the task is already running, then the following rule applies". The default value is "Do not start a new instance".
But that error message is pretty confusing. From the other answers, you may see that it may mean many completely distinct errors. As is usual in Microsoft's products.
Tip
It's helpful to check the History tab of a task. That's where I have found out what's actually going on. There was an event "Launch request ignored, instance already running".
In my case, I had to redo the permissions on the task. Somehow it had lost the domain portion of the username. Instead of `DOMAIN\joeuser' it was just 'joeuser'. After a reset, it worked correctly as it had for the previous year.
In my case as per having a job setup with Task Scheduler as written about in the "Prevent a Task Scheduler Task from Executing on Setting Updates", I had a job setup to run every "X" minutes for a period of indefinitely.
Upon seeing the dreaded "The operator or administrator has refused the request" for the Last Run Result, I looked over the History tab and see detail indicating that is "missed its schedule".
The Solution
From the Settings tab of the job properties, I simply checked the option "Run task as soon as possible after a scheduled start is missed", and problem resolved; although, I did have to type in the credential again as well.
Note: This started occurring once a server was moved from a redundant backup server once hardware repair was completed back to the original hardware. The OS was Server 2012 R2 and the OS was moved to other hardware while repair was done on the production server but I didn't notice this there—maybe an oversight there though—not sure.
I know that #Sushmit-Patil found a solution, but I wanted to add a solution to my similar problem:
It turns out a prior process never exited (it was hanging around in memory because of a defect I had in my code). By default, Windows Task Scheduler won't run the process again if it's already running.
In addition to fixing the defect, in Task Scheduler, under the Settings tab, I set If the task is already running, then the following rule applies: to Run a new instance in parallel
1
Error occurred due to folder permission, I was creating CSV from my application, which was required folder permission to be granted. After giving Full Control to the folder error got resolved.
For me, the solution was to check Run with highest privileges in the properties.
In my case my task launches a PowerShell script--and it produced the "The operator or administrator has refused the request (0x800710E0)" error message as seen in the Task Scheduler's task-entry grid. My user name was correct, but when I dropped to a command prompt and simulated the task by running the PowerShell against my .ps1 file, I saw an Avast prompt that flagged my script as suspicious and wasn't allowing it to run. I created an Avast exception and now the task runs without any issue.
After turning on history I also had the error "Missed task start rejected: Task Scheduler did not launch task as it missed its schedule." but I didn't want the task to start when I woke up the computer, I wanted to figure out why the computer didn't wake up.
This answer helped me out -- by default Windows was waking for "Important Wake Timers Only" (system updates, but not my scheduled task).
In the setting Power Options > Edit Plan Settings > Change advanced power settings > Sleep > Allow wake timer change the option to "Enabled" and then your computer will wake up to run the task.
You can also do this from "settings". Probably earlier instance was already running and launching a new instance failed.
In my case, the error message "The operator or administrator has refused the request" appeared because the computer was in stand-by at the scheduled time (and the options "Wake the computer to run this task" and "Run task as soon as possible after a scheduled start was missed" were unchecked).
I had previously chosen "Enable All Tasks History" and a more useful error message appeared in the History tab: "Missed task start rejected: Task Scheduler did not launch task as it missed its schedule. Consider using the configuration option to start the task when available, if schedule is missed."
I have found what I believe to be a bizarre bug in Windows Server 2016 scheduler and maybe other Windows Server versions that produces the OP's error (and a workaround):
Here are the conditions:
You're using the "Monthly" option trigger in your task (I currently have all months selected and just a couple days chosen, e.g. 1st and 15th)
You have the "Synchronize across time zones" selected.
This was originally an issue I found back in November 2020 when my tasks were running twice all of a sudden after the DST time change (and this was a widely reported bug, but not an obvious solution). I never would have known, except that users started receiving duplicate emails from one of my tasks. In the history you would simply see the task running twice at what appeared to be exactly the same time. It worked fine before the time change. I forget all the troubleshooting I did then, but my end theory was that it was somehow confusing the time after the time change. The work around was to set the option "Synchronize across time zones" and all seemed well...
Fast forward to March when the DST time just changed back again and now I get every time the tasks with the Monthly option runs:
The operator or administrator has refused the request
The History tab on the task is also blank. If you change options and save, the History tab starts logging again and then sometimes stops if the task errors again. Weird.
One work around is to simply turn off the "Synchronize across time zones" option (tested). However, I don't recommend that option as I assume you'll have the duplicate running task issue again when the DST time changes again in November.
The one time I got an error to show in the History tab it stated:
Task Scheduler did not launch task "\EmailCampaign" as it missed its
schedule. Consider using the configuration option to start the task
when available, if schedule is missed.
Therefore, I went and set that option to start the task if the schedule is missed and all seems well. I figured I'd see the original error and then subsequently the task running, but no error any more either. It all just works.
I know this solution was reported above, but that's because most people's computers were asleep or something to that effect. My issue is on a production internet facing server that doesn't go to sleep, hibernate or anything related and only happens with specific conditions related to the Monthly trigger option. All my others tens of scheduled tasks work flawless.
I wrote a Powershell script to do a task. I was getting this error and landed here (as well as other lower ranked search results). The task would run manually and the first time it was triggered, but not on repeat even though I had it set up to end the task if it took longer than a minute.
My problem was caused by not providing an exit code in my powershell script. Task scheduler simply did not know the task had finished and would consider it still running. I could have simply allowed the next instance of the task to be started if the previous was not finished, but using the exit code is the 'right way'.
So I simply added a new line on the end of my PS1 --
exit
This topic is old but I had the same problem on windows server 2016.
My task executes a BAT script that zip a folder and upload on an external backup.
The task never ended because there was a "pause" at the end of my script. And my task was configured with "Dot not start a new instance" settings.
I solved my problem by removing the "pause". I don't know if it will be useful..
I am trying to use Amazon Elastic Beanstalk to run a very long numerical simulation - up to 20 hours. The code works beautifully when I tell it to do a short, 20 second simulation. However, when running a longer one, I get the error "The following instances have not responded in the allowed command timeout time (they might still finish eventually on their own)".
After browsing the web, it seems to me that the issue is that Elastic Beanstalk allows worker processes to run for 30 minutes at most, and then they time out because the instance has not responded (i.e. finished the simulation). The solution some have proposed is to send a message every 30 seconds or so that "pings" Elastic Beanstalk, letting it know that the simulation is going well so it doesn't time out, which would let me run a long worker process. So I have a few questions:
Is this the correct approach?
If so, what code or configuration would I add to the project to make it stop terminating early?
If not, how can I smoothly run a 12+ hour simulation on AWS or more generally, the cloud?
Add on information
Thank you for the feedback, Rohit. To give some more information, I'm using Python with Flask.
• I am indeed using an Elastic Beanstalk worker tier with SQS queues
• In my code, I'm running a simulation of variable length - from as short as 20 seconds to as long as 20 hours. 99% of the work that Elastic Beanstalk does is running the simulation. The other 1% involves saving results, sending emails, etc.
• The simulation itself involves using generating many random numbers and working with objects that I defined. I use numpy heavily here.
Let me know if I can provide any more information. I really appreciate the help :)
After talking to a friend who's more in the know about this stuff than me, I solved the problem. It's a little sketchy, but got the job done. For future reference, here is an outline of what I did:
1) Wrote a main script that used Amazon's boto library to connect to my SQS queue. Wrote an infinite while loop to poll the queue every 60 seconds. When there's a message on the queue, run a simulation and then continue through with the loop
2) Borrowed a beautiful /etc/init.d/ template to run my script as a daemon (http://blog.scphillips.com/2013/07/getting-a-python-script-to-run-in-the-background-as-a-service-on-boot/)
3) Made my main script and the script in (2) executable
4) Set up a cron job to make sure the script would start back up if it failed.
Once again, thank you Rohit for taking the time to help me out. I'm glad I still got to use Amazon even though Elastic Beanstalk wasn't the right tool for the job
From your question it seems you are running into launches timing out because some commands during launch that run on your instance take more than 30 minutes.
As explained here, you can adjust the Timeout option in the aws:elasticbeanstalk:command namespace. This can have values between 1 and 1800. This means if your commands finish within 30 minutes you won't see this error. The commands might eventually finish as the error message says but since Elastic Beanstalk has not received a response within the specified period it does not know what is going on your instance.
It would be helpful if you could add more details about your usecase. What commands you are running during startup? Apparently you are using ebextensions to launch commands which take a long time. Is it possible to run those commands in the background or do you need these commands to run during server startup?
If you are running a Tomcat web app you could also use something like servlet init method to run app bootstrapping code. This code can take however long it needs without giving you this error message.
Unfortunately, there is no way to 'process a message' from an SQS queue for more than 12 hours (see the description of ChangeVisibilityTimeout).
With that being the case, this approach doesn't fit your application well. I have ran into the same problem.
The correct way to do this: I don't know. However, I would suggest an alternate approach where you grab a message off of your queue, spin off a thread or process to run your long running simulation, and then delete the message (signaling successful processing). In this approach, be careful of spinning off too many threads on one machine and also be wary of machines shutting down before the simulation has ended, because the queue message has already been deleted.
Final note: your question is excellently worded and sufficiently detailed :)
For those looking to run jobs shorter than 10 hours, it needs to be mentioned that the current inactivity timeout limit is 36000 seconds, so exactly 10 hours and not anymore 30 minutes, like mentioned in posts all over the web (which led me to think a workaround like described above is needed).
Check out the docs: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
A very nice write-up can be found here: https://dev.to/rizasaputra/understanding-aws-elastic-beanstalk-worker-timeout-42hi
What does "Pending Restart" mean? I have stopped and restarted my WebJob numerous times and that doesn't seem to fix it. Does it mean I have to restart my website? What caused my job to get in this state in the first place? Is there any way I can prevent this from happening in the future?
Usually, it means that the job fails to start (an exception?). Look in the jobs dashboard for logs.
Also, make sure that if the job is continuous, you actually have an infinite loop that keeps the process alive.
To add to Victor's answer, the continuous WebJob states are:
Initializing - The site was just started and the WebJob is doing it's initialization process.
Starting - The WebJob is starting up the process/script.
Running - The WebJob's process is running.
PendingRestart - The WebJob's process exited (for any good or bad reason) in less than 2 minutes since it started, for a continuous WebJob it's considered that something was probably not right with it (some exception during start-up probably as mentioned by Victor), at this point the system is waiting for 60 seconds before it'll restart the WebJob process (hence the name "pending restart").
Stopped - The WebJob was stopped (usually from the Azure portal) and is currently not running and will not be running until it is started again, best way to see this is as disabled.
Also, take a look at the webjob log, it should hold cue to what's been happening.
if the JOB is set to run continuously, once the process exits (say you are polling a queue and it's empty) the job shuts down and status changes to "pending restart". Azure Scheduler will typically restart the process in 60 seconds.
Try changing the target framework to .NET 4.5. This same issue was fixed for me when I changed the target framework from 4.6.1 to 4.5.
Had the same problem, found out that i need to keep my webjob alive, so I put a continous loop to keep it alive.
It means that application is failing after start. Check the App Service Application setting might have some problem. .. In my case i am passing date as a configuration and i entered wrong date like 20160431
This is just because Webjobs is failing or giving exception.
Make sure the Webjobs is continuous and you can check that in the log where it failing and can make the changes.
Process went down, waiting for 60 seconds
Status changed to Pending Restart
In my case, we got the deployment package prepared through the Visual Studio folder publish and deployed along with WebApp. Package created through 'Folder publish' lacked 'run.cmd' file in which command to invoke the console application (.exe) is available; this file is automatically created when we directly publish to Azure WebJob from Visual Studio.
After manually adding this to a package folder, the issue got fixed.
I've created a command in the Sitecore Editor that automatically builds out up to 25 items at a time. The problem that I'm experiencing is that the operation just "hangs" and does not complete. I don't think it's an error because I've added error handling and logging.
I'm getting the following error message "The operation could not be completed. Your session may have been lost due to a time-out or a server failure. Try again."
How can I increase the "time-out" duration (if this is a setting somewhere) - or is there another solution to this problem?
Long running operations will time out eventually depending on your IIS settings, usually after 20 mins. Instead, you should run your commands as a scheduled task, as they run in the background, with no waiting for the IIS request.
However, it seemes strange that inserting 25 items is such a long operation that the browser times out. You might have another issue in your code.