We've got a little java scheduler running on AWS ECS. It's doing what cron used to do on our old monolith. it fires up (fargate) tasks in docker containers. We've got a task that runs every hour and it's quite important to us. I want to know if it crashes or fails to run for any reason (eg the java scheduler fails, or someone turns the task off).
I'm looking for a service that will alert me if it's not notified. I want to call the notification system every time the script runs successfully. Then if the alert system doesn't get the "OK" notification as expected, it shoots off an alert.
I figure this kind of service must exist, and I don't want to re-invent the wheel trying to build it myself. I guess my question is, what's it called? And where can I go to get that kind of thing? (we're using AWS obviously and we've got a pagerDuty account).
We use this approach for these types of problems. First, the task has to write a timestamp to a file in S3 or EFS. This file is the external evidence that the task ran to completion. Then you need an http based service that will read that file and calculate if the time stamp is valid ie has been updated in the last hour. This could be a simple php or nodejs script. This process is exposed to the public web eg https://example.com/heartbeat.php. This script returns a http response code of 200 if the timestamp file is present and valid, or a 500 if not. Then we use StatusCake to monitor the url, and notify us via its Pager Duty integration if there is an incident. We usually include a message in the response so a human can see the nature of the error.
This may seem tedious, but it is foolproof. Any failure anywhere along the line will be immediately notified. StatusCake has a great free service level. This approach can be used to monitor any critical task in same way. We've learned the hard way that critical cron type tasks and processes can fail for any number of reasons, and you want to know before it becomes customer critical. 24x7x365 monitoring of these types of tasks is necessary, and helps us sleep better at night.
Note: We always have a daily system test event that triggers a Pager Duty notification at 9am each day. For the truly paranoid, this assures that pager duty itself has not failed in some way eg misconfiguratiion etc. Our support team knows if they don't get a test alert each day, there is a problem in the notification system itself. The tech on duty has to awknowlege the incident as per SOP. If they do not awknowlege, then it escalates to the next tier, and we know we have to have a talk about response times. It keeps people on their toes. This is the final piece to insure you have robust monitoring infrastructure.
OpsGene has a heartbeat service which is basically a watch dog timer. You can configure it to call you if you don't ping them in x number of minutes.
Unfortunately I would not recommend them. I have been using them for 4 years and they have changed their account system twice and left my paid account orphaned silently. I have to find a new vendor as soon as I have some free time.
I have scheduled a C# console application in Task Scheduler of Windows 2012 R2. Application will run when executed it manually or Right click on scheduled task and click on Run, but it is failed when triggered by Task Scheduler with below error.
The operator or administrator has refused the request(0x800710E0)
I have followed below steps also after Google search
Selected "Run whether user logged in or not"
Unchecked "Start the task only if the computer is on AC power"
In my case, the error message "The operator or administrator has refused the request" meant that a previous instance of the task has still been running and the task was configured to not start a new instance if it's already running (the default configuration), so the Task Scheduler refused to start a new instance when the task was triggered.
You can find that option in a select box on the task's Settings tab, under the caption "If the task is already running, then the following rule applies". The default value is "Do not start a new instance".
But that error message is pretty confusing. From the other answers, you may see that it may mean many completely distinct errors. As is usual in Microsoft's products.
Tip
It's helpful to check the History tab of a task. That's where I have found out what's actually going on. There was an event "Launch request ignored, instance already running".
In my case, I had to redo the permissions on the task. Somehow it had lost the domain portion of the username. Instead of `DOMAIN\joeuser' it was just 'joeuser'. After a reset, it worked correctly as it had for the previous year.
In my case as per having a job setup with Task Scheduler as written about in the "Prevent a Task Scheduler Task from Executing on Setting Updates", I had a job setup to run every "X" minutes for a period of indefinitely.
Upon seeing the dreaded "The operator or administrator has refused the request" for the Last Run Result, I looked over the History tab and see detail indicating that is "missed its schedule".
The Solution
From the Settings tab of the job properties, I simply checked the option "Run task as soon as possible after a scheduled start is missed", and problem resolved; although, I did have to type in the credential again as well.
Note: This started occurring once a server was moved from a redundant backup server once hardware repair was completed back to the original hardware. The OS was Server 2012 R2 and the OS was moved to other hardware while repair was done on the production server but I didn't notice this there—maybe an oversight there though—not sure.
I know that #Sushmit-Patil found a solution, but I wanted to add a solution to my similar problem:
It turns out a prior process never exited (it was hanging around in memory because of a defect I had in my code). By default, Windows Task Scheduler won't run the process again if it's already running.
In addition to fixing the defect, in Task Scheduler, under the Settings tab, I set If the task is already running, then the following rule applies: to Run a new instance in parallel
1
Error occurred due to folder permission, I was creating CSV from my application, which was required folder permission to be granted. After giving Full Control to the folder error got resolved.
For me, the solution was to check Run with highest privileges in the properties.
In my case my task launches a PowerShell script--and it produced the "The operator or administrator has refused the request (0x800710E0)" error message as seen in the Task Scheduler's task-entry grid. My user name was correct, but when I dropped to a command prompt and simulated the task by running the PowerShell against my .ps1 file, I saw an Avast prompt that flagged my script as suspicious and wasn't allowing it to run. I created an Avast exception and now the task runs without any issue.
After turning on history I also had the error "Missed task start rejected: Task Scheduler did not launch task as it missed its schedule." but I didn't want the task to start when I woke up the computer, I wanted to figure out why the computer didn't wake up.
This answer helped me out -- by default Windows was waking for "Important Wake Timers Only" (system updates, but not my scheduled task).
In the setting Power Options > Edit Plan Settings > Change advanced power settings > Sleep > Allow wake timer change the option to "Enabled" and then your computer will wake up to run the task.
You can also do this from "settings". Probably earlier instance was already running and launching a new instance failed.
In my case, the error message "The operator or administrator has refused the request" appeared because the computer was in stand-by at the scheduled time (and the options "Wake the computer to run this task" and "Run task as soon as possible after a scheduled start was missed" were unchecked).
I had previously chosen "Enable All Tasks History" and a more useful error message appeared in the History tab: "Missed task start rejected: Task Scheduler did not launch task as it missed its schedule. Consider using the configuration option to start the task when available, if schedule is missed."
I have found what I believe to be a bizarre bug in Windows Server 2016 scheduler and maybe other Windows Server versions that produces the OP's error (and a workaround):
Here are the conditions:
You're using the "Monthly" option trigger in your task (I currently have all months selected and just a couple days chosen, e.g. 1st and 15th)
You have the "Synchronize across time zones" selected.
This was originally an issue I found back in November 2020 when my tasks were running twice all of a sudden after the DST time change (and this was a widely reported bug, but not an obvious solution). I never would have known, except that users started receiving duplicate emails from one of my tasks. In the history you would simply see the task running twice at what appeared to be exactly the same time. It worked fine before the time change. I forget all the troubleshooting I did then, but my end theory was that it was somehow confusing the time after the time change. The work around was to set the option "Synchronize across time zones" and all seemed well...
Fast forward to March when the DST time just changed back again and now I get every time the tasks with the Monthly option runs:
The operator or administrator has refused the request
The History tab on the task is also blank. If you change options and save, the History tab starts logging again and then sometimes stops if the task errors again. Weird.
One work around is to simply turn off the "Synchronize across time zones" option (tested). However, I don't recommend that option as I assume you'll have the duplicate running task issue again when the DST time changes again in November.
The one time I got an error to show in the History tab it stated:
Task Scheduler did not launch task "\EmailCampaign" as it missed its
schedule. Consider using the configuration option to start the task
when available, if schedule is missed.
Therefore, I went and set that option to start the task if the schedule is missed and all seems well. I figured I'd see the original error and then subsequently the task running, but no error any more either. It all just works.
I know this solution was reported above, but that's because most people's computers were asleep or something to that effect. My issue is on a production internet facing server that doesn't go to sleep, hibernate or anything related and only happens with specific conditions related to the Monthly trigger option. All my others tens of scheduled tasks work flawless.
I wrote a Powershell script to do a task. I was getting this error and landed here (as well as other lower ranked search results). The task would run manually and the first time it was triggered, but not on repeat even though I had it set up to end the task if it took longer than a minute.
My problem was caused by not providing an exit code in my powershell script. Task scheduler simply did not know the task had finished and would consider it still running. I could have simply allowed the next instance of the task to be started if the previous was not finished, but using the exit code is the 'right way'.
So I simply added a new line on the end of my PS1 --
exit
This topic is old but I had the same problem on windows server 2016.
My task executes a BAT script that zip a folder and upload on an external backup.
The task never ended because there was a "pause" at the end of my script. And my task was configured with "Dot not start a new instance" settings.
I solved my problem by removing the "pause". I don't know if it will be useful..
This is pretty simple question. I am posting this because I couldn't get any satisfying answer. First the background: I have Jenkins job that builds and deploys a web application on to a server. The server takes some time (in the order of 5 to 10 minutes). I would like to setup a job (or modify the existing as required) to rig up the unit test case execution which will test the application. I am thinking of the following approaches. I would like you to validate or suggest any alternatives:
Have an Ant target that waits for a fixed time
Have a custom Ant target which pings the URL and checks for app availability
Thanks in advance for your help.
-Vadiraj.
Waiting for a fixed time has the problem that the time you choose is either to short (build fails) or to long (waste of build time). So I think it would be better to check if the app is available.
I have done something similar for my Selenium tests. I had to wait until the Selenium Remote Server has started. I used the waitfor element. For a detailled documentation see here.
Here is a stripped down version of my ant-Target:
<parallel>
<sequential>
... Start web application server ...
</sequential>
<sequential>
<waitfor maxwait="10" maxwaitunit="minute">
<socket server="localhost" port="8080" />
</waitfor>
<junit>
...
</junit>
</sequential>
</parallel>
If your server is available before the web app is deployed you can try to use the http condition instead of socket to check for a HTTP error code. The conditions are documented here.
I use Django-Celery +rabbitmq to execute some asyn tasks,I define a queue 'sendmail' to execute send email task,send mail is triggered by a specific task(this task has own queue), but now I encounter a problem,after the specific task finish, the mail sometimes send at once, sometimes need 5-20minutes.I want to know what reason caused it.
Django-celery will package the taskname and param as message to rabbitmq when call task.delay().
I want to know when the message go to the rabbitmq, but use web management tool only can see total messages,can't see the every message's detail, especially the time the message reached. Django-celery log can only see the work got from broker time and execute task time.I want to know all related timepoint to sure which step the time main consumed.
Django-Celery does (I believe) report task data on a per-task basis. When you sync your database, it crates a bunch of monitoring tables which are accessible via the admin. However, in order for these tasks to be recorded in these tables, you need to run the celerycam program in the django context (python ./manage.py celerycam). The celerycam program will take "snapshots" of your tasks every second or so (by default) and record information about them. Another useful tool for monitoring is the celerymon program (which also has to run in the django context). This is a command line ncurses program that reports real-time information about tasks as they occur. Finally, rabbitmqctrl has a bunch of options that might help with monitoring.
This is a particularly useful page in the docs:
http://celery.github.com/celery/userguide/monitoring.html
Anyway, this is what I use to monitor my tasks when using celery.
I have just done a clean install of CF8 on a Windows 2000 machine. I have a scheduled task I need to run every 15 minutes on this machine, and the machine does little else.
The task is set up as normal through CF admin, but for some reason, when the task takes about 5 minutes to run it will complete fine (I can see this from debug output and from cfstat) but the scheduler will not reschedule the task.
The scheduling log shows that the task started to execute, but not entry that it was rescheduled. Eg:
[ProcessRecords] Executing at Wed May 20 10:30:00 BST 2009
I have been over my server timeouts. I have NO timeout in CF admin and this particular script has a <cfsetting requesttimeout="43200" /> tag set. There are no exceptions in the console logging. The last bit of console logging is the very last debug statement in my .cfm template.
I do notice that task that run in a shorter time, say for example under a minute, will reschedule as normal.
Has anyone come across a problem like this before?
I'm baffled. Any and all replies are appreciated!
Cheers,
Ciaran
not for nothing, but i've never seen anything like this with cf8. are you sure that you have the latest hotfix and jvm installed? this might have been something in cf8 that was fixed in 8.01.
hotfix 2 for cf8.01
list of all hotfixes and updates for cf8.01
hotfix 3 for cf8
list of all hotfixes and updates for cf8
latest jvm
upgrade instruction for jvm
If you suspect that it's an uncaught exception causing the issue, then might I suggest logging portions of the process. Case in point, I had a similar problem with a scheduled task where it would just bottom out for no reason (never had the reschedule problem though). What I ended up doing to diagnose the problem was use cflog to write out portions of the process as they completed. This particular task too about 4 minutes to complete but ran through about 200 portions (it was a mass emailer for a bunch of clients).
I logged the when the portion started and completed along with how log it took. By doing so, i could see what portion would trip up the whole process and knew where to focus my attention.