Spring Integration Pop3MailReceiver stops polling silently without logging why - amazon-web-services

Problem
I have a very basic configuration for a Spring integration mail adapter setup (below is the relevant sample):
<int:channel id="emailChannel">
<int:interceptors>
<int:wire-tap channel="logger"/>
</int:interceptors>
</int:channel>
<mail:inbound-channel-adapter id="popChannel"
store-uri="pop3://user:password#domain.net/INBOX"
channel="emailChannel"
should-delete-messages="true"
auto-startup="true">
<int:poller max-messages-per-poll="1" fixed-rate="30000"/>
</mail:inbound-channel-adapter>
<int:logging-channel-adapter id="logger" level="DEBUG"/>
<int:service-activator input-channel="emailChannel" ref="mailResultsProcessor" method="onMessage" />
This is working fine the majority of the time and I can see the logs showing the polling (and it works fine hooking into my mailResultsProcessor when a mail is there):
2013-08-13 08:19:29,748 [task-scheduler-3] DEBUG org.springframework.integration.mail.Pop3MailReceiver - opening folder [pop3://user:password#fomain.net/INBOX]
2013-08-13 08:19:29,796 [task-scheduler-3] INFO org.springframework.integration.mail.Pop3MailReceiver - attempting to receive mail from folder [INBOX]
2013-08-13 08:19:29,796 [task-scheduler-3] DEBUG org.springframework.integration.mail.Pop3MailReceiver - found 0 new messages
2013-08-13 08:19:29,796 [task-scheduler-3] DEBUG org.springframework.integration.mail.Pop3MailReceiver - Received 0 messages
2013-08-13 08:19:29,893 [task-scheduler-3] DEBUG org.springframework.integration.endpoint.SourcePollingChannelAdapter - Received no Message during the poll, returning 'false'
The problem I have is that the polling stops during the day, with no indication in the logs why it has stopped working. The only reason I can tell is the debug above is not present in the logs and E-Mails build up on the E-Mail account.
Questions
Has anyone seen this before and know how to resolve it?
Is there a change that I can make in my configuration to capture the issue into the log? I thought the logging channel adapter set to debug would have this covered.
Using version 2.2.3.RELEASE of Spring Integration on Tomcat 7, logs output default to catalina.out. Deployed on AWS standard tomcat 7 instance.

Most likely the poller thread is hung someplace upstream. With your configuration, the next poll won't happen until the current poll completes.
You can use jstack or VisualVM to get a thread dump to find out what the thread is doing.
Another possibility is you are suffering from poller thread starvation - if you have a lot of other polled elements in your application, and depending on their configuration. The default taskScheduler bean has only 10 threads.
You can add a task executor to the <poller/> so each poll is handed off to another thread, but be aware that that can result in concurrent polls if a polled task takes longer to execute than the polling rate.

To resolve this problem specifically I used the configuration below:
<mail:inbound-channel-adapter id="popChannel"
store-uri="pop3://***/INBOX"
channel="emailChannel"
should-delete-messages="true"
auto-startup="true">
<int:poller max-messages-per-poll="5" fixed-rate="60000" task-executor="pool"/>
</mail:inbound-channel-adapter>
<task:executor id="pool" pool-size="10" keep-alive="50"/>
Once moving to this approach we saw no further problems, and is with any use of pool the advantage is any Threads that become a problem are cleaned up and recreated.

Related

Does Windows ever stop services when resuming from sleep?

I'm running on windows 8.
Occasionally, when I resume from sleep, my service gets a stop request through the SCM (call to SvcCtrlHandler with SERVICE_CONTROL_STOP). I wasn't able to trace the source of this request. Can it possibly be sent by the OS itself, in some scenario?
My two main suspicions right now:
If the resume event (SERVICE_CONTROL_POWEREVENT of type PBT_APMRESUMEAUTOMATIC) is taking too long, the OS might stop the service (system logs contain logs referring to this specific service: A timeout was reached (30000 milliseconds) while waiting for the [...] The service did not respond to the start or control request in a timely fashion)
The OS stops the service because it has been flagged as a problematic service (system logs contain logs referring to this specific service: service did not shut down properly after receiving a preshutdown control

Automate Suspended orchestrations to be resumed automatically

We have a BizTalk application which sends XML files to external applications by using a web-service.
BizTalk calls the web-services method by passing XML file and destination application URL as parameters.
If the external applications are not able to receive the XML, or if there is no response received from the web-service back to BizTalk the message gets suspended in BizTalk.
Presently for this situation we manually go to BizTalk admin and resume each suspended message.
Our clients want this process to be automated all, they want an dashboard which shows list of message details and a button, on its click all the suspended messages have to be resumed.
If you are doing this within an orchestration and catching the connection error, just add a delay shape configured to 5 hours. Or set a retry interval to 300 minutes and multiple retries on the send port if that makes sense. You can do this using the rule engine as well.
Why not implement an asynchronous pattern?
You make it so, so that the orchestration sends the file out via a send shape while initializing a certain correlation set.
You then put a listen shape with at one end:
- the receive (following the initialized correlation set)
- a delay shape set to 5 hours.
When you receive the message, your orchestration can handle it gracefully.
When you don't, the delay shape will kick in and you handle accordingly.
Benefit to this solution in comparison to the solution of 40Alpha will be that your orchestration will only 'wake up' from a dehydrated state if the timeout kicks in OR when the response is received. In the example of 40Alpha, the orchestration would wake up a lot of times, consuming extra resources.
You may want to look a product like BizTalk 360. It has those sort of monitoring and command built into it. I'm not sure it works with BizTalk 2006R2 though, but you should be thinking about moving off that platform anyway as it is going out of Microsoft support.

What Does Azure WebJob "Pending Restart" Mean?

What does "Pending Restart" mean? I have stopped and restarted my WebJob numerous times and that doesn't seem to fix it. Does it mean I have to restart my website? What caused my job to get in this state in the first place? Is there any way I can prevent this from happening in the future?
Usually, it means that the job fails to start (an exception?). Look in the jobs dashboard for logs.
Also, make sure that if the job is continuous, you actually have an infinite loop that keeps the process alive.
To add to Victor's answer, the continuous WebJob states are:
Initializing - The site was just started and the WebJob is doing it's initialization process.
Starting - The WebJob is starting up the process/script.
Running - The WebJob's process is running.
PendingRestart - The WebJob's process exited (for any good or bad reason) in less than 2 minutes since it started, for a continuous WebJob it's considered that something was probably not right with it (some exception during start-up probably as mentioned by Victor), at this point the system is waiting for 60 seconds before it'll restart the WebJob process (hence the name "pending restart").
Stopped - The WebJob was stopped (usually from the Azure portal) and is currently not running and will not be running until it is started again, best way to see this is as disabled.
Also, take a look at the webjob log, it should hold cue to what's been happening.
if the JOB is set to run continuously, once the process exits (say you are polling a queue and it's empty) the job shuts down and status changes to "pending restart". Azure Scheduler will typically restart the process in 60 seconds.
Try changing the target framework to .NET 4.5. This same issue was fixed for me when I changed the target framework from 4.6.1 to 4.5.
Had the same problem, found out that i need to keep my webjob alive, so I put a continous loop to keep it alive.
It means that application is failing after start. Check the App Service Application setting might have some problem. .. In my case i am passing date as a configuration and i entered wrong date like 20160431
This is just because Webjobs is failing or giving exception.
Make sure the Webjobs is continuous and you can check that in the log where it failing and can make the changes.
Process went down, waiting for 60 seconds
Status changed to Pending Restart
In my case, we got the deployment package prepared through the Visual Studio folder publish and deployed along with WebApp. Package created through 'Folder publish' lacked 'run.cmd' file in which command to invoke the console application (.exe) is available; this file is automatically created when we directly publish to Azure WebJob from Visual Studio.
After manually adding this to a package folder, the issue got fixed.

Correct way to register for pre-shutdown notification from C++

I write a local service application using C++ and I can't find the correct way of registering for a pre-shut-down notification (for OS later than Windows XP). I believe that SERVICE_CONTROL_PRESHUTDOWN notification has been added since Vista, but when you call SetServiceStatus do we need to specify:
dwServiceStatus.dwControlsAccepted = SERVICE_ACCEPT_PRESHUTDOWN;
or
dwServiceStatus.dwControlsAccepted = SERVICE_ACCEPT_SHUTDOWN | SERVICE_ACCEPT_PRESHUTDOWN;
You cannot accept both a shutdown and a preshutdown if your service is correctly coded. The documentation explicitly states this.
From http://msdn.microsoft.com/en-us/library/windows/desktop/ms683241(v=vs.85).aspx:
Referring to SERVICE_CONTROL_PRESHUTDOWN:
A service that handles this notification blocks system shutdown until the service stops or the preshutdown time-out interval specified through SERVICE_PRESHUTDOWN_INFO expires.
In the same page, the section about SERVICE_CONTROL_SHUTDOWN adds:
Note that services that register for SERVICE_CONTROL_PRESHUTDOWN notifications cannot receive this notification because they have already stopped.
So, the correct way is to set the dwControlsAccepted to include either SERVICE_ACCEPT_SHUTDOWN or SERVICE_ACCEPT_PRESHUTDOWN, depending on your needs, but not to both at the same time.
But do note that you probably want to accept more controls. You should always allow at least SERVICE_CONTROL_INTERROGATE, and almost certainly allow SERVICE_CONTROL_STOP, since without the latter the service cannot be stopped (e.g. in order to uninstall the software) and the process will have to be forcibly terminated (i.e. killed).
As noted by the commenters above, you will need to choose from either SERVICE_ACCEPT_SHUTDOWN or SERVICE_ACCEPT_PRESHUTDOWN (Vista or later). If you are using SERVICE_ACCEPT_PRESHUTDOWN, you will need to register your service with the SCM using RegisterServiceCtrlHandlerEx instead of RegisterServiceCtrlHandler else you will not be receiving the pre-shutdown notifications. The handler prototype also changes from Handler to HandlerEx.
Another point to note is that handling pure shutdown events is limited to 5 seconds in Windows Server 2012 (and presumably Windows 8), 12 seconds in Windows 7 and Windows Server 2008, 20 seconds in Windows XP before your service is killed while stopping. This is the reason why you may need the pre-shutdown notification. You may want to change this at \\HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout.
In the comment from alexpi there is a key piece of information. I found that the service handling PRESHUTDOWN needs to update the service status with a new checkpoint number (repeatedly) before WaitToKillServiceTimeout has elapsed. My server was configured to 5000 ms and my service only updated every 12000 ms, and the server went into the SHUTDOWN phase, which caused my attempt to stop another service to return the error that the shutdown was in progress.
These two notifications seem to be different as I get it from the documentation. If what you need is really to enable your service to recieve preshutdown notification, you should go with: dwServiceStatus.dwControlsAccepted = SERVICE_ACCEPT_PRESHUTDOWN; But if you also want to enable your service to receive shutdown notifications, you should go with your second option.

How to kill /re-start a long running task

Is there a way to kill / re-start a long running task in AWS SWF? Sometimes some of our tasks run for a longer duration and we would like to manually kill a certain task (either via UI or programmatically) and re-start the task if possible. How to achieve this?
Console is option to manually kill workflow.
You can also set timeouts to whole workflow execution time or to individual activities. This can be set when you register your activity or when you start your activity (defaultTaskStartToCloseTimeoutSecond).
It's not clear what language you're using.
If you're using java, then you should look into Exponential Retry in Flow Framework. This make SDK restart your activity if it fails.
Long running activity is expected to heartbeat using RecordActivityTaskHeartbeat. It leads to timeout failure after short hearbeat interval instead of long task execution timeout if the activity process hangs or crashes.
The workflow code (decider) can always request activity cancellation through RequestCancelActivityTask decision. The cancellation request is returned as output of the RecordActivityTaskHeartbeat call. Activity implementation should cancel itself and report back to the service using RespondActivityTaskCanceled API call.
See Error Handling section of AWS Flow Framework Developer Guide for the AWS Flow Framework way of cancelling activities.
Sometimes activity implementation cannot support heartbeating and self cancellation. The solution is to execute another kill activity that terminates the first activity execution. For example under Unix such kill activity could emit "kill -9" command for the process that implements the first one.