HERE Batch Geocoder Accepted but never Finishes

HERE Batch Geocoder Accepted but never Finishes - geocoding

I've been evaluating moving our Mapping and Routing apps to use HERE's Rest API. I've been testing some scenarios to proof it out and one I can't seem to get working correctly is the Batch Geocoding.
The submission of the data to Geocode works fine and I do get a valid RequestID back but when I poll for the status of the Batch Job the status always says "accepted" but never seems to change.
I am using a developer account that has a 90 day trial. Could there be a limitation due to the type of account?

Looks like it's a queue issue, except mine has been going on for nearly a week.
HERE API never runs batch job, always returns accepted status

Related

GCP Alert Filters Don't Affect Open Incidents

I have an alert that I have configured to send email when the sum of executions of cloud functions that have finished in status other than 'error' or 'ok' is above 0 (grouped by the function name).
The way I defined the alert is:
And the secondary aggregator is delta.
The problem is that once the alert is open, it looks like the filters don't matter any more, and the alert stays open because it sees that the cloud function is triggered and finishes with any status (even 'ok' status keeps it open as long as its triggered enough).
ATM the only solution I can think of is to define a log based metric that will count it itself and then the alert will be based on that custom metric instead of on the built in one.
Is there something that I'm missing?
Edit:
Adding another image to show what I think might be the problem:
From the image above we see that the graph wont go down to 0 but will stay at 1, which is not the way other normal incidents work

According to the official documentation:
"Monitoring automatically closes an incident when it observes that the condition is no longer met or when 7 days have passed without an observation that the condition is still being met."
That made me think that there are times where the condition is not relevant to make it close the incident. Which is confirmed here:
"If measurements are missing (for example, if there are no HTTP requests for a couple of minutes), the policy uses the last recorded value to evaluate conditions."
The lack of HTTP requests aren't a reason to close the metric as it keeps using the last recorded value (that triggered the metric).
So, using alerts for Http Requests is fine but you need to close them by yourself. Although I think it would be better to use a custom metric instead if you want them to be disabled automatically.

django with heavy computation and long runtime - offline computation and send results

I have a django app where the user sends a request, and the server does some SQL lookup, followed by computation on results and finally showing the results to the user.
The SQL lookup and the computation afterwards can take a long time, maybe 30+ minutes. I have seen some webpages ask for email in such cases then send you the URL later. But I'm not sure how this can be done in django or whether there are other options for this situation. Any pointer will be very helpful.
(I'm sorry but as I said it's a rather general question, I don't know how can I provide a min runnable code for this)

One way to accomplish this would be to use something like Celery, which is a distributed task queue. The processing task would go into the queue (synchronously or asynchronously), and it would call a function to send an email to the user alerting them it is ready when the task is complete.
Documentation: https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html

Alexa sent multiple request to AWS Lambda

I'm building the Alexa skill that sends the request to my web server,
then web server will do some process and upload a file to Amazon S3.
During the period of web server process, I make skill keep getting the file from Amazon S3 per 10 seconds till get the file. And the response is based on the file content.
But unfortunately, the web server process takes more than 1 minute. That means skill must stay more than 1 minute to get the file to response.
For now, I used progressive response with async await in my code,
and skill did keep waiting for the file on S3.
But I found that the skill will send the second request to Lambda after 50 seconds automatically. That means for the same skill, i got the two lambda function running at the same time.
And the execution result is : After the first response that progressive response made, 50 seconds later will hear another response that also made by the progressive response which belongs to the second request.
And nothing happened till the end.
I know it is bad to let skill waits this long, but i still want to figure out the executable way if skill needs to wait this long.
There are some points I want to figure out.
Is there anyway to prevent the skill to send the second
requests to Lambda?
Is there another way I can try to accomplish the goal?
Thanks

Eventually, I found that the second invoke of Lambda is not from Alexa, is from AWS Lambda itself. Refer to the following artical
https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/
So you have to deal with this kind of situation in your Lambda code. One thing can be used is these two times invoke's request id is the same. So you can tell if this is the first time execution by checking your storage for the same request id which you store at the first time execution.
Besides, I also found that once the Alexa Skill waits for more than 1 minutes, it will crash and return the error by speaking (test by Amazon Echo). And there is nothing different in the AWS Lambda log compare to the normal execution one. That meaning the Log seems to be fine but actually the execution result is not.
Hope this can help someone is also struggled at this problem.

SWF Activity is not completing even though the computation has finished

I'm testing a new SWF workflow, and I've got some activity that makes a RESTful call out to another service. Problem is, I can see through logging that the actual call takes less than a second to complete, but the Activity always times out in SWF (START_TO_CLOSE of 5 mins). Being more specific, the RESTful call is a list call, and when I limit the batch size to a small number, the Activity completes and moves on very quickly. But at some seemingly arbitrary threshold, it chokes completely.
Does anyone have any insight into this? I've read that SWF calls have a size limitation of 1 MB, does anyone know how to find the size of data my workers are trying to pass SWF?

After some remote debugging, it turns out the response from the task is too big and the activity is failing silently. The failure occurs when the framework tries to report the response back to SWF, and the SDK calls RespondActivityTaskCompleted. That API has a length restriction on the internal result param:
Length Constraints: Maximum length of 32768.
This is a validation error that throws an uncaught exception and is swallowed internally until the Activity times out.

I wouldn't recommend using activity input and output parameters for passing large data sets. SWF is an orchestration technology, not the data passing one. The standard workarounds are:
Storing result in a separate store (S3 for example) and passing reference to it.
Caching result locally on a machine and route all following activities to the same host for them to have access to the cached result. See fileprocessing sample for the details of routing approach.
BTW. Have you checked out Cadence which is an open source version of SWF with much better client side libraries?

Is there an AWS / Pagerduty service that will alert me if it's NOT notified

We've got a little java scheduler running on AWS ECS. It's doing what cron used to do on our old monolith. it fires up (fargate) tasks in docker containers. We've got a task that runs every hour and it's quite important to us. I want to know if it crashes or fails to run for any reason (eg the java scheduler fails, or someone turns the task off).
I'm looking for a service that will alert me if it's not notified. I want to call the notification system every time the script runs successfully. Then if the alert system doesn't get the "OK" notification as expected, it shoots off an alert.
I figure this kind of service must exist, and I don't want to re-invent the wheel trying to build it myself. I guess my question is, what's it called? And where can I go to get that kind of thing? (we're using AWS obviously and we've got a pagerDuty account).

We use this approach for these types of problems. First, the task has to write a timestamp to a file in S3 or EFS. This file is the external evidence that the task ran to completion. Then you need an http based service that will read that file and calculate if the time stamp is valid ie has been updated in the last hour. This could be a simple php or nodejs script. This process is exposed to the public web eg https://example.com/heartbeat.php. This script returns a http response code of 200 if the timestamp file is present and valid, or a 500 if not. Then we use StatusCake to monitor the url, and notify us via its Pager Duty integration if there is an incident. We usually include a message in the response so a human can see the nature of the error.
This may seem tedious, but it is foolproof. Any failure anywhere along the line will be immediately notified. StatusCake has a great free service level. This approach can be used to monitor any critical task in same way. We've learned the hard way that critical cron type tasks and processes can fail for any number of reasons, and you want to know before it becomes customer critical. 24x7x365 monitoring of these types of tasks is necessary, and helps us sleep better at night.
Note: We always have a daily system test event that triggers a Pager Duty notification at 9am each day. For the truly paranoid, this assures that pager duty itself has not failed in some way eg misconfiguratiion etc. Our support team knows if they don't get a test alert each day, there is a problem in the notification system itself. The tech on duty has to awknowlege the incident as per SOP. If they do not awknowlege, then it escalates to the next tier, and we know we have to have a talk about response times. It keeps people on their toes. This is the final piece to insure you have robust monitoring infrastructure.

OpsGene has a heartbeat service which is basically a watch dog timer. You can configure it to call you if you don't ping them in x number of minutes.
Unfortunately I would not recommend them. I have been using them for 4 years and they have changed their account system twice and left my paid account orphaned silently. I have to find a new vendor as soon as I have some free time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js