Does the HERE batch geocoding job request response take longer when there are more locations? - geocoding

After starting a new batch geocoding job (step one here), does the amount of time it takes to get a response (step two here) depend on the amount of individual geocode requests? (ie. does it take longer to get a response for 10,000 locations VS 10 locations?)
On a similar note, what are the different possible statuses that can be returned in the response? (for instance, "accepted" in step two here)
I tried looking for these answers in the HERE batch geocoding documentation, but couldn't find anything.
The HERE API FAQ page directed me here for any technical support.

Although there is a bigger infrastructure in the backend of the BGC and job items are processed in parallel to some degree, not all items of a batch job can be processed in parallel at the same time. So yes, it takes longer for bigger jobs.
A batch job can be in one of the following states:
submitted
A batch job was submitted to the batch system and is ready to be started. The batch job can be started by the user by sending the HTTP PUT "action=run" request.
accepted
Batch job has been verified for correctness and validity and has been added to the queue, waiting to be scheduled for execution.
running
Job is being processed now.
complete
Job processing completed.
cancelled
Job has been cancelled by user with HTTP PUT command action=cancel.
deleted
Job was deleted by user with HTTP DELETE command
failed
Job failed while running. This is unusual and caused by an internal error. You can try to restart the job with a PUT request and action=run or delete the job.
State transition graph

Related

Can you do batch pull messages with Google Pub Sub?

Trying to optimize our application but doing batch pulling. Pub Sub seems to allow asynchronously pulling one message at a time with different client nodes, but is there no way for a single node to do a batch pull from pub sub?
Both Streaming Pull and Pull RPC both only allow the subscriber to consume one message at a time. Right now, it looks like we would have to pull one message at a time and do application level batching.
Any insight would be helpful. Pretty new to this GCP in general.
The underlying pull and streaming pull operations can receive batches of messages in the same response. The Cloud Pub/Sub client library, which uses streaming pull, breaks these batches apart and hands them to the provided user callback one at a time. Therefore, you need not worry about optimizing the underlying receiving of messages.
If your concern is optimizing the subscriber code at the application level, e.g., you want to batch writes into a database, then you have a couple of options:
Use Pull directly, which allows one to process all of the messages in a batch at a time. Note that using pull effectively requires many simultaneously outstanding pull requests and replacing requests that return with new requests immediately.
In your user callback, re-batch messages and once the batch reaches a desired size (or you've waited a sufficient amount of time to fill the batch), process all of the messages together and then ack them.
You probably can implement that by using Dataflow (Apache Beam). You can have a running streaming job, where you group, window, transform messages according to your requirements. The results of processing can be saved in batches or steam further. It probably makes sense in case the number of messages is really big.

How to do "live request batching" in gcloud

Here is my situation:
I have a rather slow tensorflow model that runs on GPU (2 to 3 seconds per prediction)
A prediction for a single 'entity' vs a prediction for 8 'entities' takes about the same time
This means I could be 8 times as efficient by simply combining multiple predictions in the same request
I have a service on AI platform serving requests to that model
The service works for slow request rates but has trouble scaling up (anything over 4 QPS is too much to handle)
My question then is:
Is there a standard way / best practice for batching live client requests:
When receiving a request, wait a little bit for other requests
After a while, or when the number of requests reaches a set number, forward the requests in a single "batch" to another service.
If traffic is low, the delay will expire before the batch is full, but since traffic is low, that's not an issue
If traffic is high, the batch will be full before the delay, and the client will have to wait less
I have an almost-working solution with app-engine + firebase (for hosting the shared 'queue') but implementing the delay is giving me trouble (app engine doesn't seem to like python's threading.Timer
I'd appreciate something that could work with app engine, but at this point I'm open to any suggestions (as long as it is applicable on google cloud).
Thanks!
The perfect (but not the cheapest) is to use Dataflow.
When a prediction request comes in, publish it in PubSub
Deploy a dataflow in streaming mode, with fixed windows of X minutes, and another trigger, not accumulated, after Y event in the window.
When a window trigger is performed (either on the number of messages or on the timer) do the batch processing
You can imagine other designs, simpler/cheaper.
Still publish the prediction requests in PubSub
You can schedule a Cloud Functions, or a Cloud Run every X minutes to pull the pubsub subscription and then to trigger the batch job. But, it's a fixed time.
When you publish the message in PubSub, you can also store, in firestore for example, and increase a counter and the date of the 1st message published in PubSub.
If the number of message is above your threshold, perform a request to your other process that pull the PubSub subscription and run the batch processing (as before #1). Reset the counter value and the message date value
Set up a cloud scheduler which check, every minute, the value of the 1st message date in Firestore. If it's above your time limit, perform a request to your other process that pull the PubSub subscription and run the batch processing (as before #1). Reset the counter value and the message date value
The #2 will generate a lot of Firestore read/write, but will be cheaper than dataflow.

Alexa sent multiple request to AWS Lambda

I'm building the Alexa skill that sends the request to my web server,
then web server will do some process and upload a file to Amazon S3.
During the period of web server process, I make skill keep getting the file from Amazon S3 per 10 seconds till get the file. And the response is based on the file content.
But unfortunately, the web server process takes more than 1 minute. That means skill must stay more than 1 minute to get the file to response.
For now, I used progressive response with async await in my code,
and skill did keep waiting for the file on S3.
But I found that the skill will send the second request to Lambda after 50 seconds automatically. That means for the same skill, i got the two lambda function running at the same time.
And the execution result is : After the first response that progressive response made, 50 seconds later will hear another response that also made by the progressive response which belongs to the second request.
And nothing happened till the end.
I know it is bad to let skill waits this long, but i still want to figure out the executable way if skill needs to wait this long.
There are some points I want to figure out.
Is there anyway to prevent the skill to send the second
requests to Lambda?
Is there another way I can try to accomplish the goal?
Thanks
Eventually, I found that the second invoke of Lambda is not from Alexa, is from AWS Lambda itself. Refer to the following artical
https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/
So you have to deal with this kind of situation in your Lambda code. One thing can be used is these two times invoke's request id is the same. So you can tell if this is the first time execution by checking your storage for the same request id which you store at the first time execution.
Besides, I also found that once the Alexa Skill waits for more than 1 minutes, it will crash and return the error by speaking (test by Amazon Echo). And there is nothing different in the AWS Lambda log compare to the normal execution one. That meaning the Log seems to be fine but actually the execution result is not.
Hope this can help someone is also struggled at this problem.

Is there an AWS / Pagerduty service that will alert me if it's NOT notified

We've got a little java scheduler running on AWS ECS. It's doing what cron used to do on our old monolith. it fires up (fargate) tasks in docker containers. We've got a task that runs every hour and it's quite important to us. I want to know if it crashes or fails to run for any reason (eg the java scheduler fails, or someone turns the task off).
I'm looking for a service that will alert me if it's not notified. I want to call the notification system every time the script runs successfully. Then if the alert system doesn't get the "OK" notification as expected, it shoots off an alert.
I figure this kind of service must exist, and I don't want to re-invent the wheel trying to build it myself. I guess my question is, what's it called? And where can I go to get that kind of thing? (we're using AWS obviously and we've got a pagerDuty account).
We use this approach for these types of problems. First, the task has to write a timestamp to a file in S3 or EFS. This file is the external evidence that the task ran to completion. Then you need an http based service that will read that file and calculate if the time stamp is valid ie has been updated in the last hour. This could be a simple php or nodejs script. This process is exposed to the public web eg https://example.com/heartbeat.php. This script returns a http response code of 200 if the timestamp file is present and valid, or a 500 if not. Then we use StatusCake to monitor the url, and notify us via its Pager Duty integration if there is an incident. We usually include a message in the response so a human can see the nature of the error.
This may seem tedious, but it is foolproof. Any failure anywhere along the line will be immediately notified. StatusCake has a great free service level. This approach can be used to monitor any critical task in same way. We've learned the hard way that critical cron type tasks and processes can fail for any number of reasons, and you want to know before it becomes customer critical. 24x7x365 monitoring of these types of tasks is necessary, and helps us sleep better at night.
Note: We always have a daily system test event that triggers a Pager Duty notification at 9am each day. For the truly paranoid, this assures that pager duty itself has not failed in some way eg misconfiguratiion etc. Our support team knows if they don't get a test alert each day, there is a problem in the notification system itself. The tech on duty has to awknowlege the incident as per SOP. If they do not awknowlege, then it escalates to the next tier, and we know we have to have a talk about response times. It keeps people on their toes. This is the final piece to insure you have robust monitoring infrastructure.
OpsGene has a heartbeat service which is basically a watch dog timer. You can configure it to call you if you don't ping them in x number of minutes.
Unfortunately I would not recommend them. I have been using them for 4 years and they have changed their account system twice and left my paid account orphaned silently. I have to find a new vendor as soon as I have some free time.

Is there a way to set a walltime on AWS Batch jobs?

Is there a way to set a maximum running time for AWS Batch jobs (or queues)? This is a standard setting in most batch managers, which avoids wasting resources when a job hangs for whatever reason.
As of April, 2018, AWS Batch now supports setting a Job Timeout when submitting a Job, or in the job definition.
https://aws.amazon.com/about-aws/whats-new/2018/04/aws-batch-adds-support-for-automatic-termination-with-job-execution-timeout/
You specify an attemptDurationSeconds parameter, which must be at least 60 seconds, either in your job definition, or when you submit the job. When this number of seconds has passed following the job attempt's startedAt timestamp, AWS Batch terminates the job. On the compute resource, your job's container receives a SIGTERM signal to give your application a chance to shut down gracefully; if the container is still running after 30 seconds, a SIGKILL signal is sent to forcefully shut down the container.
Source: https://docs.aws.amazon.com/batch/latest/userguide/job_timeouts.html
POST /v1/submitjob HTTP/1.1
Content-type: application/json
{
...
"timeout": {
"attemptDurationSeconds": number
}
}
AFAIK there is no feature to do this. However, a workaround was suggested in the forum for a similar question.
One idea is to call Batch as an Activity from Step Functions, pingback
back on a schedule (e.g. every minute) from that job. If it stops
responding then you can detect that situation as a Timeout in the
activity and act accordingly (terminate the job etc.). Not an ideal
solution (especially if the job continues to ping back as a "zombie"),
but it's a start. You'd also likely have to store activity tokens in a
database to trace them to Batch job id.
Alternatively, you split that setup into 2 steps, and schedule a Batch
job from a Lambda in the first state, then pass the Batch job id to
the second step which then polls Batch (from another Lambda) for its
state with Retry and IntervalSeconds (e.g. once every minute, or even
with exponential backoff), and MaxAttempts calculated based on your
timeout. This way, you don't need any external state storage
mechanism, long polling or even a "ping back" from the job (it CAN be
a zombie), but the downside is more steps.
There is no option to set timeout on batch job but you can setup a lambda function that triggers every 1 hour or so and deletes jobs created before say 24 hours.
working with aws for some time now and could not find a way to set a maximum running time for batch jobs.
However there are some alternative way which you could utilize.
AWS Forum
Sadly there is no way to set the limit execution time on AWS Batch.
One solution may be to edit the docker's entry point to schedule the execution time limit.