Monitor that lambda executes in NewRelic - amazon-web-services

I'm trying to monitor if my Lambda has been executed within the last 25 hours within New Relic. I want to alert if it hasn't.
I have the following NRQL which gives me the graph I want to see:
SELECT sum(`provider.invocations.Sum`) FROM ServerlessSample WHERE provider.resource = 'my_lambda_name'
I then just want to say that if it dips below 1 for 1500 minutes (25 hours) then alert, but NR only allows me to set an alarm for 120 minutes. Any tips on how to get around this?

Interesting question, as I have seen in New Relic discussion page, or Explorers Hub, there might be solution for your task.
Can you please review this link:
https://discuss.newrelic.com/t/relic-solution-extending-the-functionality-of-nrql-alert-conditions-beyond-a-single-minute/75441
If you think about this for a moment, you might see how NRQL queries using percentile or stddev are a lot less useful than they seem, when used in an alert condition. After all, if you calculate the standard deviation over an hour (or 24 hours), that can be meaningful. But stddev(duration), or percentile(duration,95) calculated over only 60 seconds is less meaningful.
I think that limit is 24 hours but I haven't test it yet.
Hope this will help you, I will try to give it a go as well to see will this work.

Related

Inconsistent Lambda AWS Runtimes

We have created an Amazon Web Services Lambda Function in python and added code to get the execution time because we noticed that the runtime is varying significantly based on how long it has been since the last lambda call and provision concurrency seems to have little impact. For all the runs in the table below, we were running the same lambda with the same inputs and the program is deterministic, so everything related to our code is held constant. Below is a table of what we found. Interval means how long we waited before making another call to the lambda; for example 2 hours means that if we called a lambda at 1:00 PM, we called it again at 3:00 PM and 5:00 PM etc. (but not in between). # Invocations is the total number of times the lambda was called in succession for a given interval.
Provisioned Concurrency
Interval
# Invocations
Avg Duration
p90 Duration
off
2 hours
42
17.9 sec
32.0 sec
off
10 minutes
304
14.6 sec
21.2 sec
off
1 minute
364
4.5 sec
6.3 sec
on
2 hours
50
18.1 sec
31.2 sec
on
10 minutes
49
10.2 sec
29.6 sec
on
1 minute
404
4.6 sec
6.3 sec
So, a few questions that seem odd that maybe someone else has experience or knows something about:
We thought there would be a "hot" versus "cold" setting for the lambdas where provision concurrency would keep the lambdas "hot" all the time. However, 10 minutes versus 2 hours seems to be in a "warm" state (i.e. 10 minutes is faster than 2 hours, but not much faster). Then, at 1 minute, the lambda appears to be "hot" and consistently fast. Any idea what is going on here?
Also, we thought that provisioned concurrency would essentially make the lambda stay in the "hot" state regardless of invocation interval; however, that is not the case as can be seen with the 2 hour interval actually taking slightly longer. Having said that though, the 10 minute interval is slightly shorter with provision concurrency on.
Any help/incite on this is greatly appreciated.
Extra information based on comments:
The Lambda is triggered via API Gateway from an HTTP POST request.
The runtime is a custom Docker image. We are doing this so we can use a particular version of Tensorflow.
The Lambda does use other services: S3 and DynamoDB. We added logs to time each section, and we have identified that the majority of the time spent is inside of our self-contained section. Meaning, it doesn’t appear that S3 download/upload or DynamoDB read/write is taking much time at all. Most of the time is spent in our CPU intensive algorithm, which does image processing with Tensoflow. For those familiar with Deep Learning, the program loads a trained model and we do inference with that model and each test used the same image as input.
We’ve also tried adjusting the CPU/Memory of the lambda, and found it doesn’t seem to improve much beyond a certain point, and it does not fix the problem. The experiments from the table were gathered using 4GB Lambdas, and our logs report that less than 1GB of memory is ever used.

Slot time consumed in BigQuery

I ran a query which resulted in the below stats.
Elapsed time: 12.1 sec
Slot time consumed: 14 hr 12 min
total_slot_ms: 51147110 ( which is 14 hr 12 min)
We are on an on-demand pricing plan. So the max slots would be 2000. That being said, if I used 2000 slots for the whole 12.1 seconds span then I should end up with total_slot_ms as 24200000 ( which is 2000x12.1x1000). However, the total_slot_ms is 51147110. Average number of slots used are 51147110/121000 = 4225 ( which is way above 2000). Can some explain to me how I ended up using more than 2000 slots?
In a course of Google, there is an example where a query shows 13 "elapsed time" seconds and 50 minutes of "slot time consumed". They says:
Hey, across all of our workers, we did essentially 50 minutes of work massively in parallel, 50 minutes so that your query could be returned back in 13 seconds. Best of all for you, you don't need to worry about spinning up those workers, moving data in-between them, making sure they're sharing all their results between their aggregations. All you care about is writing the SQL, finding the insights, and then running that query in a very fast turnaround. But there is abstracted from you a lot of distributed parallel processing that's happening.
Increasing Bigquery slot capacity significantly improves overall query performance, despite the fact that slots amount is actually the subject for Quotas restriction along Bigquery on-demand pricing plan, exceeding slots limit does not charge you for additional costs:
BigQuery slots are shared among all queries in a single project.
BigQuery might burst beyond this limit to accelerate your queries.
To check how many slots you're using, see Monitoring BigQuery using
Cloud Monitoring.
BigQuery on-demand supports limited bursting. https://cloud.google.com/bigquery/docs/release-notes#December_10_2019
You might want to check execution plan for the query and understand all different slot_time_ms for wait, read, write activities at each stage. Since this is on-demand slots, you may see lots of wait time, that will add up into total time.
Besides bursting, each stage of explain pan will help you understand that total time is not necessarily actual slot consumption but equivalent slot consumption.

Camunda. Retry Service Task. Many different time intervals

can I configure for “retry time cycle” many different interval expression for example
something like that: “R6/PT10S, R2/PT30M” - 6 times each 10 Seconds and then 2 times after 30 minutes
Thanks in Advance,
Wladi
The job executor section of the camunda user guide only shows an example of comma separated intervals without repeats.
Looking at the code it seems a repeat is only recognized if there is a single interval configured
https://github.com/camunda/camunda-bpm-platform/blob/7.13.0/engine/src/main/java/org/camunda/bpm/engine/impl/util/ParseUtil.java#L88
The lookup of which interval to apply also does not consider any repeats
https://github.com/camunda/camunda-bpm-platform/blob/7.13.0/engine/src/main/java/org/camunda/bpm/engine/impl/cmd/DefaultJobRetryCmd.java#L113
This sounds like a useful feature though, you might want to open an issue in the camunda issue tracker.

How do I add a time delay to an Alexa Quiz SKill

I am creating a Quiz based Alexa Skill. This Quiz has three levels (1,2 and 3). I would like to reduce the amount of time the user has to answer as they progress through levels.
I'm aware that I cannot extend the 8 second reply time that is fixed with Alexa Skills so here is my current attempt. At level 1, the user will have the initial 8 seconds to respond and if they do not, Alexa will re-prompt them, adding another 8 seconds. In total level 1 players have approx 16 seconds to respond. At level two I will not allow Alexa to re-prompt the user, but after the 8 seconds state that they have run out of time and tell the user their score before saving it, so level 2 plays have roughly 8 seconds. However, I'm unsure whether or not I can reduce the initial 8 seconds to 5 seconds for level 3. Any help is much appreciated, thanks.
Edit: This is all taking place within a Amazon Lambda function
Unfortunately, this isn't something that can be changed. The time allowed for the user to respond is fixed.
You don 't have control over the timeouts aside from 8 seconds + 8 seconds in the reprompt. The only workaround I can think of is measuring the roundtrip of the question/answer in the backend and reject the answers that go above the time limit you want to configure.
There is a workaround.
You may use an audio in SSML and prompt the user to precede the answer with the awake word.
<speak>
You have 30 seconds, when you are ready just say: Alexa, and your answer.
<audio src="soundbank://soundlibrary/ui/gameshow/amzn_ui_sfx_gameshow_countdown_loop_32s_full_01"/>
Time is over. Tell me your answer.
</speak>

Google AutoML Importing text items very slow

I'm importing text items to Google's AutoML. Each row contains around 5000 characters and I'm adding 70K of these rows. This is a multi-label data set. There is no progress bar or indication of how long this process will take. Its been running for a couple of hours. Is there any way to calculate time remaining or total estimated time. I'd like to add additional data sets, but I'm worried that this will be a very long process before the training even begins. Any sort of formula to create even a semi-wild guess would be great.
-Thanks!
I don't think that's possible today, but I filed a feature request [1] that you can follow for updates. I asked for both training and importing data, as for training it could be useful too.
I tried training with 50K records (~ 300 bytes/record) and the load took more than 20 mins after which I killed it. I retried with 1K, which ran for 20 mins and then emailed me an error message saying I had multiple labels per input (yes, so what? training data is going to have some of those) and I had >100 labels. I simplified the classification buckets and re-ran. It took another 20 mins and was successful. Then I ran 'training' which took 3 hours and billed me $11. That maps to $550 for 50K recs, assuming linear behavior. The prediction results were not bad for a first pass, but I got the feeling that it is throwing a super large neural net at the problem. Would help if they said what NN it was and its dimensions. They do say "beta" :)
don't wast your time trying to using google for text classification. I am a GCP hard user but microsoft LUIS is far better, precise and so much faster that I can't believe that both products are trying to solve same problem.
Luis has a much better documentation, support more languages, has a much better test interface, way faster.. I don't know if is cheaper yet because the pricing model is different but we are willing to pay more.