We were looking to use AWS Comprehend custom classifier but its pricing seems way high as it starts charging the moment is put and even if not used ("Endpoints are billed on one second increments, with a minimum of 60 seconds. Charges will continue to incur from the time you start the endpoint until it is deleted even if no documents are analyzed.")
So, we need the feature but would like to see if there is an alternate way to use the classifiers we have.
Any ideas?
Comprehend supports both synchronous and asynchronous inference on custom classifiers. Synchronous inference provides sub-second response time but requires setting up a custom endpoint to host the model and is charged on uptime.
Asynchronous inference (StartDocumentClassificationJob) usually takes a few minutes to an hour dependent on the amount of data being processed and is billed based on data volume (1 billing units = 100 characters).
Related
I am going to be using Athena for report generation on data available in S3. A lot of it is time series data coming from IoT devices.
Users can request reports over years and years' worth of data but will mostly be weekly, monthly or annual.
I am thinking to save aggregates every 15 minutes for ex: 12:00, 12:15, 12:30, 12:45, 1:00 etc. The calculated aggregates should always be at the full 15 mins and cannot be at 12:03 and 12:18 so on and so forth. Is it possible with Kinesis data analytics? If yes, how?
If not, does scheduling a lambda to be triggered every 5-10 minutes and having athena calculate those aggregates sound like a reasonable approach? Any alternatives I should consider?
Kinesis Data Analytics runs Apache Flink which supports tumbling windows. The intervals starting from 00:00, 00:15, etc. should work by default by setting the window time to 15min.
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/operators/windows/#tumbling-windows
Since 15min is quite slow, you could also consider writing AWS Glue job (Apache Spark) and have it triggered periodically with built-in Glue triggers.
Or you can go with your current solution (Lambda/Athena).
One of the main decisions here would be how much do you need to invest to learn Spark or Flink vs. alredy known (I assume) Athena query. I would reserve some limited time for each approach to test them before picking one. This way you can quickly see where things get complicated.
I would like to set a monthly threshold on the number of traces collected by AWS X-Ray (mainly to avoid unexpected expenses).
It seems that sampling rules enable us to limit the trace ingestion but they use one second window.
https://docs.aws.amazon.com/xray/latest/devguide/xray-console-sampling.html
But setting a limit on the number of traces per seconds might cause me to loose some important traces. Basically the one second window seems unreasonably narrow and I would rather set the limits for a whole month.
Is there any way to achieve that?
If not, does anyone know the reason why AWS does not enable that?
(Update)
Answer by Lei Wang confirms that it is not possible and speculates about the possible reasons (see the post for details).
Interestingly log analytics workspaces in azure have this functionality so it should likely not be impossible to add something similar to AWS X-Ray.
XRay right now supports 2 basic sampling behaviors:
ratio
limit the sampled per second
These 2 can be used together in or relationship to become the 3rd behavior: ratio + reservoir. For example, 1/s reservoir + 5% ration. Means sample at least 1 trace / second, then if the throughput is over 1/second, sample additional 5%.
The reason XRay does not support more sampling behavior like you mentioned limit per month I guess because technically it is not easy to implement and not sure whether it is a common user requirement. Because XRay is not able to guarantee customer would not reboot application within 1 month. Even user say his application would never reboot. XRay SDK still need communication mechanism to calculate the total traces across fleet. So, the only possible workaround is user application keeps tracking how many traces have been in XRay backend in total by periodically query.
we use event hub, the intent is to able to archive the inbound event data for troubleshooting/analytic reasons, understandably event hub capture built in plays the role, however looking at the price tag my boss not happy. His question is, what benifits it compares to we simply have a function to bridge the event hub to some sort of storage e.g. blob by ourself, would that justify the cost saving in long run..
I don't know how to answer this, could you please help?
Azure Functions consumption plan is billed mainly on number of executions whereas Event Hub capture is billed on number of TUs.
Here are couple things that can help to reduce Function app execution counts:
Smaller EH partitions counts - for example, 4 partitions would deliver events in larger batches than 32 partitions would do.
Increase batchSize in function app's config.
Since you have only 3 partitions and 1 TU traffic to process, you may probably save if you run with a function rather than capture. I recommend doing some test runs and see how many executions incurred then you can compare the hourly cost of functions app to $.10 hourly fixed cost of EH capture.
I am assuming storage side billing will probably be similar or you can even try reducing it further down by increasing batching and decreasing number of storage calls.
I've been using Copperegg for a while now and have generally been happy with it until lately, where I have had a few issues. It's being used to monitor a number of EC2 instances that must be up 24/7.
Last week I was getting phantom alerts that servers had gone down when they hadn't, which I can cope with, but also I didn't get an alert when I should have done. One server had high CPU for over 5 mins when the alert should be triggered after 1 minute. The Copperegg support weren't not all that helpful, merely agreeing that an alert should have been triggered.
The latter of those problems is unacceptable and if it were to happen again outside of working hours then serious problems will follow.
So, I'm looking for alternative services that will do that same job. I've looked at Datadog and New Relic, but both have a significant problem in that they will only alert me of a problem 5 minutes after it's occurred, rather than the 1 minute I can get with Copperegg.
What else is out there that can do the same job and will also integrate with Pager Duty?
tl;dr : Amazon CloudWatch will do what you want and probably much much more.
I believe that Amazon actually offers a service that would accomplish your goal - CloudWatch (pricing). I'm going to take your points one by one. Note that I haven't actually used it before, but the documentation is fairly clear.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
It looks like CloudWatch can be configured to send an alert (which I'll get to) after one minute of a condition being met:
One can actually set conditions for many other metrics as well - this is what I see on one of my instances, and I think that detailed monitoring (I use free), might have even more:
What else is out there that can do the same job and will also integrate with Pager Duty?
I'm assuming you're talking about this. It turns out the Pager Duty has a helpful guide just for integrating CloudWatch. How nice!
Pricing
Here's the pricing page, as you would probably like to parse it instead of me telling you. I'll give a brief overview, though:
You don't want basic monitoring, as it only gives you metrics once per five minutes (which you've indicated is unacceptable.) Instead, you want detailed monitoring (once every minute).
For an EC2 instance, the price for detailed monitoring is $3.50 per instance per month. Additionally, every alarm you make is $0.10 per month. This is actually very cheap if compared to CopperEgg's pricing - $70/mo versus maybe $30 per month for 9 instances and copious amounts of alarms. In reality, you'll probably be paying more like $10/mo.
Pager Duty's tutorial suggests you use SNS, which is another cost. The good thing: it's dirt cheap. $0.60 per million notifications. If you ever get above a dollar in a year for SNS, you need to perform some serious reliability improvements on your servers.
Other shiny things!
You're not just limited to Amazon's pre-packaged metrics! You can actually send custom metrics (time it took to complete a cronjob, whatever) to Cloudwatch via a PUT request. Quite handy.
Submit Custom Metrics generated by your own applications (or by AWS resources not mentioned above) and have them monitored by Amazon CloudWatch. You can submit these metrics to Amazon CloudWatch via a simple Put API request.
(from here)
Conclusion
So all in all: CloudWatch is quite cheap, can do 1-minute frequency stats, and will integrate with Pager Duty.
tl;dr: Server Density will do what you want, on top of that it has web checks and custom metrics too.
In short Server Density is a monitoring tool that will monitor all the relevant server metrics. You can take a look at this page where it’s all described.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
Server Density’s open source agent collects and posts the data to their server every minute and you can decide yourself when that alert should be triggered. In the alert below you can see that the alert will alert 1 person after 1 minute and then repeatedly alert every 5 minutes.
There is a lot of other metrics that you can alert on too.
What else is out there that can do the same job and will also integrate with Pager Duty?
Server Density also integrates with PagerDuty. The only thing you need to do is to generate an api key at PagerDuty and then provide that in the settings.
Just provide the API key in the settings and you can then in check pagerduty as one of the alert recipients.
Pricing
You can find the pricing page here. I’ll give you a brief overview of it. The pricing starts at $10 for one server plus one web check and then get’s cheaper per server the more servers you add.
Everything will be monitored once every minute and there is no fees added for the amount of alerts added or triggered, even if that is an SMS to your phone number. The cost is slightly more expensive than the Cloudwatch example, but the support is good. If you used copperegg before they have a migration tool too.
Other shiny things!
Server Density allows you to monitor all the things! Then only thing you need to do is to send us custom metrics which you can do with a plugin written by yourself or by someone else.
I have to say that the graphs that Server Density provides is somewhat akin to eye candy too. Most other monitoring solutions I’ve seen out there have quite dull dashboards.
Conclusion
It will do the job for you. Not as cheap as CloudWatch, but doesn’t lock you in into AWS. It’ll give you 1 minute frequency metrics and integrate with pagerduty + a lot more stuff.
How should we architect a solution that uses Amazon Mechanical Turk API to process a stream of tasks instead of a single batch of bulk tasks?
Here's more info:
Our app receives a stream of about 1,000 photos and videos per day. Each picture or video contains 6-8 numbers (it's the serial number of an electronic device) that need to be transcribed, along with a "certainty level" for the transcription (e.g. "Certain", "Uncertain", "Can't Read"). The transcription will take under 10 seconds per image and under 20 seconds per video and will require minimal skill or training.
Our app will get uploads of these images continuously throughout the day and we want to turn them into numbers within a few minutes. The ideal solution would be for us to upload new tasks every minute (under 20 per minute during peak periods) and download results every minute too.
Two questions:
To ensure a good balance of fast turnaround time, accuracy, and cost effectiveness, should we submit one task at a time, or is it best to batch tasks? If so, what variables should we consider when setting a batch size?
Are there libraries or hosted services that wrap the MTurk API to more easily handle use-cases like ours where HIT generation is streaming and ongoing rather than one-time?
Apologies for the newbie questions, we're new to Mechanical Turk.
Streaming tasks one at a time to Turk
You can stream tasks individually through mechanical turk's api by using the CreateHIT operation. Every time you receive an image in your app, you can call the CreateHIT operation to immediately send the task to Turk.
You can also setup notifications through the api, so you can be alerted as soon as a task is completed. Turk Notification API Docs
Batching vs Streaming
As for batching vs streaming, you're better off streaming to achieve a good balance of turnaround time and cost. Batching won't drive down costs too much and improving accuracy is largely dependent on vetting, reviewing, and tracking worker performance either manually or implementing automated processes.
Libraries and Services
Most libraries offer all of the operations available in the api, so you can just google or search Github for a library in your programming language. (We use the Ruby library rturk)
A good list of companies that offer hosted solutions can be found under the Metaplatforms section of a answer on Quora to the question: What are some crowdsourcing services similar to Amazon Mechanical Turk? (Disclaimer: my company, Houdini is one of the solutions listed there.)