I'm trying to set up a CloudWatch graph using SEARCH, I've done some before, but I can't get it to work the way I want this time. My issue is that I want to include only some of the metric dimensions in the graph. I can search just the namespace and metric, like this:
SEARCH('Namespace="MyServiceName" MetricName="LatencyFromCreation"', 'Average', 300)
That aggregates, but that doesn't give me any dimensions. I can include all the dimensions this:
SEARCH('{MyServiceName,ResultState,ItemType,LogGroup,ServiceName,ServiceType} LatencyFromCreation', 'Average', 300)
That doesn't do any aggregation, it shows me every combination of dimensions. But when I try to remove some of them (as described in the docs and shown in examples) to keep only the dimensions that I'm interested in, like this:
SEARCH('{MyServiceName,ItemType} LatencyFromCreation', 'Average', 300)
I get no results, even though the docs seem to make it clear that this should work, am I missing something? Are the docs inaccurate?
I found this question which is related, but the answer that mentions using SEARCH does not make it clear if they are displaying a subset of the dimensions or not.
I was able to find a solution by using CloudWatch Embedded Metrics Format (EMF) as the log format emitted by my services (as mentioned in this post that Ryan linked), combined with using Metric Filters to create new metrics from the EMF logs, in addition to the ones CloudWatch creates. This allowed me to choose exactly which dimensions I wanted included in the metric by adjusting the pattern on the filter. Then I was able to use all of the normal Metric Math goodness that CloudWatch provides on this new metric.
Feels pretty hacky, but it works.
Related
I am interested in using the tune library for reinforcement learning and I would like to use the in-built tensorboard capability. However, the metric that I am using to tune my hyperparameters is based on a time-consuming evaluation procedure that should be run infrequently.
According to the documentation, it looks like the _train method returns a dictionary that is used both for logging and for tuning hyperparameters. Is it possible either to perform logging more frequently within the _train method? Alternately, could I return the values that I wish to log from the _train method but some of the time omit the expensive-to-compute metric from the dictionary?
One option is to use your own logging mechanism in the Trainable. You can log to the trial-specific directory (Trainable.logdir). If this conflicts with the built-in Tensorboard logging, you can remove that by setting tune.run(loggers=None).
Another option is to, as you mentioned, some of the time omit the expensive-to-compute metric from the dictionary. If you run into issues with that, you can also return "None" as the value for those metrics that you don't plan to compute in a particular iteration.
Hope that helps!
I'm using AWS SageMaker, and i want to create something that, with a given text, it recognize the place of that description. Is it possible?
If there are no other classes besides the text that you would like your model to identify, you may not need a multiclass classifier.
You could train your own text detection model using Amazon SageMaker, and train using a dataset with labelled examples using the Object Detection Algorithm, but this becomes rather involved for a problem that has existing solutions available.
If the appearance of the text you're trying to detect is identical each time, your problem space gets reduced from trying to interpret variable text, to simply having to gather enough examples and perform object detection for the "pattern" your text forms visually. Note that if the text were to appear in different fonts or styles, that the generic object detection method would not interpret it dynamically, and an OCR-based solution would likely be necessary.
More broadly, for text identification in images on AWS, you have quite a few options:
Amazon Rekognition has a DetectText method that will enable you to easily find text within an image. If it's a small or simple phrase, with alphanumeric characters, this should work very well for your use case.
Amazon Textract will help you perform OCR (optical character recognition) while retaining the structure of the source. This is great for documents and tables, but doesn't sound like it may be applicable to your use case.
The AWS marketplace will also have hosted options available from third party vendors. One example of this for text region identification is this one from RocketML.
There are also some great open source tools I'd recommend looking into: OpenCV for ascertaining the text bounding boxes, and Tesseract for OCR and text extraction. This blog post does a good job walking through the process of using them together.
Any of these will help to solve your problem of performing OCR/text identification on AWS, but the best choice comes down to what your current and future needs are, and how quickly you're looking to implement the feature.
Your question is not clear regarding the data that you have or the problem that you want to solve.
If you have a text that includes a place name in it (for example, "I visited Seattle and enjoyed the fish market"), you can use Amazon Comprehend Name Entity Extraction (NEE) including places ("Seattle" in the above example)
{
"Entities": [
{
"Score": 0.9857407212257385,
"Type": "LOCATION",
"Text": "Seattle",
"BeginOffset": 10,
"EndOffset": 17
}
]
}
If the description is more general and you want to classify if the description is of a hotel, a restaurant, a theme park, a concert/show, or similar types of places, you can either use the Custom classification in Comprehend or the Neural Topic Model in SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/ntm.html). You will need some examples of the classes and documents/sentences that are used for the model training.
I have built a classification model using weka.I have two classes namely {spam,non-spam} After applying stringtowordvector filter, I get 10000 attributes for 19000 records. Then I am using liblinear library to build model which gives me F-score as follows:
Spam-94%
non-spam-98%
When I use same model to predict new instances, it predict all of them as spam.
Also, when I try to use test set same as training set, It predict all of them as spam too. I am mentally exhausted to find the problem.Any help will be appreciated.
I get it also wrong every so often. Then I watch this video to remind myself how it's done: https://www.youtube.com/watch?v=Tggs3Bd3ojQ where Prof Witten, one of the Weka Developers/Architects shows how to use the FilteredClassifier (which in turn is configured to load the StringToWordVector Filter) on the training-dataset and the test-set correctly.
This is shown for weka 3.6, weka 3.7. might be slightly different.
What does ZeroR give you? If it's close to 100%, you know that any classification algorithm should be not too far off either.
Why do you optimize for F-Measure? Just asking. I have never used this and don't know much about it. (I would optimize for the "Precision" metric assuming you have much more Spam than Nonspam).
Given I have description of say product x in multiple languages along with its price and availability dictated by region/locale, how do I go about telling django to render the most appropriate variant of the content based on region of request origin? Amazon would be a good example of what I am trying to achieve.
Is is best to store each variant in the database, and afterwards look at request header to serve the most appropriate content, or is there a best practise way to achieve this.
I was struggling with the same problem. The localeurl library seems to handle these cases, so you don't have to write the logic by yourself. I still haven't tested the library, but at first glance it seems to be exactly what we need. You can read more about it here
I use rrd-tool, but now what I want to do is to send an alert if something does not follow the expected values. I tried the rrdtool Holt-Winters feture, but I was looking for something simpler.
Any suggestions?
have a look at updatev (instead of update) ... it will tell you what it writes to the database. this information can be used nicely to decide on threshold violations and such.