AWS cloudwatch dynamic labels not working properly - amazon-web-services

I wrote a query to show some metrics in a graph in AWS cloudwatch. This query is grouping by 2 different dimensions, and the label by default is hard to understand:
I was trying to use dynamic queries to make the label more expressive:
[action: ${PROP('Dim.Action')}, exception: ${PROP('Dim.exception')}]
But the values of the dimentions never get printed (the name of the dimentions is correct):
I tried with other properties, such as namespace or metric name, but none of them get printed.
Any idea what might be preventing the dynamic queries from working correctly?

According to this answer on the AWS boards, this type of dynamic labelling is only supported for single metrics, not for queries, so this will just not work at the moment.
https://repost.aws/questions/QUOqinLRJFQIC_sI-NclckOw/aws-cloud-watch-graphed-metrics-are-dynamic-labels-with-dimensions-broken

Related

Can I use pre-labeled data in AWS SageMaker Ground Truth NER?

Let's say I have some text data that has already been labeled in SageMaker. This data could have either been labeled by humans or an ner model. Then let's say I want to have a human go back over the dataset, either to label new entity class or correct existing labels. How would I set up a labeling job to allow this? I tried using an output manifest from another labeling job, but all of the documents that were already labeled cannot be accessed by workers to re-label.
Yes, this is possible you are looking for Custom Labelling worklflows you can also apply either Majority Voting (MV) or MDS to evaluate the accuracy of the job

AWS Elasticsearch exceeded limit of total fields in index

I'm running Elasticsearch on AWS, and haven't quite understood how to properly address this issue.
Right now I have the items stored on DynamoDb and use dynamodb streams to send the items to a lambda that then uses dynamodb-stream-elasticsearch to send them to elasticsearch when they are created/updated.
Some properties can be objects which have many nested properties which can themselves be objects, and when these new fields were added, is when I first started getting this error. Due to the nature of these items, these new properties will need to be searchable in the future.
Initially the default index value had not been changed. After my first search on how to fix this I increased the limit to 5000 and now have had to increase it to 12000. The instance type is a t2.small.elasticsearch. The aws elasticsearch console is already reporting the instance health as yellow after I increased the index limit.
Which is the best way to tackle this sort of situation?
Does increasing the instance type fix it or is it a matter of breaking up the item and having multiple separate indexes? If the solution is the latter, is there a good tutorial/guide on how to do this with this set-up (aws dynamodb/elasticsearch)?
By default, the maximum number of fields in an index is 1000, But you can increase that by changing the index.mapping.total_fields.limit index setting.
See other settings to prevent mappings explosion: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/mapping.html#mapping-limit-settings
Which is the best way to tackle this sort of situation?
This could be a solution if you use flattened
The nested type is a specialised version of the object datatype that
allows arrays of objects to be indexed in a way that they can be
queried independently of each other.
When ingesting key-value pairs with a large, arbitrary set of keys,
you might consider modeling each key-value pair as its own nested
document with key and value fields. Instead, consider using the
flattened datatype, which maps an entire object as a single field and
allows for simple searches over its contents. Nested documents and
queries are typically expensive, so using the flattened datatype for
this use case is a better option.

Google Stackdriver Log Based Metrics: how to extract values using a regular expression from the log line

I have log lines of the following form in my Google Cloud Console:
Updated blacklist info about 123 minions. max_blacklist_per_minion=20, median_blacklist_per_minion=8, blacklist_free_minions=31
And I'm trying to set up some log-based metrics to get a longer-term overview of the values (ie. how are they changing? is it lower or higher than yesterday? etc).
However I didn't find any examples for this scenario in the documentation and what I could think of doesn't seem to work. Specifically I'm trying to understand what I need to select in "Field name" to have access to the log line (so that I can write a regular expression against).
I tried textPayload but that seems to be empty for this log entry. Looking at the actual log entry there should also be a protoPayload.line[0], but that doesn't seem to work either
In the "Metric Editor" built into the logs viewer UI you can use "protoPayload.line.logMessage" as the field name. For some reason the UI doesn't want to suggest 'line' (seems like a bug; same behavior in the filter box).
The log based metric won't distinguish based on the index of the app log line, so something like 'line[0]' won't work. For a distribution all values are extracted. A count metric would count the log entry (ie 1 regardless the number of 'line' matches).

What is the effect of using filtered classifier over normal classifier in weka

I have used weka for text classification. First I used StringToWordVector filter and filtered data were used with SVM classifier (LibSVM) for cross validation. Later I have read a blog post here
It said that it is not suitable to use filter first and then perform cross validation. Instead it proposes FilteredClassifer to use. His justification is
Two weeks ago, I wrote a post on how to chain filters and classifiers in WEKA, in order to avoid misleading results when performing experiments with text collections. The issue was that, when using N Fold Cross Validation (CV) in your data, you should not apply the StringToWordVector (STWV) filter on the full data collection and then perform the CV evaluation on your data, because you would be using words that are present in your test subset (but not in your training subset) for each run.
I can not understand the reason behind this. Anyone knows that?
When you using filter before N Fold cross validation you would be filtering every word appear in each instance despite being a test instance or train instance. At the moment Filter has no way to know if a instance is a test instance or a train instance. So if you are using StringtoWordVector with TFTransform or any similar operation, any word in test instances may affect the transform value. (Simply, if you are implementing bag of words then you would take test instance for the consideration too). This is not acceptable since the training parameters should not affected by the testing data. So instead you can do the Filtering on the run. That is FilteredClassifer.
For get an idea about how N Fold cross validation works, please refer to the Rushdi Shams's answer in following question. Please let me know if you understood it or not. Cheers..!!
Cross Validation in Weka

Google Analytics exclude empty custom variable in a custom report

I have a custom variable set for all visitors; for our registered users it's some value, for unregistered users, it's empty.
I can find unregistered users in an advanced segment using the settings Exclude Custom Variable (Value 02) Matching Regexp .+ -- works brilliantly.
But I need a report of unregistered visitors for a dashboard, and tried to do the same thing with a filter. I have a metric of Visits and a dimension of something all vistors will have (e.g. Browser). My filter is identical to the one in the advanced segment, but ... not brilliant. I get no visits. I have tried to Include with a regex ^$ but no love there, either.
Any ideas what I am doing wrong?
To understand your problem and the solution yourself, let me illustrate how the data recording works in any collection process (Google Anlaytics is one of the tools used for data collection and analysis):
To record and analyse data, you first decide what you want to record, and then how. Maybe this how is where Google Analytics comes in for you. So, the data that you want to see is the metric, it can have a name and a (usually numeric) value, and each dimension is how you want to separate or drill down into the various views of the data. As an example, if you want to know how many visitors visited your site everyday, and you want to be able to see through which source they came, Daily Visitor Count is your metric and Source is your dimension.
The important thing to understand here is that Dimensions and Metrics are not bound together. What I mean here is that just because you decided that Daily Visitor Counts should be viewable by Source, doesn't add a source to every updation of the Daily Visitor Count metric. In order to view the metric by the dimenision, you need to update a value for the dimension every time you record the metric.
If you don't record a dimension for a metric, then you cannot obtain the value of the metrics for which you didn't record a dimension by applying a filter on the dimension. Because, using a dimension filter only lets you access the values recorded for the dimension, and not all metrics, because, dimensions don't contain values of metrics, only metrics can optionally contain values for dimensions.
So when you query "dimension equals regex +*", it works, with both include and exclude, but you cannot query metrics with empty dimension using a dimensional filter. The best way would be to only add a standard or default value for the dimension every time you record the metric so that you can separate, something like (not set) or unknown.
Hope that helps. :)
I just hope you understand what you were trying to do is conceptually wrong, though it could still have been made technically feasible.