SageMaker Clarify Bias Detection for Continuous Features - amazon-web-services

How does SageMaker Clarify bias detection work for features that are continuous?
Does it bin continuous variables automatically or do users need to bin them themselves before running the bias detection job?
Using the fairness and explainability example, I selected the 'Capital Gain' facet (it has values 0-99999, no nulls), and set the facet_values_or_threshold=[5000] (expecting the split to occur on 5000).
bias_config = clarify.BiasConfig(label_values_or_threshold=[0],
facet_name='Capital Gain',
facet_values_or_threshold=[5000]
)
The result was: "error": "CI: facet set is empty. Check that x[facet] has non-zero length."I assume this is due to the fact that 'Capital Gain' doesn't have the exact value of 5000.I tested with facet_values_ or_threshold=[2174]
bias_config = clarify.BiasConfig(label_values_or_threshold=[0],
facet_name='Capital Gain',
facet_values_or_threshold=[2174]
)
and got a result:
{
"version": "1.0",
"pre_training_bias_metrics": {
"label": "Target",
"facets": {
"Capital Gain": [
{
"value_or_threshold": "2174",
"metrics": [
{
"name": "CI",
"description": "Class Imbalance (CI)",
"value": 0.9969498043896293
}
]
}
]
},
"label_value_or_threshold": "0"
}
}

Related

AWS Step Function Error with Input to Map State

I have the following iteration state defined in a Map State:
"WriteRteToDB": {
"Comment": "Write Rte to DB. Also records the risk calculations in the same table.",
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"End": true,
"Parameters": {
"FunctionName": "logger-lambda",
"RtInfo.$": "States.Array($)",
"ExecutionId.$": "$$.Execution.Id",
"InitTime.$": "$$.Execution.StartTime"
}
The parameters defined produce the following input:
{
"FunctionName": "logger-lambda",
"RtInfo": {
"status": 200,
"rte": {
"date": "2022-06-05 00:00:00",
"rt_value": 778129128.6631782,
"lower_80": 0,
"upper_80": 0.5,
"location_id": "WeWork Office Space & Coworking, Town Square, Alpharetta, GA, USA",
"syndrome": "Gastrointestinal"
}
},
"InitTime": "2022-06-05T15:04:57.297Z",
"ExecutionId": "arn:aws:states:us-east-1:1xxxxxxxxxx1:execution:RadaRx-rteForecast:0dbf2743-abb5-e0b6-56d0-2cc82a24e3b4"
}
But the following Error is produced:
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'WriteRteToDB' (entered at the event id #28). The Parameters '{\"FunctionName\":\"logger-lambda\",\"RtInfo\":[{\"status\":200,\"rte\":{\"date\":\"2022-12-10 00:00:00\",\"rt_value\":1.3579795204795204,\"lower_80\":0,\"upper_80\":0.5,\"location_id\":\"Atlanta Tech Park, Technology Parkway, Peachtree Corners, GA, USA\",\"syndrome\":\"Influenza Like Illnesses\"}}],\"InitTime\":\"2022-06-05T16:06:10.132Z\",\"ExecutionId\":\"arn:aws:states:us-east-1:1xxxxxxxxxx1:execution:RadaRx-rteForecast:016a37f2-d01c-9bfd-dc3f-1288fb7c1af6\"}' could not be used to start the Task: [The field \"RtInfo\" is not supported by Step Functions]"
}
I have already tried wrapping the RtInfo inside an array of length 1 as you can observe from above, considering that it is a state within the Map State. I have also checked Input size to make sure that it does not cross the Max Input/Output quota of 256KB.
Your task's Parameters has incorrect syntax. Pass RtInfo and the other user-defined inputs under the Payload key:
"Parameters": {
"FunctionName": "logger-lambda",
"Payload": {
"RtInfo.$": "States.Array($)",
"ExecutionId.$": "$$.Execution.Id",
"InitTime.$": "$$.Execution.StartTime"
}
}

AWS EventBridge Input transformation rule with array List

I have a event with an Arraylist :
I have an Arraylist :
"TelephoneDetails": {
"Telephone": [
{
"Number": "<Number>",
"Type": "<Type>",
"Primary": "<Primary>",
"TextEnabled": "<TextEnabled>"
},{
"Number": "<Number>",
"Type": "<Type>",
"Primary": "<Primary>",
"TextEnabled": "<TextEnabled>"
}
]
}
how to write the InputTransformer for the InputPath for this ? ,
i can get the Telephone[0]using this
{
"Type": "$.detail.payload.TelephoneDetails.Telephone[0].Type",
"Number": "$.detail.payload.TelephoneDetails.Telephone[0].Number",
"Primary": "$.detail.payload.TelephoneDetails.Telephone[0].Primary",
"TextEnabled": "$.detail.payload.TelephoneDetails.Telephone[0].TextEnabled"
}
not understanding how to write, if I have ArrayList of N?
Is simple as that:
"TextEnabled": "$.detail.payload.TelephoneDetails.Telephone[*].TextEnabled"
please note the [*] instead of [0] to let the template engine iterates over the list of Telephones
I think you can't do this with plain EB syntax. Probably the best way would be to put a lambda function as target of your EB rule, make the transformations and then forward it to your target.

How are Map states priced in AWS Step Functions?

I found the pricing documentation ambiguous with regard to the Map states that fan out based on an array in the input. Does anyone know if each fan out ends up being counted as a "state transition" incurring the $0.025 cost? Here is an example input and state machine definition for reference.
Input:
{ "data": [
// Is each of these going to be a "state transition"?
{ "name": "a" },
{ "name": "b" },
] }
Definition:
{
"StartAt": "Start",
"States": {
"Start": {
"Type": "Map",
"ItemsPath": "$.data",
"End": true,
"Iterator": {
"StartAt": "Monitor",
"States": {
"Monitor": {
"Type": "Task",
"Resource": "some-lambda",
"End": true
}
}
}
}
}
}
The cost of Step Functions State Transitions is:
$0.000025 PER STATE TRANSITION THEREAFTER
$0.025 per 1,000 state transitions
With AWS Step Functions, you pay for the number state transitions.
So let's see how many state transitions will Map State creates.
AWS Step Functions only charge customers for events that ends with Entered
For each Map State we have at least these 4 state transitions:
MapStateEntered (Counted as state transition)
MapStateStarted (Not Counted)
MapStateSucceeded (Not Counted)
MapStateExited (Not Counted)
And for each iterations of map state we have these 2 state transitions:
MapIterationStarted (Not Counted)
MapIterationSucceeded (Not Counted)
So for a Map State we can assume the cost is defined by:
cost = (1 + iterations * (steps inside iteration) ) * $0.000025
So for your example (An execution with a Map state with 2 iterations), the overhead of Map State is:
transitions: 1+2*1 = 3
cost: 3 * 0.000025 = $0.000075

Amazon EventBridge: Match an object inside of an array

I've stuck the problem with defining a rule for matching my events.
Googled, tested.
Let's say, we've the following event which contains the object user in the array events:
{
"version": "0",
"...": "...",
"detail": {
"events": [
{
"user": {
"id": "5efdee60b48e7c1836078290"
}
}
]
}
}
Is there any way to match the user.id in an EventBus rule?
I've already tried to use the following rule which is not valid:
{
"detail": {
"events": [
{
"user": {
"id": [
"5efdee60b48e7c1836078290"
]
}
}
]
}
}
then,
{
"detail": {
"events[0]": {
"user": {
"id": [
"5efdee60b48e7c1836078290"
]
}
}
}
}
also no effect.
I don't want to give up, but I'm tired with it ;)
This pattern should work to match this event:
{
"detail": {
"events": {
"user": {
"id": [
"5efdee60b48e7c1836078290"
]
}
}
}
}
Today, EventBridge only supports matching simple values (string, integer, boolean, null) with an array. You can read more in the service documentation.
I did some playing around with your example but I can't make it work. Based on reading the Arrays in EventBridge Event Patterns I have to conclude that matching inside arrays with complex values is not possible.
The quote that seems to confirm this is "If the value in the event is an array, then the pattern matches if the intersection of the pattern array and the event array is non-empty."
And from the Event Patterns page "Match values are always in arrays." So if your pattern is an array and the value in the event is also an array (this is the example you gave), a "set" based intersection test is performed. Your pattern would have to match the entire array entry, not just a single field like you have in the example.

Elasticsearch _reindex fails

I am working on AWS Elasticsearch. It doesn't allow open/close index, so setting change can not be applied on the index.
In order to change the setting of a index, I have to create a new index with new setting and then move the data from the old index into new one.
So first I created a new index with
PUT new_index
{
"settings": {
"max_result_window":3000000,
"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_keywords": {
"type": "keyword_marker",
"keywords": ["whatever"]
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
}
},
"analyzer": {
"my_german_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"german_stop",
"german_keywords",
"german_normalization",
"german_stemmer"
]
}
}
}
}
}
it succeeded. Then I try to move data from old index into new one with query:
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
It failed with
Request failed to get to the server (status code: 504)
I checked the indices with _cat api, it gives
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open old_index AGj8WN_RRvOwrajKhDrbPw 5 1 2256482 767034 7.8gb 7.8gb
yellow open new_index WnGZ3GsUSR-WLKggp7Brjg 5 1 52000 0 110.2mb 110.2mb
Seemingly some data are loaded into there, just wondering why the _reindex doesn't work.
You can check the status of reindex with api:
GET _tasks?detailed=true&actions=*reindex
There is a "status" object in response which has field "total":
total is the total number of operations that the reindex expects to perform. You can estimate the progress by adding the updated, created, and deleted fields. The request will finish when their sum is equal to the total field.
Link to ES Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html