I found the pricing documentation ambiguous with regard to the Map states that fan out based on an array in the input. Does anyone know if each fan out ends up being counted as a "state transition" incurring the $0.025 cost? Here is an example input and state machine definition for reference.
Input:
{ "data": [
// Is each of these going to be a "state transition"?
{ "name": "a" },
{ "name": "b" },
] }
Definition:
{
"StartAt": "Start",
"States": {
"Start": {
"Type": "Map",
"ItemsPath": "$.data",
"End": true,
"Iterator": {
"StartAt": "Monitor",
"States": {
"Monitor": {
"Type": "Task",
"Resource": "some-lambda",
"End": true
}
}
}
}
}
}
The cost of Step Functions State Transitions is:
$0.000025 PER STATE TRANSITION THEREAFTER
$0.025 per 1,000 state transitions
With AWS Step Functions, you pay for the number state transitions.
So let's see how many state transitions will Map State creates.
AWS Step Functions only charge customers for events that ends with Entered
For each Map State we have at least these 4 state transitions:
MapStateEntered (Counted as state transition)
MapStateStarted (Not Counted)
MapStateSucceeded (Not Counted)
MapStateExited (Not Counted)
And for each iterations of map state we have these 2 state transitions:
MapIterationStarted (Not Counted)
MapIterationSucceeded (Not Counted)
So for a Map State we can assume the cost is defined by:
cost = (1 + iterations * (steps inside iteration) ) * $0.000025
So for your example (An execution with a Map state with 2 iterations), the overhead of Map State is:
transitions: 1+2*1 = 3
cost: 3 * 0.000025 = $0.000075
Related
i have the following chain: lambda->sns->sqs-> ec2 script->dynamodb table.
the test event for lambda
{
"body": {
"MessageAttributes": {
"vote": {
"Type": "String",
"StringValue": "b"
},
"voter": {
"Type": "String",
"StringValue": "count"
}
}
}
}
originally the table should have 1 partition and 2 attributes - a and b, created with this json code:
{
"voter": {
"S": "count"
},
"a": {
"N": "11"
},
"b": {
"N": "20"
}
}
the issue is in ec2 script that runs continuously. It has a function update_count(vote) that should just increase the existing a or b by 1. The code:
def update_count(vote):
logging.info('update count....')
table.update_item(
Key={'voter': 'count'},
UpdateExpression="ADD #vote = #vote + :incr",
ExpressionAttributeNames={'#vote': vote},
ExpressionAttributeValues={':incr': 1}
)
when i send my test event it overwrites my dynamodb table and it has the following structure:
An UpdateItem will never overwrite attributes which are not set in the UpdateExpression.
I would ensure that you have no other code which is inadvertantly overwriting your item, perhaps from a Lambda function listening to a stream, or another function being called unknowingly in your application.
You can enable DynamoDB Dataplane logs on Cloudtrail to understand what is happening:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/logging-using-cloudtrail.html
How does SageMaker Clarify bias detection work for features that are continuous?
Does it bin continuous variables automatically or do users need to bin them themselves before running the bias detection job?
Using the fairness and explainability example, I selected the 'Capital Gain' facet (it has values 0-99999, no nulls), and set the facet_values_or_threshold=[5000] (expecting the split to occur on 5000).
bias_config = clarify.BiasConfig(label_values_or_threshold=[0],
facet_name='Capital Gain',
facet_values_or_threshold=[5000]
)
The result was: "error": "CI: facet set is empty. Check that x[facet] has non-zero length."I assume this is due to the fact that 'Capital Gain' doesn't have the exact value of 5000.I tested with facet_values_ or_threshold=[2174]
bias_config = clarify.BiasConfig(label_values_or_threshold=[0],
facet_name='Capital Gain',
facet_values_or_threshold=[2174]
)
and got a result:
{
"version": "1.0",
"pre_training_bias_metrics": {
"label": "Target",
"facets": {
"Capital Gain": [
{
"value_or_threshold": "2174",
"metrics": [
{
"name": "CI",
"description": "Class Imbalance (CI)",
"value": 0.9969498043896293
}
]
}
]
},
"label_value_or_threshold": "0"
}
}
I have the following iteration state defined in a Map State:
"WriteRteToDB": {
"Comment": "Write Rte to DB. Also records the risk calculations in the same table.",
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"End": true,
"Parameters": {
"FunctionName": "logger-lambda",
"RtInfo.$": "States.Array($)",
"ExecutionId.$": "$$.Execution.Id",
"InitTime.$": "$$.Execution.StartTime"
}
The parameters defined produce the following input:
{
"FunctionName": "logger-lambda",
"RtInfo": {
"status": 200,
"rte": {
"date": "2022-06-05 00:00:00",
"rt_value": 778129128.6631782,
"lower_80": 0,
"upper_80": 0.5,
"location_id": "WeWork Office Space & Coworking, Town Square, Alpharetta, GA, USA",
"syndrome": "Gastrointestinal"
}
},
"InitTime": "2022-06-05T15:04:57.297Z",
"ExecutionId": "arn:aws:states:us-east-1:1xxxxxxxxxx1:execution:RadaRx-rteForecast:0dbf2743-abb5-e0b6-56d0-2cc82a24e3b4"
}
But the following Error is produced:
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'WriteRteToDB' (entered at the event id #28). The Parameters '{\"FunctionName\":\"logger-lambda\",\"RtInfo\":[{\"status\":200,\"rte\":{\"date\":\"2022-12-10 00:00:00\",\"rt_value\":1.3579795204795204,\"lower_80\":0,\"upper_80\":0.5,\"location_id\":\"Atlanta Tech Park, Technology Parkway, Peachtree Corners, GA, USA\",\"syndrome\":\"Influenza Like Illnesses\"}}],\"InitTime\":\"2022-06-05T16:06:10.132Z\",\"ExecutionId\":\"arn:aws:states:us-east-1:1xxxxxxxxxx1:execution:RadaRx-rteForecast:016a37f2-d01c-9bfd-dc3f-1288fb7c1af6\"}' could not be used to start the Task: [The field \"RtInfo\" is not supported by Step Functions]"
}
I have already tried wrapping the RtInfo inside an array of length 1 as you can observe from above, considering that it is a state within the Map State. I have also checked Input size to make sure that it does not cross the Max Input/Output quota of 256KB.
Your task's Parameters has incorrect syntax. Pass RtInfo and the other user-defined inputs under the Payload key:
"Parameters": {
"FunctionName": "logger-lambda",
"Payload": {
"RtInfo.$": "States.Array($)",
"ExecutionId.$": "$$.Execution.Id",
"InitTime.$": "$$.Execution.StartTime"
}
}
I am new to elasticsearch. I have only one shard with no replica, this shard gets default shard size. Now I want to add shard size explicitly by using template. But when I search for this here, it don't have any property to set shard size. Am I missing something? Is there any other way to do it? And what is default size for a shard? Below is my current template,
{
"index_patterns": ["centralized-logging-index-*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
}
}
I am using elasticsearch on AWS.
Default no of primary shards changed from 5 to 1 per index from ES 7.o version and above API should work when you want to change the number of shards for an index or in an index template.
I just created an index with 5 primary shards by index API .
Put aws-domain/myindex
{
"settings": {
"number_of_shards": 5 // myindex will have 5 primary shards
}
}
You can verify the same using the GET on above API
GET aws-domain/myindex
{
"myindex": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"creation_date": "1591277226075",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "aKbGoSs9RhC_q5iUH6qauw",
"version": {
"created": "7040299"
},
"provided_name": "myindex"
}
}
}
}
Not sure, what do you mean by shard size and if you mean size in GB or something then there is no property to set this as it is dynamic and changed based on number of docs.
I am working on AWS Elasticsearch. It doesn't allow open/close index, so setting change can not be applied on the index.
In order to change the setting of a index, I have to create a new index with new setting and then move the data from the old index into new one.
So first I created a new index with
PUT new_index
{
"settings": {
"max_result_window":3000000,
"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_keywords": {
"type": "keyword_marker",
"keywords": ["whatever"]
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
}
},
"analyzer": {
"my_german_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"german_stop",
"german_keywords",
"german_normalization",
"german_stemmer"
]
}
}
}
}
}
it succeeded. Then I try to move data from old index into new one with query:
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
It failed with
Request failed to get to the server (status code: 504)
I checked the indices with _cat api, it gives
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open old_index AGj8WN_RRvOwrajKhDrbPw 5 1 2256482 767034 7.8gb 7.8gb
yellow open new_index WnGZ3GsUSR-WLKggp7Brjg 5 1 52000 0 110.2mb 110.2mb
Seemingly some data are loaded into there, just wondering why the _reindex doesn't work.
You can check the status of reindex with api:
GET _tasks?detailed=true&actions=*reindex
There is a "status" object in response which has field "total":
total is the total number of operations that the reindex expects to perform. You can estimate the progress by adding the updated, created, and deleted fields. The request will finish when their sum is equal to the total field.
Link to ES Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html