Riak: query array of 2i through Map Reduce

Riak: query array of 2i through Map Reduce - mapreduce

Can I query an array of secondary indexes in a Map Reduce job in Riak, instead of just the one key? I want to do something like this:
"inputs": {
"bucket": myBycket,
"index": "myIndex",
"key": key + " OR " + key + " OR " + key
},
"query": [{
"map": {
"language":"javascript",
"name":"Riak.mapValuesJson"
}
}]
}
But I have not found any support for it. The keys are in no particular order, so I don't think that I can use ranged queries.

You could certainly handle this with Map Reduce. If you presort your target keys, you could limit the range and possibly improve performance.
{
"inputs": {
"bucket": myBycket,
"index": "myIndex",
"start": firstkey,
"end": lastkey
},
"query": [{
"map": {
"language":"javascript",
"source":"function(v) {
if (v.key == "key1" || v.key == "key2") {
return [<<put what you want returned here>>]
} else {
return []
}
}"
}
}]
}
You should note that according to the docs site javascript Map Reduce has been officially deprecated, so you may want to use Erlang functions for new development.

Related

What's the best practice for unmarshalling data returned from a dynamo operation in aws step functions?

I am running a state machine running a dynamodb query (called using CallAwsService). The format returned looks like this:
{
Items: [
{
"string" : {
"B": blob,
"BOOL": boolean,
"BS": [ blob ],
"L": [
"AttributeValue"
],
"M": {
"string" : "AttributeValue"
},
"N": "string",
"NS": [ "string" ],
"NULL": boolean,
"S": "string",
"SS": [ "string" ]
}
}
]
}
I would like to unmarshall this data efficiently and would like to avoid using a lambda call for this
The CDK code we're currently using for the query is below
interface FindItemsStepFunctionProps {
table: Table
id: string
}
export const FindItemsStepFunction = (scope: Construct, props: FindItemStepFunctionProps): StateMachine => {
const { table, id } = props
const definition = new CallAwsService(scope, 'Query', {
service: 'dynamoDb',
action: 'query',
parameters: {
TableName: table.tableName,
IndexName: 'exampleIndexName',
KeyConditionExpression: 'id = :id',
ExpressionAttributeValues: {
':id': {
'S.$': '$.path.id',
},
},
},
iamResources: ['*'],
})
return new StateMachine(scope, id, {
logs: {
destination: new LogGroup(scope, `${id}LogGroup`, {
logGroupName: `${id}LogGroup`,
removalPolicy: RemovalPolicy.DESTROY,
retention: RetentionDays.ONE_WEEK,
}),
level: LogLevel.ALL,
},
definition,
stateMachineType: StateMachineType.EXPRESS,
stateMachineName: id,
timeout: Duration.minutes(5),
})
}

Can you unmarshall the data downstream? I'm not too well versed on StepFunctions, do you have the ability to import utilities?
Unmarshalling DDB JSON is as simple as calling the unmarshall function from DynamoDB utility:
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_util_dynamodb.html
You may need to do so downstream as StepFunctions seems to implement the low level client.

Step functions still don't make it easy enough to call DynamoDB directly from a step in a state machine without using a Lambda function. The main missing parts are the handling of the different cases of finding zero, one or more records in a query, and the unmarshaling of the slightly complicated format of DynamoDB records. Sadly the $utils library is still not supported in step functions.
You will need to implement these two in specific steps in the graph.
Here is a diagram of the steps that we use as DynamoDB query template:
The first step is used to provide parameters to the query. This step can be omitted and define the parameters in the query step:
"Set Query Parameters": {
"Type": "Pass",
"Next": "DynamoDB Query ...",
"Result": {
"tableName": "<TABLE_NAME>",
"key_value": "<QUERY_KEY>",
"attribute_value": "<ATTRIBUTE_VALUE>"
}
}
The next step is the actual query to DynamoDB. You can also use GetItem instead of Query if you have the record keys.
"Type": "Task",
"Parameters": {
"TableName": "$.tableName",
"IndexName": "<INDEX_NAME_IF_NEEDED>",
"KeyConditionExpression": "#n1 = :v1",
"FilterExpression": "#n2.#n3 = :v2",
"ExpressionAttributeNames": {
"#n1": "<KEY_NAME>",
"#n2": "<ATTRIBUTE_NAME>",
"#n3": "<NESTED_ATTRIBUTE_NAME>"
},
"ExpressionAttributeValues": {
":v1": {
"S.$": "$.key_value"
},
":v2": {
"S.$": "$.attribute_value"
}
},
"ScanIndexForward": false
},
"Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
"ResultPath": "$.ddb_record",
"ResultSelector": {
"result.$": "$.Items[0]"
},
"Next": "Check for DDB Object"
}
The above example seems a bit complicated, using both ExpressionAttributeNames and ExpressionAttributeValues. However, it makes it possible to query on nested attributes such as item.id.
In this example, we only take the first item response with $.Items[0]. However, you can take all the results if you need more than one.
The next step is to check if the query returned a record or not.
"Check for DDB Object": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.ddb_record.result",
"IsNull": false,
"Comment": "Found Context Object",
"Next": "Parse DDB Object"
}
],
"Default": "Do Nothing"
}
And lastly, to answer your original question, we can parse the query result, in case that we have one:
"Parse DDB Object": {
"Type": "Pass",
"Parameters": {
"string_object.$": "$.ddb_record.result.string_object.S",
"bool_object.$": "$.ddb_record.result.bool_object.Bool",
"dict_object": {
"nested_dict_object.$": "$.ddb_record.result.item.M.name.S",
},
"dict_object_full.$": "States.StringToJson($.ddb_record.result.JSON_object.S)"
},
"ResultPath": "$.parsed_ddb_record",
"End": true
}
Please note that:
Simple strings are easily converted by "string_object.$": "$.ddb_record.result.string_object.S"
The same for numbers or booleans by "bool_object.$": "$.ddb_record.result.bool_object.Bool")
Nested objects are parsing the map object ("item.name.$": "$.ddb_record.result.item.M.name.S", for example)
Creation of a JSON object can be achieved by using States.StringToJson
The parsed object is added as a new entry on the flow using "ResultPath": "$.parsed_ddb_record"

Access an Array Item by index in AWS Dynamodb Query Results "Items" in Step Function

I have this dynamodb:Query in my step function:
{
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
"Next": "If nothing returned by query Or Study not yet Zipped",
"Parameters": {
"TableName": "TEST-StudyProcessingTable",
"ScanIndexForward": false,
"Limit": 1,
"KeyConditionExpression": "OrderID = :OrderID",
"FilterExpression": "StudyID = :StudyID",
"ExpressionAttributeValues": {
":OrderID": {
"S.$": "$.body.order_id"
},
":StudyID": {
"S.$": "$.body.study_id"
}
}
},
"ResultPath": "$.processed_files"
}
The results comes in as an array called Items which is nested under my ResultPath
processed_files.Items:
{
"body": {
"order_id": "1001",
"study_id": "1"
},
"processed_files": {
"Count": 1,
"Items": [
{
"Status": {
"S": "unzipped"
},
"StudyID": {
"S": "1"
},
"ZipFileS3Key": {
"S": "path/to/the/file"
},
"UploadSet": {
"S": "4"
},
"OrderID": {
"S": "1001"
},
"UploadSet#StudyID": {
"S": "4#1"
}
}
],
"LastEvaluatedKey": {
"OrderID": {
"S": "1001"
},
"UploadSet#StudyID": {
"S": "4#1"
}
},
"ScannedCount": 1
}
}
My question is how do i access the items inside this array from a choice state in a step function?
I need to query then decide something based on the results by checking the item in a condition in a choice state.
The problem is that since this is an array I can't access it using regular JsonPath (like with Items.item), and in my next step the choice condition does NOT accept an index like processed_files.Items['0'].Status

Ok so the answer was so simple all you need to do is use a number instead of string for the array index like this.
processed_files.Items[0].Status
I was originally mislead by an error I received which said that it expected a ' or '[' after the first '['. I mistakenly thought this meant it only accepts strings.
I was wrong, it works like any other array.
I hope this helps somebody one day.

AWS State Machine - Update DynamoDB table is replacing the ID with "$.id"

I have a step where I want to update a object on a DynamoDB table.
Everything works except its creating a new object with the ID value of "$.id", instead of updating where the ID I pass in.
This is my first state machine attempt so what have I done wrong here?
"update-table-processing": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:updateItem",
"ResultPath": "$.updateResult",
"Parameters": {
"TableName": "Projects",
"Key": {
"id": {
"S": "$.id"
}
},
"UpdateExpression": "SET step = :updateRef",
"ExpressionAttributeValues": {
":updateRef": {
"S": "processing"
}
},
"ReturnValues": "ALL_NEW"
},
"Next": "create-project"
},
Do I somehow need to tell DynamoDB to evaluate "$.id" rather than treating it as a "S", or is this happening because I've not mapped the input correctly that the "$.id" value is empty?
My input looks like:
{
"id": "f8185735-c90d-4d4e-8689-cec68a48b1bc"
}

In order to specify data from your input you have to use a Key-Value pair, with the key value ending in a ".$". So to fix this you need to change it to:
"Key": {
"id": {
"S.$": "$.id"
}
},
Using the above it should correctly resolve to the value from your input instead of the string value "$.id".
References - https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.html#input-output-parameters

Regular Expressions and Elastic Search

I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.

Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}

Elasticsearch : es.index() changes the Mapping when message is pushed

I am trying to push some messages like this to elasticsearch
id=1
list=asd,bcv mnmn,kjkj, pop asd dgf
so, each message has an id field which is a string, and a list field that contains a list of string values
when i push this into elastic and try to create charts in kibana, the default analyzer kicks in and splits my list by the space character. Hence it breaks up my values. I tried to create a mapping for my index as
mapping='''
{
"test":
{
"properties": {
"DocumentID": {
"type": "string"
},
"Tags":{
"type" : "string",
"index" : "not_analyzed"
}
}
}
}'''
es = Elasticsearch([{'host': server, 'port': port}])
indexName = "testindex"
es.indices.create(index=indexName, body=mapping)
so this should create the index with the mapping i defined. Now , i push the messages by simply
es.index(indexName, docType, messageBody)
but even now, Kibana breaks up my values! why was the mapping not applied ?
and when i do
GET /testindex/_mapping/test
i get
{
"testindex": {
"mappings": {
"test": {
"properties": {
"DocumentID": {
"type": "string"
},
"Tags": {
"type": "string"
}
}
}
}
}
}
why did the mapping change? How can i specify the mapping type when i do
es.index()

You were very close. You need to provide the root mappings object while creating the index and you dont need it when using _mapping end point and that is the reason put_mapping worked and create did not. You can see that in api.
mapping = '''
{
"mappings": {
"test": {
"properties": {
"DocumentID": {
"type": "string"
},
"Tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
'''
Now this will work as expected
es.indices.create(index=indexName, body=mapping)
Hope this helps

i was able to get the correct mapping to work by
es.indices.create(index=indexName)
es.indices.put_mapping(docType, mapping, indexName)
i dont understand why
es.indices.create(index=indexName, body=mapping)
did not work. this should have worked as per the API.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Riak: query array of 2i through Map Reduce - mapreduce

Related

What's the best practice for unmarshalling data returned from a dynamo operation in aws step functions?

Access an Array Item by index in AWS Dynamodb Query Results "Items" in Step Function

AWS State Machine - Update DynamoDB table is replacing the ID with "$.id"

Regular Expressions and Elastic Search

Elasticsearch : es.index() changes the Mapping when message is pushed

Categories

Resources