Generating JSON object with dynamic keys in AWS Step Functions - amazon-web-services

Background:
I am trying to add DynamoDB:GetItem step to my state machine in AWS Step Functions. GetItem API takes input in the following format:
{
"TableName": "MyDynamoDBTable",
"Key": {
"Column": {
"S": "MyEntry"
}
}
}
where "Column" is the primary key name, and "MyEntry" is the primary key value. The issue is that I want to be able to specify both primary key name and value dynamically, using JSON path reference.
Unfortunately, AWS won't allow me to pass value reference for primary key name ("Column"). So I can't do something like
{
"TableName": "MyDynamoDBTable",
"Key.$": {
"$.ColumnName": {
"S": "MyEntry"
}
}
}
Problem:
The only workaround I could think of (albeight a bit ugly) is to use combination of States.StringToJson and States.Format intrinsic functions to first generate stringified version of the input to Key.$ field, and then convert to JSON from string. Something like:
{
"TableName.$": "$.TableName",
"Key.$": "States.StringToJson(States.Format('\{\"{}\":\{\"S.$\":\"{}\"\}\}', $.PrimaryKeyName, $.PrimaryKeyValue))"
}
It should work in theory, but it seems that AWS Step Functions is not happy about escaping double quotes? It's not able to parse the definition above.
So my question is:
Is there a way to make this work? (either by escaping double quotes somehow, or through a totally different approach)

After lots of experimentation, I finally found a way to make dynamic keys work. I am using Pass step with the following parameters defined:
{
"Key.$": "States.StringToJson(States.Format('\\{\"{}\":\\{\"S\":\"{}\"\\}\\}', $.HashKeyName, $.HashKeyValue))"
}
The secret, apparently, was in using double \\ when escaping { and } symbols. Escaping " wasn't a problem after all, even though it's not documented in AWS docs.
The result of this transformation is following:
{
"Key": {
"MyHashKeyName": {
"S": "MyHashKeyValue"
}
}
}

Related

Find string in between in kibana elastic search with regex like in splunk

In splunk, we can filter out dynamic string in between two strings.
Say for example,
<TextileType>Shirt</TextileType>
<TextileType>Trousers</TextileType>
<TextileType>Shirt</TextileType>
<TextileType>Trousers</TextileType>
<TextileType>Shirt</TextileType>
The output I am expecting:
Shirt - 3
Trousers - 2
I am able to do this in splunk, easily.
Picture copied from Google (not exact one)
How can I achieve this in Kibana ?
Tried many ways, but not able to do any regex as per my need.
Note: Here's the example json query, in which I need to add regex. In this example, I am just trying to search for "Shirt" manually, which I am expecting to get dynamically.
{
"query": {
"match": {
"text": {
"query": "Shirt",
"type": "phrase"
}
}
}
}
Considering data is in the sample index, you can use a wildcard search:
GET /sample/_search
{
"query": {
"wildcard":{
"column2":"*Shirt*"
}
}
}
Notice how it only returns results containing keyword Shirt
If you are looking to clean the data, you'd need to run it through a logstash pipeline to strip the XML tags and leave you with the text.

AWS CDK - StepFunction - Modify input before passing to the next step

I have a 3 step state machine for a step function.
InputStep -> ExecuteSparkJob -> OutputLambda
ExecuteSparkJob is a glue task. Since it cannot pass its output to the step function, it writes it output to an S3 bucket. OutputLambda reads it from there and passes it on to the step function.
The idea of InputStep is simply to define a common S3 URI that the following steps can use.
Below is the code I have for the Input Step.
const op1 = Data.stringAt("$.op1");
const op2 = Data.stringAt("$.op2");
const inputTask = new Pass(this, "Input Step", {
result: Result.fromString(this.getURI(op1, op2)),
resultPath: "$.s3path"
});
getURI(op1: string, op2: string): string {
return op1.concat("/").concat(op2).concat("/").concat("response");
}
However, the string manipulation that I am doing in getURI is not working. The values in inputTask.result are not being substituted by the value in Path.
This is the input and output to the Input Step
{
"op1": "test1",
"op2": "test2"
}
Output
{
"op1": "test1",
"op2": "test2"
"responsePath": "$.op1/$.op2/response"
}
Is it possible to do some string manipulations using parameters in the Path in Step Function definition? If yes, what am I missing?
Thanks for your help.
You can use one or more EvaluateExpression Tasks - it's still a bit clunky.
You can find examples here.
API doc here.
Use a Lambda function instead of a Pass state to build the string.
Step Functions doesn't currently support string concatenation with reference paths. The Result field of a Pass state doesn't allow reference paths either. It has to be static value.
The Pass state's Parameters field supports the intrinsic functions and substitutions you need to do this natively, without a Lambda task. The Result field doesn't.
Compose a string from the execution inputs with the Format intrinsic function:
const inputTask = new Pass(this, "Input Step", {
parameters: {
path: JsonPath.format(
"{}/{}/response",
JsonPath.stringAt("$.op1"),
JsonPath.stringAt("$.op2")
),
},
outputPath: "$.s3",
});
The resulting string value test1/test2/response will be output to $.s3.path.

Elasticsearch Query on indexes whose name is matching a certain pattern

I have a couple of indexes in my Elasticsearch DB as follows
Index_2019_01
Index_2019_02
Index_2019_03
Index_2019_04
.
.
Index_2019_12
Suppose I want to search only on the first 3 Indexes.
I mean a regular expression like this:
select count(*) from Index_2019_0[1-3] where LanguageId="English"
What is the correct way to do that in Elasticsearch?
How can I query several indexes with certain names?
This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:
POST /index_2019_01,index_2019_02/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
Or, using URI search:
curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'
More details are available here. Note that Elasticsearch requires index names to be lowercase.
Can I use a regex to specify index name pattern?
In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:
The _index is exposed as a virtual field — it is not added to the
Lucene index as a real field. This means that you can use the _index
field in a term or terms query (or any query that is rewritten to a
term query, such as the match, query_string or simple_query_string
query), but it does not support prefix, wildcard, regexp, or fuzzy
queries.
For instance, the query from above can be rewritten as:
POST /_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"_index": [
"index_2019_01",
"index_2019_02"
]
}
},
{
"match": {
"LanguageID": "English"
}
}
]
}
}
}
Which employs a bool and a terms queries.
Hope that helps!
Why use POST when you are not adding any additional data to it.
I advise using GET for your case. Secondly, If the Index have similar names like in your case, you should be using an index pattern like in the query below,
GET /index_2019_*/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
OR in a URL
curl -XGET "http://<host>:<port>/index_2019_*/_search" -H 'Content-Type: application/json' -d'{"query": {"match":{"LanguageID": "English"}}}'
While searching for indices using a regex is not possible you might be able to use date math to take you a bit further.
You can look at the docs here
As an example, lets say you wish the last 3 months from those indices
that means that if we have
index_2019_01
index_2019_02
index_2019_03
index_2019_04
And today is 2019/04/20, we could use the following query to get 04,03 and 02
GET /<index-{now/M-0M{yyyy_MM}}>,<index-{now/M-1M{yyyy_MM}}>,<index-{now/M-2M{yyyy_MM}}>
I used M-0M for the first one so the query construction loop doesn't need a special case for the first index
Look at the docs regarding URL encoding this query and how to have literal braces in the index name, if a client is used the URL encoding is done for you (at least in the python client)

MongoDB: Aggregation using $cond with $regex

I am trying to group data in multiple stages.
At the moment my query looks like this:
db.captions.aggregate([
{$project: {
"videoId": "$videoId",
"plainText": "$plainText",
"Group1": {$cond: {if: {$eq: ["plainText", {"$regex": /leave\sa\scomment/i}]},
then: "Yes", else: "No"}}}}
])
I am not sure whether it is actually possible to use the $regex operator within a $cond in the aggregation stage. I would appreciate your help very much!
Thanks in advance
UPDATE: Starting with MongoDB v4.1.11, there finally appears to be a nice solution for your problem which is documented here.
Original answer:
As I wrote in the comments above, $regex does not work inside $cond as of now. There is an open JIRA ticket for that but it's, err, well, open...
In your specific case, I would tend to suggest you solve that topic on the client side unless you're dealing with crazy amounts of input data of which you will always only return small subsets. Judging by your query it would appear like you are always going to retrieve all document just bucketed into two result groups ("Yes" and "No").
If you don't want or cannot solve that topic on the client side, then here is something that uses $facet (MongoDB >= v3.4 required) - it's neither particularly fast nor overly pretty but it might help you to get started.
db.captions.aggregate([{
$facet: { // create two stages that will be processed using the full input data set from the "captions" collection
"CallToActionYes": [{ // the first stage will...
$match: { // only contain documents...
"plainText": /leave\sa\scomment/i // that are allowed by the $regex filter (which could be extended with multiple $or expressions or changed to $in/$nin which accept regular expressions, too)
}
}, {
$addFields: { // for all matching documents...
"CallToAction": "Yes" // we create a new field called "CallsToAction" which will be set to "Yes"
}
}],
"CallToActionNo": [{ // similar as above except we're doing the inverse filter using $not
$match: {
"plainText": { $not: /leave\sa\scomment/i }
}
}, {
$addFields: {
"CallToAction": "No" // and, of course, we set the field to "No"
}
}]
}
}, {
$project: { // we got two arrays of result documents out of the previous stage
"allDocuments" : { $setUnion: [ "$CallToActionYes", "$CallToActionNo" ] } // so let's merge them into a single one called "allDocuments"
}
}, {
$unwind: "$allDocuments" // flatten the "allDocuments" result array
}, {
$replaceRoot: { // restore the original document structure by moving everything inside "allDocuments" up to the top
newRoot: "$allDocuments"
}
}, {
$project: { // include only the two relevant fields in the output (and the _id)
"videoId": 1,
"CallToAction": 1
}
}])
As always with the aggregation framework, it may help to remove individual stages from the end of the pipeline and run the partial query in order to get an understanding of what each individual stage does.

AWS DynamoDB Scan filterExpression - simple number comparison

I am trying to do a simple dynamoDB scan with a filter expression (documentation here)
This is my expression string:
"attribute_exists("my_db_key") AND ("my_db_key" = 1)"
This simply states:
"If a value for my_db_key exists AND my_db_key EQUALS 1, return it in the results"
However it does not work and I get a this error:
Invalid FilterExpression: Syntax error; token: "1", near: "= 1)
I am aware that I can use an attribute value placeholder for values and then use that in the expression but I do not want to do this. And according to Amazon's documentation it is NOT required.
So how do I do this simple expression? Does anyone have an example or link to documentation? Amazon's documentation is unfortunately of no help.
NOTE: I am implementing this with AWSDynamoDBScanInput on iOS but my issue here is to do with global expression syntax so it should not matter.
Your params need to look something like this (for the Node AWS library):
params = {
"FilterExpression": 'attribute_exists("my_db_key") AND ("my_db_key" = :value)',
"ExpressionAttributeValues": {
":value": 1
},
// ...
};
docClient.scan(params, function(err, data){
// Handle err or process data
})
For some languages, the parameters should look more like this:
{
"FilterExpression": 'attribute_exists("my_db_key") AND ("my_db_key" = :value)',
"ExpressionAttributeValues": {
":value": {"N":1}
},
// ...
};
You have to use a placeholder and pass the value separately. Here's some documentation and a Post from AWS forums